From hlapp at gmx.net Wed Dec 1 01:11:04 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 1 01:09:11 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: On Monday, November 29, 2004, at 02:24 PM, Allen Day wrote: > > primary_tag() and source_tag() being separate from the AC was an > oversight > on my part. the intention was to move all of the feature's tag > attributes > into the collection. Honestly, I tend to second Chris' earlier suggestion that this be written up in some way. 'This' meaning, what is the targeted behavior that you want to become binding for all SeqFeatureI implementations that may call themselves compliant. I doubt that a) nobody is confused, and that b) everybody is on the same page ... -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Dec 1 01:18:18 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 1 01:15:52 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: On Monday, November 29, 2004, at 01:22 PM, Chris Mungall wrote: >> It would still require a working XML parser installation, no? > > yes - it would require some kind of third-party XML module or modules. > I > guess this may be slightly problemmatic as these are currently all > optional for bioperl, yep? Yep. To me, it's primarily an issue of cross-platform compatibility, unless I'm mistaken and there are reasonable pure-perl XML parsers available. In my admittedly cursory picture more or less all XML parsers (or those of reasonable speed anyway, if the ones frequently used are any indication) are based on the expat library. So far I've managed to install those modules on all Linux and MacOSX platforms I've worked with, sometimes with little and sometimes with a little more trouble, but it remains to be seen whether this is an issue for Windows or not. Anybody out there who can report on installing Expat and XML::Parser and friends? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Dec 1 01:23:32 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 1 01:21:16 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <02C65092-4210-11D9-8243-000D93392082@pcbi.upenn.edu> Message-ID: <86180B9F-4361-11D9-8ED7-000A959EB4C4@gmx.net> Sounds very reasonable Aaron, and thanks for your gentle herding efforts. :-) For the bioperl developers, this will also mean that for the lifetime of the 1.5 branch, efforts should be made to merge fixes applied to the HEAD to the branch right afterwards. -hilmar On Monday, November 29, 2004, at 06:07 AM, Aaron J. Mackey wrote: > > Yep, OK, I hear you. I really thought all this was going to be > contained to Bio::SeqFeature::Annotated, but I see now that with all > sorts of implementation happening in the interfaces (ugh!), this can't > happen. Woe is me. > > Here's what I'm willing to do to keep Allen from pulling his hair out: > there have been very few changes on the development trunk since RC1 > that aren't Annotated.pm-related; therefore, (if this makes sense to > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then > tag the branch at RC2 (and similarly tag the HEAD, so that any later > merging can be done relative to those tags). Make sense? > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > the cleanest path for 1.6.0, in which all things may change (with an > eye towards at least some backwards compatibility); my vote would be > that there remain some separation between "heavy" and "light" feature > types. I don't expect/need my Bio::SeqFeature::Simple to implement > AnnotationCollection! > > Thanks again to everyone; let me know if the CVS plan above sounds > reasonable ... > > -Aaron > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > >> I'm not saying this change of direction may be a show-stopper for any >> dependent package like bioperl-db. All I'm suggesting is let's be >> clear that this *is* a change of direction for a core interface, and >> let's give it some time to phase it in and to iron out wrinkles, both >> on the end of bioperl itself as well as the end of people who write >> software against bioperl. Let's give it some time to see how it >> works, and how it works under stress, before letting it lose on the >> general public who just wanted to get some bugfixes on the 1.4.0 >> release or some additional parsers. > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From nathanhaigh at ukonline.co.uk Wed Dec 1 03:35:38 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 1 03:33:15 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: I'm on WinXP and as far as I remember (or don't remember as the case is), I suspect I mustn't have had too bad a time installing them! However I can test this out on a clean Win98 (virtual machine) installation more or less immediately and if required on a clean WinXP installation (a bit more hassle and time required for me to do this as I'll have to install WinXP in a virtual machine). If all goes smoothly, or has some funny quirks etc, I could write a quick installation guide for BioPerlers if that's any use! The only possible problem I can see for the moment is that of the XML::Parser. I'm not sure what version is installed with ActiveState, but if BioPerl requires a later version of this module, it can be a bit of a swine to install if you only use ppm. The problem is this: if someone tries to update XML-Parser or Compress-Zlib (the latter not used by BioPerl so shouldn't be a problem) using ppm from ActiveState Perl on Win95 and Win98, it will do one of two things: 1) either will not update those modules at all; or 2) will try to update those modules and fail part way through and screw up those modules! The latter being the worst that can happen since ppm requires these two modules to function. After this happened with me, I had to do a reinstall of Perl and all the Perl modules I had. See http://aspn.activestate.com/ASPN/Downloads/ActivePerl/PPM/ for a few details. I think all the following XML modules are available as ppd files for ppm installs on windows, as I included them in a possible BioPerlv1.5 ppd file: XML-DOM XML-DOM-XPath XML-Node XML-Parser XML-SAX XML-SAX-Base XML-SAX-Writer XML-Twig XML-Writer I'll have a run though on the Win98 system over the next day or two and let you know how I get on! BTW, has the problem with the quote line in the BEGIN subs of BioPerl 1.5.0-RC1 been fixed? Or should I just use the latest CVS BioPerl? Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: 01 December 2004 06:18 > To: Chris Mungall > Cc: Allen Day; Aaron J. Mackey; Bioperl > Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > On Monday, November 29, 2004, at 01:22 PM, Chris Mungall wrote: > > >> It would still require a working XML parser installation, no? > > > > yes - it would require some kind of third-party XML module or modules. > > I > > guess this may be slightly problemmatic as these are currently all > > optional for bioperl, yep? > > Yep. > > To me, it's primarily an issue of cross-platform compatibility, unless > I'm mistaken and there are reasonable pure-perl XML parsers available. > In my admittedly cursory picture more or less all XML parsers (or those > of reasonable speed anyway, if the ones frequently used are any > indication) are based on the expat library. So far I've managed to > install those modules on all Linux and MacOSX platforms I've worked > with, sometimes with little and sometimes with a little more trouble, > but it remains to be seen whether this is an issue for Windows or not. > > Anybody out there who can report on installing Expat and XML::Parser > and friends? > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-0, 30/11/2004 Tested on: 01/12/2004 08:35:31 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From nathanhaigh at ukonline.co.uk Wed Dec 1 07:21:19 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 1 07:19:50 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <86180B9F-4361-11D9-8ED7-000A959EB4C4@gmx.net> Message-ID: Which BioPerl test(s) should determine if expat is working correctly? Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: 01 December 2004 06:24 > To: Aaron J. Mackey > Cc: Allen Day; Bioperl > Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > Sounds very reasonable Aaron, and thanks for your gentle herding > efforts. :-) > > For the bioperl developers, this will also mean that for the lifetime > of the 1.5 branch, efforts should be made to merge fixes applied to the > HEAD to the branch right afterwards. > > -hilmar > > On Monday, November 29, 2004, at 06:07 AM, Aaron J. Mackey wrote: > > > > > Yep, OK, I hear you. I really thought all this was going to be > > contained to Bio::SeqFeature::Annotated, but I see now that with all > > sorts of implementation happening in the interfaces (ugh!), this can't > > happen. Woe is me. > > > > Here's what I'm willing to do to keep Allen from pulling his hair out: > > there have been very few changes on the development trunk since RC1 > > that aren't Annotated.pm-related; therefore, (if this makes sense to > > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then > > tag the branch at RC2 (and similarly tag the HEAD, so that any later > > merging can be done relative to those tags). Make sense? > > > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > > the cleanest path for 1.6.0, in which all things may change (with an > > eye towards at least some backwards compatibility); my vote would be > > that there remain some separation between "heavy" and "light" feature > > types. I don't expect/need my Bio::SeqFeature::Simple to implement > > AnnotationCollection! > > > > Thanks again to everyone; let me know if the CVS plan above sounds > > reasonable ... > > > > -Aaron > > > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > > > >> I'm not saying this change of direction may be a show-stopper for any > >> dependent package like bioperl-db. All I'm suggesting is let's be > >> clear that this *is* a change of direction for a core interface, and > >> let's give it some time to phase it in and to iron out wrinkles, both > >> on the end of bioperl itself as well as the end of people who write > >> software against bioperl. Let's give it some time to see how it > >> works, and how it works under stress, before letting it lose on the > >> general public who just wanted to get some bugfixes on the 1.4.0 > >> release or some additional parsers. > > > > -- > > Aaron J. Mackey, Ph.D. > > Dept. of Biology, Goddard 212 > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > 415 S. University Avenue office: 215-898-1205 > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 07:58:52 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-0, 30/11/2004 Tested on: 01/12/2004 12:21:16 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From brian_osborne at cognia.com Wed Dec 1 07:57:56 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Dec 1 07:56:48 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: Nathan, t/Biblio.t is one. It tests Bio/Biblio/IO/medlinexml.pm, which uses XML::Parser, which uses expat. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh Sent: Wednesday, December 01, 2004 7:21 AM To: 'Hilmar Lapp' Cc: 'Bioperl' Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes Which BioPerl test(s) should determine if expat is working correctly? Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: 01 December 2004 06:24 > To: Aaron J. Mackey > Cc: Allen Day; Bioperl > Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > Sounds very reasonable Aaron, and thanks for your gentle herding > efforts. :-) > > For the bioperl developers, this will also mean that for the lifetime > of the 1.5 branch, efforts should be made to merge fixes applied to the > HEAD to the branch right afterwards. > > -hilmar > > On Monday, November 29, 2004, at 06:07 AM, Aaron J. Mackey wrote: > > > > > Yep, OK, I hear you. I really thought all this was going to be > > contained to Bio::SeqFeature::Annotated, but I see now that with all > > sorts of implementation happening in the interfaces (ugh!), this can't > > happen. Woe is me. > > > > Here's what I'm willing to do to keep Allen from pulling his hair out: > > there have been very few changes on the development trunk since RC1 > > that aren't Annotated.pm-related; therefore, (if this makes sense to > > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then > > tag the branch at RC2 (and similarly tag the HEAD, so that any later > > merging can be done relative to those tags). Make sense? > > > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > > the cleanest path for 1.6.0, in which all things may change (with an > > eye towards at least some backwards compatibility); my vote would be > > that there remain some separation between "heavy" and "light" feature > > types. I don't expect/need my Bio::SeqFeature::Simple to implement > > AnnotationCollection! > > > > Thanks again to everyone; let me know if the CVS plan above sounds > > reasonable ... > > > > -Aaron > > > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > > > >> I'm not saying this change of direction may be a show-stopper for any > >> dependent package like bioperl-db. All I'm suggesting is let's be > >> clear that this *is* a change of direction for a core interface, and > >> let's give it some time to phase it in and to iron out wrinkles, both > >> on the end of bioperl itself as well as the end of people who write > >> software against bioperl. Let's give it some time to see how it > >> works, and how it works under stress, before letting it lose on the > >> general public who just wanted to get some bugfixes on the 1.4.0 > >> release or some additional parsers. > > > > -- > > Aaron J. Mackey, Ph.D. > > Dept. of Biology, Goddard 212 > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > 415 S. University Avenue office: 215-898-1205 > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 07:58:52 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-0, 30/11/2004 Tested on: 01/12/2004 12:21:16 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From nathanhaigh at ukonline.co.uk Wed Dec 1 08:53:43 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 1 08:51:35 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: Forgive me if I'm being a bit dense! But, if XML::Parser is included in the ActiveState Perl distribution, doesn't this mean that expat (or similar) is also installed? Nathan > -----Original Message----- > From: Brian Osborne [mailto:brian_osborne@cognia.com] > Sent: 01 December 2004 12:58 > To: nathanhaigh@ukonline.co.uk; 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > Nathan, > > t/Biblio.t is one. > > It tests Bio/Biblio/IO/medlinexml.pm, which uses XML::Parser, which uses > expat. > > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh > Sent: Wednesday, December 01, 2004 7:21 AM > To: 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > Which BioPerl test(s) should determine if expat is working correctly? > > Nathan > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > > Sent: 01 December 2004 06:24 > > To: Aaron J. Mackey > > Cc: Allen Day; Bioperl > > Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > Sounds very reasonable Aaron, and thanks for your gentle herding > > efforts. :-) > > > > For the bioperl developers, this will also mean that for the lifetime > > of the 1.5 branch, efforts should be made to merge fixes applied to the > > HEAD to the branch right afterwards. > > > > -hilmar > > > > On Monday, November 29, 2004, at 06:07 AM, Aaron J. Mackey wrote: > > > > > > > > Yep, OK, I hear you. I really thought all this was going to be > > > contained to Bio::SeqFeature::Annotated, but I see now that with all > > > sorts of implementation happening in the interfaces (ugh!), this can't > > > happen. Woe is me. > > > > > > Here's what I'm willing to do to keep Allen from pulling his hair out: > > > there have been very few changes on the development trunk since RC1 > > > that aren't Annotated.pm-related; therefore, (if this makes sense to > > > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > > > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then > > > tag the branch at RC2 (and similarly tag the HEAD, so that any later > > > merging can be done relative to those tags). Make sense? > > > > > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > > > the cleanest path for 1.6.0, in which all things may change (with an > > > eye towards at least some backwards compatibility); my vote would be > > > that there remain some separation between "heavy" and "light" feature > > > types. I don't expect/need my Bio::SeqFeature::Simple to implement > > > AnnotationCollection! > > > > > > Thanks again to everyone; let me know if the CVS plan above sounds > > > reasonable ... > > > > > > -Aaron > > > > > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > > > > > >> I'm not saying this change of direction may be a show-stopper for any > > >> dependent package like bioperl-db. All I'm suggesting is let's be > > >> clear that this *is* a change of direction for a core interface, and > > >> let's give it some time to phase it in and to iron out wrinkles, both > > >> on the end of bioperl itself as well as the end of people who write > > >> software against bioperl. Let's give it some time to see how it > > >> works, and how it works under stress, before letting it lose on the > > >> general public who just wanted to get some bugfixes on the 1.4.0 > > >> release or some additional parsers. > > > > > > -- > > > Aaron J. Mackey, Ph.D. > > > Dept. of Biology, Goddard 212 > > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > > 415 S. University Avenue office: 215-898-1205 > > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > > > > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp at gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > --- > > avast! Antivirus: Inbound message clean. > > Virus Database (VPS): 0449-0, 30/11/2004 > > Tested on: 01/12/2004 07:58:52 > > avast! is copyright (c) 2000-2003 ALWIL Software. > > http://www.avast.com > > > > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 12:21:16 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-0, 30/11/2004 Tested on: 01/12/2004 13:53:41 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From brian_osborne at cognia.com Wed Dec 1 09:07:49 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Dec 1 09:06:16 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: Nathan, I don't know. But if ActiveState can do the equivalent of "force install" then the answer is "not necessarily". Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh Sent: Wednesday, December 01, 2004 8:54 AM To: 'Brian Osborne'; 'Hilmar Lapp' Cc: 'Bioperl' Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes Forgive me if I'm being a bit dense! But, if XML::Parser is included in the ActiveState Perl distribution, doesn't this mean that expat (or similar) is also installed? Nathan > -----Original Message----- > From: Brian Osborne [mailto:brian_osborne@cognia.com] > Sent: 01 December 2004 12:58 > To: nathanhaigh@ukonline.co.uk; 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > Nathan, > > t/Biblio.t is one. > > It tests Bio/Biblio/IO/medlinexml.pm, which uses XML::Parser, which uses > expat. > > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh > Sent: Wednesday, December 01, 2004 7:21 AM > To: 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > Which BioPerl test(s) should determine if expat is working correctly? > > Nathan > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > > Sent: 01 December 2004 06:24 > > To: Aaron J. Mackey > > Cc: Allen Day; Bioperl > > Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > Sounds very reasonable Aaron, and thanks for your gentle herding > > efforts. :-) > > > > For the bioperl developers, this will also mean that for the lifetime > > of the 1.5 branch, efforts should be made to merge fixes applied to the > > HEAD to the branch right afterwards. > > > > -hilmar > > > > On Monday, November 29, 2004, at 06:07 AM, Aaron J. Mackey wrote: > > > > > > > > Yep, OK, I hear you. I really thought all this was going to be > > > contained to Bio::SeqFeature::Annotated, but I see now that with all > > > sorts of implementation happening in the interfaces (ugh!), this can't > > > happen. Woe is me. > > > > > > Here's what I'm willing to do to keep Allen from pulling his hair out: > > > there have been very few changes on the development trunk since RC1 > > > that aren't Annotated.pm-related; therefore, (if this makes sense to > > > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > > > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then > > > tag the branch at RC2 (and similarly tag the HEAD, so that any later > > > merging can be done relative to those tags). Make sense? > > > > > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > > > the cleanest path for 1.6.0, in which all things may change (with an > > > eye towards at least some backwards compatibility); my vote would be > > > that there remain some separation between "heavy" and "light" feature > > > types. I don't expect/need my Bio::SeqFeature::Simple to implement > > > AnnotationCollection! > > > > > > Thanks again to everyone; let me know if the CVS plan above sounds > > > reasonable ... > > > > > > -Aaron > > > > > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > > > > > >> I'm not saying this change of direction may be a show-stopper for any > > >> dependent package like bioperl-db. All I'm suggesting is let's be > > >> clear that this *is* a change of direction for a core interface, and > > >> let's give it some time to phase it in and to iron out wrinkles, both > > >> on the end of bioperl itself as well as the end of people who write > > >> software against bioperl. Let's give it some time to see how it > > >> works, and how it works under stress, before letting it lose on the > > >> general public who just wanted to get some bugfixes on the 1.4.0 > > >> release or some additional parsers. > > > > > > -- > > > Aaron J. Mackey, Ph.D. > > > Dept. of Biology, Goddard 212 > > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > > 415 S. University Avenue office: 215-898-1205 > > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > > > > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp at gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > --- > > avast! Antivirus: Inbound message clean. > > Virus Database (VPS): 0449-0, 30/11/2004 > > Tested on: 01/12/2004 07:58:52 > > avast! is copyright (c) 2000-2003 ALWIL Software. > > http://www.avast.com > > > > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 12:21:16 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-0, 30/11/2004 Tested on: 01/12/2004 13:53:41 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From aqureshi at cs.odu.edu Wed Dec 1 09:19:58 2004 From: aqureshi at cs.odu.edu (Affan Qureshi) Date: Wed Dec 1 09:29:33 2004 Subject: [Bioperl-l] Discovering new genes In-Reply-To: References: Message-ID: <33600.24.254.228.186.1101910798.squirrel@cartero.cs.odu.edu> Hi, This is a newbie question, maybe not directly related to BioPerl even. I am a student working on a set of newly discovered DNA sequence files for an organism. I would like to know the steps involved in discovering and analyzing the DNA sequences. I want to classify the various functional units in the genes represented by the DNA sequences. I have conducted a Blastx search for all the 100 files and have found some very good matches (e.g e-value 1e-120). I have seperated out the big match sequences in a seperate file. I want to know what the next steps are. Maybe I should compare the matched sequences(proteins) against the corresponding ones in other organisms like Yeast, worm, drosophila etc and find out whether there is a similarity between the neighboring proteins. I tried doing this at SPRING website and looked up the matches on SWISS-PROT etc. I thought since you guys are experienced bioinformaticists and do this type of work every day, maybe you could give me some pointers. Sorry about the off-topic question. Thanks, Affan From nathanhaigh at ukonline.co.uk Wed Dec 1 09:37:30 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 1 09:36:03 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: I tried installing expat on what I thought was a clean Win98 system but I'd actually previously installed ActiveState Perl and most of BioPerl's Perl dependencies. Using the current 1.95.8 release of expat_win32bin file, I installed to the default location of c:\Expat-1.95.8\ and no other configuration options were available - I don't know if I need to point XML::Parser to this location somehow, but when i ran the t\biblio.t test using the current CVS release of BioPerl and all the tests ran fine without any warnings. So unless XML::Parser is using some other expat-like module/program that was installed by ActiveState Perl, then things seem to run fine on my setup. However, I do need to varyfy the following points: 1) Does ActiveState already have an expat like program/module installed, since it comes shipped with XML::Parser as standard? 2) If so, do I need to "tell" XML::Parser to use the expat I just installed rather than the implementation that may be being used in 1? 3) How do I verify that the expat I installed is actually being used? I'll run a few scenarios over the next few days/week! Nathan > -----Original Message----- > From: Brian Osborne [mailto:brian_osborne@cognia.com] > Sent: 01 December 2004 14:08 > To: nathanhaigh@ukonline.co.uk; 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > Nathan, > > I don't know. But if ActiveState can do the equivalent of "force install" > then the answer is "not necessarily". > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh > Sent: Wednesday, December 01, 2004 8:54 AM > To: 'Brian Osborne'; 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > Forgive me if I'm being a bit dense! But, if XML::Parser is included in the > ActiveState Perl distribution, doesn't this mean that > expat (or similar) is also installed? > > Nathan > > > -----Original Message----- > > From: Brian Osborne [mailto:brian_osborne@cognia.com] > > Sent: 01 December 2004 12:58 > > To: nathanhaigh@ukonline.co.uk; 'Hilmar Lapp' > > Cc: 'Bioperl' > > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > Nathan, > > > > t/Biblio.t is one. > > > > It tests Bio/Biblio/IO/medlinexml.pm, which uses XML::Parser, which uses > > expat. > > > > Brian O. > > > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh > > Sent: Wednesday, December 01, 2004 7:21 AM > > To: 'Hilmar Lapp' > > Cc: 'Bioperl' > > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > > > Which BioPerl test(s) should determine if expat is working correctly? > > > > Nathan > > > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > > > Sent: 01 December 2004 06:24 > > > To: Aaron J. Mackey > > > Cc: Allen Day; Bioperl > > > Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > > > Sounds very reasonable Aaron, and thanks for your gentle herding > > > efforts. :-) > > > > > > For the bioperl developers, this will also mean that for the lifetime > > > of the 1.5 branch, efforts should be made to merge fixes applied to the > > > HEAD to the branch right afterwards. > > > > > > -hilmar > > > > > > On Monday, November 29, 2004, at 06:07 AM, Aaron J. Mackey wrote: > > > > > > > > > > > Yep, OK, I hear you. I really thought all this was going to be > > > > contained to Bio::SeqFeature::Annotated, but I see now that with all > > > > sorts of implementation happening in the interfaces (ugh!), this can't > > > > happen. Woe is me. > > > > > > > > Here's what I'm willing to do to keep Allen from pulling his hair out: > > > > there have been very few changes on the development trunk since RC1 > > > > that aren't Annotated.pm-related; therefore, (if this makes sense to > > > > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > > > > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then > > > > tag the branch at RC2 (and similarly tag the HEAD, so that any later > > > > merging can be done relative to those tags). Make sense? > > > > > > > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > > > > the cleanest path for 1.6.0, in which all things may change (with an > > > > eye towards at least some backwards compatibility); my vote would be > > > > that there remain some separation between "heavy" and "light" feature > > > > types. I don't expect/need my Bio::SeqFeature::Simple to implement > > > > AnnotationCollection! > > > > > > > > Thanks again to everyone; let me know if the CVS plan above sounds > > > > reasonable ... > > > > > > > > -Aaron > > > > > > > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > > > > > > > >> I'm not saying this change of direction may be a show-stopper for any > > > >> dependent package like bioperl-db. All I'm suggesting is let's be > > > >> clear that this *is* a change of direction for a core interface, and > > > >> let's give it some time to phase it in and to iron out wrinkles, both > > > >> on the end of bioperl itself as well as the end of people who write > > > >> software against bioperl. Let's give it some time to see how it > > > >> works, and how it works under stress, before letting it lose on the > > > >> general public who just wanted to get some bugfixes on the 1.4.0 > > > >> release or some additional parsers. > > > > > > > > -- > > > > Aaron J. Mackey, Ph.D. > > > > Dept. of Biology, Goddard 212 > > > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > > > 415 S. University Avenue office: 215-898-1205 > > > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > > > > > > > > -- > > > ------------------------------------------------------------- > > > Hilmar Lapp email: lapp at gnf.org > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > ------------------------------------------------------------- > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > --- > > > avast! Antivirus: Inbound message clean. > > > Virus Database (VPS): 0449-0, 30/11/2004 > > > Tested on: 01/12/2004 07:58:52 > > > avast! is copyright (c) 2000-2003 ALWIL Software. > > > http://www.avast.com > > > > > > > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0449-0, 30/11/2004 > > Tested on: 01/12/2004 12:21:16 > > avast! is copyright (c) 2000-2003 ALWIL Software. > > http://www.avast.com > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 13:53:41 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 14:26:15 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-0, 30/11/2004 Tested on: 01/12/2004 14:37:28 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From nathanhaigh at ukonline.co.uk Wed Dec 1 09:52:33 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 1 09:51:21 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: ActiveState Perl has a module called XML::Parser::expat. Could this be a perl implementation of the expat library? Nathan > -----Original Message----- > From: Brian Osborne [mailto:brian_osborne@cognia.com] > Sent: 01 December 2004 14:08 > To: nathanhaigh@ukonline.co.uk; 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > Nathan, > > I don't know. But if ActiveState can do the equivalent of "force install" > then the answer is "not necessarily". > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh > Sent: Wednesday, December 01, 2004 8:54 AM > To: 'Brian Osborne'; 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > Forgive me if I'm being a bit dense! But, if XML::Parser is included in the > ActiveState Perl distribution, doesn't this mean that > expat (or similar) is also installed? > > Nathan > > > -----Original Message----- > > From: Brian Osborne [mailto:brian_osborne@cognia.com] > > Sent: 01 December 2004 12:58 > > To: nathanhaigh@ukonline.co.uk; 'Hilmar Lapp' > > Cc: 'Bioperl' > > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > Nathan, > > > > t/Biblio.t is one. > > > > It tests Bio/Biblio/IO/medlinexml.pm, which uses XML::Parser, which uses > > expat. > > > > Brian O. > > > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh > > Sent: Wednesday, December 01, 2004 7:21 AM > > To: 'Hilmar Lapp' > > Cc: 'Bioperl' > > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > > > Which BioPerl test(s) should determine if expat is working correctly? > > > > Nathan > > > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > > > Sent: 01 December 2004 06:24 > > > To: Aaron J. Mackey > > > Cc: Allen Day; Bioperl > > > Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > > > Sounds very reasonable Aaron, and thanks for your gentle herding > > > efforts. :-) > > > > > > For the bioperl developers, this will also mean that for the lifetime > > > of the 1.5 branch, efforts should be made to merge fixes applied to the > > > HEAD to the branch right afterwards. > > > > > > -hilmar > > > > > > On Monday, November 29, 2004, at 06:07 AM, Aaron J. Mackey wrote: > > > > > > > > > > > Yep, OK, I hear you. I really thought all this was going to be > > > > contained to Bio::SeqFeature::Annotated, but I see now that with all > > > > sorts of implementation happening in the interfaces (ugh!), this can't > > > > happen. Woe is me. > > > > > > > > Here's what I'm willing to do to keep Allen from pulling his hair out: > > > > there have been very few changes on the development trunk since RC1 > > > > that aren't Annotated.pm-related; therefore, (if this makes sense to > > > > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > > > > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then > > > > tag the branch at RC2 (and similarly tag the HEAD, so that any later > > > > merging can be done relative to those tags). Make sense? > > > > > > > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > > > > the cleanest path for 1.6.0, in which all things may change (with an > > > > eye towards at least some backwards compatibility); my vote would be > > > > that there remain some separation between "heavy" and "light" feature > > > > types. I don't expect/need my Bio::SeqFeature::Simple to implement > > > > AnnotationCollection! > > > > > > > > Thanks again to everyone; let me know if the CVS plan above sounds > > > > reasonable ... > > > > > > > > -Aaron > > > > > > > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > > > > > > > >> I'm not saying this change of direction may be a show-stopper for any > > > >> dependent package like bioperl-db. All I'm suggesting is let's be > > > >> clear that this *is* a change of direction for a core interface, and > > > >> let's give it some time to phase it in and to iron out wrinkles, both > > > >> on the end of bioperl itself as well as the end of people who write > > > >> software against bioperl. Let's give it some time to see how it > > > >> works, and how it works under stress, before letting it lose on the > > > >> general public who just wanted to get some bugfixes on the 1.4.0 > > > >> release or some additional parsers. > > > > > > > > -- > > > > Aaron J. Mackey, Ph.D. > > > > Dept. of Biology, Goddard 212 > > > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > > > 415 S. University Avenue office: 215-898-1205 > > > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > > > > > > > > -- > > > ------------------------------------------------------------- > > > Hilmar Lapp email: lapp at gnf.org > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > ------------------------------------------------------------- > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > --- > > > avast! Antivirus: Inbound message clean. > > > Virus Database (VPS): 0449-0, 30/11/2004 > > > Tested on: 01/12/2004 07:58:52 > > > avast! is copyright (c) 2000-2003 ALWIL Software. > > > http://www.avast.com > > > > > > > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0449-0, 30/11/2004 > > Tested on: 01/12/2004 12:21:16 > > avast! is copyright (c) 2000-2003 ALWIL Software. > > http://www.avast.com > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 13:53:41 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 14:26:15 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-0, 30/11/2004 Tested on: 01/12/2004 14:52:32 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From nathanhaigh at ukonline.co.uk Wed Dec 1 10:07:47 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 1 10:05:39 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: ActiveState Perl has a module called XML::Parser::expat. Could this be a perl implementation of the expat library? [Actually there appears to be an expat dir in the XML::Parser dir - need to investigate] Possible helpful links: http://www.talkaboutprogramming.com/group/comp.lang.perl.modules/messages/64363.html Nathan > -----Original Message----- > From: Brian Osborne [mailto:brian_osborne@cognia.com] > Sent: 01 December 2004 14:08 > To: nathanhaigh@ukonline.co.uk; 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > Nathan, > > I don't know. But if ActiveState can do the equivalent of "force install" > then the answer is "not necessarily". > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh > Sent: Wednesday, December 01, 2004 8:54 AM > To: 'Brian Osborne'; 'Hilmar Lapp' > Cc: 'Bioperl' > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > Forgive me if I'm being a bit dense! But, if XML::Parser is included in the > ActiveState Perl distribution, doesn't this mean that > expat (or similar) is also installed? > > Nathan > > > -----Original Message----- > > From: Brian Osborne [mailto:brian_osborne@cognia.com] > > Sent: 01 December 2004 12:58 > > To: nathanhaigh@ukonline.co.uk; 'Hilmar Lapp' > > Cc: 'Bioperl' > > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > Nathan, > > > > t/Biblio.t is one. > > > > It tests Bio/Biblio/IO/medlinexml.pm, which uses XML::Parser, which uses > > expat. > > > > Brian O. > > > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh > > Sent: Wednesday, December 01, 2004 7:21 AM > > To: 'Hilmar Lapp' > > Cc: 'Bioperl' > > Subject: RE: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > > > Which BioPerl test(s) should determine if expat is working correctly? > > > > Nathan > > > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > > > Sent: 01 December 2004 06:24 > > > To: Aaron J. Mackey > > > Cc: Allen Day; Bioperl > > > Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > > > > Sounds very reasonable Aaron, and thanks for your gentle herding > > > efforts. :-) > > > > > > For the bioperl developers, this will also mean that for the lifetime > > > of the 1.5 branch, efforts should be made to merge fixes applied to the > > > HEAD to the branch right afterwards. > > > > > > -hilmar > > > > > > On Monday, November 29, 2004, at 06:07 AM, Aaron J. Mackey wrote: > > > > > > > > > > > Yep, OK, I hear you. I really thought all this was going to be > > > > contained to Bio::SeqFeature::Annotated, but I see now that with all > > > > sorts of implementation happening in the interfaces (ugh!), this can't > > > > happen. Woe is me. > > > > > > > > Here's what I'm willing to do to keep Allen from pulling his hair out: > > > > there have been very few changes on the development trunk since RC1 > > > > that aren't Annotated.pm-related; therefore, (if this makes sense to > > > > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > > > > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then > > > > tag the branch at RC2 (and similarly tag the HEAD, so that any later > > > > merging can be done relative to those tags). Make sense? > > > > > > > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > > > > the cleanest path for 1.6.0, in which all things may change (with an > > > > eye towards at least some backwards compatibility); my vote would be > > > > that there remain some separation between "heavy" and "light" feature > > > > types. I don't expect/need my Bio::SeqFeature::Simple to implement > > > > AnnotationCollection! > > > > > > > > Thanks again to everyone; let me know if the CVS plan above sounds > > > > reasonable ... > > > > > > > > -Aaron > > > > > > > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > > > > > > > >> I'm not saying this change of direction may be a show-stopper for any > > > >> dependent package like bioperl-db. All I'm suggesting is let's be > > > >> clear that this *is* a change of direction for a core interface, and > > > >> let's give it some time to phase it in and to iron out wrinkles, both > > > >> on the end of bioperl itself as well as the end of people who write > > > >> software against bioperl. Let's give it some time to see how it > > > >> works, and how it works under stress, before letting it lose on the > > > >> general public who just wanted to get some bugfixes on the 1.4.0 > > > >> release or some additional parsers. > > > > > > > > -- > > > > Aaron J. Mackey, Ph.D. > > > > Dept. of Biology, Goddard 212 > > > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > > > 415 S. University Avenue office: 215-898-1205 > > > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > > > > > > > > -- > > > ------------------------------------------------------------- > > > Hilmar Lapp email: lapp at gnf.org > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > ------------------------------------------------------------- > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > --- > > > avast! Antivirus: Inbound message clean. > > > Virus Database (VPS): 0449-0, 30/11/2004 > > > Tested on: 01/12/2004 07:58:52 > > > avast! is copyright (c) 2000-2003 ALWIL Software. > > > http://www.avast.com > > > > > > > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0449-0, 30/11/2004 > > Tested on: 01/12/2004 12:21:16 > > avast! is copyright (c) 2000-2003 ALWIL Software. > > http://www.avast.com > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 13:53:41 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 01/12/2004 14:26:15 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-0, 30/11/2004 Tested on: 01/12/2004 14:52:32 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-0, 30/11/2004 Tested on: 01/12/2004 15:07:46 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From Mikko.Arvas at vtt.fi Wed Dec 1 11:16:46 2004 From: Mikko.Arvas at vtt.fi (Mikko Arvas) Date: Wed Dec 1 11:14:32 2004 Subject: [Bioperl-l] bad entries in interpro again Message-ID: <4.3.2.7.2.20041201160437.00cd4cb0@vttmail.vtt.fi> Hi, we've been discussing the problems of interpro parsing. I have a friend who is going to interpro consortium meeting next week and I could send some regards through him. After reading your e-mails, I am (being quite a newbie) a little bit confused of what kind of regards would you like to send if any? Is the &apos the source of the problem? Is it really a problem in BioPerl or in expat? Is somebody trying to solve the problem for Bioperl now and is there any sensible thing that the interpro team could do to help? Cheers, mikko Mikko Arvas VTT Biotechnology e-mail: mikko.arvas@vtt.fi tel: +358-(0)9-456 5827 mobile: +358-(0)44-381 0502 fax: +358-(0)9-455 2103 mail: Tietotie 2, Espoo P.O. Box 1500 FIN-02044 VTT, Finland From hlapp at gmx.net Wed Dec 1 16:49:48 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 1 16:47:22 2004 Subject: [Bioperl-l] Re: bad entries in interpro again In-Reply-To: <4.3.2.7.2.20041201160437.00cd4cb0@vttmail.vtt.fi> Message-ID: On Wednesday, December 1, 2004, at 08:16 AM, Mikko Arvas wrote: > Is the &apos the source of the problem? Did you try to take it out and see what happens? I.e., you can answer this yourself easily. I would have thought that it's not the problem, but it'd be great if you or somebody else helps out by testing what was suggested. > Is it really a problem in BioPerl or in expat? If the problem is outside of Interpro, it's Expat, not Bioperl. It's the XML parser library that threw up. > Is somebody trying to solve the problem for Bioperl now > and is there any sensible thing that the interpro team could do to > help? Depends on where the problem is. It appears that the Interpro team already eliminated the double quotes in names. The is some hard-coded stuff in interpro.pm that needs to be removed, and I heard Allen say he'll work on that. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Dec 1 16:51:48 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 1 16:49:14 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: <33AB540E-43E3-11D9-A508-000A959EB4C4@gmx.net> On Wednesday, December 1, 2004, at 06:52 AM, Nathan Haigh wrote: > ActiveState Perl has a module called XML::Parser::expat. Could this be > a perl implementation of the expat library? > No. It's the wrapper that will dynamically load the expat library. If you can 'use' this module like in $ perl -e 'use XML::Parser::Expat;' then probably you have expat installed and dynamically loadable. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Dec 1 16:59:34 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 1 16:57:01 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: <49542859-43E4-11D9-A508-000A959EB4C4@gmx.net> On Wednesday, December 1, 2004, at 05:53 AM, Nathan Haigh wrote: > Forgive me if I'm being a bit dense! But, if XML::Parser is included > in the ActiveState Perl distribution, doesn't this mean that > expat (or similar) is also installed? Don't know how this works on Windows ... On the Unix'es I've worked with you have to install expat separately. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cldwalker at chwhat.com Wed Dec 1 19:23:42 2004 From: cldwalker at chwhat.com (Gabriel Horner) Date: Wed Dec 1 19:26:43 2004 Subject: [Bioperl-l] bio{perl,python,ruby} comparison Message-ID: <20041202002342.GA5179@bigmama.chwhat.com> Hi everyone, I've been using a bioperl and bioperl-run for a few months. You guys have done a great job! Based on this minimal experience, I'm writing an article/report about perl and bioinformatics for a bioinformatics graduate class. In this report I'm mentioning bioinformatic applications written in perl. So far I have : g-language ensembl genquire gbrowse and other apps at gmod.org MLST biomolquest ESTminer CD-HIT MuGeN PerlPrimer Shrubberies Any other main ones? Also do you know of any articles that compare bioperl with biopython? Thanks, Gabriel -- my looovely website -- http://www.chwhat.com BTW, IF chwhat.com goes down email me at gabriel.horner@cern.ch From Mikko.Arvas at vtt.fi Thu Dec 2 05:11:44 2004 From: Mikko.Arvas at vtt.fi (Mikko Arvas) Date: Thu Dec 2 05:09:26 2004 Subject: [Bioperl-l] Re: bad entries in interpro again In-Reply-To: References: <4.3.2.7.2.20041201160437.00cd4cb0@vttmail.vtt.fi> Message-ID: <4.3.2.7.2.20041202094926.00cb63e8@vttmail.vtt.fi> Hi, At 13:49 1.12.2004 -0800, Hilmar Lapp wrote: >On Wednesday, December 1, 2004, at 08:16 AM, Mikko Arvas wrote: > >>Is the &apos the source of the problem? >Did you try to take it out and see what happens? I.e., you can answer this >yourself easily. >I would have thought that it's not the problem, but it'd be great if you >or somebody else helps out by testing what was suggested. Sorry about that I should have tested it before mailing. The problem is not non-ascii characters it seems to be specifically the combination of two & inside individual <>. I tried various combinations and other non-ascii characters (even in abundance) don't break it and a single & does neither. Here is again the problematic line: And its error: not well-formed (invalid token) at line 2, column 54, byte 132 at /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi/XML/Parser.pm line 187 So which way to proceed? >>Is it really a problem in BioPerl or in expat? > >If the problem is outside of Interpro, it's Expat, not Bioperl. It's the >XML parser library that threw up. > >> Is somebody trying to solve the problem for Bioperl now >>and is there any sensible thing that the interpro team could do to help? > >Depends on where the problem is. It appears that the Interpro team already >eliminated the double quotes in names. The is some hard-coded stuff in >interpro.pm that needs to be removed, and I heard Allen say he'll work on >that. > > -hilmar Cheers, mikko Mikko Arvas VTT Biotechnology e-mail: mikko.arvas@vtt.fi tel: +358-(0)9-456 5827 mobile: +358-(0)44-381 0502 fax: +358-(0)9-455 2103 mail: Tietotie 2, Espoo P.O. Box 1500 FIN-02044 VTT, Finland From nathanhaigh at ukonline.co.uk Thu Dec 2 07:41:00 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Thu Dec 2 07:39:14 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <33AB540E-43E3-11D9-A508-000A959EB4C4@gmx.net> Message-ID: This is what I've found so far: My System: ActiveState Perl 5.8.0 build 804 has: XML::Parser (v2.31) XML::Parser::Expat (v2.31) expat libs (v1.95.5) which are dynamically loaded. I'm not sure what versions of these file Perl 5.6 has, if it's important to know, I could find out. The t\Biblio.t test ran without errors or problems. The only possible future problem I can see, is if BioPerl needs to use a recent version of expat the would require windows user to update their expat/XML::Parser, but aside from that things seem to work with absolutely no problems whatsoever - there's a first! :o) Is this what you wanted to hear? If there are any more windows related questions, I'll do my best to help out! Nathan > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: 01 December 2004 21:52 > To: nathanhaigh@ukonline.co.uk > Cc: 'Brian Osborne'; 'Bioperl' > Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes > > > On Wednesday, December 1, 2004, at 06:52 AM, Nathan Haigh wrote: > > > ActiveState Perl has a module called XML::Parser::expat. Could this be > > a perl implementation of the expat library? > > > > No. It's the wrapper that will dynamically load the expat library. If > you can 'use' this module like in > > $ perl -e 'use XML::Parser::Expat;' > > then probably you have expat installed and dynamically loadable. > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0449-0, 30/11/2004 > Tested on: 02/12/2004 08:27:19 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0449-1, 02/12/2004 Tested on: 02/12/2004 12:40:55 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From dhoworth at mrc-lmb.cam.ac.uk Thu Dec 2 09:04:46 2004 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Thu Dec 2 09:02:59 2004 Subject: [Bioperl-l] Re: bad entries in interpro again In-Reply-To: <4.3.2.7.2.20041202094926.00cb63e8@vttmail.vtt.fi> References: <4.3.2.7.2.20041201160437.00cd4cb0@vttmail.vtt.fi> <4.3.2.7.2.20041202094926.00cb63e8@vttmail.vtt.fi> Message-ID: <41AF20FE.1030808@mrc-lmb.cam.ac.uk> Mikko Arvas wrote: > Sorry about that I should have tested it before mailing. The problem is > not non-ascii characters it seems to be specifically the combination of > two & inside individual <>. I tried various combinations and other > non-ascii characters (even in abundance) don't break it and a single & > does neither. > > Here is again the problematic line: > > > And its error: > not well-formed (invalid token) at line 2, column 54, byte 132 at > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi/XML/Parser.pm > line 187 > > So which way to proceed? I think some extra details might make it easier to see what is going on. Which file are you scanning? Since your original post a new version of Interpro has been released so I suggest giving a URL on the Interpro FTP site so everybody can be sure of looking at the same file. I have just run the Sun XML validator on ftp://ftp.ebi.ac.uk/pub/databases/interpro/match.xml.gz (after unpacking it) and it validates as correct XML. What version of XML::Parser are you using? I have just parsed that file with no errors using XML::Parser V2.34 on Suse 9.1 and this test script: #!/usr/bin/perl use strict; use warnings; use XML::Parser; my $pl = new XML::Parser(); $pl->parsefile('match.xml'); So on the surface, the problem doesn't seem to be with either the Interpro data or the XML parser. The file contains many lines identical to the one cited, which are all valid XML in accordance with the Interpro DTD, but none are line 2! So it looks like different data has been passed to XML::Parser. Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From hlapp at gnf.org Thu Dec 2 16:28:16 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Dec 2 16:25:59 2004 Subject: [Bioperl-l] Re: bad entries in interpro again In-Reply-To: <4.3.2.7.2.20041202094926.00cb63e8@vttmail.vtt.fi> References: <4.3.2.7.2.20041201160437.00cd4cb0@vttmail.vtt.fi> <4.3.2.7.2.20041202094926.00cb63e8@vttmail.vtt.fi> Message-ID: <140C7AFA-44A9-11D9-AA31-000A95AE92B0@gnf.org> This sounds more like an expat or XML::Parser problem. Have you tried to upgrade either? Maybe check with the authors of those modules? -hilmar On Dec 2, 2004, at 2:11 AM, Mikko Arvas wrote: > > Hi, > > At 13:49 1.12.2004 -0800, Hilmar Lapp wrote: > >> On Wednesday, December 1, 2004, at 08:16 AM, Mikko Arvas wrote: >> >>> Is the &apos the source of the problem? >> Did you try to take it out and see what happens? I.e., you can answer >> this yourself easily. >> I would have thought that it's not the problem, but it'd be great if >> you or somebody else helps out by testing what was suggested. > > Sorry about that I should have tested it before mailing. The problem > is not non-ascii characters it seems to be specifically the > combination of two & inside individual <>. I tried various > combinations and other non-ascii characters (even in abundance) don't > break it and a single & does neither. > > Here is again the problematic line: > > > And its error: > not well-formed (invalid token) at line 2, column 54, byte 132 at > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi/XML/Parser.pm > line 187 > > So which way to proceed? > >>> Is it really a problem in BioPerl or in expat? >> >> If the problem is outside of Interpro, it's Expat, not Bioperl. It's >> the XML parser library that threw up. >> >>> Is somebody trying to solve the problem for Bioperl now >>> and is there any sensible thing that the interpro team could do to >>> help? >> >> Depends on where the problem is. It appears that the Interpro team >> already eliminated the double quotes in names. The is some hard-coded >> stuff in interpro.pm that needs to be removed, and I heard Allen say >> he'll work on that. >> >> -hilmar > > Cheers, > mikko > > > > Mikko Arvas > VTT Biotechnology > > e-mail: mikko.arvas@vtt.fi > tel: +358-(0)9-456 5827 > mobile: +358-(0)44-381 0502 > fax: +358-(0)9-455 2103 > mail: Tietotie 2, Espoo > P.O. Box 1500 > FIN-02044 VTT, Finland > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Thu Dec 2 16:36:39 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Dec 2 16:34:16 2004 Subject: [Bioperl-l] Bioperl platforms and XML parser In-Reply-To: References: Message-ID: <3FFA2CF9-44AA-11D9-AA31-000A95AE92B0@gnf.org> This sounds very encouraging with respect to the Windows front. So at least, if I take your reports as representative of the Windows situation, then there's no shown show-stopper yet for requiring an XML parser. Hmm. Again, this is a good opportunity for anybody out there to voice the trouble you had when trying to install an XML parser on your platform. It almost sounds too good to be true that there isn't one ... Would anybody be scared if Bioperl wouldn't run w/o an XML parser? Trying to solicit some feedback on how audacious you want, or don't want, the developers to go. -hilmar On Dec 2, 2004, at 4:41 AM, Nathan Haigh wrote: > This is what I've found so far: > My System: > ActiveState Perl 5.8.0 build 804 has: > XML::Parser (v2.31) > XML::Parser::Expat (v2.31) > expat libs (v1.95.5) which are dynamically loaded. > I'm not sure what versions of these file Perl 5.6 has, if it's > important to know, I could find out. > > The t\Biblio.t test ran without errors or problems. The only possible > future problem I can see, is if BioPerl needs to use a recent > version of expat the would require windows user to update their > expat/XML::Parser, but aside from that things seem to work with > absolutely no problems whatsoever - there's a first! :o) > > Is this what you wanted to hear? If there are any more windows related > questions, I'll do my best to help out! > Nathan > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp@gmx.net] >> Sent: 01 December 2004 21:52 >> To: nathanhaigh@ukonline.co.uk >> Cc: 'Brian Osborne'; 'Bioperl' >> Subject: Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes >> >> >> On Wednesday, December 1, 2004, at 06:52 AM, Nathan Haigh wrote: >> >>> ActiveState Perl has a module called XML::Parser::expat. Could this >>> be >>> a perl implementation of the expat library? >>> >> >> No. It's the wrapper that will dynamically load the expat library. If >> you can 'use' this module like in >> >> $ perl -e 'use XML::Parser::Expat;' >> >> then probably you have expat installed and dynamically loadable. >> >> -hilmar >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> --- >> avast! Antivirus: Inbound message clean. >> Virus Database (VPS): 0449-0, 30/11/2004 >> Tested on: 02/12/2004 08:27:19 >> avast! is copyright (c) 2000-2003 ALWIL Software. >> http://www.avast.com >> >> > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0449-1, 02/12/2004 > Tested on: 02/12/2004 12:40:55 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Thu Dec 2 16:45:52 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Dec 2 16:43:27 2004 Subject: [Bioperl-l] Re: bad entries in interpro again In-Reply-To: <41AF20FE.1030808@mrc-lmb.cam.ac.uk> References: <4.3.2.7.2.20041201160437.00cd4cb0@vttmail.vtt.fi> <4.3.2.7.2.20041202094926.00cb63e8@vttmail.vtt.fi> <41AF20FE.1030808@mrc-lmb.cam.ac.uk> Message-ID: <89F6E812-44AB-11D9-AA31-000A95AE92B0@gnf.org> On Dec 2, 2004, at 6:04 AM, Dave Howorth wrote: > The file contains many lines identical to the one cited, which are all > valid XML in accordance with the Interpro DTD, but none are line 2! So > it looks like different data has been passed to XML::Parser. Well, yes, you can't translate the line# given by the error message into line# in the source file. SeqIO::interpro chops up the input at ... and then passes each chunk to the XML::Parser instance. There is no other editing of the chunks going on though except for a haphazard substitution of certain double-quotes. In order to see the chunk before it gets sent to the parser instance edit Bio/SeqIO/interpro.pm and before the line $self->parse_xml($xml_fragment); put a print statement that prints out the content of $xml_fragment. That should also give the exact source XML that trips up the parser. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From bfontain at iupui.edu Thu Dec 2 13:42:02 2004 From: bfontain at iupui.edu (Fontaine, Burr R) Date: Thu Dec 2 20:46:43 2004 Subject: [Bioperl-l] NCBI/Swissprot cross-ref Message-ID: <710690625AD28941BDAD9D2271264C5802502C51@iu-mssg-mbx08.exchange.iu.edu> Hi, Does anyone know if BioPERL can help me cross-reference gene and SNP ID's between NCBI and Swissprot? I can't find anything at NCBI or Swissprot that does this directly. The closest thing we've found so far for this is the kgxref table at UCSC, but this table does not includes SNP's. Also, this table appears to include Swiss-prot ID's for both proteins and genes in the same field, and I'm not sure how to sort these out. #kgID mRNA spID spDisplayID geneSymbol refseq protAcc description AY231461 AY231461 AAO84335 AAO84335 TAZ NM_000116 NP_000107 Tafazzin exon 5 deleted variant long form. AY231462 AY231462 AAO84336 AAO84336 TAZ NM_000116 NP_000107 Tafazzin exon 7 deleted variant long form. AY231463 AY231463 Q86XR0 Q86XR0 TAZ NM_000116 NP_000107 Tafazzin exon 5 and exon 7 deleted variant long form. AY258036 AY258036 Q86XQ9 Q86XQ9 TAZ NM_000116 NP_000107 Tafazzin short form. AY258037 AY258037 Q86XQ8 Q86XQ8 TAZ NM_000116 NP_000107 Tafazzin exon 5 and exon 7 deleted variant short form. AY258038 AY258038 Q86XQ7 Q86XQ7 TAZ NM_000116 NP_000107 Tafazzin exon 7 deleted variant short form. AY258039 AY258039 Q86XQ6 Q86XQ6 TAZ NM_000116 NP_000107 Tafazzin exon 5 deleted variant short form. BC005062 BC005062 Q7Z6N8 Q7Z6N8 TAZ NM_000116 NP_000107 Tafazzin, isoform 5. BC011515 BC011515 Q96F92 Q96F92 TAZ NM_000116 NP_000107 Similar to tafazzin (cardiomyopathy, dilated 3A (X-linked), endocardial fibroelastosis 2, Barth syndrome). X92762 X92762 Q16635 TFZ_HUMAN TAZ NM_000116 NP_000107 tafazzin (cardiomyopathy, dilated 3A (X-linked); endocardial fibroelastosis 2; Barth syndrome) Thanks in advance for your help. Burr Fontaine From cldwalker at chwhat.com Thu Dec 2 05:13:36 2004 From: cldwalker at chwhat.com (Gabriel Horner) Date: Thu Dec 2 20:47:08 2004 Subject: [Bioperl-l] bioperl article/report Message-ID: <20041202101336.GA7006@bigmama.chwhat.com> Hi again, I've written a 1500 word paper on perl and bioinformatics for a bioinformatics class. It was geared towards my teacher (surprise ;), meaning I talk about Bioperl more from a biological,qualitative viewpoint. Feel free to post this up if you think it'll be of use to anyone. I'm enclosing it below and attaching it. Good day, Gabriel Gabriel Horner 12/2/4 Perl and Bioinformatics The focus of this paper is to explore what is available for the bioinformatician in the world of Perl. First, we'll explore the Bioperl modules which are perl's main organized bioinformatic code base. Then we'll look at some programs that use perl. From these two explorations I aim to show some of Perl's strengths and weaknesses in the bioinformatics field and help the bioinformatician better choose when to use perl. The Bioperl modules are under the Bio::* namespace on CPAN (http://www.cpan.org). These modules began in 1995 when few other biological toolkits existed. After almost a decade, it has grown to over 300 modules with at least 20 developers. The modules themselves are not meant to be out of the box programs but rather reusable chunks that can be combined to create a wide variety of functionality. This functionality can be divided into the following topics, each to be covered in more detail: Sequences: Manipulate them ie read,write,translate Conversion: Convert between different sequence or alignment files Databases: Remote and local database access for sequences and references Graphics: Draw sequences by displaying discrete ranges on a number line ie annotations and contig maps Alignments: Create from sequences, manipulate, multiple alignment analysis External analysis: Performs queries through other programs ie ClustalW,TCoffee, EMBOSS, BLAST and at least 15 others Sequences, usually abbreviated as Seq, are perhaps the most widely used object. They are used to represent DNA,RNA or protein sequences. There are several different types including Bio::PrimarySeq for lightweight use and Bio::RichSeq which supports richer annotations. A standard Bio::Seq object can manipulate and save features and annotations as well as truncate,translate and reverse complement the sequence. Basic annotations are implemented using Bio::AnnotationI. Basic features are represented by Bio::SeqFeature::Generic objects. Features can be associated with other features, called sub-features, and annotations. Features that associate to particular locations on a sequence associate to a sequence's location object. A location on a sequence is its own object since locations can be varied enough. For example, an exon ,a feature, may have multiple locations or in an unfinished genome, a location may have some uncertainty. A sequence is usually read from a file or a database although it is possible to simply create a sequence with a given string. Reading in files is done through Bio::SeqIO. This object also acts as a stream as it has a sequence iterator method. Using this class, it is possible to convert between several different formats including Ace database, BSML, Chaos XML,EMBL, FASTA, GenBank, GCG,PIR, PLN, NCBI and SwissProt. Since a SeqIO object can define a filehandle, converting between formats is as simple as [1]: use Bio::SeqIO; $in = Bio::SeqIO->newFh(-file => "inputfilename" , '-format' => 'Fasta'); $out = Bio::SeqIO->newFh('-format' => 'EMBL'); # World's shortest Fasta<->EMBL format converter: print $out $_ while <$in>; Reading in sequences from databases can be approached in two ways. The first way is to use the correct Bio::DB::* module for the known database type ie indexed flat-file ,local relational or remote relational. Some currently supported remote databases include genbank, genpept,swissprot,biofetch and EMBL. For now, sequences are mainly retrieved by id or accession number. An example of retrieving a sequence is [2]: $gb = new Bio::DB::GenBank(); # this returns a Seq object : $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); Bio::Index::* modules are used to read and write local flat files. Since sequences are local and indexed, this is a fast way of retrieving sequences by unique keys. The second way of accessing data is via the OBDA (Open Bioinformatics Data Access) Registry system. This system, used by Bio{Perl,Java,Ruby,Python} programs, allows easily changing a program's database source by changing parameters in a configuration file. Displaying sequences is done through Bio::Graphics modules, an extension of the GD module. These modules can draw 'any type of map in which a set of discrete ranges need to be laid out on the number line.' [3] Usually the ranges describe a feature's location on a sequence and map onto a 'track' spanning the width of the display. It is possible for more than one feature to occupy the same track. A basic sequence alignment is represented by a Bio::SimpleAlign object. Some of its methods are adding/deleting sequences,fetching sequences and manipulating characters of all sequences. The actual alignment is done through interfaces to external programs under Bio::Tools::Run::Alignment::*. There are interfaces to ClustalW, BLAST's bl2seq, Lagan, TCoffee,StandAloneFasta and pSW. Like sequences, alignments can be read from files or other sources via Bio::AlignIO. Bioperl has an optional bundle of modules on CPAN called bioperl-run, under Bio::Tools::Run. This bundle has the common theme in that the modules provide a perl interface to external programs. A majority of the bundle interfaces to two main applications, EMBOSS and Pise, which themselves are program bundlers. Pise, http://www.pasteur.fr/recherche/unites/sis/Pise/ is a web-interface generator that wraps around commandline programs. By providing a more user-friendly and uniform interface to a variety of commands, it aims at overcoming the difficulty in learning a variety of commandline programs. Since it was developed with bioinformatic commands in mind, commands that take longer than a minute are assigned a job id and email the user when the job is done. It currently interfaces to about 150 commandline programs but can easily be extended to others. For a list of currently-interfaced programs check http://search.cpan.org/~birney/bioperl-run-1.4/ and look to everything under the Bio::Tools::Run::PiseApplication namespace. EMBOSS is a package of about 100 commands aimed at the molecular biology community. The package covers areas such as alignment,nucleic structure ,enzyme kinetics, feature tables,phylogeny and protein structure and composition. Having covered the basics of the Bioperl bundle, let us examine some common Perl bioinformatic applications. Ensembl, http://www.ensembl.org, is an open-source project that aims to organize data around the sequences of large genomes with an emphasis on human and mammilian genomes. In other words,it comprehensively annotates a genome and tries to link as many similar functional elements across genomes as possible . This is done in a thorough manner as they coordinate annotations with other groups that specialize in parts of a genome. When no annotations are provided, they generate annotations. When a feature is difficult to predict such as gene structures, a best guess is calculated and the evidence leading to this guess is linked for users to explore. The implementation details of Ensembl is beyond the scope of this paper but there are a few interesting points that the writers make. Perl was chosen for a few reasons, the main ones being its quick implementation time and its large dependency on the Bioperl toolkit. Ensembl borrowed heavily from Bioperl's object model and used its parsing of several sequence formats to its advantage. Four years since its inception, Ensembl now barely relies on Bioperl (a few Seq and SeqIO objects when I grepped inside its main library directory). According to Stabenau et Al. [4], disadvantages of using Perl have been its absence of compile time checking of function prototypes and its reference-count-based garbage collector. The former reason has led to many runtime errors. Another annotation perl program is GBrowse, http://www.gmod.org/ggb/ . This program is part of a larger project called GMOD whose goal is 'to develop resuable software components for model organism system databases' [5] or MODs. MODs collect data from research and experiments in efforts 'to connect genomic features to the classical biology of the organism' [6]. GBrowse aims for this goal by providing the biologist with the ability to view public annotations, search the full text of features, edit annotations with private annotations and publish the modified annotations. Its code is mainly all based on Bioperl, the rendering of images handled by Bio::Graphics modules and the communication with databases handled by Bio::DB modules. Perl was chosen because the authors believe its users would be more likely to know how to use it and extend GBrowse than with a language like C. Another reason was Bioperl's richness in the functionality it needed, graphics and a variety of database back ends. The final program this paper summarizes is MuGeN, http://www-mig.jouy.inra.fr/bdsi/MuGeN/. MuGeN can display multiple annotated genome portions from both local and remote sources. These maps can be combined with analysis results loaded from XML files. Some of the functionality overlaps with the previously mentioned programs as well as Entrez's Map Viewer and UCSC's Human Genome Browser. Unlike most of these programs, MuGeN also offers a batch mode from the commandline for a series of annotated images. Perl was chosen for this program because Bioperl offered the parsers for sequence files as well as a decent gui toolkit via Gtk-Perl. From this article, we can see that Perl's strength in the bioinformatics world is largely due to Bioperl. Of course it's also due to CPAN which offers the variety of modules that made gui,web and xml programming easy for the previewed programs. As for Perl's weakness mentioned by the Ensembl team, I agree weak prototyping can cause signficant headaches in a large project. But if you test thoroughly as you write code, most of the headaches can be avoided. I must note that not all of Bioperl's functionality was covered, most notably representation of non-sequence data. Footnotes 1. From perl documentation on Bio::SeqIO, http://search.cpan.org/perldoc?Bio::SeqIO 2. Reference 1 3. From perl documentation Bio::Graphics::Panel, http://search.cpan.org/perldoc?Bio::Graphics::Panel 4. Reference 8 5. Reference 10, pg 1 6. Reference 10, pg 1 References 1. Birney, E. BioPerlTutorial. http://search.cpan.org/~birney/bioperl-1.4/bptutorial.pl 2. Birney, E. et al. An Overview of Ensembl. Genome Research 2004 14: 925-928. 3. Hoebeke,M. et al. MuGeN: simultaneous exploration of multiple genomes and computer analysis results Bioinformatics, 2003; 19: 859??864. 4. Letondal, C. Bioperl course. http://www.pasteur.fr/recherche/unites/sis/formation/bioperl/ 5. Letondal, C., 2001. A Web interface generator for molecular biology programs in Unix. Bioinformatics, Jan 2001; 17: 73 - 82. 6. Osborne, B. http://bioperl.org/HOWTOs/Feature-Annotation/Feature-Annotation.txt 7.Rice,P et al. EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics June 2000; Vol 16, No 6. pp.276-277 8. Stabenau,A. et al. The Ensembl Core Software Libraries. Genome Res. 2004 14: 929-933. 9. Stajich, J. The Bioperl Toolkit: Perl Modules for the Life Sciences. 10. Stein,L. et al. The Generic Genome Browser: A Building Block for a Model Organism System Database. -- my looovely website -- http://www.chwhat.com BTW, IF chwhat.com goes down email me at gabriel.horner@cern.ch -------------- next part -------------- The focus of this paper is to explore what is available for the bioinformatician in the world of Perl. First, we'll explore the Bioperl modules which are perl's main organized bioinformatic code base. Then we'll look at some programs that use perl. From these two explorations I aim to show some of Perl's strengths and weaknesses in the bioinformatics field and help the bioinformatician better choose when to use perl. The Bioperl modules are under the Bio::* namespace on CPAN (http://www.cpan.org). These modules began in 1995 when few other biological toolkits existed. After almost a decade, it has grown to over 300 modules with at least 20 developers. The modules themselves are not meant to be out of the box programs but rather reusable chunks that can be combined to create a wide variety of functionality. This functionality can be divided into the following topics, each to be covered in more detail: Sequences: Manipulate them ie read,write,translate Conversion: Convert between different sequence or alignment files Databases: Remote and local database access for sequences and references Graphics: Draw sequences by displaying discrete ranges on a number line ie annotations and contig maps Alignments: Create from sequences, manipulate, multiple alignment analysis External analysis: Performs queries through other programs ie ClustalW,TCoffee, EMBOSS, BLAST and at least 15 others Sequences, usually abbreviated as Seq, are perhaps the most widely used object. They are used to represent DNA,RNA or protein sequences. There are several different types including Bio::PrimarySeq for lightweight use and Bio::RichSeq which supports richer annotations. A standard Bio::Seq object can manipulate and save features and annotations as well as truncate,translate and reverse complement the sequence. Basic annotations are implemented using Bio::AnnotationI. Basic features are represented by Bio::SeqFeature::Generic objects. Features can be associated with other features, called sub-features, and annotations. Features that associate to particular locations on a sequence associate to a sequence's location object. A location on a sequence is its own object since locations can be varied enough. For example, an exon ,a feature, may have multiple locations or in an unfinished genome, a location may have some uncertainty. A sequence is usually read from a file or a database although it is possible to simply create a sequence with a given string. Reading in files is done through Bio::SeqIO. This object also acts as a stream as it has a sequence iterator method. Using this class, it is possible to convert between several different formats including Ace database, BSML, Chaos XML,EMBL, FASTA, GenBank, GCG,PIR, PLN, NCBI and SwissProt. Since a SeqIO object can define a filehandle, converting between formats is as simple as [1]: use Bio::SeqIO; $in = Bio::SeqIO->newFh(-file => "inputfilename" , '-format' => 'Fasta'); $out = Bio::SeqIO->newFh('-format' => 'EMBL'); # World's shortest Fasta<->EMBL format converter: print $out $_ while <$in>; Reading in sequences from databases can be approached in two ways. The first way is to use the correct Bio::DB::* module for the known database type ie indexed flat-file ,local relational or remote relational. Some currently supported remote databases include genbank, genpept,swissprot,biofetch and EMBL. For now, sequences are mainly retrieved by id or accession number. An example of retrieving a sequence is [2]: $gb = new Bio::DB::GenBank(); # this returns a Seq object : $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); Bio::Index::* modules are used to read and write local flat files. Since sequences are local and indexed, this is a fast way of retrieving sequences by unique keys. The second way of accessing data is via the OBDA (Open Bioinformatics Data Access) Registry system. This system, used by Bio{Perl,Java,Ruby,Python} programs, allows easily changing a program's database source by changing parameters in a configuration file. Displaying sequences is done through Bio::Graphics modules, an extension of the GD module. These modules can draw 'any type of map in which a set of discrete ranges need to be laid out on the number line.' [3] Usually the ranges describe a feature's location on a sequence and map onto a 'track' spanning the width of the display. It is possible for more than one feature to occupy the same track. A basic sequence alignment is represented by a Bio::SimpleAlign object. Some of its methods are adding/deleting sequences,fetching sequences and manipulating characters of all sequences. The actual alignment is done through interfaces to external programs under Bio::Tools::Run::Alignment::*. There are interfaces to ClustalW, BLAST's bl2seq, Lagan, TCoffee,StandAloneFasta and pSW. Like sequences, alignments can be read from files or other sources via Bio::AlignIO. Bioperl has an optional bundle of modules on CPAN called bioperl-run, under Bio::Tools::Run. This bundle has the common theme in that the modules provide a perl interface to external programs. A majority of the bundle interfaces to two main applications, EMBOSS and Pise, which themselves are program bundlers. Pise, http://www.pasteur.fr/recherche/unites/sis/Pise/ is a web-interface generator that wraps around commandline programs. By providing a more user-friendly and uniform interface to a variety of commands, it aims at overcoming the difficulty in learning a variety of commandline programs. Since it was developed with bioinformatic commands in mind, commands that take longer than a minute are assigned a job id and email the user when the job is done. It currently interfaces to about 150 commandline programs but can easily be extended to others. For a list of currently-interfaced programs check http://search.cpan.org/~birney/bioperl-run-1.4/ and look to everything under the Bio::Tools::Run::PiseApplication namespace. EMBOSS is a package of about 100 commands aimed at the molecular biology community. The package covers areas such as alignment,nucleic structure ,enzyme kinetics, feature tables,phylogeny and protein structure and composition. Having covered the basics of the Bioperl bundle, let us examine some common Perl bioinformatic applications. Ensembl, http://www.ensembl.org, is an open-source project that aims to organize data around the sequences of large genomes with an emphasis on human and mammilian genomes. In other words,it comprehensively annotates a genome and tries to link as many similar functional elements across genomes as possible . This is done in a thorough manner as they coordinate annotations with other groups that specialize in parts of a genome. When no annotations are provided, they generate annotations. When a feature is difficult to predict such as gene structures, a best guess is calculated and the evidence leading to this guess is linked for users to explore. The implementation details of Ensembl is beyond the scope of this paper but there are a few interesting points that the writers make. Perl was chosen for a few reasons, the main ones being its quick implementation time and its large dependency on the Bioperl toolkit. Ensembl borrowed heavily from Bioperl's object model and used its parsing of several sequence formats to its advantage. Four years since its inception, Ensembl now barely relies on Bioperl (a few Seq and SeqIO objects when I grepped inside its main library directory). According to Stabenau et Al. [4], disadvantages of using Perl have been its absence of compile time checking of function prototypes and its reference-count-based garbage collector. The former reason has led to many runtime errors. Another annotation perl program is GBrowse, http://www.gmod.org/ggb/ . This program is part of a larger project called GMOD whose goal is 'to develop resuable software components for model organism system databases' [5] or MODs. MODs collect data from research and experiments in efforts 'to connect genomic features to the classical biology of the organism' [6]. GBrowse aims for this goal by providing the biologist with the ability to view public annotations, search the full text of features, edit annotations with private annotations and publish the modified annotations. Its code is mainly all based on Bioperl, the rendering of images handled by Bio::Graphics modules and the communication with databases handled by Bio::DB modules. Perl was chosen because the authors believe its users would be more likely to know how to use it and extend GBrowse than with a language like C. Another reason was Bioperl's richness in the functionality it needed, graphics and a variety of database back ends. The final program this paper summarizes is MuGeN, http://www-mig.jouy.inra.fr/bdsi/MuGeN/. MuGeN can display multiple annotated genome portions from both local and remote sources. These maps can be combined with analysis results loaded from XML files. Some of the functionality overlaps with the previously mentioned programs as well as Entrez's Map Viewer and UCSC's Human Genome Browser. Unlike most of these programs, MuGeN also offers a batch mode from the commandline for a series of annotated images. Perl was chosen for this program because Bioperl offered the parsers for sequence files as well as a decent gui toolkit via Gtk-Perl. From this article, we can see that Perl's strength in the bioinformatics world is largely due to Bioperl. Of course it's also due to CPAN which offers the variety of modules that made gui,web and xml programming easy for the previewed programs. As for Perl's weakness mentioned by the Ensembl team, I agree weak prototyping can cause signficant headaches in a large project. But if you test thoroughly as you write code, most of the headaches can be avoided. I must note that not all of Bioperl's functionality was covered, most notably representation of non-sequence data. Footnotes 1. From perl documentation on Bio::SeqIO, http://search.cpan.org/perldoc?Bio::SeqIO 2. Reference 1 3. From perl documentation Bio::Graphics::Panel, http://search.cpan.org/perldoc?Bio::Graphics::Panel 4. Reference 8 5. Reference 10, pg 1 6. Reference 10, pg 1 References 1. Birney, E. BioPerlTutorial. http://search.cpan.org/~birney/bioperl-1.4/bptutorial.pl 2. Birney, E. et al. An Overview of Ensembl. Genome Research 2004 14: 925-928. 3. Hoebeke,M. et al. MuGeN: simultaneous exploration of multiple genomes and computer analysis results Bioinformatics, 2003; 19: 859??864. 4. Letondal, C. Bioperl course. http://www.pasteur.fr/recherche/unites/sis/formation/bioperl/ 5. Letondal, C., 2001. A Web interface generator for molecular biology programs in Unix. Bioinformatics, Jan 2001; 17: 73 - 82. 6. Osborne, B. http://bioperl.org/HOWTOs/Feature-Annotation/Feature-Annotation.txt 7.Rice,P et al. EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics June 2000; Vol 16, No 6. pp.276-277 8. Stabenau,A. et al. The Ensembl Core Software Libraries. Genome Res. 2004 14: 929-933. 9. Stajich, J. The Bioperl Toolkit: Perl Modules for the Life Sciences. 10. Stein,L. et al. The Generic Genome Browser: A Building Block for a Model Organism System Database. From MAG at Stowers-Institute.org Thu Dec 2 12:16:34 2004 From: MAG at Stowers-Institute.org (Goel, Manisha) Date: Thu Dec 2 20:48:08 2004 Subject: [Bioperl-l] Module for secondary structure feature recognition ? Message-ID: <200412021714.iB2HE5Kr029496@portal.open-bio.org> Hi All, I am new to list as well as Bioperl, with some experience with perl. I need a perl script that does the following: I have a multiple sequence alignment, from which I want to extract blocks of alignment. This can possibly be taken care of easily by the Bio/SimpleAlign::slice BUT my problem being that I want some module that will decide what regions to cut dpending on the secondary structure features of the sequences. In other words.. I want a program to be able to judge the secondary str of all the sequences in the multiple sequence alignment and extract the region where all sequences have a consensus of the sec str. All sequences in the alignment have known pdb structures, so the secondary str information could be available in multiple formats- like from pdb file itself or dssp output file etc. I have gone through the FAQS and HOWTO's at Bioperl but could not come up with anything suitable. If anyone can please guide me to any such existing modules that even approximate the task.. I should probably be able to put it together to fit the bill or at least know where to get started. Thanks, -Manisha From MatsallaC at AGR.GC.CA Wed Dec 1 19:31:21 2004 From: MatsallaC at AGR.GC.CA (Matsalla, Chad) Date: Thu Dec 2 20:49:09 2004 Subject: [Bioperl-l] writing gff - what's the minumum set of objects? Message-ID: <40D827FE3890C1489175D6B8F3E602CB0F2DBE@onncrxms5.agr.gc.ca> Greetings all, I'm trying to determine the **minimum** required set of objects required to write a line of GFF. I'm doing this so that I can expand the types of annotations that can be geenrated from TIGR's Arabidopsis annotations. Parsing the XML is not a problem. What _is_ a problem is writing the gff. So far, I have found that this[1], which writes this[2], is the minimum required set of operations required to write, say, part of the reference sequence. Am I missing something or is this **really** complicated? Is there a simpler way to do this? As part of this effort, I've done a bit of work in gff.pm and Annotation.pm. I hope that this doesn't colide with similar work going on. In addition can Hilmar or someone familiar tell me why the 'type' annotation must be a Bio::Annotation::OntologyTerm ? Why can't it be an arbitrary string? An additional question is: in gff.pm, can I write an algorithm such that if a certain annotation occurs in an Annotated(say, GROUP) then it's used in the ninth field of the GFF rather than anything else. The reason for that is that if people want to control what grouping is used they need control over that field of the gff. Thanks for the input, Chad Matsalla [1] sub get_annotated_feature { my ($name,$coordset,$type,$annotation_string) = @_; my $feature = new Bio::SeqFeature::Annotated( -seq_id => $name, -source => "example", -primary => $name, -type => new Bio::Annotation::OntologyTerm(-name => 'Chromosome'), -start => $coordset->getEND5()->getData(), -end => $coordset->getEND3()->getData(), -strand => '+' ); $feature->add_Annotation('Name',new Bio::Annotation::SimpleValue($name)); return $feature; } [2] F8G22 example Chromosome 1 42801 . + . Name=F8G22 From ybcho at biomics.org Wed Dec 1 21:59:52 2004 From: ybcho at biomics.org (ybcho) Date: Thu Dec 2 20:49:10 2004 Subject: [Bioperl-l] SeqIO::refseq Message-ID: <200412020307.iB23781m016479@saju.kaist.ac.kr> I have been parsing RefSeq gpff files using Bioperl-1.4. But I found that $taxon_id was missed while printing with below script And it produced below taxon:9606 taxon:9606 taxon:9606 taxon:9606 GeneID:26278 LocusID:26278 MIM:604490 ..... from print "@db_xref\n"; I can not find why this happened. But, I can take taxonomy id from $taxonomy_id = $species->ncbi_taxid; After removing "&& ( $species->ncbi_taxid())" in 508 line of genbank.pm Because it has null value all the time. Can any one correct these? Cheers. ============= refseq parsing script ==================== foreach $feature(@features = $seq->get_SeqFeatures){ $location_type = $feature->location->location_type; $feature_type = $feature->primary_tag; %seen_tag = (); @tags = (); foreach $tag (@tags = $feature->get_all_tags){ $seen_tag{$tag}++; } $organism = $db_xref = $taxonomy_id = $strain = $plasmid = (); if ($feature_type eq "source"){ @db_xref = $feature->get_tag_values('db_xref') if exists $seen_tag{'map'}; print "@db_xref\n"; ($taxonomy_id) = $db_xref[0] =~ /taxon\:(\d+)/; ($strain) = $feature->get_tag_values('strain') if exists $seen_tag{'strain'}; ($plasmid) = $feature->get_tag_values('plasmid') if exists $seen_tag{'plasmid'}; print "$internal_id\t$organism\t$strain\t$taxonomy_id\t$plasmid\n" } ............. } ================================================================ From Russell.Smithies at agresearch.co.nz Wed Dec 1 22:38:23 2004 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu Dec 2 20:49:11 2004 Subject: [Bioperl-l] Blast graphic and image map? Message-ID: Hi all, I'm a noobie to Perl (and even newer to BioPerl) and want to produce a simple blast graphic and image map. I've got the example (render_blast4.pl) from the demos working fine and producing the .png image. The demo suggests I can get the image_map with something similar to "my @boxes = $panel->boxes " But from there, I'm not sure where to go. Can anyone point me to some code showing/explaining the next step? Or is there a better method? thanx, Russell Smithies > Bioinformatics Software Developer > AgResearch NZ > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From sdavis2 at mail.nih.gov Thu Dec 2 21:59:53 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Dec 2 21:57:32 2004 Subject: [Bioperl-l] writing gff - what's the minumum set of objects? References: <40D827FE3890C1489175D6B8F3E602CB0F2DBE@onncrxms5.agr.gc.ca> Message-ID: <001001c4d8e4$29b1ae70$7d75f345@WATSON> Chad, is this what you are looking for? http://www.sanger.ac.uk/Software/formats/GFF/ Sean ----- Original Message ----- From: "Matsalla, Chad" To: Sent: Wednesday, December 01, 2004 7:31 PM Subject: [Bioperl-l] writing gff - what's the minumum set of objects? > > Greetings all, > > I'm trying to determine the **minimum** required set of objects > required to write a line of GFF. I'm doing this so that I can expand the > types of annotations that can be geenrated from TIGR's Arabidopsis > annotations. > > Parsing the XML is not a problem. What _is_ a problem is writing the > gff. > > So far, I have found that this[1], which writes this[2], is the minimum > required set of operations required to write, say, part of the reference > sequence. > > Am I missing something or is this **really** complicated? Is there a > simpler way to do this? > > As part of this effort, I've done a bit of work in gff.pm and > Annotation.pm. I hope that this doesn't colide with similar work going > on. > > In addition can Hilmar or someone familiar tell me why the 'type' > annotation must be a Bio::Annotation::OntologyTerm ? Why can't it be an > arbitrary string? > > An additional question is: > in gff.pm, can I write an algorithm such that if a certain annotation > occurs in an Annotated(say, GROUP) then it's used in the ninth field of > the GFF rather than anything else. > > The reason for that is that if people want to control what grouping is > used they need control over that field of the gff. > > Thanks for the input, > > Chad Matsalla > > > > [1] > sub get_annotated_feature { > my ($name,$coordset,$type,$annotation_string) = @_; > my $feature = new Bio::SeqFeature::Annotated( > -seq_id => $name, > -source => "example", > -primary => $name, > -type => new Bio::Annotation::OntologyTerm(-name => > 'Chromosome'), > -start => $coordset->getEND5()->getData(), > -end => $coordset->getEND3()->getData(), > -strand => '+' > ); > $feature->add_Annotation('Name',new > Bio::Annotation::SimpleValue($name)); > return $feature; > } > > [2] > F8G22 example Chromosome 1 42801 . + . > Name=F8G22 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From sdavis2 at mail.nih.gov Thu Dec 2 22:05:44 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Dec 2 22:03:20 2004 Subject: [Bioperl-l] NCBI/Swissprot cross-ref References: <710690625AD28941BDAD9D2271264C5802502C51@iu-mssg-mbx08.exchange.iu.edu> Message-ID: <004001c4d8e4$fb17ca80$7d75f345@WATSON> Burr, You can simply use the ucsc table browser to find the overlap of SNPs with genes by doing an intersection. Note that for human, only the July 2003 build has a SNP table right now. Perhaps in the next week, I think, they will have the SNP table in the 2004 build (build 35). Sean ----- Original Message ----- From: "Fontaine, Burr R" To: Sent: Thursday, December 02, 2004 1:42 PM Subject: [Bioperl-l] NCBI/Swissprot cross-ref Hi, Does anyone know if BioPERL can help me cross-reference gene and SNP ID's between NCBI and Swissprot? I can't find anything at NCBI or Swissprot that does this directly. The closest thing we've found so far for this is the kgxref table at UCSC, but this table does not includes SNP's. Also, this table appears to include Swiss-prot ID's for both proteins and genes in the same field, and I'm not sure how to sort these out. #kgID mRNA spID spDisplayID geneSymbol refseq protAcc description AY231461 AY231461 AAO84335 AAO84335 TAZ NM_000116 NP_000107 Tafazzin exon 5 deleted variant long form. AY231462 AY231462 AAO84336 AAO84336 TAZ NM_000116 NP_000107 Tafazzin exon 7 deleted variant long form. AY231463 AY231463 Q86XR0 Q86XR0 TAZ NM_000116 NP_000107 Tafazzin exon 5 and exon 7 deleted variant long form. AY258036 AY258036 Q86XQ9 Q86XQ9 TAZ NM_000116 NP_000107 Tafazzin short form. AY258037 AY258037 Q86XQ8 Q86XQ8 TAZ NM_000116 NP_000107 Tafazzin exon 5 and exon 7 deleted variant short form. AY258038 AY258038 Q86XQ7 Q86XQ7 TAZ NM_000116 NP_000107 Tafazzin exon 7 deleted variant short form. AY258039 AY258039 Q86XQ6 Q86XQ6 TAZ NM_000116 NP_000107 Tafazzin exon 5 deleted variant short form. BC005062 BC005062 Q7Z6N8 Q7Z6N8 TAZ NM_000116 NP_000107 Tafazzin, isoform 5. BC011515 BC011515 Q96F92 Q96F92 TAZ NM_000116 NP_000107 Similar to tafazzin (cardiomyopathy, dilated 3A (X-linked), endocardial fibroelastosis 2, Barth syndrome). X92762 X92762 Q16635 TFZ_HUMAN TAZ NM_000116 NP_000107 tafazzin (cardiomyopathy, dilated 3A (X-linked); endocardial fibroelastosis 2; Barth syndrome) Thanks in advance for your help. Burr Fontaine -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From ed at compbio.berkeley.edu Fri Dec 3 03:01:34 2004 From: ed at compbio.berkeley.edu (Ed Green) Date: Fri Dec 3 02:59:11 2004 Subject: [Bioperl-l] Module for secondary structure feature recognition ? In-Reply-To: <200412021714.iB2HE5Kr029496@portal.open-bio.org> References: <200412021714.iB2HE5Kr029496@portal.open-bio.org> Message-ID: <41B01D5E.4000402@compbio.berkeley.edu> Hi Manisha, Check out Bio::Structure::SecStr::STRIDE or Bio::Structure::SecStr::DSSP These modules parse the output of the STRIDE and DSSP programs. Both of these programs take as input a pdb file and calculate secondary structure based on similar, but not identical, criteria. You can the "objectify" this output using the modules listed above. Unfortunately, these modules are not well integrated with the rest of bioperl (my fault). It may also be somewhat tricky to map PDB residues to residues in you sequence alignment - you may have to do another alignment for this. The modules are well-documented, though, (try perldoc Bio::Structure::SecStr::STRIDE::Res) and the analysis you are describing certainly sounds feasible. I would try this: 1. Do alignment and get a Bio::SimpleAlign of your sequences 2. Map pdb residues to residues of sequences in Bio::SimpleAlign object, somehow 3. Run STRIDE on pdb of each sequence, objectify using Bio::Structure::SecStr::STRIDE::Res 4. Call secBounds method on each object to get the boundaries of each secondary structure element. 5. Map these boundaries through pdb-to-alignment sequence mapping found in #2. 6. Extract sequence slices from #1 7. Make brilliant observation(s) about alignments of secondary structure elements. Good luck, Ed Green UC Berkeley Goel, Manisha wrote: > Hi All, > > I am new to list as well as Bioperl, with some experience with perl. > > > I need a perl script that does the following: > > I have a multiple sequence alignment, from which I want to extract > blocks of alignment. This can possibly be taken care of easily by the > Bio/SimpleAlign::slice > > BUT my problem being that I want some module that will decide what > regions to cut dpending on the secondary structure features of the > sequences. > > In other words.. I want a program to be able to judge the secondary str > of all the sequences in the multiple sequence alignment and extract the > region where all sequences have a consensus of the sec str. > > All sequences in the alignment have known pdb structures, so the > secondary str information could be available in multiple formats- like > from pdb file itself or dssp output file etc. > > I have gone through the FAQS and HOWTO's at Bioperl but could not come > up with anything suitable. > If anyone can please guide me to any such existing modules that even > approximate the task.. I should probably be able to put it together to > fit the bill or at least know where to get started. > > Thanks, > -Manisha > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From birney at ebi.ac.uk Fri Dec 3 03:57:51 2004 From: birney at ebi.ac.uk (Ewan Birney) Date: Fri Dec 3 03:55:32 2004 Subject: [Bioperl-l] NCBI/Swissprot cross-ref In-Reply-To: <710690625AD28941BDAD9D2271264C5802502C51@iu-mssg-mbx08.exchange.iu.edu> Message-ID: On Thu, 2 Dec 2004, Fontaine, Burr R wrote: > Hi, > > > > Does anyone know if BioPERL can help me cross-reference gene and SNP > ID's between NCBI and Swissprot? I can't find anything at NCBI or > Swissprot that does this directly. > Do you mean SNP ids to Variation IDs in Swissprot? In the swissprot files some variations do have dbSNP ids (I believe in the feature table) and I think there is a goal to get this done better in the future in swissprot. Swissprot definitely holds many more variants which are just mentioned in papers, which are often the ones with phenotypic effects. > > > The closest thing we've found so far for this is the kgxref table at > UCSC, but this table does not includes SNP's. Also, this table appears > to include Swiss-prot ID's for both proteins and genes in the same > field, and I'm not sure how to sort these out. > > > > #kgID mRNA spID spDisplayID geneSymbol refseq > protAcc description > > AY231461 AY231461 AAO84335 AAO84335 TAZ NM_000116 > NP_000107 Tafazzin exon 5 deleted variant long form. > > AY231462 AY231462 AAO84336 AAO84336 TAZ NM_000116 > NP_000107 Tafazzin exon 7 deleted variant long form. > > AY231463 AY231463 Q86XR0 Q86XR0 TAZ NM_000116 > NP_000107 Tafazzin exon 5 and exon 7 deleted variant long form. > > AY258036 AY258036 Q86XQ9 Q86XQ9 TAZ NM_000116 > NP_000107 Tafazzin short form. > > AY258037 AY258037 Q86XQ8 Q86XQ8 TAZ NM_000116 > NP_000107 Tafazzin exon 5 and exon 7 deleted variant short form. > > AY258038 AY258038 Q86XQ7 Q86XQ7 TAZ NM_000116 > NP_000107 Tafazzin exon 7 deleted variant short form. > > AY258039 AY258039 Q86XQ6 Q86XQ6 TAZ NM_000116 > NP_000107 Tafazzin exon 5 deleted variant short form. > > BC005062 BC005062 Q7Z6N8 Q7Z6N8 TAZ NM_000116 > NP_000107 Tafazzin, isoform 5. > > BC011515 BC011515 Q96F92 Q96F92 TAZ NM_000116 > NP_000107 Similar to tafazzin (cardiomyopathy, dilated 3A > (X-linked), endocardial fibroelastosis 2, Barth syndrome). > > X92762 X92762 Q16635 TFZ_HUMAN TAZ NM_000116 > NP_000107 tafazzin (cardiomyopathy, dilated 3A (X-linked); endocardial > fibroelastosis 2; Barth syndrome) > > > > Thanks in advance for your help. > > > > Burr Fontaine > > > > From gongwuming at gmail.com Fri Dec 3 05:48:23 2004 From: gongwuming at gmail.com (Wuming Gong) Date: Fri Dec 3 05:46:25 2004 Subject: [Bioperl-l] Given a protein sequence (and its Genbank accession number), how to get the accession number of corresponding mRNA sequence? Message-ID: <24d6fd0504120302482ff0848b@mail.gmail.com> Hi list, Could you please give me some clues on how to map a protein sequences which accession number is already known to a mRNA sequences in GenBank? Thanks! Wuming From gongwuming at gmail.com Fri Dec 3 05:56:05 2004 From: gongwuming at gmail.com (Wuming Gong) Date: Fri Dec 3 05:53:36 2004 Subject: [Bioperl-l] Re: Given a protein sequence (and its Genbank accession number), how to get the accession number of corresponding mRNA sequence? In-Reply-To: <24d6fd0504120302482ff0848b@mail.gmail.com> References: <24d6fd0504120302482ff0848b@mail.gmail.com> Message-ID: <24d6fd050412030256b449968@mail.gmail.com> Hi list, In fact, I wanna get the coding sequences of the given proteins. Can I perform the job by tblastn the proteins to refseq databases followed by extracting the CDS sequences according to the start and end locus of CDS in the SeqFeature section ? Thanks. Wuming On Fri, 3 Dec 2004 18:48:23 +0800, Wuming Gong wrote: > Hi list, > > Could you please give me some clues on how to map a protein sequences > which accession number is already known to a mRNA sequences in > GenBank? > > Thanks! > > Wuming > From vseri at cse.unl.edu Thu Dec 2 21:30:19 2004 From: vseri at cse.unl.edu (Vishal Seri) Date: Fri Dec 3 09:34:47 2004 Subject: [Bioperl-l] NCBI/Whitehead cross-ref Message-ID: Hi, I have all the protein sequences for Neurospora crassa in both NCBI accession number and Whitehead formats. I would like to map each NCBI accession numbers to the corresponding Whitehead accession number. Is there any way to do this in bio-perl? thanks for any help rendered, Vishal, ========================================================================= Vishal Kumar Seri Graduate Student Department of Computer Science University of Nebraska-Lincoln. e-mail : vseri@cse.unl.edu From ybcho at biomics.org Thu Dec 2 23:10:54 2004 From: ybcho at biomics.org (ybcho) Date: Fri Dec 3 09:34:49 2004 Subject: [Bioperl-l] SeqIO::genbank.pm error. Message-ID: <200412030418.iB34IC1m002638@saju.kaist.ac.kr> I have been parsing RefSeq gpff files using Bioperl-1.4. But I found that $taxon_id was missed while printing with below script And it produced below taxon:9606 taxon:9606 taxon:9606 taxon:9606 GeneID:26278 LocusID:26278 MIM:604490 <== erroneous print out. ..... from print "@db_xref\n"; I can not find why this happened. But, I can take taxonomy id from $taxonomy_id = $species->ncbi_taxid; After removing "&& ( $species->ncbi_taxid())" in 508 line of genbank.pm Because it has null value all the time. Can any one correct these? Cheers. ============= refseq parsing script ==================== foreach $feature(@features = $seq->get_SeqFeatures){ $location_type = $feature->location->location_type; $feature_type = $feature->primary_tag; %seen_tag = (); @tags = (); foreach $tag (@tags = $feature->get_all_tags){ $seen_tag{$tag}++; } $organism = $db_xref = $taxonomy_id = $strain = $plasmid = (); if ($feature_type eq "source"){ @db_xref = $feature->get_tag_values('db_xref') if exists $seen_tag{'map'}; print "@db_xref\n"; ($taxonomy_id) = $db_xref[0] =~ /taxon\:(\d+)/; ($strain) = $feature->get_tag_values('strain') if exists $seen_tag{'strain'}; ($plasmid) = $feature->get_tag_values('plasmid') if exists $seen_tag{'plasmid'}; print "$internal_id\t$organism\t$strain\t$taxonomy_id\t$plasmid\n" } ............. } ================================================================ From malatorr at genoma.ciencias.uchile.cl Fri Dec 3 13:17:06 2004 From: malatorr at genoma.ciencias.uchile.cl (Mariano Latorre A) Date: Fri Dec 3 13:15:43 2004 Subject: [Bioperl-l] problems parsing EBI interposscan.xml Message-ID: <1102097826.5668.10.camel@peach4> I installed Bioperl 1.4 (also installed dependencies Heap and Graph). I need to parser interproscan xml reports. When I run "make test" it passed the Interproscan_parser test ok. But when I perform a Interproscan at EBI I get a XML that can not be parsed. Bioperl says: Can't call method "identifier" on an undefined value at /usr/lib/perl5/site_perl/5.8.3/Bio/Ontology/SimpleOntologyEngine.pm line 410. So I go to the test directory inside the bioperl installation and check the differences and the xml generated by EBI and the one provided by bioperl installation package and notice that they are totally different!!! I paste both file beginings (as you'll see they uses different tags...): Thanks! Mariano 1.- the one provided for bioperl testing: Kringle Kringles are autonomous structural domains, found throughout the blood clotting and fibrinolytic proteins. Kringle domains are believed to play a role in binding mediators (e.g., membranes, other proteins or phospholipids), and in the regulation of proteolytic activity , , . Kringle domains , , are characterised by a triple loop, 3-disulphide bridge structure, whose conformation is defined by a number of hydrogen bonds and small pieces of anti-parallel beta-sheet. They are found in a varying number of copies, in some serine proteases and plasma proteins. Blood coagulation factor XII (Hageman factor) (1 copy) Urokinase-type plasminogen activator (1 copy) Hepatocyte growth factor (HGF) (4 copies) Hepatocyte growth factor activator (1 copy) Plasminogen (5 copies) Hepatocyte growth factor like protein (4 copies) 2.- The ouptput from EBI INTERPRO:
Molecular Function methionine adenosyltransferase activity Molecular Function ATP binding Biological Process one-carbon compound metabolism From barry.moore at genetics.utah.edu Fri Dec 3 17:57:21 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Sat Dec 4 13:49:03 2004 Subject: [Bioperl-l] Re: Given a protein sequence (and its Genbank accession number) Message-ID: <41B0EF51.9050104@genetics.utah.edu> Wuming- Try the file gene2refseq and/or gene2accession from ftp://ftp.ncbi.nih.gov/gene for the mapping. It lists mRNA and protein ids. After you've got you mRNA ids, use Bio::DB::RefSeq to grab the mRNA, and extract the CDS coordinates from the features table as you suggested, and then the sequence. B Wuming Gong wrote: >Hi list, > >In fact, I wanna get the coding sequences of the given proteins. Can I >perform the job by tblastn the proteins to refseq databases followed >by extracting the CDS sequences according to the start and end locus >of CDS in the SeqFeature section ? Thanks. > >Wuming > > >On Fri, 3 Dec 2004 18:48:23 +0800, Wuming Gong wrote: > > >>Hi list, >> >>Could you please give me some clues on how to map a protein sequences >>which accession number is already known to a mRNA sequences in >>GenBank? >> >>Thanks! >> >>Wuming >> >> >> >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From denisemdolan at hotmail.com Fri Dec 3 15:21:03 2004 From: denisemdolan at hotmail.com (denise dolan) Date: Sat Dec 4 13:49:20 2004 Subject: [Bioperl-l] SWAT Implementation Message-ID: An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041203/54e942b8/attachment.htm From ERubin at CGR.Harvard.edu Fri Dec 3 09:59:25 2004 From: ERubin at CGR.Harvard.edu (Eitan Rubin) Date: Sat Dec 4 13:49:39 2004 Subject: [Bioperl-l] New item for Briefings in bioinformatics? Message-ID: <339D68B133EAD311971E009027DC47970218EE4E@montecarlo.cgr.harvard.edu> Hi, I find the progress of the BioPerl very exciting, and would like to include a news item in the next column I am editing for Briefings in Bioinformatics. Would anyone who is familiar with the recent progress in Bioperl be willing to write a short (500-2000) words news item? While this is not a peer-reviewed publication, the abstract of the new items go into pub-med. Eitan Rubin -------------------- Eitan Rubin, PhD News Column Editor, Briefings in Bioinformatics Head of Bioinformatics The Bauer Center for Genomics Research Harvard University Tel: 617-496-5649 Fax: 617-495-2196 From aqureshi at cs.odu.edu Sat Dec 4 15:07:34 2004 From: aqureshi at cs.odu.edu (Affan Qureshi) Date: Sat Dec 4 15:17:08 2004 Subject: [Bioperl-l] New item for Briefings in bioinformatics? In-Reply-To: <339D68B133EAD311971E009027DC47970218EE4E@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC47970218EE4E@montecarlo.cgr.harvard.edu> Message-ID: <32791.68.10.90.185.1102190854.squirrel@cartero.cs.odu.edu> Thanks for your interest. If you search the archives of this list many people have written this type of papers/articles for BioPerl. You might find those helpful. > Hi, > > > > I find the progress of the BioPerl very exciting, and would like to > include a news item in the next column I am editing for Briefings in > Bioinformatics. Would anyone who is familiar with the recent progress in > Bioperl be willing to write a short (500-2000) words news item? While > this is not a > peer-reviewed publication, the abstract of the new items go into > pub-med. > > > > Eitan Rubin > > > > -------------------- > > Eitan Rubin, PhD > > News Column Editor, Briefings in Bioinformatics > > > > Head of Bioinformatics > > The Bauer Center for Genomics Research > > Harvard University > > Tel: 617-496-5649 Fax: 617-495-2196 > > From amackey at pcbi.upenn.edu Sat Dec 4 18:17:50 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Sat Dec 4 18:16:07 2004 Subject: [Bioperl-l] SWAT Implementation In-Reply-To: References: Message-ID: <41B2459E.7050201@pcbi.upenn.edu> The FASTA search suite includes the "reference" implementation of Smith-Waterman, called "ssearch"; ssearch in fact implements the SWAT optimization, while the program "osearch" implements the "traditional" S-W algorithm. You can find the code for each in dropnsw.c (normal smith waterman) and dropgsw.c (green smith waterman) from the FASTA pacakge (ftp://ftp.virginia.edu/pub/fasta/). -Aaron denise dolan wrote: > Hi. I am working on a program for my thesis. It uses the smith-waterman > algorithm and i want to use the SWAT implementation to speed it up. I > can't seem to find an accurate description or code for it and was > wondering if anyone could help. Would be really grateful of any info as > i am totally stuck. Thanks > > ------------------------------------------------------------------------ > Express yourself instantly with MSN Messenger! MSN Messenger > Download today it's FREE! > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From amackey at pcbi.upenn.edu Sat Dec 4 18:18:57 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Sat Dec 4 18:16:11 2004 Subject: [Bioperl-l] New item for Briefings in bioinformatics? In-Reply-To: <339D68B133EAD311971E009027DC47970218EE4E@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC47970218EE4E@montecarlo.cgr.harvard.edu> Message-ID: <41B245E1.8000402@pcbi.upenn.edu> I'm happy to write this up as a companion to the upcoming 1.5 release ... but are there any other takers? Perhaps a GMOD companion piece could be arranged? -Aaron Eitan Rubin wrote: > Hi, > > > > I find the progress of the BioPerl very exciting, and would like to include > a news item in the next column I am editing for Briefings in Bioinformatics. > Would anyone who is familiar with the recent progress in Bioperl be willing > to write a short (500-2000) words news item? While this is not a > peer-reviewed publication, the abstract of the new items go into pub-med. > > > > Eitan Rubin > > > > -------------------- > > Eitan Rubin, PhD > > News Column Editor, Briefings in Bioinformatics > > > > Head of Bioinformatics > > The Bauer Center for Genomics Research > > Harvard University > > Tel: 617-496-5649 Fax: 617-495-2196 > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Sat Dec 4 18:51:00 2004 From: skirov at utk.edu (Stefan Kirov) Date: Sat Dec 4 18:49:08 2004 Subject: [Bioperl-l] New item for Briefings in bioinformatics? Message-ID: <41B24D64.7070501@utk.edu> Aaron is probably the most obvious person, maybe Brian Osborn, Jason Stajich and few others as well (just looking at the code committed by each of them in the past month). The other very interesting project is BioSQL (more or less overlapping with BioPerl). Perhaps you might be interested in that too or Aaron and I guess Hilmar Lapp could join efforts... Just a humble suggestion. Stefan From skirov at utk.edu Sat Dec 4 18:51:00 2004 From: skirov at utk.edu (Stefan Kirov) Date: Sat Dec 4 18:49:10 2004 Subject: [Bioperl-l] New item for Briefings in bioinformatics? Message-ID: <41B24D64.7070501@utk.edu> Aaron is probably the most obvious person, maybe Brian Osborn, Jason Stajich and few others as well (just looking at the code committed by each of them in the past month). The other very interesting project is BioSQL (more or less overlapping with BioPerl). Perhaps you might be interested in that too or Aaron and I guess Hilmar Lapp could join efforts... Just a humble suggestion. Stefan From tc.jones at jones.tc Sun Dec 5 13:30:36 2004 From: tc.jones at jones.tc (Terry Jones) Date: Sun Dec 5 13:28:09 2004 Subject: [Bioperl-l] Simple question on reading a .msf file Message-ID: <16819.21452.472745.136814@terry.jones.tc> Hi. I'm new to bioperl (though not to perl itself). I have what I think should be a simple task, and yet after reading the FAQ and various POD and web pages, I can't make it work. I'm probably doing something fundamentally wrong. I've been sent a file with a .msf suffix. The start of the file is at the bottom of this email. A bit of googling leads me to think I have a multiple sequence alignment file of type msf, as produced by a tool like clustalw. I want to extract the sequences, do various things to them, write them out as fasta, etc. After some reading and experimenting, it seems that to read the file I should do my $in = Bio::AlignIO->new(-file => $file, -format => 'msf'); This works, but it seems there is a problem because a subsequent call to $in->next_aln() results in an exception: -------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is 1 --------------------------------------------------- ------------- EXCEPTION ------------- MSG: Attempting to set the sequence to [TTCC (some lines of DNA bases deleted here) ACGGTTGG~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~] which does not look healthy STACK Bio::PrimarySeq::seq /Library/Perl/5.8.1/Bio/PrimarySeq.pm:264 STACK Bio::PrimarySeq::new /Library/Perl/5.8.1/Bio/PrimarySeq.pm:214 STACK Bio::LocatableSeq::new /Library/Perl/5.8.1/Bio/LocatableSeq.pm:100 STACK Bio::AlignIO::msf::next_aln /Library/Perl/5.8.1/Bio/AlignIO/msf.pm:131 STACK toplevel ./simple.pl:9 So it seems to me that I'm going about this the wrong way. Other .msf files I've seen, from other sources, also contain ~ characters, and I can't read these as above either. Is there something obviously wrong in what I'm doing? One thing that makes me feel I'm partly on the right track is that if I simply remove all the ~ characters from the .msf file, then next_aln works and I can call $s = $aln->each_seq() on that and then $s->seq() on that, and get out sequences. But this is surely wrong, as deleting all the ~ characters changes the sequences. I've tried $in->gap_char('~'); but this fails, telling me Can't locate object method "gap_char" via package "Bio::AlignIO::msf" at ./simple.pl line 11. gap_char is defined in Bio::SimpleAlign, but when I try using Bio::SimpleAlign instead of Bio::AlignIO, the method next_aln() is no longer available. So that's not right either. Can someone tell me what I'm doing wrong here? Regards, Terry. -------------------------------------------------------------------- The start of my .msf file: !!NA_MULTIPLE_ALIGNMENT 1.0 PileUp of: *.ha1 Symbol comparison table: GenRunData:pileupdna.cmp CompCheck: 3341 GapWeight: 5 GapLengthWeight: 1 file.msf MSF: 1106 Type: N November 12, 2003 11:38 Check: 9322 .. Name: jjtll Len: 1032 Check: 2l50 Weight: 1.00 Name: ltkaa Len: 1032 Check: 1129 Weight: 1.00 From brian_osborne at cognia.com Sun Dec 5 14:12:52 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Sun Dec 5 14:11:09 2004 Subject: [Bioperl-l] Simple question on reading a .msf file In-Reply-To: <16819.21452.472745.136814@terry.jones.tc> Message-ID: Terry, Try map_chars() rather than gap_chars, see if that works. It's described in the Bio::AlignIO::msf POD. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Terry Jones Sent: Sunday, December 05, 2004 1:31 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Simple question on reading a .msf file Hi. I'm new to bioperl (though not to perl itself). I have what I think should be a simple task, and yet after reading the FAQ and various POD and web pages, I can't make it work. I'm probably doing something fundamentally wrong. I've been sent a file with a .msf suffix. The start of the file is at the bottom of this email. A bit of googling leads me to think I have a multiple sequence alignment file of type msf, as produced by a tool like clustalw. I want to extract the sequences, do various things to them, write them out as fasta, etc. After some reading and experimenting, it seems that to read the file I should do my $in = Bio::AlignIO->new(-file => $file, -format => 'msf'); This works, but it seems there is a problem because a subsequent call to $in->next_aln() results in an exception: -------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is 1 --------------------------------------------------- ------------- EXCEPTION ------------- MSG: Attempting to set the sequence to [TTCC (some lines of DNA bases deleted here) ACGGTTGG~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~] which does not look healthy STACK Bio::PrimarySeq::seq /Library/Perl/5.8.1/Bio/PrimarySeq.pm:264 STACK Bio::PrimarySeq::new /Library/Perl/5.8.1/Bio/PrimarySeq.pm:214 STACK Bio::LocatableSeq::new /Library/Perl/5.8.1/Bio/LocatableSeq.pm:100 STACK Bio::AlignIO::msf::next_aln /Library/Perl/5.8.1/Bio/AlignIO/msf.pm:131 STACK toplevel ./simple.pl:9 So it seems to me that I'm going about this the wrong way. Other .msf files I've seen, from other sources, also contain ~ characters, and I can't read these as above either. Is there something obviously wrong in what I'm doing? One thing that makes me feel I'm partly on the right track is that if I simply remove all the ~ characters from the .msf file, then next_aln works and I can call $s = $aln->each_seq() on that and then $s->seq() on that, and get out sequences. But this is surely wrong, as deleting all the ~ characters changes the sequences. I've tried $in->gap_char('~'); but this fails, telling me Can't locate object method "gap_char" via package "Bio::AlignIO::msf" at ./simple.pl line 11. gap_char is defined in Bio::SimpleAlign, but when I try using Bio::SimpleAlign instead of Bio::AlignIO, the method next_aln() is no longer available. So that's not right either. Can someone tell me what I'm doing wrong here? Regards, Terry. -------------------------------------------------------------------- The start of my .msf file: !!NA_MULTIPLE_ALIGNMENT 1.0 PileUp of: *.ha1 Symbol comparison table: GenRunData:pileupdna.cmp CompCheck: 3341 GapWeight: 5 GapLengthWeight: 1 file.msf MSF: 1106 Type: N November 12, 2003 11:38 Check: 9322 .. Name: jjtll Len: 1032 Check: 2l50 Weight: 1.00 Name: ltkaa Len: 1032 Check: 1129 Weight: 1.00 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From rkostadi at gmail.com Sun Dec 5 15:14:15 2004 From: rkostadi at gmail.com (Rumen Kostadinov) Date: Sun Dec 5 15:12:02 2004 Subject: [Bioperl-l] How to retrieve/parse RefSeq contig entries with Bio::DB::Query::GenBank? Message-ID: Hi, Is there a way to retrieve and parse RefSeq contig entries with bioperl using the Bio::DB::Query::GenBank? e.g. NT_079581 I get weird parsing when doing: my $query_string = param('query'); my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', -query=>$query_string); my $count = $query->count; my @ids = $query->ids; # get a genbank database handle my $gb = new Bio::DB::GenBank; my $stream = $gb->get_Stream_by_query($query); while (my $seq = $stream->next_seq) { print ""; print ''; print $seq->accession_number(), ' '; print "", $seq->desc(), br; } Sincerely, Rumen Kostadinov From jason.stajich at duke.edu Sun Dec 5 15:25:21 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Dec 5 15:22:47 2004 Subject: [Bioperl-l] How to retrieve/parse RefSeq contig entries with Bio::DB::Query::GenBank? In-Reply-To: References: Message-ID: I think we deliberately bail on these - need someone to figure out how to parse them correctly. -jason On Dec 5, 2004, at 3:14 PM, Rumen Kostadinov wrote: > Hi, > > > Is there a way to retrieve and parse RefSeq contig entries with bioperl > using the Bio::DB::Query::GenBank? > > e.g. > NT_079581 > > I get weird parsing when doing: > > my $query_string = param('query'); > my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', > -query=>$query_string); > my $count = $query->count; > my @ids = $query->ids; > > # get a genbank database handle > my $gb = new Bio::DB::GenBank; > > my $stream = $gb->get_Stream_by_query($query); > while (my $seq = $stream->next_seq) { > print ""; > print ''; > print $seq->accession_number(), ' '; > print "", $seq->desc(), br; > } > > Sincerely, > Rumen Kostadinov > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From tc.jones at jones.tc Sun Dec 5 15:45:29 2004 From: tc.jones at jones.tc (Terry Jones) Date: Sun Dec 5 15:42:55 2004 Subject: [Bioperl-l] Simple question on reading a .msf file In-Reply-To: Your message at 14:12:52 on Sunday, 5 December 2004 References: <16819.21452.472745.136814@terry.jones.tc> Message-ID: <16819.29545.176118.907324@terry.jones.tc> >>>>> "Brian" == Brian Osborne writes: Brian> Try map_chars() rather than gap_chars, see if that works. It's Brian> described in the Bio::AlignIO::msf POD. It's not in my perldoc Bio::AlignIO::msf. It is in Bio::SimpleAlign though. But I tried this originally, and I can call map_chars but then the code breaks because Bio::SimpleAlign has no next_aln(). I got another suggestion, from Stefan Kirov, to try the same thing (map_chars). He also suggested replacing '~' with '-' outside bioperl, and this does work. I'd like to avoid it though, because I'd rather not touch the .msf files outside of bioperl. Thanks a lot, Terry From jason.stajich at duke.edu Sun Dec 5 15:49:28 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Dec 5 15:46:55 2004 Subject: [Bioperl-l] How to retrieve/parse RefSeq contig entries with Bio::DB::Query::GenBank? In-Reply-To: References: Message-ID: <2835EC6A-46FF-11D9-89DB-000393C44276@duke.edu> On Dec 5, 2004, at 3:30 PM, Rumen Kostadinov wrote: > well I use the following: > > 1.open ncbi > 2.type the accession NT_079581 > 3.click on "Click here to see the sequence of this contig record." - a > recent feature they added that displays the whole sequence with > features, etc. > 4.copy/paste it into my web program > and use > my $stringfh = new IO::String($seqstr); > my $stream = new Bio::SeqIO(-fh => $stringfh, > -format => 'GenBank'); > while (my $seq = $stream->next_seq) { > bla; > } > to proceed. > it works perfectly, but I wanted to use my easier method of > retrieving by just pasteing the accession number > and letting Bio::DB::Query::GenBank do the job. > Yep would be nice to have - but we're limited by the interfaces made available through the ncbi tools scripts. The full genbank record is not what the web query returns but the contig format where there are cross references to the accession numbers of the assembled pieces that make up the contig. So someone has to figure out how to make it work, probably by parsing the contig format and then doing additional subqueries. > > Thanks for your response! > Rumen Kostadinov > > > > On Sun, 5 Dec 2004 15:25:21 -0500, Jason Stajich > wrote: >> I think we deliberately bail on these - need someone to figure out how >> to parse them correctly. >> >> -jason >> >> >> On Dec 5, 2004, at 3:14 PM, Rumen Kostadinov wrote: >> >>> Hi, >>> >>> >>> Is there a way to retrieve and parse RefSeq contig entries with >>> bioperl >>> using the Bio::DB::Query::GenBank? >>> >>> e.g. >>> NT_079581 >>> >>> I get weird parsing when doing: >>> >>> my $query_string = param('query'); >>> my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', >>> -query=>$query_string); >>> my $count = $query->count; >>> my @ids = $query->ids; >>> >>> # get a genbank database handle >>> my $gb = new Bio::DB::GenBank; >>> >>> my $stream = $gb->get_Stream_by_query($query); >>> while (my $seq = $stream->next_seq) { >>> print ""; >>> print ''; >>> print $seq->accession_number(), ' '; >>> print "", $seq->desc(), br; >>> } >>> >>> Sincerely, >>> Rumen Kostadinov >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From s_waechter at gmx.net Sun Dec 5 16:39:33 2004 From: s_waechter at gmx.net (=?ISO-8859-1?Q?Stefan_W=E4chter?=) Date: Sun Dec 5 16:37:05 2004 Subject: [Bioperl-l] Remote Blast error - 500 Too many open files In-Reply-To: <34348.68.10.90.185.1100021113.squirrel@cartero.cs.odu.edu> References: <7D136634-3189-11D9-B9AF-000393BC20D0@research.dfci.harvard.edu> <200411081527.iA8FRxKr031547@portal.open-bio.org> <34348.68.10.90.185.1100021113.squirrel@cartero.cs.odu.edu> Message-ID: <41B38015.7050105@gmx.net> Hi all, I get the same error like Affan Qureshi a couple of weeks ago. I wrote a script, that send a couple of hundreds sequences in portions of one hundred to NCBI Blast Server using the RemoteBlast module. After a while I the script breaks with an error like that below. For me it seems that the remote blast module stores the HTML output from NCBI in this cryptic filenames in the tmp directory. First the page with the RID is stored followed by the complete result page from NCBI . All these tmp-files are deleted, when the script is finished. The problem seems to be, that these files are accumulated when the script is running. I assume that there is a limit in how many files could be stored in a unix/linux directory. Another questions : Why are these files still open ? In the moment I havn't found a solution for that problem. Did someone know how to manage this problem ? Is there a possibility to tell the RemoteBlast module to delete these tmp files immediately when they are no longer required ? Thanks for your help Stefan Affan Qureshi wrote: >I tried a remote blastx search today around 1:00pm and got this error >message after a long wait. Also it seemed that the NCBI web interface was >taking too long for BLAST searches. > >MSG: >An Error Occurred > >

An Error Occurred

>500 Cannot write to '/tmp/hLFqVvHO2D': Too many open files > > > >Is this a remote server error or am I doing something wrong? Anyone else >got this error? > >Thanks, >Affan > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From brian_osborne at cognia.com Sun Dec 5 16:55:26 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Sun Dec 5 16:53:30 2004 Subject: [Bioperl-l] Simple question on reading a .msf file In-Reply-To: <16819.29545.176118.907324@terry.jones.tc> Message-ID: Terry, I'm puzzled about something. I'm looking at msf.pm and it looks like it _should_ work, even with the tilde's in there. Can you send me the file? BIO -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Terry Jones Sent: Sunday, December 05, 2004 3:45 PM To: Brian Osborne Cc: bioperl-l@bioperl.org Subject: RE: [Bioperl-l] Simple question on reading a .msf file >>>>> "Brian" == Brian Osborne writes: Brian> Try map_chars() rather than gap_chars, see if that works. It's Brian> described in the Bio::AlignIO::msf POD. It's not in my perldoc Bio::AlignIO::msf. It is in Bio::SimpleAlign though. But I tried this originally, and I can call map_chars but then the code breaks because Bio::SimpleAlign has no next_aln(). I got another suggestion, from Stefan Kirov, to try the same thing (map_chars). He also suggested replacing '~' with '-' outside bioperl, and this does work. I'd like to avoid it though, because I'd rather not touch the .msf files outside of bioperl. Thanks a lot, Terry _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From tc.jones at jones.tc Sun Dec 5 19:15:13 2004 From: tc.jones at jones.tc (Terry Jones) Date: Sun Dec 5 19:13:31 2004 Subject: [Bioperl-l] Simple question on reading a .msf file In-Reply-To: Your message at 16:55:26 on Sunday, 5 December 2004 References: <16819.29545.176118.907324@terry.jones.tc> Message-ID: <16819.42129.74444.443521@terry.jones.tc> This is now solved, thanks to a few mails from Brian Osborne. It was due to an old version of bioperl (with an old version (1.16) of Bio/AlignIO/msf.pm). I don't know how I ended up with that, having installed only last week. I guess a mirror is out of date, or similar. Terry From allenday at ucla.edu Mon Dec 6 18:46:37 2004 From: allenday at ucla.edu (Allen Day) Date: Mon Dec 6 17:44:19 2004 Subject: [Bioperl-l] New item for Briefings in bioinformatics? In-Reply-To: <41B245E1.8000402@pcbi.upenn.edu> References: <339D68B133EAD311971E009027DC47970218EE4E@montecarlo.cgr.harvard.edu> <41B245E1.8000402@pcbi.upenn.edu> Message-ID: i second the movement for a gmod companion piece. when is the article deadline? -allen On Sat, 4 Dec 2004, Aaron J. Mackey wrote: > I'm happy to write this up as a companion to the upcoming 1.5 release > ... but are there any other takers? Perhaps a GMOD companion piece > could be arranged? > > -Aaron > > Eitan Rubin wrote: > > > Hi, > > > > > > > > I find the progress of the BioPerl very exciting, and would like to include > > a news item in the next column I am editing for Briefings in Bioinformatics. > > Would anyone who is familiar with the recent progress in Bioperl be willing > > to write a short (500-2000) words news item? While this is not a > > peer-reviewed publication, the abstract of the new items go into pub-med. > > > > > > > > Eitan Rubin > > > > > > > > -------------------- > > > > Eitan Rubin, PhD > > > > News Column Editor, Briefings in Bioinformatics > > > > > > > > Head of Bioinformatics > > > > The Bauer Center for Genomics Research > > > > Harvard University > > > > Tel: 617-496-5649 Fax: 617-495-2196 > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From davalos at email.arizona.edu Mon Dec 6 13:32:13 2004 From: davalos at email.arizona.edu (Liliana Davalos) Date: Mon Dec 6 21:04:08 2004 Subject: [Bioperl-l] Bio::SearchIO::psiblast Message-ID: <261AE308-47B5-11D9-B5E1-00039357DFEA@email.arizona.edu> Hi, I am trying to parse the result of a psi-blast search using SearchIO. I get thrown with the message at the bottom, although I am using the textbook example: use Bio::SearchIO; my $in = Bio::SearchIO->new( -format => 'psiblast', -file => 'report.blastp' ); while ( my $blast = $in->next_result() ) { foreach my $hit ( $blast->hits ) { print "Hit: $hit\n"; } } Error message: Can't locate object method "result_factory" via package "Bio::SearchIO::psiblast" at /usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/psiblast.pm line 668, line 16. Does anyone have working examples? Thanks, Liliana From jason.stajich at duke.edu Mon Dec 6 21:11:54 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Dec 6 21:09:21 2004 Subject: [Bioperl-l] Bio::SearchIO::psiblast In-Reply-To: <261AE308-47B5-11D9-B5E1-00039357DFEA@email.arizona.edu> References: <261AE308-47B5-11D9-B5E1-00039357DFEA@email.arizona.edu> Message-ID: <5DA47BA0-47F5-11D9-9B7E-000393C44276@duke.edu> I think psiblast is deprecated all the functionality to parse PSIBLAST reports are in Bio::SearchIO::blast. -jason On Dec 6, 2004, at 1:32 PM, Liliana Davalos wrote: > Hi, > > I am trying to parse the result of a psi-blast search using SearchIO. > I get thrown with the message at the bottom, although I am using the > textbook example: > > use Bio::SearchIO; > > my $in = Bio::SearchIO->new( -format => 'psiblast', > -file => 'report.blastp' ); > > while ( my $blast = $in->next_result() ) { > foreach my $hit ( $blast->hits ) { > print "Hit: $hit\n"; > } > } > > Error message: > > Can't locate object method "result_factory" via package > "Bio::SearchIO::psiblast" at > /usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/psiblast.pm line > 668, line 16. > > Does anyone have working examples? > > Thanks, Liliana > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jaredfox at ucla.edu Mon Dec 6 22:59:41 2004 From: jaredfox at ucla.edu (Jared Fox) Date: Tue Dec 7 07:44:02 2004 Subject: [Bioperl-l] Fw: bad entries in interpro again (fwd) Message-ID: <00f201c4dc11$2fd1a640$6a65a8c0@VOTE4RENEW> The problem with Interpro XML is that there are entries like: or The double quotes are supposed to mark the beginning and end of the name attribute, but the xml is not valid so it has double quotes inside the attribute itself. I believe this also happens with other illegal xml characters. If Interpro were to start producing valid XML, everything should work happily. > ---------- Forwarded message ---------- > Date: Wed, 01 Dec 2004 16:16:46 +0000 > From: Mikko Arvas > To: bioperl-l@portal.open-bio.org, Hilmar Lapp , > Allen Day > Subject: bad entries in interpro again > > Hi, > > we've been discussing the problems of interpro parsing. I have a friend > who > is going to interpro consortium meeting next week and I could send some > regards through him. After reading your e-mails, I am (being quite a > newbie) a little bit confused of what kind of regards would you like to > send if any? > > Is the &apos the source of the problem? Is it really a problem in BioPerl > or in expat? Is somebody trying to solve the problem for Bioperl now > and is there any sensible thing that the interpro team could do to help? > > Cheers, > mikko > > Mikko Arvas > VTT Biotechnology > > e-mail: mikko.arvas@vtt.fi > tel: +358-(0)9-456 5827 > mobile: +358-(0)44-381 0502 > fax: +358-(0)9-455 2103 > mail: Tietotie 2, Espoo > P.O. Box 1500 > FIN-02044 VTT, Finland > From heikki at ebi.ac.uk Tue Dec 7 08:04:09 2004 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Dec 7 08:14:29 2004 Subject: [Bioperl-l] Fw: bad entries in interpro again (fwd) In-Reply-To: <00f201c4dc11$2fd1a640$6a65a8c0@VOTE4RENEW> References: <00f201c4dc11$2fd1a640$6a65a8c0@VOTE4RENEW> Message-ID: <200412071304.10125.heikki@ebi.ac.uk> This came up last week and it turned out that the new release made available on Monday already had all double quotes escaped. Download the new interpro. -Heikki On Tuesday 07 December 2004 03:59, Jared Fox wrote: > The problem with Interpro XML is that there are entries like: > > dbname="SUPERFAMILY"> > > or > > dbname="SUPERFAMILY"> > > The double quotes are supposed to mark the beginning and end of the name > attribute, but the xml is not valid so it has double quotes inside the > attribute itself. I believe this also happens with other illegal xml > characters. > > If Interpro were to start producing valid XML, everything should work > happily. > > > ---------- Forwarded message ---------- > > Date: Wed, 01 Dec 2004 16:16:46 +0000 > > From: Mikko Arvas > > To: bioperl-l@portal.open-bio.org, Hilmar Lapp , > > Allen Day > > Subject: bad entries in interpro again > > > > Hi, > > > > we've been discussing the problems of interpro parsing. I have a friend > > who > > is going to interpro consortium meeting next week and I could send some > > regards through him. After reading your e-mails, I am (being quite a > > newbie) a little bit confused of what kind of regards would you like to > > send if any? > > > > Is the &apos the source of the problem? Is it really a problem in BioPerl > > or in expat? Is somebody trying to solve the problem for Bioperl now > > and is there any sensible thing that the interpro team could do to help? > > > > Cheers, > > mikko > > > > Mikko Arvas > > VTT Biotechnology > > > > e-mail: mikko.arvas@vtt.fi > > tel: +358-(0)9-456 5827 > > mobile: +358-(0)44-381 0502 > > fax: +358-(0)9-455 2103 > > mail: Tietotie 2, Espoo > > P.O. Box 1500 > > FIN-02044 VTT, Finland > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason.stajich at duke.edu Tue Dec 7 20:58:14 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Dec 7 20:58:01 2004 Subject: [Bioperl-l] Fwd: [Root-l] install error of bioperl Message-ID: <9F0143AE-48BC-11D9-88B1-000393C44276@duke.edu> Begin forwarded message: > From: "Pan DU" > Date: December 7, 2004 3:43:17 PM EST > To: > Subject: [Root-l] install error of bioperl > > Hi, > > When I tried to install "Bioperl-1.4 [1.4] Bioperl 1.4 PPM3 > Archive", > it reported error as shown below. > > > > Error: Failed to download URL http://bioperl.org/DIST/GD.ppd: 404 Not > Found > > > > Could you fix this problem or tell me how to deal with that? Thanks. > > > > Pan > > > > > > ============================= > Pan DU > Bioinformatics & Computational Biology > Dept. Electrical & Computer Engineering > 2624 Howe Hall, > Iowa State University, > Ames, IA 50011, USA > Email: dupan@iastate.edu > Tel: 515-294-4935 (O) > ============================= > > > > _______________________________________________ > Root-l mailing list > Root-l@open-bio.org > http://open-bio.org/mailman/listinfo/root-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1193 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041207/ba2f9994/attachment-0001.bin From cain at cshl.org Tue Dec 7 22:42:04 2004 From: cain at cshl.org (Scott Cain) Date: Tue Dec 7 22:39:14 2004 Subject: [Bioperl-l] New item for Briefings in bioinformatics? In-Reply-To: <200412080205.iB824fKs026125@portal.open-bio.org> References: <200412080205.iB824fKs026125@portal.open-bio.org> Message-ID: <1102477324.3272.4.camel@localhost.localdomain> Hi Aaron, I'll chip in something for GMOD--how soon do we need it, and what would you like from me? Scott PS: Sorry for the delay--I read the bioperl list as a digest, so there is frequently a few day lag. > Message: 2 > Date: Sat, 04 Dec 2004 18:18:57 -0500 > From: "Aaron J. Mackey" > Subject: Re: [Bioperl-l] New item for Briefings in bioinformatics? > To: Eitan Rubin > Cc: "'bioperl-l@bioperl.org'" > Message-ID: <41B245E1.8000402@pcbi.upenn.edu> > Content-Type: text/plain; charset=us-ascii; format=flowed > > I'm happy to write this up as a companion to the upcoming 1.5 release > ... but are there any other takers? Perhaps a GMOD companion piece > could be arranged? > > -Aaron > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From allenday at ucla.edu Wed Dec 8 03:21:18 2004 From: allenday at ucla.edu (Allen Day) Date: Wed Dec 8 02:19:56 2004 Subject: [Bioperl-l] Fw: bad entries in interpro again (fwd) In-Reply-To: <200412071304.10125.heikki@ebi.ac.uk> References: <00f201c4dc11$2fd1a640$6a65a8c0@VOTE4RENEW> <200412071304.10125.heikki@ebi.ac.uk> Message-ID: what release are you referring to, with URL please? i am looking here, and it is the same old iprscan with the bugs: ftp://ftp.ebi.ac.uk/pub/databases/interpro/iprscan/RELEASE/3.3/ -allen On Tue, 7 Dec 2004, Heikki Lehvaslaiho wrote: > This came up last week and it turned out that the new release made available > on Monday already had all double quotes escaped. > > Download the new interpro. > > -Heikki > > > On Tuesday 07 December 2004 03:59, Jared Fox wrote: > > The problem with Interpro XML is that there are entries like: > > > > > dbname="SUPERFAMILY"> > > > > or > > > > > dbname="SUPERFAMILY"> > > > > The double quotes are supposed to mark the beginning and end of the name > > attribute, but the xml is not valid so it has double quotes inside the > > attribute itself. I believe this also happens with other illegal xml > > characters. > > > > If Interpro were to start producing valid XML, everything should work > > happily. > > > > > ---------- Forwarded message ---------- > > > Date: Wed, 01 Dec 2004 16:16:46 +0000 > > > From: Mikko Arvas > > > To: bioperl-l@portal.open-bio.org, Hilmar Lapp , > > > Allen Day > > > Subject: bad entries in interpro again > > > > > > Hi, > > > > > > we've been discussing the problems of interpro parsing. I have a friend > > > who > > > is going to interpro consortium meeting next week and I could send some > > > regards through him. After reading your e-mails, I am (being quite a > > > newbie) a little bit confused of what kind of regards would you like to > > > send if any? > > > > > > Is the &apos the source of the problem? Is it really a problem in BioPerl > > > or in expat? Is somebody trying to solve the problem for Bioperl now > > > and is there any sensible thing that the interpro team could do to help? > > > > > > Cheers, > > > mikko > > > > > > Mikko Arvas > > > VTT Biotechnology > > > > > > e-mail: mikko.arvas@vtt.fi > > > tel: +358-(0)9-456 5827 > > > mobile: +358-(0)44-381 0502 > > > fax: +358-(0)9-455 2103 > > > mail: Tietotie 2, Espoo > > > P.O. Box 1500 > > > FIN-02044 VTT, Finland > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From nathanhaigh at ukonline.co.uk Wed Dec 8 04:43:56 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 8 04:41:30 2004 Subject: [Bioperl-l] Fwd: [Root-l] install error of bioperl In-Reply-To: <9F0143AE-48BC-11D9-88B1-000393C44276@duke.edu> Message-ID: The reason for this is that GD.ppd doesn?t exist in any of the repositories being searched by ppm. This will be because you either a) do not have the repositories in the active search list, or more likely b) you do not have repositories that contain this module. If I remember rightly, there are several modules (including GD) that BioPerl requires which are not in the default ppm repositories (or the BioPerl repository). I assume that you have already added the BioPerl repository by following the instructions in the INSTALL.WIN file as you are able to find the BioPerl module via ppm. You will also need to add the following repositories by typing the following commands at the ppm prompt ? PPM> repository add theoryx http://theoryx5.uwinnipeg.ca/ppms PPM> repository add bribes http://www.Bribes.org/perl/ppm ? Now when you try to install BioPerl again, the GD.ppd file should be found as it is present in both the above repositories (the latest version should be installed automatically ? 2.19 in the theoryx repository at this time). ? NOTE: These repositories have been added to the CVS version of INSTALL.WIN under section 1.3.2 ActiveState PPM3. These repositories also work for people using PPM2, add these repositories for PPM2 by typing the following commands at the PPM2 prompt (I have just tested this and it appears to work): ? PPM> set repository theoryx http://theoryx5.uwinnipeg.ca/ppms/ PPM> set repository bribes http://www.Bribes.org/perl/ppm/ ? ? Hope this helps Nathan ? ? -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich Sent: 08 December 2004 01:58 To: Bioperl List Cc: dupan@iastate.edu Subject: [Bioperl-l] Fwd: [Root-l] install error of bioperl ? Begin forwarded message: From: "Pan DU" Date: December 7, 2004 3:43:17 PM EST To: Subject: [Root-l] install error of bioperl Hi, When I tried to install "Bioperl-1.4 [1.4] Bioperl 1.4 PPM3 Archive", it reported error as shown below. Error: Failed to download URL http://bioperl.org/DIST/GD.ppd: 404 Not Found Could you fix this problem or tell me how to deal with that? Thanks. Pan ============================= Pan DU Bioinformatics & Computational Biology Dept. Electrical & Computer Engineering 2624 Howe Hall, Iowa State University, Ames, IA 50011, USA Email: dupan@iastate.edu Tel: 515-294-4935 (O) ============================= _______________________________________________ Root-l mailing list Root-l@open-bio.org http://open-bio.org/mailman/listinfo/root-l -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ --- avast! Antivirus: Inbound message clean. Virus Database (VPS): 0450-0, 06/12/2004 Tested on: 08/12/2004 08:05:04 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From Mikko.Arvas at vtt.fi Wed Dec 8 07:01:40 2004 From: Mikko.Arvas at vtt.fi (Mikko Arvas) Date: Wed Dec 8 06:58:59 2004 Subject: [Bioperl-l] (no subject) Message-ID: <4.3.2.7.2.20041208114739.00cb7600@vttmail.vtt.fi> Hi, thank you so much for everybody for your help! But still no progress. I have Suse8.1, bioperl 1.4., XML::Parser.pm is 2.34 and latest match.xml from: ftp://ftp.ebi.ac.uk/pub/databases/interpro match.xml.gz 2004-11-29 Like Dave suggested just parsing with XML::Parser works fine with: #!/usr/bin/perl use strict; use warnings; use XML::Parser; my $pl = new XML::Parser; $pl->parsefile('match.xml'); But if do this: my $infeat = Bio::SeqIO->new('-file' => "<$opt_i", '-format' => 'interpro' ); while (my $feat = $infeat->next_seq) {print $feat->accession_number()."\n";} I still get: not well-formed (invalid token) at line 2, column 53, byte 131 at /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm line 187 from protein id o00408. And I can still remove this problem by taking the 2nd & out from line I can see no difference in the quoting of this entry and the new and old version of match.xml. There are about 2286 lines in match.xml with a two & and if I simply: tr "&" "_" match_user_friendly.xml I can parse match_user_friendly.xml untill the script above happily fills all the available memory and crashes (but that is an other story then). So is this my system only or does somebody else have the same problem too? If it is I'll just be lazy and use tr, enough time spent already. Cheers, mikko PS. Here is still the whole entry just in case: Mikko Arvas VTT Biotechnology e-mail: mikko.arvas@vtt.fi tel: +358-(0)9-456 5827 mobile: +358-(0)44-381 0502 fax: +358-(0)9-455 2103 mail: Tietotie 2, Espoo P.O. Box 1500 FIN-02044 VTT, Finland From dhoworth at mrc-lmb.cam.ac.uk Wed Dec 8 07:32:17 2004 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Wed Dec 8 07:29:42 2004 Subject: [Bioperl-l] Re: bad interpro entries In-Reply-To: <4.3.2.7.2.20041208114739.00cb7600@vttmail.vtt.fi> References: <4.3.2.7.2.20041208114739.00cb7600@vttmail.vtt.fi> Message-ID: <41B6F451.7020309@mrc-lmb.cam.ac.uk> Mikko Arvas wrote: > thank you so much for everybody for your help! But still no progress. > I have Suse8.1, bioperl 1.4., XML::Parser.pm is 2.34 and latest > match.xml from: > ftp://ftp.ebi.ac.uk/pub/databases/interpro > match.xml.gz 2004-11-29 > > Like Dave suggested just parsing with XML::Parser works fine with: > But if do this: > my $infeat = Bio::SeqIO->new('-file' => "<$opt_i", > '-format' => 'interpro' ); > while (my $feat = $infeat->next_seq) {print > $feat->accession_number()."\n";} > > I still get: > not well-formed (invalid token) at line 2, column 53, byte 131 at > /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm > line 187 > > from protein id o00408. >PS. Here is still the whole entry just in case: Well, I tested this entry with the validator and with that little test script and it appears to be good data. How did you obtain it? Was it as Hilmar suggested?: > There is no other editing of the chunks going on though except for a > haphazard substitution of certain double-quotes. In order to see the > chunk before it gets sent to the parser instance edit > Bio/SeqIO/interpro.pm and before the line > > $self->parse_xml($xml_fragment); > > put a print statement that prints out the content of $xml_fragment. > That should also give the exact source XML that trips up the parser. If you printed it another way, I'd suggest trying what Hilmar suggested next. If you did print it that way, call in the wizards! Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From nathanhaigh at ukonline.co.uk Wed Dec 8 04:38:02 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 8 07:44:52 2004 Subject: [Bioperl-l] Fwd: [Root-l] install error of bioperl In-Reply-To: <9F0143AE-48BC-11D9-88B1-000393C44276@duke.edu> Message-ID: The reason for this is that GD.ppd doesn't exist in any of the repositories being searched by ppm. This will be because you either a) do not have the repositories in the active search list, or more likely b) you do not have repositories that contain this module. If I remember rightly, there are several modules (including GD) that BioPerl requires which are not in the default ppm repositories (or the BioPerl repository). I assume that you have already added the BioPerl repository by following the instructions in the INSTALL.WIN file as you are able to find the BioPerl module via ppm. You will also need to add the following repositories by typing the following commands at the ppm prompt PPM> repository add theoryx http://theoryx5.uwinnipeg.ca/ppms PPM> repository add bribes http://www.Bribes.org/perl/ppm Now when you try to install BioPerl again, the GD.ppd file should be found as it is present in both the above repositories (the latest version should be installed automatically - 2.19 in the theoryx repository at this time). NOTE: These repositories have been added to the CVS version of INSTALL.WIN under section 1.3.2 ActiveState PPM3. These repositories also work for people using PPM2, add these repositories for PPM2 by typing the following commands at the PPM2 prompt (I have just tested this and it appears to work): PPM> set repository theoryx http://theoryx5.uwinnipeg.ca/ppms/ PPM> set repository bribes http://www.Bribes.org/perl/ppm/ Hope this helps Nathan -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich Sent: 08 December 2004 01:58 To: Bioperl List Cc: dupan@iastate.edu Subject: [Bioperl-l] Fwd: [Root-l] install error of bioperl Begin forwarded message: From: "Pan DU" Date: December 7, 2004 3:43:17 PM EST To: Subject: [Root-l] install error of bioperl Hi, When I tried to install "Bioperl-1.4 [1.4] Bioperl 1.4 PPM3 Archive", it reported error as shown below. Error: Failed to download URL http://bioperl.org/DIST/GD.ppd: 404 Not Found Could you fix this problem or tell me how to deal with that? Thanks. Pan ============================= Pan DU Bioinformatics & Computational Biology Dept. Electrical & Computer Engineering 2624 Howe Hall, Iowa State University, Ames, IA 50011, USA Email: dupan@iastate.edu Tel: 515-294-4935 (O) ============================= _______________________________________________ Root-l mailing list Root-l@open-bio.org http://open-bio.org/mailman/listinfo/root-l -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ --- avast! Antivirus: Inbound message clean. Virus Database (VPS): 0450-0, 06/12/2004 Tested on: 08/12/2004 08:05:04 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0450-0, 06/12/2004 Tested on: 08/12/2004 09:35:39 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From tuco at ebi.ac.uk Wed Dec 8 05:45:29 2004 From: tuco at ebi.ac.uk (Emmanuel Quevillon) Date: Wed Dec 8 07:44:54 2004 Subject: [Bioperl-l] [Fwd: Re: bad entries in interpro again] Message-ID: <41B6DB49.2040207@ebi.ac.uk> -------------- next part -------------- An embedded message was scrubbed... From: Emmanuel Quevillon Subject: Re: bad entries in interpro again Date: Wed, 08 Dec 2004 10:16:06 +0000 Size: 5213 Url: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041208/d0394be0/badentriesininterproagain.eml From heikki at ebi.ac.uk Wed Dec 8 07:47:01 2004 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Dec 8 07:45:17 2004 Subject: [Bioperl-l] empty POD documentation sections Message-ID: <200412081247.01957.heikki@ebi.ac.uk> I've been cleaning the POD docs and the following files have one or more empty sections. They are usually SYNOPSIS or DESCRIPTION. Could the authors please write something in them (while keeping in mind that synopsis should be runnable for easy copy and paste). Thanks, -Heikki core: Bio/FeatureIO.pm Bio/Graph/SimpleGraph/Traversal.pm Bio/Graph/SimpleGraph/Traversal.pm Bio/SeqFeature/Tools/FeatureNamer.pm Bio/Tree/Node.pm Bio/Tree/NodeI.pm run: Bio/Tools/Run/AbstractRunner.pm Bio/Tools/Run/JavaRunner.pm Bio/Tools/Run/JavaRunner.pm -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From m.claesson at student.ucc.ie Wed Dec 8 09:07:59 2004 From: m.claesson at student.ucc.ie (Marcus Claesson) Date: Wed Dec 8 09:06:12 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel colours for different HSP frames within the same blast hit? Message-ID: <1102514879.17814.47.camel@morpheus.ucc.ie> Hi! In my Graphics::Panel overview of blastx results I would like to have different colours for hits in different frames. It works fine among hits but not for HSPs within the same hit. It then uses the frame value for the first instance, and I only get one colour. Has anyone managed to side step that? Below is the code I've used so far. Many thanks! Marcus #!/usr/bin/perl -w use Bio::Graphics; use Bio::SearchIO; my $searchio = Bio::SearchIO->new(-file=>blastx_results.out -format => 'blast'); my $result = $searchio->next_result(); my $panel = Bio::Graphics::Panel->new(-length=> $result->query_length, -width=> 800); my $track = $panel->add_track(-glyph => 'graded_segments', -label => 1, -connector => 'dashed', -bgcolor => sub { my $feature = shift; my ($frame) = $feature->frame(); return "red" if ($frame =~ /0/); return "green" if ($frame =~ /1/); return "blue" if ($frame =~ /2/)}, -strand_arrow => 'tue'); while( my $hit = $result->next_hit ) { my $feature = Bio::SeqFeature::Generic->new(-score=>$hit->raw_score, -frame=> $hit->frame); while( my $hsp = $hit->next_hsp ) { $feature->add_sub_SeqFeature($hsp,'EXPAND'); } $track->add_feature($feature); } print $panel->png; From barry.moore at genetics.utah.edu Wed Dec 8 12:29:17 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed Dec 8 15:26:32 2004 Subject: [Bioperl-l] Fwd: [Root-l] install error of bioperl In-Reply-To: <9F0143AE-48BC-11D9-88B1-000393C44276@duke.edu> References: <9F0143AE-48BC-11D9-88B1-000393C44276@duke.edu> Message-ID: <41B739ED.5090506@genetics.utah.edu> Pan- For some reason the bioperl ppm archive doesn't keep a copy of GD.ppd. Fortunately however another repository does. Do this: rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Barry Jason Stajich wrote: > > > Begin forwarded message: > > From: "Pan DU" > Date: December 7, 2004 3:43:17 PM EST > To: > Subject: [Root-l] install error of bioperl > > Hi, > > When I tried to install "Bioperl-1.4 [1.4] Bioperl 1.4 PPM3 Archive", > it reported error as shown below. > > > > Error: Failed to download URL http://bioperl.org/DIST/GD.ppd: 404 > Not Found > > > > Could you fix this problem or tell me how to deal with that? Thanks. > > > > Pan > > > > > > ============================= > Pan DU > Bioinformatics & Computational Biology > Dept. Electrical & Computer Engineering > 2624 Howe Hall, > Iowa State University, > Ames, IA 50011, USA > Email: dupan@iastate.edu > Tel: 515-294-4935 (O) > ============================= > > > > _______________________________________________ > Root-l mailing list > Root-l@open-bio.org > http://open-bio.org/mailman/listinfo/root-l > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > >------------------------------------------------------------------------ > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From baixd_99 at yahoo.com Wed Dec 8 18:19:07 2004 From: baixd_99 at yahoo.com (X) Date: Wed Dec 8 18:16:30 2004 Subject: [Bioperl-l] installing HTML::Parser Message-ID: <20041208231908.43107.qmail@web50806.mail.yahoo.com> Hello there, I am new to BioPerl. As I was trying to install the module of HTML::Parser from CPAN. I got the following error messages when testing the package. It seemed that my system was not correctly configured or something. Could anybody give an explanation of the error messages and how to fix the problem? Really appreciate it. ...... (tests ok) t/entities ...........Malformed UTF-8 character (unexpected non-continuation byte 0x72, immediately after start byte oxe5) in substitution iterator at /root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Entities.pm line 458. t/entities ...........ok 2/11Confused test output: test 2 answered after test 4 t/entities............ok 3/11Confused test output: test 3 answered after test 5 t/entities............NOK 4Confused test output: test 4 answered after test 6 t/entities............NOK 5Confused test output: test 5 answered after test 7 t/entities............NOK 6Confused test output: test 6 answered after test 8 t/entities............ok 7/11Confused test output: test 7 answered after test 9 t/entities............ok 8/11Confused test output: test 8 answered after test 10 t/entities............FAILED tests 1-3, 7-9 Failed 6/11 tests, 45.45% okay ...... (tests ok) t/headparser..........Parsing of undecoded UTF-8 will give garbage when decoding entities at /root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Parser.pm line 104. # Test 3 got: 'Å være eller å ikke være' (t/headparser.t at line 137) # Expected: 'Å være eller å ikke være' # t/headparser.t line 137 is: ok($p->header("Title"), "Å være eller å ikke være"); t/headparser.........FAILED test 3 Failed 1/6 tests, 83.33% okay ...... (tests ok) t/uentities..........FAILED tests 2, 8 Failed 2/14 tests, 85.71% okay ...... (tests ok) Failed 3/44 test scripts, 93.18% okay. 9/355 subtests failed, 97.46% okay. make: *** [test_dynamic] Error 29 /usr/bin/make test -- NOT OK Xiaodong __________________________________ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250 From hlapp at gnf.org Wed Dec 8 19:18:34 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Dec 8 19:16:43 2004 Subject: [Bioperl-l] problems parsing EBI interposscan.xml In-Reply-To: <1102097826.5668.10.camel@peach4> References: <1102097826.5668.10.camel@peach4> Message-ID: It looks you're trying to parse an interpro scan match file with the InterPro ontology file parser (Bio::OntologyIO). Maybe what you need is the interpro parser in Bio::SeqIO? -hilmar On Dec 3, 2004, at 10:17 AM, Mariano Latorre A wrote: > I installed Bioperl 1.4 (also installed dependencies Heap and Graph). I > need to parser interproscan xml reports. > > When I run "make test" it passed the Interproscan_parser test ok. But > when I perform a Interproscan at EBI I get a XML that can not be > parsed. > Bioperl says: > > Can't call method "identifier" on an undefined value at > /usr/lib/perl5/site_perl/5.8.3/Bio/Ontology/SimpleOntologyEngine.pm > line > 410. > > > So I go to the test directory inside the bioperl installation and check > the differences and the xml generated by EBI and the one provided by > bioperl installation package and notice that they are totally > different!!! > > I paste both file beginings (as you'll see they uses different > tags...): > > Thanks! > Mariano > > > 1.- the one provided for bioperl testing: > > > > > > > file_date="12-JUL-2002 00:00:00"/> > file_date="24-JUN-2002 00:00:00"/> > file_date="05-JUL-2002 00:00:00"/> > file_date="24-JAN-2002 00:00:00"/> > file_date="18-JUL-2001 00:00:00"/> > file_date="21-JUN-2002 00:00:00"/> > file_date="17-MAY-2002 00:00:00"/> > file_date="28-JAN-2002 00:00:00"/> > file_date="16-NOV-2000 00:00:00"/> > file_date="03-AUG-2001 00:00:00"/> > > protein_count="129"> > Kringle > > Kringles are autonomous structural domains, found throughout the blood > clotting and fibrinolytic proteins. > Kringle domains are believed to play a role in binding mediators (e.g., > membranes, > other proteins or phospholipids), and in the regulation of proteolytic > activity > , , idref="PUB00003257"/>. > Kringle domains , idref="PUB00000803"/>, are characterised by > a triple loop, 3-disulphide bridge structure, whose conformation is > defined by a number of hydrogen bonds and small pieces of > anti-parallel > beta-sheet. They are found in a varying number of copies, in some > serine proteases and > plasma proteins. > > Blood coagulation > factor XII (Hageman factor) (1 copy) > Urokinase-type > plasminogen activator (1 copy) > Hepatocyte growth > factor (HGF) (4 copies) > Hepatocyte growth > factor activator (1 copy) idref="PUB00002776"/> > > > Plasminogen (5 copies) > > > > Hepatocyte growth factor like protein (4 copies) idref="PUB00000355"/> > > > > > > 2.- The ouptput from EBI INTERPRO: > > > >
> citation="PMID:11590104" /> > > > > type="sequences" /> > type="matrix" /> > type="model" /> > type="model" /> > type="model" /> > type="model" /> > type="strings" /> > type="strings" /> > type="model" /> > type="model" /> > type="model" /> > > >
> > > > type="Family"> > > Molecular Function > methionine adenosyltransferase > activity > > > Molecular Function > ATP binding > > > Biological Process > one-carbon compound metabolism > > dbname="PIR"> > evidence="HMMPIR" /> > > > evidence="HMMPfam" /> > > > evidence="HMMPfam" /> > > > evidence="HMMPfam" /> > > > evidence="HMMTigr" /> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From paulo.david at netvisao.pt Wed Dec 8 19:14:40 2004 From: paulo.david at netvisao.pt (Paulo Almeida) Date: Wed Dec 8 19:26:25 2004 Subject: [Bioperl-l] installing HTML::Parser In-Reply-To: <20041208231908.43107.qmail@web50806.mail.yahoo.com> References: <20041208231908.43107.qmail@web50806.mail.yahoo.com> Message-ID: <41B798F0.4010707@netvisao.pt> Hi, I'm not sure it's the same thing, but this might help you: http://forums.devshed.com/t77648/s.html The part that interests you is: I made the change to /etc/sysconfig/i18n The default file reads LANG="en_US.UTF-8" SUPPORTED="en_US.UTF-8:en_US:en" SYSFONT="latarcyrheb-sun16" I change my file to read LANG="en_US" SUPPORTED="en_US" SYSFONT="latarcyrheb-sun16" If that doesn't help, you can google for "Malformed UTF-8 character (unexpected" and see what else comes up. -Paulo Almeida X wrote: >Hello there, > >I am new to BioPerl. As I was trying to install the module of >HTML::Parser from CPAN. I got the following error messages when testing >the package. It seemed that my system was not correctly configured or >something. Could anybody give an explanation of the error messages and >how to fix the problem? Really appreciate it. > > >...... (tests ok) >t/entities ...........Malformed UTF-8 character (unexpected >non-continuation byte 0x72, immediately after start byte oxe5) in >substitution iterator at >/root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Entities.pm line 458. >t/entities ...........ok 2/11Confused test output: test 2 answered >after test 4 >t/entities............ok 3/11Confused test output: test 3 answered >after test 5 >t/entities............NOK 4Confused test output: test 4 answered after >test 6 >t/entities............NOK 5Confused test output: test 5 answered after >test 7 >t/entities............NOK 6Confused test output: test 6 answered after >test 8 >t/entities............ok 7/11Confused test output: test 7 answered >after test 9 >t/entities............ok 8/11Confused test output: test 8 answered >after test 10 >t/entities............FAILED tests 1-3, 7-9 > Failed 6/11 tests, 45.45% okay >...... (tests ok) >t/headparser..........Parsing of undecoded UTF-8 will give garbage when >decoding entities at >/root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Parser.pm line 104. ># Test 3 got: '? v??re eller ? ikke v??re' (t/headparser.t at line 137) ># Expected: '? v?re eller ? ikke v?re' ># t/headparser.t line 137 is: ok($p->header("Title"), "? v?re eller ? >ikke v?re"); >t/headparser.........FAILED test 3 > Failed 1/6 tests, 83.33% okay >...... (tests ok) >t/uentities..........FAILED tests 2, 8 > Failed 2/14 tests, 85.71% okay >...... (tests ok) > >Failed 3/44 test scripts, 93.18% okay. 9/355 subtests failed, 97.46% >okay. >make: *** [test_dynamic] Error 29 > /usr/bin/make test -- NOT OK > > >Xiaodong > From nathanhaigh at ukonline.co.uk Wed Dec 8 19:36:01 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 8 19:33:38 2004 Subject: [Bioperl-l] installing HTML::Parser In-Reply-To: <20041208231908.43107.qmail@web50806.mail.yahoo.com> Message-ID: What OS are you running? Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of X > Sent: 08 December 2004 23:19 > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] installing HTML::Parser > > Hello there, > > I am new to BioPerl. As I was trying to install the module of > HTML::Parser from CPAN. I got the following error messages when testing > the package. It seemed that my system was not correctly configured or > something. Could anybody give an explanation of the error messages and > how to fix the problem? Really appreciate it. > > > ...... (tests ok) > t/entities ...........Malformed UTF-8 character (unexpected > non-continuation byte 0x72, immediately after start byte oxe5) in > substitution iterator at > /root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Entities.pm line 458. > t/entities ...........ok 2/11Confused test output: test 2 answered > after test 4 > t/entities............ok 3/11Confused test output: test 3 answered > after test 5 > t/entities............NOK 4Confused test output: test 4 answered after > test 6 > t/entities............NOK 5Confused test output: test 5 answered after > test 7 > t/entities............NOK 6Confused test output: test 6 answered after > test 8 > t/entities............ok 7/11Confused test output: test 7 answered > after test 9 > t/entities............ok 8/11Confused test output: test 8 answered > after test 10 > t/entities............FAILED tests 1-3, 7-9 > Failed 6/11 tests, 45.45% okay > ...... (tests ok) > t/headparser..........Parsing of undecoded UTF-8 will give garbage when > decoding entities at > /root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Parser.pm line 104. > # Test 3 got: '? v??re eller ? ikke v??re' (t/headparser.t at line 137) > # Expected: '? v?re eller ? ikke v?re' > # t/headparser.t line 137 is: ok($p->header("Title"), "? v?re eller ? > ikke v?re"); > t/headparser.........FAILED test 3 > Failed 1/6 tests, 83.33% okay > ...... (tests ok) > t/uentities..........FAILED tests 2, 8 > Failed 2/14 tests, 85.71% okay > ...... (tests ok) > > Failed 3/44 test scripts, 93.18% okay. 9/355 subtests failed, 97.46% > okay. > make: *** [test_dynamic] Error 29 > /usr/bin/make test -- NOT OK > > > Xiaodong > > > > __________________________________ > Do you Yahoo!? > Yahoo! Mail - now with 250MB free storage. Learn more. > http://info.mail.yahoo.com/mail_250 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0450-0, 06/12/2004 > Tested on: 09/12/2004 00:29:19 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0450-0, 06/12/2004 Tested on: 09/12/2004 00:35:45 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From nathanhaigh at ukonline.co.uk Wed Dec 8 19:48:39 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 8 19:46:16 2004 Subject: [Bioperl-l] installing HTML::Parser In-Reply-To: <20041208231908.43107.qmail@web50806.mail.yahoo.com> Message-ID: The Reason I asked about what OS you are using, is that I've previously read something about Perl and UTF-8 on RedHat 8 OS, I can't remember the details off hand so, just wanted to check your OS before trying to dig the details out! Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of X > Sent: 08 December 2004 23:19 > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] installing HTML::Parser > > Hello there, > > I am new to BioPerl. As I was trying to install the module of > HTML::Parser from CPAN. I got the following error messages when testing > the package. It seemed that my system was not correctly configured or > something. Could anybody give an explanation of the error messages and > how to fix the problem? Really appreciate it. > > > ...... (tests ok) > t/entities ...........Malformed UTF-8 character (unexpected > non-continuation byte 0x72, immediately after start byte oxe5) in > substitution iterator at > /root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Entities.pm line 458. > t/entities ...........ok 2/11Confused test output: test 2 answered > after test 4 > t/entities............ok 3/11Confused test output: test 3 answered > after test 5 > t/entities............NOK 4Confused test output: test 4 answered after > test 6 > t/entities............NOK 5Confused test output: test 5 answered after > test 7 > t/entities............NOK 6Confused test output: test 6 answered after > test 8 > t/entities............ok 7/11Confused test output: test 7 answered > after test 9 > t/entities............ok 8/11Confused test output: test 8 answered > after test 10 > t/entities............FAILED tests 1-3, 7-9 > Failed 6/11 tests, 45.45% okay > ...... (tests ok) > t/headparser..........Parsing of undecoded UTF-8 will give garbage when > decoding entities at > /root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Parser.pm line 104. > # Test 3 got: '? v??re eller ? ikke v??re' (t/headparser.t at line 137) > # Expected: '? v?re eller ? ikke v?re' > # t/headparser.t line 137 is: ok($p->header("Title"), "? v?re eller ? > ikke v?re"); > t/headparser.........FAILED test 3 > Failed 1/6 tests, 83.33% okay > ...... (tests ok) > t/uentities..........FAILED tests 2, 8 > Failed 2/14 tests, 85.71% okay > ...... (tests ok) > > Failed 3/44 test scripts, 93.18% okay. 9/355 subtests failed, 97.46% > okay. > make: *** [test_dynamic] Error 29 > /usr/bin/make test -- NOT OK > > > Xiaodong > > > > __________________________________ > Do you Yahoo!? > Yahoo! Mail - now with 250MB free storage. Learn more. > http://info.mail.yahoo.com/mail_250 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0450-0, 06/12/2004 > Tested on: 09/12/2004 00:29:19 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0450-0, 06/12/2004 Tested on: 09/12/2004 00:48:18 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From baixd_99 at yahoo.com Wed Dec 8 20:45:51 2004 From: baixd_99 at yahoo.com (Xiaodong) Date: Wed Dec 8 20:43:12 2004 Subject: [Bioperl-l] installing HTML::Parser In-Reply-To: <41B798F0.4010707@netvisao.pt> Message-ID: <20041209014551.88918.qmail@web50806.mail.yahoo.com> Thanks Paulo. Your way actually worked. Now I have it installed. But, I am still confused about why it happened in the first place. Any ideas? Xiaodong --- Paulo Almeida wrote: > Hi, > > I'm not sure it's the same thing, but this might help you: > http://forums.devshed.com/t77648/s.html > > The part that interests you is: > > I made the change to /etc/sysconfig/i18n > > The default file reads > > LANG="en_US.UTF-8" > SUPPORTED="en_US.UTF-8:en_US:en" > SYSFONT="latarcyrheb-sun16" > > I change my file to read > > LANG="en_US" > SUPPORTED="en_US" > SYSFONT="latarcyrheb-sun16" > > If that doesn't help, you can google for "Malformed UTF-8 character > (unexpected" and see what else comes up. > > -Paulo Almeida > > > X wrote: > > >Hello there, > > > >I am new to BioPerl. As I was trying to install the module of > >HTML::Parser from CPAN. I got the following error messages when > testing > >the package. It seemed that my system was not correctly configured > or > >something. Could anybody give an explanation of the error messages > and > >how to fix the problem? Really appreciate it. > > > > > >...... (tests ok) > >t/entities ...........Malformed UTF-8 character (unexpected > >non-continuation byte 0x72, immediately after start byte oxe5) in > >substitution iterator at > >/root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Entities.pm line > 458. > >t/entities ...........ok 2/11Confused test output: test 2 answered > >after test 4 > >t/entities............ok 3/11Confused test output: test 3 answered > >after test 5 > >t/entities............NOK 4Confused test output: test 4 answered > after > >test 6 > >t/entities............NOK 5Confused test output: test 5 answered > after > >test 7 > >t/entities............NOK 6Confused test output: test 6 answered > after > >test 8 > >t/entities............ok 7/11Confused test output: test 7 answered > >after test 9 > >t/entities............ok 8/11Confused test output: test 8 answered > >after test 10 > >t/entities............FAILED tests 1-3, 7-9 > > Failed 6/11 tests, 45.45% okay > >...... (tests ok) > >t/headparser..........Parsing of undecoded UTF-8 will give garbage > when > >decoding entities at > >/root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Parser.pm line 104. > ># Test 3 got: 'Å være eller å ikke være' (t/headparser.t at line > 137) > ># Expected: 'Å være eller å ikke være' > ># t/headparser.t line 137 is: ok($p->header("Title"), "Å være eller > å > >ikke være"); > >t/headparser.........FAILED test 3 > > Failed 1/6 tests, 83.33% okay > >...... (tests ok) > >t/uentities..........FAILED tests 2, 8 > > Failed 2/14 tests, 85.71% okay > >...... (tests ok) > > > >Failed 3/44 test scripts, 93.18% okay. 9/355 subtests failed, 97.46% > >okay. > >make: *** [test_dynamic] Error 29 > > /usr/bin/make test -- NOT OK > > > > > >Xiaodong > > > > __________________________________ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 From barry.moore at genetics.utah.edu Wed Dec 8 16:15:12 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed Dec 8 21:21:30 2004 Subject: [Bioperl-l] Installing BioPerl on Windows In-Reply-To: <9F0143AE-48BC-11D9-88B1-000393C44276@duke.edu> References: <9F0143AE-48BC-11D9-88B1-000393C44276@duke.edu> Message-ID: <41B76EE0.6080800@genetics.utah.edu> Jason, Brian, Others- A recent message to the bioperl list suggests that new Windows users are still having problems installing Bioperl on Windows. This is not necessary because it's actually quite easy to install Bioperl 1.4. I had a look at the INSATLL.WIN document and I think that while it has been updated a bit, it is starting to suffer from fragmented editing over a long period of time. All the information that you need is there, but it doesn't really fit together to well anymore, and there is still some outdated and conflicting information present. Since new Windows users are often the least likely to be experienced programmers and also likely to have little Unix experience it may also need to be written with that in mind, providing more explanation for how things are done. I've taken a crack at this and rewritten INSTALL.WIN with a longer (perhaps to long) introduction to Bioperl, and updated installation instruction for Bioperl 1.4. In fact I think that the file name INSTALL.WIN should probably be changed as that is a filename that is intuitive to someone who has done a lot of installing from source. Installing_Bioperl_on_Windows.txt may be more obvious filename to new Windows users. If you think it looks useful please feel free to post it on the Bioperl web site as a replacement for or in addition to the current INSTALL.WIN. I'll be happy to try to keep this document up to date, but I'll need one of the developers to put it on the site for me. Finally, I didn't touch the Cygwin sections of the previous INSTALL.WIN document because I have no experience with it, so I'll have to assume that it is accurate and let others contribute any fixes necessary there. Let me know if I've made any errors or omissions that need to be corrected. Barry ================================================================================== Installing Bioperl on Windows ============================= 1) Quick Instructions for the impatient 2) Bioperl on Windows 3) Perl on Windows 4) BioPerl on Windows 5) Beyond the Core 6) BioPerl in Cygwin 7) Cygwin tips This installation guide was written by Barry Moore and other Bioperl authors based on the original work of Paul Boutros. Please report problems and/or fixes to the bioper lmailing list, bioperl-l@bioperl.org 1) Quick instructions for the impatient, lucky, or experienced user. ===================================================================== Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). Open a command prompt (Menus Start->Run and type cmd) and run the ppm shell (C:\>ppm). Add two new ppm repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Install Bioperl-1.4. Go to http://www.bioperl.org and start reading documentation or try the example script at the end of this file. 2) Bioperl on Windows ====================== Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, fasta, clustalw and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc.) and local (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. And finally with its associated documentation and mailing list Bioperl represents a community of bioinformatics professionals working in perl who are committed to supporting both development of Bioperl and the new users who are drawn to the project. While most bioinformatics and computational biology applications are developed in Unix/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, clustalw, Staden etc.). Bioperl simply acts as a wrapper for running and parsing output from these external programs. Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with BioPerl in the native Windows environment. Others, such as clustalw, have Windows ports, however the BioPerl developer who wrote the interface used Unix specific system calls to interact with these programs and so these wrappers will not work in the Windows environment. And finally some external programs such as Staden and the EMBOSS suite of programs can not be installed on Windows at all, and therefore any part of Bioperl that interacts with these packages either won?t work or can?t be installed at all. If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don?t mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also. Be aware that most if not all of the Bioperl developers are working in some type of a Unix environment (Linux, OSX, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses ? simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of Unix like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a Unix emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed more below. 3) Perl on Windows =================== There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState. ActiveState is a software company (http://www.activestate.com) that provides free builds of Perl for Windows users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638 is also available and should work just fine). To install ActivePerl on Windows: Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you?re doing. It that?s the case you probably don?t need to be reading this guide. Cygwin is a Unix emulation environment for Windows and comes with its own copy of Perl. Information on Cygwin and Bioperl is found below. 4) BioPerl on Windows ====================== Perl is a programming language that has been extended a lot by the addition of external modules. These modules work with the core language to extend the functionality of Perl. Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can?t install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl ? if you?ve installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you?ll have to install them yourself if you want to use them. BioPerl has such dependencies. Bioperl is actually a large collection of perl modules (over 1000 currently) and these modules are split into six groups. These six groups are: Bioperl Group Functions ----------------------------------------------------------------- bioperl (the core) Most of the main functionality of Bioperl. bioperl-run Wrappers to a lot of external programs. bioperl-ext Interaction with some alignment functions and the Staden package. bioperl-db Using bioperl with BioSQL and local relational databases. bioperl-microarray Microarray specific functions. biperl-gui Some preliminary work on a graphical user interface to some Bioperl functions. The Bioperl core is what most new users will want to start with. Bioperl 1.4 (the core) and the Perl modules that it depends on can be easily installed with ppm. PPM (Programming Package Manager) is an ActivePerl utility for installing Perl modules on systems using ActivePerl. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you. These .ppd files are stored online in ppm repositories. ActiveState maintains the largest ppm repository and when you installed ActivePerl ppm was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own ppm repositories to fill in the gaps. Installing will require you to direct ppm to look in two new repositories. You do this by opening a Windows command prompt, typing ppm to start the ppm shell and then typing the following two commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Once ppm knows where to look for Bioperl and it?s dependencies you simply tell ppm to install it. This is done with the command: ppm> install Bioperl-1.4 5) Beyond the Core =================== You may find that you want some of the features of other Bioperl groups like bioperl-run or bioperl-db. There are currently no ppm packages for installing these parts of Bioperl. You will have to install these manually from source. For this you will need a Windows version of the program make called nmake (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). You will also want to have a willingness to experiment. You?ll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make. You will have to determine from the installation documents what dependencies are required and you will have to get them, read there documentation and install them first. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with other on the bioperl mailing list. 6) BioPerl in Cygwin ===================== Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. BioPerl runs well within Cygwin. Some users claim that installation of Bioperl is easier within Cygwin than within Windows, but these may be users with Unix backgrounds. One advantage of using Bioperl in Cygwin is that all the external modules are available through CPAN, most if not all external programs can be installed and run so many of the limitation of Bioperl on Windows are circumvented. To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, make, and gcc packages. Clicking the "View" button in the upper right of the installer enables you to see details on the various packages. Then follow the BioPerl installation instructions for Unix in BioPerl's INSTALL file. Note that expat comes with Cygwin (it's used by the module XML::Parser). One known issue is that DBD::mysql can be tricky to install in Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline external packages. Fortunately there's some good instructions online: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a place to create temporary files. e.g.: setenv TMPDIR e:/cygwin/tmp # csh, tcsh export TMPDIR=e:/cygwin/tmp # sh, bash Note that this is not a syntax that Cygwin understands, which would be something like "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows. If this variable is not set correctly you'll see errors like this when you run Bio::Tools::Run::StandAloneBlast: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory STACK: Error::throw .......... 7) Cygwin tips =============== The easiest way to install Mysql is to use the Windows binaries available at www.mysql.com. Note that Windows does not have sockets, so you need to force the Mysql connections to use TCP/IP instead. Do this by using the "-h" option from the command- line: >mysql -h 127.0.0.1 -u blip -pblop biosql Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For example, if your databases are installed locally: alias mysql 'mysql -h 127.0.0.1' If you're trying to use some application or resource "outside" of Cygwin and you're having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may end up with paths written in these different syntaxes, depending. If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32- formatted. When you install Cygwin on a FAT32 partition you will not be able to set permissions and ownership correctly. In most situations this probably won't make any difference but there may be occasions where this is a problem. If you want to use BLAST we recommend that the Windows binary be obtained from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls. Although we've recommended using the BLAST and MySQL binaries you should be able to compile just about everything else from source code using Cygwin's gcc. You'll notice when you're installing Cygwin that many different libraries are also available (gd, jpeg, etc.). -------------- next part -------------- Installing Bioperl on Windows ============================= 1) Quick Instructions for the impatient 2) Bioperl on Windows 3) Perl on Windows 4) BioPerl on Windows 5) Beyond the Core 6) BioPerl in Cygwin 7) Cygwin tips This installation guide was written by Barry Moore and other Bioperlauthors based on the original work of Paul Boutros. Please report problems and/or fixes to the bioperlmailing list, bioperl-l@bioperl.org 1) Quick instructions for the impatient, lucky, or experienced user. ===================================================================== Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). Open a command prompt (Menus Start->Run and type cmd) and run the ppm shell (C:\>ppm). Add two new ppm repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Install Bioperl-1.4. Go to http://www.bioperl.org and start reading documentation or try the example script at the end of this file. 2) Bioperl on Windows ====================== Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, fasta, clustal and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc) and local (MySQL, flatfiles, GFF etc.) for storage and retrieval of sequences. And finally with it?s associated documentation and mailing list Bioperl represents a community of bioinformatics professionals working in perl who are commitmented to supporting both development of Bioperl and the new users who are drawn to the project. While most bioinformatics and computational biology applications are developed in Unix/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, Clustal, Staden etc). Bioperl simply acts as a wrapper for running and parsing output from these external programs. Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with BioPerl in the native Windows environment. Others, such as Clustal, have Windows ports, however the BioPerl developer who wrote the interface used Unix specific system calls to interact with these programs and so these wrappers will not work in the Windows environment. And finally some external programs such as Staden and the EMBOSS suite of programs can not be installed on Windows at all, and therefore any part of Bioperl that interacts with these packages either won?t work or can?t be installed at all. If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don?t mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also. Be aware that most if not all of the Bioperl developers are working in some type of a Unix environment (Linux, OSX, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses ? simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of Unix like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a Unix emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed more below. 3) Perl on Windows =================== There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState. ActiveState is a software company (http://www.activestate.com) that provides free builds of Perl for Windows users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638 is also available and should work just fine). To install ActivePerl on Windows: Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you?re doing. It that?s the case you probably don?t need to be reading this guide. Cygwin is a Unix emulation environment for Windows and comes with its own copy of Perl. Information on Cygwin and Bioperl is found below. 4) BioPerl on Windows ====================== Perl is a programming language that has been extended a lot by the addition of external modules. These modules work with the core language to extend the functionality of Perl. Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can?t install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl ? if you?ve installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you?ll have to install them yourself if you want to use them. BioPerl has such dependencies. Bioperl is actually a large collection of perl modules (over 1000 currently) and these modules are split into six groups. These six groups are: Bioperl Group Functions ----------------------------------------------------------------- bioperl (the core) Most of the main functionality of Bioperl. bioperl-run Wrappers to a lot of external programs. bioperl-ext Interaction with some alignment functions and the Staden package. bioperl-db Using bioperl with BioSQL and local relational databases. bioperl-microarray Microarray specific functions. biperl-gui Some preliminary work on a graphical user interface to some Bioperl functions. The Bioperl core is what most new users will want to start with. Bioperl 1.4 (the core) and the Perl modules that it depends on can be easily installed with ppm. PPM (Programming Package Manager) is an ActivePerl utility for installing Perl modules on systems using ActivePerl. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you. These .ppd files are stored online in ppm repositories. ActiveState maintains the largest ppm repository and when you installed ActivePerl ppm was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own ppm repositories to fill in the gaps. Installing will require you to direct ppm to look in two new repositories. You do this by opening a Windows command prompt, typing ppm to start the ppm shell and then typing the following two commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Once ppm knows where to look for Bioperl and it?s dependencies you simply tell ppm to install it. This is done with the command: ppm> install Bioperl-1.4 5) Beyond the Core =================== You may find that you want some of the features of other Bioperl groups like bioperl-run or bioperl-db. There are currently no ppm packages for installing these parts of Bioperl. You will have to install these manually from source. For this you will need a Windows version of the program make called nmake (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). You will also want to have a willingness to experiment. You?ll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make. You will have to determine from the installation documents what dependencies are required and you will have to get them, read there documentation and install them first. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with other on the bioperl mailing list. 6) BioPerl in Cygwin ===================== Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. BioPerl runs well within Cygwin. Some users claim that installation of Bioperl is easier within Cygwin than within Windows, but these may be users with Unix backgrounds. One advantage of using Bioperl in Cygwin is that all the external modules are available through CPAN, most if not all external programs can be installed and run so many of the limitation of Bioperl on Windows are circumvented. To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, make, and gcc packages. Clicking the "View" button in the upper right of the installer enables you to see details on the various packages. Then follow the BioPerl installation instructions for Unix in BioPerl's INSTALL file. Note that expat comes with Cygwin (it's used by the module XML::Parser). One known issue is that DBD::mysql can be tricky to install in Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline external packages. Fortunately there's some good instructions online: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a place to create temporary files. e.g.: setenv TMPDIR e:/cygwin/tmp # csh, tcsh export TMPDIR=e:/cygwin/tmp # sh, bash Note that this is not a syntax that Cygwin understands, which would be something like "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows. If this variable is not set correctly you'll see errors like this when you run Bio::Tools::Run::StandAloneBlast: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory STACK: Error::throw .......... 7) Cygwin tips =============== The easiest way to install Mysql is to use the Windows binaries available at www.mysql.com. Note that Windows does not have sockets, so you need to force the Mysql connections to use TCP/IP instead. Do this by using the "-h" option from the command- line: >mysql -h 127.0.0.1 -u blip -pblop biosql Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For example, if your databases are installed locally: alias mysql 'mysql -h 127.0.0.1' If you're trying to use some application or resource "outside" of Cygwin and you're having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may end up with paths written in these different syntaxes, depending. If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32- formatted. When you install Cygwin on a FAT32 partition you will not be able to set permissions and ownership correctly. In most situations this probably won't make any difference but there may be occasions where this is a problem. If you want to use BLAST we recommend that the Windows binary be obtained from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls. Although we've recommended using the BLAST and MySQL binaries you should be able to compile just about everything else from source code using Cygwin's gcc. You'll notice when you're installing Cygwin that many different libraries are also available (gd, jpeg, etc.). From barry.moore at genetics.utah.edu Wed Dec 8 16:30:01 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed Dec 8 21:21:33 2004 Subject: [Bioperl-l] Installing Bioperl on Windows Message-ID: <41B77259.3030606@genetics.utah.edu> Of course as soon as I sent my last e-mail I found an error in the file I attached. It didn't include the example script that I reffered to. Barry ========================================================== Installing Bioperl on Windows ============================= 1) Quick Instructions for the Impatient 2) Bioperl on Windows 3) Perl on Windows 4) BioPerl on Windows 5) Beyond the Core 6) BioPerl and Cygwin 7) Cygwin Tips 8) Example Script This installation guide was written by Barry Moore and other Bioperl authors based on the original work of Paul Boutros. Please report problems and/or fixes to the bioperl mailing list, bioperl-l@bioperl.org 1) Quick instructions for the impatient, lucky, or experienced user. ===================================================================== Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). Open a command prompt (Menus Start->Run and type cmd) and run the ppm shell (C:\>ppm). Add two new ppm repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Install Bioperl-1.4. Go to http://www.bioperl.org and start reading documentation or try the example script at the end of this file. 2) Bioperl on Windows ====================== Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing Perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, fasta, clustalw and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc) and local (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. And finally with it?s associated documentation and mailing list Bioperl represents a community of bioinformatics professionals working in Perl who are committed to supporting both development of Bioperl and the new users who are drawn to the project. While most bioinformatics and computational biology applications are developed in Unix/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl simply acts as a wrapper for running and parsing output from these external programs. Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with BioPerl in the native Windows environment. Others, such as clustalw, have Windows ports, however the BioPerl developer who wrote the interface used Unix specific system calls to interact with these programs and so these wrappers will not work in the Windows environment. And finally some external programs such as Staden and the EMBOSS suite of programs can not be installed on Windows at all, and therefore any part of Bioperl that interacts with these packages either won?t work or can?t be installed at all. If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don?t mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also. Be aware that most if not all of the Bioperl developers are working in some type of a Unix environment (Linux, OSX, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses ? simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of Unix like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a Unix emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed more below. 3) Perl on Windows =================== There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState. ActiveState is a software company (http://www.activestate.com) that provides free builds of Perl for Windows users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638 is also available and should work just fine). To install ActivePerl on Windows: Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you?re doing. It that?s the case you probably don?t need to be reading this guide. Cygwin is a Unix emulation environment for Windows and comes with its own copy of Perl. Information on Cygwin and Bioperl is found below. 4) BioPerl on Windows ====================== Perl is a programming language that has been extended a lot by the addition of external modules. These modules work with the core language to extend the functionality of Perl. Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can?t install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl ? if you?ve installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you?ll have to install them yourself if you want to use them. BioPerl has such dependencies. Bioperl is actually a large collection of Perl modules (over 1000 currently) and these modules are split into six groups. These six groups are: Bioperl Group Functions ----------------------------------------------------------------- bioperl (the core) Most of the main functionality of Bioperl. bioperl-run Wrappers to a lot of external programs. bioperl-ext Interaction with some alignment functions and the Staden package. bioperl-db Using bioperl with BioSQL and local relational databases. bioperl-microarray Microarray specific functions. biperl-gui Some preliminary work on a graphical user interface to some Bioperl functions. The Bioperl core is what most new users will want to start with. Bioperl 1.4 (the core) and the Perl modules that it depends on can be easily installed with ppm. PPM (Programming Package Manager) is an ActivePerl utility for installing Perl modules on systems using ActivePerl. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you. These .ppd files are stored online in ppm repositories. ActiveState maintains the largest ppm repository and when you installed ActivePerl ppm was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own ppm repositories to fill in the gaps. Installing will require you to direct ppm to look in two new repositories. You do this by opening a Windows command prompt, typing ppm to start the ppm shell and then typing the following two commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Once ppm knows where to look for Bioperl and it?s dependencies you simply tell ppm to install it. This is done with the command: ppm> install Bioperl-1.4 5) Beyond the Core =================== You may find that you want some of the features of other Bioperl groups like bioperl-run or bioperl-db. There are currently no ppm packages for installing these parts of Bioperl. You will have to install these manually from source. For this you will need a Windows version of the program make called nmake (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). You will also want to have a willingness to experiment. You?ll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make. You will have to determine from the installation documents what dependencies are required and you will have to get them, read there documentation and install them first. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with other on the bioperl mailing list. 6) BioPerl and Cygwin ===================== Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. BioPerl runs well within Cygwin. Some users claim that installation of Bioperl is easier within Cygwin than within Windows, but these may be users with Unix backgrounds. One advantage of using Bioperl in Cygwin is that all the external modules are available through CPAN, most if not all external programs can be installed and run so many of the limitation of Bioperl on Windows are circumvented. To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, make, and gcc packages. Clicking the "View" button in the upper right of the installer enables you to see details on the various packages. Then follow the BioPerl installation instructions for Unix in BioPerl's INSTALL file. Note that expat comes with Cygwin (it's used by the module XML::Parser). One known issue is that DBD::mysql can be tricky to install in Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline external packages. Fortunately there's some good instructions online: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a place to create temporary files. e.g.: setenv TMPDIR e:/cygwin/tmp # csh, tcsh export TMPDIR=e:/cygwin/tmp # sh, bash Note that this is not a syntax that Cygwin understands, which would be something like "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows. If this variable is not set correctly you'll see errors like this when you run Bio::Tools::Run::StandAloneBlast: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory STACK: Error::throw .......... 7) Cygwin Tips =============== The easiest way to install MySQL is to use the Windows binaries available at www.mysql.com. Note that Windows does not have sockets, so you need to force the MySQL connections to use TCP/IP instead. Do this by using the "-h" option from the command- line: >mysql -h 127.0.0.1 -u blip -pblop biosql Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For example, if your databases are installed locally: alias mysql 'mysql -h 127.0.0.1' If you're trying to use some application or resource "outside" of Cygwin and you're having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may end up with paths written in these different syntaxes, depending. If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32- formatted. When you install Cygwin on a FAT32 partition you will not be able to set permissions and ownership correctly. In most situations this probably won't make any difference but there may be occasions where this is a problem. If you want to use BLAST we recommend that the Windows binary be obtained from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls. Although we've recommended using the BLAST and MySQL binaries you should be able to compile just about everything else from source code using Cygwin's gcc. You'll notice when you're installing Cygwin that many different libraries are also available (gd, jpeg, etc.). 8) Example Script ================= #!/usr/bin/perl #A short script to demonstrate how to download sequences from GenBank and access #the sequence and some associated annotations using Bioperl. use strict; use warnings; use Bio::SeqIO; use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed #Get some sequence IDs either like below, or read in from a file. Note that #this sample script works with the accession numbers below (at least at the time #it was written). If you add different accession numbers, and you get errors, #you may be calling for something that the sequence doesn't have. You'll have #to add your own error trapping code to handle that. my @ids = ('K03160', 'AB039327', 'BC035972'); #Create the GenBank database object to read from the database. my $gb = new Bio::DB::GenBank(); #Create a sequence stream to pass the sequences from the database to the program. my $seqio = $gb->get_Stream_by_id(\@ids); #Loop over all of the sequences that you requested. while (my $seq = $seqio->next_seq) { #Here is how you get methods directly from the RichSeq object. Replace #'display_name' with any other method in Table 2. that can be called on #either the RichSeq object directly, or the PrimarySeq object which it has #inherited. print "Display Name: ", $seq->display_name,"\n"; print "Sequence Date: ",$seq->get_dates,"\n"; #Here is how to access the classification data from the species object. my $species = $seq->species; print "Species :", $species->common_name,"\n"; my @class = $species->classification; print "Classification: @class\n"; #Here is a general way to call things that are stored as a Bio::SeqFeature:: #Generic object. Replace 'source' with any other of the "major" headings in #the feature table (e.g gene, CDS, etc.) and replace 'organism' with any of #the tag values found under that heading (mol_type, locus_tag, gene, etc.) my @source_feats = grep { $_->primary_tag eq 'source' } $seq->get_SeqFeatures(); my $source_feat = shift @source_feats; my @mol_type = $source_feat->get_tag_values('mol_type'); print "Molecule Type: @mol_type\n"; #Here is a general way to call things that are stored as some type of a #Bio::Annotation oject. This includes reference information, and comments. #Replace reference with 'comment' to get the comment, and replace #$ref->authors with $ref->title (or location, medline, etc.) to get other #reference categories my $ann = $seq->annotation(); my @references = ($ann->get_Annotations('reference')); my $ref = shift @references; my ($title, $authors, $location, $pubmed, $reference); if (defined $ref) { $authors = $ref->authors; print "Authors: $authors\n"; } print "Sequence: \n", $seq->seq, "\n\n"; } -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT -------------- next part -------------- Installing Bioperl on Windows ============================= 1) Quick Instructions for the Impatient 2) Bioperl on Windows 3) Perl on Windows 4) BioPerl on Windows 5) Beyond the Core 6) BioPerl and Cygwin 7) Cygwin Tips 8) Example Script This installation guide was written by Barry Moore and other Bioperl authors based on the original work of Paul Boutros. Please report problems and/or fixes to the bioperl mailing list, bioperl-l@bioperl.org 1) Quick instructions for the impatient, lucky, or experienced user. ===================================================================== Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). Open a command prompt (Menus Start->Run and type cmd) and run the ppm shell (C:\>ppm). Add two new ppm repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Install Bioperl-1.4. Go to http://www.bioperl.org and start reading documentation or try the example script at the end of this file. 2) Bioperl on Windows ====================== Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing Perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, fasta, clustalw and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc) and local (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. And finally with it?s associated documentation and mailing list Bioperl represents a community of bioinformatics professionals working in Perl who are committed to supporting both development of Bioperl and the new users who are drawn to the project. While most bioinformatics and computational biology applications are developed in Unix/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl simply acts as a wrapper for running and parsing output from these external programs. Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with BioPerl in the native Windows environment. Others, such as clustalw, have Windows ports, however the BioPerl developer who wrote the interface used Unix specific system calls to interact with these programs and so these wrappers will not work in the Windows environment. And finally some external programs such as Staden and the EMBOSS suite of programs can not be installed on Windows at all, and therefore any part of Bioperl that interacts with these packages either won?t work or can?t be installed at all. If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don?t mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also. Be aware that most if not all of the Bioperl developers are working in some type of a Unix environment (Linux, OSX, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses ? simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of Unix like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a Unix emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed more below. 3) Perl on Windows =================== There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState. ActiveState is a software company (http://www.activestate.com) that provides free builds of Perl for Windows users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638 is also available and should work just fine). To install ActivePerl on Windows: Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you?re doing. It that?s the case you probably don?t need to be reading this guide. Cygwin is a Unix emulation environment for Windows and comes with its own copy of Perl. Information on Cygwin and Bioperl is found below. 4) BioPerl on Windows ====================== Perl is a programming language that has been extended a lot by the addition of external modules. These modules work with the core language to extend the functionality of Perl. Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can?t install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl ? if you?ve installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you?ll have to install them yourself if you want to use them. BioPerl has such dependencies. Bioperl is actually a large collection of Perl modules (over 1000 currently) and these modules are split into six groups. These six groups are: Bioperl Group Functions ----------------------------------------------------------------- bioperl (the core) Most of the main functionality of Bioperl. bioperl-run Wrappers to a lot of external programs. bioperl-ext Interaction with some alignment functions and the Staden package. bioperl-db Using bioperl with BioSQL and local relational databases. bioperl-microarray Microarray specific functions. biperl-gui Some preliminary work on a graphical user interface to some Bioperl functions. The Bioperl core is what most new users will want to start with. Bioperl 1.4 (the core) and the Perl modules that it depends on can be easily installed with ppm. PPM (Programming Package Manager) is an ActivePerl utility for installing Perl modules on systems using ActivePerl. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you. These .ppd files are stored online in ppm repositories. ActiveState maintains the largest ppm repository and when you installed ActivePerl ppm was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own ppm repositories to fill in the gaps. Installing will require you to direct ppm to look in two new repositories. You do this by opening a Windows command prompt, typing ppm to start the ppm shell and then typing the following two commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Once ppm knows where to look for Bioperl and it?s dependencies you simply tell ppm to install it. This is done with the command: ppm> install Bioperl-1.4 5) Beyond the Core =================== You may find that you want some of the features of other Bioperl groups like bioperl-run or bioperl-db. There are currently no ppm packages for installing these parts of Bioperl. You will have to install these manually from source. For this you will need a Windows version of the program make called nmake (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). You will also want to have a willingness to experiment. You?ll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make. You will have to determine from the installation documents what dependencies are required and you will have to get them, read there documentation and install them first. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with other on the bioperl mailing list. 6) BioPerl and Cygwin ===================== Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. BioPerl runs well within Cygwin. Some users claim that installation of Bioperl is easier within Cygwin than within Windows, but these may be users with Unix backgrounds. One advantage of using Bioperl in Cygwin is that all the external modules are available through CPAN, most if not all external programs can be installed and run so many of the limitation of Bioperl on Windows are circumvented. To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, make, and gcc packages. Clicking the "View" button in the upper right of the installer enables you to see details on the various packages. Then follow the BioPerl installation instructions for Unix in BioPerl's INSTALL file. Note that expat comes with Cygwin (it's used by the module XML::Parser). One known issue is that DBD::mysql can be tricky to install in Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline external packages. Fortunately there's some good instructions online: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a place to create temporary files. e.g.: setenv TMPDIR e:/cygwin/tmp # csh, tcsh export TMPDIR=e:/cygwin/tmp # sh, bash Note that this is not a syntax that Cygwin understands, which would be something like "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows. If this variable is not set correctly you'll see errors like this when you run Bio::Tools::Run::StandAloneBlast: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory STACK: Error::throw .......... 7) Cygwin Tips =============== The easiest way to install MySQL is to use the Windows binaries available at www.mysql.com. Note that Windows does not have sockets, so you need to force the MySQL connections to use TCP/IP instead. Do this by using the "-h" option from the command- line: >mysql -h 127.0.0.1 -u blip -pblop biosql Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For example, if your databases are installed locally: alias mysql 'mysql -h 127.0.0.1' If you're trying to use some application or resource "outside" of Cygwin and you're having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may end up with paths written in these different syntaxes, depending. If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32- formatted. When you install Cygwin on a FAT32 partition you will not be able to set permissions and ownership correctly. In most situations this probably won't make any difference but there may be occasions where this is a problem. If you want to use BLAST we recommend that the Windows binary be obtained from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls. Although we've recommended using the BLAST and MySQL binaries you should be able to compile just about everything else from source code using Cygwin's gcc. You'll notice when you're installing Cygwin that many different libraries are also available (gd, jpeg, etc.). 8) Example Script ================= #!/usr/bin/perl #A short script to demonstrate how to download sequences from GenBank and access #the sequence and some associated annotations using Bioperl. use strict; use warnings; use Bio::SeqIO; use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed #Get some sequence IDs either like below, or read in from a file. Note that #this sample script works with the accession numbers below (at least at the time #it was written). If you add different accession numbers, and you get errors, #you may be calling for something that the sequence doesn't have. You'll have #to add your own error trapping code to handle that. my @ids = ('K03160', 'AB039327', 'BC035972'); #Create the GenBank database object to read from the database. my $gb = new Bio::DB::GenBank(); #Create a sequence stream to pass the sequences from the database to the program. my $seqio = $gb->get_Stream_by_id(\@ids); #Loop over all of the sequences that you requested. while (my $seq = $seqio->next_seq) { #Here is how you get methods directly from the RichSeq object. Replace #'display_name' with any other method in Table 2. that can be called on #either the RichSeq object directly, or the PrimarySeq object which it has #inherited. print "Display Name: ", $seq->display_name,"\n"; print "Sequence Date: ",$seq->get_dates,"\n"; #Here is how to access the classification data from the species object. my $species = $seq->species; print "Species :", $species->common_name,"\n"; my @class = $species->classification; print "Classification: @class\n"; #Here is a general way to call things that are stored as a Bio::SeqFeature:: #Generic object. Replace 'source' with any other of the "major" headings in #the feature table (e.g gene, CDS, etc.) and replace 'organism' with any of #the tag values found under that heading (mol_type, locus_tag, gene, etc.) my @source_feats = grep { $_->primary_tag eq 'source' } $seq->get_SeqFeatures(); my $source_feat = shift @source_feats; my @mol_type = $source_feat->get_tag_values('mol_type'); print "Molecule Type: @mol_type\n"; #Here is a general way to call things that are stored as some type of a #Bio::Annotation oject. This includes reference information, and comments. #Replace reference with 'comment' to get the comment, and replace #$ref->authors with $ref->title (or location, medline, etc.) to get other #reference categories my $ann = $seq->annotation(); my @references = ($ann->get_Annotations('reference')); my $ref = shift @references; my ($title, $authors, $location, $pubmed, $reference); if (defined $ref) { $authors = $ref->authors; print "Authors: $authors\n"; } print "Sequence: \n", $seq->seq, "\n\n"; } From rfsouza at cecm.usp.br Wed Dec 8 19:35:30 2004 From: rfsouza at cecm.usp.br (Robson Francisco de Souza {S}) Date: Wed Dec 8 21:21:40 2004 Subject: [Bioperl-l] problems parsing EBI interposscan.xml In-Reply-To: References: <1102097826.5668.10.camel@peach4> Message-ID: <20041209003530.GA22259@cecm.usp.br> Hello, On Wed, Dec 08, 2004 at 04:18:34PM -0800, Hilmar Lapp wrote: > It looks you're trying to parse an interpro scan match file with the > InterPro ontology file parser (Bio::OntologyIO). Maybe what you need is > the interpro parser in Bio::SeqIO? By the way, wouldn't the SeqIO/interpro.pm be better placed under the FeatureIO hierarchy? It only collects data related to similarity and pattern matches on protein sequences... Just a thought... Cheers, Robson From hlapp at gnf.org Wed Dec 8 22:13:52 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Dec 8 22:12:27 2004 Subject: [Bioperl-l] problems parsing EBI interposscan.xml In-Reply-To: <20041209003530.GA22259@cecm.usp.br> References: <1102097826.5668.10.camel@peach4> <20041209003530.GA22259@cecm.usp.br> Message-ID: <5A602232-4990-11D9-B836-000A95AE92B0@gnf.org> That was my thought too when I saw it ... didn't want to be the jerk again so I left it where it was ... any comment from the authors? -hilmar On Dec 8, 2004, at 4:35 PM, Robson Francisco de Souza {S} wrote: > Hello, > > > On Wed, Dec 08, 2004 at 04:18:34PM -0800, Hilmar Lapp wrote: >> It looks you're trying to parse an interpro scan match file with the >> InterPro ontology file parser (Bio::OntologyIO). Maybe what you need >> is >> the interpro parser in Bio::SeqIO? > > By the way, wouldn't the SeqIO/interpro.pm be better placed under the > FeatureIO hierarchy? It only collects data related to similarity and > pattern matches on protein sequences... > Just a thought... > Cheers, > Robson > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Thu Dec 9 00:08:42 2004 From: allenday at ucla.edu (Allen Day) Date: Wed Dec 8 23:06:33 2004 Subject: [Bioperl-l] problems parsing EBI interposscan.xml In-Reply-To: <5A602232-4990-11D9-B836-000A95AE92B0@gnf.org> References: <1102097826.5668.10.camel@peach4> <20041209003530.GA22259@cecm.usp.br> <5A602232-4990-11D9-B836-000A95AE92B0@gnf.org> Message-ID: yeah, it should be in there. i don't have time to do it right now -- feel free. brian, you interested in this? btw, update on the parse: i am in the process of testing iprscan 4.0, which supposedly fixes the invalid xml bug. there are other issues with it though, such as it is missing index files, and files necessary to run on a SGE cluster... when i can actually get some output i will post here. -allen On Wed, 8 Dec 2004, Hilmar Lapp wrote: > That was my thought too when I saw it ... didn't want to be the jerk > again so I left it where it was ... any comment from the authors? > > -hilmar > > On Dec 8, 2004, at 4:35 PM, Robson Francisco de Souza {S} wrote: > > > Hello, > > > > > > On Wed, Dec 08, 2004 at 04:18:34PM -0800, Hilmar Lapp wrote: > >> It looks you're trying to parse an interpro scan match file with the > >> InterPro ontology file parser (Bio::OntologyIO). Maybe what you need > >> is > >> the interpro parser in Bio::SeqIO? > > > > By the way, wouldn't the SeqIO/interpro.pm be better placed under the > > FeatureIO hierarchy? It only collects data related to similarity and > > pattern matches on protein sequences... > > Just a thought... > > Cheers, > > Robson > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From nathanhaigh at ukonline.co.uk Thu Dec 9 04:42:17 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Thu Dec 9 07:56:30 2004 Subject: [Bioperl-l] Installing Bioperl on Windows In-Reply-To: <41B77259.3030606@genetics.utah.edu> Message-ID: Being a windows user (primarily), I have the following comments about the windows install instructions: I wasn't sure which wrappers you were referring to that will not work in on Windows OS, when you said: "Others, such as clustalw, have Windows ports, however the BioPerl developer who wrote the interface used Unix specific system calls to interact with these programs and so these wrappers will not work in the Windows environment" Are you sure that the http://www.Bribes.org/perl/ppm repository isn't require in addition to theoryx http://theoryx5.uwinnipeg.ca/ppms for some modules (just wondering why I might have it installed unless I needed it for a bioperl feature)?: I have amended the section regarding ppd files for bioperl-run etc. suggesting the user try's searching for them before jumping into source! I might see about getting a ppd file for the Bioperl-run package made up as this is often something that beginners/intermediate bioperlers would like to use i.e. have batch runs and parse the output etc. I've attached my modified version of the file with changes. Also, with regards to naming packages in .ppd files: Short version: ------------------ Change the two references to Bioperl-1.4 in the PPM install steps to read: Install Bioperl Also, I think Bioperl 1.4 references should be made more general for future releases i.e. just Bioperl Reasoning: --------------- ppd files have both a NAME and a VERSION field, and when installing via PPM you would type PPM> install NAME should not contain any reference to the version number and should simply be set to Bioperl (not Bioperl-1.4), leaving the version numbering to the VERSION field. This means that when a Bioperl v1.5 is released and you do a search for bioperl: PPM> search bioperl A list of modules is returned, e.g.: Searching in Active Repositories 1. Bioperl [1.5] Bioinformatics Toolkit 2. Bioperl-1.2 [1.2] Bioperl 1.2 PPM3 Archive 3. Bioperl-1.2.1 [1.2.1] Bioperl 1.2.1 PPM3 Archive 4. Bioperl-1.2.3 [1.2.3] Bioperl 1.2.3 PPM3 Archive 5. Bioperl-1.4 [1.4] Bioperl 1.4 PPM3 Archive Thus, when the user issues the command: PPM> install bioperl PPM's internals will automatically install the latest version of Bioperl. If the user needs to install an older version, they should issue a command such as: PPM> install 4 This would install Bioperl-1.2.3 package from the above list. This would also allow a user of BioPerl v1.4 to upgrade to 1.5 by issuing the following command: PPM> upgrade Bioperl And PPM's internals would upgrade BioPerl to the latest version (however, I don't know how/if this would work for people who have install Bioperl-1.4 (package 5 shown above) as PPM would probably think this a totally different module because of the different NAME. Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Barry Moore > Sent: 08 December 2004 21:30 > To: Jason Stajich; Brian Osborne; bioperl > Subject: [Bioperl-l] Installing Bioperl on Windows > > Of course as soon as I sent my last e-mail I found an error in the file > I attached. It didn't include the example script that I reffered to. > > Barry > > ========================================================== > > Installing Bioperl on Windows > ============================= > > 1) Quick Instructions for the Impatient > 2) Bioperl on Windows > 3) Perl on Windows > 4) BioPerl on Windows > 5) Beyond the Core > 6) BioPerl and Cygwin > 7) Cygwin Tips > 8) Example Script > > This installation guide was written by Barry Moore and other Bioperl > authors based on the > original work of Paul Boutros. Please report problems and/or fixes to > the bioperl mailing > list, bioperl-l@bioperl.org > > 1) Quick instructions for the impatient, lucky, or experienced user. > ===================================================================== > > Download the ActivePerl MSI from > http://www.activestate.com/Products/ActivePerl/ > Run the ActivePerl Installer (accepting all defaults is fine). > Open a command prompt (Menus Start->Run and type cmd) and run the ppm > shell (C:\>ppm). > Add two new ppm repositories with the following commands: > ppm> rep add Bioperl http://bioperl.org/DIST > ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > Install Bioperl-1.4. > Go to http://www.bioperl.org and start reading documentation or try the > example script at > the end of this file. > > > 2) Bioperl on Windows > ====================== > > Bioperl is a large collection of Perl modules (extensions to the Perl > language) that aid > in the task of writing Perl code to deal with sequence data in a myriad > of ways. Bioperl > provides objects for various types of sequence data and their associated > features and > annotations. It provides interfaces for analysis of these sequences with > a wide variety > of external programs (BLAST, fasta, clustalw and EMBOSS to name just a > few). It provides > interfaces to various types of databases both remote (GenBank, EMBL etc) > and local > (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. > And finally with > it's associated documentation and mailing list Bioperl represents a > community of > bioinformatics professionals working in Perl who are committed to > supporting both > development of Bioperl and the new users who are drawn to the project. > > While most bioinformatics and computational biology applications are > developed in > Unix/Linux environments, more and more programs are being ported to > other operating > systems like Windows, and many users (often biologists with little > background in > programming) are looking for ways to automate bioinformatics analyses in > the Windows > environment. Perl and Bioperl can be installed natively on Windows > NT/2000/XP. Most of > the functionality of Bioperl is available with this type of install. > Much of the heavy > lifting in bioinformatics is done by programs originally developed in > lower level > languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl > simply acts as a > wrapper for running and parsing output from these external programs. > Some of those > programs (BLAST for example) are ported to Windows. These can be > installed and work > quite happily with BioPerl in the native Windows environment. Others, > such as clustalw, > have Windows ports, however the BioPerl developer who wrote the > interface used Unix > specific system calls to interact with these programs and so these > wrappers will not work > in the Windows environment. And finally some external programs such as > Staden and the > EMBOSS suite of programs can not be installed on Windows at all, and > therefore any part > of Bioperl that interacts with these packages either won't work or can't > be installed at > all. > > If you have a fairly simple project in mind, want to start using Bioperl > quickly, only > have access to a computer running Windows, and/or don't mind bumping up > against some > limitations then Bioperl on Windows may be a good place for you to > start. For example, > downloading a bunch of sequences from GenBank and sorting out the ones > that have a > particular annotation or feature works great. Running a bunch of your > sequences against > remote or local BLAST, parsing the output and storing it in a MySQL > database would be > fine also. Be aware that most if not all of the Bioperl developers are > working in some > type of a Unix environment (Linux, OSX, Cygwin). If you have problems > with Bioperl that > are specific to the Windows environment, you may be blazing new ground > and your pleas for > help on the Bioperl mailing list may get few responses - simply because > no one knows the > answer to your Windows specific problem. If this is or becomes a problem > for you then > you are better off working in some type of Unix like environment. One > solution to this > problem that will keep you working on a Windows machine it to install > Cygwin, a Unix > emulation environment for Windows. A number of Bioperl users are using > this approach > successfully and it is discussed more below. > > 3) Perl on Windows > =================== > > There are a couple of ways of installing Perl on a Windows machine. The > most common and > easiest is to get the most recent build from ActiveState. ActiveState is > a software > company (http://www.activestate.com) that provides free builds of Perl > for Windows > users. The current (December 2004) build is ActivePerl 5.8.4.810 > (ActivePerl 5.6.1.638 > is also available and should work just fine). To install ActivePerl on > Windows: > Download the ActivePerl MSI from > http://www.activestate.com/Products/ActivePerl/ > Run the ActivePerl Installer (accepting all defaults is fine). > > You can also build Perl yourself (which requires a C compiler) or > download one of the > other binary distributions. The Perl source for building it yourself is > available from > CPAN (http://www.cpan.org), as are a few other binary distributions that > are alternatives > to ActiveState. This approach is not recommended unless you have > specific reasons for > doing so and know what you're doing. It that's the case you probably > don't need to be > reading this guide. > > Cygwin is a Unix emulation environment for Windows and comes with its > own copy of Perl. > Information on Cygwin and Bioperl is found below. > > 4) BioPerl on Windows > ====================== > > Perl is a programming language that has been extended a lot by the > addition of external > modules. These modules work with the core language to extend the > functionality of Perl. > Bioperl is one such extension to Perl. These modular extensions to Perl > sometimes depend > on the functionality of other Perl modules and this creates a > dependency. You can't > install module X unless you have already installed module Y. Some Perl > modules are so > fundamentally useful that the Perl developers have included them in the > core distribution > of Perl - if you've installed Perl then these modules are already > installed. Other > modules are freely available from CPAN, but you'll have to install them > yourself if you > want to use them. BioPerl has such dependencies. > > Bioperl is actually a large collection of Perl modules (over 1000 > currently) and these > modules are split into six groups. These six groups are: > > Bioperl Group Functions > ----------------------------------------------------------------- > bioperl (the core) Most of the main functionality of Bioperl. > bioperl-run Wrappers to a lot of external programs. > bioperl-ext Interaction with some alignment functions > and the Staden package. > bioperl-db Using bioperl with BioSQL and local > relational databases. > bioperl-microarray Microarray specific functions. > biperl-gui Some preliminary work on a graphical user > interface to some Bioperl functions. > > The Bioperl core is what most new users will want to start with. Bioperl > 1.4 (the core) > and the Perl modules that it depends on can be easily installed with > ppm. PPM > (Programming Package Manager) is an ActivePerl utility for installing > Perl modules on > systems using ActivePerl. PPM will look online (you have to be connected > to the internet > of course) for files (these files end with .ppd) that tell it how to > install the modules > you want and what other modules your new modules depends on. It will > then download and > install your modules and all dependent modules for you. These .ppd files > are stored > online in ppm repositories. ActiveState maintains the largest ppm > repository and when > you installed ActivePerl ppm was installed with directions for using the > ActiveState > repositories. Unfortunately the ActiveState repositories are far from > complete and other > ActivePerl users maintain their own ppm repositories to fill in the > gaps. Installing > will require you to direct ppm to look in two new repositories. You do > this by opening a > Windows command prompt, typing ppm to start the ppm shell and then > typing the following > two commands: > ppm> rep add Bioperl http://bioperl.org/DIST > ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > > Once ppm knows where to look for Bioperl and it's dependencies you > simply tell ppm to > install it. This is done with the command: > ppm> install Bioperl-1.4 > > 5) Beyond the Core > =================== > > You may find that you want some of the features of other Bioperl groups > like bioperl-run > or bioperl-db. There are currently no ppm packages for installing these > parts of > Bioperl. You will have to install these manually from source. For this > you will need a > Windows version of the program make called nmake > (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). > You will > also want to have a willingness to experiment. You'll have to read the > installation > documents for each component that you want to install, and use nmake > where the > instructions call for make. You will have to determine from the > installation documents > what dependencies are required and you will have to get them, read there > documentation > and install them first. The details of this are beyond the scope of this > guide. Read > the documentation. Search Google. Try your best, and if you get stuck > consult with > other on the bioperl mailing list. > > 6) BioPerl and Cygwin > ===================== > > Cygwin is a Unix emulator and shell environment available free at > www.cygwin.com. BioPerl > runs well within Cygwin. Some users claim that installation of Bioperl > is easier within > Cygwin than within Windows, but these may be users with Unix backgrounds. > > One advantage of using Bioperl in Cygwin is that all the external > modules are available > through CPAN, most if not all external programs can be installed and run > so many of the > limitation of Bioperl on Windows are circumvented. > > To get Bioperl running first install the basic Cygwin package as well as > the Cygwin Perl, > make, and gcc packages. Clicking the "View" button in the upper right of > the installer > enables you to see details on the various packages. Then follow the > BioPerl installation > instructions for Unix in BioPerl's INSTALL file. > > Note that expat comes with Cygwin (it's used by the module XML::Parser). > > One known issue is that DBD::mysql can be tricky to install in > Cygwin and this module is required for the bioperl-db, Biosql, and > bioperl-pipeline > external packages. Fortunately there's some good instructions online: > http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. > > Also, set the environmental variable TMPDIR, programs like BLAST and > clustalw need a > place to create temporary files. e.g.: > > setenv TMPDIR e:/cygwin/tmp # csh, tcsh > export TMPDIR=e:/cygwin/tmp # sh, bash > > Note that this is not a syntax that Cygwin understands, which would be > something like > "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects > on Windows. > > If this variable is not set correctly you'll see errors like this when > you run > Bio::Tools::Run::StandAloneBlast: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory > STACK: Error::throw > .......... > > 7) Cygwin Tips > =============== > > The easiest way to install MySQL is to use the Windows binaries > available at > www.mysql.com. Note that Windows does not have sockets, so you need to > force the MySQL > connections to use TCP/IP instead. Do this by using the "-h" option from > the command- > line: > > >mysql -h 127.0.0.1 -u blip -pblop biosql > > Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it > uses a host. For > example, if your databases are installed locally: > > alias mysql 'mysql -h 127.0.0.1' > > If you're trying to use some application or resource "outside" of Cygwin > and you're > having a problem remember that Cygwin's path syntax may not be the > correct one. Cygwin > understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when > referring to the E: > drive) but the external resource may want 'E:/cygwin/home/jacky'. So > your *rc files may > end up with paths written in these different syntaxes, depending. > > If you can, install Cygwin on a drive or partition that's > NTFS-formatted, not FAT32- > formatted. When you install Cygwin on a FAT32 partition you will not be > able to set > permissions and ownership correctly. In most situations this probably > won't make any > difference but there may be occasions where this is a problem. > > If you want to use BLAST we recommend that the Windows binary be > obtained from NCBI > (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will > be named something > like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions > in README.bls. > > Although we've recommended using the BLAST and MySQL binaries you should > be able to > compile just about everything else from source code using Cygwin's gcc. > You'll notice > when you're installing Cygwin that many different libraries are also > available (gd, jpeg, > etc.). > > 8) Example Script > ================= > > #!/usr/bin/perl > > #A short script to demonstrate how to download sequences from GenBank > and access > #the sequence and some associated annotations using Bioperl. > > use strict; > use warnings; > use Bio::SeqIO; > use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed > > #Get some sequence IDs either like below, or read in from a file. Note that > #this sample script works with the accession numbers below (at least at > the time > #it was written). If you add different accession numbers, and you get > errors, > #you may be calling for something that the sequence doesn't have. You'll > have > #to add your own error trapping code to handle that. > my @ids = ('K03160', 'AB039327', 'BC035972'); > > #Create the GenBank database object to read from the database. > my $gb = new Bio::DB::GenBank(); > > #Create a sequence stream to pass the sequences from the database to the > program. > my $seqio = $gb->get_Stream_by_id(\@ids); > > #Loop over all of the sequences that you requested. > while (my $seq = $seqio->next_seq) { > > #Here is how you get methods directly from the RichSeq object. Replace > #'display_name' with any other method in Table 2. that can be called on > #either the RichSeq object directly, or the PrimarySeq object which it has > #inherited. > print "Display Name: ", $seq->display_name,"\n"; > print "Sequence Date: ",$seq->get_dates,"\n"; > > #Here is how to access the classification data from the species object. > my $species = $seq->species; > print "Species :", $species->common_name,"\n"; > my @class = $species->classification; > print "Classification: @class\n"; > > #Here is a general way to call things that are stored as a Bio::SeqFeature:: > #Generic object. Replace 'source' with any other of the "major" headings in > #the feature table (e.g gene, CDS, etc.) and replace 'organism' with any of > #the tag values found under that heading (mol_type, locus_tag, gene, etc.) > my @source_feats = grep { $_->primary_tag eq 'source' } > $seq->get_SeqFeatures(); > my $source_feat = shift @source_feats; > my @mol_type = $source_feat->get_tag_values('mol_type'); > print "Molecule Type: @mol_type\n"; > > #Here is a general way to call things that are stored as some type of a > #Bio::Annotation oject. This includes reference information, and comments. > #Replace reference with 'comment' to get the comment, and replace > #$ref->authors with $ref->title (or location, medline, etc.) to get other > #reference categories > my $ann = $seq->annotation(); > my @references = ($ann->get_Annotations('reference')); > my $ref = shift @references; > my ($title, $authors, $location, $pubmed, $reference); > if (defined $ref) { > $authors = $ref->authors; > print "Authors: $authors\n"; > } > print "Sequence: \n", $seq->seq, "\n\n"; > } > > -- > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0450-0, 06/12/2004 > Tested on: 09/12/2004 07:31:40 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 40823 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041209/081ad70f/winmail-0001.bin From crabtree at tigr.org Thu Dec 9 09:46:46 2004 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Thu Dec 9 10:54:46 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel colours fordifferent HSP frames within the same blast hit? Message-ID: Hi Marcus- Looking at the source for Bio/Graphics/Glyph/graded_segments.pm, I'd say that you're stuck with this behavior unless you either modify the glyph or choose a different approach. In terms of modifying the glyph, a good long-term solution would be to add a "use_part_color" option to graded_segments.pm; this option would conditionally enable the code change I describe below. If you're looking for a quick hack, however, just make yourself a new glyph based on a copy of the old one, something like this: 1. copy graded_segments.pm to graded_segments2.pm (ensuring that graded_segments2.pm remains somewhere in your classpath under a similar directory structure, namely Bio/Graphics/Glyph/) 2. make the following changes to graded_segments2.pm: -globally replace "graded_segments" with "graded_segments2" -find the section labeled "allocate colors", which looks like this: # allocate colors my $fill = $self->bgcolor; my ($red,$green,$blue) = $self->panel->rgb($fill); foreach my $part (@parts) { -change it to look like this (i.e. get $fill from the child feature, not the parent): # allocate colors foreach my $part (@parts) { my $fill = $part->bgcolor; my ($red,$green,$blue) = $self->panel->rgb($fill); Finally, change your original script to use glyph => 'graded_segments2'. Jonathan -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org on behalf of Marcus Claesson Sent: Wed 12/8/2004 9:07 AM To: Bioperl list Subject: [Bioperl-l] Can I get different Graphics::Panel colours fordifferent HSP frames within the same blast hit? Hi! In my Graphics::Panel overview of blastx results I would like to have different colours for hits in different frames. It works fine among hits but not for HSPs within the same hit. It then uses the frame value for the first instance, and I only get one colour. Has anyone managed to side step that? Below is the code I've used so far. Many thanks! Marcus #!/usr/bin/perl -w use Bio::Graphics; use Bio::SearchIO; my $searchio = Bio::SearchIO->new(-file=>blastx_results.out -format => 'blast'); my $result = $searchio->next_result(); my $panel = Bio::Graphics::Panel->new(-length=> $result->query_length, -width=> 800); my $track = $panel->add_track(-glyph => 'graded_segments', -label => 1, -connector => 'dashed', -bgcolor => sub { my $feature = shift; my ($frame) = $feature->frame(); return "red" if ($frame =~ /0/); return "green" if ($frame =~ /1/); return "blue" if ($frame =~ /2/)}, -strand_arrow => 'tue'); while( my $hit = $result->next_hit ) { my $feature = Bio::SeqFeature::Generic->new(-score=>$hit->raw_score, -frame=> $hit->frame); while( my $hsp = $hit->next_hsp ) { $feature->add_sub_SeqFeature($hsp,'EXPAND'); } $track->add_feature($feature); } print $panel->png; _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From amackey at pcbi.upenn.edu Thu Dec 9 11:20:23 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Dec 9 11:18:45 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel colours fordifferent HSP frames within the same blast hit? In-Reply-To: References: Message-ID: <3A3AED42-49FE-11D9-B511-000D93392082@pcbi.upenn.edu> There's another solution that shouldn't involve code patches: see the documentation in Panel (short answer: all_callbacks: 1) When you install a callback for a feature that contains subparts, the callback will be invoked first for the top-level feature, and then for each of its subparts (recursively). You should make sure to examine the feature's type to determine whether the option is appropriate. Some glyphs deliberately disable this recursive feature. The "track", "group", "transcript", "transcript2" and "segments" glyphs selectively disable the -bump, -label and -description options. This is to avoid, for example, a label being attached to each exon in a transcript, or the various segments of a gapped alignment bumping each other. You can override this behavior and force your callback to be invoked by providing add_track() with a true -all_callbacks argument. In this case, you must be prepared to handle configuring options for the "group" and "track" glyphs. In particular, this means that in order to control the -bump option with a callback, you should specify -all_callbacks=>1, and turn on bumping when the callback is in the track or group glyphs. On Dec 9, 2004, at 9:46 AM, Crabtree, Jonathan wrote: > > Hi Marcus- > > Looking at the source for Bio/Graphics/Glyph/graded_segments.pm, I'd > say that you're stuck with this behavior unless you either modify the > glyph or choose a different approach. In terms of modifying the > glyph, a good long-term solution would be to add a "use_part_color" > option to graded_segments.pm; this option would conditionally enable > the code change I describe below. If you're looking for a quick hack, > however, just make yourself a new glyph based on a copy of the old > one, something like this: > > 1. copy graded_segments.pm to graded_segments2.pm (ensuring that > graded_segments2.pm remains somewhere in your classpath under a > similar directory structure, namely Bio/Graphics/Glyph/) > > 2. make the following changes to graded_segments2.pm: > -globally replace "graded_segments" with "graded_segments2" > -find the section labeled "allocate colors", which looks like this: > > # allocate colors > my $fill = $self->bgcolor; > my ($red,$green,$blue) = $self->panel->rgb($fill); > > foreach my $part (@parts) { > > -change it to look like this (i.e. get $fill from the child feature, > not the parent): > > # allocate colors > > foreach my $part (@parts) { > my $fill = $part->bgcolor; > my ($red,$green,$blue) = $self->panel->rgb($fill); > > Finally, change your original script to use glyph => > 'graded_segments2'. > > Jonathan > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org on behalf of Marcus > Claesson > Sent: Wed 12/8/2004 9:07 AM > To: Bioperl list > Subject: [Bioperl-l] Can I get different Graphics::Panel colours > fordifferent HSP frames within the same blast hit? > > Hi! > > In my Graphics::Panel overview of blastx results I would like to have > different colours for hits in different frames. It works fine among > hits > but not for HSPs within the same hit. It then uses the frame value for > the first instance, and I only get one colour. Has anyone managed to > side step that? Below is the code I've used so far. > > Many thanks! > Marcus > > > #!/usr/bin/perl -w > use Bio::Graphics; > use Bio::SearchIO; > my $searchio = Bio::SearchIO->new(-file=>blastx_results.out > -format => 'blast'); > my $result = $searchio->next_result(); > my $panel = Bio::Graphics::Panel->new(-length=> $result->query_length, > -width=> 800); > my $track = $panel->add_track(-glyph => 'graded_segments', > -label => 1, > -connector => 'dashed', > -bgcolor => sub { > my $feature = shift; > my ($frame) = $feature->frame(); > return "red" if ($frame =~ /0/); > return "green" if ($frame =~ /1/); > return "blue" if ($frame =~ /2/)}, > -strand_arrow => 'tue'); > while( my $hit = $result->next_hit ) { > my $feature = > Bio::SeqFeature::Generic->new(-score=>$hit->raw_score, > -frame=> $hit->frame); > while( my $hsp = $hit->next_hsp ) { > $feature->add_sub_SeqFeature($hsp,'EXPAND'); > } > $track->add_feature($feature); > } > print $panel->png; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From crabtree at tigr.org Thu Dec 9 11:51:25 2004 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Thu Dec 9 11:49:15 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel colours fordifferent HSP frames within the same blast hit? Message-ID: Hi Aaron- > There's another solution that shouldn't involve code patches: see the > documentation in Panel (short answer: all_callbacks: 1) Perhaps I'm being obtuse, but I don't see how turning all_callbacks on helps you with the -bgcolor issue. It seems like Bio::Graphics::Glyph::graded_segments::draw need to be called on the parent feature in order to get the correct min/max score range. To wit: my ($min_score,$max_score) = $self->minmax(\@parts); And since the call that sets $fill = $self->bgcolor is outside the loop that sets the colors of the individual @parts (the HSPs in this case), there's no way you can then get a different (base) color for each part unless you do what I did, and move the assignment to $fill inside the loop. Now I'm not saying there isn't some other way to assign the HSPs their own colors, just that if you want to use the graded_segments glyph to do so then you need a slightly different implementation of its draw method. Besides, all solutions involve code patches; it's just a question of which code you feel like patching ;) Jonathan From amackey at pcbi.upenn.edu Thu Dec 9 13:19:14 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Dec 9 13:16:36 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel colours fordifferent HSP frames within the same blast hit? In-Reply-To: References: Message-ID: Ahh, right, my apologies. I had run into the all_callbacks solution to a slightly different problem: wanting to provide different labels/hyperlinks for each segment. I was thinking that bgcolor attributes would be similar with segment.pm, but you're right, graded_segment.pm already hijacks the bgcolor calculation. -Aaron On Dec 9, 2004, at 11:51 AM, Crabtree, Jonathan wrote: > > Hi Aaron- > >> There's another solution that shouldn't involve code patches: see the >> documentation in Panel (short answer: all_callbacks: 1) > > Perhaps I'm being obtuse, but I don't see how turning all_callbacks on > helps you with the -bgcolor issue. It seems like > Bio::Graphics::Glyph::graded_segments::draw need to be called on the > parent feature in order to get the correct min/max score range. To > wit: > > my ($min_score,$max_score) = $self->minmax(\@parts); > > And since the call that sets $fill = $self->bgcolor is outside the loop > that sets the colors of the individual @parts (the HSPs in this case), > there's no way you can then get a different (base) color for each part > unless you do what I did, and move the assignment to $fill inside the > loop. > > Now I'm not saying there isn't some other way to assign the HSPs their > own colors, just that if you want to use the graded_segments glyph to > do > so then you need a slightly different implementation of its draw > method. > Besides, all solutions involve code patches; it's just a question of > which code you feel like patching ;) > > Jonathan > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From m.claesson at student.ucc.ie Thu Dec 9 13:26:52 2004 From: m.claesson at student.ucc.ie (Marcus Claesson) Date: Thu Dec 9 13:24:17 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel colours fordifferent HSP frames within the same blast hit? In-Reply-To: References: Message-ID: <1102616812.17814.152.camel@morpheus.ucc.ie> Thanks for your answers guys! That hack seems to do it. However, my program will be used by people installing it themselves so I have to stick with the standard non-hacked version of bioperl. Cheers though, Marcus On Thu, 2004-12-09 at 16:51, Crabtree, Jonathan wrote: > Hi Aaron- > > > There's another solution that shouldn't involve code patches: see the > > documentation in Panel (short answer: all_callbacks: 1) > > Perhaps I'm being obtuse, but I don't see how turning all_callbacks on > helps you with the -bgcolor issue. It seems like > Bio::Graphics::Glyph::graded_segments::draw need to be called on the > parent feature in order to get the correct min/max score range. To wit: > > my ($min_score,$max_score) = $self->minmax(\@parts); > > And since the call that sets $fill = $self->bgcolor is outside the loop > that sets the colors of the individual @parts (the HSPs in this case), > there's no way you can then get a different (base) color for each part > unless you do what I did, and move the assignment to $fill inside the > loop. > > Now I'm not saying there isn't some other way to assign the HSPs their > own colors, just that if you want to use the graded_segments glyph to do > so then you need a slightly different implementation of its draw method. > Besides, all solutions involve code patches; it's just a question of > which code you feel like patching ;) > > Jonathan From barry.moore at genetics.utah.edu Thu Dec 9 11:18:52 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu Dec 9 13:48:59 2004 Subject: [Bioperl-l] Installing Bioperl on Windows In-Reply-To: References: Message-ID: <41B87AEC.1050607@genetics.utah.edu> Very helpful comments Nathan - Thank you. I was referring to Bio::Tools::Run::Alignment::Clustalw in bioperl-run. While clustalw has a Windows port and runs just fine on Windows, this Bioperl module doesn't. In it's doucmentation it says, "However, since the module is currently implemented using (unix) system calls, extensive modification may be necessary before Clustalw.pm would work under non-Unix operating systems (eg Windows, MacOS). I'm not sure that the modifications would need to be that extensive (i.e. changing a system call to backticks) and maybe I should try that out. My comments seemed to suggest that there were other modules that had the same issues, and that may not be true. I have been able to install the Bioperl 1.4 core successfully with just the ActiveState, Bioperl, and Winnipeg repositories. You are right, I should have given more detail for installing bioperl-run. It contains alot of stuff beginners (and others) might want. Access to the Pise packages alone would be very useful to Windows users wanting to implement Unix only software. Thanks for your help on the ppm info. I'm not to savy with ppm and wasn't sure that it would automatically install the latest version. I think a ppd for bioperl-run is an excellent idea. Barry Nathan Haigh wrote: > Being a windows user (primarily), I have the following comments about > the windows install instructions: > > I wasn't sure which wrappers you were referring to that will not work > in on Windows OS, when you said: > > "Others, such as clustalw, have Windows ports, however the BioPerl > developer who wrote the interface used Unix specific system calls to > interact with these programs and so these wrappers will not work in > the Windows environment" > > Are you sure that the http://www.Bribes.org/perl/ppm repository isn't > require in addition to theoryx http://theoryx5.uwinnipeg.ca/ppms for > some modules (just wondering why I might have it installed unless I > needed it for a bioperl feature)?: > > I have amended the section regarding ppd files for bioperl-run etc. > suggesting the user try's searching for them before jumping into > source! I might see about getting a ppd file for the Bioperl-run > package made up as this is often something that beginners/intermediate > bioperlers would like to use i.e. have batch runs and parse the output > etc. > > I've attached my modified version of the file with changes. > > Also, with regards to naming packages in .ppd files: > > Short version: > > ------------------ > > Change the two references to Bioperl-1.4 in the PPM install steps to read: > > Install Bioperl > > Also, I think Bioperl 1.4 references should be made more general for > future releases i.e. just Bioperl > > Reasoning: > > --------------- > > ppd files have both a NAME and a VERSION field, and when installing > via PPM you would type > > PPM> install > > NAME should not contain any reference to the version number and should > simply be set to Bioperl (not Bioperl-1.4), leaving the version > numbering to the VERSION field. This means that when a Bioperl v1.5 is > released and you do a search for bioperl: > > PPM> search bioperl > > A list of modules is returned, e.g.: > > Searching in Active Repositories > > 1. Bioperl [1.5] Bioinformatics Toolkit > > 2. Bioperl-1.2 [1.2] Bioperl 1.2 PPM3 Archive > > 3. Bioperl-1.2.1 [1.2.1] Bioperl 1.2.1 PPM3 Archive > > 4. Bioperl-1.2.3 [1.2.3] Bioperl 1.2.3 PPM3 Archive > > 5. Bioperl-1.4 [1.4] Bioperl 1.4 PPM3 Archive > > Thus, when the user issues the command: > > PPM> install bioperl > > PPM's internals will automatically install the latest version of > Bioperl. If the user needs to install an older version, they should > issue a command such as: > > PPM> install 4 > > This would install Bioperl-1.2.3 package from the above list. > > This would also allow a user of BioPerl v1.4 to upgrade to 1.5 by > issuing the following command: > > PPM> upgrade Bioperl > > And PPM's internals would upgrade BioPerl to the latest version > (however, I don't know how/if this would work for people who have > install Bioperl-1.4 (package 5 shown above) as PPM would probably > think this a totally different module because of the different NAME. > > Nathan > > >> -----Original Message----- > >> From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Barry Moore > >> Sent: 08 December 2004 21:30 > >> To: Jason Stajich; Brian Osborne; bioperl > >> Subject: [Bioperl-l] Installing Bioperl on Windows > >> > >> Of course as soon as I sent my last e-mail I found an error in the file > >> I attached. It didn't include the example script that I reffered to. > >> > >> Barry > >> > >> ========================================================== > >> > >> Installing Bioperl on Windows > >> ============================= > >> > >> 1) Quick Instructions for the Impatient > >> 2) Bioperl on Windows > >> 3) Perl on Windows > >> 4) BioPerl on Windows > >> 5) Beyond the Core > >> 6) BioPerl and Cygwin > >> 7) Cygwin Tips > >> 8) Example Script > >> > >> This installation guide was written by Barry Moore and other Bioperl > >> authors based on the > >> original work of Paul Boutros. Please report problems and/or fixes to > >> the bioperl mailing > >> list, bioperl-l@bioperl.org > >> > >> 1) Quick instructions for the impatient, lucky, or experienced user. > >> ===================================================================== > >> > >> Download the ActivePerl MSI from > >> http://www.activestate.com/Products/ActivePerl/ > >> Run the ActivePerl Installer (accepting all defaults is fine). > >> Open a command prompt (Menus Start->Run and type cmd) and run the ppm > >> shell (C:\>ppm). > >> Add two new ppm repositories with the following commands: > >> ppm> rep add Bioperl http://bioperl.org/DIST > >> ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > >> Install Bioperl-1.4. > >> Go to http://www.bioperl.org and start reading documentation or try the > >> example script at > >> the end of this file. > >> > >> > >> 2) Bioperl on Windows > >> ====================== > >> > >> Bioperl is a large collection of Perl modules (extensions to the Perl > >> language) that aid > >> in the task of writing Perl code to deal with sequence data in a myriad > >> of ways. Bioperl > >> provides objects for various types of sequence data and their associated > >> features and > >> annotations. It provides interfaces for analysis of these sequences with > >> a wide variety > >> of external programs (BLAST, fasta, clustalw and EMBOSS to name just a > >> few). It provides > >> interfaces to various types of databases both remote (GenBank, EMBL etc) > >> and local > >> (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. > >> And finally with > >> it's associated documentation and mailing list Bioperl represents a > >> community of > >> bioinformatics professionals working in Perl who are committed to > >> supporting both > >> development of Bioperl and the new users who are drawn to the project. > >> > >> While most bioinformatics and computational biology applications are > >> developed in > >> Unix/Linux environments, more and more programs are being ported to > >> other operating > >> systems like Windows, and many users (often biologists with little > >> background in > >> programming) are looking for ways to automate bioinformatics analyses in > >> the Windows > >> environment. Perl and Bioperl can be installed natively on Windows > >> NT/2000/XP. Most of > >> the functionality of Bioperl is available with this type of install. > >> Much of the heavy > >> lifting in bioinformatics is done by programs originally developed in > >> lower level > >> languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl > >> simply acts as a > >> wrapper for running and parsing output from these external programs. > >> Some of those > >> programs (BLAST for example) are ported to Windows. These can be > >> installed and work > >> quite happily with BioPerl in the native Windows environment. Others, > >> such as clustalw, > >> have Windows ports, however the BioPerl developer who wrote the > >> interface used Unix > >> specific system calls to interact with these programs and so these > >> wrappers will not work > >> in the Windows environment. And finally some external programs such as > >> Staden and the > >> EMBOSS suite of programs can not be installed on Windows at all, and > >> therefore any part > >> of Bioperl that interacts with these packages either won't work or can't > >> be installed at > >> all. > >> > >> If you have a fairly simple project in mind, want to start using Bioperl > >> quickly, only > >> have access to a computer running Windows, and/or don't mind bumping up > >> against some > >> limitations then Bioperl on Windows may be a good place for you to > >> start. For example, > >> downloading a bunch of sequences from GenBank and sorting out the ones > >> that have a > >> particular annotation or feature works great. Running a bunch of your > >> sequences against > >> remote or local BLAST, parsing the output and storing it in a MySQL > >> database would be > >> fine also. Be aware that most if not all of the Bioperl developers are > >> working in some > >> type of a Unix environment (Linux, OSX, Cygwin). If you have problems > >> with Bioperl that > >> are specific to the Windows environment, you may be blazing new ground > >> and your pleas for > >> help on the Bioperl mailing list may get few responses - simply because > >> no one knows the > >> answer to your Windows specific problem. If this is or becomes a problem > >> for you then > >> you are better off working in some type of Unix like environment. One > >> solution to this > >> problem that will keep you working on a Windows machine it to install > >> Cygwin, a Unix > >> emulation environment for Windows. A number of Bioperl users are using > >> this approach > >> successfully and it is discussed more below. > >> > >> 3) Perl on Windows > >> =================== > >> > >> There are a couple of ways of installing Perl on a Windows machine. The > >> most common and > >> easiest is to get the most recent build from ActiveState. ActiveState is > >> a software > >> company (http://www.activestate.com) that provides free builds of Perl > >> for Windows > >> users. The current (December 2004) build is ActivePerl 5.8.4.810 > >> (ActivePerl 5.6.1.638 > >> is also available and should work just fine). To install ActivePerl on > >> Windows: > >> Download the ActivePerl MSI from > >> http://www.activestate.com/Products/ActivePerl/ > >> Run the ActivePerl Installer (accepting all defaults is fine). > >> > >> You can also build Perl yourself (which requires a C compiler) or > >> download one of the > >> other binary distributions. The Perl source for building it yourself is > >> available from > >> CPAN (http://www.cpan.org), as are a few other binary distributions that > >> are alternatives > >> to ActiveState. This approach is not recommended unless you have > >> specific reasons for > >> doing so and know what you're doing. It that's the case you probably > >> don't need to be > >> reading this guide. > >> > >> Cygwin is a Unix emulation environment for Windows and comes with its > >> own copy of Perl. > >> Information on Cygwin and Bioperl is found below. > >> > >> 4) BioPerl on Windows > >> ====================== > >> > >> Perl is a programming language that has been extended a lot by the > >> addition of external > >> modules. These modules work with the core language to extend the > >> functionality of Perl. > >> Bioperl is one such extension to Perl. These modular extensions to Perl > >> sometimes depend > >> on the functionality of other Perl modules and this creates a > >> dependency. You can't > >> install module X unless you have already installed module Y. Some Perl > >> modules are so > >> fundamentally useful that the Perl developers have included them in the > >> core distribution > >> of Perl - if you've installed Perl then these modules are already > >> installed. Other > >> modules are freely available from CPAN, but you'll have to install them > >> yourself if you > >> want to use them. BioPerl has such dependencies. > >> > >> Bioperl is actually a large collection of Perl modules (over 1000 > >> currently) and these > >> modules are split into six groups. These six groups are: > >> > >> Bioperl Group Functions > >> ----------------------------------------------------------------- > >> bioperl (the core) Most of the main functionality of Bioperl. > >> bioperl-run Wrappers to a lot of external programs. > >> bioperl-ext Interaction with some alignment functions > >> and the Staden package. > >> bioperl-db Using bioperl with BioSQL and local > >> relational databases. > >> bioperl-microarray Microarray specific functions. > >> biperl-gui Some preliminary work on a graphical user > >> interface to some Bioperl functions. > >> > >> The Bioperl core is what most new users will want to start with. Bioperl > >> 1.4 (the core) > >> and the Perl modules that it depends on can be easily installed with > >> ppm. PPM > >> (Programming Package Manager) is an ActivePerl utility for installing > >> Perl modules on > >> systems using ActivePerl. PPM will look online (you have to be connected > >> to the internet > >> of course) for files (these files end with .ppd) that tell it how to > >> install the modules > >> you want and what other modules your new modules depends on. It will > >> then download and > >> install your modules and all dependent modules for you. These .ppd files > >> are stored > >> online in ppm repositories. ActiveState maintains the largest ppm > >> repository and when > >> you installed ActivePerl ppm was installed with directions for using the > >> ActiveState > >> repositories. Unfortunately the ActiveState repositories are far from > >> complete and other > >> ActivePerl users maintain their own ppm repositories to fill in the > >> gaps. Installing > >> will require you to direct ppm to look in two new repositories. You do > >> this by opening a > >> Windows command prompt, typing ppm to start the ppm shell and then > >> typing the following > >> two commands: > >> ppm> rep add Bioperl http://bioperl.org/DIST > >> ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > >> > >> Once ppm knows where to look for Bioperl and it's dependencies you > >> simply tell ppm to > >> install it. This is done with the command: > >> ppm> install Bioperl-1.4 > >> > >> 5) Beyond the Core > >> =================== > >> > >> You may find that you want some of the features of other Bioperl groups > >> like bioperl-run > >> or bioperl-db. There are currently no ppm packages for installing these > >> parts of > >> Bioperl. You will have to install these manually from source. For this > >> you will need a > >> Windows version of the program make called nmake > >> (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). > >> You will > >> also want to have a willingness to experiment. You'll have to read the > >> installation > >> documents for each component that you want to install, and use nmake > >> where the > >> instructions call for make. You will have to determine from the > >> installation documents > >> what dependencies are required and you will have to get them, read there > >> documentation > >> and install them first. The details of this are beyond the scope of this > >> guide. Read > >> the documentation. Search Google. Try your best, and if you get stuck > >> consult with > >> other on the bioperl mailing list. > >> > >> 6) BioPerl and Cygwin > >> ===================== > >> > >> Cygwin is a Unix emulator and shell environment available free at > >> www.cygwin.com. BioPerl > >> runs well within Cygwin. Some users claim that installation of Bioperl > >> is easier within > >> Cygwin than within Windows, but these may be users with Unix backgrounds. > >> > >> One advantage of using Bioperl in Cygwin is that all the external > >> modules are available > >> through CPAN, most if not all external programs can be installed and run > >> so many of the > >> limitation of Bioperl on Windows are circumvented. > >> > >> To get Bioperl running first install the basic Cygwin package as well as > >> the Cygwin Perl, > >> make, and gcc packages. Clicking the "View" button in the upper right of > >> the installer > >> enables you to see details on the various packages. Then follow the > >> BioPerl installation > >> instructions for Unix in BioPerl's INSTALL file. > >> > >> Note that expat comes with Cygwin (it's used by the module XML::Parser). > >> > >> One known issue is that DBD::mysql can be tricky to install in > >> Cygwin and this module is required for the bioperl-db, Biosql, and > >> bioperl-pipeline > >> external packages. Fortunately there's some good instructions online: > >> http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. > >> > >> Also, set the environmental variable TMPDIR, programs like BLAST and > >> clustalw need a > >> place to create temporary files. e.g.: > >> > >> setenv TMPDIR e:/cygwin/tmp # csh, tcsh > >> export TMPDIR=e:/cygwin/tmp # sh, bash > >> > >> Note that this is not a syntax that Cygwin understands, which would be > >> something like > >> "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects > >> on Windows. > >> > >> If this variable is not set correctly you'll see errors like this when > >> you run > >> Bio::Tools::Run::StandAloneBlast: > >> > >> ------------- EXCEPTION: Bio::Root::Exception ------------- > >> MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory > >> STACK: Error::throw > >> .......... > >> > >> 7) Cygwin Tips > >> =============== > >> > >> The easiest way to install MySQL is to use the Windows binaries > >> available at > >> www.mysql.com. Note that Windows does not have sockets, so you need to > >> force the MySQL > >> connections to use TCP/IP instead. Do this by using the "-h" option from > >> the command- > >> line: > >> > >> >mysql -h 127.0.0.1 -u blip -pblop biosql > >> > >> Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it > >> uses a host. For > >> example, if your databases are installed locally: > >> > >> alias mysql 'mysql -h 127.0.0.1' > >> > >> If you're trying to use some application or resource "outside" of Cygwin > >> and you're > >> having a problem remember that Cygwin's path syntax may not be the > >> correct one. Cygwin > >> understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when > >> referring to the E: > >> drive) but the external resource may want 'E:/cygwin/home/jacky'. So > >> your *rc files may > >> end up with paths written in these different syntaxes, depending. > >> > >> If you can, install Cygwin on a drive or partition that's > >> NTFS-formatted, not FAT32- > >> formatted. When you install Cygwin on a FAT32 partition you will not be > >> able to set > >> permissions and ownership correctly. In most situations this probably > >> won't make any > >> difference but there may be occasions where this is a problem. > >> > >> If you want to use BLAST we recommend that the Windows binary be > >> obtained from NCBI > >> (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will > >> be named something > >> like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions > >> in README.bls. > >> > >> Although we've recommended using the BLAST and MySQL binaries you should > >> be able to > >> compile just about everything else from source code using Cygwin's gcc. > >> You'll notice > >> when you're installing Cygwin that many different libraries are also > >> available (gd, jpeg, > >> etc.). > >> > >> 8) Example Script > >> ================= > >> > >> #!/usr/bin/perl > >> > >> #A short script to demonstrate how to download sequences from GenBank > >> and access > >> #the sequence and some associated annotations using Bioperl. > >> > >> use strict; > >> use warnings; > >> use Bio::SeqIO; > >> use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed > >> > >> #Get some sequence IDs either like below, or read in from a file. > Note that > >> #this sample script works with the accession numbers below (at least at > >> the time > >> #it was written). If you add different accession numbers, and you get > >> errors, > >> #you may be calling for something that the sequence doesn't have. You'll > >> have > >> #to add your own error trapping code to handle that. > >> my @ids = ('K03160', 'AB039327', 'BC035972'); > >> > >> #Create the GenBank database object to read from the database. > >> my $gb = new Bio::DB::GenBank(); > >> > >> #Create a sequence stream to pass the sequences from the database to the > >> program. > >> my $seqio = $gb->get_Stream_by_id(\@ids); > >> > >> #Loop over all of the sequences that you requested. > >> while (my $seq = $seqio->next_seq) { > >> > >> #Here is how you get methods directly from the RichSeq object. Replace > >> #'display_name' with any other method in Table 2. that can be called on > >> #either the RichSeq object directly, or the PrimarySeq object which > it has > >> #inherited. > >> print "Display Name: ", $seq->display_name,"\n"; > >> print "Sequence Date: ",$seq->get_dates,"\n"; > >> > >> #Here is how to access the classification data from the species object. > >> my $species = $seq->species; > >> print "Species :", $species->common_name,"\n"; > >> my @class = $species->classification; > >> print "Classification: @class\n"; > >> > >> #Here is a general way to call things that are stored as a > Bio::SeqFeature:: > >> #Generic object. Replace 'source' with any other of the "major" > headings in > >> #the feature table (e.g gene, CDS, etc.) and replace 'organism' with > any of > >> #the tag values found under that heading (mol_type, locus_tag, gene, > etc.) > >> my @source_feats = grep { $_->primary_tag eq 'source' } > >> $seq->get_SeqFeatures(); > >> my $source_feat = shift @source_feats; > >> my @mol_type = $source_feat->get_tag_values('mol_type'); > >> print "Molecule Type: @mol_type\n"; > >> > >> #Here is a general way to call things that are stored as some type of a > >> #Bio::Annotation oject. This includes reference information, and > comments. > >> #Replace reference with 'comment' to get the comment, and replace > >> #$ref->authors with $ref->title (or location, medline, etc.) to get other > >> #reference categories > >> my $ann = $seq->annotation(); > >> my @references = ($ann->get_Annotations('reference')); > >> my $ref = shift @references; > >> my ($title, $authors, $location, $pubmed, $reference); > >> if (defined $ref) { > >> $authors = $ref->authors; > >> print "Authors: $authors\n"; > >> } > >> print "Sequence: \n", $seq->seq, "\n\n"; > >> } > >> > >> -- > >> Barry Moore > >> Dept. of Human Genetics > >> University of Utah > >> Salt Lake City, UT > >> > >> --- > >> avast! Antivirus: Inbound message clean. > >> Virus Database (VPS): 0450-0, 06/12/2004 > >> Tested on: 09/12/2004 07:31:40 > >> avast! is copyright (c) 2000-2003 ALWIL Software. > >> http://www.avast.com > >> > >> > >> > >> <> > >------------------------------------------------------------------------ > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > >------------------------------------------------------------------------ > >Installing Bioperl on Windows >============================= > >1) Quick Instructions for the Impatient >2) Bioperl on Windows >3) Perl on Windows >4) BioPerl on Windows >5) Beyond the Core >6) BioPerl and Cygwin >7) Cygwin Tips >8) Example Script > >This installation guide was written by Barry Moore and other Bioperl authors based on the >original work of Paul Boutros. Please report problems and/or fixes to the bioperl mailing >list, bioperl-l@bioperl.org > >1) Quick instructions for the impatient, lucky, or experienced user. >===================================================================== > >Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ >Run the ActivePerl Installer (accepting all defaults is fine). >Open a command prompt (Menus Start->Run and type cmd) and run the PPM shell (C:\>ppm). >Add two new PPM repositories with the following commands: > PPM> rep add Bioperl http://bioperl.org/DIST > PPM> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms >Install Bioperl with the following command: > PPM> install Bioperl >Go to http://www.bioperl.org and start reading documentation or try the example script at >the end of this file. > > >2) Bioperl on Windows >====================== > >Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid >in the task of writing Perl code to deal with sequence data in a myriad of ways. Bioperl >provides objects for various types of sequence data and their associated features and >annotations. It provides interfaces for analysis of these sequences with a wide variety >of external programs (BLAST, fasta, clustalw and EMBOSS to name just a few). It provides >interfaces to various types of databases both remote (GenBank, EMBL etc) and local >(MySQL, flat files, GFF etc.) for storage and retrieval of sequences. And finally with >it's associated documentation and mailing list Bioperl represents a community of >bioinformatics professionals working in Perl who are committed to supporting both >development of Bioperl and the new users who are drawn to the project. > >While most bioinformatics and computational biology applications are developed in >Unix/Linux environments, more and more programs are being ported to other operating >systems like Windows, and many users (often biologists with little background in >programming) are looking for ways to automate bioinformatics analyses in the Windows >environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of >the functionality of Bioperl is available with this type of install. Much of the heavy >lifting in bioinformatics is done by programs originally developed in lower level >languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl simply acts as a >wrapper for running and parsing output from these external programs. Some of those >programs (BLAST for example) are ported to Windows. These can be installed and work >quite happily with BioPerl in the native Windows environment. Others, such as clustalw, >have Windows ports, however the BioPerl developer who wrote the interface used Unix >specific system calls to interact with these programs and so these wrappers will not work >in the Windows environment. And finally some external programs such as Staden and the >EMBOSS suite of programs can not be installed on Windows at all, and therefore any part >of Bioperl that interacts with these packages either won't work or can't be installed at >all. > >If you have a fairly simple project in mind, want to start using Bioperl quickly, only >have access to a computer running Windows, and/or don't mind bumping up against some >limitations then Bioperl on Windows may be a good place for you to start. For example, >downloading a bunch of sequences from GenBank and sorting out the ones that have a >particular annotation or feature works great. Running a bunch of your sequences against >remote or local BLAST, parsing the output and storing it in a MySQL database would be >fine also. Be aware that most if not all of the Bioperl developers are working in some >type of a Unix environment (Linux, OSX, Cygwin). If you have problems with Bioperl that >are specific to the Windows environment, you may be blazing new ground and your pleas for >help on the Bioperl mailing list may get few responses - simply because no one knows the >answer to your Windows specific problem. If this is or becomes a problem for you then >you are better off working in some type of Unix like environment. One solution to this >problem that will keep you working on a Windows machine it to install Cygwin, a Unix >emulation environment for Windows. A number of Bioperl users are using this approach >successfully and it is discussed more below. > >3) Perl on Windows >=================== > >There are a couple of ways of installing Perl on a Windows machine. The most common and >easiest is to get the most recent build from ActiveState. ActiveState is a software >company (http://www.activestate.com) that provides free builds of Perl for Windows >users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638 >is also available and should work just fine). To install ActivePerl on Windows: > Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ > Run the ActivePerl Installer (accepting all defaults is fine). > >You can also build Perl yourself (which requires a C compiler) or download one of the >other binary distributions. The Perl source for building it yourself is available from >CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives >to ActiveState. This approach is not recommended unless you have specific reasons for >doing so and know what you're doing. If that's the case you probably don't need to be >reading this guide. > >Cygwin is a Unix emulation environment for Windows and comes with its own copy of Perl. >Information on Cygwin and Bioperl is found below. > >4) BioPerl on Windows >====================== > >Perl is a programming language that has been extended a lot by the addition of external >modules. These modules work with the core language to extend the functionality of Perl. >Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend >on the functionality of other Perl modules and this creates a dependency. You can't >install module X unless you have already installed module Y. Some Perl modules are so >fundamentally useful that the Perl developers have included them in the core distribution >of Perl - if you've installed Perl then these modules are already installed. Other >modules are freely available from CPAN, but you'll have to install them yourself if you >want to use them. BioPerl has such dependencies. > >Bioperl is actually a large collection of Perl modules (over 1000 currently) and these >modules are split into six groups. These six groups are: > > Bioperl Group Functions > ----------------------------------------------------------------- > bioperl (the core) Most of the main functionality of Bioperl. > bioperl-run Wrappers to a lot of external programs. > bioperl-ext Interaction with some alignment functions > and the Staden package. > bioperl-db Using bioperl with BioSQL and local > relational databases. > bioperl-microarray Microarray specific functions. > biperl-gui Some preliminary work on a graphical user > interface to some Bioperl functions. > >The Bioperl core is what most new users will want to start with. Bioperl (the core) >and the Perl modules that it depends on can be easily installed with PPM. PPM >(Programmer's Package Manager formally known as the Perl Package Manager) is an ActivePerl >utility for installing Perl modules on systems using ActivePerl. PPM will look online >(you have to be connected to the internet of course) for files (these files end with .ppd) >that tell it how to install the modules you want and what other modules your new modules >depends on. It will then download and install your modules and all dependent modules for >you. These .ppd files are stored online in PPM repositories. ActiveState maintains the >largest PPM repository and when you installed ActivePerl PPM was installed with directions >for using the ActiveState repositories. Unfortunately the ActiveState repositories are >far from complete and other ActivePerl users maintain their own PPM repositories to fill >in the gaps. Installing will require you to direct PPM to look in two new repositories. >You do this by opening a Windows command prompt, typing ppm to start the PPM shell and >then typing the following two commands: > PPM> rep add Bioperl http://bioperl.org/DIST > PPM> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > >Once PPM knows where to look for Bioperl and it's dependencies you simply tell PPM to >install it. This is done with the command: > PPM> install Bioperl > >5) Beyond the Core >=================== > >You may find that you want some of the features of other Bioperl groups like bioperl-run >or bioperl-db. There are currently no PPM packages for installing these parts of >Bioperl (but check this by doing a Bioperl search at the PPM shell): > PPM> search bioperl > >If they are not present, you will have to install these manually from source. For this >you will need a Windows version of the program make called nmake >(http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). You will >also want to have a willingness to experiment. You'll have to read the installation >documents for each component that you want to install, and use nmake where the >instructions call for make. You will have to determine from the installation documents >what dependencies are required and you will have to get them, read there documentation >and install them first. The details of this are beyond the scope of this guide. Read >the documentation. Search Google. Try your best, and if you get stuck consult with >others on the bioperl mailing list. > >6) BioPerl and Cygwin >===================== > >Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. BioPerl >runs well within Cygwin. Some users claim that installation of Bioperl is easier within >Cygwin than within Windows, but these may be users with Unix backgrounds. > >One advantage of using Bioperl in Cygwin is that all the external modules are available >through CPAN, most if not all external programs can be installed and run so many of the >limitation of Bioperl on Windows are circumvented. > >To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, >make, and gcc packages. Clicking the "View" button in the upper right of the installer >enables you to see details on the various packages. Then follow the BioPerl installation >instructions for Unix in BioPerl's INSTALL file. > >Note that expat comes with Cygwin (it's used by the module XML::Parser). > >One known issue is that DBD::mysql can be tricky to install in >Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline >external packages. Fortunately there's some good instructions online: >http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. > >Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a >place to create temporary files. e.g.: > >setenv TMPDIR e:/cygwin/tmp # csh, tcsh >export TMPDIR=e:/cygwin/tmp # sh, bash > >Note that this is not a syntax that Cygwin understands, which would be something like >"/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows. > >If this variable is not set correctly you'll see errors like this when you run >Bio::Tools::Run::StandAloneBlast: > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory >STACK: Error::throw >.......... > >7) Cygwin Tips >=============== > >The easiest way to install MySQL is to use the Windows binaries available at >www.mysql.com. Note that Windows does not have sockets, so you need to force the MySQL >connections to use TCP/IP instead. Do this by using the "-h" option from the command- >line: > > > >>mysql -h 127.0.0.1 -u blip -pblop biosql >> >> > >Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For >example, if your databases are installed locally: > >alias mysql 'mysql -h 127.0.0.1' > >If you're trying to use some application or resource "outside" of Cygwin and you're >having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin >understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: >drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may >end up with paths written in these different syntaxes, depending. > >If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32- >formatted. When you install Cygwin on a FAT32 partition you will not be able to set >permissions and ownership correctly. In most situations this probably won't make any >difference but there may be occasions where this is a problem. > >If you want to use BLAST we recommend that the Windows binary be obtained from NCBI >(ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something >like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls. > >Although we've recommended using the BLAST and MySQL binaries you should be able to >compile just about everything else from source code using Cygwin's gcc. You'll notice >when you're installing Cygwin that many different libraries are also available (gd, jpeg, >etc.). > >8) Example Script >================= > >#!/usr/bin/perl > >#A short script to demonstrate how to download sequences from GenBank and access >#the sequence and some associated annotations using Bioperl. > >use strict; >use warnings; >use Bio::SeqIO; >use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed > >#Get some sequence IDs either like below, or read in from a file. Note that >#this sample script works with the accession numbers below (at least at the time >#it was written). If you add different accession numbers, and you get errors, >#you may be calling for something that the sequence doesn't have. You'll have >#to add your own error trapping code to handle that. >my @ids = ('K03160', 'AB039327', 'BC035972'); > >#Create the GenBank database object to read from the database. >my $gb = new Bio::DB::GenBank(); > >#Create a sequence stream to pass the sequences from the database to the program. >my $seqio = $gb->get_Stream_by_id(\@ids); > >#Loop over all of the sequences that you requested. >while (my $seq = $seqio->next_seq) { > > #Here is how you get methods directly from the RichSeq object. Replace > #'display_name' with any other method in Table 2. that can be called on > #either the RichSeq object directly, or the PrimarySeq object which it has > #inherited. > print "Display Name: ", $seq->display_name,"\n"; > print "Sequence Date: ",$seq->get_dates,"\n"; > > #Here is how to access the classification data from the species object. > my $species = $seq->species; > print "Species :", $species->common_name,"\n"; > my @class = $species->classification; > print "Classification: @class\n"; > > #Here is a general way to call things that are stored as a Bio::SeqFeature:: > #Generic object. Replace 'source' with any other of the "major" headings in > #the feature table (e.g gene, CDS, etc.) and replace 'organism' with any of > #the tag values found under that heading (mol_type, locus_tag, gene, etc.) > my @source_feats = grep { $_->primary_tag eq 'source' } $seq->get_SeqFeatures(); > my $source_feat = shift @source_feats; > my @mol_type = $source_feat->get_tag_values('mol_type'); > print "Molecule Type: @mol_type\n"; > > #Here is a general way to call things that are stored as some type of a > #Bio::Annotation oject. This includes reference information, and comments. > #Replace reference with 'comment' to get the comment, and replace > #$ref->authors with $ref->title (or location, medline, etc.) to get other > #reference categories > my $ann = $seq->annotation(); > my @references = ($ann->get_Annotations('reference')); > my $ref = shift @references; > my ($title, $authors, $location, $pubmed, $reference); > if (defined $ref) { > $authors = $ref->authors; > print "Authors: $authors\n"; > } > print "Sequence: \n", $seq->seq, "\n\n"; >} > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From barry.moore at genetics.utah.edu Thu Dec 9 12:55:28 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu Dec 9 13:49:01 2004 Subject: [Bioperl-l] installing HTML::Parser In-Reply-To: <20041209014551.88918.qmail@web50806.mail.yahoo.com> References: <20041209014551.88918.qmail@web50806.mail.yahoo.com> Message-ID: <41B89190.5090601@genetics.utah.edu> Xiaodong- I had similar problems installing other perl modules on Redhat 9. In the begining there was ASCII and Unix was happy. But the world spoke more than just latin based languages and we got ISO-10646 and Unicode and Unix was confused. ASCII is of course the old character set used by early computer systems. ISO-10646 and Unicode are huge character sets that support just about every language known (they're even working on support for Tolkien's elvish Tengwar). UTF-8 is a character standard that allows ASCII based Unix operating systems to maintain backward compatibility with ASCII and forward compatibility to Unicode. RedHad 8 was the first major Linux distribution to use UTF-8 as the default encoding for all locales, but unfortunately there was a major problem with UTF-8 support in the perl that shipped with Redhat I've never read anything that said what that problem was, but the solution Paulo gave you seems to always solve all the problems. Try Googling "UTF-8 Redhat Perl Makefile" if you want to read about this problem ad nauseum. Barry Xiaodong wrote: >Thanks Paulo. Your way actually worked. Now I have it installed. But, I >am still confused about why it happened in the first place. Any ideas? > >Xiaodong > >--- Paulo Almeida wrote: > > > >>Hi, >> >>I'm not sure it's the same thing, but this might help you: >>http://forums.devshed.com/t77648/s.html >> >>The part that interests you is: >> >>I made the change to /etc/sysconfig/i18n >> >>The default file reads >> >>LANG="en_US.UTF-8" >>SUPPORTED="en_US.UTF-8:en_US:en" >>SYSFONT="latarcyrheb-sun16" >> >>I change my file to read >> >>LANG="en_US" >>SUPPORTED="en_US" >>SYSFONT="latarcyrheb-sun16" >> >>If that doesn't help, you can google for "Malformed UTF-8 character >>(unexpected" and see what else comes up. >> >>-Paulo Almeida >> >> >>X wrote: >> >> >> >>>Hello there, >>> >>>I am new to BioPerl. As I was trying to install the module of >>>HTML::Parser from CPAN. I got the following error messages when >>> >>> >>testing >> >> >>>the package. It seemed that my system was not correctly configured >>> >>> >>or >> >> >>>something. Could anybody give an explanation of the error messages >>> >>> >>and >> >> >>>how to fix the problem? Really appreciate it. >>> >>> >>>...... (tests ok) >>>t/entities ...........Malformed UTF-8 character (unexpected >>>non-continuation byte 0x72, immediately after start byte oxe5) in >>>substitution iterator at >>>/root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Entities.pm line >>> >>> >>458. >> >> >>>t/entities ...........ok 2/11Confused test output: test 2 answered >>>after test 4 >>>t/entities............ok 3/11Confused test output: test 3 answered >>>after test 5 >>>t/entities............NOK 4Confused test output: test 4 answered >>> >>> >>after >> >> >>>test 6 >>>t/entities............NOK 5Confused test output: test 5 answered >>> >>> >>after >> >> >>>test 7 >>>t/entities............NOK 6Confused test output: test 6 answered >>> >>> >>after >> >> >>>test 8 >>>t/entities............ok 7/11Confused test output: test 7 answered >>>after test 9 >>>t/entities............ok 8/11Confused test output: test 8 answered >>>after test 10 >>>t/entities............FAILED tests 1-3, 7-9 >>> Failed 6/11 tests, 45.45% okay >>>...... (tests ok) >>>t/headparser..........Parsing of undecoded UTF-8 will give garbage >>> >>> >>when >> >> >>>decoding entities at >>>/root/.cpan/build/HTML-Parser-3.43/blib/lib/HTML/Parser.pm line 104. >>># Test 3 got: '? v??re eller ? ikke v??re' (t/headparser.t at line >>> >>> >>137) >> >> >>># Expected: '? v?re eller ? ikke v?re' >>># t/headparser.t line 137 is: ok($p->header("Title"), "? v?re eller >>> >>> >>? >> >> >>>ikke v?re"); >>>t/headparser.........FAILED test 3 >>> Failed 1/6 tests, 83.33% okay >>>...... (tests ok) >>>t/uentities..........FAILED tests 2, 8 >>> Failed 2/14 tests, 85.71% okay >>>...... (tests ok) >>> >>>Failed 3/44 test scripts, 93.18% okay. 9/355 subtests failed, 97.46% >>>okay. >>>make: *** [test_dynamic] Error 29 >>> /usr/bin/make test -- NOT OK >>> >>> >>>Xiaodong >>> >>> >>> >> >> > > > > >__________________________________ >Do you Yahoo!? >Yahoo! Mail - Easier than ever with enhanced search. Learn more. >http://info.mail.yahoo.com/mail_250 >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From crabtree at tigr.org Thu Dec 9 13:56:28 2004 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Thu Dec 9 14:06:31 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel coloursfordifferent HSP frames within the same blast hit? Message-ID: Marcus- >That hack seems to do it. However, my program will be used by people >installing it themselves so I have to stick with the standard non-hacked >version of bioperl. OK, in that case here's an even less elegant solution for you to consider; this one requires you to distribute only a single file. Just replace 'blastx.out' with the name of your blastx output file in the script below. Jonathan #!/usr/bin/perl # BEGIN HACK use Bio::Graphics::Glyph::graded_segments; package Bio::Graphics::Glyph::graded_segments; # redefine draw method from Bioperl graded_segments package; # perl will warn you (and for good reason...) that you're doing this if you run it with the -w flag # sub draw { my $self = shift; # bail out if this isn't the right kind of feature # handle both das-style and Bio::SeqFeatureI style, # which use different names for subparts. my @parts = $self->parts; @parts = $self if !@parts && $self->level == 0; return $self->SUPER::draw(@_) unless @parts; my ($min_score,$max_score) = $self->minmax(\@parts); return $self->SUPER::draw(@_) unless defined($max_score) && defined($min_score) && $min_score < $max_score; my $span = $max_score - $min_score; foreach my $part (@parts) { # use part's bgcolor as base color (to be adjusted by score) my $fill = $part->bgcolor; my ($red,$green,$blue) = $self->panel->rgb($fill); my $s = eval { $part->feature->score }; unless (defined $s) { $part->{partcolor} = $fill; next; } my ($r,$g,$b) = $self->calculate_color($s,[$red,$green,$blue],$min_score,$span); my $idx = $self->panel->translate_color($r,$g,$b); $part->{partcolor} = $idx; } $self->SUPER::draw(@_); } package MAIN; # END HACK use Bio::Graphics; use Bio::SearchIO; my $searchio = Bio::SearchIO->new(-file=> 'blastx.out', -format => 'blast'); my $result = $searchio->next_result(); my $panel = Bio::Graphics::Panel->new(-length=> $result->query_length, -width=> 800); my $track = $panel->add_track(-glyph => 'graded_segments', -label => 1, -connector => 'dashed', -bgcolor => sub { my $feature = shift; my ($frame) = $feature->frame(); return "red" if ($frame =~ /0/); return "green" if ($frame =~ /1/); return "blue" if ($frame =~ /2/)}, -strand_arrow => 'tue'); while( my $hit = $result->next_hit ) { my $feature = Bio::SeqFeature::Generic->new(-score=>$hit->raw_score, -frame=> $hit->frame); while( my $hsp = $hit->next_hsp ) { $feature->add_sub_SeqFeature($hsp,'EXPAND'); } $track->add_feature($feature); } print $panel->png; From jason.stajich at duke.edu Thu Dec 9 14:24:36 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Dec 9 14:22:56 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel coloursfordifferent HSP frames within the same blast hit? In-Reply-To: References: Message-ID: On Dec 9, 2004, at 1:56 PM, Crabtree, Jonathan wrote: > > Marcus- > >> That hack seems to do it. However, my program will be used by people >> installing it themselves so I have to stick with the standard >> non-hacked >> version of bioperl. > > OK, in that case here's an even less elegant solution for you to > consider; this one requires you to distribute only a single file. > Just replace 'blastx.out' with the name of your blastx output file in > the script below. > > Jonathan > > > #!/usr/bin/perl > > # BEGIN HACK > # You can do this even more succinctly and without the warnings use Bio::Graphics::Glyph::graded_segments; # package Bio::Graphics::Glyph::graded_segments; # redefine draw method from Bioperl graded_segments package; # perl will warn you (and for good reason...) that you're doing this if you run it with the -w flag # # sub draw { sub Bio::Graphics::Glyph::graded_segments::draw { my $self = shift; # bail out if this isn't the right kind of feature # handle both das-style and Bio::SeqFeatureI style, # which use different names for subparts. my @parts = $self->parts; @parts = $self if !@parts && $self->level == 0; return $self->SUPER::draw(@_) unless @parts; my ($min_score,$max_score) = $self->minmax(\@parts); return $self->SUPER::draw(@_) unless defined($max_score) && defined($min_score) && $min_score < $max_score; my $span = $max_score - $min_score; foreach my $part (@parts) { # use part's bgcolor as base color (to be adjusted by score) my $fill = $part->bgcolor; my ($red,$green,$blue) = $self->panel->rgb($fill); my $s = eval { $part->feature->score }; unless (defined $s) { $part->{partcolor} = $fill; next; } my ($r,$g,$b) = $self->calculate_color($s,[$red,$green,$blue],$min_score,$span); my $idx = $self->panel->translate_color($r,$g,$b); $part->{partcolor} = $idx; } $self->SUPER::draw(@_); } # package MAIN; > > # END HACK > > use Bio::Graphics; > use Bio::SearchIO; > > my $searchio = Bio::SearchIO->new(-file=> 'blastx.out', -format => > 'blast'); > my $result = $searchio->next_result(); > my $panel = Bio::Graphics::Panel->new(-length=> $result->query_length, > -width=> 800); > my $track = $panel->add_track(-glyph => 'graded_segments', > -label => 1, > -connector => 'dashed', > -bgcolor => sub { > my $feature = shift; > my ($frame) = $feature->frame(); > return "red" if ($frame =~ /0/); > return "green" if ($frame =~ /1/); > return "blue" if ($frame =~ /2/)}, > -strand_arrow => 'tue'); > while( my $hit = $result->next_hit ) { > my $feature = > Bio::SeqFeature::Generic->new(-score=>$hit->raw_score, > -frame=> $hit->frame); > while( my $hsp = $hit->next_hsp ) { > $feature->add_sub_SeqFeature($hsp,'EXPAND'); > } > $track->add_feature($feature); > } > print $panel->png; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From crabtree at tigr.org Thu Dec 9 14:44:17 2004 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Thu Dec 9 14:42:13 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel coloursfordifferent HSP frames within the same blast hit? Message-ID: Jason- Perhaps a data entry error on my part is to blame, but when I try your version I still get the warning, and I also get the following runtime error because Perl can't resolve the reference to $self->SUPER::draw: Can't locate object method "draw" via package "main" at ./test2.pl line 48, line 191. I agree that the "package MAIN;" is superfluous, but I think you need the other one (unless you replace SUPER::draw with something more specific, at which point I think your already-marginal succinctness advantage goes out the window...) Does this version work for you, Marcus? Jonathan > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Thursday, December 09, 2004 2:25 PM > To: Crabtree, Jonathan > Cc: Marcus Claesson; Bioperl list > Subject: Re: [Bioperl-l] Can I get different Graphics::Panel > coloursfordifferent HSP frames within the same blast hit? > > > > On Dec 9, 2004, at 1:56 PM, Crabtree, Jonathan wrote: > > > > > Marcus- > > > >> That hack seems to do it. However, my program will be used > by people > >> installing it themselves so I have to stick with the standard > >> non-hacked version of bioperl. > > > > OK, in that case here's an even less elegant solution for you to > > consider; this one requires you to distribute only a single file. > > Just replace 'blastx.out' with the name of your blastx > output file in > > the script below. > > > > Jonathan > > > > > > #!/usr/bin/perl > > > > # BEGIN HACK > > > # You can do this even more succinctly and without the warnings > > use Bio::Graphics::Glyph::graded_segments; > # package Bio::Graphics::Glyph::graded_segments; > > # redefine draw method from Bioperl graded_segments package; > # perl will warn you (and for good reason...) that you're > doing this if > you run it with the -w flag > # > # sub draw { > sub Bio::Graphics::Glyph::graded_segments::draw { > my $self = shift; > > # bail out if this isn't the right kind of feature > # handle both das-style and Bio::SeqFeatureI style, > # which use different names for subparts. > my @parts = $self->parts; > @parts = $self if !@parts && $self->level == 0; > return $self->SUPER::draw(@_) unless @parts; > > my ($min_score,$max_score) = $self->minmax(\@parts); > > return $self->SUPER::draw(@_) > unless defined($max_score) && defined($min_score) > && $min_score < $max_score; > > my $span = $max_score - $min_score; > > foreach my $part (@parts) { > # use part's bgcolor as base color (to be adjusted by score) > my $fill = $part->bgcolor; > my ($red,$green,$blue) = $self->panel->rgb($fill); > > my $s = eval { $part->feature->score }; > unless (defined $s) { > $part->{partcolor} = $fill; > next; > } > my ($r,$g,$b) = > $self->calculate_color($s,[$red,$green,$blue],$min_score,$span); > my $idx = $self->panel->translate_color($r,$g,$b); > $part->{partcolor} = $idx; > } > $self->SUPER::draw(@_); > } > > # package MAIN; > > > > # END HACK > > > > use Bio::Graphics; > > use Bio::SearchIO; > > > > my $searchio = Bio::SearchIO->new(-file=> 'blastx.out', -format => > > 'blast'); > > my $result = $searchio->next_result(); > > my $panel = Bio::Graphics::Panel->new(-length=> > $result->query_length, > > -width=> 800); > > my $track = $panel->add_track(-glyph => 'graded_segments', > > -label => 1, > > -connector => 'dashed', > > -bgcolor => sub { > > my $feature = shift; > > my ($frame) = $feature->frame(); > > return "red" if ($frame =~ /0/); > > return "green" if ($frame =~ /1/); > > return "blue" if ($frame =~ /2/)}, > > -strand_arrow => 'tue'); > > while( my $hit = $result->next_hit ) { > > my $feature = > > Bio::SeqFeature::Generic->new(-score=>$hit->raw_score, > > -frame=> > $hit->frame); > > while( my $hsp = $hit->next_hsp ) { > > $feature->add_sub_SeqFeature($hsp,'EXPAND'); > > } > > $track->add_feature($feature); > > } > > print $panel->png; > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > From jason.stajich at duke.edu Thu Dec 9 14:50:41 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Dec 9 14:47:59 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel coloursfordifferent HSP frames within the same blast hit? In-Reply-To: References: Message-ID: <9B4F1F5E-4A1B-11D9-9D3C-000393C44276@duke.edu> Ah right - wasn't really looking at the code in the method - if you want to call SUPER then you would need to declare the package as you did (and then later reset the package to MAIN) I guess. So ignore me... Sorry. -jason On Dec 9, 2004, at 2:44 PM, Crabtree, Jonathan wrote: > > Jason- > > Perhaps a data entry error on my part is to blame, but when I try your > version I still get the warning, and I also get the following runtime > error because Perl can't resolve the reference to $self->SUPER::draw: > > Can't locate object method "draw" via package "main" at ./test2.pl line > 48, line 191. > > I agree that the "package MAIN;" is superfluous, but I think you need > the other one (unless you replace SUPER::draw with something more > specific, at which point I think your already-marginal succinctness > advantage goes out the window...) Does this version work for you, > Marcus? > > Jonathan > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich@duke.edu] >> Sent: Thursday, December 09, 2004 2:25 PM >> To: Crabtree, Jonathan >> Cc: Marcus Claesson; Bioperl list >> Subject: Re: [Bioperl-l] Can I get different Graphics::Panel >> coloursfordifferent HSP frames within the same blast hit? >> >> >> >> On Dec 9, 2004, at 1:56 PM, Crabtree, Jonathan wrote: >> >>> >>> Marcus- >>> >>>> That hack seems to do it. However, my program will be used >> by people >>>> installing it themselves so I have to stick with the standard >>>> non-hacked version of bioperl. >>> >>> OK, in that case here's an even less elegant solution for you to >>> consider; this one requires you to distribute only a single file. >>> Just replace 'blastx.out' with the name of your blastx >> output file in >>> the script below. >>> >>> Jonathan >>> >>> >>> #!/usr/bin/perl >>> >>> # BEGIN HACK >>> >> # You can do this even more succinctly and without the warnings >> >> use Bio::Graphics::Glyph::graded_segments; >> # package Bio::Graphics::Glyph::graded_segments; >> >> # redefine draw method from Bioperl graded_segments package; >> # perl will warn you (and for good reason...) that you're >> doing this if >> you run it with the -w flag >> # >> # sub draw { >> sub Bio::Graphics::Glyph::graded_segments::draw { >> my $self = shift; >> >> # bail out if this isn't the right kind of feature >> # handle both das-style and Bio::SeqFeatureI style, >> # which use different names for subparts. >> my @parts = $self->parts; >> @parts = $self if !@parts && $self->level == 0; >> return $self->SUPER::draw(@_) unless @parts; >> >> my ($min_score,$max_score) = $self->minmax(\@parts); >> >> return $self->SUPER::draw(@_) >> unless defined($max_score) && defined($min_score) >> && $min_score < $max_score; >> >> my $span = $max_score - $min_score; >> >> foreach my $part (@parts) { >> # use part's bgcolor as base color (to be adjusted by score) >> my $fill = $part->bgcolor; >> my ($red,$green,$blue) = $self->panel->rgb($fill); >> >> my $s = eval { $part->feature->score }; >> unless (defined $s) { >> $part->{partcolor} = $fill; >> next; >> } >> my ($r,$g,$b) = >> $self->calculate_color($s,[$red,$green,$blue],$min_score,$span); >> my $idx = $self->panel->translate_color($r,$g,$b); >> $part->{partcolor} = $idx; >> } >> $self->SUPER::draw(@_); >> } >> >> # package MAIN; >>> >>> # END HACK >>> >>> use Bio::Graphics; >>> use Bio::SearchIO; >>> >>> my $searchio = Bio::SearchIO->new(-file=> 'blastx.out', -format => >>> 'blast'); >>> my $result = $searchio->next_result(); >>> my $panel = Bio::Graphics::Panel->new(-length=> >> $result->query_length, >>> -width=> 800); >>> my $track = $panel->add_track(-glyph => 'graded_segments', >>> -label => 1, >>> -connector => 'dashed', >>> -bgcolor => sub { >>> my $feature = shift; >>> my ($frame) = $feature->frame(); >>> return "red" if ($frame =~ /0/); >>> return "green" if ($frame =~ /1/); >>> return "blue" if ($frame =~ /2/)}, >>> -strand_arrow => 'tue'); >>> while( my $hit = $result->next_hit ) { >>> my $feature = >>> Bio::SeqFeature::Generic->new(-score=>$hit->raw_score, >>> -frame=> >> $hit->frame); >>> while( my $hsp = $hit->next_hsp ) { >>> $feature->add_sub_SeqFeature($hsp,'EXPAND'); >>> } >>> $track->add_feature($feature); >>> } >>> print $panel->png; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From brian_osborne at cognia.com Thu Dec 9 15:06:44 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Dec 9 15:05:01 2004 Subject: [Bioperl-l] Installing BioPerl on Windows In-Reply-To: <41B76EE0.6080800@genetics.utah.edu> Message-ID: Barry, Your proposed revisions are the third time someone has attempted to redo the Windows installation file. Or perhaps the fourth? The other well-intentioned authors made their attempts for the same reasons you did: 1) Windows users are a sizable fraction of the users with installation problems. 2) Windows users with problems have the same questions, again and again ("where or what is GD?", etc). 3) These users have not read the INSTALL.WIN file, or have not paid attention. So, I'm fairly certain that your proposed changes will make no difference, no matter how well-reasoned they are. If people don't read this file, changing it makes no difference. So, where would you put a "Windows tips" file? Again, I don't think Windows users pay attention to the files in the top directory. Check out the first section of the README file, it directs them immediately to INSTALL.WIN, very obvious, so these users aren't reading the README either. I'm not being snide here, I just think the mode of Windows installation doesn't naturally lead to reading these top-level documents. Different from Unix. Question: when the Windows user downloads the package what do they do with it? Given a typical approach, what's the best place to put information on Windows installation? On the Web download page perhaps? Another effective way to do these kinds of documents is to get all the frequently asked questions/problems and address them specifically. So, you'd have a "quick start" section first, as you did, then follow it immediately with a list of questions/problems and answers. Yes, you might consider putting these into the existing FAQ but then each time the user writes "where is ...?" you'd have to answer "please check the FAQ 4.2 ...". Less than ideal, since the idea is to set things up so that the users don't have to write bioperl-l. Thank you for efforts. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Barry Moore Sent: Wednesday, December 08, 2004 4:15 PM To: Jason Stajich Cc: Brian Osborne; Bioperl List Subject: [Bioperl-l] Installing BioPerl on Windows Jason, Brian, Others- A recent message to the bioperl list suggests that new Windows users are still having problems installing Bioperl on Windows. This is not necessary because it's actually quite easy to install Bioperl 1.4. I had a look at the INSATLL.WIN document and I think that while it has been updated a bit, it is starting to suffer from fragmented editing over a long period of time. All the information that you need is there, but it doesn't really fit together to well anymore, and there is still some outdated and conflicting information present. Since new Windows users are often the least likely to be experienced programmers and also likely to have little Unix experience it may also need to be written with that in mind, providing more explanation for how things are done. I've taken a crack at this and rewritten INSTALL.WIN with a longer (perhaps to long) introduction to Bioperl, and updated installation instruction for Bioperl 1.4. In fact I think that the file name INSTALL.WIN should probably be changed as that is a filename that is intuitive to someone who has done a lot of installing from source. Installing_Bioperl_on_Windows.txt may be more obvious filename to new Windows users. If you think it looks useful please feel free to post it on the Bioperl web site as a replacement for or in addition to the current INSTALL.WIN. I'll be happy to try to keep this document up to date, but I'll need one of the developers to put it on the site for me. Finally, I didn't touch the Cygwin sections of the previous INSTALL.WIN document because I have no experience with it, so I'll have to assume that it is accurate and let others contribute any fixes necessary there. Let me know if I've made any errors or omissions that need to be corrected. Barry ============================================================================ ====== Installing Bioperl on Windows ============================= 1) Quick Instructions for the impatient 2) Bioperl on Windows 3) Perl on Windows 4) BioPerl on Windows 5) Beyond the Core 6) BioPerl in Cygwin 7) Cygwin tips This installation guide was written by Barry Moore and other Bioperl authors based on the original work of Paul Boutros. Please report problems and/or fixes to the bioper lmailing list, bioperl-l@bioperl.org 1) Quick instructions for the impatient, lucky, or experienced user. ===================================================================== Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). Open a command prompt (Menus Start->Run and type cmd) and run the ppm shell (C:\>ppm). Add two new ppm repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Install Bioperl-1.4. Go to http://www.bioperl.org and start reading documentation or try the example script at the end of this file. 2) Bioperl on Windows ====================== Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, fasta, clustalw and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc.) and local (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. And finally with its associated documentation and mailing list Bioperl represents a community of bioinformatics professionals working in perl who are committed to supporting both development of Bioperl and the new users who are drawn to the project. While most bioinformatics and computational biology applications are developed in Unix/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, clustalw, Staden etc.). Bioperl simply acts as a wrapper for running and parsing output from these external programs. Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with BioPerl in the native Windows environment. Others, such as clustalw, have Windows ports, however the BioPerl developer who wrote the interface used Unix specific system calls to interact with these programs and so these wrappers will not work in the Windows environment. And finally some external programs such as Staden and the EMBOSS suite of programs can not be installed on Windows at all, and therefore any part of Bioperl that interacts with these packages either won?t work or can?t be installed at all. If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don?t mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also. Be aware that most if not all of the Bioperl developers are working in some type of a Unix environment (Linux, OSX, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses ? simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of Unix like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a Unix emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed more below. 3) Perl on Windows =================== There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState. ActiveState is a software company (http://www.activestate.com) that provides free builds of Perl for Windows users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638 is also available and should work just fine). To install ActivePerl on Windows: Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you?re doing. It that?s the case you probably don?t need to be reading this guide. Cygwin is a Unix emulation environment for Windows and comes with its own copy of Perl. Information on Cygwin and Bioperl is found below. 4) BioPerl on Windows ====================== Perl is a programming language that has been extended a lot by the addition of external modules. These modules work with the core language to extend the functionality of Perl. Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can?t install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl ? if you?ve installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you?ll have to install them yourself if you want to use them. BioPerl has such dependencies. Bioperl is actually a large collection of perl modules (over 1000 currently) and these modules are split into six groups. These six groups are: Bioperl Group Functions ----------------------------------------------------------------- bioperl (the core) Most of the main functionality of Bioperl. bioperl-run Wrappers to a lot of external programs. bioperl-ext Interaction with some alignment functions and the Staden package. bioperl-db Using bioperl with BioSQL and local relational databases. bioperl-microarray Microarray specific functions. biperl-gui Some preliminary work on a graphical user interface to some Bioperl functions. The Bioperl core is what most new users will want to start with. Bioperl 1.4 (the core) and the Perl modules that it depends on can be easily installed with ppm. PPM (Programming Package Manager) is an ActivePerl utility for installing Perl modules on systems using ActivePerl. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you. These .ppd files are stored online in ppm repositories. ActiveState maintains the largest ppm repository and when you installed ActivePerl ppm was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own ppm repositories to fill in the gaps. Installing will require you to direct ppm to look in two new repositories. You do this by opening a Windows command prompt, typing ppm to start the ppm shell and then typing the following two commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Once ppm knows where to look for Bioperl and it?s dependencies you simply tell ppm to install it. This is done with the command: ppm> install Bioperl-1.4 5) Beyond the Core =================== You may find that you want some of the features of other Bioperl groups like bioperl-run or bioperl-db. There are currently no ppm packages for installing these parts of Bioperl. You will have to install these manually from source. For this you will need a Windows version of the program make called nmake (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.ex e). You will also want to have a willingness to experiment. You?ll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make. You will have to determine from the installation documents what dependencies are required and you will have to get them, read there documentation and install them first. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with other on the bioperl mailing list. 6) BioPerl in Cygwin ===================== Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. BioPerl runs well within Cygwin. Some users claim that installation of Bioperl is easier within Cygwin than within Windows, but these may be users with Unix backgrounds. One advantage of using Bioperl in Cygwin is that all the external modules are available through CPAN, most if not all external programs can be installed and run so many of the limitation of Bioperl on Windows are circumvented. To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, make, and gcc packages. Clicking the "View" button in the upper right of the installer enables you to see details on the various packages. Then follow the BioPerl installation instructions for Unix in BioPerl's INSTALL file. Note that expat comes with Cygwin (it's used by the module XML::Parser). One known issue is that DBD::mysql can be tricky to install in Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline external packages. Fortunately there's some good instructions online: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwi n. Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a place to create temporary files. e.g.: setenv TMPDIR e:/cygwin/tmp # csh, tcsh export TMPDIR=e:/cygwin/tmp # sh, bash Note that this is not a syntax that Cygwin understands, which would be something like "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows. If this variable is not set correctly you'll see errors like this when you run Bio::Tools::Run::StandAloneBlast: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory STACK: Error::throw .......... 7) Cygwin tips =============== The easiest way to install Mysql is to use the Windows binaries available at www.mysql.com. Note that Windows does not have sockets, so you need to force the Mysql connections to use TCP/IP instead. Do this by using the "-h" option from the command- line: >mysql -h 127.0.0.1 -u blip -pblop biosql Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For example, if your databases are installed locally: alias mysql 'mysql -h 127.0.0.1' If you're trying to use some application or resource "outside" of Cygwin and you're having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may end up with paths written in these different syntaxes, depending. If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32- formatted. When you install Cygwin on a FAT32 partition you will not be able to set permissions and ownership correctly. In most situations this probably won't make any difference but there may be occasions where this is a problem. If you want to use BLAST we recommend that the Windows binary be obtained from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls. Although we've recommended using the BLAST and MySQL binaries you should be able to compile just about everything else from source code using Cygwin's gcc. You'll notice when you're installing Cygwin that many different libraries are also available (gd, jpeg, etc.). From jason.stajich at duke.edu Thu Dec 9 16:04:06 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Dec 9 16:01:39 2004 Subject: [Bioperl-l] Re: Extracting Raw Score Value from Blast Output In-Reply-To: <41B8B57E.9080707@bioanalysis.org> References: <41B8B57E.9080707@bioanalysis.org> Message-ID: In the future it is helpful to post code that you are using. I suspect you are calling $hit->raw_score which is the overall value for the HIT not the HSP. If you want the score for the HSP you should call $hsp->score. http://bioperl.org/HOWTOs/SearchIO/use.html is a good place to start seeing where values get stored. -jason On Dec 9, 2004, at 3:28 PM, Waibhav Tembe wrote: > Hello, > > I am relatively new to BLAST and BioPerl. Apologies if this > question/observation is trivial or I have made any basic mistake. > > I am parsing BLAST output using *bioperl-1.4::Bio::Search::Hit* > . > For a given hit, I would like to extract raw score, bit score and > other information. Using > ->raw_score > ->Bits > for a hit. Here is what I observed. (Just pasting relevant info from > BLAST output) > ================================================ > Query= PA008 > (35 letters) > Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, > GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) > 2,718,617 sequences; 12,254,801,043 total letters > Searching..................................................done > Score > E > Sequences producing significant alignments: > (bits) Value > gb|CP000001.1| Bacillus cereus ZK, complete genome > 70 2e-10 > omitted all other records .......... > >gb|CP000001.1| Bacillus cereus ZK, complete genome > Length = 5300915 > Score = 69.9 bits (35), Expect = 2e-10 > Identities = 35/35 (100%) > Strand = Plus / Plus > Query: 1 ttaacgaagcatcgcgaagagcacgttcaattgga 35 > ||||||||||||||||||||||||||||||||||| > Sbjct: 3032643 ttaacgaagcatcgcgaagagcacgttcaattgga 3032677 > --------------------------------------------- > For the above BLAST section, I generated the following statistics > using BioPerl. > Query Name = PA008 > Lambda=1.37, Kappa=0.711, Base Match Reward=1 > Checking Hit [1]Raw Score= 70 BitScore=69.9 EValue=2e-10 > Bacillus cereus ZK, complete genome > > I was expecting Raw Score = 35 and NOT 70. Is raw_score output by > BioPerl's implementation calculated differently? Am I reading BLAST > output incorreclty? > > Thanks! > > -waibhav > > -- > Waibhav Tembe. > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From smarkel at scitegic.com Thu Dec 9 19:22:37 2004 From: smarkel at scitegic.com (Scott Markel) Date: Thu Dec 9 19:20:29 2004 Subject: [Bioperl-l] Installing BioPerl on Windows In-Reply-To: References: Message-ID: <41B8EC4D.904@scitegic.com> Just to provide one commercial user's experience - I invoke BLAST, ClustalW, and EMBOSS programs on Windows and Linux by using BioPerl. I've found that using output files works best. This gets around the backticks vs system() issue. I use the same code on both operating systems. Scott Brian Osborne wrote: > Barry, > > Your proposed revisions are the third time someone has attempted to redo the > Windows installation file. Or perhaps the fourth? The other well-intentioned > authors made their attempts for the same reasons you did: 1) Windows users > are a sizable fraction of the users with installation problems. 2) Windows > users with problems have the same questions, again and again ("where or what > is GD?", etc). 3) These users have not read the INSTALL.WIN file, or have > not paid attention. > > So, I'm fairly certain that your proposed changes will make no difference, > no matter how well-reasoned they are. If people don't read this file, > changing it makes no difference. So, where would you put a "Windows tips" > file? Again, I don't think Windows users pay attention to the files in the > top directory. Check out the first section of the README file, it directs > them immediately to INSTALL.WIN, very obvious, so these users aren't reading > the README either. I'm not being snide here, I just think the mode of > Windows installation doesn't naturally lead to reading these top-level > documents. Different from Unix. > > Question: when the Windows user downloads the package what do they do with > it? Given a typical approach, what's the best place to put information on > Windows installation? On the Web download page perhaps? > > Another effective way to do these kinds of documents is to get all the > frequently asked questions/problems and address them specifically. So, you'd > have a "quick start" section first, as you did, then follow it immediately > with a list of questions/problems and answers. Yes, you might consider > putting these into the existing FAQ but then each time the user writes > "where is ...?" you'd have to answer "please check the FAQ 4.2 ...". Less > than ideal, since the idea is to set things up so that the users don't have > to write bioperl-l. > > Thank you for efforts. > > Brian O. > > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Barry Moore > Sent: Wednesday, December 08, 2004 4:15 PM > To: Jason Stajich > Cc: Brian Osborne; Bioperl List > Subject: [Bioperl-l] Installing BioPerl on Windows > > > Jason, Brian, Others- > > A recent message to the bioperl list suggests that new Windows users are > still having problems installing Bioperl on Windows. This is not > necessary because it's actually quite easy to install Bioperl 1.4. I had > a look at the INSATLL.WIN document and I think that while it has been > updated a bit, it is starting to suffer from fragmented editing over a > long period of time. All the information that you need is there, but it > doesn't really fit together to well anymore, and there is still some > outdated and conflicting information present. Since new Windows users > are often the least likely to be experienced programmers and also likely > to have little Unix experience it may also need to be written with that > in mind, providing more explanation for how things are done. I've taken > a crack at this and rewritten INSTALL.WIN with a longer (perhaps to > long) introduction to Bioperl, and updated installation instruction for > Bioperl 1.4. In fact I think that the file name INSTALL.WIN should > probably be changed as that is a filename that is intuitive to someone > who has done a lot of installing from source. > Installing_Bioperl_on_Windows.txt may be more obvious filename to new > Windows users. If you think it looks useful please feel free to post it > on the Bioperl web site as a replacement for or in addition to the > current INSTALL.WIN. I'll be happy to try to keep this document up to > date, but I'll need one of the developers to put it on the site for me. > Finally, I didn't touch the Cygwin sections of the previous INSTALL.WIN > document because I have no experience with it, so I'll have to assume > that it is accurate and let others contribute any fixes necessary there. > Let me know if I've made any errors or omissions that need to be corrected. > > Barry > > ============================================================================ > ====== > > Installing Bioperl on Windows > ============================= > > 1) Quick Instructions for the impatient > 2) Bioperl on Windows > 3) Perl on Windows > 4) BioPerl on Windows > 5) Beyond the Core > 6) BioPerl in Cygwin > 7) Cygwin tips > > This installation guide was written by Barry Moore and other Bioperl > authors based on the > original work of Paul Boutros. Please report problems and/or fixes to > the bioper lmailing > list, bioperl-l@bioperl.org > > 1) Quick instructions for the impatient, lucky, or experienced user. > ===================================================================== > > Download the ActivePerl MSI from > http://www.activestate.com/Products/ActivePerl/ > Run the ActivePerl Installer (accepting all defaults is fine). > Open a command prompt (Menus Start->Run and type cmd) and run the ppm > shell (C:\>ppm). > Add two new ppm repositories with the following commands: > ppm> rep add Bioperl http://bioperl.org/DIST > ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > Install Bioperl-1.4. > Go to http://www.bioperl.org and start reading documentation or try the > example script at > the end of this file. > > > 2) Bioperl on Windows > ====================== > > Bioperl is a large collection of Perl modules (extensions to the Perl > language) that aid > in the task of writing perl code to deal with sequence data in a myriad > of ways. Bioperl > provides objects for various types of sequence data and their associated > features and > annotations. It provides interfaces for analysis of these sequences with > a wide variety > of external programs (BLAST, fasta, clustalw and EMBOSS to name just a > few). It provides > interfaces to various types of databases both remote (GenBank, EMBL > etc.) and local > (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. > And finally with > its associated documentation and mailing list Bioperl represents a > community of > bioinformatics professionals working in perl who are committed to > supporting both > development of Bioperl and the new users who are drawn to the project. > > While most bioinformatics and computational biology applications are > developed in > Unix/Linux environments, more and more programs are being ported to > other operating > systems like Windows, and many users (often biologists with little > background in > programming) are looking for ways to automate bioinformatics analyses in > the Windows > environment. Perl and Bioperl can be installed natively on Windows > NT/2000/XP. Most of > the functionality of Bioperl is available with this type of install. > Much of the heavy > lifting in bioinformatics is done by programs originally developed in > lower level > languages like C and Pascal (e.g. BLAST, clustalw, Staden etc.). Bioperl > simply acts as a > wrapper for running and parsing output from these external programs. > Some of those > programs (BLAST for example) are ported to Windows. These can be > installed and work > quite happily with BioPerl in the native Windows environment. Others, > such as clustalw, > have Windows ports, however the BioPerl developer who wrote the > interface used Unix > specific system calls to interact with these programs and so these > wrappers will not work > in the Windows environment. And finally some external programs such as > Staden and the > EMBOSS suite of programs can not be installed on Windows at all, and > therefore any part > of Bioperl that interacts with these packages either won?t work or can?t > be installed at > all. > > If you have a fairly simple project in mind, want to start using Bioperl > quickly, only > have access to a computer running Windows, and/or don?t mind bumping up > against some > limitations then Bioperl on Windows may be a good place for you to > start. For example, > downloading a bunch of sequences from GenBank and sorting out the ones > that have a > particular annotation or feature works great. Running a bunch of your > sequences against > remote or local BLAST, parsing the output and storing it in a MySQL > database would be > fine also. Be aware that most if not all of the Bioperl developers are > working in some > type of a Unix environment (Linux, OSX, Cygwin). If you have problems > with Bioperl that > are specific to the Windows environment, you may be blazing new ground > and your pleas for > help on the Bioperl mailing list may get few responses ? simply because > no one knows the > answer to your Windows specific problem. If this is or becomes a problem > for you then > you are better off working in some type of Unix like environment. One > solution to this > problem that will keep you working on a Windows machine it to install > Cygwin, a Unix > emulation environment for Windows. A number of Bioperl users are using > this approach > successfully and it is discussed more below. > > 3) Perl on Windows > =================== > > There are a couple of ways of installing Perl on a Windows machine. The > most common and > easiest is to get the most recent build from ActiveState. ActiveState is > a software > company (http://www.activestate.com) that provides free builds of Perl > for Windows > users. The current (December 2004) build is ActivePerl 5.8.4.810 > (ActivePerl 5.6.1.638 > is also available and should work just fine). To install ActivePerl on > Windows: > Download the ActivePerl MSI from > http://www.activestate.com/Products/ActivePerl/ > Run the ActivePerl Installer (accepting all defaults is fine). > > You can also build Perl yourself (which requires a C compiler) or > download one of the > other binary distributions. The Perl source for building it yourself is > available from > CPAN (http://www.cpan.org), as are a few other binary distributions that > are alternatives > to ActiveState. This approach is not recommended unless you have > specific reasons for > doing so and know what you?re doing. It that?s the case you probably > don?t need to be > reading this guide. > > Cygwin is a Unix emulation environment for Windows and comes with its > own copy of Perl. > Information on Cygwin and Bioperl is found below. > > 4) BioPerl on Windows > ====================== > > Perl is a programming language that has been extended a lot by the > addition of external > modules. These modules work with the core language to extend the > functionality of Perl. > Bioperl is one such extension to Perl. These modular extensions to Perl > sometimes depend > on the functionality of other Perl modules and this creates a > dependency. You can?t > install module X unless you have already installed module Y. Some Perl > modules are so > fundamentally useful that the Perl developers have included them in the > core distribution > of Perl ? if you?ve installed Perl then these modules are already > installed. Other > modules are freely available from CPAN, but you?ll have to install them > yourself if you > want to use them. BioPerl has such dependencies. > > Bioperl is actually a large collection of perl modules (over 1000 > currently) and these > modules are split into six groups. These six groups are: > > Bioperl Group Functions > ----------------------------------------------------------------- > bioperl (the core) Most of the main functionality of Bioperl. > bioperl-run Wrappers to a lot of external programs. > bioperl-ext Interaction with some alignment functions > and the Staden package. > bioperl-db Using bioperl with BioSQL and local > relational databases. > bioperl-microarray Microarray specific functions. > biperl-gui Some preliminary work on a graphical user > interface to some Bioperl functions. > > The Bioperl core is what most new users will want to start with. Bioperl > 1.4 (the core) > and the Perl modules that it depends on can be easily installed with > ppm. PPM > (Programming Package Manager) is an ActivePerl utility for installing > Perl modules on > systems using ActivePerl. PPM will look online (you have to be connected > to the internet > of course) for files (these files end with .ppd) that tell it how to > install the modules > you want and what other modules your new modules depends on. It will > then download and > install your modules and all dependent modules for you. These .ppd files > are stored > online in ppm repositories. ActiveState maintains the largest ppm > repository and when > you installed ActivePerl ppm was installed with directions for using the > ActiveState > repositories. Unfortunately the ActiveState repositories are far from > complete and other > ActivePerl users maintain their own ppm repositories to fill in the > gaps. Installing > will require you to direct ppm to look in two new repositories. You do > this by opening a > Windows command prompt, typing ppm to start the ppm shell and then > typing the following > two commands: > ppm> rep add Bioperl http://bioperl.org/DIST > ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > > Once ppm knows where to look for Bioperl and it?s dependencies you > simply tell ppm to > install it. This is done with the command: > ppm> install Bioperl-1.4 > > 5) Beyond the Core > =================== > > You may find that you want some of the features of other Bioperl groups > like bioperl-run > or bioperl-db. There are currently no ppm packages for installing these > parts of > Bioperl. You will have to install these manually from source. For this > you will need a > Windows version of the program make called nmake > (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.ex > e). > You will > also want to have a willingness to experiment. You?ll have to read the > installation > documents for each component that you want to install, and use nmake > where the > instructions call for make. You will have to determine from the > installation documents > what dependencies are required and you will have to get them, read there > documentation > and install them first. The details of this are beyond the scope of this > guide. Read > the documentation. Search Google. Try your best, and if you get stuck > consult with > other on the bioperl mailing list. > > 6) BioPerl in Cygwin > ===================== > > Cygwin is a Unix emulator and shell environment available free at > www.cygwin.com. BioPerl > runs well within Cygwin. Some users claim that installation of Bioperl > is easier within > Cygwin than within Windows, but these may be users with Unix backgrounds. > > One advantage of using Bioperl in Cygwin is that all the external > modules are available > through CPAN, most if not all external programs can be installed and run > so many of the > limitation of Bioperl on Windows are circumvented. > > To get Bioperl running first install the basic Cygwin package as well as > the Cygwin Perl, > make, and gcc packages. Clicking the "View" button in the upper right of > the installer > enables you to see details on the various packages. Then follow the > BioPerl installation > instructions for Unix in BioPerl's INSTALL file. > > Note that expat comes with Cygwin (it's used by the module XML::Parser). > > One known issue is that DBD::mysql can be tricky to install in > Cygwin and this module is required for the bioperl-db, Biosql, and > bioperl-pipeline > external packages. Fortunately there's some good instructions online: > http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwi > n. > > Also, set the environmental variable TMPDIR, programs like BLAST and > clustalw need a > place to create temporary files. e.g.: > > setenv TMPDIR e:/cygwin/tmp # csh, tcsh > export TMPDIR=e:/cygwin/tmp # sh, bash > > Note that this is not a syntax that Cygwin understands, which would be > something like > "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects > on Windows. > > If this variable is not set correctly you'll see errors like this when > you run > Bio::Tools::Run::StandAloneBlast: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory > STACK: Error::throw > .......... > > 7) Cygwin tips > =============== > > The easiest way to install Mysql is to use the Windows binaries > available at > www.mysql.com. Note that Windows does not have sockets, so you need to > force the Mysql > connections to use TCP/IP instead. Do this by using the "-h" option from > the command- > line: > > >mysql -h 127.0.0.1 -u blip -pblop biosql > > Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it > uses a host. For > example, if your databases are installed locally: > > alias mysql 'mysql -h 127.0.0.1' > > If you're trying to use some application or resource "outside" of Cygwin > and you're > having a problem remember that Cygwin's path syntax may not be the > correct one. Cygwin > understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when > referring to the E: > drive) but the external resource may want 'E:/cygwin/home/jacky'. So > your *rc files may > end up with paths written in these different syntaxes, depending. > > If you can, install Cygwin on a drive or partition that's > NTFS-formatted, not FAT32- > formatted. When you install Cygwin on a FAT32 partition you will not be > able to set > permissions and ownership correctly. In most situations this probably > won't make any > difference but there may be occasions where this is a problem. > > If you want to use BLAST we recommend that the Windows binary be > obtained from NCBI > (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will > be named something > like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions > in README.bls. > > Although we've recommended using the BLAST and MySQL binaries you should > be able to > compile just about everything else from source code using Cygwin's gcc. > You'll notice > when you're installing Cygwin that many different libraries are also > available (gd, jpeg, > etc.). > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From nathanhaigh at ukonline.co.uk Thu Dec 9 19:52:28 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Thu Dec 9 19:50:14 2004 Subject: [Bioperl-l] Installing BioPerl on Windows In-Reply-To: Message-ID: > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Brian Osborne > Sent: 09 December 2004 20:07 > To: Barry Moore; Jason Stajich > Cc: Bioperl List > Subject: RE: [Bioperl-l] Installing BioPerl on Windows > > Barry, > > Your proposed revisions are the third time someone has attempted to redo the > Windows installation file. Or perhaps the fourth? The other well-intentioned > authors made their attempts for the same reasons you did: 1) Windows users > are a sizable fraction of the users with installation problems. 2) Windows > users with problems have the same questions, again and again ("where or what > is GD?", etc). 3) These users have not read the INSTALL.WIN file, or have > not paid attention. > > So, I'm fairly certain that your proposed changes will make no difference, > no matter how well-reasoned they are. If people don't read this file, > changing it makes no difference. So, where would you put a "Windows tips" > file? Again, I don't think Windows users pay attention to the files in the > top directory. Check out the first section of the README file, it directs > them immediately to INSTALL.WIN, very obvious, so these users aren't reading > the README either. I'm not being snide here, I just think the mode of > Windows installation doesn't naturally lead to reading these top-level > documents. Different from Unix. I believe that if a user does make it to this file, the modifications will make more sense to a windows user. > > Question: when the Windows user downloads the package what do they do with > it? Given a typical approach, what's the best place to put information on > Windows installation? On the Web download page perhaps? The Windows user doesn't actually download the package themselves, this is what PPM is for. If the user encounters problems with PPM (assuming they know how to use it etc, and have added the Bioperl repository etc), they may then manually download the pakage to install using nmake. However, this al means that the don't have the package to unpack, don't see the top level, and don't see the README's. The first thing a would-be Windows BioPerl user will do is look at the Homepage, try to find out exactly what Bioperl can do (this isn't obvious to a newbie-maybe have a short synopsis at the top, followed by a link to a more though explaination), then they'll want to find out how to install it (so an Installation link on the menu would be good - if they do to the download page first, they are likely to get confused at first (if they see the packages to download at the top of the download page, they may never get to the installation instructions towards the bottom). I found that the Bioperl website is geared up towards people who are familiar with what is there, where it is and the relevant files are accessable first (at the tops of pages), for someone who's newish to the site, they can waste a fair amount of time trying to find the info they want, and may resort to asking the mailing list before thoroughly looking around the site. May I suggest that the left hand menu on the homepage is modified to include direct links to important pages things such as: Installation (or Windows Install) - Windows users shouldn't miss this being on the homepage menu near the top! This should get the user to the windows install file with as few clicks as possible! Documentation - with links to all the module documentation including Bioperl-run etc: my first time to the website, I couldn't find Bioperl-run docs for the life in me, although it's obvious now I'm familiar with it's layout. The GD install problem should be addressed with the additional repositories in the CVS INSTALL.WIN and Barry's install file. Anyway, it late and I hope I haven't offended any windows users! I am one myself! Nathan > > Another effective way to do these kinds of documents is to get all the > frequently asked questions/problems and address them specifically. So, you'd > have a "quick start" section first, as you did, then follow it immediately > with a list of questions/problems and answers. Yes, you might consider > putting these into the existing FAQ but then each time the user writes > "where is ...?" you'd have to answer "please check the FAQ 4.2 ...". Less > than ideal, since the idea is to set things up so that the users don't have > to write bioperl-l. > > Thank you for efforts. > > Brian O. > > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Barry Moore > Sent: Wednesday, December 08, 2004 4:15 PM > To: Jason Stajich > Cc: Brian Osborne; Bioperl List > Subject: [Bioperl-l] Installing BioPerl on Windows > > > Jason, Brian, Others- > > A recent message to the bioperl list suggests that new Windows users are > still having problems installing Bioperl on Windows. This is not > necessary because it's actually quite easy to install Bioperl 1.4. I had > a look at the INSATLL.WIN document and I think that while it has been > updated a bit, it is starting to suffer from fragmented editing over a > long period of time. All the information that you need is there, but it > doesn't really fit together to well anymore, and there is still some > outdated and conflicting information present. Since new Windows users > are often the least likely to be experienced programmers and also likely > to have little Unix experience it may also need to be written with that > in mind, providing more explanation for how things are done. I've taken > a crack at this and rewritten INSTALL.WIN with a longer (perhaps to > long) introduction to Bioperl, and updated installation instruction for > Bioperl 1.4. In fact I think that the file name INSTALL.WIN should > probably be changed as that is a filename that is intuitive to someone > who has done a lot of installing from source. > Installing_Bioperl_on_Windows.txt may be more obvious filename to new > Windows users. If you think it looks useful please feel free to post it > on the Bioperl web site as a replacement for or in addition to the > current INSTALL.WIN. I'll be happy to try to keep this document up to > date, but I'll need one of the developers to put it on the site for me. > Finally, I didn't touch the Cygwin sections of the previous INSTALL.WIN > document because I have no experience with it, so I'll have to assume > that it is accurate and let others contribute any fixes necessary there. > Let me know if I've made any errors or omissions that need to be corrected. > > Barry > > ============================================================================ > ====== > > Installing Bioperl on Windows > ============================= > > 1) Quick Instructions for the impatient > 2) Bioperl on Windows > 3) Perl on Windows > 4) BioPerl on Windows > 5) Beyond the Core > 6) BioPerl in Cygwin > 7) Cygwin tips > > This installation guide was written by Barry Moore and other Bioperl > authors based on the > original work of Paul Boutros. Please report problems and/or fixes to > the bioper lmailing > list, bioperl-l@bioperl.org > > 1) Quick instructions for the impatient, lucky, or experienced user. > ===================================================================== > > Download the ActivePerl MSI from > http://www.activestate.com/Products/ActivePerl/ > Run the ActivePerl Installer (accepting all defaults is fine). > Open a command prompt (Menus Start->Run and type cmd) and run the ppm > shell (C:\>ppm). > Add two new ppm repositories with the following commands: > ppm> rep add Bioperl http://bioperl.org/DIST > ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > Install Bioperl-1.4. > Go to http://www.bioperl.org and start reading documentation or try the > example script at > the end of this file. > > > 2) Bioperl on Windows > ====================== > > Bioperl is a large collection of Perl modules (extensions to the Perl > language) that aid > in the task of writing perl code to deal with sequence data in a myriad > of ways. Bioperl > provides objects for various types of sequence data and their associated > features and > annotations. It provides interfaces for analysis of these sequences with > a wide variety > of external programs (BLAST, fasta, clustalw and EMBOSS to name just a > few). It provides > interfaces to various types of databases both remote (GenBank, EMBL > etc.) and local > (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. > And finally with > its associated documentation and mailing list Bioperl represents a > community of > bioinformatics professionals working in perl who are committed to > supporting both > development of Bioperl and the new users who are drawn to the project. > > While most bioinformatics and computational biology applications are > developed in > Unix/Linux environments, more and more programs are being ported to > other operating > systems like Windows, and many users (often biologists with little > background in > programming) are looking for ways to automate bioinformatics analyses in > the Windows > environment. Perl and Bioperl can be installed natively on Windows > NT/2000/XP. Most of > the functionality of Bioperl is available with this type of install. > Much of the heavy > lifting in bioinformatics is done by programs originally developed in > lower level > languages like C and Pascal (e.g. BLAST, clustalw, Staden etc.). Bioperl > simply acts as a > wrapper for running and parsing output from these external programs. > Some of those > programs (BLAST for example) are ported to Windows. These can be > installed and work > quite happily with BioPerl in the native Windows environment. Others, > such as clustalw, > have Windows ports, however the BioPerl developer who wrote the > interface used Unix > specific system calls to interact with these programs and so these > wrappers will not work > in the Windows environment. And finally some external programs such as > Staden and the > EMBOSS suite of programs can not be installed on Windows at all, and > therefore any part > of Bioperl that interacts with these packages either won't work or can't > be installed at > all. > > If you have a fairly simple project in mind, want to start using Bioperl > quickly, only > have access to a computer running Windows, and/or don't mind bumping up > against some > limitations then Bioperl on Windows may be a good place for you to > start. For example, > downloading a bunch of sequences from GenBank and sorting out the ones > that have a > particular annotation or feature works great. Running a bunch of your > sequences against > remote or local BLAST, parsing the output and storing it in a MySQL > database would be > fine also. Be aware that most if not all of the Bioperl developers are > working in some > type of a Unix environment (Linux, OSX, Cygwin). If you have problems > with Bioperl that > are specific to the Windows environment, you may be blazing new ground > and your pleas for > help on the Bioperl mailing list may get few responses - simply because > no one knows the > answer to your Windows specific problem. If this is or becomes a problem > for you then > you are better off working in some type of Unix like environment. One > solution to this > problem that will keep you working on a Windows machine it to install > Cygwin, a Unix > emulation environment for Windows. A number of Bioperl users are using > this approach > successfully and it is discussed more below. > > 3) Perl on Windows > =================== > > There are a couple of ways of installing Perl on a Windows machine. The > most common and > easiest is to get the most recent build from ActiveState. ActiveState is > a software > company (http://www.activestate.com) that provides free builds of Perl > for Windows > users. The current (December 2004) build is ActivePerl 5.8.4.810 > (ActivePerl 5.6.1.638 > is also available and should work just fine). To install ActivePerl on > Windows: > Download the ActivePerl MSI from > http://www.activestate.com/Products/ActivePerl/ > Run the ActivePerl Installer (accepting all defaults is fine). > > You can also build Perl yourself (which requires a C compiler) or > download one of the > other binary distributions. The Perl source for building it yourself is > available from > CPAN (http://www.cpan.org), as are a few other binary distributions that > are alternatives > to ActiveState. This approach is not recommended unless you have > specific reasons for > doing so and know what you're doing. It that's the case you probably > don't need to be > reading this guide. > > Cygwin is a Unix emulation environment for Windows and comes with its > own copy of Perl. > Information on Cygwin and Bioperl is found below. > > 4) BioPerl on Windows > ====================== > > Perl is a programming language that has been extended a lot by the > addition of external > modules. These modules work with the core language to extend the > functionality of Perl. > Bioperl is one such extension to Perl. These modular extensions to Perl > sometimes depend > on the functionality of other Perl modules and this creates a > dependency. You can't > install module X unless you have already installed module Y. Some Perl > modules are so > fundamentally useful that the Perl developers have included them in the > core distribution > of Perl - if you've installed Perl then these modules are already > installed. Other > modules are freely available from CPAN, but you'll have to install them > yourself if you > want to use them. BioPerl has such dependencies. > > Bioperl is actually a large collection of perl modules (over 1000 > currently) and these > modules are split into six groups. These six groups are: > > Bioperl Group Functions > ----------------------------------------------------------------- > bioperl (the core) Most of the main functionality of Bioperl. > bioperl-run Wrappers to a lot of external programs. > bioperl-ext Interaction with some alignment functions > and the Staden package. > bioperl-db Using bioperl with BioSQL and local > relational databases. > bioperl-microarray Microarray specific functions. > biperl-gui Some preliminary work on a graphical user > interface to some Bioperl functions. > > The Bioperl core is what most new users will want to start with. Bioperl > 1.4 (the core) > and the Perl modules that it depends on can be easily installed with > ppm. PPM > (Programming Package Manager) is an ActivePerl utility for installing > Perl modules on > systems using ActivePerl. PPM will look online (you have to be connected > to the internet > of course) for files (these files end with .ppd) that tell it how to > install the modules > you want and what other modules your new modules depends on. It will > then download and > install your modules and all dependent modules for you. These .ppd files > are stored > online in ppm repositories. ActiveState maintains the largest ppm > repository and when > you installed ActivePerl ppm was installed with directions for using the > ActiveState > repositories. Unfortunately the ActiveState repositories are far from > complete and other > ActivePerl users maintain their own ppm repositories to fill in the > gaps. Installing > will require you to direct ppm to look in two new repositories. You do > this by opening a > Windows command prompt, typing ppm to start the ppm shell and then > typing the following > two commands: > ppm> rep add Bioperl http://bioperl.org/DIST > ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > > Once ppm knows where to look for Bioperl and it's dependencies you > simply tell ppm to > install it. This is done with the command: > ppm> install Bioperl-1.4 > > 5) Beyond the Core > =================== > > You may find that you want some of the features of other Bioperl groups > like bioperl-run > or bioperl-db. There are currently no ppm packages for installing these > parts of > Bioperl. You will have to install these manually from source. For this > you will need a > Windows version of the program make called nmake > (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.ex > e). > You will > also want to have a willingness to experiment. You'll have to read the > installation > documents for each component that you want to install, and use nmake > where the > instructions call for make. You will have to determine from the > installation documents > what dependencies are required and you will have to get them, read there > documentation > and install them first. The details of this are beyond the scope of this > guide. Read > the documentation. Search Google. Try your best, and if you get stuck > consult with > other on the bioperl mailing list. > > 6) BioPerl in Cygwin > ===================== > > Cygwin is a Unix emulator and shell environment available free at > www.cygwin.com. BioPerl > runs well within Cygwin. Some users claim that installation of Bioperl > is easier within > Cygwin than within Windows, but these may be users with Unix backgrounds. > > One advantage of using Bioperl in Cygwin is that all the external > modules are available > through CPAN, most if not all external programs can be installed and run > so many of the > limitation of Bioperl on Windows are circumvented. > > To get Bioperl running first install the basic Cygwin package as well as > the Cygwin Perl, > make, and gcc packages. Clicking the "View" button in the upper right of > the installer > enables you to see details on the various packages. Then follow the > BioPerl installation > instructions for Unix in BioPerl's INSTALL file. > > Note that expat comes with Cygwin (it's used by the module XML::Parser). > > One known issue is that DBD::mysql can be tricky to install in > Cygwin and this module is required for the bioperl-db, Biosql, and > bioperl-pipeline > external packages. Fortunately there's some good instructions online: > http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwi > n. > > Also, set the environmental variable TMPDIR, programs like BLAST and > clustalw need a > place to create temporary files. e.g.: > > setenv TMPDIR e:/cygwin/tmp # csh, tcsh > export TMPDIR=e:/cygwin/tmp # sh, bash > > Note that this is not a syntax that Cygwin understands, which would be > something like > "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects > on Windows. > > If this variable is not set correctly you'll see errors like this when > you run > Bio::Tools::Run::StandAloneBlast: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory > STACK: Error::throw > .......... > > 7) Cygwin tips > =============== > > The easiest way to install Mysql is to use the Windows binaries > available at > www.mysql.com. Note that Windows does not have sockets, so you need to > force the Mysql > connections to use TCP/IP instead. Do this by using the "-h" option from > the command- > line: > > >mysql -h 127.0.0.1 -u blip -pblop biosql > > Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it > uses a host. For > example, if your databases are installed locally: > > alias mysql 'mysql -h 127.0.0.1' > > If you're trying to use some application or resource "outside" of Cygwin > and you're > having a problem remember that Cygwin's path syntax may not be the > correct one. Cygwin > understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when > referring to the E: > drive) but the external resource may want 'E:/cygwin/home/jacky'. So > your *rc files may > end up with paths written in these different syntaxes, depending. > > If you can, install Cygwin on a drive or partition that's > NTFS-formatted, not FAT32- > formatted. When you install Cygwin on a FAT32 partition you will not be > able to set > permissions and ownership correctly. In most situations this probably > won't make any > difference but there may be occasions where this is a problem. > > If you want to use BLAST we recommend that the Windows binary be > obtained from NCBI > (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will > be named something > like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions > in README.bls. > > Although we've recommended using the BLAST and MySQL binaries you should > be able to > compile just about everything else from source code using Cygwin's gcc. > You'll notice > when you're installing Cygwin that many different libraries are also > available (gd, jpeg, > etc.). > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0450-1, 09/12/2004 > Tested on: 10/12/2004 00:20:42 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > From m.claesson at student.ucc.ie Fri Dec 10 06:50:43 2004 From: m.claesson at student.ucc.ie (Marcus Claesson) Date: Fri Dec 10 06:48:28 2004 Subject: [Bioperl-l] Can I get different Graphics::Panel coloursfordifferent HSP frames within the same blast hit? In-Reply-To: References: Message-ID: <1102679442.17814.226.camel@morpheus.ucc.ie> Yes it now works very well thanks. Excellent! I noticed the error message with the -w flag but when removing it it disappeared. I guess I should do without it then. Thanks! Marcus On Thu, 2004-12-09 at 19:44, Crabtree, Jonathan wrote: > Jason- > > Perhaps a data entry error on my part is to blame, but when I try your > version I still get the warning, and I also get the following runtime > error because Perl can't resolve the reference to $self->SUPER::draw: > > Can't locate object method "draw" via package "main" at ./test2.pl line > 48, line 191. > > I agree that the "package MAIN;" is superfluous, but I think you need > the other one (unless you replace SUPER::draw with something more > specific, at which point I think your already-marginal succinctness > advantage goes out the window...) Does this version work for you, > Marcus? > > Jonathan > > > > -----Original Message----- > > From: Jason Stajich [mailto:jason.stajich@duke.edu] > > Sent: Thursday, December 09, 2004 2:25 PM > > To: Crabtree, Jonathan > > Cc: Marcus Claesson; Bioperl list > > Subject: Re: [Bioperl-l] Can I get different Graphics::Panel > > coloursfordifferent HSP frames within the same blast hit? > > > > > > > > On Dec 9, 2004, at 1:56 PM, Crabtree, Jonathan wrote: > > > > > > > > Marcus- > > > > > >> That hack seems to do it. However, my program will be used > > by people > > >> installing it themselves so I have to stick with the standard > > >> non-hacked version of bioperl. > > > > > > OK, in that case here's an even less elegant solution for you to > > > consider; this one requires you to distribute only a single file. > > > Just replace 'blastx.out' with the name of your blastx > > output file in > > > the script below. > > > > > > Jonathan > > > > > > > > > #!/usr/bin/perl > > > > > > # BEGIN HACK > > > > > # You can do this even more succinctly and without the warnings > > > > use Bio::Graphics::Glyph::graded_segments; > > # package Bio::Graphics::Glyph::graded_segments; > > > > # redefine draw method from Bioperl graded_segments package; > > # perl will warn you (and for good reason...) that you're > > doing this if > > you run it with the -w flag > > # > > # sub draw { > > sub Bio::Graphics::Glyph::graded_segments::draw { > > my $self = shift; > > > > # bail out if this isn't the right kind of feature > > # handle both das-style and Bio::SeqFeatureI style, > > # which use different names for subparts. > > my @parts = $self->parts; > > @parts = $self if !@parts && $self->level == 0; > > return $self->SUPER::draw(@_) unless @parts; > > > > my ($min_score,$max_score) = $self->minmax(\@parts); > > > > return $self->SUPER::draw(@_) > > unless defined($max_score) && defined($min_score) > > && $min_score < $max_score; > > > > my $span = $max_score - $min_score; > > > > foreach my $part (@parts) { > > # use part's bgcolor as base color (to be adjusted by score) > > my $fill = $part->bgcolor; > > my ($red,$green,$blue) = $self->panel->rgb($fill); > > > > my $s = eval { $part->feature->score }; > > unless (defined $s) { > > $part->{partcolor} = $fill; > > next; > > } > > my ($r,$g,$b) = > > $self->calculate_color($s,[$red,$green,$blue],$min_score,$span); > > my $idx = $self->panel->translate_color($r,$g,$b); > > $part->{partcolor} = $idx; > > } > > $self->SUPER::draw(@_); > > } > > > > # package MAIN; > > > > > > # END HACK > > > > > > use Bio::Graphics; > > > use Bio::SearchIO; > > > > > > my $searchio = Bio::SearchIO->new(-file=> 'blastx.out', -format => > > > 'blast'); > > > my $result = $searchio->next_result(); > > > my $panel = Bio::Graphics::Panel->new(-length=> > > $result->query_length, > > > -width=> 800); > > > my $track = $panel->add_track(-glyph => 'graded_segments', > > > -label => 1, > > > -connector => 'dashed', > > > -bgcolor => sub { > > > my $feature = shift; > > > my ($frame) = $feature->frame(); > > > return "red" if ($frame =~ /0/); > > > return "green" if ($frame =~ /1/); > > > return "blue" if ($frame =~ /2/)}, > > > -strand_arrow => 'tue'); > > > while( my $hit = $result->next_hit ) { > > > my $feature = > > > Bio::SeqFeature::Generic->new(-score=>$hit->raw_score, > > > -frame=> > > $hit->frame); > > > while( my $hsp = $hit->next_hsp ) { > > > $feature->add_sub_SeqFeature($hsp,'EXPAND'); > > > } > > > $track->add_feature($feature); > > > } > > > print $panel->png; > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From amackey at pcbi.upenn.edu Fri Dec 10 07:47:28 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Fri Dec 10 07:44:48 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] [Bug 1720] New: Bug in Bio::SeqFeature::Generic.pm DESTROY causes object corruption.. fix included! In-Reply-To: <200412100310.iBA3AWLa028953@portal.open-bio.org> References: <200412100310.iBA3AWLa028953@portal.open-bio.org> Message-ID: if you do this: { my $obj = Class->new(); } As that $obj goes out of scope, $obj->DESTROY is called. After the scope fully ends (all possible DESTROY methods have been called), $obj is completely gone (it's memory may be reused, but the variable itself is gone) Without looking at your code, what's likely happening is that something else ($obj2) is also going out of scope at the same time, and also being DESTROY'ed; during $obj2's destruction, it's expecting $obj to still be around (since the order of destruction isn't guaranteed, nor even predictable), and not yet DESTROY'ed. Faith is only useful when you don't yet know the science. There is no object resurrection, nor will there'll be a second coming. -Aaron On Dec 9, 2004, at 10:10 PM, bugzilla-daemon@portal.open-bio.org wrote: > Ah, so perl is calling destroy every time through the block on all the > objects that were created during > that block's execution, and then somehow reusing the objects? Does > anyone understand how this > works? I'll append my script in its current form... it probably won't > be useful without a very large input > file, email me directly and I'll send some over. > > Object resurrection... I've been using perl for nearly six years > now.... and now this is seriously shaking > my faith. This must have been what Cardinal John Henry Newman felt > like before his conversion... > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From sdavis2 at mail.nih.gov Fri Dec 10 08:02:22 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri Dec 10 07:59:59 2004 Subject: [Bioperl-l] Bio::tools::primer3 and bio::seqfeature::primer Message-ID: I am running primer3 using bio::tools::run::primer3. I would like to add the resulting primers to my original sequence as sequence features. I can extract all the tags and build my own sequence feature (bio::seqfeature::primer), but the perldoc for bio::seqfeature::primer implies that it is designed to be integrated with bio::tools::primer3. I just want to make sure that there isn't an inheritance that I don't see before constructing my own Bio::Seqfeature::Primer from the Bio::Tools::Primer3 objects. Thanks, Sean From ewijaya at singnet.com.sg Fri Dec 10 10:17:53 2004 From: ewijaya at singnet.com.sg (Edward WIJAYA) Date: Fri Dec 10 10:16:14 2004 Subject: [Bioperl-l] Mostly used Bioperl modules In-Reply-To: <41B77259.3030606@genetics.utah.edu> References: <41B77259.3030606@genetics.utah.edu> Message-ID: Dear Sirs, I am new in Bioperl and Perl in general. Also I found that Bioperl is very useful in the area I am working in motif discovery. I was wondering in the area of Bioinformatics, generally what are the most popular Bioperl's modules that are mostly used? Namely modules that are used repeatedly in any kind of Bioinformatic task. Thanks so much for your time. Hope to hear from you again. Regards, Edward WIJAYA SINGAPORE From rob at salmonella.org Fri Dec 10 11:33:34 2004 From: rob at salmonella.org (Rob Edwards) Date: Fri Dec 10 11:30:57 2004 Subject: [Bioperl-l] Bio::tools::primer3 and bio::seqfeature::primer In-Reply-To: References: Message-ID: <3C8671CD-4AC9-11D9-AAE0-000A959E1622@salmonella.org> This is already available: Bio::Tools::Primer3->next_primer() returns a Bio::Seq::PrimedSeq object. Take a look at the Bio::Seq::PrimedSeq docs to see how to write out a sequence with it's primers attached. Rob On Dec 10, 2004, at 5:02 AM, Sean Davis wrote: > I am running primer3 using bio::tools::run::primer3. I would like to > add the resulting primers to my original sequence as sequence > features. I can extract all the tags and build my own sequence > feature (bio::seqfeature::primer), but the perldoc for > bio::seqfeature::primer implies that it is designed to be integrated > with bio::tools::primer3. I just want to make sure that there isn't > an inheritance that I don't see before constructing my own > Bio::Seqfeature::Primer from the Bio::Tools::Primer3 objects. > > Thanks, > Sean > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From amackey at pcbi.upenn.edu Fri Dec 10 11:59:27 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Fri Dec 10 11:56:43 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] [Bug 1720] New: Bug in Bio::SeqFeature::Generic.pm DESTROY causes object corruption.. fix included! In-Reply-To: <49F5251D-4ACC-11D9-8F75-000D932893EC@caltech.edu> References: <200412100310.iBA3AWLa028953@portal.open-bio.org> <49F5251D-4ACC-11D9-8F75-000D932893EC@caltech.edu> Message-ID: Probably because the offending code in get_tag_values doesn't check for defined($hash{$key}) but just exists($hash{$key}) (and then merrily tries to dereference it as an array). It's a bug in BioPerl, to be sure, but not as mystical as it seems. -Aaron On Dec 10, 2004, at 11:55 AM, Alok Saldanha wrote: > I still don't understand why adding delete() to DESTROY causes the > problem to go away -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From hlapp at gnf.org Fri Dec 10 12:34:54 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Dec 10 12:32:20 2004 Subject: [Bioperl-l] Re: [BioSQL-l] tigr.pm In-Reply-To: References: Message-ID: Strand is NOT NULLable in all three (MySQL, Pg, Oracle) versions of the schema. This constraint could be relaxed, as Strand is not part of any unique key constraint. OTOH the designated value for an unknown strand is 0 in bioperl, not NULL or undef. Furthermore, the schema defines 0 as the default value for the Strand column. Unfortunately, this doesn't help if the column is explicitly inserted as NULL. I could add a fix to the LocationAdapter to default a null strand to 0. Any opinions? I'm leaning towards adding a fix to LocationAdapter to fix this problem immediately and forever. -hilmar On Dec 10, 2004, at 2:49 AM, matthieu CONTE wrote: > I am trying to load tigr rice data in my biosql db with the > load_seqdatabase.pl using the lastest tigr parser on the CVS. > I still have the same problem with null field > > Thanks > > perl load_seqdatabase.pl --dbuser biosql --dbpass biosql --namespace > tigr_arath --format tigr > /home/conte/pipeline_orthologues/data/arath_tigr/CHR1.R5v01212004.xml > Loading > /home/conte/pipeline_orthologues/data/arath_tigr/CHR1.R5v01212004.xml > ... > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, > values were ("130","2000","","1") FKs (16962,) > Column 'strand' cannot be null > --------------------------------------------------- > > ************************************** > m_conte@hotmail.com > CIRAD > ************************************** > > _________________________________________________________________ > MSN Messenger : dialoguez en temps r?el avec vos amis ! > http://g.msn.fr/FR1001/866 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From escobarebio at yahoo.com Fri Dec 10 15:59:37 2004 From: escobarebio at yahoo.com (D.Enrique ESCOBAR ESPINOZA) Date: Fri Dec 10 15:57:32 2004 Subject: [Bioperl-l] HG-U133a annotation csv (HG-U133A_annot.csv) Message-ID: <20041210205938.77559.qmail@web11505.mail.yahoo.com> I m have a hell of time trying to parse the annotation file with a regular expression. The problem is that the file contains fileds separated by a coma, each field starts with a double quote and it ends in a double quote, and also it contains in each field some ';' and ','. an exemple of that file is at the end of this mail, can someone help and give me a trick for parsing the lines of this file? It has 38 fields, and excel is not even opening it correctly, and if i try to save it back to a csv file, it does a complet mess. Thanks in advance. "Probe Set ID","GeneChip Array","Species Scientific Name","Annotation Date","Sequence Type","Sequence Source","Transcript ID","Target Description","Representative Public ID","UniGene ID","Genome Version","Alignments","Gene Title","Gene Symbol","Chromosomal Location","Unigene Cluster Type","Ensembl","LocusLink","SwissProt","EC","OMIM","RefSeq Protein ID","RefSeq Transcript ID","FlyBase","AGI","WormBase","MGI Name","RGD Name","SGD accession number","Gene Ontology Biological Process","Gene Ontology Cellular Component","Gene Ontology Molecular Function","Pathway","Protein Families","Protein Domains","InterPro","Trans Membrane","QTL","Annotation Description","Annotation Transcript Cluster","Transcript Assignments","Annotation Notes" "1007_s_at","Human Genome U133A Array","Homo sapiens","Oct 11, 2004","Exemplar sequence","Affymetrix Proprietary Database","U48705mRNA"," U48705 /FEATURE=mRNA /DEFINITION=HSU48705 Human receptor tyrosine kinase DDR gene, complete cds ","U48705","Hs.423573","May 2004 (NCBI 35)","chr6:30964144-30975910 (+) // 95.63 // p21.33","discoidin domain receptor family, member 1","DDR1","chr6p21.3","full length","ENSG00000137332","780","BAC85426 /// Q08345 /// Q96T61 /// Q96T62","EC:2.7.1.112","600408","NP_001945 /// NP_054699 /// NP_054700","NM_001954 /// NM_013993 /// NM_013994","---","---","---","---","---","---","6468 // protein amino acid phosphorylation // inferred from electronic annotation /// 7155 // cell adhesion // traceable author statement /// 7169 // transmembrane receptor protein tyrosine kinase signaling pathway // inferred from electronic annotation","5887 // integral to plasma membrane // traceable author statement /// 16020 // membrane // inferred from electronic annotation","4674 // protein serine/threonine kinase activity // inferred from electronic annotation /// 4714 // transmembrane receptor protein tyrosine kinase activity // traceable author statement /// 4872 // receptor activity // inferred from electronic annotation /// 5524 // ATP binding // inferred from electronic annotation /// 16740 // transferase activity // inferred from electronic annotation","---","ec // ZA70_HUMAN // ZA70_HUMAN EC:2.7.1.112:TYROSINE-PROTEIN KINASE ZAP-70 (EC 2.7.1.112) (70 KDA ZETA-ASSOCIATED PROTEIN) (SYK-RELATED TYROSINE KINASE). // 2.0E-65 /// Hanks // DDR // HUMRTK_1 (DDR) KINASES:5.11.1 | PTK Group B membrane spanning protein tyrosine kinases.PTK XX DDR/TKT family .DDR // 1.0E-156","scop // d1kexa_ // d1kexa_ SCOP:b.18.1.2:| B1 domain of neuropilin-1 // 5.0E-42","IPR000421 // Coagulation factor 5/8 type C domain (FA58C) /// IPR000719 // Protein kinase","NP_054700.1 // span:417-439 // numtm:1","---","This probe set was annotated using the Matching Probes based pipeline to a Locus Link identifier using 1 transcripts. // false // Matching Probes // A","NM_013994(16)","ENST00000259875 // cdna:known chromosome:NCBI34:6:30958112:30974184:1 // ensembl // 16 // --- /// NM_013994 // Homo sapiens discoidin domain receptor family, member 1 (DDR1), transcript variant 3, mRNA. // refseq // 16 // ---","ENST00000325423 // ensembl // 1 // Negative Strand Matching Probes /// ENST00000340208 // ensembl // 1 // Negative Strand Matching Probes /// GENSCAN00000025013 // ensembl // 1 // Negative Strand Matching Probes /// BC026341 // gb // 1 // Negative Strand Matching Probes /// S57212 // gb // 1 // Negative Strand Matching Probes" "1053_at","Human Genome U133A Array","Homo sapiens","Oct 11, 2004","Exemplar sequence","GenBank","M87338"," M87338 /FEATURE= /DEFINITION=HUMA1SBU Human replication factor C, 40-kDa subunit (A1) mRNA, complete cds ","M87338","Hs.139226","May 2004 (NCBI 35)","chr7:73090653-73113383 (-) // 70.86 // q11.23","replication factor C (activator 1) 2, 40kDa","RFC2","chr7q11.23","full length","ENSG00000049541","5982","AAP35707 /// P35250","---","600404","NP_002905 /// NP_852136","NM_002914 /// NM_181471","---","---","---","---","---","---","6260 // DNA replication // inferred from electronic annotation","5634 // nucleus // inferred from electronic annotation /// 5663 // DNA replication factor C complex // traceable author statement","166 // nucleotide binding // inferred from electronic annotation /// 3677 // DNA binding // inferred from electronic annotation /// 5524 // ATP binding // traceable author statement","DNA_replication // GenMAPP","ec // KAD2_HUMAN // KAD2_HUMAN EC:2.7.4.3:ADENYLATE KINASE ISOENZYME 2, MITOCHONDRIAL (EC 2.7.4.3) (ATP-AMP TRANSPHOSPHORYLASE). // 8.2","scop // d1nrjb_ // d1nrjb_ SCOP:c.37.1.8:| Signal recognition particle receptor beta-subunit // 0.024","---","---","---","This probe set was annotated using the Matching Probes based pipeline to a Locus Link identifier using 2 transcripts. // false // Matching Probes // A","M87338(15),NM_181471(12)","ENST00000055077 // cdna:known chromosome:NCBI34:7:73057931:73080835:-1 // ensembl // 12 // --- /// ENST00000275627 // cdna:known chromosome:NCBI34:7:73057931:73080835:-1 // ensembl // 12 // --- /// M87338 // Human replication factor C, 40-kDa subunit (A1) mRNA, complete cds. // gb // 15 // --- /// NM_181471 // Homo sapiens replication factor C (activator 1) 2, 40kDa (RFC2), transcript variant 1, mRNA. // refseq // 12 // ---","GENSCAN00000014431 // ensembl // 8 // Cross Hyb Matching Probes" ===== -------------------------------------------------- D.Enrique ESCOBAR ESPINOZA (B.Sc.) http://www.iro.umontreal.ca/~escobard/ http://adn.bioinfo.uqam.ca/~escd07097301/ ICQ#: 201778618 ------------------------------------------------- 1487, Boul. St-Joseph Est Apt4 Tel: (514) 523-8398 Montreal QC Canada H2J 1M6 __________________________________ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 From laurichj at bioinfo.ucr.edu Fri Dec 10 16:09:19 2004 From: laurichj at bioinfo.ucr.edu (Josh Lauricha Hi) Date: Fri Dec 10 16:07:13 2004 Subject: [Bioperl-l] Re: [BioSQL-l] tigr.pm In-Reply-To: References: Message-ID: <20041210210919.GB17108@bioinfo.ucr.edu> The problem with setting strand in tigr.pm is that, for simplicity, I've already reversed the strands to the positive oriantation. So while their not "unknown" they are neither positive nor negative. Marking a positive strand as positive is fine, but marking a negative strand as negative is not, since it has been reversed already. It might be misinterpreted (by code or people) that a negative strand must be reversed again. On Fri 12/10/04 09:34, Hilmar Lapp wrote: > Strand is NOT NULLable in all three (MySQL, Pg, Oracle) versions of the > schema. > > This constraint could be relaxed, as Strand is not part of any unique > key constraint. > > OTOH the designated value for an unknown strand is 0 in bioperl, not > NULL or undef. > > Furthermore, the schema defines 0 as the default value for the Strand > column. Unfortunately, this doesn't help if the column is explicitly > inserted as NULL. > > I could add a fix to the LocationAdapter to default a null strand to 0. > > Any opinions? I'm leaning towards adding a fix to LocationAdapter to > fix this problem immediately and forever. > > -hilmar > > On Dec 10, 2004, at 2:49 AM, matthieu CONTE wrote: > > >I am trying to load tigr rice data in my biosql db with the > >load_seqdatabase.pl using the lastest tigr parser on the CVS. > >I still have the same problem with null field > > > >Thanks > > > >perl load_seqdatabase.pl --dbuser biosql --dbpass biosql --namespace > >tigr_arath --format tigr > >/home/conte/pipeline_orthologues/data/arath_tigr/CHR1.R5v01212004.xml > >Loading > >/home/conte/pipeline_orthologues/data/arath_tigr/CHR1.R5v01212004.xml > >... > > > >-------------------- WARNING --------------------- > >MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, > >values were ("130","2000","","1") FKs (16962,) > >Column 'strand' cannot be null > >--------------------------------------------------- > > > >************************************** > >m_conte@hotmail.com > >CIRAD > >************************************** > > > >_________________________________________________________________ > >MSN Messenger : dialoguez en temps r?el avec vos amis ! > >http://g.msn.fr/FR1001/866 > > > >_______________________________________________ > >BioSQL-l mailing list > >BioSQL-l@open-bio.org > >http://open-bio.org/mailman/listinfo/biosql-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| From demerphq at gmail.com Fri Dec 10 06:04:24 2004 From: demerphq at gmail.com (demerphq) Date: Fri Dec 10 16:25:15 2004 Subject: [Bioperl-l] Fuzzy Pattern Matching Algorithm Message-ID: <9b18b3110412100304534a9176@mail.gmail.com> Hi, Recently some questions appeared on Perlmonks asking about algorithms to use for fuzzy pattern matching of genestrings (in the form of strings of "ACGT"). After some debate the following thread was published with a summary of our results: http://perlmonks.org/?node_id=413697 To give an example without having to read the linked document one algorithm is capable of search 1 million chars for any fuzzy matches of 500_000 25 char sequences in about 150 seconds and 10mb in 1000 seconds. Is this performance good? I suspect that one of the few places this code may actually prove useful is in the context of Bioperl. Im curious as to what solutions and scenarios such algorithms would be used in. For those who participated I believe most were purely interested as a Comp Sci problem and not actually for the utility of the solution itself so we really have no idea. It would be very interesting to hear the thoughts of an experienced BioPerl developer on our efforts. Especially if it might mean that somebody would do something useful with them :-) Apologies if this is a waste of your time and bandwidth. Cheers, yves -- First they ignore you, then they laugh at you, then they fight you, then you win. +Gandhi From tembe at bioanalysis.org Thu Dec 9 15:28:46 2004 From: tembe at bioanalysis.org (Waibhav Tembe) Date: Fri Dec 10 16:26:17 2004 Subject: [Bioperl-l] Extracting Raw Score Value from Blast Output Message-ID: <41B8B57E.9080707@bioanalysis.org> Hello, I am relatively new to BLAST and BioPerl. Apologies if this question/observation is trivial or I have made any basic mistake. I am parsing BLAST output using *bioperl-1.4::Bio::Search::Hit* . For a given hit, I would like to extract raw score, bit score and other information. Using ->raw_score ->Bits for a hit. Here is what I observed. (Just pasting relevant info from BLAST output) ================================================ Query= PA008 (35 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 2,718,617 sequences; 12,254,801,043 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value gb|CP000001.1| Bacillus cereus ZK, complete genome 70 2e-10 omitted all other records .......... >gb|CP000001.1| Bacillus cereus ZK, complete genome Length = 5300915 Score = 69.9 bits (35), Expect = 2e-10 Identities = 35/35 (100%) Strand = Plus / Plus Query: 1 ttaacgaagcatcgcgaagagcacgttcaattgga 35 ||||||||||||||||||||||||||||||||||| Sbjct: 3032643 ttaacgaagcatcgcgaagagcacgttcaattgga 3032677 --------------------------------------------- For the above BLAST section, I generated the following statistics using BioPerl. Query Name = PA008 Lambda=1.37, Kappa=0.711, Base Match Reward=1 Checking Hit [1]Raw Score= 70 BitScore=69.9 EValue=2e-10 Bacillus cereus ZK, complete genome I was expecting Raw Score = 35 and NOT 70. Is raw_score output by BioPerl's implementation calculated differently? Am I reading BLAST output incorreclty? Thanks! -waibhav -- Waibhav Tembe. From m_conte at hotmail.com Fri Dec 10 05:49:43 2004 From: m_conte at hotmail.com (matthieu CONTE) Date: Fri Dec 10 16:26:45 2004 Subject: [Bioperl-l] tigr.pm In-Reply-To: <200412101145.03569.manuel.ruiz@cirad.fr> Message-ID: I am trying to load tigr rice data in my biosql db with the load_seqdatabase.pl using the lastest tigr parser on the CVS. I still have the same problem with null field Thanks perl load_seqdatabase.pl --dbuser biosql --dbpass biosql --namespace tigr_arath --format tigr /home/conte/pipeline_orthologues/data/arath_tigr/CHR1.R5v01212004.xml Loading /home/conte/pipeline_orthologues/data/arath_tigr/CHR1.R5v01212004.xml ... -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed, values were ("130","2000","","1") FKs (16962,) Column 'strand' cannot be null --------------------------------------------------- ************************************** m_conte@hotmail.com CIRAD ************************************** _________________________________________________________________ MSN Messenger : dialoguez en temps r?el avec vos amis ! http://g.msn.fr/FR1001/866 From Peter.Robinson at t-online.de Fri Dec 10 16:34:47 2004 From: Peter.Robinson at t-online.de (Peter Robinson) Date: Fri Dec 10 16:32:01 2004 Subject: [Bioperl-l] HG-U133a annotation csv (HG-U133A_annot.csv) In-Reply-To: <20041210205938.77559.qmail@web11505.mail.yahoo.com> References: <20041210205938.77559.qmail@web11505.mail.yahoo.com> Message-ID: <1102714487.4639.9.camel@localhost.localdomain> while ($line =~ m/"(.*?)"/g) { print $1; } The "?" keeps * from being greedy, so we match only what is in between each of the quotes. This regex just basically ignores the commas in between the entries. HTH Peter On Fri, 2004-12-10 at 21:59, D.Enrique ESCOBAR ESPINOZA wrote: > I m have a hell of time trying to parse the annotation file with a > regular expression. > The problem is that the file contains fileds separated by a coma, > each field starts with a double quote and it ends in a double quote, > and also it contains in each field some ';' and ','. > an exemple of that file is at the end of this mail, > can someone help and give me a trick for parsing the lines of this > file? > It has 38 fields, and excel is not even opening it correctly, > and if i try to save it back to a csv file, > it does a complet mess. > Thanks in advance. > "Probe Set ID","GeneChip Array","Species Scientific Name","Annotation > Date","Sequence Type","Sequence Source","Transcript ID","Target > Description","Representative Public ID","UniGene ID","Genome > Version","Alignments","Gene Title","Gene Symbol","Chromosomal > Location","Unigene Cluster > Type","Ensembl","LocusLink","SwissProt","EC","OMIM","RefSeq Protein > ID","RefSeq Transcript ID","FlyBase","AGI","WormBase","MGI Name","RGD > Name","SGD accession number","Gene Ontology Biological Process","Gene > Ontology Cellular Component","Gene Ontology Molecular > Function","Pathway","Protein Families","Protein > Domains","InterPro","Trans Membrane","QTL","Annotation > Description","Annotation Transcript Cluster","Transcript > Assignments","Annotation Notes" > "1007_s_at","Human Genome U133A Array","Homo sapiens","Oct 11, > 2004","Exemplar sequence","Affymetrix Proprietary > Database","U48705mRNA"," U48705 /FEATURE=mRNA /DEFINITION=HSU48705 > Human receptor tyrosine kinase DDR gene, complete cds > ","U48705","Hs.423573","May 2004 (NCBI 35)","chr6:30964144-30975910 > (+) // 95.63 // p21.33","discoidin domain receptor family, member > 1","DDR1","chr6p21.3","full length","ENSG00000137332","780","BAC85426 > /// Q08345 /// Q96T61 /// Q96T62","EC:2.7.1.112","600408","NP_001945 > /// NP_054699 /// NP_054700","NM_001954 /// NM_013993 /// > NM_013994","---","---","---","---","---","---","6468 // protein amino > acid phosphorylation // inferred from electronic annotation /// 7155 > // cell adhesion // traceable author statement /// 7169 // > transmembrane receptor protein tyrosine kinase signaling pathway // > inferred from electronic annotation","5887 // integral to plasma > membrane // traceable author statement /// 16020 // membrane // > inferred from electronic annotation","4674 // protein > serine/threonine kinase activity // inferred from electronic > annotation /// 4714 // transmembrane receptor protein tyrosine kinase > activity // traceable author statement /// 4872 // receptor activity > // inferred from electronic annotation /// 5524 // ATP binding // > inferred from electronic annotation /// 16740 // transferase activity > // inferred from electronic annotation","---","ec // ZA70_HUMAN // > ZA70_HUMAN EC:2.7.1.112:TYROSINE-PROTEIN KINASE ZAP-70 (EC 2.7.1.112) > (70 KDA ZETA-ASSOCIATED PROTEIN) (SYK-RELATED TYROSINE KINASE). // > 2.0E-65 /// Hanks // DDR // HUMRTK_1 (DDR) KINASES:5.11.1 | PTK Group > B membrane spanning protein tyrosine kinases.PTK XX DDR/TKT family > .DDR // 1.0E-156","scop // d1kexa_ // d1kexa_ SCOP:b.18.1.2:| B1 > domain of neuropilin-1 // 5.0E-42","IPR000421 // Coagulation factor > 5/8 type C domain (FA58C) /// IPR000719 // Protein > kinase","NP_054700.1 // span:417-439 // numtm:1","---","This probe > set was annotated using the Matching Probes based pipeline to a Locus > Link identifier using 1 transcripts. // false // Matching Probes // > A","NM_013994(16)","ENST00000259875 // cdna:known > chromosome:NCBI34:6:30958112:30974184:1 // ensembl // 16 // --- /// > NM_013994 // Homo sapiens discoidin domain receptor family, member 1 > (DDR1), transcript variant 3, mRNA. // refseq // 16 // > ---","ENST00000325423 // ensembl // 1 // Negative Strand Matching > Probes /// ENST00000340208 // ensembl // 1 // Negative Strand > Matching Probes /// GENSCAN00000025013 // ensembl // 1 // Negative > Strand Matching Probes /// BC026341 // gb // 1 // Negative Strand > Matching Probes /// S57212 // gb // 1 // Negative Strand Matching > Probes" > "1053_at","Human Genome U133A Array","Homo sapiens","Oct 11, > 2004","Exemplar sequence","GenBank","M87338"," M87338 /FEATURE= > /DEFINITION=HUMA1SBU Human replication factor C, 40-kDa subunit (A1) > mRNA, complete cds ","M87338","Hs.139226","May 2004 (NCBI > 35)","chr7:73090653-73113383 (-) // 70.86 // q11.23","replication > factor C (activator 1) 2, 40kDa","RFC2","chr7q11.23","full > length","ENSG00000049541","5982","AAP35707 /// > P35250","---","600404","NP_002905 /// NP_852136","NM_002914 /// > NM_181471","---","---","---","---","---","---","6260 // DNA > replication // inferred from electronic annotation","5634 // nucleus > // inferred from electronic annotation /// 5663 // DNA replication > factor C complex // traceable author statement","166 // nucleotide > binding // inferred from electronic annotation /// 3677 // DNA > binding // inferred from electronic annotation /// 5524 // ATP > binding // traceable author statement","DNA_replication // > GenMAPP","ec // KAD2_HUMAN // KAD2_HUMAN EC:2.7.4.3:ADENYLATE KINASE > ISOENZYME 2, MITOCHONDRIAL (EC 2.7.4.3) (ATP-AMP TRANSPHOSPHORYLASE). > // 8.2","scop // d1nrjb_ // d1nrjb_ SCOP:c.37.1.8:| Signal > recognition particle receptor beta-subunit // > 0.024","---","---","---","This probe set was annotated using the > Matching Probes based pipeline to a Locus Link identifier using 2 > transcripts. // false // Matching Probes // > A","M87338(15),NM_181471(12)","ENST00000055077 // cdna:known > chromosome:NCBI34:7:73057931:73080835:-1 // ensembl // 12 // --- /// > ENST00000275627 // cdna:known > chromosome:NCBI34:7:73057931:73080835:-1 // ensembl // 12 // --- /// > M87338 // Human replication factor C, 40-kDa subunit (A1) mRNA, > complete cds. // gb // 15 // --- /// NM_181471 // Homo sapiens > replication factor C (activator 1) 2, 40kDa (RFC2), transcript > variant 1, mRNA. // refseq // 12 // ---","GENSCAN00000014431 // > ensembl // 8 // Cross Hyb Matching Probes" > > > ===== > -------------------------------------------------- > D.Enrique ESCOBAR ESPINOZA (B.Sc.) > http://www.iro.umontreal.ca/~escobard/ > http://adn.bioinfo.uqam.ca/~escd07097301/ > ICQ#: 201778618 > ------------------------------------------------- > 1487, Boul. St-Joseph Est Apt4 > Tel: (514) 523-8398 > Montreal QC Canada > H2J 1M6 > > > > __________________________________ > Do you Yahoo!? > Yahoo! Mail - Easier than ever with enhanced search. Learn more. > http://info.mail.yahoo.com/mail_250 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Peter N. Robinson peter.robinson@t-online.de peter.robinson@charite.de http://www.charite.de/ch/medgen/robinson/ From jason.stajich at duke.edu Fri Dec 10 16:49:31 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Dec 10 16:46:48 2004 Subject: [Bioperl-l] HG-U133a annotation csv (HG-U133A_annot.csv) In-Reply-To: <1102714487.4639.9.camel@localhost.localdomain> References: <20041210205938.77559.qmail@web11505.mail.yahoo.com> <1102714487.4639.9.camel@localhost.localdomain> Message-ID: <5F622E78-4AF5-11D9-B8AD-000393C44276@duke.edu> The module Text::CSV_XS could be used as well - it does a pretty good job with mixed quoted and non-quoted fields. -jason On Dec 10, 2004, at 4:34 PM, Peter Robinson wrote: > while ($line =~ m/"(.*?)"/g) { > print $1; > } > The "?" keeps * from being greedy, so we match only what is in between > each of the quotes. This regex just basically ignores the commas in > between the entries. > > HTH > > Peter > > > On Fri, 2004-12-10 at 21:59, D.Enrique ESCOBAR ESPINOZA wrote: >> I m have a hell of time trying to parse the annotation file with a >> regular expression. >> The problem is that the file contains fileds separated by a coma, >> each field starts with a double quote and it ends in a double quote, >> and also it contains in each field some ';' and ','. >> an exemple of that file is at the end of this mail, >> can someone help and give me a trick for parsing the lines of this >> file? >> It has 38 fields, and excel is not even opening it correctly, >> and if i try to save it back to a csv file, >> it does a complet mess. >> Thanks in advance. >> "Probe Set ID","GeneChip Array","Species Scientific Name","Annotation >> Date","Sequence Type","Sequence Source","Transcript ID","Target >> Description","Representative Public ID","UniGene ID","Genome >> Version","Alignments","Gene Title","Gene Symbol","Chromosomal >> Location","Unigene Cluster >> Type","Ensembl","LocusLink","SwissProt","EC","OMIM","RefSeq Protein >> ID","RefSeq Transcript ID","FlyBase","AGI","WormBase","MGI Name","RGD >> Name","SGD accession number","Gene Ontology Biological Process","Gene >> Ontology Cellular Component","Gene Ontology Molecular >> Function","Pathway","Protein Families","Protein >> Domains","InterPro","Trans Membrane","QTL","Annotation >> Description","Annotation Transcript Cluster","Transcript >> Assignments","Annotation Notes" >> "1007_s_at","Human Genome U133A Array","Homo sapiens","Oct 11, >> 2004","Exemplar sequence","Affymetrix Proprietary >> Database","U48705mRNA"," U48705 /FEATURE=mRNA /DEFINITION=HSU48705 >> Human receptor tyrosine kinase DDR gene, complete cds >> ","U48705","Hs.423573","May 2004 (NCBI 35)","chr6:30964144-30975910 >> (+) // 95.63 // p21.33","discoidin domain receptor family, member >> 1","DDR1","chr6p21.3","full length","ENSG00000137332","780","BAC85426 >> /// Q08345 /// Q96T61 /// Q96T62","EC:2.7.1.112","600408","NP_001945 >> /// NP_054699 /// NP_054700","NM_001954 /// NM_013993 /// >> NM_013994","---","---","---","---","---","---","6468 // protein amino >> acid phosphorylation // inferred from electronic annotation /// 7155 >> // cell adhesion // traceable author statement /// 7169 // >> transmembrane receptor protein tyrosine kinase signaling pathway // >> inferred from electronic annotation","5887 // integral to plasma >> membrane // traceable author statement /// 16020 // membrane // >> inferred from electronic annotation","4674 // protein >> serine/threonine kinase activity // inferred from electronic >> annotation /// 4714 // transmembrane receptor protein tyrosine kinase >> activity // traceable author statement /// 4872 // receptor activity >> // inferred from electronic annotation /// 5524 // ATP binding // >> inferred from electronic annotation /// 16740 // transferase activity >> // inferred from electronic annotation","---","ec // ZA70_HUMAN // >> ZA70_HUMAN EC:2.7.1.112:TYROSINE-PROTEIN KINASE ZAP-70 (EC 2.7.1.112) >> (70 KDA ZETA-ASSOCIATED PROTEIN) (SYK-RELATED TYROSINE KINASE). // >> 2.0E-65 /// Hanks // DDR // HUMRTK_1 (DDR) KINASES:5.11.1 | PTK Group >> B membrane spanning protein tyrosine kinases.PTK XX DDR/TKT family >> .DDR // 1.0E-156","scop // d1kexa_ // d1kexa_ SCOP:b.18.1.2:| B1 >> domain of neuropilin-1 // 5.0E-42","IPR000421 // Coagulation factor >> 5/8 type C domain (FA58C) /// IPR000719 // Protein >> kinase","NP_054700.1 // span:417-439 // numtm:1","---","This probe >> set was annotated using the Matching Probes based pipeline to a Locus >> Link identifier using 1 transcripts. // false // Matching Probes // >> A","NM_013994(16)","ENST00000259875 // cdna:known >> chromosome:NCBI34:6:30958112:30974184:1 // ensembl // 16 // --- /// >> NM_013994 // Homo sapiens discoidin domain receptor family, member 1 >> (DDR1), transcript variant 3, mRNA. // refseq // 16 // >> ---","ENST00000325423 // ensembl // 1 // Negative Strand Matching >> Probes /// ENST00000340208 // ensembl // 1 // Negative Strand >> Matching Probes /// GENSCAN00000025013 // ensembl // 1 // Negative >> Strand Matching Probes /// BC026341 // gb // 1 // Negative Strand >> Matching Probes /// S57212 // gb // 1 // Negative Strand Matching >> Probes" >> "1053_at","Human Genome U133A Array","Homo sapiens","Oct 11, >> 2004","Exemplar sequence","GenBank","M87338"," M87338 /FEATURE= >> /DEFINITION=HUMA1SBU Human replication factor C, 40-kDa subunit (A1) >> mRNA, complete cds ","M87338","Hs.139226","May 2004 (NCBI >> 35)","chr7:73090653-73113383 (-) // 70.86 // q11.23","replication >> factor C (activator 1) 2, 40kDa","RFC2","chr7q11.23","full >> length","ENSG00000049541","5982","AAP35707 /// >> P35250","---","600404","NP_002905 /// NP_852136","NM_002914 /// >> NM_181471","---","---","---","---","---","---","6260 // DNA >> replication // inferred from electronic annotation","5634 // nucleus >> // inferred from electronic annotation /// 5663 // DNA replication >> factor C complex // traceable author statement","166 // nucleotide >> binding // inferred from electronic annotation /// 3677 // DNA >> binding // inferred from electronic annotation /// 5524 // ATP >> binding // traceable author statement","DNA_replication // >> GenMAPP","ec // KAD2_HUMAN // KAD2_HUMAN EC:2.7.4.3:ADENYLATE KINASE >> ISOENZYME 2, MITOCHONDRIAL (EC 2.7.4.3) (ATP-AMP TRANSPHOSPHORYLASE). >> // 8.2","scop // d1nrjb_ // d1nrjb_ SCOP:c.37.1.8:| Signal >> recognition particle receptor beta-subunit // >> 0.024","---","---","---","This probe set was annotated using the >> Matching Probes based pipeline to a Locus Link identifier using 2 >> transcripts. // false // Matching Probes // >> A","M87338(15),NM_181471(12)","ENST00000055077 // cdna:known >> chromosome:NCBI34:7:73057931:73080835:-1 // ensembl // 12 // --- /// >> ENST00000275627 // cdna:known >> chromosome:NCBI34:7:73057931:73080835:-1 // ensembl // 12 // --- /// >> M87338 // Human replication factor C, 40-kDa subunit (A1) mRNA, >> complete cds. // gb // 15 // --- /// NM_181471 // Homo sapiens >> replication factor C (activator 1) 2, 40kDa (RFC2), transcript >> variant 1, mRNA. // refseq // 12 // ---","GENSCAN00000014431 // >> ensembl // 8 // Cross Hyb Matching Probes" >> >> >> ===== >> -------------------------------------------------- >> D.Enrique ESCOBAR ESPINOZA (B.Sc.) >> http://www.iro.umontreal.ca/~escobard/ >> http://adn.bioinfo.uqam.ca/~escd07097301/ >> ICQ#: 201778618 >> ------------------------------------------------- >> 1487, Boul. St-Joseph Est Apt4 >> Tel: (514) 523-8398 >> Montreal QC Canada >> H2J 1M6 >> >> >> >> __________________________________ >> Do you Yahoo!? >> Yahoo! Mail - Easier than ever with enhanced search. Learn more. >> http://info.mail.yahoo.com/mail_250 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > Peter N. Robinson > peter.robinson@t-online.de > peter.robinson@charite.de > http://www.charite.de/ch/medgen/robinson/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From Steve_Chervitz at affymetrix.com Sun Dec 12 02:11:12 2004 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Sun Dec 12 02:09:17 2004 Subject: [Bioperl-l] HG-U133a annotation csv (HG-U133A_annot.csv) In-Reply-To: <5F622E78-4AF5-11D9-B8AD-000393C44276@duke.edu> Message-ID: Another good route is the standard module Text::ParseWords, as described in the Perl Cookbook. Here's the basic approach: use Text::ParseWords; while (<>) { chomp; my @data = parse_line(',', 0, $_); do_something(@data); } Works like a dream. BTW, Excel should handle these files no problem. What version/platform of Excel are you using? Some suggestions: Make sure the file has a .txt extension. Opening it from within Excel should then activate the Text Import Wizard. Once there, be sure that the data type is "Delimited", the delimiter is "Comma", the text qualifier is ", and the column data format is "Text" (general should work too, since all fields are enclosed in double quotes and should therefore be interpreted as text). If you still have problems, let me know. Steve > From: Jason Stajich > Date: Fri, 10 Dec 2004 16:49:31 -0500 > To: Peter Robinson > Cc: Bioperl List > Subject: Re: [Bioperl-l] HG-U133a annotation csv (HG-U133A_annot.csv) > > The module Text::CSV_XS could be used as well - it does a pretty good > job with mixed quoted and non-quoted fields. > > -jason > > On Dec 10, 2004, at 4:34 PM, Peter Robinson wrote: > >> while ($line =~ m/"(.*?)"/g) { >> print $1; >> } >> The "?" keeps * from being greedy, so we match only what is in between >> each of the quotes. This regex just basically ignores the commas in >> between the entries. >> >> HTH >> >> Peter >> >> >> On Fri, 2004-12-10 at 21:59, D.Enrique ESCOBAR ESPINOZA wrote: >>> I m have a hell of time trying to parse the annotation file with a >>> regular expression. >>> The problem is that the file contains fileds separated by a coma, >>> each field starts with a double quote and it ends in a double quote, >>> and also it contains in each field some ';' and ','. >>> an exemple of that file is at the end of this mail, >>> can someone help and give me a trick for parsing the lines of this >>> file? >>> It has 38 fields, and excel is not even opening it correctly, >>> and if i try to save it back to a csv file, >>> it does a complet mess. >>> Thanks in advance. >>> "Probe Set ID","GeneChip Array","Species Scientific Name","Annotation >>> Date","Sequence Type","Sequence Source","Transcript ID","Target >>> Description","Representative Public ID","UniGene ID","Genome >>> Version","Alignments","Gene Title","Gene Symbol","Chromosomal >>> Location","Unigene Cluster >>> Type","Ensembl","LocusLink","SwissProt","EC","OMIM","RefSeq Protein >>> ID","RefSeq Transcript ID","FlyBase","AGI","WormBase","MGI Name","RGD >>> Name","SGD accession number","Gene Ontology Biological Process","Gene >>> Ontology Cellular Component","Gene Ontology Molecular >>> Function","Pathway","Protein Families","Protein >>> Domains","InterPro","Trans Membrane","QTL","Annotation >>> Description","Annotation Transcript Cluster","Transcript >>> Assignments","Annotation Notes" >>> "1007_s_at","Human Genome U133A Array","Homo sapiens","Oct 11, >>> 2004","Exemplar sequence","Affymetrix Proprietary >>> Database","U48705mRNA"," U48705 /FEATURE=mRNA /DEFINITION=HSU48705 >>> Human receptor tyrosine kinase DDR gene, complete cds >>> ","U48705","Hs.423573","May 2004 (NCBI 35)","chr6:30964144-30975910 >>> (+) // 95.63 // p21.33","discoidin domain receptor family, member >>> 1","DDR1","chr6p21.3","full length","ENSG00000137332","780","BAC85426 >>> /// Q08345 /// Q96T61 /// Q96T62","EC:2.7.1.112","600408","NP_001945 >>> /// NP_054699 /// NP_054700","NM_001954 /// NM_013993 /// >>> NM_013994","---","---","---","---","---","---","6468 // protein amino >>> acid phosphorylation // inferred from electronic annotation /// 7155 >>> // cell adhesion // traceable author statement /// 7169 // >>> transmembrane receptor protein tyrosine kinase signaling pathway // >>> inferred from electronic annotation","5887 // integral to plasma >>> membrane // traceable author statement /// 16020 // membrane // >>> inferred from electronic annotation","4674 // protein >>> serine/threonine kinase activity // inferred from electronic >>> annotation /// 4714 // transmembrane receptor protein tyrosine kinase >>> activity // traceable author statement /// 4872 // receptor activity >>> // inferred from electronic annotation /// 5524 // ATP binding // >>> inferred from electronic annotation /// 16740 // transferase activity >>> // inferred from electronic annotation","---","ec // ZA70_HUMAN // >>> ZA70_HUMAN EC:2.7.1.112:TYROSINE-PROTEIN KINASE ZAP-70 (EC 2.7.1.112) >>> (70 KDA ZETA-ASSOCIATED PROTEIN) (SYK-RELATED TYROSINE KINASE). // >>> 2.0E-65 /// Hanks // DDR // HUMRTK_1 (DDR) KINASES:5.11.1 | PTK Group >>> B membrane spanning protein tyrosine kinases.PTK XX DDR/TKT family >>> .DDR // 1.0E-156","scop // d1kexa_ // d1kexa_ SCOP:b.18.1.2:| B1 >>> domain of neuropilin-1 // 5.0E-42","IPR000421 // Coagulation factor >>> 5/8 type C domain (FA58C) /// IPR000719 // Protein >>> kinase","NP_054700.1 // span:417-439 // numtm:1","---","This probe >>> set was annotated using the Matching Probes based pipeline to a Locus >>> Link identifier using 1 transcripts. // false // Matching Probes // >>> A","NM_013994(16)","ENST00000259875 // cdna:known >>> chromosome:NCBI34:6:30958112:30974184:1 // ensembl // 16 // --- /// >>> NM_013994 // Homo sapiens discoidin domain receptor family, member 1 >>> (DDR1), transcript variant 3, mRNA. // refseq // 16 // >>> ---","ENST00000325423 // ensembl // 1 // Negative Strand Matching >>> Probes /// ENST00000340208 // ensembl // 1 // Negative Strand >>> Matching Probes /// GENSCAN00000025013 // ensembl // 1 // Negative >>> Strand Matching Probes /// BC026341 // gb // 1 // Negative Strand >>> Matching Probes /// S57212 // gb // 1 // Negative Strand Matching >>> Probes" >>> "1053_at","Human Genome U133A Array","Homo sapiens","Oct 11, >>> 2004","Exemplar sequence","GenBank","M87338"," M87338 /FEATURE= >>> /DEFINITION=HUMA1SBU Human replication factor C, 40-kDa subunit (A1) >>> mRNA, complete cds ","M87338","Hs.139226","May 2004 (NCBI >>> 35)","chr7:73090653-73113383 (-) // 70.86 // q11.23","replication >>> factor C (activator 1) 2, 40kDa","RFC2","chr7q11.23","full >>> length","ENSG00000049541","5982","AAP35707 /// >>> P35250","---","600404","NP_002905 /// NP_852136","NM_002914 /// >>> NM_181471","---","---","---","---","---","---","6260 // DNA >>> replication // inferred from electronic annotation","5634 // nucleus >>> // inferred from electronic annotation /// 5663 // DNA replication >>> factor C complex // traceable author statement","166 // nucleotide >>> binding // inferred from electronic annotation /// 3677 // DNA >>> binding // inferred from electronic annotation /// 5524 // ATP >>> binding // traceable author statement","DNA_replication // >>> GenMAPP","ec // KAD2_HUMAN // KAD2_HUMAN EC:2.7.4.3:ADENYLATE KINASE >>> ISOENZYME 2, MITOCHONDRIAL (EC 2.7.4.3) (ATP-AMP TRANSPHOSPHORYLASE). >>> // 8.2","scop // d1nrjb_ // d1nrjb_ SCOP:c.37.1.8:| Signal >>> recognition particle receptor beta-subunit // >>> 0.024","---","---","---","This probe set was annotated using the >>> Matching Probes based pipeline to a Locus Link identifier using 2 >>> transcripts. // false // Matching Probes // >>> A","M87338(15),NM_181471(12)","ENST00000055077 // cdna:known >>> chromosome:NCBI34:7:73057931:73080835:-1 // ensembl // 12 // --- /// >>> ENST00000275627 // cdna:known >>> chromosome:NCBI34:7:73057931:73080835:-1 // ensembl // 12 // --- /// >>> M87338 // Human replication factor C, 40-kDa subunit (A1) mRNA, >>> complete cds. // gb // 15 // --- /// NM_181471 // Homo sapiens >>> replication factor C (activator 1) 2, 40kDa (RFC2), transcript >>> variant 1, mRNA. // refseq // 12 // ---","GENSCAN00000014431 // >>> ensembl // 8 // Cross Hyb Matching Probes" >>> >>> >>> ===== >>> -------------------------------------------------- >>> D.Enrique ESCOBAR ESPINOZA (B.Sc.) >>> http://www.iro.umontreal.ca/~escobard/ >>> http://adn.bioinfo.uqam.ca/~escd07097301/ >>> ICQ#: 201778618 >>> ------------------------------------------------- >>> 1487, Boul. St-Joseph Est Apt4 >>> Tel: (514) 523-8398 >>> Montreal QC Canada >>> H2J 1M6 >>> >>> >>> >>> __________________________________ >>> Do you Yahoo!? >>> Yahoo! Mail - Easier than ever with enhanced search. Learn more. >>> http://info.mail.yahoo.com/mail_250 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> -- >> Peter N. Robinson >> peter.robinson@t-online.de >> peter.robinson@charite.de >> http://www.charite.de/ch/medgen/robinson/ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From bcur001 at ec.auckland.ac.nz Sun Dec 12 23:13:05 2004 From: bcur001 at ec.auckland.ac.nz (bcur001@ec.auckland.ac.nz) Date: Sun Dec 12 23:12:31 2004 Subject: [Bioperl-l] Installing bioperl-ext-1.4 In-Reply-To: <200412102127.iBALPiKu021926@portal.open-bio.org> References: <200412102127.iBALPiKu021926@portal.open-bio.org> Message-ID: <1102911185.41bd16d165145@webmail2.ec.auckland.ac.nz> I am wanting to run code to do smith-waterman alignment. From what I can see, I need the EMBOSS suite, which appears to come as part of bioperl-ext-1.4. I have installed bioperl-1.4 fine. when I attempt to install bioperl-ext-1.4 however, I encounter problems. I've worked my way through a few initial errors, finding and installing the staden library and the Inline pm (both of which appear to ahve installed fine), I have, however, finally been stumped. Upon attempting to run `perl Makefile.PL` from the bioperl-ext-1.4/ directory, I get the following: Writing Makefile for Bio::Ext::Align Found Staden io_lib "libread" in /usr/local/lib ... Automatically using the Read.h found in /usr/local/include/io_lib ... Writing Makefile for Bio::SeqIO::staden::read Writing Makefile for Bio One or more DATA sections were not processed by Inline. And there it ends. I can't find any similar problems on the mailing list archives, so I'm wondering if anyone has any suggestions as to where I could be looking to find out what the problem is. Someway I could elicit some more informative errors messages would be nice. If it's of any use, I'm working on a fairly standard Mandrake 10.0 installation. Thanks. Ben. From michael.watson at bbsrc.ac.uk Mon Dec 13 05:25:39 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon Dec 13 05:23:07 2004 Subject: [Bioperl-l] Tabular BLAST output and SearchIO Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E899DC@iahce2knas1.iah.bbsrc.reserved> Hi Do I have to do anything special with SearchIO objects to get them to read NCBI blast output produced with the "-m 8" (tabular without comment lines) option? Because I'm using it as I would with a normal BLAST output format and getting an undefined value when calling next_result() (using bioperl 1.4 on Suse Linux) Thanks Mick From michael.watson at bbsrc.ac.uk Mon Dec 13 05:44:58 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon Dec 13 05:43:23 2004 Subject: [Bioperl-l] Tabular BLAST output and SearchIO Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E899DD@iahce2knas1.iah.bbsrc.reserved> Sorry, ignore me, I need to use the "blasttable" format. Mick -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of michael watson (IAH-C) Sent: 13 December 2004 10:26 To: Bioperl Subject: [Bioperl-l] Tabular BLAST output and SearchIO Hi Do I have to do anything special with SearchIO objects to get them to read NCBI blast output produced with the "-m 8" (tabular without comment lines) option? Because I'm using it as I would with a normal BLAST output format and getting an undefined value when calling next_result() (using bioperl 1.4 on Suse Linux) Thanks Mick _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From pawel.szczesny at tuebingen.mpg.de Mon Dec 13 05:56:43 2004 From: pawel.szczesny at tuebingen.mpg.de (Pawel Szczesny) Date: Mon Dec 13 05:54:55 2004 Subject: [Bioperl-l] Getting score from Emboss-type alignment (water) Message-ID: <41BD756B.1010305@tuebingen.mpg.de> Hello, I have a problem in extracting score from alignment in EMBOSS format (water in this case). The code: my $in = new Bio::AlignIO(-format => 'emboss',-file => 'test.water'); while( my $aln = $in->next_aln ) { print $aln->score; } $aln->score seems to be undefined. File emboss.pm from BioPerl Live has additional lines compared to stable distribution, but I have no idea why even with patching emboss.pm it doesn't work at all. Any ideas? I was searching documentation, but reporting score seem to be implemented only in Blast parser. Regards Pawel Szczesny From michael.watson at bbsrc.ac.uk Mon Dec 13 08:07:18 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon Dec 13 08:06:14 2004 Subject: [Bioperl-l] Getting query start and end from blast table report Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E899E1@iahce2knas1.iah.bbsrc.reserved> Hi How do you access the start and end of hits parsed using SearchIO and the "blasttable" format option? >From the docs, as far as I can see, hit objects parsed this way do not have HSP objects, but it is only hsp objects that have start and end methods... Anyone help? Mick From jason.stajich at duke.edu Mon Dec 13 08:33:22 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Dec 13 08:30:48 2004 Subject: [Bioperl-l] Getting score from Emboss-type alignment (water) In-Reply-To: <41BD756B.1010305@tuebingen.mpg.de> References: <41BD756B.1010305@tuebingen.mpg.de> Message-ID: <8F6482D4-4D0B-11D9-B718-000393C44276@duke.edu> Scores are set by the Alignment parser - we separate the 'running' from the 'parsing'. Bio::AlignIO::emboss had to be updated. -jason On Dec 13, 2004, at 5:56 AM, Pawel Szczesny wrote: > Hello, > > I have a problem in extracting score from alignment in EMBOSS format > (water in this case). > > The code: > > my $in = new Bio::AlignIO(-format => 'emboss',-file => 'test.water'); > while( my $aln = $in->next_aln ) { > print $aln->score; > } > > $aln->score seems to be undefined. File emboss.pm from BioPerl Live > has additional lines compared to stable distribution, but I have no > idea why even with patching emboss.pm it doesn't work at all. > > Any ideas? I was searching documentation, but reporting score seem to > be implemented only in Blast parser. > > Regards > Pawel Szczesny > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Mon Dec 13 08:35:00 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Dec 13 08:32:08 2004 Subject: [Bioperl-l] Tabular BLAST output and SearchIO In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95E899DC@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E95E899DC@iahce2knas1.iah.bbsrc.reserved> Message-ID: I think you've already figured it out based on your other question but 'blasttable' since it is hard to auto-detect this format we made it a separate SearchIO parser module. -jason On Dec 13, 2004, at 5:25 AM, michael watson ((IAH-C)) wrote: > Hi > > Do I have to do anything special with SearchIO objects to get them to > read NCBI blast output produced with the "-m 8" (tabular without > comment > lines) option? Because I'm using it as I would with a normal BLAST > output format and getting an undefined value when calling next_result() > > (using bioperl 1.4 on Suse Linux) > > Thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From Annie.Law at nrc-cnrc.gc.ca Mon Dec 13 11:03:47 2004 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Mon Dec 13 11:01:22 2004 Subject: [Bioperl-l] Entrez Gene and bioperl-db Message-ID: <10C94843061E094A98C02EB77CFC328722FEDF@nrcmrdex1d.imsb.nrc.ca> Hi, I was wondering with regards to bioperl-db the scripts and schema and load_seqdatabase.pl has there been preparation for integration of Entrez gene information when locuslink is phased out? Or if it has already been changed could somebody point me to the documentation or changed code? Thanks, Annie. From amackey at pcbi.upenn.edu Mon Dec 13 11:39:53 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Dec 13 11:37:24 2004 Subject: [Bioperl-l] Installing bioperl-ext-1.4 In-Reply-To: <1102911185.41bd16d165145@webmail2.ec.auckland.ac.nz> References: <200412102127.iBALPiKu021926@portal.open-bio.org> <1102911185.41bd16d165145@webmail2.ec.auckland.ac.nz> Message-ID: <9DA10456-4D25-11D9-9B7B-000D93392082@pcbi.upenn.edu> On Dec 12, 2004, at 11:13 PM, bcur001@ec.auckland.ac.nz wrote: > I am wanting to run code to do smith-waterman alignment. From what I > can see, I > need the EMBOSS suite, which appears to come as part of > bioperl-ext-1.4. Where does that appear? EMBOSS is a separate suite of utilities, and does not come with bioperl-ext > Upon attempting to run `perl Makefile.PL` from the bioperl-ext-1.4/ > directory, I get > the following: > > Writing Makefile for Bio::Ext::Align > Found Staden io_lib "libread" in /usr/local/lib ... > Automatically using the Read.h found in /usr/local/include/io_lib ... > Writing Makefile for Bio::SeqIO::staden::read > Writing Makefile for Bio > One or more DATA sections were not processed by Inline. > > > And there it ends. As it is supposed to; you've made the necessary Makefiles, all that remains is for you to type "make", "make test" and "make install" when satisfied. The Inline warning should be ignored. But I think what you want is to install EMBOSS, and then bioperl-run to get Bio::Tools::Emboss functionality. -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From amackey at pcbi.upenn.edu Mon Dec 13 11:53:55 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Dec 13 12:27:41 2004 Subject: [Bioperl-l] Fuzzy Pattern Matching Algorithm In-Reply-To: <9b18b3110412100304534a9176@mail.gmail.com> References: <9b18b3110412100304534a9176@mail.gmail.com> Message-ID: <9327BCA8-4D27-11D9-9B7B-000D93392082@pcbi.upenn.edu> On Dec 10, 2004, at 6:04 AM, demerphq wrote: > I suspect that one of the few places this code may actually prove > useful is in the context of Bioperl. Im curious as to what solutions > and scenarios such algorithms would be used in. For those who > participated I believe most were purely interested as a Comp Sci > problem and not actually for the utility of the solution itself so we > really have no idea. Typically, we need to do "fuzzy" matching by sequence motif or profile (i.e. position-specific, weighted matching), not this kind of k-mismatch, which are sometimes used as fast "seed"-finding steps in larger sequence search/alignment algorithms (you guys should check out the BLAT algorithm for kicks and giggles in approximate string matching). There is also an extensive literature on these subjects, including the use of indexed trie's. -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From harry at liverpool.ac.uk Mon Dec 13 07:13:27 2004 From: harry at liverpool.ac.uk (Harry Noyes) Date: Mon Dec 13 19:28:20 2004 Subject: [Bioperl-l] Bio::Tools::BPbl2seq Parsing bl2seq Message-ID: <6.1.2.0.0.20041213121140.0206b150@pop1.liv.ac.uk> I am trying to parse a bl2seq output file using Bio::Tools::BPbl2seq and I get the followoing error message: Can't call method "nextHSP" on unblessed reference at /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/BPbl2seq.pm line 243, line 7. The error is generated when I run this script use Bio::Tools::BPbl2seq; my $report = Bio::Tools::BPbl2seq->new( -file => 'temp.txt'); $report->sbjctName; $report->sbjctLength; my $hsp = $report->next_feature; my $S_start = $hsp->sbjct->start; my $S_end = $hsp->sbjct->end; if ($S_start) {print "Start $S_start \n";} Since I am a beginner at this,I am probably doing something stupid. The file I am parsing is below and was generated with the statement: system("/usr/local/genome/blast/bl2seq -i $probe_file -j target_file.txt -p blastn -o temp.txt"); . I have also tried inserting a statement " -format => 'blastn', " before the " -file => " statement but I still get the Warning message and the message about the unblessed reference. Any help would be much appreciated. Thanks Harry ###################################################################### #COMMAND LINE OUTPUT perl blast_parser_test.pl -------------------- WARNING --------------------- MSG: Must provide which type of BLAST was run (blastp,blastn, tblastn, tblastx, blastx) if you want strand information to get set properly for DNA query or subjects --------------------------------------------------- Can't call method "nextHSP" on unblessed reference at /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/BPbl2seq.pm line 243, line 7. ###################################################################### #INPUT FILE (I HAVE OMMITTED MINOR ALIGNMENTS) Query= (250 letters) > Length = 105880 Score = 496 bits (250), Expect = e-142 Identities = 250/250 (100%) Strand = Plus / Minus Query: 1 acctgctagagccttgatctgggaatctaagttttcataattatgaacaataaatttatg 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 3962 acctgctagagccttgatctgggaatctaagttttcataattatgaacaataaatttatg 3903 Query: 61 ttatttataaactacccgatataagatattttattacagcagcaagaatggactaagatg 120 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 3902 ttatttataaactacccgatataagatattttattacagcagcaagaatggactaagatg 3843 Query: 121 agtgcaaaatctgagaaggaaaccacaggtacctgcaagtactggaatattccataattg 180 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 3842 agtgcaaaatctgagaaggaaaccacaggtacctgcaagtactggaatattccataattg 3783 Query: 181 attaggtgggagtttaaatgtaagacagtaagttatattgctaaatatgaatgctgaggt 240 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 3782 attaggtgggagtttaaatgtaagacagtaagttatattgctaaatatgaatgctgaggt 3723 Query: 241 cctccctaaa 250 |||||||||| Sbjct: 3722 cctccctaaa 3713 Score = 28.2 bits (14), Expect = 0.079 Identities = 14/14 (100%) Strand = Plus / Plus Query: 34 ttcataattatgaa 47 |||||||||||||| Sbjct: 3916 ttcataattatgaa 3929 Lambda K H 1.37 0.711 1.31 Gapped Lambda K H 1.37 0.711 1.31 Matrix: blastn matrix:1 -3 Gap Penalties: Existence: 5, Extension: 2 Number of Hits to DB: 15 Number of Sequences: 0 Number of extensions: 15 Number of successful extensions: 15 Number of sequences better than 10.0: 1 Number of HSP's better than 10.0 without gapping: 1 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 0 Number of HSP's gapped (non-prelim): 15 length of query: 250 length of database: 105,880 effective HSP length: 12 effective length of query: 238 effective length of database: 105,868 effective search space: 25196584 effective search space used: 25196584 T: 0 A: 0 X1: 6 (11.9 bits) X2: 15 (29.7 bits) S1: 12 (24.3 bits) S2: 11 (22.3 bits) *******************************NOTE NEW ADDRESS AND PHONE NUMBERS***************** Harry Noyes Room 231 Biosciences Building School of Biological Sciences, University of Liverpool, Crown St. Liverpool L69 7ZB Internal 7334 Tel 0151-794-7334 Fax 0151-795-4408 email harry@liv.ac.uk http://www.genomics.liv.ac.uk/ From openmind13 at biosys.kaist.ac.kr Mon Dec 13 01:02:38 2004 From: openmind13 at biosys.kaist.ac.kr (=?ks_c_5601-1987?B?wMzAusGk?=) Date: Mon Dec 13 19:28:26 2004 Subject: [Bioperl-l] How to mirror Bioperl? Message-ID: <200412130605.iBD65sKN008809@biosys.kaist.ac.kr> Hello, I?d like to mirror ?bioperl? program, could you let me know the way? I am a student in South Korea. I will appreciate if you give me a description about that. Thanks, EunJung Li. From jason.stajich at duke.edu Mon Dec 13 20:18:12 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Dec 13 20:17:19 2004 Subject: [Bioperl-l] Re: How can I extract the orientation from a blastreport using BPlite In-Reply-To: <000601c4e16e$9220ece0$6fc1148e@omid> References: <000601c4e16e$9220ece0$6fc1148e@omid> Message-ID: <05EB7836-4D6E-11D9-9ED6-000393C44276@duke.edu> Please ask these questions on the bioperl list. Try out the help in the SearchIO HOWTO first to answer your question. See the bioperl.org website for the FAQ pointer. On Dec 13, 2004, at 6:50 PM, omid gulban wrote: > Hi Jason, > > I am using BPliste to parse some blast reports I could not find a > method that allows you to parse orientaion information about the > alignemnt. > Is it possible to do so or there is other bioperl modules that allow > you to extract such information. > > many thanks > Omid > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hlapp at gmx.net Tue Dec 14 03:00:48 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Dec 14 02:58:04 2004 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <10C94843061E094A98C02EB77CFC328722FEDF@nrcmrdex1d.imsb.nrc.ca> Message-ID: <4445633C-4DA6-11D9-89A3-000A959EB4C4@gmx.net> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for parsing any input file, what you're asking is whether or not there is a SeqIO parser for NCBI Gene. The answer to that question is no, not yet. Anybody who feels motivated is welcome to give it a try ... Since I'll need it, I'll write the parser if nobody else does within the next 3 months, but I'm not going to promise when exactly this will happen. -hilmar On Monday, December 13, 2004, at 08:03 AM, Law, Annie wrote: > Hi, > > I was wondering with regards to bioperl-db the scripts and schema and > load_seqdatabase.pl has there been preparation for integration of > Entrez > gene information when locuslink is phased out? Or if it has already > been > changed could somebody point > me to the documentation or changed code? > > Thanks, > Annie. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From sdavis2 at mail.nih.gov Tue Dec 14 05:54:45 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Dec 14 05:52:14 2004 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <10C94843061E094A98C02EB77CFC328722FEDF@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FEDF@nrcmrdex1d.imsb.nrc.ca> Message-ID: <9111AE41-4DBE-11D9-A4BA-000D933565E8@mail.nih.gov> The information from Entrez Gene is in the form of tab-delimited text files, so parsing and loading should be pretty straightforward? Of course, there is no sequence information loaded. In fact, given what is available in Gene right now, is it sufficient to load RefSeq? The Gene Reference Into Function is not included with refseq, nor is the omim (for human) mapping, but the gene ontology, function, and summary function, as well as mapping to Gene ID are included, so will that do it? Sean On Dec 13, 2004, at 11:03 AM, Law, Annie wrote: > Hi, > > I was wondering with regards to bioperl-db the scripts and schema and > load_seqdatabase.pl has there been preparation for integration of > Entrez > gene information when locuslink is phased out? Or if it has already > been > changed could somebody point > me to the documentation or changed code? > > Thanks, > Annie. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jforment at ibmcp.upv.es Tue Dec 14 06:08:20 2004 From: jforment at ibmcp.upv.es (Javier Forment Millet) Date: Tue Dec 14 06:07:02 2004 Subject: [Bioperl-l] Parsing TGICL-CAP3 ACE result files Message-ID: <1103022500.41bec9a474614@webmail2.upv.es> Hi,... I have to parse CAP3 ACE result files obtained with the TGICL pipeline (TIGR Gene Indices clustering tools). I know about the bioperl-1.4::Bio::Assembly module for working with contig assemblies, but it seems to be Phrap-orientated. Has anybody parsed CAP3 ACE result files with this module or a modification of it? Is there another module available for parsing these files? Thanks a lot, Javier. -- Javier Forment Millet Instituto de Biologia Molecular y Celular de Plantas (IBMCP) UPV-CSIC Avenida de los Naranjos, s/n 46022 Valencia (SPAIN) jforment@ibmcp.upv.es Tlf.:+34-96-3877885 From jrm at compbio.dundee.ac.uk Tue Dec 14 06:55:12 2004 From: jrm at compbio.dundee.ac.uk (Jon manning) Date: Tue Dec 14 06:54:55 2004 Subject: [Bioperl-l] copying Bio::Tree::Tree objects Message-ID: <1103025312.10587.5.camel@localhost.localdomain> Hi to all, Is there a simple way of copying tree objects, i.e. not pass by reference like: $tree2 = $tree; keeping node identifiers etc the same? I want to change one copy while keeping the other to refer back to. I tried using TreeIO to write and then read in the tree, but this seems to preclude the use of the same node identifiers, making the comparison tricky. Thanks, Jon From jason.stajich at duke.edu Tue Dec 14 08:38:08 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Dec 14 08:35:21 2004 Subject: [Bioperl-l] copying Bio::Tree::Tree objects In-Reply-To: <1103025312.10587.5.camel@localhost.localdomain> References: <1103025312.10587.5.camel@localhost.localdomain> Message-ID: <63E81420-4DD5-11D9-9ED6-000393C44276@duke.edu> There is no deep copy (yet... feel free to write one). Node identifiers (internal_id) are intended to be unique for all nodes in memory. Presumably you want to be able to refer to the same internal node in two different trees? Why not give them all labels and then you can use the $node->id method. Just give all non-tip nodes a label and then write out the tree. my $id = 1; for my $node ( grep { ! $_->is_Leaf } $tree->get_nodes ) { $node->id("internal_$id"); $id++; } my $out = Bio::TreeIO->new(-format => 'newick', -file => ">treecopy.tre"); $out->write_tree($tree); $out->write_tree($tree); #write it twice so you'll have two copies undef $out; my $in = Bio::TreeIO->new(-format => 'newick', -file => 'treecopy.tre'); my $tree1 = $in->next_tree; my $tree2 = $in->next_tree; Now you should have two copies of the tree. -jason On Dec 14, 2004, at 6:55 AM, Jon manning wrote: > Hi to all, > > Is there a simple way of copying tree objects, i.e. not pass by > reference like: > > $tree2 = $tree; > > keeping node identifiers etc the same? I want to change one copy while > keeping the other to refer back to. I tried using TreeIO to write and > then read in the tree, but this seems to preclude the use of the same > node identifiers, making the comparison tricky. > > Thanks, > > Jon > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Tue Dec 14 08:40:32 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Dec 14 08:37:42 2004 Subject: [Bioperl-l] Getting query start and end from blast table report In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95E899E1@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E95E899E1@iahce2knas1.iah.bbsrc.reserved> Message-ID: Should work - that's the whole point of the parser - you need to provide some example code + version of Bioperl so we can follow along at home. The standard, Result -> Hit -> HSP methodology is what you should be using. -jason On Dec 13, 2004, at 8:07 AM, michael watson ((IAH-C)) wrote: > Hi > > How do you access the start and end of hits parsed using SearchIO and > the "blasttable" format option? > >> From the docs, as far as I can see, hit objects parsed this way do not > have HSP objects, but it is only hsp objects that have start and end > methods... > > Anyone help? > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Tue Dec 14 08:52:01 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Dec 14 08:49:19 2004 Subject: [Bioperl-l] Bio::Tools::BPbl2seq Parsing bl2seq In-Reply-To: <6.1.2.0.0.20041213121140.0206b150@pop1.liv.ac.uk> References: <6.1.2.0.0.20041213121140.0206b150@pop1.liv.ac.uk> Message-ID: <546FF666-4DD7-11D9-9ED6-000393C44276@duke.edu> I'd use Bio::SearchIO, format 'blast' instead of BPbl2seq. But anyways... You want to protect the calls with a while loop: while( my $hsp = $report->next_feature ) { } Because if there are no alignments you'll get an error like you just saw. Passing option " -report_type => 'blastn' " is necessary for guessing how to do the strandedness correctly. -jason On Dec 13, 2004, at 7:13 AM, Harry Noyes wrote: > I am trying to parse a bl2seq output file using Bio::Tools::BPbl2seq > and I get the followoing error message: > Can't call method "nextHSP" on unblessed reference at > /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/BPbl2seq.pm line 243, > line 7. > > The error is generated when I run this script > > > use Bio::Tools::BPbl2seq; > my $report = Bio::Tools::BPbl2seq->new( > -file => 'temp.txt'); > $report->sbjctName; > $report->sbjctLength; > my $hsp = $report->next_feature; > my $S_start = $hsp->sbjct->start; > my $S_end = $hsp->sbjct->end; > > > if ($S_start) {print "Start $S_start \n";} > > > Since I am a beginner at this,I am probably doing something stupid. > The file I am parsing is below and was generated with the statement: > system("/usr/local/genome/blast/bl2seq -i $probe_file -j > target_file.txt -p blastn -o temp.txt"); > > . > I have also tried inserting a statement " -format => 'blastn', > " before the " -file => " statement but I still get > the Warning message and the message about the unblessed reference. > Any help would be much appreciated. > Thanks > > Harry > > ###################################################################### > #COMMAND LINE OUTPUT > > perl blast_parser_test.pl > > -------------------- WARNING --------------------- > MSG: Must provide which type of BLAST was run (blastp,blastn, tblastn, > tblastx, blastx) if you want strand information to get set properly > for DNA query or subjects > --------------------------------------------------- > Can't call method "nextHSP" on unblessed reference at > /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/BPbl2seq.pm line 243, > line 7. > > ###################################################################### > #INPUT FILE (I HAVE OMMITTED MINOR ALIGNMENTS) > Query= > (250 letters) > > > > Length = 105880 > > Score = 496 bits (250), Expect = e-142 > Identities = 250/250 (100%) > Strand = Plus / Minus > > > Query: 1 > acctgctagagccttgatctgggaatctaagttttcataattatgaacaataaatttatg 60 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 3962 > acctgctagagccttgatctgggaatctaagttttcataattatgaacaataaatttatg 3903 > > > Query: 61 > ttatttataaactacccgatataagatattttattacagcagcaagaatggactaagatg 120 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 3902 > ttatttataaactacccgatataagatattttattacagcagcaagaatggactaagatg 3843 > > > Query: 121 > agtgcaaaatctgagaaggaaaccacaggtacctgcaagtactggaatattccataattg 180 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 3842 > agtgcaaaatctgagaaggaaaccacaggtacctgcaagtactggaatattccataattg 3783 > > > Query: 181 > attaggtgggagtttaaatgtaagacagtaagttatattgctaaatatgaatgctgaggt 240 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 3782 > attaggtgggagtttaaatgtaagacagtaagttatattgctaaatatgaatgctgaggt 3723 > > > Query: 241 cctccctaaa 250 > |||||||||| > Sbjct: 3722 cctccctaaa 3713 > > > > Score = 28.2 bits (14), Expect = 0.079 > Identities = 14/14 (100%) > Strand = Plus / Plus > > > Query: 34 ttcataattatgaa 47 > |||||||||||||| > Sbjct: 3916 ttcataattatgaa 3929 > > Lambda K H > 1.37 0.711 1.31 > > Gapped > Lambda K H > 1.37 0.711 1.31 > > > Matrix: blastn matrix:1 -3 > Gap Penalties: Existence: 5, Extension: 2 > Number of Hits to DB: 15 > Number of Sequences: 0 > Number of extensions: 15 > Number of successful extensions: 15 > Number of sequences better than 10.0: 1 > Number of HSP's better than 10.0 without gapping: 1 > Number of HSP's successfully gapped in prelim test: 0 > Number of HSP's that attempted gapping in prelim test: 0 > Number of HSP's gapped (non-prelim): 15 > length of query: 250 > length of database: 105,880 > effective HSP length: 12 > effective length of query: 238 > effective length of database: 105,868 > effective search space: 25196584 > effective search space used: 25196584 > T: 0 > A: 0 > X1: 6 (11.9 bits) > X2: 15 (29.7 bits) > S1: 12 (24.3 bits) > S2: 11 (22.3 bits) > > > > *******************************NOTE NEW ADDRESS AND PHONE > NUMBERS***************** > Harry Noyes > Room 231 > Biosciences Building > School of Biological Sciences, > University of Liverpool, > Crown St. > Liverpool > L69 7ZB > Internal 7334 > Tel 0151-794-7334 > Fax 0151-795-4408 > email harry@liv.ac.uk > http://www.genomics.liv.ac.uk/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jforment at ibmcp.upv.es Tue Dec 14 10:59:46 2004 From: jforment at ibmcp.upv.es (Javier Forment Millet) Date: Tue Dec 14 10:56:59 2004 Subject: [Bioperl-l] Parsing TGICL-CAP3 ACE result files In-Reply-To: References: Message-ID: <1103039986.41bf0df2e084a@webmail2.upv.es> There is already a Bio::Assembly::IO::ace file in the Bioperl-1.4 distribution, but it is not the same as the one you have sent. Actually, your ace.pm seems not to be as phrap-oriented as the other one. For example, when searching for the contig name, your file states (/^CO (.*Contig\d+) (\d+) (\d+) (\d+) (\w+)/) && do { my $contigID = $1; (so $contigID becomes, for example, 'CL2Contig14', for it is the way TGICL names contigs) whereas the one with the distribution does (/^CO Contig(\d+) (\d+) (\d+) (\d+) (\w+)/) && do { my $contigID = $1; (so $contigID becomes, for example, '857', which is the way phrap names contigs). I don't know if there is other crucial changes, but I will try it. Thanks a looooot, Javier. Cita de Marc Logghe : > Hi Javier, > It is already quite old now, don't know what changes have been done in the > mean time. > Have a try with the Bio::Assembly::IO::ace file attached. > This package I used for exactly the same purpose as you. > Good luck, > Marc > > > -----Original Message----- > > From: Javier Forment Millet [mailto:jforment@ibmcp.upv.es] > > Sent: Tuesday, December 14, 2004 12:08 PM > > To: bioperl-l@portal.open-bio.org > > Subject: [Bioperl-l] Parsing TGICL-CAP3 ACE result files > > > > > > Hi,... > > > > I have to parse CAP3 ACE result files obtained with the TGICL > > pipeline (TIGR > > Gene Indices clustering tools). I know about the > > bioperl-1.4::Bio::Assembly > > module for working with contig assemblies, but it seems to be > > Phrap-orientated. > > Has anybody parsed CAP3 ACE result files with this module or > > a modification of > > it? Is there another module available for parsing these files? > > > > Thanks a lot, > > > > Javier. > > > > -- > > Javier Forment Millet > > Instituto de Biologia Molecular y Celular de Plantas (IBMCP) > > UPV-CSIC > > Avenida de los Naranjos, s/n > > 46022 Valencia (SPAIN) > > jforment@ibmcp.upv.es > > Tlf.:+34-96-3877885 > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- Javier Forment Millet Instituto de Biologia Molecular y Celular de Plantas (IBMCP) UPV-CSIC Avenida de los Naranjos, s/n 46022 Valencia (SPAIN) jforment@ibmcp.upv.es Tlf.:+34-96-3877885 From Marc.Logghe at devgen.com Tue Dec 14 11:18:58 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Dec 14 11:18:04 2004 Subject: [Bioperl-l] Parsing TGICL-CAP3 ACE result files Message-ID: > There is already a Bio::Assembly::IO::ace file in the > Bioperl-1.4 distribution, > but it is not the same as the one you have sent. Actually, > your ace.pm seems > not to be as phrap-oriented as the other one. For example, > when searching for No it is not the same, indeed. It is an old version based on ace.pm from the bioperl release almost 2 years ago. I made few adaptations so that the TGICL generated ace files could be parsed. I did not want to commit the code because I did not have phrap data (or other data) to check whether I broke anything or not. Hope it still works within the environment of new bioperl. Cheers, Marc From sdavis2 at mail.nih.gov Tue Dec 14 11:41:04 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Dec 14 11:38:42 2004 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <10C94843061E094A98C02EB77CFC328722FEE5@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FEE5@nrcmrdex1d.imsb.nrc.ca> Message-ID: Annie, I have heard rumors that the gene ontology and cdd stuff are going to be in the refseq entries rather than a "gene2go" file, but I haven't seen direct confirmation of this. If you look in the CDS section of some of the refseq entries (http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? db=nucleotide&val=56550106 as an example), you will see the gene ontology information there. I honestly don't know how this is handled by bioperl-db.... Also, I wish I could say I KNEW what NCBI was doing here so that we could all plan, but I don't. Sean On Dec 14, 2004, at 10:39 AM, Law, Annie wrote: > Hey Sean, > > As always good to hear from you. Which particular file are you > referring to > in the > Ftp folders? Basically, there is the gene2unigene file but no > equivalent of > loc2go. > I wrote to NCBI and they said there will be more files to come but no > mention specifically about an equivalent of > Loc2go. > > Have a wonderful Christmas and a happy new year, > Annie. > > -----Original Message----- > From: Sean Davis [mailto:sdavis2@mail.nih.gov] > Sent: Tuesday, December 14, 2004 5:55 AM > To: Law, Annie > Cc: 'bioperl-l@portal.open-bio.org' > Subject: Re: [Bioperl-l] Entrez Gene and bioperl-db > > > The information from Entrez Gene is in the form of tab-delimited text > files, so parsing and loading should be pretty straightforward? Of > course, there is no sequence information loaded. In fact, given what > is available in Gene right now, is it sufficient to load RefSeq? The > Gene Reference Into Function is not included with refseq, nor is the > omim (for human) mapping, but the gene ontology, function, and summary > function, as well as mapping to Gene ID are included, so will that do > it? > > Sean > > On Dec 13, 2004, at 11:03 AM, Law, Annie wrote: > >> Hi, >> >> I was wondering with regards to bioperl-db the scripts and schema and >> load_seqdatabase.pl has there been preparation for integration of >> Entrez gene information when locuslink is phased out? Or if it has >> already been >> changed could somebody point >> me to the documentation or changed code? >> >> Thanks, >> Annie. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gnf.org Tue Dec 14 23:24:18 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Dec 14 23:21:57 2004 Subject: [Bioperl-l] references in swissprot Message-ID: <2FA33D2A-4E51-11D9-8D9B-000A95AE92B0@gnf.org> This is mainly an FYI for those who may be affected. If you are using bioperl 1.4 you are not affected. The SeqIO swissprot parser had a bug in parsing references. Specifically, for all except the first reference entry in a sequence, authors, location, and title will be wrong in that they concatenated the value of the preceding reference with that of the current. If there is no value for the current (e.g., no title), then that attribute will erroneously have the same value as the preceding reference. This will also be the case for start and end position (these won't be concatenated though). The bug was introduced in a version (1.80) after the 1.4 branch. I fixed it today, so far only on the main trunk. (I also added tests to detect this problem.) Aaron, do you want me to merge this into the 1.5 rc1 branch, or will you still branch the official release off of the main trunk? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Marc.Logghe at devgen.com Tue Dec 14 10:28:50 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Dec 15 01:03:27 2004 Subject: [Bioperl-l] Parsing TGICL-CAP3 ACE result files Message-ID: Hi Javier, It is already quite old now, don't know what changes have been done in the mean time. Have a try with the Bio::Assembly::IO::ace file attached. This package I used for exactly the same purpose as you. Good luck, Marc > -----Original Message----- > From: Javier Forment Millet [mailto:jforment@ibmcp.upv.es] > Sent: Tuesday, December 14, 2004 12:08 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Parsing TGICL-CAP3 ACE result files > > > Hi,... > > I have to parse CAP3 ACE result files obtained with the TGICL > pipeline (TIGR > Gene Indices clustering tools). I know about the > bioperl-1.4::Bio::Assembly > module for working with contig assemblies, but it seems to be > Phrap-orientated. > Has anybody parsed CAP3 ACE result files with this module or > a modification of > it? Is there another module available for parsing these files? > > Thanks a lot, > > Javier. > > -- > Javier Forment Millet > Instituto de Biologia Molecular y Celular de Plantas (IBMCP) > UPV-CSIC > Avenida de los Naranjos, s/n > 46022 Valencia (SPAIN) > jforment@ibmcp.upv.es > Tlf.:+34-96-3877885 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: ace.pm Type: application/octet-stream Size: 13099 bytes Desc: ace.pm Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041214/10e14456/ace-0001.obj From wes.barris at csiro.au Wed Dec 15 01:17:54 2004 From: wes.barris at csiro.au (Wes Barris) Date: Wed Dec 15 01:17:04 2004 Subject: [Bioperl-l] SeqIO fails on masked sequences Message-ID: <41BFD712.4050403@csiro.au> Hi, I am using a simple bioperl script to process sequences. When I encounter a completely masked sequence, bioperl issues this error message and my script dies: ------------- EXCEPTION ------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] STACK Bio::PrimarySeq::_guess_alphabet /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:837 STACK Bio::Seq::SeqFastaSpeedFactory::create /usr/lib/perl5/site_perl/5.8.0/Bio/Seq/SeqFastaSpeedFactory.pm:134 STACK Bio::SeqIO::fasta::next_seq /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/fasta.pm:141 STACK toplevel /home/wes/soft/fasta/extractSeqs.pl:28 Line 28 of my script is the "while" statement: my $seq_in = Bio::SeqIO->new(-format=>$format, -file => $infile); my $seq_out = Bio::SeqIO->new(-format=>$format, -file => ">$outfile"); while (my $seq = $seq_in->next_seq()) { An example sequence that would generate this error is: >test XXXXXXXXXXXXXXXXXXXXXXXXX Certain, there must be a way to process sequences that are completely masked. Is there a way to explicitly specify the "alphabet" so that SeqIO does not have to guess it? I could not find a way. I am using bioperl live (as of about one month ago) on a Redhat 8 workstation. -- Wes Barris E-Mail: Wes.Barris@csiro.au From hlapp at gmx.net Wed Dec 15 03:24:35 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 15 03:22:05 2004 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: Message-ID: On Tuesday, December 14, 2004, at 08:41 AM, Sean Davis wrote: > If you look in the CDS section of some of the refseq entries > (http://www.ncbi.nlm.nih.gov/entrez/ > viewer.fcgi?db=nucleotide&val=56550106 as an example), you will see > the gene ontology information there. I honestly don't know how this > is handled by bioperl-db... Well if you don't do anything about it then it will sit there in seqfeature_qualifier_value rows, where it is relatively useless (but, hey, it's in the feature table in semi-mangled form, and hence comes in a more or less useless format already ...). So what I did is write a custom SequenceProcessorI (by deriving from Bio::Seq::BaseSeqProcessor) that for every sequence parses this out of the annotation (tags) of the CDS feature, creates Bio::Ontology::Term instances with name and identifier set, and re-attaches the term objects to the sequence object's annotation using Bio::Annotation::OntologyTerm as the adaptor (so that the term is-a Bio::AnnotationI). When the result of this gets serialized to the database through bioperl-db, you get rows in the term table for the terms, and the association with the sequence (bioentry) will be in bioentry_qualifier_value. You hook your SequenceProcessorI into the system by using the --pipeline argument to load_seqdatabase.pl. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Sebastien.Moretti at igs.cnrs-mrs.fr Wed Dec 15 03:36:18 2004 From: Sebastien.Moretti at igs.cnrs-mrs.fr (Sebastien Moretti) Date: Wed Dec 15 03:33:30 2004 Subject: [Bioperl-l] [EMBL reformatter problem with large acc_number] Message-ID: <200412150936.18849.Sebastien.Moretti@igs.cnrs-mrs.fr> Hello, I use bioperl 1.4 on linux OS. I try to enhance some of my EMBL flat files with the standard EMBL reformatter of BioPerl: my $in=Bio::SeqIO->new(-file=>"$rep/struct.emb",-format=>'EMBL'); my $out=Bio::SeqIO->new(-file=>">$rep/struct.embl",-format=>'EMBL'); while (my $seq=$in->next_seq()){ $out->write_seq($seq); } But some of my files have large accession number, like ensEMBL accession number, and my ID lines look like that: ID ENSG00000156345standard; genomic DNA; UNK; 17425 BP. So, I would like to change this and to add blank spaces before 'standard;'. I tried to add $seq->id()=$seq->id()."\s\s\s"; in the while instruction but I get this error message: Can't modify non-lvalue subroutine call What can I do to solve this problem ? I tried to change the id value with references but I am not a master in it. Thanks -- Sebastien MORETTI Linux User - #327894 CNRS - IGS 31 chemin Joseph Aiguier 13402 Marseille cedex 20, FRANCE tel. +33 (0)4 91 16 44 55 From simon.andrews at bbsrc.ac.uk Wed Dec 15 04:29:58 2004 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed Dec 15 04:29:58 2004 Subject: [Bioperl-l] [EMBL reformatter problem with large acc_number] In-Reply-To: <200412150936.18849.Sebastien.Moretti@igs.cnrs-mrs.fr> References: <200412150936.18849.Sebastien.Moretti@igs.cnrs-mrs.fr> Message-ID: On 15 Dec 2004, at 08:36, Sebastien Moretti wrote: > Hello, > I use bioperl 1.4 on linux OS. > I try to enhance some of my EMBL flat files with the standard EMBL > reformatter > of BioPerl: > > But some of my files have large accession number, like ensEMBL > accession > number, and my ID lines look like that: > ID ENSG00000156345standard; genomic DNA; UNK; 17425 BP. > > So, I would like to change this and to add blank spaces before > 'standard;'. > This has been discussed on the list previously and there is a patch waiting in bugzilla to sort out this behaviour http://bugzilla.bioperl.org/show_bug.cgi?id=1618 A small change to EMBL.pm should have you on your way without having to change any of your code. Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews@bbsrc.ac.uk +44 (0) 1223 496463 From Sebastien.Moretti at igs.cnrs-mrs.fr Wed Dec 15 05:07:32 2004 From: Sebastien.Moretti at igs.cnrs-mrs.fr (Sebastien Moretti) Date: Wed Dec 15 05:05:55 2004 Subject: [Bioperl-l] [EMBL reformatter problem with large acc_number] In-Reply-To: References: <200412150936.18849.Sebastien.Moretti@igs.cnrs-mrs.fr> Message-ID: <200412151107.32729.Sebastien.Moretti@igs.cnrs-mrs.fr> > > Hello, > > I use bioperl 1.4 on linux OS. > > I try to enhance some of my EMBL flat files with the standard EMBL > > reformatter > > of BioPerl: > > > > But some of my files have large accession number, like ensEMBL > > accession > > number, and my ID lines look like that: > > ID ENSG00000156345standard; genomic DNA; UNK; 17425 BP. > > > > So, I would like to change this and to add blank spaces before > > 'standard;'. > > This has been discussed on the list previously and there is a patch > waiting in bugzilla to sort out this behaviour > > http://bugzilla.bioperl.org/show_bug.cgi?id=1618 > > A small change to EMBL.pm should have you on your way without having to > change any of your code. > > Simon. Thanks, I works with the patch but it cuts my accession number. So, it must be better to change $temp_line = sprintf("%-11.10sstandard; $mol; $div; %d BP.", $seq->id(), $len); with $temp_line = sprintf("%-10s standard; $mol; $div; %d BP.", $seq->id(), $len); as reported at the end of the bur report. I hope it will be fixed automatically in the 1.5 BioPerl release ! -- Sebastien MORETTI Linux User - #327894 CNRS - IGS 31 chemin Joseph Aiguier 13402 Marseille cedex 20, FRANCE tel. +33 (0)4 91 16 44 55 From Marc.Logghe at devgen.com Wed Dec 15 05:27:20 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Dec 15 05:26:06 2004 Subject: [Bioperl-l] SeqIO fails on masked sequences Message-ID: Hi, > my $seq_in = Bio::SeqIO->new(-format=>$format, -file => $infile); > my $seq_out = Bio::SeqIO->new(-format=>$format, -file => ">$outfile"); > while (my $seq = $seq_in->next_seq()) { Guess you can do it by setting the alphabet explicitely: $seq_in->alphabet('dna'); # or 'rna' or 'protein' You probably also can pass it directly with the constructor, but did not find that immediately. Indirectly, you can do it also by setting the alphabet for the factory object and passing the factory object with the Bio::SeqIO constructor. HTH, Marc From simon.andrews at bbsrc.ac.uk Wed Dec 15 05:35:20 2004 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed Dec 15 05:33:48 2004 Subject: [Bioperl-l] [EMBL reformatter problem with large acc_number] In-Reply-To: <200412151107.32729.Sebastien.Moretti@igs.cnrs-mrs.fr> References: <200412150936.18849.Sebastien.Moretti@igs.cnrs-mrs.fr> <200412151107.32729.Sebastien.Moretti@igs.cnrs-mrs.fr> Message-ID: <04B2E868-4E85-11D9-8DF3-000D933B785C@bbsrc.ac.uk> On 15 Dec 2004, at 10:07, Sebastien Moretti wrote: >>> Hello, >>> I use bioperl 1.4 on linux OS. >>> I try to enhance some of my EMBL flat files with the standard EMBL >>> reformatter >>> of BioPerl: >>> >>> But some of my files have large accession number, like ensEMBL >>> accession >>> number, and my ID lines look like that: >>> ID ENSG00000156345standard; genomic DNA; UNK; 17425 BP. >>> >>> So, I would like to change this and to add blank spaces before >>> 'standard;'. >> >> This has been discussed on the list previously and there is a patch >> waiting in bugzilla to sort out this behaviour >> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1618 >> >> A small change to EMBL.pm should have you on your way without having >> to >> change any of your code. >> >> Simon. > > Thanks, I works with the patch but it cuts my accession number. See the modified patch at the bottom of the report which does exactly the same thing you did! > I hope it will be fixed automatically in the 1.5 BioPerl release ! Hopefully it will be... TTFN Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews@bbsrc.ac.uk +44 (0) 1223 496463 From nathanhaigh at ukonline.co.uk Wed Dec 15 06:40:14 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Dec 15 06:37:59 2004 Subject: [Bioperl-l] nmake test failures Message-ID: I am getting the following failures when I run the bioperl tests: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t\BioFetch_DB.t 27 2 7.41% 20-21 t\DB.t 77 2 2.60% 39 41 t\EMBL_DB.t 15 2 13.33% 13-14 They appear to be getting the seqs back in a different order to the input order.. >From NCBI the seqs should be the following lengths: >From BioFetch_DB.t and EMBL_DB.t - J00522: 408 J02231: 200 AF303112: 1611 >From DB.t - NDP_MOUSE : 131 NDP_HUMAN: 133 Nathan From j.abbott at imperial.ac.uk Wed Dec 15 08:50:07 2004 From: j.abbott at imperial.ac.uk (James Abbott) Date: Wed Dec 15 08:49:40 2004 Subject: [Bioperl-l] copying Bio::Tree::Tree objects In-Reply-To: <63E81420-4DD5-11D9-9ED6-000393C44276@duke.edu> References: <1103025312.10587.5.camel@localhost.localdomain> <63E81420-4DD5-11D9-9ED6-000393C44276@duke.edu> Message-ID: <41C0410F.1060202@imperial.ac.uk> Jason Stajich wrote: > There is no deep copy (yet... feel free to write one). > I've had considerable success making deep copies of various bioperl objects using the Clone module (from a CPAN near you...). Not tried Bio::Tree::Tree, but it works fine with Bio::Seq::RichSeq etc. May be worth a try. James -- Dr. James Abbott Bioinformatics Software Developer, Bioinformatics Support Service Imperial College, London From razi at genet.sickkids.on.ca Wed Dec 15 15:19:06 2004 From: razi at genet.sickkids.on.ca (Razi Khaja) Date: Wed Dec 15 15:16:31 2004 Subject: [Bioperl-l] bitscore and rawscore for blast reports Message-ID: <20041215201906.41276.qmail@web51606.mail.yahoo.com> Im am trying to parse blast reports using the Bio::SearchIO system. #!/usr/bin/perl use strict; use Bio::SearchIO; my( $blastFile ) = @ARGV; my $searchio = new Bio::SearchIO( -file=>$blastFile, -format=>'blast' ); while( my $result = $searchio->next_result() ){ while( my $hit = $result->next_hit() ){ while( my $hsp = $hit->next_hsp() ) { From Marc.Logghe at devgen.com Wed Dec 15 15:31:30 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Dec 15 15:30:02 2004 Subject: [Bioperl-l] bitscore and rawscore for blast reports Message-ID: Hi, > Im am trying to parse blast reports using the Bio::SearchIO system. > > #!/usr/bin/perl > use strict; > use Bio::SearchIO; > my( $blastFile ) = @ARGV; > my $searchio = new Bio::SearchIO( -file=>$blastFile, > -format=>'blast' ); > while( my $result = $searchio->next_result() ){ > while( my $hit = $result->next_hit() ){ > while( my $hsp = $hit->next_hsp() ) { > Your hit object is a Bio::Search::Hit::BlastHit object. The methods related to score are inherited from Bio::Search::Hit::GenericHit. So have a look there for info about the methods score() and raw_score(). The score for an individual hsp is also inherited: Bio::SeqFeature::FeaturePair::hscore Bio::SeqFeature::SimilarityPair::score You might also have a look at the tutorial about Bio::SearchIO, especially the overview table showing all available methods (http://bioperl.org/HOWTOs/SearchIO/use.html) HTH, Marc From razi at genet.sickkids.on.ca Wed Dec 15 16:20:24 2004 From: razi at genet.sickkids.on.ca (Razi Khaja) Date: Wed Dec 15 16:18:35 2004 Subject: [Bioperl-l] bitscore and rawscore for blast reports Message-ID: <20041215212024.65345.qmail@web51607.mail.yahoo.com> Sorry for my previous incomplete posting ... Im am parsing BLASTN reports using the Bio::SearchIO system, and have run into some problems in obtaining bit score and raw score data. I need to be able to get the bit score and raw score for *each* of the hsps, but HSPI provides no methods to get this data. The HitI object provides methods raw_score() and bits() to get this data, however these only return the raw score and bit score of the first/best hsp in the blast report. I need this data not only for the first/best hsp but for each of them. Any input would be appreciated. Thanks Razi ==== Partial BLAST Report ===== >chrX Length = 153692391 Score = 682 bits (344), Expect = 0.0 Identities = 354/356 (99%), Gaps = 1/356 (0%) Strand = Plus / Minus Query: 17 aaaatga ... ||||||| Sbjct: 107395915 aaaaaga ... . . . Score = 172 bits (87), Expect = 2e-40 Identities = 120/131 (91%) Strand = Plus / Minus Query: 490 ctctctctctctctct ... |||||||||||||||| Sbjct: 52902264 ctctctctctctctct ... ===== Code Snippet ===== #!/usr/bin/perl use strict; use Bio::SearchIO; my( $blastFile ) = @ARGV; my $searchio = new Bio::SearchIO( -file=>$blastFile, -format=>'blast' ); while( my $result = $searchio->next_result() ){ while( my $hit = $result->next_hit() ) { while( my $hsp = $hit->next_hsp() ) { # Want to collect bit score and raw score data here } } } From jason.stajich at duke.edu Wed Dec 15 16:42:14 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Dec 15 16:39:24 2004 Subject: [Bioperl-l] bitscore and rawscore for blast reports In-Reply-To: <20041215212024.65345.qmail@web51607.mail.yahoo.com> References: <20041215212024.65345.qmail@web51607.mail.yahoo.com> Message-ID: <2F4FDA89-4EE2-11D9-BEE0-000393C44276@duke.edu> On Dec 15, 2004, at 4:20 PM, Razi Khaja wrote: > Sorry for my previous incomplete posting ... > > Im am parsing BLASTN reports using the Bio::SearchIO system, and have > run > into some problems in obtaining bit score and raw score data. > > I need to be able to get the bit score and raw score for *each* of the > hsps, but HSPI provides no methods to get this data. > uh, yes it does. $hsp->bits $hsp->score Try the SearchIO HOWTO as Marc said. > The HitI object provides methods raw_score() and bits() to get this > data, > however these only return the raw score and bit score of the > first/best hsp > in the blast report. I need this data not only for the first/best hsp > but > for each of them. > or sometimes it is a summary score not the best score depending on how you have run WU-BLAST. > Any input would be appreciated. > Thanks > Razi > > > ==== Partial BLAST Report ===== >> chrX > Length = 153692391 > > Score = 682 bits (344), Expect = 0.0 > Identities = 354/356 (99%), Gaps = 1/356 (0%) > Strand = Plus / Minus > > > > Query: 17 aaaatga ... > ||||||| > Sbjct: 107395915 aaaaaga ... > > > . > . > . > > Score = 172 bits (87), Expect = 2e-40 > Identities = 120/131 (91%) > Strand = Plus / Minus > > > > Query: 490 ctctctctctctctct ... > |||||||||||||||| > Sbjct: 52902264 ctctctctctctctct ... > > ===== Code Snippet ===== > #!/usr/bin/perl > use strict; > use Bio::SearchIO; > my( $blastFile ) = @ARGV; > my $searchio = new Bio::SearchIO( -file=>$blastFile, -format=>'blast' > ); > while( my $result = $searchio->next_result() ){ > while( my $hit = $result->next_hit() ) { > while( my $hsp = $hit->next_hsp() ) { > # Want to collect bit score and raw score data here print $hsp->bits, " ", $hsp->score, "\n"; # for each HSP > } > } > } > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From Marc.Logghe at devgen.com Wed Dec 15 16:42:21 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Dec 15 16:40:47 2004 Subject: [Bioperl-l] bitscore and rawscore for blast reports Message-ID: > I need to be able to get the bit score and raw score for *each* of the > hsps, but HSPI provides no methods to get this data. This does the trick: print join(',',$hsp->score,$hsp->bits),"\n"; Like I already said, score() is inherited from Bio::SeqFeature::SimilarityPair. This is also the case for the bits() method. HTH, Marc From wes.barris at csiro.au Wed Dec 15 17:37:13 2004 From: wes.barris at csiro.au (Wes Barris) Date: Wed Dec 15 17:34:28 2004 Subject: [Bioperl-l] SeqIO fails on masked sequences In-Reply-To: References: Message-ID: <41C0BC99.3020506@csiro.au> Marc Logghe wrote: > Hi, > >>my $seq_in = Bio::SeqIO->new(-format=>$format, -file => $infile); >>my $seq_out = Bio::SeqIO->new(-format=>$format, -file => ">$outfile"); >>while (my $seq = $seq_in->next_seq()) { > > > Guess you can do it by setting the alphabet explicitely: > $seq_in->alphabet('dna'); # or 'rna' or 'protein' Sorry, that does not work. I tried this and got the same error: my $seq_in = Bio::SeqIO->new(-format=>$format, -file => $infile); $seq_in->alphabet('dna'); my $seq_out = Bio::SeqIO->new(-format=>$format, -file => ">$outfile"); while (my $seq = $seq_in->next_seq()) { > > You probably also can pass it directly with the constructor, but did not find that immediately. Sorry, that did not work either. Here is what I tried: my $seq_in = Bio::SeqIO->new(-format=>$format, -file => $infile, -alphabet=>'dna'); my $seq_out = Bio::SeqIO->new(-format=>$format, -file => ">$outfile"); while (my $seq = $seq_in->next_seq()) { > Indirectly, you can do it also by setting the alphabet for the factory object and passing the factory object with the Bio::SeqIO constructor. Would you provide an example? > > HTH, > Marc -- Wes Barris E-Mail: Wes.Barris@csiro.au From xuying at sibs.ac.cn Thu Dec 16 01:21:40 2004 From: xuying at sibs.ac.cn (xuying) Date: Thu Dec 16 01:23:52 2004 Subject: [Bioperl-l] Bio::SeqIO::staden::read problem Message-ID: <20041216062603.65F0E10DF65@smtp.sibsnet.org> hi all: I have problem with running demo scripts in bptutorial.pl [xuying@guyver-1 bioperl-1.4]$ perl -w bptutorial.pl 0 The extension 'Bio::SeqIO::staden::read' is not properly installed in path: '/usr/lib/perl5/site_perl/5.8.0' If this is a CPAN/distributed module, you may need to reinstall it on your system. To allow Inline to compile the module in a temporary cache, simply remove the Inline config option 'VERSION=' from the Bio::SeqIO::staden::read module. at bptutorial.pl line 0 INIT failed--call queue aborted, line 1. I have installed io_lib in /usr/local/lib/ and installed bioperl-ext successful. is there any argument i should change? ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡xuying ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡xuying@sibs.ac.cn ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡2004-12-16 From Marc.Logghe at devgen.com Thu Dec 16 04:27:00 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Dec 16 04:25:35 2004 Subject: [Bioperl-l] SeqIO fails on masked sequences Message-ID: Hi Wes, > > Guess you can do it by setting the alphabet explicitely: > > $seq_in->alphabet('dna'); # or 'rna' or 'protein' > > Sorry, that does not work. I tried this and got the same error: Yeah, some strange things seem to happen. You can set it this way but it is not taken into account anyhow by Bio::SeqIO::fasta: when it is set and there is a sequence found, it is boldly set to undef !!! In object creation the type is guessed anyhow and in your case it ends up as protein because of the X's. It would end up as dna if it were N's, though. > > Indirectly, you can do it also by setting the alphabet for > the factory object and passing the factory object with the > Bio::SeqIO constructor. > > Would you provide an example? Think that did not make sense, sorry for that. On the other hand I was not able to mimick your problem generating the error. I got no errors, only the fact that the alphabet was reset to 'protein'. Initially I got a similar error but that was caused by the fact that $format was not set yet and I did not run using the strict pragma. The script I used to find that out: #!/usr/bin/perl use strict; use Bio::SeqIO; use Data::Dumper; my $format = 'fasta'; my $seq_in = Bio::SeqIO->new(-format=>$format, -fh => \*DATA); $seq_in->alphabet('dna'); my $seq_out = Bio::SeqIO->new(-format=>$format, -fh => \*STDOUT); $seq_out->alphabet('dna'); my $seq = $seq_in->next_seq; print Data::Dumper->Dump([$seq],['seq']); $seq_out->write_seq($seq); __DATA__ >test XXXXXXXXXXXXXXXXXXXXXXXXX I am afraid I can not be of more help here. Cheers, Marc From hlapp at gnf.org Thu Dec 16 05:18:43 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Dec 16 05:20:25 2004 Subject: [Bioperl-l] Re: [BioSQL-l] tigr.pm In-Reply-To: Message-ID: On Friday, December 10, 2004, at 09:34 AM, Hilmar Lapp wrote: > I could add a fix to the LocationAdapter to default a null strand to 0. > > Any opinions? I'm leaning towards adding a fix to LocationAdapter to > fix this problem immediately and forever. I'm taking this back. This is exactly what LocationAdapter does already. There shouldn't be an undefined strand that ever makes it to an INSERT attempt. Which version are you using? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From nathanhaigh at ukonline.co.uk Thu Dec 16 05:31:15 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Thu Dec 16 05:32:01 2004 Subject: [Bioperl-l] SeqIO fails on masked sequences In-Reply-To: Message-ID: When I use the script you supplied, I get the exception shown below. I'll try to get to the bottom of this. In the meantime, what OS are you both using and what version of Bioperl? Nathan ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] STACK: Error::throw STACK: Bio::Root::Root::throw I:/Programming/Perl/perl5.8.0/site/lib/Bio/Root/Root.pm:328 STACK: Bio::PrimarySeq::_guess_alphabet I:/Programming/Perl/perl5.8.0/site/lib/Bio/PrimarySeq.pm:837 STACK: Bio::Seq::SeqFastaSpeedFactory::create I:/Programming/Perl/perl5.8.0/site/lib/Bio/Seq/SeqFastaSpeedFactory.pm:134 STACK: Bio::SeqIO::fasta::next_seq I:/Programming/Perl/perl5.8.0/site/lib/Bio\SeqIO\fasta.pm:141 STACK: test.pl:9 ----------------------------------------------------------- > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Marc Logghe > Sent: 16 December 2004 09:27 > To: Wes Barris > Cc: Bioperl Mailing List > Subject: RE: [Bioperl-l] SeqIO fails on masked sequences > > Hi Wes, > > > > Guess you can do it by setting the alphabet explicitely: > > > $seq_in->alphabet('dna'); # or 'rna' or 'protein' > > > > Sorry, that does not work. I tried this and got the same error: > > Yeah, some strange things seem to happen. You can set it this way but it is not taken into account anyhow by Bio::SeqIO::fasta: when > it is set and there is a sequence found, it is boldly set to undef !!! > In object creation the type is guessed anyhow and in your case it ends up as protein because of the X's. It would end up as dna if it > were N's, though. > > > > > Indirectly, you can do it also by setting the alphabet for > > the factory object and passing the factory object with the > > Bio::SeqIO constructor. > > > > Would you provide an example? > Think that did not make sense, sorry for that. > > On the other hand I was not able to mimick your problem generating the error. I got no errors, only the fact that the alphabet was reset > to 'protein'. Initially I got a similar error but that was caused by the fact that $format was not set yet and I did not run using the strict > pragma. > > The script I used to find that out: > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Data::Dumper; > > my $format = 'fasta'; > > > my $seq_in = Bio::SeqIO->new(-format=>$format, -fh => \*DATA); > $seq_in->alphabet('dna'); > my $seq_out = Bio::SeqIO->new(-format=>$format, -fh => \*STDOUT); > $seq_out->alphabet('dna'); > my $seq = $seq_in->next_seq; > > print Data::Dumper->Dump([$seq],['seq']); > $seq_out->write_seq($seq); > > __DATA__ > >test > XXXXXXXXXXXXXXXXXXXXXXXXX > > > I am afraid I can not be of more help here. > Cheers, > Marc > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0451-1, 14/12/2004 > Tested on: 16/12/2004 10:00:11 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0451-1, 14/12/2004 Tested on: 16/12/2004 10:30:08 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From Marc.Logghe at devgen.com Thu Dec 16 05:44:06 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Dec 16 05:42:45 2004 Subject: [Bioperl-l] SeqIO fails on masked sequences Message-ID: > When I use the script you supplied, I get the exception shown below. > > I'll try to get to the bottom of this. > > In the meantime, what OS are you both using and what version > of Bioperl? > Ah, yes that explains. Too much fiddling with PERL5LIB is not good ;-) I did not realize I was acutally using bioperl 1.4.0. There it worked. It fails indeed when using bioperl-release-1-5-0-rc1. Apologies for confusing you people. Cheers, Marc From nathanhaigh at ukonline.co.uk Thu Dec 16 06:21:57 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Thu Dec 16 06:21:53 2004 Subject: [Bioperl-l] SeqIO fails on masked sequences In-Reply-To: Message-ID: Ok, the "bug" seems to have been introduced in the last update to Bio::PrimarySeq.pm (v1.83) where X was added to the list of ambiguous characters in the _guess_alphabet subroutine. Brian - do you remember why/what this was for? Nathan > -----Original Message----- > From: Marc Logghe [mailto:Marc.Logghe@devgen.com] > Sent: 16 December 2004 10:44 > To: nathanhaigh@ukonline.co.uk; Wes Barris > Cc: Bioperl Mailing List > Subject: RE: [Bioperl-l] SeqIO fails on masked sequences > > > When I use the script you supplied, I get the exception shown below. > > > > I'll try to get to the bottom of this. > > > > In the meantime, what OS are you both using and what version > > of Bioperl? > > > Ah, yes that explains. Too much fiddling with PERL5LIB is not good ;-) > I did not realize I was acutally using bioperl 1.4.0. There it worked. > It fails indeed when using bioperl-release-1-5-0-rc1. > Apologies for confusing you people. > Cheers, > Marc > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0451-1, 14/12/2004 > Tested on: 16/12/2004 10:47:57 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0451-1, 14/12/2004 Tested on: 16/12/2004 11:21:54 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From amackey at pcbi.upenn.edu Thu Dec 16 07:19:23 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Dec 16 07:17:56 2004 Subject: [Bioperl-l] Bio::SeqIO::staden::read problem In-Reply-To: <20041216062603.65F0E10DF65@smtp.sibsnet.org> References: <20041216062603.65F0E10DF65@smtp.sibsnet.org> Message-ID: This error crops up when, in fact, bioperl-ext hasn't been fully, successfully installed (at least not in your site_perl/5.8.0). If you do not need Bio::SeqIO::staden::read functionality, I suggest removing that part of the installation tree. -Aaron On Dec 16, 2004, at 1:21 AM, xuying wrote: > hi all: > I have problem with running demo scripts in bptutorial.pl > > [xuying@guyver-1 bioperl-1.4]$ perl -w bptutorial.pl 0 > The extension 'Bio::SeqIO::staden::read' is not properly installed in > path: > '/usr/lib/perl5/site_perl/5.8.0' > > If this is a CPAN/distributed module, you may need to reinstall it on > your > system. > > To allow Inline to compile the module in a temporary cache, simply > remove the > Inline config option 'VERSION=' from the Bio::SeqIO::staden::read > module. > > at bptutorial.pl line 0 > INIT failed--call queue aborted, line 1. > > I have installed io_lib in /usr/local/lib/ and installed bioperl-ext > successful. > is there any argument i should change? > > ????????xuying > ????????xuying@sibs.ac.cn > ??????????2004-12-16 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From jason.stajich at duke.edu Thu Dec 16 13:42:37 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Dec 16 13:39:44 2004 Subject: [Bioperl-l] SeqFeature::Generic DESTROY [was New: Bug in Bio::SeqFeature::Generic.pm DESTROY causes object corruption.. fix included!] In-Reply-To: References: <200412100310.iBA3AWLa028953@portal.open-bio.org> <49F5251D-4ACC-11D9-8F75-000D932893EC@caltech.edu> Message-ID: <42276C0E-4F92-11D9-A502-000393C44276@duke.edu> So is this bug going to get resolved by someone? I don't have a clear idea of how to reproduce it. Has the removal of the get_tag_values is implemented removed all of this code in the first place? The reason for that code was to free memory cycles introduced when a Feature has a reference to the Sequence and then the Sequence also has a reference to the Feature. But it looks like the whole DESTROY function was removed from SeqFeature::Generic so I'm unsure if the memory cycle problem has now been re-introduced. Looks like it might be since there is no explict unlinking of $obj->{'_gsf_seq'}. -jason On Dec 10, 2004, at 11:59 AM, Aaron J. Mackey wrote: > > Probably because the offending code in get_tag_values doesn't check > for defined($hash{$key}) but just exists($hash{$key}) (and then > merrily tries to dereference it as an array). It's a bug in BioPerl, > to be sure, but not as mystical as it seems. > > -Aaron > > On Dec 10, 2004, at 11:55 AM, Alok Saldanha wrote: > >> I still don't understand why adding delete() to DESTROY causes the >> problem to go away > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From sdavis2 at mail.nih.gov Thu Dec 16 16:44:07 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Dec 16 16:47:45 2004 Subject: [Bioperl-l] bio::graphics::panel under mod_perl Message-ID: <9D11550E-4FAB-11D9-816E-000D933565E8@mail.nih.gov> I know this is a very esoteric question, but here goes: I have two scripts, one home-brewed (using bioperl-live from recent CVS), and gbrowse (1.62), both running under mod_perl (apache 1.3). Each has the same behavior--in some requests, seemingly at random, several colors become black. I haven't been able to sort this out, but since it happens in gbrowse and my custom app using bio::graphics, I wonder if there is an issue there. It could just as likely be gd or some other supporting package. I guess I am wondering if anyone else has noticed these issues or how I might sort it out. Note that these work fine as straight CGI. Two files are attached (as I am behind a firewall, so can't show the sites....). The second has the correct color scheme. Sean -------------- next part -------------- A non-text attachment was scrubbed... Name: human_open1.png Type: image/png Size: 5163 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041216/5902d110/human_open1.png -------------- next part -------------- A non-text attachment was scrubbed... Name: human_open2.png Type: image/png Size: 3319 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041216/5902d110/human_open2.png From wes.barris at csiro.au Thu Dec 16 17:27:23 2004 From: wes.barris at csiro.au (Wes Barris) Date: Thu Dec 16 17:24:34 2004 Subject: [Bioperl-l] SeqIO fails on masked sequences In-Reply-To: References: Message-ID: <41C20BCB.6090808@csiro.au> Nathan Haigh wrote: > When I use the script you supplied, I get the exception shown below. > > I'll try to get to the bottom of this. > > In the meantime, what OS are you both using and what version of Bioperl? > > Nathan Hi Nathan, I am using bioperl-live (as of about a month ago). I too remember this working properly back with bioperl-1.4. Redhat 8 and Redhat 9. > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Got a sequence with no letters in it cannot guess alphabet [] > STACK: Error::throw > STACK: Bio::Root::Root::throw I:/Programming/Perl/perl5.8.0/site/lib/Bio/Root/Root.pm:328 > STACK: Bio::PrimarySeq::_guess_alphabet I:/Programming/Perl/perl5.8.0/site/lib/Bio/PrimarySeq.pm:837 > STACK: Bio::Seq::SeqFastaSpeedFactory::create I:/Programming/Perl/perl5.8.0/site/lib/Bio/Seq/SeqFastaSpeedFactory.pm:134 > > STACK: Bio::SeqIO::fasta::next_seq I:/Programming/Perl/perl5.8.0/site/lib/Bio\SeqIO\fasta.pm:141 > STACK: test.pl:9 > ----------------------------------------------------------- > > >>-----Original Message----- >>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Marc Logghe >>Sent: 16 December 2004 09:27 >>To: Wes Barris >>Cc: Bioperl Mailing List >>Subject: RE: [Bioperl-l] SeqIO fails on masked sequences >> >>Hi Wes, >> >> >>>>Guess you can do it by setting the alphabet explicitely: >>>>$seq_in->alphabet('dna'); # or 'rna' or 'protein' >>> >>>Sorry, that does not work. I tried this and got the same error: >> >>Yeah, some strange things seem to happen. You can set it this way but it is not taken into account anyhow by Bio::SeqIO::fasta: > > when > >>it is set and there is a sequence found, it is boldly set to undef !!! >>In object creation the type is guessed anyhow and in your case it ends up as protein because of the X's. It would end up as dna if > > it > >>were N's, though. >> >> >> >>>>Indirectly, you can do it also by setting the alphabet for >>> >>>the factory object and passing the factory object with the >>>Bio::SeqIO constructor. >>> >>>Would you provide an example? >> >>Think that did not make sense, sorry for that. >> >>On the other hand I was not able to mimick your problem generating the error. I got no errors, only the fact that the alphabet was > > reset > >>to 'protein'. Initially I got a similar error but that was caused by the fact that $format was not set yet and I did not run using > > the strict > >>pragma. >> >>The script I used to find that out: >> >>#!/usr/bin/perl >>use strict; >>use Bio::SeqIO; >>use Data::Dumper; >> >>my $format = 'fasta'; >> >> >>my $seq_in = Bio::SeqIO->new(-format=>$format, -fh => \*DATA); >>$seq_in->alphabet('dna'); >>my $seq_out = Bio::SeqIO->new(-format=>$format, -fh => \*STDOUT); >>$seq_out->alphabet('dna'); >>my $seq = $seq_in->next_seq; >> >>print Data::Dumper->Dump([$seq],['seq']); >>$seq_out->write_seq($seq); >> >>__DATA__ >> >>>test >> >>XXXXXXXXXXXXXXXXXXXXXXXXX >> >> >>I am afraid I can not be of more help here. >>Cheers, >>Marc >> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>--- >>avast! Antivirus: Inbound message clean. >>Virus Database (VPS): 0451-1, 14/12/2004 >>Tested on: 16/12/2004 10:00:11 >>avast! is copyright (c) 2000-2003 ALWIL Software. >>http://www.avast.com >> >> > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0451-1, 14/12/2004 > Tested on: 16/12/2004 10:30:08 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > -- Wes Barris E-Mail: Wes.Barris@csiro.au From qfdong at iastate.edu Thu Dec 16 18:00:57 2004 From: qfdong at iastate.edu (Qunfeng) Date: Thu Dec 16 17:58:05 2004 Subject: [Bioperl-l] parse long organism name Message-ID: <6.1.2.0.2.20041216165914.039e5758@qfdong.mail.iastate.edu> For example, http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=47776109 It has a LONG name: Paphiopedilum 'Dark Roller' x Paphiopedilum rothschildianum Is there anyway in Bioperl to parse out that long name from GenBank format file? Thanks! Qunfeng From cjfields at uiuc.edu Thu Dec 16 18:11:27 2004 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Dec 16 18:08:50 2004 Subject: [Bioperl-l] searching Bioperl archives Message-ID: <6.1.1.1.2.20041216170905.01a662a8@express.cites.uiuc.edu> Is there a direct way to search the Bioperl list archives? Using the openbio search form doesn't seem to work for the email archives. __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 From sdavis2 at mail.nih.gov Thu Dec 16 18:13:13 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Dec 16 18:10:49 2004 Subject: [Bioperl-l] bio::graphics::panel under mod_perl In-Reply-To: <9D11550E-4FAB-11D9-816E-000D933565E8@mail.nih.gov> References: <9D11550E-4FAB-11D9-816E-000D933565E8@mail.nih.gov> Message-ID: <0F9E953C-4FB8-11D9-816E-000D933565E8@mail.nih.gov> Sorry. A reinstall of GD-2.19 over a fresh libgd-2.0.33 seems to have fixed the problem. Sean On Dec 16, 2004, at 4:44 PM, Sean Davis wrote: > I know this is a very esoteric question, but here goes: > > I have two scripts, one home-brewed (using bioperl-live from recent > CVS), and gbrowse (1.62), both running under mod_perl (apache 1.3). > Each has the same behavior--in some requests, seemingly at random, > several colors become black. I haven't been able to sort this out, > but since it happens in gbrowse and my custom app using bio::graphics, > I wonder if there is an issue there. It could just as likely be gd or > some other supporting package. I guess I am wondering if anyone else > has noticed these issues or how I might sort it out. Note that these > work fine as straight CGI. Two files are attached (as I am behind a > firewall, so can't show the sites....). The second has the correct > color scheme. > > > Sean > > > _____________________________________ > __________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Thu Dec 16 18:15:02 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Dec 16 18:12:48 2004 Subject: [Bioperl-l] bio::graphics::panel under mod_perl In-Reply-To: <0F9E953C-4FB8-11D9-816E-000D933565E8@mail.nih.gov> References: <9D11550E-4FAB-11D9-816E-000D933565E8@mail.nih.gov> <0F9E953C-4FB8-11D9-816E-000D933565E8@mail.nih.gov> Message-ID: <503F229F-4FB8-11D9-A634-000D933565E8@mail.nih.gov> Nope. More testing--still a problem. Sean On Dec 16, 2004, at 6:13 PM, Sean Davis wrote: > Sorry. A reinstall of GD-2.19 over a fresh libgd-2.0.33 seems to have > fixed the problem. > > Sean > > On Dec 16, 2004, at 4:44 PM, Sean Davis wrote: > >> I know this is a very esoteric question, but here goes: >> >> I have two scripts, one home-brewed (using bioperl-live from recent >> CVS), and gbrowse (1.62), both running under mod_perl (apache 1.3). >> Each has the same behavior--in some requests, seemingly at random, >> several colors become black. I haven't been able to sort this out, >> but since it happens in gbrowse and my custom app using >> bio::graphics, I wonder if there is an issue there. It could just as >> likely be gd or some other supporting package. I guess I am >> wondering if anyone else has noticed these issues or how I might sort >> it out. Note that these work fine as straight CGI. Two files are >> attached (as I am behind a firewall, so can't show the sites....). >> The second has the correct color scheme. >> >> >> Sean >> >> >> ____________________________________ >> ___________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Thu Dec 16 18:39:35 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu Dec 16 18:36:57 2004 Subject: [Bioperl-l] searching Bioperl archives In-Reply-To: <6.1.1.1.2.20041216170905.01a662a8@express.cites.uiuc.edu> References: <6.1.1.1.2.20041216170905.01a662a8@express.cites.uiuc.edu> Message-ID: <41C21CB7.7060006@genetics.utah.edu> Google with site:bioperl.org. Chris Fields wrote: > Is there a direct way to search the Bioperl list archives? Using the > openbio search form doesn't seem to work for the email archives. > > __________________________________ > > Chris Fields - Postdoctoral Researcher > Lab of Dr. Robert Switzer > > Address: > > University of Illinois at Urbana-Champaign > Dept. of Biochemistry - 323 RAL > 600 S. Mathews Ave. > Urbana, IL 61801 > > Phone : (217) 333-7098 > Fax : (217) 244-5858 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From wes.barris at csiro.au Thu Dec 16 19:08:40 2004 From: wes.barris at csiro.au (Wes Barris) Date: Thu Dec 16 19:05:47 2004 Subject: [Bioperl-l] searching Bioperl archives In-Reply-To: <41C21CB7.7060006@genetics.utah.edu> References: <6.1.1.1.2.20041216170905.01a662a8@express.cites.uiuc.edu> <41C21CB7.7060006@genetics.utah.edu> Message-ID: <41C22388.9060708@csiro.au> Barry Moore wrote: > Google with site:bioperl.org. > > Chris Fields wrote: > >> Is there a direct way to search the Bioperl list archives? Using the >> openbio search form doesn't seem to work for the email archives. I would think that there should be a search option right on the bioperl list archive pages. That is where most people would look for it. >> >> __________________________________ >> >> Chris Fields - Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> >> Address: >> >> University of Illinois at Urbana-Champaign >> Dept. of Biochemistry - 323 RAL >> 600 S. Mathews Ave. >> Urbana, IL 61801 >> >> Phone : (217) 333-7098 >> Fax : (217) 244-5858 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- Wes Barris E-Mail: Wes.Barris@csiro.au From xuying at sibs.ac.cn Thu Dec 16 21:11:37 2004 From: xuying at sibs.ac.cn (xuying) Date: Thu Dec 16 21:57:19 2004 Subject: [Bioperl-l] Bio::SeqIO::staden::read problem Message-ID: <20041217021559.B17AB10DF63@smtp.sibsnet.org> but i do need the functionality provided by bioperl-ext. here is the message for install bioperl-ext. isn't it installed successfully? ...... (i just can't run any example in the bptutorial.pl) it just gives me the error message "The extension 'Bio::SeqIO::staden::read' is not properly installed in path: '/usr/lib/perl5/site_perl/5.8.0'" again and again. test....ok 35/94# Failed test 35 in test.pl at line 107 fail #5 *TODO*: Can't write valid ctf files until we have a trace object test....ok 36/94# Failed test 36 in test.pl at line 107 fail #6 *TODO*: Can't write valid ctf files until we have a trace object test....ok 37/94# Failed test 37 in test.pl at line 107 fail #7 *TODO*: Can't write valid ctf files until we have a trace object test....ok All tests successful. Files=1, Tests=94, 6 wallclock secs ( 5.56 cusr + 0.09 csys = 5.65 CPU) make[1]: Leaving directory `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' /usr/bin/make test -- OK Running make install make[1]: Entering directory `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' make[1]: Leaving directory `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' make[1]: Entering directory `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' make[1]: Leaving directory `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' Installing /usr/share/man/man3/Bio::SeqIO::staden::read.3pm Writing /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi/auto/Bio/.packlist Appending installation info to /usr/lib/perl5/5.8.0/i386-linux-thread-multi/perllocal.pod /usr/bin/make install -- OK >Xuying- > >One of the demos you are trying to run apparently requires >Bio::SeqIO::staden::read to be installed. That is part of bioperl-ext >and requires the Staden package either of which you may not have >installed. Are you just trying to run a script to see how bioperl works? >In the tutorial documentation it offers some advice... > >It may be best to start by just running one or two demos at a time. For >example, to run the basic sequence manipulation demo, do: > > > perl -w bptutorial.pl 1 > >Some of the later demos require that you have an internet connection >and/or that you have an auxiliary bioperl library and/or external >cpan module and/or external program installed. > >Try those. > >Barry > > >xuying wrote: > >>hi all: >>I have problem with running demo scripts in bptutorial.pl >> >>[xuying@guyver-1 bioperl-1.4]$ perl -w bptutorial.pl 0 >>The extension '' is not properly installed in path: >> '/usr/lib/perl5/site_perl/5.8.0' >> >>If this is a CPAN/distributed module, you may need to reinstall it on your >>system. >> >>To allow Inline to compile the module in a temporary cache, simply remove the >>Inline config option 'VERSION=' from the Bio::SeqIO::staden::read module. >> >> at bptutorial.pl line 0 >>INIT failed--call queue aborted, line 1. >> >>I have installed io_lib in /usr/local/lib/ and installed bioperl-ext successful. >>is there any argument i should change? >> >>¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡xuying >>¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡xuying@sibs.ac.cn >>¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡2004-12-16 >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > >-- >Barry Moore >Dept. of Human Genetics >University of Utah >Salt Lake City, UT > > > > > = = = = = = = = = = = = = = = = = = = = ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡Ö Àñ£¡ ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡xuying ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡xuying@sibs.ac.cn ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡2004-12-17 From heikki at nildram.co.uk Fri Dec 17 04:48:04 2004 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Fri Dec 17 04:46:34 2004 Subject: [Bioperl-l] [EMBL reformatter problem with large acc_number] In-Reply-To: <04B2E868-4E85-11D9-8DF3-000D933B785C@bbsrc.ac.uk> References: <200412150936.18849.Sebastien.Moretti@igs.cnrs-mrs.fr> <200412151107.32729.Sebastien.Moretti@igs.cnrs-mrs.fr> <04B2E868-4E85-11D9-8DF3-000D933B785C@bbsrc.ac.uk> Message-ID: <200412170948.05596.heikki@nildram.co.uk> The fix suggested in bug 1618 is now in cvs and should make it into next release. Cheers, -Heikki On Wednesday 15 December 2004 10:35, simon andrews wrote: > On 15 Dec 2004, at 10:07, Sebastien Moretti wrote: > >>> Hello, > >>> I use bioperl 1.4 on linux OS. > >>> I try to enhance some of my EMBL flat files with the standard EMBL > >>> reformatter > >>> of BioPerl: > >>> > >>> But some of my files have large accession number, like ensEMBL > >>> accession > >>> number, and my ID lines look like that: > >>> ID ENSG00000156345standard; genomic DNA; UNK; 17425 BP. > >>> > >>> So, I would like to change this and to add blank spaces before > >>> 'standard;'. > >> > >> This has been discussed on the list previously and there is a patch > >> waiting in bugzilla to sort out this behaviour > >> > >> http://bugzilla.bioperl.org/show_bug.cgi?id=1618 > >> > >> A small change to EMBL.pm should have you on your way without having > >> to > >> change any of your code. > >> > >> Simon. > > > > Thanks, I works with the patch but it cuts my accession number. > > See the modified patch at the bottom of the report which does exactly > the same thing you did! > > > I hope it will be fixed automatically in the 1.5 BioPerl release ! > > Hopefully it will be... > > TTFN > > Simon. -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Fri Dec 17 10:32:22 2004 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Dec 17 10:27:31 2004 Subject: [Bioperl-l] searching Bioperl archives In-Reply-To: <41C22388.9060708@csiro.au> References: <6.1.1.1.2.20041216170905.01a662a8@express.cites.uiuc.edu> <41C21CB7.7060006@genetics.utah.edu> <41C22388.9060708@csiro.au> Message-ID: <6.1.1.1.2.20041217092527.01a11258@express.cites.uiuc.edu> At 06:08 PM 12/16/2004, Wes Barris wrote: >Barry Moore wrote: > >>Google with site:bioperl.org. >>Chris Fields wrote: >> >>>Is there a direct way to search the Bioperl list archives? Using the >>>openbio search form doesn't seem to work for the email archives. > >I would think that there should be a search option right on the >bioperl list archive pages. That is where most people would look >for it. If the page you're referring to is: http://search.open-bio.org/cgi-bin/obf-search.cgi then the answer is, strangely, no. I tried searching for a few previous posts of mine (as a test) and couldn't find anything. I could have sworn that I had searched the bioperl mail lists using this interface before! The Google option works well (thanks Barry; I didn't think of trying it that way), but shouldn't there be a way to search through the archives from the open-bio interface? __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 From hlapp at gmx.net Fri Dec 17 11:14:01 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Dec 17 11:11:50 2004 Subject: [Bioperl-l] parse long organism name In-Reply-To: <6.1.2.0.2.20041216165914.039e5758@qfdong.mail.iastate.edu> Message-ID: What's the error that you get, if any? -hilmar On Thursday, December 16, 2004, at 03:00 PM, Qunfeng wrote: > For example, > http://www.ncbi.nlm.nih.gov/entrez/ > viewer.fcgi?db=nucleotide&val=47776109 > > It has a LONG name: > wwwtax.cgi?id=232838>Paphiopedilum 'Dark Roller' x Paphiopedilum > rothschildianum > > Is there anyway in Bioperl to parse out that long name from GenBank > format file? > > Thanks! > > Qunfeng _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jurgen.pletinckx at algonomics.com Fri Dec 17 11:36:56 2004 From: jurgen.pletinckx at algonomics.com (Jurgen Pletinckx) Date: Fri Dec 17 11:12:03 2004 Subject: [Bioperl-l] searching Bioperl archives In-Reply-To: <6.1.1.1.2.20041217092527.01a11258@express.cites.uiuc.edu> Message-ID: However, http://search.open-bio.org/cgi-bin/mail-search.cgi does indeed work exactly as one would expect. The mail below indicates we have Kyle Jensen to thank for this. (I now notice for the first tme this mail didn't actually go to bioperl, which may explain why this knowledge is not widespread). --quote-- >From dag at sonsorol.org Tue May 11 10:01:20 2004 From: dag at sonsorol.org (Chris Dagdigian) Date: Tue May 11 10:05:21 2004 Subject: [DAS] testing a new Open-Bio website and mail list archive search Message-ID: <40A0DCB0.1050207@sonsorol.org> Hello Everyone, A fantastic volunteer (Kyle Jensen) has been steathily working on a problem for us that has long been a major issue for us -- website searching and indexing. We've tried various solutions and ht://dig implementations in the past and nothing really worked well. Kyle has been setting up a search indexing system based on the code from www.swish-e.org and so far it looks very very promising. We have 2 main indexed search sites: This page is a search index for all of the open-bio hosted websites: http://search.open-bio.org/cgi-bin/obf-search.cgi This page is just for searching mailing list archives: http://search.open-bio.org/cgi-bin/mail-search.cgi Please give the pages a whirl and let me know what you think. Thanks again Kyle! -Chris open-bio.org admin team --endquote-- Cheers, -- Jurgen Pletinckx AlgoNomics NV From amackey at pcbi.upenn.edu Fri Dec 17 15:24:13 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Fri Dec 17 15:21:26 2004 Subject: [Bioperl-l] Bio::SeqIO::staden::read problem In-Reply-To: <20041217021559.B17AB10DF63@smtp.sibsnet.org> References: <20041217021559.B17AB10DF63@smtp.sibsnet.org> Message-ID: <41C3406D.1060601@pcbi.upenn.edu> xuying wrote: > Running make install > make[1]: Entering directory `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' > make[1]: Leaving directory `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' > make[1]: Entering directory `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' > make[1]: Leaving directory `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' > Installing /usr/share/man/man3/Bio::SeqIO::staden::read.3pm > Writing /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi/auto/Bio/.packlist > Appending installation info to /usr/lib/perl5/5.8.0/i386-linux-thread-multi/perllocal.pod > /usr/bin/make install -- OK No, this doesn't look entirely correct; more files should be listed as being installed/written to. From within the CPAN shell, type "look Bio::SeqIO::staden" to drop into a subshell, then do a make clean, and start over with perl Makefile.PL, make, make test, make install; send me all the output. -Aaron From nathanhaigh at ukonline.co.uk Sat Dec 18 08:40:38 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Sat Dec 18 08:38:09 2004 Subject: [Bioperl-l] nmake test on windows Message-ID: Until recently I had thought that "nmake test" on windows only died with a "line too long" error if an old version of ExtUtils::MakeMaker was being used - I thought this error had been dealt with in versions >=6.06. However, it had not! In fact using nmake 1.5 to do "nmake test" resulted in the "line too long" error; whereas nmake 6.0 did not produce this error and conducted the tests correctly. Since most windows users probably use nmake 1.5 I informed the writers of ExtUtils::MakeMaker to this problem and they have come up with a fix that enables the tests to run with both version of nmake, and is currently implemented in ExtUtils::MakeMaker dev release 6.25_01 available from CPAN. Therefore, if you are a windows user and get the "line too long" error when trying to run "nmake test", you should install ExtUtils::MakeMaker v6.25_01 to rectify the problem and/or used nmake 6.0. Nathan --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0451-2, 17/12/2004 Tested on: 18/12/2004 13:40:15 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From xuying at sibs.ac.cn Fri Dec 17 23:39:27 2004 From: xuying at sibs.ac.cn (xuying) Date: Sat Dec 18 10:37:08 2004 Subject: [Bioperl-l] Bio::SeqIO::staden::read problem Message-ID: <20041218044352.2681C10DF5E@smtp.sibsnet.org> all output is in the attached file "out.make.ext". > > >xuying wrote: > >> Running make install >> make[1]: Entering directory `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' >> make[1]: Leaving directory `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' >> make[1]: Entering directory `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' >> make[1]: Leaving directory `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' >> Installing /usr/share/man/man3/Bio::SeqIO::staden::read.3pm >> Writing /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi/auto/Bio/.packlist >> Appending installation info to /usr/lib/perl5/5.8.0/i386-linux-thread-multi/perllocal.pod >> /usr/bin/make install -- OK > >No, this doesn't look entirely correct; more files should be listed as >being installed/written to. From within the CPAN shell, type "look >Bio::SeqIO::staden" to drop into a subshell, then do a make clean, and >start over with perl Makefile.PL, make, make test, make install; send me >all the output. > >-Aaron > > = = = = = = = = = = = = = = = = = = = = ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡Ö Àñ£¡ ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡xuying ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡xuying@sibs.ac.cn ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡2004-12-18 -------------- next part -------------- A non-text attachment was scrubbed... Name: out.make.ext Type: application/octet-stream Size: 21244 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041218/442ed83f/out.make.obj From mbeisen at lbl.gov Sat Dec 18 21:56:12 2004 From: mbeisen at lbl.gov (Michael Eisen) Date: Sat Dec 18 21:53:15 2004 Subject: [Bioperl-l] weird behavior with bulk_load_gff Message-ID: <41C4EDCC.2010500@lbl.gov> I am having a very frustrating time loading a gff database and associated fasta files. This is something I've done thousands of times before, but for some reason, it's choking on my latest set of files, and I can't figure out why. The files are contigs and scaffolds from an assembly of a fly genome. I've got a fasta file with the scaffolds (17 of them), and an associated gff file. bulk_load_gff runs fine, all of the gff records are loaded into the fdata, but only loading 6 of the 17 sequences are loaded when I try to run it splitting the scaffolds into individual files, it again only loads 6 of them, but this time a different 6 (in each case it only loads the first 6 - from a file the first six in the file, from a directory the first six in lexigraphic order) Any thoughts on what's going wrong? -- Michael B. Eisen, Ph.D. (MBEISEN@LBL.GOV) Genome Sciences Division, Lawrence Berkeley Natl Lab Dept. of Molecular and Cell Biology, UC Berkeley Support Open Access to the Scientific Literature www.plos.org From allenday at ucla.edu Sat Dec 18 23:31:20 2004 From: allenday at ucla.edu (Allen Day) Date: Sat Dec 18 22:29:08 2004 Subject: [Bioperl-l] weird behavior with bulk_load_gff In-Reply-To: <41C4EDCC.2010500@lbl.gov> References: <41C4EDCC.2010500@lbl.gov> Message-ID: are you using bioperl-live HEAD, a bioperl-live tag/branch, or a bioperl release? what is the error message, if any? -allen On Sat, 18 Dec 2004, Michael Eisen wrote: > I am having a very frustrating time loading a gff database and > associated fasta files. This is something I've done thousands of times > before, but for some reason, it's choking on my latest set of files, and > I can't figure out why. > > The files are contigs and scaffolds from an assembly of a fly genome. > I've got a fasta file with the scaffolds (17 of them), and an associated > gff file. > > bulk_load_gff runs fine, all of the gff records are loaded into the > fdata, but only loading 6 of the 17 sequences are loaded > > when I try to run it splitting the scaffolds into individual files, it > again only loads 6 of them, but this time a different 6 (in each case it > only loads the first 6 - from a file the first six in the file, from a > directory the first six in lexigraphic order) > > Any thoughts on what's going wrong? > > > From yorambu at kitp.ucsb.edu Sat Dec 18 17:31:40 2004 From: yorambu at kitp.ucsb.edu (Yoram Burak) Date: Sun Dec 19 12:28:36 2004 Subject: [Bioperl-l] problem installing bioperl on cygwin Message-ID: I am trying to isntall bioperl on cygwin. When I run perl -MCPAN -e "install Bundle::BioPerl" it works for a while but then exits with an error (see below). Help will be highly appreciated. Yoram CPAN: Digest::MD5 loaded OK LWP not available Fetching with Net::FTP:ftp: //ftp.perl.org/pub/CPAN/authors/id/C/CR/CRAFFI/CHECKSUMS Checksum for /home/yorambu/.cpan/sources/authors/id/C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz ok Scanning cache /home/yorambu/.cpan/build for sizes C:\cygwin\bin\perl.exe (3512): *** couldn't release memory 0xDE4000(1032192) for 'C:\cygwin\lib\perl5\5.8.5\cygwin-thread-multi-64int\auto\Cwd\Cwd.dll' alignment, Win32 error 487 4 [main] perl 1008 sync_with_child: child 3512(0x628) died before initialization with status code 0x1 38547 [main] perl 1008 sync_with_child: *** child state child loading dlls From brian_osborne at cognia.com Sun Dec 19 13:34:39 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Sun Dec 19 13:32:00 2004 Subject: [Bioperl-l] problem installing bioperl on cygwin In-Reply-To: Message-ID: Yoram, One thing to try is the rebaseall program that comes with Cygwin. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Yoram Burak Sent: Saturday, December 18, 2004 5:32 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] problem installing bioperl on cygwin I am trying to isntall bioperl on cygwin. When I run perl -MCPAN -e "install Bundle::BioPerl" it works for a while but then exits with an error (see below). Help will be highly appreciated. Yoram CPAN: Digest::MD5 loaded OK LWP not available Fetching with Net::FTP:ftp: //ftp.perl.org/pub/CPAN/authors/id/C/CR/CRAFFI/CHECKSUMS Checksum for /home/yorambu/.cpan/sources/authors/id/C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar. gz ok Scanning cache /home/yorambu/.cpan/build for sizes C:\cygwin\bin\perl.exe (3512): *** couldn't release memory 0xDE4000(1032192) for 'C:\cygwin\lib\perl5\5.8.5\cygwin-thread-multi-64int\auto\Cwd\Cwd.dll' alignment, Win32 error 487 4 [main] perl 1008 sync_with_child: child 3512(0x628) died before initialization with status code 0x1 38547 [main] perl 1008 sync_with_child: *** child state child loading dlls _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From paulo.david at netvisao.pt Sun Dec 19 13:23:48 2004 From: paulo.david at netvisao.pt (Paulo Almeida) Date: Sun Dec 19 17:21:14 2004 Subject: [Bioperl-l] problem installing bioperl on cygwin In-Reply-To: References: Message-ID: <41C5C734.3020506@netvisao.pt> That seems to be a problem with cygwin. I read about a fix for memory conflicts with cygwin DLLs that goes like this: a. install the Cygwin rebase package (if necessary) b. shutdown all Cygwin processes c. start bash (do not use rxvt) d. execute rebaseall (in the bash window) You should try to find out what that does exactly though, because I don't know if your problem is the same. -Paulo Yoram Burak a ?crit : > > I am trying to isntall bioperl on cygwin. > > When I run perl -MCPAN -e "install Bundle::BioPerl" it works for a > while but then exits with an error (see below). Help will be highly > appreciated. > > Yoram > > CPAN: Digest::MD5 loaded OK > LWP not available > Fetching with Net::FTP:ftp: > //ftp.perl.org/pub/CPAN/authors/id/C/CR/CRAFFI/CHECKSUMS > Checksum for > /home/yorambu/.cpan/sources/authors/id/C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz > ok > Scanning cache /home/yorambu/.cpan/build for sizes > C:\cygwin\bin\perl.exe (3512): *** couldn't release memory > 0xDE4000(1032192) for > 'C:\cygwin\lib\perl5\5.8.5\cygwin-thread-multi-64int\auto\Cwd\Cwd.dll' > alignment, Win32 error 487 > > 4 [main] perl 1008 sync_with_child: child 3512(0x628) died before > initialization with status code 0x1 > 38547 [main] perl 1008 sync_with_child: *** child state child loading > dlls From amackey at pcbi.upenn.edu Mon Dec 20 07:32:44 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Dec 20 07:30:00 2004 Subject: [Bioperl-l] Bio::SeqIO::staden::read problem In-Reply-To: <20041218044352.2681C10DF5E@smtp.sibsnet.org> References: <20041218044352.2681C10DF5E@smtp.sibsnet.org> Message-ID: <3F61582A-5283-11D9-B3DF-000A9577009E@pcbi.upenn.edu> You should comment out line 316 of /usr/lib/perl5/5.8.0/Bio/SeqIO.pm to read: ?sub BEGIN { ?# ? ?eval { require Bio::SeqIO::staden::read; }; ?} This is one of those things that we can't seem to agree on (some installations seem to require this eval, other's fail when we use it). -Aaron On Dec 17, 2004, at 11:39 PM, xuying wrote: > all output is in the attached file "out.make.ext". >> >> >> xuying wrote: >> >>> Running make install >>> make[1]: Entering directory >>> `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' >>> make[1]: Leaving directory >>> `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' >>> make[1]: Entering directory >>> `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' >>> make[1]: Leaving directory >>> `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' >>> Installing /usr/share/man/man3/Bio::SeqIO::staden::read.3pm >>> Writing >>> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi/auto/Bio/ >>> .packlist >>> Appending installation info to >>> /usr/lib/perl5/5.8.0/i386-linux-thread-multi/perllocal.pod >>> /usr/bin/make install -- OK >> >> No, this doesn't look entirely correct; more files should be listed as >> being installed/written to. From within the CPAN shell, type "look >> Bio::SeqIO::staden" to drop into a subshell, then do a make clean, and >> start over with perl Makefile.PL, make, make test, make install; send >> me >> all the output. >> >> -Aaron >> >> > > = = = = = = = = = = = = = = = = = = = = > > > ????????? > ?? > > > ????????xuying > ????????xuying@sibs.ac.cn > ??????????2004-12-18 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From amackey at pcbi.upenn.edu Mon Dec 20 07:45:59 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Dec 20 07:43:03 2004 Subject: [Bioperl-l] Bio::SeqIO::staden::read problem In-Reply-To: <3F61582A-5283-11D9-B3DF-000A9577009E@pcbi.upenn.edu> References: <20041218044352.2681C10DF5E@smtp.sibsnet.org> <3F61582A-5283-11D9-B3DF-000A9577009E@pcbi.upenn.edu> Message-ID: <1933E344-5285-11D9-B3DF-000A9577009E@pcbi.upenn.edu> Mark, you were the last person to comment/uncomment this line in CVS (the log says "to get CGI to work"). Can you clarify? Was this under mod_perl? All of the read.pm-requiring SeqIO formats should load staden/read.pm themselves, why do we need SeqIO to do it? Thanks, -Aaron On Dec 20, 2004, at 7:32 AM, Aaron J. Mackey wrote: > > You should comment out line 316 of /usr/lib/perl5/5.8.0/Bio/SeqIO.pm > to read: > > ?sub BEGIN { > ?# ? ?eval { require Bio::SeqIO::staden::read; }; > ?} > > This is one of those things that we can't seem to agree on (some > installations seem to require this eval, other's fail when we use it). > > -Aaron > > On Dec 17, 2004, at 11:39 PM, xuying wrote: > >> all output is in the attached file "out.make.ext". >>> >>> >>> xuying wrote: >>> >>>> Running make install >>>> make[1]: Entering directory >>>> `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' >>>> make[1]: Leaving directory >>>> `/root/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align' >>>> make[1]: Entering directory >>>> `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' >>>> make[1]: Leaving directory >>>> `/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden' >>>> Installing /usr/share/man/man3/Bio::SeqIO::staden::read.3pm >>>> Writing >>>> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi/auto/Bio/ >>>> .packlist >>>> Appending installation info to >>>> /usr/lib/perl5/5.8.0/i386-linux-thread-multi/perllocal.pod >>>> /usr/bin/make install -- OK >>> >>> No, this doesn't look entirely correct; more files should be listed >>> as >>> being installed/written to. From within the CPAN shell, type "look >>> Bio::SeqIO::staden" to drop into a subshell, then do a make clean, >>> and >>> start over with perl Makefile.PL, make, make test, make install; >>> send me >>> all the output. >>> >>> -Aaron >>> >>> >> >> = = = = = = = = = = = = = = = = = = = = >> >> >> ????????? >> ?? >> >> >> ????????xuying >> ????????xuying@sibs.ac.cn >> ??????????2004-12-18 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From Lukasz.Huminiecki at cgb.ki.se Mon Dec 20 06:12:35 2004 From: Lukasz.Huminiecki at cgb.ki.se (Lukasz Huminiecki) Date: Mon Dec 20 14:14:49 2004 Subject: [Bioperl-l] does RichSeq carry xrefs? Message-ID: An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041220/240652dc/attachment.htm From jason.stajich at duke.edu Mon Dec 20 14:28:40 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Dec 20 14:25:44 2004 Subject: [Bioperl-l] does RichSeq carry xrefs? In-Reply-To: References: Message-ID: <5A730DF6-52BD-11D9-9807-000393C44276@duke.edu> for my $xref ( $seq->annotation->get_Annotations('dblink') ) { if( $xref->database eq 'EMBL' ) { print $xref->primary_id, " ", $xref->database, "\n"; } You may also want to call $xref->optional_id if there is a secondary accession listed. -jason On Dec 20, 2004, at 6:12 AM, Lukasz Huminiecki wrote: > ? > > Dear BioPerl, > > ? > > It?s a great pleasure to work with BioPerl these days. So much > functionality and increasingly good documentation. > > ? > > I have a little problem which I am sure one of the experts here can > answer with one sentence. I need to align SwissProt seqs to their > nucleotide clones. I am using remote database access to SwissProt with > Bio::DB modules, getting my RichSeq object. I know how to get all the > features and annotations, but how do I get an xref to the EMBL clone? > This info is in SwissProt entries, but is it also in the RichSeq > object? > > ? > > Naturally I can get SwissProt-EMBL mapping from a flatfile, but I was > wondering if there is an elegant ?BioPerl way? of doing that. > > ? > > Many thanks, > > ? > > Lukasz > > > > ========================= > Lukasz Huminiecki, DPhil > Experimental Bioinformatics Lab > CGB, KI, Stockholm > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From cain at cshl.org Mon Dec 20 15:07:29 2004 From: cain at cshl.org (Scott Cain) Date: Mon Dec 20 15:05:04 2004 Subject: [Bioperl-l] ppm of bioperl release candidate Message-ID: <1103573249.3362.24.camel@localhost.localdomain> Hi Lincoln and Aaron, Before I announce the release of the ppm version of GBrowse, I really should have a ppm version of bioperl. In the past, we've hosted "unofficial" ppm builds of bioperl at www.gmod.org/ggb/ppm (the same place as the gbrowse ppm), though to the best of my recollection, I've never done the build myself. Is there any reason I wouldn't be able to do a 'make ppd' in bioperl-live and get something useful? Or is there a ppm build of the current release candidate already available? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From lstein at cshl.edu Mon Dec 20 15:21:37 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Dec 20 17:34:04 2004 Subject: [Bioperl-l] ppm of bioperl release candidate In-Reply-To: <1103573249.3362.24.camel@localhost.localdomain> References: <1103573249.3362.24.camel@localhost.localdomain> Message-ID: <200412201521.38109.lstein@cshl.edu> I've done it before. I don't think there's a ppm target, but you merely need to tar up blib and use the attached (appropriately-modified) ppd. Lincoln bioperl Bioinformatics Toolkit (1.3 prerelease) Bioperl Team (bioperl-l@bioperl.org) On Monday 20 December 2004 03:07 pm, Scott Cain wrote: > Hi Lincoln and Aaron, > > Before I announce the release of the ppm version of GBrowse, I > really should have a ppm version of bioperl. In the past, we've > hosted "unofficial" ppm builds of bioperl at www.gmod.org/ggb/ppm > (the same place as the gbrowse ppm), though to the best of my > recollection, I've never done the build myself. Is there any > reason I wouldn't be able to do a 'make ppd' in bioperl-live and > get something useful? Or is there a ppm build of the current > release candidate already available? > > Thanks, > Scott -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041220/e50f4294/attachment.bin From nathanhaigh at ukonline.co.uk Mon Dec 20 20:21:02 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Mon Dec 20 20:18:32 2004 Subject: [Bioperl-l] ppm of bioperl release candidate In-Reply-To: <1103573249.3362.24.camel@localhost.localdomain> Message-ID: I have been working on making a ppd file for the upcoming 1.5 bioperl release as well as a bioperl-run ppd and associated tar.gz files. In these ppd files I have tried to include as many of the bioperl dependencies I could find so that the user will only need to install the bioperl ppd via PPM (if the user needs to use PPm, chances are, they'll want everything installed so they don't have the hassle of wondering why things don't work at a later date); without this the user my have limited bioperl functionality due to missing dependencies (or other modules that are required for extended functionality). I can let you have the ppd file, then all you need to do is generate the tar.gz file from the blib dir as Lincoln suggested, and modify the ppm file so it points to the tar.gz file. Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Scott Cain > Sent: 20 December 2004 20:07 > To: Lincoln Stein; Aaron J. Mackey > Cc: Bioperl list > Subject: [Bioperl-l] ppm of bioperl release candidate > > Hi Lincoln and Aaron, > > Before I announce the release of the ppm version of GBrowse, I really > should have a ppm version of bioperl. In the past, we've hosted > "unofficial" ppm builds of bioperl at www.gmod.org/ggb/ppm (the same > place as the gbrowse ppm), though to the best of my recollection, I've > never done the build myself. Is there any reason I wouldn't be able to > do a 'make ppd' in bioperl-live and get something useful? Or is there a > ppm build of the current release candidate already available? > > Thanks, > Scott > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain@cshl.org > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0451-2, 17/12/2004 > Tested on: 21/12/2004 00:30:06 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0451-2, 17/12/2004 Tested on: 21/12/2004 01:20:33 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From allenday at ucla.edu Mon Dec 20 21:34:25 2004 From: allenday at ucla.edu (Allen Day) Date: Mon Dec 20 20:32:05 2004 Subject: [Bioperl-l] weird behavior with bulk_load_gff In-Reply-To: References: <41C4EDCC.2010500@lbl.gov> Message-ID: have you tried using bioperl-live HEAD ? if so, did the problem go away? -allen On Sat, 18 Dec 2004, Allen Day wrote: > are you using bioperl-live HEAD, a bioperl-live tag/branch, or a bioperl > release? > > what is the error message, if any? > > -allen > > > On Sat, 18 Dec 2004, Michael Eisen wrote: > > > I am having a very frustrating time loading a gff database and > > associated fasta files. This is something I've done thousands of times > > before, but for some reason, it's choking on my latest set of files, and > > I can't figure out why. > > > > The files are contigs and scaffolds from an assembly of a fly genome. > > I've got a fasta file with the scaffolds (17 of them), and an associated > > gff file. > > > > bulk_load_gff runs fine, all of the gff records are loaded into the > > fdata, but only loading 6 of the 17 sequences are loaded > > > > when I try to run it splitting the scaffolds into individual files, it > > again only loads 6 of them, but this time a different 6 (in each case it > > only loads the first 6 - from a file the first six in the file, from a > > directory the first six in lexigraphic order) > > > > Any thoughts on what's going wrong? > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From cain at cshl.org Tue Dec 21 00:09:56 2004 From: cain at cshl.org (Scott Cain) Date: Tue Dec 21 00:07:41 2004 Subject: [Bioperl-l] ppm of bioperl release candidate In-Reply-To: References: Message-ID: <1103605796.3358.1.camel@localhost.localdomain> Thanks Nathan, I did as Lincoln suggested and just tarred up the blib directory. The ppd file I already had will serve, as for me at least, I just need the prerequisites that are needed for gbrowse--trying to do more is beyond what I want for right now. Thanks, Scott On Tue, 2004-12-21 at 01:21 +0000, Nathan Haigh wrote: > I have been working on making a ppd file for the upcoming 1.5 bioperl release as well as a bioperl-run ppd and associated tar.gz > files. In these ppd files I have tried to include as many of the bioperl dependencies I could find so that the user will only need > to install the bioperl ppd via PPM (if the user needs to use PPm, chances are, they'll want everything installed so they don't have > the hassle of wondering why things don't work at a later date); without this the user my have limited bioperl functionality due to > missing dependencies (or other modules that are required for extended functionality). > > I can let you have the ppd file, then all you need to do is generate the tar.gz file from the blib dir as Lincoln suggested, and > modify the ppm file so it points to the tar.gz file. > > Nathan > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Scott Cain > > Sent: 20 December 2004 20:07 > > To: Lincoln Stein; Aaron J. Mackey > > Cc: Bioperl list > > Subject: [Bioperl-l] ppm of bioperl release candidate > > > > Hi Lincoln and Aaron, > > > > Before I announce the release of the ppm version of GBrowse, I really > > should have a ppm version of bioperl. In the past, we've hosted > > "unofficial" ppm builds of bioperl at www.gmod.org/ggb/ppm (the same > > place as the gbrowse ppm), though to the best of my recollection, I've > > never done the build myself. Is there any reason I wouldn't be able to > > do a 'make ppd' in bioperl-live and get something useful? Or is there a > > ppm build of the current release candidate already available? > > > > Thanks, > > Scott > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain@cshl.org > > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > --- > > avast! Antivirus: Inbound message clean. > > Virus Database (VPS): 0451-2, 17/12/2004 > > Tested on: 21/12/2004 00:30:06 > > avast! is copyright (c) 2000-2003 ALWIL Software. > > http://www.avast.com > > > > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0451-2, 17/12/2004 > Tested on: 21/12/2004 01:20:33 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From brian_osborne at cognia.com Tue Dec 21 12:19:38 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Dec 21 12:17:47 2004 Subject: [Bioperl-l] RE: Bioperl 1.4.0(BUG) In-Reply-To: <1103648563.41c85733c7ed7@webmail4.portugalmail.pt> Message-ID: Madalena, Please provide your code so we can take a closer look. Brian O. -----Original Message----- From: lenavarzim@portugalmail.pt [mailto:lenavarzim@portugalmail.pt] Sent: Tuesday, December 21, 2004 12:03 PM To: jason@bioperl.org; brian_osborne@cognia.com; heikki@ebi.ac.uk; bioperl-bugs@bio.perl.org Subject: Bioperl 1.4.0(BUG) Hi! I'm using Bioperl 1.4.0 and i have the next problem: I'm using Bio::DB::GenBank to query Genbank and I'm certain that the ac (accession number) is there but I'm seeing the error "MSG: acc does not exist". I wait your answer. thanks, Madalena Varzim (student from University of Minho - Portugal) __________________________________________________________ Sabe quanto gasta com a sua liga??o ? Internet? Verifique aqui: http://acesso.portugalmail.pt/contas From jason.stajich at gmail.com Tue Dec 21 12:59:16 2004 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue Dec 21 13:01:09 2004 Subject: [Bioperl-l] Fwd: Bioperl 1.4.0(BUG) Message-ID: <07D31C04-537A-11D9-8E89-000393C44276@gmail.com> Begin forwarded message: > From: lenavarzim@portugalmail.pt > Date: December 21, 2004 12:17:49 PM EST > To: Jason Stajich > Subject: Re: Bioperl 1.4.0(BUG) > > > > Jason, > > the accession number is: AE000440 (Escheriachia coli k-12 MG1655 > section 330 of > 400 of the complete genome). > The program code: > use Bio::Perl; > get_Sequence('Genbank','AE000440'); > the problem is on this method (get_Sequence).I'm not using the > protrein, i'm > using the nucleotide. > > Madalena > > > > > > > > > > > > > > > Citando Jason Stajich : > > We would need the accession and the code you are using to adequately > determine the problem. > > Usually when people have this type of problem they are using a protein > accession against genbank. If this is the case use Bio::DB::GenPept > for protein accessions. > > -jason > On Dec 21, 2004, at 12:02 PM, lenavarzim@portugalmail.pt wrote: > >> >> Hi! >> >> I'm using Bioperl 1.4.0 and i have the next problem: >> >> >> I'm using Bio::DB::GenBank to query Genbank and I'm certain that the >> ac >> (accession number) is there but I'm seeing the error "MSG: acc does >> not exist". >> >> I wait your answer. >> thanks, >> Madalena Varzim (student from University of Minho - Portugal) >> >> >> >> __________________________________________________________ >> Sabe quanto gasta com a sua liga??o ? Internet? >> Verifique aqui: http://acesso.portugalmail.pt/contas >> >> > -- > Jason Stajich > jason.stajich-at-gmail.com or jason-at-bioperl.org > http://jason.open-bio.org > > > > > > __________________________________________________________ > Para grandes mails, grandes contas > Portugalmail: 2 000 MB de espa?o > http://www.portugalmail.pt/2000mb > > -- Jason Stajich jason.stajich-at-gmail.com or jason-at-bioperl.org http://jason.open-bio.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2027 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041221/23e9bbab/attachment.bin From paulo.david at netvisao.pt Tue Dec 21 16:11:29 2004 From: paulo.david at netvisao.pt (Paulo Almeida) Date: Tue Dec 21 16:10:12 2004 Subject: [Bioperl-l] Fwd: Bioperl 1.4.0(BUG) In-Reply-To: <07D31C04-537A-11D9-8E89-000393C44276@gmail.com> References: <07D31C04-537A-11D9-8E89-000393C44276@gmail.com> Message-ID: <41C89181.2070002@netvisao.pt> Madalena, This code works for me: use Bio::DB::GenBank; $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_acc('AE000440'); print $seq->description; -Paulo I don't know if I'm missing something. In the first e-mail you said you were using Bio::DB::Genbank, but I don't think there is a get_Sequence method there? > Jason, > > the accession number is: AE000440 (Escheriachia coli k-12 MG1655 > section 330 of > 400 of the complete genome). > The program code: > use Bio::Perl; > get_Sequence('Genbank','AE000440'); > the problem is on this method (get_Sequence).I'm not using the > protrein, i'm > using the nucleotide. > > Madalena > From brian_osborne at cognia.com Tue Dec 21 16:16:34 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Dec 21 16:14:16 2004 Subject: [Bioperl-l] FW: Bioperl 1.4.0(BUG) Message-ID: -----Original Message----- From: lenavarzim@portugalmail.pt [mailto:lenavarzim@portugalmail.pt] Sent: Tuesday, December 21, 2004 1:01 PM To: Brian Osborne Subject: RE: Bioperl 1.4.0(BUG) This is the message error: ------------ EXCEPTION ------------- MSG: WebDBSeqI Request Error: 500 (Internal Server Error) Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: timeout) Content-Type: text/plain Client-Date: Tue, 21 Dec 2004 18:18:11 GMT Client-Warning: Internal response 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: timeout) STACK Bio::DB::WebDBSeqI::_stream_request /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:728 STACK Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:460 STACK Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/site_perl/5.8.0/Bio/DB/NCBIHelper.pm:415 STACK Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:181 STACK Bio::DB::GenBank::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.0/Bio/DB/GenBank.pm:216 STACK Bio::Perl::get_sequence /usr/lib/perl5/site_perl/5.8.0/Bio/Perl.pm:508 STACK main::procuraSeqC menu.pl:312 STACK toplevel menu.pl:69 -------------------------------------- -------------------- WARNING --------------------- MSG: acc (gb|AE000440) does not exist --------------------------------------------------- Citando Brian Osborne : Madalena, This works for me, from the command-line: >perl -e 'use Bio::Perl; $seq = get_sequence('Genbank','AE000440');write_sequence(">tests.fa","fasta",$seq); ' Brian O. -----Original Message----- From: lenavarzim@portugalmail.pt [mailto:lenavarzim@portugalmail.pt] Sent: Tuesday, December 21, 2004 12:16 PM To: Brian Osborne Subject: RE: Bioperl 1.4.0(BUG) Brain, the accession number is: AE000440 (Escheriachia coli k-12 MG1655 section 330 of 400 of the complete genome). The program code: use Bio::Perl; get_Sequence('Genbank','AE000440'); the problem is on this method (get_Sequence). Madalena Citando Brian Osborne : Madalena, Please provide your code so we can take a closer look. Brian O. -----Original Message----- From: lenavarzim@portugalmail.pt [mailto:lenavarzim@portugalmail.pt] Sent: Tuesday, December 21, 2004 12:03 PM To: jason@bioperl.org; brian_osborne@cognia.com; heikki@ebi.ac.uk; bioperl-bugs@bio.perl.org Subject: Bioperl 1.4.0(BUG) Hi! I'm using Bioperl 1.4.0 and i have the next problem: I'm using Bio::DB::GenBank to query Genbank and I'm certain that the ac (accession number) is there but I'm seeing the error "MSG: acc does not exist". I wait your answer. thanks, Madalena Varzim (student from University of Minho - Portugal) __________________________________________________________ Sabe quanto gasta com a sua liga??o ? Internet? Verifique aqui: http://acesso.portugalmail.pt/contas __________________________________________________________ Email gratuito com 2 000 MB Espa?o para guardar as mem?rias de uma vida http://www.portugalmail.pt/2000mb __________________________________________________________ Pare de esbanjar dinheiro! Compare o pre?o da sua liga??o ? Internet http://acesso.portugalmail.pt/maisbarato From Marc.Logghe at devgen.com Tue Dec 21 16:33:06 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Dec 21 16:31:45 2004 Subject: [Bioperl-l] FW: Bioperl 1.4.0(BUG) Message-ID: Hi all, could it be a proxy issue ? Madalena: in case it is, you can set it with proxy() and, if necessary, authentication() methods. Paulo's test script could be adapted like this then: use Bio::DB::GenBank; $gb = new Bio::DB::GenBank; $gb->proxy('http://myproxy'); $gb->authentication($user,$pass); $seq = $gb->get_Seq_by_acc('AE000440'); print $seq->description; HTH, Marc From razi at genet.sickkids.on.ca Wed Dec 22 14:18:39 2004 From: razi at genet.sickkids.on.ca (Razi Khaja) Date: Wed Dec 22 14:16:19 2004 Subject: [Bioperl-l] Contributing to bioperl? Message-ID: <20041222191839.75398.qmail@web51609.mail.yahoo.com> Wondering how I could get involved in bioperl development? Im interested in contributing and developing code for BLAST and other alignment report parsing as well GFF3 (particulary with respect to representing alignments in GFF3). Razi -- /** * Razi Khaja * The Hospital for Sick Children * Room 9107, Department of Genetics * 555 University Avenue * Toronto, Ontario, M5G 1X8, Canada * Email: razi@genet.sickkids.on.ca * Tel: 416-813-7032 * Fax: 416-813-8319 */ From anunberg at oriongenomics.com Wed Dec 22 15:45:37 2004 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Wed Dec 22 15:42:41 2004 Subject: [Bioperl-l] "feature" in Bio::Restriction::Analysis Message-ID: Sorry if this has been posted before. I am using Bioperl 1.4 from cvs If Bio::Restriction::Analysis->cutters returns the entire enzyme collection if does not find any enzymes that cut. I tried the following code: my $seq = Bio::PrimarySeq->new(-seq=>'ggacgaggttttcctcctgtt'); my $ra = Bio::Restriction::Analysis->new(-seq=>$seq); print "max cutter is ",$ra->max_cuts,"\n"; my $cutters = $ra->cutters; for my $enz ($cutters->each_enzyme){ print join("\t",($enz->name,$enz->string,$ra->seq->seq)),"\n"; } The result was that every enzyme was printed out in the default collection used in the analysis, even those that Could not possibly be in the sequence. Looking at the code I discovered the problem >From the cutters subroutine in Bio::Restriction::Analysis- By default, no args passed to cutters, then $start=1 However this line of code appears after this assignment $start = $self->{'maximum_cuts'} if $start > $self->{'maximum_cuts'}; If there are 0 maximum_cuts then $start is then set to 0, meaning that the cutters subroutine will look for enzymes that cut 0 or more times. I inserted this line right after the collection object is made but before The searchis done for enzymes that cut. This apparently fixes the problem my $set = new Bio::Restriction::EnzymeCollection(-empty => 1); --> #return an empty set if nothing cut --> return $set unless $self->{'maximum_cuts'}; -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From Marc.Logghe at devgen.com Wed Dec 22 18:38:30 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Dec 22 18:36:49 2004 Subject: [Bioperl-l] scf writing issues Message-ID: Hi all, it seems there is a problem (at least I have ;-) when creating scf output starting from a Bio::Seq::SeqWithQuality file. I used the following test script: #!/usr/bin/perl use Bio::SeqIO; use Bio::Seq::SeqWithQuality; my $scff = 'test.scf'; my $out = Bio::SeqIO->new( -file => ">$scff", -format => 'scf'); my $qs = Bio::Seq::SeqWithQuality->new ( -qual => '10 20 30 40 50 50 20 10', -seq => 'ATCGATCG', -id => 'human_id', -accession_number => 'AL000012', ); $out->write_seq(-target => $qs); When you open the produced test.scf file in trev for instance, the sequence is rubbish. It makes no difference when you pass the version paramater or not. Has anybody else noticed these kind of issues with Bio::SeqIO::scf ? I am using bioperl-release-1-4-0 but also tried my luck with bioperl-release-1-5-0-rc1. Regards, Marc From lstein at cshl.edu Wed Dec 22 11:55:05 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Thu Dec 23 08:36:13 2004 Subject: [Bioperl-l] bio::graphics::panel under mod_perl In-Reply-To: <9D11550E-4FAB-11D9-816E-000D933565E8@mail.nih.gov> References: <9D11550E-4FAB-11D9-816E-000D933565E8@mail.nih.gov> Message-ID: <200412221155.06132.lstein@cshl.edu> Funny; this problem is not occurring for me, and I use Bio::Graphics::Panel under mod_perl all the time on multiple web sites. It sounds like a GD problem. My usual advice on these occasions (when GD is acting up) is to search and destroy all old versions of libgd, then reinstall libgd and GD from source. In your case, you might also try doing the same for libpng and libz -- particularly if you haven't updated in a while since there is a security hole in libpng 2.0.26. Lincoln On Thursday 16 December 2004 04:44 pm, Sean Davis wrote: > I know this is a very esoteric question, but here goes: > > I have two scripts, one home-brewed (using bioperl-live from recent > CVS), and gbrowse (1.62), both running under mod_perl (apache 1.3). > Each has the same behavior--in some requests, seemingly at random, > several colors become black. I haven't been able to sort this out, > but since it happens in gbrowse and my custom app using > bio::graphics, I wonder if there is an issue there. It could just > as likely be gd or some other supporting package. I guess I am > wondering if anyone else has noticed these issues or how I might > sort it out. Note that these work fine as straight CGI. Two files > are attached (as I am behind a firewall, so can't show the > sites....). The second has the correct color scheme. > > > Sean -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041222/fd411ea2/attachment-0001.bin From tex at biocompute.net Wed Dec 22 00:49:21 2004 From: tex at biocompute.net (James Thompson) Date: Thu Dec 23 08:36:17 2004 Subject: [Bioperl-l] "feature" in Bio::Restriction::Analysis In-Reply-To: Message-ID: Andrew, I just fixed this in CVS, do a cvs up to get it and verify that it works as it should. The 1.5 release is right around the corner, and this will probably be included. If you find any more bugs, feel free to report them either to the mailing list and our bug tracking software (http://bugzilla.bioperl.org). Using bugzilla provides a good amount of documentation for problems like this. Thanks for the fix. :) Cheers, James Thompson On Wed, 22 Dec 2004, Andrew Nunberg wrote: > Sorry if this has been posted before. > I am using Bioperl 1.4 from cvs > > If Bio::Restriction::Analysis->cutters returns the entire enzyme collection > if does not find any enzymes that cut. > > I tried the following code: > > my $seq = Bio::PrimarySeq->new(-seq=>'ggacgaggttttcctcctgtt'); > my $ra = Bio::Restriction::Analysis->new(-seq=>$seq); > print "max cutter is ",$ra->max_cuts,"\n"; > my $cutters = $ra->cutters; > > for my $enz ($cutters->each_enzyme){ > print join("\t",($enz->name,$enz->string,$ra->seq->seq)),"\n"; > } > > The result was that every enzyme was printed out in the default collection > used in the analysis, even those that > Could not possibly be in the sequence. > > Looking at the code I discovered the problem > > >From the cutters subroutine in Bio::Restriction::Analysis- > > By default, no args passed to cutters, then $start=1 > > However this line of code appears after this assignment > $start = $self->{'maximum_cuts'} if $start > $self->{'maximum_cuts'}; > > If there are 0 maximum_cuts then $start is then set to 0, meaning that the > cutters subroutine will look for enzymes that cut 0 or more times. > > I inserted this line right after the collection object is made but before > The searchis done for enzymes that cut. This apparently fixes the problem > > my $set = new Bio::Restriction::EnzymeCollection(-empty => 1); > > --> #return an empty set if nothing cut > --> return $set unless $self->{'maximum_cuts'}; > > From brian_osborne at cognia.com Thu Dec 23 11:59:40 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Dec 23 11:56:40 2004 Subject: [Bioperl-l] Beginners HOWTO Message-ID: bioperl-l, I've committed the beginnings of a HOWTO for beginners, Windows or Unix. This is not an installation HOWTO, that would be something else. This is more like "your very first scripts", for biologists. By the way, any objections if I change those documents to XML? Essentially just changing the file suffixes and header lines, the HOWTOs we have are already valid XML. Docbook SGML is out-of-date, everything I've read says Docbook XML is the way to go. Brian O. From ewijaya at singnet.com.sg Wed Dec 22 20:02:23 2004 From: ewijaya at singnet.com.sg (Edward Wijaya) Date: Thu Dec 23 11:58:17 2004 Subject: [Bioperl-l] Looking for module that extract EPD database Message-ID: Hi there, Is there any module that allows us to extract sequence from EPD database? As found here: http://www.epd.isb-sib.ch/seq_download.html Thanks so much for your time. -- Edward WIJAYA Singapore From razi at genet.sickkids.on.ca Thu Dec 23 15:54:40 2004 From: razi at genet.sickkids.on.ca (Razi Khaja) Date: Thu Dec 23 15:51:39 2004 Subject: [Bioperl-l] Converting GFF2 records to GFF3 Message-ID: <20041223205440.84719.qmail@web51606.mail.yahoo.com> Sorry for cross posting, but this may be relevent to both bioperl and song-devel. Ive written a small script to convert gff2 records to gff3 using bioperl and vice versa (see gff2_to_gff3.pl and gff3_to_gff2.pl below). In doing this I have noticed some problems in conversion. The method Bio::Tools::GFF::_gff3_string will quote attribute values if they contain characters not in [a-zA-Z0-9,;=.:%^*$@!+_?-] (ie. $value = '"'.$value.'"';) and will output empty quotes for tags without values (ie. $value = "\"\"";). Currently the gff3 spec says: "Unescaped quotation marks, ... are explicitly forbidden." This brings up 2 questions: (1) Are quotes necessary in gff3? (2) When a value is empty, what should be output? a) Tag=""; b) Tag=.; c) Tag=; d) nothing? (Apart from not meeting the spec, this makes it difficult to do transformations from gff2 to gff3 and back to gff2 again.) # ===== gff2_to_gff3.pl ===== #!/usr/bin/perl use strict; use Bio::Tools::GFF; my( $gff2File ) = @ARGV; my $gffio = Bio::Tools::GFF->new(-file=>"$gff2File", -gff_version=>2); while( my $feature = $gffio->next_feature() ) { my $gff3string = $gffio->_gff3_string( $feature ); print "$gff3string\n"; } $gffio->close(); # ===== gff3_to_gff2.pl ===== #!/usr/bin/perl use strict; use Bio::Tools::GFF; my( $gff3File ) = @ARGV; my $gffio = Bio::Tools::GFF->new(-file=>"$gff3File", -gff_version=>3); while( my $feature = $gffio->next_feature() ) { my $gff2string = $gffio->_gff2_string( $feature ); print "$gff2string\n"; } $gffio->close(); /** * Razi Khaja, Bioinformatics Analyst * The Hospital for Sick Children, Toronto */ From Marc.Logghe at devgen.com Sun Dec 26 19:30:51 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Sun Dec 26 19:28:59 2004 Subject: [Bioperl-l] scf writing issues Message-ID: Hi, Think I was able to solve this. The issue was the version after all. >It makes no difference when you pass the version > paramater or not. diff -r1.30 scf.pm 611c611 < if ($writer_fodder->{comments}->{version} != 2 && $comments{version} != 3) { --- > if ($writer_fodder->{comments}->{version} != 2 && $writer_fodder->{comments}->{version} != 3) { I commited the fix to cvs. > When you open the produced test.scf file in trev for > instance, the sequence is > rubbish. This is only the case with version 3.00 scf format, because trev seems not to like this. When you create real version 2 scf format (after the fix) trev shows the sequences allright ... except for the last nucleotide. Meaning there is another (minor) bug around. I'll try to look into that and hope it won't need some poking around in binaries ;-) Enjoy your holidays !! Marc From siawlinglo at yahoo.com Mon Dec 27 20:50:44 2004 From: siawlinglo at yahoo.com (Siaw Ling Lo) Date: Mon Dec 27 20:47:28 2004 Subject: [Bioperl-l] improve speed in extracting Fasta sequence Message-ID: <20041228015044.85708.qmail@web53105.mail.yahoo.com> hi, I am new to bioperl and I need to extract fasta sequences from Uniprot using a list of accession number in a file. The response time is very slow (60 sequences extracted in an hour) as the list of accession number is in thousands. Is there a way to improve the speed? The following is the code: ======================================= use Bio::SeqIO; my $file = 'uniprot'; my $format = 'Fasta'; #read in accession no input file open (ACC, "acc.txt") or die "an error occured with reading acc file: $!"; #loop thru the input file and write to output file while() { chomp; # remove newline $accs[$x] = $_; $x++ ; } $count = @accs; #open write out file - Fasta sequence file open(FILEHANDLE, ">uniprot_fasta.txt") or die "cannot open out file for writing: $!"; my $inseq = Bio::SeqIO->new('-file' => "<$file", '-format' => $format ); # get sequence while (my $seq = $inseq->next_seq) { #search for the acc in the fasta file and extract it for ($i=0; $i<$count; $i++){ #strip off all trailing white spaces - tabs, spaces, new lines and returns $accs[$i] =~ s/\s+$//; #if match, print out the line if ($seq->desc() =~ /$accs[$i]/) { print FILEHANDLE ">"; print FILEHANDLE $seq->desc(),"\n"; print FILEHANDLE $seq->seq,"\n"; #break out of loop when found last; } } } exit; Any advice is much appreciated. Thank you, Siaw Ling __________________________________ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com From rob at salmonella.org Tue Dec 28 00:02:18 2004 From: rob at salmonella.org (Rob Edwards) Date: Mon Dec 27 23:59:17 2004 Subject: [Bioperl-l] improve speed in extracting Fasta sequence In-Reply-To: <20041228015044.85708.qmail@web53105.mail.yahoo.com> References: <20041228015044.85708.qmail@web53105.mail.yahoo.com> Message-ID: This is very slow because you are using an array to store the data that you need and then cycling through the array every time you get a sequence. You should try using a hash to store the lookup information. In your code, you trim off the whitespace on every iteration. Why not just do it the first time that you get the accession number? If you use a hash, you don't need to recycle through the array each time you get a new sequence. You can then simplify your whole code by also using Bio::SeqIO for the output. This will simplify your code a lot. If you want to make this really zippy you should look into the database functionality in bioperl, but I suspect that this will suffice. Rob ======== use strict; use Bio::SeqIO; my $file = 'uniprot'; my $format = 'Fasta'; #read in accession no input file open (ACC, "acc.txt") or die "an error occured with reading acc file: $!"; #loop thru the input file and write to output file my %acc; # declare the hash that is used below while () { chomp; s/\s+//g; # strip the spaces here and then you only need to do it once $accs{$_}=1; # now this is a hash and not an array } my $inseq = Bio::SeqIO->new('-file' => "<$file", '-format' => $format ); my $outseq = Bio::SeqIO->new(-file=>">uniprot_fasta.txt", -format=>'fasta'); # use Bio::SeqIO for output too # get sequence while (my $seq = $inseq->next_seq) { $outseq->write_seq($seq) if $acc{$seq->id}; # print the sequence out if we want that one } From hlapp at gmx.net Tue Dec 28 01:17:42 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Dec 28 01:14:47 2004 Subject: [Bioperl-l] Entrez Gene and bioperl-db In-Reply-To: <1103827902.3597.12.camel@localhost.localdomain> Message-ID: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net> Great to hear that someone is giving this a shot. Yes at this point is appears that NCBI is only offering the ASN.1, not a conversion to XML. Their asn2xml tool will not work with this ASN.1 format either, just checked it to be sure. They do seem to be mulling the option of XML though on the Gene FAQ. Maybe if enough people get in their ears they will spend some effort towards that. After all, the entrez gene web interface can display XML on demand - even though it looks fairly hideous. There is no ASN.1 support in bioperl at all. Also, ASN.1 support in perl is actually thin - there is Convert::ASN1 at version 0.18 two years ago that I could find ... doesn't make me feel warm and fuzzy. In the absence of any XML available from NCBI, gene_info might be the best start. An option could be to check for the presence of the other tab-delimited files and use those that are present. These are tab-delimited and hence the format itself is trivial so you can focus entirely on setting up a Bio::Seq plus annotation that's comparable/compatible to what the current SeqIO::locuslink does. My $0.02 (worth less and less almost every day). -hilmar On Thursday, December 23, 2004, at 10:51 AM, Peter Robinson wrote: > Hi, > > I have been thinking about given a BioPerl EntrezGene parser a try > since > I have been a heavy user of locus link to date. One issue is that the > files that correspond to LL_tmpl (which was a flat file) are now in asn > format > http://www.ncbi.nlm.nih.gov/entrez/query/static/help/ > genehelp.html#query > Although I saw some mention of ASN support in Bioperl by googling, I > can't seem to find any module that does this in the present > distribution. What is the status on that? In any case, I will be > working > on this in the next month or two and if anything nice comes of it I > will > send it to you / BioPerpl. > > best wishes & happy holidays > > Peter > > On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote: >> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for parsing >> any input file, what you're asking is whether or not there is a SeqIO >> parser for NCBI Gene. >> >> The answer to that question is no, not yet. Anybody who feels >> motivated >> is welcome to give it a try ... Since I'll need it, I'll write the >> parser if nobody else does within the next 3 months, but I'm not going >> to promise when exactly this will happen. >> >> -hilmar >> >> On Monday, December 13, 2004, at 08:03 AM, Law, Annie wrote: >> >>> Hi, >>> >>> I was wondering with regards to bioperl-db the scripts and schema and >>> load_seqdatabase.pl has there been preparation for integration of >>> Entrez >>> gene information when locuslink is phased out? Or if it has already >>> been >>> changed could somebody point >>> me to the documentation or changed code? >>> >>> Thanks, >>> Annie. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > -- > Peter N. Robinson > peter.robinson@t-online.de > peter.robinson@charite.de > http://www.charite.de/ch/medgen/robinson/ > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From tbayer at smail.uni-koeln.de Tue Dec 28 04:38:37 2004 From: tbayer at smail.uni-koeln.de (Till Bayer) Date: Tue Dec 28 04:35:13 2004 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::Muscle Problem Message-ID: <41D1299D.7050600@smail.uni-koeln.de> Hi! I have a script that uses the muscle module in biperl-run to align several groups 4 sequences. Everything seems to run fine, but after muscle is done I get the warning -------------------- WARNING --------------------- MSG: Replacing one sequence [ENSRNOG00000000187/1-4680] --------------------------------------------------- The sequence mentioned is then missing from the alignment when I print it to a file, so I always end up with only 3 aligned seqences. Muscle correctly gets all 4 seqs to align. Is this a limitation of the alignment object? What can I do about it? Till From davidg at lsi.upc.edu Tue Dec 28 11:59:51 2004 From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=) Date: Tue Dec 28 11:57:14 2004 Subject: [Bioperl-l] BioPerl Parsing problems Message-ID: <001e01c4ecfe$a66d3640$30b01950@latadecervecix> Hello. I'm trying to parse the results of a blast query, using BPLite. The problem is: I can't access the evalue of a HSP. For accessing the score, there are no problems: I only have to access the HSP's field "score. But... what with evalue? If I do the same using "evalue", there's an error: it can't find that field on the HSP. Here you can see the main part of the code: $factory = Bio::Tools::Run::StandAloneBlast->new(@pars); my $blst_rprt = $factory->blastall($seq); my $exists_results = parse_blast($blst_rprt); And the function parse_blast is this: sub parse_blast { my $blast_report = shift; while (my $result = $blast_report->nextSbjct) { while ( my $hsp = $result->nextHSP ) { my $score = $hsp->score; my $evalue = $hsp->evalue; } } } A different problem (although closely related with the one above) i'd like to ask is this: I thought that maybe it'd be better to use SearchIO for the parsing, but then the problem is that when creating a SearchIO variable, you have to pass the name of the file where the blast result to parse is. But as you can see in the code, my blast result is in the variable $blst_rprt (or $blast_report), not in a file. So should I write the content of the $blast_report variable into a file, and pass this file to SearchIO? I don't like this solution; i would be grateful if you could give me an alternative one. Thank you in advance. -- David Garc?a Cort?s Instituto Nacional de Bioinform?tica (INB) Nodo Computacional GNHC-2 UPC-CIRI c/. Jordi Girona 1-3 Modul C6-E201 Tel. : 934 011 650 E-08034 Barcelona Fax : 934 017 014 Catalunya (Spain) e-mail: davidg@lsi.upc.edu From golharam at umdnj.edu Tue Dec 28 12:19:05 2004 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue Dec 28 12:13:17 2004 Subject: [Bioperl-l] Spidey Parser Message-ID: <001301c4ed01$56c22210$9a00a8c0@GOLHARMOBILE1> Is there a module similar to the sim4 module to parse Spidey results? From jason.stajich at duke.edu Tue Dec 28 13:06:51 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Dec 28 13:05:26 2004 Subject: [Bioperl-l] BioPerl Parsing problems In-Reply-To: <001e01c4ecfe$a66d3640$30b01950@latadecervecix> References: <001e01c4ecfe$a66d3640$30b01950@latadecervecix> Message-ID: <3FB02C9B-58FB-11D9-A116-000393C44276@duke.edu> I think you want $hsp->significance() for BPlite - if you follow the inheritance hierarchy you'll see that BPlite::HSP is-a SeqFeature::SimilarityPair which is-a SeaFeature::Similarity which has the significance method. On Dec 28, 2004, at 11:59 AM, David Garc?a Cort?s wrote: > Hello. > > I'm trying to parse the results of a blast query, using BPLite. The > problem is: I can't access the evalue of a HSP. For accessing the > score, there are no problems: I only have to access the HSP's field > "score. But... what with evalue? If I do the same using "evalue", > there's an error: it can't find that field on the HSP. > > Here you can see the main part of the code: > > $factory = Bio::Tools::Run::StandAloneBlast->new(@pars); > my $blst_rprt = $factory->blastall($seq); > my $exists_results = parse_blast($blst_rprt); > > > And the function parse_blast is this: > > sub parse_blast { > my $blast_report = shift; > > while (my $result = $blast_report->nextSbjct) { > while ( my $hsp = $result->nextHSP ) > { > my $score = $hsp->score; > my $evalue = $hsp->evalue; > } > } > > } > > A different problem (although closely related with the one above) i'd > like to ask is this: > I thought that maybe it'd be better to use SearchIO for the parsing, > but then the problem is that when creating a SearchIO variable, you > have to pass the name of the file where the blast result to parse is. > But as you can see in the code, my blast result is in the variable > $blst_rprt (or $blast_report), not in a file. So should I write the > content of the $blast_report variable into a file, and pass this file > to SearchIO? I don't like this solution; i would be grateful if you > could give me an alternative one. > This is something you can ask StandAloneBlast to do for you by return a Bio::SearchIO object provide the option _READMETHOD => 'Blast' in @params array that you using. This should be the default in recent releases of Bioperl (i.e. you can only get BPlite object if you specify _READMETHOD => 'BPlite') so it would depend on what version of Bioperl you are using. You can also save the blast report to a file if you provide the -o => 'filename' option in @params to write the report to a file and can then reopen it with Bio::SearchIO. -jason > Thank you in advance. > > -- > David Garc?a Cort?s > Instituto Nacional de Bioinform?tica (INB) > Nodo Computacional GNHC-2 UPC-CIRI > c/. Jordi Girona 1-3 > Modul C6-E201 Tel. : 934 011 650 > E-08034 Barcelona Fax : 934 017 014 > Catalunya (Spain) e-mail: davidg@lsi.upc.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From cpr at geospiza.com Mon Dec 27 18:53:35 2004 From: cpr at geospiza.com (Christie Robertson) Date: Tue Dec 28 13:20:12 2004 Subject: [Bioperl-l] problems running Bio::SearchIO on the FASTA results Message-ID: Hi folks, I'm wondering if anybody here is currently parsing the results of the FASTA program with Bio::SearchIO. I'm running into a problem very early on in the process, right at the moment of trying to parse a result. Here is a pared-down example program: >>>>>> use Bio::SearchIO; my $fastaFile = 'chWnt3_hg_Gnomon_prots_E0.001.out'; my $searchIO = new Bio::SearchIO(-format => 'fasta', -file => $fastaFile); my $result = $searchIO->next_result; <<<<<<< This program dies on the call to $searchIO->next_result() with this message: >>>>>>> 1039 cpr@napa:~/fastaTest > ./bioperlFastaParseTest.pl Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/GenericHSP.pm line 231, line 131. ------------- EXCEPTION ------------- MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm FASTP -hit_seq CRNYIEIMPSVAEGVKLGIQECQHQFRGRRWNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADARENRPDARSAMNKHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDFLKDKYDSASEMVVEKHRESRGWVETLRAKYSLFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRIYDVHTCK -hit_length 297 -query_length 297 -query_frame 0 -rank 1 -hit_name hmm6623 -query_name gi|18091804|gb|AAL58093.1| -evalue 0 -score 4361.0 -hit_frame 0 -hsp_length 297 -swscore 3215 -query_seq WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADARENRPDARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK -homology_seq :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::::::::::.:::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::::::: -bits 815.4 (qs=' STACK Bio::Search::HSP::GenericHSP::new /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/GenericHSP.pm:231 STACK Bio::Search::HSP::FastaHSP::new /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/FastaHSP.pm:97 STACK Bio::Factory::ObjectFactory::create_object /usr/lib/perl5/site_perl/5.8.0/Bio/Factory/ObjectFactory.pm:150 STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/SearchResultEventBuilder.pm:275 STACK Bio::SearchIO::fasta::end_element /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/fasta.pm:872 STACK Bio::SearchIO::fasta::next_result /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/fasta.pm:403 STACK toplevel ./bioperlFastaParseTest.pl:9 -------------------------------------- 1040 cpr@napa:~/fastaTest > <<<<<<< Apparently, Bio::Search::HSP::GenericHSP.pm expects Query End and Query Begin to be set, and isn't getting them. Out of curiosity, I commented the die line (231) from GenericHSP.pm, and then the module dies on the next line, looking for Hit Begin and Hit End. Did the FASTA output format get out of sync with SearchIO? Am I missing something? I am attaching my output file. Thanks for any help! Christie ~~~~~~~~~~~~~~~~~~~~~~~~~ Christie P Robertson, PhD Research Associate Geospiza, Inc. cpr@geospiza.com (206)633-4403 ~~~~~~~~~~~~~~~~~~~~~~~~~ -------------- next part -------------- # fasta chWnt3.fasta /usr/local/data/hg_Gnomon_prots.fsa 1 -E 0.001 -Q -s P20 FASTA searches a protein or DNA sequence data bank version 3.4t24 July 21, 2004 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448 Query library chWnt3.fasta vs /usr/local/data/hg_Gnomon_prots.fsa library searching /usr/local/data/hg_Gnomon_prots.fsa library 1>>>gi|18091804|gb|AAL58093.1| Wnt3 [Gallus gallus] - 267 aa vs /usr/local/data/hg_Gnomon_prots.fsa library opt E() < 20 1 0:= 22 2 0:= one = represents 62 library sequences 24 8 0:= 26 43 1:* 28 94 9:*= 30 295 52:*==== 32 526 200:===*===== 34 791 543:========*==== 36 844 1115:============== * 38 1300 1843:===================== * 40 2554 2571:=========================================* 42 3522 3142:==================================================*====== 44 3683 3466:=======================================================*==== 46 3282 3530:===================================================== * 48 3115 3380:=================================================== * 50 2902 3084:=============================================== * 52 2509 2711:========================================= * 54 2338 2316:=====================================* 56 2088 1935:===============================*== 58 1837 1588:=========================*==== 60 1427 1287:====================*=== 62 998 1031:================* 64 809 820:=============* 66 546 648:========= * 68 434 510:======= * 70 370 400:======* 72 303 312:=====* 74 266 243:===*= 76 183 190:===* 78 95 147:==* 80 83 114:=* 82 62 87:=* 84 71 69:=* 86 56 54:* 88 52 41:* inset = represents 1 library sequences 90 28 32:* 92 14 25:* :============== * 94 10 19:* :========== * 96 12 15:* :============ * 98 6 12:* :====== * 100 10 9:* :========*= 102 3 7:* :=== * 104 5 5:* :====* 106 5 4:* :===*= 108 1 3:* := * 110 0 2:* : * 112 0 2:* : * 114 0 1:* :* 116 0 1:* :* 118 0 1:* :* >120 22 1:* :*===================== 17173199 residues in 37605 sequences Expectation_n fit: rho(ln(x))= 1.9310+/-0.000241; mu= 29.4800+/- 0.014 mean_var=54.1966+/-10.424, 0's: 0 Z-trim: 22 B-trim: 0 in 0/45 Lambda= 0.174216 Kolmogorov-Smirnov statistic: 0.0254 (N=29) at 34 FASTA (3.47 Mar 2004) function [optimized, MD20 matrix (18:-29)] ktup: 1 join: 42, opt: 30, gap-pen: -26/-4, width: 32 Scan time: 6.780 The best scores are: opt bits E(37605) hmm6623 Gene predicted by Gnomon on Homo sapiens ( 453) 3215 815.4 0 hmm10855 Gene predicted by Gnomon on Homo sapiens ( 352) 2746 697.3 4.4e-201 hmm9156 Gene predicted by Gnomon on Homo sapiens ( 351) 899 233.0 2.5e-61 hmm2415 Gene predicted by Gnomon on Homo sapiens ( 370) 814 211.7 6.8e-55 hmm19724 Gene predicted by Gnomon on Homo sapiens ( 360) 711 185.8 4.2e-47 hmm18663 Gene predicted by Gnomon on Homo sapiens ( 360) 711 185.8 4.2e-47 hmm12279 Gene predicted by Gnomon on Homo sapiens ( 865) 703 184.7 2.3e-46 hmm13855 Gene predicted by Gnomon on Homo sapiens ( 349) 652 171.0 1.2e-42 hmm9892 Gene predicted by Gnomon on Homo sapiens ( 391) 610 160.5 1.9e-39 hmm14214 Gene predicted by Gnomon on Homo sapiens ( 380) 606 159.5 3.7e-39 hmm2064 Gene predicted by Gnomon on Homo sapiens ( 411) 526 139.5 4.3e-33 hmm627 Gene predicted by Gnomon on Homo sapiens r ( 351) 518 137.3 1.7e-32 hmm2414 Gene predicted by Gnomon on Homo sapiens ( 389) 514 136.4 3.4e-32 hmm19734 Gene predicted by Gnomon on Homo sapiens ( 365) 469 125.0 8.6e-29 hmm18673 Gene predicted by Gnomon on Homo sapiens ( 365) 469 125.0 8.6e-29 hmm16428 Gene predicted by Gnomon on Homo sapiens ( 355) 446 119.2 4.7e-27 hmm13573 Gene predicted by Gnomon on Homo sapiens ( 587) 391 105.9 8e-23 hmm13572 Gene predicted by Gnomon on Homo sapiens ( 709) 378 102.8 8.2e-22 hmm1658 Gene predicted by Gnomon on Homo sapiens ( 354) 363 98.3 8.9e-21 hmm10853 Gene predicted by Gnomon on Homo sapiens ( 365) 170 49.9 3.6e-06 hmm6624 Gene predicted by Gnomon on Homo sapiens ( 357) 165 48.6 8.5e-06 >>hmm6623 Gene predicted by Gnomon on Homo sapiens refer (453 aa) initn: 3215 init1: 3215 opt: 3215 Z-score: 4361.0 bits: 815.4 E(): 0 Smith-Waterman score: 3215; 98.502% identity (99.625% similar) in 267 aa overlap (1-267:187-453) 10 20 30 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAI :::::::::::::::::::::::::::::: hmm662 CRNYIEIMPSVAEGVKLGIQECQHQFRGRRWNCTTIDDSLAIFGPVLDKATRESAFVHAI 160 170 180 190 200 210 40 50 60 70 80 90 gi|180 ASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADAREN :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: hmm662 ASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADAREN 220 230 240 250 260 270 100 110 120 130 140 150 gi|180 RPDARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKY :::::::::.::::::::::::::::::::::::::::::::::::::::::::.::::: hmm662 RPDARSAMNKHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDFLKDKY 280 290 300 310 320 330 160 170 180 190 200 210 gi|180 DSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDR ::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::: hmm662 DSASEMVVEKHRESRGWVETLRAKYSLFKPPTERDLVYYENSPNFCEPNPETGSFGTRDR 340 350 360 370 380 390 220 230 240 250 260 gi|180 TCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK :::::::::::::::::::::::::::::::::::::::::::::::::.::::::: hmm662 TCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRIYDVHTCK 400 410 420 430 440 450 >>hmm10855 Gene predicted by Gnomon on Homo sapiens refe (352 aa) initn: 2746 init1: 2746 opt: 2746 Z-score: 3724.6 bits: 697.3 E(): 4.4e-201 Smith-Waterman score: 2746; 87.640% identity (90.262% similar) in 267 aa overlap (1-267:86-352) 10 20 30 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAI :::::. ::::::::::::::::::::::: hmm108 CRNYVEIMPSVAEGIKIGIQECQHQFRGRRWNCTTVHDSLAIFGPVLDKATRESAFVHAI 60 70 80 90 100 110 40 50 60 70 80 90 gi|180 ASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADAREN ::::::::::::::::: :::: : : : :: :::::::::: .:: ::::::::::: hmm108 ASAGVAFAVTRSCAEGTAAICGCSSRHQGSPGKGWKWGGCSEDIEFGGMVSREFADAREN 120 130 140 150 160 170 100 110 120 130 140 150 gi|180 RPDARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKY ::::::::::::::::: : :::::::::::::::::::::: :::::::::.::::: hmm108 RPDARSAMNRHNNEAGRQAIASHMHLKCKCHGLSGSCEVKTCWWSQPDFRAIGDFLKDKY 180 190 200 210 220 230 160 170 180 190 200 210 gi|180 DSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDR :::::::::::::::::::::: .: :: :::::::::: ::::::::::::::::::: hmm108 DSASEMVVEKHRESRGWVETLRPRYTYFKVPTERDLVYYEASPNFCEPNPETGSFGTRDR 240 250 260 270 280 290 220 230 240 250 260 gi|180 TCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK :::: ::::::::::::::::: : :.:.::: :.:::::::::::: ::::::::: hmm108 TCNVSSHGIDGCDLLCCGRGHNARAERRREKCRCVFHWCCYVSCQECTRVYDVHTCK 300 310 320 330 340 350 >>hmm9156 Gene predicted by Gnomon on Homo sapiens refer (351 aa) initn: 954 init1: 393 opt: 899 Z-score: 1215.7 bits: 233.0 E(): 2.5e-61 Smith-Waterman score: 899; 53.676% identity (58.456% similar) in 272 aa overlap (1-266:87-350) 10 20 30 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAI ::: : : :: .:: : ::: :::.:: hmm915 CKRNLEVMDSVRRGAQLAIEECQYQFRNRRWNCSTLD-SLPVFGKVVTQGTREAAFVYAI 60 70 80 90 100 110 40 50 60 70 80 90 gi|180 ASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADAREN :::::::::: : : :::: : : : :::. .:: : : : :: hmm915 SSAGVAFAVTRACSSGELEKCGCDRTVHGVSPQGFQWSGCSDNIAYGVAFSQSFVDVRER 120 130 140 150 160 170 100 110 120 130 140 gi|180 RPDARSA---MNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLK : : :: ::::::: :: :: ::::: :::::::::: : : :: .: :: hmm915 SKGASSSRALMNLHNNEAGRKAILTHMRVECKCHGVSGSCEVKTCWRAVPPFRQVGHALK 180 190 200 210 220 230 150 160 170 180 190 200 gi|180 DKYDSASEMVVEKHRESRGWVETLRA---KYALFKPPTERDLVYYENSPNFCEPNPETGS .:.: : : :: : : :: . : ::: :. :::: : :: ::: : hmm915 EKFDGATE--VEPRR-----VGSSRALVPRNAQFKPHTDEDLVYLEPSPDFCEQDMRSGV 240 250 260 270 280 210 220 230 240 250 260 gi|180 FGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVH ::: :::: :: ::::.::::::: : :.: : :::::.: : : : . : hmm915 LGTRGRTCNKTSKAIDGCELLCCGRGFHTAQVELAERCSCKFHWCCFVKCRQCQRLVELH 290 300 310 320 330 340 gi|180 TCK :: hmm915 TCR 350 >>hmm2415 Gene predicted by Gnomon on Homo sapiens refer (370 aa) initn: 927 init1: 554 opt: 814 Z-score: 1100.1 bits: 211.7 E(): 6.8e-55 Smith-Waterman score: 814; 52.282% identity (58.091% similar) in 241 aa overlap (22-257:122-360) 10 20 30 40 50 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTIC :: ::. :: :::: : ::: :: : hmm241 ECKWQFRNRRWNCPTAPGPHLFGKIVNRGCRETAFIFAITSAGVTHSVARSCSEGSIESC 100 110 120 130 140 150 60 70 80 90 100 110 gi|180 GCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADARENRPDARSAMNRHNNEAGRTTIL :: .:: : : :::::. ::: : ::: : : : : :: :::::::::. hmm241 TCDYRRRGPGGPDWHWGGCSDNIDFGRLFGREFVDSGEKGRDLRFLMNLHNNEAGRTTVF 160 170 180 190 200 210 120 130 140 150 160 gi|180 DHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSASEMVVEK---HRESRGWV : ::::: :::: :.::: : ::.:: :.:..: :: : :: hmm241 SEMRQECKCHGMSGSCTVRTCWMRLPTLRAVGDVLRDRFDGASRVLYGNRGSNRASR--A 220 230 240 250 260 170 180 190 200 210 220 gi|180 ETLR--AKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLC : :: ::: ::::.: ::::: : :: : :: : :::.::: hmm241 ELLRLEPEDPAHKPPSPHDLVYFEKSPNFCTYSGRLGTAGTAGRACNSSSPALDGCELLC 270 280 290 300 310 320 230 240 250 260 gi|180 CGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK ::::: ::: . :.: : :::::.::: : hmm241 CGRGHRTRTQRVTERCNCTFHWCCHVSCRNCTHTRVLHECL 330 340 350 360 370 >>hmm19724 Gene predicted by Gnomon on Homo sapiens HSC_ (360 aa) initn: 892 init1: 418 opt: 711 Z-score: 960.3 bits: 185.8 E(): 4.2e-47 Smith-Waterman score: 711; 48.718% identity (54.945% similar) in 273 aa overlap (1-267:85-349) 10 20 30 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAI ::: : : :: :: . ::::::.:: hmm197 CHRHPDVMRAISQGVAEWTAECQHQFRQHRWNCNTLDRDHSLFGRVLLRSSRESAFVYAI 60 70 80 90 100 110 40 50 60 70 80 gi|180 ASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWK----WGGCSEDADFGVLVSREFAD :::: ::.:: : : : :: : . : :::::. :.:. : : : hmm197 SSAGVVFAITRACSQGEVKSCSCDPKKMGSAKDS-KGIFDWGGCSDNIDYGIKFARAFVD 120 130 140 150 160 170 90 100 110 120 130 140 gi|180 ARENRP-DARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDY :.: . ::: :: ::: ::: . ::::: :::: .::: : ::: ::: hmm197 AKERKGKDARALMNLHNNRAGRKAVKRFLKQECKCHGVSGSCTLRTCWLAMADFRKTGDY 180 190 200 210 220 230 150 160 170 180 190 200 gi|180 LKDKYDSASEMVVEKHRESRGW-VETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGS : :: : : . : : : :: :: ::::.:::: .: : :: hmm197 LWRKYNGAIQVVM--NQDGTGFTVANER-----FKKPTKNDLVYFENSPDYCIRDREAGS 240 250 260 270 280 210 220 230 240 250 260 gi|180 FGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVH :: : :: :: : : :. :::::. : : :: : ::::: : ::.: ::: hmm197 LGTAGRVCNLTSRGMDSCEVMCCGRGYDTSHVTRMTKCGCKFHWCCAVRCQDCLEALDVH 290 300 310 320 330 340 gi|180 TCK ::: hmm197 TCKAPKNADWTTAT 350 360 >>hmm18663 Gene predicted by Gnomon on Homo sapiens refe (360 aa) initn: 892 init1: 418 opt: 711 Z-score: 960.3 bits: 185.8 E(): 4.2e-47 Smith-Waterman score: 711; 48.718% identity (54.945% similar) in 273 aa overlap (1-267:85-349) 10 20 30 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAI ::: : : :: :: . ::::::.:: hmm186 CHRHPDVMRAISQGVAEWTAECQHQFRQHRWNCNTLDRDHSLFGRVLLRSSRESAFVYAI 60 70 80 90 100 110 40 50 60 70 80 gi|180 ASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWK----WGGCSEDADFGVLVSREFAD :::: ::.:: : : : :: : . : :::::. :.:. : : : hmm186 SSAGVVFAITRACSQGEVKSCSCDPKKMGSAKDS-KGIFDWGGCSDNIDYGIKFARAFVD 120 130 140 150 160 170 90 100 110 120 130 140 gi|180 ARENRP-DARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDY :.: . ::: :: ::: ::: . ::::: :::: .::: : ::: ::: hmm186 AKERKGKDARALMNLHNNRAGRKAVKRFLKQECKCHGVSGSCTLRTCWLAMADFRKTGDY 180 190 200 210 220 230 150 160 170 180 190 200 gi|180 LKDKYDSASEMVVEKHRESRGW-VETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGS : :: : : . : : : :: :: ::::.:::: .: : :: hmm186 LWRKYNGAIQVVM--NQDGTGFTVANER-----FKKPTKNDLVYFENSPDYCIRDREAGS 240 250 260 270 280 210 220 230 240 250 260 gi|180 FGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVH :: : :: :: : : :. :::::. : : :: : ::::: : ::.: ::: hmm186 LGTAGRVCNLTSRGMDSCEVMCCGRGYDTSHVTRMTKCGCKFHWCCAVRCQDCLEALDVH 290 300 310 320 330 340 gi|180 TCK ::: hmm186 TCKAPKNADWTTAT 350 360 >>hmm12279 Gene predicted by Gnomon on Homo sapiens refe (865 aa) initn: 675 init1: 544 opt: 703 Z-score: 947.1 bits: 184.7 E(): 2.3e-46 Smith-Waterman score: 703; 53.361% identity (58.403% similar) in 238 aa overlap (22-254:618-852) 10 20 30 40 50 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTIC :: :: .:: :::: ::: : : : hmm122 ECQYQFRFGRWNCSALGEKTVFGQELRVGSREAAFTYAITAAGVAHAVTAACSQGNLSNC 590 600 610 620 630 640 60 70 80 90 100 gi|180 GCDSHHKG--PPGEGWKWGGCSEDADFGVLVSREFADARENRPDARSAMNRHNNEAGRTT ::: : ::::::::: : .:. :: : :::: . :: :: ::::::: hmm122 GCDREKQGYYNQAEGWKWGGCSADVRYGIDFSRRFVDAREIKKNARRLMNLHNNEAGRKV 650 660 670 680 690 700 110 120 130 140 150 160 gi|180 ILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSASEMVVEKHRESRGWVE : :.: ::::: :::: :::: : :: .: ::.:: : :: : :: hmm122 LEDRMQLECKCHGVSGSCTTKTCWTTLPKFREVGHLLKEKYNAAVQ--VEVVRASRLRQP 710 720 730 740 750 760 170 180 190 200 210 220 gi|180 T-LRAKYAL--FKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLC : :: : : . : : :::: : :::.:: ::: :: : :: :: : :::: : hmm122 TFLRIK-QLRSYQKPMETDLVYIEKSPNYCEEDAATGSVGTQGRLCNRTSPGADGCDTMC 770 780 790 800 810 820 230 240 250 260 gi|180 CGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK ::::.:: . : : :::::.: : hmm122 CGRGYNTHQYTKVWQCNCKFHWCCFVKCNTCSERTEVFTCK 830 840 850 860 >>hmm13855 Gene predicted by Gnomon on Homo sapiens refe (349 aa) initn: 592 init1: 483 opt: 652 Z-score: 880.2 bits: 171.0 E(): 1.2e-42 Smith-Waterman score: 652; 52.101% identity (57.563% similar) in 238 aa overlap (22-254:102-336) 10 20 30 40 50 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTIC :: :: .:: :::: :.: : : : hmm138 ECQFQFRNGRWNCSALGERTVFGKELKVGSREAAFTYAIIAAGVAHAITAACTQGNLSDC 80 90 100 110 120 130 60 70 80 90 100 gi|180 GCDSHHKGP--PGEGWKWGGCSEDADFGVLVSREFADARENRPDARSAMNRHNNEAGRTT ::: : ::::::::: : .:. . : :::: . :: :: ::::::: hmm138 GCDKEKQGQYHRDEGWKWGGCSADIRYGIGFAKVFVDAREIKQNARTLMNLHNNEAGRKI 140 150 160 170 180 190 110 120 130 140 150 160 gi|180 ILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSASEMV-VEKHRESRGWV . : : ::::: :::: :::: : :: : ::::: : : :: : :: hmm138 LEENMKLECKCHGVSGSCTTKTCWTTLPQFRELGYVLKDKYN---EAVHVEPVRASRNKR 200 210 220 230 240 170 180 190 200 210 220 gi|180 ET-LRAKYAL-FKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLC : :. : : .. : . :::: : :::.:: : ::: :: : :: : :::: : hmm138 PTFLKIKKPLSYRKPMDTDLVYIEKSPNYCEEDPVTGSVGTQGRACNKTAPQASGCDLMC 250 260 270 280 290 300 230 240 250 260 gi|180 CGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK ::::.:: : : : ::::::: : hmm138 CGRGYNTHQYARVWQCNCKFHWCCYVKCNTCSERTEMYTCK 310 320 330 340 >>hmm9892 Gene predicted by Gnomon on Homo sapiens refer (391 aa) initn: 821 init1: 316 opt: 610 Z-score: 822.9 bits: 160.5 E(): 1.9e-39 Smith-Waterman score: 610; 45.588% identity (53.676% similar) in 272 aa overlap (1-267:116-380) 10 20 30 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAI ::::: : .:: : . :: :::.:: hmm989 CQRYPDIMRSVGEGAREWIRECQHQFRHHRWNCTTLDRDHTVFGRVMLRSSREAAFVYAI 90 100 110 120 130 140 40 50 60 70 80 gi|180 ASAGVAFAVTRSCAEGTSTICGCDSHHKG----PPGEGWKWGGCSEDADFGVLVSREFAD :::: :.:: : : .: :: . .: :. :::::. .:: . : : hmm989 SSAGVVHAITRACSQGELSVCSCDPYTRGRHHDQRGD-FDWGGCSDNIHYGVRFAKAFVD 150 160 170 180 190 200 90 100 110 120 130 140 gi|180 ARENRP-DARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDY :.: : ::: :: ::: ::: . : ::::: :::: .::: : ::: ::: hmm989 AKEKRLKDARALMNLHNNRCGRTAVRRFLKLECKCHGVSGSCTLRTCWRALSDFRRTGDY 210 220 230 240 250 260 150 160 170 180 190 200 gi|180 LKDKYDSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSF :. .:: : : : . : ::::..::: .: :: hmm989 LRRRYDGAVQVMATQDGAN---FTAARQGY---RRATRTDLVYFDNSPDYCVLDKAAGSL 270 280 290 300 310 210 220 230 240 250 260 gi|180 GTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHT :: : : :: : :::. :::::. : : : : ::::: : : :: :::: hmm989 GTAGRVCSKTSKGTDGCEIMCCGRGYDTTRVTRVTQCECKFHWCCAVRCKECRNTVDVHT 320 330 340 350 360 370 gi|180 CK :: hmm989 CKAPKKAEWLDQT 380 390 >>hmm14214 Gene predicted by Gnomon on Homo sapiens refe (380 aa) initn: 753 init1: 327 opt: 606 Z-score: 817.5 bits: 159.5 E(): 3.7e-39 Smith-Waterman score: 606; 47.761% identity (54.478% similar) in 268 aa overlap (1-257:113-370) 10 20 30 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAI ::: :.: .:: : :: :: .:. hmm142 CHLYQDHMQYIGEGAKTGIKECQYQFRHRRWNCSTVDNT-SVFGRVMQIGSRETAFTYAV 90 100 110 120 130 140 40 50 60 70 80 gi|180 ASAGVAFAVTRSCAEGTSTICGC--DSHHKGPPGEGWKWGGCSEDADFGVLVSREFADAR ::: : : : :: ::: : : . : :::: . :.: .:: ::: hmm142 SAAGVVNAMSRACREGELSTCGCSRAARPKDLPRD-WLWGGCGDNIDYGYRFAKEFVDAR 150 160 170 180 190 200 90 100 110 120 130 140 gi|180 E-NRPDAR----SA---MNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFR : : :. :: :: ::::::: :. ::::: :::: :::: ::: hmm142 ERERIHAKGSYESARILMNLHNNEAGRRTVYNLADVACKCHGVSGSCSLKTCWLQLADFR 210 220 230 240 250 260 150 160 170 180 190 200 gi|180 AIGDYLKDKYDSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNP .:: ::.::::: : ::: : : :: :::: . :: .: : hmm142 KVGDALKEKYDSAAAM----RLNSRG---KLVQVNSRFNSPTTQDLVYIDPSPDYCVRNE 270 280 290 300 310 210 220 230 240 250 gi|180 ETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNT-RTEKRKEKCHCIFHWCCYVSCQECIR ::: :: : :: :: : :::.: :::::. .: :.::: ::::::: : : hmm142 STGSLGTQGRLCNKTSEGMDGCELMCCGRGYDQFKTVQ-TERCHCKFHWCCYVKCKKCTE 320 330 340 350 360 370 260 gi|180 VYDVHTCK hmm142 IVDQFVCK 380 >>hmm2064 Gene predicted by Gnomon on Homo sapiens refer (411 aa) initn: 691 init1: 299 opt: 526 Z-score: 708.7 bits: 139.5 E(): 4.3e-33 Smith-Waterman score: 526; 44.944% identity (53.558% similar) in 267 aa overlap (1-257:144-401) 10 20 30 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAI ::: : : .:: : :: :: ::. hmm206 CQLYQEHMAYIGEGAKTGIKECQHQFRQRRWNCSTADNA-SVFGRVMQIGSRETAFTHAV 120 130 140 150 160 170 40 50 60 70 80 gi|180 ASAGVAFAVTRSCAEGTSTICGCD--SHHKGPPGEGWKWGGCSEDADFGVLVSREFADAR ::: :. : : :: ::: : : . : :::: . ..: .:: ::: hmm206 SAAGVVNAISRACREGELSTCGCSRTARPKDLPRD-WLWGGCGDNVEYGYRFAKEFVDAR 180 190 200 210 220 230 90 100 110 120 130 140 gi|180 E------NRPD--ARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFR : . : :: .:::::: . ::::: :::: :::: .:: hmm206 EREKNFAKGSEEQGRVLMNLQNNEAGRRAVYKMADVACKCHGVSGSCSLKTCWLQLAEFR 240 250 260 270 280 290 150 160 170 180 190 200 gi|180 AIGDYLKDKYDSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNP .:: ::.::::: : : . .: : . : :: :::: . :: .: : hmm206 KVGDRLKEKYDSAAAMRVTR----KGRLELVNSR---FTQPTPEDLVYVDPSPDYCLRNE 300 310 320 330 340 210 220 230 240 250 260 gi|180 ETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRV ::: :: : :: :: : :::.: :::::.: :.::: :::::.: : : hmm206 STGSLGTQGRLCNKTSEGMDGCELMCCGRGYNQFKSVQVERCHCKFHWCCFVRCKKCTEI 350 360 370 380 390 400 gi|180 YDVHTCK hmm206 VDQYICK 410 >>hmm627 Gene predicted by Gnomon on Homo sapiens refere (351 aa) initn: 662 init1: 398 opt: 518 Z-score: 698.2 bits: 137.3 E(): 1.7e-32 Smith-Waterman score: 518; 53.333% identity (60.000% similar) in 135 aa overlap (20-153:81-215) 10 20 30 40 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTST : :: :::::: :::: . :: : : hmm627 IEECKYQFAWDRWNCPERALQLSSHGGLRSANRETAFVHAISSAGVMYTLTRNCSLGDFD 60 70 80 90 100 110 50 60 70 80 90 100 gi|180 ICGCDSHHKGPPG-EGWKWGGCSEDADFGVLVSREFADARENRPDARSAMNRHNNEAGRT :::: : : :: :::::. :: .:. : :: : ::: ::: ::::::: hmm627 NCGCDDSRNGQLGGQGWLWGGCSDNVGFGEAISKQFVDALETGQDARAAMNLHNNEAGRK 120 130 140 150 160 170 110 120 130 140 150 160 gi|180 TILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSASEMVVEKHRESRGWV . : ::::: :::: ::: :.:: .: .::.:: : hmm627 AVKGTMKRTCKCHGVSGSCTTQTCWLQLPEFREVGAHLKEKYHAALKVDLLQGAGNSAAG 180 190 200 210 220 230 170 180 190 200 210 220 gi|180 ETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCG hmm627 RGAIADTFRSISTRELVHLEDSPDYCLENKTLGLLGTEGRECLRRGRALGRWERRSCRRL 240 250 260 270 280 290 >>hmm2414 Gene predicted by Gnomon on Homo sapiens refer (389 aa) initn: 702 init1: 415 opt: 514 Z-score: 692.5 bits: 136.4 E(): 3.4e-32 Smith-Waterman score: 514; 49.485% identity (56.701% similar) in 194 aa overlap (65-257:194-379) 40 50 60 70 80 90 gi|180 VAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADARENRPDA : :::: : ::: ::.: : :: : hmm241 LQALSRGKSFPHSLPSPGPGSSPSPGPQDTWEWGGCNHDMDFGEKFSRDFLDSREAPRDI 170 180 190 200 210 220 100 110 120 130 140 150 gi|180 RSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSAS : ::: :: . . :::::: :::: :::: : :.:::.: :... : hmm241 QARMRIHNNRVGRQVVTENLKRKCKCHGTSGSCQFKTCWRAAPEFRAVGAALRERLGRA- 230 240 250 260 270 280 160 170 180 190 200 210 gi|180 EMVVEKH-RESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCN .. : : : :: . . : :::.: :: ::: : :: ::: : :: hmm241 -IFIDTHNRNSGAFQPRLRPR----RLSGE--LVYFEKSPDFCERDPTMGSPGTRGRACN 290 300 310 320 330 220 230 240 250 260 gi|180 VTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK :: ::: :::::::: : :.::: ::::::: : :: hmm241 KTSRLLDGCGSLCCGRGHNVLRQTRVERCHCRFHWCCYVLCDECKVTEWVNVCK 340 350 360 370 380 >>hmm19734 Gene predicted by Gnomon on Homo sapiens HSC_ (365 aa) initn: 606 init1: 297 opt: 469 Z-score: 631.5 bits: 125.0 E(): 8.6e-29 Smith-Waterman score: 469; 45.136% identity (51.751% similar) in 257 aa overlap (21-267:117-365) 10 20 30 40 50 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTI :.: ::..:. :: ::::: : : hmm197 HERWNCMITAAATTAPMGASPLFGYELSSGTKETAFIYAVMAAGLVHSVTRSCSAGNMTE 90 100 110 120 130 140 60 70 80 90 100 gi|180 CGCDS--HHKGPPGEGWKWGGCSEDADFGVLVSREFAD-------ARENRPDARSAMNRH : :: . : ::: :::::.: .: :: : : .::. ::: : hmm197 CSCDTTLQNGGSASEGWHWGGCSDDVQYGMWFSRKFLDFPIGNTTGKENK--VLLAMNLH 150 160 170 180 190 200 110 120 130 140 150 160 gi|180 NNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSASEMVVEKH :::::: . : :.::: :::: ::::: : :: :::::. hmm197 NNEAGRQAVAKLMSVDCRCHGVSGSCAVKTCWKTMSSFEKIGHLLKDKYENSIQISDKTK 210 220 230 240 250 260 170 180 190 200 210 220 gi|180 RESRGWVETLRAKYALFKPPTERD-LVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGID : : : : : .: : : :::.: : :: : :: :: : : hmm197 RKMRRREKDQR------KIPIHKDDLLYVNKSPNYCVEDKKLGIPGTQGRECNRTSEGAD 270 280 290 300 310 230 240 250 260 gi|180 GCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK :: :::::::.:: . :.: : : ::::: : : :::::: hmm197 GCNLLCCGRGYNTHVVRHVERCECKFIWCCYVRCRRCESMTDVHTCK 320 330 340 350 360 >>hmm18673 Gene predicted by Gnomon on Homo sapiens refe (365 aa) initn: 606 init1: 297 opt: 469 Z-score: 631.5 bits: 125.0 E(): 8.6e-29 Smith-Waterman score: 469; 45.136% identity (51.751% similar) in 257 aa overlap (21-267:117-365) 10 20 30 40 50 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTI :.: ::..:. :: ::::: : : hmm186 HERWNCMITAAATTAPMGASPLFGYELSSGTKETAFIYAVMAAGLVHSVTRSCSAGNMTE 90 100 110 120 130 140 60 70 80 90 100 gi|180 CGCDS--HHKGPPGEGWKWGGCSEDADFGVLVSREFAD-------ARENRPDARSAMNRH : :: . : ::: :::::.: .: :: : : .::. ::: : hmm186 CSCDTTLQNGGSASEGWHWGGCSDDVQYGMWFSRKFLDFPIGNTTGKENK--VLLAMNLH 150 160 170 180 190 200 110 120 130 140 150 160 gi|180 NNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSASEMVVEKH :::::: . : :.::: :::: ::::: : :: :::::. hmm186 NNEAGRQAVAKLMSVDCRCHGVSGSCAVKTCWKTMSSFEKIGHLLKDKYENSIQISDKTK 210 220 230 240 250 260 170 180 190 200 210 220 gi|180 RESRGWVETLRAKYALFKPPTERD-LVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGID : : : : : .: : : :::.: : :: : :: :: : : hmm186 RKMRRREKDQR------KIPIHKDDLLYVNKSPNYCVEDKKLGIPGTQGRECNRTSEGAD 270 280 290 300 310 230 240 250 260 gi|180 GCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK :: :::::::.:: . :.: : : ::::: : : :::::: hmm186 GCNLLCCGRGYNTHVVRHVERCECKFIWCCYVRCRRCESMTDVHTCK 320 330 340 350 360 >>hmm16428 Gene predicted by Gnomon on Homo sapiens refe (355 aa) initn: 549 init1: 344 opt: 446 Z-score: 600.4 bits: 119.2 E(): 4.7e-27 Smith-Waterman score: 446; 51.128% identity (59.398% similar) in 133 aa overlap (20-151:81-213) 10 20 30 40 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTST :::: :.::: :::: . .:. : : hmm164 IEECKFQFAWERWNCPENALQLSTHNRLRSATRETSFIHAISSAGVMYIITKNCSMGDFE 60 70 80 90 100 110 50 60 70 80 90 100 gi|180 ICGCDSHHKGPPGE-GWKWGGCSEDADFGVLVSREFADARENRPDARSAMNRHNNEAGRT :::: : : :: :::::. .:: .:. : : : ::: :: ::: ::: hmm164 NCGCDGSNNGKTGGHGWIWGGCSDNVEFGERISKLFVDSLEKGKDARALMNLHNNRAGRL 120 130 140 150 160 170 110 120 130 140 150 160 gi|180 TILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSASEMVVEKHRESRGWV . : ::::: :::: . ::: .:: ::::: ::: hmm164 AVRATMKRTCKCHGISGSCSIQTCWLQLAEFREMGDYLKAKYDQALKIEMDKRQLRAGNS 180 190 200 210 220 230 170 180 190 200 210 220 gi|180 ETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCG hmm164 AEGHWVPAEAFLPSAEAELIFLEESPDYCTCNSSLGIYGTEGRECLQNSHNTSRWERRSC 240 250 260 270 280 290 >>hmm13573 Gene predicted by Gnomon on Homo sapiens refe (587 aa) initn: 726 init1: 344 opt: 391 Z-score: 524.4 bits: 105.9 E(): 8e-23 Smith-Waterman score: 391; 43.541% identity (49.761% similar) in 209 aa overlap (65-257:379-577) 40 50 60 70 80 90 gi|180 VAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADARENRPDA : ::::: : :: :..: : :: : hmm135 LDALQRGKGLSHGVPEHPALPTASPGLQDSWEWGGCSPDMGFGERFSKDFLDSREPHRDI 350 360 370 380 390 400 100 110 120 130 140 150 gi|180 RSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSAS : ::: :: . . : :::::: :::: :::: :.:: .: :. .. : hmm135 HARMRLHNNRVGRQAVMENMRRKCKCHGTSGSCQLKTCWQVTPEFRTVGALLRSRFHRAT 410 420 430 440 450 460 160 170 180 190 gi|180 EMVVEKHRESRGWVET----------------LRAKYALFKPPTERDLVYYENSPNFCEP . : : : :: : ::::.: :: ::: hmm135 --LIRPHNRNGGQLEPGPAGAPSPAPGAPGPRRRA------SPA--DLVYFEKSPDFCER 470 480 490 500 510 200 210 220 230 240 250 gi|180 NPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECI : : :: : :: : : ::: ::::::: : :.::: :::::.: : :: hmm135 EPRLDSAGTVGRLCNKSSAGSDGCGSMCCGRGHNILRQTRSERCHCRFHWCCFVVCEECR 520 530 540 550 560 570 260 gi|180 RVYDVHTCK hmm135 ITEWVSVCK 580 >>hmm13572 Gene predicted by Gnomon on Homo sapiens refe (709 aa) initn: 641 init1: 295 opt: 378 Z-score: 506.2 bits: 102.8 E(): 8.2e-22 Smith-Waterman score: 378; 46.667% identity (51.795% similar) in 195 aa overlap (65-257:511-699) 40 50 60 70 80 90 gi|180 VAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADARENRP-- : :::: .: ::: :: : ::: : hmm135 PRGRAPPRPSGLPGTPGPPGPAGSPEGSAAWEWGGCGDDVDFGDEKSRLFMDARHKRGRG 490 500 510 520 530 540 100 110 120 130 140 150 gi|180 DARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDS : : ::::::: . : :::::::::: .::: : :: .: : ... hmm135 DIRALVQLHNNEAGRLAVRSHTRTECKCHGLSGSCALRTCWQKLPPFREVGARLLERFHG 550 560 570 580 590 600 160 170 180 190 200 210 gi|180 ASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTC :: : . . : ::: :: : :: :: :: ::: ::: : : hmm135 ASR-VMGTN-DGKALLPAVRT----LKPPGRADLLYAADSPDFCAPNRRTGSPGTRGRAC 610 620 630 640 650 220 230 240 250 260 gi|180 NVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK : ::::::::::: : : : ::::: : :. : hmm135 NSSAPDLSGCDLLCCGRGHRQESVQLEENCLCRFHWCCVVQCHRCRVRKELSLCL 660 670 680 690 700 >>hmm1658 Gene predicted by Gnomon on Homo sapiens refer (354 aa) initn: 497 init1: 351 opt: 363 Z-score: 487.6 bits: 98.3 E(): 8.9e-21 Smith-Waterman score: 363; 42.041% identity (51.020% similar) in 245 aa overlap (17-257:104-344) 10 20 30 40 gi|180 WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEG :.. :::::::.: : . :. : : : hmm165 REVMKACRRAFADMRWNCSSIELAPNYLLDLERGTRESAFVYALSAAAISHAIARACTSG 80 90 100 110 120 130 50 60 70 80 90 100 gi|180 TSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADA----RENRPDARSAMNRHN : : ::: : .:::: . .: : : :: . : : :: hmm165 DLPGCSCGPVPGEPPGPGNRWGGCADNLSYGLLMGAKFSDAPMKVKKTGSQANKLMRLHN 140 150 160 170 180 190 110 120 130 140 150 160 gi|180 NEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLKDKYDSASEMVVEKHR : :: :::::: :::: ..::: . . :: .: :: : :: hmm165 SEVGRQALRASLEMKCKCHGVSGSCSIRTCWKGLQELQDVAADLKTRYLSATKVV---HR 200 210 220 230 240 250 170 180 190 200 210 220 gi|180 ESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGC : : : .: . .::: :: :: : :: :: :: :: :: : : : hmm165 -PMGTRKHLVPKDLDIRPVKDSELVYLQSSPDFCMKNEKVGSHGTQDRQCNKTSNGSDSC 260 270 280 290 300 230 240 250 260 gi|180 DLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK :: :::::.: :.. :.::: .:::::: : : hmm165 DLMCCGRGYNPYTDRVVERCHCKYHWCCYVTCRRCERTVERYVCK 310 320 330 340 350 >>hmm10853 Gene predicted by Gnomon on Homo sapiens refe (365 aa) initn: 283 init1: 170 opt: 170 Z-score: 225.4 bits: 49.9 E(): 3.6e-06 Smith-Waterman score: 170; 55.556% identity (66.667% similar) in 36 aa overlap (118-153:215-250) 90 100 110 120 130 140 gi|180 RENRPDARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLK ::::: :::: :.::: : .: .:: hmm108 RRSSKDLRARVDFHNNLVGVKVIKAGVETTCKCHGVSGSCTVRTCWRQLAPFHEVGKHLK 190 200 210 220 230 240 150 160 170 180 190 200 gi|180 DKYDSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGT ::. : hmm108 HKYETALKVGSTTNEAAGEAGAISPPRGRASGAGGSDPLPRTPELVHLDDSPSFCLAGRF 250 260 270 280 290 300 >>hmm6624 Gene predicted by Gnomon on Homo sapiens refer (357 aa) initn: 339 init1: 165 opt: 165 Z-score: 218.7 bits: 48.6 E(): 8.5e-06 Smith-Waterman score: 165; 81.250% identity (87.500% similar) in 16 aa overlap (118-133:210-225) 90 100 110 120 130 140 gi|180 RENRPDARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDYLK ::::: :::: :.::: hmm662 KRGNKDLRARADAHNTHVGIKAVKSGLRTTCKCHGVSGSCAVRTCWKQLSPFRETGQVLK 180 190 200 210 220 230 150 160 170 180 190 200 gi|180 DKYDSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGT hmm662 LRYDSAVKVSSATNEALGRLELWAPARQGSLTKGLAPRSGDLVYMEDSPSFCRPSKYSPG 240 250 260 270 280 290 267 residues in 1 query sequences 17173199 residues in 37605 library sequences Scomplib [34t24] start: Mon Dec 27 15:38:25 2004 done: Mon Dec 27 15:38:32 2004 Total Scan time: 6.780 Total Display time: 0.030 Function used was FASTA [version 3.4t24 July 21, 2004] From jason.stajich at duke.edu Tue Dec 28 14:18:01 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Dec 28 14:16:50 2004 Subject: [Bioperl-l] problems running Bio::SearchIO on the FASTA results Message-ID: <30E39C3C-5905-11D9-A116-000393C44276@duke.edu> Christie - this has to do with how the FASTA format has changed with the latest releases. The parser has been updated to handle the changed format -update Bio/SearchIO/fasta.pm file from CVS or grab it here; http://bioperl.org/SRC/ I did not put these changes on the 1.4 branch as I didn't think we'd be releasing off that branch, but I can merge the changes there as well if it will help people. -jason > Hi folks, > > I'm wondering if anybody here is currently parsing the results of > the FASTA program with Bio::SearchIO. I'm running into a problem very > early on in the process, right at the moment of trying to parse a > result. > Here is a pared-down example program: > > >>>>>> > > use Bio::SearchIO; > > my $fastaFile = 'chWnt3_hg_Gnomon_prots_E0.001.out'; > my $searchIO = new Bio::SearchIO(-format => 'fasta', > -file => $fastaFile); > > my $result = $searchIO->next_result; > > <<<<<<< > > This program dies on the call to $searchIO->next_result() with this > message: > > >>>>>>> > > 1039 cpr at napa:~/fastaTest > ./bioperlFastaParseTest.pl > Use of uninitialized value in concatenation (.) or string at > /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/GenericHSP.pm line 231, > line 131. > > ------------- EXCEPTION ------------- > MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm > FASTP -hit_seq > CRNYIEIMPSVAEGVKLGIQECQHQFRGRRWNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTR > SCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADARENRPDARSAMNKHNNEAGRTTILD > HMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDFLKDKYDSASEMVVEKHRESRGWVETLRAKYSLFKPPTE > RDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSC > QECIRIYDVHTCK > -hit_length 297 -query_length 297 -query_frame 0 -rank 1 -hit_name > hmm6623 > -query_name gi|18091804|gb|AAL58093.1| -evalue 0 -score 4361.0 > -hit_frame > 0 -hsp_length 297 -swscore 3215 -query_seq > WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCS > EDADFGVLVSREFADARENRPDARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAI > GDYLKDKYDSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCN > VTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK > -homology_seq > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > ::::::::::::::::::::::::::::.:::::::::::::::::::::::::::::::::::::::::: > ::.:::::::::::::::::::::::::::::: > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > ::::::::::::.::::::: > -bits 815.4 (qs=' > STACK Bio::Search::HSP::GenericHSP::new > /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/GenericHSP.pm:231 > STACK Bio::Search::HSP::FastaHSP::new > /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/FastaHSP.pm:97 > STACK Bio::Factory::ObjectFactory::create_object > /usr/lib/perl5/site_perl/5.8.0/Bio/Factory/ObjectFactory.pm:150 > STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp > /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/ > SearchResultEventBuilder.pm:275 > STACK Bio::SearchIO::fasta::end_element > /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/fasta.pm:872 > STACK Bio::SearchIO::fasta::next_result > /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/fasta.pm:403 > STACK toplevel ./bioperlFastaParseTest.pl:9 > > -------------------------------------- > 1040 cpr at napa:~/fastaTest > > > <<<<<<< > > Apparently, Bio::Search::HSP::GenericHSP.pm expects Query End and Query > Begin to be set, and isn't getting them. Out of curiosity, I commented > the die line (231) from GenericHSP.pm, and then the module dies on the > next line, looking for Hit Begin and Hit End. Did the FASTA output > format > get out of sync with SearchIO? Am I missing something? > > I am attaching my output file. > > Thanks for any help! > > Christie > > > ~~~~~~~~~~~~~~~~~~~~~~~~~ > Christie P Robertson, PhD > Research Associate > Geospiza, Inc. > > cpr at geospiza.com > (206)633-4403 > ~~~~~~~~~~~~~~~~~~~~~~~~~ > -------------- next part -------------- > # fasta chWnt3.fasta /usr/local/data/hg_Gnomon_prots.fsa 1 -E 0.001 -Q > -s P20 > FASTA searches a protein or DNA sequence data bank > version 3.4t24 July 21, 2004 -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From cain at cshl.org Tue Dec 28 14:20:06 2004 From: cain at cshl.org (Scott Cain) Date: Tue Dec 28 14:25:58 2004 Subject: [Bioperl-l] Re: Converting GFF2 records to GFF3 In-Reply-To: <200412281822.iBSIM0Ks010650@portal.open-bio.org> References: <200412281822.iBSIM0Ks010650@portal.open-bio.org> Message-ID: <1104261607.10463.44.camel@localhost.localdomain> Hi Razi, I think the spec is pretty clear on the question of quotes--they are to be escaped. Unfortunately, Bio::Tools::GFF is not a great module and we are moving away from using it in favor of FeatureIO modules (of which, gff.pm is one). I've commented out the line that adds quotes to the values in BTGFF. Quotes are used in GFF2 to group together text that has spaces in them, since space is used as a delimiter in the ninth column of GFF2. In GFF3, delimiting is much clearer: everything between the '=' and either ';' (single value) or ',' (for a list) is the value. As for your second question, I'm not sure what the answer is without an example, but I suspect it is related to a boolean property, and if so, it should have a value of one, for example, "is_current=1;" Finally, note that the scripts you wrote won't work generally for at least a few reasons, though they may in your case. Use extreme caution, though, because your script doesn't verify that the feature type is part of SOFA, doesn't deal with parent-child relationships, and doesn't guarantee the the rules are followed for reserved tag names. I've found that when I want to convert GFF2 to GFF3, it is a partially manual process, where I can script the easy things and then fix problems by hand. Good luck! Scott On Tue, 2004-12-28 at 13:22 -0500, bioperl-l-request@portal.open-bio.org wrote: > Date: Thu, 23 Dec 2004 15:54:40 -0500 (EST) > From: Razi Khaja > Subject: [Bioperl-l] Converting GFF2 records to GFF3 > To: song-devel@lists.sourceforge.org, bioperl > Message-ID: <20041223205440.84719.qmail@web51606.mail.yahoo.com> > Content-Type: text/plain; charset=us-ascii > > Sorry for cross posting, but this may be relevent to both bioperl and song-devel. > > Ive written a small script to convert gff2 records to gff3 using bioperl and vice versa (see gff2_to_gff3.pl and gff3_to_gff2.pl below). > > In doing this I have noticed some problems in conversion. > > The method Bio::Tools::GFF::_gff3_string will quote attribute values if they contain characters not in [a-zA-Z0-9,;=.:%^*$@!+_?-] (ie. $value = '"'.$value.'"';) and will output empty quotes for tags without values (ie. $value = "\"\"";). > > Currently the gff3 spec says: "Unescaped quotation marks, ... are explicitly forbidden." > > This brings up 2 questions: > (1) Are quotes necessary in gff3? > (2) When a value is empty, what should be output? > a) Tag=""; > b) Tag=.; > c) Tag=; > d) nothing? > > (Apart from not meeting the spec, this makes it difficult to do transformations from gff2 to gff3 and back to gff2 again.) > > > > > # ===== gff2_to_gff3.pl ===== > #!/usr/bin/perl > use strict; > use Bio::Tools::GFF; > my( $gff2File ) = @ARGV; > my $gffio = Bio::Tools::GFF->new(-file=>"$gff2File", > -gff_version=>2); > while( my $feature = $gffio->next_feature() ) { > my $gff3string = $gffio->_gff3_string( $feature ); > print "$gff3string\n"; > } > $gffio->close(); > > > > # ===== gff3_to_gff2.pl ===== > > #!/usr/bin/perl > use strict; > use Bio::Tools::GFF; > my( $gff3File ) = @ARGV; > my $gffio = Bio::Tools::GFF->new(-file=>"$gff3File", -gff_version=>3); > while( my $feature = $gffio->next_feature() ) { > my $gff2string = $gffio->_gff2_string( $feature ); > print "$gff2string\n"; > } > $gffio->close(); > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cpr at geospiza.com Tue Dec 28 14:46:59 2004 From: cpr at geospiza.com (Christie Robertson) Date: Tue Dec 28 14:43:50 2004 Subject: [Bioperl-l] problems running Bio::SearchIO on the FASTA results In-Reply-To: <30E39C3C-5905-11D9-A116-000393C44276@duke.edu> References: <30E39C3C-5905-11D9-A116-000393C44276@duke.edu> Message-ID: Great, thanks a lot, Jason! Christie On Tue, 28 Dec 2004, Jason Stajich wrote: > Christie - this has to do with how the FASTA format has changed with > the latest releases. The parser has been updated to handle the changed > format -update Bio/SearchIO/fasta.pm file from CVS or grab it here; > http://bioperl.org/SRC/ > > I did not put these changes on the 1.4 branch as I didn't think we'd be > releasing off that branch, but I can merge the changes there as well if > it will help people. > > -jason > > > Hi folks, > > > > I'm wondering if anybody here is currently parsing the results of > > the FASTA program with Bio::SearchIO. I'm running into a problem very > > early on in the process, right at the moment of trying to parse a > > result. > > Here is a pared-down example program: > > > > >>>>>> > > > > use Bio::SearchIO; > > > > my $fastaFile = 'chWnt3_hg_Gnomon_prots_E0.001.out'; > > my $searchIO = new Bio::SearchIO(-format => 'fasta', > > -file => $fastaFile); > > > > my $result = $searchIO->next_result; > > > > <<<<<<< > > > > This program dies on the call to $searchIO->next_result() with this > > message: > > > > >>>>>>> > > > > 1039 cpr at napa:~/fastaTest > ./bioperlFastaParseTest.pl > > Use of uninitialized value in concatenation (.) or string at > > /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/GenericHSP.pm line 231, > > line 131. > > > > ------------- EXCEPTION ------------- > > MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm > > FASTP -hit_seq > > CRNYIEIMPSVAEGVKLGIQECQHQFRGRRWNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTR > > SCAEGTSTICGCDSHHKGPPGEGWKWGGCSEDADFGVLVSREFADARENRPDARSAMNKHNNEAGRTTILD > > HMHLKCKCHGLSGSCEVKTCWWAQPDFRAIGDFLKDKYDSASEMVVEKHRESRGWVETLRAKYSLFKPPTE > > RDLVYYENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSC > > QECIRIYDVHTCK > > -hit_length 297 -query_length 297 -query_frame 0 -rank 1 -hit_name > > hmm6623 > > -query_name gi|18091804|gb|AAL58093.1| -evalue 0 -score 4361.0 > > -hit_frame > > 0 -hsp_length 297 -swscore 3215 -query_seq > > WNCTTIDDSLAIFGPVLDKATRESAFVHAIASAGVAFAVTRSCAEGTSTICGCDSHHKGPPGEGWKWGGCS > > EDADFGVLVSREFADARENRPDARSAMNRHNNEAGRTTILDHMHLKCKCHGLSGSCEVKTCWWAQPDFRAI > > GDYLKDKYDSASEMVVEKHRESRGWVETLRAKYALFKPPTERDLVYYENSPNFCEPNPETGSFGTRDRTCN > > VTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCIFHWCCYVSCQECIRVYDVHTCK > > -homology_seq > > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > > ::::::::::::::::::::::::::::.:::::::::::::::::::::::::::::::::::::::::: > > ::.:::::::::::::::::::::::::::::: > > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > > ::::::::::::.::::::: > > -bits 815.4 (qs=' > > STACK Bio::Search::HSP::GenericHSP::new > > /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/GenericHSP.pm:231 > > STACK Bio::Search::HSP::FastaHSP::new > > /usr/lib/perl5/site_perl/5.8.0/Bio/Search/HSP/FastaHSP.pm:97 > > STACK Bio::Factory::ObjectFactory::create_object > > /usr/lib/perl5/site_perl/5.8.0/Bio/Factory/ObjectFactory.pm:150 > > STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp > > /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/ > > SearchResultEventBuilder.pm:275 > > STACK Bio::SearchIO::fasta::end_element > > /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/fasta.pm:872 > > STACK Bio::SearchIO::fasta::next_result > > /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/fasta.pm:403 > > STACK toplevel ./bioperlFastaParseTest.pl:9 > > > > -------------------------------------- > > 1040 cpr at napa:~/fastaTest > > > > > <<<<<<< > > > > Apparently, Bio::Search::HSP::GenericHSP.pm expects Query End and Query > > Begin to be set, and isn't getting them. Out of curiosity, I commented > > the die line (231) from GenericHSP.pm, and then the module dies on the > > next line, looking for Hit Begin and Hit End. Did the FASTA output > > format > > get out of sync with SearchIO? Am I missing something? > > > > I am attaching my output file. > > > > Thanks for any help! > > > > Christie > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~ > > Christie P Robertson, PhD > > Research Associate > > Geospiza, Inc. > > > > cpr at geospiza.com > > (206)633-4403 > > ~~~~~~~~~~~~~~~~~~~~~~~~~ > > -------------- next part -------------- > > # fasta chWnt3.fasta /usr/local/data/hg_Gnomon_prots.fsa 1 -E 0.001 -Q > > -s P20 > > FASTA searches a protein or DNA sequence data bank > > version 3.4t24 July 21, 2004 > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > > From allenday at ucla.edu Wed Dec 29 02:34:47 2004 From: allenday at ucla.edu (Allen Day) Date: Wed Dec 29 01:32:46 2004 Subject: [Bioperl-l] Re: Questions about Bio::AlignIO::maf In-Reply-To: <000601c4ed68$c493f8b0$7347d90a@imcb.astar.edu.sg> References: <000601c4ed68$c493f8b0$7347d90a@imcb.astar.edu.sg> Message-ID: Hi Alison, I did not add strand information as I didn't need it at the time of writing. However, I believe this has come up on list recently and someone has already patched in strand support, as well as an off-by-one bug in my code. Can whoever did these patches recently pipe in? Thanks. Alison, please keep the bioperl list CCed in your reply. -Allen On Wed, 29 Dec 2004, Lee Ping Alison wrote: > Dear Mr Day, > > While reading the Bioperl 1.4 documentation for the "Bio::AlignIO::maf" module, I found your email address and I have some questions about how to use "maf." > > Am I right to say that the strand information of each sequence in an "maf" file is not recorded, when the LocateableSeq object is created in the nextAln() method? I observed that $strand was not one of the arguments in the call to the constructor. > > If yes, what is the reason for not using the strand information? And subsequently, if I need to retrieve the strand information, how should I go about it? > > Thank you very much for answering my queries. > > Best Regards, > Alison > (Institute of Molecular and Cell Biology, Singapore) From g0404203 at nus.edu.sg Wed Dec 29 09:45:49 2004 From: g0404203 at nus.edu.sg (Lee Ping Alison) Date: Wed Dec 29 09:45:28 2004 Subject: [Bioperl-l] Re: Questions about Bio::AlignIO::maf References: <000601c4ed68$c493f8b0$7347d90a@imcb.astar.edu.sg> Message-ID: <001d01c4edb5$20a404a0$074b12ac@bac1745> Hi, Mr Day, thanks a lot for helping me with my queries. I've just obtained the most recent bioperl-live code via cvs with the bug fixes you've mentioned. I'm wondering why the off-by-one bug fix (end = start+size-1) was necessary. I'm thinking that "end = start+size" is correct. Because the MAF file format by UCSC states that coordinates are half-open, zero-based. And I have understood it as the coordinates in "maf" module should be (start, end] (start exclusive, end inclusive). I've also tried several coordinates that agree with UCSC Genome Browser which uses [start, end]. Hence, in my opinion the bug fix was not necessary. Will someone please enlighten me on this? Thank you very much! Alison. ----- Original Message ----- From: Allen Day To: Lee Ping Alison Cc: Bioperl Sent: 29 December, 2004 3:34 PM Subject: Re: Questions about Bio::AlignIO::maf Hi Alison, I did not add strand information as I didn't need it at the time of writing. However, I believe this has come up on list recently and someone has already patched in strand support, as well as an off-by-one bug in my code. Can whoever did these patches recently pipe in? Thanks. Alison, please keep the bioperl list CCed in your reply. -Allen On Wed, 29 Dec 2004, Lee Ping Alison wrote: > Dear Mr Day, > > While reading the Bioperl 1.4 documentation for the "Bio::AlignIO::maf" module, I found your email address and I have some questions about how to use "maf." > > Am I right to say that the strand information of each sequence in an "maf" file is not recorded, when the LocateableSeq object is created in the nextAln() method? I observed that $strand was not one of the arguments in the call to the constructor. > > If yes, what is the reason for not using the strand information? And subsequently, if I need to retrieve the strand information, how should I go about it? > > Thank you very much for answering my queries. > > Best Regards, > Alison > (Institute of Molecular and Cell Biology, Singapore) From stefan.weckx at vib.be Wed Dec 29 11:31:54 2004 From: stefan.weckx at vib.be (Stefan Weckx) Date: Wed Dec 29 11:27:02 2004 Subject: [Bioperl-l] error from GenBank Message-ID: <41D2DBFA.6050100@vib.be> hi all, does anybody have experience with this: I'm retrieving data from GenBank (protein/nucleotide) using Bio::DB::Query::GenBank within a loop, based on a list of taxonomy ids referring to bacterial strains. In case there is no data in GenBank for a certain strain, I get an error message (see below) and the script terminates ... which is not the behaviour I would like to see, since there are still other taxid's waiting for being queried ... more specifically, the Bio::DB::Query::GenBank query itself runs ok, only while extracting data with e.g. "my $count = $query->count" the error message appears and the script terminates. Suggestions to work around this are welcome! cheers Stefan error message: Warning(s) from GenBank: txid267364 ------------- EXCEPTION ------------- MSG: Error from Genbank: No items found. STACK Bio::DB::Query::GenBank::_parse_response /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Query/GenBank.pm:267 STACK Bio::DB::Query::WebQuery::_run_query /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Query/WebQuery.pm:268 STACK Bio::DB::Query::WebQuery::_fetch_ids /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Query/WebQuery.pm:239 STACK Bio::DB::Query::WebQuery::ids /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Query/WebQuery.pm:205 STACK toplevel ./harvest_LAB_ORF.pl:37 -------------------------------------- From rob at salmonella.org Wed Dec 29 11:40:21 2004 From: rob at salmonella.org (Rob Edwards) Date: Wed Dec 29 11:37:05 2004 Subject: [Bioperl-l] error from GenBank In-Reply-To: <41D2DBFA.6050100@vib.be> References: <41D2DBFA.6050100@vib.be> Message-ID: <54FB8BC8-59B8-11D9-94B7-000A959E1622@salmonella.org> Wrap line 37 of harvest_LAB_ORF.pl in an eval like this eval {...get sequences...}; if ($@) { # this will catch the error and warn you but not stop the script print STDERR "There were no sequences for x\n"; next; } On Dec 29, 2004, at 8:31 AM, Stefan Weckx wrote: > hi all, > > does anybody have experience with this: > > I'm retrieving data from GenBank (protein/nucleotide) using > Bio::DB::Query::GenBank within a loop, based on a list of taxonomy ids > referring to bacterial strains. In case there is no data in GenBank > for a certain strain, I get an error message (see below) and the > script terminates ... which is not the behaviour I would like to see, > since there are still other taxid's waiting for being queried ... > > more specifically, the Bio::DB::Query::GenBank query itself runs ok, > only while extracting data with e.g. "my $count = $query->count" the > error message appears and the script terminates. > > Suggestions to work around this are welcome! > > cheers > Stefan > > error message: > > Warning(s) from GenBank: > txid267364 > ------------- EXCEPTION ------------- > MSG: Error from Genbank: No items found. > STACK Bio::DB::Query::GenBank::_parse_response > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Query/GenBank.pm:267 > STACK Bio::DB::Query::WebQuery::_run_query > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Query/WebQuery.pm:268 > STACK Bio::DB::Query::WebQuery::_fetch_ids > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Query/WebQuery.pm:239 > STACK Bio::DB::Query::WebQuery::ids > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Query/WebQuery.pm:205 > STACK toplevel ./harvest_LAB_ORF.pl:37 > -------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From amackey at pcbi.upenn.edu Wed Dec 29 16:40:25 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Dec 29 16:37:16 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/Factory FTLocationFactory.pm, 1.15, 1.16 In-Reply-To: <200412292038.iBTKcjDT005217@pub.open-bio.org> References: <200412292038.iBTKcjDT005217@pub.open-bio.org> Message-ID: <40287732-59E2-11D9-8293-000A9577009E@pcbi.upenn.edu> Kudos, this is fantastic! On Dec 29, 2004, at 3:38 PM, Jason Stajich wrote: > Update of /home/repository/bioperl/bioperl-live/Bio/Factory > In directory pub.open-bio.org:/tmp/cvs-serv5201/Bio/Factory > > Modified Files: > FTLocationFactory.pm > Log Message: > yuck. lookaheads and balanced parens. But this is a problem that has > been around for a while, glad to finally fix it. bug #1674 describes > the behavior. Couldn't previously handle nested join(join()) properly > > > Index: FTLocationFactory.pm > =================================================================== > RCS file: > /home/repository/bioperl/bioperl-live/Bio/Factory/ > FTLocationFactory.pm,v > retrieving revision 1.15 > retrieving revision 1.16 > diff -C2 -d -r1.15 -r1.16 > *** FTLocationFactory.pm 23 Nov 2004 16:16:34 -0000 1.15 > --- FTLocationFactory.pm 29 Dec 2004 20:38:42 -0000 1.16 > *************** > *** 124,128 **** > my ($self,$locstr,$is_rec) = @_; > my $loc; > ! > # there is no place in FT-formatted location strings where > whitespace > # carries meaning, so strip it off entirely upfront > --- 124,128 ---- > my ($self,$locstr,$is_rec) = @_; > my $loc; > ! > # there is no place in FT-formatted location strings where > whitespace > # carries meaning, so strip it off entirely upfront > *************** > *** 131,136 **** > # does it contain an operator? > if($locstr =~ /^([A-Za-z]+)\((.*)\)$/) { > # yes: > ! my $op = $1; > my $oparg = $2; > if($op eq "complement") { > --- 131,137 ---- > # does it contain an operator? > if($locstr =~ /^([A-Za-z]+)\((.*)\)$/) { > + > # yes: > ! my $op = lc($1); > my $oparg = $2; > if($op eq "complement") { > *************** > *** 138,142 **** > $loc = $self->from_string($oparg, 1); > $loc->strand(-1); > ! } elsif(($op eq "join") || ($op eq "order") || ($op eq "bond")) { > # This is a split location. Split into components and parse each > # one recursively, then gather into a SplitLocationI instance. > --- 139,143 ---- > $loc = $self->from_string($oparg, 1); > $loc->strand(-1); > ! } elsif($op eq "join" || $op eq "order" || $op eq "bond" ) { > # This is a split location. Split into components and parse each > # one recursively, then gather into a SplitLocationI instance. > *************** > *** 146,152 **** > $loc = Bio::Location::Split->new(-verbose => $self->verbose, > -splittype => $op); > ! foreach my $substr (split(/,/, $oparg)) { > ! $loc->add_sub_Location($self->from_string($substr, 1)); > } > } else { > $self->throw("operator \"$op\" unrecognized by parser"); > --- 147,179 ---- > $loc = Bio::Location::Split->new(-verbose => $self->verbose, > -splittype => $op); > ! > ! # have to do this to capture nested joins, something like this > ! # join(11..21,join(100..300,complement(150..230))) > ! # This fixes bug #1674 > ! my $re; > ! $re = qr{ > ! \( > ! (?: > ! (?> [^()]+ ) # Non-parens without backtracking > ! | > ! (??{ $re }) # Group with matching parens > ! )* > ! \) > ! }x; > ! my @sections; > ! if( $oparg =~ s/(.*),(join|order|bond)/$2/i) { > ! push @sections, split(/,/,$1); > ! } > ! # lets capture and remove all the sections which > ! # are groups > ! while( $oparg =~ s/(join|order|bond)$re//ig ) { > ! push @sections, $&; > } > + push @sections, split(/,/,$oparg) if length($oparg); > + # end of fix for bug #1674 > + foreach my $s (@sections) { > + $loc->add_sub_Location($self->from_string($s, 1)); > + } > + > } else { > $self->throw("operator \"$op\" unrecognized by parser"); > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From hlapp at gmx.net Wed Dec 29 17:26:38 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 29 17:23:44 2004 Subject: [Bioperl-l] Re: Bio/Factory FTLocationFactory.pm, 1.15, 1.16 In-Reply-To: <40287732-59E2-11D9-8293-000A9577009E@pcbi.upenn.edu> Message-ID: On Wednesday, December 29, 2004, at 01:40 PM, Aaron J. Mackey wrote: >> # have to do this to capture nested joins, something like this >> ! # join(11..21,join(100..300,complement(150..230))) The sad thing is that apparently it is asking for too much to expect such locations to be written normalized (here, e.g., join(11..21,100..300,complement(150..230)), using simple algebraic equivalence rules - it's hard to imagine the situation where you would really need nested joins). Great you fixed it. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From tex at biocompute.net Wed Dec 29 02:30:17 2004 From: tex at biocompute.net (James Thompson) Date: Wed Dec 29 17:38:29 2004 Subject: [Bioperl-l] Re: Questions about Bio::AlignIO::maf In-Reply-To: <001d01c4edb5$20a404a0$074b12ac@bac1745> Message-ID: Alison (and Allen), I was the aforementioned bug fixer. :) Sorry if there's any confusion on this, but AFAIK Bioperl uses an one-based inclusive coordinate system. While maf may have its own opinions on the best way to do coordinates, maf is only one of the formats that are supported by Bio::AlignIO. The consensus in Bioperl appears to be that it makes more sense to use one consistent coordinate system within all of the modules rather than catering to the opinions and idiosyncrasies of all of the possible file formats. If we did not fix the off-by-one bug in maf.pm, then would be consistency issues with Bio::Align::AlignI objects created from different file formats. Here's a link to a message from the mailing list that seems relevant to the topic at hand: http://bioperl.org/pipermail/bioperl-l/2002-June/008309.html Cheers, James Thompson On Wed, 29 Dec 2004, Lee Ping Alison wrote: > Hi, > > Mr Day, thanks a lot for helping me with my queries. > > I've just obtained the most recent bioperl-live code via cvs with the bug > fixes you've mentioned. I'm wondering why the off-by-one bug fix (end = > start+size-1) was necessary. I'm thinking that "end = start+size" is correct. > Because the MAF file format by UCSC states that coordinates are half-open, > zero-based. And I have understood it as the coordinates in "maf" module > should be (start, end] (start exclusive, end inclusive). I've also tried > several coordinates that agree with UCSC Genome Browser which uses [start, > end]. Hence, in my opinion the bug fix was not necessary. > > Will someone please enlighten me on this? > > Thank you very much! > > Alison. > > ----- Original Message ----- > From: Allen Day > To: Lee Ping Alison > Cc: Bioperl > Sent: 29 December, 2004 3:34 PM > Subject: Re: Questions about Bio::AlignIO::maf > > > Hi Alison, > > I did not add strand information as I didn't need it at the time of > writing. However, I believe this has come up on list recently and someone > has already patched in strand support, as well as an off-by-one bug in my > code. Can whoever did these patches recently pipe in? Thanks. > > Alison, please keep the bioperl list CCed in your reply. > > -Allen > > On Wed, 29 Dec 2004, Lee Ping Alison wrote: > > > Dear Mr Day, > > > > While reading the Bioperl 1.4 documentation for the "Bio::AlignIO::maf" module, I found your email address and I have some questions about how to use "maf." > > > > Am I right to say that the strand information of each sequence in an "maf" file is not recorded, when the LocateableSeq object is created in the nextAln() method? I observed that $strand was not one of the arguments in the call to the constructor. > > > > If yes, what is the reason for not using the strand information? And subsequently, if I need to retrieve the strand information, how should I go about it? > > > > Thank you very much for answering my queries. > > > > Best Regards, > > Alison > > (Institute of Molecular and Cell Biology, Singapore) > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Wed Dec 29 17:46:27 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Dec 29 17:43:12 2004 Subject: [Bioperl-l] Bioperl in 2005 Message-ID: <79464705-59EB-11D9-B264-000393C44276@duke.edu> I just wanted to use the end of the year as a chance to reflect on what we've accomplished in 2004 and think about what 2005 holds for Bioperl. What happened in 2004? First of all, this year has been really has been productive at a level perhaps only appreciated by the folks who read the bioperl-guts-l list which lists the CVS commits. New modules, bugfixes and code improvements have been steadily making their way into the codebase. Not only has there been lots of traffic, but more people are contributing code and fixes. We have also seen increased contributions to the HOWTOs which we hope will be an effective place to explain how to use sets of modules to complete a particular task. We are continually working to improve the documentation. This is a balance between a developer trying to get something accomplished for their own research and wanting other people to use their code (and not wanting to field lots of emails about a particular module). Open source software written solely by volunteers suffers from a reward system which values code over documentation and writing tutorials. We welcome ideas on changes which would help this and are currently thinking about ways to reward the productive documenters as well as coders. We had a chance to have a 5 day Bootcamp in June thanks to Sylvain Foisy, the University of Montreal and the Quebec Bioinformatics Network (BioneQ). We hope to do another one of these in 2006. If there is a general interest in more widespread Bioperl tutorials please forward them to myself or the bioperl list and we can consider how something like this could be organized in conjunction with a conference or meeting. How popular is Bioperl? The 2002 paper has 60+ citations according to Web of Science and we're seeing use in a broader context than just sequence analysis. At least one published paper about modules which were already part of the codebase has appeared suggesting software availability and collaboration can happen prior to publication. The website has been consistently gets around 300,000 hits per month which isn't bad considering that the content doesn't change very much and this is just a site for one toolkit for specific aspect of science. The bioperl-l mailing list has seen an average 341 mails per month (not correcting for spam) which has seen a lot of questions answered and ideas hashed out. How can you help out? I want to use this chance to also appeal to those who use Bioperl and have been sitting on your hands waiting to jump in. It is a collaborative project that only works if new people jump in an contribute ideas and manpower. We've had many examples of people who have just jumped on board the project, fixed some bugs, contributed a module and went on their merry way. We've also had other people who have jumped in, contributed code, and found themselves fully engaged in the project and its internal workings almost immediately. Not to wax poetic, but it was about 5 years ago that fresh out of college, I started reading the mailing list, read Steve Chervitz's email plea for people to "ask not what Bioperl can do for you, ask what you can do for Bioperl" (http://bioperl.org/pipermail/bioperl-l/1999-December/003354.html) and just jumped right in. I can only hope to influence some more folks who might have wanted to contribute but were waiting for the invitation. Well come on over, we'd love to have you taking part. As for some specifics. - Parsing of Species information out from the ORGANISM lines in SwissProt, GenBank, and EMBL is pretty spotty and could take some work. - Some more parsers for formats that people have asked for - a Spidey parser (NCBI's mRNA -> genomic alignment tool) - Work on the Structure modules for dealing with protein structure data - Integrate new applications into bioperl-run and further cleanup the existing modules so they are more consistent - Volunteer to be the next release master. What does the future hold for Bioperl? We expect to have a 1.5 release of bioperl in 1st quarter of 2005 - this is the domain of Aaron Mackey who agreed to be the release master (who has his hands full right now, but I'm sure will ask for help when he needs it). This should incorporate many new modules and bug fixes but be compatible with the 1.4 API as well. Details on the schedule for 1.5 sometime after the holidays. The future depends entirely on who steps up to work on the project next year. In 2005, I am resolving to limit myself from the front guard of mailing list question answering. This is in part finish my PhD research and focus on building more specific tools to support my research questions, but also it is time for other people to contribute and share the spotlight and be a know-it-all. Bioperl is very much a labor of love and it is an integral part of the tools I use in my own work so I expect to focus more directly on those things I need in the coming year and help out where I can. My hope is that some of the new folks who have stepped up to contribute will help by continuing the course we have set to have high quality releases, a full test suite, POD documentation for every module, and overall documentation for using modules in HOWTOs and tutorials. If there are new or unexplored areas the project should consider I hope that you will speak up and suggest them. There is discussion underfoot that a new Bioperl object model may be born. This has been called Bioperl2 and Bioperl-NG. The idea is it would try and create a leaner and cleaner code base which is does things like event-based parsing, autogenerated code for things like getters/setters, and could do things faster and easier than we are currently. Generally there is a lot of legacy code and legacy design in Bioperl and it would be beneficial to have a project that was free of these constraints. At the same time there is an expectation that a project like this would also need to achieve something more than what the current bioperl API cannot do so it incumbent on the new project to have goals that are higher than what Bioperl can do. Thank you I'd like to finally thank some people who have done a lot this year. Of course I'm not going to remember to name everyone, but I just wanted to highlight some folks who have endeavored not only get the toolkit to do what they want, but also to help out other people get started with it. The people who have kept the project going. These are usual suspects how have labored to do the dirty grunt work cleaning up boring bugs, adding documentation, preparing a release, keeping the servers going, etc. They also code too, but wanted to highlight that they have really been critical to keeping the project going by doing the things that most people don't want to bother with. Brian Osborne Aaron Mackey Chris Dagdigian Kyle Jenson (mailing list and site searching at http://search.open-bio.org) Some usual suspects who have been helping maintain their modules and generally being Bioperl knowledgeable on the list: Scott Cain Steve Chervitz Allen Day Donald Jackson Stefan Kirov Hilmar Lapp Josh Lauricha Heikki Lehvaslaiho Chris Mungall Jurgen Plentinckx Lincon Stein There are new several people who have taken up the slack as those before them have drifted onto other commitments. (metaphoric slack of course, not trying to accuse anyone of being a 'slacker'). Thanks for jumping in, fixing bugs, running tests, giving feedback, and just getting involved. It is really encouraging when the project can be a 2-way street and not just a one way flow information going out from a few people who post answers to the list. Richard Adams Sean Davis Rob Edwards Nathan Haigh Marc Logghe Barry Moore Remo Sanges James Thompson Koen van der Drift (Bioperl available via fink on OS X) Thanks also to Peter van Heusden and Electric Genetics which are undertaking a code audit of Bioperl and should have many helpful feedback points for us. I've probably forgotten some people, please post a followup if I have neglected someone as I would like you to be recognized for your work since we don't give out a whole lot else right now. A safe and prosperous New Year to you all. Jason Stajich on behalf of the Bioperl core developers. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From khufaz83 at yahoo.com Thu Dec 30 02:27:49 2004 From: khufaz83 at yahoo.com (hafiz hafiz) Date: Thu Dec 30 02:24:33 2004 Subject: [Bioperl-l] SeqIO, Write to File Message-ID: <20041230072749.73897.qmail@web52509.mail.yahoo.com> hello,i have sucessfull to change format by this souce code in cgi file but it still not working in the url, why? $filegame=">/var/www/html/infoseq"; open (FILE,"$filegame") || die "Couldn't open file\n"; $io=Bio::SeqIO->new('-file'=>"$filegame",'-format'=>"game"); $io->write_seq($seq); ________________________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html From paulo.david at netvisao.pt Thu Dec 30 03:50:37 2004 From: paulo.david at netvisao.pt (Paulo Almeida) Date: Thu Dec 30 03:47:29 2004 Subject: [Bioperl-l] SeqIO, Write to File In-Reply-To: <20041230072749.73897.qmail@web52509.mail.yahoo.com> References: <20041230072749.73897.qmail@web52509.mail.yahoo.com> Message-ID: <41D3C15D.30805@netvisao.pt> Hi, It would be good to know what the error message is, if any. Can you access the webserver's logs? Is the perl file executable, and does the webserver have permission to execute it? If you are testing the script on a different machine, make sure the path to the perl executable is correct on the server as well. -Paulo hafiz hafiz a ?crit : >hello,i have sucessfull to change format by this souce >code in cgi file but it still not working in the >url, why? > > >$filegame=">/var/www/html/infoseq"; >open (FILE,"$filegame") || die "Couldn't open file\n"; > > >$io=Bio::SeqIO->new('-file'=>"$filegame",'-format'=>"game"); > >$io->write_seq($seq); > From khufaz83 at yahoo.com Thu Dec 30 04:26:30 2004 From: khufaz83 at yahoo.com (hafiz hafiz) Date: Thu Dec 30 04:24:49 2004 Subject: [Bioperl-l] SeqIO, Write to File In-Reply-To: <41D3C15D.30805@netvisao.pt> Message-ID: <20041230092630.99557.qmail@web52507.mail.yahoo.com> --- Paulo Almeida wrote: > Hi, > > It would be good to know what the error message is, > if any. Can you > access the webserver's logs? Is the perl file > executable, and does the > webserver have permission to execute it? If you are > testing the script > on a different machine, make sure the path to the > perl executable is > correct on the server as well. > > -Paulo > > hafiz hafiz a ?crit : > > >hello,i have sucessfull to change format by this > souce > >code in cgi file but it still not working in the > >url, why? > > > > > >$filegame=">/var/www/html/infoseq"; > >open (FILE,"$filegame") || die "Couldn't open > file\n"; > > > > > >$io=Bio::SeqIO->new('-file'=>"$filegame",'-format'=>"game"); > > > >$io->write_seq($seq); > > > ________________________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html From paulo.david at netvisao.pt Thu Dec 30 06:41:24 2004 From: paulo.david at netvisao.pt (Paulo Almeida) Date: Thu Dec 30 06:39:06 2004 Subject: [Bioperl-l] Bioperl in 2005 In-Reply-To: <79464705-59EB-11D9-B264-000393C44276@duke.edu> References: <79464705-59EB-11D9-B264-000393C44276@duke.edu> Message-ID: I've been wanting to get more involved in BioPerl, but I need to learn more about Perl first (namely, the object-oriented parts). I am learning about packages and modules (just bought Mastering Perl for Bioinformatics!), so it would be great to contribute to BioPerl, as that would also help me learn more (which is also one of the good things about Open Source). I have been working on a Phylogenetic Profiler (basically, an automatic scan of a number of species for the presence of orthologs of proteins of interest, based on Blast E-values), and on a script that computes the correlation between two protdist matrices, obtained from the protein sequences of a number of species (the optimistic goal is to infer protein co-evolution/interaction from good correlations). Could something like this be interesting for BioPerl? Either way, I wouldn't mind trying your specific suggestions too. -Paulo On Dec 29, 2004, at 22:46, Jason Stajich wrote: > How can you help out? > (...) > I can only hope to influence some more folks who might have wanted to > contribute but were waiting for the invitation. Well come on over, > we'd love to have you taking part. > > As for some specifics. > - Parsing of Species information out from the ORGANISM lines in > SwissProt, GenBank, and EMBL is pretty spotty and could take some > work. > - Some more parsers for formats that people have asked for - a > Spidey parser (NCBI's mRNA -> genomic alignment tool) > - Work on the Structure modules for dealing with protein structure > data > - Integrate new applications into bioperl-run and further cleanup > the existing modules so they are more consistent > - Volunteer to be the next release master. From g0404203 at nus.edu.sg Wed Dec 29 21:02:14 2004 From: g0404203 at nus.edu.sg (Lee Ping Alison) Date: Thu Dec 30 08:51:37 2004 Subject: [Bioperl-l] Re: Questions about Bio::AlignIO::maf References: Message-ID: <003001c4ee13$b28a8bb0$7347d90a@imcb.astar.edu.sg> Hi Mr Thompson, Thanks for the reply. I understand the need for the one-based inclusive coordinate system now; also partly because the major genome browsers use that. However, since you're using inclusive coords, then shouldn't you add 1 to $start first before calculating $end, since $start is zero-based? Alison. ----- Original Message ----- From: "James Thompson" To: "Lee Ping Alison" Cc: "Allen Day" ; "Bioperl" Sent: Wednesday, December 29, 2004 3:30 PM Subject: Re: [Bioperl-l] Re: Questions about Bio::AlignIO::maf > Alison (and Allen), > > I was the aforementioned bug fixer. :) > > Sorry if there's any confusion on this, but AFAIK Bioperl uses an one-based > inclusive coordinate system. While maf may have its own opinions on the best > way to do coordinates, maf is only one of the formats that are supported by > Bio::AlignIO. The consensus in Bioperl appears to be that it makes more sense > to use one consistent coordinate system within all of the modules rather than > catering to the opinions and idiosyncrasies of all of the possible file > formats. If we did not fix the off-by-one bug in maf.pm, then would be > consistency issues with Bio::Align::AlignI objects created from different file > formats. > > Here's a link to a message from the mailing list that seems relevant to the > topic at hand: > > http://bioperl.org/pipermail/bioperl-l/2002-June/008309.html > > Cheers, > > James Thompson > > On Wed, 29 Dec 2004, Lee Ping Alison wrote: > > > Hi, > > > > Mr Day, thanks a lot for helping me with my queries. > > > > I've just obtained the most recent bioperl-live code via cvs with the bug > > fixes you've mentioned. I'm wondering why the off-by-one bug fix (end = > > start+size-1) was necessary. I'm thinking that "end = start+size" is correct. > > Because the MAF file format by UCSC states that coordinates are half-open, > > zero-based. And I have understood it as the coordinates in "maf" module > > should be (start, end] (start exclusive, end inclusive). I've also tried > > several coordinates that agree with UCSC Genome Browser which uses [start, > > end]. Hence, in my opinion the bug fix was not necessary. > > > > Will someone please enlighten me on this? > > > > Thank you very much! > > > > Alison. > > > > ----- Original Message ----- > > From: Allen Day > > To: Lee Ping Alison > > Cc: Bioperl > > Sent: 29 December, 2004 3:34 PM > > Subject: Re: Questions about Bio::AlignIO::maf > > > > > > Hi Alison, > > > > I did not add strand information as I didn't need it at the time of > > writing. However, I believe this has come up on list recently and someone > > has already patched in strand support, as well as an off-by-one bug in my > > code. Can whoever did these patches recently pipe in? Thanks. > > > > Alison, please keep the bioperl list CCed in your reply. > > > > -Allen > > > > On Wed, 29 Dec 2004, Lee Ping Alison wrote: > > > > > Dear Mr Day, > > > > > > While reading the Bioperl 1.4 documentation for the "Bio::AlignIO::maf" module, I found your email address and I have some questions about how to use "maf." > > > > > > Am I right to say that the strand information of each sequence in an "maf" file is not recorded, when the LocateableSeq object is created in the nextAln() method? I observed that $strand was not one of the arguments in the call to the constructor. > > > > > > If yes, what is the reason for not using the strand information? And subsequently, if I need to retrieve the strand information, how should I go about it? > > > > > > Thank you very much for answering my queries. > > > > > > Best Regards, > > > Alison > > > (Institute of Molecular and Cell Biology, Singapore) > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From venkat at calmail.berkeley.edu Thu Dec 30 04:53:40 2004 From: venkat at calmail.berkeley.edu (Venky Nandagopal) Date: Thu Dec 30 08:51:40 2004 Subject: [Bioperl-l] Re: Bioperl in 2005 (Jason Stajich) In-Reply-To: <200412300927.iBU9QdKt030729@portal.open-bio.org> References: <200412300927.iBU9QdKt030729@portal.open-bio.org> Message-ID: Hello I haven't written to the list a lot in the past, nor have I followed it closely, but I do use Bioperl every day in my research and I think this is an amazingly useful resource you guys have created. Thank you. I hope you all have a wonderful 2005. In re some of the things that Jason mentioned, it would be useful to have a link on the website with a brief tutorial on how to write a bioperl compliant module. What are the guidelines, what classes should be inherited from, formats etc. For example, I've got a spidey parser lying somewhere, and I could probably quickly hack it up to be useful for others, but I'm reluctant to invest the time to read through the existing code to figure out what rules I should be following. I'm not a perl expert, but I'm not a novice either, and fairly technical guidelines would work fine. If something like this exists already, a pointer to it would be great and ignore everything else I said. Thanks for your time. Venky -- ___ Venky Nandagopal Graduate Student Eisen Lab UC Berkeley From birney at ebi.ac.uk Thu Dec 30 09:09:23 2004 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Dec 30 09:06:22 2004 Subject: [Bioperl-l] Re: Bioperl in 2005 (Jason Stajich) In-Reply-To: Message-ID: On Thu, 30 Dec 2004, Venky Nandagopal wrote: > Hello > > I haven't written to the list a lot in the past, nor have I followed it > closely, but I do use Bioperl every day in my research and I think this is > an amazingly useful resource you guys have created. Thank you. I hope you > all have a wonderful 2005. > > In re some of the things that Jason mentioned, it would be useful to have > a link on the website with a brief tutorial on how to write a bioperl > compliant module. What are the guidelines, what classes should be > inherited from, formats etc. For example, I've got a spidey parser lying > somewhere, and I could probably quickly hack it up to be useful for > others, but I'm reluctant to invest the time to read through the existing > code to figure out what rules I should be following. I'm not a perl > expert, but I'm not a novice either, and fairly technical guidelines would > work fine. If something like this exists already, a pointer to it would be > great and ignore everything else I said. > There is a brief document on bioperl conventions at: biodesign.pod in the top level of the bioperl distribution. > Thanks for your time. > Venky > > > > -- > ___ > Venky Nandagopal > Graduate Student > Eisen Lab > UC Berkeley > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From golharam at umdnj.edu Thu Dec 30 12:32:37 2004 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu Dec 30 12:24:17 2004 Subject: [Bioperl-l] Bioperl in 2005 In-Reply-To: <79464705-59EB-11D9-B264-000393C44276@duke.edu> Message-ID: <001201c4ee95$8f739130$6400a8c0@GOLHARMOBILE1> Hi all, I'd like to contribute a parser module to parse Spidey results. I took the sim4 parser and modified a little bit to properly read in spidey results. Everything else about it works the same as the sim4 parser as far as I can tell. How can I contribute this module? ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam@umdnj.edu -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich Sent: Wednesday, December 29, 2004 5:46 PM To: Bioperl List; bioperl-announce-l@bioperl.org Subject: [Bioperl-l] Bioperl in 2005 I just wanted to use the end of the year as a chance to reflect on what we've accomplished in 2004 and think about what 2005 holds for Bioperl. What happened in 2004? First of all, this year has been really has been productive at a level perhaps only appreciated by the folks who read the bioperl-guts-l list which lists the CVS commits. New modules, bugfixes and code improvements have been steadily making their way into the codebase. Not only has there been lots of traffic, but more people are contributing code and fixes. We have also seen increased contributions to the HOWTOs which we hope will be an effective place to explain how to use sets of modules to complete a particular task. We are continually working to improve the documentation. This is a balance between a developer trying to get something accomplished for their own research and wanting other people to use their code (and not wanting to field lots of emails about a particular module). Open source software written solely by volunteers suffers from a reward system which values code over documentation and writing tutorials. We welcome ideas on changes which would help this and are currently thinking about ways to reward the productive documenters as well as coders. We had a chance to have a 5 day Bootcamp in June thanks to Sylvain Foisy, the University of Montreal and the Quebec Bioinformatics Network (BioneQ). We hope to do another one of these in 2006. If there is a general interest in more widespread Bioperl tutorials please forward them to myself or the bioperl list and we can consider how something like this could be organized in conjunction with a conference or meeting. How popular is Bioperl? The 2002 paper has 60+ citations according to Web of Science and we're seeing use in a broader context than just sequence analysis. At least one published paper about modules which were already part of the codebase has appeared suggesting software availability and collaboration can happen prior to publication. The website has been consistently gets around 300,000 hits per month which isn't bad considering that the content doesn't change very much and this is just a site for one toolkit for specific aspect of science. The bioperl-l mailing list has seen an average 341 mails per month (not correcting for spam) which has seen a lot of questions answered and ideas hashed out. How can you help out? I want to use this chance to also appeal to those who use Bioperl and have been sitting on your hands waiting to jump in. It is a collaborative project that only works if new people jump in an contribute ideas and manpower. We've had many examples of people who have just jumped on board the project, fixed some bugs, contributed a module and went on their merry way. We've also had other people who have jumped in, contributed code, and found themselves fully engaged in the project and its internal workings almost immediately. Not to wax poetic, but it was about 5 years ago that fresh out of college, I started reading the mailing list, read Steve Chervitz's email plea for people to "ask not what Bioperl can do for you, ask what you can do for Bioperl" (http://bioperl.org/pipermail/bioperl-l/1999-December/003354.html) and just jumped right in. I can only hope to influence some more folks who might have wanted to contribute but were waiting for the invitation. Well come on over, we'd love to have you taking part. As for some specifics. - Parsing of Species information out from the ORGANISM lines in SwissProt, GenBank, and EMBL is pretty spotty and could take some work. - Some more parsers for formats that people have asked for - a Spidey parser (NCBI's mRNA -> genomic alignment tool) - Work on the Structure modules for dealing with protein structure data - Integrate new applications into bioperl-run and further cleanup the existing modules so they are more consistent - Volunteer to be the next release master. What does the future hold for Bioperl? We expect to have a 1.5 release of bioperl in 1st quarter of 2005 - this is the domain of Aaron Mackey who agreed to be the release master (who has his hands full right now, but I'm sure will ask for help when he needs it). This should incorporate many new modules and bug fixes but be compatible with the 1.4 API as well. Details on the schedule for 1.5 sometime after the holidays. The future depends entirely on who steps up to work on the project next year. In 2005, I am resolving to limit myself from the front guard of mailing list question answering. This is in part finish my PhD research and focus on building more specific tools to support my research questions, but also it is time for other people to contribute and share the spotlight and be a know-it-all. Bioperl is very much a labor of love and it is an integral part of the tools I use in my own work so I expect to focus more directly on those things I need in the coming year and help out where I can. My hope is that some of the new folks who have stepped up to contribute will help by continuing the course we have set to have high quality releases, a full test suite, POD documentation for every module, and overall documentation for using modules in HOWTOs and tutorials. If there are new or unexplored areas the project should consider I hope that you will speak up and suggest them. There is discussion underfoot that a new Bioperl object model may be born. This has been called Bioperl2 and Bioperl-NG. The idea is it would try and create a leaner and cleaner code base which is does things like event-based parsing, autogenerated code for things like getters/setters, and could do things faster and easier than we are currently. Generally there is a lot of legacy code and legacy design in Bioperl and it would be beneficial to have a project that was free of these constraints. At the same time there is an expectation that a project like this would also need to achieve something more than what the current bioperl API cannot do so it incumbent on the new project to have goals that are higher than what Bioperl can do. Thank you I'd like to finally thank some people who have done a lot this year. Of course I'm not going to remember to name everyone, but I just wanted to highlight some folks who have endeavored not only get the toolkit to do what they want, but also to help out other people get started with it. The people who have kept the project going. These are usual suspects how have labored to do the dirty grunt work cleaning up boring bugs, adding documentation, preparing a release, keeping the servers going, etc. They also code too, but wanted to highlight that they have really been critical to keeping the project going by doing the things that most people don't want to bother with. Brian Osborne Aaron Mackey Chris Dagdigian Kyle Jenson (mailing list and site searching at http://search.open-bio.org) Some usual suspects who have been helping maintain their modules and generally being Bioperl knowledgeable on the list: Scott Cain Steve Chervitz Allen Day Donald Jackson Stefan Kirov Hilmar Lapp Josh Lauricha Heikki Lehvaslaiho Chris Mungall Jurgen Plentinckx Lincon Stein There are new several people who have taken up the slack as those before them have drifted onto other commitments. (metaphoric slack of course, not trying to accuse anyone of being a 'slacker'). Thanks for jumping in, fixing bugs, running tests, giving feedback, and just getting involved. It is really encouraging when the project can be a 2-way street and not just a one way flow information going out from a few people who post answers to the list. Richard Adams Sean Davis Rob Edwards Nathan Haigh Marc Logghe Barry Moore Remo Sanges James Thompson Koen van der Drift (Bioperl available via fink on OS X) Thanks also to Peter van Heusden and Electric Genetics which are undertaking a code audit of Bioperl and should have many helpful feedback points for us. I've probably forgotten some people, please post a followup if I have neglected someone as I would like you to be recognized for your work since we don't give out a whole lot else right now. A safe and prosperous New Year to you all. Jason Stajich on behalf of the Bioperl core developers. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l