From heikki at sanbi.ac.za Wed Oct 1 03:31:13 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 1 Oct 2008 09:31:13 +0200 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <48E2A074.5060305@open-bio.org> References: <48E2A074.5060305@open-bio.org> Message-ID: <200810010931.14117.heikki@sanbi.ac.za> Cool! I have trouble understanding the values in different columns. Could you whip together a wiki page that explains in plain English how to read them? -Heikki On Tuesday 30 September 2008 23:56:04 Mauricio Herrera Cuadra wrote: > Hi all, > > Daily-updated test coverage reports are now available for those BioPerl > packages which make use of the Build.PL mechanism (except bioperl-db): > > http://bioperl.org/test-coverage/bioperl-live/ > http://bioperl.org/test-coverage/bioperl-network/ > http://bioperl.org/test-coverage/bioperl-run/ > > These reports will help us to know the current 'quality' of the code in > SVN for most of the BioPerl modules. This idea was started by Nathan > Haigh and Sendu a long time ago and it was my fault to not implement on > time the necessary script to run the process on a daily basis, so > apologies for that. > > There are still a few things to be done in order to have this working as > it should: > > - Nathan, current Devel::Cover module from CPAN doesn't include the JS > modifications to make table columns sortable. Do you know what happened > to the code you contributed to the author for that? > > - Reports could be generated for the rest of the BioPerl packages as > soon as they're migrated to the Build.PL infrastructure. Anyone up for > that? > > - bioperl-db tests require BioSQL to be setup in the webserver machine, > and the same goes for bioperl-run's tests with ALL of its dependencies. > The bioperl.org site is co-hosted with all of the other OBF projects and > that machine also takes care of other things (mailing lists, etc), so I > would like your feedback on possible workarounds to not overload the > server if we want to setup such test reports. > > Thanks & regards, > Mauricio. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From jason at bioperl.org Wed Oct 1 04:07:49 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 1 Oct 2008 01:07:49 -0700 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <48E2A074.5060305@open-bio.org> References: <48E2A074.5060305@open-bio.org> Message-ID: <7BFDA4CC-BFDB-473E-81C6-B4F2769A52C6@bioperl.org> Thanks for doing this Mauricio! Great to have this resource and to follow up on the excellent efforts by Nathan and Sendu to get this ball rolling.. We have a couple of Virtual Hosts through Chris Dagdigian and BioTeam's donated resources that we can setup postgres and mysql instances for biosql and even Bio::DB::GFF & Bio::DB::SeqFeature testing. Let's see what Chris's plans are for the current VM instance - we have talked about also starting to port some of the websites to separate instances of the VM to spread the load a little bit more. One idea that can follow out of doing this work is some sort of testable reference servers for some of the bio{*} tools to access some basic datasets and hosting. Maybe with a simple Swissprot instance and a slice of a genbank division so that working gbrowse backend & biosql instances can be used for code testing and development purposes. -jason On Sep 30, 2008, at 2:56 PM, Mauricio Herrera Cuadra wrote: > Hi all, > > Daily-updated test coverage reports are now available for those > BioPerl packages which make use of the Build.PL mechanism (except > bioperl-db): > > http://bioperl.org/test-coverage/bioperl-live/ > http://bioperl.org/test-coverage/bioperl-network/ > http://bioperl.org/test-coverage/bioperl-run/ > > These reports will help us to know the current 'quality' of the code > in SVN for most of the BioPerl modules. This idea was started by > Nathan Haigh and Sendu a long time ago and it was my fault to not > implement on time the necessary script to run the process on a daily > basis, so apologies for that. > > There are still a few things to be done in order to have this > working as it should: > > - Nathan, current Devel::Cover module from CPAN doesn't include the > JS modifications to make table columns sortable. Do you know what > happened to the code you contributed to the author for that? > > - Reports could be generated for the rest of the BioPerl packages as > soon as they're migrated to the Build.PL infrastructure. Anyone up > for that? > > - bioperl-db tests require BioSQL to be setup in the webserver > machine, and the same goes for bioperl-run's tests with ALL of its > dependencies. The bioperl.org site is co-hosted with all of the > other OBF projects and that machine also takes care of other things > (mailing lists, etc), so I would like your feedback on possible > workarounds to not overload the server if we want to setup such test > reports. > > Thanks & regards, > Mauricio. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From bix at sendu.me.uk Wed Oct 1 04:43:10 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 01 Oct 2008 09:43:10 +0100 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <48E2A074.5060305@open-bio.org> References: <48E2A074.5060305@open-bio.org> Message-ID: <48E3381E.9090908@sendu.me.uk> Mauricio Herrera Cuadra wrote: > Hi all, > > Daily-updated test coverage reports are now available for those BioPerl > packages which make use of the Build.PL mechanism (except bioperl-db): > > http://bioperl.org/test-coverage/bioperl-live/ > http://bioperl.org/test-coverage/bioperl-network/ > http://bioperl.org/test-coverage/bioperl-run/ Brilliant, thanks for doing this Mauricio. > - Nathan, current Devel::Cover module from CPAN doesn't include the JS > modifications to make table columns sortable. Do you know what happened > to the code you contributed to the author for that? Could you, in any case, put Nathan's version in the private module area of the user running Build.PL, just so we have it in the meantime? From David.Messina at sbc.su.se Wed Oct 1 05:09:52 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 1 Oct 2008 11:09:52 +0200 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <48E2A074.5060305@open-bio.org> References: <48E2A074.5060305@open-bio.org> Message-ID: <628aabb70810010209j525441c8o6b735b821ac2af76@mail.gmail.com> Great, Mauricio! This should be a big help in encouraging everyone to contribute tests. Thanks for taking the time to do this. Dave From spiros at lokku.com Wed Oct 1 05:13:30 2008 From: spiros at lokku.com (Spiros Denaxas) Date: Wed, 1 Oct 2008 10:13:30 +0100 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <48E2A074.5060305@open-bio.org> References: <48E2A074.5060305@open-bio.org> Message-ID: Awesome work Mauricio, thanks for taking the time to do this. im sure it will greatly benefit us. Spiros On Tue, Sep 30, 2008 at 10:56 PM, Mauricio Herrera Cuadra wrote: > Hi all, > > Daily-updated test coverage reports are now available for those BioPerl > packages which make use of the Build.PL mechanism (except bioperl-db): > > http://bioperl.org/test-coverage/bioperl-live/ > http://bioperl.org/test-coverage/bioperl-network/ > http://bioperl.org/test-coverage/bioperl-run/ > > These reports will help us to know the current 'quality' of the code in SVN > for most of the BioPerl modules. This idea was started by Nathan Haigh and > Sendu a long time ago and it was my fault to not implement on time the > necessary script to run the process on a daily basis, so apologies for that. > > There are still a few things to be done in order to have this working as it > should: > > - Nathan, current Devel::Cover module from CPAN doesn't include the JS > modifications to make table columns sortable. Do you know what happened to > the code you contributed to the author for that? > > - Reports could be generated for the rest of the BioPerl packages as soon as > they're migrated to the Build.PL infrastructure. Anyone up for that? > > - bioperl-db tests require BioSQL to be setup in the webserver machine, and > the same goes for bioperl-run's tests with ALL of its dependencies. The > bioperl.org site is co-hosted with all of the other OBF projects and that > machine also takes care of other things (mailing lists, etc), so I would > like your feedback on possible workarounds to not overload the server if we > want to setup such test reports. > > Thanks & regards, > Mauricio. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Oct 1 08:54:20 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 1 Oct 2008 07:54:20 -0500 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <48E2A074.5060305@open-bio.org> References: <48E2A074.5060305@open-bio.org> Message-ID: <0C03A991-52DB-48EB-8C29-4242C000144B@illinois.edu> On Sep 30, 2008, at 4:56 PM, Mauricio Herrera Cuadra wrote: > Hi all, > > Daily-updated test coverage reports are now available for those > BioPerl packages which make use of the Build.PL mechanism (except > bioperl-db): > > http://bioperl.org/test-coverage/bioperl-live/ > http://bioperl.org/test-coverage/bioperl-network/ > http://bioperl.org/test-coverage/bioperl-run/ > > These reports will help us to know the current 'quality' of the code > in SVN for most of the BioPerl modules. This idea was started by > Nathan Haigh and Sendu a long time ago and it was my fault to not > implement on time the necessary script to run the process on a daily > basis, so apologies for that. > > There are still a few things to be done in order to have this > working as it should: > > - Nathan, current Devel::Cover module from CPAN doesn't include the > JS modifications to make table columns sortable. Do you know what > happened to the code you contributed to the author for that? > > - Reports could be generated for the rest of the BioPerl packages as > soon as they're migrated to the Build.PL infrastructure. Anyone up > for that? Beyond bioperl-db (which Jason mentioned a solution for) and bioperl- pedigree which other distributions would we be talking about? I think bioperl-ext would be too problematic under the current build scheme. > - bioperl-db tests require BioSQL to be setup in the webserver > machine, and the same goes for bioperl-run's tests with ALL of its > dependencies. The bioperl.org site is co-hosted with all of the > other OBF projects and that machine also takes care of other things > (mailing lists, etc), so I would like your feedback on possible > workarounds to not overload the server if we want to setup such test > reports. I think Jason answered that one. > Thanks & regards, > Mauricio. Fantastic work Mauricio, thanks for taking the time to do this! chris From cjfields at illinois.edu Wed Oct 1 09:36:33 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 1 Oct 2008 08:36:33 -0500 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <7BFDA4CC-BFDB-473E-81C6-B4F2769A52C6@bioperl.org> References: <48E2A074.5060305@open-bio.org> <7BFDA4CC-BFDB-473E-81C6-B4F2769A52C6@bioperl.org> Message-ID: An HTML attachment was scrubbed... URL: From hartzell at alerce.com Wed Oct 1 12:33:23 2008 From: hartzell at alerce.com (George Hartzell) Date: Wed, 1 Oct 2008 09:33:23 -0700 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <48E2A074.5060305@open-bio.org> References: <48E2A074.5060305@open-bio.org> Message-ID: <18659.42579.544312.936500@almost.alerce.com> Awesome! Thanks for pushing this forward. g. From hartzell at alerce.com Wed Oct 1 20:32:48 2008 From: hartzell at alerce.com (George Hartzell) Date: Wed, 1 Oct 2008 17:32:48 -0700 Subject: [Bioperl-l] dpalign, local for one sequence, global for the other? Message-ID: <18660.5808.305033.605069@almost.alerce.com> I need to produce an alignment between a hunk of genomic sequence (in the sense that it hasn't had introns edited out or anything) that's on the order of a 1000 bases to a genome/chromosome/fragment of similar genomic sequence. In an ideal situation they'll be the same, differences will come from variations in the sources (e.g. the hunk might have been clipped out of genome revision X and the current genome might be X+i, or the hunk might have come from a paper (who knows where it came from...). Nothing across species or across evolutionary time or anything fun. I'm happy to narrow the region of the genome hunk down using some/an/... heuristic first to avoid running dp against an entire chromosome. I need the alignment to account for all of the bases in the hunk. In dynamic programming terms, if the hunk is along the vertical axis, the path through the matrix would have to run from the top to the bottom (or vice versa). The projection of the path onto the horizontal/genomic axis can start/end wherever. I'd like to not write this [again] and was hoping to use the bio-ext dpalign stuff. I'm hopeful that "ENDSFREE" is just what I need, but from the docs I'm not convinced that it is. A more pessimistic reading makes it sounds a lot like a local alignment. Can anyone out there who's familiar with the dpalign code tell me whether it can do what I need? Out of the box? With modifications? g. From nhaigh at sheffield.ac.uk Thu Oct 2 01:07:29 2008 From: nhaigh at sheffield.ac.uk (Nathan S. Watson-Haigh) Date: Thu, 2 Oct 2008 15:07:29 +1000 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <48E3381E.9090908@sendu.me.uk> References: <48E2A074.5060305@open-bio.org> <48E3381E.9090908@sendu.me.uk> Message-ID: Hi Sendu, The JS modifications made it into Devel::Cover 0.62, so no need for my modified Devel::Cover code now: http://search.cpan.org/src/PJCJ/Devel-Cover-0.62/CHANGES The latest version is now 0.64. Nath -----Original Message----- From: Sendu Bala [mailto:bix at sendu.me.uk] Sent: Wednesday, 1 October 2008 6:43 PM To: Mauricio Herrera Cuadra Cc: bioperl-l; Nathan Haigh Subject: Re: Test coverage for BioPerl now available Mauricio Herrera Cuadra wrote: > Hi all, > > Daily-updated test coverage reports are now available for those BioPerl > packages which make use of the Build.PL mechanism (except bioperl-db): > > http://bioperl.org/test-coverage/bioperl-live/ > http://bioperl.org/test-coverage/bioperl-network/ > http://bioperl.org/test-coverage/bioperl-run/ Brilliant, thanks for doing this Mauricio. > - Nathan, current Devel::Cover module from CPAN doesn't include the JS > modifications to make table columns sortable. Do you know what happened > to the code you contributed to the author for that? Could you, in any case, put Nathan's version in the private module area of the user running Build.PL, just so we have it in the meantime? From mauricio at open-bio.org Thu Oct 2 10:30:31 2008 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 02 Oct 2008 09:30:31 -0500 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: References: <48E2A074.5060305@open-bio.org> <48E3381E.9090908@sendu.me.uk> Message-ID: <48E4DB07.7040204@open-bio.org> We are using 0.64 to generate the reports. Maybe the feature is not enabled by default? Any hints on that? Mauricio. Nathan S. Watson-Haigh wrote: > Hi Sendu, > > The JS modifications made it into Devel::Cover 0.62, so no need for my > modified Devel::Cover code now: > http://search.cpan.org/src/PJCJ/Devel-Cover-0.62/CHANGES > > The latest version is now 0.64. > > Nath > > -----Original Message----- > From: Sendu Bala [mailto:bix at sendu.me.uk] > Sent: Wednesday, 1 October 2008 6:43 PM > To: Mauricio Herrera Cuadra > Cc: bioperl-l; Nathan Haigh > Subject: Re: Test coverage for BioPerl now available > > Mauricio Herrera Cuadra wrote: >> Hi all, >> >> Daily-updated test coverage reports are now available for those BioPerl >> packages which make use of the Build.PL mechanism (except bioperl-db): >> >> http://bioperl.org/test-coverage/bioperl-live/ >> http://bioperl.org/test-coverage/bioperl-network/ >> http://bioperl.org/test-coverage/bioperl-run/ > > Brilliant, thanks for doing this Mauricio. > > >> - Nathan, current Devel::Cover module from CPAN doesn't include the JS >> modifications to make table columns sortable. Do you know what happened >> to the code you contributed to the author for that? > > Could you, in any case, put Nathan's version in the private module area > of the user running Build.PL, just so we have it in the meantime? > > > > From lincoln.stein at gmail.com Thu Oct 2 11:41:00 2008 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 2 Oct 2008 11:41:00 -0400 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: References: <48E2A074.5060305@open-bio.org> <7BFDA4CC-BFDB-473E-81C6-B4F2769A52C6@bioperl.org> Message-ID: <6dce9a0b0810020840m5671d0c1ncb83a818bab316f@mail.gmail.com> Actually I have a family of kvm images with a host of gbrowse databases (of various types, including Bio::DB::GFF and Bio::SeqFeature::Store) running in it. I would be very happy to contribute this to the cause. Lincoln On Wed, Oct 1, 2008 at 9:36 AM, Chris Fields wrote: > Speaking of databases and testing, one thing I would add to the list is a > test aggregation server of some sort (maybe using Smolder). If a VM is set > up for BioSQL/Gbrowse-related tests it might be worth adding this in when we > have the tuits. > > chris > > > On Oct 1, 2008, at 3:07 AM, Jason Stajich wrote: > > Thanks for doing this Mauricio! Great to have this resource and to follow > up on the excellent efforts by Nathan and Sendu to get this ball rolling.. > > > We have a couple of Virtual Hosts through Chris Dagdigian and BioTeam's > donated resources that we can setup postgres and mysql instances for biosql > and even Bio::DB::GFF & Bio::DB::SeqFeature testing. Let's see what Chris's > plans are for the current VM instance - we have talked about also starting > to port some of the websites to separate instances of the VM to spread the > load a little bit more. > > > One idea that can follow out of doing this work is some sort of testable > reference servers for some of the bio{*} tools to access some basic datasets > and hosting. Maybe with a simple Swissprot instance and a slice of a > genbank division so that working gbrowse backend & biosql instances can be > used for code testing and development purposes. > > > -jason > > On Sep 30, 2008, at 2:56 PM, Mauricio Herrera Cuadra wrote: > > > Hi all, > > > Daily-updated test coverage reports are now available for those BioPerl > packages which make use of the Build.PL mechanism (except bioperl-db): > > > http://bioperl.org/test-coverage/bioperl-live/ > > http://bioperl.org/test-coverage/bioperl-network/ > > http://bioperl.org/test-coverage/bioperl-run/ > > > These reports will help us to know the current 'quality' of the code in SVN > for most of the BioPerl modules. This idea was started by Nathan Haigh and > Sendu a long time ago and it was my fault to not implement on time the > necessary script to run the process on a daily basis, so apologies for that. > > > There are still a few things to be done in order to have this working as it > should: > > > - Nathan, current Devel::Cover module from CPAN doesn't include the JS > modifications to make table columns sortable. Do you know what happened to > the code you contributed to the author for that? > > > - Reports could be generated for the rest of the BioPerl packages as soon > as they're migrated to the Build.PL infrastructure. Anyone up for that? > > > - bioperl-db tests require BioSQL to be setup in the webserver machine, and > the same goes for bioperl-run's tests with ALL of its dependencies. The > bioperl.org site is co-hosted with all of the other OBF projects and that > machine also takes care of other things (mailing lists, etc), so I would > like your feedback on possible workarounds to not overload the server if we > want to setup such test reports. > > > Thanks & regards, > > Mauricio. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Jason Stajich > > jason at bioperl.org > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Stacey Quinn Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 USA (516) 367-8380 Assistant: Sandra Michelsen From mauricio at open-bio.org Thu Oct 2 11:56:49 2008 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 02 Oct 2008 10:56:49 -0500 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <48E4DB07.7040204@open-bio.org> References: <48E2A074.5060305@open-bio.org> <48E3381E.9090908@sendu.me.uk> <48E4DB07.7040204@open-bio.org> Message-ID: <48E4EF41.1060806@open-bio.org> Disregard that, Adam gave me the tip on the '-report html_basic' option for `cover`, which generates the tables as we want. Reports are now updated to sortable views. Mauricio. Mauricio Herrera Cuadra wrote: > We are using 0.64 to generate the reports. Maybe the feature is not > enabled by default? Any hints on that? > > Mauricio. > > > Nathan S. Watson-Haigh wrote: >> Hi Sendu, >> >> The JS modifications made it into Devel::Cover 0.62, so no need for my >> modified Devel::Cover code now: >> http://search.cpan.org/src/PJCJ/Devel-Cover-0.62/CHANGES >> >> The latest version is now 0.64. >> >> Nath >> >> -----Original Message----- >> From: Sendu Bala [mailto:bix at sendu.me.uk] Sent: Wednesday, 1 October >> 2008 6:43 PM >> To: Mauricio Herrera Cuadra >> Cc: bioperl-l; Nathan Haigh >> Subject: Re: Test coverage for BioPerl now available >> >> Mauricio Herrera Cuadra wrote: >>> Hi all, >>> >>> Daily-updated test coverage reports are now available for those >>> BioPerl packages which make use of the Build.PL mechanism (except >>> bioperl-db): >>> >>> http://bioperl.org/test-coverage/bioperl-live/ >>> http://bioperl.org/test-coverage/bioperl-network/ >>> http://bioperl.org/test-coverage/bioperl-run/ >> >> Brilliant, thanks for doing this Mauricio. >> >> >>> - Nathan, current Devel::Cover module from CPAN doesn't include the >>> JS modifications to make table columns sortable. Do you know what >>> happened to the code you contributed to the author for that? >> >> Could you, in any case, put Nathan's version in the private module >> area of the user running Build.PL, just so we have it in the meantime? >> >> >> >> > From philsf79 at gmail.com Mon Oct 6 16:42:36 2008 From: philsf79 at gmail.com (Felipe Figueiredo) Date: Mon, 06 Oct 2008 17:42:36 -0300 Subject: [Bioperl-l] *Phylip*::ProtDist but no ::DnaDist? Message-ID: <1223325756.4538.12.camel@localhost> hello, I see there is Bio::Tools::Run::Phylo::Phylip::ProtDist but I can't find a module supporting the dnadist program in doc.bp.o. I even tried substituting ::ProtDist for ::DnaDist and ::DNADist there, but those are 404. Is there any reason why phylip support only includes proteins, and not nucleotide, or am I missing something? regards FF From hartzell at alerce.com Mon Oct 6 17:27:11 2008 From: hartzell at alerce.com (George Hartzell) Date: Mon, 6 Oct 2008 14:27:11 -0700 Subject: [Bioperl-l] existing support for location w/ offset? Message-ID: <18666.33455.279233.39395@almost.alerce.com> I have a community that likes to refer to positions as offsets, e.g. 100 bases upstream of the beginning of the exon, or 5 bases after the splice site.... They think of it as 34532+2 or 23451-23 (funny, if you hold down the shift key when you type that, you get @#$%!-@#, which is how I feel about it...). Are there any exising Bio::Location classes that would make keeping track of this info easier? g. From bix at sendu.me.uk Mon Oct 6 17:57:53 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 06 Oct 2008 22:57:53 +0100 Subject: [Bioperl-l] existing support for location w/ offset? In-Reply-To: <18666.33455.279233.39395@almost.alerce.com> References: <18666.33455.279233.39395@almost.alerce.com> Message-ID: <48EA89E1.1020601@sendu.me.uk> George Hartzell wrote: > I have a community that likes to refer to positions as offsets, > e.g. 100 bases upstream of the beginning of the exon, or 5 bases after > the splice site.... They think of it as 34532+2 or 23451-23 (funny, > if you hold down the shift key when you type that, you get @#$%!-@#, > which is how I feel about it...). > > Are there any exising Bio::Location classes that would make keeping > track of this info easier? It might not be at all appropriate, and hopefully there's better solutions, but for relative positioning stuff you can try playing with Bio::Map stuff. From nhaigh at sheffield.ac.uk Sun Oct 5 22:39:54 2008 From: nhaigh at sheffield.ac.uk (Nathan S. Watson-Haigh) Date: Mon, 6 Oct 2008 12:39:54 +1000 Subject: [Bioperl-l] Test coverage for BioPerl now available In-Reply-To: <200810010931.14117.heikki@sanbi.ac.za> References: <48E2A074.5060305@open-bio.org> <200810010931.14117.heikki@sanbi.ac.za> Message-ID: <4B3C96F71A6345F8984762F14D9ED682@nexus.csiro.au> Firstly, thanks Mauricio for getting this set up....it should be really useful for everyone and make adding tests so much easier by focusing attention on problem areas! In fact, it could help users make the transition from using Bioperl to contributing code by giving them an entry point into understanding the modules. Increasing test coverage in a group of modules would make a good student project! Heikki, You should find this page useful: http://search.cpan.org/~pjcj/Devel-Cover-0.64/lib/Devel/Cover/Tutorial.pod I use the following as a list of priorities for code coverage: 1) Get the "subroutine" metric to 100% - every subroutine should have at least 1 test to check it's returning the correct value/object given at least one set of arguments. 2) Get the "statement" metric as close to 100% as possible by providing various inputs so that each statement is tested for BOTH the true and false possibilities. 3) Work on the "branch" and "path" metrics if after performing the above 2, these are still in low coverage. You'll find that performing the above steps, will have a knock-on effect of increasing these coverage's at the same time. NOTE: Testing never proves the absence of faults, it only shows their presence. I think as a result of better test coverage, there will be a need to formalise/discuss the intended behaviour of some modules, as well as the choice over default values. Also, Don't forget to test for situations where you expect a die/warn using Test::Exception and Test::Warn WIKI: I Think there needs to be some consolidation of Testing practices/information etc on the wiki, especially in light of the test coverage pages, with the above page linked to. In particular the following pages have info about testing: http://www.bioperl.org/wiki/Project_priority_list#Module_testing http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests http://www.bioperl.org/wiki/Becoming_a_developer#Test http://www.bioperl.org/wiki/Advanced_BioPerl#Designing_Good_Tests Cheers, Nathan -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho Sent: Wednesday, 1 October 2008 5:31 PM To: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Test coverage for BioPerl now available Cool! I have trouble understanding the values in different columns. Could you whip together a wiki page that explains in plain English how to read them? -Heikki On Tuesday 30 September 2008 23:56:04 Mauricio Herrera Cuadra wrote: > Hi all, > > Daily-updated test coverage reports are now available for those BioPerl > packages which make use of the Build.PL mechanism (except bioperl-db): > > http://bioperl.org/test-coverage/bioperl-live/ > http://bioperl.org/test-coverage/bioperl-network/ > http://bioperl.org/test-coverage/bioperl-run/ > > These reports will help us to know the current 'quality' of the code in > SVN for most of the BioPerl modules. This idea was started by Nathan > Haigh and Sendu a long time ago and it was my fault to not implement on > time the necessary script to run the process on a daily basis, so > apologies for that. > > There are still a few things to be done in order to have this working as > it should: > > - Nathan, current Devel::Cover module from CPAN doesn't include the JS > modifications to make table columns sortable. Do you know what happened > to the code you contributed to the author for that? > > - Reports could be generated for the rest of the BioPerl packages as > soon as they're migrated to the Build.PL infrastructure. Anyone up for > that? > > - bioperl-db tests require BioSQL to be setup in the webserver machine, > and the same goes for bioperl-run's tests with ALL of its dependencies. > The bioperl.org site is co-hosted with all of the other OBF projects and > that machine also takes care of other things (mailing lists, etc), so I > would like your feedback on possible workarounds to not overload the > server if we want to setup such test reports. > > Thanks & regards, > Mauricio. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From kurniawan.monica at gmail.com Wed Oct 8 00:03:08 2008 From: kurniawan.monica at gmail.com (Monica Kurniawan) Date: Wed, 8 Oct 2008 15:03:08 +1100 Subject: [Bioperl-l] Installing Bioperl-ext Message-ID: Hi Everyone, I am trying to install the bioperl-ext module since this morning without any luck! It keeps complaining that It can not find the io_lib-config. I have installed the staden io_lib version 1.8.12. My OS is ubuntu. Anyone have any experience with this ? Thanks Monica From michael.watson at bbsrc.ac.uk Wed Oct 8 10:35:57 2008 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 8 Oct 2008 15:35:57 +0100 Subject: [Bioperl-l] Bio::Biblio to access pubmed? Message-ID: <8975119BCD0AC5419D61A9CF1A923E9506D87DD2@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I am looking for methods of querying pubmed using perl, getting an object which I can then print out various attributes of the papers returned (author, title etc) in tabular format. I came across Bio::Biblio but I can't see any examples of using it with pubmed? Does anyone have any? Thanks Mick Head of Informatics Institute for Animal Health Compton Berks RG20 7NN 01635 578411 http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From miraceti at gmail.com Wed Oct 8 15:05:38 2008 From: miraceti at gmail.com (miraceti) Date: Wed, 8 Oct 2008 15:05:38 -0400 Subject: [Bioperl-l] phylogeny-trait association methods into BioPerl In-Reply-To: <200809191822.23503.heikki@sanbi.ac.za> References: <200809101332.07137.heikki@sanbi.ac.za> <200809191822.23503.heikki@sanbi.ac.za> Message-ID: Hi, thanks for introducing this into bioperl, I was having a hard time trying to figure out how to use mesquite on many many trees automatically. Now I can use bioperl instead. I was testing the ps() function with some discrete character states, and I noticed the ancestor states are different from what I would get by hand. I think it's because of this line map {$node->set_tag_value('ps_trait', $_)} keys %union; it replaces the trait values every time it does map, instead of storing all possible values. Could you look into this? thanks Mira Han On Fri, Sep 19, 2008 at 12:22 PM, Heikki Lehvaslaiho wrote: > The code is now in SVN. Bio::Tree::TreeFunctionsI::add_trait() can be used > to > set trait values in a tree. > > Enjoy, > > -Heikki > > On Wednesday 10 September 2008 13:32:06 Heikki Lehvaslaiho wrote: > > FYI, > > > > I've been recently writing code to analyse phylogeny-trait associations. > > These traits are typically geographical location of the sequence but they > > can be any phenotypic characters associated with the sequences. > > > > This involves trees, i.e. Bio::Tree::Tree and Bio::Tree::Node objects and > > strings describing the traits. I've been using tags to store trait values > > within nodes. The tag methods are: > > > > Bio::Tree::Node::add_tag_value > > Bio::Tree::Node::get_all_tags > > Bio::Tree::Node::get_tag_values > > Bio::Tree::Node::has_tag > > Bio::Tree::Node::remove_all_tags > > Bio::Tree::Node::remove_tag > > > > Question: Is there any particular reason why there is no > > set_tag_value(scalar|@array) method? > > > > I am getting tired of writing: > > $node->remove_tag($key); > > map {$node->add_tag_value($key)} @values ; > > so I am going to implement that unless there is are strong objections. > > > > Otherwise it has been smooth sailing. I am going to add > > Bio::Tree::TreeFunctions::is_binary() and start populating > > Bio::Tree::Statistics soon with these methods: > > > > ps() - Parsimony Score (PS) from Fitch 1971 > > ai() - Association index (AI) of Whang et al. 2001 > > mc() - Monophyletic Clade (MC) size statistics by Salemi at al. 2005 > > cherries() - number of leaf node pairs > > > > If you have any comments, please feel free to post them here. > > > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki at sanbi.ac.za Thu Oct 9 01:36:47 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Oct 2008 07:36:47 +0200 Subject: [Bioperl-l] phylogeny-trait association methods into BioPerl In-Reply-To: References: <200809101332.07137.heikki@sanbi.ac.za> <200809191822.23503.heikki@sanbi.ac.za> Message-ID: <200810090736.47772.heikki@sanbi.ac.za> Mira, Thanks for spotting this. This really proved the point that many eyes are better than one pair. I knew that the outcome for the test tree is 4 and still accepted 5! When writing this, I moved from using add_tag_value() into new method set_tag_value(): map {$node->add_tag_value('ps_trait', $_)} keys %union; # worked initially map {$node->set_tag_value('ps_trait', $_)} keys %union; # wrong $node->set_tag_value('ps_trait',keys %union); # correct and a lot simpler I'll commit the changes the moment I get svn to respond. The connection is closing prematurely for me: svn update =========================================== dev.open-bio.org - Authorized Access Only =========================================== Connection closed by 207.154.17.71 svn: Connection closed unexpectedly -Heikki On Wednesday 08 October 2008 21:05:38 miraceti wrote: > Hi, thanks for introducing this into bioperl, > I was having a hard time trying to figure out how to use mesquite on many > many trees automatically. > Now I can use bioperl instead. > I was testing the ps() function with some discrete character states, > and I noticed the ancestor states are different from what I would get by > hand. > I think it's because of this line > > map {$node->set_tag_value('ps_trait', $_)} keys %union; > > it replaces the trait values every time it does map, > instead of storing all possible values. > Could you look into this? > thanks > > Mira Han > > On Fri, Sep 19, 2008 at 12:22 PM, Heikki Lehvaslaiho wrote: > > The code is now in SVN. Bio::Tree::TreeFunctionsI::add_trait() can be > > used to > > set trait values in a tree. > > > > Enjoy, > > > > -Heikki > > > > On Wednesday 10 September 2008 13:32:06 Heikki Lehvaslaiho wrote: > > > FYI, > > > > > > I've been recently writing code to analyse phylogeny-trait > > > associations. These traits are typically geographical location of the > > > sequence but they can be any phenotypic characters associated with the > > > sequences. > > > > > > This involves trees, i.e. Bio::Tree::Tree and Bio::Tree::Node objects > > > and strings describing the traits. I've been using tags to store trait > > > values within nodes. The tag methods are: > > > > > > Bio::Tree::Node::add_tag_value > > > Bio::Tree::Node::get_all_tags > > > Bio::Tree::Node::get_tag_values > > > Bio::Tree::Node::has_tag > > > Bio::Tree::Node::remove_all_tags > > > Bio::Tree::Node::remove_tag > > > > > > Question: Is there any particular reason why there is no > > > set_tag_value(scalar|@array) method? > > > > > > I am getting tired of writing: > > > $node->remove_tag($key); > > > map {$node->add_tag_value($key)} @values ; > > > so I am going to implement that unless there is are strong objections. > > > > > > Otherwise it has been smooth sailing. I am going to add > > > Bio::Tree::TreeFunctions::is_binary() and start populating > > > Bio::Tree::Statistics soon with these methods: > > > > > > ps() - Parsimony Score (PS) from Fitch 1971 > > > ai() - Association index (AI) of Whang et al. 2001 > > > mc() - Monophyletic Clade (MC) size statistics by Salemi at al. 2005 > > > cherries() - number of leaf node pairs > > > > > > If you have any comments, please feel free to post them here. > > > > > > -Heikki > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Oct 9 04:22:57 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Oct 2008 10:22:57 +0200 Subject: [Bioperl-l] phylogeny-trait association methods into BioPerl In-Reply-To: <200810090736.47772.heikki@sanbi.ac.za> References: <200809101332.07137.heikki@sanbi.ac.za> <200810090736.47772.heikki@sanbi.ac.za> Message-ID: <200810091022.57889.heikki@sanbi.ac.za> Looks like the SVN problem was at my end. It is sorted out now and I've committed the fix. -Heikki On Thursday 09 October 2008 07:36:47 Heikki Lehvaslaiho wrote: > I'll commit the changes the moment I get svn to respond. The connection is > closing prematurely for me: > > svn update > =========================================== > dev.open-bio.org - Authorized Access Only > =========================================== > Connection closed by 207.154.17.71 > svn: Connection closed unexpectedly From sidd.basu at gmail.com Thu Oct 9 12:53:41 2008 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 9 Oct 2008 11:53:41 -0500 Subject: [Bioperl-l] Re: Bio::Biblio to access pubmed? In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9506D87DD2@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9506D87DD2@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <48ee3717.1386460a.239f.2631@mx.google.com> Hi, As an alternate I also use Bio::DB::EUtilities frequently query and extract data from pubmed. Currently it returns xml which needed to be parsed to get specific information. Here is a small example using xpath query.... #!/usr/bin/perl -w use strict; use Bio::DB::EUtilities; use XML::Twig::XPath; my $pubmed_id = $ARGV[0] || '18835579'; my $eutils = Bio::DB::EUtilities->new( eutil => 'efetch', -db => 'pubmed', -id => $pubmed_id ); if ( $eutils->get_Response->is_error() ) { die $eutils->get_Response->message(), "\n"; } my $twig = XML::Twig::XPath->new->parse($eutils->get_Response->content()); my ($title) = $twig->findnodes('//Article/ArticleTitle'); print $title->getValue,"\n"; my @authors = $twig->findnodes('//Article/AuthorList/Author'); print $_->getValue,"\n" foreach @authors; -siddhartha On Wed, 08 Oct 2008, michael watson (IAH-C) wrote: > Hi > > I am looking for methods of querying pubmed using perl, getting an > object which I can then print out various attributes of the papers > returned (author, title etc) in tabular format. > > I came across Bio::Biblio but I can't see any examples of using it with > pubmed? Does anyone have any? > > Thanks > Mick > > Head of Informatics > Institute for Animal Health > Compton > Berks > RG20 7NN > 01635 578411 > > http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. > If you have received this message in error please delete it & notify the > originator immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Anthony.Underwood at hpa.org.uk Thu Oct 9 13:08:31 2008 From: Anthony.Underwood at hpa.org.uk (Anthony Underwood) Date: Thu, 9 Oct 2008 18:08:31 +0100 Subject: [Bioperl-l] scf version 2 traces Message-ID: <69E2D2428BD6C2429B8944FEC53B1EB9AEE77A@colhpaexc004.HPA.org.uk> Hi all, A long time ago (March 2004) I had a discussion with Chad about reading scf files in Bioperl. I noticed there may be some problems with version 2 files. I now mostly code in ruby and so am contributing to bioruby. I have been writing code to extract trace information from scf files based on some code from another biorubyist for reading ABI files and then looking at the code in Bioperl. I now have this working and a whole better understanding of reading binary files. I believe I have discovered the bugs in Bioperl for reading version2 scf traces. In scf.pm In the _parse_v2_traces method I believe the lines entering the information into the traces array should be as below since the order is specified here http://staden.sourceforge.net/manual/formats_unix_4.html#SEC4 push @{$traces->{'a'}},$read[$offset2]; push @{$traces->{'t'}},$read[$offset2+1]; push @{$traces->{'g'}},$read[$offset2+3]; push @{$traces->{'c'}},$read[$offset2+2]; also the $buffer for this method passed in from the next_seq method is incorrect because the offset isn't correct. In the next_seq method the last of the following lines should be changed $creator->{header} = $self->_get_header($buffer); if ($creator->{header}->{'version'} lt "3.00") { $self->debug("scf.pm is working with a version 2 scf.\n"); # first gather the trace information $length = $creator->{header}->{'samples'} * $creator->{header}->{sample_size}*4; $buffer = $self->read_from_buffer($fh, $buffer, $length, $creator->{header}->{samples_offset}); To $buffer = $self->read_from_buffer($fh, $buffer, $length, $creator->{header}->{sample_offset}); Note sample_offet not samples_offset. I have tested these corrections using other sequence viewers (Chromas, FinchTV) and with these changes the output is now correct. Can these be updated in the live code and next release version. Thanks Anthony Dr Anthony Underwood Bioinformatics Unit | Statistics, Modelling and Bioinformatics Department Centre for Infections Health Protection Agency 61 Colindale Avenue London NW9 5HT t: 0208 3276466 f: 0208 3276738 e:anthony.underwood at hpa.org.uk ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************** From vebaev at gmail.com Fri Oct 10 09:18:12 2008 From: vebaev at gmail.com (Vesselin Baev) Date: Fri, 10 Oct 2008 16:18:12 +0300 Subject: [Bioperl-l] bio::graphics - xyplot problems Message-ID: Dear All, I'm trying to plot a line hystogram with the code: but instead of line like this transctiptional profile - http://oligomers.tamu.edu/gbrowse/tutorial/figures/figure12.gif I get boxes ?!? - http://img147.imageshack.us/img147/9137/plotpz2.png #!/usr/bin/perl use Bio::Graphics; use Bio::Graphics::Panel; use Bio::SeqFeatureI; use Bio::SeqFeature::Generic; my $panel = Bio::Graphics::Panel->new( -length => 1000, -width => 800, -gridcolor => 'lightcyan', -grid => 1, ); my $track2 = $panel->add_track( -glyph => 'xyplot', -graph_type => 'line', -point_symbol => 'point', -max_score => 50, -min_score => 0, -scale => 'right' -key => 'Methylation profile'); # add "subfeatures" my @met_array=(0.2,0.5,1,2,3,4,5,6,7,8,9,10); for (my $k=0;$k<$#met_array;$k++) { my $feature1 = Bio::SeqFeature::Generic->new(-start=>$k,-end=>$k+20,-score=>$met_array[$k]); $track2->add_feature($feature1); } #==========END==========# #writing in the file open (IMAGEFILE,">plot.png"); binmode IMAGEFILE; print IMAGEFILE $panel->png; -- ------------------------------------------------ Dr. Vesselin Baev Research Assistant Professor University of Plovdiv Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA +359 32 261 (560) +359 89 43 80 945 Skype: vebaev vebaev at gmail.com baev at uni-plovdiv.bg http://plantgene.eu/ From hartzell at alerce.com Fri Oct 10 16:57:35 2008 From: hartzell at alerce.com (George Hartzell) Date: Fri, 10 Oct 2008 13:57:35 -0700 Subject: [Bioperl-l] bioperl.lisp, bioperl-object-start, use, and use base? Message-ID: <18671.49599.871762.785099@almost.alerce.com> Why does the template provided by bioperl-object-start both use Bio::Root::Root; and use base qw(Bio::Root::Root); Is the first one necessary for some reason? g. From hartzell at alerce.com Fri Oct 10 17:25:36 2008 From: hartzell at alerce.com (George Hartzell) Date: Fri, 10 Oct 2008 14:25:36 -0700 Subject: [Bioperl-l] bioperl.lisp, bioperl-object-start, use, and use base? In-Reply-To: <5CA74927-F801-441B-ADF4-6580DD6307E5@illinois.edu> References: <18671.49599.871762.785099@almost.alerce.com> <5CA74927-F801-441B-ADF4-6580DD6307E5@illinois.edu> Message-ID: <18671.51280.12232.222819@almost.alerce.com> Thanks. I that's what I thought/googled. I was afraid that it was some backward compatability best practice or something. Two small edits (scoured from perlmonks) Chris Fields writes: > No, it shouldn't be necessary. "use 'Foo'" is the same as: > > BEGIN { > require Foo; > } It also calls that package's import, so it's: BEGIN { require Foo; Foo->import(); } > > ... and "use base 'Foo'" is the same as: > > BEGIN { > require Foo; > push @ISA, 'Foo'; > } > and this doesn't to the ->import(). Thanks, g. > chris > > On Oct 10, 2008, at 3:57 PM, George Hartzell wrote: > > > > > Why does the template provided by bioperl-object-start both > > > > use Bio::Root::Root; > > > > and > > > > use base qw(Bio::Root::Root); > > > > Is the first one necessary for some reason? > > > > g. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > From cjfields at illinois.edu Fri Oct 10 17:16:48 2008 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 10 Oct 2008 16:16:48 -0500 Subject: [Bioperl-l] bioperl.lisp, bioperl-object-start, use, and use base? In-Reply-To: <18671.49599.871762.785099@almost.alerce.com> References: <18671.49599.871762.785099@almost.alerce.com> Message-ID: <5CA74927-F801-441B-ADF4-6580DD6307E5@illinois.edu> No, it shouldn't be necessary. "use 'Foo'" is the same as: BEGIN { require Foo; } ... and "use base 'Foo'" is the same as: BEGIN { require Foo; push @ISA, 'Foo'; } chris On Oct 10, 2008, at 3:57 PM, George Hartzell wrote: > > Why does the template provided by bioperl-object-start both > > use Bio::Root::Root; > > and > > use base qw(Bio::Root::Root); > > Is the first one necessary for some reason? > > g. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at illinois.edu Fri Oct 10 23:47:00 2008 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 10 Oct 2008 22:47:00 -0500 Subject: [Bioperl-l] bioperl.lisp, bioperl-object-start, use, and use base? In-Reply-To: <18671.51280.12232.222819@almost.alerce.com> References: <18671.49599.871762.785099@almost.alerce.com> <5CA74927-F801-441B-ADF4-6580DD6307E5@illinois.edu> <18671.51280.12232.222819@almost.alerce.com> Message-ID: Yeah, I forgot the import bit. It really applies mainly when you 'use Foo qw(bar baz)' but it's always helpful to keep in mind when it applies -c On Oct 10, 2008, at 4:25 PM, George Hartzell wrote: > > Thanks. I that's what I thought/googled. I was afraid that it was > some backward compatability best practice or something. Two small > edits (scoured from perlmonks) > > Chris Fields writes: >> No, it shouldn't be necessary. "use 'Foo'" is the same as: >> >> BEGIN { >> require Foo; >> } > > It also calls that package's import, so it's: > > BEGIN { > require Foo; > Foo->import(); > } > >> >> ... and "use base 'Foo'" is the same as: >> >> BEGIN { >> require Foo; >> push @ISA, 'Foo'; >> } >> > > and this doesn't to the ->import(). > > Thanks, > > g. > >> chris >> >> On Oct 10, 2008, at 3:57 PM, George Hartzell wrote: >> >>> >>> Why does the template provided by bioperl-object-start both >>> >>> use Bio::Root::Root; >>> >>> and >>> >>> use base qw(Bio::Root::Root); >>> >>> Is the first one necessary for some reason? >>> >>> g. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From vebaev at gmail.com Sun Oct 12 16:44:39 2008 From: vebaev at gmail.com (Vesselin Baev) Date: Sun, 12 Oct 2008 23:44:39 +0300 Subject: [Bioperl-l] xyplot howto? Message-ID: Dear All, I have a file containing methylation profile in GFF format such: Chr1 ATIDB mCIP_col_BU_UB_methylation 1 24 1.5240512e-01 . . Name="methylation Chr1:mCIP_col_BU_UB" Chr1 ATIDB mCIP_col_BU_UB_methylation 25 59 1.4491698e-01 . . Name="methylation Chr1:mCIP_col_BU_UB" Chr1 ATIDB mCIP_col_BU_UB_methylation 60 112 8.0595555e-02 . . Name="methylation Chr1:mCIP_col_BU_UB" Chr1 ATIDB mCIP_col_BU_UB_methylation 113 153 6.0091032e-02 . . Name="methylation Chr1:mCIP_col_BU_UB" Chr1 ATIDB mCIP_col_BU_UB_methylation 154 184 4.9909350e-02 . . Name="methylation Chr1:mCIP_col_BU_UB" Chr1 ATIDB mCIP_col_BU_UB_methylation 185 218 5.7258270e-01 . . Name="methylation Chr1:mCIP_col_BU_UB" Chr1 ATIDB mCIP_col_BU_UB_methylation 219 253 6.7766268e-01 . . Name="methylation Chr1:mCIP_col_BU_UB" what type of code to apply to display just an image-histogram for a desired region (3 and 4 columns coordinates)? Vesko -- ------------------------------------------------ Dr. Vesselin Baev Research Assistant Professor University of Plovdiv Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA +359 32 261 (560) +359 89 43 80 945 Skype: vebaev vebaev at gmail.com baev at uni-plovdiv.bg http://plantgene.eu/ From cain.cshl at gmail.com Mon Oct 13 11:04:20 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 13 Oct 2008 11:04:20 -0400 Subject: [Bioperl-l] xyplot howto? In-Reply-To: References: Message-ID: <536f21b00810130804s6dd324cp5241ce1894dfcac8@mail.gmail.com> Hello Vesko, Have you looked at the tutorial that covers creating xy/histogram plots? You can either look at your local version of it, at http://localhost/gbrowse/tutorial/tutorial.html#graph or the version on gmod.org: http://gmod.org/gbrowse-cgi/tutorial/tutorial.html#graph Scott On Sun, Oct 12, 2008 at 4:44 PM, Vesselin Baev wrote: > Dear All, > I have a file containing methylation profile in GFF format such: > Chr1 ATIDB mCIP_col_BU_UB_methylation 1 24 1.5240512e-01 . . Name="methylation > Chr1:mCIP_col_BU_UB" > Chr1 ATIDB mCIP_col_BU_UB_methylation 25 59 1.4491698e-01 . . Name="methylation > Chr1:mCIP_col_BU_UB" > Chr1 ATIDB mCIP_col_BU_UB_methylation 60 112 8.0595555e-02 . . Name="methylation > Chr1:mCIP_col_BU_UB" > Chr1 ATIDB mCIP_col_BU_UB_methylation 113 153 6.0091032e-02 . . Name="methylation > Chr1:mCIP_col_BU_UB" > Chr1 ATIDB mCIP_col_BU_UB_methylation 154 184 4.9909350e-02 . . Name="methylation > Chr1:mCIP_col_BU_UB" > Chr1 ATIDB mCIP_col_BU_UB_methylation 185 218 5.7258270e-01 . . Name="methylation > Chr1:mCIP_col_BU_UB" > Chr1 ATIDB mCIP_col_BU_UB_methylation 219 253 6.7766268e-01 . . Name="methylation > Chr1:mCIP_col_BU_UB" > > what type of code to apply to display just an image-histogram for a > desired region (3 and 4 columns coordinates)? > > > > Vesko > > -- > ------------------------------------------------ > Dr. Vesselin Baev > Research Assistant Professor > University of Plovdiv > Dept. Molecular Biology > Bioinformatics Group > Tzar Assen 24 > Plovdiv 4000, BULGARIA > +359 32 261 (560) > +359 89 43 80 945 > Skype: vebaev > vebaev at gmail.com > baev at uni-plovdiv.bg > http://plantgene.eu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From six-guns at hotmail.com Tue Oct 14 02:38:57 2008 From: six-guns at hotmail.com (liuganqiang) Date: Tue, 14 Oct 2008 14:38:57 +0800 Subject: [Bioperl-l] How to use UTR Message-ID: Dear Hilmar Lapp, I wanna to use bioperl to retrieve 3'UTR from a fasta sequence; but I don't know how to use Bio::SeqFeature::Gene::UTR module. Could you show me some example script? Thank you very much! sincerely, Grant Liu 2008-10-14 ??????????????????????Live ???????????????????? ???????????? From barry.moore at genetics.utah.edu Tue Oct 14 11:30:19 2008 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 14 Oct 2008 09:30:19 -0600 Subject: [Bioperl-l] How to use UTR In-Reply-To: References: Message-ID: Grant, To know where the UTRs are in your sequence you will need to have some kind of annotation associated with the sequence. A couple of common sources for those annotations would be from a GenBank, ENSEMBL, or GFF3 - something that says there is a 3' UTR located between the locations 1,234 - 1,456 on your sequence. Fasta sequence would normally not have that kind of information associated with it. You may need to rethink your source data. You should have a look at a few of the very well written bioperl HOWTOs: http://www.bioperl.org/wiki/HOWTO:Beginners http://www.bioperl.org/wiki/HOWTO:SeqIO http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Have a look through those documents to get an idea of how to handle sequences and annotations with bioperl, decide which approach applies best to your data, and then if you have more questions about how to implement the details drop us a line with more specifics about your situation. Barry Barry Moore Senior Research Specialist Eccles Institute of Human Genetics Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 On Oct 14, 2008, at 12:38 AM, liuganqiang wrote: > Dear Hilmar Lapp, > > I wanna to use bioperl to retrieve 3'UTR from a fasta sequence; but > I don't know how to use Bio::SeqFeature::Gene::UTR module. Could you > show me some example script? > Thank you very much! > sincerely, > > Grant Liu > 2008-10-14 > > ???????????Live ???????? > ?? ?????? From hartzell at alerce.com Tue Oct 14 17:52:26 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue, 14 Oct 2008 14:52:26 -0700 Subject: [Bioperl-l] can someone confirm a small doc/comment bug in Bio::LocatableSeq? Message-ID: <18677.5274.447430.707525@almost.alerce.com> In the course of trying to understand how stuff works, I've noticed what I'm pretty sure is a typo in the comments above: Bio::LocatableSeq::column_from_residue_number The example says that column_from_residue_number(94) returns 5, but I think that it should return 6 (and running the demo does indeed return 6). If someone will confirm that I'm not Missing Something, I'll can touch it up. g. From cjfields at illinois.edu Tue Oct 14 18:45:53 2008 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Oct 2008 17:45:53 -0500 Subject: [Bioperl-l] can someone confirm a small doc/comment bug in Bio::LocatableSeq? In-Reply-To: <18677.5274.447430.707525@almost.alerce.com> References: <18677.5274.447430.707525@almost.alerce.com> Message-ID: <0D147ED4-766E-4D54-8E7D-17630FBE93BF@illinois.edu> Go ahead and fix that. The documentation indicates column numbering starts at 1, so it should be 6. chris On Oct 14, 2008, at 4:52 PM, George Hartzell wrote: > > In the course of trying to understand how stuff works, I've noticed > what I'm pretty sure is a typo in the comments above: > > Bio::LocatableSeq::column_from_residue_number > > The example says that column_from_residue_number(94) returns 5, but I > think that it should return 6 (and running the demo does indeed return > 6). > > If someone will confirm that I'm not Missing Something, I'll can > touch it up. > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Oct 14 23:28:23 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 15 Oct 2008 16:28:23 +1300 Subject: [Bioperl-l] drawing a quality trace on a panel? In-Reply-To: <0D147ED4-766E-4D54-8E7D-17630FBE93BF@illinois.edu> References: <18677.5274.447430.707525@almost.alerce.com> <0D147ED4-766E-4D54-8E7D-17630FBE93BF@illinois.edu> Message-ID: Hi all, I'm writing a contig viewer and want to add a trace of quality trace data. I can get the values OK but how do I draw these on a panel with the xyplot glyph? Any ideas? Here's some example code: (sorry if Outlook will screws up the formatting) ===================================== my $parser = new Bio::Assembly::IO(-file => $infile ,-format => "ace") or die $!; # just work on the first contig my ($contig) = $parser->next_assembly->all_contigs; my $consensus = $contig->get_consensus_sequence(); my @quality_values = @{$contig->get_consensus_quality()->qual()}; #create panel my $panel = Bio::Graphics::Panel->new( -length => $consensus->length, -width => gdSmallFont->width * 1.5 * $consensus->length, -bgcolor => 'white', -grid => 1, -pad_left=> 20, -pad_right=> 20, ); # add a scale $panel->add_track('arrow' => Bio::Graphics::Feature->new( -start => $consensus->start, -end => $consensus->end ), -bump => 0, -double => 1, -tick => 2, ); my $consensus_feature = Bio::SeqFeature::Generic->new( -start => $consensus->start, -end => $consensus->end, -strand=> $consensus->strand, -display_name => $consensus->display_name, ); my $consensus_seq = Bio::PrimarySeq->new( -seq => $consensus->seq); $consensus_feature->attach_seq($consensus_seq); $panel->add_track( $consensus_feature, -glyph => 'segments' , -height => 10, -label => 1, -draw_dna => 1, -bgcolor => "white", -fgcolor => "silver" ); # somehow get the array of quality_values into an xyplot? #other code to draw rest of the reads # output print $panel->png; ======================================= ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From hartzell at alerce.com Tue Oct 14 23:49:47 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue, 14 Oct 2008 20:49:47 -0700 Subject: [Bioperl-l] can someone confirm a small doc/comment bug in Bio::LocatableSeq? In-Reply-To: <0D147ED4-766E-4D54-8E7D-17630FBE93BF@illinois.edu> References: <18677.5274.447430.707525@almost.alerce.com> <0D147ED4-766E-4D54-8E7D-17630FBE93BF@illinois.edu> Message-ID: <18677.26715.64414.957510@almost.alerce.com> Chris Fields writes: > Go ahead and fix that. The documentation indicates column numbering > starts at 1, so it should be 6. I also touched up the docs for location_from_column. There's still an inconsistency, the docs say it'll return undef for i columns before the first residue, but in reality it returns an IN-BETWEEN. Is it better to leave it as is or change the code to match the docs? Should it behave consistently for things past the end of the sequence if we fix the before? g. From vebaev at gmail.com Wed Oct 15 13:31:46 2008 From: vebaev at gmail.com (Vesselin Baev) Date: Wed, 15 Oct 2008 20:31:46 +0300 Subject: [Bioperl-l] xyplot only from numbers? Message-ID: Dear All, if I have only a bunch on numbers, can I use a xyplot to draw a histogram? Vesko -- ------------------------------------------------ Dr. Vesselin Baev Research Assistant Professor University of Plovdiv Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA +359 32 261 (560) +359 89 43 80 945 Skype: vebaev vebaev at gmail.com baev at uni-plovdiv.bg http://plantgene.eu/ From michael.kiwala at gmail.com Wed Oct 15 13:55:37 2008 From: michael.kiwala at gmail.com (Michael Kiwala) Date: Wed, 15 Oct 2008 12:55:37 -0500 Subject: [Bioperl-l] significant bug with Bio::LocatableSeq Message-ID: <16fd5ab50810151055x1ec23ce9l51961ebb18d118a3@mail.gmail.com> I'd like to add Bio::Assembly::IO::ace to the list of affected parsers. I've been off the list for a while, so I know I'm just jumping right in the middle of something, and it's probably bigger than I know. But I'd like to help out anyway. :) I need some background on LocatableSeq in order to understand the problem. Is it not redundant to pass -start, -end, and -seq? Seems like normally you would only need -start or -end plus the -seq, right? Does anyone see a problem with just removing the -end argument to Bio::LocatableSeq->new() calls in B:A:IO:ace since the -start and -seq are already being provided? Thanks, Michael > On Mon Sep 15 00:13:57 EDT 2008 Chris Fields wrote: > > While debugging some tests in bioperl, I noticed a fairly significant > issue with Bio::LocatableSeq which is probably due to some > inconsistencies with start/end coordinates. For some reason this > started popping up with error messages recently when running AlignIO > tests on bioperl-live (i.e. something changed which exposed the bug, > maybe the verbosity level): > > 1..295 > ok 1 - use Bio::AlignIO; > ok 2 - The object isa Bio::AlignIO > ok 3 - The object isa Bio::Align::AlignI > ok 4 > ok 5 > ok 6 - The object isa Bio::AlignIO > > --------------------- WARNING --------------------- > MSG: In sequence 02 residue count gives end value 399. > Overriding value [355] with value 399 for Bio::LocatableSeq::end(). > STACK Bio::LocatableSeq::end /Users/cjfields/bioperl/bioperl-live/blib/ > lib/Bio/LocatableSeq.pm:150 > STACK Bio::LocatableSeq::new /Users/cjfields/bioperl/bioperl-live/blib/ > lib/Bio/LocatableSeq.pm:103 > STACK Bio::AlignIO::arp::next_aln /Users/cjfields/bioperl/bioperl-live/ > blib/lib/Bio/AlignIO/arp.pm:106 > STACK toplevel t/AlignIO.t:34 > --------------------------------------------------- > .... followed by tons of similar errors. > > The problem is, no change is ever made. This is demonstrated by the > following: > > ----------------------------- > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::LocatableSeq; > > my $seq = Bio::LocatableSeq->new( > -id => 'foo', > -seq => 'A----TGCGCTTCCTCGCTTCCG', > -start => 10, > -end => 100, # intentially bad > -strand => -1); > > print $seq->end."\n"; > > ----------------------------- > > Results: > > --------------------- WARNING --------------------- > MSG: In sequence foo residue count gives end value 28. > Overriding value [100] with value 28 for Bio::LocatableSeq::end(). > STACK Bio::LocatableSeq::end /Users/cjfields/bioperl/bioperl-live/Bio/ > LocatableSeq.pm:150 > STACK Bio::LocatableSeq::new /Users/cjfields/bioperl/bioperl-live/Bio/ > LocatableSeq.pm:103 > STACK toplevel seq.pl:7 > --------------------------------------------------- > 100 > > The warning pops up when -end is passed to LocatableSeq::new and > indicates that the passed coordinate doesn't match up with the one > calculated from the sequence (minus gaps). I've isolated the bug down > to the end() method and am working on fixing it. Note that this > affects LocatableSeq::length as well. This appears to affect arp, > nexus, stockholm, and a few other AlignIO parsers as well. > > chris -- I saved latin. What did you ever do? -Max Fischer From cain.cshl at gmail.com Wed Oct 15 15:44:14 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 15 Oct 2008 15:44:14 -0400 Subject: [Bioperl-l] xyplot only from numbers? In-Reply-To: References: Message-ID: <536f21b00810151244x7e758314r6bb9bffc68ef0672@mail.gmail.com> Hi Vesko, Could you be a little more descriptive in what you want to do? I'm guessing you'd like to draw a histogram along a DNA sequence, but you only have coordinates and scores, but no actual DNA (ie, ATGC) sequence? You don't need DNA sequence to use BioGraphics. Scott On Wed, Oct 15, 2008 at 1:31 PM, Vesselin Baev wrote: > Dear All, > if I have only a bunch on numbers, can I use a xyplot to draw a histogram? > > > Vesko > > -- > ------------------------------------------------ > Dr. Vesselin Baev > Research Assistant Professor > University of Plovdiv > Dept. Molecular Biology > Bioinformatics Group > Tzar Assen 24 > Plovdiv 4000, BULGARIA > +359 32 261 (560) > +359 89 43 80 945 > Skype: vebaev > vebaev at gmail.com > baev at uni-plovdiv.bg > http://plantgene.eu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From Russell.Smithies at agresearch.co.nz Wed Oct 15 15:40:41 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 16 Oct 2008 08:40:41 +1300 Subject: [Bioperl-l] xyplot only from numbers? In-Reply-To: References: Message-ID: Hi Vesko, I managed to figure out how to draw my xyplot with data from an array of quality values and I'm sure you'll be able to adapt this example to suit your needs. The trick is to add one feature for the xyplot then add_SeqFeature for each the values you want to use for the plot. ============================================ my $consensus = $contig->get_consensus_sequence(); my @quality_values = @{$contig->get_consensus_quality()->qual()}; my $consensus_quality_feature = Bio::SeqFeature::Generic->new( -start => $consensus->start, -end => $consensus->end, -strand=> $consensus->strand, -display_name => $consensus->display_name, ); # add "subfeatures" for (my $i=1;$i<$#quality_values;$i++) { $consensus_quality_feature->add_SeqFeature(Bio::SeqFeature::Generic->new (-start=>$i,-end=>$i,-score=>$quality_values[$i])); } $panel->add_track( $consensus_quality_feature, -glyph => 'xyplot' , -height => 10, -graph_type => 'boxes', -display_name => "Quality", ); ================================================ Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Vesselin Baev > Sent: Thursday, 16 October 2008 6:32 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] xyplot only from numbers? > > Dear All, > if I have only a bunch on numbers, can I use a xyplot to draw a histogram? > > > Vesko > > -- > ------------------------------------------------ > Dr. Vesselin Baev > Research Assistant Professor > University of Plovdiv > Dept. Molecular Biology > Bioinformatics Group > Tzar Assen 24 > Plovdiv 4000, BULGARIA > +359 32 261 (560) > +359 89 43 80 945 > Skype: vebaev > vebaev at gmail.com > baev at uni-plovdiv.bg > http://plantgene.eu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From vebaev at gmail.com Wed Oct 15 16:20:06 2008 From: vebaev at gmail.com (Vesselin Baev) Date: Wed, 15 Oct 2008 23:20:06 +0300 Subject: [Bioperl-l] xyplot only from numbers? Message-ID: Hi Scott, I mean I want to try to draw a methylation pattern from a GFF file. I have only chromosome number, coordinates and this GFF file (with methylation from TAIR) I extract the values for my desired region simply by scanning the GFF file with my perl script :) and I have for example for the 1000nt redion @met filled with (1.2, 1.5, 8.4, 6.3, 8.3, 0.1, ....and so on ) and I want to add a histogram to my Bio::Graphics panel with my 1000nt ruler on it I know it is stupid doing it like this but I was trying like crazy and nothing :) Thanks!!! Vesko Scott wrote: ============= Hi Vesko, Could you be a little more descriptive in what you want to do? I'm guessing you'd like to draw a histogram along a DNA sequence, but you only have coordinates and scores, but no actual DNA (ie, ATGC) sequence? You don't need DNA sequence to use BioGraphics. Scott On Wed, Oct 15, 2008 at 1:31 PM, Vesselin Baev wrote: > Dear All, > if I have only a bunch on numbers, can I use a xyplot to draw a histogram? > > > Vesko > > -- > ------------------------------------------------ > Dr. Vesselin Baev > Research Assistant Professor > University of Plovdiv > Dept. Molecular Biology > Bioinformatics Group > Tzar Assen 24 > Plovdiv 4000, BULGARIA > +359 32 261 (560) > +359 89 43 80 945 > Skype: vebaev > vebaev at gmail.com > baev at uni-plovdiv.bg > http://plantgene.eu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------ Dr. Vesselin Baev Research Assistant Professor University of Plovdiv Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA +359 32 261 (560) +359 89 43 80 945 Skype: vebaev vebaev at gmail.com baev at uni-plovdiv.bg http://plantgene.eu/ From nickytong at gmail.com Thu Oct 16 17:28:26 2008 From: nickytong at gmail.com (pan tong) Date: Thu, 16 Oct 2008 16:28:26 -0500 Subject: [Bioperl-l] Enquery about the Bio::Tools::Run::Alignment::Blat module Message-ID: Dear bioperl team, I'm a graduate student. I've just begun using bioperl to conduct my research. I found the modules provided by bioperl very useful. Besides, there are many documents describing each module which is very helpful. However, when I search the Blat module, there seems to be little example or document about it. I have problems while using the blat module. Here is my code: ---------------------------------------------------------------------------------------------------------------- #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::Tools::Alignment::Blat; my $in= Bio::SeqIO->new(-file=>'124_Dmel_Enc[1][1].fa.txt' , '-format' => 'Fasta' ); while ( my $seq_object = $in->next_seq() ) { my $seq=$seq_object->seq(); #print "$seq\n"; my $factory = Bio::Tools::Alignment::Blat->new(); my $DB='D.melanogaster'; my @feats = $factory->run($seq_object,$DB); } --------------------------------------------------------------------------------------------------------------- When I excute it, the program exit with "Can't locate object method "new" via package "Bio::Tools::Alignment::Blat"..." Can you help modify my code? Or can you give me a detailed sample code of Blat so that I can refer? By the way, I need to specify Genome with D.melanogaster when I call the blat module. Thank you very much and look forward to your reply. Yours, Pan -- Department of Quantitative Science M.D. Anderson Cancer Center Houston, Tx, 77054 From hlapp at gmx.net Thu Oct 16 18:12:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 16 Oct 2008 18:12:13 -0400 Subject: [Bioperl-l] SeqFeature scores In-Reply-To: <16770479-9818-4EF4-8AB4-7BCECB5FC482@illinois.edu> References: <0FC73752-0290-4937-AE6B-974CA53A7865@gmx.net> <16770479-9818-4EF4-8AB4-7BCECB5FC482@illinois.edu> Message-ID: <4B3C9707-11C8-49AC-834E-1961F5816920@gmx.net> On Oct 14, 2008, at 1:28 PM, Chris Fields wrote: > Personally I'm not sure how we'd store score data in BioSQL. Is > 'score' within the schema? I suppose we could add it as a specific > tag value but that seems potentially hackish and prone to naming > conflicts. Well, yes there might be naming conflicts, if someone wanted to add a tag to a seqfeature called 'score', for example. But would there be cases where it would make sense to have value stored in the feature's $feat->score() method *and* a (semantically) different one as the 'score' tag's value? Quite frankly I'd be hard- pressed to come up with a scenario where that might make sense. So as far as I am concerned I wouldn't actually have a problem with changing the implementation of score() to store/pull the value to/from the tag/value hash. In fact, that's what B::SF::Similarity does for the attributes it adds methods for (such as bits, significance, etc). Thoughts? I'm copying this to the Bioperl list as really it is a BioPerl/Bioperl-db issue. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Thu Oct 16 20:32:07 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 16 Oct 2008 19:32:07 -0500 Subject: [Bioperl-l] SeqFeature scores In-Reply-To: <4B3C9707-11C8-49AC-834E-1961F5816920@gmx.net> References: <0FC73752-0290-4937-AE6B-974CA53A7865@gmx.net> <16770479-9818-4EF4-8AB4-7BCECB5FC482@illinois.edu> <4B3C9707-11C8-49AC-834E-1961F5816920@gmx.net> Message-ID: <126B4381-F15B-44FF-B91E-3A19523224B5@illinois.edu> On Oct 16, 2008, at 5:12 PM, Hilmar Lapp wrote: > > On Oct 14, 2008, at 1:28 PM, Chris Fields wrote: > >> Personally I'm not sure how we'd store score data in BioSQL. Is >> 'score' within the schema? I suppose we could add it as a specific >> tag value but that seems potentially hackish and prone to naming >> conflicts. > > Well, yes there might be naming conflicts, if someone wanted to add > a tag to a seqfeature called 'score', for example. > > But would there be cases where it would make sense to have value > stored in the feature's $feat->score() method *and* a (semantically) > different one as the 'score' tag's value? Quite frankly I'd be hard- > pressed to come up with a scenario where that might make sense. Agreed. I think it's a very unlikely scenario, frankly, but never hurts to bring it up. > So as far as I am concerned I wouldn't actually have a problem with > changing the implementation of score() to store/pull the value to/ > from the tag/value hash. In fact, that's what B::SF::Similarity does > for the attributes it adds methods for (such as bits, significance, > etc). > > Thoughts? I'm copying this to the Bioperl list as really it is a > BioPerl/Bioperl-db issue. > > -hilmar Makes sense to me. I think anything else not supported in BioSQL/ bioperl-db should likewise be stored in the tag/value hash, but score() is the only one that comes to mind at the moment. I'll make the change once the bug report is filed so we can track any problems I encounter. chris From cjfields at illinois.edu Thu Oct 16 21:28:57 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 16 Oct 2008 20:28:57 -0500 Subject: [Bioperl-l] significant bug with Bio::LocatableSeq In-Reply-To: <16fd5ab50810151055x1ec23ce9l51961ebb18d118a3@mail.gmail.com> References: <16fd5ab50810151055x1ec23ce9l51961ebb18d118a3@mail.gmail.com> Message-ID: <92992648-4B27-4128-AEA9-0D6B2FB1C9A1@illinois.edu> Michael, You can easily remove the -end call as the end will be calculated from the start position and the number of residues (non-gap positions), though it may be a good idea to initially test passing it in to make sure the gaps are being called correctly. The LocatableSeq bug in question wasn't explicitly checking for a miscalculated end even though it appeared to, so it was particularly nasty as several modules were either passing in the wrong -end or the gap symbols were not correctly set (so LocatableSeq incorrectly checked the ends). There are still a few issues to work out. Primarily, we need a workaround for LocatableSeqs derived from HSPs where the sequences are translated, so we need to either bypass the end point check or work in a tuple when calculating the end. chris On Oct 15, 2008, at 12:55 PM, Michael Kiwala wrote: > I'd like to add Bio::Assembly::IO::ace to the list of affected > parsers. I've been off the list for a while, so I know I'm just > jumping right in the middle of something, and it's probably bigger > than I know. But I'd like to help out anyway. :) > > I need some background on LocatableSeq in order to understand the > problem. Is it not redundant to pass -start, -end, and -seq? Seems > like normally you would only need -start or -end plus the -seq, right? > Does anyone see a problem with just removing the -end argument to > Bio::LocatableSeq->new() calls in B:A:IO:ace since the -start and -seq > are already being provided? > > Thanks, > Michael > > >> On Mon Sep 15 00:13:57 EDT 2008 Chris Fields wrote: >> >> While debugging some tests in bioperl, I noticed a fairly significant >> issue with Bio::LocatableSeq which is probably due to some >> inconsistencies with start/end coordinates. For some reason this >> started popping up with error messages recently when running AlignIO >> tests on bioperl-live (i.e. something changed which exposed the bug, >> maybe the verbosity level): >> >> 1..295 >> ok 1 - use Bio::AlignIO; >> ok 2 - The object isa Bio::AlignIO >> ok 3 - The object isa Bio::Align::AlignI >> ok 4 >> ok 5 >> ok 6 - The object isa Bio::AlignIO >> >> --------------------- WARNING --------------------- >> MSG: In sequence 02 residue count gives end value 399. >> Overriding value [355] with value 399 for Bio::LocatableSeq::end(). >> STACK Bio::LocatableSeq::end /Users/cjfields/bioperl/bioperl-live/ >> blib/ >> lib/Bio/LocatableSeq.pm:150 >> STACK Bio::LocatableSeq::new /Users/cjfields/bioperl/bioperl-live/ >> blib/ >> lib/Bio/LocatableSeq.pm:103 >> STACK Bio::AlignIO::arp::next_aln /Users/cjfields/bioperl/bioperl- >> live/ >> blib/lib/Bio/AlignIO/arp.pm:106 >> STACK toplevel t/AlignIO.t:34 >> --------------------------------------------------- >> .... followed by tons of similar errors. >> >> The problem is, no change is ever made. This is demonstrated by the >> following: >> >> ----------------------------- >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::LocatableSeq; >> >> my $seq = Bio::LocatableSeq->new( >> -id => 'foo', >> -seq => 'A----TGCGCTTCCTCGCTTCCG', >> -start => 10, >> -end => 100, # intentially bad >> -strand => -1); >> >> print $seq->end."\n"; >> >> ----------------------------- >> >> Results: >> >> --------------------- WARNING --------------------- >> MSG: In sequence foo residue count gives end value 28. >> Overriding value [100] with value 28 for Bio::LocatableSeq::end(). >> STACK Bio::LocatableSeq::end /Users/cjfields/bioperl/bioperl-live/ >> Bio/ >> LocatableSeq.pm:150 >> STACK Bio::LocatableSeq::new /Users/cjfields/bioperl/bioperl-live/ >> Bio/ >> LocatableSeq.pm:103 >> STACK toplevel seq.pl:7 >> --------------------------------------------------- >> 100 >> >> The warning pops up when -end is passed to LocatableSeq::new and >> indicates that the passed coordinate doesn't match up with the one >> calculated from the sequence (minus gaps). I've isolated the bug >> down >> to the end() method and am working on fixing it. Note that this >> affects LocatableSeq::length as well. This appears to affect arp, >> nexus, stockholm, and a few other AlignIO parsers as well. >> >> chris > > > -- > I saved latin. What did you ever do? > -Max Fischer > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From hartzell at alerce.com Thu Oct 16 22:29:32 2008 From: hartzell at alerce.com (George Hartzell) Date: Thu, 16 Oct 2008 19:29:32 -0700 Subject: [Bioperl-l] Enquery about the Bio::Tools::Run::Alignment::Blat module In-Reply-To: References: Message-ID: <1224210572.1404.1.camel@delicious> On Thu, 2008-10-16 at 16:28 -0500, pan tong wrote: > Dear bioperl team, > > I'm a graduate student. I've just begun using bioperl to conduct my > research. I found the modules provided by bioperl very useful. Besides, > there are many documents describing each module which is very helpful. > > However, when I search the Blat module, there seems to be little example or > document about it. I have problems while using the blat module. > > Here is my code: > ---------------------------------------------------------------------------------------------------------------- > #!/usr/bin/perl > use strict; > use warnings; > use Bio::SeqIO; > use Bio::Tools::Alignment::Blat; > > my $in= Bio::SeqIO->new(-file=>'124_Dmel_Enc[1][1].fa.txt' , '-format' => > 'Fasta' ); > while ( my $seq_object = $in->next_seq() ) > { > my $seq=$seq_object->seq(); > #print "$seq\n"; > my $factory = Bio::Tools::Alignment::Blat->new(); > my $DB='D.melanogaster'; > my @feats = $factory->run($seq_object,$DB); > } > --------------------------------------------------------------------------------------------------------------- > When I excute it, the program exit with "Can't locate object method "new" > via package "Bio::Tools::Alignment::Blat"..." > Can you help modify my code? Or can you give me a detailed sample code of > Blat so that I can refer? > By the way, I need to specify Genome with D.melanogaster when I call the > blat module. > I think that you should be using Bio::Tools::Run::Alignment::Blat You'll need to change the use statement at line 5 and the package that you specify in the call to ->new(). I'm kind of surprised that your 'use' statement at line 5 actually works as is. g. From heikki at sanbi.ac.za Fri Oct 17 03:41:59 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 17 Oct 2008 09:41:59 +0200 Subject: [Bioperl-l] FigTree extensions to nexus Message-ID: <200810170941.59878.heikki@sanbi.ac.za> FigTree is a graphical viewer of phylogenetic trees and a program for producing publication-ready figures written by Andrew Rambaut: http://tree.bio.ed.ac.uk/software/figtree/. I added code to Bio::TreeIO::nexus::write_tree (svn 14935) that adds labels and colors to the output as comments for FigTree to recognise. This code names and colors a cluster in a tree: my $name = 'ClusterA'; my $color = '#ff0000'; #red $stem_node->set_tag_value('label', $name); $stem_node->set_tag_value('color', $color); foreach my $node ($stem_node->get_all_Descendents) { $node->set_tag_value('color', $color); } Currently these tags are always printed out. If needed, it will be easy to add a switch to the nexus module to write them out only on request. Enjoy, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From me at hongyu.org Fri Oct 17 17:32:29 2008 From: me at hongyu.org (Hongyu Zhang) Date: Fri, 17 Oct 2008 14:32:29 -0700 (PDT) Subject: [Bioperl-l] contribute a new BioPerl module Message-ID: <683226.53324.qm@web51403.mail.re2.yahoo.com> I have written a new Bioperl module to parse the USPTO (Unite States Patent & Trademark Office) sequence listing files. Now I intended to make it public, and am wondering whom I should contact to add it into the Bioperl archive. Thanks! Best, Hongyu Zhang, Ph.D. Ceres Inc., Thousand Oaks, CA Cell: 805-405-5394 Fax: 866-447-8750 From hartzell at alerce.com Fri Oct 17 18:03:42 2008 From: hartzell at alerce.com (George Hartzell) Date: Fri, 17 Oct 2008 15:03:42 -0700 Subject: [Bioperl-l] sanity check my understanding of bioperl's location terminology? Message-ID: <18681.3006.557134.242948@almost.alerce.com> Hi All, I'm not sure that I'm understanding Bioperl's location terminology and how it's carried through into some basic technology like the LocatableSeqs. Hopefully this is more about communicating in the shared bioperl language and I'm not just demonstrating how much biology I've forgotten. This is being driven by my gmap SearchIO parser (which hopefully will get committed at some point), which currently returns GenericHSPs in GenericHits and from which I can retrieve SimpleAlign's. I'm just not sure that I'm translating from gmap-speak to bioperl-speak correctly (assumptions, it's always assumptions...). There's the basic truism from e.g. Bio::Range: length = end - start + 1 end >= start strand = (-1 | 0 | +1) So if I have seq_id => foo 5' AACTGTTTGG 3' 1 5 1 0 so -start => 3, -end => 6, -strand => +1 would be: CTGT and -start => 4, -end => 4, -strand => +1 would be: T Things get goofier when strand is -1, but I'm pretty confident that one would say -start => 4, -end => 4, -strand => -1 would be: A (but I'm worried that I should say it's T, or the reverse compliment of T or something complicated) and slightly less confident that one would say -start => 3, -end => 6, -strand => -1 would be: ACAG (in other words, always spelling the sequence out 5' to 3' from the reverse strand in that range). Where I really get shaky about how I understand how things are supposed to be said is when LocatableSeqs get involved. If I have a simple align that contains a row w/ -seq_id => foo, -start => 3, -end => 6, -strand => 1 then I think that the row might look like this in an alignment: moose CTCT foo CTGT bar C-GT and that if moose AGAG foo ACAG bar AC-G were the rows in the alignment then it's info would be -seq_id => foo, -start => 3, -end => 6, -strand => -1. Now here's where I really loose faith in my understanding. There's code in Bio::LocatableSeq::column_from_residue_number (around line 301) that tests the strand and if strand == -1 then it counts back from the -end coordinate towards the -start but if it's +1 then it counts up from 0 towards the -end. That only makes sense if the sequence string that's contained in the locatable seq that's part of the simple alignment is actually the forward strand, in which case the alignment would *look* really goofy visually. If on the other hand the string in the alignment were the 5' to 3' writing of reverse strand (which would look like it aligns with the other strings in the alignment because the bases would match) then the counting seems messed up. If someone wants to round out my understanding and squash any bugs in it, I can try to use it to seed a Location HowTo or something. Thanks, g. From cjfields at illinois.edu Fri Oct 17 23:00:56 2008 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 17 Oct 2008 22:00:56 -0500 Subject: [Bioperl-l] contribute a new BioPerl module In-Reply-To: <683226.53324.qm@web51403.mail.re2.yahoo.com> References: <683226.53324.qm@web51403.mail.re2.yahoo.com> Message-ID: Hongyu, The best way to submit the module is to add it as an enhancement request to Bugzilla (http://bugzilla.bioperl.org/) along with relevant tests as described in the Writing Tests HOWTO (http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests ). One of the core devs will try to vet the module to make sure everything is in place prior to adding it to bioperl-live. chris On Oct 17, 2008, at 4:32 PM, Hongyu Zhang wrote: > I have written a new Bioperl module to parse the USPTO (Unite States > Patent & Trademark Office) sequence listing files. Now I intended to > make it public, and am wondering whom I should contact to add it > into the Bioperl archive. Thanks! > > Best, > > Hongyu Zhang, Ph.D. > Ceres Inc., Thousand Oaks, CA > Cell: 805-405-5394 > Fax: 866-447-8750 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki at sanbi.ac.za Sat Oct 18 02:47:52 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Sat, 18 Oct 2008 08:47:52 +0200 Subject: [Bioperl-l] sanity check my understanding of bioperl's location terminology? In-Reply-To: <18681.3006.557134.242948@almost.alerce.com> References: <18681.3006.557134.242948@almost.alerce.com> Message-ID: <200810180847.52715.heikki@sanbi.ac.za> George, You've got it all right expect the last bit. Bio::LocatableSeq::column_from_residue_number() is a special case because it's input is in original sequence coordinates, of which the LocatableSeq in question is part of. Look at the tests. I just added a few on column_from_residue_number() that show that if you take a revcom() on a LocatebleSeq the outcome of this method remains the same! The reason is that within a same alignment, a revcomed sequence is not a the same one any more. You can not put it back into the same alignment. The following demonstrates it by taking two sequences that happen to have almost identical in the wrong strand (I hope I did not mess this up by doing it by hand): seq1 ttaccta seq2 atgctat 1234567890123 seq1 --atg---gtaa- -1 seq2 --atg---ctat- 1 is $seq1->column_from_residue_number(5),5; is $seq1->column_from_residue_number(4),9; is $seq2->column_from_residue_number(5),10; is $seq2->column_from_residue_number(4),9; Maybe Ewan can be dragged from his bioperl retirement to point us to an old document somewhere that explains all the logic behind the way strand is used in bioperl? -Heikki On Saturday 18 October 2008 00:03:42 George Hartzell wrote: > Hi All, > > I'm not sure that I'm understanding Bioperl's location terminology and > how it's carried through into some basic technology like the > LocatableSeqs. Hopefully this is more about communicating in the > shared bioperl language and I'm not just demonstrating how much > biology I've forgotten. This is being driven by my gmap SearchIO > parser (which hopefully will get committed at some point), which > currently returns GenericHSPs in GenericHits and from which I can > retrieve SimpleAlign's. I'm just not sure that I'm translating from > gmap-speak to bioperl-speak correctly (assumptions, it's always > assumptions...). > > There's the basic truism from e.g. Bio::Range: > > length = end - start + 1 > end >= start > strand = (-1 | 0 | +1) > > So if I have seq_id => foo > > 5' AACTGTTTGG 3' > 1 5 1 > 0 > > so > > -start => 3, -end => 6, -strand => +1 would be: CTGT > > and > > -start => 4, -end => 4, -strand => +1 would be: T > > Things get goofier when strand is -1, but I'm pretty confident that > one would say > > -start => 4, -end => 4, -strand => -1 would be: A > > (but I'm worried that I should say it's T, or the reverse compliment > of T or something complicated) You are taking the reverse compliment of T , that is A. > and slightly less confident that one would say > > -start => 3, -end => 6, -strand => -1 would be: ACAG > > (in other words, always spelling the sequence out 5' to 3' from the > reverse strand in that range). Yes. > Where I really get shaky about how I understand how things are > supposed to be said is when LocatableSeqs get involved. > > If I have a simple align that contains a row w/ > > -seq_id => foo, -start => 3, -end => 6, -strand => 1 > > then I think that the row might look like this in an alignment: > > moose CTCT > foo CTGT > bar C-GT > > and that if > > moose AGAG > foo ACAG > bar AC-G > > were the rows in the alignment then it's info would be > > -seq_id => foo, -start => 3, -end => 6, -strand => -1. > > Now here's where I really loose faith in my understanding. There's > code in Bio::LocatableSeq::column_from_residue_number (around line > 301) that tests the strand and > > if strand == -1 then it counts back from the -end coordinate towards > the -start > but if it's +1 then it counts up from 0 towards the -end. and it adds one to the final result. > That only makes sense if the sequence string that's contained in the > locatable seq that's part of the simple alignment is actually the > forward strand, in which case the alignment would *look* really goofy > visually. If on the other hand the string in the alignment were the > 5' to 3' writing of reverse strand (which would look like it aligns > with the other strings in the alignment because the bases would match) > then the counting seems messed up. > > If someone wants to round out my understanding and squash any bugs in > it, I can try to use it to seed a Location HowTo or something. > > Thanks, > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From vincenza.maselli at gmail.com Tue Oct 21 06:34:21 2008 From: vincenza.maselli at gmail.com (Vincenza Maselli) Date: Tue, 21 Oct 2008 12:34:21 +0200 Subject: [Bioperl-l] problem with Bio::Das::FeatureTypeI::name Message-ID: <1f1de13e0810210334he069c4fh38c3e3a97514fe9@mail.gmail.com> Dear All I tried to write a simple script to write a gff file, but my script dead when try to execute the method _write_feature_3 in the gff.pm module but I got this message: Abstract method "Bio::Das::FeatureTypeI::name" is not implemented by package Bio::DB::GFF::Typename. This is not your fault - author of Bio::DB::GFF::Typename should be blamed! Do you have any suggestion to overcome this problem? Did someone implemented it? Thanks for the help Vincenza ================== START CODE ============================================================================== ! /usr/bin/perl use strict; use DBI; use DBD::mysql; use Data::Dumper; use Bio::FeatureIO; use Bio::DB::GFF; use Bio::DB::GFF::Featname; use Bio::SeqFeature::Generic; my $hostname = 'localhost'; my $user = 'user'; my $password = 'passwd'; my $dsn = "dbi:mysql:database=dbname;host=$hostname"; my $dbh = DBI->connect($dsn, $user, $password,{PrintError=>0,RaiseError=>0}) || die "Can't connect to database:$DBI::errstr\n"; my $out = Bio::FeatureIO->new(-file => ">>test.gff" , -format => 'GFF' , -version => 3); # queries my $sql = qq{SELECT d.fid, d.fref, d.fstart, d.fstop, d.fbin, d.fscore, d.fstrand, d.fphase, d.ftarget_start, d.ftarget_stop, t.fmethod, t.fsource, g.gid,g.gclass,g.gname FROM fdata d, ftype t, fgroup g WHERE d.ftypeid = t.ftypeid AND d.gid = g.gid LIMIT 1}; my $sth = $dbh->prepare($sql); $sth->execute; # create GFF adaptor my $gff_adaptor = Bio::DB::GFF->new(-dsn => $dsn, -user => $user, -pass => $password); #initialize variables for feature object my $factory = $gff_adaptor; #a Bio::DB::GFF adaptor object (or descendent) while (my $ref = $sth->fetchrow_hashref){ my $srcseq = "ATGCGGATAGACGATAGCGATAACCTATAGTAGATCCGCTCGATCGTAGC"; #the source sequence my $start = $ref->{'fstart'}; #start of this feature my $stop = $ref->{'fstop'}; #stop of this feature my $method = $ref->{'fmethod'}; #this feature's GFF method my $source = $ref->{'fsource'}; #this feature's GFF source my $score = $ref->{'fscore'}; #this feature's score my $fstrand = $ref->{'fstrand'}; #this feature's strand (relative to the source sequence, which has its own strandedness!) my $phase = $ref->{'fphase'}; #this feature's phase my $group = Bio::DB::GFF::Featname->new(-class => $ref->{'gclass'},-name => $ref->{'gname'}); #this feature's group my $db_id = $ref->{'fid'}; #this feature's internal database ID my $group_id = $ref->{'gid'}; my $tstart = $ref->{'ftarget_start'}; my $tstop = $ref->{'ftarget_stop'}; #create feature object my $feat = Bio::DB::GFF::Feature->new( $factory, $srcseq, $start, $stop, $method, $source, $score, $fstrand, $phase, $group, $db_id, $group_id, $tstart, $tstop); #write out features my $seq_feat = Bio::SeqFeature::Generic->new( -gff3_string => $feat->gff3_string ); my $annseq = Bio::SeqFeature::Annotated->new(-start => $start, -end => $stop, -phase => $phase); $annseq->add_SeqFeature($feat); $out->write_feature($annseq); } =========================== START RETURN ============================================================ >$ perl create_gff.pl ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::Das::FeatureTypeI::name" is not implemented by package Bio::DB::GFF::Typename. This is not your fault - author of Bio::DB::GFF::Typename should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::Root::RootI::throw_not_implemented /usr/local/lib/perl5/site_perl/5.10.0/Bio/Root/RootI.pm:680 STACK: Bio::Das::FeatureTypeI::name /usr/local/lib/perl5/site_perl/5.10.0/Bio/Das/FeatureTypeI.pm:142 STACK: Bio::FeatureIO::gff::_write_feature_3 /usr/local/lib/perl5/site_perl/5.10.0/Bio/FeatureIO/gff.pm:884 STACK: Bio::FeatureIO::gff::_write_feature_3 /usr/local/lib/perl5/site_perl/5.10.0/Bio/FeatureIO/gff.pm:934 STACK: Bio::FeatureIO::gff::write_feature /usr/local/lib/perl5/site_perl/5.10.0/Bio/FeatureIO/gff.pm:263 STACK: create_gff.pl:86 ---------------------------------------------------------------- ===================================================================================================== -- Vincenza Maselli Dept. of Soil, Plant, Environmental and Animal Production Sciences University of Naples "Federico II" Via Universita' 100 Parco Gussone - building number 75 "GenoPom" 80055 Portici, Naples, Italy phone: +39-081-2539246 web: http://cab.unina.it From smarkel at accelrys.com Tue Oct 21 16:49:21 2008 From: smarkel at accelrys.com (Scott Markel) Date: Tue, 21 Oct 2008 13:49:21 -0700 Subject: [Bioperl-l] SeqIO-based parser for Vector NTI sequence files Message-ID: <48FE4051.2010700@accelrys.com> I'm looking for a BioPerl-related solution to parsing Vector NTI sequence files. The genbank.pm parser will work, but it doesn't parse the COMMENT lines beyond grabbing the simple string value, so it misses all of the added information in those lines. If you know of any existing code, I'd be interesting in hearing about it. I checked BioPerl, BioJava, and EMBOSS documentation. I also checked the Invitrogen web site. Scott -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From David.Messina at sbc.su.se Tue Oct 21 20:21:00 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 21 Oct 2008 20:21:00 -0400 Subject: [Bioperl-l] latest BioPerl hard to find in CPAN? Message-ID: <03E32B10-DE46-4C8D-9BA6-B542E2CA2511@sbc.su.se> Hey everybody, When I looked in CPAN for bioperl 1.5.2 (the latest version available on CPAN), I couldn't find it. cpan[4]> i /bioperl/ Bundle Bundle::BioPerl (CRAFFI/Bundle-BioPerl-2.1.8.tar.gz) Distribution BIRNEY/bioperl-1.2.1.tar.gz Distribution BIRNEY/bioperl-1.2.2.tar.gz Distribution BIRNEY/bioperl-1.2.3.tar.gz Distribution BIRNEY/bioperl-1.2.tar.gz Distribution BIRNEY/bioperl-1.4.tar.gz Distribution BIRNEY/bioperl-db-0.1.tar.gz Distribution BIRNEY/bioperl-ext-1.4.tar.gz Distribution BIRNEY/bioperl-gui-0.7.tar.gz Distribution BIRNEY/bioperl-run-1.2.2.tar.gz Distribution BIRNEY/bioperl-run-1.4.tar.gz Distribution BOZO/Fry-Lib-BioPerl-0.15.tar.gz Distribution CRAFFI/Bundle-BioPerl-2.1.8.tar.gz Module = Bio::LiveSeq::IO::BioPerl (BIRNEY/bioperl-1.2.3.tar.gz) Module Bio::Phylo::Adaptor::Bioperl::Datum (RVOSA/Bio- Phylo-0.17_RC6.tar.gz) Module Bio::Phylo::Adaptor::Bioperl::Matrix (RVOSA/Bio- Phylo-0.17_RC6.tar.gz) Module Bio::Phylo::Adaptor::Bioperl::Node (RVOSA/Bio- Phylo-0.17_RC6.tar.gz) Module Bio::Phylo::Adaptor::Bioperl::Tree (RVOSA/Bio- Phylo-0.17_RC6.tar.gz) Module Fry::Lib::BioPerl (BOZO/Fry-Lib-BioPerl-0.15.tar.gz) Author BIOPERLML ("Bioperl-l" ) 20 items found I happen to know where it is, so I know it's there. cpan[7]> ls SENDU Fetching with LWP: ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/S/CHECKSUMS Fetching with LWP: ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/S/SE/CHECKSUMS Fetching with LWP: ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/S/SE/SENDU/CHECKSUMS 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz 320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz 99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz 942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz But surely this will hamper others, no? Dave From hartzell at alerce.com Tue Oct 21 23:48:09 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue, 21 Oct 2008 20:48:09 -0700 Subject: [Bioperl-l] sanity check my understanding of bioperl's location terminology? In-Reply-To: <200810180847.52715.heikki@sanbi.ac.za> References: <18681.3006.557134.242948@almost.alerce.com> <200810180847.52715.heikki@sanbi.ac.za> Message-ID: <18686.41593.311775.547360@almost.alerce.com> Heikki Lehvaslaiho writes: > George, > > You've got it all right expect the last bit. > > Bio::LocatableSeq::column_from_residue_number() is a special case because it's > input is in original sequence coordinates, of which the LocatableSeq in > question is part of. > > Look at the tests. I just added a few on column_from_residue_number() that > show that if you take a revcom() on a LocatebleSeq the outcome of this method > remains the same! The reason is that within a same alignment, a revcomed > sequence is not a the same one any more. You can not put it back into the same > alignment. > > The following demonstrates it by taking two sequences that happen to have > almost identical in the wrong strand (I hope I did not mess this up by doing > it by hand): > > seq1 ttaccta > seq2 atgctat I think that it should be seq1 ttaccat so that it matches the alignment in the example below. > 1234567890123 > seq1 --atg---gtaa- -1 > seq2 --atg---ctat- 1 > > > is $seq1->column_from_residue_number(5),5; > is $seq1->column_from_residue_number(4),9; > > is $seq2->column_from_residue_number(5),10; > is $seq2->column_from_residue_number(4),9; > > Maybe Ewan can be dragged from his bioperl retirement to point us to an old > document somewhere that explains all the logic behind the way strand is used > in bioperl? > > -Heikki > [...] Thanks! It's great to have it all broken out, written down, and commited to the tree. It even makes sense, in that way that only things involving strand can. g. From bix at sendu.me.uk Wed Oct 22 02:45:40 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 22 Oct 2008 07:45:40 +0100 Subject: [Bioperl-l] latest BioPerl hard to find in CPAN? In-Reply-To: <03E32B10-DE46-4C8D-9BA6-B542E2CA2511@sbc.su.se> References: <03E32B10-DE46-4C8D-9BA6-B542E2CA2511@sbc.su.se> Message-ID: <48FECC14.8080104@sendu.me.uk> Dave Messina wrote: > Hey everybody, > > When I looked in CPAN for bioperl 1.5.2 (the latest version available on > CPAN), I couldn't find it. > > cpan[4]> i /bioperl/ [...] > I happen to know where it is, so I know it's there. > > cpan[7]> ls SENDU > Fetching with LWP: > ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/S/CHECKSUMS > Fetching with LWP: > ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/S/SE/CHECKSUMS > Fetching with LWP: > > ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/S/SE/SENDU/CHECKSUMS > 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz > 320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz > 99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz > 942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz > > But surely this will hamper others, no? Probably, but that's the way it works with CPAN. Developer releases do not show up in searches. You have to know the path. I think this is explained in the installation instructions on the wiki. From heikki at sanbi.ac.za Wed Oct 22 03:54:40 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 22 Oct 2008 09:54:40 +0200 Subject: [Bioperl-l] latest BioPerl hard to find in CPAN? In-Reply-To: <03E32B10-DE46-4C8D-9BA6-B542E2CA2511@sbc.su.se> References: <03E32B10-DE46-4C8D-9BA6-B542E2CA2511@sbc.su.se> Message-ID: <200810220954.40466.heikki@sanbi.ac.za> CPAN web query by typing 'cpan:bioperl' into konqueror that gets expanded to http://search.cpan.org/search?mode=dist&query=bioperl gives bioperl bioperl-1.5.2 as the first one in the list. Luckily, most people looking for a module will be using the web interface. That does not explain why the same query from the command line cpan[7]> d /bioperl/ does not show the latest release! More digging into this is needed. -Heikki On Wednesday 22 October 2008 02:21:00 Dave Messina wrote: > Hey everybody, > > When I looked in CPAN for bioperl 1.5.2 (the latest version available > on CPAN), I couldn't find it. > > cpan[4]> i /bioperl/ > Bundle Bundle::BioPerl (CRAFFI/Bundle-BioPerl-2.1.8.tar.gz) > Distribution BIRNEY/bioperl-1.2.1.tar.gz > Distribution BIRNEY/bioperl-1.2.2.tar.gz > Distribution BIRNEY/bioperl-1.2.3.tar.gz > Distribution BIRNEY/bioperl-1.2.tar.gz > Distribution BIRNEY/bioperl-1.4.tar.gz > Distribution BIRNEY/bioperl-db-0.1.tar.gz > Distribution BIRNEY/bioperl-ext-1.4.tar.gz > Distribution BIRNEY/bioperl-gui-0.7.tar.gz > Distribution BIRNEY/bioperl-run-1.2.2.tar.gz > Distribution BIRNEY/bioperl-run-1.4.tar.gz > Distribution BOZO/Fry-Lib-BioPerl-0.15.tar.gz > Distribution CRAFFI/Bundle-BioPerl-2.1.8.tar.gz > Module = Bio::LiveSeq::IO::BioPerl (BIRNEY/bioperl-1.2.3.tar.gz) > Module Bio::Phylo::Adaptor::Bioperl::Datum (RVOSA/Bio- > Phylo-0.17_RC6.tar.gz) > Module Bio::Phylo::Adaptor::Bioperl::Matrix (RVOSA/Bio- > Phylo-0.17_RC6.tar.gz) > Module Bio::Phylo::Adaptor::Bioperl::Node (RVOSA/Bio- > Phylo-0.17_RC6.tar.gz) > Module Bio::Phylo::Adaptor::Bioperl::Tree (RVOSA/Bio- > Phylo-0.17_RC6.tar.gz) > Module Fry::Lib::BioPerl (BOZO/Fry-Lib-BioPerl-0.15.tar.gz) > Author BIOPERLML ("Bioperl-l" ) > 20 items found > > > I happen to know where it is, so I know it's there. > > cpan[7]> ls SENDU > Fetching with LWP: > ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/S/CHECKSUMS > Fetching with LWP: > ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/S/SE/CHECKSUMS > Fetching with LWP: > > ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/S/SE/SENDU/CHECKSUMS > 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz > 320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz > 99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz > 942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz > > > But surely this will hamper others, no? > > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From David.Messina at sbc.su.se Wed Oct 22 09:19:07 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 22 Oct 2008 09:19:07 -0400 Subject: [Bioperl-l] latest BioPerl hard to find in CPAN? In-Reply-To: <48FECC14.8080104@sendu.me.uk> References: <03E32B10-DE46-4C8D-9BA6-B542E2CA2511@sbc.su.se> <48FECC14.8080104@sendu.me.uk> Message-ID: <5A3C18F9-920E-4877-B7E8-437A525912DE@sbc.su.se> Oh, that's right, thanks guys. I forgot about the dev release thing. YARWWN1.6 Dave (yet another reason why we need 1.6) From pabignone at gmail.com Wed Oct 22 10:02:18 2008 From: pabignone at gmail.com (Paola Bignone) Date: Wed, 22 Oct 2008 15:02:18 +0100 Subject: [Bioperl-l] Run::Primer3 and no primer return Message-ID: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> Dear all, I trying to use Primer3 through Bioperl. I copied the basic script from the Run::Primer3 documentation module, and although it is reading the sequence from the file, no primer is designed. If I use that sequence in the web-base interface of Primer3, several primers were obtained; so there is no problem with the sequence. It is a very basic problem, but I cannot get this to work. I will appreciate any help as I'm stuck even before I started to change the code to suit my needs. TIA, Paola ---- use Bio::Tools::Run::Primer3; use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>'data/test.fasta'); my $seq=$seqio->next_seq; my $primer3 = Bio::Tools::Run::Primer3 -> new( -seq => $seq, -outfile => 'data/temp.out', -path => '/usr/local/pkgbin/primer3_core', ); unless ($primer3->executable) { print STDERR "primer3 can not be found. Is it installed?\n"; exit(-1) } $results=$primer3->run(); print "There were ", $results->number_of_results, " primers\n"; ---- the data/temp.out file is created but it is empty. From roy.chaudhuri at gmail.com Wed Oct 22 10:56:05 2008 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 22 Oct 2008 15:56:05 +0100 Subject: [Bioperl-l] Run::Primer3 and no primer return In-Reply-To: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> References: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> Message-ID: <48FF3F05.4060309@gmail.com> Hi Paola, I tried your code using a sequence I had lying around, and it seemed to work fine, designing 5 primers. I then tried it with a sequence consisting of just As, so no primers were designed. However the temp.out file was not empty, it still contained lines with the input data (PRIMER_SEQUENCE_ID and SEQUENCE), so this suggests that Primer3 is not running correctly for you. Perhaps there is a file permissions issue? Is primer3_core executable? I notice that you haven't declared $results using "my". If you're not doing so already, include "use warnings; use strict;" at the top of your program, that might give you some more useful information on why things are going wrong. Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. Paola Bignone wrote: > Dear all, > > I trying to use Primer3 through Bioperl. > I copied the basic script from the Run::Primer3 documentation module, and > although it is reading the sequence from the file, no primer is designed. > If I use that sequence in the web-base interface of Primer3, several primers > were obtained; so there is no problem with the sequence. > > It is a very basic problem, but I cannot get this to work. > I will appreciate any help as I'm stuck even before I started to change the > code to suit my needs. > TIA, > Paola > > ---- > use Bio::Tools::Run::Primer3; > use Bio::SeqIO; > > my $seqio=Bio::SeqIO->new(-file=>'data/test.fasta'); > my $seq=$seqio->next_seq; > > my $primer3 = Bio::Tools::Run::Primer3 -> new( > -seq => $seq, > -outfile => 'data/temp.out', > -path => '/usr/local/pkgbin/primer3_core', > ); > > unless ($primer3->executable) { > print STDERR "primer3 can not be found. Is it installed?\n"; > exit(-1) > } > > $results=$primer3->run(); > print "There were ", $results->number_of_results, " primers\n"; > > ---- > the data/temp.out file is created but it is empty. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Oct 22 11:58:33 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 22 Oct 2008 11:58:33 -0400 Subject: [Bioperl-l] Run::Primer3 and no primer return In-Reply-To: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> References: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> Message-ID: <1504ABAD-1B26-47D8-8386-444D1E2617AE@sbc.su.se> Ciao Paola, I don't have Primer3 installed locally so I can't run your code -- hopefully someone so equipped will chime in. But one thing I would try if you haven't already is to run your local copy of Primer3 on your test data on the command line outside of Perl to verify that it is working. Even though your script is checking that Primer3 is installed, that's probably just looking for the presence of an executable file at the location you specified. Dave From smarkel at accelrys.com Wed Oct 22 11:57:37 2008 From: smarkel at accelrys.com (Scott Markel) Date: Wed, 22 Oct 2008 08:57:37 -0700 Subject: [Bioperl-l] Run::Primer3 and no primer return In-Reply-To: <48FF3F05.4060309@gmail.com> References: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> <48FF3F05.4060309@gmail.com> Message-ID: <48FF4D71.8040108@accelrys.com> Paola, In addition to the excellent points raised by Roy, I would suggest running the same command line outside of BioPerl. In addition to permissions issues, you'll also be able to track down problems with input files, file locations (missing leading slash), etc. Scott Roy Chaudhuri wrote: > Hi Paola, > > I tried your code using a sequence I had lying around, and it seemed to > work fine, designing 5 primers. I then tried it with a sequence > consisting of just As, so no primers were designed. However the temp.out > file was not empty, it still contained lines with the input data > (PRIMER_SEQUENCE_ID and SEQUENCE), so this suggests that Primer3 is not > running correctly for you. Perhaps there is a file permissions issue? Is > primer3_core executable? > > I notice that you haven't declared $results using "my". If you're not > doing so already, include "use warnings; use strict;" at the top of your > program, that might give you some more useful information on why things > are going wrong. > > Roy. > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > Paola Bignone wrote: >> Dear all, >> >> I trying to use Primer3 through Bioperl. >> I copied the basic script from the Run::Primer3 documentation module, and >> although it is reading the sequence from the file, no primer is designed. >> If I use that sequence in the web-base interface of Primer3, several >> primers >> were obtained; so there is no problem with the sequence. >> >> It is a very basic problem, but I cannot get this to work. >> I will appreciate any help as I'm stuck even before I started to >> change the >> code to suit my needs. >> TIA, >> Paola >> >> ---- >> use Bio::Tools::Run::Primer3; >> use Bio::SeqIO; >> >> my $seqio=Bio::SeqIO->new(-file=>'data/test.fasta'); >> my $seq=$seqio->next_seq; >> >> my $primer3 = Bio::Tools::Run::Primer3 -> new( >> -seq => $seq, >> -outfile => 'data/temp.out', >> -path => '/usr/local/pkgbin/primer3_core', >> ); >> >> unless ($primer3->executable) { >> print STDERR "primer3 can not be found. Is it installed?\n"; >> exit(-1) >> } >> >> $results=$primer3->run(); >> print "There were ", $results->number_of_results, " primers\n"; >> >> ---- >> the data/temp.out file is created but it is empty. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From roy.chaudhuri at gmail.com Wed Oct 22 13:43:55 2008 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 22 Oct 2008 18:43:55 +0100 Subject: [Bioperl-l] Run::Primer3 and no primer return In-Reply-To: <40d6e6580810220848ya17bef1ub5c3dc6df5989ae5@mail.gmail.com> References: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> <48FF3A60.7030605@cam.ac.uk> <40d6e6580810220848ya17bef1ub5c3dc6df5989ae5@mail.gmail.com> Message-ID: <48FF665B.7040105@gmail.com> Hi Paola, Please cc the bioperl-l mailing list in your replies, then others can contribute to the discussion. I'm not sure I follow, I think you're saying that primer3_core works when run from the command line with a correctly formatted input file (as suggested by Dave and Scott), but when you run it via your script you get the usage message? That would indicate that Bioperl is finding and running primer3_core, but not producing a valid input file. Looking at the code, it seems that a temporary file is produced for the primer3 input (using tempfile from Bio::Root::IO), so there could be a problem with that. You haven't mentioned what BioPerl version you're using, but it'd be a good idea to upgrade to the latest (1.5.2) or even the one from Subversion, since there have been a few bugfixes to Bio::Tools::Run::Primer3 over the years. See: http://www.bioperl.org/wiki/Getting_BioPerl http://www.bioperl.org/wiki/Installing_BioPerl Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. Paola Bignone wrote: > Hi Roy, > > Thank you for you quick response. > > I normally run eprimer3 within a bash script, but I want to include the > searched primers in my perl script. I assumed that primer3_core was > working. > When I figured out the format of the input file, I managed to run > primer3_core from the command with the test sequence, giving me five > primers pairs. > > The output file is still empty. I have included the "use warnings; use > strict;" at the top of my script. > Even I changed the location of the file, to point to the executable > directly, rather the link. > > Could it be that the perl script is not calling the primer3_core, as > what it return is the usage message ? > USAGE: primer3_core [-format_output] [-2x_compat] [-strict_tags] > This is primer3 (primer3 release 1.0) > Input must be provided on standard input. > For example: > $ primer3_core < my_input_file > There were 0 primers > > I will contact the system administrator, as it seems that it is not > 'bioperl' related, as the script was taken from the module documentation > and it run in your hands. > Thanks again, > Paola > > > On Wed, Oct 22, 2008 at 3:36 PM, Roy Chaudhuri > wrote: > > Hi Paola, > > I tried your code using a sequence I had lying around, and it seemed > to work fine, designing 5 primers. I then tried it with a sequence > consisting of just As, so no primers were designed. However the > temp.out file was not empty, it still contained lines with the input > data (PRIMER_SEQUENCE_ID and SEQUENCE), so this suggests that > Primer3 is not running correctly for you. Perhaps there is a file > permissions issue? Is primer3_core executable? > > I notice that you haven't declared $results using "my". If you're > not doing so already, include "use warnings; use strict;" at the top > of your program, that might give you some more useful information on > why things are going wrong. > > Roy. > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > Paola Bignone wrote: > > Dear all, > > I trying to use Primer3 through Bioperl. > I copied the basic script from the Run::Primer3 documentation > module, and > although it is reading the sequence from the file, no primer is > designed. > If I use that sequence in the web-base interface of Primer3, > several primers > were obtained; so there is no problem with the sequence. > > It is a very basic problem, but I cannot get this to work. > I will appreciate any help as I'm stuck even before I started to > change the > code to suit my needs. > TIA, > Paola > > ---- > use Bio::Tools::Run::Primer3; > use Bio::SeqIO; > > my $seqio=Bio::SeqIO->new(-file=>'data/test.fasta'); > my $seq=$seqio->next_seq; > > my $primer3 = Bio::Tools::Run::Primer3 -> new( > -seq => $seq, > -outfile => 'data/temp.out', > -path => '/usr/local/pkgbin/primer3_core', > ); > > unless ($primer3->executable) { > print STDERR "primer3 can not be found. Is it installed?\n"; > exit(-1) > } > > $results=$primer3->run(); > print "There were ", $results->number_of_results, " primers\n"; > > ---- > the data/temp.out file is created but it is empty. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From caroline.johnston at iop.kcl.ac.uk Wed Oct 22 14:07:23 2008 From: caroline.johnston at iop.kcl.ac.uk (Caroline) Date: Wed, 22 Oct 2008 19:07:23 +0100 Subject: [Bioperl-l] Run::Primer3 and no primer return In-Reply-To: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> References: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> Message-ID: <1224698843.6967.38.camel@clive> Hi Paola, Your script seems to work for me. Do you want to chuck me a copy of your test sequence and I'll try it with that? Maybe the default primer3 settings in Bioperl are different to those used by the web interface? Cheers, Cass xx On Wed, 2008-10-22 at 15:02 +0100, Paola Bignone wrote: > > use Bio::Tools::Run::Primer3; > use Bio::SeqIO; > > my $seqio=Bio::SeqIO->new(-file=>'data/test.fasta'); > my $seq=$seqio->next_seq; > > my $primer3 = Bio::Tools::Run::Primer3 -> new( > -seq => $seq, > -outfile => 'data/temp.out', > -path => '/usr/local/pkgbin/primer3_core', > ); > > unless ($primer3->executable) { > print STDERR "primer3 can not be found. Is it installed?\n"; > exit(-1) > } > > $results=$primer3->run(); > print "There were ", $results->number_of_results, " primers\n"; > From davila at ioc.fiocruz.br Wed Oct 22 14:22:44 2008 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Wed, 22 Oct 2008 16:22:44 -0200 Subject: [Bioperl-l] Locus Tag vs Accession number mappings Message-ID: <48FF6F74.5000002@ioc.fiocruz.br> Dear colleagues, I wonder to know if there would be a way to use bioperl to generate a mapping of the NCBI Locus Tag ID (eg: MSMEG_2393, TA21330) to GenBank Accession Number (eg: AI568267,CR940347) or RefSeq accession number (eg: XM_949332.1) ? What would be the easiest way to do that ? I just asked NCBI-HelpDesk about this. Thanks, Alberto From jason at bioperl.org Wed Oct 22 16:20:27 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 22 Oct 2008 16:20:27 -0400 Subject: [Bioperl-l] multi ID indexing Bio::DB::Fasta Message-ID: <82C6946D-2BDD-4237-A1CB-4910DD8C2614@bioperl.org> Any reason I should avoid making this change which allows a custom makeid to return a list of IDs that can be indexed per sequence? There is the issue that they need to be unique but that is always the case. DBFasta tests pass fine after the change. This would allow indexing by GI and Accession and LOCUS all from the same file. This is the behavior for Bio::Index::Fasta and other Bio::Index not sure if there was any reason to not support it in Bio::DB::Fasta. Basically it is as follows. I did some code indenting so there are actually less changes than showing when I do a diff. - if ($id) { - my $seqlength = $pos - $offset - length($_); - $seqlength -= $termination_length * $seq_lines; - $offsets->{$id} = &{$self->{packmeth}}($offset,$seqlength, - $linelength,$firstline, - $type,$base); - } - $id = ref($self->{makeid}) eq 'CODE' ? $self->{makeid}->($_) : $1; + if (@id) { + my $seqlength = $pos - $offset - length($_); + $seqlength -= $termination_length * $seq_lines; + my $ppos = &{$self->{packmeth}}($offset,$seqlength, + $linelength,$firstline, + $type,$base); + for my $id (@id) { $offsets->{$id} = $ppos } + } + @id = ref($self->{makeid}) eq 'CODE' ? $self->{makeid}->($_) : $1; -jason -- Jason Stajich jason at bioperl.org From cjfields at illinois.edu Wed Oct 22 16:54:42 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Oct 2008 15:54:42 -0500 Subject: [Bioperl-l] Locus Tag vs Accession number mappings In-Reply-To: <48FF6F74.5000002@ioc.fiocruz.br> References: <48FF6F74.5000002@ioc.fiocruz.br> Message-ID: On Oct 22, 2008, at 1:22 PM, Alberto Davila wrote: > Dear colleagues, > > I wonder to know if there would be a way to use bioperl to generate > a mapping of the NCBI Locus Tag ID (eg: MSMEG_2393, TA21330) to > GenBank Accession Number (eg: AI568267,CR940347) or RefSeq accession > number (eg: XM_949332.1) ? > > What would be the easiest way to do that ? > > I just asked NCBI-HelpDesk about this. > > Thanks, Alberto For small lists (<500) you can query the nucleotide database directly (you can add 'srcdb refseq[properties]' to the search term to limit to just RefSeq): ------------------------------------- use Bio::DB::EUtilities; my @ids = qw(MSMEG_2393 TA21330); my $term = join(' OR ',map {$_."[GENE]"} @ids); my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'nucleotide', -term => $term); my @uids = $eutil->get_ids; $eutil->set_parameters(-eutil => 'esummary',-id => \@uids); $eutil->print_DocSums; ------------------------------------- You can 'epost' in increments if you have more IDs, up to 1000-2000 I think. Beyond that, you should probably use one of the mapping files located in the ftp.ncbi.nih.gov/gene/DATA folder and just use it locally (initially index the data with DB_File, search using a tied hash, etc). chris From hlapp at gmx.net Wed Oct 22 20:58:47 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 23 Oct 2008 08:58:47 +0800 Subject: [Bioperl-l] multi ID indexing Bio::DB::Fasta In-Reply-To: <82C6946D-2BDD-4237-A1CB-4910DD8C2614@bioperl.org> References: <82C6946D-2BDD-4237-A1CB-4910DD8C2614@bioperl.org> Message-ID: Sounds great to me. I understand that this wouldn't *require* makeid to return a list, right? -hilmar On Oct 23, 2008, at 4:20 AM, Jason Stajich wrote: > Any reason I should avoid making this change which allows a custom > makeid to return a list of IDs that can be indexed per sequence? > There is the issue that they need to be unique but that is always > the case. DBFasta tests pass fine after the change. > > This would allow indexing by GI and Accession and LOCUS all from the > same file. This is the behavior for Bio::Index::Fasta and other > Bio::Index not sure if there was any reason to not support it in > Bio::DB::Fasta. > > Basically it is as follows. I did some code indenting so there are > actually less changes than showing when I do a diff. > > - if ($id) { > - my $seqlength = $pos - $offset - length($_); > - $seqlength -= $termination_length * $seq_lines; > - $offsets->{$id} = &{$self->{packmeth}}($offset,$seqlength, > - $linelength,$firstline, > - $type,$base); > - } > - $id = ref($self->{makeid}) eq 'CODE' ? $self->{makeid}->($_) : > $1; > > > + if (@id) { > + my $seqlength = $pos - $offset - length($_); > + $seqlength -= $termination_length * $seq_lines; > + my $ppos = &{$self->{packmeth}}($offset,$seqlength, > + $linelength,$firstline, > + $type,$base); > + for my $id (@id) { $offsets->{$id} = $ppos } > + } > + @id = ref($self->{makeid}) eq 'CODE' ? $self->{makeid}- > >($_) : $1; > > > > -jason > -- > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Thu Oct 23 01:04:08 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 23 Oct 2008 01:04:08 -0400 Subject: [Bioperl-l] multi ID indexing Bio::DB::Fasta In-Reply-To: References: <82C6946D-2BDD-4237-A1CB-4910DD8C2614@bioperl.org> Message-ID: <840CE87D-8FA5-476C-8700-2B0D7D9523EB@bioperl.org> doesn't matter, it is all in list context so single-item returned is just fine. -jason On Oct 22, 2008, at 8:58 PM, Hilmar Lapp wrote: > Sounds great to me. I understand that this wouldn't *require* makeid > to return a list, right? > > -hilmar > > On Oct 23, 2008, at 4:20 AM, Jason Stajich wrote: > >> Any reason I should avoid making this change which allows a custom >> makeid to return a list of IDs that can be indexed per sequence? >> There is the issue that they need to be unique but that is always >> the case. DBFasta tests pass fine after the change. >> >> This would allow indexing by GI and Accession and LOCUS all from >> the same file. This is the behavior for Bio::Index::Fasta and >> other Bio::Index not sure if there was any reason to not support it >> in Bio::DB::Fasta. >> >> Basically it is as follows. I did some code indenting so there are >> actually less changes than showing when I do a diff. >> >> - if ($id) { >> - my $seqlength = $pos - $offset - length($_); >> - $seqlength -= $termination_length * $seq_lines; >> - $offsets->{$id} = &{$self->{packmeth}}($offset,$seqlength, >> - $linelength,$firstline, >> - $type,$base); >> - } >> - $id = ref($self->{makeid}) eq 'CODE' ? $self->{makeid}->($_) : >> $1; >> >> >> + if (@id) { >> + my $seqlength = $pos - $offset - length($_); >> + $seqlength -= $termination_length * $seq_lines; >> + my $ppos = &{$self->{packmeth}}($offset,$seqlength, >> + $linelength,$firstline, >> + $type,$base); >> + for my $id (@id) { $offsets->{$id} = $ppos } >> + } >> + @id = ref($self->{makeid}) eq 'CODE' ? $self->{makeid}- >> >($_) : $1; >> >> >> >> -jason >> -- >> Jason Stajich >> jason at bioperl.org >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > Jason Stajich jason at bioperl.org From lmanchon at univ-montp2.fr Thu Oct 23 03:04:15 2008 From: lmanchon at univ-montp2.fr (Laurent Manchon) Date: Thu, 23 Oct 2008 09:04:15 +0200 Subject: [Bioperl-l] method never found in esearch.pm package Message-ID: <5.0.2.1.2.20081023090125.00c4cba8@pop.univ-montp2.fr> Hi, your code below posted yesterday returns me this error: Can't locate object method "set_parameters" via package "Bio::DB::EUtilities::esearch" at line 20 i don't know why ! #!/usr/bin/perl use Bio::DB::EUtilities; my @ids = qw(MSMEG_2393 TA21330); my $term = join(' OR ',map {$_."[GENE]"} @ids); my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'nucleotide', -term => $term); my @uids = $eutil->get_ids; $eutil->set_parameters(-eutil => 'esummary',-id => \@uids); $eutil->print_DocSums; +---------------------------------------------+ Laurent Manchon Email: lmanchon at univ-montp2.fr +---------------------------------------------+ From cjfields at illinois.edu Thu Oct 23 08:25:42 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Oct 2008 07:25:42 -0500 Subject: [Bioperl-l] method never found in esearch.pm package In-Reply-To: <5.0.2.1.2.20081023090125.00c4cba8@pop.univ-montp2.fr> References: <5.0.2.1.2.20081023090125.00c4cba8@pop.univ-montp2.fr> Message-ID: On Oct 23, 2008, at 2:04 AM, Laurent Manchon wrote: > Hi, > > your code below posted yesterday returns me this error: > Can't locate object method "set_parameters" via package > "Bio::DB::EUtilities::esearch" at line 20 > i don't know why ! > > #!/usr/bin/perl > > use Bio::DB::EUtilities; > > my @ids = qw(MSMEG_2393 TA21330); > > my $term = join(' OR ',map {$_."[GENE]"} @ids); > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'nucleotide', > -term => $term); > my @uids = $eutil->get_ids; > > $eutil->set_parameters(-eutil => 'esummary',-id => \@uids); > $eutil->print_DocSums; You'll need the latest code from subversion; EUtilities has been extensively revised since the last developer release. chris From lstein at cshl.edu Thu Oct 23 09:38:15 2008 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 23 Oct 2008 09:38:15 -0400 Subject: [Bioperl-l] multi ID indexing Bio::DB::Fasta In-Reply-To: <82C6946D-2BDD-4237-A1CB-4910DD8C2614@bioperl.org> References: <82C6946D-2BDD-4237-A1CB-4910DD8C2614@bioperl.org> Message-ID: <6dce9a0b0810230638u282e4035if827f775a47010ac@mail.gmail.com> I don't see a problem with this, provided all IDs are unique. Lincoln On Wed, Oct 22, 2008 at 4:20 PM, Jason Stajich wrote: > Any reason I should avoid making this change which allows a custom makeid > to return a list of IDs that can be indexed per sequence? There is the > issue that they need to be unique but that is always the case. DBFasta > tests pass fine after the change. > > This would allow indexing by GI and Accession and LOCUS all from the same > file. This is the behavior for Bio::Index::Fasta and other Bio::Index not > sure if there was any reason to not support it in Bio::DB::Fasta. > > Basically it is as follows. I did some code indenting so there are > actually less changes than showing when I do a diff. > > - if ($id) { > - my $seqlength = $pos - $offset - length($_); > - $seqlength -= $termination_length * > $seq_lines; > - $offsets->{$id} = > &{$self->{packmeth}}($offset,$seqlength, > - > > $linelength,$firstline, > - > $type,$base); > - } > - $id = ref($self->{makeid}) eq 'CODE' ? > $self->{makeid}->($_) : $1; > > > + if (@id) { > + my $seqlength = $pos - $offset - length($_); > + $seqlength -= $termination_length * $seq_lines; > + my $ppos = &{$self->{packmeth}}($offset,$seqlength, > + $linelength,$firstline, > + $type,$base); > + for my $id (@id) { $offsets->{$id} = $ppos } > + } > + @id = ref($self->{makeid}) eq 'CODE' ? $self->{makeid}->($_) : $1; > > > > -jason > -- > Jason Stajich > jason at bioperl.org > > > > -- Lincoln D. Stein Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Stacey Quinn Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 USA (516) 367-8380 Assistant: Sandra Michelsen From briano at bioteam.net Thu Oct 23 12:22:09 2008 From: briano at bioteam.net (Brian Osborne) Date: Thu, 23 Oct 2008 12:22:09 -0400 Subject: [Bioperl-l] Fast BLAST parsing Message-ID: Bioperl, I'm not familiar with the very latest and greatest in BLAST parsing, perhaps you can help me here. I have a large Blast output file, it has multiple results in it. I'd like to rapidly find the relevant result for a given query name, I don't want to iterate over the results checking for query_name() each time. How can I directly pull out a result using query name? Thanks again, Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 From cjfields at illinois.edu Thu Oct 23 13:09:04 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Oct 2008 12:09:04 -0500 Subject: [Bioperl-l] Fast BLAST parsing In-Reply-To: References: Message-ID: Could you index the BLAST report using Bio::Index::Blast? From the synopsis: use Bio::Index::Blast; my ($indexfile,$file1,$file2,$query); my $index = Bio::Index::Blast->new(-filename => $indexfile, -write_flag => 1); $index->make_index($file1,$file2); my $data = $index->get_stream($query); my $blast_report = $index->fetch_report($query); print "query is ", $blast_report->query, "\n"; while ( my $result = $blast_report->next_result ) { print $result->algorithm, "\n"; while ( my $hsp = $result->next_hit ) { print "\t name ", $hsp->name, } print "\n"; } I think you can index using a callback on the query name (so you can look up by various means). chris On Oct 23, 2008, at 11:22 AM, Brian Osborne wrote: > Bioperl, > > I'm not familiar with the very latest and greatest in BLAST parsing, > perhaps you can help me here. I have a large Blast output file, it > has multiple results in it. I'd like to rapidly find the relevant > result for a given query name, I don't want to iterate over the > results checking for query_name() each time. How can I directly pull > out a result using query name? > > > Thanks again, > > Brian O. > -- > Brian Osborne, PhD > BioTeam: http://bioteam.net > email: briano at bioteam.net > mobile: 978-317-3101 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jason at bioperl.org Thu Oct 23 15:29:20 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 23 Oct 2008 15:29:20 -0400 Subject: [Bioperl-l] SimpleAlign - get_seq_by_id Message-ID: <16B97FCC-FA65-4FD5-AD49-A3E34F7FC07A@bioperl.org> I added get_seq_by_id to Bio::SimpleAlign to allow retrieval of a particular sequence from the alignment by ID. Not sure why this didn't exist before. -jason -- Jason Stajich jason at bioperl.org From jcorbo at pathology.wustl.edu Thu Oct 23 16:47:19 2008 From: jcorbo at pathology.wustl.edu (Corbo, Joseph) Date: Thu, 23 Oct 2008 15:47:19 -0500 Subject: [Bioperl-l] Need help installing Bioperl on Windows Message-ID: Greetings. I am trying to install Bioperl on my Windows XP machine and am having a problem I would greatly appreciate some help with. I am following the directions for installing Bioperl as given on the Bioperl website page "Installing Bioperl on Windows" under the section "Installation using the Perl Package Manager". I added the three new repositories (http://bioperl.org/DIST/ etc.) as they instruct. Then when I try to install Bioperl either using the GUI installation or the command line sequence "ppm-shell, search bioperl, install [number]" I get the following error message which I don't know how to fix: WARNING: Can't find any package that provides DB_File:: for Bundle-BioPerl-Core bioperl depends on Bundle-BioPerl-Core bioperl depends on SVG bioperl depends on GD-SVG bioperl depends on Spreadsheet-ParseExcel bioperl depends on XML-SAX bioperl depends on AcePerl bioperl depends on XML-SAX-ExpatXS bioperl depends on SOAP-Lite bioperl depends on SVG-Graph bioperl depends on Bio-ASN1-EntrezGene bioperl depends on XML-XPath bioperl depends on Convert-Binary-C bioperl depends on XML-Twig bioperl depends on Set-Scalar bioperl depends on Text-Shellwords bioperl depends on Data-Stag bioperl depends on libxml-perl bioperl depends on XML-Writer bioperl depends on Graph bioperl depends on Class-AutoClass bioperl depends on Clone bioperl depends on XML-DOM-XPath bioperl depends on IO-stringy bioperl depends on OLE-Storage_Lite bioperl depends on XML-NamespaceSupport bioperl depends on Cache-Cache bioperl depends on MIME-Lite bioperl depends on Math-Derivative bioperl depends on Math-Spline bioperl depends on Statistics-Descriptive bioperl depends on Tree-DAG_Node bioperl depends on Heap bioperl depends on Test-Deep bioperl depends on XML-DOM bioperl depends on XML-XPathEngine bioperl depends on Error bioperl depends on Email-Date-Format bioperl depends on MIME-Types bioperl depends on MailTools bioperl depends on Test-Tester bioperl depends on Test-NoWarnings bioperl depends on XML-RegExp bioperl depends on Test-Pod bioperl depends on TimeDate Any thoughts on how to fix this? Thanks, Joe Corbo From jcorbo at pathology.wustl.edu Thu Oct 23 16:35:57 2008 From: jcorbo at pathology.wustl.edu (Corbo, Joseph) Date: Thu, 23 Oct 2008 15:35:57 -0500 Subject: [Bioperl-l] (no subject) Message-ID: Greetings. I am trying to install Bioperl on my Windows XP machine and am having a problem I would greatly appreciate some help with. I am following the directions for installing Bioperl as given on the Bioperl website page "Installing Bioperl on Windows" under the section "Installation using the Perl Package Manager". I added the three new repositories (http://bioperl.org/DIST/ etc.) as they instruct. Then when I try to install Bioperl either using the GUI installation or the command line sequence "ppm-shell, search bioperl, install [number]" I get the following error message which I don't know how to fix: WARNING: Can't find any package that provides DB_File:: for Bundle-BioPerl-Core bioperl depends on Bundle-BioPerl-Core bioperl depends on SVG bioperl depends on GD-SVG bioperl depends on Spreadsheet-ParseExcel bioperl depends on XML-SAX bioperl depends on AcePerl bioperl depends on XML-SAX-ExpatXS bioperl depends on SOAP-Lite bioperl depends on SVG-Graph bioperl depends on Bio-ASN1-EntrezGene bioperl depends on XML-XPath bioperl depends on Convert-Binary-C bioperl depends on XML-Twig bioperl depends on Set-Scalar bioperl depends on Text-Shellwords bioperl depends on Data-Stag bioperl depends on libxml-perl bioperl depends on XML-Writer bioperl depends on Graph bioperl depends on Class-AutoClass bioperl depends on Clone bioperl depends on XML-DOM-XPath bioperl depends on IO-stringy bioperl depends on OLE-Storage_Lite bioperl depends on XML-NamespaceSupport bioperl depends on Cache-Cache bioperl depends on MIME-Lite bioperl depends on Math-Derivative bioperl depends on Math-Spline bioperl depends on Statistics-Descriptive bioperl depends on Tree-DAG_Node bioperl depends on Heap bioperl depends on Test-Deep bioperl depends on XML-DOM bioperl depends on XML-XPathEngine bioperl depends on Error bioperl depends on Email-Date-Format bioperl depends on MIME-Types bioperl depends on MailTools bioperl depends on Test-Tester bioperl depends on Test-NoWarnings bioperl depends on XML-RegExp bioperl depends on Test-Pod bioperl depends on TimeDate Any thoughts on how to fix this? Thanks, Joe Corbo From cain.cshl at gmail.com Thu Oct 23 16:55:08 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 23 Oct 2008 16:55:08 -0400 Subject: [Bioperl-l] Need help installing Bioperl on Windows In-Reply-To: References: Message-ID: <536f21b00810231355i7a111337t5d9579a0c3fb0b1e@mail.gmail.com> Hi Joe, I just ran into the same problem when installing GBrowse on Windows. You need to add yet another repository: tcool http://ppm.tcool.org/archives/ which provides DB_File. Scott On Thu, Oct 23, 2008 at 4:47 PM, Corbo, Joseph wrote: > Greetings. I am trying to install Bioperl on my Windows XP machine and > am having a problem I would greatly appreciate some help with. I am > following the directions for installing Bioperl as given on the Bioperl > website page "Installing Bioperl on Windows" under the section > "Installation using the Perl Package Manager". I added the three new > repositories (http://bioperl.org/DIST/ etc.) as they instruct. Then > when I try to install Bioperl either using the GUI installation or the > command line sequence "ppm-shell, search bioperl, install [number]" I > get the following error message which I don't know how to fix: > > > > WARNING: Can't find any package that provides DB_File:: for > Bundle-BioPerl-Core > > > > bioperl depends on Bundle-BioPerl-Core > > bioperl depends on SVG > > bioperl depends on GD-SVG > > bioperl depends on Spreadsheet-ParseExcel > > bioperl depends on XML-SAX > > bioperl depends on AcePerl > > bioperl depends on XML-SAX-ExpatXS > > bioperl depends on SOAP-Lite > > bioperl depends on SVG-Graph > > bioperl depends on Bio-ASN1-EntrezGene > > bioperl depends on XML-XPath > > bioperl depends on Convert-Binary-C > > bioperl depends on XML-Twig > > bioperl depends on Set-Scalar > > bioperl depends on Text-Shellwords > > bioperl depends on Data-Stag > > bioperl depends on libxml-perl > > bioperl depends on XML-Writer > > bioperl depends on Graph > > bioperl depends on Class-AutoClass > > bioperl depends on Clone > > bioperl depends on XML-DOM-XPath > > bioperl depends on IO-stringy > > bioperl depends on OLE-Storage_Lite > > bioperl depends on XML-NamespaceSupport > > bioperl depends on Cache-Cache > > bioperl depends on MIME-Lite > > bioperl depends on Math-Derivative > > bioperl depends on Math-Spline > > bioperl depends on Statistics-Descriptive > > bioperl depends on Tree-DAG_Node > > bioperl depends on Heap > > bioperl depends on Test-Deep > > bioperl depends on XML-DOM > > bioperl depends on XML-XPathEngine > > bioperl depends on Error > > bioperl depends on Email-Date-Format > > bioperl depends on MIME-Types > > bioperl depends on MailTools > > bioperl depends on Test-Tester > > bioperl depends on Test-NoWarnings > > bioperl depends on XML-RegExp > > bioperl depends on Test-Pod > > bioperl depends on TimeDate > > > > > > Any thoughts on how to fix this? Thanks, Joe Corbo > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From jcorbo at pathology.wustl.edu Thu Oct 23 17:17:19 2008 From: jcorbo at pathology.wustl.edu (Corbo, Joseph) Date: Thu, 23 Oct 2008 16:17:19 -0500 Subject: [Bioperl-l] Need help installing Bioperl on Windows In-Reply-To: <536f21b00810231355i7a111337t5d9579a0c3fb0b1e@mail.gmail.com> References: <536f21b00810231355i7a111337t5d9579a0c3fb0b1e@mail.gmail.com> Message-ID: Unfortunately, I tried the installation again after adding the tcool repository as Scott suggested, and I got exactly the same error message. Anybody have another idea? Thanks, Joe -----Original Message----- From: Scott Cain [mailto:cain.cshl at gmail.com] Sent: Thursday, October 23, 2008 3:55 PM To: Corbo, Joseph Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Need help installing Bioperl on Windows Hi Joe, I just ran into the same problem when installing GBrowse on Windows. You need to add yet another repository: tcool http://ppm.tcool.org/archives/ which provides DB_File. Scott On Thu, Oct 23, 2008 at 4:47 PM, Corbo, Joseph wrote: > Greetings. I am trying to install Bioperl on my Windows XP machine and > am having a problem I would greatly appreciate some help with. I am > following the directions for installing Bioperl as given on the Bioperl > website page "Installing Bioperl on Windows" under the section > "Installation using the Perl Package Manager". I added the three new > repositories (http://bioperl.org/DIST/ etc.) as they instruct. Then > when I try to install Bioperl either using the GUI installation or the > command line sequence "ppm-shell, search bioperl, install [number]" I > get the following error message which I don't know how to fix: > > > > WARNING: Can't find any package that provides DB_File:: for > Bundle-BioPerl-Core > > > > bioperl depends on Bundle-BioPerl-Core > > bioperl depends on SVG > > bioperl depends on GD-SVG > > bioperl depends on Spreadsheet-ParseExcel > > bioperl depends on XML-SAX > > bioperl depends on AcePerl > > bioperl depends on XML-SAX-ExpatXS > > bioperl depends on SOAP-Lite > > bioperl depends on SVG-Graph > > bioperl depends on Bio-ASN1-EntrezGene > > bioperl depends on XML-XPath > > bioperl depends on Convert-Binary-C > > bioperl depends on XML-Twig > > bioperl depends on Set-Scalar > > bioperl depends on Text-Shellwords > > bioperl depends on Data-Stag > > bioperl depends on libxml-perl > > bioperl depends on XML-Writer > > bioperl depends on Graph > > bioperl depends on Class-AutoClass > > bioperl depends on Clone > > bioperl depends on XML-DOM-XPath > > bioperl depends on IO-stringy > > bioperl depends on OLE-Storage_Lite > > bioperl depends on XML-NamespaceSupport > > bioperl depends on Cache-Cache > > bioperl depends on MIME-Lite > > bioperl depends on Math-Derivative > > bioperl depends on Math-Spline > > bioperl depends on Statistics-Descriptive > > bioperl depends on Tree-DAG_Node > > bioperl depends on Heap > > bioperl depends on Test-Deep > > bioperl depends on XML-DOM > > bioperl depends on XML-XPathEngine > > bioperl depends on Error > > bioperl depends on Email-Date-Format > > bioperl depends on MIME-Types > > bioperl depends on MailTools > > bioperl depends on Test-Tester > > bioperl depends on Test-NoWarnings > > bioperl depends on XML-RegExp > > bioperl depends on Test-Pod > > bioperl depends on TimeDate > > > > > > Any thoughts on how to fix this? Thanks, Joe Corbo > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cain.cshl at gmail.com Thu Oct 23 17:21:29 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 23 Oct 2008 17:21:29 -0400 Subject: [Bioperl-l] Need help installing Bioperl on Windows In-Reply-To: References: <536f21b00810231355i7a111337t5d9579a0c3fb0b1e@mail.gmail.com> Message-ID: <536f21b00810231421j4b80221eh2edccceae3886418@mail.gmail.com> Ah, I forgot to ask: are you using perl 5.10? I've found that to be completely broken for ActiveState's perl (other perl 5.10 builds seem to be fine). I would suggest you use perl 5.8. Scott On Thu, Oct 23, 2008 at 5:17 PM, Corbo, Joseph wrote: > Unfortunately, I tried the installation again after adding the tcool > repository as Scott suggested, and I got exactly the same error message. > Anybody have another idea? Thanks, Joe > > > -----Original Message----- > From: Scott Cain [mailto:cain.cshl at gmail.com] > Sent: Thursday, October 23, 2008 3:55 PM > To: Corbo, Joseph > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Need help installing Bioperl on Windows > > Hi Joe, > > I just ran into the same problem when installing GBrowse on Windows. > You need to add yet another repository: tcool > http://ppm.tcool.org/archives/ which provides DB_File. > > Scott > > > On Thu, Oct 23, 2008 at 4:47 PM, Corbo, Joseph > wrote: >> Greetings. I am trying to install Bioperl on my Windows XP machine > and >> am having a problem I would greatly appreciate some help with. I am >> following the directions for installing Bioperl as given on the > Bioperl >> website page "Installing Bioperl on Windows" under the section >> "Installation using the Perl Package Manager". I added the three new >> repositories (http://bioperl.org/DIST/ etc.) as they instruct. Then >> when I try to install Bioperl either using the GUI installation or the >> command line sequence "ppm-shell, search bioperl, install [number]" I >> get the following error message which I don't know how to fix: >> >> >> >> WARNING: Can't find any package that provides DB_File:: for >> Bundle-BioPerl-Core >> >> >> >> bioperl depends on Bundle-BioPerl-Core >> >> bioperl depends on SVG >> >> bioperl depends on GD-SVG >> >> bioperl depends on Spreadsheet-ParseExcel >> >> bioperl depends on XML-SAX >> >> bioperl depends on AcePerl >> >> bioperl depends on XML-SAX-ExpatXS >> >> bioperl depends on SOAP-Lite >> >> bioperl depends on SVG-Graph >> >> bioperl depends on Bio-ASN1-EntrezGene >> >> bioperl depends on XML-XPath >> >> bioperl depends on Convert-Binary-C >> >> bioperl depends on XML-Twig >> >> bioperl depends on Set-Scalar >> >> bioperl depends on Text-Shellwords >> >> bioperl depends on Data-Stag >> >> bioperl depends on libxml-perl >> >> bioperl depends on XML-Writer >> >> bioperl depends on Graph >> >> bioperl depends on Class-AutoClass >> >> bioperl depends on Clone >> >> bioperl depends on XML-DOM-XPath >> >> bioperl depends on IO-stringy >> >> bioperl depends on OLE-Storage_Lite >> >> bioperl depends on XML-NamespaceSupport >> >> bioperl depends on Cache-Cache >> >> bioperl depends on MIME-Lite >> >> bioperl depends on Math-Derivative >> >> bioperl depends on Math-Spline >> >> bioperl depends on Statistics-Descriptive >> >> bioperl depends on Tree-DAG_Node >> >> bioperl depends on Heap >> >> bioperl depends on Test-Deep >> >> bioperl depends on XML-DOM >> >> bioperl depends on XML-XPathEngine >> >> bioperl depends on Error >> >> bioperl depends on Email-Date-Format >> >> bioperl depends on MIME-Types >> >> bioperl depends on MailTools >> >> bioperl depends on Test-Tester >> >> bioperl depends on Test-NoWarnings >> >> bioperl depends on XML-RegExp >> >> bioperl depends on Test-Pod >> >> bioperl depends on TimeDate >> >> >> >> >> >> Any thoughts on how to fix this? Thanks, Joe Corbo >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From Kevin.M.Brown at asu.edu Thu Oct 23 18:51:31 2008 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 23 Oct 2008 15:51:31 -0700 Subject: [Bioperl-l] Need help installing Bioperl on Windows In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4056AB5CC@EX02.asurite.ad.asu.edu> Which version of Perl did you install from Activestate? I have 5.8.8.820 and DB_File is provided by the Activestate repository. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Corbo, Joseph > Sent: Thursday, October 23, 2008 1:47 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Need help installing Bioperl on Windows > > Greetings. I am trying to install Bioperl on my Windows XP > machine and > am having a problem I would greatly appreciate some help with. I am > following the directions for installing Bioperl as given on > the Bioperl > website page "Installing Bioperl on Windows" under the section > "Installation using the Perl Package Manager". I added the three new > repositories (http://bioperl.org/DIST/ etc.) as they instruct. Then > when I try to install Bioperl either using the GUI installation or the > command line sequence "ppm-shell, search bioperl, install [number]" I > get the following error message which I don't know how to fix: > > > > WARNING: Can't find any package that provides DB_File:: for > Bundle-BioPerl-Core > > > > bioperl depends on Bundle-BioPerl-Core > > bioperl depends on SVG > > bioperl depends on GD-SVG > > bioperl depends on Spreadsheet-ParseExcel > > bioperl depends on XML-SAX > > bioperl depends on AcePerl > > bioperl depends on XML-SAX-ExpatXS > > bioperl depends on SOAP-Lite > > bioperl depends on SVG-Graph > > bioperl depends on Bio-ASN1-EntrezGene > > bioperl depends on XML-XPath > > bioperl depends on Convert-Binary-C > > bioperl depends on XML-Twig > > bioperl depends on Set-Scalar > > bioperl depends on Text-Shellwords > > bioperl depends on Data-Stag > > bioperl depends on libxml-perl > > bioperl depends on XML-Writer > > bioperl depends on Graph > > bioperl depends on Class-AutoClass > > bioperl depends on Clone > > bioperl depends on XML-DOM-XPath > > bioperl depends on IO-stringy > > bioperl depends on OLE-Storage_Lite > > bioperl depends on XML-NamespaceSupport > > bioperl depends on Cache-Cache > > bioperl depends on MIME-Lite > > bioperl depends on Math-Derivative > > bioperl depends on Math-Spline > > bioperl depends on Statistics-Descriptive > > bioperl depends on Tree-DAG_Node > > bioperl depends on Heap > > bioperl depends on Test-Deep > > bioperl depends on XML-DOM > > bioperl depends on XML-XPathEngine > > bioperl depends on Error > > bioperl depends on Email-Date-Format > > bioperl depends on MIME-Types > > bioperl depends on MailTools > > bioperl depends on Test-Tester > > bioperl depends on Test-NoWarnings > > bioperl depends on XML-RegExp > > bioperl depends on Test-Pod > > bioperl depends on TimeDate > > > > > > Any thoughts on how to fix this? Thanks, Joe Corbo > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Oct 23 20:54:10 2008 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Oct 2008 19:54:10 -0500 Subject: [Bioperl-l] Need help installing Bioperl on Windows In-Reply-To: <1A4207F8295607498283FE9E93B775B4056AB5CC@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B4056AB5CC@EX02.asurite.ad.asu.edu> Message-ID: <9CEB5105-4EDD-4B2B-85A1-771A27FF652C@illinois.edu> Apparently there are problems building DB_File with ActivePerl on most platforms with perl 5.10: http://community.activestate.com/node/2810 I suggest using ActivePerl 5.8 until this is resolved. chris On Oct 23, 2008, at 5:51 PM, Kevin Brown wrote: > Which version of Perl did you install from Activestate? > > I have 5.8.8.820 and DB_File is provided by the Activestate > repository. > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Corbo, Joseph >> Sent: Thursday, October 23, 2008 1:47 PM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Need help installing Bioperl on Windows >> >> Greetings. I am trying to install Bioperl on my Windows XP >> machine and >> am having a problem I would greatly appreciate some help with. I am >> following the directions for installing Bioperl as given on >> the Bioperl >> website page "Installing Bioperl on Windows" under the section >> "Installation using the Perl Package Manager". I added the three new >> repositories (http://bioperl.org/DIST/ etc.) as they instruct. Then >> when I try to install Bioperl either using the GUI installation or >> the >> command line sequence "ppm-shell, search bioperl, install [number]" I >> get the following error message which I don't know how to fix: >> >> >> >> WARNING: Can't find any package that provides DB_File:: for >> Bundle-BioPerl-Core >> >> >> >> bioperl depends on Bundle-BioPerl-Core >> >> bioperl depends on SVG >> >> bioperl depends on GD-SVG >> >> bioperl depends on Spreadsheet-ParseExcel >> >> bioperl depends on XML-SAX >> >> bioperl depends on AcePerl >> >> bioperl depends on XML-SAX-ExpatXS >> >> bioperl depends on SOAP-Lite >> >> bioperl depends on SVG-Graph >> >> bioperl depends on Bio-ASN1-EntrezGene >> >> bioperl depends on XML-XPath >> >> bioperl depends on Convert-Binary-C >> >> bioperl depends on XML-Twig >> >> bioperl depends on Set-Scalar >> >> bioperl depends on Text-Shellwords >> >> bioperl depends on Data-Stag >> >> bioperl depends on libxml-perl >> >> bioperl depends on XML-Writer >> >> bioperl depends on Graph >> >> bioperl depends on Class-AutoClass >> >> bioperl depends on Clone >> >> bioperl depends on XML-DOM-XPath >> >> bioperl depends on IO-stringy >> >> bioperl depends on OLE-Storage_Lite >> >> bioperl depends on XML-NamespaceSupport >> >> bioperl depends on Cache-Cache >> >> bioperl depends on MIME-Lite >> >> bioperl depends on Math-Derivative >> >> bioperl depends on Math-Spline >> >> bioperl depends on Statistics-Descriptive >> >> bioperl depends on Tree-DAG_Node >> >> bioperl depends on Heap >> >> bioperl depends on Test-Deep >> >> bioperl depends on XML-DOM >> >> bioperl depends on XML-XPathEngine >> >> bioperl depends on Error >> >> bioperl depends on Email-Date-Format >> >> bioperl depends on MIME-Types >> >> bioperl depends on MailTools >> >> bioperl depends on Test-Tester >> >> bioperl depends on Test-NoWarnings >> >> bioperl depends on XML-RegExp >> >> bioperl depends on Test-Pod >> >> bioperl depends on TimeDate >> >> >> >> >> >> Any thoughts on how to fix this? Thanks, Joe Corbo >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From heikki at sanbi.ac.za Fri Oct 24 02:32:49 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 24 Oct 2008 08:32:49 +0200 Subject: [Bioperl-l] SimpleAlign - get_seq_by_id In-Reply-To: <16B97FCC-FA65-4FD5-AD49-A3E34F7FC07A@bioperl.org> References: <16B97FCC-FA65-4FD5-AD49-A3E34F7FC07A@bioperl.org> Message-ID: <200810240832.49915.heikki@sanbi.ac.za> The main reason it has not been Bio::SeqAlign is that sequence ID not necessarily a unique identifier in a MSA. Multiple regions of the sequence defined by one ID can be in one. The current code returns only the more or less randomly selected first Bio::LocatebleSeqI object with that ID. Should we make it context sensitive and return an array of sequences in array context? That brings up an other question: After the change, the get_seq_by_id() will behave differently from all other instances of that method, so should it be renamed to reflect that? -Heikkki On Thursday 23 October 2008 21:29:20 Jason Stajich wrote: > I added get_seq_by_id to Bio::SimpleAlign to allow retrieval of a > particular sequence from the alignment by ID. Not sure why this didn't > exist before. > > -jason > -- > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From rvos at interchange.ubc.ca Fri Oct 24 02:40:25 2008 From: rvos at interchange.ubc.ca (Rutger Vos) Date: Fri, 24 Oct 2008 14:40:25 +0800 Subject: [Bioperl-l] SimpleAlign - get_seq_by_id In-Reply-To: <8254_1224830163_1224830163_14811_1224830159_1224830159_200810240832.49915.heikki@sanbi.ac.za> References: <16B97FCC-FA65-4FD5-AD49-A3E34F7FC07A@bioperl.org> <8254_1224830163_1224830163_14811_1224830159_1224830159_200810240832.49915.heikki@sanbi.ac.za> Message-ID: <2bb9b24a0810232340n3a005afel4b2e5dcff805730f@mail.gmail.com> I would be very hesitant to introduce array contexts - it just yields subtle bugs and maintenance issues. Perhaps a separate get_all_seqs_by_id would be better? On Fri, Oct 24, 2008 at 2:32 PM, Heikki Lehvaslaiho wrote: > > The main reason it has not been Bio::SeqAlign is that sequence ID not > necessarily a unique identifier in a MSA. Multiple regions of the sequence > defined by one ID can be in one. > > The current code returns only the more or less randomly selected first > Bio::LocatebleSeqI object with that ID. Should we make it context sensitive > and return an array of sequences in array context? > > That brings up an other question: After the change, the get_seq_by_id() will > behave differently from all other instances of that method, so should it be > renamed to reflect that? > > -Heikkki > > On Thursday 23 October 2008 21:29:20 Jason Stajich wrote: >> I added get_seq_by_id to Bio::SimpleAlign to allow retrieval of a >> particular sequence from the alignment by ID. Not sure why this didn't >> exist before. >> >> -jason >> -- >> Jason Stajich >> jason at bioperl.org >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com From heikki at sanbi.ac.za Fri Oct 24 03:05:27 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 24 Oct 2008 09:05:27 +0200 Subject: [Bioperl-l] SimpleAlign - get_seq_by_id In-Reply-To: <200810240832.49915.heikki@sanbi.ac.za> References: <16B97FCC-FA65-4FD5-AD49-A3E34F7FC07A@bioperl.org> <200810240832.49915.heikki@sanbi.ac.za> Message-ID: <200810240905.27399.heikki@sanbi.ac.za> Spoke too soon: each_seq_with_id() already exists. Is there really a need for get_seq_by_id()? A more general observation: Bio::SimpleAlign with its 83 methods has grown too big to keep all the code (3055 lines total) in one file. Any volunteers to break it up into more manageable chunks? The methods in the current file have already been categorised which should help in the task: =head1 Modifier methods =head1 Sequence selection methods =head1 Create new alignments =head1 Change sequences within the MSA =head1 MSA attributes =head1 Alignment descriptors =head1 Alignment positions =head1 Sequence names The helper modules should go into Bio::Align name space. -Heikki On Friday 24 October 2008 08:32:49 Heikki Lehvaslaiho wrote: > The main reason it has not been Bio::SeqAlign is that sequence ID not > necessarily a unique identifier in a MSA. Multiple regions of the sequence > defined by one ID can be in one. > > The current code returns only the more or less randomly selected first > Bio::LocatebleSeqI object with that ID. Should we make it context sensitive > and return an array of sequences in array context? > > That brings up an other question: After the change, the get_seq_by_id() > will behave differently from all other instances of that method, so should > it be renamed to reflect that? > > -Heikkki > > On Thursday 23 October 2008 21:29:20 Jason Stajich wrote: > > I added get_seq_by_id to Bio::SimpleAlign to allow retrieval of a > > particular sequence from the alignment by ID. Not sure why this didn't > > exist before. > > > > -jason > > -- > > Jason Stajich > > jason at bioperl.org > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Fri Oct 24 06:04:06 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 24 Oct 2008 12:04:06 +0200 Subject: [Bioperl-l] Bio::Tools reorganisation (long) Message-ID: <200810241204.07002.heikki@sanbi.ac.za> I was thinking of the proposed simplification of the BioPerl core and reading http://www.bioperl.org/wiki/Proposed_core_modules_changes. I realised that Bio::Tools really should be reorganised. At the moment it holds at least five different kinds of categories of "tools". The most difficult question here is, of course: How much backward compatibility should be kept? I am in favour of doing quite drastic changes if they help clarify the purpose of the modules. Also, what is the defining rule to categorize the modules? Possibilities: 1. Local/web 2. Type of data: Sequences in databases/Analysis tools 3. Type of analysis Below I've outlined what I think need to be done based on assumption that local/web rule is primary and type of analysis is the secondary organising principle. None of this is in the bioperl wiki but can be put in there at any point of the discussion. -Heikki 1. Core functionality ===================== Used by core sequence objects. e.g.: Bio::Utils::Codontable Bio::Utils::GuessSeqFormat Suggestion: Not called directly, so moving to. e.g. Bio::Seq, should not be a problem. Can be implemented immediately. 2. Utilities ============ Perform a simple analysis related to sequences or sequence formats. All the code is present within the module. e.g.: Bio::Tools::IUPAC Bio::Tools::OddCodes Bio::Tools::ECnumber (?) Suggestion: Separate them from tools into Bio::Utils within the core package. Seldom used, so should not break backward compatibility too much. 3. Parsers for program outputs ============================== Bulk of the Bio::Tools name space content. They need to be sorted into categories when possible according to convention: Bio::Tools::Alignment, Bio::Tools, Phylo. Suggestion: Move into Bio::Tools::Parser(, or Bio::Parser). 4. External local programme wrappers ==================================== Most of these, but not all, are in Bio::Tools::Run and already in bioperl-run package. They use parsers in Bio::Tools name space (category 3.). Suggestion: Move into Bio::Tools::RunLocal, (or Bio::RunLocal) to shorten the name. 5. Wrappers for remote (Web based) services =========================================== Most of the service wrappers follow Bio::SimpleAlignI and are in Bio::Tools::Analysis. Examples of modules that are using web but are among local application wrappers: Bio::Tools::Protparam Bio::Tools::WebBlat Modules using Web access, but are in the bioperl-run package: Bio::Tools::Run::Pise* Modules accessing web for retrieving sequences: Bio::DB. This name space contains modules for managing local sequence databases, accessing web based sequence databases, and a variety of other objects: Bibliographic references, sequence annotation, MeSH terms, Taxonomy. Suggestion: move to Bio::Tools::RunExternal, (or Bio::Web). Reorganise Bio::DB in the similar manner to logical categories. -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bosborne11 at verizon.net Fri Oct 24 08:47:21 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 24 Oct 2008 08:47:21 -0400 Subject: [Bioperl-l] Fast BLAST parsing In-Reply-To: References: Message-ID: Chris, Here I am thinking there's some fancy new approach but it's the old standby that does the trick! Also corrected that Synopsis, it didn't work for me. Thanks again, Brian O. On Oct 23, 2008, at 1:09 PM, Chris Fields wrote: > Could you index the BLAST report using Bio::Index::Blast? From the > synopsis: > > use Bio::Index::Blast; > my ($indexfile,$file1,$file2,$query); > my $index = Bio::Index::Blast->new(-filename => $indexfile, > -write_flag => 1); > $index->make_index($file1,$file2); > > my $data = $index->get_stream($query); > > my $blast_report = $index->fetch_report($query); > print "query is ", $blast_report->query, "\n"; > while ( my $result = $blast_report->next_result ) { > print $result->algorithm, "\n"; > while ( my $hsp = $result->next_hit ) { > print "\t name ", $hsp->name, > } > print "\n"; > } > > I think you can index using a callback on the query name (so you can > look up by various means). > > chris > > > On Oct 23, 2008, at 11:22 AM, Brian Osborne wrote: > >> Bioperl, >> >> I'm not familiar with the very latest and greatest in BLAST >> parsing, perhaps you can help me here. I have a large Blast output >> file, it has multiple results in it. I'd like to rapidly find the >> relevant result for a given query name, I don't want to iterate >> over the results checking for query_name() each time. How can I >> directly pull out a result using query name? >> >> >> Thanks again, >> >> Brian O. >> -- >> Brian Osborne, PhD >> BioTeam: http://bioteam.net >> email: briano at bioteam.net >> mobile: 978-317-3101 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cserpell at dim.uchile.cl Fri Oct 24 10:50:21 2008 From: cserpell at dim.uchile.cl (=?ISO-8859-1?Q?Cristi=E1n_Serpell?=) Date: Fri, 24 Oct 2008 11:50:21 -0300 Subject: [Bioperl-l] Editing sub locations Message-ID: <1C3558D8-5C59-4C20-845F-F243CDE58FAE@dim.uchile.cl> Hi I would like to know if there is a way to edit a sub location from a Bio::Location::Split object, through the API. For example, you can get an array with $subloc = $location- >sub_Location(), and then modify it with $subloc->start($subloc- >start() + 1), but this does not change the original sub location. IF you get the array, you get the same original values (the same object). One idea I have got is to create a new Bio::Location::Split object and adding the modified sub locations, but then I don't know if you can change the Bio::SeqFeature location manually, even when creating a new one. The whole thing I'm doing is a program that "moves" an object, adding the same value to start and end values to everything. Any idea would help Thanks Cristi?n From jieuiuc at yahoo.com Fri Oct 24 14:06:44 2008 From: jieuiuc at yahoo.com (Jie Zhang) Date: Fri, 24 Oct 2008 11:06:44 -0700 (PDT) Subject: [Bioperl-l] help on Bio-Perl Installation Message-ID: <431725.22025.qm@web31007.mail.mud.yahoo.com> HI, ? I'm new to BioPerl and just finished installing BioPerl on Windows XP by dowloading and unpack the file bioperl-1.5.2_102.tar.gz??from the Bioperl.org website,?then strictly followed the manual?installation instruction. All the Build and Test steps were fine although there were some unimportant modules failed to install. I was able to view the documentation by typing perldoc Bio::Perl in the command window. However, when I?tested if?it is?installed properly, I?encountered problem. I wrote?a two-line script file called bp.pl ? #!/bin/perl -w use Bio::Perl; ? The compilation step failed?and gave me this message"use?not allowed in the expression at bp.pl line 3, syntax error at bp.pl line 3, near"use Bio::Perl"...." ? That warning appeared no matter the script is "use Bio::Seq" or other modules. It seems use is not allowed here. What could be wrong during installation? Could you please help me? ? Thank you very much ? Jie ? ? From cain.cshl at gmail.com Fri Oct 24 14:48:13 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 24 Oct 2008 14:48:13 -0400 Subject: [Bioperl-l] Need help installing Bioperl on Windows In-Reply-To: <1A4207F8295607498283FE9E93B775B4056AB5CC@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B4056AB5CC@EX02.asurite.ad.asu.edu> Message-ID: <536f21b00810241148r4c1728afx20b59682fc4c7dbb@mail.gmail.com> Unfortunately, DB_File for perl 5.8 went away with build 824. I don't know why, but that is why the tcool repository is needed. Scott On Thu, Oct 23, 2008 at 6:51 PM, Kevin Brown wrote: > Which version of Perl did you install from Activestate? > > I have 5.8.8.820 and DB_File is provided by the Activestate repository. > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Corbo, Joseph >> Sent: Thursday, October 23, 2008 1:47 PM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Need help installing Bioperl on Windows >> >> Greetings. I am trying to install Bioperl on my Windows XP >> machine and >> am having a problem I would greatly appreciate some help with. I am >> following the directions for installing Bioperl as given on >> the Bioperl >> website page "Installing Bioperl on Windows" under the section >> "Installation using the Perl Package Manager". I added the three new >> repositories (http://bioperl.org/DIST/ etc.) as they instruct. Then >> when I try to install Bioperl either using the GUI installation or the >> command line sequence "ppm-shell, search bioperl, install [number]" I >> get the following error message which I don't know how to fix: >> >> >> >> WARNING: Can't find any package that provides DB_File:: for >> Bundle-BioPerl-Core >> >> >> >> bioperl depends on Bundle-BioPerl-Core >> >> bioperl depends on SVG >> >> bioperl depends on GD-SVG >> >> bioperl depends on Spreadsheet-ParseExcel >> >> bioperl depends on XML-SAX >> >> bioperl depends on AcePerl >> >> bioperl depends on XML-SAX-ExpatXS >> >> bioperl depends on SOAP-Lite >> >> bioperl depends on SVG-Graph >> >> bioperl depends on Bio-ASN1-EntrezGene >> >> bioperl depends on XML-XPath >> >> bioperl depends on Convert-Binary-C >> >> bioperl depends on XML-Twig >> >> bioperl depends on Set-Scalar >> >> bioperl depends on Text-Shellwords >> >> bioperl depends on Data-Stag >> >> bioperl depends on libxml-perl >> >> bioperl depends on XML-Writer >> >> bioperl depends on Graph >> >> bioperl depends on Class-AutoClass >> >> bioperl depends on Clone >> >> bioperl depends on XML-DOM-XPath >> >> bioperl depends on IO-stringy >> >> bioperl depends on OLE-Storage_Lite >> >> bioperl depends on XML-NamespaceSupport >> >> bioperl depends on Cache-Cache >> >> bioperl depends on MIME-Lite >> >> bioperl depends on Math-Derivative >> >> bioperl depends on Math-Spline >> >> bioperl depends on Statistics-Descriptive >> >> bioperl depends on Tree-DAG_Node >> >> bioperl depends on Heap >> >> bioperl depends on Test-Deep >> >> bioperl depends on XML-DOM >> >> bioperl depends on XML-XPathEngine >> >> bioperl depends on Error >> >> bioperl depends on Email-Date-Format >> >> bioperl depends on MIME-Types >> >> bioperl depends on MailTools >> >> bioperl depends on Test-Tester >> >> bioperl depends on Test-NoWarnings >> >> bioperl depends on XML-RegExp >> >> bioperl depends on Test-Pod >> >> bioperl depends on TimeDate >> >> >> >> >> >> Any thoughts on how to fix this? Thanks, Joe Corbo >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From jason at bioperl.org Sat Oct 25 11:44:43 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 25 Oct 2008 11:44:43 -0400 Subject: [Bioperl-l] Editing sub locations In-Reply-To: <1C3558D8-5C59-4C20-845F-F243CDE58FAE@dim.uchile.cl> References: <1C3558D8-5C59-4C20-845F-F243CDE58FAE@dim.uchile.cl> Message-ID: <7DE1B520-4A96-4120-8E50-C01C28253101@bioperl.org> you can replace the stored location for a SeqFeature like this: $feature->location($newlocation); You can also update the values for a sub location with this because you have access to a reference to each of the individual sublocations Try this out (also added this example to http://bioperl.org/wiki/Module:Bio::Location::Split ) #!/usr/bin/perl -w use Bio::Location::Split; use Bio::Location::Simple; my $split = Bio::Location::Split->new; $split->add_sub_Location(Bio::Location::Simple->new(-start => 1, -end => 20, -strand => 1)); $split->add_sub_Location(Bio::Location::Simple->new(-start => 25, -end => 35, -strand => 1)); print $split->to_FTstring(),"\n"; for my $subloc ( $split->each_Location ) { $subloc->start($subloc->start + 1001); $subloc->end($subloc->end + 1001); } print $split->to_FTstring(),"\n"; On Oct 24, 2008, at 10:50 AM, Cristi?n Serpell wrote: > Hi > > I would like to know if there is a way to edit a sub location from a > Bio::Location::Split object, through the API. > > For example, you can get an array with $subloc = $location- > >sub_Location(), and then modify it with $subloc->start($subloc- > >start() + 1), but this does not change the original sub location. > IF you get the array, you get the same original values (the same > object). > > One idea I have got is to create a new Bio::Location::Split object > and adding the modified sub locations, but then I don't know if you > can change the Bio::SeqFeature location manually, even when creating > a new one. > > The whole thing I'm doing is a program that "moves" an object, > adding the same value to start and end values to everything. > > Any idea would help > > Thanks > Cristi?n > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From jason at bioperl.org Sat Oct 25 11:47:12 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 25 Oct 2008 11:47:12 -0400 Subject: [Bioperl-l] SimpleAlign - get_seq_by_id In-Reply-To: <200810240905.27399.heikki@sanbi.ac.za> References: <16B97FCC-FA65-4FD5-AD49-A3E34F7FC07A@bioperl.org> <200810240832.49915.heikki@sanbi.ac.za> <200810240905.27399.heikki@sanbi.ac.za> Message-ID: <828931CB-922D-4D40-8DBB-9C806D225A63@bioperl.org> you're right - it should be rolled back. I guess each_seq_xxx gets the job done. We have a real problem with each vs get for our API mixture. I think there had been some logic there at one point but I think it is confusingly mixed now. Perhaps a cleaned up API with deprecated aliases would be okay way to at some point move towards more standardized. It would make sense to also see about implementing Gblocks style filtering method as well (but not in SimpleAlign give then number of methods already as you mention!). -jason On Oct 24, 2008, at 3:05 AM, Heikki Lehvaslaiho wrote: > Spoke too soon: each_seq_with_id() already exists. Is there really a > need for > get_seq_by_id()? > > A more general observation: Bio::SimpleAlign with its 83 methods has > grown too > big to keep all the code (3055 lines total) in one file. Any > volunteers to > break it up into more manageable chunks? > > The methods in the current file have already been categorised which > should help > in the task: > > =head1 Modifier methods > =head1 Sequence selection methods > =head1 Create new alignments > =head1 Change sequences within the MSA > =head1 MSA attributes > =head1 Alignment descriptors > =head1 Alignment positions > =head1 Sequence names > > The helper modules should go into Bio::Align name space. > > > -Heikki > > > On Friday 24 October 2008 08:32:49 Heikki Lehvaslaiho wrote: >> The main reason it has not been Bio::SeqAlign is that sequence ID not >> necessarily a unique identifier in a MSA. Multiple regions of the >> sequence >> defined by one ID can be in one. >> >> The current code returns only the more or less randomly selected >> first >> Bio::LocatebleSeqI object with that ID. Should we make it context >> sensitive >> and return an array of sequences in array context? >> >> That brings up an other question: After the change, the >> get_seq_by_id() >> will behave differently from all other instances of that method, so >> should >> it be renamed to reflect that? >> >> -Heikkki >> >> On Thursday 23 October 2008 21:29:20 Jason Stajich wrote: >>> I added get_seq_by_id to Bio::SimpleAlign to allow retrieval of a >>> particular sequence from the alignment by ID. Not sure why this >>> didn't >>> exist before. >>> >>> -jason >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From avilella at gmail.com Sat Oct 25 13:31:53 2008 From: avilella at gmail.com (Albert Vilella) Date: Sat, 25 Oct 2008 18:31:53 +0100 Subject: [Bioperl-l] SimpleAlign - get_seq_by_id In-Reply-To: <828931CB-922D-4D40-8DBB-9C806D225A63@bioperl.org> References: <16B97FCC-FA65-4FD5-AD49-A3E34F7FC07A@bioperl.org> <200810240832.49915.heikki@sanbi.ac.za> <200810240905.27399.heikki@sanbi.ac.za> <828931CB-922D-4D40-8DBB-9C806D225A63@bioperl.org> Message-ID: <358f4d650810251031v79d48ca9ye5054eb8ff74875b@mail.gmail.com> > > It would make sense to also see about implementing Gblocks style filtering > method as well (but not in SimpleAlign give then number of methods already > as you mention!). Yes, that would be really useful :-) Some of the aligner programs, like T-Coffee, also have the ability to produce a score matrix for a given MSA. This matrix-style notation of alignment "quality" could be used to filter given a threshold not only vertically (columns) but also horizontally (e.g., mispredicted coding exons that don't align well to the rest of the proteins). I think there is growing interest in the concept of meta-aligners nowadays, i.e., methods that will combine the resulting MSAs from a set of aligners and produce a final MSA jointly with a score matrix of alignment quality: http://nar.oxfordjournals.org/cgi/content/full/34/6/1692 It would be very interesting to have the methods for this kind of alignment filtering in Bioperl so that one can call methods from a $sa object and get filtered alignments given defined thresholds. > -jason > > On Oct 24, 2008, at 3:05 AM, Heikki Lehvaslaiho wrote: > > Spoke too soon: each_seq_with_id() already exists. Is there really a need >> for >> get_seq_by_id()? >> >> A more general observation: Bio::SimpleAlign with its 83 methods has grown >> too >> big to keep all the code (3055 lines total) in one file. Any volunteers to >> break it up into more manageable chunks? >> >> The methods in the current file have already been categorised which should >> help >> in the task: >> >> =head1 Modifier methods >> =head1 Sequence selection methods >> =head1 Create new alignments >> =head1 Change sequences within the MSA >> =head1 MSA attributes >> =head1 Alignment descriptors >> =head1 Alignment positions >> =head1 Sequence names >> >> The helper modules should go into Bio::Align name space. >> >> >> -Heikki >> >> >> On Friday 24 October 2008 08:32:49 Heikki Lehvaslaiho wrote: >> >>> The main reason it has not been Bio::SeqAlign is that sequence ID not >>> necessarily a unique identifier in a MSA. Multiple regions of the >>> sequence >>> defined by one ID can be in one. >>> >>> The current code returns only the more or less randomly selected first >>> Bio::LocatebleSeqI object with that ID. Should we make it context >>> sensitive >>> and return an array of sequences in array context? >>> >>> That brings up an other question: After the change, the get_seq_by_id() >>> will behave differently from all other instances of that method, so >>> should >>> it be renamed to reflect that? >>> >>> -Heikkki >>> >>> On Thursday 23 October 2008 21:29:20 Jason Stajich wrote: >>> >>>> I added get_seq_by_id to Bio::SimpleAlign to allow retrieval of a >>>> particular sequence from the alignment by ID. Not sure why this didn't >>>> exist before. >>>> >>>> -jason >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> -- >> ______ _/ _/_____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/________________________________________________________ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Sat Oct 25 13:32:54 2008 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 25 Oct 2008 12:32:54 -0500 Subject: [Bioperl-l] SimpleAlign - get_seq_by_id In-Reply-To: <828931CB-922D-4D40-8DBB-9C806D225A63@bioperl.org> References: <16B97FCC-FA65-4FD5-AD49-A3E34F7FC07A@bioperl.org> <200810240832.49915.heikki@sanbi.ac.za> <200810240905.27399.heikki@sanbi.ac.za> <828931CB-922D-4D40-8DBB-9C806D225A63@bioperl.org> Message-ID: <2569959C-7D75-454B-9FEC-CD51AA210F62@illinois.edu> Issues with naming each_* vs get_* vs next_* have been raised in the past, though I can't find them on the mail list archives. I have similar issues with num_* vs no_*. Maybe we should come up with some basic coding guidelines in a HOWTO (tests, method names, etc). We already have some basic documentation for coding standards with some suggestions (best practices, advanced bioperl, etc), so maybe these should be consolidated into a single resource and revised by the core devs to reflect what we expect. BTW, I think an API cleanup is worth doing, but I don't see it getting done before a 1.6 release until we agree on some simple coding conventions. However, for SimpleAlign, we could run a simple cleanup by moving non-AlignI (utility) methods to the Bio::Align::Utilities module and deprecating use of the Bio::SimpleAlign versions (i.e. warn and delegate to the Utilities versions in the meantime, then remove after 1.6). chris On Oct 25, 2008, at 10:47 AM, Jason Stajich wrote: > you're right - it should be rolled back. I guess each_seq_xxx gets > the job done. > > We have a real problem with each vs get for our API mixture. I think > there had been some logic there at one point but I think it is > confusingly mixed now. > Perhaps a cleaned up API with deprecated aliases would be okay way > to at some point move towards more standardized. > > It would make sense to also see about implementing Gblocks style > filtering method as well (but not in SimpleAlign give then number of > methods already as you mention!). > > -jason > On Oct 24, 2008, at 3:05 AM, Heikki Lehvaslaiho wrote: > >> Spoke too soon: each_seq_with_id() already exists. Is there really >> a need for >> get_seq_by_id()? >> >> A more general observation: Bio::SimpleAlign with its 83 methods >> has grown too >> big to keep all the code (3055 lines total) in one file. Any >> volunteers to >> break it up into more manageable chunks? >> >> The methods in the current file have already been categorised which >> should help >> in the task: >> >> =head1 Modifier methods >> =head1 Sequence selection methods >> =head1 Create new alignments >> =head1 Change sequences within the MSA >> =head1 MSA attributes >> =head1 Alignment descriptors >> =head1 Alignment positions >> =head1 Sequence names >> >> The helper modules should go into Bio::Align name space. >> >> >> -Heikki >> >> >> On Friday 24 October 2008 08:32:49 Heikki Lehvaslaiho wrote: >>> The main reason it has not been Bio::SeqAlign is that sequence ID >>> not >>> necessarily a unique identifier in a MSA. Multiple regions of the >>> sequence >>> defined by one ID can be in one. >>> >>> The current code returns only the more or less randomly selected >>> first >>> Bio::LocatebleSeqI object with that ID. Should we make it context >>> sensitive >>> and return an array of sequences in array context? >>> >>> That brings up an other question: After the change, the >>> get_seq_by_id() >>> will behave differently from all other instances of that method, >>> so should >>> it be renamed to reflect that? >>> >>> -Heikkki >>> >>> On Thursday 23 October 2008 21:29:20 Jason Stajich wrote: >>>> I added get_seq_by_id to Bio::SimpleAlign to allow retrieval of a >>>> particular sequence from the alignment by ID. Not sure why this >>>> didn't >>>> exist before. >>>> >>>> -jason >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> ______ _/ _/ >> _____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/ >> ________________________________________________________ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at illinois.edu Sat Oct 25 15:05:09 2008 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 25 Oct 2008 14:05:09 -0500 Subject: [Bioperl-l] Bio::Tools reorganisation (long) In-Reply-To: <200810241204.07002.heikki@sanbi.ac.za> References: <200810241204.07002.heikki@sanbi.ac.za> Message-ID: On Oct 24, 2008, at 5:04 AM, Heikki Lehvaslaiho wrote: > I was thinking of the proposed simplification of the BioPerl core and > reading http://www.bioperl.org/wiki/Proposed_core_modules_changes. > > I realised that Bio::Tools really should be reorganised. At the > moment it holds at least five different kinds of categories > of "tools". > > The most difficult question here is, of course: How much backward > compatibility should be kept? I am in favour of doing quite > drastic changes if they help clarify the purpose of the modules. It's probably best to just bite the bullet, fix as many bugs within reason within the next month or two, and just put out a 1.6 release. Past that point I think we should focus on what a bioperl 2.0 should be like and work towards that w/o the overhead of worrying about breaking old code. > Also, what is the defining rule to categorize the modules? > > Possibilities: > > 1. Local/web > 2. Type of data: Sequences in databases/Analysis tools > 3. Type of analysis > > Below I've outlined what I think need to be done based on > assumption that local/web rule is primary and type of analysis is the > secondary organising principle. > > > None of this is in the bioperl wiki but can be put in there at any > point of > the discussion. > > > -Heikki You're free to modify the 'proposed changes' page as you want; we can use the discussion page as well for ideas. > 1. Core functionality > ===================== > > Used by core sequence objects. > > e.g.: > Bio::Utils::Codontable > Bio::Utils::GuessSeqFormat > > Suggestion: Not called directly, so moving to. e.g. Bio::Seq, > should not be a problem. Can be implemented immediately. +1 > 2. Utilities > ============ > > Perform a simple analysis related to sequences or sequence > formats. All the code is present within the module. > > e.g.: > Bio::Tools::IUPAC > Bio::Tools::OddCodes > Bio::Tools::ECnumber (?) > > Suggestion: Separate them from tools into Bio::Utils within the > core package. Seldom used, so should not break backward > compatibility too much. +1 > 3. Parsers for program outputs > ============================== > > Bulk of the Bio::Tools name space content. They need to be sorted into > categories when possible according to convention: > Bio::Tools::Alignment, Bio::Tools, Phylo. > > Suggestion: Move into Bio::Tools::Parser(, or Bio::Parser). Some of the tools combine parsers with simple container objects, so they aren't easily separated (e.g. Bio::Tools::EUtilities, which parses output from NCBI's eutils and represents data from them as Bio* container objects). I suppose I could move the simple containers into their own unique namespace... > 4. External local programme wrappers > ==================================== > > Most of these, but not all, are in Bio::Tools::Run and already in > bioperl-run package. They use parsers in Bio::Tools name > space (category 3.). > > Suggestion: Move into Bio::Tools::RunLocal, (or Bio::RunLocal) to > shorten the > name. Maybe just Bio::Run or Bio::Wrapper? > 5. Wrappers for remote (Web based) services > =========================================== > > Most of the service wrappers follow Bio::SimpleAlignI and are in > Bio::Tools::Analysis. > > Examples of modules that are using web but are among local > application wrappers: > > Bio::Tools::Protparam > Bio::Tools::WebBlat WebBlat is deprecated (no longer maintained). http://thread.gmane.org/gmane.comp.lang.perl.bio.general/13520 > Modules using Web access, but are in the bioperl-run package: > > Bio::Tools::Run::Pise* > > Modules accessing web for retrieving sequences: Bio::DB. > > This name space contains modules for managing local sequence > databases, > accessing web based sequence databases, and a variety of other > objects: Bibliographic references, sequence annotation, MeSH > terms, Taxonomy. > > Suggestion: move to Bio::Tools::RunExternal, (or > Bio::Web). Reorganise Bio::DB in the similar manner to logical > categories. I think this is okay, though the shorter the namespace the better. chris > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ From hlapp at gmx.net Sun Oct 26 01:59:50 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 25 Oct 2008 22:59:50 -0700 Subject: [Bioperl-l] SimpleAlign - get_seq_by_id In-Reply-To: <828931CB-922D-4D40-8DBB-9C806D225A63@bioperl.org> References: <16B97FCC-FA65-4FD5-AD49-A3E34F7FC07A@bioperl.org> <200810240832.49915.heikki@sanbi.ac.za> <200810240905.27399.heikki@sanbi.ac.za> <828931CB-922D-4D40-8DBB-9C806D225A63@bioperl.org> Message-ID: <0D127B21-B343-4CBA-A74D-EDCCF46A433D@gmx.net> On Oct 25, 2008, at 8:47 AM, Jason Stajich wrote: > We have a real problem with each vs get for our API mixture. I think > there had been some logic there at one point but I think it is > confusingly mixed now. The each_XXX is (supposed to be) old-style. I started cleaning this up to use get_XXX a long time ago but didn't get much beyond Bio::SeqI and Bio::SeqFeatureI. > you're right - it should be rolled back. I guess each_seq_xxx gets > the job done. Well, couldn't it be that for some implementations iterating over all sequences is way more expensive that possible using an indexed access? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Sun Oct 26 19:08:15 2008 From: jason at bioperl.org (Jason Stajich) Date: Sun, 26 Oct 2008 16:08:15 -0700 Subject: [Bioperl-l] problem with Bio::Das::FeatureTypeI::name In-Reply-To: <1f1de13e0810210334he069c4fh38c3e3a97514fe9@mail.gmail.com> References: <1f1de13e0810210334he069c4fh38c3e3a97514fe9@mail.gmail.com> Message-ID: <065E973E-FEF4-4FA0-AB83-952B13E782A9@bioperl.org> I'm a little confused why you have any SQL code in your program - that should all be handled under the hood by Bio::DB::GFF -- if you let it build your features for you it might work better. At any rate, I can only guess this is because you aren't building the Feature object correctly or that the feature object you are passing in isn't respecting the API FeatureIO is using. Can you get what you want just by doing a simple Did you try calling the methods for getting segments (sequences) with features out via $gff_adaptor ? -jason On Oct 21, 2008, at 3:34 AM, Vincenza Maselli wrote: > Dear All > > I tried to write a simple script to write a gff file, but my script > dead > when try to execute the method _write_feature_3 in the gff.pm > module but I > got this message: > > Abstract method "Bio::Das::FeatureTypeI::name" is not implemented by > package > Bio::DB::GFF::Typename. > This is not your fault - author of Bio::DB::GFF::Typename should be > blamed! > > Do you have any suggestion to overcome this problem? Did someone > implemented > it? > > Thanks for the help > > Vincenza > > > ================== START CODE > = > = > = > = > = > = > = > = > ====================================================================== > ! /usr/bin/perl > > use strict; > use DBI; > use DBD::mysql; > use Data::Dumper; > use Bio::FeatureIO; > use Bio::DB::GFF; > use Bio::DB::GFF::Featname; > use Bio::SeqFeature::Generic; > > my $hostname = 'localhost'; > my $user = 'user'; > my $password = 'passwd'; > my $dsn = "dbi:mysql:database=dbname;host=$hostname"; > > > my $dbh = DBI->connect($dsn, $user, $password, > {PrintError=>0,RaiseError=>0}) > || die "Can't connect to database:$DBI::errstr\n"; > > my $out = Bio::FeatureIO->new(-file => ">>test.gff" , > -format => 'GFF' , > -version => 3); > > # queries > > my $sql = qq{SELECT d.fid, d.fref, d.fstart, d.fstop, d.fbin, > d.fscore, > d.fstrand, d.fphase, d.ftarget_start, d.ftarget_stop, > t.fmethod, t.fsource, > g.gid,g.gclass,g.gname > FROM fdata d, ftype t, fgroup g > WHERE d.ftypeid = t.ftypeid > AND d.gid = g.gid > LIMIT 1}; > my $sth = $dbh->prepare($sql); > $sth->execute; > > # create GFF adaptor > > my $gff_adaptor = Bio::DB::GFF->new(-dsn => $dsn, > -user => $user, > -pass => $password); > > #initialize variables for feature object > > my $factory = $gff_adaptor; #a Bio::DB::GFF adaptor object (or > descendent) > > while (my $ref = $sth->fetchrow_hashref){ > my $srcseq = "ATGCGGATAGACGATAGCGATAACCTATAGTAGATCCGCTCGATCGTAGC"; > #the source sequence > my $start = $ref->{'fstart'}; #start of this feature > my $stop = $ref->{'fstop'}; #stop of this feature > my $method = $ref->{'fmethod'}; #this feature's GFF method > my $source = $ref->{'fsource'}; #this feature's GFF source > my $score = $ref->{'fscore'}; #this feature's score > my $fstrand = $ref->{'fstrand'}; #this feature's strand > (relative to > the source sequence, which has its own strandedness!) > my $phase = $ref->{'fphase'}; #this feature's phase > my $group = Bio::DB::GFF::Featname->new(-class => $ref- > >{'gclass'},-name > => $ref->{'gname'}); #this feature's group > my $db_id = $ref->{'fid'}; #this feature's internal database ID > my $group_id = $ref->{'gid'}; > my $tstart = $ref->{'ftarget_start'}; > my $tstop = $ref->{'ftarget_stop'}; > #create feature object > > my $feat = Bio::DB::GFF::Feature->new( > $factory, > $srcseq, > $start, > $stop, > $method, > $source, > $score, > $fstrand, > $phase, > $group, > $db_id, > $group_id, > $tstart, > $tstop); > > #write out features > my $seq_feat = Bio::SeqFeature::Generic->new( -gff3_string => > $feat->gff3_string ); > my $annseq = Bio::SeqFeature::Annotated->new(-start => $start, > -end => $stop, > -phase => $phase); > $annseq->add_SeqFeature($feat); > $out->write_feature($annseq); > } > > =========================== START RETURN > ============================================================ > >> $ perl > create_gff.pl > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::Das::FeatureTypeI::name" is not > implemented by > package Bio::DB::GFF::Typename. > This is not your fault - author of Bio::DB::GFF::Typename should be > blamed! > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::Root::RootI::throw_not_implemented > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Root/RootI.pm:680 > STACK: Bio::Das::FeatureTypeI::name > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Das/FeatureTypeI.pm:142 > STACK: Bio::FeatureIO::gff::_write_feature_3 > /usr/local/lib/perl5/site_perl/5.10.0/Bio/FeatureIO/gff.pm:884 > STACK: Bio::FeatureIO::gff::_write_feature_3 > /usr/local/lib/perl5/site_perl/5.10.0/Bio/FeatureIO/gff.pm:934 > STACK: Bio::FeatureIO::gff::write_feature > /usr/local/lib/perl5/site_perl/5.10.0/Bio/FeatureIO/gff.pm:263 > STACK: create_gff.pl:86 > ---------------------------------------------------------------- > > > > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > ====================================================================== > > -- > Vincenza Maselli > Dept. of Soil, Plant, Environmental and Animal Production Sciences > University of Naples "Federico II" > Via Universita' 100 > Parco Gussone - building number 75 "GenoPom" > 80055 Portici, Naples, Italy > phone: +39-081-2539246 > web: http://cab.unina.it > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From David.Messina at sbc.su.se Sun Oct 26 20:27:57 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 26 Oct 2008 20:27:57 -0400 Subject: [Bioperl-l] Bio::DB::GenPept fetch pooping out every ~200 queries? References: <52773.143.48.140.8.1225064553.squirrel@webmail.cornell.edu> Message-ID: <2C06DD89-6B4F-462E-BDBD-4AC8B7A61D4A@sbc.su.se> Hey everyone, We're seeing a weird behavior where when fetching GenPept protein records by accession number over the net. We get a bogus "acc does not exist" error after 200-250 queries, and that triggers a "resource not available". Repeat the query with the same accession, though, and it's retrieved just fine. Anyone else experience this, understand why, or have a solution? On a related note, this is a fatal error. Seems to me it should be just a warn so additional queries can continue. What's the rationale for fatal? Errors and code below. Thanks! Dave > ------------- EXCEPTION ------------- > MSG: Couldn't fork: Resource temporarily unavailable > STACK Bio::DB::WebDBSeqI::_open_pipe > /Network/Servers/courses2.cshl.edu/Users/jramsey/Library/Perl/5.8.8/ > Bio/DB/WebDBSeqI.pm:709 > STACK Bio::DB::WebDBSeqI::_stream_request > /Network/Servers/courses2.cshl.edu/Users/jramsey/Library/Perl/5.8.8/ > Bio/DB/WebDBSeqI.pm:738 > STACK Bio::DB::WebDBSeqI::get_seq_stream > /Network/Servers/courses2.cshl.edu/Users/jramsey/Library/Perl/5.8.8/ > Bio/DB/WebDBSeqI.pm:455 > STACK Bio::DB::NCBIHelper::get_Stream_by_acc > /Network/Servers/courses2.cshl.edu/Users/jramsey/Library/Perl/5.8.8/ > Bio/DB/NCBIHelper.pm:466 > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > /Network/Servers/courses2.cshl.edu/Users/jramsey/Library/Perl/5.8.8/ > Bio/DB/WebDBSeqI.pm:173 > STACK toplevel ./genbank:26 > ------------------------------------- > > > ------------- EXCEPTION ------------- > MSG: acc AAF52404 does not exist > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > /Network/Servers/courses2.cshl.edu/Users/jramsey/Library/Perl/5.8.8/ > Bio/DB/WebDBSeqI.pm:182 > STACK toplevel ./genbank:26 > > #!/usr/bin/perl -w > use strict; > use Bio::DB::GenPept; > use Bio::SeqIO; > use Getopt::Long; > > my $idfile; > my $format = 'genbank'; > GetOptions( > 'i|input:s' => \$idfile, > 'f|format:s' => \$format, > ); > $idfile = shift @ARGV if ! defined $idfile; > my $db = Bio::DB::GenPept->new; > > my $out = Bio::SeqIO->new(-format => $format, > -file => ">$idfile.out", > ); > > my $fh; > open($fh, $idfile) || die "cannot open '$idfile': $!"; > > while (<$fh>) { > my $id = $_; > chomp($id); # refseq id from the file > my $seq = $db->get_Seq_by_acc($id); > if ( $seq ) { > $out->write_seq($seq); > } else { > warn("Didn't find seq $id\n"); > } > } > From cserpell at dim.uchile.cl Mon Oct 27 08:30:56 2008 From: cserpell at dim.uchile.cl (=?ISO-8859-1?Q?Cristi=E1n_Serpell?=) Date: Mon, 27 Oct 2008 09:30:56 -0300 Subject: [Bioperl-l] Editing sub locations In-Reply-To: <7DE1B520-4A96-4120-8E50-C01C28253101@bioperl.org> References: <1C3558D8-5C59-4C20-845F-F243CDE58FAE@dim.uchile.cl> <7DE1B520-4A96-4120-8E50-C01C28253101@bioperl.org> Message-ID: <40332BEC-6F76-46A8-A2A6-CAF010A95C00@dim.uchile.cl> Thanks It seems that location replacing such as in $feature->location ($newlocation); was added in a newer version, and I was looking an older API. It worked. Cristi?n El 25-10-2008, a las 12:44, Jason Stajich escribi?: > you can replace the stored location for a SeqFeature like this: > $feature->location($newlocation); > > You can also update the values for a sub location with this because > you have access to a reference to each of the individual sublocations > > Try this out (also added this example to http://bioperl.org/wiki/ > Module:Bio::Location::Split ) > #!/usr/bin/perl -w > use Bio::Location::Split; > use Bio::Location::Simple; > > my $split = Bio::Location::Split->new; > $split->add_sub_Location(Bio::Location::Simple->new(-start => 1, - > end => 20, > -strand => 1)); > $split->add_sub_Location(Bio::Location::Simple->new(-start => 25, - > end => 35, > -strand => 1)); > > print $split->to_FTstring(),"\n"; > for my $subloc ( $split->each_Location ) { > $subloc->start($subloc->start + 1001); > $subloc->end($subloc->end + 1001); > } > > print $split->to_FTstring(),"\n"; > > On Oct 24, 2008, at 10:50 AM, Cristi?n Serpell wrote: > >> Hi >> >> I would like to know if there is a way to edit a sub location from >> a Bio::Location::Split object, through the API. >> >> For example, you can get an array with $subloc = $location- >> >sub_Location(), and then modify it with $subloc->start($subloc- >> >start() + 1), but this does not change the original sub location. >> IF you get the array, you get the same original values (the same >> object). >> >> One idea I have got is to create a new Bio::Location::Split object >> and adding the modified sub locations, but then I don't know if >> you can change the Bio::SeqFeature location manually, even when >> creating a new one. >> >> The whole thing I'm doing is a program that "moves" an object, >> adding the same value to start and end values to everything. >> >> Any idea would help >> >> Thanks >> Cristi?n >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bosborne11 at verizon.net Mon Oct 27 15:13:46 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 27 Oct 2008 15:13:46 -0400 Subject: [Bioperl-l] Getting Genomic Sequences using Bioperl In-Reply-To: <2AA014D73FB87241B45863F19A3817B701EB9829@e2k3ms2.urmc-sh.rochester.edu> References: <2AA014D73FB87241B45863F19A3817B701EB9829@e2k3ms2.urmc-sh.rochester.edu> Message-ID: Craig, I'm CCing the Bioperl list in case someone has suggestions. If I were faced with this task I'd either use the ENSEMBL API (described on the Getting Genomic Sequences page) or the eutils API at NCBI or I'd take a look at the UCSC Genome Browser. There are people who are experts at using the eutils API in this list, perhaps they'll have some specific suggestions about how to get genomic sequences starting with Gene ids. Brian O. On Oct 27, 2008, at 2:24 PM, Benson, Craig C wrote: > > Hi Brian, > > I was wondering if you knew of a way to set the species genome and > build when using Bio::DB::GenBank (to get the sequence given > specific coordinates) or when using Bio::DB::EntrezGene (to get the > coordinates)? For instance, I have a list of ~300 genes (w/ Gene > ID) and I'd like to use a perl script to retrieve the dna sequence > 50bp up and down stream from the transcription start site of these > genes from both the human genome and the mouse genome. I'm assuming > that the example code on the bioperl documentation page for "Getting > Genomic Sequences" (http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences > ) defaults to the most recent human genome. Is that correct? Is > there a better way to retrieve these two sequences besides using > bioperl? > > Thanks! > > > Craig C. Benson, MD > Med-Peds Residency Program > University of Rochester Medical Center > > From bamboowarrior at gmail.com Mon Oct 27 20:08:39 2008 From: bamboowarrior at gmail.com (John O. Woods) Date: Mon, 27 Oct 2008 19:08:39 -0500 Subject: [Bioperl-l] Getting gene symbol and gene descriptions Message-ID: <91656c3f0810271708md7f5e30u493cfb4f740327eb@mail.gmail.com> Hi folks, I have lists of protein-coding genes in fruit fly (by FlyBase id), and in arabidopsis thaliana (by TAIR locus ID: AT...). I need to get the gene symbol name and gene description for each of these. I tried using GenPept::get_Stream_by_id, but I can't figure out how to extract the symbol or description that way. I've also tried BioMart, but that doesn't seem to have data for Arabidopsis (and it's timing out for fruit fly for some reason). There's also a flat file on TAIR's website, but it looks to be out of date. I'd much prefer to get the symbols that are considered "primary" in GenPept. I was also wondering: I know FlyBase allows anonymous access to its Chado DB. Is this possible with TAIR? They seem to ignore my emails, sadly--or perhaps I'm sending them to the wrong place. Cheers, John Woods UT Austin From awitney at sgul.ac.uk Tue Oct 28 09:37:23 2008 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 28 Oct 2008 13:37:23 +0000 Subject: [Bioperl-l] Fast BLAST parsing In-Reply-To: References: Message-ID: <320764BA-7CB4-49BC-94F4-872D8260A87D@sgul.ac.uk> Just wondering, could Bio::Index::Blast be used to generate the Index as the BLAST is performed, say with Bio::Tools::Run::StandAloneBlast? thanks adam On 23 Oct 2008, at 18:09, Chris Fields wrote: > Could you index the BLAST report using Bio::Index::Blast? From the > synopsis: > > use Bio::Index::Blast; > my ($indexfile,$file1,$file2,$query); > my $index = Bio::Index::Blast->new(-filename => $indexfile, > -write_flag => 1); > $index->make_index($file1,$file2); > > my $data = $index->get_stream($query); > > my $blast_report = $index->fetch_report($query); > print "query is ", $blast_report->query, "\n"; > while ( my $result = $blast_report->next_result ) { > print $result->algorithm, "\n"; > while ( my $hsp = $result->next_hit ) { > print "\t name ", $hsp->name, > } > print "\n"; > } > > I think you can index using a callback on the query name (so you can > look up by various means). > > chris > > > On Oct 23, 2008, at 11:22 AM, Brian Osborne wrote: > >> Bioperl, >> >> I'm not familiar with the very latest and greatest in BLAST >> parsing, perhaps you can help me here. I have a large Blast output >> file, it has multiple results in it. I'd like to rapidly find the >> relevant result for a given query name, I don't want to iterate >> over the results checking for query_name() each time. How can I >> directly pull out a result using query name? >> >> >> Thanks again, >> >> Brian O. >> -- >> Brian Osborne, PhD >> BioTeam: http://bioteam.net >> email: briano at bioteam.net >> mobile: 978-317-3101 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Oct 28 10:40:52 2008 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 28 Oct 2008 09:40:52 -0500 Subject: [Bioperl-l] Fast BLAST parsing In-Reply-To: <320764BA-7CB4-49BC-94F4-872D8260A87D@sgul.ac.uk> References: <320764BA-7CB4-49BC-94F4-872D8260A87D@sgul.ac.uk> Message-ID: Never tried it to be honest, but I can see where that would be helpful. It would probably require a bit of I/O magic (checking file position before/after parse, finding the query, etc) but I think it's a possibility. You can always add this in as an enhancement request to Bugzilla; patches are also welcome! chris On Oct 28, 2008, at 8:37 AM, Adam Witney wrote: > > Just wondering, could Bio::Index::Blast be used to generate the > Index as the BLAST is performed, say with > Bio::Tools::Run::StandAloneBlast? > > thanks > > adam > > On 23 Oct 2008, at 18:09, Chris Fields wrote: > >> Could you index the BLAST report using Bio::Index::Blast? From the >> synopsis: >> >> use Bio::Index::Blast; >> my ($indexfile,$file1,$file2,$query); >> my $index = Bio::Index::Blast->new(-filename => $indexfile, >> -write_flag => 1); >> $index->make_index($file1,$file2); >> >> my $data = $index->get_stream($query); >> >> my $blast_report = $index->fetch_report($query); >> print "query is ", $blast_report->query, "\n"; >> while ( my $result = $blast_report->next_result ) { >> print $result->algorithm, "\n"; >> while ( my $hsp = $result->next_hit ) { >> print "\t name ", $hsp->name, >> } >> print "\n"; >> } >> >> I think you can index using a callback on the query name (so you >> can look up by various means). >> >> chris >> >> >> On Oct 23, 2008, at 11:22 AM, Brian Osborne wrote: >> >>> Bioperl, >>> >>> I'm not familiar with the very latest and greatest in BLAST >>> parsing, perhaps you can help me here. I have a large Blast output >>> file, it has multiple results in it. I'd like to rapidly find the >>> relevant result for a given query name, I don't want to iterate >>> over the results checking for query_name() each time. How can I >>> directly pull out a result using query name? >>> >>> >>> Thanks again, >>> >>> Brian O. >>> -- >>> Brian Osborne, PhD >>> BioTeam: http://bioteam.net >>> email: briano at bioteam.net >>> mobile: 978-317-3101 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Marie-Claude Hofmann >> College of Veterinary Medicine >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jensen at fortinbras.us Tue Oct 28 11:31:55 2008 From: jensen at fortinbras.us (Mark A. Jensen - Fortinbras Research) Date: Tue, 28 Oct 2008 11:31:55 -0400 Subject: [Bioperl-l] a query-based interface to the LANL HIV db Message-ID: Hello all, I've been working on a set of Bioperl modules that access the (difficult to navigate) Los Alamos HIV sequence database. The modules have a couple of features that I think could boost the usage of that resource. What makes LANL really useful is the richness of the annotations, with information on geography, virology, patient health, and other epidemiologically useful stuff. However, the db interface is cgi-based and difficult to use in batch or to create complex queries without cramping your right index finger. The modules take care of this, slurp the desired annotations and attach them correctly (I think) to Bio::Seq objects delivered by a SeqIO stream. The interface understands NCBI-like query strings (with ANDs, ORs, and parens). I've put the APIs up on my site so as not to cram your inbox: please see http://fortinbras.us/HIVQueryAPI. All the modules exist, work, and have their own .t files. I'd appreciate your comments. cheers, Mark Jensen From cjm at berkeleybop.org Mon Oct 27 20:44:24 2008 From: cjm at berkeleybop.org (Chris Mungall) Date: Mon, 27 Oct 2008 17:44:24 -0700 Subject: [Bioperl-l] Getting gene symbol and gene descriptions In-Reply-To: <91656c3f0810271708md7f5e30u493cfb4f740327eb@mail.gmail.com> References: <91656c3f0810271708md7f5e30u493cfb4f740327eb@mail.gmail.com> Message-ID: <0A0F36C4-247E-4668-A0A8-A9E97502E9F8@berkeleybop.org> You could do a SQL query for these on the GO database: http://berkeleynop.org/goose On Oct 27, 2008, at 5:08 PM, John O. Woods wrote: > Hi folks, > > I have lists of protein-coding genes in fruit fly (by FlyBase id), > and in > arabidopsis thaliana (by TAIR locus ID: AT...). I need to get the gene > symbol name and gene description for each of these. > > I tried using GenPept::get_Stream_by_id, but I can't figure out how to > extract the symbol or description that way. > > I've also tried BioMart, but that doesn't seem to have data for > Arabidopsis > (and it's timing out for fruit fly for some reason). > > There's also a flat file on TAIR's website, but it looks to be out > of date. > I'd much prefer to get the symbols that are considered "primary" in > GenPept. > > > I was also wondering: I know FlyBase allows anonymous access to its > Chado > DB. Is this possible with TAIR? They seem to ignore my emails, > sadly--or > perhaps I'm sending them to the wrong place. > > Cheers, > John Woods > UT Austin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hartzell at alerce.com Tue Oct 28 13:00:02 2008 From: hartzell at alerce.com (George Hartzell) Date: Tue, 28 Oct 2008 10:00:02 -0700 Subject: [Bioperl-l] Getting gene symbol and gene descriptions In-Reply-To: <0A0F36C4-247E-4668-A0A8-A9E97502E9F8@berkeleybop.org> References: <91656c3f0810271708md7f5e30u493cfb4f740327eb@mail.gmail.com> <0A0F36C4-247E-4668-A0A8-A9E97502E9F8@berkeleybop.org> Message-ID: <18695.17682.136284.574262@almost.alerce.com> Chris Mungall writes: > > You could do a SQL query for these on the GO database: > > http://berkeleynop.org/goose > [...] berkelynop.org is a no-op. This works though: http://berkeleybop.org/goose g. From maj at fortinbras.us Tue Oct 28 14:33:16 2008 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 28 Oct 2008 14:33:16 -0400 Subject: [Bioperl-l] HIV db post with linebreaks. Message-ID: <729E7804B62D45E59824423AE790ACC0@NewLife> [once more, with linebreaks] Hello all, I've been working on a set of Bioperl modules that access the (difficult to navigate) Los Alamos HIV sequence database. The modules have a couple of features that I think could boost the usage of that resource. What makes LANL really useful is the richness of the annotations, with information on geography, virology, patient health, and other epidemiologically useful stuff. However, the db interface is cgi-based and difficult to use in batch or to create complex queries without cramping your right index finger. The modules take care of this, slurp the desired annotations and attach them correctly (I think) to Bio::Seq objects delivered by a SeqIO stream. The interface understands NCBI-like query strings (with ANDs, ORs, and parens). I've put the APIs up on my site so as not to cram your inbox: please see http://fortinbras.us/HIVQueryAPI All the modules exist, work, and have their own .t files. I'd appreciate your comments. cheers, Mark Jensen From cjfields at illinois.edu Wed Oct 29 12:32:44 2008 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Oct 2008 11:32:44 -0500 Subject: [Bioperl-l] HIV db post with linebreaks. In-Reply-To: <729E7804B62D45E59824423AE790ACC0@NewLife> References: <729E7804B62D45E59824423AE790ACC0@NewLife> Message-ID: <7BD7D126-8DC6-419E-B993-58CD249CB199@illinois.edu> Mark, Looks very interesting and useful! We would need to know what prereqs are needed for the modules; we are trying to refrain from adding any more than we already have. Also, recent updates to Bio::AnnotationI which disallow operator overloading (read: maintenance nightmare) require a display_text() method for data comparison. Feel free to attach the modules to a bug report in Bugzilla (as an enhancement request) so we can test them out. chris On Oct 28, 2008, at 1:33 PM, Mark A. Jensen wrote: > [once more, with linebreaks] > Hello all, > > I've been working on a set of Bioperl modules that access the > (difficult to navigate) Los Alamos HIV sequence database. The modules > have a couple of features that I think could boost the usage of that > resource. What makes LANL really useful is the richness of the > annotations, with information on geography, virology, patient health, > and other epidemiologically useful stuff. However, the db interface is > cgi-based and difficult to use in batch or to create complex queries > without cramping your right index finger. The modules take care of > this, slurp the desired annotations and attach them correctly (I > think) to Bio::Seq objects delivered by a SeqIO stream. The interface > understands NCBI-like query strings (with ANDs, ORs, and parens). > > I've put the APIs up on my site so as not to cram your inbox: please > see > > http://fortinbras.us/HIVQueryAPI > > All the modules exist, work, and have their own .t files. I'd > appreciate your comments. > > cheers, > Mark Jensen > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From clements at nescent.org Wed Oct 29 17:53:31 2008 From: clements at nescent.org (Dave Clements) Date: Wed, 29 Oct 2008 14:53:31 -0700 Subject: [Bioperl-l] bp_genbank2gff3.pl error: "MSG: structure_type 2 is currently unknown" In-Reply-To: References: Message-ID: Hello all, I'm trying to translate the threespine stickleback genome from Ensembl in GenBank format ( ftp://ftp.ensembl.org/pub/current_genbank/gasterosteus_aculeatus/) into GFF3 format using the bp_genbank2gff3.pl script. I get several data errors and I've contacted Ensembl about some of them. However, I also have a question about one of the errors. I get this error many times while parsing the files: --- # working on region:scaffold:BROADS1:scaffold_180:1:137802:1, Gasterosteus aculeatus, 30-JUN-2008, Gasterosteus aculeatus scaffold scaffold_180 BROADS1 full sequence 1..137802 reannotated via EnsEMBL scaffold:BROADS1:scaffold_180:1:137802:1 Unflattening error: Details: ------------- EXCEPTION ------------- MSG: structure_type 2 is currently unknown STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/share/perl/5.8.8/Bio/SeqFeature/Tools/Unflattener.pm:1445 STACK (eval) /usr/local/bin/bp_genbank2gff3.pl:895 STACK main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:894 STACK toplevel /usr/local/bin/bp_genbank2gff3.pl:411 ------------------------------------- # Possible gene unflattening error withscaffold:BROADS1:scaffold_180:1:137802:1: consult STDERR --- The code snippet that generates this message is: 1432 # TYPE CONTAINMENT HIERARCHY (aka partonomy) 1433 # set the containment hierarchy if desired 1434 # see docs for structure_type() method 1435 if ($structure_type) { 1436 if ($structure_type == 1) { 1437 $self->partonomy( 1438 {CDS => 'gene', 1439 exon => 'CDS', 1440 intron => 'CDS', 1441 } 1442 ); 1443 } 1444 else { 1445 $self->throw("structure_type $structure_type is currently unknown"); 1446 } 1447 } I get this error if I specify --noCDS or --CDS. I also get it if I parse the EMBL format files instead. However, if I specify "--filter exon --filter mRNA" (I have to specify both) the errors go away. According to http://search.cpan.org/~birney/bioperl/Bio/SeqFeature/Tools/Unflattener.pm#structure_type(and my copy of the PM), 0 and 1 are the only valid values for this. However, $structure_type gets set by this chunk of code: 1337 # Are there any mRNA features in the record? 1338 if ($n_mrnas == 0) { 1339 # NO mRNAs: 1340 # looks like structure_type == 1 * 1341 $structure_type = 1; 1342 $need_to_infer_mRNAs = 1; 1343 } 1344 elsif ($n_mrnas_attached_to_gene == 0) { 1345 # $n_mrnas > 0 1346 # $n_mrnas_attached_to_gene = 0 1347 # 1348 # The entries _do_ contain mRNA features, 1349 # but none of them are part of a group/gene, i.e. they 1350 # are 'floating' 1351 1352 # this is an annoying weird file that has some floating 1353 # mRNA features; 1354 # eg ftp.ncbi.nih.gov/genomes/Schizosaccharomyces_pombe/ 1355 1356 if ($self->verbose) { 1357 my @floating_mrnas = 1358 grep {$_->primary_tag eq 'mRNA' && 1359 !$_->has_tag($group_tag)} @flat_seq_features; 1360 printf STDERR "Unattached mRNAs:\n"; 1361 foreach my $mrna (@floating_mrnas) { 1362 $self->_write_sf_detail($mrna); 1363 } 1364 printf STDERR "Don't know how to deal with these; filter at source?\n"; 1365 } 1366 1367 foreach (@flat_seq_features) { 1368 if ($_->primary_tag eq 'mRNA') { 1369 # what should we do?? 1370 1371 # I think for pombe we just have to filter 1372 # out bogus mRNAs prior to starting 1373 } 1374 } 1375 1376 # looks like structure_type == 2 * 1377 $structure_type = 2; 1378 $need_to_infer_mRNAs = 1; 1379 } 1380 else { 1381 } I've attached a file containing only scaffold_180 (cleaned up some), but it may be too big to make it through the list's filters. If that happens the files are at ftp://ftp.ensembl.org/pub/current_genbank/gasterosteus_aculeatus/. Scaffold_180 is in the "0" data file. I've also appended the relevant parts of the file at the end. Can someone explain what the comments mean by: The entries _do_ contain mRNA features, but none of them are part of a group/gene, i.e. they are 'floating' this is an annoying weird file that has some floating mRNA features; The mRNAs all appear to have gene names associated with them. What am I missing? Any ideas? Thanks, Dave C LOCUS scaffold_180 137802 bp DNA HTG 30-JUN-2008 DEFINITION Gasterosteus aculeatus scaffold scaffold_180 BROADS1 full sequence 1..137802 reannotated via EnsEMBL ACCESSION scaffold:BROADS1:scaffold_180:1:137802:1 VERSION scaffold_180BROADS1 KEYWORDS . SOURCE three-spined stickleback ORGANISM Gasterosteus aculeatus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Euteleostei; Neoteleostei; Acanthomorpha; Acanthopterygii; Percomorpha; Gasterosteiformes; Gasterosteidae; Gasterosteus. COMMENT This sequence was annotated by the Ensembl system. Please visit the Ensembl web site, http://www.ensembl.org/ for more information. COMMENT All feature locations are relative to the first (5') base of the sequence in this file. The sequence presented is always the forward strand of the assembly. Features that lie outside of the sequence contained in this file have clonal location coordinates in the format: .:.. COMMENT The /gene indicates a unique id for a gene, /note="transcript_id=..." a unique id for a transcript, /protein_id a unique id for a peptide and note="exon_id=..." a unique id for an exon. These ids are maintained wherever possible between versions. COMMENT All the exons and transcripts in Ensembl are confirmed by similarity to either protein or cDNA sequences. FEATURES Location/Qualifiers source 1..137802 /organism="Gasterosteus aculeatus" /db_xref="taxon:69293" gene complement(1399..13644) /gene=ENSGACG00000001596 /locus_tag="TOP1 (2 of 2)" /note="DNA topoisomerase 1 (EC 5.99.1.2) (DNA topoisomerase I). [Source:Uniprot/SWISSPROT;Acc:P11387]" mRNA join(complement(2230..2423),complement(1399..1718), complement(3787..3936),complement(4166..4260), complement(4370..4497),complement(5297..5451), complement(5953..5999),complement(6147..6212), complement(6228..6374),complement(6548..6582), complement(6594..6760),complement(7035..7222), complement(7299..7421),complement(7497..7662), complement(7676..7735),complement(8704..8786), complement(8863..8950),complement(9267..9302), complement(9718..9792),complement(9899..9954), complement(10037..10139),complement(10661..10751), complement(13289..13313),complement(13612..13644)) /gene="ENSGACG00000001596" /note="transcript_id=ENSGACT00000002089" CDS join( complement(2321..2423), complement(3787..3936),complement(4166..4260), complement(4370..4497),complement(5297..5451), complement(5953..5999),complement(6147..6212), complement(6228..6374),complement(6548..6582), complement(6594..6760),complement(7035..7222), complement(7299..7421),complement(7497..7662), complement(7676..7735),complement(8704..8786), complement(8863..8950),complement(9267..9302), complement(9718..9792),complement(9899..9954), complement(10037..10139),complement(10661..10751), complement(13289..13313),complement(13612..13644)) /gene="ENSGACG00000001596" /protein_id="ENSGACP00000002084" /note="transcript_id=ENSGACT00000002089" /db_xref="HGNC_curated_gene:TOP1 (2 of 2)" /translation="MSGGHAHAHAQVNSGSKGSETHKHKEKHKEHRHKEHRKEKEREK LKHSNSEHKDPAEKKLRDKQKLKHSNGSSEKPREKRREEKIQPSHVEKPKKEKENGFV RERSPSALKSEPEEDNGFYPSPQHLNTCRAESAGRDVGLEYRPKKIKSEHDKKAKKRK QEYEEDEEEDIKPKKKTRDQKATQGKKIKKEEEKWKCVCKERTETSRRHSLVCGPTFL TPSWIVDLWDLFAGKPMKLKPPAEEVATFFAKMLDHEYTTKDIFRKNFFKDWRKEMTS EEKSKLSDLNKCDFGEMSEYFKAQSEARKQMSKEEKQKLKEENERLLQEYGFCIMDNH KERIGNFRIEPPGLFRGRGDHPKMGMLKRRIRPEDIIINCSKDSKQPKPPPGTKWKEV RHDNKVTWLASWTENIQGSIKYIMLNPSSRIKVPCQHMTTEKKGSVSLMNNSLRWEPA ALSLRAPGGVRFLLERFGLRIVSEQQLQRNSGFDENTFFFNKMDSWTIMAASYAGKEK QNCCHKLHAEAEVYAGEQYLLCPRAPFESSSSPLQTSILNKHLQELMDGLTAKVFRTY NASITLQQQLKELACPDDSLPAKVLSYNRANRAVAILCNHQRAPPKTFEKSMQNLQTK IDEKQNQLSAARKQLKSAKAAHKTSHDDKSRKAWKVKRKAVQRIEEQLMKLQVQATDR EENKQIALGTSKLNYLDPRISVAWCKKWAVPIEKIYNKTQREKFAWAIDMAEKDFEF" gene complement(16523..17577) /gene=ENSGACG00000001598 mRNA join(complement(16523..16551), complement(16802..17577)) /gene="ENSGACG00000001598" /note="transcript_id=ENSGACT00000002091" CDS join(complement(16523..16551), complement(16802..17399)) /gene="ENSGACG00000001598" /protein_id="ENSGACP00000002086" /note="transcript_id=ENSGACT00000002091" /translation="MSPPPAPQVKGQPSPAPAVVSATADSHQSLVERTGQGPPGAVPP QVLHPPAIQIEAIAPPTSAPAASNNITAPTASSPTPAASQVAVPTPIISQAPVPSTAA ASNQAQAVAPQPPAVALAGASTSVAATLVSTAAPVQRPVPSVVPIVAGSGPSLEAVAT TSSPVANPSGVPPAQPNPPAVERPMPPTAASAAITQTSPVSIQQAPPSQ" gene complement(18492..25815) /gene=ENSGACG00000001600 mRNA join(complement(18492..19760), complement(19856..20080), complement(20334..20468), complement(20661..20713), complement(20841..20959), complement(21093..21501), complement(21610..21727), complement(21929..22470), complement(23568..23708), complement(23816..24424), complement(24488..24643), complement(24749..24877), complement(24989..25111), complement(25218..25373), complement(25716..25815)) /gene="ENSGACG00000001600" /note="transcript_id=ENSGACT00000002099" CDS join(complement(18492..19760), complement(19856..20080), complement(20334..20468), complement(20661..20713), complement(20841..20959), complement(21093..21501), complement(21610..21727), complement(21929..22470), complement(23568..23708), complement(23816..24424), complement(24488..24643), complement(24749..24877), complement(24989..25111), complement(25218..25373), complement(25716..25815)) /gene="ENSGACG00000001600" /protein_id="ENSGACP00000002094" /note="transcript_id=ENSGACT00000002099" /translation="SAVFIAFRGNMEDEDFSLKLDSILSGIPNMLDMASERLQPQHVE PWNSVRVTFNIPRDAAERLRLLAQNNQQQLRDLGILSVQIEGEGAINVAVGPNRGQDV RVNGPTGAPGQMRMDVGFSGQPGPGGVRMANPAMVPPGPGIAGQAMVPGSSGQMHPRI QRPTSQTGSDGTDPMMAGMSVQQQQQPLQHQQAGPHVPGPMPQAAHHLQALQGGRPLN PAAQAQLSQLGPRPPFNPSGQMAVPPGWNQLPSGVLQPPATQGSPAWRKPPPQAQMVP RPPSLATVQTPSHPPPPYPFGSQQAGQVFNAIGQLQQQQQTGVGQFAAPQPKGLQTGP GGVAGPPRPPPPLPPTSGPQGNLTAKSPGSSSSPFQQGSPGTPPMRPTTPQGFPQGVG SPGRAALGQPGNMQQGFMGMPQHGQPGAQVHPVITGMPKRPMGFPNPNFVQGQVSGST PGTPVGGASQQLQGNQAMTHTGALPSASTPNSMQGPPHAQPNVMGVQSGMAGLPPGTT AGPSMGQQQPGLQTQMMGLQHQAQPVSSSPSQKVQGQGGGQTVLSRPLSQGQRGGMTP PKQMMPQQGQGVMHGQGQMVGGQGHQAMLMQQQQQQNSMMEQMVANQMQGNKQPFGGK IPAGVMPGQMMRGPAPNVPGNMVQFQGQQQHQQMNQQQPQQVPIAGNPNQAMGMHGQQ LRLPAGHPLTAQQHPHPLGDPNGGTGDLGVQQMVPDMQAQQQQGMMGGPQHMQMGNGH FAGHGMNFNSQFQGQMPMAGACVQPGGFPVSKDVTLTSPLLVNLLQSDISASQFGPGG KQGAGGGNQAKPKKKKPARKKKSKEGDGPHGLDAAAGMEDSELPNLGGEQSLGLENSG QKLPEFANRPAGFPGQAGDQRVLQQVPMQMQMQSLQNAQGPQGMTGPQAPGQGQPQMH PHQLQQQPQQSNLLQQMLMMLKMQQEQAKNRMSIPPGGQIPPRGMGNPPEVQRLPVSQ QNNMPVMISLPGHGGVPPSPDKARGMPLMVNPQLAGAVRRMSHPDAGQGLQGAGSEEA IAHQKQPGGPDVGLQHPGNGNQQMMANQGSNAHMMKQGPGPSPMPQHTGASPQQQLPS QPQQGGPMPGLHFPNVPTTSQSSRPKTPNRASPRPYHHPLTPTNRPPSTEPSEINLSP ERLNASIAGLFPPKINIPLPPRQPNLNRGFDQQGLNPTTLKAIGQAPPSLTLPGNSNN GSVGGNNNQQPFSTGSGVGGAGGKQDKQPGGQAKRASPSNSRRSSPASSRKSATPSPG RQKGTKMAINCPPPQQQLVGSQAQTTMLSPASALPNPLSMPSQVSGAVEAQQTQSPFH GMQGNAAEGIRESQGMATAEQRQVPQTPPQPLRELSAPRMASPRFPLPQQPKPDLEVK AGTVDRLPVQTPPVPDSEASPTLRAAPTSLNQLLDNSAIANMPPRAGQNT" gene complement(28650..36301) /gene=ENSGACG00000001608 /locus_tag="GGTL3 (1 of 2)" /note="Gamma-glutamyltransferase 4 precursor (EC 2.3.2.2) (Gamma- glutamyltranspeptidase 4) (GGT 4) (Gamma- glutamyltransferase-like 3) (Gamma-glutamyltransferase- like 5) [Contains: Gamma- glutamyltransferase 4 heavy chain; Gamma-glutamyltransferase 4 light chain [Source:Uniprot/SWISSPROT;Acc:Q9UJ14]" mRNA join(complement(28650..28810), complement(29299..29398), complement(29498..29635), complement(29741..29855), complement(30289..30441), complement(30537..30625), complement(31122..31249), complement(31894..31981), complement(32781..32974), complement(33108..33181), complement(33575..33642), complement(34074..34191), complement(34272..34423), complement(34505..34731), complement(35555..35616), complement(35650..35756)) /gene="ENSGACG00000001608" /note="transcript_id=ENSGACT00000002107" CDS join(complement(28650..28810), complement(29299..29398), complement(29498..29635), complement(29741..29855), complement(30289..30441), complement(30537..30625), complement(31122..31249), complement(31894..31981), complement(32781..32974), complement(33108..33181), complement(33575..33642), complement(34074..34191), complement(34272..34423), complement(34505..34731), complement(35555..35616), complement(35650..35756)) /gene="ENSGACG00000001608" /protein_id="ENSGACP00000002102" /note="transcript_id=ENSGACT00000002107" /db_xref="HGNC_curated_gene:GGTL3 (1 of 2)" /translation="RTEDKSANPETTLGSAYSPVDYMSITSFPRLPEDDKGDNTLKLR KGEENALSEQDTDPDVFLKSAHLQRLPSSASDLASHEIASLRETRTDPFTEDCACQRD GLTVIITAGLTFALGVTVALIMQIYLGPPQIFNQGAVVTDVAQCTSLGFDVLERQGSS VDAAIAAALCLGIVHPHTSGIGGGGVMLVHNIRRNETRVIDFRETAPAAISEEMLLTK LHLNPGLLVGVPGMLSGLHQAHQLYGRMPWKDVVTMAAEVARTGFNVTHDLAEALAKA KDQNMSDAFGHLFLPDGQPPPSGLLTRRLDLAAILDAVASKGTSEFYSENLTREMAAA VQAAGGVLTEEDFGNYSTVLQQPAEIIYQGHHVMAAPAPHAGIALIAALNILEGYNIT SQVPRNSTYHWIAEALKISLALASGLGDPMYDTSISDVVAKMLSKSQASLLRQMINDS QAFPVGHYAPSFTLETGAAAAQVMVMGPDDHIVSVMSSLNKPFGSGIVTPSGILLNSQ ILDFSWPNKTRGSSPNPHNSLQPGKRPMSFLMPTAVRPAVGLCGTYVAVGSSDGEKAL SGITQVLMNVLSSRKNMSDSLAYGRLHPHLLPNMLLVDSEFEDEDVELLQAKGHKVER RDVLSLVEGTRRTNDLIIGVKDPRSADASALTMS" mRNA join(complement(29318..29398), complement(29498..29635), complement(29741..29855), complement(30289..30441), complement(30537..30625), complement(31122..31249), complement(31894..31981), complement(32781..32974), complement(33108..33181), complement(33575..33642), complement(34074..34191), complement(34272..34423), complement(34505..34731), complement(35555..35849), complement(36242..36301)) /gene="ENSGACG00000001608" /note="transcript_id=ENSGACT00000002113" CDS join(complement(29318..29398), complement(29498..29635), complement(29741..29855), complement(30289..30441), complement(30537..30625), complement(31122..31249), complement(31894..31981), complement(32781..32974), complement(33108..33181), complement(33575..33642), complement(34074..34191), complement(34272..34423), complement(34505..34731), complement(35555..35690)) /gene="ENSGACG00000001608" /protein_id="ENSGACP00000002108" /note="transcript_id=ENSGACT00000002113" /translation="MSITSFPRLPEDDNAAAAAAPAPAPGDNTLKLRKGEENALSEQD TDPDVFLKSAHLQRLPSSASDLASHEIASLRETRTDPFTEDCACQRDGLTVIITAGLT FALGVTVALIMQIYLGPPQIFNQGAVVTDVAQCTSLGFDVLERQGSSVDAAIAAALCL GIVHPHTSGIGGGGVMLVHNIRRNETRVIDFRETAPAAISEEMLLTKLHLNPGLLVGV PGMLSGLHQAHQLYGRMPWKDVVTMAAEVARTGFNVTHDLAEALAKAKDQNMSDAFGH LFLPDGQPPPSGLLTRRLDLAAILDAVASKGTSEFYSENLTREMAAAVQAAGGVLTEE DFGNYSTVLQQPAEIIYQGHHVMAAPAPHAGIALIAALNILEGYNITSQVPRNSTYHW IAEALKISLALASGLGDPMYDTSISDVVAKMLSKSQASLLRQMINDSQAFPVGHYAPS FTLETGAAAAQVMVMGPDDHIVSVMSSLNKPFGSGIVTPSGILLNSQILDFSWPNKTR GSSPNPHNSLQPGKRPMSFLMPTAVRPAVGLCGTYVAVGSSDGEKALSGITQVLMNVL SSRKNMSDSLAYGRLHPHLLP" gene 53987..55637 /gene=ENSGACG00000001618 /locus_tag="SNAI1 (2 of 2)" /note="Zinc finger protein SNAI1 (Protein snail homolog 1) (Protein sna). [Source:Uniprot/SWISSPROT;Acc:O95863]" mRNA join(53987..54136,54343..54534,54562..54707,54756..54915, 55148..55314,55605..55637) /gene="ENSGACG00000001618" /note="transcript_id=ENSGACT00000002120" CDS join(54052..54136,54343..54534,54562..54707,54756..54915, 55148..55314,55605..55637) /gene="ENSGACG00000001618" /protein_id="ENSGACP00000002115" /note="transcript_id=ENSGACT00000002120" /db_xref="HGNC_curated_gene:SNAI1 (2 of 2)" /translation="MPRSFLVKKYFSNRKPSWDRDSQLESQAAFVPESFAQAELPTQN GSFALTCYPTGPSFSGVGVLPAPLSPIAPASPSPSPLGPLDLSSAPSSNGGRTSDPPS PDVVQHAFHCLRCTSSYSSLSALSHHQASHHQASQRARQRPAFHCKHCPKEYTSLGAL KMHIRSHTLPCVCPTCGKAFSRPWLLRGHIRTHTGERPFACQHCNRAFADRSNLRAHL QKHPEVKKYQCGSCSRTFSRMFLLLNTAPPGAGVCAPLRGNIQ" mRNA join(54052..54136,54343..54535,54563..54731,54756..54915, 55148..55290,55293..55316,55420..55428) /gene="ENSGACG00000001618" /note="transcript_id=ENSGACT00000002124" CDS join(54052..54136,54343..54535,54563..54731,54756..54915, 55148..55290,55293..55316,55420..55428) /gene="ENSGACG00000001618" /protein_id="ENSGACP00000002119" /note="transcript_id=ENSGACT00000002124" /translation="MPRSFLVKKYFSNRKPSWDRDSQLESQAAFVPESFAQAELPTQN GSFALTCYPTGPSFSGVGVLPAPLSPIAPASPSPSPLGPLDLSSAPSSSGGRTSDPPS PDVVQHAFHCLRCTSSYSSLSALSHHQASHHQASQRARQQHSSPLPPRPAFHCKHCPK EYTSLGALKMHIRSHTLPCVCPTCGKAFSRPWLLRGHIRTHTGERPFACQHCNRAFAD RSNLRAHLQKHPEVKKYQCGSCSRTFSRMFLLQHSASGCCPPC" gene 69424..106380 /gene=ENSGACG00000001624 mRNA join(69424..70049,70523..70546,105631..105758, 106305..106380) /gene="ENSGACG00000001624" /note="transcript_id=ENSGACT00000002125" CDS 69516..69824 /gene="ENSGACG00000001624" /protein_id="ENSGACP00000002120" /note="transcript_id=ENSGACT00000002125" /translation="MKRLKNLIMLTIDLTKIPSQRRSLPLLTRGRFVRRPQAFLAAFV VVWPDCRRVQSSEDPSIAARSLHLNICFKGCDRRREHDLLHLISNKTNIKKGKTKTKC L" gene complement(69458..72266) /gene=ENSGACG00000001627 mRNA join(complement(69458..70045), complement(70528..70553), complement(71520..71631), complement(72179..72266)) /gene="ENSGACG00000001627" /note="transcript_id=ENSGACT00000002128" CDS complement(69515..69880) /gene="ENSGACG00000001627" /protein_id="ENSGACP00000002123" /note="transcript_id=ENSGACT00000002128" /translation="MRYSGPFACVFKLFYYNTHKHLVLVLPFLMLVLFDIKCNRSCSL LLSQPLKHIFKCKLRAAMDGSSLLCTRRQSGQTTTNAAKNACGRRTKRPRVSNGSERR WEGIFVKSIVNIIKFLSLFI" gene complement(74610..77849) /gene=ENSGACG00000001630 /locus_tag="SAMHD1 (2 of 3)" /note="SAM domain and HD domain-containing protein 1 (Dendritic cell-derived IFNG-induced protein) (DCIP) (Monocyte protein 5) (MOP-5). [Source:Uniprot/SWISSPROT;Acc:Q9Y3Z3]" mRNA join(complement(74610..74684), complement(74761..74861), complement(74977..75120), complement(75749..75819), complement(75907..76022), complement(76434..76594), complement(77706..77849)) /gene="ENSGACG00000001630" /note="transcript_id=ENSGACT00000002132" CDS join(complement(74612..74684), complement(74761..74861), complement(74977..75120), complement(75749..75819), complement(75907..76022), complement(76434..76594), complement(77706..77738)) /gene="ENSGACG00000001630" /protein_id="ENSGACP00000002127" /note="transcript_id=ENSGACT00000002132" /db_xref="HGNC_curated_gene:SAMHD1 (2 of 3)" /translation="MAGRPSDLLGKVFNDPIHGHMEMHPLLIRIIDTPQFQRLRHIKQ LGGVYFVFPGASHNRFEHSLGVAHLAGELVRDLKQRQPDLNITDRDVLCVQIAGLCHD LGHGPFSHMFDGMFIPKARPGLTWKHEKASVEMFDHLVADNDLKPVMKEHGLKLPEDL VFIKELMDPKDPKDPWSYKGRLENKSFLYEIVSNKRNAIDVDKWDYFARDCYHLGIKN NFDHGRCLMFARVCE" gene complement(90678..100224) /gene=ENSGACG00000001632 /locus_tag="SAMHD1 (1 of 3)" /note="SAM domain and HD domain-containing protein 1 (Dendritic cell-derived IFNG-induced protein) (DCIP) (Monocyte protein 5) (MOP-5). [Source:Uniprot/SWISSPROT;Acc:Q9Y3Z3]" mRNA join(complement(90678..91320), complement(91530..91667), complement(91829..91933), complement(92020..92107), complement(92441..92582), complement(93438..93553), complement(93628..93719), complement(94565..94673), complement(94750..94850), complement(94942..95100), complement(95645..95715), complement(95803..95918), complement(98829..98989), complement(99304..99376), complement(99751..99817), complement(99921..100224)) /gene="ENSGACG00000001632" /note="transcript_id=ENSGACT00000002142" CDS join(complement(91198..91320), complement(91530..91667), complement(91829..91933), complement(92020..92107), complement(92441..92582), complement(93438..93553), complement(93628..93719), complement(94565..94673), complement(94750..94850), complement(94942..95100), complement(95645..95715), complement(95803..95918), complement(98829..98989), complement(99304..99376), complement(99751..99817), complement(99921..100089)) /gene="ENSGACG00000001632" /protein_id="ENSGACP00000002136" /note="transcript_id=ENSGACT00000002142" /db_xref="HGNC_curated_gene:SAMHD1 (1 of 3)" /translation="MASRKRSFPPDSSLSAPGKRAPGPGAPQTDYAGWGAEETCRYLR AEGLGEWEDAFREHRITGVGLRYLADADLEKMGLKFLGDRLRVLHSLRTLWQIEVEPS KVFNDPIHGHMEMHPLLIRIIDTPQFQRLRHIKQLGGAYFVFPGASHNRFEHSLGVGH LAGQLVRALDQRQPELHITRRDVLCVQIAGLCHDLGHGPFSHMFDGKFIPKARPGFTW KHEDASVKMFDHLVADNDLQPVMKEHGLVLPEDLDFIKEQIAGPMDPKDMKKLEWPYR GRPKDKSFLYEIVSNKRNGIDVDKWDYFARDCYHLGIKNNFDYGRCLMFAKVCEVDGQ KHICTRDKEVGNLYDMFHTRNCLHRRAYQHKVAKIVETMITEAFLKADGHILFEGSKG KMFSLSTAIDDMEAYTKVTVDNVFEQILNSSSAALKDSREILKNVVCRRLYKCLGHTQ ADQHENVPQKERIASWEADLARCASQDVVLNPEDFIIDVINLDYGMKEKNPINSVRFY SKDDPSKAVQIRKNQVSKLLPEQFAEQLIRVYCKKLDSRSLEAAKKNFVQWCMDENFS KPQDGDIIAPELTPLKPSRQEDDDNNKKEVNPVGKARIQLFER" gene complement(105654..110675) /gene=ENSGACG00000001639 /locus_tag="SAMHD1 (3 of 3)" /note="SAM domain and HD domain-containing protein 1 (Dendritic cell-derived IFNG-induced protein) (DCIP) (Monocyte protein 5) (MOP-5). [Source:Uniprot/SWISSPROT;Acc:Q9Y3Z3]" mRNA join(complement(105654..105755), complement(106302..106406), complement(106493..106528), complement(106762..106946), complement(107965..108080), complement(108157..108248), complement(108694..108802), complement(108879..109033), complement(109155..109214), complement(109846..109916), complement(110004..110119), complement(110524..110675)) /gene="ENSGACG00000001639" /note="transcript_id=ENSGACT00000002145" CDS join(complement(105654..105755), complement(106302..106406), complement(106493..106528), complement(106762..106946), complement(107965..108080), complement(108157..108248), complement(108694..108802), complement(108879..109033), complement(109155..109214), complement(109846..109916), complement(110004..110119), complement(110524..110675)) /gene="ENSGACG00000001639" /protein_id="ENSGACP00000002139" /note="transcript_id=ENSGACT00000002145" /db_xref="HGNC_curated_gene:SAMHD1 (3 of 3)" /translation="DPIHGHMEMHPLLIRIIDTPQFQRLRRIKQLGGAYFVFPGASHN RFEHSLGVAHLAGKLVRALDQRQGDLHIDDRDVLCVQIAGLCHDLGHGPFSHMFDGKF IPKARPGFTWKHEKASVEMFDHLVADNDLQPNDHVVLFVPDVTVVSSPQWPYRGRLEN KSFLYEIVSNKRNCIDVDKWDYFARDCYHLGIKNNFDHGRCLMFARVCEVDGQKQICF RDKEVEDLYDMFYTRICLHRRAYQHKAANIVETMITEAFWKADGHIEFEGSGGQKFKL SDTIKDMEAYTKVTDDVFEKILNSSSDELKDSREILQDVVCRRIYKCIGQAQPTQPTT VTVSVIIFSYFTLEKLEADVVLNPEDFIIDVINLDYGMKEENPIDRVRFYSKDDPDKG FQIPQNQVFGFLPEKFTKELIRVYCKKLDSESLKAAKDNFK" exon complement(10037..10139) /note="exon_id=ENSGACE00000016865" exon complement(8863..8950) /note="exon_id=ENSGACE00000016876" exon complement(7299..7421) /note="exon_id=ENSGACE00000016885" exon complement(9899..9954) /note="exon_id=ENSGACE00000016866" exon complement(5297..5451) /note="exon_id=ENSGACE00000016904" exon complement(4370..4497) /note="exon_id=ENSGACE00000016905" exon complement(7676..7735) /note="exon_id=ENSGACE00000016881" exon complement(7035..7222) /note="exon_id=ENSGACE00000016889" exon complement(5953..5999) /note="exon_id=ENSGACE00000016902" exon complement(6228..6374) /note="exon_id=ENSGACE00000016896" exon complement(4166..4260) /note="exon_id=ENSGACE00000016911" exon complement(1399..1718) /note="exon_id=ENSGACE00000016924" exon complement(10661..10751) /note="exon_id=ENSGACE00000016863" exon complement(6548..6582) /note="exon_id=ENSGACE00000016893" exon complement(13289..13313) /note="exon_id=ENSGACE00000016862" exon complement(9267..9302) /note="exon_id=ENSGACE00000016873" exon complement(9718..9792) /note="exon_id=ENSGACE00000016870" exon complement(13612..13644) /note="exon_id=ENSGACE00000016858" exon complement(3787..3936) /note="exon_id=ENSGACE00000016915" exon complement(6147..6212) /note="exon_id=ENSGACE00000016899" exon complement(6594..6760) /note="exon_id=ENSGACE00000016891" exon complement(2230..2423) /note="exon_id=ENSGACE00000016920" exon complement(8704..8786) /note="exon_id=ENSGACE00000016877" exon complement(7497..7662) /note="exon_id=ENSGACE00000016884" exon complement(16523..16551) /note="exon_id=ENSGACE00000016940" exon complement(16802..17577) /note="exon_id=ENSGACE00000016938" exon complement(23568..23708) /note="exon_id=ENSGACE00000016969" exon complement(18492..19760) /note="exon_id=ENSGACE00000017004" exon complement(20661..20713) /note="exon_id=ENSGACE00000016989" exon complement(21093..21501) /note="exon_id=ENSGACE00000016980" exon complement(19856..20080) /note="exon_id=ENSGACE00000016996" exon complement(24749..24877) /note="exon_id=ENSGACE00000016956" exon complement(25218..25373) /note="exon_id=ENSGACE00000016951" exon complement(25716..25815) /note="exon_id=ENSGACE00000016949" exon complement(21929..22470) /note="exon_id=ENSGACE00000016971" exon complement(24989..25111) /note="exon_id=ENSGACE00000016953" exon complement(20841..20959) /note="exon_id=ENSGACE00000016984" exon complement(20334..20468) /note="exon_id=ENSGACE00000016991" exon complement(24488..24643) /note="exon_id=ENSGACE00000016961" exon complement(23816..24424) /note="exon_id=ENSGACE00000016965" exon complement(21610..21727) /note="exon_id=ENSGACE00000016974" exon complement(36242..36301) /note="exon_id=ENSGACE00000017125" exon complement(29741..29855) /note="exon_id=ENSGACE00000017086" exon complement(34272..34423) /note="exon_id=ENSGACE00000017035" exon complement(33575..33642) /note="exon_id=ENSGACE00000017051" exon complement(35650..35756) /note="exon_id=ENSGACE00000017021" exon complement(29318..29398) /note="exon_id=ENSGACE00000017135" exon complement(34074..34191) /note="exon_id=ENSGACE00000017044" exon complement(29299..29398) /note="exon_id=ENSGACE00000017095" exon complement(30537..30625) /note="exon_id=ENSGACE00000017071" exon complement(33108..33181) /note="exon_id=ENSGACE00000017056" exon complement(31894..31981) /note="exon_id=ENSGACE00000017062" exon complement(35555..35849) /note="exon_id=ENSGACE00000017127" exon complement(35555..35616) /note="exon_id=ENSGACE00000017028" exon complement(31122..31249) /note="exon_id=ENSGACE00000017066" exon complement(29498..29635) /note="exon_id=ENSGACE00000017091" exon complement(28650..28810) /note="exon_id=ENSGACE00000017102" exon complement(34505..34731) /note="exon_id=ENSGACE00000017031" exon complement(32781..32974) /note="exon_id=ENSGACE00000017060" exon complement(30289..30441) /note="exon_id=ENSGACE00000017076" exon 55148..55290 /note="exon_id=ENSGACE00000017228" exon 54343..54535 /note="exon_id=ENSGACE00000017218" exon 55293..55316 /note="exon_id=ENSGACE00000017233" exon 54563..54731 /note="exon_id=ENSGACE00000017224" exon 55605..55637 /note="exon_id=ENSGACE00000017197" exon 54756..54915 /note="exon_id=ENSGACE00000017188" exon 54343..54534 /note="exon_id=ENSGACE00000017167" exon 53987..54136 /note="exon_id=ENSGACE00000017156" exon 54052..54136 /note="exon_id=ENSGACE00000017212" exon 55148..55314 /note="exon_id=ENSGACE00000017193" exon 54562..54707 /note="exon_id=ENSGACE00000017179" exon 55420..55428 /note="exon_id=ENSGACE00000017240" exon 106305..106380 /note="exon_id=ENSGACE00000017258" exon 69424..70049 /note="exon_id=ENSGACE00000017248" exon 105631..105758 /note="exon_id=ENSGACE00000017255" exon 70523..70546 /note="exon_id=ENSGACE00000017250" exon complement(70528..70553) /note="exon_id=ENSGACE00000017275" exon complement(69458..70045) /note="exon_id=ENSGACE00000017281" exon complement(72179..72266) /note="exon_id=ENSGACE00000017268" exon complement(71520..71631) /note="exon_id=ENSGACE00000017272" exon complement(75907..76022) /note="exon_id=ENSGACE00000017299" exon complement(74977..75120) /note="exon_id=ENSGACE00000017305" exon complement(74761..74861) /note="exon_id=ENSGACE00000017308" exon complement(76434..76594) /note="exon_id=ENSGACE00000017296" exon complement(75749..75819) /note="exon_id=ENSGACE00000017303" exon complement(77706..77849) /note="exon_id=ENSGACE00000017294" exon complement(74610..74684) /note="exon_id=ENSGACE00000017310" exon complement(98829..98989) /note="exon_id=ENSGACE00000017342" exon complement(93438..93553) /note="exon_id=ENSGACE00000017375" exon complement(90678..91320) /note="exon_id=ENSGACE00000017391" exon complement(94750..94850) /note="exon_id=ENSGACE00000017366" exon complement(99921..100224) /note="exon_id=ENSGACE00000017324" exon complement(99751..99817) /note="exon_id=ENSGACE00000017329" exon complement(94565..94673) /note="exon_id=ENSGACE00000017369" exon complement(91530..91667) /note="exon_id=ENSGACE00000017386" exon complement(95645..95715) /note="exon_id=ENSGACE00000017355" exon complement(91829..91933) /note="exon_id=ENSGACE00000017381" exon complement(92020..92107) /note="exon_id=ENSGACE00000017379" exon complement(99304..99376) /note="exon_id=ENSGACE00000017335" exon complement(94942..95100) /note="exon_id=ENSGACE00000017360" exon complement(92441..92582) /note="exon_id=ENSGACE00000017377" exon complement(93628..93719) /note="exon_id=ENSGACE00000017372" exon complement(95803..95918) /note="exon_id=ENSGACE00000017347" exon complement(110004..110119) /note="exon_id=ENSGACE00000017407" exon complement(110524..110675) /note="exon_id=ENSGACE00000017404" exon complement(107965..108080) /note="exon_id=ENSGACE00000017418" exon complement(106493..106528) /note="exon_id=ENSGACE00000017422" exon complement(108157..108248) /note="exon_id=ENSGACE00000017414" exon complement(108694..108802) /note="exon_id=ENSGACE00000017413" exon complement(109846..109916) /note="exon_id=ENSGACE00000017408" exon complement(109155..109214) /note="exon_id=ENSGACE00000017410" exon complement(106762..106946) /note="exon_id=ENSGACE00000017420" exon complement(105654..105755) /note="exon_id=ENSGACE00000017425" exon complement(106302..106406) /note="exon_id=ENSGACE00000017424" exon complement(108879..109033) /note="exon_id=ENSGACE00000017412" misc_feature 1..5487 /note="contig contig_13399 1..5487(1)" misc_feature 5852..7735 /note="contig contig_13400 1..1884(1)" misc_feature 8660..14728 /note="contig contig_13401 1..6069(1)" misc_feature 15200..39327 /note="contig contig_13402 1..24128(1)" misc_feature 46327..47864 /note="contig contig_13403 1..1538(1)" misc_feature 50118..51320 /note="contig contig_13404 1..1203(1)" misc_feature 53911..55318 /note="contig contig_13405 1..1408(1)" misc_feature 55419..56091 /note="contig contig_13406 1..673(1)" misc_feature 56664..57183 /note="contig contig_13407 1..520(1)" misc_feature 57284..83877 /note="contig contig_13408 1..26594(1)" misc_feature 83978..137802 /note="contig contig_13409 1..53825(1)" BASE COUNT 32610 a 28563 c 28961 g 33195 t 14473 n ORIGIN 1 GGTTTACCTC CCGGGGGGGG GGCGACACGG CGGAGTTGCC CCCCCGGAGG GAACCAGCCG -- Fill out the the GMOD Community Survey NOW and win some GMOD Gear: http://gmod.org/wiki/GMOD_News#2008_GMOD_Community_Survey -------------- next part -------------- A non-text attachment was scrubbed... Name: scaffold_180.genbank Type: application/octet-stream Size: 213889 bytes Desc: not available URL: From pabignone at gmail.com Thu Oct 30 09:22:06 2008 From: pabignone at gmail.com (Paola Bignone) Date: Thu, 30 Oct 2008 13:22:06 +0000 Subject: [Bioperl-l] Run::Primer3 and no primer return In-Reply-To: <48FF665B.7040105@gmail.com> References: <40d6e6580810220702w3017c3e8s9d2f3c7542e8a585@mail.gmail.com> <48FF3A60.7030605@cam.ac.uk> <40d6e6580810220848ya17bef1ub5c3dc6df5989ae5@mail.gmail.com> <48FF665B.7040105@gmail.com> Message-ID: <40d6e6580810300622n7f2c4883r5ee1dc7a47293ae3@mail.gmail.com> Thank you all for the suggestions. The simple script from the documentation runs well. It seems that some of the settings in server were the problem; the system administrator sorted it that out for me. Cheers, Paola From jieuiuc at yahoo.com Thu Oct 30 13:56:53 2008 From: jieuiuc at yahoo.com (Jie Zhang) Date: Thu, 30 Oct 2008 10:56:53 -0700 (PDT) Subject: [Bioperl-l] help: can't run Bioperl Message-ID: <595974.61204.qm@web31008.mail.mud.yahoo.com> HI, ? I'm new to BioPerl and just finished installing BioPerl on Windows XP using PPM,?strictly followed the instruction. At the end, a message showed?45 packages were installed for Bioperl CORE and? Bioperl. However, when I?tested if?it is?installed properly, I?encountered problem. I wrote?a two-line script file called bp.pl ? #!/bin/perl -w use Bio::Perl; ? The compilation step failed?and gave me this message"use?not allowed in the expression at bp.pl line 3, syntax error at bp.pl line 3, near"use Bio::Perl"...." ? That warning appeared no matter the script is "use Bio::Seq" or other modules. What could be wrong, is that an installation problem? Could you please help me? ? Thank you very much ? Jie From Kevin.M.Brown at asu.edu Thu Oct 30 15:25:35 2008 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 30 Oct 2008 12:25:35 -0700 Subject: [Bioperl-l] help: can't run Bioperl In-Reply-To: <595974.61204.qm@web31008.mail.mud.yahoo.com> References: <595974.61204.qm@web31008.mail.mud.yahoo.com> Message-ID: <1A4207F8295607498283FE9E93B775B40573013D@EX02.asurite.ad.asu.edu> Could be a line ending problem in your script. You say it is a 2 line script, but perl is breaking on the third line. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Jie Zhang > Sent: Thursday, October 30, 2008 10:57 AM > To: bioperl > Subject: [Bioperl-l] help: can't run Bioperl > > HI, > ? > I'm new to BioPerl and just finished installing BioPerl on > Windows XP using PPM,?strictly followed the instruction. At > the end, a message showed?45 packages were installed for > Bioperl CORE and? Bioperl. However, when I?tested if?it is? installed properly, I?encountered problem. I wrote?a two-line > script file called bp.pl > ? > #!/bin/perl -w > use Bio::Perl; > ? > The compilation step failed?and gave me this message"use?not > allowed in the expression at bp.pl line 3, syntax error at > bp.pl line 3, near"use Bio::Perl"...." > ? > That warning appeared no matter the script is "use Bio::Seq" > or other modules. What could be wrong, is that an > installation problem? Could you please help me? > ? > Thank you very much > ? > Jie > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l >