From wes.barris at csiro.au Mon Sep 1 01:29:13 2003 From: wes.barris at csiro.au (Wes Barris) Date: Mon Sep 1 01:28:18 2003 Subject: [Bioperl-l] ace to msf format? In-Reply-To: References: <3F527B6F.5030206@csiro.au> Message-ID: <3F52D929.2080305@csiro.au> Jason Stajich wrote: > Each Contig is a Bio::Align::AlignI - so in theory you can manipulate > them as if they are Bio::SimpleAlign objects. Robson can clarify if there > are any caveats there. > > > But you want to do this to have access to each contig in the scaffold: > foreach my $contig ( $scaffold->all_contigs ) { > # process Bio::Assembly::Contig object here > } Thanks Jason, that makes sense. Perhaps I'm missing something obvious but I am getting an error when treating each contig as a Bio::SimpleAlign object. Here is my code: #!/usr/local/bin/perl -w # use strict; use Bio::Assembly::IO; use Bio::AlignIO; # my $usage = "Usage: $0 \n"; my $infile = shift or die $usage; my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); my $assembly = $io->next_assembly; foreach my $contig ($assembly->all_contigs()) { my $name = "cn".$contig->id; print("$name\n"); my $outstream = new Bio::AlignIO(-format=>'msf', -file=>">$name"); $outstream->write_aln($contig); undef $outstream; } And here is the runtime error: cn1 Use of uninitialized value in hash element at /usr/lib/perl5/site_perl/5.6.1/Bio/Assembly/Contig.pm line 1305, line 33990. Use of uninitialized value in hash element at /usr/lib/perl5/site_perl/5.6.1/Bio/Assembly/Contig.pm line 1305, line 33990. Can't call method "alphabet" on an undefined value at /usr/lib/perl5/site_perl/5.6.1/Bio/AlignIO/msf.pm line 180, line 33990. I am using bioperl-1.2.2. > > Your code below is calling it in scalar context which will just have $aln > being set to the length of the returned array. > > -jason > > On Mon, 1 Sep 2003, Wes Barris wrote: > > >>Brian Osborne wrote: >> >> >>>Wes, >>> >>>I don't think this is possible in Bioperl. To put it more generally, AlignIO >>>can't accommodate Assembly objects currently. AlignIO is the module that >>>takes in a variety of alignment formats and interconverts them, analogous to >>>SeqIO. I'll be corrected if I'm wrong. >>> >>>Brian O. >> >>I am kind of new to this so I could be wrong but isn't an Assembly a group >>of alignments? So, from one assemble, a group of alignments could be >>generated? >> >> >>>-----Original Message----- >>>From: bioperl-l-bounces@portal.open-bio.org >>>[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wes Barris >>>Sent: Thursday, August 28, 2003 7:58 PM >>>To: Bioperl Mailing List >>>Subject: [Bioperl-l] ace to msf format? >>> >>>Can anyone give me a hint as to how I could use bioperl to read in >>>an ACE assembly and write out an MSF formatted alignment? This shows >>>what I have figured out so far: >>> >>>#!/usr/local/bin/perl -w >>># >>>use strict; >>>use Bio::Assembly::IO; >>># >>>my $usage = "Usage: $0 \n"; >>>my $infile = shift or die $usage; >>> >>>my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); >>>my $assembly = $io->next_assembly; >>> >>>my $aln = $assembly->all_contigs(); >>> >>>-- >>>Wes Barris >>>E-Mail: Wes.Barris@csiro.au >>> >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu -- Wes Barris E-Mail: Wes.Barris@csiro.au From rysz_c1 at yahoo.com Mon Sep 1 02:41:47 2003 From: rysz_c1 at yahoo.com (Flywheel) Date: Mon Sep 1 05:41:10 2003 Subject: [Bioperl-l] Re: How Energy Storage can help us ? Message-ID: <200309010941.h819f5fg001379@localhost.localdomain> Nice Hello from Flywheel Storage & Sun Tracking Goodbay to Blackout forever rysz_c2@yahoo.com Responding to your Questions; We offer Safety Solution Rolling Blackout will always happened, since the SYSTEM - is design to protect itself in dangerous Over-Current Situations. Breakers basic duty is defense whole system, shut down, and isolate from Grid; "Wave of over-current" travel to next Sub-Station and stimulate the same results on another, then another, then another. And a VOLTAGE is growing rapidly. Growing Disconnection in Domino Effect; we did seen twice; Accident ~ 30 years ago (baby boom) - for purpose recently. To decrease all "Over-Current situation" in POWER LINE or reduce to Zero ALL EXCESS ENERGY USE Energy Storage 25 - 1000 MWh and no Sub - Station will ever have any risk again. Small units UPS 20 - 200 kWh protect Stories, mainframes, servers, computers and homes, hospitals, public & private places; give back emergency energy for hours & days and/or time for diesel/gas generator to work. But with Solar Palette & Sun Tracking - you don't need anything else. & Enjoy Safety & Energy Supply for generations to come; Rest you can find on: www.sun-tracking.com www.flywheel-storage.com rysz_c2@yahoo.com Thanks, Have a nice Weekend ! From rysz_c1 at yahoo.com Mon Sep 1 02:41:49 2003 From: rysz_c1 at yahoo.com (Flywheel) Date: Mon Sep 1 05:41:11 2003 Subject: [Bioperl-l] Re: How Energy Storage can help us ? Message-ID: <200309010941.h819f1fg001367@localhost.localdomain> Nice Hello from Flywheel Storage & Sun Tracking Goodbay to Blackout forever rysz_c2@yahoo.com Responding to your Questions; We offer Safety Solution Rolling Blackout will always happened, since the SYSTEM - is design to protect itself in dangerous Over-Current Situations. Breakers basic duty is defense whole system, shut down, and isolate from Grid; "Wave of over-current" travel to next Sub-Station and stimulate the same results on another, then another, then another. And a VOLTAGE is growing rapidly. Growing Disconnection in Domino Effect; we did seen twice; Accident ~ 30 years ago (baby boom) - for purpose recently. To decrease all "Over-Current situation" in POWER LINE or reduce to Zero ALL EXCESS ENERGY USE Energy Storage 25 - 1000 MWh and no Sub - Station will ever have any risk again. Small units UPS 20 - 200 kWh protect Stories, mainframes, servers, computers and homes, hospitals, public & private places; give back emergency energy for hours & days and/or time for diesel/gas generator to work. But with Solar Palette & Sun Tracking - you don't need anything else. & Enjoy Safety & Energy Supply for generations to come; Rest you can find on: www.sun-tracking.com www.flywheel-storage.com rysz_c2@yahoo.com Thanks, Have a nice Weekend ! From heikki at ebi.ac.uk Mon Sep 1 05:48:54 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Sep 1 05:47:52 2003 Subject: [Bioperl-l] bugs on branch; tests on main trunk In-Reply-To: References: Message-ID: <1062409729.2062.37.camel@bala> On Fri, 2003-08-29 at 17:56, Ewan Birney wrote: > I have fixed the translate() and pdb res bug on the branch. > Great. > Heikki or Rob --- does RestrictionEnzyme *really* need Storeable? > Storeable doesn't come by default on systems, so if it didn't need > it then it would be more useful not to use it. Any chance of this? Storable is used to do deep cloning of Enzyme objects. Storable is also used by Bio::DB::FileCache and Bio::SeqFeature::Collection (or is the documentation for Collection outdated?). Storable is part of 5.8.1 distribution. There is Clone in CPAN which is faster but less systems are bound to have is. If you think is is critical to get rid of this dependency I can rewrite the cloning method. -Heikki > > I put a require eval() in tutorial around the restriction enzyme stuff. > > > > > Chris (and the unflattening crew...) the Unflattener is issueing alot of > warnings with -w --- any chance of one of you looking at it? > > > However, I now have on the main trunk: > > > All tests successful, 33 subtests skipped. > Files=168, Tests=7643, 383 wallclock secs (287.31 cusr + 25.46 csys = 312.77 CPU) > > > > > > > Pretty darn impressive. > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Mon Sep 1 06:21:09 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Sep 1 06:20:12 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: References: Message-ID: <1062411662.2062.44.camel@bala> I am here! Two week without email on holiday needs one week to sort things out. I am ~ ready again. Jason, Can you do the supervise the 1.2.3 release? I'll start looking into 1.3/1.4. I'll post an other message to start a new thread on it. -Heikki On Fri, 2003-08-29 at 17:50, Ewan Birney wrote: > On Fri, 29 Aug 2003, Lincoln Stein wrote: > > > What are the timetables for 1.2.3 and 1.3? > > > > > 1.2.3 soon(ish) > > 1.3 Heikki should decide. I am not on campus at the moment, so will try to > track him down next week... > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Mon Sep 1 06:49:12 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Sep 1 06:48:06 2003 Subject: [Bioperl-l] planning for bioperl 1.4 Message-ID: <1062413344.2062.69.camel@bala> Summer is over and it is time to start preparing next major release of bioperl which will be release 1.4. Before it is out there will be a series of tarballed development releases named 1.3.0, 1.3.1, 1.3.2, ... We will tag them in the cvs head but will not branch until 1.4.0 release. First, we need to get a clearer picture what has been added into bioperl since 1.2 release was made at the end of year 2002. Please, add them into Changes file in the CVS head. Never mind the format, we can sort that out later. I'll go through news to see what major additions there have been. Secondly, all developers, please post here about projects you'd like to finish before the release and your estimate when it would be ready for release. Depending on those comments, we can readjust the release schedule but I'd like to see first 1.3.X out in a month (some time after 1.2.3) and 1.4 in two months at most. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From Luc.Gauthier at aventis.com Mon Sep 1 10:16:18 2003 From: Luc.Gauthier at aventis.com (Luc.Gauthier@aventis.com) Date: Mon Sep 1 10:15:43 2003 Subject: [Bioperl-l] (no subject) Message-ID: <6FA8B454A1DF1E4A97F0A48DCC324EAB7200F0@crbsmxsusr05.pharma.aventis.com> From markus at kador.de Mon Sep 1 10:19:45 2003 From: markus at kador.de (Markus Kador) Date: Mon Sep 1 10:18:29 2003 Subject: [Bioperl-l] GeneDB Question Message-ID: <5603E980-DC87-11D7-9123-000393CEC144@kador.de> Hi, I would like to get sequence data form GeneDB (http://www.genedb.org/) in my perl script. Since there is no module available I wanted to ask if anyone has ever done that or has any pointers on how to achive that. Specifically the blast server would be interesting. Thanks in advance, Markus From kdj at sanger.ac.uk Mon Sep 1 12:46:01 2003 From: kdj at sanger.ac.uk (Keith James) Date: Mon Sep 1 12:46:03 2003 Subject: [Bioperl-l] GeneDB Question In-Reply-To: <5603E980-DC87-11D7-9123-000393CEC144@kador.de> References: <5603E980-DC87-11D7-9123-000393CEC144@kador.de> Message-ID: >>>>> "Markus" == Markus Kador writes: Markus> Hi, I would like to get sequence data form GeneDB Markus> (http://www.genedb.org/) in my perl script. Since there Markus> is no module available I wanted to ask if anyone has ever Markus> done that or has any pointers on how to achive Markus> that. Specifically the blast server would be interesting. As I'm at Sanger I've just been round to the genedb office to ask about this. I think that you will have to try screen-scraping the omniblast page (rather than the individual organism blast pages). This way you can search all the data but only have to maintain your script to mirror the changes to one submission web page. However, that page is subject to periodic changes in formatting and in the number and labelling of radio buttons and checkboxes. As you know, there is no public server or API. There is no likelihood of these becoming available in the forseeable future, so a web-scraper may be worth the effort. I also asked about ftp availability of the data because I think that if you have the resources (disk space & local blast) your best option is to ftp the data to your local machine. Due to ongoing data-release policy issues the ftp site data is not complete for some organisms. You would need to contact the genedb people directly about that. HTH Keith -- - Keith James Microarray Facility, Team 65 - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From lstein at cshl.edu Mon Sep 1 18:18:31 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Sep 1 18:17:56 2003 Subject: [Bioperl-l] Bio::Graphics::Panel, -spacing => 0 constructor problem. In-Reply-To: <200308222223.h7MMN14c022854@mx3.nyu.edu> References: <200308222223.h7MMN14c022854@mx3.nyu.edu> Message-ID: <200309011818.31470.lstein@cshl.edu> Spacing adds additional padding between tracks. You cannot get them to overly each other. Possibly -start and -end are not doing what you think they should do. Lincoln On Friday 22 August 2003 06:23 pm, Philip MacMenamin wrote: > Hi, > > Am I right in thinking that the '-spacing' constructor for > Bio::Graphics::Panel, if set to 0, should result in no space between > tracks? ie That the tracks are either over-laying eachother, or squashed > down onto the same plane as the previous one. > > I cannot get this to happen if this is its purpose. It continues to stack > the tracks on the panel as per default. (So I have left it out, its not > useful to see this). I know that it works, because I have seen it working > in the wormbase UTRs. > > Here is some code: > # if (scalar @threePrimeUTR >0) > # { > # $panel->add_track(generic=>\@threePrimeUTR, > # -bgcolor => 'lightblue', > # -fgcolor => 'black', > # # -bump => +1, > # -spacing => 0, > # -utr_color => '#D0D0D0', ##whats this about?, chnging makes no dif? > # -font2color => 'blue', > # -height => 10, > # -description => 1, > # -label => '3 prime UTR' > # } > > I have also tried to set spacing to 0 on the tracks surrounding the UTRs, > but to no avail. > > Also, on a slightly differant vein, I cant seem to get the > Bio::Graphics::Panel start end constructors to work either. All of which is > making me increasingly suspicious of my perl skills. It just makes no > differance if I provide these arguments or not. The segment or sequence obj > always over-rides the start stop args. Not a massive problem, but it has > confused me. > > More code: > > my $panel = Bio::Graphics::Panel->new( -segment => $segment, > -width => 600, > -key_color => '#ffffcc', > -start =>$panelStart, > -end =>$panelEnd, > # -start => 4110000, > ); > > Any help is of course appreciated. -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From Richard.Adams at ed.ac.uk Tue Sep 2 04:16:11 2003 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Tue Sep 2 04:15:02 2003 Subject: [Bioperl-l] planning for bioperl 1.4 Message-ID: <3F5451CB.B920E8A8@ed.ac.uk> Re Bio::Tools::Analysis::Protein/DNA modules, I need to write some example scripts and more documentation about how to write new modules of this type - will be able to do that by end of September. Just out of interest, How is it decided what goes in a new release? Is it everything that works or is there some sort of selection made? Richard -- Dr Richard Adams Bioinformatician, Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From birney at ebi.ac.uk Tue Sep 2 04:29:52 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Sep 2 04:28:45 2003 Subject: [Bioperl-l] planning for bioperl 1.4 In-Reply-To: <3F5451CB.B920E8A8@ed.ac.uk> Message-ID: On Tue, 2 Sep 2003, Richard Adams wrote: > Re Bio::Tools::Analysis::Protein/DNA modules, > > I need to write some example scripts and more documentation about > how to write new modules > of this type - will be able to do that by end of September. > Just out of interest, > How is it decided what goes in a new release? > Is it everything that works or is there some sort of selection made? > Everything that works ;) From heikki at nildram.co.uk Tue Sep 2 04:31:42 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Sep 2 04:30:53 2003 Subject: [Bioperl-l] GeneDB Question In-Reply-To: References: <5603E980-DC87-11D7-9123-000393CEC144@kador.de> Message-ID: <1062491502.2035.11.camel@bala> Markus, Since screen-scraping is what is needed the absolutely easiest way to do it is to use WWW::Mechanize. If you want to be a bit more compatible to most installations, you can use bioperl module Bio::WebAgent which is built on top of LWP::UserAgent. Incidently, WWW::Mechanize is a subclass of LWP::UserAgent, too, so you could test for the availability and sneakily bless Bio::WebAgent into WWW::Mechanize! Have a look at Bio::DB::MeSH for examples. I got carried away and included code based on several different modules. (The MeSH modulue will be renamed at some point.) -Heikki On Mon, 2003-09-01 at 17:47, Keith James wrote: > >>>>> "Markus" == Markus Kador writes: > > Markus> Hi, I would like to get sequence data form GeneDB > Markus> (http://www.genedb.org/) in my perl script. Since there > Markus> is no module available I wanted to ask if anyone has ever > Markus> done that or has any pointers on how to achive > Markus> that. Specifically the blast server would be interesting. > > As I'm at Sanger I've just been round to the genedb office to ask > about this. > > I think that you will have to try screen-scraping the omniblast page > (rather than the individual organism blast pages). This way you can > search all the data but only have to maintain your script to mirror > the changes to one submission web page. However, that page is subject > to periodic changes in formatting and in the number and labelling of > radio buttons and checkboxes. > > As you know, there is no public server or API. There is no likelihood > of these becoming available in the forseeable future, so a web-scraper > may be worth the effort. > > I also asked about ftp availability of the data because I think that > if you have the resources (disk space & local blast) your best option > is to ftp the data to your local machine. Due to ongoing data-release > policy issues the ftp site data is not complete for some > organisms. You would need to contact the genedb people directly about > that. > > HTH > > Keith -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Tue Sep 2 04:41:51 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Sep 2 04:41:09 2003 Subject: [Bioperl-l] planning for bioperl 1.4 In-Reply-To: <3F5451CB.B920E8A8@ed.ac.uk> References: <3F5451CB.B920E8A8@ed.ac.uk> Message-ID: <1062492111.2040.22.camel@bala> On Tue, 2003-09-02 at 09:16, Richard Adams wrote: > Re Bio::Tools::Analysis::Protein/DNA modules, > > I need to write some example scripts and more documentation about > how to write new modules > of this type - will be able to do that by end of September. Great! > Just out of interest, > How is it decided what goes in a new release? > Is it everything that works or is there some sort of selection made? It is in the discretion of the release manager. In practise, everything that works somehow is included unless there is a danger of seriously confusing users. If the code is not really ready to be used, we can include it and just not announce it. The main thing is that there are tests for the code and they pass. If anyone feels that their code is not ready for the release, please let me know and it will be stripped off the release branch. -Heikki > Richard > > > -- > Dr Richard Adams > Bioinformatician, > Psychiatric Genetics Group, > Medical Genetics, > Molecular Medicine Centre, > Western General Hospital, > Crewe Rd West, > Edinburgh UK > EH4 2XU > > Tel: 44 131 651 1084 > richard.adams@ed.ac.uk > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From cynthiaprattrg at cas.ensmp.fr Tue Sep 2 01:50:32 2003 From: cynthiaprattrg at cas.ensmp.fr (Cynthia Pratt) Date: Tue Sep 2 04:49:22 2003 Subject: [Bioperl-l] Should we tr y anyways? Message-ID: <8b0b01c37116$adabaa95$b70ea597@q56kv33> Introducing VP-RX Pills VP-RX will Expand, Lengthen and Enlarge your Penís 3+ Inches! 100% Satísfaction Guaranteed! * Totally confidential, no one needs to know! * No embarrassing doctor or pharmacy visits! * We have sold over 1 million bottles! * If you don't like our product don't keep it, send it back for a 100% money back refund! * For a limited time, free bottle with your purchase! Visit our site: http://www.dealsbytheminute.biz/mka/m2c.php?man=st4vp I want to be removed from future ads: http://www.dealsbytheminute.biz/bek/ From Richard.Adams at ed.ac.uk Tue Sep 2 09:58:37 2003 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Tue Sep 2 09:57:30 2003 Subject: [Bioperl-l] codon useage modules Message-ID: <3F54A20D.C2BBBD56@ed.ac.uk> Hi, I'm writing a couple of modules for interrogating the codon usage database s(http://www.kazusa.or.jp/codon/) and retrieving its statistics; Module 1 inherits from Bio::WebAgent and contacts the DB with a species query, gets the appropriate Codon Useage Table and objectifies it. Module 2 supplies the methods and just inherits from Bio::Root::Root but uses modules like Bio::SeqUtils and Bio::Tools::CodonTable quite heavily. methods include e.g., $ct->get_rel_frequency('TGT'), $ct->get_aa_frequency('Leu'), $ct->preferred_codon('Arg'), . Any recommendations for where these modules should be put in the CVS? Cheers Richard -- Dr Richard Adams Bioinformatician, Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From rapcbuddhist at mindspring.com Tue Sep 2 10:22:17 2003 From: rapcbuddhist at mindspring.com (vssInflict Hilarious Pranks) Date: Tue Sep 2 10:25:53 2003 Subject: [Bioperl-l] Sir Laugh-a-lot sdyfp Message-ID: <200309021425.h82EOofi009600@localhost.localdomain> =====FUN PRANKS TO PLAY===== How to use this service: 1. Choose a Prank from the list below (New Pranks). 2. Dial 09050 000 222. 3. Key in your victim`s telephone number. 4. Listen in as the computer dials out to your target! The Pranks: # "This mobile is reported stolen..." *RECOMMENDED* # STD Clinic has bad news...! # A strange delivery is about to arrive. # Pissed off caller. # Your live on the radio! +five others to choose from. Calls to 09050 no`s cost one pound and fifty pence per minute and last a few minutes. Service provider: C Fry: 0871 872 3731. To be excluded from future promotions please call 0871 520 0125. Data ticket 541. Message ID: rbdmi. jndfvtnqettqbtmudvxqgs From wcui at UDel.Edu Tue Sep 2 10:49:28 2003 From: wcui at UDel.Edu (Wenwu Cui) Date: Tue Sep 2 10:48:13 2003 Subject: [Bioperl-l] RemoteBlast user: NCBI changer RID extension In-Reply-To: <3F54A20D.C2BBBD56@ed.ac.uk> Message-ID: <000701c37161$692a1d80$b6b6af80@HAIYAN> Wenwu Cui Department of Biological Sciences University of Delaware I contacted the webmaster of NCBI blast server and she told me that they had changed the RID extension by adding .BLASTQ3 at the end of RID. So modify your RemoteBlast.pm (there is a post in August by Dr. Sergey V. Orlov tells you how to do it) and it will work. Wenwu Cui Department of Biological Sciences University of Delaware Email: wcui@udel.edu From rfsouza at citri.iq.usp.br Tue Sep 2 10:59:30 2003 From: rfsouza at citri.iq.usp.br (Robson Francisco de Souza) Date: Tue Sep 2 11:18:17 2003 Subject: [Bioperl-l] ace to msf format? In-Reply-To: <3F52D929.2080305@csiro.au> Message-ID: Hi Wes and Jason, There are indeed some caveats when trying to use Bio::Assembly::Contig objects as Bio::Align::AlignI objects. Not all methods defined in this interface are implemented and some are not working (checked it yesterday using Wes's code). Most routines that are not working can be corrected without much work and some not yet implemented are easy to write but I'm not sure we'll ever get full compliance to the AlignI interface. I'd like to discuss that further but for now let me just clarify why I believe there will be no way to print contig using msf.pm: contigs are not flush, i.e. most contigs will be alignments of sequences of different lengths and, even worst, sequences in a contig may be only locally aligned to each other, which implies that some regions of any sequence in the alignment might not be aligned to the contig consensus but will get printed to MSF any way. As far as I understand AlignI interface, such an alignment (a set of local alignments) is not supported. I've been considering removing AlignI from @ISA in Bio::Assembly::Contig and defining a ContigI interface for it as it seems to me that AlignI interface is not generic enough to describe contigs. The main problem is that any sequence in a contig is only partially aligned to a consensus's subsequence, qich makes some of the methods from AlignI non-sense (e.g. Bio::Align::AlignI::length, which is used by msf.pm). I'd like to hear comments from others on this. So, do not try to use MSF, CLUSTALW or other format of multiple global alignment for printing assemblies, you wont get what you want. Robson On Mon, 1 Sep 2003, Wes Barris wrote: > Thanks Jason, that makes sense. Perhaps I'm missing something obvious > but I am getting an error when treating each contig as a Bio::SimpleAlign > object. Here is my code: > > #!/usr/local/bin/perl -w > # > use strict; > use Bio::Assembly::IO; > use Bio::AlignIO; > # > my $usage = "Usage: $0 \n"; > my $infile = shift or die $usage; > > my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); > my $assembly = $io->next_assembly; > > foreach my $contig ($assembly->all_contigs()) { > my $name = "cn".$contig->id; > print("$name\n"); > my $outstream = new Bio::AlignIO(-format=>'msf', -file=>">$name"); > $outstream->write_aln($contig); > undef $outstream; > } > > And here is the runtime error: > > cn1 > Use of uninitialized value in hash element at > /usr/lib/perl5/site_perl/5.6.1/Bio/Assembly/Contig.pm line 1305, line 33990. > Use of uninitialized value in hash element at > /usr/lib/perl5/site_perl/5.6.1/Bio/Assembly/Contig.pm line 1305, line 33990. > Can't call method "alphabet" on an undefined value at > /usr/lib/perl5/site_perl/5.6.1/Bio/AlignIO/msf.pm line 180, line 33990. > > I am using bioperl-1.2.2. > > > > > > Your code below is calling it in scalar context which will just have $aln > > being set to the length of the returned array. > > > > -jason > > > > On Mon, 1 Sep 2003, Wes Barris wrote: > > > > > >>Brian Osborne wrote: > >> > >> > >>>Wes, > >>> > >>>I don't think this is possible in Bioperl. To put it more generally, AlignIO > >>>can't accommodate Assembly objects currently. AlignIO is the module that > >>>takes in a variety of alignment formats and interconverts them, analogous to > >>>SeqIO. I'll be corrected if I'm wrong. > >>> > >>>Brian O. > >> > >>I am kind of new to this so I could be wrong but isn't an Assembly a group > >>of alignments? So, from one assemble, a group of alignments could be > >>generated? > >> > >> > >>>-----Original Message----- > >>>From: bioperl-l-bounces@portal.open-bio.org > >>>[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wes Barris > >>>Sent: Thursday, August 28, 2003 7:58 PM > >>>To: Bioperl Mailing List > >>>Subject: [Bioperl-l] ace to msf format? > >>> > >>>Can anyone give me a hint as to how I could use bioperl to read in > >>>an ACE assembly and write out an MSF formatted alignment? This shows > >>>what I have figured out so far: > >>> > >>>#!/usr/local/bin/perl -w > >>># > >>>use strict; > >>>use Bio::Assembly::IO; > >>># > >>>my $usage = "Usage: $0 \n"; > >>>my $infile = shift or die $usage; > >>> > >>>my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); > >>>my $assembly = $io->next_assembly; > >>> > >>>my $aln = $assembly->all_contigs(); > >>> > >>>-- > >>>Wes Barris > >>>E-Mail: Wes.Barris@csiro.au > >>> > >>> > >>>_______________________________________________ > >>>Bioperl-l mailing list > >>>Bioperl-l@portal.open-bio.org > >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> > >> > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > > -- > Wes Barris > E-Mail: Wes.Barris@csiro.au > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From natg at shore.net Tue Sep 2 11:26:51 2003 From: natg at shore.net (Nathan (Nat) Goodman) Date: Tue Sep 2 11:27:04 2003 Subject: [Bioperl-l] Desperately seeking object model for drug screening experiments in mouse models Message-ID: <001701c37166$a2151780$3300a8c0@goodmandesktop> Hi Folks I need a Perl object model for drug screening experiments in mouse models. Does anyone have such a thing? I'd also appreciate advice on what such a model should cover. Thanks, Nat Goodman From jason at cgt.duhs.duke.edu Tue Sep 2 11:30:44 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Sep 2 11:30:04 2003 Subject: [Bioperl-l] ace to msf format? In-Reply-To: References: Message-ID: Perhaps it make sense to instead derive a flushed alignment from a Contig - i.e. a get_aln() method - which will make a new SimpleAlign object and padding the individual sequences with the necessary leading and trailing gap characters? Wes - if this is something you need, perhaps you could look into trying to write a method of this sort? -jason On Tue, 2 Sep 2003, Robson Francisco de Souza wrote: > > Hi Wes and Jason, > > There are indeed some caveats when trying to use > Bio::Assembly::Contig objects as Bio::Align::AlignI objects. Not all > methods defined in this interface are implemented and some are not > working (checked it yesterday using Wes's code). Most routines that are > not working can be corrected without much work and some not yet > implemented are easy to write but I'm not sure we'll ever get full > compliance to the AlignI interface. > I'd like to discuss that further but for now let me just clarify > why I believe there will be no way to print contig using msf.pm: contigs > are not flush, i.e. most contigs will be alignments of sequences of > different lengths and, even worst, sequences in a contig may be only > locally aligned to each other, which implies that some regions of any > sequence in the alignment might not be aligned to the contig consensus but > will get printed to MSF any way. As far as I understand AlignI interface, > such an alignment (a set of local alignments) is not supported. > I've been considering removing AlignI from @ISA in > Bio::Assembly::Contig and defining a ContigI interface for it as it seems > to me that AlignI interface is not generic enough to describe contigs. > The main problem is that any sequence in a contig is only partially > aligned to a consensus's subsequence, qich makes some of the methods from > AlignI non-sense (e.g. Bio::Align::AlignI::length, which is used by > msf.pm). I'd like to hear comments from others on this. > So, do not try to use MSF, CLUSTALW or other format of multiple > global alignment for printing assemblies, you wont get what you want. > > Robson > > On Mon, 1 Sep 2003, Wes Barris wrote: > > Thanks Jason, that makes sense. Perhaps I'm missing something obvious > > but I am getting an error when treating each contig as a Bio::SimpleAlign > > object. Here is my code: > > > > #!/usr/local/bin/perl -w > > # > > use strict; > > use Bio::Assembly::IO; > > use Bio::AlignIO; > > # > > my $usage = "Usage: $0 \n"; > > my $infile = shift or die $usage; > > > > my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); > > my $assembly = $io->next_assembly; > > > > foreach my $contig ($assembly->all_contigs()) { > > my $name = "cn".$contig->id; > > print("$name\n"); > > my $outstream = new Bio::AlignIO(-format=>'msf', -file=>">$name"); > > $outstream->write_aln($contig); > > undef $outstream; > > } > > > > And here is the runtime error: > > > > cn1 > > Use of uninitialized value in hash element at > > /usr/lib/perl5/site_perl/5.6.1/Bio/Assembly/Contig.pm line 1305, line 33990. > > Use of uninitialized value in hash element at > > /usr/lib/perl5/site_perl/5.6.1/Bio/Assembly/Contig.pm line 1305, line 33990. > > Can't call method "alphabet" on an undefined value at > > /usr/lib/perl5/site_perl/5.6.1/Bio/AlignIO/msf.pm line 180, line 33990. > > > > I am using bioperl-1.2.2. > > > > > > > > > > Your code below is calling it in scalar context which will just have $aln > > > being set to the length of the returned array. > > > > > > -jason > > > > > > On Mon, 1 Sep 2003, Wes Barris wrote: > > > > > > > > >>Brian Osborne wrote: > > >> > > >> > > >>>Wes, > > >>> > > >>>I don't think this is possible in Bioperl. To put it more generally, AlignIO > > >>>can't accommodate Assembly objects currently. AlignIO is the module that > > >>>takes in a variety of alignment formats and interconverts them, analogous to > > >>>SeqIO. I'll be corrected if I'm wrong. > > >>> > > >>>Brian O. > > >> > > >>I am kind of new to this so I could be wrong but isn't an Assembly a group > > >>of alignments? So, from one assemble, a group of alignments could be > > >>generated? > > >> > > >> > > >>>-----Original Message----- > > >>>From: bioperl-l-bounces@portal.open-bio.org > > >>>[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wes Barris > > >>>Sent: Thursday, August 28, 2003 7:58 PM > > >>>To: Bioperl Mailing List > > >>>Subject: [Bioperl-l] ace to msf format? > > >>> > > >>>Can anyone give me a hint as to how I could use bioperl to read in > > >>>an ACE assembly and write out an MSF formatted alignment? This shows > > >>>what I have figured out so far: > > >>> > > >>>#!/usr/local/bin/perl -w > > >>># > > >>>use strict; > > >>>use Bio::Assembly::IO; > > >>># > > >>>my $usage = "Usage: $0 \n"; > > >>>my $infile = shift or die $usage; > > >>> > > >>>my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); > > >>>my $assembly = $io->next_assembly; > > >>> > > >>>my $aln = $assembly->all_contigs(); > > >>> > > >>>-- > > >>>Wes Barris > > >>>E-Mail: Wes.Barris@csiro.au > > >>> > > >>> > > >>>_______________________________________________ > > >>>Bioperl-l mailing list > > >>>Bioperl-l@portal.open-bio.org > > >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >>> > > >> > > >> > > >> > > > > > > -- > > > Jason Stajich > > > Duke University > > > jason at cgt.mc.duke.edu > > > > > > -- > > Wes Barris > > E-Mail: Wes.Barris@csiro.au > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From kenneth_fitzpatrick_ro at germancash4click.de Tue Sep 2 10:28:46 2003 From: kenneth_fitzpatrick_ro at germancash4click.de (Kenneth T. Fitzpatrick) Date: Tue Sep 2 13:27:38 2003 Subject: [Bioperl-l] How was it lat ely? Message-ID: <01e901c3715e$cc7101d2$114a67f6@xxdg1g2> Buy Generic Víagra on the Internet. Gives you the exact performance and power as Víagra for HALF THE COST. Visit our site: http://www.98207.biz/gv/m2c.php?man=st4gv To be removed: http://www.98207.biz/bek/ From vesko_baev at abv.bg Tue Sep 2 15:12:09 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Tue Sep 2 15:11:06 2003 Subject: [Bioperl-l] help- remote blast Message-ID: <522242135.1062529929121.JavaMail.nobody@java1.ni.bg> Hi, How can I tell to my remoteblast to blast in Arabidopsis genome database? Thank YOU in advance! Vesselin Baev ----------------------------------------------------------------- http://club.ABV.bg - ???? ??? - ?????? ON ! From wcui at UDel.Edu Tue Sep 2 15:37:28 2003 From: wcui at UDel.Edu (Wenwu Cui) Date: Tue Sep 2 15:36:22 2003 Subject: [Bioperl-l] help- remote blast In-Reply-To: <522242135.1062529929121.JavaMail.nobody@java1.ni.bg> Message-ID: <000b01c37189$a5368de0$b6b6af80@HAIYAN> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = $org; $org can be only organism you want. Wenwu Cui Department of Biological Sciences University of Delaware Email: wcui@udel.edu Homepage: http://mywpages.comcast.net/wcui/ -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Vesko Baev Sent: Tuesday, September 02, 2003 3:12 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] help- remote blast Hi, How can I tell to my remoteblast to blast in Arabidopsis genome database? Thank YOU in advance! Vesselin Baev ----------------------------------------------------------------- http://club.ABV.bg - ???? ??? - ?????? ON ! _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From heikki at nildram.co.uk Tue Sep 2 17:14:08 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Sep 2 17:28:02 2003 Subject: [Bioperl-l] codon useage modules In-Reply-To: <3F54A20D.C2BBBD56@ed.ac.uk> References: <3F54A20D.C2BBBD56@ed.ac.uk> Message-ID: <1062537248.1956.10.camel@bala> On Tue, 2003-09-02 at 14:58, Richard Adams wrote: > Hi, > I'm writing a couple of modules for interrogating the codon usage > database s(http://www.kazusa.or.jp/codon/) > and retrieving its statistics; > > Module 1 inherits from Bio::WebAgent and contacts the DB with a species > query, gets the appropriate Codon Useage Table > and objectifies it. Bio::DB::CUTG > Module 2 supplies the methods and just inherits from Bio::Root::Root but > uses modules like Bio::SeqUtils > and Bio::Tools::CodonTable quite heavily. > > methods include e.g., $ct->get_rel_frequency('TGT'), > $ct->get_aa_frequency('Leu'), > $ct->preferred_codon('Arg'), Do you think there will be other modules that deal with codon usage? I suspect there will be (generate codon usage from sequences, compare codon usages, ...) and it would be warrant giving them their own name space: Bio::CodonUsage. Your module could be Bio::CodonUsage::Table -Heikki > Any recommendations for where these modules should be put in the CVS? > > Cheers > > Richard > > > -- > Dr Richard Adams > Bioinformatician, > Psychiatric Genetics Group, > Medical Genetics, > Molecular Medicine Centre, > Western General Hospital, > Crewe Rd West, > Edinburgh UK > EH4 2XU > > Tel: 44 131 651 1084 > richard.adams@ed.ac.uk > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From pm66 at nyu.edu Tue Sep 2 18:44:27 2003 From: pm66 at nyu.edu (Philip MacMenamin) Date: Tue Sep 2 18:46:00 2003 Subject: [Bioperl-l] Bio::Graphics::Panel, -spacing => 0 constructor problem. In-Reply-To: <200309011818.31470.lstein@cshl.edu> References: <200308222223.h7MMN14c022854@mx3.nyu.edu> <200309011818.31470.lstein@cshl.edu> Message-ID: <200309022247.h82MlCAS000711@mx2.nyu.edu> Thanks Lincoln, I was under the impression that the "spacing" argument was referring to spacing on the y axis, and that it defaults to 5 between the stack of tracks. And that if it was set to zero, then the track would lie on the same plain as the previous in the stack. Where in fact it adds additional space between tracks, over and above the normal amount of space. I would like there to be no vertical (y) space between certain tracks, ie the curatedGenes track and the UTR track, similar to the way that wormbase have it. They should not overly eachother, due to them existing in differant x space. What area of Bio::Graphics should I look at to do this? Philip On Monday 01 September 2003 06:18 pm, Lincoln Stein wrote: > Spacing adds additional padding between tracks. You cannot get them to > overly each other. Possibly -start and -end are not doing what you think > they should do. > > Lincoln From ymc at paxil.stanford.edu Tue Sep 2 19:06:09 2003 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Tue Sep 2 19:04:58 2003 Subject: [Bioperl-l] E-value of a combined alignment? Message-ID: Hi folks, I am aligning mRNAs against human genome using ungapped tblastx. I got a bunch of HSPs with different e-values. I can observed that some of them should be in the same group because they are exons of a gene. But then what is the e-value of all these HSPs combined? I know the formulas of e-value and bit score for BLOSUM62: Let S' be bit score, S be score, e be e-value, m be the length of HSP, n be length of database. S' = (0.318 * S - ln(0.135)) / ln(2) e = m * n / (2^(S')) I am guessing the formula for the e-value of non-overlapping combined e-value to be: S'' = (0.318 * sum_of_S - ln(0.135)) / ln(2) e' = sum_of_m * n / (2^(S'')) Is this correct? Or do you know the right way to calculate it? Thanks in advance. Yee Man From wes.barris at csiro.au Tue Sep 2 19:09:40 2003 From: wes.barris at csiro.au (Wes Barris) Date: Tue Sep 2 21:14:44 2003 Subject: [Bioperl-l] ace to msf format? In-Reply-To: References: Message-ID: <3F552334.2040601@csiro.au> Jason Stajich wrote: > Perhaps it make sense to instead derive a flushed alignment from a Contig > - i.e. a get_aln() method - which will make a new SimpleAlign object and > padding the individual sequences with the necessary leading and trailing > gap characters? Yes, that is exactly what I want. > Wes - if this is something you need, perhaps you could look into trying to > write a method of this sort? Yes, I will try. However, I will first have to better understand how all the pieces of Bioperl "fit together" before I will know where to begin. > > -jason > > On Tue, 2 Sep 2003, Robson Francisco de Souza wrote: > > >> Hi Wes and Jason, >> >> There are indeed some caveats when trying to use >>Bio::Assembly::Contig objects as Bio::Align::AlignI objects. Not all >>methods defined in this interface are implemented and some are not >>working (checked it yesterday using Wes's code). Most routines that are >>not working can be corrected without much work and some not yet >>implemented are easy to write but I'm not sure we'll ever get full >>compliance to the AlignI interface. >> I'd like to discuss that further but for now let me just clarify >>why I believe there will be no way to print contig using msf.pm: contigs >>are not flush, i.e. most contigs will be alignments of sequences of >>different lengths and, even worst, sequences in a contig may be only >>locally aligned to each other, which implies that some regions of any >>sequence in the alignment might not be aligned to the contig consensus but >>will get printed to MSF any way. As far as I understand AlignI interface, >>such an alignment (a set of local alignments) is not supported. >> I've been considering removing AlignI from @ISA in >>Bio::Assembly::Contig and defining a ContigI interface for it as it seems >>to me that AlignI interface is not generic enough to describe contigs. >>The main problem is that any sequence in a contig is only partially >>aligned to a consensus's subsequence, qich makes some of the methods from >>AlignI non-sense (e.g. Bio::Align::AlignI::length, which is used by >>msf.pm). I'd like to hear comments from others on this. >> So, do not try to use MSF, CLUSTALW or other format of multiple >>global alignment for printing assemblies, you wont get what you want. >> >> Robson >> >>On Mon, 1 Sep 2003, Wes Barris wrote: >> >>>Thanks Jason, that makes sense. Perhaps I'm missing something obvious >>>but I am getting an error when treating each contig as a Bio::SimpleAlign >>>object. Here is my code: >>> >>>#!/usr/local/bin/perl -w >>># >>>use strict; >>>use Bio::Assembly::IO; >>>use Bio::AlignIO; >>># >>>my $usage = "Usage: $0 \n"; >>>my $infile = shift or die $usage; >>> >>>my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); >>>my $assembly = $io->next_assembly; >>> >>>foreach my $contig ($assembly->all_contigs()) { >>> my $name = "cn".$contig->id; >>> print("$name\n"); >>> my $outstream = new Bio::AlignIO(-format=>'msf', -file=>">$name"); >>> $outstream->write_aln($contig); >>> undef $outstream; >>> } >>> >>>And here is the runtime error: >>> >>>cn1 >>>Use of uninitialized value in hash element at >>>/usr/lib/perl5/site_perl/5.6.1/Bio/Assembly/Contig.pm line 1305, line 33990. >>>Use of uninitialized value in hash element at >>>/usr/lib/perl5/site_perl/5.6.1/Bio/Assembly/Contig.pm line 1305, line 33990. >>>Can't call method "alphabet" on an undefined value at >>>/usr/lib/perl5/site_perl/5.6.1/Bio/AlignIO/msf.pm line 180, line 33990. >>> >>>I am using bioperl-1.2.2. >>> >>> >>> >>>>Your code below is calling it in scalar context which will just have $aln >>>>being set to the length of the returned array. >>>> >>>>-jason >>>> >>>>On Mon, 1 Sep 2003, Wes Barris wrote: >>>> >>>> >>>> >>>>>Brian Osborne wrote: >>>>> >>>>> >>>>> >>>>>>Wes, >>>>>> >>>>>>I don't think this is possible in Bioperl. To put it more generally, AlignIO >>>>>>can't accommodate Assembly objects currently. AlignIO is the module that >>>>>>takes in a variety of alignment formats and interconverts them, analogous to >>>>>>SeqIO. I'll be corrected if I'm wrong. >>>>>> >>>>>>Brian O. >>>>> >>>>>I am kind of new to this so I could be wrong but isn't an Assembly a group >>>>>of alignments? So, from one assemble, a group of alignments could be >>>>>generated? >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: bioperl-l-bounces@portal.open-bio.org >>>>>>[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wes Barris >>>>>>Sent: Thursday, August 28, 2003 7:58 PM >>>>>>To: Bioperl Mailing List >>>>>>Subject: [Bioperl-l] ace to msf format? >>>>>> >>>>>>Can anyone give me a hint as to how I could use bioperl to read in >>>>>>an ACE assembly and write out an MSF formatted alignment? This shows >>>>>>what I have figured out so far: >>>>>> >>>>>>#!/usr/local/bin/perl -w >>>>>># >>>>>>use strict; >>>>>>use Bio::Assembly::IO; >>>>>># >>>>>>my $usage = "Usage: $0 \n"; >>>>>>my $infile = shift or die $usage; >>>>>> >>>>>>my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); >>>>>>my $assembly = $io->next_assembly; >>>>>> >>>>>>my $aln = $assembly->all_contigs(); >>>>>> >>>>>>-- >>>>>>Wes Barris >>>>>>E-Mail: Wes.Barris@csiro.au >>>>>> >>>>>> >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l@portal.open-bio.org >>>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>>> >>>>> >>>>-- >>>>Jason Stajich >>>>Duke University >>>>jason at cgt.mc.duke.edu >>> >>> >>>-- >>>Wes Barris >>>E-Mail: Wes.Barris@csiro.au >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu -- Wes Barris E-Mail: Wes.Barris@csiro.au From vesko_baev at abv.bg Wed Sep 3 05:30:38 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Wed Sep 3 05:29:47 2003 Subject: [Bioperl-l] BioPerl host space? Message-ID: <102876638.1062581438784.JavaMail.nobody@storage.ni.bg> Hi, Can anyone tell me where I can find host server with Bioperl to put and test my CGI-bioperl scripts? I know some servers but without Bioper - only Perl is supported!? Thanks a lot, friends! -------------- Vesselin Baev Plovdiv, BULGARIA ----------------------------------------------------------------- http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. From heikki at ebi.ac.uk Wed Sep 3 06:23:30 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Sep 3 06:22:57 2003 Subject: [Bioperl-l] BioPerl host space? In-Reply-To: <102876638.1062581438784.JavaMail.nobody@storage.ni.bg> References: <102876638.1062581438784.JavaMail.nobody@storage.ni.bg> Message-ID: <1062584605.2062.64.camel@bala> Vesko, I am afraid I can not help you finding an open server with bioperl, but... 1. You should really do the development and testing in your local computer. There are plenty of free web server software packages out there. 2. If you know a server you can use, you can always copy bioperl over and add 'use lib path' in your cgi script to point where you've put it. If size is a problem, you can delete all the unused modules and other file from the server. Yours, -Heikki On Wed, 2003-09-03 at 10:30, Vesko Baev wrote: > Hi, > Can anyone tell me where I can find host server with Bioperl to put and test my CGI-bioperl scripts? I know some servers but without Bioper - only Perl is supported!? > > Thanks a lot, friends! > > -------------- > Vesselin Baev > Plovdiv, BULGARIA > > ----------------------------------------------------------------- > http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From ik1 at sanger.ac.uk Wed Sep 3 07:46:02 2003 From: ik1 at sanger.ac.uk (Ian Korf) Date: Wed Sep 3 07:44:47 2003 Subject: [Bioperl-l] E-value of a combined alignment? In-Reply-To: Message-ID: <3178565C-DE04-11D7-A14A-0003930EC19A@sanger.ac.uk> There are several publications on combined statistical significance of local alignment scores. The ones implemented in BLAST are not exactly the same as the publications though. You can get a pretty decent approximation by subtracting log(KMN) for each gap, but this isn't the proper formula. WU-BLAST is much better for combined statistics than NCBI-BLAST because it shows the actual groups with the -links parameter and allows you to limit the number of groups with the -topcomboN and -topcomboE parameters. It also lets you fine-tune the groupings a bit with -olmax and -olfmax. If the sequences aren't too diverged, you might be better off keeping X low though. -Ian On Wednesday, September 3, 2003, at 12:06 AM, Yee Man Chan wrote: > > Hi folks, > > I am aligning mRNAs against human genome using ungapped tblastx. I > got a bunch of HSPs with different e-values. I can observed that some > of > them should be in the same group because they are exons of a gene. But > then what is the e-value of all these HSPs combined? > > I know the formulas of e-value and bit score for BLOSUM62: > > Let S' be bit score, S be score, e be e-value, m be the length of HSP, > n be length of database. > > S' = (0.318 * S - ln(0.135)) / ln(2) > > e = m * n / (2^(S')) > > I am guessing the formula for the e-value of > non-overlapping combined e-value to be: > > S'' = (0.318 * sum_of_S - ln(0.135)) / ln(2) > > e' = sum_of_m * n / (2^(S'')) > > Is this correct? Or do you know the right way to calculate it? > > Thanks in advance. > Yee Man > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From guojun at idmb.tamu.edu Tue Sep 2 23:12:12 2003 From: guojun at idmb.tamu.edu (Guojun Yang) Date: Wed Sep 3 08:26:52 2003 Subject: [Bioperl-l] ask for help about the new RID format on NCBI server Message-ID: <9DB60E0436608046813742ED5E9989EF01D4BA@mail.idmb.tamu.edu> I sent an email to NCBI service BLAST help (see the email below) and was told that we need to change our regex format. We do not need to use regex for RID in our perl script for Remoteblast. Do we need to change the source code for a certain module? If so, would any one please let me know how to do that? With appreciation, Guojun -----Original Message----- From: Tao, Tao (NIH/NLM/NCBI) To: 'Guojun Yang' Cc: 'blast-help@ncbi.nlm.nih.gov' Sent: 9/2/2003 11:56 AM Subject: RE: [blast-help] Qblast server problem? The RID format has changed to track our internal reorganization. You would need to change your script to cope with this change. Regards, .... NCBI USER SErvice -----Original Message----- From: Guojun Yang [mailto:guojun@idmb.tamu.edu] Sent: Tuesday, September 02, 2003 12:18 PM To: 'blast-help@ncbi.nlm.nih.gov' Subject: [blast-help] Qblast server problem? Dear Blast Help, We are running a perl program using API to send and retrieve BLAST results. It was working very well before. However, in the last week (including the past weekend), we always receive error information. It shows the RID number followed by the word "error". We tried to use the RID to retrieve results the NCBI page using RID, it did not work. We tried many times, it did not work.Would you please help me sorting this out? Sincerely, Guojun IDMB & Biology Dept Texas A&M University From rah at mrc-dunn.cam.ac.uk Wed Sep 3 08:16:17 2003 From: rah at mrc-dunn.cam.ac.uk (Richard Harrington) Date: Wed Sep 3 09:09:38 2003 Subject: [Bioperl-l] E-value of a combined alignment? In-Reply-To: References: Message-ID: <42167.193.60.85.21.1062591377.squirrel@www.mrc-dunn.cam.ac.uk> Hi, > > Hi folks, > > I am aligning mRNAs against human genome using ungapped tblastx. I > got a bunch of HSPs with different e-values. I can observed that some of > them should be in the same group because they are exons of a gene. But > then what is the e-value of all these HSPs combined? > > I know the formulas of e-value and bit score for BLOSUM62: > > Let S' be bit score, S be score, e be e-value, m be the length of HSP, n > be length of database. > > S' = (0.318 * S - ln(0.135)) / ln(2) > > e = m * n / (2^(S')) > > I am guessing the formula for the e-value of > non-overlapping combined e-value to be: > > S'' = (0.318 * sum_of_S - ln(0.135)) / ln(2) > > e' = sum_of_m * n / (2^(S'')) > > Is this correct? Or do you know the right way to calculate it? > > Thanks in advance. > Yee Man > another good place to post this query might be the new 'Sequence Searching mailing list' (ssml). http://bioinformatics.org/mailman/listinfo/ssml-general Cheers, Richard > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Richard Harrington, HUS Graduate Vice-President, Homerton College. PhD student, Bioinformatics Group, MRC-Dunn Human Nutrition Unit, Cambridge University. e-mail: rah@mrc-dunn.cam.ac.uk telephone: +44 (0) 1223 252861 _______________________________________________ From cdwan at mail.ahc.umn.edu Wed Sep 3 10:02:18 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Wed Sep 3 10:01:04 2003 Subject: [Bioperl-l] E-value of a combined alignment? In-Reply-To: Message-ID: > I am aligning mRNAs against human genome using ungapped tblastx. I > got a bunch of HSPs with different e-values. I can observed that some of > them should be in the same group because they are exons of a gene. But > then what is the e-value of all these HSPs combined? > > I know the formulas of e-value and bit score for BLOSUM62: > > Let S' be bit score, S be score, e be e-value, m be the length of HSP, > n be length of database. > > S' = (0.318 * S - ln(0.135)) / ln(2) > > e = m * n / (2^(S')) > > I am guessing the formula for the e-value of > non-overlapping combined e-value to be: > > S'' = (0.318 * sum_of_S - ln(0.135)) / ln(2) > > e' = sum_of_m * n / (2^(S'')) > > Is this correct? Or do you know the right way to calculate it? If you assume that gaps are irrelevant to your similarity metric (say, you want to totally discount introns AND assume that all gaps are caused by non conservation in introns) this is the way to go. This formula pretends that the gaps don't exist and that you're dealing with one long HSP. I believe that this is actually the behavior of NCBI's BLASTP. All of the HSP's in a hit get the same evalue, which is about what you would get if you summed the bit scores of the HSPs and then calculated a final evalue. If you want to include some sort of penalty for the gaps, Ian Korf's suggestion (log(KMN) for each gap) is a good starting point. If "p" scores were really probabilities, we could combine them using the formulas for either dependent or independent events. Has anyone tried this? -Chris Dwan CCGB, University of Minnesota. From quickster333 at hotmail.com Wed Sep 3 13:14:24 2003 From: quickster333 at hotmail.com (Johnny Amos) Date: Wed Sep 3 13:13:09 2003 Subject: [Bioperl-l] SeqIO: Validate File-Format Message-ID: Hello, I am using SeqIO as part of a web-application. I want to verify that files are input in FASTA format, but if they are not, I would like to catch the error and present an error page. SeqIO dies if given a bad file-format: is there anyway to get it to warn with a return code, instead? Or is there another way I should be approaching this problem? TIA, Johnny _________________________________________________________________ Get MSN 8 and enjoy automatic e-mail virus protection. http://join.msn.com/?page=features/virus From ik1 at sanger.ac.uk Wed Sep 3 13:33:48 2003 From: ik1 at sanger.ac.uk (Ian Korf) Date: Wed Sep 3 13:32:35 2003 Subject: [Bioperl-l] E-value of a combined alignment? In-Reply-To: Message-ID: > I believe that this is actually the behavior of NCBI's BLASTP. All of > the > HSP's in a hit get the same evalue, which is about what you would get > if > you summed the bit scores of the HSPs and then calculated a final > evalue. This is definitely not what BLASTP or any other BLAST does. If this was the case, you could sum up the scores for highly insignificant HSPs (e.g. those with an E-value of 1.0) and come up with a very good E-value. The log(KMN) penalty for each HSP subtracts the background expected alignment score [every search has a score with an E-value of 1.0, and this is log(KMN) in the limit of large sequences]. Combining alignments is not so straightforward if you want the HSPs to be consistent (e.g. the N-termini match and the C-termini match rather than the N-terminus matching the C-terminus). In this case, one must evaluate all HSPs to compare the overlaps. Since this is a quadratic operation, it doesn't scale well to large sequences. Setting high values of single-HSP cutoffs helps offset the cost as does gapped alignment, which produces fewer HSPs. The cutoff value is hard-coded in NCBI-BLAST but not WU-BLAST (parameters are S2 and gapS2). > If "p" scores were really probabilities, we could combine them using > the > formulas for either dependent or independent events. Has anyone tried > this? There's loads of literature on this topic already. The papers are mostly theoretical though, and do not really concern themselves with the practicality of biological sequences. Finite sequence lengths pose some problems. For example, the log(KMN) expected score is a little too high. BLAST therefore uses some heuristics to bring this down. The code in NCBI-BLAST that does this is a little frightening and I've no idea what WU-BLAST does though it seems to take length into account in some manner. This stuff (and a whole lot more) is discussed in the O'Reilly BLAST book (sorry for the shameless plug). -Ian > > -Chris Dwan > CCGB, University of Minnesota. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From brian_osborne at cognia.com Wed Sep 3 13:44:51 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Sep 3 13:48:05 2003 Subject: [Bioperl-l] SeqIO: Validate File-Format In-Reply-To: Message-ID: Johnny, You can put the code that might normally throw the fatal error into an eval{} block, like this: ~>perl -e 'use Bio::SeqIO; $io = Bio::SeqIO->new(-file => "test.fa", -format => "swiss" ); eval { $seq = $io->next_seq; }; print $@ if $@;' In this situation there's no fatal error and the error message will be found in $@. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Johnny Amos Sent: Wednesday, September 03, 2003 1:14 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] SeqIO: Validate File-Format Hello, I am using SeqIO as part of a web-application. I want to verify that files are input in FASTA format, but if they are not, I would like to catch the error and present an error page. SeqIO dies if given a bad file-format: is there anyway to get it to warn with a return code, instead? Or is there another way I should be approaching this problem? TIA, Johnny _________________________________________________________________ Get MSN 8 and enjoy automatic e-mail virus protection. http://join.msn.com/?page=features/virus _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Wed Sep 3 13:54:07 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Sep 3 13:52:42 2003 Subject: [Bioperl-l] SeqIO: Validate File-Format In-Reply-To: References: Message-ID: This has been discussed on the list in the past - http://search.open-bio.org and plugging in "SeqIO and validate" brings up some links. The recent thread starts here: http://bioperl.org/pipermail/bioperl-l/2003-June/012416.html On Wed, 3 Sep 2003, Johnny Amos wrote: > Hello, > > I am using SeqIO as part of a web-application. I want to verify that files > are input in FASTA format, but if they are not, I would like to catch the > error and present an error page. SeqIO dies if given a bad file-format: is > there anyway to get it to warn with a return code, instead? Or is there > another way I should be approaching this problem? > > TIA, > Johnny > > _________________________________________________________________ > Get MSN 8 and enjoy automatic e-mail virus protection. > http://join.msn.com/?page=features/virus > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Wed Sep 3 14:28:09 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Sep 3 14:26:42 2003 Subject: [Bioperl-l] remote blast fix Message-ID: I've committed the very simple fix. If you don't want to/cannot deal with upgrading your Bioperl code locally you can just add this to your scripts. This module was written to be intentionally flexible so I encourage folks who use it to read the code and to read more about how to use the CGI script from the NCBI site. use Bio::Tools::Run::RemoteBlast; $Bio::Tools::Run::RemoteBlast::RIDLINE = 'RID\s+\=\s+(\S+)'; See: http://bugzilla.open-bio.org/show_bug.cgi?id=1500 for more info. -- Jason Stajich Duke University jason at cgt.mc.duke.edu From heikki at nildram.co.uk Wed Sep 3 15:57:52 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Sep 3 15:56:37 2003 Subject: cloning and Storable Re: [Bioperl-l] bugs on branch; tests on main trunk In-Reply-To: <1062409729.2062.37.camel@bala> References: <1062409729.2062.37.camel@bala> Message-ID: <1062619071.2047.43.camel@bala> I've removed the dependency for Storable. Storable is still used if it is installed. Local code can clone everything except circular references. If someone knows how to do it, I'd be happy to receive help. Not having it here does not really matter because the the main use of the clone method is to allow in-memory creation of a new enzyme based on an existing one. The clone code is written in very general way and should be able to deep copy any in-memory objects. If you need to add a clone method your own classes, copy from there. Ewan feels strongly that deep cloning is too prone to errors to be a general property of bioperl objects, so better not add this into Bio::Root::Root, although it would be handy. -Heikki On Mon, 2003-09-01 at 10:48, Heikki Lehvaslaiho wrote: > On Fri, 2003-08-29 at 17:56, Ewan Birney wrote: > > Heikki or Rob --- does RestrictionEnzyme *really* need Storeable? > > Storeable doesn't come by default on systems, so if it didn't need > > it then it would be more useful not to use it. Any chance of this? > > Storable is used to do deep cloning of Enzyme objects. Storable is also > used by Bio::DB::FileCache and Bio::SeqFeature::Collection (or is the > documentation for Collection outdated?). > > Storable is part of 5.8.1 distribution. There is Clone in CPAN which is > faster but less systems are bound to have is. > > If you think is is critical to get rid of this dependency I can rewrite > the cloning method. > > -Heikki > > > > > I put a require eval() in tutorial around the restriction enzyme stuff. > > > > > > > > > > Chris (and the unflattening crew...) the Unflattener is issueing alot of > > warnings with -w --- any chance of one of you looking at it? > > > > > > However, I now have on the main trunk: > > > > > > All tests successful, 33 subtests skipped. > > Files=168, Tests=7643, 383 wallclock secs (287.31 cusr + 25.46 csys = 312.77 CPU) > > > > > > > > > > > > > > Pretty darn impressive. > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l From quickster333 at hotmail.com Wed Sep 3 17:21:19 2003 From: quickster333 at hotmail.com (Johnny Amos) Date: Wed Sep 3 17:20:09 2003 Subject: [Bioperl-l] SeqIO: Validate File-Format Message-ID: Ooops... I thought I had already searched the archive at: http://bioperl.org/pipermail/bioperl-l/ When I used the search form there, nothing came up under "validate AND format". Hmm, actually now that I it try again, I can't really get *any* hits from that search form, although it works from the link you have. Is there a trick to that search form I don't know? And thank you for the references: solves my problem exactly. :) Johnny >From: Jason Stajich >To: Johnny Amos >CC: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] SeqIO: Validate File-Format >Date: Wed, 3 Sep 2003 13:54:07 -0400 (EDT) > >This has been discussed on the list in the past - >http://search.open-bio.org >and plugging in "SeqIO and validate" brings up some links. > >The recent thread starts here: > http://bioperl.org/pipermail/bioperl-l/2003-June/012416.html > >On Wed, 3 Sep 2003, Johnny Amos wrote: > > > Hello, > > > > I am using SeqIO as part of a web-application. I want to verify that >files > > are input in FASTA format, but if they are not, I would like to catch >the > > error and present an error page. SeqIO dies if given a bad file-format: >is > > there anyway to get it to warn with a return code, instead? Or is there > > another way I should be approaching this problem? > > > > TIA, > > Johnny > > > > _________________________________________________________________ > > Get MSN 8 and enjoy automatic e-mail virus protection. > > http://join.msn.com/?page=features/virus > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >-- >Jason Stajich >Duke University >jason at cgt.mc.duke.edu _________________________________________________________________ Help protect your PC: Get a free online virus scan at McAfee.com. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 From SFESeminarsLtd at totalise.co.uk Wed Sep 3 22:41:28 2003 From: SFESeminarsLtd at totalise.co.uk (Robert Seviour) Date: Wed Sep 3 22:38:58 2003 Subject: [Bioperl-l] Our next seminars dates Message-ID: <200309040238.h842crsX016244@portal.> Selling for Engineers One-day Seminar Edinburgh Wed 3rd September 2003 Gatwick Thurs 25th September 2003 Amsterdam Fri 3rd October 2003 Belfast Wed 8th October 2003 Dublin Wed 15th October 2003 Boston Wed 29th October 2003 Chicago Mon 2nd November 2003 Dallas Thurs 6th November 2003 San Jose Wed 12th November 2003 Seattle Fri 14th November 2003 Denver Thurs 20th November 2003 Toronto Tues 25th November 2003 The Selling for Engineers seminar is a good introduction to effective sales principles for people who are new to selling, and also a useful refresher for 'old hands'. It applies to selling both technical products and intangible services. Many Sales Engineers have been learning the technical skills of their job for years but have had little formal training in selling. This course helps correct that imbalance. Typical job titles of delegates: Sales Engineer, Account Executive and Business Development Manager. Fee for this event is ?300. Telephone Sales Prospecting for Engineers One-day Workshop Leeds Mon 22nd September 2003 Belfast Thurs 9th October 2003 Dublin Thurs 16th October 2003 This event is a practical workshop teaching people in technical companies how to find new customers on the phone. It is applicable to business development for both tangible products and intangible services. The first session addresses whom to target, what to say and how to handle problems. The remainder of the day consists of live sales calls with coaching from Robert Seviour; the objective being to give delegates some positive experiences of prospecting, make sales appointments and maybe sell something! Please note that this telephone sales prospecting event is restricted to a maximum of six delegates to permit individual coaching. Fee for this event is ?300. Closing Techniques Workshop Half day workshop Edinburgh Wed 4th September 2003 Birmingham Fri 19th September 2003 Leeds Tues 23rd September 2003 Gatwick Fri 26th September 2003 Belfast Fri 10th October 2003 Dublin Fri 17th October 2003 Boston Thurs 30th October 2003 Chicago Tues 3rd November 2003 Dallas Fri 7th November 2003 San Jose Thurs 13th November 2003 Seattle Mon 17th November 2003 Denver Fri 21st November 2003 Toronto Wed 26th November 2003 What if the customer says: " " 'It's too expensive' " " 'We're happy with our present supplier' " " 'I want to think about it' Can you handle these common objections? By far the most efficient way to be more profitable is to turn more of the enquiries you receive into paid orders. For this, the ability to resolve objections is critical - either you close or you lose the sale. And if you answer 'How much discount will you give me?' with: 'I'll ask my boss', you waste profit, which could be yours with a better reply. In only half a day I will teach you techniques which overcome these objections and more. You will be able to use them immediately to win profitable orders. There is no need to lose business to your competitors or give big discounts. The price for the workshop is low, only ?145. If you have never had any formal sales training or need a refresher, don't continue to work at a disadvantage. Reservations and information Please contact Sue on: Tel: +44(0)1481 720 294 Fax: +44(0)1481 720 317 E mail robertseviour@totalise.co.uk or reply to this email with the subject "Send info" If sales training is not an issue for your company please reply to this email with the word "DELETE" in the subject line. We will remove your details promptly. From Richard.Adams at ed.ac.uk Thu Sep 4 03:50:15 2003 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Thu Sep 4 03:49:06 2003 Subject: [Bioperl-l] codon useage modules References: <3F54A20D.C2BBBD56@ed.ac.uk> <1062537248.1956.10.camel@bala> Message-ID: <3F56EEB6.474D35B6@ed.ac.uk> Just looking at the Bio::DB:: classes, there are a lot of interface classes but as far as I can tell they're intended for sequence databases. Would it be useful to have a pure abstract interface class for accessing non-sequence-centred biological DBs (e.g., splice site DBs, codonuseage DBs, protein function DBs), to ensure consistent nomenclature? For example, get_local_request ()(if DB is stored locally) get_web_request () (for web DB) would retrieve the request in raw text format post_process_data () would return an object representationof the data. Richard Heikki Lehvaslaiho wrote: > On Tue, 2003-09-02 at 14:58, Richard Adams wrote: > > Hi, > > I'm writing a couple of modules for interrogating the codon usage > > database s(http://www.kazusa.or.jp/codon/) > > and retrieving its statistics; > > > > Module 1 inherits from Bio::WebAgent and contacts the DB with a species > > query, gets the appropriate Codon Useage Table > > and objectifies it. > > Bio::DB::CUTG > > > Module 2 supplies the methods and just inherits from Bio::Root::Root but > > uses modules like Bio::SeqUtils > > and Bio::Tools::CodonTable quite heavily. > > > > methods include e.g., $ct->get_rel_frequency('TGT'), > > $ct->get_aa_frequency('Leu'), > > $ct->preferred_codon('Arg'), > > Do you think there will be other modules that deal with codon usage? > I suspect there will be (generate codon usage from sequences, compare > codon usages, ...) and it would be warrant giving them their own name > space: Bio::CodonUsage. Your module could be > > Bio::CodonUsage::Table > > -Heikki > > > Any recommendations for where these modules should be put in the CVS? > > > > Cheers > > > > Richard > > > > > > -- > > Dr Richard Adams > > Bioinformatician, > > Psychiatric Genetics Group, > > Medical Genetics, > > Molecular Medicine Centre, > > Western General Hospital, > > Crewe Rd West, > > Edinburgh UK > > EH4 2XU > > > > Tel: 44 131 651 1084 > > richard.adams@ed.ac.uk > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Dr Richard Adams Bioinformatician, Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From birney at ebi.ac.uk Thu Sep 4 03:53:42 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Sep 4 03:52:43 2003 Subject: cloning and Storable Re: [Bioperl-l] bugs on branch; tests on main trunk In-Reply-To: <1062619071.2047.43.camel@bala> Message-ID: On Wed, 3 Sep 2003, Heikki Lehvaslaiho wrote: > I've removed the dependency for Storable. Storable is still used if it > is installed. Local code can clone everything except circular > references. If someone knows how to do it, I'd be happy to receive help. > Not having it here does not really matter because the the main use of > the clone method is to allow in-memory creation of a new enzyme based on > an existing one. > > The clone code is written in very general way and should be able to deep > copy any in-memory objects. If you need to add a clone method your own > classes, copy from there. Ewan feels strongly that deep cloning is too > prone to errors to be a general property of bioperl objects, so better > not add this into Bio::Root::Root, although it would be handy. I am willing to be overruled if there are alot of people who agree with Heikki, but clone() methods are, in my view, just promise something (the ability to correctly make a independent copy of all connected objects) without being able to deliver. The problem is with objects that either have eccentric memory layouts (such as bound XS code; not that we have many of these) or have implicit singleton style characteristics (eg, adaptors to databases which have session information). a clone() which naively attempts to just in-memory copy everything with truely fall over on teh first case and probably cause a complex problem on the second case. Remember that these objects may not be the top level ones, but rather be held onto in the object graph. Furthermore, I rarely see the need for clone; in most systems just reference passing is fine, and clone() is at best used as a shorthand for a specific constructor, (which is what it is doing in restriction enzyme) where I would argue the "full memory copy" is really a shorthand for "build me a new RE with precisely the same attributes" which can then be modified. So, I would argue that clone() on RE's is better written as a type of new option $new_re = new RestrictionEnzyme ( -template => $old_re); and we don't have clone on the Root::Object. Current Heikki is swayed enough by this argument to keep the clone() method specific to RE's. If Jason/Lincoln/Hilmar all (or mostly...) liked clone() on the Root object then I'd have to conceed From hlapp at gmx.net Thu Sep 4 04:30:16 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Sep 4 04:29:01 2003 Subject: cloning and Storable Re: [Bioperl-l] bugs on branch; tests on main trunk In-Reply-To: Message-ID: <02D54730-DEB2-11D7-AF80-000A959EB4C4@gmx.net> On Thursday, September 4, 2003, at 12:53 AM, Ewan Birney wrote: > If Jason/Lincoln/Hilmar all (or mostly...) liked clone() on the Root > object then I'd have to conceed > My view is that it can potentially create more damage if people do start to copy&paste the implementation from the RE module into others. It seems to me that a clone() method on Root can strike a better balance, provided it is documented as to what the caveats are and that you should actually not be using this method unless you understand those caveats. Copy constructors are handy at times, and setting actual attributes from those of the template just isn't very robust (what if a parent class adds an attribute? also, an inheriting class is implicitly burdened by having to override the copy constructor if it adds an attribute). -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From postmaster at ebi.ac.uk Thu Sep 4 05:04:36 2003 From: postmaster at ebi.ac.uk (MailScanner) Date: Thu Sep 4 05:03:18 2003 Subject: [Bioperl-l] Warning: E-mail viruses detected Message-ID: <200309040904.h8494aF09527@maui.ebi.ac.uk> Our virus detector has just been triggered by a message you sent:- To: datasubs@ebi.ac.uk Subject: Re: Your application Date: Thu Sep 4 10:04:36 2003 One or more of the attachments (application.pif) are on the list of unacceptable attachments for this site and will not have been delivered. Consider renaming the files or putting them into a "zip" file to avoid this constraint. The virus detector said this about the message: Report: Shortcuts to MS-Dos programs are very dangerous in email (application.pif) -- MailScanner Email Virus Scanner www.mailscanner.info Mailscanner thanks transtec Computers for their support From vesko_baev at abv.bg Thu Sep 4 07:19:09 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Thu Sep 4 07:18:14 2003 Subject: [Bioperl-l] comparing two seq Message-ID: <1235175531.1062674349495.JavaMail.nobody@java1.ni.bg> Hi, Which way is the best (and which module is best to use)for comparing two sequnces (rna & dna)? thanks a lot friends! Vesko Plovdiv,BULAGRIA ----------------------------------------------------------------- http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. From Richard.Adams at ed.ac.uk Thu Sep 4 08:18:22 2003 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Thu Sep 4 08:17:08 2003 Subject: [Bioperl-l] codon useage modules Message-ID: <3F572D8E.5E1D8843@ed.ac.uk> I've committed 2 modules to the CVS using Heikki's suggested namespace: Bio::DB::CUTG.pm is for retrieving species specific codon usage tables from a web database or from a local file. e.g., my $db = Bio::DB::CUTG->new(-sp => 'Pan troglodytes'); $db->get_web_request(); $db->write_data(-file ">savetolocalfile") ## get a Bio::CodonUsage::Table object: my $cut = $db->next_data(); Bio::CodonUsage::Table .pm provides methods on an objectified Codon usage table. e.g., $cut->aafrequency() - frequency of that aa in organisms proteins $cut->codon_rel_frequency('CTG') - relative use of a particular codon $cut->get_coding_gc(1) - GC content of a particular codon position etc., Test script and data also committed. Any comments welcome! Richard -- Dr Richard Adams Bioinformatician, Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From heikki at ebi.ac.uk Thu Sep 4 10:32:58 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu Sep 4 10:31:45 2003 Subject: [Bioperl-l] codon useage modules In-Reply-To: <3F572D8E.5E1D8843@ed.ac.uk> References: <3F572D8E.5E1D8843@ed.ac.uk> Message-ID: <1062685973.3600.17.camel@bala> Richard, Great to see the modules in so soon! The test script t/DBCUTG.t can not find the cutg example file. You should run tests like: perl -w t/DBCUTG.t and the first argument to catfile need to be "t": ok $db->get_local_request(-file=> Bio::Root::IO->catfile("t", "data", "MmCT")), 1; After this test #13 fails. Rather than having codon table parser inside Bio::DB::CUTG, I think it would be better to have it in a different module inside Bio::CodonUsage so that Bio::DB::CUTG need to deal only with web access. Do you think you could place the IO code in, e.g, Bio::CodonUsage::IO ? Then we'd have : Bio::CodonUsage::Table - codon usage storage object Bio::CodonUsage::IO - read and write codon tables Bio::DB::CUTG - retrieve codon info from CUTG Yours, -Heikki On Thu, 2003-09-04 at 13:18, Richard Adams wrote: > I've committed 2 modules to the CVS using Heikki's suggested namespace: > > Bio::DB::CUTG.pm is for retrieving species specific codon usage tables > from a web database or from > a local file. > e.g., my $db = Bio::DB::CUTG->new(-sp => 'Pan troglodytes'); > $db->get_web_request(); > $db->write_data(-file ">savetolocalfile") > ## get a Bio::CodonUsage::Table object: > my $cut = $db->next_data(); > > Bio::CodonUsage::Table .pm provides methods on an objectified Codon > usage table. > > e.g., $cut->aafrequency() - frequency of that aa in > organisms proteins > $cut->codon_rel_frequency('CTG') - relative use > of a particular codon > $cut->get_coding_gc(1) - GC content of a > particular codon position > etc., > > Test script and data also committed. > > Any comments welcome! > > Richard > > -- > Dr Richard Adams > Bioinformatician, > Psychiatric Genetics Group, > Medical Genetics, > Molecular Medicine Centre, > Western General Hospital, > Crewe Rd West, > Edinburgh UK > EH4 2XU > > Tel: 44 131 651 1084 > richard.adams@ed.ac.uk > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Thu Sep 4 10:49:11 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu Sep 4 10:48:10 2003 Subject: [Bioperl-l] comparing two seq In-Reply-To: <1235175531.1062674349495.JavaMail.nobody@java1.ni.bg> References: <1235175531.1062674349495.JavaMail.nobody@java1.ni.bg> Message-ID: <1062686950.3587.32.camel@bala> Vesko, Perl is really not language to write CPU intensive applications, so the only sequence comparison implementations we have in bioperl are written in C and are not part of the standard distribution. Look for bioperl-ext cvs module. What we try to do is to have perl wrappers for other programs. Check e.g.the EMBOSS suite at http://www.emboss.org/. The best program we have a wrapper for for rna/dna alignments is Sim4. The wrappers are all in the separate run module in cvs or tar file: http://www.bioperl.org/DIST/current_run_stable.tar.gz Sim4 can be tricky to compile so check if there is a binary for your platform. For example, I've seen an RPM file for Mandrake Linux. -Heikki On Thu, 2003-09-04 at 12:19, Vesko Baev wrote: > Hi, > Which way is the best (and which module is best to use)for comparing two sequnces (rna & dna)? > > thanks a lot friends! > > Vesko > Plovdiv,BULAGRIA > > ----------------------------------------------------------------- > http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Thu Sep 4 10:49:11 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu Sep 4 10:48:15 2003 Subject: [Bioperl-l] comparing two seq In-Reply-To: <1235175531.1062674349495.JavaMail.nobody@java1.ni.bg> References: <1235175531.1062674349495.JavaMail.nobody@java1.ni.bg> Message-ID: <1062686945.3929.30.camel@bala> Vesko, Perl is really not language to write CPU intensive applications, so the only sequence comparison implementations we have in bioperl are written in C and are not part of the standard distribution. Look for bioperl-ext cvs module. What we try to do is to have perl wrappers for other programs. Check e.g.the EMBOSS suite at http://www.emboss.org/. The best program we have a wrapper for for rna/dna alignments is Sim4. The wrappers are all in the separate run module in cvs or tar file: http://www.bioperl.org/DIST/current_run_stable.tar.gz Sim4 can be tricky to compile so check if there is a binary for your platform. For example, I've seen an RPM file for Mandrake Linux. -Heikki On Thu, 2003-09-04 at 12:19, Vesko Baev wrote: > Hi, > Which way is the best (and which module is best to use)for comparing two sequnces (rna & dna)? > > thanks a lot friends! > > Vesko > Plovdiv,BULAGRIA > > ----------------------------------------------------------------- > http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Thu Sep 4 10:49:10 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu Sep 4 10:48:20 2003 Subject: [Bioperl-l] comparing two seq In-Reply-To: <1235175531.1062674349495.JavaMail.nobody@java1.ni.bg> References: <1235175531.1062674349495.JavaMail.nobody@java1.ni.bg> Message-ID: <1062686940.3588.29.camel@bala> Vesko, Perl is really not language to write CPU intensive applications, so the only sequence comparison implementations we have in bioperl are written in C and are not part of the standard distribution. Look for bioperl-ext cvs module. What we try to do is to have perl wrappers for other programs. Check e.g.the EMBOSS suite at http://www.emboss.org/. The best program we have a wrapper for for rna/dna alignments is Sim4. The wrappers are all in the separate run module in cvs or tar file: http://www.bioperl.org/DIST/current_run_stable.tar.gz Sim4 can be tricky to compile so check if there is a binary for your platform. For example, I've seen an RPM file for Mandrake Linux. -Heikki On Thu, 2003-09-04 at 12:19, Vesko Baev wrote: > Hi, > Which way is the best (and which module is best to use)for comparing two sequnces (rna & dna)? > > thanks a lot friends! > > Vesko > Plovdiv,BULAGRIA > > ----------------------------------------------------------------- > http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From Marc.Logghe at devgen.com Thu Sep 4 10:55:10 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Sep 4 10:53:54 2003 Subject: [Bioperl-l] comparing two seq Message-ID: > -----Original Message----- > From: Heikki Lehvaslaiho [mailto:heikki@ebi.ac.uk] > Sent: Thursday, September 04, 2003 4:49 PM > To: Vesko Baev > Cc: Bioperl > Subject: Re: [Bioperl-l] comparing two seq > > > Vesko, > > Perl is really not language to write CPU intensive > applications, so the > only sequence comparison implementations we have in bioperl > are written > in C and are not part of the standard distribution. Look for > bioperl-ext > cvs module. > > What we try to do is to have perl wrappers for other programs. Check > e.g.the EMBOSS suite at http://www.emboss.org/. The best > program we have > a wrapper for for rna/dna alignments is Sim4. The wrappers are all in > the separate run module in cvs or tar file: > http://www.bioperl.org/DIST/current_run_stable.tar.gz > > Sim4 can be tricky to compile so check if there is a binary for your > platform. For example, I've seen an RPM file for Mandrake Linux. Also for RH and Suse on http://www.biolinux.org/sim4.html ML From saka123 at crocus.ocn.ne.jp Thu Sep 4 11:24:59 2003 From: saka123 at crocus.ocn.ne.jp (=?iso-2022-jp?B?GyRCIzMyLzFfJHIjNTIvIzlAaUt8MV8+WjVyTS0+WjVyJEdDNSQ5ISobKEI=?=) Date: Thu Sep 4 11:17:30 2003 Subject: [Bioperl-l] =?iso-2022-jp?b?GyRCTCQ+NUJ6OS05cCF2IzEyLzFfJHIbKEI=?= =?iso-2022-jp?b?GyRCISYhJiM1QGlLfDFfSl0+WiRyISYhJkM1JDlKfUshGyhC?= =?iso-2022-jp?b?GyRCJEgkTyEmISYbKEI=?= Message-ID: <200309041524.h84FOws06146@mimotohosyou.com> ??????????????????????????????? ????????? http://www12.ocn.ne.jp/~osyou/deny.html???????????????????????????????????????????? ?http://www12.ocn.ne.jp/~hosyou/???? ????????????????????????? ??? ?????????????????????????? ??????????????????????? ????????????????????? ?????????????????????????????? ???????????????????????????????? ????????????????--?--?-? ????????????????????????????? ?????????????????????????????????? ???????? ???????????????????????? ?????????????????????????????? ?????????????????????????????? ???????????????????????????? ??????????????????????FAX????? ?????????????????????????????? ??????????????????????????????? ???????????????????????????? ??????????????????????????????? ????????????????????????????????? ????????????????????????? ????????????????????????????????? ?????????????????????? ??????????????????????????????? ????????????????????????????????? ????????????????????????????????? ???????????????????????? ????????????????????????????????? ???? ??????????????????????????????? ????????????????????????????????? ?????????????????????????????? ?????? http://www12.ocn.ne.jp/~hosyou/?????????? ? ?????????????????????????? ???? ???????????????????? ??????????????????????????????????? ??????????????????????????????? ????????????????????????????????? ??????????????????????????????? ???????????????????????????????? ??????????????????????????????! ???????????????????????????????? ???????????????????????????????? ? From vesko_baev at abv.bg Thu Sep 4 11:56:23 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Thu Sep 4 11:55:21 2003 Subject: [Bioperl-l] retrieving seq from miltifasta files Message-ID: <1600227422.1062690983227.JavaMail.nobody@java1.ni.bg> Hello, I'm a new user of Bioperl. For several hours I'm reading POD for indexing multiple fasta files (Bio::Index::Fasta) but I don't get it?! For eximple I have one file (database) 'elegans.fa' which contains: >imkl2 cgatcgatcgac >imk13 gtattacgagcacac >imk14 ....... And I want to retrieve 'imk13' sequence in Seqobj. How to do it? I look in Fasta.pm there is a script but I don't how to make it. # Complete code for making an index for several # fasta files use Bio::Index::Fasta; use strict; my $Index_File_Name = shift; my $inx = Bio::Index::Fasta->new( '-filename' => $Index_File_Name, '-write_flag' => 1); $inx->make_index(@ARGV); # Print out several sequences present in the index # in Fasta format use Bio::Index::Fasta; use strict; my $Index_File_Name = shift; my $inx = Bio::Index::Fasta->new('-filename' => $Index_File_Name); my $out = Bio::SeqIO->new('-format' => 'Fasta','-fh' => \*STDOUT); foreach my $id (@ARGV) { my $seq = $inx->fetch($id); # Returns Bio::Seq object $out->write_seq($seq); } # or, alternatively my $seq = $inx->get_Seq_by_id($id); #identical to fetch THANKS!!!! BioFRIENDS Vesko plovdiv,Bulgaria ----------------------------------------------------------------- http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. From Lobvi.Matamoros at crchul.ulaval.ca Thu Sep 4 13:58:16 2003 From: Lobvi.Matamoros at crchul.ulaval.ca (Lobvi Matamoros) Date: Thu Sep 4 11:57:03 2003 Subject: [Bioperl-l] script Message-ID: <4.2.0.58.20030904115614.00a43708@drs.crchul.ulaval.ca> Hi: Does any one have an script to count how many proteins do you have in a database/file in FASTA format Thanks in advance for your help Lobvi Lobvi Matamoros Fern?ndez, Ph.D Post-doctoral fellow Centre de Recherche du CHUL 2705 Boul. Laurier, T3-80 Sainte-Foy (Qu?bec) G1V 4G2 CANADA Tel: 418-6542261 FAX:418-654-2279 From birney at ebi.ac.uk Thu Sep 4 12:06:57 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Sep 4 12:05:49 2003 Subject: [Bioperl-l] script In-Reply-To: <4.2.0.58.20030904115614.00a43708@drs.crchul.ulaval.ca> Message-ID: On Thu, 4 Sep 2003, Lobvi Matamoros wrote: > > Hi: > > Does any one have an script to count how many proteins do you have in a > database/file in FASTA format grep -c '>' filename > > Thanks in advance for your help > > Lobvi > > Lobvi Matamoros Fern?ndez, Ph.D > Post-doctoral fellow > > Centre de Recherche du CHUL > 2705 Boul. Laurier, T3-80 > Sainte-Foy (Qu?bec) > G1V 4G2 CANADA > Tel: 418-6542261 > FAX:418-654-2279 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From james.wasmuth at ed.ac.uk Thu Sep 4 12:12:21 2003 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Thu Sep 4 12:10:09 2003 Subject: [Bioperl-l] script References: <4.2.0.58.20030904115614.00a43708@drs.crchul.ulaval.ca> Message-ID: <3F576465.7040602@ed.ac.uk> If its a standard FASTA format file, then at the command line prompt type: grep ">" file.fa | wc -l hth james Lobvi Matamoros wrote: > > Hi: > > Does any one have an script to count how many proteins do you have in > a database/file in FASTA format > > Thanks in advance for your help > > Lobvi > > Lobvi Matamoros Fern?ndez, Ph.D > Post-doctoral fellow > > Centre de Recherche du CHUL > 2705 Boul. Laurier, T3-80 > Sainte-Foy (Qu?bec) > G1V 4G2 CANADA > Tel: 418-6542261 > FAX:418-654-2279 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Nematode Bioinformatics Blaxter Nematode Genomics Group Institute of Cell, Animal and Population Biology Ashworth Labs University of Edinburgh King's Buildings Edinburgh EH9 3JT UK (+44)(0)131 650 7403 From Luc.Gauthier at aventis.com Thu Sep 4 12:12:50 2003 From: Luc.Gauthier at aventis.com (Luc.Gauthier@aventis.com) Date: Thu Sep 4 12:11:42 2003 Subject: [Bioperl-l] Bio::SearchIO::sim4 parser Message-ID: <6FA8B454A1DF1E4A97F0A48DCC324EAB819C9D@crbsmxsusr05.pharma.aventis.com> Hi everybody, I am currently writing modules that find great benefit in using the Bio::SearchIO system. A couple of days ago, I found myself in front of a problem. I am using BioPerl 1.2.2, which does not contain a parser for sim4 output. So I started to use Bio::Tools::Sim4::Result from Ewan Birney and Hilmar Lapp. Great module! but it kind of "broke" the generic way of coding I was following with Bio::SearchIO. Then I found out that a sim4 parser existed in the "live" version of BopPerl. No problem for me, I have control on the libraries I use, so I downloaded the code from CVS and started using this new version. Unfortunately, it again did not work as I was expecting. It indeed seems to treat hits as distinct results so it could not fit in the system I was developing. So I wrote a new parser, based on that Bio::SearchIO::sim4 module and Bio::Tools::Sim4::Result too. Actually, I only modified some methods, mainly 'next_result'. In the end, I get a parser that works great for me. It can parse 0/1/3/4 outputs (like the existing modules) and gets maximum data, including sequences and homology string, along with gaps and everything. Above all, each alignment is now treated as a hit. I would greatly appreciate contributing to the project, if it could be of interest to anyone ! I must also admit that it would ease future maintenance for me if it became part of the standard distribution ! ;) Well, would someone want to see the code? How do you proceed when you want to contribute to BioPerl? Thank you and have a good day ! Luc Gauthier From ak at ebi.ac.uk Thu Sep 4 12:33:07 2003 From: ak at ebi.ac.uk (Andreas Kahari) Date: Thu Sep 4 12:31:55 2003 Subject: [Bioperl-l] script In-Reply-To: <3F576465.7040602@ed.ac.uk> References: <4.2.0.58.20030904115614.00a43708@drs.crchul.ulaval.ca> <3F576465.7040602@ed.ac.uk> Message-ID: <20030904163307.GA16670@ebi.ac.uk> If it's not a Unix system, this [untested] Perl snippet will do approximately the same thing: $/ = "\n>"; $count = 0; open(IN, "file.fa") or die; while () { $count++ } close(IN); print "No. of seqs: ", $count, "\n"; On Thu, Sep 04, 2003 at 05:12:21PM +0100, James Wasmuth wrote: > If its a standard FASTA format file, then at the command line prompt type: > > grep ">" file.fa | wc -l > > hth > james > > Lobvi Matamoros wrote: > > > > >Hi: > > > >Does any one have an script to count how many proteins do you have in > >a database/file in FASTA format > > > >Thanks in advance for your help [cut] -- a n d r e ( Andreas K?h?ri ) 0 1 0 0 0 a s . k a ) EMBL, European Bioinformatics Institute ( 1 0 0 0 1 h a r i @ ( Wellcome Trust Genome Campus, Hinxton ) 0 0 1 1 1 e b i . a ) Cambridge, CB10 1SD ( 0 0 1 0 0 c . u k ( United Kingdom ) 0 0 0 1 From james.wasmuth at ed.ac.uk Thu Sep 4 12:45:05 2003 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Thu Sep 4 12:42:53 2003 Subject: [Bioperl-l] script References: <4.2.0.58.20030904115614.00a43708@drs.crchul.ulaval.ca> <3F576465.7040602@ed.ac.uk> <20030904163307.GA16670@ebi.ac.uk> Message-ID: <3F576C11.9050801@ed.ac.uk> Sorry to be anally retentive, but just in case it is on a Windows machine, and the fasta file has funny carriage return formatting which may not picked up with ' \n ' use; open IN, ") { $lines .= $_; } $count++ while ($lines=~m/^>/g); print "No. of seqs: ", $count, "\n"; I think all bases have been covered now, except loading it into BioSeqIO... :-p james Andreas Kahari wrote: >If it's not a Unix system, this [untested] Perl snippet will do >approximately the same thing: > >$/ = "\n>"; >$count = 0; > >open(IN, "file.fa") or die; >while () { $count++ } >close(IN); > >print "No. of seqs: ", $count, "\n"; > > >On Thu, Sep 04, 2003 at 05:12:21PM +0100, James Wasmuth wrote: > > >>If its a standard FASTA format file, then at the command line prompt type: >> >>grep ">" file.fa | wc -l >> >>hth >>james >> >>Lobvi Matamoros wrote: >> >> >> >>>Hi: >>> >>>Does any one have an script to count how many proteins do you have in >>>a database/file in FASTA format >>> >>>Thanks in advance for your help >>> >>> >[cut] > > > -- Nematode Bioinformatics Blaxter Nematode Genomics Group Institute of Cell, Animal and Population Biology Ashworth Labs University of Edinburgh King's Buildings Edinburgh EH9 3JT UK (+44)(0)131 650 7403 From brian_osborne at cognia.com Thu Sep 4 12:40:58 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Sep 4 12:44:01 2003 Subject: [Bioperl-l] script In-Reply-To: <4.2.0.58.20030904115614.00a43708@drs.crchul.ulaval.ca> Message-ID: Lobvi, If you really mean "proteins" and for some reason there may be both protein and nucleotide sequences in your file then something like: ~>perl -e 'use Bio::SeqIO; $in = Bio::SeqIO->new(-file => $ARGV[0]); while ($seq = $in->next_seq ){ $count++ if ($seq->alphabet eq "protein") } print $count;' test.fa Otherwise, the grep. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Lobvi Matamoros Sent: Thursday, September 04, 2003 1:58 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] script Hi: Does any one have an script to count how many proteins do you have in a database/file in FASTA format Thanks in advance for your help Lobvi Lobvi Matamoros Fern?ndez, Ph.D Post-doctoral fellow Centre de Recherche du CHUL 2705 Boul. Laurier, T3-80 Sainte-Foy (Qu?bec) G1V 4G2 CANADA Tel: 418-6542261 FAX:418-654-2279 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From Lobvi.Matamoros at crchul.ulaval.ca Thu Sep 4 15:43:36 2003 From: Lobvi.Matamoros at crchul.ulaval.ca (Lobvi Matamoros) Date: Thu Sep 4 13:42:22 2003 Subject: [Bioperl-l] script for Win32 Message-ID: <4.2.0.58.20030904134056.00a58738@drs.crchul.ulaval.ca> Hi: Does any one have an script (Win32) to count how many proteins do you have in a database/file in FASTA format Thanks in advance for your help Lobvi Lobvi Matamoros Fern?ndez, Ph.D Post-doctoral fellow Centre de Recherche du CHUL 2705 Boul. Laurier, T3-80 Sainte-Foy (Qu?bec) G1V 4G2 CANADA Tel: 418-6542261 FAX:418-654-2279 From james.wasmuth at ed.ac.uk Thu Sep 4 14:11:57 2003 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Thu Sep 4 14:09:48 2003 Subject: [Bioperl-l] Running Blast and Hiding Error Message Message-ID: <3F57806D.7050904@ed.ac.uk> Dear All, could someone enlighted me as to whether there is a way I can pipe off an error / warning message printed by StandAlone blast. I know how to do this usually, but am unsure if I can when I launch Blast using Bioperl. cheers james -- Blaxter Nematode Genomics Group Institute of Cell, Animal and Population Biology Ashworth Labs University of Edinburgh King's Buildings Edinburgh EH9 3JT UK (+44)(0)131 650 7403 From Richard.Holland at agresearch.co.nz Thu Sep 4 17:10:14 2003 From: Richard.Holland at agresearch.co.nz (Holland, Richard) Date: Thu Sep 4 17:08:59 2003 Subject: [Bioperl-l] Blastx parser misses scores Message-ID: Hi, I have run into a problem with Bio::SearchIO::blast parsing blastx result files. This may affect other blast outputs as well but I'm not sure. At the top of a blastx output there is a summary of the best hits in the results file. Then, all the hits are listed, even the ones which are not in the best hits list. The Bio::Perl parser successfully parses all the hits from the file, however it only returns scores for those which appear in the summary. I have found the code which does this in Bio::SearchIO::blast and noticed that this seems to be deliberate - in all cases, blastx or not, the scores are taken from the summary, and the scores in the hit details appear to be ignored. Is this a feature or a bug? We would like to be able to use Bio::Perl to parse out all the results from our blast reports including all their scores and details, regardless of whether or not they appear in the best hits summary. Can anyone help? cheers, Richard ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at cgt.duhs.duke.edu Thu Sep 4 17:38:31 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Sep 4 17:37:02 2003 Subject: [Bioperl-l] Blastx parser misses scores In-Reply-To: References: Message-ID: Can you please provide and example report and code which doesn't behave as you would expect. Are you talking about the case where you have 50 hits listed in the summary but say only 25 HSP alignments? On Fri, 5 Sep 2003, Holland, Richard wrote: > Hi, > > I have run into a problem with Bio::SearchIO::blast parsing blastx > result files. This may affect other blast outputs as well but I'm not > sure. > > At the top of a blastx output there is a summary of the best hits in the > results file. Then, all the hits are listed, even the ones which are not > in the best hits list. > > The Bio::Perl parser successfully parses all the hits from the file, > however it only returns scores for those which appear in the summary. I > have found the code which does this in Bio::SearchIO::blast and noticed > that this seems to be deliberate - in all cases, blastx or not, the > scores are taken from the summary, and the scores in the hit details > appear to be ignored. > > Is this a feature or a bug? We would like to be able to use Bio::Perl to > parse out all the results from our blast reports including all their > scores and details, regardless of whether or not they appear in the best > hits summary. > > Can anyone help? > > cheers, > Richard > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From Richard.Holland at agresearch.co.nz Thu Sep 4 18:14:23 2003 From: Richard.Holland at agresearch.co.nz (Holland, Richard) Date: Thu Sep 4 18:13:25 2003 Subject: [Bioperl-l] Blastx parser misses scores Message-ID: > Are you talking about the case where you have 50 hits listed in the summary but say only 25 HSP alignments? Not sure. There are 10 hits listed in the summary and 18 detailed below it. We only get scores reported by the parser for the 10 in the summary. > Can you please provide and example report and code which doesn't behave as you would expect. The blast report in question is at the end of this email. Our code follows: =========== my $blastin = Bio::SearchIO->new(-fh=>$fileRef,-format=>"blast"); while (1) { my $result = $blastin->next_result; if (not $result) { last; } my $QueryID = $result->query_name; my $QueryLength = $result->query_length; while(my $hit = $result->next_hit()) { my $hitid = $hit->name; my $score = $hit->raw_score; my $description = $hit->name . " " . $hit->description; while (my $hsp = $hit->next_hsp) { my $expectation = $hsp->evalue; my $frame = ($hsp->query->frame + 1) * $hsp->query->strand; my $strand = $hsp->strand; my $hitlength = $hit->length; my $identities = $hsp->num_identical; my $overlaps = $hsp->length('total'); my $gaps = $hsp->gaps; my $qstart = $hsp->start('query'); my $qstop = $hsp->end('query'); my $hstart = $hsp->start('hit'); my $hstop = $hsp->end('hit'); my $positives = $hsp->num_conserved; # Truncated - code goes here that processes the results } } } =========== The blast report looks like this. In the code above, all scores ($hit->raw_score) for hits ">SW:SSRP_DROME Q05344 drosophila melanogaster (fruit fly). single-strand recognition" onwards come out as null: =========== BLASTX 2.2.4 [Aug-26-2002] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= 010404CS0701000001 (668 letters) Database: /home/seqstore/ncbi/blast/data/swplus 954,989 sequences; 303,757,025 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value SP_PL:O04235 O04235 vicia faba (broad bean). transcription facto... 358 3e-98 SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (mada... 313 9e-85 SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress). str... 309 1e-83 SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). ... 306 1e-82 SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early ... 306 1e-82 SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002 301 3e-81 SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87.... 120 9e-27 SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure s... 115 5e-25 SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific re... 114 6e-25 SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific re... 108 5e-23 >SP_PL:O04235 O04235 vicia faba (broad bean). transcription factor. 10/2002 Length = 642 Score = 358 bits (919), Expect = 3e-98 Identities = 172/194 (88%), Positives = 184/194 (94%) Frame = +3 Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN 260 MTDGHLFNNITLG RGGTNPGQIKI+SGGILWKRQGGGK+I+VDK DI+ VTWMKVP++N Sbjct: 1 MTDGHLFNNITLGXRGGTNPGQIKIYSGGILWKRQGGGKTIDVDKTDIMGVTWMKVPKTN 60 Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA 440 QLGVQIKDGL YKFTGFRDQDV+SLTNFFQNTFGI V+EKQLSV+GRNWG+VDLNGNMLA Sbjct: 61 QLGVQIKDGLLYKFTGFRDQDVVSLTNFFQNTFGITVEEKQLSVTGRNWGEVDLNGNMLA 120 Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV 620 FMVGSKQAFEV LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLME+SFHIP+SNTQFV Sbjct: 121 FMVGSKQAFEVSLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEMSFHIPSSNTQFV 180 Query: 621 GDENTPPXQVFRXK 662 GDEN P QVFR K Sbjct: 181 GDENRPSAQVFRDK 194 >SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (madagascar periwinkle). structure-specific recognition protein 1 homolog (hmg protein). 9/2003 Length = 639 Score = 313 bits (802), Expect = 9e-85 Identities = 153/194 (78%), Positives = 174/194 (88%) Frame = +3 Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN 260 M DGHLFNNITLGGRGGTNPGQ+++ SGGILWK+QGG K++EVDK+D+V +TWMKVPRSN Sbjct: 1 MADGHLFNNITLGGRGGTNPGQLRVHSGGILWKKQGGAKAVEVDKSDMVGLTWMKVPRSN 60 Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA 440 QLGV+IKDGLFYKFTGFRDQDV SLT++ Q+T GI +EKQLSVSG+NWG+VDLNGNML Sbjct: 61 QLGVRIKDGLFYKFTGFRDQDVASLTSYLQSTCGITPEEKQLSVSGKNWGEVDLNGNMLT 120 Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEF/MWMTQLEPM\EKDSLMEISFHIPNSNTQ 614 F+VGSKQAFEV LADV+QT LQGKNDV+LEF MWM LE M K+SLMEISFH+PNSNTQ Sbjct: 121 FLVGSKQAFEVSLADVAQTQLQGKNDVMLEF MWMILLEQM RKNSLMEISFHVPNSNTQ 178 Query: 615 FVGDENTPPXQVFRXK 662 FVGDEN PP QVFR K Sbjct: 179 FVGDENRPPAQVFRDK 194 >SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress). structure-specific recognition protein 1 homolog (hmg protein). 9/2003 Length = 646 Score = 309 bits (792), Expect = 1e-83 Identities = 148/191 (77%), Positives = 167/191 (86%) Frame = +3 Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN 260 M DGH FNNI+L GRGG NPG +KI SGGI WK+QGGGK++EVD++DIVSV+W KV +SN Sbjct: 1 MADGHSFNNISLSGRGGKNPGLLKINSGGIQWKKQGGGKAVEVDRSDIVSVSWTKVTKSN 60 Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA 440 QLGV+ KDGL+YKF GFRDQDV SL++FFQ+++G EKQLSVSGRNWG+VDL+GN L Sbjct: 61 QLGVKTKDGLYYKFVGFRDQDVPSLSSFFQSSYGKTPDEKQLSVSGRNWGEVDLHGNTLT 120 Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV 620 F+VGSKQAFEV LADVSQT LQGKNDV LEFHVDDT GANEKDSLMEISFHIPNSNTQFV Sbjct: 121 FLVGSKQAFEVSLADVSQTQLQGKNDVTLEFHVDDTAGANEKDSLMEISFHIPNSNTQFV 180 Query: 621 GDENTPPXQVF 653 GDEN PP QVF Sbjct: 181 GDENRPPSQVF 191 >SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). 10/2002 Length = 641 Score = 306 bits (784), Expect = 1e-82 Identities = 141/190 (74%), Positives = 164/190 (86%) Frame = +3 Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN 260 MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+ Sbjct: 1 MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY 60 Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA 440 QLGV+ KDGLFYKF GFR+QDV SLTNF Q G++ EKQLSVSG+NWG +D+NGNML Sbjct: 61 QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT 120 Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV 620 FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+ Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL 180 Query: 621 GDENTPPXQV 650 GDEN QV Sbjct: 181 GDENRTAAQV 190 >SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early drought induced protein. 3/2003 Length = 641 Score = 306 bits (784), Expect = 1e-82 Identities = 141/190 (74%), Positives = 164/190 (86%) Frame = +3 Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN 260 MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+ Sbjct: 1 MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY 60 Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA 440 QLGV+ KDGLFYKF GFR+QDV SLTNF Q G++ EKQLSVSG+NWG +D+NGNML Sbjct: 61 QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT 120 Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV 620 FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+ Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL 180 Query: 621 GDENTPPXQV 650 GDEN QV Sbjct: 181 GDENRTAAQV 190 >SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002 Length = 639 Score = 301 bits (772), Expect = 3e-81 Identities = 138/190 (72%), Positives = 162/190 (84%) Frame = +3 Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN 260 MTDGH FNNI LGGRGGTNPGQ K+ SGG+ WKRQGGGK+IE+DKAD+ +VTWMKVPR+ Sbjct: 1 MTDGHHFNNILLGGRGGTNPGQFKVHSGGLAWKRQGGGKTIEIDKADVTAVTWMKVPRAY 60 Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA 440 QLGV+IK GLFY+F GFR+QDV +LTNF Q G+ EKQLSVSG+NWG +D++GNML Sbjct: 61 QLGVRIKAGLFYRFIGFREQDVSNLTNFIQKNMGVTPDEKQLSVSGQNWGGIDIDGNMLT 120 Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV 620 FMVGSKQAFEV L DV+QT +QGK DV+LE HVDDTTGANEKDSLM++SFH+P SNTQFV Sbjct: 121 FMVGSKQAFEVSLPDVAQTQMQGKTDVLLELHVDDTTGANEKDSLMDLSFHVPTSNTQFV 180 Query: 621 GDENTPPXQV 650 GDE+ PP + Sbjct: 181 GDESRPPAHI 190 >SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87. 10/2002 Length = 693 Score = 120 bits (302), Expect = 9e-27 Identities = 64/173 (36%), Positives = 100/173 (56%) Frame = +3 Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN 260 M D FN+I +G N G++++ G+++K GK + ADI V W +V + Sbjct: 1 MADTLEFNDIYQEVKGSMNDGRLRLSRAGLMYKNNKTGKVENISAADIAEVVWRRVALGH 60 Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA 440 + + G YK+ GFR+ + L ++F++ F + + EK L V G NWG V G +L+ Sbjct: 61 GIKLLTNGGHVYKYDGFRETEYDKLFDYFKSHFSVELVEKDLCVKGWNWGSVRFGGQLLS 120 Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 F +G + AFE+PL++VSQ GKN+V LEFH +D + + SLMEI F++P Sbjct: 121 FDIGDQPAFELPLSNVSQCT-TGKNEVTLEFHQND----DSEVSLMEIRFYVP 168 >SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure specific recognition protein 1. 3/2003 Length = 711 Score = 115 bits (287), Expect = 5e-25 Identities = 59/167 (35%), Positives = 97/167 (57%) Frame = +3 Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI 278 FN+I +G N G++++ GI++K GK + ++ W +V + L + Sbjct: 7 FNDIFQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT 66 Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK 458 K+G YK+ GFR+ + L++FF+ + + + EK L V G NWG V G +L+F +G + Sbjct: 67 KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ 126 Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 FE+PL++VSQ GKN+V LEFH +D + + SLME+ F++P Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168 >SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific recognition protein 1 (ssrp1) (recombination signal sequence recognition protein) (t160) (chromatin-specific transcription elongation factor 80 kda subunit) (fact 80 kda subunit). 9/2003 Length = 709 Score = 114 bits (286), Expect = 6e-25 Identities = 58/167 (34%), Positives = 97/167 (57%) Frame = +3 Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI 278 FN++ +G N G++++ GI++K GK + ++ W +V + L + Sbjct: 7 FNDVYQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT 66 Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK 458 K+G YK+ GFR+ + L++FF+ + + + EK L V G NWG V G +L+F +G + Sbjct: 67 KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ 126 Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 FE+PL++VSQ GKN+V LEFH +D + + SLME+ F++P Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168 >SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific recognition protein 1 (ssrp1) (recombination signal sequence recognition protein) (t160). 9/2003 Length = 708 Score = 108 bits (270), Expect = 5e-23 Identities = 56/167 (33%), Positives = 95/167 (56%) Frame = +3 Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI 278 FN+I +G N G++++ GI++K GK + ++ W +V + L + Sbjct: 7 FNDIFQEVKGSMNDGRLRLSPSGIIFKNSKTGKVDNIQAGELTEGIWPRVALGHGLKLLT 66 Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK 458 K+G YK+ GFR+ + L++FF+ + + + EK L V G NWG V G +L+F +G + Sbjct: 67 KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ 126 Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 FE+PL++VS Q + +V LEFH +D + + SLME+ F++P Sbjct: 127 PVFEIPLSNVSSVP-QARIEVTLEFHQND----DPEVSLMEVRFYVP 168 >SW:SSRP_DROME Q05344 drosophila melanogaster (fruit fly). single-strand recognition protein (ssrp) (chorion-factor 5). 9/2003 Length = 723 Score = 101 bits (251), Expect = 7e-21 Identities = 63/173 (36%), Positives = 92/173 (52%) Frame = +3 Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN 260 MTD +N+I RG G++K+ I++K GK ++ DI + K + Sbjct: 1 MTDSLEYNDINAEVRGVLCSGRLKMTEQNIIFKNTKTGKVEQISAEDIDLINSQKFVGTW 60 Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA 440 L V K G+ ++FTGFRD + L F + + + EK++ V G NWG G++L+ Sbjct: 61 GLRVFTKGGVLHRFTGFRDSEHEKLGKFIKAAYSQEMVEKEMCVKGWNWGTARFMGSVLS 120 Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 F SK FEVPL+ VSQ + GKN+V LEFH +D L+E+ FHIP Sbjct: 121 FDKESKTIFEVPLSHVSQC-VTGKNEVTLEFHQNDDAPV----GLLEMRFHIP 168 >SP_FUN:O94529 O94529 schizosaccharomyces pombe (fission yeast). putative structure specific recognition protein. 3/2003 Length = 512 Score = 96.7 bits (239), Expect = 2e-19 Identities = 48/161 (29%), Positives = 86/161 (52%), Gaps = 2/161 (1%) Frame = +3 Query: 138 PGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRD 317 PG+++I G+ WK + + ++I W + R +L + +K GF Sbjct: 19 PGKLRIAPSGLGWKSPSLAEPFTLPISEIRRFCWSRFARGYELKIILKSKDPVSLDGFSQ 78 Query: 318 QDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQT 497 +D+ L N + F + +++K+ S+ G NWG+ + G+ L F V S+ AFE+P++ V+ T Sbjct: 79 EDLDDLINVIKQNFDMGIEQKEFSIKGWNWGEANFLGSELVFDVNSRPAFEIPISAVTNT 138 Query: 498 NLQGKNDVILEFHV--DDTTGANEKDSLMEISFHIPNSNTQ 614 NL GKN+V LEF D + + D L+E+ ++P + + Sbjct: 139 NLSGKNEVALEFSTTDDKQIPSAQVDELVEMRLYVPGTTAK 179 >SW:SSRP_CHICK Q04678 gallus gallus (chicken). structure-specific recognition protein 1 (ssrp1) (recombination signal sequence recognition protein) (t160) (fragment). 9/2003 Length = 669 Score = 95.9 bits (237), Expect = 3e-19 Identities = 48/131 (36%), Positives = 79/131 (59%) Frame = +3 Query: 207 VDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQL 386 + +++ W +V + L + K+G YK+ GFR+ + L++FF+ + + + EK L Sbjct: 5 IQASELAEGVWRRVALGHGLKLLTKNGHVYKYDGFRESEFDKLSDFFKAHYRLELAEKDL 64 Query: 387 SVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEK 566 V G NWG V G +L+F +G + FE+PL++VSQ GKN+V LEFH +D + + Sbjct: 65 CVKGWNWGTVRFGGQLLSFDIGEQPVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAE 119 Query: 567 DSLMEISFHIP 599 SLME+ F++P Sbjct: 120 VSLMEVRFYVP 130 >SP_IN:Q8IL56 Q8il56 plasmodium falciparum (isolate 3d7). structure specific recognition protein, putative. 3/2003 Length = 506 Score = 94.0 bits (232), Expect = 1e-18 Identities = 50/170 (29%), Positives = 89/170 (51%), Gaps = 5/170 (2%) Frame = +3 Query: 120 GRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN-----QLGVQIKD 284 G GG++ G ++ + + WK + + +DI W+K +N +LG + K+ Sbjct: 21 GFGGSDFGSFRMSNEFLGWKNKKTNNVYQYKCSDIDEGCWIKTSYNNNRLHLKLG-ESKE 79 Query: 285 GLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQA 464 + F GF D++V +T FQ F I + ++++ G NWG+ L + L F + +K A Sbjct: 80 NIIIYFDGFPDRNVNEITQHFQKYFNIRLNNRKIATKGWNWGEFKLENSNLCFDIDNKYA 139 Query: 465 FEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQ 614 F +P +++Q N+Q K D+ +EF D+ +D L EI F+ P+ N + Sbjct: 140 FNLPTNNINQLNVQIKTDIAMEFKNDENNNKGNEDFLAEIRFYYPHENDE 189 >SW:SSRP_CAEEL P41848 caenorhabditis elegans. probable structure-specific recognition protein 1 (ssrp1) (recombination signal sequence recognition protein). 9/2003 Length = 697 Score = 92.0 bits (227), Expect = 4e-18 Identities = 48/153 (31%), Positives = 82/153 (53%) Frame = +3 Query: 141 GQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQ 320 G +K+ + +K GGKS+ V +DI + W K+ L V + DG ++F GF+D Sbjct: 20 GTLKLTEKSLNFKGDKGGKSVNVTGSDIDKLKWQKLGNKPGLRVGLNDGGAHRFGGFKDT 79 Query: 321 DVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTN 500 D+ + +F + + ++ + L + G N+G ++ G + F K FE+P +VS Sbjct: 80 DLEKIQSFTSSNWSQSIDQSNLFIKGWNYGQAEVKGKTVEFSWEDKPIFEIPCTNVSNV- 138 Query: 501 LQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 + KN+ +LEFH +D + K LME+ FH+P Sbjct: 139 IANKNEAVLEFHQND----DSKVQLMEMRFHMP 167 >SW:YMG9_YEAST Q04636 saccharomyces cerevisiae (baker's yeast). hypothetical 63.0 kda protein in dak1-orc1 intergenic region. 5/2000 Length = 552 Score = 89.0 bits (219), Expect = 4e-17 Identities = 50/161 (31%), Positives = 80/161 (49%), Gaps = 8/161 (4%) Frame = +3 Query: 141 GQIKIFSGGILWK--RQGGGKSIEVDK------ADIVSVTWMKVPRSNQLGVQIKDGLFY 296 G+ +I G+ WK GG + + K ++ +V W + R L + K+ Sbjct: 17 GRFRIADSGLGWKISTSGGSAANQARKPFLLPATELSTVQWSRGCRGYDLKINTKNQGVI 76 Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP 476 + GF D + N F F I V++++ S+ G NWG DL N + F + K FE+P Sbjct: 77 QLDGFSQDDYNLIKNDFHRRFNIQVEQREHSLRGWNWGKTDLARNEMVFALNGKPTFEIP 136 Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 A ++ TNL KN+V +EF++ D D L+E+ F+IP Sbjct: 137 YARINNTNLTSKNEVGIEFNIQDEEYQPAGDELVEMRFYIP 177 >SP_IN:O01683 O01683 caenorhabditis elegans. c32f10.5 protein. 3/2003 Length = 689 Score = 86.7 bits (213), Expect = 2e-16 Identities = 50/186 (26%), Positives = 90/186 (47%) Frame = +3 Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI 278 F + + G G + + I + GGKS+ + D+ + W K+ L V + Sbjct: 6 FKGVYVEDIGHLTCGTLTLTENSINFIGDKGGKSVYITGTDVDKLKWQKLGNKPGLRVGL 65 Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK 458 DG ++F GF D D+ + +F + + ++ + L ++G N+G D+ G + F ++ Sbjct: 66 SDGGAHRFGGFLDDDLQKIQSFTSSNWSKSINQSNLFINGWNYGQADVKGKNIEFSWENE 125 Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFVGDENTP 638 FE+P +VS + KN+ ILEFH ++ K LME+ FH+P +E+T Sbjct: 126 PIFEIPCTNVSNV-IANKNEAILEFHQNE----QSKVQLMEMRFHMP---VDLENEEDTD 177 Query: 639 PXQVFR 656 + F+ Sbjct: 178 KVEEFK 183 >SP_FUN:Q9HFC4 Q9hfc4 zygosaccharomyces rouxii (candida mogii). ssrp1-like protein (fragment). 10/2002 Length = 542 Score = 85.1 bits (209), Expect = 5e-16 Identities = 48/165 (29%), Positives = 79/165 (47%), Gaps = 8/165 (4%) Frame = +3 Query: 141 GQIKIFSGGILWKRQGGGKSIE--------VDKADIVSVTWMKVPRSNQLGVQIKDGLFY 296 G+ +I G+ WK G S + ++ +V W + R +L V K+ Sbjct: 45 GRFRIADSGLGWKSANAGGSAANQSKQPFLLPATELSTVQWSRGCRGFELKVNTKNQGVV 104 Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP 476 + GF D + N F F + V+ K+ S+ G NWG DL N + F + + +FEVP Sbjct: 105 QLDGFAPDDFNLIKNDFHRRFNVQVEPKEHSLRGWNWGKADLARNEMVFALNGRPSFEVP 164 Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNT 611 A ++ TNL K +V +EF++ D D L+E+ ++P + T Sbjct: 165 YARINNTNLTSKTEVAIEFNLADENYQPAGDELVEMRLYVPGTVT 209 Database: /home/seqstore/ncbi/blast/data/swplus Posted date: Apr 15, 2003 12:04 PM Number of letters in database: 303,757,025 Number of sequences in database: 954,989 Lambda K H 0.318 0.135 0.401 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 385,793,622 Number of Sequences: 954989 Number of extensions: 8541745 Number of successful extensions: 21678 Number of sequences better than 1.0e-06: 36 Number of HSP's better than 0.0 without gapping: 21171 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 0 Number of HSP's gapped (non-prelim): 21664 length of database: 303,757,025 effective HSP length: 116 effective length of database: 192,978,301 effective search space used: 20455699906 frameshift window, decay const: 50, 0.1 T: 12 A: 40 X1: 16 ( 7.3 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.7 bits) =========== Richard Holland Bioinformatics Database Developer ITS, Agresearch Invermay x3279 -----Original Message----- From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] Sent: Friday, 5 September 2003 9:39 a.m. To: Holland, Richard Cc: bioperl-l@bioperl.org; McCulloch, Alan Subject: Re: [Bioperl-l] Blastx parser misses scores Can you please provide and example report and code which doesn't behave as you would expect. Are you talking about the case where you have 50 hits listed in the summary but say only 25 HSP alignments? On Fri, 5 Sep 2003, Holland, Richard wrote: > Hi, > > I have run into a problem with Bio::SearchIO::blast parsing blastx > result files. This may affect other blast outputs as well but I'm not > sure. > > At the top of a blastx output there is a summary of the best hits in > the results file. Then, all the hits are listed, even the ones which > are not in the best hits list. > > The Bio::Perl parser successfully parses all the hits from the file, > however it only returns scores for those which appear in the summary. > I have found the code which does this in Bio::SearchIO::blast and > noticed that this seems to be deliberate - in all cases, blastx or > not, the scores are taken from the summary, and the scores in the hit > details appear to be ignored. > > Is this a feature or a bug? We would like to be able to use Bio::Perl > to parse out all the results from our blast reports including all > their scores and details, regardless of whether or not they appear in > the best hits summary. > > Can anyone help? > > cheers, > Richard > ====================================================================== > = > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at cgt.duhs.duke.edu Thu Sep 4 21:49:06 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Sep 4 21:47:36 2003 Subject: [Bioperl-l] Blastx parser misses scores In-Reply-To: References: Message-ID: The Hit summary is where the score comes from, if there isn't a listing for the hit you are interested in, we can't really report it - the parser is just providing an object representation of what is in the input file. You can get the HSP bit score, z score, evalue from these fields $hsp->bits, $hsp->score, $hsp->evalue. You can get more hit scores reported by changing your blast parameters so it will report more hits in the sumamry (-v parameter) if you want to see these summary value. There were just recently some messages on this list talking about how the summary scores are computed in case you wanted to think about constructing them yourself. -jason On Fri, 5 Sep 2003, Holland, Richard wrote: > > Are you talking about the case where you have 50 hits listed in the > summary but say only 25 HSP alignments? > > Not sure. There are 10 hits listed in the summary and 18 detailed below > it. We only get scores reported by the parser for the 10 in the summary. > > > Can you please provide and example report and code which doesn't > behave as you would expect. > > The blast report in question is at the end of this email. > > Our code follows: > > =========== > > my $blastin = > Bio::SearchIO->new(-fh=>$fileRef,-format=>"blast"); > > while (1) { > my $result = $blastin->next_result; > if (not $result) { last; } > > my $QueryID = $result->query_name; > my $QueryLength = $result->query_length; > > while(my $hit = $result->next_hit()) { > my $hitid = $hit->name; > my $score = $hit->raw_score; > my $description = $hit->name . " " . > $hit->description; > while (my $hsp = $hit->next_hsp) { > my $expectation = $hsp->evalue; > my $frame = ($hsp->query->frame + 1) * > $hsp->query->strand; > my $strand = $hsp->strand; > my $hitlength = $hit->length; > my $identities = $hsp->num_identical; > my $overlaps = $hsp->length('total'); > my $gaps = $hsp->gaps; > my $qstart = $hsp->start('query'); > my $qstop = $hsp->end('query'); > my $hstart = $hsp->start('hit'); > my $hstop = $hsp->end('hit'); > my $positives = $hsp->num_conserved; > # Truncated - code goes here that processes the > results > } > } > } > > =========== > > The blast report looks like this. In the code above, all scores > ($hit->raw_score) for hits ">SW:SSRP_DROME Q05344 drosophila > melanogaster (fruit fly). single-strand recognition" onwards come out as > null: > > =========== > > BLASTX 2.2.4 [Aug-26-2002] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= 010404CS0701000001 > (668 letters) > > Database: /home/seqstore/ncbi/blast/data/swplus > 954,989 sequences; 303,757,025 total letters > > Searching..................................................done > > Score > E > Sequences producing significant alignments: (bits) > Value > > SP_PL:O04235 O04235 vicia faba (broad bean). transcription facto... > 358 3e-98 > SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (mada... > 313 9e-85 > SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress). str... > 309 1e-83 > SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). ... > 306 1e-82 > SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early ... > 306 1e-82 > SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002 > 301 3e-81 > SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87.... > 120 9e-27 > SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure s... > 115 5e-25 > SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific re... > 114 6e-25 > SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific re... > 108 5e-23 > > >SP_PL:O04235 O04235 vicia faba (broad bean). transcription factor. > 10/2002 > Length = 642 > > Score = 358 bits (919), Expect = 3e-98 > Identities = 172/194 (88%), Positives = 184/194 (94%) > Frame = +3 > > Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN > 260 > MTDGHLFNNITLG RGGTNPGQIKI+SGGILWKRQGGGK+I+VDK DI+ VTWMKVP++N > Sbjct: 1 MTDGHLFNNITLGXRGGTNPGQIKIYSGGILWKRQGGGKTIDVDKTDIMGVTWMKVPKTN > 60 > > Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA > 440 > QLGVQIKDGL YKFTGFRDQDV+SLTNFFQNTFGI V+EKQLSV+GRNWG+VDLNGNMLA > Sbjct: 61 QLGVQIKDGLLYKFTGFRDQDVVSLTNFFQNTFGITVEEKQLSVTGRNWGEVDLNGNMLA > 120 > > Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV > 620 > FMVGSKQAFEV LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLME+SFHIP+SNTQFV > Sbjct: 121 FMVGSKQAFEVSLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEMSFHIPSSNTQFV > 180 > > Query: 621 GDENTPPXQVFRXK 662 > GDEN P QVFR K > Sbjct: 181 GDENRPSAQVFRDK 194 > > > >SW:SSTP_CATRO Q39601 catharanthus roseus (rosy periwinkle) (madagascar > periwinkle). > structure-specific recognition protein 1 homolog (hmg > protein). 9/2003 > Length = 639 > > Score = 313 bits (802), Expect = 9e-85 > Identities = 153/194 (78%), Positives = 174/194 (88%) > Frame = +3 > > Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN > 260 > M DGHLFNNITLGGRGGTNPGQ+++ SGGILWK+QGG K++EVDK+D+V +TWMKVPRSN > Sbjct: 1 MADGHLFNNITLGGRGGTNPGQLRVHSGGILWKKQGGAKAVEVDKSDMVGLTWMKVPRSN > 60 > > Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA > 440 > QLGV+IKDGLFYKFTGFRDQDV SLT++ Q+T GI +EKQLSVSG+NWG+VDLNGNML > Sbjct: 61 QLGVRIKDGLFYKFTGFRDQDVASLTSYLQSTCGITPEEKQLSVSGKNWGEVDLNGNMLT > 120 > > Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEF/MWMTQLEPM\EKDSLMEISFHIPNSNTQ > 614 > F+VGSKQAFEV LADV+QT LQGKNDV+LEF MWM LE M K+SLMEISFH+PNSNTQ > Sbjct: 121 FLVGSKQAFEVSLADVAQTQLQGKNDVMLEF MWMILLEQM RKNSLMEISFHVPNSNTQ > 178 > > Query: 615 FVGDENTPPXQVFRXK 662 > FVGDEN PP QVFR K > Sbjct: 179 FVGDENRPPAQVFRDK 194 > > > >SW:SSRP_ARATH Q05153 arabidopsis thaliana (mouse-ear cress). > structure-specific > recognition protein 1 homolog (hmg protein). 9/2003 > Length = 646 > > Score = 309 bits (792), Expect = 1e-83 > Identities = 148/191 (77%), Positives = 167/191 (86%) > Frame = +3 > > Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN > 260 > M DGH FNNI+L GRGG NPG +KI SGGI WK+QGGGK++EVD++DIVSV+W KV +SN > Sbjct: 1 MADGHSFNNISLSGRGGKNPGLLKINSGGIQWKKQGGGKAVEVDRSDIVSVSWTKVTKSN > 60 > > Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA > 440 > QLGV+ KDGL+YKF GFRDQDV SL++FFQ+++G EKQLSVSGRNWG+VDL+GN L > Sbjct: 61 QLGVKTKDGLYYKFVGFRDQDVPSLSSFFQSSYGKTPDEKQLSVSGRNWGEVDLHGNTLT > 120 > > Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV > 620 > F+VGSKQAFEV LADVSQT LQGKNDV LEFHVDDT GANEKDSLMEISFHIPNSNTQFV > Sbjct: 121 FLVGSKQAFEVSLADVSQTQLQGKNDVTLEFHVDDTAGANEKDSLMEISFHIPNSNTQFV > 180 > > Query: 621 GDENTPPXQVF 653 > GDEN PP QVF > Sbjct: 181 GDENRPPSQVF 191 > > > >SP_PL:Q9LGR0 Q9lgr0 oryza sativa (rice). ests au069334(c60619). 10/2002 > Length = 641 > > Score = 306 bits (784), Expect = 1e-82 > Identities = 141/190 (74%), Positives = 164/190 (86%) > Frame = +3 > > Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN > 260 > MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+ > Sbjct: 1 MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY > 60 > > Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA > 440 > QLGV+ KDGLFYKF GFR+QDV SLTNF Q G++ EKQLSVSG+NWG +D+NGNML > Sbjct: 61 QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT > 120 > > Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV > 620 > FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+ > Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL > 180 > > Query: 621 GDENTPPXQV 650 > GDEN QV > Sbjct: 181 GDENRTAAQV 190 > > > >SP_PL:Q8LKS8 Q8lks8 oryza sativa (indica cultivar-group). early drought > induced > protein. 3/2003 > Length = 641 > > Score = 306 bits (784), Expect = 1e-82 > Identities = 141/190 (74%), Positives = 164/190 (86%) > Frame = +3 > > Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN > 260 > MTDGHLFNNI LGGR G+NPGQ K++SGG+ WKRQGGGK+IE++K+D+ SVTWMKVPR+ > Sbjct: 1 MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAY > 60 > > Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA > 440 > QLGV+ KDGLFYKF GFR+QDV SLTNF Q G++ EKQLSVSG+NWG +D+NGNML > Sbjct: 61 QLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINGNMLT > 120 > > Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV > 620 > FMVGSKQAFEV LADVSQT +QGK DV+LEFHVDDTTG NEKDSLM++SFH+P SNTQF+ > Sbjct: 121 FMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFL > 180 > > Query: 621 GDENTPPXQV 650 > GDEN QV > Sbjct: 181 GDENRTAAQV 190 > > > >SP_PL:Q9LEF5 Q9lef5 zea mays (maize). ssrp1 protein. 10/2002 > Length = 639 > > Score = 301 bits (772), Expect = 3e-81 > Identities = 138/190 (72%), Positives = 162/190 (84%) > Frame = +3 > > Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN > 260 > MTDGH FNNI LGGRGGTNPGQ K+ SGG+ WKRQGGGK+IE+DKAD+ +VTWMKVPR+ > Sbjct: 1 MTDGHHFNNILLGGRGGTNPGQFKVHSGGLAWKRQGGGKTIEIDKADVTAVTWMKVPRAY > 60 > > Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA > 440 > QLGV+IK GLFY+F GFR+QDV +LTNF Q G+ EKQLSVSG+NWG +D++GNML > Sbjct: 61 QLGVRIKAGLFYRFIGFREQDVSNLTNFIQKNMGVTPDEKQLSVSGQNWGGIDIDGNMLT > 120 > > Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFV > 620 > FMVGSKQAFEV L DV+QT +QGK DV+LE HVDDTTGANEKDSLM++SFH+P SNTQFV > Sbjct: 121 FMVGSKQAFEVSLPDVAQTQMQGKTDVLLELHVDDTTGANEKDSLMDLSFHVPTSNTQFV > 180 > > Query: 621 GDENTPPXQV 650 > GDE+ PP + > Sbjct: 181 GDESRPPAHI 190 > > > >SP_OV:Q9W602 Q9w602 xenopus laevis (african clawed frog). duf87. > 10/2002 > Length = 693 > > Score = 120 bits (302), Expect = 9e-27 > Identities = 64/173 (36%), Positives = 100/173 (56%) > Frame = +3 > > Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN > 260 > M D FN+I +G N G++++ G+++K GK + ADI V W +V + > Sbjct: 1 MADTLEFNDIYQEVKGSMNDGRLRLSRAGLMYKNNKTGKVENISAADIAEVVWRRVALGH > 60 > > Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA > 440 > + + G YK+ GFR+ + L ++F++ F + + EK L V G NWG V G +L+ > Sbjct: 61 GIKLLTNGGHVYKYDGFRETEYDKLFDYFKSHFSVELVEKDLCVKGWNWGSVRFGGQLLS > 120 > > Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 > F +G + AFE+PL++VSQ GKN+V LEFH +D + + SLMEI F++P > Sbjct: 121 FDIGDQPAFELPLSNVSQCT-TGKNEVTLEFHQND----DSEVSLMEIRFYVP 168 > > > >SP_RO:Q8CGA6 Q8cga6 mus musculus (mouse). similar to structure specific > recognition protein 1. 3/2003 > Length = 711 > > Score = 115 bits (287), Expect = 5e-25 > Identities = 59/167 (35%), Positives = 97/167 (57%) > Frame = +3 > > Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI > 278 > FN+I +G N G++++ GI++K GK + ++ W +V + L + > Sbjct: 7 FNDIFQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT > 66 > > Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK > 458 > K+G YK+ GFR+ + L++FF+ + + + EK L V G NWG V G +L+F +G + > Sbjct: 67 KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ > 126 > > Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 > FE+PL++VSQ GKN+V LEFH +D + + SLME+ F++P > Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168 > > > >SW:SSRP_HUMAN Q08945 homo sapiens (human). structure-specific > recognition protein 1 > (ssrp1) (recombination signal sequence recognition > protein) (t160) (chromatin-specific transcription > elongation factor 80 kda subunit) (fact 80 kda subunit). > 9/2003 > Length = 709 > > Score = 114 bits (286), Expect = 6e-25 > Identities = 58/167 (34%), Positives = 97/167 (57%) > Frame = +3 > > Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI > 278 > FN++ +G N G++++ GI++K GK + ++ W +V + L + > Sbjct: 7 FNDVYQEVKGSMNDGRLRLSRQGIIFKNSKTGKVDNIQAGELTEGIWRRVALGHGLKLLT > 66 > > Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK > 458 > K+G YK+ GFR+ + L++FF+ + + + EK L V G NWG V G +L+F +G + > Sbjct: 67 KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ > 126 > > Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 > FE+PL++VSQ GKN+V LEFH +D + + SLME+ F++P > Sbjct: 127 PVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAEVSLMEVRFYVP 168 > > > >SW:SSRP_MOUSE Q08943 mus musculus (mouse). structure-specific > recognition protein 1 > (ssrp1) (recombination signal sequence recognition > protein) (t160). 9/2003 > Length = 708 > > Score = 108 bits (270), Expect = 5e-23 > Identities = 56/167 (33%), Positives = 95/167 (56%) > Frame = +3 > > Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI > 278 > FN+I +G N G++++ GI++K GK + ++ W +V + L + > Sbjct: 7 FNDIFQEVKGSMNDGRLRLSPSGIIFKNSKTGKVDNIQAGELTEGIWPRVALGHGLKLLT > 66 > > Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK > 458 > K+G YK+ GFR+ + L++FF+ + + + EK L V G NWG V G +L+F +G + > Sbjct: 67 KNGHVYKYDGFRESEFEKLSDFFKTHYRLELMEKDLCVKGWNWGTVKFGGQLLSFDIGDQ > 126 > > Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 > FE+PL++VS Q + +V LEFH +D + + SLME+ F++P > Sbjct: 127 PVFEIPLSNVSSVP-QARIEVTLEFHQND----DPEVSLMEVRFYVP 168 > > > >SW:SSRP_DROME Q05344 drosophila melanogaster (fruit fly). single-strand > recognition > protein (ssrp) (chorion-factor 5). 9/2003 > Length = 723 > > Score = 101 bits (251), Expect = 7e-21 > Identities = 63/173 (36%), Positives = 92/173 (52%) > Frame = +3 > > Query: 81 MTDGHLFNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN > 260 > MTD +N+I RG G++K+ I++K GK ++ DI + K + > Sbjct: 1 MTDSLEYNDINAEVRGVLCSGRLKMTEQNIIFKNTKTGKVEQISAEDIDLINSQKFVGTW > 60 > > Query: 261 QLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLA > 440 > L V K G+ ++FTGFRD + L F + + + EK++ V G NWG G++L+ > Sbjct: 61 GLRVFTKGGVLHRFTGFRDSEHEKLGKFIKAAYSQEMVEKEMCVKGWNWGTARFMGSVLS > 120 > > Query: 441 FMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 > F SK FEVPL+ VSQ + GKN+V LEFH +D L+E+ FHIP > Sbjct: 121 FDKESKTIFEVPLSHVSQC-VTGKNEVTLEFHQNDDAPV----GLLEMRFHIP 168 > > > >SP_FUN:O94529 O94529 schizosaccharomyces pombe (fission yeast). > putative structure > specific recognition protein. 3/2003 > Length = 512 > > Score = 96.7 bits (239), Expect = 2e-19 > Identities = 48/161 (29%), Positives = 86/161 (52%), Gaps = 2/161 (1%) > Frame = +3 > > Query: 138 PGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRD > 317 > PG+++I G+ WK + + ++I W + R +L + +K GF > Sbjct: 19 PGKLRIAPSGLGWKSPSLAEPFTLPISEIRRFCWSRFARGYELKIILKSKDPVSLDGFSQ > 78 > > Query: 318 QDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQT > 497 > +D+ L N + F + +++K+ S+ G NWG+ + G+ L F V S+ AFE+P++ V+ T > Sbjct: 79 EDLDDLINVIKQNFDMGIEQKEFSIKGWNWGEANFLGSELVFDVNSRPAFEIPISAVTNT > 138 > > Query: 498 NLQGKNDVILEFHV--DDTTGANEKDSLMEISFHIPNSNTQ 614 > NL GKN+V LEF D + + D L+E+ ++P + + > Sbjct: 139 NLSGKNEVALEFSTTDDKQIPSAQVDELVEMRLYVPGTTAK 179 > > > >SW:SSRP_CHICK Q04678 gallus gallus (chicken). structure-specific > recognition > protein 1 (ssrp1) (recombination signal sequence > recognition protein) (t160) (fragment). 9/2003 > Length = 669 > > Score = 95.9 bits (237), Expect = 3e-19 > Identities = 48/131 (36%), Positives = 79/131 (59%) > Frame = +3 > > Query: 207 VDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQL > 386 > + +++ W +V + L + K+G YK+ GFR+ + L++FF+ + + + EK L > Sbjct: 5 IQASELAEGVWRRVALGHGLKLLTKNGHVYKYDGFRESEFDKLSDFFKAHYRLELAEKDL > 64 > > Query: 387 SVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEK > 566 > V G NWG V G +L+F +G + FE+PL++VSQ GKN+V LEFH +D + + > Sbjct: 65 CVKGWNWGTVRFGGQLLSFDIGEQPVFEIPLSNVSQCT-TGKNEVTLEFHQND----DAE > 119 > > Query: 567 DSLMEISFHIP 599 > SLME+ F++P > Sbjct: 120 VSLMEVRFYVP 130 > > > >SP_IN:Q8IL56 Q8il56 plasmodium falciparum (isolate 3d7). structure > specific > recognition protein, putative. 3/2003 > Length = 506 > > Score = 94.0 bits (232), Expect = 1e-18 > Identities = 50/170 (29%), Positives = 89/170 (51%), Gaps = 5/170 (2%) > Frame = +3 > > Query: 120 GRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSN-----QLGVQIKD > 284 > G GG++ G ++ + + WK + + +DI W+K +N +LG + K+ > Sbjct: 21 GFGGSDFGSFRMSNEFLGWKNKKTNNVYQYKCSDIDEGCWIKTSYNNNRLHLKLG-ESKE > 79 > > Query: 285 GLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQA > 464 > + F GF D++V +T FQ F I + ++++ G NWG+ L + L F + +K A > Sbjct: 80 NIIIYFDGFPDRNVNEITQHFQKYFNIRLNNRKIATKGWNWGEFKLENSNLCFDIDNKYA > 139 > > Query: 465 FEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQ 614 > F +P +++Q N+Q K D+ +EF D+ +D L EI F+ P+ N + > Sbjct: 140 FNLPTNNINQLNVQIKTDIAMEFKNDENNNKGNEDFLAEIRFYYPHENDE 189 > > > >SW:SSRP_CAEEL P41848 caenorhabditis elegans. probable > structure-specific > recognition protein 1 (ssrp1) (recombination signal > sequence recognition protein). 9/2003 > Length = 697 > > Score = 92.0 bits (227), Expect = 4e-18 > Identities = 48/153 (31%), Positives = 82/153 (53%) > Frame = +3 > > Query: 141 GQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQIKDGLFYKFTGFRDQ > 320 > G +K+ + +K GGKS+ V +DI + W K+ L V + DG ++F GF+D > Sbjct: 20 GTLKLTEKSLNFKGDKGGKSVNVTGSDIDKLKWQKLGNKPGLRVGLNDGGAHRFGGFKDT > 79 > > Query: 321 DVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVPLADVSQTN > 500 > D+ + +F + + ++ + L + G N+G ++ G + F K FE+P +VS > Sbjct: 80 DLEKIQSFTSSNWSQSIDQSNLFIKGWNYGQAEVKGKTVEFSWEDKPIFEIPCTNVSNV- > 138 > > Query: 501 LQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 > + KN+ +LEFH +D + K LME+ FH+P > Sbjct: 139 IANKNEAVLEFHQND----DSKVQLMEMRFHMP 167 > > > >SW:YMG9_YEAST Q04636 saccharomyces cerevisiae (baker's yeast). > hypothetical 63.0 > kda protein in dak1-orc1 intergenic region. 5/2000 > Length = 552 > > Score = 89.0 bits (219), Expect = 4e-17 > Identities = 50/161 (31%), Positives = 80/161 (49%), Gaps = 8/161 (4%) > Frame = +3 > > Query: 141 GQIKIFSGGILWK--RQGGGKSIEVDK------ADIVSVTWMKVPRSNQLGVQIKDGLFY > 296 > G+ +I G+ WK GG + + K ++ +V W + R L + K+ > Sbjct: 17 GRFRIADSGLGWKISTSGGSAANQARKPFLLPATELSTVQWSRGCRGYDLKINTKNQGVI > 76 > > Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP > 476 > + GF D + N F F I V++++ S+ G NWG DL N + F + K FE+P > Sbjct: 77 QLDGFSQDDYNLIKNDFHRRFNIQVEQREHSLRGWNWGKTDLARNEMVFALNGKPTFEIP > 136 > > Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIP 599 > A ++ TNL KN+V +EF++ D D L+E+ F+IP > Sbjct: 137 YARINNTNLTSKNEVGIEFNIQDEEYQPAGDELVEMRFYIP 177 > > > >SP_IN:O01683 O01683 caenorhabditis elegans. c32f10.5 protein. 3/2003 > Length = 689 > > Score = 86.7 bits (213), Expect = 2e-16 > Identities = 50/186 (26%), Positives = 90/186 (47%) > Frame = +3 > > Query: 99 FNNITLGGRGGTNPGQIKIFSGGILWKRQGGGKSIEVDKADIVSVTWMKVPRSNQLGVQI > 278 > F + + G G + + I + GGKS+ + D+ + W K+ L V + > Sbjct: 6 FKGVYVEDIGHLTCGTLTLTENSINFIGDKGGKSVYITGTDVDKLKWQKLGNKPGLRVGL > 65 > > Query: 279 KDGLFYKFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSK > 458 > DG ++F GF D D+ + +F + + ++ + L ++G N+G D+ G + F ++ > Sbjct: 66 SDGGAHRFGGFLDDDLQKIQSFTSSNWSKSINQSNLFINGWNYGQADVKGKNIEFSWENE > 125 > > Query: 459 QAFEVPLADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNTQFVGDENTP > 638 > FE+P +VS + KN+ ILEFH ++ K LME+ FH+P +E+T > Sbjct: 126 PIFEIPCTNVSNV-IANKNEAILEFHQNE----QSKVQLMEMRFHMP---VDLENEEDTD > 177 > > Query: 639 PXQVFR 656 > + F+ > Sbjct: 178 KVEEFK 183 > > > >SP_FUN:Q9HFC4 Q9hfc4 zygosaccharomyces rouxii (candida mogii). > ssrp1-like protein > (fragment). 10/2002 > Length = 542 > > Score = 85.1 bits (209), Expect = 5e-16 > Identities = 48/165 (29%), Positives = 79/165 (47%), Gaps = 8/165 (4%) > Frame = +3 > > Query: 141 GQIKIFSGGILWKRQGGGKSIE--------VDKADIVSVTWMKVPRSNQLGVQIKDGLFY > 296 > G+ +I G+ WK G S + ++ +V W + R +L V K+ > Sbjct: 45 GRFRIADSGLGWKSANAGGSAANQSKQPFLLPATELSTVQWSRGCRGFELKVNTKNQGVV > 104 > > Query: 297 KFTGFRDQDVLSLTNFFQNTFGIAVKEKQLSVSGRNWGDVDLNGNMLAFMVGSKQAFEVP > 476 > + GF D + N F F + V+ K+ S+ G NWG DL N + F + + +FEVP > Sbjct: 105 QLDGFAPDDFNLIKNDFHRRFNVQVEPKEHSLRGWNWGKADLARNEMVFALNGRPSFEVP > 164 > > Query: 477 LADVSQTNLQGKNDVILEFHVDDTTGANEKDSLMEISFHIPNSNT 611 > A ++ TNL K +V +EF++ D D L+E+ ++P + T > Sbjct: 165 YARINNTNLTSKTEVAIEFNLADENYQPAGDELVEMRLYVPGTVT 209 > > > Database: /home/seqstore/ncbi/blast/data/swplus > Posted date: Apr 15, 2003 12:04 PM > Number of letters in database: 303,757,025 > Number of sequences in database: 954,989 > > Lambda K H > 0.318 0.135 0.401 > > Gapped > Lambda K H > 0.267 0.0410 0.140 > > > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Hits to DB: 385,793,622 > Number of Sequences: 954989 > Number of extensions: 8541745 > Number of successful extensions: 21678 > Number of sequences better than 1.0e-06: 36 > Number of HSP's better than 0.0 without gapping: 21171 > Number of HSP's successfully gapped in prelim test: 0 > Number of HSP's that attempted gapping in prelim test: 0 > Number of HSP's gapped (non-prelim): 21664 > length of database: 303,757,025 > effective HSP length: 116 > effective length of database: 192,978,301 > effective search space used: 20455699906 > frameshift window, decay const: 50, 0.1 > T: 12 > A: 40 > X1: 16 ( 7.3 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 41 (21.7 bits) > > =========== > > Richard Holland > Bioinformatics Database Developer > ITS, Agresearch Invermay x3279 > > > > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: Friday, 5 September 2003 9:39 a.m. > To: Holland, Richard > Cc: bioperl-l@bioperl.org; McCulloch, Alan > Subject: Re: [Bioperl-l] Blastx parser misses scores > > > Can you please provide and example report and code which doesn't behave > as you would expect. > > Are you talking about the case where you have 50 hits listed in the > summary but say only 25 HSP alignments? > > > On Fri, 5 Sep 2003, Holland, Richard wrote: > > > Hi, > > > > I have run into a problem with Bio::SearchIO::blast parsing blastx > > result files. This may affect other blast outputs as well but I'm not > > sure. > > > > At the top of a blastx output there is a summary of the best hits in > > the results file. Then, all the hits are listed, even the ones which > > are not in the best hits list. > > > > The Bio::Perl parser successfully parses all the hits from the file, > > however it only returns scores for those which appear in the summary. > > I have found the code which does this in Bio::SearchIO::blast and > > noticed that this seems to be deliberate - in all cases, blastx or > > not, the scores are taken from the summary, and the scores in the hit > > details appear to be ignored. > > > > Is this a feature or a bug? We would like to be able to use Bio::Perl > > to parse out all the results from our blast reports including all > > their scores and details, regardless of whether or not they appear in > > the best hits summary. > > > > Can anyone help? > > > > cheers, > > Richard > > ====================================================================== > > = > > Attention: The information contained in this message and/or > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > privileged > > material. Any review, retransmission, dissemination or other use of, > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From birney at ebi.ac.uk Fri Sep 5 04:46:15 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Fri Sep 5 04:44:59 2003 Subject: [Bioperl-l] Re: cloning and Storable In-Reply-To: Message-ID: Hmmmm. That looks v. neat, in particular the lazy-loading. I still worry about clone() but sense is that this is going to get in. Can we have a big warning on the clone() function that it might get itself truely tangled up if it hits exotic objects in its path... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From vesko_baev at abv.bg Fri Sep 5 05:38:25 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Fri Sep 5 05:37:11 2003 Subject: [Bioperl-l] 2seq compare? Message-ID: <1036420488.1062754705436.JavaMail.nobody@storage.ni.bg> Hi friends, I'm new one in this area. I have an idea to compare&align dna-seq with dna-gene(seq), which module to use? bl2seq? simple align? pSW (but I understand that he works only with proteins) or??? Thanks, Vesko. ----------------------------------------------------------------- http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. From heikki at ebi.ac.uk Fri Sep 5 05:40:31 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Fri Sep 5 05:39:23 2003 Subject: [Bioperl-l] Re: cloning and Storable In-Reply-To: References: Message-ID: <1062754830.4056.24.camel@localhost> Will, This does not solve my problem but is a great module to add into bioperl. I'll do it right now if you promise to write a test file in near future. ;-) I added the following text into docs: Anyone planning to use Bio::Root::Storable in bioperl modules: Storable is not part of all perl core libraries. When inheriting from it, you have to do: eval { require Storable; }; and fail gracefully. -Heikki On Fri, 2003-09-05 at 09:39, Will Spooner wrote: > This may be a good time to mention that we have developed a > Bio::Root::Storable module for use with the Ensembl web site. > > This module is used specifically for serialising/retrieving Search objects > to disk after parsing with SearchIO. It is in production, and works very > well, even in a high-throughput environment. > > The implementation is generic (can be inhereted by any bioperl object), > and even implements a clone() method. > > I have attached the module to this mail, and pasted the description below. > I would be delighted to see this module incorperated into BioPerl if > appropriate. I don't know whether it will solve Hekki's problem, but may > provide an alternative to writing a new serialiser! > > Regards, > > Will > > --- > Will Spooner whs@sanger.ac.uk > Ensembl Web Developer http://www.ensembl.org > > > NAME > Bio::Root::Storable - object serialisation methods > > SYNOPSIS > my $storable = Bio::Root::Storable->new(); > > # Store/retrieve using class retriever > my $token = $storable->store(); > my $storable2 = Bio::Root::Storable->retrieve( $token ); > > # Store/retrieve using object retriever > my $storable2 = $storable->new_retrievable(); > $storable2->retrieve(); > > DESCRIPTION > Generic module that allows objects to be safely stored/retrieved from > disk. Can be inhereted by any BioPerl object. As it will not usually be > the first class in the inheretence list, _initialise_storable() should > be called during object instantiation. > > Currently stores objects in binary format (using the Perl Storable > module). This can cause problems when storing and retrieving with different > versions of Storable (e.g. on different machines). An ASCII storage > option (using Data::Dumper) may be implemented in the future. > > Object storage is recursive; If the object being stored contains other > storable objects, these will be stored seperately, and replaced by a > skeleton object in the parent heirarchy. When the parent is later > retrieved, its children remain in the skeleton state until explicitly > retrieved by the parent. This lazy-retrieve approach has obvious memory > efficiency benefits for certain applications. > -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Fri Sep 5 06:25:43 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Fri Sep 5 06:25:33 2003 Subject: [Bioperl-l] 2seq compare? In-Reply-To: <1036420488.1062754705436.JavaMail.nobody@storage.ni.bg> References: <1036420488.1062754705436.JavaMail.nobody@storage.ni.bg> Message-ID: <1062757543.4058.31.camel@localhost> Vesko, Basic, no frills sequence alignment is best done using tcoffee and clustalw. Both programs have bioperl wrappers. Yours, -Heikki On Fri, 2003-09-05 at 10:38, Vesko Baev wrote: > Hi friends, > I'm new one in this area. > I have an idea to compare&align dna-seq with dna-gene(seq), which module to use? > bl2seq? > simple align? > pSW (but I understand that he works only with proteins) > or??? > > Thanks, Vesko. > > > > > ----------------------------------------------------------------- > http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From mkur at poczta.gazeta.pl Wed Sep 3 21:55:56 2003 From: mkur at poczta.gazeta.pl (Michal Kurowski) Date: Fri Sep 5 08:26:20 2003 Subject: [Bioperl-l] SeqFactory diff Message-ID: <20030904015556.GA9204@calvados> Hi, The patch is required (!) to make it possible to define package in the same file where one actually uses Bio::Seq::SeqFactory. Cheers, -- Michal Kurowski -------------- next part -------------- --- SeqFactory.pm Sat Nov 9 21:24:06 2002 +++ /usr/lib/perl5/site_perl/Bio/Seq/SeqFactory.pm Thu Sep 4 03:43:50 2003 @@ -140,8 +140,8 @@ sub type{ my ($self,$value) = @_; if( defined $value) { - eval "require $value"; - if( $@ ) { $self->throw("$@: Unrecognized Sequence type for SeqFactory '$value'");} +# eval "require $value"; +# if( $@ ) { $self->throw("$@: Unrecognized Sequence type for SeqFactory '$value'");} my $a = bless {},$value; unless( $a->isa('Bio::PrimarySeqI') || From whs at sanger.ac.uk Fri Sep 5 04:39:58 2003 From: whs at sanger.ac.uk (Will Spooner) Date: Fri Sep 5 08:26:46 2003 Subject: [Bioperl-l] cloning and Storable Message-ID: This may be a good time to mention that we have developed a Bio::Root::Storable module for use with the Ensembl web site. This module is used specifically for serialising/retrieving Search objects to disk after parsing with SearchIO. It is in production, and works very well, even in a high-throughput environment. The implementation is generic (can be inhereted by any bioperl object), and even implements a clone() method. I have attached the module to this mail, and pasted the description below. I would be delighted to see this module incorperated into BioPerl if appropriate. I don't know whether it will solve Hekki's problem, but may provide an alternative to writing a new serialiser! Regards, Will --- Will Spooner whs@sanger.ac.uk Ensembl Web Developer http://www.ensembl.org NAME Bio::Root::Storable - object serialisation methods SYNOPSIS my $storable = Bio::Root::Storable->new(); # Store/retrieve using class retriever my $token = $storable->store(); my $storable2 = Bio::Root::Storable->retrieve( $token ); # Store/retrieve using object retriever my $storable2 = $storable->new_retrievable(); $storable2->retrieve(); DESCRIPTION Generic module that allows objects to be safely stored/retrieved from disk. Can be inhereted by any BioPerl object. As it will not usually be the first class in the inheretence list, _initialise_storable() should be called during object instantiation. Currently stores objects in binary format (using the Perl Storable module). This can cause problems when storing and retrieving with different versions of Storable (e.g. on different machines). An ASCII storage option (using Data::Dumper) may be implemented in the future. Object storage is recursive; If the object being stored contains other storable objects, these will be stored seperately, and replaced by a skeleton object in the parent heirarchy. When the parent is later retrieved, its children remain in the skeleton state until explicitly retrieved by the parent. This lazy-retrieve approach has obvious memory efficiency benefits for certain applications. -------------- next part -------------- =head1 NAME Bio::Root::Storable - object serialisation methods =head1 SYNOPSIS my $storable = Bio::Root::Storable->new(); # Store/retrieve using class retriever my $token = $storable->store(); my $storable2 = Bio::Root::Storable->retrieve( $token ); # Store/retrieve using object retriever my $storable2 = $storable->new_retrievable(); $storable2->retrieve(); =head1 DESCRIPTION Generic module that allows objects to be safely stored/retrieved from disk. Can be inhereted by any BioPerl object. As it will not usually be the first class in the inheretence list, _initialise_storable() should be called during object instantiation. Currently stores objects in binary format (using the Perl Storable module). This can cause problems when storing and retrieving with different versions of Storable (e.g. on different machines). An ASCII storage option (using Data::Dumper) may be implemented in the future. Object storage is recursive; If the object being stored contains other storable objects, these will be stored seperately, and replaced by a skeleton object in the parent heirarchy. When the parent is later retrieved, its children remain in the skeleton state until explicitly retrieved by the parent. This lazy-retrieve approach has obvious memory efficiency benefits for certain applications. =cut # Let the code begin... package Bio::Root::Storable; use strict; #use Data::Dumper qw( Dumper ); use Storable qw( freeze thaw dclone ); use vars qw(@ISA); use Bio::Root::Root; use Bio::Root::IO; @ISA = qw( Bio::Root::Root ); #---------------------------------------------------------------------- =head2 new Arg [1] : -workdir => filesystem path, -template => tmpfile template, -suffix => tmpfile suffix, Function : Builds a new Bio::Root::Storable inhereting object Returntype: Bio::Root::Storable inhereting object Exceptions: Caller : Example : $storable = Bio::Root::Storable->new() =cut sub new { my ($caller, @args) = @_; my $self = $caller->SUPER::new(@args); $self->_initialise_storable; return $self; } #---------------------------------------------------------------------- =head2 _initialise_storable Arg [1] : See 'new' method Function : Initialises storable-specific attributes Returntype: boolean Exceptions: Caller : Example : =cut sub _initialise_storable { my $self = shift; my( $workdir, $template, $suffix ) = $self->_rearrange([qw(WORKDIR TEMPLATE SUFFIX)], @_ ); $workdir && $self->workdir ( $workdir ); $template && $self->template( $template ); $suffix && $self->suffix ( $suffix ); return 1; } #---------------------------------------------------------------------- =head2 statefile Arg [1] : string (optional) Function : Accessor for the file to write state into. Should not normaly use as a setter - let Root::IO do this for you. Returntype: string Exceptions: Caller : Bio::Root::Storable->store Example : my $statefile = $obj->statefile(); =cut sub statefile{ my $key = '_statefile'; my $self = shift; if( @_ ){ $self->{$key} = shift } if( ! $self->{$key} ){ # Create a new statefile my $workdir = $self->workdir; my $template = $self->template; my $suffix = $self->suffix; # TODO: add cleanup and unlink methods. For now, we'll keep the # statefile hanging around. my @args = ( CLEANUP=>0, UNLINK=>0 ); if( $template ){ push( @args, 'TEMPLATE' => $template )}; if( $workdir ){ push( @args, 'DIR' => $workdir )}; if( $suffix ){ push( @args, 'SUFFIX' => $suffix )}; my( $fh, $file ) = Bio::Root::IO->new->tempfile( @args ); $self->{$key} = $file; } return $self->{$key}; } #---------------------------------------------------------------------- =head2 workdir Arg [1] : string (optional) (TODO - convert to array for x-platform) Function : Accessor for the statefile directory. Defaults to $Bio::Root::IO::TEMPDIR Returntype: string Exceptions: Caller : Example : $obj->workdir('/tmp/foo'); =cut sub workdir { my $key = '_workdir'; my $self = shift; if( @_ ){ my $caller = join( ', ', (caller(0))[1..2] ); $self->{$key} && $self->debug("Overwriting workdir: probably bad!"); $self->{$key} = shift } $self->{$key} ||= $Bio::Root::IO::TEMPDIR; return $self->{$key}; } #---------------------------------------------------------------------- =head2 template Arg [1] : string (optional) Function : Accessor for the statefile template. Defaults to XXXXXXXX Returntype: string Exceptions: Caller : Example : $obj->workdir('RES_XXXXXXXX'); =cut sub template { my $key = '_template'; my $self = shift; if( @_ ){ $self->{$key} = shift } $self->{$key} ||= 'XXXXXXXX'; return $self->{$key}; } #---------------------------------------------------------------------- =head2 suffix Arg [1] : string (optional) Function : Accessor for the statefile template. Returntype: string Exceptions: Caller : Example : $obj->suffix('.state'); =cut sub suffix { my $key = '_suffix'; my $self = shift; if( @_ ){ $self->{$key} = shift } return $self->{$key}; } #---------------------------------------------------------------------- =head2 new_retrievable Arg [1] : Same as for 'new' Function : Similar to store, except returns a 'skeleton' of the calling object, rather than the statefile. The skeleton can be repopulated by calling 'retrieve'. This will be a clone of the original object. Returntype: Bio::Root::Storable inhereting object Exceptions: Caller : Example : my $clone = $obj->new_retrievable(); # skeleton $skel->retrieve(); # clone =cut sub new_retrievable{ my $self = shift; my @args = @_; $self->_initialise_storable( @args ); if( $self->retrievable ){ return $self->clone } # Clone retrievable return bless( { _statefile => $self->store(@args), _workdir => $self->workdir, _suffix => $self->suffix, _template => $self->template, _retrievable => 1 }, ref( $self ) ); } #---------------------------------------------------------------------- =head2 retrievable Arg [1] : none Function : Reports whether the object is in 'skeleton' state, and the 'retrieve' method can be called. Returntype: boolean Exceptions: Caller : Example : if( $obj->retrievable ){ $obj->retrieve } =cut sub retrievable { my $self = shift; if( @_ ){ $self->{_retrievable} = shift } return $self->{_retrievable}; } #---------------------------------------------------------------------- =head2 token Arg [1] : None Function : Accessor for token attribute Returntype: string. Whatever retrieve needs to retrieve. This base implementation returns the statefile Exceptions: Caller : Example : my $token = $obj->token(); =cut sub token{ my $self = shift; return $self->statefile; } #---------------------------------------------------------------------- =head2 store Arg [1] : none Function : Saves a serialised representation of the object structure to disk. Returns the name of the file that the object was saved to. Returntype: string Exceptions: Caller : Example : my $token = $obj->store(); =cut sub store{ my $self = shift; my $statefile = $self->statefile; my $store_obj = $self->serialise; my $io = Bio::Root::IO->new( ">$statefile" ); $io->_print( $store_obj->freeze() ); $self->debug( "STORING $store_obj to $statefile\n" ); return $statefile; } #---------------------------------------------------------------------- =head2 serialise Arg [1] : none Function : Prepares the the serialised representation of the object. Object attribute names starting with '__' are skipped. This is useful for those that do not serialise too well (e.g. filehandles). Attributes are examined for other storable objects. If these are found they are serialised seperately using 'new_retrievable' Returntype: string Exceptions: Caller : Example : my $serialised = $obj->_prepare_storable(); =cut sub serialise{ my $self = shift; # Create a new object of same class that is going to be serialised my $store_obj = bless( {}, ref( $self ) ); my %retargs = ( -workdir =>$self->workdir, -suffix =>$self->suffix, -template=>$self->template ); # Assume that other storable bio objects held by this object are # only 1-deep. foreach my $key( keys( %$self ) ){ if( $key =~ /^__/ ){ next } # Ignore keys starting with '__' my $value = $self->{$key}; # Scalar value if( ! ref( $value ) ){ $store_obj->{$key} = $value; } # Bio::Root::Storable obj: save placeholder elsif( ref($value) =~ /^Bio::/ and $value->isa('Bio::Root::Storable') ){ # Bio::Root::Storable $store_obj->{$key} = $value->new_retrievable( %retargs ); next; } # Arrayref value. Look for Bio::Root::Storable objs elsif( ref( $value ) eq 'ARRAY' ){ my @ary; foreach my $val( @$value ){ if( ref($val) =~ /^Bio::/ and $val->isa('Bio::Root::Storable') ){ push( @ary, $val->new_retrievable( %retargs ) ); } else{ push( @ary, $val ) } } $store_obj->{$key} = \@ary; } # Hashref value. Look for Bio::Root::Storable objs elsif( ref( $value ) eq 'HASH' ){ my %hash; foreach my $k2( keys %$value ){ my $val = $value->{$k2}; if( ref($val) =~ /^Bio::/ and $val->isa('Bio::Root::Storable') ){ $hash{$k2} = $val->new_retrievable( %retargs ); } else{ $hash{$k2} = $val } } $store_obj->{$key} = \%hash; } # Unknown, just add to the store object regardless else{ $store_obj->{$key} = $value } } $store_obj->retrievable(0); # Once deserialised, obj not retrievable return $store_obj; } #---------------------------------------------------------------------- =head2 retrieve Arg [1] : string; filesystem location of the state file to be retrieved Function : Retrieves a stored object from disk. Note that the retrieved object will be blessed into its original class, and not the Returntype: Bio::Root::Storable inhereting object Exceptions: Caller : Example : my $obj = Bio::Root::Storable->retrieve( $token ); =cut sub retrieve{ my( $caller, $statefile ) = @_; my $self = {}; my $class = ref( $caller ) || $caller; # Is this a call on a retrievable object? if( ref( $caller ) and $caller->retrievable ){ $self = $caller; $statefile = $self->statefile; } bless( $self, $class ); # Recover serialised object if( ! -f $statefile ){ $self->throw( "Token $statefile is not found" ); } my $io = Bio::Root::IO->new( $statefile ); local $/ = undef(); my $state_str = $io->_readline; # Dynamic-load modules required by stored object my $stored_obj; my $success; for( my $i=0; $i<10; $i++ ){ eval{ $stored_obj = thaw( $state_str ) }; if( ! $@ ){ $success=1; last } my $package; if( $@ =~ /Cannot restore overloading/i ){ my $postmatch = $'; if( $postmatch =~ /\(package +([\w\:]+)\)/ ) { $package = $1; } } if( $package ){ eval "require $package"; $self->throw($@) if $@; } else{ $self->throw($@) } } if( ! $success ){ $self->throw("maximum number of requires exceeded" ) } if( ! ref( $stored_obj ) ){ $self->throw( "Token $statefile returned no data" ); } map{ $self->{$_} = $stored_obj->{$_} } keys %$stored_obj; # Copy hasheys $self->retrievable(0); # Maintain class of stored obj return $self; } #---------------------------------------------------------------------- =head2 clone Arg [1] : none Function : Returns a clone of the calling object Returntype: Bio::Root::Storable inhereting object Exceptions: Caller : Example : my $clone = $obj->clone(); =cut sub clone { my $self = shift; return dclone( $self ); } #---------------------------------------------------------------------- =head2 remove Arg [1] : none Function : Clears the stored object from disk Returntype: boolean Exceptions: Caller : Example : $obj->remove(); =cut sub remove { my $self = shift; if( -e $self->statefile ){ unlink( $self->statefile ); } return 1; } #---------------------------------------------------------------------- 1; From whs at sanger.ac.uk Fri Sep 5 10:16:00 2003 From: whs at sanger.ac.uk (Will Spooner) Date: Fri Sep 5 10:14:43 2003 Subject: [Bioperl-l] Re: cloning and Storable In-Reply-To: <1062754830.4056.24.camel@localhost> Message-ID: On Fri, 5 Sep 2003, Heikki Lehvaslaiho wrote: > Will, > > This does not solve my problem but is a great module to add into > bioperl. I'll do it right now if you promise to write a test file in > near future. ;-) > > > I added the following text into docs: > > Anyone planning to use Bio::Root::Storable in bioperl modules: > Storable is not part of all perl core libraries. When inheriting from > it, you have to do: > eval { require Storable; }; > and fail gracefully. > I have plans to add an ASCII option that replaces Storable with Data::Dumper. That's a core Perl 5.8 module, but I don't know about earlier versions. Will > > -Heikki > > On Fri, 2003-09-05 at 09:39, Will Spooner wrote: > > This may be a good time to mention that we have developed a > > Bio::Root::Storable module for use with the Ensembl web site. > > > > This module is used specifically for serialising/retrieving Search objects > > to disk after parsing with SearchIO. It is in production, and works very > > well, even in a high-throughput environment. > > > > The implementation is generic (can be inhereted by any bioperl object), > > and even implements a clone() method. > > > > I have attached the module to this mail, and pasted the description below. > > I would be delighted to see this module incorperated into BioPerl if > > appropriate. I don't know whether it will solve Hekki's problem, but may > > provide an alternative to writing a new serialiser! > > > > Regards, > > > > Will > > > > --- > > Will Spooner whs@sanger.ac.uk > > Ensembl Web Developer http://www.ensembl.org > > > > > > NAME > > Bio::Root::Storable - object serialisation methods > > > > SYNOPSIS > > my $storable = Bio::Root::Storable->new(); > > > > # Store/retrieve using class retriever > > my $token = $storable->store(); > > my $storable2 = Bio::Root::Storable->retrieve( $token ); > > > > # Store/retrieve using object retriever > > my $storable2 = $storable->new_retrievable(); > > $storable2->retrieve(); > > > > DESCRIPTION > > Generic module that allows objects to be safely stored/retrieved from > > disk. Can be inhereted by any BioPerl object. As it will not usually be > > the first class in the inheretence list, _initialise_storable() should > > be called during object instantiation. > > > > Currently stores objects in binary format (using the Perl Storable > > module). This can cause problems when storing and retrieving with different > > versions of Storable (e.g. on different machines). An ASCII storage > > option (using Data::Dumper) may be implemented in the future. > > > > Object storage is recursive; If the object being stored contains other > > storable objects, these will be stored seperately, and replaced by a > > skeleton object in the parent heirarchy. When the parent is later > > retrieved, its children remain in the skeleton state until explicitly > > retrieved by the parent. This lazy-retrieve approach has obvious memory > > efficiency benefits for certain applications. > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > From brian_osborne at cognia.com Fri Sep 5 10:50:06 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Sep 5 10:52:56 2003 Subject: [Bioperl-l] RE: [Bioperl-guts-l] [Bug 1509] Two tutorial problems: missing datafile and incorrect example In-Reply-To: <200309051406.h85E6f5l017949@portal.> Message-ID: Jim, > How about putting the data in a known production area, say /usr/local/bioperl/data? That's possible, yes. I only speak for myself but I'd say that it comes down to "cost/benefit". I tend to see the documentation in bptutorial as the most important thing, as opposed to the running of the example scripts. This is not the result of any study of the issue you understand, just my guess. I think that the most important thing for the novice is to actually write code, running the examples is interesting but not enlightening. Now, if you don't agree and would like to do this yourself, well, that's fine too! ;-) I just corrected #2. Thank you, Brian O. -----Original Message----- From: bioperl-guts-l-bounces@portal.open-bio.org [mailto:bioperl-guts-l-bounces@portal.open-bio.org]On Behalf Of bugzilla-daemon@portal.open-bio.org Sent: Friday, September 05, 2003 10:07 AM To: bioperl-guts-l@bioperl.org Subject: [Bioperl-guts-l] [Bug 1509] Two tutorial problems: missing datafile and incorrect example http://bugzilla.bioperl.org/show_bug.cgi?id=1509 jwinkle@doit.wisc.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Additional Comments From jwinkle@doit.wisc.edu 2003-09-05 10:06 ------- Thanks Brian, for the quick fix on #1. But while dying more gracefully is an improvement, the novice is still going to be hitting a roadblock. How about putting the data in a known production area, say /usr/local/bioperl/data? Did #2 (the incorrect example) get fixed as well? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l From hlapp at gmx.net Fri Sep 5 11:06:16 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Sep 5 11:04:58 2003 Subject: [Bioperl-l] SeqFactory diff In-Reply-To: <20030904015556.GA9204@calvados> Message-ID: <7F63FB0D-DFB2-11D7-B807-000A959EB4C4@gmx.net> Michal, could you post a test case please along with the error that it produces. The lines you're commenting out are there for a conscious reason, and just removing them is unlikely to be accepted into the codebase without further grounds that that would be the sole way to let you do what you wanted to do. -hilmar On Wednesday, September 3, 2003, at 06:55 PM, Michal Kurowski wrote: > > Hi, > > The patch is required (!) to make it possible to define package > in the same file where one actually uses Bio::Seq::SeqFactory. > > Cheers, > > > -- > Michal Kurowski > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mkur at poczta.gazeta.pl Fri Sep 5 12:10:17 2003 From: mkur at poczta.gazeta.pl (Michal Kurowski) Date: Fri Sep 5 12:08:59 2003 Subject: [Bioperl-l] Re: SeqFactory diff In-Reply-To: <7F63FB0D-DFB2-11D7-B807-000A959EB4C4@gmx.net> References: <20030904015556.GA9204@calvados> <7F63FB0D-DFB2-11D7-B807-000A959EB4C4@gmx.net> Message-ID: <20030905161017.GA18746@calvados> Hilmar Lapp [hlapp@gmx.net] wrote: > could you post a test case please along with the error that it produces. MSG: Can't locate MySeq.pm in @INC (@INC contains: CODE(0x84d79b8) /usr/lib/perl5/i386-linux /usr/lib/perl5 /usr/lib/perl5/site_perl/i386-linux /usr/lib/perl5/site_perl /usr/lib/perl5/site_perl .) at (eval 8) line 3. : Unrecognized Sequence type for SeqFactory 'MySeq' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/Bio/Root/Root.pm:342 STACK: Bio::Seq::SeqFactory::type /usr/lib/perl5/site_perl/Bio/Seq/SeqFactory.pm:144 STACK: Bio::Seq::SeqFactory::new /usr/lib/perl5/site_perl/Bio/Seq/SeqFactory.pm:103 STACK: /home/michal/bin/uploader2.pl:49 > The lines you're commenting out are there for a conscious reason, and > just removing them is unlikely to be accepted into the code base without > further grounds that that would be the sole way to let you do what you > wanted to do. You're right. The change breaks a lot of code and I reverted it. Sorry, a long working hours case ;-). The problem remains though. require/use will not do if you have a PrimarySeq derived class and a client code in the same file. -- Michal Kurowski From heikki at ebi.ac.uk Fri Sep 5 12:16:08 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Fri Sep 5 12:14:56 2003 Subject: [Bioperl-l] Re: cloning and Storable In-Reply-To: References: Message-ID: <1062778568.4063.42.camel@localhost> Will, We are claiming to support perl 5.005 and luckily it was the first perl release that contained Data::Dumper in core, so go ahead and add it. You could use it as the default storage method as it is guarantied to be present. How do I know? RpmFind has info about all the rpms for a the past few years. Here is a list of 5.005 modules: http://fr2.rpmfind.net//linux/RPM/redhat/6.2/i386/perl-5.00503-10.i386.html -Heikki On Fri, 2003-09-05 at 15:16, Will Spooner wrote: > On Fri, 5 Sep 2003, Heikki Lehvaslaiho wrote: > > > Will, > > > > This does not solve my problem but is a great module to add into > > bioperl. I'll do it right now if you promise to write a test file in > > near future. ;-) > > > > > > I added the following text into docs: > > > > Anyone planning to use Bio::Root::Storable in bioperl modules: > > Storable is not part of all perl core libraries. When inheriting from > > it, you have to do: > > eval { require Storable; }; > > and fail gracefully. > > > > I have plans to add an ASCII option that replaces Storable with > Data::Dumper. That's a core Perl 5.8 module, but I don't know about > earlier versions. > > Will > > > > > > -Heikki > > > > On Fri, 2003-09-05 at 09:39, Will Spooner wrote: > > > This may be a good time to mention that we have developed a > > > Bio::Root::Storable module for use with the Ensembl web site. > > > > > > This module is used specifically for serialising/retrieving Search objects > > > to disk after parsing with SearchIO. It is in production, and works very > > > well, even in a high-throughput environment. > > > > > > The implementation is generic (can be inhereted by any bioperl object), > > > and even implements a clone() method. > > > > > > I have attached the module to this mail, and pasted the description below. > > > I would be delighted to see this module incorperated into BioPerl if > > > appropriate. I don't know whether it will solve Hekki's problem, but may > > > provide an alternative to writing a new serialiser! > > > > > > Regards, > > > > > > Will > > > > > > --- > > > Will Spooner whs@sanger.ac.uk > > > Ensembl Web Developer http://www.ensembl.org > > > > > > > > > NAME > > > Bio::Root::Storable - object serialisation methods > > > > > > SYNOPSIS > > > my $storable = Bio::Root::Storable->new(); > > > > > > # Store/retrieve using class retriever > > > my $token = $storable->store(); > > > my $storable2 = Bio::Root::Storable->retrieve( $token ); > > > > > > # Store/retrieve using object retriever > > > my $storable2 = $storable->new_retrievable(); > > > $storable2->retrieve(); > > > > > > DESCRIPTION > > > Generic module that allows objects to be safely stored/retrieved from > > > disk. Can be inhereted by any BioPerl object. As it will not usually be > > > the first class in the inheretence list, _initialise_storable() should > > > be called during object instantiation. > > > > > > Currently stores objects in binary format (using the Perl Storable > > > module). This can cause problems when storing and retrieving with different > > > versions of Storable (e.g. on different machines). An ASCII storage > > > option (using Data::Dumper) may be implemented in the future. > > > > > > Object storage is recursive; If the object being stored contains other > > > storable objects, these will be stored seperately, and replaced by a > > > skeleton object in the parent heirarchy. When the parent is later > > > retrieved, its children remain in the skeleton state until explicitly > > > retrieved by the parent. This lazy-retrieve approach has obvious memory > > > efficiency benefits for certain applications. > > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From whs at sanger.ac.uk Fri Sep 5 13:10:23 2003 From: whs at sanger.ac.uk (Will Spooner) Date: Fri Sep 5 13:09:04 2003 Subject: [Bioperl-l] Re: cloning and Storable In-Reply-To: <1062778568.4063.42.camel@localhost> Message-ID: On Fri, 5 Sep 2003, Heikki Lehvaslaiho wrote: > Will, > > We are claiming to support perl 5.005 and luckily it was the first perl > release that contained Data::Dumper in core, so go ahead and add it. You > could use it as the default storage method as it is guarantied to be > present. Data::Dumper is a great debug tool, but has it's little ...problems... Now solved though - new code goes with Storable if it can, but uses Data::Dumper as a fallback. Users can set $Bio::Root::Storable::BINARY to false to force the use of Data::Dumper. Have mailed the latest module + test seperately. > > How do I know? RpmFind has info about all the rpms for a the past few > years. Here is a list of 5.005 modules: > > http://fr2.rpmfind.net//linux/RPM/redhat/6.2/i386/perl-5.00503-10.i386.html Thats a good trick ;) Will > > -Heikki > > On Fri, 2003-09-05 at 15:16, Will Spooner wrote: > > On Fri, 5 Sep 2003, Heikki Lehvaslaiho wrote: > > > > > Will, > > > > > > This does not solve my problem but is a great module to add into > > > bioperl. I'll do it right now if you promise to write a test file in > > > near future. ;-) > > > > > > > > > I added the following text into docs: > > > > > > Anyone planning to use Bio::Root::Storable in bioperl modules: > > > Storable is not part of all perl core libraries. When inheriting from > > > it, you have to do: > > > eval { require Storable; }; > > > and fail gracefully. > > > > > > > I have plans to add an ASCII option that replaces Storable with > > Data::Dumper. That's a core Perl 5.8 module, but I don't know about > > earlier versions. > > > > Will > > > > > > > > > > -Heikki > > > > > > On Fri, 2003-09-05 at 09:39, Will Spooner wrote: > > > > This may be a good time to mention that we have developed a > > > > Bio::Root::Storable module for use with the Ensembl web site. > > > > > > > > This module is used specifically for serialising/retrieving Search objects > > > > to disk after parsing with SearchIO. It is in production, and works very > > > > well, even in a high-throughput environment. > > > > > > > > The implementation is generic (can be inhereted by any bioperl object), > > > > and even implements a clone() method. > > > > > > > > I have attached the module to this mail, and pasted the description below. > > > > I would be delighted to see this module incorperated into BioPerl if > > > > appropriate. I don't know whether it will solve Hekki's problem, but may > > > > provide an alternative to writing a new serialiser! > > > > > > > > Regards, > > > > > > > > Will > > > > > > > > --- > > > > Will Spooner whs@sanger.ac.uk > > > > Ensembl Web Developer http://www.ensembl.org > > > > > > > > > > > > NAME > > > > Bio::Root::Storable - object serialisation methods > > > > > > > > SYNOPSIS > > > > my $storable = Bio::Root::Storable->new(); > > > > > > > > # Store/retrieve using class retriever > > > > my $token = $storable->store(); > > > > my $storable2 = Bio::Root::Storable->retrieve( $token ); > > > > > > > > # Store/retrieve using object retriever > > > > my $storable2 = $storable->new_retrievable(); > > > > $storable2->retrieve(); > > > > > > > > DESCRIPTION > > > > Generic module that allows objects to be safely stored/retrieved from > > > > disk. Can be inhereted by any BioPerl object. As it will not usually be > > > > the first class in the inheretence list, _initialise_storable() should > > > > be called during object instantiation. > > > > > > > > Currently stores objects in binary format (using the Perl Storable > > > > module). This can cause problems when storing and retrieving with different > > > > versions of Storable (e.g. on different machines). An ASCII storage > > > > option (using Data::Dumper) may be implemented in the future. > > > > > > > > Object storage is recursive; If the object being stored contains other > > > > storable objects, these will be stored seperately, and replaced by a > > > > skeleton object in the parent heirarchy. When the parent is later > > > > retrieved, its children remain in the skeleton state until explicitly > > > > retrieved by the parent. This lazy-retrieve approach has obvious memory > > > > efficiency benefits for certain applications. > > > > > > > -- > > > ______ _/ _/_____________________________________________________ > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > ___ _/_/_/_/_/________________________________________________________ > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > From Lobvi.Matamoros at crchul.ulaval.ca Fri Sep 5 15:20:40 2003 From: Lobvi.Matamoros at crchul.ulaval.ca (Lobvi Matamoros) Date: Fri Sep 5 13:19:26 2003 Subject: [Bioperl-l] Bio::SeqIO Message-ID: <4.2.0.58.20030904162355.00a4ce28@drs.crchul.ulaval.ca> Hi everybody: When I tried to run a perl script containing the Bio::SeqIO: object I get this message: C:\Perl\site\lib>perl Convert.pl Bio::SeqIO: txt cannot be found Exception ------------- EXCEPTION ------------- MSG: Failed to load module Bio::SeqIO::txt. Can't locate Bio\SeqIO\txt.pm in @IN C (@INC contains: C:/Perl/lib C:/Perl/site/lib .) at C:/Perl/site/lib/Bio/Root/R oot.pm line 407. STACK Bio::Root::Root::_load_module C:/Perl/site/lib/Bio/Root/Root.pm:409 STACK (eval) C:/Perl/site/lib/Bio/SeqIO.pm:535 STACK Bio::SeqIO::_load_format_module C:/Perl/site/lib/Bio/SeqIO.pm:534 STACK Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:367 STACK toplevel Convert.pl:2 -------------------------------------- For more information about the SeqIO system please see the SeqIO docs. This includes ways of checking for formats at compile time, not run time Can't call method "next_seq" on an undefined value at Convert.pl line 6. Since I believed that it was a problem of installation of this object I tried to install the Bio::SeqIO object but I got the following message: Error installing package 'Bio-SeqIO': Could not locate a PPD file for package Bi o-SeqIO On the other hand I was able to run all test the scripts contained in bptutorial.pl file which is include in the bioperl package. I am working on Windows but I was able to run these scripts by just unziping the bioperl file and coping the t folder to the directory where the installation of bioperl was done with ppm command as recommended for windows users. C:\Perl\site\lib> From heikki at ebi.ac.uk Fri Sep 5 14:04:48 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Fri Sep 5 14:03:29 2003 Subject: [Bioperl-l] Re: cloning and Storable In-Reply-To: References: Message-ID: <1062785088.4056.51.camel@localhost> Will, Your quick! I merged your next version into the committed one. Hopefully everything is fine. The test logic seems to be OK but it would be great to get confirmation that all tests pass from someone without Storable. -Heikki On Fri, 2003-09-05 at 18:10, Will Spooner wrote: > On Fri, 5 Sep 2003, Heikki Lehvaslaiho wrote: > > > Will, > > > > We are claiming to support perl 5.005 and luckily it was the first perl > > release that contained Data::Dumper in core, so go ahead and add it. You > > could use it as the default storage method as it is guarantied to be > > present. > > Data::Dumper is a great debug tool, but has it's little ...problems... > > Now solved though - new code goes with Storable if it can, but uses > Data::Dumper as a fallback. Users can set $Bio::Root::Storable::BINARY to > false to force the use of Data::Dumper. > > Have mailed the latest module + test seperately. > > > > > How do I know? RpmFind has info about all the rpms for a the past few > > years. Here is a list of 5.005 modules: > > > > http://fr2.rpmfind.net//linux/RPM/redhat/6.2/i386/perl-5.00503-10.i386.html > > Thats a good trick ;) > > Will > > > > > > > > > -Heikki > > > > On Fri, 2003-09-05 at 15:16, Will Spooner wrote: > > > On Fri, 5 Sep 2003, Heikki Lehvaslaiho wrote: > > > > > > > Will, > > > > > > > > This does not solve my problem but is a great module to add into > > > > bioperl. I'll do it right now if you promise to write a test file in > > > > near future. ;-) > > > > > > > > > > > > I added the following text into docs: > > > > > > > > Anyone planning to use Bio::Root::Storable in bioperl modules: > > > > Storable is not part of all perl core libraries. When inheriting from > > > > it, you have to do: > > > > eval { require Storable; }; > > > > and fail gracefully. > > > > > > > > > > I have plans to add an ASCII option that replaces Storable with > > > Data::Dumper. That's a core Perl 5.8 module, but I don't know about > > > earlier versions. > > > > > > Will > > > > > > > > > > > > > > -Heikki > > > > > > > > On Fri, 2003-09-05 at 09:39, Will Spooner wrote: > > > > > This may be a good time to mention that we have developed a > > > > > Bio::Root::Storable module for use with the Ensembl web site. > > > > > > > > > > This module is used specifically for serialising/retrieving Search objects > > > > > to disk after parsing with SearchIO. It is in production, and works very > > > > > well, even in a high-throughput environment. > > > > > > > > > > The implementation is generic (can be inhereted by any bioperl object), > > > > > and even implements a clone() method. > > > > > > > > > > I have attached the module to this mail, and pasted the description below. > > > > > I would be delighted to see this module incorperated into BioPerl if > > > > > appropriate. I don't know whether it will solve Hekki's problem, but may > > > > > provide an alternative to writing a new serialiser! > > > > > > > > > > Regards, > > > > > > > > > > Will > > > > > > > > > > --- > > > > > Will Spooner whs@sanger.ac.uk > > > > > Ensembl Web Developer http://www.ensembl.org > > > > > > > > > > > > > > > NAME > > > > > Bio::Root::Storable - object serialisation methods > > > > > > > > > > SYNOPSIS > > > > > my $storable = Bio::Root::Storable->new(); > > > > > > > > > > # Store/retrieve using class retriever > > > > > my $token = $storable->store(); > > > > > my $storable2 = Bio::Root::Storable->retrieve( $token ); > > > > > > > > > > # Store/retrieve using object retriever > > > > > my $storable2 = $storable->new_retrievable(); > > > > > $storable2->retrieve(); > > > > > > > > > > DESCRIPTION > > > > > Generic module that allows objects to be safely stored/retrieved from > > > > > disk. Can be inhereted by any BioPerl object. As it will not usually be > > > > > the first class in the inheretence list, _initialise_storable() should > > > > > be called during object instantiation. > > > > > > > > > > Currently stores objects in binary format (using the Perl Storable > > > > > module). This can cause problems when storing and retrieving with different > > > > > versions of Storable (e.g. on different machines). An ASCII storage > > > > > option (using Data::Dumper) may be implemented in the future. > > > > > > > > > > Object storage is recursive; If the object being stored contains other > > > > > storable objects, these will be stored seperately, and replaced by a > > > > > skeleton object in the parent heirarchy. When the parent is later > > > > > retrieved, its children remain in the skeleton state until explicitly > > > > > retrieved by the parent. This lazy-retrieve approach has obvious memory > > > > > efficiency benefits for certain applications. > > > > > > > > > -- > > > > ______ _/ _/_____________________________________________________ > > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > > ___ _/_/_/_/_/________________________________________________________ > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From hlapp at gnf.org Fri Sep 5 14:30:16 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Sep 5 14:28:57 2003 Subject: [Bioperl-l] Re: SeqFactory diff Message-ID: <833E32F61B9F8746878F2A1865BECE60530E22@EXCHCLUSTER01.lj.gnf.org> Could you post your code too? Otherwise it will be very tedious to try to reproduce the error. Ideally, if you can provide a self-contained script or collection of files/scripts that demonstrate the error, that'd be very helpful. Otherwise the chances of somebody sitting down and trying to debug this are not sky high ... -hilmar > -----Original Message----- > From: Michal Kurowski [mailto:mkur@poczta.gazeta.pl] > Sent: Friday, September 05, 2003 9:10 AM > To: Hilmar Lapp > Cc: Bioperl > Subject: [Bioperl-l] Re: SeqFactory diff > > > Hilmar Lapp [hlapp@gmx.net] wrote: > > > could you post a test case please along with the error that it > > produces. > > MSG: Can't locate MySeq.pm in @INC (@INC contains: CODE(0x84d79b8) > /usr/lib/perl5/i386-linux /usr/lib/perl5 > /usr/lib/perl5/site_perl/i386-linux > /usr/lib/perl5/site_perl /usr/lib/perl5/site_perl .) at (eval > 8) line 3. > : Unrecognized Sequence type for SeqFactory 'MySeq' > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/Bio/Root/Root.pm:342 > STACK: Bio::Seq::SeqFactory::type > /usr/lib/perl5/site_perl/Bio/Seq/SeqFactory.pm:144 > STACK: Bio::Seq::SeqFactory::new > /usr/lib/perl5/site_perl/Bio/Seq/SeqFactory.pm:103 > STACK: /home/michal/bin/uploader2.pl:49 > > > The lines you're commenting out are there for a conscious > reason, and > > just removing them is unlikely to be accepted into the code > base without > > further grounds that that would be the sole way to let you > do what you > > wanted to do. > > You're right. The change breaks a lot of code and I reverted > it. Sorry, a long working hours case ;-). > > The problem remains though. require/use will not do if you have > a PrimarySeq derived class and a client code in the same file. > > > -- > Michal Kurowski > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From amackey at pcbi.upenn.edu Fri Sep 5 14:57:00 2003 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Fri Sep 5 14:55:40 2003 Subject: [Bioperl-l] Fwd: Help Message-ID: Begin forwarded message: > From: Preetam Shah > Date: Fri Sep 5, 2003 2:28:17 PM US/Eastern > To: amackey@virginia.edu > Subject: Help > > > > Dear Dr. Mackey, > I was wondering if you could help me in any way. I saw you name > on the > web when I did a search on this matter. Currently NCBI's batch Entrez > allows me to download batch sequences with a file of Gi's and Accession > numbers. However I want to be able to download the only specific > sequences > based on GSS Ids eg. ENTIE01TF. I have done this in the past but NCBI > has > changed this...please see e-mail below. Do you have a script that might > help me extract this informaton. I can run Bioperl or perl but I am > not a > programmer. Any help is appreciated. I have to retrieve thousands of > sequences at a time and manually creating a file of Gi's or Accession > numbers is tedious. I have files of GSS Ids. Thanks for your help. > > E mail from NCBI: > Date: Wed, 3 Sep 2003 18:10:45 -0400 > From: "Gabrielian, Andrei (NIH/NLM/NCBI)" > To: 'Preetam Shah' > Subject: RE: Urgent > Dear NCBI user, > Batch Entrez that allowed to use anything except GB Id or gi, was > broken > (was doing all-fields search, whis is wrong). Now it is fixed, and it > accepts Genbank accession numbers or gi's only. > You can download the corresponding files from the FTP site and run the > script on them to fish out the necessary sequences. I do not see any > alternative. > Regards, > A.Gabrielian > NCBI Help desk > > > Regards, > Preetam Shah, Ph.D. > > > > > From lstein at cshl.edu Fri Sep 5 16:50:11 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Sep 5 16:49:25 2003 Subject: [Bioperl-l] Fwd: Help In-Reply-To: References: Message-ID: <200309051650.11182.lstein@cshl.edu> I think this has been true for the past 8 months at least when I rewrote DB::Genbank to work through GI and accession numbers. Lincoln On Friday 05 September 2003 02:57 pm, Aaron J. Mackey wrote: > Begin forwarded message: > > From: Preetam Shah > > Date: Fri Sep 5, 2003 2:28:17 PM US/Eastern > > To: amackey@virginia.edu > > Subject: Help > > > > > > > > Dear Dr. Mackey, > > I was wondering if you could help me in any way. I saw you name > > on the > > web when I did a search on this matter. Currently NCBI's batch Entrez > > allows me to download batch sequences with a file of Gi's and Accession > > numbers. However I want to be able to download the only specific > > sequences > > based on GSS Ids eg. ENTIE01TF. I have done this in the past but NCBI > > has > > changed this...please see e-mail below. Do you have a script that might > > help me extract this informaton. I can run Bioperl or perl but I am > > not a > > programmer. Any help is appreciated. I have to retrieve thousands of > > sequences at a time and manually creating a file of Gi's or Accession > > numbers is tedious. I have files of GSS Ids. Thanks for your help. > > > > E mail from NCBI: > > Date: Wed, 3 Sep 2003 18:10:45 -0400 > > From: "Gabrielian, Andrei (NIH/NLM/NCBI)" > > To: 'Preetam Shah' > > Subject: RE: Urgent > > Dear NCBI user, > > Batch Entrez that allowed to use anything except GB Id or gi, was > > broken > > (was doing all-fields search, whis is wrong). Now it is fixed, and it > > accepts Genbank accession numbers or gi's only. > > You can download the corresponding files from the FTP site and run the > > script on them to fish out the necessary sequences. I do not see any > > alternative. > > Regards, > > A.Gabrielian > > NCBI Help desk > > > > > > Regards, > > Preetam Shah, Ph.D. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From birney at ebi.ac.uk Sat Sep 6 04:38:22 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Sat Sep 6 04:37:07 2003 Subject: [Bioperl-l] Re: cloning and Storable In-Reply-To: <1062785088.4056.51.camel@localhost> Message-ID: On Fri, 5 Sep 2003, Heikki Lehvaslaiho wrote: > Will, > > Your quick! I merged your next version into the committed one. Hopefully > everything is fine. The test logic seems to be OK but it would be great > to get confirmation that all tests pass from someone without Storable. maybe time to give will an a/c on bioperl directly ... ;) Whaddya reckon? From heikki at nildram.co.uk Sat Sep 6 07:17:56 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Sat Sep 6 07:16:35 2003 Subject: [Bioperl-l] Re: cloning and Storable In-Reply-To: References: Message-ID: <1062847075.5141.2.camel@localhost> Definitely. Would you like one, Will? -Heikki On Sat, 2003-09-06 at 09:38, Ewan Birney wrote: > On Fri, 5 Sep 2003, Heikki Lehvaslaiho wrote: > > > Will, > > > > Your quick! I merged your next version into the committed one. Hopefully > > everything is fine. The test logic seems to be OK but it would be great > > to get confirmation that all tests pass from someone without Storable. > > maybe time to give will an a/c on bioperl directly ... ;) Whaddya reckon? -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From vesko_baev at abv.bg Sat Sep 6 08:29:26 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Sat Sep 6 08:28:11 2003 Subject: [Bioperl-l] can't start 'water' Message-ID: <1355899327.1062851366478.JavaMail.nobody@app1.ni.bg> Hi, In my computer I have bioperl-pack & Bioperl-run-pack, and I HAVE water.pm,but: message in dos-prompt: -------------------- WARNING --------------------- MSG: Application [water] is not available! --------------------------------------------------- Can't call method "run" on an undefined value at water.pl line 22. Here is my script: #!usr/bin/perl use Bio::Factory::EMBOSS; use Bio::AlignIO; my $seqobj1 = Bio::Seq->new(-seq => "gacgggtatctttaggcggacttagg"); my $seqobj2 = Bio::Seq->new(-seq => "gacgtgtatctttaggcggtcttacc"); my @seqs_to_check; # this would be a list of seqs to compare push (@seqs_to_check,seqobj2); # get an EMBOSS application object from the EMBOSS factory $factory = new Bio::Factory::EMBOSS; # $application = $factory->program('embossversion'); # run the application with an optional hash containing parameters # $result = $application->run(); # returns a string or creates a file # print $result . "\n"; $water = $factory->program('water'); # here is an example of running the application # water can compare 1 seq against 1->many sequences # in a database using Smith-Waterman my $wateroutfile = 'mirna.water'; $water->run({ '-sequencea' => $seqobj1, '-seqall' => \@seqs_to_check, '-gapopen' => '10.0', '-gapextend' => '0.5', '-outfile' => $wateroutfile}); # now you might want to get the alignment use Bio::AlignIO; my $alnin = new Bio::AlignIO(-format => 'emboss', -file => $wateroutfile); while( my $aln = $alnin->next_aln ) { # process the alignment -- these will be Bio::SimpleAlign objects } AND when I wrote 'use Bio::Tools::Run::PiseApplication::water;' the message was: Can't locate XML/Parser/PerlSAX.pm in @INC?!?!?! WHAT TO DO? Vesko Thenks!! ----------------------------------------------------------------- http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. From jason at cgt.duhs.duke.edu Sat Sep 6 11:03:41 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Sep 6 11:02:19 2003 Subject: [Bioperl-l] can't start 'water' In-Reply-To: <1355899327.1062851366478.JavaMail.nobody@app1.ni.bg> References: <1355899327.1062851366478.JavaMail.nobody@app1.ni.bg> Message-ID: do you have EMBOSS installed? On Sat, 6 Sep 2003, Vesko Baev wrote: > Hi, > In my computer I have bioperl-pack & Bioperl-run-pack, and I HAVE water.pm,but: > message in dos-prompt: > > -------------------- WARNING --------------------- > MSG: Application [water] is not available! > --------------------------------------------------- > Can't call method "run" on an undefined value at water.pl line 22. > > Here is my script: > #!usr/bin/perl > use Bio::Factory::EMBOSS; > use Bio::AlignIO; > my $seqobj1 = Bio::Seq->new(-seq => "gacgggtatctttaggcggacttagg"); > my $seqobj2 = Bio::Seq->new(-seq => "gacgtgtatctttaggcggtcttacc"); > my @seqs_to_check; # this would be a list of seqs to compare > push (@seqs_to_check,seqobj2); > # get an EMBOSS application object from the EMBOSS factory > $factory = new Bio::Factory::EMBOSS; > # $application = $factory->program('embossversion'); > # run the application with an optional hash containing parameters > # $result = $application->run(); # returns a string or creates a file > # print $result . "\n"; > > $water = $factory->program('water'); > > # here is an example of running the application > # water can compare 1 seq against 1->many sequences > # in a database using Smith-Waterman > > my $wateroutfile = 'mirna.water'; > $water->run({ '-sequencea' => $seqobj1, > > '-seqall' => \@seqs_to_check, > '-gapopen' => '10.0', > '-gapextend' => '0.5', > '-outfile' => $wateroutfile}); > # now you might want to get the alignment > use Bio::AlignIO; > my $alnin = new Bio::AlignIO(-format => 'emboss', > -file => $wateroutfile); > > while( my $aln = $alnin->next_aln ) { > # process the alignment -- these will be Bio::SimpleAlign objects > } > > AND when I wrote 'use Bio::Tools::Run::PiseApplication::water;' > the message was: > Can't locate XML/Parser/PerlSAX.pm in @INC?!?!?! > > WHAT TO DO? > Vesko > Thenks!! > > > > ----------------------------------------------------------------- > http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From mkur at poczta.gazeta.pl Fri Sep 5 19:08:02 2003 From: mkur at poczta.gazeta.pl (Michal Kurowski) Date: Sat Sep 6 12:26:58 2003 Subject: [Bioperl-l] Re: SeqFactory diff In-Reply-To: <833E32F61B9F8746878F2A1865BECE60530E22@EXCHCLUSTER01.lj.gnf.org> References: <833E32F61B9F8746878F2A1865BECE60530E22@EXCHCLUSTER01.lj.gnf.org> Message-ID: <20030905230802.GA23819@calvados> Hilmar Lapp [hlapp@gnf.org] wrote: > Could you post your code too? Otherwise it will be very tedious to try > to reproduce the error. Ideally, if you can provide a self-contained > script or collection of files/scripts that demonstrate the error, that'd > be very helpful. > > Otherwise the chances of somebody sitting down and trying to debug this > are not sky high ... Of course, I can. It is in an attachment. It's just a dirty script that won't ever work outside here because database requirement. Thanks for your help, -- Michal Kurowski -------------- next part -------------- A non-text attachment was scrubbed... Name: uploader2.pl Type: application/x-perl Size: 7543 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20030906/68055387/uploader2.bin From vesko_baev at abv.bg Sat Sep 6 12:34:06 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Sat Sep 6 12:32:59 2003 Subject: [Bioperl-l] TCoffee question? Message-ID: <560874888.1062866046363.JavaMail.nobody@storage.ni.bg> Hi, I have some ERROR MESAGES that I don't understand that they mean? The sctipt using TCoffee (I've got the run::al.::Tcoffee module, but I do not know if it wants any TCoffee external program ot something like that?! there is a script & ERR.messages: #!/usr/bin/perl use Bio::Seq; use Bio::Tools::Run::Alignment::TCoffee; #Build a Bio::Seq obj1 $seqobj1 = Bio::Seq->new(-seq => "gatgggtataataggtggactta"); #Build a Bio::Seq obj2 $gene_seqobj2 = Bio::Seq->new(-seq => "gacgggtatctttaggcggacttag"); # Build a tcoffee alignment factory @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'output' => 'clustalw', 'outfile'=> 'coffee.out'); $factory = new Bio::Tools::Run::Alignment::TCoffee (@params); # WAY ONE with file # $aln = $factory->align('el.fa'); # WAY TWO with 2 seq obj my @seq_array =(); push (@seq_array, $seqobj1); push (@seq_array, $gene_seqobj2); $seq_array_ref = \@seq_array; $aln = $factory->align($seq_array_ref); my $s1_perid = $aln->average_percentage_identity; print $s1_perid; in the way with fasta file I've got: ------------- EXCEPTION ------------- MSG: TCoffee call crashed: 0 [command -in=el.fa,XBLOSUM,Mlalign_id_pair,Mclusta lw_pair -ktuple=2 -outfile=coffee.out -output=clustalw] STACK Bio::Tools::Run::Alignment::TCoffee::_run D:/Perl/site/lib/Bio/Tools/Run/A lignment/TCoffee.pm:814 STACK Bio::Tools::Run::Alignment::TCoffee::align D:/Perl/site/lib/Bio/Tools/Run/ Alignment/TCoffee.pm:719 STACK toplevel coffee.pl:21 -------------------------------------- in the WAY 2 (with the 2 seqobj) the ERROR MESSAGE i telling me that in my array there is less then 2 seq obj (but I pushed 2 obj!) THANKS!! AGAIN! ----------------------------------------------------------------- http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. From shawnh at fugu-sg.org Sat Sep 6 13:39:18 2003 From: shawnh at fugu-sg.org (Shawn Hoon) Date: Sat Sep 6 13:34:53 2003 Subject: [Bioperl-l] TCoffee question? In-Reply-To: <560874888.1062866046363.JavaMail.nobody@storage.ni.bg> References: <560874888.1062866046363.JavaMail.nobody@storage.ni.bg> Message-ID: <0A9046C6-E091-11D7-A4CE-000A95783436@fugu-sg.org> Yes u will need TCoffee installed. bioperl-run is simply a suite of perl wrapper modules around the binaries. The urls to download the programs may be found in INSTALL.PROGRAMS in the bioperl-run package. shawn On Sunday, September 7, 2003, at 12:34 AM, Vesko Baev wrote: > Hi, > I have some ERROR MESAGES that I don't understand that they mean? > The sctipt using TCoffee (I've got the run::al.::Tcoffee module, but I > do not know if it wants any TCoffee external program ot something like > that?! > there is a script & ERR.messages: > > > > #!/usr/bin/perl > use Bio::Seq; > use Bio::Tools::Run::Alignment::TCoffee; > > > #Build a Bio::Seq obj1 > $seqobj1 = Bio::Seq->new(-seq => "gatgggtataataggtggactta"); > #Build a Bio::Seq obj2 > $gene_seqobj2 = Bio::Seq->new(-seq => "gacgggtatctttaggcggacttag"); > > > # Build a tcoffee alignment factory > @params = ('ktuple' => 2, > 'matrix' => 'BLOSUM', > 'output' => 'clustalw', > 'outfile'=> 'coffee.out'); > $factory = new Bio::Tools::Run::Alignment::TCoffee (@params); > > # WAY ONE with file > # $aln = $factory->align('el.fa'); > > # WAY TWO with 2 seq obj > my @seq_array =(); > > push (@seq_array, $seqobj1); > push (@seq_array, $gene_seqobj2); > > $seq_array_ref = \@seq_array; > $aln = $factory->align($seq_array_ref); > my $s1_perid = $aln->average_percentage_identity; > print $s1_perid; > > in the way with fasta file I've got: > ------------- EXCEPTION ------------- > MSG: TCoffee call crashed: 0 [command > -in=el.fa,XBLOSUM,Mlalign_id_pair,Mclusta > lw_pair -ktuple=2 -outfile=coffee.out -output=clustalw] > > STACK Bio::Tools::Run::Alignment::TCoffee::_run > D:/Perl/site/lib/Bio/Tools/Run/A > lignment/TCoffee.pm:814 > STACK Bio::Tools::Run::Alignment::TCoffee::align > D:/Perl/site/lib/Bio/Tools/Run/ > Alignment/TCoffee.pm:719 > STACK toplevel coffee.pl:21 > > -------------------------------------- > in the WAY 2 (with the 2 seqobj) the ERROR MESSAGE i telling me that > in my array there is less then 2 seq obj (but I pushed 2 obj!) > > THANKS!! AGAIN! > > > ----------------------------------------------------------------- > http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -shawn From vesi_radeva at mail.bg Sun Sep 7 03:57:33 2003 From: vesi_radeva at mail.bg (vesi_radeva@mail.bg) Date: Sun Sep 7 03:56:15 2003 Subject: [Bioperl-l] TCoffee message Message-ID: <1062921453.e3518f27ae8fa@mail.bg> Hello Dear Colleagues, I'm a new user of BioPerl! I've wrote a simple script for TCoffee program, but when I start the script there is a error mesage like this: ------ Can't call method "isa" on unblessed reference at D:/Perl/site/lib/Bio/Root/IO.p m line 487, chunk 1. ------ My script is simple and I can't find where is my error!?: use Bio::Seq; use Bio::SeqIO; use Bio::Tools::Run::Alignment::TCoffee; $ENV{TCOFFEEDIR} = '/perl/tcoffee'; #The gene obj $geneDNA = 'aaagtgaccgtagcgagctgcatacttccaaaagaagtattgtagaacggggtggtagt'; $geneRNA = $geneDNA; $geneRNA =~ tr/Tt/Uu/; $gene = Bio::Seq->new( -seq => $geneRNA, -id => "transctibed_gene"); $DB = Bio::SeqIO->new(-file => 'theseqs.fa', -format=>'fasta'); # TCoffee @params = ('ktuple' => 2, 'matrix' => 'BLOSUM', 'output' => 'clustalw', 'outfile'=> 'mi.out'); $factory = new Bio::Tools::Run::Alignment::TCoffee (@params); my $mi=$DB->next_seq(); push (@seq_array, $mi,$gene); push (@seq_array, $gene); $seq_array_ref = \@seq_array; $aln = $factory->align($seq_array_ref); Thanks in advance for your help! Sincerely Yours, Victoria R. ---------------------- ????? ????-???? ?????? 12MB ????? ?? ???? SMS ?? ??? ??e?? ? ??? ????? ?????????! POP3/WAP ?????? _________________________________________ HOB ?E???ATEH A?PEC - http://mail.bg/new/ From vesko_baev at abv.bg Sun Sep 7 12:00:23 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Sun Sep 7 11:59:09 2003 Subject: [Bioperl-l] BioPerl problem - can anyone help me? Message-ID: <1788294638.1062950423519.JavaMail.nobody@storage.ni.bg> >Hello Dear Colleagues, >I'm a new user of BioPerl!(student in fact) >My idea is to get the seq obj from multifasta file and align it with ither seq obj. >I've wrote a simple script for TCoffee program, but when I start the script I've got error message. If I use '$aln = $factory->align($FILE);' every thing is ok (with fasta file). >but if I use '$aln = $factory->align($seq_array_ref); >there is a error mesage like this: >------ >Can't call method "isa" on unblessed reference at D:/Perl/site/lib/Bio/Root/IO.p >m line 487, chunk 1. >------ >My script is simple and I can't find where is my error!?: > > >use Bio::Seq; >use Bio::SeqIO; >use Bio::Tools::Run::Alignment::TCoffee; >$ENV{TCOFFEEDIR} = '/perl/tcoffee'; > >#The gene obj > $geneDNA = 'aaagtgaccgtagcgagctgcatacttccaaaagaagtattgtagaacggggtggtagt'; > $geneRNA = $geneDNA; > $geneRNA =~ tr/Tt/Uu/; > $gene = Bio::Seq->new( -seq => $geneRNA, > -id => "transctibed_gene"); > $DB = Bio::SeqIO->new(-file => 'theseqs.fa', > -format=>'fasta'); > # TCoffee > @params = ('ktuple' => 2, > 'matrix' => 'BLOSUM', > 'output' => 'clustalw', > 'outfile'=> 'mi.out'); > $factory = new Bio::Tools::Run::Alignment::TCoffee (@params); > > my $mi=$DB->next_seq(); > push (@seq_array, $mi,$gene); > push (@seq_array, $gene); > $seq_array_ref = \@seq_array; > $aln = $factory->align($seq_array_ref); >----------------------- >And when I use : >while ( my $seq = $DB->next_seq() ) { > push (@seq_array, $seq) ;} >my $seq_array_ref = \@seq_array; >$aln = $factory->align($seq_array_ref); >Again I have the same error message > >I tryed everything, but it works only when I give it the fasta file to align!HELP! > >Thanks in advance for your help! >Sincerely Yours, Vesselin Baev >Bulgaria ----------------------------------------------------------------- http://nova.GBG.bg - ??????? ?? ????! ?????, ????? ? ?????????????. From lstein at cshl.edu Sun Sep 7 16:05:17 2003 From: lstein at cshl.edu (lstein@cshl.edu) Date: Sun Sep 7 16:04:00 2003 Subject: [Bioperl-l] Bio::DB::GFF feature request granted Message-ID: <200309072005.h87K5HOp005396@pronto.lsjs.org> Hi Sheldon, Your request for a delete() method for the segment object has now been granted. CVS update bioperl-live to get your present! The syntax is: $segment = $db->segment(Clone => 'M1022.1'); $segment->delete(-range_type => 'contains', -type => 'UTR'); -range_type can be one of 'contains', 'contained_in', or 'overlaps', the same way features() works. It defaults to 'overlaps'. -type is an optional list of feature types to delete. It defaults to all features. Hopefully this will handle Artemis' needs. Lincoln -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From letondal at pasteur.fr Sun Sep 7 16:56:52 2003 From: letondal at pasteur.fr (Catherine Letondal) Date: Sun Sep 7 16:55:32 2003 Subject: [Bioperl-l] can't start 'water' In-Reply-To: <1355899327.1062851366478.JavaMail.nobody@app1.ni.bg>; from vesko_baev@abv.bg on Sat, Sep 06, 2003 at 03:29:26PM +0300 References: <1355899327.1062851366478.JavaMail.nobody@app1.ni.bg> Message-ID: <20030907225652.A71012@electre.pasteur.fr> On Sat, Sep 06, 2003 at 03:29:26PM +0300, Vesko Baev wrote: > [...] > AND when I wrote 'use Bio::Tools::Run::PiseApplication::water;' > the message was: > Can't locate XML/Parser/PerlSAX.pm in @INC?!?!?! You need to have XML::Parser::PerlSAX, LWP::UserAgent and HTTP::Request::Common in order to use Pise bioperl modules. BTW, you should not use a PiseApplication directly, but rather through a factory. So, instead of: use Bio::Tools::Run::PiseApplication::water; you rather put: use Bio::Tools::Run::AnalysisFactory::Pise; my $factory = new Bio::Tools::Run::AnalysisFactory::Pise(); my $water = $factory->program('water'); See Bio::Tools::Run::AnalysisFactory::Pise synopsis for further details. -- Catherine Letondal -- Pasteur Institute Computing Center From Mason.Christopher at mayo.edu Sun Sep 7 17:04:13 2003 From: Mason.Christopher at mayo.edu (Christopher Mason) Date: Sun Sep 7 17:03:17 2003 Subject: [Bioperl-l] Problems with biosql Message-ID: <92446450.1062950653@[172.23.198.245]> Howdy- I'm using the latest CVS versions (as of 5 Sep 03) of bioperl-live, bioperl-db, and biosql-schema with the latest PostgreSQL (7.3.4) and Perl (5.8.0). I'm trying to load all of swiss-prot into a biosql database, and it's not going well. Although my ultimate goal is to manipulate this database from java, I'm using perl because the various docs I've read seem to indicate this is the way to go for loading (if it's not, please tell me). There are some errors output when loading the schema (see below). But in general, creating the database seems to work. However, when trying to run: > bioperl-db/scripts/biosql/load_seqdatabase.pl --dbname biosql > --driver Pg --format swiss --dbuser cmason > --namespace bioperl sprot.dat I immediately get this error: > Could not store P15711: > ------------- EXCEPTION ------------- > MSG: You're trying to lie about the length: is 102 but you say 924 (P15711 is the very first entry in the file.) (Full traceback below.) which seems to be generated here: Bio/PrimarySeq.pm:419 > "You're trying to lie about the length: ". > "is $len but you say ".$val); called from here: Bio/DB/BioSQL/BiosequenceAdaptor.pm:252 > $obj->alphabet($rows->[3]) if $rows->[3]; > $obj->seq($rows->[4]) if $rows->[4]; > $obj->length($rows->[2]) if $rows->[2]; # <---- 252 > if($obj->isa("Bio::DB::PersistentObjectI") && $rows is > [1, undef, 924, protein, undef, 1] Commenting out the indicated line seems to prevent this error message. However, then I get, about two days later, this message: > Out of memory! The state of the database is odd: > biosql=# select count(bioentry_id) from bioentry; > count > ------- > 1 > (1 row) but: > biosql=# select count (seqfeature_id) from location; > count > ------- > 1329 > (1 row) and: ># du -sk /home/postgres/ > 739724 /home/postgres (There are no other database besides biosql.) (I tried VACUUMing the database which caused it to grow by about 100MB, but nothing else shows up.) It's hard to tell how far it's gotten when it runs out of memory. I sort of expected the size of the finished database to be somewhat larger than the size of the flat file. But even if it's almost finished, it's incredibly slow (at least 1,300 minutes of user time, not counting postgres). Would mysql be much faster? Or should I simply be prepared to wait a long time? Has anyone tried this recently (importing all of swiss prot into a biosql database) with any database (postgres, mysql, oracle, etc.)? If so, can you give me (even approximate) performance numbers (for loading, selecting a sequence, etc.) and ultimate database size on disk? I'm trying to determine if this is a viable way of architecting my application (which incidentally, will probably be written in java, not perl). Also, why is this code spread out over three different CVS modules? Thanks, -c When loading the schema: >> psql biosql < biosqldb-views-pg.sql > ERROR: Relation "seqfeature_key" does not exist > ERROR: view "gff" does not exist > ERROR: Relation "ontology_term" does not exist > ERROR: Relation "ontology_term" does not exist > ERROR: Relation "fasta" does not exist > ERROR: Relation "ontology_term" does not exist > ERROR: parser: parse error at end of input > ERROR: RemoveFunction: function compl(text) does not exist > CREATE FUNCTION > ERROR: RemoveFunction: function reverse(text) does not exist > ERROR: stat failed on file > '/home/cjm/cvs/biosql-schema/ext/biosqldb-funcs.so': No such file or > directory ERROR: Function reverse("unknown") does not exist > Unable to identify a function that satisfies the given argument > types You may need to add explicit typecasts > ERROR: RemoveFunction: function get_subseq(text, integer, integer, > integer) does not exist CREATE FUNCTION > get_subseq > ------------ > bc > (1 row) > > ERROR: view "gffseq" does not exist > ERROR: Relation "seqfeature_key" does not exist > ERROR: Relation "seqfeature_key_v" does not exist > ERROR: Relation "seqfeature_key_v" does not exist > ERROR: Relation "seqfeature_key_v" does not exist > ERROR: Relation "seqfeature_key_v" does not exist > ERROR: Relation "seqfeature_key_v" does not exist > ERROR: Relation "seqfeature_key_v" does not exist > ERROR: Relation "seqfeature_key_v" does not exist > ERROR: Relation "seqfeature_key_v" does not exist > ERROR: Relation "seqfeature_key_v" does not exist and: >> psql biosql < biosql-accelerators-pg.sql > ERROR: RemoveFunction: function biosql_accelerators_level() does not > exist CREATE FUNCTION > ERROR: RemoveFunction: function intern_ontology_term(text) does not exist > CREATE FUNCTION > ERROR: RemoveFunction: function intern_seqfeature_source(text) does not > exist CREATE FUNCTION > ERROR: RemoveFunction: function create_seqfeature(integer, text, text) > does not exist CREATE FUNCTION > ERROR: RemoveFunction: function create_seqfeature_onespan(integer, text, > text, integer, integer, integer) does not exist CREATE FUNCTION Then when trying to load: > ------------- EXCEPTION ------------- > MSG: You're trying to lie about the length: is 102 but you say 924 > STACK Bio::PrimarySeq::length > /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:419 STACK > Bio::DB::Persistent::PersistentObject::AUTOLOAD > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:541 > STACK Bio::Seq::length /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:612 > STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:541 > STACK Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BiosequenceAdaptor.pm:254 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:12 > 78 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:966 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:851 > STACK Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284 > STACK Bio::DB::BioSQL::SeqAdaptor::attach_children > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/SeqAdaptor.pm:279 STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:13 > 09 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:966 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:851 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:204 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 > STACK (eval) ./load_seqdatabase.pl:446 > STACK toplevel ./load_seqdatabase.pl:429 > > -------------------------------------- -- [ Christopher Mason MPRC Bioinformatics cjm37@mayo.edu ] From chenn at cshl.edu Sun Sep 7 23:56:06 2003 From: chenn at cshl.edu (Jack Chen) Date: Sun Sep 7 23:54:43 2003 Subject: [Bioperl-l] clustalw alignment display format Message-ID: Hi, I have checked the BioPerl API but could not find a module that can display the output of clustalw alignment. This site: http://embnet.cifn.unam.mx/perl-doc/Bio/SimpleAlign.html indicates that the subroutines has been implemented before but not in the BioPerl package yet. Any module in BioPerl can do the display? Thanks Jack ++++++++++++++++++++++++++++++++++++++++++++ o-o Jack Chen, Stein Laboratory o---o Cold Spring Harbor Laboratory o----o #5 Williams, 1 Bungtown Road O----O Cold Spring Harbor, NY, 11724 0--o Tel: 1 516 367 8394 O e-mail: chenn@cshl.org o-o Website: http://www.wormbase.org +++++++++++++++++++++++++++++++++++++++++++++ From DrWeb-DAEMON at xecu.net Mon Sep 8 00:15:31 2003 From: DrWeb-DAEMON at xecu.net (DrWeb-DAEMON) Date: Mon Sep 8 00:14:05 2003 Subject: [Bioperl-l] Undelivered mail: Your details Message-ID: <20030908041531.217B1393962@mg2.xecu.net> Dear bioperl-l@bioperl.org, The message with following attributes has not been delivered, because it contains an infected object, most likely a virus. Sender = bioperl-l@bioperl.org Recipients = dkuhns@xecu.net Subject = Your details --- Virus Scan Report --- ======== infected with Win32.HLLM.Reteras ======== plain] - Ok document_9446.pif infected with Win32.HLLM.Reteras ======== known virus is found : 1 evaluation key used : 1 --- Virus Scan Report --- You should disinfect your machine using popular anti-virus software and resend your message once you are clean. If you have any questions or concerns, do not hesitate to contact support@xecu.net for assistance. Xecunet Technical Support 877-XECUNET From shawnh at fugu-sg.org Mon Sep 8 01:40:11 2003 From: shawnh at fugu-sg.org (Shawn Hoon) Date: Mon Sep 8 01:39:52 2003 Subject: [Bioperl-l] clustalw alignment display format In-Reply-To: Message-ID: You want to use Bio::AlignIO. use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); $inputfilename = 't/data/cysprot.fa'; $aln = $factory->align($inputfilename); # $aln is a SimpleAlign object. my $aio = Bio::AlignIO->new(-fh=>\*STDOUT,-format=>'clustalw'); $aio->write_aln($aln); replace clustalw with any other multiple alignment format that can be found in Bio/AlignIO/ hth, shawn On Sun, 7 Sep 2003, Jack Chen wrote: > Hi, > > I have checked the BioPerl API but could not find a module that can > display the output of clustalw alignment. This site: > > http://embnet.cifn.unam.mx/perl-doc/Bio/SimpleAlign.html > > indicates that the subroutines has been implemented before but not in the > BioPerl package yet. Any module in BioPerl can do the display? > > Thanks > > Jack > > ++++++++++++++++++++++++++++++++++++++++++++ > o-o Jack Chen, Stein Laboratory > o---o Cold Spring Harbor Laboratory > o----o #5 Williams, 1 Bungtown Road > O----O Cold Spring Harbor, NY, 11724 > 0--o Tel: 1 516 367 8394 > O e-mail: chenn@cshl.org > o-o Website: http://www.wormbase.org > +++++++++++++++++++++++++++++++++++++++++++++ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ******************************** * Shawn Hoon * http://www.fugu-sg.org/~shawnh ******************************** From Richard.Adams at ed.ac.uk Mon Sep 8 03:54:23 2003 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Mon Sep 8 03:53:02 2003 Subject: [Bioperl-l] codon useage modules Message-ID: <3F5C35AF.DAF8CC7@ed.ac.uk> Heikki, Have made suggested changes to Bio::CodonUsage::Table, Bio::CodonUsage::IO and Bio::DB::CUTG and CVSd them. Richard -- Dr Richard Adams Bioinformatician, Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From heikki at ebi.ac.uk Mon Sep 8 11:54:21 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Sep 8 11:52:57 2003 Subject: [Bioperl-l] codon useage modules In-Reply-To: <3F5C35AF.DAF8CC7@ed.ac.uk> References: <3F5C35AF.DAF8CC7@ed.ac.uk> Message-ID: <1063036460.17679.5.camel@localhost> Looks good. Cheers, -Heikki On Mon, 2003-09-08 at 08:54, Richard Adams wrote: > Heikki, > Have made suggested changes to Bio::CodonUsage::Table, > Bio::CodonUsage::IO and Bio::DB::CUTG and CVSd them. > > Richard > > -- > Dr Richard Adams > Bioinformatician, > Psychiatric Genetics Group, > Medical Genetics, > Molecular Medicine Centre, > Western General Hospital, > Crewe Rd West, > Edinburgh UK > EH4 2XU > > Tel: 44 131 651 1084 > richard.adams@ed.ac.uk > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From pm66 at nyu.edu Mon Sep 8 14:16:56 2003 From: pm66 at nyu.edu (Philip MacMenamin) Date: Mon Sep 8 14:20:09 2003 Subject: [Bioperl-l] Connecting Bio::DB::GFF::Features on a Bio::Graphics::Panel Message-ID: <200309081821.h88ILS4c020668@mx3.nyu.edu> Hi, I am creating Bio::DB::GFF::Features, on the fly, (as in at run time, nothing to do with Drosophila) using the data contained in a GFF database. I generate a Bio::DB::GFF::Feature, with the aid of data I have retrieved from the database, and assign them the same group and type name. (This is in contrast to getting features from a segment taken from the DB, ie my @pcr_products = $segment->features(-Types => "structural:GenePair_STS"); ) Code example: my @KKs; my $newFeature = Bio::DB::GFF::Feature->new($db,'RNAiPredicted',$KKstart,$KKstop); $newFeature->group('someGroup'); $newFeature->type('newType'); push @KKs, $newFeature; I then draw these features, which appear on the panel similar to various other features I have pulled directly from the database. They appear on the same plane, as desired. But I would like them to be joined using a dashed connector. To do this would you use an aggregator, and if so, how? I have tried things like: my $KKref = \@KKs; my $KKagg = Bio::DB::GFF::Aggregator->new(-method => 'RNAiPredicted', ); along with: if ( scalar @KKs > 0 ) { $panel->add_track(group =>$KKFeatures, -bgcolor => 'blue', -height => 4, -fgcolor => 'darkgreen', -key => 'RNAi:predicted', -connect => 1, -connector => 'dashed', -bump => +1, -connect_color => 'lightpurple', # -label => 1, # -description => 1 ); } Maybe I dont need to use Bio::DB::GFF::Feature, maybe I could use a Bio::Graphics::Feature? I tried this, but again, I could not get them to be connected. Thank you for any ideas, Philip. ###Previously I asked:### "I would like there to be no vertical (y) space between certain tracks, ie the curatedGenes track and the UTR track, similar to the way that wormbase have it. They should not overly eachother, due to them existing in differant x space. What area of Bio::Graphics should I look at to do this? " This worked for me: my $aggregator = Bio::DB::GFF::Aggregator->new(-method => 'transcript', -sub_parts => ['exon','UTR','CDS' ] ); and load them out of the segment into a single array ie my @features = $segment1->features('transcript'); Then these features can be painted on the panel with the exons on the same y co-ord as the UTRs. Similar to wormbase. From pm66 at nyu.edu Mon Sep 8 14:47:26 2003 From: pm66 at nyu.edu (Philip MacMenamin) Date: Mon Sep 8 14:50:34 2003 Subject: [Bioperl-l] Connecting Bio::DB::GFF::Features on a Bio::Graphics::Panel Message-ID: <200309081851.h88IpvIM026914@mx2.nyu.edu> Apologies for previous question. I realised the solution to the problem just after I sent the question. Very simple, very silly of me. Always the way, stare at it for ages, then show some one else the code, and the answer instantly becomes painfully obvious. Use a Bio::Graphics::Feature object, and just tell it the segments in the constructor at the start. Works fine. No aggregator, no Bio::DB::GFF::Features, all very simple and above board. $joinedFeatureRef = Bio::Graphics::Feature->new(-segments =>[ [$KKstart,$KKstop],[($KKstopStop-1),$KKstopStop]]); and if ( scalar @KKs > 0 ) { $panel->add_track(group =>$joinedFeatureRef, -bgcolor => 'blue', -height => 4, -fgcolor => 'darkgreen', -key => 'SuchAndSuch', -connect => 1, -connector => 'dashed', -bump => +1, -connect_color => 'lightpurple', ); } All the best. Philip. From pstogios at uhnres.utoronto.ca Mon Sep 8 15:22:16 2003 From: pstogios at uhnres.utoronto.ca (Peter Stogios) Date: Mon Sep 8 15:20:55 2003 Subject: [Bioperl-l] Bio::SimpleAlign problems Message-ID: Hello, I would like some help with Bio::SimpleAlign and Bio::AlignIO. I am trying to do some VERY simple tasks but the AlignIO module is being difficult in reading alignment files. It does not seem to read many formats correctly. I am using the sample code included at the SimpleAlign documentation page. The code of interest is: $str = Bio::AlignIO->new('-file' => 'testaln.aln'); $aln = $str->next_aln(); print $aln->no_residues, "\n"; print $aln->no_sequences, "\n"; I have tried loading ClustalX format version 1.81, ClustalW version 1.5, MSF format, and PHYLIP format, without having success. I am sure the alignment files are in their correct formats, since other programs can read them. Can someone please inform me what is the preferred format for reading by AlignIO and SimpleAlign? Also, should I specify the format of the alignment in the Bio::AlignIO-->new line? Thank you very much in advance, Peter Stogios -- ____________________________________________________________ Peter Stogios | Ontario Cancer Institute Graduate Student | Princess Margaret Hospital G. Prive Lab | 610 University Ave. Rm.7-207 Dept. of Medical Biophysics | M5G 2M9 University of Toronto | (416) 946-2000 ex. 5615 pstogios@uhnres.utoronto.ca http://xtal.uhnres.utoronto.ca/prive ____________________________________________________________ From jason at cgt.duhs.duke.edu Mon Sep 8 15:31:52 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Sep 8 15:30:31 2003 Subject: [Bioperl-l] Bio::SimpleAlign problems In-Reply-To: References: Message-ID: You need to specify the format with the -format flag, otherwise it tries to guess based on the extension - .aln maps to clustalw I think. $str = Bio::AlignIO->new(-file => 'testaln.aln', -format=>'clustalw'); .. or $str = Bio::AlignIO->new(-file => 'testaln.phy', -format=>'phylip'); or $str = Bio::AlignIO->new(-file => 'testaln.msf', -format=>'msf'); If someone wanted to contribute something new to these modules a little method which would guess the format by peeking at the first few lines and could. The Root::IO->_pushback makes this really easy to do without worrying about reopening the file/dealing with file streams -jason On Mon, 8 Sep 2003, Peter Stogios wrote: > Hello, > > I would like some help with Bio::SimpleAlign and Bio::AlignIO. > > I am trying to do some VERY simple tasks but the AlignIO module is being > difficult in reading alignment files. It does not seem to read many > formats correctly. > > I am using the sample code included at the SimpleAlign documentation page. > The code of interest is: > > $str = Bio::AlignIO->new('-file' => 'testaln.aln'); > $aln = $str->next_aln(); > print $aln->no_residues, "\n"; > print $aln->no_sequences, "\n"; > > I have tried loading ClustalX format version 1.81, ClustalW version 1.5, > MSF format, and PHYLIP format, without having success. I am sure the > alignment files are in their correct formats, since other programs can > read them. > > Can someone please inform me what is the preferred format for reading by > AlignIO and SimpleAlign? Also, should I specify the format of the > alignment in the Bio::AlignIO-->new line? > > Thank you very much in advance, > > Peter Stogios > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From wxu at msi.umn.edu Mon Sep 8 19:33:29 2003 From: wxu at msi.umn.edu (wxu) Date: Mon Sep 8 19:32:10 2003 Subject: [Bioperl-l] Bio::DB::GenBank problem? Message-ID: Hello, I wrote a simple bioperl program to retrieve genBank sequences by genBank accession numbers using $gb->get_Stream_by_id function. But it seems to me it can only work on GI number, not work on accession numbers. I use $gb->get_Stream_by_acc, the result is the same. In my following code, it only give me 2981014 sequences. I search NCBI web manually, all "J00522","AF303112","2981014","BG065024","BG065143" work. And I tried $gb->get_Seq_by_acc("J00522"); It complains "No such acc error". What is the problem? Your help would be greatly appreciated. Wayne -- ---------------------------------------------------------- #!/usr/local/bin/perl use lib "/usr/local/bioperl/bioperl-1.2"; use lib qw(/home/wxu/bioperl); use Bio::Perl; use Bio::DB::GenBank; use Bio::SeqIO; my $gb = new Bio::DB::GenBank(); my $seqio = $gb->get_Stream_by_id(["J00522","AF303112","2981014","BG065024","BG065143"]) ; while( my $seq = $seqio->next_seq ) { #print "seq length is ", $seq->length,"\n"; write_sequence('>>outputfile', 'fasta', $seq) } ---------------------------------------------------------------------------- -- Wayne Xu Computational Genomics Specialist www.msi.umn.edu/user_support/compgen Supercomputing Institute 550 Walter Library 117 Pleasant Street SE University of Minnesota Minneapolis, Minnesota 55455 email: wxu@msi.umn.edu help email: help@msi.umn.edu phone: 612-624-1447 help phone: 612-626-0802 fax: 612-624-8861 From james.wasmuth at ed.ac.uk Tue Sep 9 05:04:05 2003 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Tue Sep 9 05:03:24 2003 Subject: [Bioperl-l] Bio::DB::GenBank problem? References: Message-ID: <3F5D9785.8050909@ed.ac.uk> Hi Wayne, I identified this a few months back, and from what I can remember Heikki fixed it and is now in the CVS. james wxu wrote: >Hello, >I wrote a simple bioperl program to retrieve genBank sequences by genBank >accession numbers using $gb->get_Stream_by_id function. >But it seems to me it can only work on GI number, not work on accession >numbers. I use $gb->get_Stream_by_acc, the result is the same. In my >following code, it only give me 2981014 sequences. I search NCBI web >manually, all "J00522","AF303112","2981014","BG065024","BG065143" work. >And I tried $gb->get_Seq_by_acc("J00522"); It complains "No such acc error". > >What is the problem? >Your help would be greatly appreciated. > >Wayne >-- > >---------------------------------------------------------- >#!/usr/local/bin/perl >use lib "/usr/local/bioperl/bioperl-1.2"; >use lib qw(/home/wxu/bioperl); > >use Bio::Perl; >use Bio::DB::GenBank; >use Bio::SeqIO; >my $gb = new Bio::DB::GenBank(); > >my $seqio = >$gb->get_Stream_by_id(["J00522","AF303112","2981014","BG065024","BG065143"]) >; > >while( my $seq = $seqio->next_seq ) { > #print "seq length is ", $seq->length,"\n"; > write_sequence('>>outputfile', 'fasta', $seq) > } >---------------------------------------------------------------------------- >-- >Wayne Xu >Computational Genomics Specialist www.msi.umn.edu/user_support/compgen > >Supercomputing Institute >550 Walter Library >117 Pleasant Street SE >University of Minnesota >Minneapolis, Minnesota 55455 >email: wxu@msi.umn.edu help email: help@msi.umn.edu >phone: 612-624-1447 help phone: 612-626-0802 >fax: 612-624-8861 > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Nematode Bioinformatics Blaxter Nematode Genomics Group Institute of Cell, Animal and Population Biology Ashworth Labs University of Edinburgh King's Buildings Edinburgh EH9 3JT UK (+44)(0)131 650 7403 From brian_osborne at cognia.com Tue Sep 9 08:02:41 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Sep 9 08:05:22 2003 Subject: [Bioperl-l] Bio::SimpleAlign problems In-Reply-To: Message-ID: Peter, AlignIO will attempt to guess the format based on the file suffix, here are the rules, from Bio/AlignIO.pm: return 'fasta' if /\.(fasta|fast|seq|fa|fsa|nt|aa)$/i; return 'msf' if /\.(msf|pileup|gcg)$/i; return 'pfam' if /\.(pfam|pfm)$/i; return 'selex' if /\.(selex|slx|selx|slex|sx)$/i; return 'phylip' if /\.(phylip|phlp|phyl|phy|phy|ph)$/i; return 'nexus' if /\.(nexus|nex)$/i; return 'mega' if( /\.(meg|mega)$/i ); return 'clustalw' if( /\.aln$/i ); return 'meme' if( /\.meme$/i ); return 'emboss' if( /\.(water|needle)$/i ); return 'psi' if( /\.psi$/i ); If you suspect that AlignIO isn't parsing your alignment files correctly then you may want to compare them to files that it certainly can parse. These files are in the t/ directory and they're used by AlignIO.t, so they're parseable: data/testaln.fasta data/testaln.pfam data/testaln.mase data/testaln.phylip data/testaln.prodom data/testaln.msf data/testaln.selex data/testaln.nexus That's odd that there's no clustalw file there, I will add a test for it in AlignIO.t. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Peter Stogios Sent: Monday, September 08, 2003 3:22 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Bio::SimpleAlign problems Hello, I would like some help with Bio::SimpleAlign and Bio::AlignIO. I am trying to do some VERY simple tasks but the AlignIO module is being difficult in reading alignment files. It does not seem to read many formats correctly. I am using the sample code included at the SimpleAlign documentation page. The code of interest is: $str = Bio::AlignIO->new('-file' => 'testaln.aln'); $aln = $str->next_aln(); print $aln->no_residues, "\n"; print $aln->no_sequences, "\n"; I have tried loading ClustalX format version 1.81, ClustalW version 1.5, MSF format, and PHYLIP format, without having success. I am sure the alignment files are in their correct formats, since other programs can read them. Can someone please inform me what is the preferred format for reading by AlignIO and SimpleAlign? Also, should I specify the format of the alignment in the Bio::AlignIO-->new line? Thank you very much in advance, Peter Stogios -- ____________________________________________________________ Peter Stogios | Ontario Cancer Institute Graduate Student | Princess Margaret Hospital G. Prive Lab | 610 University Ave. Rm.7-207 Dept. of Medical Biophysics | M5G 2M9 University of Toronto | (416) 946-2000 ex. 5615 pstogios@uhnres.utoronto.ca http://xtal.uhnres.utoronto.ca/prive ____________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From yaofx at xymu.net Mon Sep 8 20:56:53 2003 From: yaofx at xymu.net (yaofx) Date: Tue Sep 9 08:30:45 2003 Subject: [Fwd: Re: [Bioperl-l] About Bio::DB::GenBank!] Message-ID: <3F5D2555.8030007@xymu.net> -------- Original Message -------- Subject: Re: [Bioperl-l] About Bio::DB::GenBank! Date: Fri, 01 Aug 2003 13:18:22 -0400 From: Jonathan Manning Organization: Whitehead Institute Center for Genome Research To: yaofx CC: bioperl References: <3F2A6D86.9040006@xymu.net> I just encountered the same error. It looks like both Bio::DB::GenBank and Bio::DB::GenPept search the protein database. So, Bio::DB::GenBank is not returning anything when you query for an accession. To verify this, change: my $gb = new Bio::DB::GenBank; to: my $gb = new Bio::DB::GenBank(-format => 'fasta', -verbose => 1); For me, this prints: url is http://www.ncbi.nih.gov/entrez/eutils/efetch.fcgi?rettype=fasta&db=protein&id=AC068609&tool=bioperl&retmode=text&usehistory=n Sure enough, visiting this gives me a blank page. However, if I substitute 'db=nucleotide' for 'db=protein' in that url, it works. This bug seems to exist in 1.2 and 1.2.1, but I think is fixed in 1.2.2. (at least in the CVS head I checked...) Either upgrade to 1.2.2 or edit Bio/DB/GenBank.pm and change 'protein' to 'nucleotide' in the BEGIN block. ~Jonathan yaofx wrote: > Hello, > > I have installed Perl 5.6.1 for WIN32, and Bioperl version 1.2.1. > The following is the script ,which can retrieve data from GenBank by > sequences' gi, > but can not get the results by accession number. > I replace "get_Stream_by_id" with ""get_Stream_by_acc", also failed. > > The error message is : > > ------------- EXCEPTION ------------- > MSG: WebDBSeqI Error - check query sequences! > > STACK Bio::DB::WebDBSeqI::get_seq_stream > c:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:46 > 4 > STACK Bio::DB::WebDBSeqI::get_Stream_by_id > c:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: > 259 > STACK toplevel web_gi2seq_vi.pl:50 > > -------------------------------------- > > BTW, I don't change any code about Bioperl package. > What's matter with it? and how will i do next? > > Thanks in advance for any kind of help > > Fengxia > > > #!/usr/bin/perl > > $idlist = $ARGV[0]; > > if (@ARGV != 1){ > print "USAGE: perl web_id2seq.pl \n"; > exit(1); > } > > $faoutfile = $idlist."_fa.txt"; > (unlink $faoutfile) if (-e $faoutfile); > > open (INPUT,$idlist); > > while ($line = ){ > chomp ($line); > $line =~ s/\r//; > push (@querylist,$line); > } > $list = join ",",@querylist; > close INPUT; > > use Bio::SeqIO; > use Bio::DB::GenBank; > my $gb = new Bio::DB::GenBank; > > my $seqout = new Bio::SeqIO(-file => ">$faoutfile", -format => 'fasta'); > my $seqio = $gb->get_Stream_by_id([$list]); > > while($seq = $seqio->next_seq ) { > $seqout->write_seq($seq); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jonathan Manning Whitehead Institute Center for Genome Research Finishing Process Analyst / Data Analyst From michael.watson at bbsrc.ac.uk Tue Sep 9 08:36:03 2003 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue Sep 9 08:35:42 2003 Subject: [Bioperl-l] Help with Bio::DB::Query::GenBank Message-ID: <20B7EB075F2D4542AFFAF813E98ACD9301C00AA7@cl-exsrv1.irad.bbsrc.ac.uk> Hi I am having various problems with Bio::DB::Query::GenBank. I have just downloaded Bioperl 1.2.2 from the website. First of all the example gives me errors. If i use the query string 'Oryza[Oragnism]', i get an error message stating that "Organism". However, it does return a nice count of 5879 records. Only if I search NCBI using the web, Oryza[Organism] actuallu returns 525828 records - quite significantly more than the 5879 that Bio::DB::Query::GenBank returns! What I am really trying to do is to get all the Gallus gallus genome sequences out and downloaded onto my server. I was hoping to do this using Bioperl, but using "Gallus" as the query string returns 2421 using Bioperl and nearly 600,000 using Entrez.... Can anyone please enlighten me?? Thanks Mick From brian_osborne at cognia.com Tue Sep 9 08:46:24 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Sep 9 08:48:59 2003 Subject: [Bioperl-l] Bio::SimpleAlign problems In-Reply-To: Message-ID: Peter, AlignIO is parsing my clustalw version 1.74 files correctly. ~>perl -e 'use Bio::AlignIO; $io = Bio::AlignIO->new(-file => "test.aln" ); $aln = $io->next_aln; print $aln->consensus_string;' MNEGEHQIKLDELFEKLLRARKIFKNKDVLRHSWEPKDLPHRHEQIEALAQILVPVLRGETMKIIFCGHHACELGE DRGT KGFVIDELKDVDEDRNGKVDVIEINCEHMDTHYRVLPNIAKLFDDCTGIGVPMHGGPTDEVTAKLKQVIDMKERFV IIVL DEIDKLVKKSGDEVLYSLTRINTELKRAKVSVIGISNDLKFKEYLDPRVLSSLSEEEVVFPPYDANQLRDILTQRA EEAF YPGVLDEGVIPLCAALAAREHGDARKALDLLRVAGEIAEREGASKVTEKHVWKAQEKIEQDMMEEVIKTLPLQSKV LLYA IVLLDENGDLPANTGDVYAVYRELCEYIDLEPLTQRRISDLINELDMLGIINAKVVSKGRYGRTKEIRLMVTSYKI RNVL RYDYSIQPLLTISLKSEQRRLI The input file is t/data/testaln.aln, in the latest bioperl-live. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Peter Stogios Sent: Monday, September 08, 2003 3:22 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Bio::SimpleAlign problems Hello, I would like some help with Bio::SimpleAlign and Bio::AlignIO. I am trying to do some VERY simple tasks but the AlignIO module is being difficult in reading alignment files. It does not seem to read many formats correctly. I am using the sample code included at the SimpleAlign documentation page. The code of interest is: $str = Bio::AlignIO->new('-file' => 'testaln.aln'); $aln = $str->next_aln(); print $aln->no_residues, "\n"; print $aln->no_sequences, "\n"; I have tried loading ClustalX format version 1.81, ClustalW version 1.5, MSF format, and PHYLIP format, without having success. I am sure the alignment files are in their correct formats, since other programs can read them. Can someone please inform me what is the preferred format for reading by AlignIO and SimpleAlign? Also, should I specify the format of the alignment in the Bio::AlignIO-->new line? Thank you very much in advance, Peter Stogios -- ____________________________________________________________ Peter Stogios | Ontario Cancer Institute Graduate Student | Princess Margaret Hospital G. Prive Lab | 610 University Ave. Rm.7-207 Dept. of Medical Biophysics | M5G 2M9 University of Toronto | (416) 946-2000 ex. 5615 pstogios@uhnres.utoronto.ca http://xtal.uhnres.utoronto.ca/prive ____________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Tue Sep 9 10:05:43 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Sep 9 10:04:10 2003 Subject: [Bioperl-l] 1.2.3 release prep, modules needing maintainers Message-ID: 1.2.3 release schedule ====================== I'm planning to do 1.2.3 release candidate by next Monday (15-Sep-03). This is contingent on my current full plate of things to do not getting any bigger... I think there were a couple of lingering bugs that should get squashed, I'll have to check and see what is left. The notables off the top of my head: - A request to add the subsequence retrieval from Entrez to DB::GenBank/DB::GenPept, there is a patch in bugzilla which needs to be tested and applied. - A couple of TreeIO bugs that I would like to fix before the release, but I am not sure it will get done. Will do the best I can. These are mostly fixed on the main trunk so will try and migrate them over. My intention is to wait a week, have people test 1.2.3 candidate and then release 1.2.3 on (22-Sep-2003) unless there are any showstoppers. I will need volunteers to test the release on different OSes so please try and help out when the candidates are made. You can start testing early if you want by checking out the branch 'branch-1-2' from CVS. Modules needing maintainers =========================== A side note- Lincoln has wanted to wash his hands of DB::GenBank DB::GenBank::Query so if there are volunteers out there, we could do with some other people stepping up to maintain this code. Similarly I do not have any interest in maintaining StandAloneBlast, RemoteBlast because they are not part of my own research direction. There are several feature requests, reported bugs in bugzilla about these. We would warmly welcome someone to step up and try and add these feature requests. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From wxu at msi.umn.edu Tue Sep 9 10:53:01 2003 From: wxu at msi.umn.edu (wxu) Date: Tue Sep 9 10:51:36 2003 Subject: [Bioperl-l] Bio::DB::GenBank problem? In-Reply-To: <3F5D9785.8050909@ed.ac.uk> Message-ID: Thanks, James, I reinstalled v1.2.2, and it works fine now. Wayne -- Wayne Xu Computational Genomics Specialist www.msi.umn.edu/user_support/compgen Supercomputing Institute 550 Walter Library 117 Pleasant Street SE University of Minnesota Minneapolis, Minnesota 55455 email: wxu@msi.umn.edu help email: help@msi.umn.edu phone: 612-624-1447 help phone: 612-626-0802 fax: 612-624-8861 -----Original Message----- From: James Wasmuth [mailto:james.wasmuth@ed.ac.uk] Sent: Tuesday, September 09, 2003 4:04 AM To: wxu Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Bio::DB::GenBank problem? Hi Wayne, I identified this a few months back, and from what I can remember Heikki fixed it and is now in the CVS. james wxu wrote: >Hello, >I wrote a simple bioperl program to retrieve genBank sequences by genBank >accession numbers using $gb->get_Stream_by_id function. >But it seems to me it can only work on GI number, not work on accession >numbers. I use $gb->get_Stream_by_acc, the result is the same. In my >following code, it only give me 2981014 sequences. I search NCBI web >manually, all "J00522","AF303112","2981014","BG065024","BG065143" work. >And I tried $gb->get_Seq_by_acc("J00522"); It complains "No such acc error". > >What is the problem? >Your help would be greatly appreciated. > >Wayne >-- > >---------------------------------------------------------- >#!/usr/local/bin/perl >use lib "/usr/local/bioperl/bioperl-1.2"; >use lib qw(/home/wxu/bioperl); > >use Bio::Perl; >use Bio::DB::GenBank; >use Bio::SeqIO; >my $gb = new Bio::DB::GenBank(); > >my $seqio = >$gb->get_Stream_by_id(["J00522","AF303112","2981014","BG065024","BG065143"] ) >; > >while( my $seq = $seqio->next_seq ) { > #print "seq length is ", $seq->length,"\n"; > write_sequence('>>outputfile', 'fasta', $seq) > } >--------------------------------------------------------------------------- - >-- >Wayne Xu >Computational Genomics Specialist www.msi.umn.edu/user_support/compgen > >Supercomputing Institute >550 Walter Library >117 Pleasant Street SE >University of Minnesota >Minneapolis, Minnesota 55455 >email: wxu@msi.umn.edu help email: help@msi.umn.edu >phone: 612-624-1447 help phone: 612-626-0802 >fax: 612-624-8861 > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Nematode Bioinformatics Blaxter Nematode Genomics Group Institute of Cell, Animal and Population Biology Ashworth Labs University of Edinburgh King's Buildings Edinburgh EH9 3JT UK (+44)(0)131 650 7403 From michael.watson at bbsrc.ac.uk Tue Sep 9 10:59:18 2003 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue Sep 9 11:00:08 2003 Subject: [Bioperl-l] Help with Bio::DB::Query::GenBank Message-ID: <20B7EB075F2D4542AFFAF813E98ACD9301C00AAB@cl-exsrv1.irad.bbsrc.ac.uk> OK I have fixed my own problem (or one in Bio::DB::Query::GenBank - actually in WebQuery.pm, actually in UIR.pm!!) IN WebQuery.pm, sub _get_request the code is: $uri->query_form(@params) if one does a simple "print $uri\n" we find that the url string says: db=+nucleotide hello? where did that "+" come from????? OK so if I do a $uri =~ s/\+// in my code to get rid of it, i get the correct number of results :-D So perhaps my URI.pm is incorrect.... Mick -----Original Message----- From: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Sent: 09 September 2003 13:36 To: 'bioperl-l@bioperl.org' Subject: [Bioperl-l] Help with Bio::DB::Query::GenBank Hi I am having various problems with Bio::DB::Query::GenBank. I have just downloaded Bioperl 1.2.2 from the website. First of all the example gives me errors. If i use the query string 'Oryza[Oragnism]', i get an error message stating that "Organism". However, it does return a nice count of 5879 records. Only if I search NCBI using the web, Oryza[Organism] actuallu returns 525828 records - quite significantly more than the 5879 that Bio::DB::Query::GenBank returns! What I am really trying to do is to get all the Gallus gallus genome sequences out and downloaded onto my server. I was hoping to do this using Bioperl, but using "Gallus" as the query string returns 2421 using Bioperl and nearly 600,000 using Entrez.... Can anyone please enlighten me?? Thanks Mick _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From basu at pharm.sunysb.edu Tue Sep 9 14:47:16 2003 From: basu at pharm.sunysb.edu (Siddhartha Basu) Date: Tue Sep 9 14:18:43 2003 Subject: [Bioperl-l] Warning and error message using Bio::Biblio module Message-ID: <3F5E2034.3010109@pharm.sunysb.edu> Hi, I am using bioperl-1.2.2 and getting these warning message when using the Bio::Biblio module. ++++++++++++++++++++++++++ #!/usr/bin/perl -w use strict; use Bio::Biblio; ++++++++++++++++++++++ Just using that module generates following warning messages ================================================================= Use of uninitialized value in sprintf at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BiblioI.pm line 90. Use of uninitialized value in sprintf at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BiblioI.pm line 90. Use of uninitialized value in sprintf at /usr/lib/perl5/site_perl/5.8.0/Bio/Biblio.pm line 196. Use of uninitialized value in sprintf at /usr/lib/perl5/site_perl/5.8.0/Bio/Biblio.pm line 196. Use of uninitialized value in sprintf at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Biblio/soap.pm line 127. Use of uninitialized value in sprintf at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Biblio/soap.pm line 127. =================================================================== Other than that any script using that module returns the error "Unexpected Content-Type '' returned". Even the one liners like perl -MBio::Biblio -e 'print join ("\n", @{ new Bio::Biblio->find ("brazma")->get_all_ids })' perl -MBio::Biblio -e 'print new Bio::Biblio->find ("Java")->find ("perl")->get_count' gives the same error. Any idea/suggestions. -siddhartha From lstein at cshl.edu Tue Sep 9 15:43:45 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 15:43:03 2003 Subject: [Bioperl-l] 1.2.3 release prep, modules needing maintainers In-Reply-To: References: Message-ID: <200309091543.45146.lstein@cshl.edu> Previous remarks aside, I would like to pipe in here to say how much FUN maintaining Bio::DB::Genbank has been and how much I ENVY the LUCKY GUY who will be taking over this HIGHLY ENTERTAINING activity. Lincoln On Tuesday 09 September 2003 10:05 am, Jason Stajich wrote: > 1.2.3 release schedule > ====================== > > I'm planning to do 1.2.3 release candidate by next Monday (15-Sep-03). > This is contingent on my current full plate of things to do not getting > any bigger... > > I think there were a couple of lingering bugs that should get squashed, > I'll have to check and see what is left. > > The notables off the top of my head: > > - A request to add the subsequence retrieval from Entrez to > DB::GenBank/DB::GenPept, there is a patch in bugzilla which needs to be > tested and applied. > > - A couple of TreeIO bugs that I would like to fix before the > release, but I am not sure it will get done. Will do the best I can. > These are mostly fixed on the main trunk so will try and migrate them > over. > > My intention is to wait a week, have people test 1.2.3 candidate and then > release 1.2.3 on (22-Sep-2003) unless there are any showstoppers. > I will need volunteers to test the release on different OSes so please > try and help out when the candidates are made. You can start testing > early if you want by checking out the branch 'branch-1-2' from CVS. > > > Modules needing maintainers > =========================== > > A side note- Lincoln has wanted to wash his hands of DB::GenBank > DB::GenBank::Query so if there are volunteers out there, we could do with > some other people stepping up to maintain this code. > > Similarly I do not have any interest in maintaining StandAloneBlast, > RemoteBlast because they are not part of my own research direction. > There are several feature requests, reported bugs in bugzilla about these. > We would warmly welcome someone to step up and try and add these feature > requests. > > > -jason > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Sep 9 15:47:55 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 15:47:02 2003 Subject: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script In-Reply-To: References: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <200309091547.55821.lstein@cshl.edu> Sorry about any confusion this caused. However, it is mentioned in the docs for WebDBSeqI. Perhaps the default should be changed to "tempfile", which should work in all cases. Lincoln On Wednesday 20 August 2003 01:05 pm, Jason Stajich wrote: > > So your script is doing what it's supposed to, it's just that some other > > stuff is getting out on STDOUT before your webserver is able to get in > > on the act. > > > > Having played a bit, this proves to be interesting: > > > > #!/usr/bin/perl -w > > use strict; > > use Bio::DB::GenBank; > > > > close STDOUT; > > > > my $d = Bio::DB::GenBank->new(); > > my $seq = $d -> get_Seq_by_gi('163483'); > > > > > > This gives me: > > > > print() on closed filehandle STDOUT at > > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm line 701 > > > > So WebDBSeqI.pm is usurping STDOUT as part of its query. This probably > > explains what you're getting. Apache will redirect STDOUT straight to > > the return stream for the connection. This means it gets the output > > intended for WbDBSeq and it appears in your programs output. You then > > get the output you printed. > > This is part of Lincoln's rechaining of the IO and using fork - looking > at his comments in the code. > # Try to create a stream using POSIX fork-and-pipe facility. > # this is a *big* win when fetching thousands of sequences from > # a web database because we can return the first entry while > # transmission is still in progress. > # Also, no need to keep sequence in memory or in a temporary file. > # If this fails (Windows, MacOS 9), we fall back to non-pipelined > # access. > > You can turn this off by adding to the DB::GenBank init > my $db = new Bio::DB::GenBank(-retrievaltype => 'io_string'); > > -retrievaltype => 'io_string' (for in-memory holding of the sequence > before parsing) > or > -retrievaltype => 'temp' (for use of tempfiles, but I'm not 100% > this code has gotten a workout to cleanup > until the program exits which might be > a problem for mod_perl running scripts) > > > If this is right, you should have some interesting error messages in > > your logs if you run your script with warnings enabled. > > > > I can't see an immediate fix for this, short of running your fetch as a > > completely detached process with a separate STDOUT, but that kind of > > defeats the point of using mod-perl. The use of a pipe from STDOUT to > > read the results of a webquery seem pretty engrained into WebQueryI.pm > > and it may not be trivial to change it. > > > > Maybe others will be able to think of a simpler work-round? > > > > > > Simon. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Sep 9 15:51:18 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 15:50:09 2003 Subject: [Bioperl-l] Bio::DB::GFF problem In-Reply-To: <3F3AC384.5010100@gsf.de> References: <3F3AC384.5010100@gsf.de> Message-ID: <200309091551.18347.lstein@cshl.edu> Hi, I'm catching up on my bioperl mail after being on vacation for August. The problem is that you need an entry for the whole chromosome because the segment() call needs to retrieve it in order to get the length. Something like this will do the trick: 1 EnsEMBL chromosome 1 1502911 . . . Chromosome 1 Lincoln On Wednesday 13 August 2003 07:02 pm, Matthias Wahl wrote: > Hi all! > > I have trouble in using Bio::DB::GFF with the following code: > > my $aggregator = Bio::DB::GFF::Aggregator->new(-method => 'gene_density' > -sub_parts => > 'EnsEMBL:gene_density'); > > my $gff_db = Bio::DB::GFF->new(-adaptor =>'dbi::mysqlopt', > -dsn=>'dbi:mysql:Mus_musculus_GFF', > -user => 'xxxxx', > -pass => 'xxxxx', > -aggregator => $aggregator > ); > > > Calling > > $gff_db->segment(-class=>'Chromosome', > -value=>'1'); > > always returns undef (whatever arguments I use)! > The database has been generated by loading a GFF file of the following > format: > > 1 EnsEMBL gene_density 1000001 2000000 0 > Chromosome 1 > > 1 EnsEMBL gene_density 2000001 3000000 0 > Chromosome 1 > > 1 EnsEMBL gene_density 3000001 4000000 1 > Chromosome 1 > > 1 EnsEMBL gene_density 4000001 5000000 12 > Chromosome 1 > > 1 EnsEMBL gene_density 5000001 6000000 4 > Chromosome 1 > > with load_gff.PLS (columns are tab-seperated, the 9th column consists of > 'Chromosome' and name, seperated by space), both with and without the > associated sequence file. > > Calling > > $gff_db->features() > > works fine. But I need aggregated features for generating a > Bio::Graphics xyplot (to plot the gene density for a particular > chromosome). > > Many thanks, > > Matthias -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Sep 9 16:14:45 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 16:13:38 2003 Subject: [Bioperl-l] Bio::Graphics::Panel, -spacing => 0 constructor problem. In-Reply-To: <200309022247.h82MlCAS000711@mx2.nyu.edu> References: <200308222223.h7MMN14c022854@mx3.nyu.edu> <200309011818.31470.lstein@cshl.edu> <200309022247.h82MlCAS000711@mx2.nyu.edu> Message-ID: <200309091614.45277.lstein@cshl.edu> Hi Philip, Just put both features into the same track and use a callback to choose different glyphs for them: [MY FEATURE] feature = feature1 feature2 glyph = sub { my $feature = shift; return 'arrow' if $feature->method eq 'feature1'; return 'generic' if $feature->method eq 'feature2'; } You can do the same thing for any of the options. Lincoln On Tuesday 02 September 2003 06:44 pm, Philip MacMenamin wrote: > Thanks Lincoln, > > I was under the impression that the "spacing" argument was referring to > spacing on the y axis, and that it defaults to 5 between the stack of > tracks. And that if it was set to zero, then the track would lie on the > same plain as the previous in the stack. > > Where in fact it adds additional space between tracks, over and above the > normal amount of space. > > I would like there to be no vertical (y) space between certain tracks, ie > the curatedGenes track and the UTR track, similar to the way that wormbase > have it. They should not overly eachother, due to them existing in > differant x space. What area of Bio::Graphics should I look at to do this? > > Philip > > On Monday 01 September 2003 06:18 pm, Lincoln Stein wrote: > > Spacing adds additional padding between tracks. You cannot get them to > > overly each other. Possibly -start and -end are not doing what you think > > they should do. > > > > Lincoln > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Sep 9 16:10:48 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 16:17:54 2003 Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <5.1.1.6.0.20030909160020.00b28d50@valmont> References: <5.1.1.6.0.20030806090354.00b28208@valmont> <5.1.1.6.0.20030909160020.00b28d50@valmont> Message-ID: <200309091610.48730.lstein@cshl.edu> I'm sorry for doubting. This seems to be a bug in Bio::SeqFeature::Generic in the released bioperl 1.2.2. Enclosed is a patch file to fix it, and this will be included in next week's 1.2.3. Alternatively you can do one of the following: 1) In all examples, replace "Bio::SeqFeature::Generic" with "Bio::Graphics::Feature". The latter is a plug-in replacement for Bio::SeqFeature::Generic but doesn't have the aforementioned bug. alternatively 2) After creating the Bio::SeqFeature::Generic, call its display_name() method to set the name: my $feature = Bio::Graphics::Feature->new(-display_name=>$name,-score=>$score, -start=>$start,-end=>$end); $feature->display_name($name); Lincoln On Tuesday 09 September 2003 10:10 am, Laurence Amilhat wrote: > Sorry I am a little confused, > > The updated tutorial is the 2003-05-15 one? Is the one I am looking at. > I tried the example and it still doesn't work. > > Scott cain gave me another example to make the same thing, and skip the > problem, see below. > (From his point of view your example will work on bioperl 1.3) > > Sincerely, > > Laurence > > > #!/usr/local/bin/perl -w > use strict; > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > my $track=$panel->add_track(-glyph =>'generic', > -label => sub{my $self=shift; > return $self->seq_id;}, > -description=> sub{my $self=shift; > return $self->score;}, > -bgcolor => sub{my $self=shift; > my $score = $self->score; > if ($score >= 1000) { > return 'red'; > } else { > return 'green'; > } > }); > while () > { > chomp; > next if /^\#/; > my ($name,$score,$start,$end)=split /\s+/; > my $feature=Bio::SeqFeature::Generic->new(-seq_id=>$name, > -score=>$score, > -start=>$start, > -end=>$end); > $track->add_feature($feature); > } > print $panel->png; > > At 11:39 29/08/2003 -0400, you wrote: > >Sorry for responding so late to this e-mail. You were looking at an out > > of date tutorial that no longer matches the code base. However, the > > tutorial has now been updated and the examples should work. > > > >Lincoln > > > >On Wednesday 06 August 2003 03:09 am, Laurence Amilhat wrote: > > > Hi, > > > > > > I try to learn how to use the module Bio::Graphics. > > > I found he How To from Lincoln Stein on the web. I try to practice with > > > the examples, it's working except for the labels of the features that > > > don't appear on my figure. > > > Does anybody ever use this module? > > > > > > This is the example: > > > #!/usr/local/public/bin/perl > > > > > > use strict; > > > use lib > > > '/homej/bioinf/lamilhat/PERL_MODULE/lib/perl5/site_perl/5.005/BIOPERL/l > > >ib/s ite_perl/5.6.1/'; use Bio::Graphics; > > > use Bio::SeqFeature::Generic; > > > > > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > > > my $track=$panel->add_track(-glyph =>'generic',-label =>1); > > > > > > > > > while (<>) > > > { > > > chomp; > > > next if /^\#/; > > > my ($name,$score,$start,$end)=split /\t+/; > > > print STDERR "$name\n"; > > > my $feature= > > > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-star > > >t=>$ start,-end=>$end); $track->add_feature($feature); > > > } > > > > > > print $panel->png; > > > > > > > > > And this is the Data to parse with the example: > > > #hit score start end > > > truc1 381 2 200 > > > truc2 210 2 210 > > > truc3 800 2 200 > > > truc4 1000 380 921 > > > truc5 812 402 972 > > > truc6 1200 400 970 > > > bum 400 300 620 > > > pres1 127 310 700 > > > > > > > > > Thanks, > > > > > > Laurence. > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes > > > 234 avenue du Br?zet > > > 63039 Clermont-Ferrand Cedex 2 > > > > > > Tel 04 73 62 48 37 > > > Fax 04 73 62 44 53 > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > >-- > >======================================================================== > >Lincoln D. Stein Cold Spring Harbor Laboratory > >lstein@cshl.org Cold Spring Harbor, NY > >======================================================================== > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes > 234 avenue du Br?zet > 63039 Clermont-Ferrand Cedex 2 > > Tel 04 73 62 40 87 > Fax 04 73 62 44 53 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- A non-text attachment was scrubbed... Name: generic_diff.pm Type: text/x-perl Size: 1608 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20030909/aa033cae/generic_diff.bin From wxu at msi.umn.edu Tue Sep 9 10:52:22 2003 From: wxu at msi.umn.edu (wxu) Date: Tue Sep 9 16:18:04 2003 Subject: [Fwd: Re: [Bioperl-l] About Bio::DB::GenBank!] In-Reply-To: <3F5D2555.8030007@xymu.net> Message-ID: Hi Jonathan, Thank you very much for your help. I reinstalled v1.2.2, and it works fine now. Thanks again, Wayne -- Wayne Xu Computational Genomics Specialist www.msi.umn.edu/user_support/compgen Supercomputing Institute 550 Walter Library 117 Pleasant Street SE University of Minnesota Minneapolis, Minnesota 55455 email: wxu@msi.umn.edu help email: help@msi.umn.edu phone: 612-624-1447 help phone: 612-626-0802 fax: 612-624-8861 -----Original Message----- From: yaofx [mailto:yaofx@xymu.net] Sent: Monday, September 08, 2003 7:57 PM To: wxu@msi.umn.edu Cc: bioperl-l@bioperl.org Subject: [Fwd: Re: [Bioperl-l] About Bio::DB::GenBank!] -------- Original Message -------- Subject: Re: [Bioperl-l] About Bio::DB::GenBank! Date: Fri, 01 Aug 2003 13:18:22 -0400 From: Jonathan Manning Organization: Whitehead Institute Center for Genome Research To: yaofx CC: bioperl References: <3F2A6D86.9040006@xymu.net> I just encountered the same error. It looks like both Bio::DB::GenBank and Bio::DB::GenPept search the protein database. So, Bio::DB::GenBank is not returning anything when you query for an accession. To verify this, change: my $gb = new Bio::DB::GenBank; to: my $gb = new Bio::DB::GenBank(-format => 'fasta', -verbose => 1); For me, this prints: url is http://www.ncbi.nih.gov/entrez/eutils/efetch.fcgi?rettype=fasta&db=protein&i d=AC068609&tool=bioperl&retmode=text&usehistory=n Sure enough, visiting this gives me a blank page. However, if I substitute 'db=nucleotide' for 'db=protein' in that url, it works. This bug seems to exist in 1.2 and 1.2.1, but I think is fixed in 1.2.2. (at least in the CVS head I checked...) Either upgrade to 1.2.2 or edit Bio/DB/GenBank.pm and change 'protein' to 'nucleotide' in the BEGIN block. ~Jonathan yaofx wrote: > Hello, > > I have installed Perl 5.6.1 for WIN32, and Bioperl version 1.2.1. > The following is the script ,which can retrieve data from GenBank by > sequences' gi, > but can not get the results by accession number. > I replace "get_Stream_by_id" with ""get_Stream_by_acc", also failed. > > The error message is : > > ------------- EXCEPTION ------------- > MSG: WebDBSeqI Error - check query sequences! > > STACK Bio::DB::WebDBSeqI::get_seq_stream > c:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:46 > 4 > STACK Bio::DB::WebDBSeqI::get_Stream_by_id > c:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: > 259 > STACK toplevel web_gi2seq_vi.pl:50 > > -------------------------------------- > > BTW, I don't change any code about Bioperl package. > What's matter with it? and how will i do next? > > Thanks in advance for any kind of help > > Fengxia > > > #!/usr/bin/perl > > $idlist = $ARGV[0]; > > if (@ARGV != 1){ > print "USAGE: perl web_id2seq.pl \n"; > exit(1); > } > > $faoutfile = $idlist."_fa.txt"; > (unlink $faoutfile) if (-e $faoutfile); > > open (INPUT,$idlist); > > while ($line = ){ > chomp ($line); > $line =~ s/\r//; > push (@querylist,$line); > } > $list = join ",",@querylist; > close INPUT; > > use Bio::SeqIO; > use Bio::DB::GenBank; > my $gb = new Bio::DB::GenBank; > > my $seqout = new Bio::SeqIO(-file => ">$faoutfile", -format => 'fasta'); > my $seqio = $gb->get_Stream_by_id([$list]); > > while($seq = $seqio->next_seq ) { > $seqout->write_seq($seq); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jonathan Manning Whitehead Institute Center for Genome Research Finishing Process Analyst / Data Analyst From lstein at cshl.edu Tue Sep 9 16:31:46 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 16:30:38 2003 Subject: [Bioperl-l] Connecting Bio::DB::GFF::Features on a Bio::Graphics::Panel In-Reply-To: <200309081851.h88IpvIM026914@mx2.nyu.edu> References: <200309081851.h88IpvIM026914@mx2.nyu.edu> Message-ID: <200309091631.46491.lstein@cshl.edu> You beat me to it, but I wanted to warn everyone away from creating Bio::DB::GFF::Feature objects directly. This class is intended for features that are tied to a Bio::DB::GFF database and it will probably not do what you want if you create it in memory. To do what Philip wants you can either create a Bio::Graphics::Feature using the new() constructor with nested coordinates, or do something like the following with Bio::SeqFeature::Generic: my $feature1 = Bio::SeqFeature::Generic->new(-start=>$KKstart,-end=>$KKstop); my $feature2 = Bio::SeqFeature::Generic->new(-start=>$KKstopStop-1,-end=>$KKstopStop); my $feature = Bio::SeqFeature::Generic->new(-primary=>'group'); $feature->add_SeqFeature($feature1, 'EXPAND'); $feature->add_SeqFeature($feature2,'EXPAND'); and then add $feature to the track. There is default behavior that any feature of type "group" is rendered with a dotted line connecting the two features. This can be overridden. I leave it as an exercise to the readers how to join two features that themselves contain subfeatures or split locations. Lincoln On Monday 08 September 2003 02:47 pm, Philip MacMenamin wrote: > Apologies for previous question. I realised the solution to the problem > just after I sent the question. Very simple, very silly of me. Always the > way, stare at it for ages, then show some one else the code, and the answer > instantly becomes painfully obvious. > > Use a Bio::Graphics::Feature object, and just tell it the segments in the > constructor at the start. Works fine. No aggregator, no > Bio::DB::GFF::Features, all very simple and above board. > > $joinedFeatureRef = Bio::Graphics::Feature->new(-segments =>[ > [$KKstart,$KKstop],[($KKstopStop-1),$KKstopStop]]); > > and > > if ( scalar @KKs > 0 ) > { > $panel->add_track(group =>$joinedFeatureRef, > -bgcolor => 'blue', > -height => 4, > -fgcolor => 'darkgreen', > -key => 'SuchAndSuch', > -connect => 1, > -connector => 'dashed', > -bump => +1, > -connect_color => 'lightpurple', > ); > } > > All the best. > Philip. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From heikki at nildram.co.uk Tue Sep 9 16:58:55 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Sep 9 16:57:30 2003 Subject: [Bioperl-l] Warning and error message using Bio::Biblio module In-Reply-To: <3F5E2034.3010109@pharm.sunysb.edu> References: <3F5E2034.3010109@pharm.sunysb.edu> Message-ID: <1063141134.2038.35.camel@localhost> Siddhartha, There has been server problems in fetching entries and I am not sure what the current situation is. Martin Senger who should know has been travelling lately but should be soon back. However, simply using the modules should not give you error messages. My guess is that you do not have SOAP::Lite installed. Can you run 'man SOAP::Lite'? -Heikki On Tue, 2003-09-09 at 19:47, Siddhartha Basu wrote: > Hi, > I am using bioperl-1.2.2 and getting these warning message when using > the Bio::Biblio module. > > ++++++++++++++++++++++++++ > #!/usr/bin/perl -w > use strict; > use Bio::Biblio; > ++++++++++++++++++++++ > > Just using that module generates following warning messages > ================================================================= > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BiblioI.pm line 90. > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BiblioI.pm line 90. > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/Biblio.pm line 196. > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/Biblio.pm line 196. > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Biblio/soap.pm line 127. > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Biblio/soap.pm line 127. > =================================================================== > > Other than that any script using that module returns the error > "Unexpected Content-Type '' returned". Even the one liners like > perl -MBio::Biblio -e 'print join ("\n", @{ new Bio::Biblio->find > ("brazma")->get_all_ids })' > perl -MBio::Biblio -e 'print new Bio::Biblio->find ("Java")->find > ("perl")->get_count' > > gives the same error. > > Any idea/suggestions. > > -siddhartha > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Tue Sep 9 17:16:03 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 17:14:57 2003 Subject: [Bioperl-l] GFF to gene structure pictures In-Reply-To: References: Message-ID: <200309091716.03966.lstein@cshl.edu> Did you get this working? It should be a short script: #!/usr/bin/perl use strict; use Bio::Graphics::FeatureFile; my $file = Bio::Graphics::FeatureFile->new(-file=>shift); my ($number_rendered,$panel) = $file->render; print $panel->png; Lincoln On Thursday 21 August 2003 11:14 am, Andrew Ram wrote: > Hi everyone > I would like to convert my gene structures I have in GFF or GTF format to > nice pictures probably using the BioGraphics tools. Can someone out there > help me with any scripts? > Thanks very much in advance-Looking forward to hearing from the group! > Andrew > > _________________________________________________________________ > Add photos to your messages with MSN 8. Get 2 months FREE*. > http://join.msn.com/?page=features/featuredemail > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From heikki at nildram.co.uk Tue Sep 9 17:26:47 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Sep 9 17:31:18 2003 Subject: [Bioperl-l] Warning and error message using Bio::Biblio module In-Reply-To: <3F5E478F.6020908@pharm.sunysb.edu> References: <3F5E2034.3010109@pharm.sunysb.edu> <1063141134.2038.35.camel@localhost> <3F5E478F.6020908@pharm.sunysb.edu> Message-ID: <1063142807.2038.43.camel@localhost> I got it. The warnings are completely harmless. The offending lines should look like this: $VERSION = do { my @r = (q$Revision: 1.5 $ =~ /\d+/g); sprintf "%d.%-02d", @r }; but since I disabled the CVS keyword expansion before exporting code into distribution tarball, there is no revision string to match the regular expression. Sorry, -Heikki On Tue, 2003-09-09 at 22:35, Siddhartha Basu wrote: > Hi Heikki, > > Heikki Lehvaslaiho wrote: > > Siddhartha, > > > > There has been server problems in fetching entries and I am not sure > > what the current situation is. Martin Senger who should know has been > > travelling lately but should be soon back. > Ok, i understand. > > > > > However, simply using the modules should not give you error messages. My > > guess is that you do not have SOAP::Lite installed. Can you run 'man > > SOAP::Lite'? > Yes, SOAP::Lite is installed and i can get the manual with man and > perldoc command. The script "biblio_soap.pl" under the scripts/biblio > directory is also running properly with the warnings. Here is the output > > perl biblio_soap.pl > 1..10 > # Running under perl version 5.008001 for linux > # Current time local: Tue Sep 9 17:32:25 2003 > # Current time GMT: Tue Sep 9 21:32:25 2003 > # Using Test.pm version 1.24 > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BiblioI.pm line 90. > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BiblioI.pm line 90. > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/Biblio.pm line 196. > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/Biblio.pm line 196. > Contact to SOAP server at 127.0.0.1:4444 (server PID: 14301) > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Biblio/soap.pm line 127. > Use of uninitialized value in sprintf at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Biblio/soap.pm line 127. > ok 1 > ok 2 > ok 3 > ok 4 > ok 5 > ok 6 > ok 7 > ok 8 > ok 9 > ok 10 > SOAP server 14301 killed > > -siddhartha > > > > > -Heikki > > -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From lstein at cshl.edu Tue Sep 9 17:28:12 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 17:32:22 2003 Subject: [Bioperl-l] Help with Bio::DB::Query::GenBank In-Reply-To: <20B7EB075F2D4542AFFAF813E98ACD9301C00AA7@cl-exsrv1.irad.bbsrc.ac.uk> References: <20B7EB075F2D4542AFFAF813E98ACD9301C00AA7@cl-exsrv1.irad.bbsrc.ac.uk> Message-ID: <200309091728.12574.lstein@cshl.edu> This has been fixed in the forthcoming 1.2.3 release. I've also corrected some errors in the synopsis (most importantly ids() instead of get_Ids()) Lincoln On Tuesday 09 September 2003 08:36 am, michael watson (IAH-C) wrote: > Hi > > I am having various problems with Bio::DB::Query::GenBank. > > I have just downloaded Bioperl 1.2.2 from the website. > > First of all the example gives me errors. If i use the query string > 'Oryza[Oragnism]', i get an error message stating that > "Organism". However, it does return a nice > count of 5879 records. Only if I search NCBI using the web, > Oryza[Organism] actuallu returns 525828 records - quite significantly more > than the 5879 that Bio::DB::Query::GenBank returns! > > What I am really trying to do is to get all the Gallus gallus genome > sequences out and downloaded onto my server. I was hoping to do this using > Bioperl, but using "Gallus" as the query string returns 2421 using Bioperl > and nearly 600,000 using Entrez.... > > Can anyone please enlighten me?? > > Thanks > > Mick > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Sep 9 17:37:12 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 17:36:31 2003 Subject: [Bioperl-l] load_gff.pl question In-Reply-To: <45F418C6-C78A-11D7-879A-0003935652B4@biosci.cbs.umn.edu> References: <45F418C6-C78A-11D7-879A-0003935652B4@biosci.cbs.umn.edu> Message-ID: <200309091737.12788.lstein@cshl.edu> I know it is very late to be answering this (I was on vacation in the month of August) but the problem is the database schema, which has a uniqueness constraint on features. To be considered unique, a feature must have a unique combination of seq_id,start,end,feature_type and name. If you are trying to load several features that have the same position and type, then try giving each of them a different name in order to differentiate them. Alternatively, you could go into the database and redefine this constraint. Lincoln On Tuesday 05 August 2003 05:17 pm, Shin Enomoto wrote: > I am getting erratic results with the load_gff.pl. > > 1) I was trying to load a table of 900 lines. Of the 900 entries, ~30 > are highly unique and they all loaded normally. The remainder was ~600, > ~250 and 4 of very similar items only differing slightly. It loaded > 6/600, 5/250 and 2/4. > > 2) I have 10 large tables of ~250000 lines each. I was able to load the > first table. load_gff.pl will not load any other tables. > > Where do I start to customize this script to allow loading of large > number of similar entities? > > > > > Shin Enomoto > 295 ASLVM > 1988 Fitch Ave. > St. Paul, MN 55108 > > 612-625-7737 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Sep 9 18:03:00 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 18:13:16 2003 Subject: [Bioperl-l] multiple inheritance In-Reply-To: References: Message-ID: <200309091803.00099.lstein@cshl.edu> I know I'm responding kind of late, but the easiest way to weasle out of this one is just to explicitly call new() in both parents. sub new { my $class = shift; my $self = $class->B::new(@_); $self->C::new(@_); } Or you can generalize this: sub new { my $class = shift; my $self = $class; for my $parent (@ISA) { my $method = "$parent\:\:new"; $self = $self->$method(@_); } $self; } Note that A will end up being invoked twice (oops) and that the initializer must be prepared to receive a first argument containing an initialized object rather than a class name. NEXT will solve the first problem, but not the second. Better perhaps would be to separate object creation from initialization in all classes: sub new { my $class = shift; my $self = bless {},ref $class || $class; $self->initialize(@_); } This way one would never override new() and instead would inherit like this: sub initialize { my $self = shift; for my $parent (@ISA) { next unless $parent->can('initialize'); my $method = "$parent\:\:initialize"; $self->$method(@_); } } There's probably a clever way to turn this into a method that can be called in the Bio::Root class. Let's see: sub initialize_chain { my $self = shift; my $package = ref $self; for my $parent (@{"$package\:\:ISA"}) { next unless $parent->can('initialize'); my $method = "$parent\:\:initialize"; $self->$method(@_); } } So now we override with: sub initialize { my $self = shift; $self->initialize_chain(@_); # now do our own initialization } A little more work and we can prevent the base class from initializing twice. Lincoln On Wednesday 30 July 2003 04:12 pm, Jason Stajich wrote: > Hmm - how should we solve the multiple inheritance when we want to chain > both constructors. Using SUPER just goes up the tree and will follow B > first up to A and never call C's constructor. > > package D; > @ISA = qw(B C); > > A > / \ > B C > \ / > D > > > The Right Way* to do this is of course not having multiple inheritance OR > to use The Damian's NEXT > http://search.cpan.org/author/DCONWAY/NEXT-0.50/lib/NEXT.pm > > We run into this for SeqFeature::SimilarityPair which ISA FeaturePair and > a Similarity - the soln there was not to rely on the constructor for > initializing parameters. > > I am hitting it again for my Tree::AlleleNode objects which are > PopGen::Individuals (genotype containers) and Tree::Node (as part of the > coalescent). > > My soln will be to explictly code all the initialization parameters for > the skipped superclass (C as in above example) as copy+paste > > NEXT is part of perl 5.8.0 and would remove a lot of issues wrt to > chaining destructors that we have some code hacks for in Bio::Root::Root. > But I am wary of adding another module dependancy. > > Comments? > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Sep 9 18:45:22 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 18:45:49 2003 Subject: [Bioperl-l] multiple inheritance In-Reply-To: References: Message-ID: <200309091845.22355.lstein@cshl.edu> OK, here's my solution to this particular problem. The inheritance hierarchy is: A /\ B C \/ D | E Here's the test script: use E; my $a1 = A->new(qw(A initialization stuff)); print "\n"; my $b1 = B->new(qw(B initialization stuff)); print "\n"; my $c1 = C->new(qw(C initialization stuff)); print "\n"; my $d1 = D->new(qw(D initialization stuff)); print "\n"; my $e1 = E->new(qw(E initialization stuff)); print "\n"; And here's its output: A: initializing a A=HASH(0x8124394) with A initialization stuff A: initializing a B=HASH(0x81244cc) with B initialization stuff B: initializing a B=HASH(0x81244cc) with B initialization stuff A: initializing a C=HASH(0x813b2c4) with C initialization stuff C: initializing a C=HASH(0x813b2c4) with C initialization stuff A: initializing a D=HASH(0x813b324) with D initialization stuff B: initializing a D=HASH(0x813b324) with D initialization stuff C: initializing a D=HASH(0x813b324) with D initialization stuff D: initializing a D=HASH(0x813b324) with D initialization stuff A: initializing a E=HASH(0x813b378) with E initialization stuff B: initializing a E=HASH(0x813b378) with E initialization stuff C: initializing a E=HASH(0x813b378) with E initialization stuff D: initializing a E=HASH(0x813b378) with E initialization stuff E: initializing a E=HASH(0x813b378) with E initialization stuff As shown in the attached code, there's some magic in base class A that causes chaining to occur and avoids initialization from occurring more than once per class per object. Each subclass has an initialization method that looks like this: sub initialize { my $self = shift; $self->initialize_chain(@_); # do own initialization } Lincoln # ------------------------ # FILE: A.pm # ------------------------ package A; our %seenit; sub new { my $class = shift; my $self = bless {},ref $class || $class; $self->initialize(@_); $self; } sub initialize { my $self = shift; warn __PACKAGE__ . ": initializing a $self with @_\n"; } sub initialize_chain { my $self = shift; local %seenit unless %seenit; $self->_initialize_chain(@_); } sub _initialize_chain { my $self = shift; my $package = caller(1); for my $parent (@{"$package\:\:ISA"}) { next if $seenit{$parent}++; next unless $parent->can('initialize'); $initialize = "$parent\:\:initialize"; $self->$initialize(@_); } } 1; # ------------------------ # FILE: B.pm # ------------------------ package B; use base 'A'; sub initialize { my $self = shift; $self->initialize_chain(@_); warn __PACKAGE__ . ": initializing a $self with @_\n"; } 1; # ------------------------ # FILE: C.pm # ------------------------ package C; use base 'A';C sub initialize { my $self = shift; $self->initialize_chain(@_); warn __PACKAGE__ . ": initializing a $self with @_\n"; } 1; # ------------------------ # FILE: D.pm # ------------------------ package D; use base 'B','C'; sub initialize { my $self = shift; $self->initialize_chain(@_); warn __PACKAGE__ . ": initializing a $self with @_\n"; } 1; # ------------------------ # FILE: E.pm # ------------------------ package E; use base 'D'; sub initialize { my $self = shift; $self->initialize_chain(@_); warn __PACKAGE__ . ": initializing a $self with @_\n"; } 1; -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Sep 9 18:47:59 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 18:47:30 2003 Subject: [Bioperl-l] No version numbers on bioperl pms. In-Reply-To: <200307302021.h6UKLkOU027301@mx2.nyu.edu> References: <3F27F4F6.7030501@nyu.edu> <200307302021.h6UKLkOU027301@mx2.nyu.edu> Message-ID: <200309091847.59264.lstein@cshl.edu> Sort of a problem here. I originally had version numbers in all my modules, but they were removed from the live branch when we went to the overall BioPerl versioning system. However, before the 1.2 release, I moved many of my modules from "live" into branch-1-2. This meant that the modules went into bioperl 1.2.X without any versions at all. Lincoln On Wednesday 30 July 2003 04:15 pm, Philip MacMenamin wrote: > I am aware that the overall version of BioPerl is 1.2.2. I dont think I am > interested in this though. I want the version of the specific modules. I > have looked at the Bio::DB::GFF perldoc, and all I see is $Id$. There is > nothing to the left or right of this. > > When I grep through it therefore, I get this returned: > # $Id$ > > This is not helpful. When I perldoc it, there is nothing either. > > When I look at the other pms not in BioPerl (say Fatal.pm), there is a > version number (ie $VERSION = 1.02) that is used and needed by things such > as Makemaker. Without $VERSION makemaker may fail to work, and probably > give some not helpful message (unless you have the newest version). > > Now, if you are telling me that by virtue of the fact that this is bioPerl > 1.2.2 *all* pms contained within are implicitly 1.2.2 this is another > story. And I would be happy. But, I dont know that this is the case. > > All I want to know is what version of Bio::DB::GFF and Bio::Graphics is the > latest. (I downloaded it using cpan, not from source) > > Thanks, > Philip. > > On Wednesday 30 July 2003 04:02 pm, you wrote: > > % perldoc -m Bio::DB::GFF | grep Id > > > > There are only explict versions when we do a release. 1.2.2 is the latest > > stable release. > > > > Did you download the code from CVS, a release from the website, > > from http://bioperl.org/SRC? > > > > On Wed, 30 Jul 2003, Philip MacMenamin wrote: > > > OK, most of them have $Id$ at the start. (Although Graphics.pm does > > > not). However, if it is there, what do you do with it? When I perldoc > > > it, I see no version number. > > > > > > To be honest, all I want to know is the versions of the Bio::DB::GFF > > > Bio::Graphics that would have been downloaded in the last few days. (ie > > > I assume the most recent version) > > > > > > Thanks. > > > > > > On Wednesday 30 July 2003 02:59 pm, you wrote: > > > > you can also look at the > > > > $Id$ > > > > in almost every file which list the Revision of the particular > > > > instance of the code you have. > > > > > > > > -jason > > > > > > > > On Wed, 30 Jul 2003, Philip MacMenamin wrote: > > > > > Hi Jason, > > > > > > > > > > OK, thats fine. However, the overall version number is not what I > > > > > need. > > > > > > > > > > I would like to know what version of Bio::DB::GFF i have, and what > > > > > version of Bio::Graphics. Because at the moment I just guessed a > > > > > $VERSION number and hacked it into them, and hoped for the best. > > > > > And, although GMOD installs, it doesnt work properly. And, its > > > > > probably nothing to do with the version I guessed, but I really > > > > > dont know. > > > > > > > > > > It just seems that it might be straight forward to do things the > > > > > same way as in the rest of perl (from what I can see of the rest of > > > > > perl anyway), to make things easier for people like me, OR, to put > > > > > some comment in the code at the point were the VERSION number > > > > > usually is, (like line 2 or somthing) saying this is version such > > > > > and such, but we dont use $VERSION numbers for reason X. > > > > > > > > > > So, if anyone knows the version number of Bio::DB::GF, and > > > > > Bio::Graphics that I downloaded in the last couple of days is, I > > > > > would be much obliged to them if they let me know (so I dont have > > > > > to just guess it). > > > > > > > > > > Thanks again, > > > > > Philip. > > > > > > > > > > On Wednesday 30 July 2003 01:05 pm, you wrote: > > > > > > We're aware. > > > > > > > > > > > > We've implemented a new system > > > > > > Bio::Root::Version > > > > > > which will now make the overall version number of package > > > > > > available. > > > > > > > > > > > > This is only for current CVS bioperl-live so will be in bioperl > > > > > > 1.4 > > > > > > > > > > > > -jason > > > > > > > > > > > > On Wed, 30 Jul 2003, philip wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I assume that you are aware that your BioPerl pms are not > > > > > > > versioned. > > > > > > > > > > > > > > This seems unusual to me, however I am no perl head. > > > > > > > > > > > > > > BUT, it can cause problems (which took me a LONG time to > > > > > > > understand, see previous comment about my lack of perl cool > > > > > > > points) when I was trying to set up GMOD. What was happening > > > > > > > that when Lincolns Makefile.PL was demanding such and such > > > > > > > version of GFF etc, makemaker was going off, and looking for > > > > > > > such and such version number, was getting nothing, and saying > > > > > > > that I didnt have GFF at all. The newest makemaker does not do > > > > > > > this, it gives out about not understanding the version number > > > > > > > or something. > > > > > > > > > > > > > > But, I don't know, I have looked at some other ordinary pms and > > > > > > > they all seem to have VERSION numbers. So, I just thought that > > > > > > > I would post this up here. > > > > > > > > > > > > > > All the best, > > > > > > > Philip. > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l@portal.open-bio.org > > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Sep 9 18:56:46 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Sep 9 18:55:32 2003 Subject: [Bioperl-l] problem with Graphics In-Reply-To: <3F3C5984.3070401@csiro.au> References: <200308121545.33474.sobrien@umail.ucsb.edu> <3F3C5984.3070401@csiro.au> Message-ID: <200309091856.46751.lstein@cshl.edu> If you are using ActiveState perl on Windows or RedHat Perl (which is a bad thing -- get rid of it), then you must do this: binmode(STDOUT); print $panel->png; Somebody decided that it would be great idea to enable linefeed/carriage return translation by default on these platforms. Lincoln On Thursday 14 August 2003 11:54 pm, Wes Barris wrote: > Sean O'Brien wrote: > > Hi, > > > > I have been trying to get BioPerl to output png's, but I seem to be > > getting invalid png files. I have a fresh install of libgd, version > > 2.0.15 and my GD version is 2.07. I installed Bundle::BioPerl, and after > > having no luck, I installed BioPerl version 1.2.2 from the sources in > > current_core_stable.tar.gz. When I run 'make test' I get an ok for > > BioGraphics. Also, when I run the first script described in the Bio > > Graphics tutorial, it runs with no errors and outputs some data which > > appears as though it could be an image. However, the file seems to be of > > an invalid png format because it cannot be opened by display, galeon or > > the GIMP. This is pretty frustrating because everything apears to be > > installing/running fine, but then the image is somehow corrupted. What > > might I have done wrong/ need to do to make this work. Thanks. > > Hi Sean, > > The examples that output png files never worked for me either. To fix > them, I changed this line: > > print $panel->png; > > to something like this: > > open(OUT, ">junk.png"); > binmode OUT; > print OUT $panel->png; > close(OUT); > print("Wrote junk.png\n"); -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From gniu at sibs.ac.cn Wed Sep 10 01:46:58 2003 From: gniu at sibs.ac.cn (Gang NIU) Date: Wed Sep 10 01:44:05 2003 Subject: [Bioperl-l] (no subject) Message-ID: <200309100544.h8A5i1sX012165@portal.> From lichunjiang at sibs.ac.cn Wed Sep 10 02:04:07 2003 From: lichunjiang at sibs.ac.cn (=?gb2312?B??=) Date: Wed Sep 10 02:02:40 2003 Subject: [Bioperl-l] source codes required Message-ID: <1063123447$40701068@lichunjiang@sibs.ac.cn> Hello: I\'m a freshmen using this mailinglist as well as bioperl.I\'m now in need of a programme to parsing .gbk file into a relatively small database with information intersted not lost.Another progrmme I need is to Blast the whole human genome with inputtedquery sequences--I\'ve written a programme which work well with small datasize fly genome while fail to work with the much larger human genome on my PC. you\'re appratiated very much for sending me the source codes for this two programmes. Holly From lichunjiang at sibs.ac.cn Wed Sep 10 03:16:12 2003 From: lichunjiang at sibs.ac.cn (=?gb2312?B??=) Date: Wed Sep 10 03:14:46 2003 Subject: [Bioperl-l] genbank parsing Message-ID: <1063124172$1635499643@lichunjiang@sibs.ac.cn> Hello: I\'m a freshman using bioperl, will you send me source codes for program parsing genbank file? You\'re appratiated very much for your help. Holly From shawnh at fugu-sg.org Wed Sep 10 08:03:09 2003 From: shawnh at fugu-sg.org (Shawn Hoon) Date: Wed Sep 10 07:58:28 2003 Subject: [Bioperl-l] StandAloneBlast for Wu-Blast Message-ID: I just committed support for WU-Blast in StandAloneBlast. It returns a Bio::SearchIO report. I have tested it with some parameters but I expect some differences in parsing the output using certain wu-blast specific parameters. People with more experience with WU-Blast, pls check it out. cheers, shawn From matthias.wahl at gsf.de Wed Sep 10 17:58:20 2003 From: matthias.wahl at gsf.de (Matthias Wahl) Date: Wed Sep 10 08:41:51 2003 Subject: [Bioperl-l] Bio::DB::GFF problem Message-ID: <3F5F9E7C.9070807@gsf.de> Hi Lincoln, Thank you very much for your input. The $gff_db->segment call now works fine. However, I am still not able to generate an xyplot. Using the following GFF file: 1 chromosome Component 1 195869683 . . . Sequence "1" 1 EnsEMBL gene_density 1 200000 0 . . gene_density "1.density" 1 EnsEMBL gene_density 200001 400000 0 . . gene_density "1.density" 1 EnsEMBL gene_density 400001 600000 0 . . gene_density "1.density" 1 EnsEMBL gene_density 600001 800000 0 . . gene_density "1.density" 1 EnsEMBL gene_density 800001 1000000 0 . . gene_density "1.density" 1 EnsEMBL gene_density 1000001 1200000 0 . . gene_density "1.density" ... I tried to aggregate all gene_density features my $aggregator = Bio::DB::GFF::Aggregator->new(-method => 'density' -sub_parts => 'gene_density'); my $gff_db = Bio::DB::GFF->new(-adaptor =>'dbi::mysqlopt', -dsn=>'dbi:mysql:Mus_musculus_GFF', -user => 'XXXXX', -pass => 'XXXXX, -aggregator => $aggregator, ); my $gene_density = $gff_db->segment($chromosome_name) or die "Can not retrieve GFF segment: ".$gff_db->error; my @features = $gene_density->features('gene_density'); This all works fine. But when I try to draw a xyplot $picture->add_track(\@features, -glyph => 'xyplot'); I end up with bars in my image, but no plot! Could you by chance tell me what I am doing wrong? Many thanks Matthias Lincoln Stein wrote: >Hi, > >I'm catching up on my bioperl mail after being on vacation for August. The >problem is that you need an entry for the whole chromosome because the >segment() call needs to retrieve it in order to get the length. Something >like this will do the trick: > >1 EnsEMBL chromosome 1 1502911 . . . Chromosome 1 > >Lincoln > >On Wednesday 13 August 2003 07:02 pm, Matthias Wahl wrote: > > >>Hi all! >> >>I have trouble in using Bio::DB::GFF with the following code: >> >>my $aggregator = Bio::DB::GFF::Aggregator->new(-method => 'gene_density' >> -sub_parts => >>'EnsEMBL:gene_density'); >> >>my $gff_db = Bio::DB::GFF->new(-adaptor =>'dbi::mysqlopt', >> -dsn=>'dbi:mysql:Mus_musculus_GFF', >> -user => 'xxxxx', >> -pass => 'xxxxx', >> -aggregator => $aggregator >> ); >> >> >>Calling >> >>$gff_db->segment(-class=>'Chromosome', >> -value=>'1'); >> >>always returns undef (whatever arguments I use)! >>The database has been generated by loading a GFF file of the following >>format: >> >>1 EnsEMBL gene_density 1000001 2000000 0 >>Chromosome 1 >> >>1 EnsEMBL gene_density 2000001 3000000 0 >>Chromosome 1 >> >>1 EnsEMBL gene_density 3000001 4000000 1 >>Chromosome 1 >> >>1 EnsEMBL gene_density 4000001 5000000 12 >>Chromosome 1 >> >>1 EnsEMBL gene_density 5000001 6000000 4 >>Chromosome 1 >> >>with load_gff.PLS (columns are tab-seperated, the 9th column consists of >>'Chromosome' and name, seperated by space), both with and without the >>associated sequence file. >> >>Calling >> >>$gff_db->features() >> >>works fine. But I need aggregated features for generating a >>Bio::Graphics xyplot (to plot the gene density for a particular >>chromosome). >> >>Many thanks, >> >>Matthias >> >> > > > -- Matthias Wahl GSF-National Research Center for Environment and Health Institute of Developmental Genetics Ingolstaedter Landstrasse 1 D-85764 Neuherberg Germany TEL: ++49 89 3187-4117,-2638 FAX: ++49 89 3187-3099 E-mail: matthias.wahl@gsf.de WWW: http://www.gsf.de/idg From birney at ebi.ac.uk Wed Sep 10 10:35:39 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Wed Sep 10 10:34:10 2003 Subject: [Bioperl-l] (no subject) Message-ID: <0C892EA0-E39C-11D7-8765-000393CBD5AE@ebi.ac.uk> I have been scripting primer design for a while where I find I have better control over the heuristics and (importantly) can include BLAST/exonerate matching of a region to its own genome to find unique-in-genome areas. I know Primer3 is out there, but in some cases, making sure you design a primer in a non-duplicated region is more important than getting the right G/C content etc. I'd like to propose the following modules: Bio::Primer::Feature.pm a single primer, SeqFeatureI compliant, start/end on a sequence, reuses the seq(), has gc content methods and has_inversion($size) which gives back the first inverted string over size or undef if none. Bio::Primer::Pair.pm a pair of primers, having left and right Bio::Primer::Feature.pm's with "joint" methods such as diff_gc(), the difference in GC content between the two pairs Bio::Primer::AssessmentI,pm interfaces which defines the method $score = $assor->assess($pair); Bio::Primer::Design.pm takes a sequence, an optional left hand region (defaults to 50bp), an optional right hand region (defaults to 50bp), an optional primer size (default of 20), an optional prune score and a list of Bio::Primer::AssessmentI.pm compliant modules. design works the following way: generates every left hand and right hand primer of size foreach left,right pair, applies each Assessment module in turn. If the score falls below prune at any point, discards this pair immedaitely (therefore by setting prune to - say - -100 and having an assessment module of inversion_greater_than_5 give -200 then primer pairs with this are never considered, to keep the list manageable if needs be). stores final score for this pair provides final "best pair" or complete list Assessment modules first up would be: Bio::Primer::Assessment::inversion_length.pm Bio::Primer::Assessment::GC_content.pm Bio::Primer::Assessment::GC_matching.pm (primers should have the same melting temperature) Bio::Primer::Assessment::product_length.pm (ideal product length of around 1KB) these would all take some "weight" constructor to allow them to be weighted differently I'd also build in Bio::SearchIO or SeqFeature based modules which "banned" certain regions of the sequence from being used. I thought about putting the Bio::Primer::Feature.pm in Bio::SeqFeature::Primer.pm but I thought that keeping all the modules together made more sense. This could also go off Bio::Tools::Primer::* if people prefered. any views? From jmanning at genome.wi.mit.edu Wed Sep 10 11:02:13 2003 From: jmanning at genome.wi.mit.edu (Jonathan M. Manning) Date: Wed Sep 10 11:00:44 2003 Subject: [Bioperl-l] primer selection References: <0C892EA0-E39C-11D7-8765-000393CBD5AE@ebi.ac.uk> Message-ID: <3F5F3CF5.8080603@genome.wi.mit.edu> I too have an internal set of scripts that handles primer selection, using Bio::Seq objects and primer3, then blasting against E.Coli to screen out vector matches. I'm not entirely convinced about the 'Pair of primers' approach you've listed here - I'd favor it being primarily for selection of single left or right primers, though these objects could certainly be paired up later, and additional 'Assessments' be run on them. You'll have to get a reply from someone else regarding how it best fits into BioPerl, but I'd like to see something like this included, and I'd be willing to contribute to it. I was planning a rewrite of my scripts anyway... time to modularize. ~Jonathan Ewan Birney wrote: > > I have been scripting primer design for a while where I find I have > better control over > the heuristics and (importantly) can include BLAST/exonerate matching of > a region > to its own genome to find unique-in-genome areas. > > > I know Primer3 is out there, but in some cases, making sure you design a > primer in > a non-duplicated region is more important than getting the right G/C > content etc. > > From vesko_baev at abv.bg Wed Sep 10 13:10:13 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Wed Sep 10 13:09:28 2003 Subject: [Bioperl-l] foreach (@array)? Message-ID: <469001640.1063213813115.JavaMail.nobody@app1.ni.bg> Hi, When I start my script, it return results only for first variable in @array, but in the raw 'foreach (@array){' in the @array there is two or more variables, but anytime it returns the result whit first variable: foreach (@mirna) { $id=$_; while (my $RNA=$DB->next_seq()) { $RNAid=$RNA->display_id; if ($RNAid eq $id) { $seqRNA=$RNA->seq(); } else {next}; for (my $i=0; $i References: <469001640.1063213813115.JavaMail.nobody@app1.ni.bg> Message-ID: <20030910173453.GA23688@bioinfo.ucr.edu> On Wed 09/10/03 20:10, Vesko Baev wrote: > Hi, > When I start my script, it return results only for first variable in > @array, but in the raw 'foreach (@array){' in the @array there is two > or more variables, but anytime it returns the result whit first > variable: > My guess would be this is an issue with $_, so use foreach my $id (@mirna) { rather than: > foreach (@mirna) { > $id=$_; > while (my $RNA=$DB->next_seq()) { > $RNAid=$RNA->display_id; > if ($RNAid eq $id) { > $seqRNA=$RNA->seq(); > } > else {next}; > > > for (my $i=0; $i $subgene=substr($geneRNA,$i,length($seqRNA)); > $percentage = align_subs($seqRNA, $subgene); > > if($percent<$percentage) { > push (@RNAname,$id); > push (@subgene, $subgene); > push (@percentage,$percentage); > push (@position,($i+1)); > } > } > }; > }; > }; > Thanks! > > > ----------------------------------------------------------------- > http://gsm.ABV.bg - ????? ???? ??????? ?? ???? ??????? ! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ---------------------------- | Josh Lauricha | | laurichj@bioinfo.ucr.edu | | Bioinformatics, UCR | |--------------------------| From sbour at niaid.nih.gov Wed Sep 10 13:44:30 2003 From: sbour at niaid.nih.gov (Stephan Bour) Date: Wed Sep 10 13:43:40 2003 Subject: [Bioperl-l] foreach (@array)? In-Reply-To: <469001640.1063213813115.JavaMail.nobody@app1.ni.bg> Message-ID: I believe you need to assign each array element to a scalar variable for the foreach loop to work: foreach $mirna (@mirna) { Stephan. > Hi, > When I start my script, it return results only for first variable in @array, > but in the raw 'foreach (@array){' in the @array there is two or more > variables, but anytime it returns the result whit first variable: > > foreach (@mirna) { > $id=$_; > while (my $RNA=$DB->next_seq()) { > $RNAid=$RNA->display_id; > if ($RNAid eq $id) { > $seqRNA=$RNA->seq(); > } > else {next}; > > > for (my $i=0; $i $subgene=substr($geneRNA,$i,length($seqRNA)); > $percentage = align_subs($seqRNA, $subgene); > > if($percent<$percentage) { > push (@RNAname,$id); > push (@subgene, $subgene); > push (@percentage,$percentage); > push (@position,($i+1)); > } > } > }; > }; > }; > Thanks! > > > ----------------------------------------------------------------- > http://gsm.ABV.bg - ????? ???? ??????? ?? ???? ???? > ??? ! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jmanning at genome.wi.mit.edu Wed Sep 10 14:13:03 2003 From: jmanning at genome.wi.mit.edu (Jonathan M. Manning) Date: Wed Sep 10 14:11:39 2003 Subject: [Bioperl-l] foreach (@array)? References: <469001640.1063213813115.JavaMail.nobody@app1.ni.bg> Message-ID: <3F5F69AF.4010401@genome.wi.mit.edu> The other replies suggested assigning the foreach. While it's probably a good idea, you don't *have* to assign it to a variable. What you have should work. The real problem here is the next_seq() call. This function consumes your $DB Bio::SeqIO object. Once you get a next_seq, you can never go back. I think you're trying to iterate over it multiple times. Here's a quick solution: my %sequences; ## This goes through your sequences once, and stores them for later use. while(my $RNA=$DB->next_seq()) { ## Added bonus of indexing by display_id. ## this allows for quick lookup later my $RNAid = $RNA->display_id; $sequences{$RNAid} = $RNA; } ## Now just use: foreach $id (@mirna) { next unless (exists $sequences{$id} && defined $sequences{$id}); $seqRNA = $sequences{$id}->seq(); # rest of your code here... for (my $i=0; $i Hi, > When I start my script, it return results only for first variable in @array, but in the raw 'foreach (@array){' in the @array there is two or more variables, but anytime it returns the result whit first variable: > > foreach (@mirna) { > $id=$_; > while (my $RNA=$DB->next_seq()) { > $RNAid=$RNA->display_id; > if ($RNAid eq $id) { > $seqRNA=$RNA->seq(); > } > else {next}; > > > for (my $i=0; $i $subgene=substr($geneRNA,$i,length($seqRNA)); > $percentage = align_subs($seqRNA, $subgene); > > if($percent<$percentage) { > push (@RNAname,$id); > push (@subgene, $subgene); > push (@percentage,$percentage); > push (@position,($i+1)); > } > } > }; > }; > }; > Thanks! > > > ----------------------------------------------------------------- > http://gsm.ABV.bg - ????? ???? ??????? ?? ???? ??????? ! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jonathan Manning Whitehead Institute Center for Genome Research Finishing Process Analyst / Data Analyst From skirov at utk.edu Wed Sep 10 15:00:49 2003 From: skirov at utk.edu (Stefan Kirov) Date: Wed Sep 10 14:59:24 2003 Subject: [Bioperl-l] foreach (@array)? Message-ID: <3F5F74E1.4070006@utk.edu> Hi Vesko, My underrstanding is you are trying to extract the sequence information for a list of ids you have in @mirna. While the solution given by Jonathan is good, there might be a memory problem if your sequences are big and/or the database contains considerable amount of sequences. Now there is another solution: my (@found,@ids); while(my $RNA=$DB->next_seq()) { my $id=$RNA->display_id; push @found, $RNA if (grep(/\b$id\b/,@mirna)); push @ids,$id; } After that you will have an array with Bio::Seq objects, that are in your initial list (@mirna) and you can extract any info you want, using the Bio::Seq methods on the fly. Also you have an array (@ids) with the retrieved sequences, so you can see which ones you miss by comparing @ids with mirna. I think this should work ----------- By the way kade rabotish? Good luck! The other replies suggested assigning the foreach. While it's probably a good idea, you don't *have* to assign it to a variable. What you have should work. The real problem here is the next_seq() call. This function consumes your $DB Bio::SeqIO object. Once you get a next_seq, you can never go back. I think you're trying to iterate over it multiple times. Here's a quick solution: my %sequences; ## This goes through your sequences once, and stores them for later use. while(my $RNA=$DB->next_seq()) { ## Added bonus of indexing by display_id. ## this allows for quick lookup later my $RNAid = $RNA->display_id; $sequences{$RNAid} = $RNA; } ## Now just use: foreach $id (@mirna) { next unless (exists $sequences{$id} && defined $sequences{$id}); $seqRNA = $sequences{$id}->seq(); # rest of your code here... for (my $i=0; $inext_seq()) { $RNAid=$RNA->display_id; if ($RNAid eq $id) { $seqRNA=$RNA->seq(); } else {next}; for (my $i=0; $i Dear sir, i'm using bioperl version 1.0.1. I could convert fasta to most of the formats except phd. I'm sending the error mesaage. please explain me what i should do. the error message is as follows: [root@Host0 bioperl-0.9.0]# perl conphd.pl Bio/SeqIO/phd.pm: phd cannot be found Exception Can't locate Bio/SeqIO/phd.pm in @INC (@INC contains: /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl .) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm line 477. For more information about the SeqIO system please see the SeqIO docs. This includes ways of checking for formats at compile time, not run time Can't use an undefined value as a symbol reference at conphd.pl line 9, line 1. i'll be grateful if could give suugestions to solve this problem with regards sankari --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software From scassidy at accelrys.com Wed Sep 10 18:34:10 2003 From: scassidy at accelrys.com (Susan Cassidy) Date: Wed Sep 10 19:37:41 2003 Subject: [Bioperl-l] Question about genbank.pm/SeqIO support for info in SEGMENT Genbank line Message-ID: If a Genbank entry has a SEGMENT line, like the following (partial) entry: LOCUS AARPOB2 871 bp DNA linear BCT 03-FEB-2000 DEFINITION Abiotrophia adiacens RNA polymerase beta subunit (rpoB) gene, partial cds. ACCESSION AF194508 VERSION AF194508.1 GI:6449110 KEYWORDS . SEGMENT 2 of 2 SOURCE Granulicatella adiacens I do not see anywhere that that information is saved/returned via any SeqIO method. I looked at genbank.pm, and did not see any reference to that line at all. Am I missing something? Thanks, Susan Cassidy From lichunjiang at sibs.ac.cn Wed Sep 10 22:23:26 2003 From: lichunjiang at sibs.ac.cn (=?gb2312?B??=) Date: Wed Sep 10 22:21:59 2003 Subject: [Bioperl-l] questions: [NULL_Caption] FATAL ERROR: CoreLib [001.000] 1>: Failed to allocate 10000 byte Message-ID: <1063211006$1781643295@lichunjiang@sibs.ac.cn> Hi, I\'ve written a program to do local blast and extract information on the gene near to the hits.The program works well with fly genome,but it fail to work with human genome.Following error is suggested: [NULL_Caption] FATAL ERROR: CoreLib [001.000] 1>: Failed to allocate 10000 bytes Is that due to the large datasize? You are appratiated to help! #! /usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::Seq; use Bio::Tools::Run::StandAloneBlast; use Bio::SearchIO; my $bits=200; my $infile=\"#defined inflie\"; my $database=\'##\'; open(OU,\">defined outfile\") or die \"cannot open output.\\n\"; print OU \"\\#Input :\\t\\t $infile\\n\"; print OU \"\\#Bits Limit:\\t$bits\\n\"; my @params=(\'database\' =>$database, \'program\' =>\'blastn\', \'e\' =>\'10\', \'W\' =>\'15\', \'_READMETHOD\' =>\'Blast\'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $io=Bio::SeqIO->new(\'-file\' => $infile, \'-format\' =>\'fasta\'); my $Oinseq; while($Oinseq=$io->next_seq) { my $blast_report = $factory->blastall($Oinseq); print \"blast\\n\"; my $hsptable; my $searchio = new Bio::SearchIO(\'-file\' => $blast_report->file); my $blast_result = $searchio->next_result; while (my $hit=$blast_result->next_hit) { my $hitbits = $hit->bits(); $hitbits =~s/^e/1e/; next if ($hitbits<=$bits); while (my $hsp=$hit->next_hsp()) { my $hspbits=$hsp->bits(); $hspbits=~s/^e/1e/; if ($hspbits<=$bits) {next;} else {$hsptable->{$hit->accession}->{$hsp->hstart}=$hsp->hend; print\"hash\\n\";} } } my $count=1; my $acc; my $hsphash; my $hspst; my $hspend; my $eq; my $aseq; my $cseq; my $Oseq; my $read=1; my $write=1; my $bgene; my $hgene; while(($acc,$hsphash)=each %$hsptable) { my $ino = Bio::SeqIO->new(\'-file\'=>\"F:/data/$database/$acc.gbk\", \'-format\'=>\'genbank\'); my $Oseq = $ino->next_seq; print \" read $read\\n\"; my @feat=$Oseq->get_SeqFeatures(); print \"features\\n\"; $read++; while(($hspst,$hspend)= each %$hsphash) { print OU \"#$count ACC: $acc\\tStart: $hspst\\tEnd: $hspend#\\n\"; $count++; foreach my $feat(@feat) { if ($feat->primary_tag eq \'CDS\') { my $featend = $feat->end; my $featstart = $feat->start; if( ($hspst-$featend)>=0) { $bgene=\'\'; if ($feat->has_tag(\'gene\')) {$bgene= join(\'\',$feat->each_tag_value(\'gene\')); next;} } if (($hspend-$featstart)<=0) { $hgene=\'\'; if ($feat->has_tag(\'gene\')) {$hgene= join(\'\',$feat->each_tag_value(\'gene\')); last;} } } } #$eq=$Oseq->subseq($hspst-100,$hspend+100); print OU \"$bgene and $hgene\\n\"; # print OU \"$eq \\n\\n\"; } } } From lichunjiang at sibs.ac.cn Wed Sep 10 22:25:29 2003 From: lichunjiang at sibs.ac.cn (=?gb2312?B??=) Date: Wed Sep 10 22:24:02 2003 Subject: [Bioperl-l] genbank file parsing Message-ID: <1063211129$612724617@lichunjiang@sibs.ac.cn> Hi, I\'m a freshman using bioperl, will you send me source codes for program parsing genbank file? You\'re appratiated very much for your help. Holly From lichunjiang at sibs.ac.cn Wed Sep 10 22:32:55 2003 From: lichunjiang at sibs.ac.cn (=?gb2312?B??=) Date: Wed Sep 10 22:33:25 2003 Subject: [Bioperl-l] sources code help Message-ID: <1063211575$246224939@lichunjiang@sibs.ac.cn> Hello: I\'m a freshmen using this mailinglist as well as bioperl.I\'m now in need of a programme to parsing .gbk file into a relatively small database with information intersted not lost.Another progrmme I need is to Blast the whole human genome with inputted query sequences--I\'ve written a programme which work well with small datasize fly genome while fail to work with the much larger human genome. you\'re appratiated sending me the source codes for this two programmes. good wishes Holly From jiang_holly2003 at hotmail.com Wed Sep 10 22:59:59 2003 From: jiang_holly2003 at hotmail.com (Jiang Holly) Date: Wed Sep 10 22:58:31 2003 Subject: [Bioperl-l] Questions: CoreLib [001.000] 1>: Failed to allocate 10000 bytes Message-ID: Hi, I've written a program to do local blast and extract information on the gene near to the hits.The program works well with fly genome,but it fail to work with human genome.Following error is suggested: [NULL_Caption] FATAL ERROR: CoreLib [001.000] 1>: Failed to allocate 10000 bytes Is that due to the large datasize? You are appratiated to help! #! /usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::Seq; use Bio::Tools::Run::StandAloneBlast; use Bio::SearchIO; my $bits=200; my $infile=\"#defined inflie\"; my $database=\'##\'; open(OU,\">defined outfile\") or die \"cannot open output.\\n\"; print OU \"\\#Input :\\t\\t $infile\\n\"; print OU \"\\#Bits Limit:\\t$bits\\n\"; my @params=(\'database\' =>$database, \'program\' =>\'blastn\', \'e\' =>\'10\', \'W\' =>\'15\', \'_READMETHOD\' =>\'Blast\'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $io=Bio::SeqIO->new(\'-file\' => $infile, \'-format\' =>\'fasta\'); my $Oinseq; while($Oinseq=$io->next_seq) { my $blast_report = $factory->blastall($Oinseq); print \"blast\\n\"; my $hsptable; my $searchio = new Bio::SearchIO(\'-file\' => $blast_report->file); my $blast_result = $searchio->next_result; while (my $hit=$blast_result->next_hit) { my $hitbits = $hit->bits(); $hitbits =~s/^e/1e/; next if ($hitbits<=$bits); while (my $hsp=$hit->next_hsp()) { my $hspbits=$hsp->bits(); $hspbits=~s/^e/1e/; if ($hspbits<=$bits) {next;} else {$hsptable->{$hit->accession}->{$hsp->hstart}=$hsp->hend; print\"hash\\n\";} } } my $count=1; my $acc; my $hsphash; my $hspst; my $hspend; my $eq; my $aseq; my $cseq; my $Oseq; my $read=1; my $write=1; my $bgene; my $hgene; while(($acc,$hsphash)=each %$hsptable) { my $ino = Bio::SeqIO->new(\'-file\'=>\"F:/data/$database/$acc.gbk\", \'-format\'=>\'genbank\'); my $Oseq = $ino->next_seq; print \" read $read\\n\"; my @feat=$Oseq->get_SeqFeatures(); print \"features\\n\"; $read++; while(($hspst,$hspend)= each %$hsphash) { print OU \"#$count ACC: $acc\\tStart: $hspst\\tEnd: $hspend#\\n\"; $count++; foreach my $feat(@feat) { if ($feat->primary_tag eq \'CDS\') { my $featend = $feat->end; my $featstart = $feat->start; if( ($hspst-$featend)>=0) { $bgene=\'\'; if ($feat->has_tag(\'gene\')) {$bgene= join(\'\',$feat->each_tag_value(\'gene\')); next;} } if (($hspend-$featstart)<=0) { $hgene=\'\'; if ($feat->has_tag(\'gene\')) {$hgene= join(\'\',$feat->each_tag_value(\'gene\')); last;} } } } #$eq=$Oseq->subseq($hspst-100,$hspend+100); print OU \"$bgene and $hgene\\n\"; # print OU \"$eq \\n\\n\"; } } } _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl _________________________________________________________________ Tired of spam? Get advanced junk mail protection with MSN 8. http://join.msn.com/?page=features/junkmail From jiang_holly2003 at hotmail.com Wed Sep 10 23:02:01 2003 From: jiang_holly2003 at hotmail.com (Jiang Holly) Date: Wed Sep 10 23:00:31 2003 Subject: [Bioperl-l] genbank file parsing Message-ID: Hi, I'm a freshman using bioperl, will you send me source codes for program parsing genbank file? You're appratiated very much for your help. Holly _________________________________________________________________ STOP MORE SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail From wes.barris at csiro.au Wed Sep 10 23:24:24 2003 From: wes.barris at csiro.au (Wes Barris) Date: Wed Sep 10 23:23:08 2003 Subject: [Bioperl-l] genbank file parsing In-Reply-To: References: Message-ID: <3F5FEAE8.3070601@csiro.au> Jiang Holly wrote: > Hi, > I'm a freshman using bioperl, will you send me source codes for program > parsing genbank file? You're appratiated very much for your help. > Holly There are some good examples of how to do this in the "HOW TO" section of the bioperl.org web site: http://www.bioperl.org/HOWTOs/html/SeqIO.html -- Wes Barris E-Mail: Wes.Barris@csiro.au From andrew at anatomy.otago.ac.nz Thu Sep 11 00:54:36 2003 From: andrew at anatomy.otago.ac.nz (Andrew Macgregor) Date: Thu Sep 11 00:53:13 2003 Subject: [Bioperl-l] Small change in UniGene file format Message-ID: <0B141079-E414-11D7-A8EE-00039399CEDC@anatomy.otago.ac.nz> Andrew Walsh pointed out in bug report 1491 that the NCBI *.data files now include a version number at the end of the accession number in each SEQUENCE line e.g. ACC=BQ190891.1 I have modified the UniGene module to handle this. The resulting Seq obj in this case now returns BQ190891 as the accession number and 1 as the version. The module can still parse the older format, e.g. accession numbers without version info. I hope this conforms to conventions regarding accession numbers in bioperl. -- Andrew. From fangl at genomics.org.cn Thu Sep 11 01:48:18 2003 From: fangl at genomics.org.cn (Magic Fang) Date: Thu Sep 11 01:43:55 2003 Subject: [Bioperl-l] draw vertical line and text string in Bio::Graphics::Panel object Message-ID: <200309111344562.SM02252@magicpc> Dear my colleagues, can Bio::Graphics::Panel draw vertical line, and output text on the track i want? My task is to draw cDNA to genome mapping. Thank you. Magic Fang From fangl at genomics.org.cn Thu Sep 11 02:43:26 2003 From: fangl at genomics.org.cn (Magic Fang) Date: Thu Sep 11 02:39:28 2003 Subject: [Bioperl-l] Bio::DB::GenBank problem Message-ID: <200309111439531.SM02252@magicpc> Dear my colleagues, can anybody tell me why the codes below get different result: use Bio::DB::Query::GenBank; use Bio::DB::GenBank; $gb=Bio::DB::GenBank->new(-format => 'fasta'); $gb->proxy([ftp, http], 'http://192.168.4.7:80'); $seq=$gb->get_Seq_by_acc('AC130605.2'); print $seq->display_id, "\t", $seq->length, "\n"; get normal result ---------------------------------------------------------------- use Bio::DB::Query::GenBank; use Bio::DB::GenBank; $gb=Bio::DB::GenBank->new(-format => 'fasta'); $gb->proxy([ftp, http], 'http://192.168.4.7:80'); $seq=$gb->get_Seq_by_acc('AC130605'); print $seq->display_id, "\t", $seq->length, "\n"; ------------- EXCEPTION ------------- MSG: acc does not exist STACK Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:177 STACK toplevel db_test.pl:6 ------------------------------------------------------------------- use Bio::DB::Query::GenBank; use Bio::DB::GenBank; $gb=Bio::DB::GenBank->new(-format => 'genbank'); $gb->proxy([ftp, http], 'http://192.168.4.7:80'); $seq=$gb->get_Seq_by_acc('AC130605'); print $seq->display_id, "\t", $seq->length, "\n"; ------------- EXCEPTION ------------- MSG: acc does not exist STACK Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:177 STACK toplevel db_test.pl:6 -------------------------------------- ---------------------------------------------------------------------- use Bio::DB::Query::GenBank; use Bio::DB::GenBank; $gb=Bio::DB::GenBank->new(-format => 'genbank'); $gb->proxy([ftp, http], 'http://192.168.4.7:80'); $seq=$gb->get_Seq_by_acc('AC130605.2'); print $seq->display_id, "\t", $seq->length, "\n"; ------------- EXCEPTION ------------- MSG: acc does not exist STACK Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:177 STACK toplevel db_test.pl:6 -------------------------------------- Thank you. From Weiner at urz.uni-hd.de Thu Sep 11 03:45:00 2003 From: Weiner at urz.uni-hd.de (January Weiner 3) Date: Thu Sep 11 03:43:33 2003 Subject: [Bioperl-l] Parsing tfasta reports In-Reply-To: <200309111439531.SM02252@magicpc> Message-ID: Hello, I'm trying to parse fasta reports using Bio::SearchIO. It works fine for standard fasta reports; however, whenever I run tfasta, I get the warning "MSG: unrecognized FASTA family report file!", even though the report file produced looks quite normal. Reports from other FASTA programs seem to be pared fine. What do i do wrong? What should I do to check what's going on? Thanks, j. ----)-\//-///-----------------------------------January-Weiner-3------- Toto je konec vasi praci v Windows ? From jason at cgt.duhs.duke.edu Thu Sep 11 07:46:48 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Sep 11 07:45:18 2003 Subject: [Bioperl-l] Parsing tfasta reports In-Reply-To: References: Message-ID: never had the problem with all the flavors of fasta I am running - what version of fasta? Why don't you post this as a bug report at http://bugzilla.bioperl.org/ and an example report as an attachment. -jason On Thu, 11 Sep 2003, January Weiner 3 wrote: > Hello, > I'm trying to parse fasta reports using Bio::SearchIO. It works > fine for standard fasta reports; however, whenever I run tfasta, I > get the warning "MSG: unrecognized FASTA family report file!", even > though the report file produced looks quite normal. > Reports from other FASTA programs seem to be pared fine. What do i > do wrong? What should I do to check what's going on? > > Thanks, > j. > > ----)-\//-///-----------------------------------January-Weiner-3------- > Toto je konec vasi praci v Windows ? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From brian_osborne at cognia.com Thu Sep 11 07:47:09 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Sep 11 07:49:40 2003 Subject: [Bioperl-l] sources code help In-Reply-To: <1063211575$246224939@lichunjiang@sibs.ac.cn> Message-ID: Holly, Although it sounds like you've created your own relational database you might want to consider using a "standard" schema, like Biosql. With the Biosql database and the bioperl-db package installed you can load your sequences using the available loading scripts, extract sequences from the database as sequence objects, and so on. Very powerful, available for Mysql, Postgres, and Oracle. Take a look at the most recent biodatabases.pod file for a brief introduction. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of lichunjiang@sibs.ac.cn Sent: Wednesday, September 10, 2003 10:33 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] sources code help Hello: I\'m a freshmen using this mailinglist as well as bioperl.I\'m now in need of a programme to parsing .gbk file into a relatively small database with information intersted not lost.Another progrmme I need is to Blast the whole human genome with inputted query sequences--I\'ve written a programme which work well with small datasize fly genome while fail to work with the much larger human genome. you\'re appratiated sending me the source codes for this two programmes. good wishes Holly _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Thu Sep 11 08:21:39 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Sep 11 08:24:10 2003 Subject: [Bioperl-l] phd format conversion. In-Reply-To: <20030910162712.14078.qmail@web14203.mail.yahoo.com> Message-ID: Sankari, I can't answer your question but I do have a couple of thoughts. First, you might consider upgrading to version 1.2.2 or get the upcoming 1.2.3, there are a number of important bug fixes in these 2 versions, version 1.0.1 is quite old. Second, I'm wondering why you want to convert from fasta to phd, since *phd files contain not just sequence but also quality scores for each base. Fasta files, of course, don't have quality information. It sounds like you're testing out various format conversions but this is one that normally one would not choose to do. In fact in 1.2.2 this conversion isn't possible, you get this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: You must pass a Bio::Seq::SeqWithQuality object to write_scf as a parameter named "SeqWithQuality" Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of sankari thirumal Sent: Wednesday, September 10, 2003 12:27 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] phd format conversion. Dear sir, i'm using bioperl version 1.0.1. I could convert fasta to most of the formats except phd. I'm sending the error mesaage. please explain me what i should do. the error message is as follows: [root@Host0 bioperl-0.9.0]# perl conphd.pl Bio/SeqIO/phd.pm: phd cannot be found Exception Can't locate Bio/SeqIO/phd.pm in @INC (@INC contains: /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl .) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm line 477. For more information about the SeqIO system please see the SeqIO docs. This includes ways of checking for formats at compile time, not run time Can't use an undefined value as a symbol reference at conphd.pl line 9, line 1. i'll be grateful if could give suugestions to solve this problem with regards sankari --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software From lstein at cshl.edu Thu Sep 11 20:52:49 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Thu Sep 11 20:57:25 2003 Subject: [Bioperl-l] Bio::DB::GenBank problem In-Reply-To: <200309111439531.SM02252@magicpc> References: <200309111439531.SM02252@magicpc> Message-ID: <200309112052.49357.lstein@cshl.edu> Just forget about requesting the format. It works fine with both AC130605 and AC130605.2 with a simple: $gb = Bio::DB::GenBank->new(); $seq=$gb->get_Seq_by_acc('AC130605'); The -format argument is undocumented and intended solely for internal use, which means that if you use it, expect to get burned. Lincoln On Thursday 11 September 2003 02:43 am, Magic Fang wrote: > Dear my colleagues, can anybody tell me why the codes below get different > result: use Bio::DB::Query::GenBank; > use Bio::DB::GenBank; > > $gb=Bio::DB::GenBank->new(-format => 'fasta'); > $gb->proxy([ftp, http], 'http://192.168.4.7:80'); > $seq=$gb->get_Seq_by_acc('AC130605.2'); > print $seq->display_id, "\t", $seq->length, "\n"; > > get normal result > > ---------------------------------------------------------------- > use Bio::DB::Query::GenBank; > use Bio::DB::GenBank; > > $gb=Bio::DB::GenBank->new(-format => 'fasta'); > $gb->proxy([ftp, http], 'http://192.168.4.7:80'); > $seq=$gb->get_Seq_by_acc('AC130605'); > print $seq->display_id, "\t", $seq->length, "\n"; > > ------------- EXCEPTION ------------- > MSG: acc does not exist > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:177 STACK toplevel > db_test.pl:6 > > ------------------------------------------------------------------- > > use Bio::DB::Query::GenBank; > use Bio::DB::GenBank; > > $gb=Bio::DB::GenBank->new(-format => 'genbank'); > $gb->proxy([ftp, http], 'http://192.168.4.7:80'); > $seq=$gb->get_Seq_by_acc('AC130605'); > print $seq->display_id, "\t", $seq->length, "\n"; > > ------------- EXCEPTION ------------- > MSG: acc does not exist > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:177 STACK toplevel > db_test.pl:6 > > -------------------------------------- > > ---------------------------------------------------------------------- > use Bio::DB::Query::GenBank; > use Bio::DB::GenBank; > > $gb=Bio::DB::GenBank->new(-format => 'genbank'); > $gb->proxy([ftp, http], 'http://192.168.4.7:80'); > $seq=$gb->get_Seq_by_acc('AC130605.2'); > print $seq->display_id, "\t", $seq->length, "\n"; > > ------------- EXCEPTION ------------- > MSG: acc does not exist > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:177 STACK toplevel > db_test.pl:6 > > -------------------------------------- > > Thank you. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Thu Sep 11 20:53:54 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Thu Sep 11 20:59:05 2003 Subject: [Bioperl-l] draw vertical line and text string in Bio::Graphics::Panel object In-Reply-To: <200309111344562.SM02252@magicpc> References: <200309111344562.SM02252@magicpc> Message-ID: <200309112053.54519.lstein@cshl.edu> It is unclear from your description what you want to do, exactly. Lincoln On Thursday 11 September 2003 01:48 am, Magic Fang wrote: > Dear my colleagues, can Bio::Graphics::Panel draw vertical line, and output > text on the track i want? My task is to draw cDNA to genome mapping. Thank > you. > > Magic Fang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From jason at cgt.duhs.duke.edu Thu Sep 11 21:08:30 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Sep 11 21:06:41 2003 Subject: [Bioperl-l] blast overview script Message-ID: I wrote a very simple script to display an overview of hits from a SearchIO parsed report. scripts/graphics/search_overview.PLS With a little work we might try and include it as an optional part of HTMLResultWriter. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From wes.barris at csiro.au Fri Sep 12 01:52:32 2003 From: wes.barris at csiro.au (Wes Barris) Date: Fri Sep 12 01:51:15 2003 Subject: [Bioperl-l] How do you pull features out of a genbank file Message-ID: <3F615F20.4030803@csiro.au> Hi, I have been struggling with the bioperl documentation trying to figure out how to pull features out of a genbank file. The following perl code obviously does not work (because I made up the "next_feature" thing). #!/usr/local/bin/perl -w # use strict; use Bio::SeqIO; # my $usage = "Usage $0 \n"; my $infile = shift or die $usage; my $outfile = shift or die $usage; my $seq_in = Bio::SeqIO->new( -format => 'genbank', -file => $infile); my $seq_out = Bio::SeqIO->new( -format => 'fasta', -file => ">$outfile"); while (my $seq = $seq_in->next_seq()) { my $newid = "gi|".$seq->primary_id."|ref|".$seq->accession.".".$seq->version."|"; $seq->id($newid); while (my $feature = $seq->next_feature()) { # <-- what should I put here??? print("$feature\n"); } $seq_out->write_seq($seq); } What I would like to do is to gain access to the variations that are shown in this sample genbank file snippit below: FEATURES Location/Qualifiers source 1..1470 /organism="Bos taurus" /mol_type="mRNA" /db_xref="taxon:9913" /chromosome="14" /map="14" gene 1..1470 /gene="DGAT1" /db_xref="LocusID:282609" misc_feature 730..1422 /gene="DGAT1" /note="ARE1; Region: COG5056, ARE1, Acyl-CoA cholesterol acyltransferase [Lipid metabolism]" /db_xref="CDD:COG5056" variation 694 /gene="DGAT1" /note="single nucleotide polymorphism (snp)" /replace="a" variation 695 /gene="DGAT1" /note="single nucleotide polymorphism (snp)" /replace="a" variation 747 /gene="DGAT1" /note="single nucleotide polymorphism (snp)" /replace="t" variation 1235^1236 /gene="DGAT1" /replace="g" -- Wes Barris E-Mail: Wes.Barris@csiro.au From Richard.Adams at ed.ac.uk Fri Sep 12 05:15:10 2003 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Fri Sep 12 05:13:42 2003 Subject: [Bioperl-l] How do you pull features out of a genbank file Message-ID: <3F618E9E.737F79D9@ed.ac.uk> Wes, This should work: my $seq = $seq_in->next_seq(); ## now get an array of all seq features my @fts = $seq->all_SeqFeatures; #and array of alleles my @allele = grep {$_->primary_tag eq 'variation'} @fts; for my $var (@allele) { print "residue ", $var->start, " has alternative nucleotide ", ($var->each_tag_value('replace'))[0], "\n"; } There is a good explanation of these methods in Bio::SeqFeatureI and Bio::SeqFeature::Generic documentation Richard -- Dr Richard Adams Bioinformatician, Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From rmdeng5 at yahoo.com Fri Sep 12 05:10:05 2003 From: rmdeng5 at yahoo.com (CAD Engineer) Date: Fri Sep 12 08:09:47 2003 Subject: [Bioperl-l] Resume from Mechanical, Civil & Structural Designer Message-ID: <200309121209.h8CC9QMg023300@portal.open-bio.org> RIC M'SIE San Jose, CA 95131 USA Tel (408) 482-2840 rmdeng2@yahoo.com OBJECTIVE: STRUCTURAL & MECHANICAL DESIGNER CIVIL, ARCHITECTURAL, TRANSPORTATION CAD Operator EXPERIENCE: 89 - present DESIGNER, ENGINEER, CAD MANAGER; Engineering & Design Service, Project Management & Development. Preparing technical documentation, calculations, layouts drawings & propositions. CAD Management and Operations, drafting & redesigning. Intergraph, MicroStation, Autodesk, ACAD, Win, Net, Softdesk Mgmt Civil, Bridges and Structural Design, Plans, Mapping, Detail Freeway & Roadway, data translation & inserting. Script & CAD automation. Geological Structures, Viaducts, Freeways, Highways, Shopping Center. Architectural and Environmental Projects and cooperation; military facilities and plans, Cities, Airports, remediation drawings upgrade, correcting and redesign. Traffic design & problem analyzes-reorganize. Freeway Design & Drafting Support, Site analyzing for Caltrans, Architectural, Archeotype & Electrical drawings, "as is" and initial design; Develop remediation procedure and equipment for lead painted buildings. Construction management, Job site inspection, civil & structural support Mechanical Evaluations - Design - Service and Maintenance; R&D. EDUCATION: College - BS Degree - CAD, Engineering DOS, UNIX, MAC, SUN computers; WP, dBase, Lotus, Network, Lisp, Windows & Applications: Excel, Words, Access, Power Point & more METRIC, SOLAR, AutoCAD/Computer Instructor. Transportation Spec. Personal Designer, MS Project, C, Script, File Management, File transfer, Nastran, Algor, Solid Works. Learn quickly, work independly, shift, overtime. From rmdeng5 at yahoo.com Fri Sep 12 05:10:05 2003 From: rmdeng5 at yahoo.com (CAD Engineer) Date: Fri Sep 12 08:09:53 2003 Subject: [Bioperl-l] Resume from Mechanical, Civil & Structural Designer Message-ID: <200309121209.h8CC9QMg023299@portal.open-bio.org> RIC M'SIE San Jose, CA 95131 USA Tel (408) 482-2840 rmdeng2@yahoo.com OBJECTIVE: STRUCTURAL & MECHANICAL DESIGNER CIVIL, ARCHITECTURAL, TRANSPORTATION CAD Operator EXPERIENCE: 89 - present DESIGNER, ENGINEER, CAD MANAGER; Engineering & Design Service, Project Management & Development. Preparing technical documentation, calculations, layouts drawings & propositions. CAD Management and Operations, drafting & redesigning. Intergraph, MicroStation, Autodesk, ACAD, Win, Net, Softdesk Mgmt Civil, Bridges and Structural Design, Plans, Mapping, Detail Freeway & Roadway, data translation & inserting. Script & CAD automation. Geological Structures, Viaducts, Freeways, Highways, Shopping Center. Architectural and Environmental Projects and cooperation; military facilities and plans, Cities, Airports, remediation drawings upgrade, correcting and redesign. Traffic design & problem analyzes-reorganize. Freeway Design & Drafting Support, Site analyzing for Caltrans, Architectural, Archeotype & Electrical drawings, "as is" and initial design; Develop remediation procedure and equipment for lead painted buildings. Construction management, Job site inspection, civil & structural support Mechanical Evaluations - Design - Service and Maintenance; R&D. EDUCATION: College - BS Degree - CAD, Engineering DOS, UNIX, MAC, SUN computers; WP, dBase, Lotus, Network, Lisp, Windows & Applications: Excel, Words, Access, Power Point & more METRIC, SOLAR, AutoCAD/Computer Instructor. Transportation Spec. Personal Designer, MS Project, C, Script, File Management, File transfer, Nastran, Algor, Solid Works. Learn quickly, work independly, shift, overtime. From lstein at cshl.edu Thu Sep 11 21:07:57 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Sep 12 08:30:01 2003 Subject: [Bioperl-l] Bio::DB::GFF problem In-Reply-To: <3F5F9E7C.9070807@gsf.de> References: <3F5F9E7C.9070807@gsf.de> Message-ID: <200309112107.57017.lstein@cshl.edu> The enclosed script and its output illustrates what you need to do. Your mistake was to fetch "gene_density" features, which are the unaggregated subparts. You want to fetch features of type "density". This is the method name that you gave to the aggregator, and so is the method name that you need to fetch. Also, you don't need to create an aggregator object. This shortcut works just fine: $db = Bio::DB::GFF->new(-adaptor => 'memory', -aggregator => 'density{gene_density}'); The ability to use IO::String as input to load_gff() is a relatively new feature. Lincoln On Wednesday 10 September 2003 05:58 pm, Matthias Wahl wrote: > Hi Lincoln, > > Thank you very much for your input. The $gff_db->segment call now works > fine. > > However, I am still not able to generate an xyplot. Using the following > GFF file: > > 1 chromosome Component 1 195869683 . . . > Sequence "1" > 1 EnsEMBL gene_density 1 200000 0 . . > gene_density "1.density" > 1 EnsEMBL gene_density 200001 400000 0 . . > gene_density "1.density" > 1 EnsEMBL gene_density 400001 600000 0 . . > gene_density "1.density" > 1 EnsEMBL gene_density 600001 800000 0 . . > gene_density "1.density" > 1 EnsEMBL gene_density 800001 1000000 0 . . > gene_density "1.density" > 1 EnsEMBL gene_density 1000001 1200000 0 . . > gene_density "1.density" > .. > > I tried to aggregate all gene_density features > > my $aggregator = Bio::DB::GFF::Aggregator->new(-method => 'density' > > -sub_parts => 'gene_density'); > > my $gff_db = Bio::DB::GFF->new(-adaptor =>'dbi::mysqlopt', > > -dsn=>'dbi:mysql:Mus_musculus_GFF', > -user => > 'XXXXX', > -pass => > 'XXXXX, -aggregator => $aggregator, > ); > my $gene_density = $gff_db->segment($chromosome_name) or die "Can not > retrieve GFF segment: ".$gff_db->error; > my @features = $gene_density->features('gene_density'); > > This all works fine. But when I try to draw a xyplot > > $picture->add_track(\@features, > -glyph => 'xyplot'); > > I end up with bars in my image, but no plot! > > Could you by chance tell me what I am doing wrong? > > Many thanks > > Matthias > > Lincoln Stein wrote: > >Hi, > > > >I'm catching up on my bioperl mail after being on vacation for August. > > The problem is that you need an entry for the whole chromosome because > > the segment() call needs to retrieve it in order to get the length. > > Something like this will do the trick: > > > >1 EnsEMBL chromosome 1 1502911 . . . Chromosome 1 > > > >Lincoln > > > >On Wednesday 13 August 2003 07:02 pm, Matthias Wahl wrote: > >>Hi all! > >> > >>I have trouble in using Bio::DB::GFF with the following code: > >> > >>my $aggregator = Bio::DB::GFF::Aggregator->new(-method => > >> 'gene_density' -sub_parts => > >>'EnsEMBL:gene_density'); > >> > >>my $gff_db = Bio::DB::GFF->new(-adaptor =>'dbi::mysqlopt', > >> -dsn=>'dbi:mysql:Mus_musculus_GFF', > >> -user => 'xxxxx', > >> -pass => 'xxxxx', > >> -aggregator => $aggregator > >> ); > >> > >> > >>Calling > >> > >>$gff_db->segment(-class=>'Chromosome', > >> -value=>'1'); > >> > >>always returns undef (whatever arguments I use)! > >>The database has been generated by loading a GFF file of the following > >>format: > >> > >>1 EnsEMBL gene_density 1000001 2000000 0 > >>Chromosome 1 > >> > >>1 EnsEMBL gene_density 2000001 3000000 0 > >>Chromosome 1 > >> > >>1 EnsEMBL gene_density 3000001 4000000 1 > >>Chromosome 1 > >> > >>1 EnsEMBL gene_density 4000001 5000000 12 > >>Chromosome 1 > >> > >>1 EnsEMBL gene_density 5000001 6000000 4 > >>Chromosome 1 > >> > >>with load_gff.PLS (columns are tab-seperated, the 9th column consists of > >>'Chromosome' and name, seperated by space), both with and without the > >>associated sequence file. > >> > >>Calling > >> > >>$gff_db->features() > >> > >>works fine. But I need aggregated features for generating a > >>Bio::Graphics xyplot (to plot the gene density for a particular > >>chromosome). > >> > >>Many thanks, > >> > >>Matthias -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- A non-text attachment was scrubbed... Name: gene_density_histo.pl Type: text/x-perl Size: 1168 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20030911/e0e4affa/gene_density_histo.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: histogram.png Type: image/png Size: 1100 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20030911/e0e4affa/histogram.png From lstein at cshl.edu Fri Sep 12 09:40:29 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Sep 12 10:03:46 2003 Subject: [Bioperl-l] Bio::DB::GFF & Bio::Graphics updates in branch-1-2 Message-ID: <200309120940.29910.lstein@cshl.edu> Hi, I've folded recent changes to Bio::DB::GFF and Bio::Graphics into the branch, and have also updated the tutorial material. There's also now scripts for populating Bio::DB::GFF databases from Genbank, EMBL and UCSC. Some intrepid soul should see whether the EMBL loader works with EnsEMBL's EMBL format. I inadvertently broke Bio::DB::GFF's flat file loading last night through this morning when adding support for tied filehandles. My apologies to anyone who was unlucky enough to CVS update during that period of time. Lincoln From markw at illuminae.com Fri Sep 12 10:35:44 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Fri Sep 12 10:35:46 2003 Subject: [BioPerl] Re: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script In-Reply-To: <200309091547.55821.lstein@cshl.edu> References: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> <200309091547.55821.lstein@cshl.edu> Message-ID: <1063377422.1754.23.camel@localhost.localdomain> there is actually a similar problem somewhere else in the code. Even if you use retrievaltype => 'io_string', there are cases where it will fail with the same symptoms. If you try to do a get_Seq_by_acc using a RefSeq identifier (e.g. NC_003992) you get the following warning in your errorlog: -------------------- WARNING --------------------- MSG: [gb|NC_003992] is not a normal sequence database but a RefSeq entry. Redirecting the request. Unfortunately, somewhere in that redirection something is printed to STDOUT because the next message is: --------------------------------------------------- [Fri Sep 12 11:29:50 2003] [error] [client 24.78.208.156] malformed header from script. Bad header=LOCUS NC_003992 : Services.cgi [Fri Sep 12 11:29:50 2003] [warn] /cgi-bin/Services.cgi did not send an HTTP header So, this re-direction fails in a CGI environment :-( Same problem with retrievaltype => 'tempfile' M On Tue, 2003-09-09 at 13:47, Lincoln Stein wrote: > Sorry about any confusion this caused. However, it is mentioned in the docs > for WebDBSeqI. Perhaps the default should be changed to "tempfile", which > should work in all cases. > > Lincoln > > On Wednesday 20 August 2003 01:05 pm, Jason Stajich wrote: > > > So your script is doing what it's supposed to, it's just that some other > > > stuff is getting out on STDOUT before your webserver is able to get in > > > on the act. > > > > > > Having played a bit, this proves to be interesting: > > > > > > #!/usr/bin/perl -w > > > use strict; > > > use Bio::DB::GenBank; > > > > > > close STDOUT; > > > > > > my $d = Bio::DB::GenBank->new(); > > > my $seq = $d -> get_Seq_by_gi('163483'); > > > > > > > > > This gives me: > > > > > > print() on closed filehandle STDOUT at > > > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm line 701 > > > > > > So WebDBSeqI.pm is usurping STDOUT as part of its query. This probably > > > explains what you're getting. Apache will redirect STDOUT straight to > > > the return stream for the connection. This means it gets the output > > > intended for WbDBSeq and it appears in your programs output. You then > > > get the output you printed. > > > > This is part of Lincoln's rechaining of the IO and using fork - looking > > at his comments in the code. > > # Try to create a stream using POSIX fork-and-pipe facility. > > # this is a *big* win when fetching thousands of sequences from > > # a web database because we can return the first entry while > > # transmission is still in progress. > > # Also, no need to keep sequence in memory or in a temporary file. > > # If this fails (Windows, MacOS 9), we fall back to non-pipelined > > # access. > > > > You can turn this off by adding to the DB::GenBank init > > my $db = new Bio::DB::GenBank(-retrievaltype => 'io_string'); > > > > -retrievaltype => 'io_string' (for in-memory holding of the sequence > > before parsing) > > or > > -retrievaltype => 'temp' (for use of tempfiles, but I'm not 100% > > this code has gotten a workout to cleanup > > until the program exits which might be > > a problem for mod_perl running scripts) > > > > > If this is right, you should have some interesting error messages in > > > your logs if you run your script with warnings enabled. > > > > > > I can't see an immediate fix for this, short of running your fetch as a > > > completely detached process with a separate STDOUT, but that kind of > > > defeats the point of using mod-perl. The use of a pipe from STDOUT to > > > read the results of a webquery seem pretty engrained into WebQueryI.pm > > > and it may not be trivial to change it. > > > > > > Maybe others will be able to think of a simpler work-round? > > > > > > > > > Simon. > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Mark Wilkinson Illuminae From jason at cgt.duhs.duke.edu Fri Sep 12 10:43:57 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Sep 12 10:42:26 2003 Subject: [BioPerl] Re: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script In-Reply-To: <1063377422.1754.23.camel@localhost.localdomain> References: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> <200309091547.55821.lstein@cshl.edu> <1063377422.1754.23.camel@localhost.localdomain> Message-ID: You might try setting verbose => -1 in your code which uses Bio::DB::GenBank. On Fri, 12 Sep 2003, Mark Wilkinson wrote: > there is actually a similar problem somewhere else in the code. Even if > you use retrievaltype => 'io_string', there are cases where it will fail > with the same symptoms. > > If you try to do a get_Seq_by_acc using a RefSeq identifier (e.g. > NC_003992) you get the following warning in your errorlog: > > -------------------- WARNING --------------------- > MSG: [gb|NC_003992] is not a normal sequence database but a RefSeq > entry. Redirecting the request. > > Unfortunately, somewhere in that redirection something is printed to > STDOUT because the next message is: > > > --------------------------------------------------- > [Fri Sep 12 11:29:50 2003] [error] [client 24.78.208.156] malformed > header from script. Bad header=LOCUS NC_003992 : > Services.cgi > [Fri Sep 12 11:29:50 2003] [warn] /cgi-bin/Services.cgi did not send an > HTTP header > > So, this re-direction fails in a CGI environment :-( > > Same problem with retrievaltype => 'tempfile' > > M > > > > On Tue, 2003-09-09 at 13:47, Lincoln Stein wrote: > > Sorry about any confusion this caused. However, it is mentioned in the docs > > for WebDBSeqI. Perhaps the default should be changed to "tempfile", which > > should work in all cases. > > > > Lincoln > > > > On Wednesday 20 August 2003 01:05 pm, Jason Stajich wrote: > > > > So your script is doing what it's supposed to, it's just that some other > > > > stuff is getting out on STDOUT before your webserver is able to get in > > > > on the act. > > > > > > > > Having played a bit, this proves to be interesting: > > > > > > > > #!/usr/bin/perl -w > > > > use strict; > > > > use Bio::DB::GenBank; > > > > > > > > close STDOUT; > > > > > > > > my $d = Bio::DB::GenBank->new(); > > > > my $seq = $d -> get_Seq_by_gi('163483'); > > > > > > > > > > > > This gives me: > > > > > > > > print() on closed filehandle STDOUT at > > > > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm line 701 > > > > > > > > So WebDBSeqI.pm is usurping STDOUT as part of its query. This probably > > > > explains what you're getting. Apache will redirect STDOUT straight to > > > > the return stream for the connection. This means it gets the output > > > > intended for WbDBSeq and it appears in your programs output. You then > > > > get the output you printed. > > > > > > This is part of Lincoln's rechaining of the IO and using fork - looking > > > at his comments in the code. > > > # Try to create a stream using POSIX fork-and-pipe facility. > > > # this is a *big* win when fetching thousands of sequences from > > > # a web database because we can return the first entry while > > > # transmission is still in progress. > > > # Also, no need to keep sequence in memory or in a temporary file. > > > # If this fails (Windows, MacOS 9), we fall back to non-pipelined > > > # access. > > > > > > You can turn this off by adding to the DB::GenBank init > > > my $db = new Bio::DB::GenBank(-retrievaltype => 'io_string'); > > > > > > -retrievaltype => 'io_string' (for in-memory holding of the sequence > > > before parsing) > > > or > > > -retrievaltype => 'temp' (for use of tempfiles, but I'm not 100% > > > this code has gotten a workout to cleanup > > > until the program exits which might be > > > a problem for mod_perl running scripts) > > > > > > > If this is right, you should have some interesting error messages in > > > > your logs if you run your script with warnings enabled. > > > > > > > > I can't see an immediate fix for this, short of running your fetch as a > > > > completely detached process with a separate STDOUT, but that kind of > > > > defeats the point of using mod-perl. The use of a pipe from STDOUT to > > > > read the results of a webquery seem pretty engrained into WebQueryI.pm > > > > and it may not be trivial to change it. > > > > > > > > Maybe others will be able to think of a simpler work-round? > > > > > > > > > > > > Simon. > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Jason Stajich > > > Duke University > > > jason at cgt.mc.duke.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Fri Sep 12 10:54:21 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Sep 12 10:53:07 2003 Subject: [BioPerl] Re: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script In-Reply-To: <1063377422.1754.23.camel@localhost.localdomain> References: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> <200309091547.55821.lstein@cshl.edu> <1063377422.1754.23.camel@localhost.localdomain> Message-ID: So it is because the new DB::RefSeq object which is created doesn't inherit the input params of the DB::GenBank object. Applied a fix in CVS. Added a method to Bio::DB::NCBIHelper which has the method refseq_db which you can use to get/set the Bio::DB::RefSeq object. This should also be a little smarter/faster for repeated queries on the same db handle since it caches the RefSeq handle. Interestingly we have only implemented RefSeq retrieval from the EBI server - someone should look into how to best retrieve RefSeqs from Entrez as well and make an RefSeqEntrez interface. setting $db->verbose(-1) will prevent the redirection -jason On Fri, 12 Sep 2003, Mark Wilkinson wrote: > there is actually a similar problem somewhere else in the code. Even if > you use retrievaltype => 'io_string', there are cases where it will fail > with the same symptoms. > > If you try to do a get_Seq_by_acc using a RefSeq identifier (e.g. > NC_003992) you get the following warning in your errorlog: > > -------------------- WARNING --------------------- > MSG: [gb|NC_003992] is not a normal sequence database but a RefSeq > entry. Redirecting the request. > > Unfortunately, somewhere in that redirection something is printed to > STDOUT because the next message is: > > > --------------------------------------------------- > [Fri Sep 12 11:29:50 2003] [error] [client 24.78.208.156] malformed > header from script. Bad header=LOCUS NC_003992 : > Services.cgi > [Fri Sep 12 11:29:50 2003] [warn] /cgi-bin/Services.cgi did not send an > HTTP header > > So, this re-direction fails in a CGI environment :-( > > Same problem with retrievaltype => 'tempfile' > > M > > > > On Tue, 2003-09-09 at 13:47, Lincoln Stein wrote: > > Sorry about any confusion this caused. However, it is mentioned in the docs > > for WebDBSeqI. Perhaps the default should be changed to "tempfile", which > > should work in all cases. > > > > Lincoln > > > > On Wednesday 20 August 2003 01:05 pm, Jason Stajich wrote: > > > > So your script is doing what it's supposed to, it's just that some other > > > > stuff is getting out on STDOUT before your webserver is able to get in > > > > on the act. > > > > > > > > Having played a bit, this proves to be interesting: > > > > > > > > #!/usr/bin/perl -w > > > > use strict; > > > > use Bio::DB::GenBank; > > > > > > > > close STDOUT; > > > > > > > > my $d = Bio::DB::GenBank->new(); > > > > my $seq = $d -> get_Seq_by_gi('163483'); > > > > > > > > > > > > This gives me: > > > > > > > > print() on closed filehandle STDOUT at > > > > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm line 701 > > > > > > > > So WebDBSeqI.pm is usurping STDOUT as part of its query. This probably > > > > explains what you're getting. Apache will redirect STDOUT straight to > > > > the return stream for the connection. This means it gets the output > > > > intended for WbDBSeq and it appears in your programs output. You then > > > > get the output you printed. > > > > > > This is part of Lincoln's rechaining of the IO and using fork - looking > > > at his comments in the code. > > > # Try to create a stream using POSIX fork-and-pipe facility. > > > # this is a *big* win when fetching thousands of sequences from > > > # a web database because we can return the first entry while > > > # transmission is still in progress. > > > # Also, no need to keep sequence in memory or in a temporary file. > > > # If this fails (Windows, MacOS 9), we fall back to non-pipelined > > > # access. > > > > > > You can turn this off by adding to the DB::GenBank init > > > my $db = new Bio::DB::GenBank(-retrievaltype => 'io_string'); > > > > > > -retrievaltype => 'io_string' (for in-memory holding of the sequence > > > before parsing) > > > or > > > -retrievaltype => 'temp' (for use of tempfiles, but I'm not 100% > > > this code has gotten a workout to cleanup > > > until the program exits which might be > > > a problem for mod_perl running scripts) > > > > > > > If this is right, you should have some interesting error messages in > > > > your logs if you run your script with warnings enabled. > > > > > > > > I can't see an immediate fix for this, short of running your fetch as a > > > > completely detached process with a separate STDOUT, but that kind of > > > > defeats the point of using mod-perl. The use of a pipe from STDOUT to > > > > read the results of a webquery seem pretty engrained into WebQueryI.pm > > > > and it may not be trivial to change it. > > > > > > > > Maybe others will be able to think of a simpler work-round? > > > > > > > > > > > > Simon. > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Jason Stajich > > > Duke University > > > jason at cgt.mc.duke.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From lstein at cshl.edu Fri Sep 12 12:03:28 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Sep 12 12:02:49 2003 Subject: cloning and Storable Re: [Bioperl-l] bugs on branch; tests on main trunk In-Reply-To: References: Message-ID: <200309121203.28932.lstein@cshl.edu> I don't want to see clone() placed in Bio::Root::Root, because as Ewan says it is not guaranteed to work in all cases, and will probably break at the worst time. Also, I tend to use non-hashed implementations such as blessed arrayrefs and flyweights that will break generic cloning code that expects a hashref. I don't mind seeing it placed into a util class that can be multiply-inherited by a subclass that needs the functionality: package Bio::Root::Cloneable; sub clone { my $self = shift; my %copy = %$self; return bless \%copy,ref $self; } ... package NaiveSubclass; @ISA = qw(Bio::Root::Root Bio::Root::Cloneable); ... package SomethingElse; sub do_something_that_needs_cloning { my $self = shift; my $obj = shift; if ($obj->can('clone')) { } else { $self->throw('passed an unclonable object'); } } Lincoln On Thursday 04 September 2003 03:53 am, Ewan Birney wrote: > On Wed, 3 Sep 2003, Heikki Lehvaslaiho wrote: > > I've removed the dependency for Storable. Storable is still used if it > > is installed. Local code can clone everything except circular > > references. If someone knows how to do it, I'd be happy to receive help. > > Not having it here does not really matter because the the main use of > > the clone method is to allow in-memory creation of a new enzyme based on > > an existing one. > > > > The clone code is written in very general way and should be able to deep > > copy any in-memory objects. If you need to add a clone method your own > > classes, copy from there. Ewan feels strongly that deep cloning is too > > prone to errors to be a general property of bioperl objects, so better > > not add this into Bio::Root::Root, although it would be handy. > > I am willing to be overruled if there are alot of people who agree with > Heikki, but clone() methods are, in my view, just promise something (the > ability to correctly make a independent copy of all connected objects) > without being able to deliver. > > > The problem is with objects that either have eccentric memory layouts > (such as bound XS code; not that we have many of these) or have implicit > singleton style characteristics (eg, adaptors to databases which have > session information). a clone() which naively attempts to just in-memory > copy everything with truely fall over on teh first case and probably cause > a complex problem on the second case. Remember that these objects may not > be the top level ones, but rather be held onto in the object graph. > > > Furthermore, I rarely see the need for clone; in most systems just > reference passing is fine, and clone() is at best used as a shorthand for > a specific constructor, (which is what it is doing in restriction enzyme) > where I would argue the "full memory copy" is really a shorthand for > "build me a new RE with precisely the same attributes" which can then be > modified. > > > So, I would argue that clone() on RE's is better written as a type of > new option > > $new_re = new RestrictionEnzyme ( -template => $old_re); > > > and we don't have clone on the Root::Object. Current Heikki is swayed > enough by this argument to keep the clone() method specific to RE's. > > > If Jason/Lincoln/Hilmar all (or mostly...) liked clone() on the Root > object then I'd have to conceed > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From jiang_holly2003 at hotmail.com Fri Sep 12 14:10:05 2003 From: jiang_holly2003 at hotmail.com (Jiang Holly) Date: Fri Sep 12 14:08:33 2003 Subject: [Bioperl-l] puzzling questions:genbank file changed? Message-ID: Hi, I've encounted a puzzling problem.Since the datasize of human genome is too large to make local blast, I format fasta file of each chromosome(eg NT_011875).I put the accroding genbank file in the directory D:\data\NT_011875.When the programm read the genbank file, following error occurs: D:\data\NT_011875\NT_011903.gbk not found¡£And when I rename the NT_011875.gbk as NT_011903.gbk, the programm runs well. Thanks very much for helps! Holly _________________________________________________________________ MSN 8 with e-mail virus protection service: 2 months FREE* http://join.msn.com/?page=features/virus From vesko_baev at abv.bg Fri Sep 12 15:11:14 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Fri Sep 12 15:09:46 2003 Subject: [Bioperl-l] Bio::Graphics output Message-ID: <1950800394.1063393874989.JavaMail.nobody@app2.ni.bg> Hello, I've read the Graphics:HOWTO, and all the scripts in there are ending "print $panel->png;". But my script is a CGI and it makes an HTML-page. What to put in the end of my script to generate image in HTML-page? Thanks! ----------------------------------------------------------------- http://club.abv.bg/jsp/abvCard.jsp - ???? ??? - ??????? ? ?????? ????? ? ???? ! From Marc.Logghe at devgen.com Fri Sep 12 15:17:24 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Fri Sep 12 15:26:51 2003 Subject: [Bioperl-l] Bio::Graphics output Message-ID: > -----Original Message----- > From: Vesko Baev [mailto:vesko_baev@abv.bg] > Sent: Friday, September 12, 2003 9:11 PM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] Bio::Graphics output > > > Hello, > I've read the Graphics:HOWTO, and all the scripts in there > are ending "print $panel->png;". > But my script is a CGI and it makes an HTML-page. What to put > in the end of my script to generate image in HTML-page? You have to output the image to a file and make an html img tag pointing to that file. HTH, Marc From nel at birc.dk Fri Sep 12 15:39:47 2003 From: nel at birc.dk (Niels Larsen) Date: Fri Sep 12 15:43:07 2003 Subject: [Bioperl-l] Bio::Graphics output In-Reply-To: References: Message-ID: <200309122139.47541.nel@birc.dk> > You have to output the image to a file and make an html img tag pointing to that file. > HTH, > Marc Or have a script make it while the page is loaded, That works good for few large images; many mall ones would slow the server. Niels L From vesko_baev at abv.bg Fri Sep 12 16:27:12 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Fri Sep 12 16:25:47 2003 Subject: [Bioperl-l] Bio::Graphics output to file! Message-ID: <683144613.1063398432791.JavaMail.nobody@storage.ni.bg> Hi, I created an empty file colled: image.png and wrote in my script: open (IMAGEFILE,">image.png"); print IMAGEFILE $panel->png; The running of a script is OK, but when I open the file with my favorite image software I've got the error message: "PNG decoder error", "this is not valid png file" ?!?!? ----------------------------------------------------------------- http://club.abv.bg/jsp/abvCard.jsp - ???? ??? - ??????? ? ?????? ????? ? ???? ! From Jonathan_Epstein at nih.gov Fri Sep 12 17:29:15 2003 From: Jonathan_Epstein at nih.gov (Jonathan Epstein) Date: Fri Sep 12 17:31:35 2003 Subject: [Bioperl-l] Bio::Graphics output to file! In-Reply-To: <683144613.1063398432791.JavaMail.nobody@storage.ni.bg> Message-ID: <5.1.1.6.0.20030912172711.01d060b8@nihexchange4.nih.gov> Make it a binary file would probably help, depending upon the operating system and version of Perl you're using, i.e. open (IMAGEFILE,">image.png"); binmode IMAGEFILE; print IMAGEFILE $panel->png; HTH, Jonathan At 11:27 p.m. 9/12/2003 +0300, Vesko Baev wrote: >Hi, >I created an empty file colled: image.png and wrote in my script: > >open (IMAGEFILE,">image.png"); >print IMAGEFILE $panel->png; > >The running of a script is OK, but when I open the file with my favorite image software I've got the error message: >"PNG decoder error", "this is not valid png file" > >?!?!? From Birth at mpimp-golm.mpg.de Fri Sep 12 03:00:10 2003 From: Birth at mpimp-golm.mpg.de (Petra Birth) Date: Fri Sep 12 18:04:14 2003 Subject: [Bioperl-l] search/parse in genomatix Message-ID: <6039BB6EE5EEBB4C8F2A78F4874861D83CF839@MAIL.mpimp-golm.mpg.de> Hi, I try to search for known cis elements in the promotorregion of many genes of arabidopsis.(in genomatix/martinspector) Do anybody know, if there exist a Bioperl Modul to do this. Thanks Petra From Birth at mpimp-golm.mpg.de Thu Sep 11 09:39:17 2003 From: Birth at mpimp-golm.mpg.de (Petra Birth) Date: Fri Sep 12 18:04:24 2003 Subject: [Bioperl-l] search in genomatix Message-ID: <6039BB6EE5EEBB4C8F2A78F4874861D83CF838@MAIL.mpimp-golm.mpg.de> Hi, I try to search in genomatix (martinspector) for known cis elements in the promotorregion of genes of arabidopsis. Do somebody know, if there exist a bioperl tool to do this? From Birth at mpimp-golm.mpg.de Thu Sep 11 09:29:22 2003 From: Birth at mpimp-golm.mpg.de (Petra Birth) Date: Fri Sep 12 18:04:29 2003 Subject: [Bioperl-l] modul to search in genomatix Message-ID: <6039BB6EE5EEBB4C8F2A78F4874861D83CF837@MAIL.mpimp-golm.mpg.de> Hi, i try to search in genomatix (martinspector) for known cis elements in the promotorregion of genes of arabidopsis. Do somebody know, if there exist an bioperl tool to do this? From brian_osborne at cognia.com Fri Sep 12 12:33:18 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Sep 12 19:31:23 2003 Subject: [Bioperl-l] phd format conversion. In-Reply-To: <20030912161436.98479.qmail@web14206.mail.yahoo.com> Message-ID: Sankari, If fasta is the only format available to you then you'll probably have to create a SeqWithQuality object using the sequence in the fasta files. Take a look at this link: http://bioperl.org/Core/Latest/bptutorial.html#iii.7.6_incorporating_quality _data_in_sequence_annotation_(seqwithquality) Since there's no quality data in the fasta file you'll just have to make up reasonable values. If you're new to bioperl then I'd also recommend that you take a look at the other bptutorial sections on the Seq object and on SeqIO as it's sounding like you're going to be reading from and writing to sequence files. Brian O. -----Original Message----- From: sankari thirumal [mailto:sankari_thirumal@yahoo.com] Sent: Friday, September 12, 2003 12:15 PM To: Brian Osborne Subject: RE: [Bioperl-l] phd format conversion. Dear sir, I want use the phd format for SNP detection. Only phd format is accepted by the software for SNP detection. SO it is mandatory for me to convert the sequence to .phd format. Please help me in this regard. with regards sankari Brian Osborne wrote: Sankari, I can't answer your question but I do have a couple of thoughts. First, you might consider upgrading to version 1.2.2 or get the upcoming 1.2.3, there are a number of important bug fixes in these 2 versions, version 1.0.1 is quite old. Second, I'm wondering why you want to convert from fasta to phd, since *phd files contain not just sequence but also quality scores for each base. Fasta files, of course, don't have quality information. It sounds like you're testing out various format conversions but this is one that normally one would not choose to do. In fact in 1.2.2 this conversion isn't possible, you get this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: You must pass a Bio::Seq::SeqWithQuality object to write_scf as a parameter named "SeqWithQuality" Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of sankari thirumal Sent: Wednesday, September 10, 2003 12:27 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] phd format conversion. Dear sir, i'm using bioperl version 1.0.1. I could convert fasta to most of the formats except phd. I'm sending the error mesaage. please explain me what i should do. the error message is as follows: [root@Host0 bioperl-0.9.0]# perl conphd.pl Bio/SeqIO/phd.pm: phd cannot be found Exception Can't locate Bio/SeqIO/phd.pm in @INC (@INC contains: /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl .) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm line 477. For more information about the SeqIO system please see the SeqIO docs. This includes ways of checking for formats at compile time, not run time Can't use an undefined value as a symbol reference at conphd.pl line 9, line 1. i'll be grateful if could give suugestions to solve this problem with regards sankari --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software _____ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software From lehvasla at ebi.ac.uk Sun Sep 14 00:39:03 2003 From: lehvasla at ebi.ac.uk (lehvasla@ebi.ac.uk) Date: Sun Sep 14 00:37:33 2003 Subject: cloning and Storable Re: [Bioperl-l] bugs on branch; tests on main trunk In-Reply-To: <200309121203.28932.lstein@cshl.edu> References: <200309121203.28932.lstein@cshl.edu> Message-ID: <1123.144.137.208.245.1063514343.squirrel@webmail.ebi.ac.uk> This is the best suggestion so far. Bio::Root Clonable it will be. -Heikki > I don't want to see clone() placed in Bio::Root::Root, because as Ewan > says it > is not guaranteed to work in all cases, and will probably break at the > worst > time. Also, I tend to use non-hashed implementations such as blessed > arrayrefs and flyweights that will break generic cloning code that expects > a > hashref. I don't mind seeing it placed into a util class that can be > multiply-inherited by a subclass that needs the functionality: > > package Bio::Root::Cloneable; > sub clone { > my $self = shift; > my %copy = %$self; > return bless \%copy,ref $self; > } > > ... > > package NaiveSubclass; > @ISA = qw(Bio::Root::Root Bio::Root::Cloneable); > > ... > package SomethingElse; > sub do_something_that_needs_cloning { > my $self = shift; > my $obj = shift; > if ($obj->can('clone')) { > } > else { > $self->throw('passed an unclonable object'); > } > } > > Lincoln > > On Thursday 04 September 2003 03:53 am, Ewan Birney wrote: >> On Wed, 3 Sep 2003, Heikki Lehvaslaiho wrote: >> > I've removed the dependency for Storable. Storable is still used if it >> > is installed. Local code can clone everything except circular >> > references. If someone knows how to do it, I'd be happy to receive >> help. >> > Not having it here does not really matter because the the main use of >> > the clone method is to allow in-memory creation of a new enzyme based >> on >> > an existing one. >> > >> > The clone code is written in very general way and should be able to >> deep >> > copy any in-memory objects. If you need to add a clone method your own >> > classes, copy from there. Ewan feels strongly that deep cloning is too >> > prone to errors to be a general property of bioperl objects, so better >> > not add this into Bio::Root::Root, although it would be handy. >> >> I am willing to be overruled if there are alot of people who agree with >> Heikki, but clone() methods are, in my view, just promise something (the >> ability to correctly make a independent copy of all connected objects) >> without being able to deliver. >> >> >> The problem is with objects that either have eccentric memory layouts >> (such as bound XS code; not that we have many of these) or have implicit >> singleton style characteristics (eg, adaptors to databases which have >> session information). a clone() which naively attempts to just in-memory >> copy everything with truely fall over on teh first case and probably >> cause >> a complex problem on the second case. Remember that these objects may >> not >> be the top level ones, but rather be held onto in the object graph. >> >> >> Furthermore, I rarely see the need for clone; in most systems just >> reference passing is fine, and clone() is at best used as a shorthand >> for >> a specific constructor, (which is what it is doing in restriction >> enzyme) >> where I would argue the "full memory copy" is really a shorthand for >> "build me a new RE with precisely the same attributes" which can then be >> modified. >> >> >> So, I would argue that clone() on RE's is better written as a type of >> new option >> >> $new_re = new RestrictionEnzyme ( -template => $old_re); >> >> >> and we don't have clone on the Root::Object. Current Heikki is swayed >> enough by this argument to keep the clone() method specific to RE's. >> >> >> If Jason/Lincoln/Hilmar all (or mostly...) liked clone() on the Root >> object then I'd have to conceed >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > ======================================================================== > Lincoln D. Stein Cold Spring Harbor Laboratory > lstein@cshl.org Cold Spring Harbor, NY > ======================================================================== > > From hag442 at mail.usask.ca Sun Sep 14 01:11:39 2003 From: hag442 at mail.usask.ca (Haidan Guo) Date: Sun Sep 14 01:10:07 2003 Subject: [Bioperl-l] About the bioperl on Tigr_XML access problem Message-ID: <1063516299.3f63f88b335f6@my.usask.ca> Dear Mr. Josh Lauricha: I have seen your posting on the web about to access the TIGR_XML format by bioperl SeqIO on May 28,2003. I am interested in it. May I ask you a question that "Have you gotten any result about this now from the web answer or your reseach result"? Best Regards Wendy Guo From brian_osborne at cognia.com Mon Sep 15 07:46:00 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Sep 15 07:48:24 2003 Subject: [Bioperl-l] RE: [NORDNS] [Bioperl-guts-l] [Bug 1508] SeqIO allows fasta to masquerade as genbank or gcg or ace In-Reply-To: <200309132221.h8DMLO2x021465@portal.open-bio.org> Message-ID: Jason, The problem is fundamentally one of consistency. If you give SeqIO a fasta file and say "-format => genbank" then there's no error. If you give the same file with "-format => embl" there's an error. In my opinion there should always be an error so one can use eval. One way or the other, let's make up our minds! Brian O. ~>perl -e 'use Bio::SeqIO; $io = Bio::SeqIO->new(-file => "test.fa", -format => "genbank" ); $seq = $io->next_seq; Giving SeqIO a fasta file with a stated format of "genbank", "ace", or "gcg" and calling next_seq() results in no error. Change "genbank" to "embl", "swiss", "raw", "fastq" or "pir" and it's a fatal error where the message hints that the actual and stated formats are different. My guess is that the latter result is preferable. -----Original Message----- From: bioperl-guts-l-bounces@portal.open-bio.org [mailto:bioperl-guts-l-bounces@portal.open-bio.org]On Behalf Of bugzilla-daemon@portal.open-bio.org Sent: Saturday, September 13, 2003 6:21 PM To: bioperl-guts-l@bioperl.org Subject: [NORDNS] [Bioperl-guts-l] [Bug 1508] SeqIO allows fasta to masquerade as genbank or gcg or ace http://bugzilla.bioperl.org/show_bug.cgi?id=1508 ------- Additional Comments From jason@open-bio.org 2003-09-13 18:21 ------- So you're wanting it to throw a nonfatal error when we've specified the format as 'FASTA' but in fact the format is not FASTA? I don't know how to do this and not also throw an error when passed an empty file. (Maybe we should be throwing non-fatal errors there as well). My strategy would be to throw some sort of warning/error when we get to the end of the file/stream and the numer of sequences read is still 0. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l From apark at dyax.com Mon Sep 15 09:48:21 2003 From: apark at dyax.com (Al Park) Date: Mon Sep 15 09:46:47 2003 Subject: [Bioperl-l] SOAP and bioperl Message-ID: Hello everyone, I have encountered a problem using SOAP::Lite and bioperl and I was wondering if anyone knew of a solution. My SOAP client calls a perl module which executes the run method in the Factory::EMBOSS module. The error that the client generates is: Can't call method "run" on an undefined variable at xxx where the undefined variable is $app. I use the following in my module being called by the SOAP client: my $factory = Bio::Factory::EMBOSS -> new(); my $app = $factory->program($program); $app->run({ '-sequencea' => $seq_to_test, '-graph' => 'none', '-outfile' => $outfile}); Anyone have a solution, tell me what I'm doing wrong, or can just point me in a direction? Thanks in advance! -Al Park From brian_osborne at cognia.com Mon Sep 15 10:41:03 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Sep 15 10:43:40 2003 Subject: [NORDNS] [Bioperl-l] SOAP and bioperl In-Reply-To: Message-ID: Al, What is the value of $program in your code? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Al Park Sent: Monday, September 15, 2003 9:48 AM To: bioperl-l@portal.open-bio.org Subject: [NORDNS] [Bioperl-l] SOAP and bioperl Hello everyone, I have encountered a problem using SOAP::Lite and bioperl and I was wondering if anyone knew of a solution. My SOAP client calls a perl module which executes the run method in the Factory::EMBOSS module. The error that the client generates is: Can't call method "run" on an undefined variable at xxx where the undefined variable is $app. I use the following in my module being called by the SOAP client: my $factory = Bio::Factory::EMBOSS -> new(); my $app = $factory->program($program); $app->run({ '-sequencea' => $seq_to_test, '-graph' => 'none', '-outfile' => $outfile}); Anyone have a solution, tell me what I'm doing wrong, or can just point me in a direction? Thanks in advance! -Al Park _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From apark at dyax.com Mon Sep 15 10:52:52 2003 From: apark at dyax.com (Al Park) Date: Mon Sep 15 10:51:12 2003 Subject: [NORDNS] [Bioperl-l] SOAP and bioperl Message-ID: Brian, For example it would be iep. I understand that there are various methods of entering sequence files into the different tools via -sequencea or -sequence or -seqall, etc depending on the tool itself. -Al -----Original Message----- From: Brian Osborne [mailto:brian_osborne@cognia.com] Sent: Monday, September 15, 2003 10:41 AM To: Al Park; bioperl-l@portal.open-bio.org Subject: RE: [NORDNS] [Bioperl-l] SOAP and bioperl Al, What is the value of $program in your code? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Al Park Sent: Monday, September 15, 2003 9:48 AM To: bioperl-l@portal.open-bio.org Subject: [NORDNS] [Bioperl-l] SOAP and bioperl Hello everyone, I have encountered a problem using SOAP::Lite and bioperl and I was wondering if anyone knew of a solution. My SOAP client calls a perl module which executes the run method in the Factory::EMBOSS module. The error that the client generates is: Can't call method "run" on an undefined variable at xxx where the undefined variable is $app. I use the following in my module being called by the SOAP client: my $factory = Bio::Factory::EMBOSS -> new(); my $app = $factory->program($program); $app->run({ '-sequencea' => $seq_to_test, '-graph' => 'none', '-outfile' => $outfile}); Anyone have a solution, tell me what I'm doing wrong, or can just point me in a direction? Thanks in advance! -Al Park _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From MEC at Stowers-Institute.org Mon Sep 15 12:25:07 2003 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Mon Sep 15 12:23:33 2003 Subject: [Bioperl-l] SeqIO::GeneConstructionKit ?? Message-ID: I'm looking at reforming some hunders (thousands) or GCK molecules into genbank (or embl) format. Only option I can suss out is to uyse menu option to export one-at-a-time into GCC format (supported by GCK) and then translate using either ReadSeq or SeqIO based script. But that manual export is going to be painful. Any ideas? Cheers, Malcolm Cook Database Applications Manager Stowers Institute for Medical Research 1000 E 50th Street Kansas City, MO 64110 tel: 816-926-4449 fax: (816) 926-2098 From MEC at Stowers-Institute.org Mon Sep 15 11:53:55 2003 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Mon Sep 15 12:45:14 2003 Subject: [Bioperl-l] SeqIO::GeneConstructionKit ?? Message-ID: Skipped content of type multipart/alternative From jason at cgt.duhs.duke.edu Mon Sep 15 13:05:33 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Sep 15 13:03:45 2003 Subject: [Bioperl-l] bioperl 1.2.3 RC1 Message-ID: Bioperl 1.2.3 Release candidate 1 Available from http://bioperl.org/DIST/bioperl-1.2.3-rc1.tar.gz http://bioperl.org/DIST/bioperl-1.2.3-rc1.tar.bz2 http://bioperl.org/DIST/bioperl-1.2.3-rc1.zip MD5 for these releases e44d4c8b0d3e5aedba8da3f0b6d3d0c3 bioperl-1.2.3-rc1.tar.bz2 d8a3fcf235ed1de67bb50c928d0c6d57 bioperl-1.2.3-rc1.tar.gz 7512abce2dd12950ff74dbd2fba4373f bioperl-1.2.3-rc1.zip This release has been tested and passed on the follow platforms & perl versions: Mac OS X perl 5.6.0 perl 5.8.0 i686-Linux perl 5.6.1 perl 5.8.0 alpha-OSF1 perl 5.6.1 Developers: If you have any last minute changes - you need to get them in by tomorrow or let me know what you still have pending and I can hold up making a final release. Users: We would appreciate you downloading this release candidate, running the test suite on your systems and ideally running some of your existing scripts which depend on Bioperl with the release so that we might find any potential incompatibilities. This release should be completely compatable with any 1.2.x dependent Bioperl scripts. Barring any reports of problems with this release candidate I would like to make the 1.2.3 release on Wed. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From deltoro at mail.ev1.net Mon Sep 15 15:18:02 2003 From: deltoro at mail.ev1.net (deltoro) Date: Mon Sep 15 15:11:31 2003 Subject: [Bioperl-l] Would like to use bioperl and techniques for analysis DNA sequences. Message-ID: <200309151418.AA44368066@mail.ev1.net> Greetings! Allow me to introduce my self... My name is David Del Toro and I am a senior at the University of Houston-Downtown. I am majoring in Computer Science and have decided to do a project in bioinformatics. I would like permission to use your resources such as sequence databases and code for reference in writing a data mining engine only for the purposes of construction a senior project. If this is possible, please let me know. I would really appreciate it. Sincerely, David Del Toro Computer Science Student ________________________________________________________________ Sent via the EV1 webmail system at mail.ev1.net From jason at cgt.duhs.duke.edu Mon Sep 15 16:58:47 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Sep 15 16:56:55 2003 Subject: [Bioperl-l] Would like to use bioperl and techniques for analysis DNA sequences. In-Reply-To: <200309151418.AA44368066@mail.ev1.net> References: <200309151418.AA44368066@mail.ev1.net> Message-ID: Do whatever you like with the codes - the Bioperl source code is provided under the Perl Artistic license. http://bioperl.org/Core/Latest/LICENSE as long as you adhere to that (which is pretty basic), no problem. Sequence databases are redistributed under the restrictions of the providers NCBI, EMBL/EBI, DDBJ. -jason On Mon, 15 Sep 2003, deltoro wrote: > Greetings! > > Allow me to introduce my self... > My name is David Del Toro and I am a senior at the University of > Houston-Downtown. I am majoring in Computer Science and have > decided to do a project in bioinformatics. I would like > permission to use your resources such as sequence databases and > code for reference in writing a data mining engine only for the > purposes of construction a senior project. If this is possible, > please let me know. I would really appreciate it. > > Sincerely, > David Del Toro > Computer Science Student > > > > ________________________________________________________________ > Sent via the EV1 webmail system at mail.ev1.net > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Tue Sep 16 13:42:59 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Sep 16 13:41:08 2003 Subject: [Bioperl-l] SeqIO::GeneConstructionKit ?? In-Reply-To: References: Message-ID: I'm not really familiar with the format but in principal if you write a SeqIO parser for GCK then it is trivial to do the conversion. Depending on how much information you are trying to save from the conversion - just the sequence data or presumably the annotation data as well... Look at SeqIO/genbank.pm and SeqIO/embl.pm the next_seq method to see how we build a sequence object up from the parsed data. -jason On Mon, 15 Sep 2003, Cook, Malcolm wrote: > I'm looking at reforming some hunders (thousands) or GCK molecules into > genbank (or embl) format. > Only option I can suss out is to uyse menu option to export > one-at-a-time into GCC format (supported by GCK) and then translate > using either ReadSeq or SeqIO based script. > > But that manual export is going to be painful. > > Any ideas? > > Cheers, > > Malcolm Cook > Database Applications Manager > Stowers Institute for Medical Research > 1000 E 50th Street > Kansas City, MO 64110 > tel: 816-926-4449 > fax: (816) 926-2098 > > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From MEC at Stowers-Institute.org Tue Sep 16 15:16:55 2003 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Tue Sep 16 15:15:25 2003 Subject: [Bioperl-l] SeqIO::GeneConstructionKit ?? Message-ID: Thanks for the directions Jason, I've written SeqIO parsers before for other formats (i.e. DNAStar / Lasergene) and scripted SeqIO allot. But GCK has a binary format that I was hoping not to have to crack if it had already been done. Regards, Malcolm > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: Tuesday, September 16, 2003 12:43 PM > To: Cook, Malcolm > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] SeqIO::GeneConstructionKit ?? > > > I'm not really familiar with the format but in principal if > you write a > SeqIO parser for GCK then it is trivial to do the conversion. > Depending > on how much information you are trying to save from the > conversion - just > the sequence data or presumably the annotation data as > well... Look at > SeqIO/genbank.pm and SeqIO/embl.pm the next_seq method to see > how we build > a sequence object up from the parsed data. > > From hlapp at gnf.org Tue Sep 16 19:51:07 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Sep 16 19:49:11 2003 Subject: [Bioperl-l] Small change in UniGene file format In-Reply-To: <0B141079-E414-11D7-A8EE-00039399CEDC@anatomy.otago.ac.nz> Message-ID: Cool. I've been hit by this recently and it was on my list of things to do. Would you care migrating this fix to the branch? Otherwise I may try, but I'm afraid I'll be cut from the internet in a couple minutes and not be back before Thursday. Or Jason? (The reason I'm asking is that it would help everyone loading Unigene into Biosql.) -hilmar On Wednesday, September 10, 2003, at 09:54 PM, Andrew Macgregor wrote: > Andrew Walsh pointed out in bug report 1491 that the NCBI *.data files > now include a version number at the end of the accession number in > each SEQUENCE line e.g. ACC=BQ190891.1 > > I have modified the UniGene module to handle this. The resulting Seq > obj in this case now returns BQ190891 as the accession number and 1 as > the version. The module can still parse the older format, e.g. > accession numbers without version info. I hope this conforms to > conventions regarding accession numbers in bioperl. > > -- Andrew. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason at cgt.duhs.duke.edu Tue Sep 16 20:04:27 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Sep 16 20:03:29 2003 Subject: [Bioperl-l] Small change in UniGene file format In-Reply-To: References: Message-ID: I think Andrew already did it - any chance you can give it a quick test to make sure things are a-okay. -j On Tue, 16 Sep 2003, Hilmar Lapp wrote: > Cool. I've been hit by this recently and it was on my list of things to > do. > > Would you care migrating this fix to the branch? Otherwise I may try, > but I'm afraid I'll be cut from the internet in a couple minutes and > not be back before Thursday. Or Jason? (The reason I'm asking is that > it would help everyone loading Unigene into Biosql.) > > -hilmar > > On Wednesday, September 10, 2003, at 09:54 PM, Andrew Macgregor wrote: > > > Andrew Walsh pointed out in bug report 1491 that the NCBI *.data files > > now include a version number at the end of the accession number in > > each SEQUENCE line e.g. ACC=BQ190891.1 > > > > I have modified the UniGene module to handle this. The resulting Seq > > obj in this case now returns BQ190891 as the accession number and 1 as > > the version. The module can still parse the older format, e.g. > > accession numbers without version info. I hope this conforms to > > conventions regarding accession numbers in bioperl. > > > > -- Andrew. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From andrew at anatomy.otago.ac.nz Tue Sep 16 20:07:54 2003 From: andrew at anatomy.otago.ac.nz (Andrew Macgregor) Date: Tue Sep 16 20:06:15 2003 Subject: [Bioperl-l] Small change in UniGene file format In-Reply-To: Message-ID: Yes - I did migrate the fix to the branch and hopefully everything is OK ;) - Andrew. On Wednesday, September 17, 2003, at 12:04 PM, Jason Stajich wrote: > I think Andrew already did it - any chance you can give it a quick > test to > make sure things are a-okay. > > -j > On Tue, 16 Sep 2003, Hilmar Lapp wrote: > >> Cool. I've been hit by this recently and it was on my list of things >> to >> do. >> >> Would you care migrating this fix to the branch? Otherwise I may try, >> but I'm afraid I'll be cut from the internet in a couple minutes and >> not be back before Thursday. Or Jason? (The reason I'm asking is that >> it would help everyone loading Unigene into Biosql.) >> >> -hilmar >> >> On Wednesday, September 10, 2003, at 09:54 PM, Andrew Macgregor >> wrote: >> >>> Andrew Walsh pointed out in bug report 1491 that the NCBI *.data >>> files >>> now include a version number at the end of the accession number in >>> each SEQUENCE line e.g. ACC=BQ190891.1 >>> >>> I have modified the UniGene module to handle this. The resulting Seq >>> obj in this case now returns BQ190891 as the accession number and 1 >>> as >>> the version. The module can still parse the older format, e.g. >>> accession numbers without version info. I hope this conforms to >>> conventions regarding accession numbers in bioperl. >>> >>> -- Andrew. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From steve_chervitz at affymetrix.com Wed Sep 17 05:04:29 2003 From: steve_chervitz at affymetrix.com (Steve Chervitz) Date: Wed Sep 17 05:02:10 2003 Subject: [Bioperl-l] Re: [Bioperl-announce-l] bioperl 1.2.3 RC1 In-Reply-To: Message-ID: On Monday, Sep 15, 2003, at 10:05 US/Pacific, Jason Stajich wrote: > > Developers: > If you have any last minute changes - you need to get them in by > tomorrow > or let me know what you still have pending and I can hold up making a > final release. I put in a small addition to SeqIO::fasta that allows you to specify which type of identifier to appear after the '>' in the generated fasta output (e.g., accession, accession.version, display, or primary). The new method is called preferred_id_type. This is only applicable to fasta format; other formats have special slots for different types of identifiers, so it didn't make sense to put this in SeqIO.pm. I also added a test for it in SeqIO.t. Jason: Since this was such a trivial addition and doesn't affect existing functionality, I put in on the 1.2 branch as well as bioperl-live. However, since it is new functionality, I should have put it on the main trunk only. Let me know if you want to back it out of the branch. One issue I haven't addressed is how to deal with SeqIO::MultiFile. This module should delegate any method calls it can't handle (such as preferred_id_type) to its current SeqIO object. This would also address other format-specific methods such as SeqIO::fasta::width(). Steve From Marc.Logghe at devgen.com Wed Sep 17 07:31:34 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Sep 17 07:32:28 2003 Subject: [Bioperl-l] order of sublocations Message-ID: Hi, I could not find anything back in the 'DDBJ/EMBL/GenBank Feature Table Definition' about this, but apparently applications exist (like e.g. VectorNTI) which are really picky about the order of sublocations in split locations. CDS join(complement(649..>1045),complement(129..218), complement(<1..66)) for instance, is not accepted by vectorNTI. The following is OK: CDS join(complement(<1..66),complement(129..218), complement(649..>1045)) I expect this is more a problem of vectorNTI, rather than BioPerl ;-) but anyhow, this can easily be fixed by sorting the sublocations first: in Bio::Location::Split sub to_FTstring { my ($self) = @_; my @strs; foreach my $loc ( sort { $a->start <=> $b->start } $self->sub_Location() ) { # ~~~~~~~~~~~~~~~~~~~~~~~ my $str = $loc->to_FTstring(); # we only append the remote seq_id if it hasn't been done already # by the sub-location (which it should if it knows it's remote) # (and of course only if it's necessary) if( (! $loc->is_remote) && defined($self->seq_id) && defined($loc->seq_id) && ($loc->seq_id ne $self->seq_id) ) { $str = sprintf("%s:%s", $loc->seq_id, $str); } push @strs, $str; } my $str = sprintf("%s(%s)",lc $self->splittype, join(",", @strs)); return $str; }; Cheers, Marc From senger at ebi.ac.uk Wed Sep 17 08:09:32 2003 From: senger at ebi.ac.uk (Martin Senger) Date: Wed Sep 17 08:07:52 2003 Subject: [Bioperl-l] Re: SOAP and bioperl Message-ID: > I have encountered a problem using SOAP::Lite and bioperl and I was > wondering if anyone knew of a solution. > What does make you think that it is a SOAP-related problem? From your snippet of code I see that the value ($app) returned from $factory->program is undefined. It may be because of various reasons, but I do not see any connection with the SOAP. Am I missing something? Regards, Martin -- Martin Senger EMBL Outstation - Hinxton Senger@EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From jason at cgt.duhs.duke.edu Wed Sep 17 08:17:35 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Sep 17 08:15:53 2003 Subject: [Bioperl-l] order of sublocations In-Reply-To: References: Message-ID: This alone won't work because of the case when we have remote locations the sorting will mess up that order. Besides the order specified does mean something - we had to go through this with the spliced_seq method to generate the spliced out sequence. If we're specifying a gene on the complement it would be incorrect to sort the exons like you have shown - that puts them in the wrong order. Perhaps complement(join(<1..66,129..218)) is what VectorNTI wants - but we've been down that road before in getting the code to output this and it would require doing something a bit different to distribute the complementation status to all the subfeatures. To please VectorNTI it seems to me we need to split CDSes into individual exons - if this is in regard to SequenceDumping in Gbrowse - I have a 'CDS' track which I use to dump the CDSes as individual features rather than joined together. -jason On Wed, 17 Sep 2003, Marc Logghe wrote: > Hi, > I could not find anything back in the 'DDBJ/EMBL/GenBank Feature Table Definition' about this, but apparently applications exist (like e.g. VectorNTI) which are really picky about the order of sublocations in split locations. > CDS join(complement(649..>1045),complement(129..218), > complement(<1..66)) > for instance, is not accepted by vectorNTI. The following is OK: > CDS join(complement(<1..66),complement(129..218), > complement(649..>1045)) > I expect this is more a problem of vectorNTI, rather than BioPerl ;-) but anyhow, this can easily be fixed by sorting the sublocations first: > in Bio::Location::Split > sub to_FTstring { > my ($self) = @_; > my @strs; > foreach my $loc ( sort { $a->start <=> $b->start } $self->sub_Location() ) { > # ~~~~~~~~~~~~~~~~~~~~~~~ > my $str = $loc->to_FTstring(); > # we only append the remote seq_id if it hasn't been done already > # by the sub-location (which it should if it knows it's remote) > # (and of course only if it's necessary) > if( (! $loc->is_remote) && > defined($self->seq_id) && defined($loc->seq_id) && > ($loc->seq_id ne $self->seq_id) ) { > $str = sprintf("%s:%s", $loc->seq_id, $str); > } > push @strs, $str; > } > > my $str = sprintf("%s(%s)",lc $self->splittype, join(",", @strs)); > return $str; > }; > > Cheers, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From apark at dyax.com Wed Sep 17 08:45:40 2003 From: apark at dyax.com (Al Park) Date: Wed Sep 17 08:43:58 2003 Subject: [Bioperl-l] Re: SOAP and bioperl Message-ID: Thanks for the suggestions, I've actually figured this one out. I forgot to call a method properly in a different module that I wrote that was actually causing the error. Al -----Original Message----- From: Martin Senger [mailto:senger@ebi.ac.uk] Sent: Wednesday, September 17, 2003 8:10 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Re: SOAP and bioperl > I have encountered a problem using SOAP::Lite and bioperl and I was > wondering if anyone knew of a solution. > What does make you think that it is a SOAP-related problem? From your snippet of code I see that the value ($app) returned from $factory->program is undefined. It may be because of various reasons, but I do not see any connection with the SOAP. Am I missing something? Regards, Martin -- Martin Senger EMBL Outstation - Hinxton Senger@EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Wed Sep 17 09:52:43 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Sep 17 09:51:14 2003 Subject: [Bioperl-l] Re: [Bioperl-announce-l] bioperl 1.2.3 RC1 In-Reply-To: References: Message-ID: On Wed, 17 Sep 2003, Steve Chervitz wrote: > On Monday, Sep 15, 2003, at 10:05 US/Pacific, Jason Stajich wrote: > > > > Developers: > > If you have any last minute changes - you need to get them in by > > tomorrow > > or let me know what you still have pending and I can hold up making a > > final release. > > I put in a small addition to SeqIO::fasta that allows you to specify > which type of identifier to appear after the '>' in the generated fasta > output (e.g., accession, accession.version, display, or primary). The > new method is called preferred_id_type. This is only applicable to > fasta format; other formats have special slots for different types of > identifiers, so it didn't make sense to put this in SeqIO.pm. I also > added a test for it in SeqIO.t. > > Jason: Since this was such a trivial addition and doesn't affect > existing functionality, I put in on the 1.2 branch as well as > bioperl-live. However, since it is new functionality, I should have put > it on the main trunk only. Let me know if you want to back it out of > the branch. It is fine Steve - a couple of new features have already sneaked their way onto the branch. I think it will be a many months until 1.4 is ready to go stable so I'd rather put some of the (simple) new features out there if they don't cause any backwards compatability problems. > > One issue I haven't addressed is how to deal with SeqIO::MultiFile. > This module should delegate any method calls it can't handle (such as > preferred_id_type) to its current SeqIO object. This would also address > other format-specific methods such as SeqIO::fasta::width(). > > Steve > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From cjm at fruitfly.org Wed Sep 17 09:59:36 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Wed Sep 17 09:57:53 2003 Subject: [Bioperl-l] proposed changes to RangeI.pm Message-ID: both intersection() and union() are documented as returning a (start, end, strand) triple. in actual fact, intersection returns a RangeI compliant object, and union() returns either a RangeI object or a triple depending on wantarray() I have fixed things so that both intersection() and union() return either RangeI or triple depending on wantarray() - following the principle of least surprise - and documented this. The test suite passes. This will break code like this: $h = { 'range' => $sf->intersection($sf2) } since wantarray will be true here; however, this code violates the previously documented interface anyway. I have also added a new method disconnected_ranges() to RangeI I could easily migrate this method somewhere else, but it seems to belong with other geometrical methods such as intersection and union here is the pod docs: Title : disconnected_ranges Usage : my @disc_ranges = Bio::Range->disconnected_ranges(@ranges); Function: finds the minimal set of ranges such that each input range is fully contained by at least one output range, and none of the output ranges overlap Args : a list of ranges Returns : a list of objects of the same type as the input (conforms to RangeI) =cut is this a good time to check these changes in? From clangin at siu.edu Wed Sep 17 10:32:25 2003 From: clangin at siu.edu (Chet Langin) Date: Wed Sep 17 10:40:01 2003 Subject: [Bioperl-l] Finding modules on hard disk Message-ID: <1063809145.3f68707981744@webmail.siu.edu> When looking at the code for a module, I would like to be able to quickly find and look at another module which might be used by the one I'm looking at. For example, if the code I'm looking at has... use AnotherModule; ...how do I find AnotherModule so that I can look at the code in it? This would be like the "whereis" command. Also, I have noticed that some modules appear more than once in my hard disk directory structure. How do I know which one is being used? ,,Chet Langin,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ~~~Diagonally parked in a parallel universe~~~~~~~ ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From Marc.Logghe at devgen.com Wed Sep 17 10:51:33 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Sep 17 10:49:51 2003 Subject: [Bioperl-l] Finding modules on hard disk Message-ID: > -----Original Message----- > From: Chet Langin [mailto:clangin@siu.edu] > Sent: Wednesday, September 17, 2003 4:32 PM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] Finding modules on hard disk > > > When looking at the code for a module, I would like > to be able to quickly find and look at another module > which might be used by the one I'm looking at. > > For example, if the code I'm looking at has... > > use AnotherModule; > when you want to look at the documentation: perldoc AnotherModule (e.g. perldoc Bio::Seq) when you want to see the actual code in the module: perldoc -m AnotherModule when you want to know where the file is: perldoc -l AnotherModule HTH, Marc From shawnh at stanford.edu Wed Sep 17 11:07:34 2003 From: shawnh at stanford.edu (Shawn Hoon) Date: Wed Sep 17 11:02:29 2003 Subject: [Bioperl-l] Finding modules on hard disk In-Reply-To: <1063809145.3f68707981744@webmail.siu.edu> References: <1063809145.3f68707981744@webmail.siu.edu> Message-ID: do perldoc -l AnotherModule On Wednesday, September 17, 2003, at 7:32 AM, Chet Langin wrote: > When looking at the code for a module, I would like > to be able to quickly find and look at another module > which might be used by the one I'm looking at. > > For example, if the code I'm looking at has... > > use AnotherModule; > > ...how do I find AnotherModule so that I can look > at the code in it? > > This would be like the "whereis" command. > > Also, I have noticed that some modules appear more > than once in my hard disk directory structure. How > do I know which one is being used? > > > ,,Chet Langin,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, > > ~~~Diagonally parked in a parallel universe~~~~~~~ > > > > > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -shawn From jdw at ou.edu Wed Sep 17 11:24:13 2003 From: jdw at ou.edu (James D. White) Date: Wed Sep 17 11:24:26 2003 Subject: [Bioperl-l] phd format conversion. References: <200309171133.h8HBXwMg017048@portal.open-bio.org> Message-ID: <3F687C9D.8080803@ou.edu> Sankari, Phd files are the normal output of the Phred basecalling program written by Phil Green. If you have the original data files from a sequencer (for example, ab1 or scf files), you could run Phred to produce phd files with real quality values. This is not a Bioperl solution, but may be a better one if you have the original trace file data, not just the sequence. If all you have is the sequence, then other solutions are required. For this specific problem I do have a program, fastaq2phd, to create phd files from a fasta file. The program may be found at: . This is not a Bioperl program. It was written before I knew about Bioperl. If this is all the function you need it should work, but a Bioperl solution would probably be a better choice if you need to integrate it with other code. James D. White (jdw@ou.edu) Department of Chemistry and Biochemistry University of Oklahoma 620 Parrington Oval, Room 313 Norman, OK 73019-3051 >Date: Fri, 12 Sep 2003 12:33:18 -0400 >From: "Brian Osborne" >Subject: RE: [Bioperl-l] phd format conversion. >To: "sankari thirumal" >Cc: bioperl-l@bioperl.org >Message-ID: >Content-Type: text/plain; charset="us-ascii" > >Sankari, > >If fasta is the only format available to you then you'll probably have to >create a SeqWithQuality object using the sequence in the fasta files. Take a >look at this link: > >http://bioperl.org/Core/Latest/bptutorial.html#iii.7.6_incorporating_quality >_data_in_sequence_annotation_(seqwithquality) > >Since there's no quality data in the fasta file you'll just have to make up >reasonable values. > >If you're new to bioperl then I'd also recommend that you take a look at the >other bptutorial sections on the Seq object and on SeqIO as it's sounding >like you're going to be reading from and writing to sequence files. > > >Brian O. > >-----Original Message----- >From: sankari thirumal [mailto:sankari_thirumal@yahoo.com] >Sent: Friday, September 12, 2003 12:15 PM >To: Brian Osborne >Subject: RE: [Bioperl-l] phd format conversion. > >Dear sir, > >I want use the phd format for SNP detection. Only phd format is accepted by >the software for SNP detection. SO it is mandatory for me to convert the >sequence to .phd format. Please help me in this regard. > >with regards > >sankari > > >Brian Osborne wrote: >Sankari, > >I can't answer your question but I do have a couple of thoughts. First, you >might consider upgrading to version 1.2.2 or get the upcoming 1.2.3, there >are a number of important bug fixes in these 2 versions, version 1.0.1 is >quite old. Second, I'm wondering why you want to convert from fasta to phd, >since *phd files contain not just sequence but also quality scores for each >base. Fasta files, of course, don't have quality information. It sounds like >you're testing out various format conversions but this is one that normally >one would not choose to do. In fact in 1.2.2 this conversion isn't possible, >you get this error: > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: You must pass a Bio::Seq::SeqWithQuality object to write_scf as a >parameter named "SeqWithQuality" > >Brian O. > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of sankari thirumal >Sent: Wednesday, September 10, 2003 12:27 PM >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] phd format conversion. > >Dear sir, > >i'm using bioperl version 1.0.1. I could convert fasta to most of the >formats except phd. I'm sending the error mesaage. please explain me what i >should do. >the error message is as follows: > >[root@Host0 bioperl-0.9.0]# perl conphd.pl >Bio/SeqIO/phd.pm: phd cannot be found >Exception Can't locate Bio/SeqIO/phd.pm in @INC (@INC contains: >/usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 >/usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi >/usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl >/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi >/usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl .) at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm line 477. > >For more information about the SeqIO system please see the SeqIO docs. >This includes ways of checking for formats at compile time, not run time >Can't use an undefined value as a symbol reference at conphd.pl line 9, >line 1. > >i'll be grateful if could give suugestions to solve this problem > >with regards >sankari > > > From dhoworth at mrc-lmb.cam.ac.uk Wed Sep 17 11:57:18 2003 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Wed Sep 17 11:55:36 2003 Subject: [Bioperl-l] SeqIO::GeneConstructionKit ?? References: Message-ID: <3F68845E.4030507@mrc-lmb.cam.ac.uk> Cook, Malcolm wrote: > I'm looking at reforming some hunders (thousands) or GCK molecules > into genbank (or embl) format. Don't know anything about the format but since there've been no other suggestions here are a couple of thoughts. > GCK has a binary format that I was hoping not to have to crack if it had already been done. Is it possible that the company would supply details of the format if you asked them? That would make the problem a lot easier. > Only option I can suss out is to uyse menu option to export > one-at-a-time into GCC format (supported by GCK) and then translate > using either ReadSeq or SeqIO based script. You could perhaps automate the GUI operation with one of the macro/robot/automation tools so this wouldn't be too difficult. HTH, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From gilbertd at indiana.edu Wed Sep 17 14:51:54 2003 From: gilbertd at indiana.edu (Don Gilbert) Date: Wed Sep 17 14:50:24 2003 Subject: [Bioperl-l] bioperl mirror at iubio.bio.indiana.edu now daily updated Message-ID: <01993920-E940-11D7-AB8B-000393B8D01C@indiana.edu> Re: http://www.bioperl.org/Core/Latest/index.shtml MIRRORS: iubio.bio.indiana.edu - Updates on a 20 day cycle so may not always be current. I updated IUBio mirroring of this to daily, in case you all start cranking out new revisions. -- Don -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/ From lstein at cshl.edu Wed Sep 17 15:03:27 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Sep 17 15:02:01 2003 Subject: [Bioperl-l] Bio::Graphics output In-Reply-To: <1950800394.1063393874989.JavaMail.nobody@app2.ni.bg> References: <1950800394.1063393874989.JavaMail.nobody@app2.ni.bg> Message-ID: <200309171503.27082.lstein@cshl.edu> You need two scripts: one to produce an HTML that contains an tag, and the other to produce the output that goes into the tag. Example: Alternatively, you can do it all with one script which uses different CGI parameters to toggle between HTML-production and PNG production. Lincoln On Friday 12 September 2003 03:11 pm, Vesko Baev wrote: > Hello, > I've read the Graphics:HOWTO, and all the scripts in there are ending > "print $panel->png;". But my script is a CGI and it makes an HTML-page. > What to put in the end of my script to generate image in HTML-page? > > Thanks! > > ----------------------------------------------------------------- > http://club.abv.bg/jsp/abvCard.jsp - ???? ??? - ??????? ? ?????? ????? ? > ???? ! _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From lstein at cshl.edu Wed Sep 17 15:04:16 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Sep 17 15:02:34 2003 Subject: [Bioperl-l] Bio::Graphics output to file! In-Reply-To: <683144613.1063398432791.JavaMail.nobody@storage.ni.bg> References: <683144613.1063398432791.JavaMail.nobody@storage.ni.bg> Message-ID: <200309171504.16765.lstein@cshl.edu> You may need to put the filehandle into binary mode with: binmode(IMAGEFILE); This happens on Windows systems as well as RedHat systems that are using the evil RedHat-patched Perl. Lincoln On Friday 12 September 2003 04:27 pm, Vesko Baev wrote: > Hi, > I created an empty file colled: image.png and wrote in my script: > > open (IMAGEFILE,">image.png"); > print IMAGEFILE $panel->png; > > The running of a script is OK, but when I open the file with my favorite > image software I've got the error message: "PNG decoder error", "this is > not valid png file" > > ?!?!? > > ----------------------------------------------------------------- > http://club.abv.bg/jsp/abvCard.jsp - ???? ??? - ??????? ? ?????? ????? ? > ???? ! _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From chenn at cshl.edu Wed Sep 17 15:21:24 2003 From: chenn at cshl.edu (Jack Chen) Date: Wed Sep 17 15:19:43 2003 Subject: [Bioperl-l] blat module Message-ID: Hi All, I am trying to use the Blat.pm module. I have a simple script like this: #!/usr/bin/perl -w use strict; use Bio::Tools::Blat; my @bllat_feat; my $blat_parser = new Bio::Tools::Blat(-filename => "./temp"); while( my $blat_feat = $blat_parser->next_result() ) { warn "here...\n"; push @bllat_feat, $blat_feat; } In the script, temp is a blat result file, sitting in the current directory. The content of the file is (sorry that the lines in the file are wrapped). The probelm I have is that the program never runs. It gets stuck right after I execute the command. Any suggestion? Jack psLayout version 3 match mis- rep. N's Q gap Q gap T gap T gap strand Q Q Q Q T T T T blockSizes qStarts tStarts match match count bases count bases name size start end name size start end --------------------------------------------------------------------------------------------------------------------------------------------------------------- 80 0 0 0 0 0 0 0 + origi 80 0 80 CHROMOSOME_III 13783268 13703494 13703574 1 80, 0, 13703494, ++++++++++++++++++++++++++++++++++++++++++++ o-o Jack Chen, Stein Laboratory o---o Cold Spring Harbor Laboratory o----o #5 Williams, 1 Bungtown Road O----O Cold Spring Harbor, NY, 11724 0--o Tel: 1 516 367 8394 O e-mail: chenn@cshl.org o-o Website: http://www.wormbase.org +++++++++++++++++++++++++++++++++++++++++++++ From jason at cgt.duhs.duke.edu Wed Sep 17 15:22:43 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Sep 17 15:21:22 2003 Subject: [Bioperl-l] Re: [Bioperl-guts-l] How can i get Data from GenBank file In-Reply-To: <3F68B36C.BAECA819@cifn.unam.mx> References: <3F68B36C.BAECA819@cifn.unam.mx> Message-ID: use Bio::SeqIO; my $seqio = new Bio::SeqIO(-file => $filename, -format => 'genbank'); my $seq = $seqio->next_seq; my $species = $seq->species; see docs on Bio::Species for how to access the fields. % perldoc Bio::Species Post these types of questions on bioperl-l in the future please. -jason On Wed, 17 Sep 2003, Fabiola [iso-8859-1] Sánchez wrote: > Hello everybody! > > How i can get ORGANISM from genBank file > and how i can get from FEATURES > /gene > /product > /note > using Bioperl > now i using > use Bio::SeqIO; > > Thanks > Fabi > > > // > LOCUS A16STM112 1346 bp DNA linear BCT > 31-OCT-1996 > DEFINITION Actinomyces species 16S ribosomal RNA (isolate TM112). > ACCESSION X92698 > VERSION X92698.1 GI:1655518 > KEYWORDS 16S ribosomal RNA; 16S rRNA gene; spacer region. > SOURCE Actinomycetales > ORGANISM Actinomycetales > Bacteria; Actinobacteria; Actinobacteridae. > REFERENCE 1 > AUTHORS Rheims,H., Sproer,C., Rainey,F.A. and Stackebrandt,E. > TITLE Molecular biological evidence for the occurrence of > uncultured > members of the actinomycete line of descent in different > environments and geographical locations > JOURNAL Microbiology (Reading, Engl.) 142 (Pt 10), 2863-2870 (1996) > MEDLINE 97039856 > PUBMED 8885402 > REFERENCE 2 (bases 1 to 1346) > AUTHORS Rheims,H. > TITLE Direct Submission > JOURNAL Submitted (31-OCT-1995) H. Rheims, DSMZ, Deutsche Sammlung > von > Mikroorganismen u. Zellkulturen GmbH, Mascheroder Weg 1b, D- > 38124 > Braunschweig, FRG > REMARK Revised by submittor 30-OCT-96 > FEATURES Location/Qualifiers > source 1..1346 > /organism="Actinomycetales" > /mol_type="genomic DNA" > /db_xref="taxon:2037" > /clone="TM112" > gene 1..1346 > /gene="16S rRNA" > rRNA 1..1346 > /partial > /gene="16S rRNA" > /product="16S ribosomal RNA" > BASE COUNT 329 a 325 c 422 g 267 t 3 others > ORIGIN > 1 cgctggcggc gtgcttaaca catgcaagtc gaacggaatc caaggagctt > gctccgaaag > 61 atttagtggc gaacgggtga gtaacacgtg agcaacctgc cccgaagatt > gggataacac > 121 cgggaaaccg gtgctaatac cgaataccct caacctgtcg catgacagga > ggaggaaatg > 181 tcttatcgct tcgggagggg ctcgcggccc atcagcttgt tggnggggta > acggcccacc > 241 aaggcaacga cggatagctg gtctgagagg acgatcagcc acactggaac > tgagacacgg > 301 tccagactcc tacgggaggc agcagtgggg aatcttgcgc aatgggcgaa > agcctgacgc > 361 agcaacgccg cgtgagggac gaaggctttc tgagttgtaa acctctttcg > acaggaacga > 421 ttgtgacggt acctgtagaa gaagcaccgg ccaactatgt gccagcagtc > gcggtgatac > 481 atagggtgca agcgttattc ggatttattg ggcgtaaaga gctcgcaggc > ggntcaacaa > 541 gtcggntgtt aaacccccag gctcaacctg gggccgccac ctgaaactgt > tgtgactaga > 601 gtttggtagg ggatcacgga attcctggtg tagcggtggt atgcgcagat > atcaggagga > 661 acaccagtag cgaaggcggt gatctgggcc aatactgacg ctgaggagcg > aaagcgtggg > 721 gagcgaacag gattagatac cctggtagtc cacgccgtaa acgttgggca > ctaggtgtgg > 781 ggacttttca acggtttcgg tgtcgcaggt aacgcatgaa gtgtcccgcc > tggggagtac > 841 ggtcgcaaga ctaaaactca aagaaattga cggtggcccg cacatgcagt > ggagcatgtg > 901 gcttatttcg atgcaacgcg aaaaacctta cctagatttg acatgctggg > aaaagccaca > 961 gagatgttgt gtccttcggg gcccagcaca ggtggtgcat cgctgtcgtc > agctcgtgtc > 1021 gtgagatgtc gcgttaagtc ccgcaacgag cgcaaccctt gttctatgtt > gccagcacgt > 1081 aatggcgggg actcgtagaa gactgtcggg gtcaactcgg aggaaggtgg > ggacgacgtc > 1141 aagtcatcat gccccttacg tctagggctg cacacatgat acaatgggcg > gtacagaggg > 1201 ctgctaaacc gcgaggtgga gccaatccct aaaaccgctc tcagttcaga > ttgcaggctg > 1261 caactcgcct gcatgaagtt ggagttgcta gtaatcccgg atcagcattg > ccggggtgaa > 1321 tacgttcccg ggccttgtac acaccg > // > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Wed Sep 17 15:28:08 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Sep 17 15:26:19 2003 Subject: [Bioperl-l] blat module In-Reply-To: References: Message-ID: You want: my $blat_parser = new Bio::Tools::Blat(-file => "./temp"); Otherwise, in absence of any arguments that Root::IO knows about, it will try and read from STDIN. The new - more experimental Bio::SearchIO::blast is in CVS on the main trunk if you want to give that whirl too. It *might* require you to have stripped out the header line ahead of time, can't remember if I added in a check to remove that or not. -jason On Wed, 17 Sep 2003, Jack Chen wrote: > Hi All, > > I am trying to use the Blat.pm module. I have a simple script like this: > > #!/usr/bin/perl -w > use strict; > use Bio::Tools::Blat; > > my @bllat_feat; > > my $blat_parser = new Bio::Tools::Blat(-filename => "./temp"); > > while( my $blat_feat = $blat_parser->next_result() ) { > warn "here...\n"; > push @bllat_feat, $blat_feat; > } > > > In the script, temp is a blat result file, sitting in the current > directory. The content of the file is (sorry that the lines in the file > are wrapped). The probelm I have is that the program never runs. It gets > stuck right after I execute the command. Any suggestion? > > Jack > > psLayout version 3 > > match mis- rep. N's Q gap Q gap T gap T gap strand Q > Q Q Q T T T T blockSizes > qStarts tStarts > match match count bases count bases > name size start end name size start > end > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > 80 0 0 0 0 0 0 0 + > origi 80 0 80 CHROMOSOME_III 13783268 13703494 > 13703574 1 80, 0, 13703494, > > > > > ++++++++++++++++++++++++++++++++++++++++++++ > o-o Jack Chen, Stein Laboratory > o---o Cold Spring Harbor Laboratory > o----o #5 Williams, 1 Bungtown Road > O----O Cold Spring Harbor, NY, 11724 > 0--o Tel: 1 516 367 8394 > O e-mail: chenn@cshl.org > o-o Website: http://www.wormbase.org > +++++++++++++++++++++++++++++++++++++++++++++ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From chenn at cshl.edu Wed Sep 17 15:48:59 2003 From: chenn at cshl.edu (Jack Chen) Date: Wed Sep 17 15:47:25 2003 Subject: [Bioperl-l] blat module In-Reply-To: Message-ID: Thanks for your prompt reply! I think the Bio::Tools::Blat documentation should be correct accordingly. I got an error message like this: Use of uninitialized value in pattern match (m//) at /usr/lib/perl5/site_perl/5.6.0/Bio/Tools/Blat.pm line 126, line 2. the line read: unless ( $matches =~/^\d+$/ ){ Thanks, jack On Wed, 17 Sep 2003, Jason Stajich wrote: > You want: > my $blat_parser = new Bio::Tools::Blat(-file => "./temp"); > > Otherwise, in absence of any arguments that Root::IO knows about, it will > try and read from STDIN. > > The new - more experimental Bio::SearchIO::blast is in CVS on the main > trunk if you want to give that whirl too. It *might* require you to have > stripped out the header line ahead of time, can't remember if I added in > a check to remove that or not. > > -jason > On Wed, 17 Sep 2003, Jack Chen wrote: > > > Hi All, > > > > I am trying to use the Blat.pm module. I have a simple script like this: > > > > #!/usr/bin/perl -w > > use strict; > > use Bio::Tools::Blat; > > > > my @bllat_feat; > > > > my $blat_parser = new Bio::Tools::Blat(-filename => "./temp"); > > > > while( my $blat_feat = $blat_parser->next_result() ) { > > warn "here...\n"; > > push @bllat_feat, $blat_feat; > > } > > > > > > In the script, temp is a blat result file, sitting in the current > > directory. The content of the file is (sorry that the lines in the file > > are wrapped). The probelm I have is that the program never runs. It gets > > stuck right after I execute the command. Any suggestion? > > > > Jack > > > > psLayout version 3 > > > > match mis- rep. N's Q gap Q gap T gap T gap strand Q > > Q Q Q T T T T blockSizes > > qStarts tStarts > > match match count bases count bases > > name size start end name size start > > end > > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 80 0 0 0 0 0 0 0 + > > origi 80 0 80 CHROMOSOME_III 13783268 13703494 > > 13703574 1 80, 0, 13703494, > > > > > > > > > > ++++++++++++++++++++++++++++++++++++++++++++ > > o-o Jack Chen, Stein Laboratory > > o---o Cold Spring Harbor Laboratory > > o----o #5 Williams, 1 Bungtown Road > > O----O Cold Spring Harbor, NY, 11724 > > 0--o Tel: 1 516 367 8394 > > O e-mail: chenn@cshl.org > > o-o Website: http://www.wormbase.org > > +++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From clangin at siu.edu Wed Sep 17 18:27:37 2003 From: clangin at siu.edu (Chet Langin) Date: Wed Sep 17 20:17:23 2003 Subject: [Bioperl-l] Circular References Message-ID: <1063837657.3f68dfd988c17@webmail.siu.edu> Thanks to the "perldoc -l ModuleName" responders! It looks like some modules have circular references, like this (pseudocode)... ModuleA require ModuleB ModuleB if not using ModuleA, require ModuleA Is this possible? ,,Chet Langin,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ~~~Diagonally parked in a parallel universe~~~~~~~ ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From jason at cgt.duhs.duke.edu Wed Sep 17 23:22:39 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Sep 17 23:20:38 2003 Subject: [Bioperl-l] Release Announcement: Bioperl 1.2.3 Message-ID: Bioperl 1.2.3 ------------- On behalf of the Bioperl developers, I am pleased to announce the release of Bioperl 1.2.3. This is the set of Core libraries which constitutes Bioperl and covers areas like Sequence file parsing, Sequence Feature representations, Database access to flatfile and webbased sequence thadatabases, Alignment parsing and manipulation, parsing of and data representation of output from a majority of standard bioinformatics tools, and many more features. This release constitutes several major bugfixes from the 1.2.2 release earlier this summer and provides some new minor functionality improvements. This release is intended to be compatible with code which has been programmed using the API in the 1.2.x series of releases. The release is available as always from http://bioperl.org/DIST/ http://bioperl.org/DIST/bioperl-1.2.3.tar.bz2 http://bioperl.org/DIST/bioperl-1.2.3.tar.gz http://bioperl.org/DIST/bioperl-1.2.3.zip As well as from mirrors generously provided by Don Gilbert at Indiana University http://iubio.bio.indiana.edu/ MD5 signatures for the release files c64219b6540a722e781a53aea215ebc8 bioperl-1.2.3.tar.bz2 72b4a23f7372e820a7a7d9a72e7a0e76 bioperl-1.2.3.tar.gz e3bef5ca6ec6692bc253b75046100b64 bioperl-1.2.3.zip HTML-ized documentation for the release is available from the documentation website http://doc.bioperl.org/ http://doc.bioperl.org/releases/bioperl-1.2.3/ Related Projects ---------------- The Generic Genome Browser (www.gmod.org) which depends on bioperl will likely release a new version which will utilize features in bioperl 1.2.3. This is coordinated by Lincoln Stein and Scott Cain. Code is available from the project site at http://www.gmod.org. We plan to release a new version of bioperl-run in the coming week. bioperl-run is a package of perl module wrappers around many applications common to bioinformatics analyses. Shawn Hoon is responsible for overseeing this release. Previous and future releases are available at http://bioperl.org/DIST/ and in CPAN and as always from our CVS repository http://cvs.open-bio.org. A brand new package bioperl-microarray will be released this week as well, version 0.1. This is a project headed by Allen Day and he will be announcing a code release shortly. The code will be available from CPAN, http://bioperl.org/DIST, and http://cvs.open-bio.org/. Another recent project which has not been released yet, but should be out this fall is bioperl-pedigree. This will include codes for parsing and representing pedigree data and will interface with genotype parsing and representations already part of the Bioperl Core. This package is overseen by Jason Stajich. Code is available from our CVS repository and will be available at http://bioperl.org/DIST. Contributors ------------ The hard work of many people has gone into this release, please see AUTHORS file for a complete list of individuals who have contributed to the project. We would especially like to thank those who have provided bug reports and feedback about the modules to help us improve them. We would like to welcome several new developers who have joined us recently to provide code improvements and implementations of new areas for Bioperl. Future plans ------------ We intend to focus our energy on the next set of developer's releases in the Fall of 2003 which will be numbered 1.3.x and will lead to the next stable release 1.4 in 2004. We encourage new and old developers to be part of the development cycle as well as users to provide feedback and bug reports. Bugs ---- Bugs should be reported at our bug tracking site http://bugzilla.bioperl.org/ A synopsis of changes from the Changes file ------------------------------------------------------------------------ 1.2.3 Stable release update o Bug #1475 - Fix and add speedup to spliced_seq for remote location handling. o Bug #1477 - Sel --> Sec abbreviation fixed o Fix bug #1487 where paring in-between locations when end < start caused the FTLocationFactory logic to fail. o Fix bug #1489 which was not dealing with keywords as an arrayref properly (this is fixed on the main trunk because keywords returns a string and the array is accessible via get_keywords). o Bio::Tree::Tree memory leak (bug #1480) fixed Added a new initialization option -nodelete which won't try and cleanup the containing nodes if this is true. o Bug with parsing labeled nodes with Bio::TreeIO::newick fixed this was only present on the branch for the 1.2.1 and 1.2.2 series - Also merged main trunk changes to the branch which make newick -> nhx round tripping more effective (storing branch length and bootstrap values in same locate for NodeNHX and Node implementations.) Fixes to TreeIO parsing for labeled internal also required small changes to TreeIO::nhx. Improved tests for this module as well. o Bio::SearchIO - Fixed bugs in BLAST parsing which couldn't parse NCBI gapped blast properly (was losing hit significance values due to the extra unexpeted column). - Parsing of blastcl3 (netblast from NCBI) now can handle case of integer overflow (# of letters in nt seq dbs is > MAX_INT) although doesn't try to correct it - will get the negative number for you. Added a test for this as well. - Fixed HMMER parsing bug which prevented parsing when a hmmpfam report has no top-level family classification scores but does have scores and alignments for individual domains. - Parsing FASTA reports where ungapped percent ID is < 10 and the regular expression to match the line was missing the possibility of an extra space. This is rare, which is why we probably did not catch it before. - BLAST parsing picks up more of the statistics/parameter fields at the bottom of reports. Still not fully complete. - SearchIO::Writer::HTMLResultWriter and TextResultWriter were fixed to include many improvements and added flexiblity in outputting the files. Bug #1495 was also fixed in the process. o Bio::DB::GFF - Update for GFF3 compatibility. - Added scripts for importing from UCSC and GenBank. - Added a 1.2003 version number. o Bio::Graphics - Updated tutorial. - Added a 1.2003 version number. o SeqIO::swiss Bug #1504 fixed with swiss writing which was not properly writing keywords out. o Bio::SeqIO::genbank - Fixed bug/enhancement #1513 where dates of the form D-MMM-YYYY were not parsed. Even though this is invalid format we can handle it - and also cleanup the date string so it is properly formatted. - Bug/enhancement #1517 fixed so that SEGMENT line can be parsed and written with Genbank format. Similarly bug #1515 is fixed to parse in the ORIGIN text. o Bio::SeqIO::fasta, a new method called preferred_id_type allows you to specify the ID type, one of (accession accession.version display primary). See Bio::SeqIO::preferred_id_type method documentation for more information. o Unigene parsing updated to handle file format changes by NCBI Jason Stajich on behalf of the Bioperl developers. -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Wed Sep 17 23:22:39 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Sep 18 07:41:10 2003 Subject: [Bioperl-l] [Bioperl-announce-l] Release Announcement: Bioperl 1.2.3 Message-ID: Bioperl 1.2.3 ------------- On behalf of the Bioperl developers, I am pleased to announce the release of Bioperl 1.2.3. This is the set of Core libraries which constitutes Bioperl and covers areas like Sequence file parsing, Sequence Feature representations, Database access to flatfile and webbased sequence thadatabases, Alignment parsing and manipulation, parsing of and data representation of output from a majority of standard bioinformatics tools, and many more features. This release constitutes several major bugfixes from the 1.2.2 release earlier this summer and provides some new minor functionality improvements. This release is intended to be compatible with code which has been programmed using the API in the 1.2.x series of releases. The release is available as always from http://bioperl.org/DIST/ http://bioperl.org/DIST/bioperl-1.2.3.tar.bz2 http://bioperl.org/DIST/bioperl-1.2.3.tar.gz http://bioperl.org/DIST/bioperl-1.2.3.zip As well as from mirrors generously provided by Don Gilbert at Indiana University http://iubio.bio.indiana.edu/ MD5 signatures for the release files c64219b6540a722e781a53aea215ebc8 bioperl-1.2.3.tar.bz2 72b4a23f7372e820a7a7d9a72e7a0e76 bioperl-1.2.3.tar.gz e3bef5ca6ec6692bc253b75046100b64 bioperl-1.2.3.zip HTML-ized documentation for the release is available from the documentation website http://doc.bioperl.org/ http://doc.bioperl.org/releases/bioperl-1.2.3/ Related Projects ---------------- The Generic Genome Browser (www.gmod.org) which depends on bioperl will likely release a new version which will utilize features in bioperl 1.2.3. This is coordinated by Lincoln Stein and Scott Cain. Code is available from the project site at http://www.gmod.org. We plan to release a new version of bioperl-run in the coming week. bioperl-run is a package of perl module wrappers around many applications common to bioinformatics analyses. Shawn Hoon is responsible for overseeing this release. Previous and future releases are available at http://bioperl.org/DIST/ and in CPAN and as always from our CVS repository http://cvs.open-bio.org. A brand new package bioperl-microarray will be released this week as well, version 0.1. This is a project headed by Allen Day and he will be announcing a code release shortly. The code will be available from CPAN, http://bioperl.org/DIST, and http://cvs.open-bio.org/. Another recent project which has not been released yet, but should be out this fall is bioperl-pedigree. This will include codes for parsing and representing pedigree data and will interface with genotype parsing and representations already part of the Bioperl Core. This package is overseen by Jason Stajich. Code is available from our CVS repository and will be available at http://bioperl.org/DIST. Contributors ------------ The hard work of many people has gone into this release, please see AUTHORS file for a complete list of individuals who have contributed to the project. We would especially like to thank those who have provided bug reports and feedback about the modules to help us improve them. We would like to welcome several new developers who have joined us recently to provide code improvements and implementations of new areas for Bioperl. Future plans ------------ We intend to focus our energy on the next set of developer's releases in the Fall of 2003 which will be numbered 1.3.x and will lead to the next stable release 1.4 in 2004. We encourage new and old developers to be part of the development cycle as well as users to provide feedback and bug reports. Bugs ---- Bugs should be reported at our bug tracking site http://bugzilla.bioperl.org/ A synopsis of changes from the Changes file ------------------------------------------------------------------------ 1.2.3 Stable release update o Bug #1475 - Fix and add speedup to spliced_seq for remote location handling. o Bug #1477 - Sel --> Sec abbreviation fixed o Fix bug #1487 where paring in-between locations when end < start caused the FTLocationFactory logic to fail. o Fix bug #1489 which was not dealing with keywords as an arrayref properly (this is fixed on the main trunk because keywords returns a string and the array is accessible via get_keywords). o Bio::Tree::Tree memory leak (bug #1480) fixed Added a new initialization option -nodelete which won't try and cleanup the containing nodes if this is true. o Bug with parsing labeled nodes with Bio::TreeIO::newick fixed this was only present on the branch for the 1.2.1 and 1.2.2 series - Also merged main trunk changes to the branch which make newick -> nhx round tripping more effective (storing branch length and bootstrap values in same locate for NodeNHX and Node implementations.) Fixes to TreeIO parsing for labeled internal also required small changes to TreeIO::nhx. Improved tests for this module as well. o Bio::SearchIO - Fixed bugs in BLAST parsing which couldn't parse NCBI gapped blast properly (was losing hit significance values due to the extra unexpeted column). - Parsing of blastcl3 (netblast from NCBI) now can handle case of integer overflow (# of letters in nt seq dbs is > MAX_INT) although doesn't try to correct it - will get the negative number for you. Added a test for this as well. - Fixed HMMER parsing bug which prevented parsing when a hmmpfam report has no top-level family classification scores but does have scores and alignments for individual domains. - Parsing FASTA reports where ungapped percent ID is < 10 and the regular expression to match the line was missing the possibility of an extra space. This is rare, which is why we probably did not catch it before. - BLAST parsing picks up more of the statistics/parameter fields at the bottom of reports. Still not fully complete. - SearchIO::Writer::HTMLResultWriter and TextResultWriter were fixed to include many improvements and added flexiblity in outputting the files. Bug #1495 was also fixed in the process. o Bio::DB::GFF - Update for GFF3 compatibility. - Added scripts for importing from UCSC and GenBank. - Added a 1.2003 version number. o Bio::Graphics - Updated tutorial. - Added a 1.2003 version number. o SeqIO::swiss Bug #1504 fixed with swiss writing which was not properly writing keywords out. o Bio::SeqIO::genbank - Fixed bug/enhancement #1513 where dates of the form D-MMM-YYYY were not parsed. Even though this is invalid format we can handle it - and also cleanup the date string so it is properly formatted. - Bug/enhancement #1517 fixed so that SEGMENT line can be parsed and written with Genbank format. Similarly bug #1515 is fixed to parse in the ORIGIN text. o Bio::SeqIO::fasta, a new method called preferred_id_type allows you to specify the ID type, one of (accession accession.version display primary). See Bio::SeqIO::preferred_id_type method documentation for more information. o Unigene parsing updated to handle file format changes by NCBI Jason Stajich on behalf of the Bioperl developers. -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-announce-l Assistant Scientist, Global Infectious Diseases Josephine Bay Paul Center, Marine Biological Laboratory 7 MBL Street, Woods Hole, MA 02543-1015 http://jbpc.mbl.edu/mcarthur Assistant Scientist, Global Infectious Diseases Josephine Bay Paul Center, Marine Biological Laboratory 7 MBL Street, Woods Hole, MA 02543-1015 http://jbpc.mbl.edu/mcarthur From Richard.Adams at ed.ac.uk Thu Sep 18 10:12:47 2003 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Thu Sep 18 10:11:06 2003 Subject: [Bioperl-l] interfaces Message-ID: <3F69BD5F.BC3C923B@ed.ac.uk> I was wondering if there was any aim or intent to make interface classes purely abstract? It seems that some interfaces (e.g., RangeI )contain extensive implementation and others e.g., Bio::RichSeqI are indeed abstract. As far as I know it's deemed A Good Idea to have interfaces abstract. Some changes would just involve creating a base class implementing the interface, which I'd volunteer to do, at least for the classes I use regularly. Or is it too late to do this now without breaking everyone's code? Richard Dr Richard Adams Bioinformatician, Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From jason at cgt.duhs.duke.edu Thu Sep 18 10:28:07 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Sep 18 10:26:19 2003 Subject: [Bioperl-l] interfaces In-Reply-To: <3F69BD5F.BC3C923B@ed.ac.uk> References: <3F69BD5F.BC3C923B@ed.ac.uk> Message-ID: We've gone back and forth on this. I think we settled on implementations which do not rely on any internal data structures (i.e. which only use methods defined by the interface or the interface's superclass(es) ) are okay. Arguments pro and con are still welcomed of course. -jason On Thu, 18 Sep 2003, Richard Adams wrote: > I was wondering if there was any aim or intent to make interface classes > purely abstract? > > It seems that some interfaces (e.g., RangeI )contain extensive > implementation and others > > e.g., Bio::RichSeqI are indeed abstract. As far as I know it's deemed A > Good Idea to have > > interfaces abstract. > > Some changes would just involve creating a base class implementing the interface, which I'd volunteer > > to do, at least for the classes I use regularly. > > Or is it too late to do this now without breaking everyone's code? > > Richard > > Dr Richard Adams > Bioinformatician, > Psychiatric Genetics Group, > Medical Genetics, > Molecular Medicine Centre, > Western General Hospital, > Crewe Rd West, > Edinburgh UK > EH4 2XU > > Tel: 44 131 651 1084 > richard.adams@ed.ac.uk > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From kvddrift at earthlink.net Thu Sep 18 12:21:33 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu Sep 18 12:20:02 2003 Subject: [Bioperl-l] Re: bioperl 1.2.3 RC1 Message-ID: <2B623BE0-E9F4-11D7-B5E5-003065A5FDCC@earthlink.net> Hi, Just installed RC1 on my Mac - no problems! One comment (not fatal, I guess ;) The makefile doesn't contain the Script Install Section. It is present in a version in the cvs tree. See revision 1.63 on http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ Makefile.PL?cvsroot=bioperl. Is this part not to be included in 1.2.3? thanks, - Koen. From brian_osborne at cognia.com Thu Sep 18 14:31:57 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Sep 18 14:34:56 2003 Subject: [NORDNS] [Bioperl-l] Re: bioperl 1.2.3 RC1 In-Reply-To: <2B623BE0-E9F4-11D7-B5E5-003065A5FDCC@earthlink.net> Message-ID: Koen, No script installation in 1.2.3 since 1.2.3 is based on 1.2, in essence. This feature will formally appear in 1.3. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Koen van der Drift Sent: Thursday, September 18, 2003 12:22 PM To: Jason Stajich; bioperl-l@bioperl.org Subject: [NORDNS] [Bioperl-l] Re: bioperl 1.2.3 RC1 Hi, Just installed RC1 on my Mac - no problems! One comment (not fatal, I guess ;) The makefile doesn't contain the Script Install Section. It is present in a version in the cvs tree. See revision 1.63 on http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ Makefile.PL?cvsroot=bioperl. Is this part not to be included in 1.2.3? thanks, - Koen. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Thu Sep 18 19:02:47 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Sep 18 19:00:46 2003 Subject: [Bioperl-l] Bio::Tools::GFF GFF3 parsing Message-ID: I added support for GFF3 parsing to Bio::Tools::GFF and added some simple tests. I'm not 100% I have GFF3 output correct so Chris Mungall, Lincoln if you don't mind giving it a look over that would be great. If I've duplicated functionality from somewhere else let me know, but I think Bio::Tools::GFF needs to be able to parse in GFF3 format at some point. The GFF3 parsing seems to work fine for processing the BGDP annotations so I feel confident it is working correctly, but more testing is welcomed! This is no real support for Unicode type output, the simpliest solution would be to rely on HTML::Entities for encoding non-ASCII codes. I wasn't sure I wanted to make Tools::GFF depend on this right now so I've just implemented a simple encoding for '=,;'. Feel free to fix this if it needs to be more aggresive. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From allenday at ucla.edu Thu Sep 18 21:12:55 2003 From: allenday at ucla.edu (Allen Day) Date: Thu Sep 18 21:11:09 2003 Subject: [Bioperl-l] Release Announcement: bioperl-microarray 0.1 In-Reply-To: Message-ID: The Bioperl developers are pleased to announce a 0.1 release of bioperl-microarray, a Bioperl extension package dedicated to manipulation of microarray data. The package is implemented using IO conventions Bioperl developers should already be familiar with. Data types currently supported are: Affymetrix GeneChip CEL files (read and write) Affymetrix GeneChip CDF files (read) Affymetrix GeneChip Microarray Suite 5.0 normalized files (read) Affymetrix GeneChip dChip normalized files (read) Data types for which support is planned in a 0.2 release include: Bio::MAGE objects and MAGE-ML (read and write) Affymetrix SNPChip genotype files (read) GenePix GPR files (read) Release packages are available from: http://www.bioperl.org/DIST/ http://www.bioperl.org/DIST/bioperl-microarray-0.1.tar.bz2 http://www.bioperl.org/DIST/bioperl-microarray-0.1.tar.gz http://www.bioperl.org/DIST/bioperl-microarray-0.1.zip as well as from CPAN: http://search.cpan.org/author/ALLENDAY/ Documentation available from: http://doc.bioperl.org/bioperl-microarray/ Subscribe to the bioperl-microarray mailing list at: http://bioperl.org/mailman/listinfo/bioperl-microarray/ Report bugs at: http://bugzilla.bioperl.org/ -Allen -------------------- Allen Day Human Genetics, UCLA From ssilrich at yahoo.com Thu Sep 18 23:39:28 2003 From: ssilrich at yahoo.com (ELECTRICIAN) Date: Fri Sep 19 02:39:11 2003 Subject: [Bioperl-l] R E S U M E Message-ID: <200309190638.h8J6clMg021465@portal.open-bio.org> Resume from: Rich for Job or Service " no job too small " E L E C T R I C I A N Tel. (408) 482-2102 rysio3@yahoo.com WIRING & INSTALLATION Hands on electrical installations perform fitting, mounting, laying cables on Commercial, Industrial, residential new & existing buildings. Electrical Power Supply for Lights, Plugs, Receptacles, Panels, & Fuse boxes, Emergency Generators wiring and testing, Transformers, Power Lines & conduit layout, bending and mounting, parking lighting, lamps, switches, SOLAR PROJECTS, posts and underground installations. Shopping Centers; grocery stories, hardware stories, restaurants & residential - housing areas, computer business & fast food units installation & buildings; Solar Panels, Sun Tracking, Flywheel Storage & electric cars systems modify, Natural Energy in Remote areas install. LOW VOLTAGE Office Home Yard Patio Parking 12 / 24 Volt audio & video equipment, Computer & data network wiring, data backup and UPS; Monitoring Video Control & backup tapes set up and mounting, electro-optical assemblies & subsystems. DC Power Supply, Switch & Motion sensors Alarm. Fire & safety systems install. Fiber Optics systems, PLC setup, Master Control Center, cable modems & cable TV install. Network, UPS Battery Backup mounting and charging systems; Power supply testing, troubleshooting, and analyzing to a components level. Electric Vehicles Design, Assembly & Installations. CC TV & Cameras, Security Systems & Sensors for Safety, Fire sprinklers and traffic Monitoring & Door Control. Telephones / Net move & install. TECHNICIAN Use lab & shop equipment, mechanical, electrical & electronic tools, measurement & testing equipment, video cameras & microscopes. Support scientists & electronic engineers. Mechanical & Electro-Mech. Design. OFFICE, ELECTRICAL AND MECHANICAL PROJECTS Electrical & Network Sketches, one line diagrams, and "as is" drawings update. Customizing Electronic and Electrical Components & Parts, Layouts electronic and electrical schematic, connectors and mechanical detailing. Quotes, supply, bids and job estimating. Customers contact, inspection, project mgmt & supervision of electricians & material handling; Use CAD, Windows and applications; ELECTRICAL & MAINTENANCE SERVICE US Citizen; open for travel . From ssilrich at yahoo.com Thu Sep 18 23:39:28 2003 From: ssilrich at yahoo.com (ELECTRICIAN) Date: Fri Sep 19 02:39:18 2003 Subject: [Bioperl-l] R E S U M E Message-ID: <200309190638.h8J6clMg021464@portal.open-bio.org> Resume from: Rich for Job or Service " no job too small " E L E C T R I C I A N Tel. (408) 482-2102 rysio3@yahoo.com WIRING & INSTALLATION Hands on electrical installations perform fitting, mounting, laying cables on Commercial, Industrial, residential new & existing buildings. Electrical Power Supply for Lights, Plugs, Receptacles, Panels, & Fuse boxes, Emergency Generators wiring and testing, Transformers, Power Lines & conduit layout, bending and mounting, parking lighting, lamps, switches, SOLAR PROJECTS, posts and underground installations. Shopping Centers; grocery stories, hardware stories, restaurants & residential - housing areas, computer business & fast food units installation & buildings; Solar Panels, Sun Tracking, Flywheel Storage & electric cars systems modify, Natural Energy in Remote areas install. LOW VOLTAGE Office Home Yard Patio Parking 12 / 24 Volt audio & video equipment, Computer & data network wiring, data backup and UPS; Monitoring Video Control & backup tapes set up and mounting, electro-optical assemblies & subsystems. DC Power Supply, Switch & Motion sensors Alarm. Fire & safety systems install. Fiber Optics systems, PLC setup, Master Control Center, cable modems & cable TV install. Network, UPS Battery Backup mounting and charging systems; Power supply testing, troubleshooting, and analyzing to a components level. Electric Vehicles Design, Assembly & Installations. CC TV & Cameras, Security Systems & Sensors for Safety, Fire sprinklers and traffic Monitoring & Door Control. Telephones / Net move & install. TECHNICIAN Use lab & shop equipment, mechanical, electrical & electronic tools, measurement & testing equipment, video cameras & microscopes. Support scientists & electronic engineers. Mechanical & Electro-Mech. Design. OFFICE, ELECTRICAL AND MECHANICAL PROJECTS Electrical & Network Sketches, one line diagrams, and "as is" drawings update. Customizing Electronic and Electrical Components & Parts, Layouts electronic and electrical schematic, connectors and mechanical detailing. Quotes, supply, bids and job estimating. Customers contact, inspection, project mgmt & supervision of electricians & material handling; Use CAD, Windows and applications; ELECTRICAL & MAINTENANCE SERVICE US Citizen; open for travel . From vesko_baev at abv.bg Fri Sep 19 02:55:21 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Fri Sep 19 02:53:39 2003 Subject: [Bioperl-l] GD problem Message-ID: <871010117.1063954521108.JavaMail.nobody@storage.ni.bg> Hello, We have Linux SuSE, Perl(latest ver.),Bioperl(latest ver.).We installed GD package and then we started the programm we have this error (but we have libgd): Can't locate loadible object for module GD What's that mean? Thanks! ----------------------------------------------------------------- http://club.abv.bg/jsp/abvCard.jsp - ???? ??? - ??????? ? ?????? ????? ? ???? ! From simonchanx at hotmail.com Fri Sep 19 02:58:26 2003 From: simonchanx at hotmail.com (Simon Chan) Date: Fri Sep 19 02:56:41 2003 Subject: [Bioperl-l] Bio::TreeIO::tabtree Question Message-ID: Greetings, I've got a question regarding Bio::TreeIO::tabtree I want to draw phylogenetic trees in via ASCII and was quite happy to find tabtree. I inputed a newick file that contained the tree: (B,(A,C,E),D); In my output file, I got this: B D A C E I was expecting that there would be branches (ie: " ---" ) linking the letters together. What's wrong? Many thanks for your time! Any suggestions/comments would be greatly appreciated. My code is below: #!/usr/bin/perl -w use Bio::TreeIO; my $treeFile = "newickTreeFile.txt"; my $in = new Bio::TreeIO(-file => "$treeFile", -format => 'newick'); my $out = new Bio::TreeIO(-file => '>output', -format => 'tabtree'); while( my $tree = $in->next_tree ) { $out->write_tree($tree); } _________________________________________________________________ Add photos to your messages with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail From Marc.Logghe at devgen.com Fri Sep 19 03:03:49 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Fri Sep 19 03:02:02 2003 Subject: [Bioperl-l] GD problem Message-ID: Not sure, but you probably need gd *and* gd-devel installed. HTH, Marc > -----Original Message----- > From: Vesko Baev [mailto:vesko_baev@abv.bg] > Sent: Friday, September 19, 2003 8:55 AM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] GD problem > > > Hello, > We have Linux SuSE, Perl(latest ver.),Bioperl(latest ver.).We > installed GD package and then we started the programm we have > this error (but we have libgd): > Can't locate loadible object for module GD > > What's that mean? > Thanks! > > > ----------------------------------------------------------------- > http://club.abv.bg/jsp/abvCard.jsp - ???? ??? - ??????? ? ???? > ?? ????? ? ???? ! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From brian_osborne at cognia.com Fri Sep 19 07:43:31 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Sep 19 07:46:26 2003 Subject: [Bioperl-l] Version 1.2.3 class diagram? Message-ID: Bioperl-l, I would like to make a PDF class diagram for the package, but I don't have Dia, can someone help me out? I have the XML from Autodial. From what I can gather one can open this file in Dia and print to PS, then use something like ImageMagick to convert to PDF. Perhaps there are other ways? Brian O. From jason at cgt.duhs.duke.edu Fri Sep 19 09:09:45 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Sep 19 09:07:54 2003 Subject: [Bioperl-l] Version 1.2.3 class diagram? In-Reply-To: References: Message-ID: ps2pdf for converting postscript files into pdf. If you don't have it on your cygwin box it is installed on 'pub'. -jason On Fri, 19 Sep 2003, Brian Osborne wrote: > Bioperl-l, > > I would like to make a PDF class diagram for the package, but I don't have > Dia, can someone help me out? I have the XML from Autodial. From what I can > gather one can open this file in Dia and print to PS, then use something > like ImageMagick to convert to PDF. Perhaps there are other ways? > > Brian O. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From basm101 at york.ac.uk Fri Sep 19 09:26:44 2003 From: basm101 at york.ac.uk (basm101) Date: Fri Sep 19 09:23:12 2003 Subject: [Bioperl-l] Trees as graphics Message-ID: <3F6B0414.6070009@york.ac.uk> Hi there, Is there a bioperl feature that allows tree files to be converted easily from PHYLIP Newick format into gif files or other graphics ? At the moment to do this I have to go into TreeView and do "save as graphic". A Perl module to speed this up would be very useful. Thanks, basm101 From jason at cgt.duhs.duke.edu Fri Sep 19 09:38:38 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Sep 19 09:36:54 2003 Subject: [Bioperl-l] Trees as graphics In-Reply-To: <3F6B0414.6070009@york.ac.uk> References: <3F6B0414.6070009@york.ac.uk> Message-ID: Install the biperl-run package. Bio::Tools::Run::Phylo::Phylip::DrawTree or Bio::Tools::Run::Phylo::Phylip::DrawGram are wrappers around drawtree and drawgram respectively. (we currently only support Phylip 3.5 menus) I tend to use treeplot as well (http://www.cnrs-gif.fr/pge/bioinfo) and output as adobe illustrator file which can be tweaked. No bioperl wrapper about this yet, but it is quite easy to script with it already. -jason On Fri, 19 Sep 2003, basm101 wrote: > Hi there, > > Is there a bioperl feature that allows tree files to be converted easily > from PHYLIP Newick format > into gif files or other graphics ? At the moment to do this I have to go > into TreeView and do "save > as graphic". A Perl module to speed this up would be very useful. > > Thanks, > basm101 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From allenday at ucla.edu Fri Sep 19 12:58:53 2003 From: allenday at ucla.edu (Allen Day) Date: Fri Sep 19 12:57:22 2003 Subject: [Bioperl-l] Trees as graphics In-Reply-To: Message-ID: i also have an adapter to draw trees as SVG from newick format. it's still unreleased, i plan on getting it out within the next couple of weeks. -allen On Fri, 19 Sep 2003, Jason Stajich wrote: > Install the biperl-run package. > Bio::Tools::Run::Phylo::Phylip::DrawTree > or > Bio::Tools::Run::Phylo::Phylip::DrawGram > > are wrappers around drawtree and drawgram respectively. > (we currently only support Phylip 3.5 menus) > > I tend to use treeplot as well (http://www.cnrs-gif.fr/pge/bioinfo) and > output as adobe illustrator file which can be tweaked. No bioperl wrapper > about this yet, but it is quite easy to script with it already. > > -jason > On Fri, 19 Sep 2003, basm101 wrote: > > > Hi there, > > > > Is there a bioperl feature that allows tree files to be converted easily > > from PHYLIP Newick format > > into gif files or other graphics ? At the moment to do this I have to go > > into TreeView and do "save > > as graphic". A Perl module to speed this up would be very useful. > > > > Thanks, > > basm101 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gnf.org Fri Sep 19 13:37:28 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Sep 19 13:35:21 2003 Subject: [Bioperl-l] Re: [BioSQL-l] gene ontology questions revisited In-Reply-To: <200309191451.29091.daniel.lang@biologie.uni-freiburg.de> Message-ID: On 9/19/03 5:51 AM, "Daniel Lang" wrote: > But another one occurred while loading the data: > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were > ("MetaCyc","2-PYRONE-4\,6-DICARBOXYLATE-LACTONASE-RXN","0") FKs () > ERROR: value too long for type character varying(40) > --------------------------------------------------- The problem here is that the references for GO terms are modeled as DBXrefs with dbname and accession. This sometimes applies quite well, but often the reference in the GO.defs file is used in a far wider sense. In the example above for instance, the reference is in fact to a term in another ontology (MetaCyc), so should be a term relationship rather than a reference. So, what you're seeing is the result of deficiencies in the flat file representation (term references can be any of lit.reference, dbxref, and ontology term) and consequently in the parser (who doesn't try to be smarter than the flat file representation). Unfortunately that assessment doesn't help you much. What I did locally (I obviously ran into the same problem) is widening the accession column in dbxref to 64 chars, which is I thought a somewhat reasonable compromise. You don't want to open it up completely and water down the relational model just because a certain flat file format is deficient in its expressivity). This doesn't fix the problem that something ends up as a dbxref when it should rather be a term relationship. Anyone else got a good idea here? I'm cc'ing the bioperl list since this is rather an issue of the object-space representation than one of the schema. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Fri Sep 19 13:51:17 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Sep 19 13:49:08 2003 Subject: [Bioperl-l] proposed changes to RangeI.pm In-Reply-To: Message-ID: On 9/17/03 6:59 AM, "Chris Mungall" wrote: > > is this a good time to check these changes in? I think so ... Heikki? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Fri Sep 19 14:36:29 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Sep 19 14:34:24 2003 Subject: [Bioperl-l] Problems with biosql In-Reply-To: <92446450.1062950653@[172.23.198.245]> Message-ID: First off, sorry for the late reply. I've been on vacation. On 9/7/03 2:04 PM, "Christopher Mason" wrote: > Howdy- > > I'm using the latest CVS versions (as of 5 Sep 03) of bioperl-live, > bioperl-db, and biosql-schema with the latest PostgreSQL (7.3.4) and Perl > (5.8.0). I'm trying to load all of swiss-prot into a biosql database, and > it's not going well. Although my ultimate goal is to manipulate this > database from java, I'm using perl because the various docs I've read seem > to indicate this is the way to go for loading (if it's not, please tell > me). You should be able to use biojava for loading as well, but bioperl-db has the much configurable script load_seqdatabase.pl. First off, after you unpacked or downloaded bioperl-db, did you run the test suite? Did all tests pass? > > There are some errors output when loading the schema (see below). But in > general, creating the database seems to work. > > However, when trying to run: > >> bioperl-db/scripts/biosql/load_seqdatabase.pl --dbname biosql >> --driver Pg --format swiss --dbuser cmason >> --namespace bioperl sprot.dat > > I immediately get this error: > >> Could not store P15711: >> ------------- EXCEPTION ------------- >> MSG: You're trying to lie about the length: is 102 but you say 924 > > (P15711 is the very first entry in the file.) > (Full traceback below.) This would mean that P15711 was already in the database, with an erroneous (truncated?) sequence. Did you load that before, and did you manipulate the seq column in biosequence? > > which seems to be generated here: > > Bio/PrimarySeq.pm:419 >> "You're trying to lie about the length: ". >> "is $len but you say ".$val); > > called from here: > > Bio/DB/BioSQL/BiosequenceAdaptor.pm:252 >> $obj->alphabet($rows->[3]) if $rows->[3]; >> $obj->seq($rows->[4]) if $rows->[4]; >> $obj->length($rows->[2]) if $rows->[2]; # <---- 252 >> if($obj->isa("Bio::DB::PersistentObjectI") && > > $rows is > >> [1, undef, 924, protein, undef, 1] > > > Commenting out the indicated line seems to prevent this error message. > However, then I get, about two days later, this message: > >> Out of memory! These two should be unrelated. If all goes correctly, the length column should have a value identical to the length of the sequence, and therefore there should be no exception. In your case, the length of the sequence as it comes back from the database is much shorter than the one in the flat file. Somewhere in between it must have gotten truncated or updated. > > The state of the database is odd: > >> biosql=# select count(bioentry_id) from bioentry; >> count >> ------- >> 1 >> (1 row) > > but: > >> biosql=# select count (seqfeature_id) from location; >> count >> ------- >> 1329 >> (1 row) This is indeed odd. There should be UK and FK constraints that should completely prevent this from happening. Have you tinkered with the constraints? What is the result of biosql=# select count(distinct bioentry_id) from seqfeature; and biosql=# select count(distinct seqfeature_id) from location; I suspect there is something wrong with either the installation of Pg or with the instantiation of biosql. I'd start over with a fresh instance. Try to insert something made up into seqfeature. It should fail. Same for location. Try to cut out the first say 10 entries from sprot and load them and see whether you succeed. You can supply --testonly to have everything rolled back at the end automatically. Also, note that it is advisable to pre-load the taxonomy database (see load_taxonomy.pl in the biosql-schema module). Otherwise you'll see errors from misparsed organisms. > > and: > >> # du -sk /home/postgres/ >> 739724 /home/postgres > > (There are no other database besides biosql.) > > (I tried VACUUMing the database which caused it to grow by about 100MB, but > nothing else shows up.) You mean shrink, not grow? > > It's hard to tell how far it's gotten when it runs out of memory. I sort > of expected the size of the finished database to be somewhat larger than > the size of the flat file. It should be at least 3-4 times that size b/c of indexes. > > But even if it's almost finished, it's incredibly slow (at least 1,300 > minutes of user time, not counting postgres). Would mysql be much faster? > Or should I simply be prepared to wait a long time? > > Has anyone tried this recently (importing all of swiss prot into a biosql > database) with any database (postgres, mysql, oracle, etc.)? If so, can > you give me (even approximate) performance numbers (for loading, selecting > a sequence, etc.) and ultimate database size on disk? I'm doing this on a regular basis for a biosql instance on Oracle. Swissprot and TREMBL load (or update) over night when run in parallel. The problem with Pg is that is relies a lot on index statistics. When you're first building your database, you ramp up the content from zero to a couple hundred thousand bioentries (and 3-5 times more features, annotation associations, etc). So, a while into the upload run, the statistics are grossly wrong and may lead Pg making the wrong decisions on the query plan. You can try to run vacuumdb --analyze every hour or so while it's loading (you can run this concurrently meanwhile) to try to remedy this problem. > I'm trying to > determine if this is a viable way of architecting my application (which > incidentally, will probably be written in java, not perl). This is along our architecture here. We do the data loads in perl, whereas the web-app for browsing and searching uses the J2EE stack. Works fine. > > Also, why is this code spread out over three different CVS modules? Because those are the components. Biosql is the schema (which, as you noticed already, is not tied to perl or bioperl in any way), then you have the bio* library according to your language preference (e.g., bioperl), and the third component is the language bindings, which for bioperl has traditionally been in its own module (bioperl-db). > > Thanks, > > -c > > When loading the schema: > >>> psql biosql < biosqldb-views-pg.sql This hasn't been updated for the latest version of the schema. You don't need this. >> ERROR: Relation "seqfeature_key" does not exist >> ERROR: view "gff" does not exist >> ERROR: Relation "ontology_term" does not exist >> ERROR: Relation "ontology_term" does not exist >> ERROR: Relation "fasta" does not exist >> ERROR: Relation "ontology_term" does not exist >> ERROR: parser: parse error at end of input >> ERROR: RemoveFunction: function compl(text) does not exist >> CREATE FUNCTION >> ERROR: RemoveFunction: function reverse(text) does not exist >> ERROR: stat failed on file >> '/home/cjm/cvs/biosql-schema/ext/biosqldb-funcs.so': No such file or >> directory ERROR: Function reverse("unknown") does not exist >> Unable to identify a function that satisfies the given argument >> types You may need to add explicit typecasts >> ERROR: RemoveFunction: function get_subseq(text, integer, integer, >> integer) does not exist CREATE FUNCTION >> get_subseq >> ------------ >> bc >> (1 row) >> >> ERROR: view "gffseq" does not exist >> ERROR: Relation "seqfeature_key" does not exist >> ERROR: Relation "seqfeature_key_v" does not exist >> ERROR: Relation "seqfeature_key_v" does not exist >> ERROR: Relation "seqfeature_key_v" does not exist >> ERROR: Relation "seqfeature_key_v" does not exist >> ERROR: Relation "seqfeature_key_v" does not exist >> ERROR: Relation "seqfeature_key_v" does not exist >> ERROR: Relation "seqfeature_key_v" does not exist >> ERROR: Relation "seqfeature_key_v" does not exist >> ERROR: Relation "seqfeature_key_v" does not exist > > and: > >>> psql biosql < biosql-accelerators-pg.sql Same applies here. Although I'm not sure whether biojava still depends on it or not. -hilmar >> ERROR: RemoveFunction: function biosql_accelerators_level() does not >> exist CREATE FUNCTION >> ERROR: RemoveFunction: function intern_ontology_term(text) does not exist >> CREATE FUNCTION >> ERROR: RemoveFunction: function intern_seqfeature_source(text) does not >> exist CREATE FUNCTION >> ERROR: RemoveFunction: function create_seqfeature(integer, text, text) >> does not exist CREATE FUNCTION >> ERROR: RemoveFunction: function create_seqfeature_onespan(integer, text, >> text, integer, integer, integer) does not exist CREATE FUNCTION > > > Then when trying to load: > > >> ------------- EXCEPTION ------------- >> MSG: You're trying to lie about the length: is 102 but you say 924 >> STACK Bio::PrimarySeq::length >> /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:419 STACK >> Bio::DB::Persistent::PersistentObject::AUTOLOAD >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:541 >> STACK Bio::Seq::length /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:612 >> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:541 >> STACK Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BiosequenceAdaptor.pm:254 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:12 >> 78 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:966 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:851 >> STACK Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284 >> STACK Bio::DB::BioSQL::SeqAdaptor::attach_children >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/SeqAdaptor.pm:279 STACK >> Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:13 >> 09 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:966 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:851 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:204 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 >> STACK Bio::DB::Persistent::PersistentObject::store >> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 >> STACK (eval) ./load_seqdatabase.pl:446 >> STACK toplevel ./load_seqdatabase.pl:429 >> >> -------------------------------------- > > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sun Sep 21 00:41:18 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Sep 21 00:39:17 2003 Subject: [Bioperl-l] slides of persistent bioperl bosc03 talk Message-ID: I offered the slides a while ago and then got dragged away by other things before being able to follow through. I've posted them now: http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf I also wrote a news entry which I guess needs a while to propagate. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney at ebi.ac.uk Sun Sep 21 02:13:15 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Sun Sep 21 07:29:46 2003 Subject: [Bioperl-l] Exonerate vulgar lines; SearchIO model Message-ID: I have added vulgar line parsing to the exonerate output. I made a simpler model than the cigar line parsing of having just the M state durations as being HSPs. (this will get checked into the main trunk) This brings up an issue - should HSPs in the SearchIO objects be ungapped or gapped? It looks as if gapped cases are allowed - is this the case? Should we flag ungapped vs gapped (or is this done already somehow?) I am starting to grok more why this event passing system is useful (abstracts out parsing from object creation, more graceful about partial information etc) but it does seem... quite alot of scaffolding... I guess we should make SeqIO work like this at some point, but that's definitely not in my critical path at the moment. From jason at cgt.duhs.duke.edu Sun Sep 21 09:07:42 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Sep 21 09:05:51 2003 Subject: [Bioperl-l] Re: Exonerate vulgar lines; SearchIO model In-Reply-To: References: Message-ID: On Sun, 21 Sep 2003, Ewan Birney wrote: > > I have added vulgar line parsing to the exonerate output. I made > a simpler model than the cigar line parsing of having just the M > state durations as being HSPs. cool! Is there a switch in SearchIO::exonerate to look for '^cigar' versus '^vulgar' lines? > > (this will get checked into the main trunk) > > > This brings up an issue - should HSPs in the SearchIO objects > be ungapped or gapped? It looks as if gapped cases are allowed > - is this the case? Should we flag ungapped vs gapped (or is > this done already somehow?) > Basically both are allowed. That flag would be the gap count in the HSP I guess. In general they should be ungapped, but we handle 'small gaps' in the sense that FASTA or BLAST HSPs can contain gaps (whose location we do have access to by virture of the gap charater in homology line). In fact the HitI->gaps call only collects the count of all the gaps for the contained HSPs - ala exons on genomic DNA we might count gaps in the (cDNA) exon alignment but not the overall (intron) gaps introduced by the alignment. What do you think should be done? > > I am starting to grok more why this event passing system is useful > (abstracts out parsing from object creation, more graceful about > partial information etc) but it does seem... quite alot of > scaffolding... > I agree - now that it is sort gotten out there and we at least have something to evaluate, I am game for looking at some refactoring. Had to make it work first... > > I guess we should make SeqIO work like this at some point, but > that's definitely not in my critical path at the moment. > Ditto. -- Jason Stajich Duke University jason at cgt.mc.duke.edu From donald.jackson at bms.com Sun Sep 21 09:50:58 2003 From: donald.jackson at bms.com (Donald G. Jackson) Date: Sun Sep 21 09:45:01 2003 Subject: [Bioperl-l] Re: Trees as graphics Message-ID: <3F6DACC2.7060103@bms.com> basm101, to follow up on Jason's post - Phylip's drawgram program will output postscript. If you just want to change the format, the 'ps2gif' program (part of Ghostscript - standard on most linux distros, available for Windows) is the easiest way. I use ImageMagick's 'convert' program (www.imagemagick.org??) which lets me add legends and headers too. Don Jackson BMS Bioinformatics From senger at ebi.ac.uk Sun Sep 21 10:13:04 2003 From: senger at ebi.ac.uk (Martin Senger) Date: Sun Sep 21 10:11:19 2003 Subject: [Bioperl-l] Re: Warning and error message using Bio::Biblio module Message-ID: This reply is quiote delayed - I am sorry for that. I hope that thanks to Heikke you already have an answer anyway. Heikke was right that there were some periods when the biblio server did not respond. I have fixed it - and I hope that now my cron jobs informing me about the server failure will work correctly so I will know sooner that something is going wrong :-) Heikke was also right that the simple 'use Bio::Biblio' should not generate any warnings even if the biblio server is not responding. Let me know if you still (or in the future will) have similar problems. I have now tested the examples you gave in the email with the correct following results: perl -MBio::Biblio -e 'print new Bio::Biblio->find("perl")->get_count' 8245 perl -MBio::Biblio -e 'print join ("\n", @{ new Bio::Biblio->find("brazma")->get_all_ids })' MEDLINENEW/12386000 MEDLINENEW/12386004 MEDLINENEW/12387284 MEDLINENEW/12225585 MEDLINENEW/12540298 MEDLINENEW/12519944 MEDLINENEW/12519949 MEDLINENEW/12652749 MEDLINENEW/12424109 MEDLINENEW/12529438 MEDLINE2003/894225 MEDLINE2003/10693778 MEDLINE2003/10977099 MEDLINE2003/10967323 MEDLINE2003/11238066 MEDLINE2003/11580977 MEDLINE2003/11696959 MEDLINE2003/11726920 MEDLINE2003/12049748 MEDLINE2003/12227734 MEDLINE2003/8877502 MEDLINE2003/9322017 MEDLINE2003/9672833 Regards, Martin -- Martin Senger EMBL Outstation - Hinxton Senger@EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From jason at cgt.duhs.duke.edu Sun Sep 21 10:32:36 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Sep 21 10:30:42 2003 Subject: [Bioperl-l] Drawing ASCII trees (was Bio::TreeIO::tabtree Question) In-Reply-To: References: Message-ID: As I responded to Simon off-list tabtree was just a play format output I had started but never finished. This would be a fun project for the compsci types out there. A bunch of solutions exist for this problem it just requires applying it to the Bio::Tree::Node API. I checked in some changes on the main trunk which at least connect things with some lines, but it still isn't as nice as I would like to see. If someone is keen to work on this - you basically just need to take bits and pieces of some solutions to the perl quiz of the week #5 and apply it to _write_tree_Helper: [the original question] http://perl.plover.com/qotw/misc/e005/question [some solutions <*.pl>] http://perl.plover.com/qotw/misc/e005/ Ideally we could support vertical and horizontal layout w/ and w/o the boxes as many of the example solutions have done. You would just need to add another option when initializing TreeIO::tabtree to tell it what type of layout you want. If your day involves drawing a lot of trees and want some easy ways to generate overview ASCII pictures of them this would be a good project for you... -jason On Thu, 18 Sep 2003, Simon Chan wrote: > Greetings, > > I've got a question regarding Bio::TreeIO::tabtree > > I want to draw phylogenetic trees in via ASCII and was quite happy to find > tabtree. I inputed a newick file that contained the tree: > > (B,(A,C,E),D); > > In my output file, I got this: > > B > D > > A > C > E > > > I was expecting that there would be branches (ie: " ---" ) linking the > letters together. > What's wrong? > > > Many thanks for your time! Any suggestions/comments would be greatly > appreciated. > > > My code is below: > #!/usr/bin/perl -w > > use Bio::TreeIO; > > my $treeFile = "newickTreeFile.txt"; > my $in = new Bio::TreeIO(-file => "$treeFile", -format => 'newick'); > my $out = new Bio::TreeIO(-file => '>output', -format => 'tabtree'); > while( my $tree = $in->next_tree ) { $out->write_tree($tree); } > > _________________________________________________________________ > Add photos to your messages with MSN 8. Get 2 months FREE*. > http://join.msn.com/?page=features/featuredemail > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From letondal at pasteur.fr Sun Sep 21 15:22:57 2003 From: letondal at pasteur.fr (Catherine Letondal) Date: Sun Sep 21 15:21:06 2003 Subject: [Bioperl-l] Pise modules: testing for service availability Message-ID: <20030921212257.A199463@electre.pasteur.fr> The Pise servers have currently a PBS problem, so if someone tries to run a Pise client module, s/he won't get it run. This message is to remind that you can run Pise programs on another location: my $factory = new Bio::Tools::Run::AnalysisFactory::Pise( -location => 'http://kun.majorlinux.com:5080/cgi-bin/Pise/5.a/'); (check before at: http://kun.homelinux.com/Pise/5.a/ whether the program is installed by trying it interactively) Job errors can also be tested: my $clustalw = $factory->program('clustalw'); $clustalw->infile($ARGV[0]); my $job = $clustalw->run; if ($job->error) { print "error: ", $job->error_message(), "\n"; exit; } else { # whatever you need ... print $job->stdout(); } This test is performed by the Pise test installation step, so it should not be a problem for people doing a make test at installation time. The service should be back tomorrow, sorry for the inconvenience. -- Catherine Letondal -- Pasteur Institute Computing Center From Marc.Logghe at devgen.com Wed Sep 17 09:04:27 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Sun Sep 21 17:40:27 2003 Subject: [Bioperl-l] order of sublocations Message-ID: > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: Wednesday, September 17, 2003 2:18 PM > To: Marc Logghe > Cc: Bioperl-L (E-mail) > Subject: Re: [Bioperl-l] order of sublocations > > > This alone won't work because of the case when we have remote > locations > the sorting will mess up that order. Besides the order specified does > mean something - we had to go through this with the > spliced_seq method to > generate the spliced out sequence. > > If we're specifying a gene on the complement it would be > incorrect to sort the > exons like you have shown - that puts them in the wrong order. I don't know whether there is a concept of 'wrong order', is there ?. Is it relevant in Bioperl ? Nothing is said about it in the Feature Table Definition. You can only notice that in the examples they give, the sublocations are nicely sorted. I was just reasoning: if it is not relevant for Bioperl why not just sort them, even when they end up 'wrongly sorted' ;-) > Perhaps complement(join(<1..66,129..218)) is what VectorNTI > wants - but > we've been down that road before in getting the code to > output this and > it would require doing something a bit different to distribute the > complementation status to all the subfeatures. > > To please VectorNTI it seems to me we need to split CDSes > into individual > exons - if this is in regard to SequenceDumping in Gbrowse - I have a > 'CDS' track which I use to dump the CDSes as individual > features rather > than joined together. Like I said: it is more a VectorNTI problem. I preferred to have a CDS in stead of exons because the picture looks less messy. When you have to deal with exons of different splicing variants, you can't easily see which exons belong to each other. It is pic1 versus pic2. Cheers, Marc -------------- next part -------------- A non-text attachment was scrubbed... Name: pic2.jpg Type: image/jpeg Size: 15936 bytes Desc: pic2.jpg Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20030917/41be4a67/pic2-0001.jpg -------------- next part -------------- A non-text attachment was scrubbed... Name: pic1.jpg Type: image/jpeg Size: 17035 bytes Desc: pic1.jpg Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20030917/41be4a67/pic1-0001.jpg From letondal at pasteur.fr Mon Sep 22 05:58:50 2003 From: letondal at pasteur.fr (Catherine Letondal) Date: Mon Sep 22 05:57:00 2003 Subject: [Bioperl-l] End-User Development Survey Message-ID: <200309220958.h8M9wolQ348869@electre.pasteur.fr> Hi, An european End-User Development network (http://giove.cnuce.cnr.it/eud-net.htm) has started 1 year ago. Its aim is to develop and promote development and tailoring of software tools by their users, which I think is of major importance in our field. By "users", they mean not only "end-user" - I don't really like this term but it's the common adopted one - but also people who, while being professional of a domain (i.e bioinformatics) and thus not being "professional" computer scientists, do participate in the development of software tools. I thought that it might be interesting for this network to get some input from real "practicionners" by asking to the bioperlers to answer the survey questionnaire that is online : http://www.lri.fr/~letondal/Bib/eud_survey.html Answers are sent by email to the organizers of the network and are fully anonymous (no record of the sending machine is made, nobody has access to the log files etc...). The questionnaire tests show that it should take about 10 minutes to fill. Thanks to all of you, -- Catherine Letondal -- Pasteur Institute Computing Center From birney at ebi.ac.uk Mon Sep 22 07:37:38 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon Sep 22 07:35:45 2003 Subject: [Bioperl-l] Re: Exonerate vulgar lines; SearchIO model In-Reply-To: Message-ID: <2AEC62A3-ECF1-11D7-B260-000393CBD5AE@ebi.ac.uk> On Sunday, September 21, 2003, at 02:07 pm, Jason Stajich wrote: > On Sun, 21 Sep 2003, Ewan Birney wrote: > >> >> I have added vulgar line parsing to the exonerate output. I made >> a simpler model than the cigar line parsing of having just the M >> state durations as being HSPs. > > cool! Is there a switch in SearchIO::exonerate to look for '^cigar' > versus '^vulgar' lines? > Yup. The same parser can deal with either ^cigar or ^vulgar. I thought that was going to be more robust in the long term >> >> (this will get checked into the main trunk) >> >> >> This brings up an issue - should HSPs in the SearchIO objects >> be ungapped or gapped? It looks as if gapped cases are allowed >> - is this the case? Should we flag ungapped vs gapped (or is >> this done already somehow?) >> > > Basically both are allowed. That flag would be the gap count in the > HSP > I guess. In general they should be ungapped, but we handle 'small > gaps' in > the sense that FASTA or BLAST HSPs can contain gaps (whose location we > do have access to by virture of the gap charater in homology line). > Ok. I guess I should make this a flag in the parser (ungapped HSPs or gapped HSPs...) > In fact the HitI->gaps call only collects the count of all the gaps for > the contained HSPs - ala exons on genomic DNA we might count gaps in > the > (cDNA) exon alignment but not the overall (intron) gaps introduced by > the > alignment. What do you think should be done? > >> >> I am starting to grok more why this event passing system is useful >> (abstracts out parsing from object creation, more graceful about >> partial information etc) but it does seem... quite alot of >> scaffolding... >> > > I agree - now that it is sort gotten out there and we at least have > something to evaluate, I am game for looking at some refactoring. Had > to > make it work first... > > >> >> I guess we should make SeqIO work like this at some point, but >> that's definitely not in my critical path at the moment. >> > Ditto. > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From basm101 at york.ac.uk Mon Sep 22 12:02:21 2003 From: basm101 at york.ac.uk (basm101) Date: Mon Sep 22 11:58:29 2003 Subject: [Bioperl-l] Re: Trees as graphics References: <3F6DACC2.7060103@bms.com> Message-ID: <3F6F1D0D.3040301@york.ac.uk> Thanks for the advice all. Sorry if this follow-up isnt about bioperl - is there another discussion board you guys use for general questions about bioinformatics programs ? Does anyone know if it is possible using PHYLIP to output postscript files that include bootstrap values ? I have been trying using drawgram to output postscript files. However, when I preview the tree (which had been generated via seqboot->protdist->neighbor->consense) the bootstrap values that I can see in TreeView are no longer there. I will have a go at using ImageMagick and the Perl interface to it. Thanks, basm101 Donald G. Jackson wrote: > basm101, > > to follow up on Jason's post - > > Phylip's drawgram program will output postscript. If you just want to > change the format, the 'ps2gif' program (part of Ghostscript - > standard on most linux distros, available for Windows) is the easiest > way. I use ImageMagick's 'convert' program (www.imagemagick.org??) > which lets me add legends and headers too. > > Don Jackson > BMS Bioinformatics > From jason at cgt.duhs.duke.edu Mon Sep 22 12:14:57 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Sep 22 12:13:12 2003 Subject: [Bioperl-l] Re: Trees as graphics In-Reply-To: <3F6F1D0D.3040301@york.ac.uk> References: <3F6DACC2.7060103@bms.com> <3F6F1D0D.3040301@york.ac.uk> Message-ID: I would really suggest treeplot. You specify the bootstraps as the label for internal nodes like this. (((A:0.1,B:0.2)100:0.6),D:0.4),100:0.30); Treeplot version 0.7 Developpment: Olivier Langella CNRS UPR9034 Laboratoire PGE e-mail: langella@pge.cnrs-gif.fr Web: http://www.cnrs-gif.fr/pge/bioinfo USAGE: treeplot SOURCE DESTINATION -[options] DESTINATION is a postscript file (output.ps) by default. to produce other format, choose one of these extensions: .ps ==> Postscript .ai ==> Adobe Illustrator .svg ==> Scalable Vector Graphic .cgm ==> Computer Graphic Metafile .hpgl ==> Hewlet Packard Graphic Language .fig ==> xfig file .gif ==> gif image file .pnm ==> PBM Portable aNy Map file OPTIONS for treeplot -t rect drawing rectangular cladogram -t phylogram drawing phylogram -nb no bootstrap values -og 'LIST' -g 'COLOR' 'LIST' where LIST is a list of OTU's (separated by spaces) COLOR is a name of color (in english) or an rgb code (example: '255 0 0' for red) Other places to ask could be bionet.molbio.evolution bionet.molbio.software I guess. On Mon, 22 Sep 2003, basm101 wrote: > Thanks for the advice all. > Sorry if this follow-up isnt about bioperl - is there another discussion > board you guys use for general questions > about bioinformatics programs ? > > Does anyone know if it is possible using PHYLIP to output postscript > files that include bootstrap values ? > I have been trying using drawgram to output postscript files. However, > when I preview the tree (which had been > generated via seqboot->protdist->neighbor->consense) the bootstrap > values that I can see in TreeView > are no longer there. > > I will have a go at using ImageMagick and the Perl interface to it. > > > Thanks, > basm101 > > > > Donald G. Jackson wrote: > > > basm101, > > > > to follow up on Jason's post - > > > > Phylip's drawgram program will output postscript. If you just want to > > change the format, the 'ps2gif' program (part of Ghostscript - > > standard on most linux distros, available for Windows) is the easiest > > way. I use ImageMagick's 'convert' program (www.imagemagick.org??) > > which lets me add legends and headers too. > > > > Don Jackson > > BMS Bioinformatics > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From Lobvi.Matamoros at crchul.ulaval.ca Mon Sep 22 14:31:11 2003 From: Lobvi.Matamoros at crchul.ulaval.ca (Lobvi Matamoros) Date: Mon Sep 22 12:29:21 2003 Subject: [Bioperl-l] Script Message-ID: <4.2.0.58.20030922122731.00a554b8@drs.crchul.ulaval.ca> Hi to everyone: I am trying to know how many times a particular amino acid motif occur in a protein database, for instance NCCC, in other words count that particular motif. Does anyone have an script to perform that task or something close I can change a little bit?. Thanks for your help in advance Lobvi Matamoros Fern?ndez, Ph.D Post-doctoral fellow Centre de Recherche du CHUL 2705 Boul. Laurier, T3-80 Sainte-Foy (Qu?bec) G1V 4G2 CANADA Tel: 418-6542261 FAX:418-654-2279 From jason at cgt.duhs.duke.edu Mon Sep 22 12:41:48 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Sep 22 12:39:52 2003 Subject: [Bioperl-l] Script In-Reply-To: <4.2.0.58.20030922122731.00a554b8@drs.crchul.ulaval.ca> References: <4.2.0.58.20030922122731.00a554b8@drs.crchul.ulaval.ca> Message-ID: You might instead try fuzzpro: http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/fuzzpro.html It would be pretty trivial to write something to do what you ask with bioperl but you might want to try and read the HOWTOs an the tutorial a little bit before asking for people to write something for you. For example - see how to read in sequences here: http://www.bioperl.org/HOWTOs/html/SeqIO.html The you just need to know $seq->seq() -- get the sequence as a string from the Sequence object and the perl function index % perldoc -f index to see how the Seq object ( $seq->seq() ) An example problem here: http://jason.open-bio.org/~jason/Bioperl_Tutorials/Duke_2003/problem_sets/perl/02_find_subsequence.txt With answers of course too: http://jason.open-bio.org/~jason/Bioperl_Tutorials/Duke_2003/problem_sets/perl/answers/ On Mon, 22 Sep 2003, Lobvi Matamoros wrote: > > > Hi to everyone: > > I am trying to know how many times a particular amino acid motif occur in a > protein database, for instance NCCC, in other words count that particular > motif. Does anyone have an script to perform that task or something close I > can change a little bit?. > > Thanks for your help in advance > > Lobvi Matamoros Fern?ndez, Ph.D > Post-doctoral fellow > > Centre de Recherche du CHUL > 2705 Boul. Laurier, T3-80 > Sainte-Foy (Qu?bec) > G1V 4G2 CANADA > Tel: 418-6542261 > FAX:418-654-2279 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From brian_osborne at cognia.com Mon Sep 22 13:51:08 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Sep 22 13:53:51 2003 Subject: [NORDNS] [Bioperl-l] Script In-Reply-To: <4.2.0.58.20030922122731.00a554b8@drs.crchul.ulaval.ca> Message-ID: Lobvi, Bio::Tools::SeqWords does this, but only the non-overlapping occurrences. You could modify the module and create a new method that found overlapping occurrences. It's also possible that the motifs you're concerned with can't overlap. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Lobvi Matamoros Sent: Monday, September 22, 2003 2:31 PM To: bioperl-l@bioperl.org Subject: [NORDNS] [Bioperl-l] Script Hi to everyone: I am trying to know how many times a particular amino acid motif occur in a protein database, for instance NCCC, in other words count that particular motif. Does anyone have an script to perform that task or something close I can change a little bit?. Thanks for your help in advance Lobvi Matamoros Fern?ndez, Ph.D Post-doctoral fellow Centre de Recherche du CHUL 2705 Boul. Laurier, T3-80 Sainte-Foy (Qu?bec) G1V 4G2 CANADA Tel: 418-6542261 FAX:418-654-2279 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From markw at illuminae.com Mon Sep 22 14:23:21 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Mon Sep 22 14:23:21 2003 Subject: [Bioperl-l] Bio::DB::GenBank under mod-perl Message-ID: <1064255109.2089.224.camel@localhost.localdomain> Hi all, Has anyone else experienced a problem using Bio::DB::GenBank under mod-perl? The symptoms are that after a few hours the server is no longer able to find the Bio::SeqIO::genbank module (in fact, the entire bioperl-live folder suddenly disappears from @INC as far as I can tell!) I've tried hammering on the server to see if I can make it fail, but it works every time... it is only after a certain amount of time that it stops working. A server restart solves the problem. My other scripts (in the same package) all continue working just fine throughout, it is only the ones that use Bio::DB::GenBank that start to fail. Any ideas? Is there anything that could obviously time-out in the sequence retrieval mechanism? I've put some debugging lines into the script to see if I can watch what happens when it fails, so I might have more info on the bug(?) later, but perhaps others have noticed this problem already and have a solution... Mark -- Mark Wilkinson Illuminae From proj_mgr2 at yahoo.com Mon Sep 22 21:56:26 2003 From: proj_mgr2 at yahoo.com (E N G I N E E R) Date: Tue Sep 23 00:56:16 2003 Subject: [Bioperl-l] R e s u m e Message-ID: <200309230455.h8N4tlMg017313@portal.open-bio.org> Richard Siek. Tel: (408) 309-7006 proj_mgr4@yahoo.com POSITION: ENGINEER, DESIGNER, Mfg 1993 - present "Mech-Tronic Design" Santa Clara, CA Sr. Mechanical & Design Engineer 10 years of MECHANICAL, Electro-Mechanical, Consumer & Industrial Products & Systems DESIGN from concept to production, Prototyping and Pilot Manufacturing to a low cost fabrication. R&D, Product & Process Engineering & Development, transferring to manufacturing procedures, design & build Fixtures, Tooling & Automation, and Motion systems, Robotics, Machinery & Equipment. CADD automation, Dynamics, balancing, Complex opto-mechanical, electro-optical & micro-electronic systems, assemblies & subsystems incorporating lasers & optics, mechanical packaging, vibration isolation & damping. Dynamics, Kinematics, thermodynamics & heat transfer & FEA modeling & analysis, Military, Metric & ANSI Y-14.5M Geometric Dimensioning & Tolerances, Documentation Standards, Drafting, CADD Operator, Structural Design, analyzing stresses and tolerances. ISO 9000, DFMA implementation, value engineering of existing products and time-to-market project management. Sheet metal, housing, enclosures, plastics, injection molding, extrusion design, evaluating mechanical design concepts, review Drawings, issue ECO / ECN, 3D Solid Modeling, Electro-Mechanical assemblies, EMC / EMI shielding techniques, configurations, layouts, design studies, test planning, process development. Automation equipment and machine control using for machines and robots precision mechanisms motion programming, fluid mechanics, pneumatics systems, mounting and positioning devices, electro-mechanical and vacuum mechanisms, design and analysis of structures, castings, welded frames, mechanical detailing, redesigning mechanical components for manufacturing. Assembly drawings, Customizing Electronic, Mechanical and Electrical Equipment. Evaluate, debug, and resolve complex problems on system and module levels, materials and processes selection & development, innovative mechanical & electrical designs for manufacturability, testability & mass production. CAD Management and Operations, METRIC, SOLAR, Physics, Chemistry, Machine Technology & Machine Design. Support scientists & electronic engineers. EDUCATION: 87 Institute for Business & Technology, CA CADD Engineer, Programming, Design, Management 75/78 Electro - Mechanical College, PL Mechanical Engineering, BSME, ASEE, MBA SOFTWARE: Use ACAD 10-20 & LT, Lisp, Script, Solid Works, Pro-E, ZEMAX Design CAD Pro, Nastran, MathCAD, Lotus, dBase, Windows & Net applications: Excel, Word, Outlook, Access, MS Project, DOS, Sun UNIX, MAC, VAX computers, Micro Station 5 & 95 WP, PD, Basic, Quadro, Algor, C, Fortran, Softdesk PRODUCTS: 2.1 GB (93) HARD DISK DRIVE Design & Manufacture Commercial Color Printer, Paper Tension & Heads Motion ICT & PCB Testing Equipment & vacuum tables & valves. IBM 36 GB Hard Disk Drive Test & Assembly Equipment Design, Build & Manufacture. WAFER Rapid Thermo Processing, Chemical Operations, Inspection & Thickness Measurement Equipment for 200 & 300 mm with automatic Door & manual inserting, Scanners, Stages, Sliders, Sensors & Motion Control. Tools for mini-environments and clean room systems. Wafer Handling Cassette opening/rotation LASER MEDICAL DEVICE, Lasers, Optics, Fiber Optics. Collimator lenses automation. Opto-Mechanics, military connector ENVIRONMENTAL ENERGY & STORAGE SYSTEM, Wind & Electro-Solar Installations, Turbo machine. Electric Vehicles and Computerized Transportation System. Rack, enclosures. Fiber Communication & polishing Systems. Network, Security, Electrical Design & Installations; Commercial, Industrial, Residential, Fire Alarms, Smoke Detectors, Lights, Plugs, Panels, Power Supply, Conveyors, Spiral Elevator, Fast Cannery Transportation, Electronic and Electrical Components, Master Control Center, Sheet Metal Oven Rebuilding Project, gas panels, weldments, harness, processing hardware & devices, jigs, assembly tools. 8 years of MANUFACTURING EXPERIENCE: scheduling & production planning, injection molding, plastic parts and hydraulic & pneumatic equipment, machinery and control systems, mechanisms, robotics & medical device, precision machining. Review Manufacturing Process and Quality Control Inspecting and control standards, selecting materials, supplying BOM & Inventory Systems for Production, Assembly & Operations Management. Participate in ATM and other work team activities. Oversee and approve all applicable instructions, definition, development and qualification of materials & equipment. Directing & Specify operations and order process tooling. Provide training for personnel on correct assembly & systems work & rework techniques, and corrective actions & solutions for product, process & materials related problems. Quality execution & support, systems-level management and technical development. Use lab & shop mechanical, electrical & electronic tools, measurement & testing equipment, torque testers, calipers, micrometers, cameras & microscopes. Project management, statistical analysis, lead design team to meet customer product development milestones. Investigate design, procurement, and assembly process on projects and products. Accuracy of current costs and labor hours. SAFETY Identify and prioritize potential actions for immediate cost reduction. Develop system documentation through engineering cycle, assembly, integration and test. Ensure engineering and manufacturing documentation is completed and released. SKILS: Prepare "Dallas Project" for $400 M Contract Create CAD Department and Library System, Develop Standard for Drawings & transfer with Europe & Asia Repair 1000 drawings in 4 hours. CONSULTING Teach Mechanical Engineering and Descriptive Geometry Computer Graphics, Electro - Mechanical Design, AutoCAD Plotting, DOS, Lectures and practical assignment Software and computer support. Proficient with CADD Systems. Electrical Engineering, Industrial Engineering, Manufacturing Engineering, Design and development of proprietary Products, components & assemblies to build or expand your product line. Redesign products to improve performance and costs reduction. Strong communication and project management skills, work closely with internal cross-functional teams, OEM, customers, vendors and suppliers, meeting project deadlines in a fast-paced, customer-specific product development environment. Program Manager, Product Manger, Engineering Manager CADD Manager, Operations & Manufacturing Manager Research & Development Analyst, Idea Technologist Solutions Development, Instructor Product Development Specialist & Consultant Electro-Mechanical & Industrial Designer Motivated, Creative, Independent, Active, Innovative, Team Work PERMANENT preferred, or CONSULTING, US Citizen Open for domestic & international travel (408) 309-7006 proj_mgr4@yahoo.com From proj_mgr2 at yahoo.com Mon Sep 22 21:56:26 2003 From: proj_mgr2 at yahoo.com (E N G I N E E R) Date: Tue Sep 23 00:56:22 2003 Subject: [Bioperl-l] R e s u m e Message-ID: <200309230455.h8N4tlMg017315@portal.open-bio.org> Richard Siek. Tel: (408) 309-7006 proj_mgr4@yahoo.com POSITION: ENGINEER, DESIGNER, Mfg 1993 - present "Mech-Tronic Design" Santa Clara, CA Sr. Mechanical & Design Engineer 10 years of MECHANICAL, Electro-Mechanical, Consumer & Industrial Products & Systems DESIGN from concept to production, Prototyping and Pilot Manufacturing to a low cost fabrication. R&D, Product & Process Engineering & Development, transferring to manufacturing procedures, design & build Fixtures, Tooling & Automation, and Motion systems, Robotics, Machinery & Equipment. CADD automation, Dynamics, balancing, Complex opto-mechanical, electro-optical & micro-electronic systems, assemblies & subsystems incorporating lasers & optics, mechanical packaging, vibration isolation & damping. Dynamics, Kinematics, thermodynamics & heat transfer & FEA modeling & analysis, Military, Metric & ANSI Y-14.5M Geometric Dimensioning & Tolerances, Documentation Standards, Drafting, CADD Operator, Structural Design, analyzing stresses and tolerances. ISO 9000, DFMA implementation, value engineering of existing products and time-to-market project management. Sheet metal, housing, enclosures, plastics, injection molding, extrusion design, evaluating mechanical design concepts, review Drawings, issue ECO / ECN, 3D Solid Modeling, Electro-Mechanical assemblies, EMC / EMI shielding techniques, configurations, layouts, design studies, test planning, process development. Automation equipment and machine control using for machines and robots precision mechanisms motion programming, fluid mechanics, pneumatics systems, mounting and positioning devices, electro-mechanical and vacuum mechanisms, design and analysis of structures, castings, welded frames, mechanical detailing, redesigning mechanical components for manufacturing. Assembly drawings, Customizing Electronic, Mechanical and Electrical Equipment. Evaluate, debug, and resolve complex problems on system and module levels, materials and processes selection & development, innovative mechanical & electrical designs for manufacturability, testability & mass production. CAD Management and Operations, METRIC, SOLAR, Physics, Chemistry, Machine Technology & Machine Design. Support scientists & electronic engineers. EDUCATION: 87 Institute for Business & Technology, CA CADD Engineer, Programming, Design, Management 75/78 Electro - Mechanical College, PL Mechanical Engineering, BSME, ASEE, MBA SOFTWARE: Use ACAD 10-20 & LT, Lisp, Script, Solid Works, Pro-E, ZEMAX Design CAD Pro, Nastran, MathCAD, Lotus, dBase, Windows & Net applications: Excel, Word, Outlook, Access, MS Project, DOS, Sun UNIX, MAC, VAX computers, Micro Station 5 & 95 WP, PD, Basic, Quadro, Algor, C, Fortran, Softdesk PRODUCTS: 2.1 GB (93) HARD DISK DRIVE Design & Manufacture Commercial Color Printer, Paper Tension & Heads Motion ICT & PCB Testing Equipment & vacuum tables & valves. IBM 36 GB Hard Disk Drive Test & Assembly Equipment Design, Build & Manufacture. WAFER Rapid Thermo Processing, Chemical Operations, Inspection & Thickness Measurement Equipment for 200 & 300 mm with automatic Door & manual inserting, Scanners, Stages, Sliders, Sensors & Motion Control. Tools for mini-environments and clean room systems. Wafer Handling Cassette opening/rotation LASER MEDICAL DEVICE, Lasers, Optics, Fiber Optics. Collimator lenses automation. Opto-Mechanics, military connector ENVIRONMENTAL ENERGY & STORAGE SYSTEM, Wind & Electro-Solar Installations, Turbo machine. Electric Vehicles and Computerized Transportation System. Rack, enclosures. Fiber Communication & polishing Systems. Network, Security, Electrical Design & Installations; Commercial, Industrial, Residential, Fire Alarms, Smoke Detectors, Lights, Plugs, Panels, Power Supply, Conveyors, Spiral Elevator, Fast Cannery Transportation, Electronic and Electrical Components, Master Control Center, Sheet Metal Oven Rebuilding Project, gas panels, weldments, harness, processing hardware & devices, jigs, assembly tools. 8 years of MANUFACTURING EXPERIENCE: scheduling & production planning, injection molding, plastic parts and hydraulic & pneumatic equipment, machinery and control systems, mechanisms, robotics & medical device, precision machining. Review Manufacturing Process and Quality Control Inspecting and control standards, selecting materials, supplying BOM & Inventory Systems for Production, Assembly & Operations Management. Participate in ATM and other work team activities. Oversee and approve all applicable instructions, definition, development and qualification of materials & equipment. Directing & Specify operations and order process tooling. Provide training for personnel on correct assembly & systems work & rework techniques, and corrective actions & solutions for product, process & materials related problems. Quality execution & support, systems-level management and technical development. Use lab & shop mechanical, electrical & electronic tools, measurement & testing equipment, torque testers, calipers, micrometers, cameras & microscopes. Project management, statistical analysis, lead design team to meet customer product development milestones. Investigate design, procurement, and assembly process on projects and products. Accuracy of current costs and labor hours. SAFETY Identify and prioritize potential actions for immediate cost reduction. Develop system documentation through engineering cycle, assembly, integration and test. Ensure engineering and manufacturing documentation is completed and released. SKILS: Prepare "Dallas Project" for $400 M Contract Create CAD Department and Library System, Develop Standard for Drawings & transfer with Europe & Asia Repair 1000 drawings in 4 hours. CONSULTING Teach Mechanical Engineering and Descriptive Geometry Computer Graphics, Electro - Mechanical Design, AutoCAD Plotting, DOS, Lectures and practical assignment Software and computer support. Proficient with CADD Systems. Electrical Engineering, Industrial Engineering, Manufacturing Engineering, Design and development of proprietary Products, components & assemblies to build or expand your product line. Redesign products to improve performance and costs reduction. Strong communication and project management skills, work closely with internal cross-functional teams, OEM, customers, vendors and suppliers, meeting project deadlines in a fast-paced, customer-specific product development environment. Program Manager, Product Manger, Engineering Manager CADD Manager, Operations & Manufacturing Manager Research & Development Analyst, Idea Technologist Solutions Development, Instructor Product Development Specialist & Consultant Electro-Mechanical & Industrial Designer Motivated, Creative, Independent, Active, Innovative, Team Work PERMANENT preferred, or CONSULTING, US Citizen Open for domestic & international travel (408) 309-7006 proj_mgr4@yahoo.com From walsh at cenix-bioscience.com Tue Sep 23 03:59:11 2003 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Tue Sep 23 08:18:12 2003 Subject: [Bioperl-l] Script In-Reply-To: <4.2.0.58.20030922122731.00a554b8@drs.crchul.ulaval.ca> References: <4.2.0.58.20030922122731.00a554b8@drs.crchul.ulaval.ca> Message-ID: <3F6FFD4F.2040609@cenix-bioscience.com> This isn't a pure Bioperl implementation, but it should do the trick: # assume you have fasta file with your seqs my $seqio = Bio::SeqIO->new(-file => 'my_file.fasta'); my $count = 0; while (my $seq = $seqio->next_seq) { while ($seq->seq =~ /NCCC/g) { $count++; } } print "Found motif $count times\n"; If you need to have 2 or more amino acids possible at 1 position, then use [] in your regex. e.g. to match NCCC and NDCC, use /N[CD]CC/g Maybe someone else out there knows of a Bioperl module that would also do this. Cheers, Andrew Lobvi Matamoros wrote: > > > Hi to everyone: > > I am trying to know how many times a particular amino acid motif occur > in a protein database, for instance NCCC, in other words count that > particular motif. Does anyone have an script to perform that task or > something close I can change a little bit?. > > Thanks for your help in advance > > Lobvi Matamoros Fern?ndez, Ph.D > Post-doctoral fellow > > Centre de Recherche du CHUL > 2705 Boul. Laurier, T3-80 > Sainte-Foy (Qu?bec) > G1V 4G2 CANADA > Tel: 418-6542261 > FAX:418-654-2279 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Pfotenhauerstr. 108 01307 Dresden, Germany Tel. +49(351)210-2699 Fax +49(351)210-1309 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From heikki at ebi.ac.uk Tue Sep 23 12:22:21 2003 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Sep 23 12:20:35 2003 Subject: [Bioperl-l] proposed changes to RangeI.pm In-Reply-To: References: Message-ID: <1064334135.2373.63.camel@localhost.dhcp.ebi.ac.uk> Go ahead, Chris! -Heikki On Fri, 2003-09-19 at 18:51, Hilmar Lapp wrote: > On 9/17/03 6:59 AM, "Chris Mungall" wrote: > > > > > is this a good time to check these changes in? > > I think so ... Heikki? -hilmar -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From wes.barris at csiro.au Wed Sep 24 00:46:03 2003 From: wes.barris at csiro.au (Wes Barris) Date: Wed Sep 24 00:44:23 2003 Subject: [Bioperl-l] Bio::Seq documentation? Message-ID: <3F71218B.9070907@csiro.au> Hi, According to the bioperl documentation the Bio::SeqIO genbank method "next_seq" returns a "Bio::Seq" object. Where is the documentation for a "Bio::Seq" object? In the documentation on the bioperl web site, if you click on "bioperl-1.2.3::Bio::Seq" I see the following list: BaseSeqProcessor EncodedSeq LargePrimarySeq LargeSeq PrimaryQual PrimedSeq QualI RichSeq RichSeqI SeqBuilder SeqFactory SeqFastaSpeedFactory SeqWithQuality SequenceTrace TraceI Which one of these do I look under to find out information about what is returned by "next_seq"? What I am trying to do is to parse a genbank file and find out the sequence direction (3 prime or 5 prime). What I have done is this, but I am trying to find out if there is a better "built-in" way to do this: if ($seq->description =~ /3'/) { $direction = '3prime'; } elsif ($seq->description =~ /5'/) { $direction = '5prime'; } -- Wes Barris E-Mail: Wes.Barris@csiro.au From Marc.Logghe at devgen.com Wed Sep 24 03:53:50 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Sep 24 03:51:55 2003 Subject: [Bioperl-l] Bio::Seq documentation? Message-ID: > -----Original Message----- > From: Wes Barris [mailto:wes.barris@csiro.au] > Sent: Wednesday, September 24, 2003 6:46 AM > To: Bioperl Mailing List > Subject: [Bioperl-l] Bio::Seq documentation? > > > Hi, > > According to the bioperl documentation the Bio::SeqIO genbank method > "next_seq" returns a "Bio::Seq" object. Where is the documentation > for a "Bio::Seq" object? In the documentation on the bioperl > web site, > if you click on "bioperl-1.2.3::Bio::Seq" I see the following list: You have to click on "bioperl-1.2.3::Bio". Consider it as a directory. In the Bio directory you should find the Seq.pm module. Actually, Bio::SeqIO::genbank returns a Bio::Seq::RichSeq object (which is-a Bio::Seq object). HTH, Marc From brian_osborne at cognia.com Wed Sep 24 08:01:34 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Sep 24 08:03:46 2003 Subject: [NORDNS] [Bioperl-l] Bio::Seq documentation? In-Reply-To: <3F71218B.9070907@csiro.au> Message-ID: Wes, If you look at the "Docs" page, http://www.bioperl.org/Core/Latest/modules.html , you'll see the link to the Complete Module Documentation, http://doc.bioperl.org/releases/bioperl-1.2.3/. That lower left frame is a list of all the modules, select "Seq" to get to the Bio::Seq page. One problem with the software that creates these individual module pages is that it sometime sticks a lot of the useful documentation at the bottom of the page, particularly evident in the case of Seq.pm. You could also have gotten to this Bio::Seq page from the bptutorial, which has a brief section on Bio::Seq. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wes Barris Sent: Wednesday, September 24, 2003 12:46 AM To: Bioperl Mailing List Subject: [NORDNS] [Bioperl-l] Bio::Seq documentation? Hi, According to the bioperl documentation the Bio::SeqIO genbank method "next_seq" returns a "Bio::Seq" object. Where is the documentation for a "Bio::Seq" object? In the documentation on the bioperl web site, if you click on "bioperl-1.2.3::Bio::Seq" I see the following list: BaseSeqProcessor EncodedSeq LargePrimarySeq LargeSeq PrimaryQual PrimedSeq QualI RichSeq RichSeqI SeqBuilder SeqFactory SeqFastaSpeedFactory SeqWithQuality SequenceTrace TraceI Which one of these do I look under to find out information about what is returned by "next_seq"? What I am trying to do is to parse a genbank file and find out the sequence direction (3 prime or 5 prime). What I have done is this, but I am trying to find out if there is a better "built-in" way to do this: if ($seq->description =~ /3'/) { $direction = '3prime'; } elsif ($seq->description =~ /5'/) { $direction = '5prime'; } -- Wes Barris E-Mail: Wes.Barris@csiro.au _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Wed Sep 24 15:02:05 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Sep 24 15:00:11 2003 Subject: [Bioperl-l] Re: [Bug 1526] New: Where in the world is pg.pm? In-Reply-To: <200309241648.h8OGmMg2022148@portal.open-bio.org> References: <200309241648.h8OGmMg2022148@portal.open-bio.org> Message-ID: <200309241502.05231.lstein@cshl.edu> It doesn't seem to have been added to the 1.2 branch. I will add it now. Lincoln On Wednesday 24 September 2003 12:48 pm, bugzilla-daemon@portal.open-bio.org wrote: > http://bugzilla.bioperl.org/show_bug.cgi?id=1526 > > Summary: Where in the world is pg.pm? > Product: Bioperl > Version: 1.2 branch > Platform: All > OS/Version: All > Status: NEW > Severity: normal > Priority: P2 > Component: Bio::DB::GFF > AssignedTo: lstein@cshl.org > ReportedBy: nutello@sweetness.com > > > Module Bio::DB::GFF::Adaptor::dbi::pg seems to exist in CVS and is used > by scripts/Bio-DB-GFF/pg_bulk_load_gff.pl, but I can't find it in the > 1.2.2 or 1.2.3 tarballs (haven't checked earlier distributions). > > > > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From schattner at soe.ucsc.edu Wed Sep 24 15:16:48 2003 From: schattner at soe.ucsc.edu (Peter Schattner) Date: Wed Sep 24 15:15:24 2003 Subject: [Bioperl-l] Query re installing Bundle::BioPerl packages in a personal directory Message-ID: <3F71EDA0.9080706@soe.ucsc.edu> Hi guys I need to install bioperl on system where I don't have root access. For installing bioperl itself, the following command worked successfully as described in INSTALL. perl Makefile.PL PREFIX=/home/peter/My_Local_Perl_Modules However installing the bioperl bundle using: perl -MCPAN -e 'install Bundle::BioPerl' failed to install because of my lack of su privileges. Is there some way to pass the equivalent of "PREFIX=/home/dag/My_Local_Perl_Modules" to the bundle::BioPerl installation? Or does one have to install each of the 28 modules separately in order to install them in a local directory?? Thanks in advance Peter From jason at cgt.duhs.duke.edu Wed Sep 24 15:26:25 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Sep 24 15:24:23 2003 Subject: [Bioperl-l] Query re installing Bundle::BioPerl packages in a personal directory In-Reply-To: <3F71EDA0.9080706@soe.ucsc.edu> References: <3F71EDA0.9080706@soe.ucsc.edu> Message-ID: When you run CPAN for the first time it gives you the option of setting these parameters. you want to set makepl_arg to LIB=/home/peter... PREFIX=/home/peter cpan> o conf makepl_arg LIB=/home/peter... You can always re-run the configuration with cpan> o conf init http://search.cpan.org/dist/CPAN/lib/CPAN.pm#CONFIGURATION -jason On Wed, 24 Sep 2003, Peter Schattner wrote: > Hi guys > > I need to install bioperl on system where I don't have root access. For > installing bioperl itself, the following command worked successfully as > described in INSTALL. > > perl Makefile.PL PREFIX=/home/peter/My_Local_Perl_Modules > > However installing the bioperl bundle using: > > perl -MCPAN -e 'install Bundle::BioPerl' > > failed to install because of my lack of su privileges. > > Is there some way to pass the equivalent of > "PREFIX=/home/dag/My_Local_Perl_Modules" to the bundle::BioPerl > installation? > > Or does one have to install each of the 28 modules separately in > order to install them in a local directory?? > > Thanks in advance > > Peter > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From brian_osborne at cognia.com Wed Sep 24 15:48:51 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Sep 24 15:51:21 2003 Subject: [NORDNS] Re: [Bioperl-l] Query re installing Bundle::BioPerl packages in a personal directory In-Reply-To: Message-ID: Peter! I hope this letter finds you well. Just repeating what Jason's saying here. Go into the CPAN shell: >perl -MCPAN -e shell Then do: cpan>o conf makepl_arg PREFIX=/home/peters Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich Sent: Wednesday, September 24, 2003 3:26 PM To: Peter Schattner Cc: bioperl-l@bioperl.org Subject: [NORDNS] Re: [Bioperl-l] Query re installing Bundle::BioPerl packages in a personal directory When you run CPAN for the first time it gives you the option of setting these parameters. you want to set makepl_arg to LIB=/home/peter... PREFIX=/home/peter cpan> o conf makepl_arg LIB=/home/peter... You can always re-run the configuration with cpan> o conf init http://search.cpan.org/dist/CPAN/lib/CPAN.pm#CONFIGURATION -jason On Wed, 24 Sep 2003, Peter Schattner wrote: > Hi guys > > I need to install bioperl on system where I don't have root access. For > installing bioperl itself, the following command worked successfully as > described in INSTALL. > > perl Makefile.PL PREFIX=/home/peter/My_Local_Perl_Modules > > However installing the bioperl bundle using: > > perl -MCPAN -e 'install Bundle::BioPerl' > > failed to install because of my lack of su privileges. > > Is there some way to pass the equivalent of > "PREFIX=/home/dag/My_Local_Perl_Modules" to the bundle::BioPerl > installation? > > Or does one have to install each of the 28 modules separately in > order to install them in a local directory?? > > Thanks in advance > > Peter > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From Bernhard.Schmalhofer at biomax.de Thu Sep 25 04:29:58 2003 From: Bernhard.Schmalhofer at biomax.de (Bernhard Schmalhofer) Date: Thu Sep 25 04:28:02 2003 Subject: [Bioperl-l] Is Bioperl 1.2.3 in CPAN? In-Reply-To: References: Message-ID: <3F72A786.10205@biomax.de> Jason Stajich wrote: > The release is available as always from > http://bioperl.org/DIST/ > http://bioperl.org/DIST/bioperl-1.2.3.tar.bz2 > http://bioperl.org/DIST/bioperl-1.2.3.tar.gz > http://bioperl.org/DIST/bioperl-1.2.3.zip Hello, I wanted to install Bioperl 1.2.3 with CPAN.pm, but found that there is only bioperl-1.2.2 in http://search.cpan.org/~birney/. Is it still propagating? CU, Bernhard -- ************************************************* Bernhard Schmalhofer Senior Developer Biomax Informatics AG Lochhamer Str. 11 82152 Martinsried, Germany Tel: +49 89 89 55 74 - 839 Fax: +49 89 89 55 74 - 25 PGP: https://ssl.biomax.de/pgp/ Email: mailto:Bernhard.Schmalhofer@biomax.de Web: http://www.biomax.de ************************************************* From apark at dyax.com Thu Sep 25 15:35:09 2003 From: apark at dyax.com (Al Park) Date: Thu Sep 25 15:33:15 2003 Subject: [Bioperl-l] SOAP::Lite and bioperl Message-ID: Hello everyone, I have been stumped for the past two days trying to figure out why I can't get the following code to work properly. I get an error telling me that it "Can't call method "run" on undefined value". Any help or suggestions for what I'm doing wrong? Thanks in advance. Al ######Soap::lite server code #!/usr/bin/perl use lib '/var/www/MODs'; use SOAP::Transport::HTTP; use testpms; my $server = SOAP::Transport::HTTP::CGI ->on_action(sub{return}) ->dispatch_to('testpms::runTool'); $server->serializer->maptype->{SOAPStruct} = 'xsd1'; $server->handle; ######testpms.pm code package testpms; use strict; use lib '/var/www/MODs'; use SoapExSeq; use EmbossTools; sub runTool{ my ($class,$program,$ids,$chain) = @_; my ($dbh,$rc) = SoapExSeq::connectDB(); my $file = SoapExSeq::getSeq($dbh,$rc,$ids,$chain); SoapExSeq::disconnectDB($dbh,$rc); my $emboss = new EmbossTools; my $hashRef = $emboss->eTools($program,$file); my @val; my %hash = %$hashRef; foreach my $x (keys %hash){ if($x =~ /^\d+/){ my $value = $hash{$x}; my $item = SOAP::Data ->type("SOAPStruct") ->value({"isolate" => $x, "value" => $value}); push(@val, $item); } } return SOAP::Data->name("results" => \@val); } 1; #######EmbossTools.pm code package EmbossTools; use strict; use Bio::Factory::EMBOSS; BEGIN { $ENV{EMBOSSDIR} = '/usr/local/bin/emboss/'; } sub new{ my $class = shift; my $self = {}; bless ($self, $class); return $self; } sub eTools{ my $self = shift; my ($program,$seq_to_test) = @_; my (%hash); my $outfile = sprintf "/var/www/tmp/out.%s.$program", time; my $factory = new Bio::Factory::EMBOSS; my $application = $factory->program($program); if($program eq "iep"){ #handles fasta file $application->run({ '-sequencea' => $seq_to_test, '-graph' => 'none', '-outfile' => $outfile}); } my @all; if($outfile){ if($outfile =~ /iep/){ my($isolate,$iep_value); open(FH, "$outfile"); while(){ chomp; if($_ =~ /IEP/){ ($isolate = $_) =~ s/\w+\s\w+\s(.+)\sfrom.+$/$1/g; } if($_ =~ /Isoelectric/){ ($iep_value = $_) =~ s/\w+\s\w+\s\=\s(.+)/$1/g; $hash{$isolate} = $iep_value; push(@all, $isolate, $iep_value); } } close FH; } return (\%hash); } 1; From apark at dyax.com Thu Sep 25 15:42:03 2003 From: apark at dyax.com (Al Park) Date: Thu Sep 25 15:40:03 2003 Subject: [Bioperl-l] SOAP::Lite and bioperl Message-ID: I forgot to indicate that $program is "iep" so the if statement ($program eq "iep") would be executed properly if an object existed in $application. Al -----Original Message----- From: Al Park Sent: Thursday, September 25, 2003 3:35 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] SOAP::Lite and bioperl Hello everyone, I have been stumped for the past two days trying to figure out why I can't get the following code to work properly. I get an error telling me that it "Can't call method "run" on undefined value". Any help or suggestions for what I'm doing wrong? Thanks in advance. Al ######Soap::lite server code #!/usr/bin/perl use lib '/var/www/MODs'; use SOAP::Transport::HTTP; use testpms; my $server = SOAP::Transport::HTTP::CGI ->on_action(sub{return}) ->dispatch_to('testpms::runTool'); $server->serializer->maptype->{SOAPStruct} = 'xsd1'; $server->handle; ######testpms.pm code package testpms; use strict; use lib '/var/www/MODs'; use SoapExSeq; use EmbossTools; sub runTool{ my ($class,$program,$ids,$chain) = @_; my ($dbh,$rc) = SoapExSeq::connectDB(); my $file = SoapExSeq::getSeq($dbh,$rc,$ids,$chain); SoapExSeq::disconnectDB($dbh,$rc); my $emboss = new EmbossTools; my $hashRef = $emboss->eTools($program,$file); my @val; my %hash = %$hashRef; foreach my $x (keys %hash){ if($x =~ /^\d+/){ my $value = $hash{$x}; my $item = SOAP::Data ->type("SOAPStruct") ->value({"isolate" => $x, "value" => $value}); push(@val, $item); } } return SOAP::Data->name("results" => \@val); } 1; #######EmbossTools.pm code package EmbossTools; use strict; use Bio::Factory::EMBOSS; BEGIN { $ENV{EMBOSSDIR} = '/usr/local/bin/emboss/'; } sub new{ my $class = shift; my $self = {}; bless ($self, $class); return $self; } sub eTools{ my $self = shift; my ($program,$seq_to_test) = @_; my (%hash); my $outfile = sprintf "/var/www/tmp/out.%s.$program", time; my $factory = new Bio::Factory::EMBOSS; my $application = $factory->program($program); if($program eq "iep"){ #handles fasta file $application->run({ '-sequencea' => $seq_to_test, '-graph' => 'none', '-outfile' => $outfile}); } my @all; if($outfile){ if($outfile =~ /iep/){ my($isolate,$iep_value); open(FH, "$outfile"); while(){ chomp; if($_ =~ /IEP/){ ($isolate = $_) =~ s/\w+\s\w+\s(.+)\sfrom.+$/$1/g; } if($_ =~ /Isoelectric/){ ($iep_value = $_) =~ s/\w+\s\w+\s\=\s(.+)/$1/g; $hash{$isolate} = $iep_value; push(@all, $isolate, $iep_value); } } close FH; } return (\%hash); } 1; _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From djoubert at mail.mcg.edu Thu Sep 25 16:56:53 2003 From: djoubert at mail.mcg.edu (Douglas Joubert) Date: Thu Sep 25 16:59:36 2003 Subject: [Bioperl-l] BioPerl 1.2.3 Message-ID: Howdy, When I ppm: repository bioperl http://bioperl.org/DIST ppm: search bioperl I only see bioperl vesion 1.2.1 and bioinformatics toolkit 1.2.2... am I missing something? Cheers P.S. I am using acitve state 5.8.0 DJJ Douglas Joubert, M.L.I.S. Instructor and Digital Information Librarian Robert B. Greenblatt M.D. Library Medical College of Georgia Augusta, GA 30912-4400 From nk_white at yahoo.com Thu Sep 25 20:58:59 2003 From: nk_white at yahoo.com (Neill White) Date: Thu Sep 25 20:56:59 2003 Subject: [Bioperl-l] Entrez interface for NT contigs Message-ID: <20030926005859.55387.qmail@web11005.mail.yahoo.com> Hello, My problem is that I'm unable to retrieve the NT con sequences through the entrez interface. The error I get is: -------------------- WARNING --------------------- MSG: CONTIG found. GenBank get_Stream_by_acc about to run. --------------------------------------------------- Warning: unable to close filehandle FETCH properly. ------------- EXCEPTION ------------- MSG: in subseq, start [111855] has to be greater than end [106558] STACK Bio::PrimarySeq::subseq /usr/lib/perl5/site_perl/5.6.0/Bio/PrimarySeq.pm:358 STACK Bio::Seq::subseq /usr/lib/perl5/site_perl/5.6.0/Bio/Seq.pm:635 STACK Bio::DB::NCBIHelper::postprocess_data /usr/lib/perl5/site_perl/5.6.0/Bio/DB/NCBIHelper.pm:329 STACK Bio::DB::WebDBSeqI::_stream_request /usr/lib/perl5/site_perl/5.6.0/Bio/DB/WebDBSeqI.pm:687 STACK Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/site_perl/5.6.0/Bio/DB/WebDBSeqI.pm:434 STACK Bio::DB::NCBIHelper::get_Stream_by_query /usr/lib/perl5/site_perl/5.6.0/Bio/DB/NCBIHelper.pm:249 STACK toplevel ./dump_gene_annotation.pl:32 and the offending code is: 24 my $gb = new Bio::DB::GenBank; 25 26 # get a stream via a query string 27 my $query_string = $gene_name . '[gene] AND Human[Organism] AND contig'; 28 my $query = Bio::DB::Query::GenBank->new 29 (-query => $query_string, 30 -db => 'nucleotide'); 31 32 my $seqio = $gb->get_Stream_by_query($query); The documentation recommends that for contigs, to put a $gb->request_format('fasta') before making the query, but it's really not the sequence I want - it's the annotation. Any help would be much appreciated. -neill __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com From brian_osborne at cognia.com Fri Sep 26 10:42:27 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Sep 26 10:47:51 2003 Subject: [NORDNS] [Bioperl-l] Is Bioperl 1.2.3 in CPAN? In-Reply-To: <3F72A786.10205@biomax.de> Message-ID: Bernhard, Yes, I think when Jason announced 1.2.3 he also suggested that it may take a while for this version to appear in CPAN. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Bernhard Schmalhofer Sent: Thursday, September 25, 2003 4:30 AM To: bioperl-l@bioperl.org Subject: [NORDNS] [Bioperl-l] Is Bioperl 1.2.3 in CPAN? Jason Stajich wrote: > The release is available as always from > http://bioperl.org/DIST/ > http://bioperl.org/DIST/bioperl-1.2.3.tar.bz2 > http://bioperl.org/DIST/bioperl-1.2.3.tar.gz > http://bioperl.org/DIST/bioperl-1.2.3.zip Hello, I wanted to install Bioperl 1.2.3 with CPAN.pm, but found that there is only bioperl-1.2.2 in http://search.cpan.org/~birney/. Is it still propagating? CU, Bernhard -- ************************************************* Bernhard Schmalhofer Senior Developer Biomax Informatics AG Lochhamer Str. 11 82152 Martinsried, Germany Tel: +49 89 89 55 74 - 839 Fax: +49 89 89 55 74 - 25 PGP: https://ssl.biomax.de/pgp/ Email: mailto:Bernhard.Schmalhofer@biomax.de Web: http://www.biomax.de ************************************************* _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Fri Sep 26 11:44:22 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Sep 26 11:42:27 2003 Subject: [Bioperl-l] Bio::Tools::GFF GFF3 parsing In-Reply-To: References: Message-ID: <200309261144.22482.lstein@cshl.edu> Hi Jason, This is very good and I'm looking it over now. Does it handle the CIGAR lines for gapped alignments? Lincoln On Thursday 18 September 2003 07:02 pm, Jason Stajich wrote: > I added support for GFF3 parsing to Bio::Tools::GFF and added some simple > tests. I'm not 100% I have GFF3 output correct so Chris Mungall, Lincoln > if you don't mind giving it a look over that would be great. If I've > duplicated functionality from somewhere else let me know, but I think > Bio::Tools::GFF needs to be able to parse in GFF3 format at some point. > > The GFF3 parsing seems to work fine for processing the BGDP annotations so > I feel confident it is working correctly, but more testing is welcomed! > > This is no real support for Unicode type output, the simpliest solution > would be to rely on HTML::Entities for encoding non-ASCII codes. I wasn't > sure I wanted to make Tools::GFF depend on this right now so I've just > implemented a simple encoding for '=,;'. Feel free to fix this if it > needs to be more aggresive. > > -jason -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From jason at cgt.duhs.duke.edu Fri Sep 26 12:29:32 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Sep 26 12:27:51 2003 Subject: [Bioperl-l] Bio::Tools::GFF GFF3 parsing In-Reply-To: <200309261144.22482.lstein@cshl.edu> References: <200309261144.22482.lstein@cshl.edu> Message-ID: heh - not yet. I need to see whether we can use any of the existing cigar parsing capabilities in Bioperl - either way I'd like to make this part a Factory outside of Tools::GFF. Pass in a cigar string get out the appropriate feature representation (SimilarityPair or Search::HSP) or pass in a SimilarityPair and get out the appropriate cigar string. -jason On Fri, 26 Sep 2003, Lincoln Stein wrote: > Hi Jason, > > This is very good and I'm looking it over now. Does it handle the CIGAR lines > for gapped alignments? > > Lincoln > > On Thursday 18 September 2003 07:02 pm, Jason Stajich wrote: > > I added support for GFF3 parsing to Bio::Tools::GFF and added some simple > > tests. I'm not 100% I have GFF3 output correct so Chris Mungall, Lincoln > > if you don't mind giving it a look over that would be great. If I've > > duplicated functionality from somewhere else let me know, but I think > > Bio::Tools::GFF needs to be able to parse in GFF3 format at some point. > > > > The GFF3 parsing seems to work fine for processing the BGDP annotations so > > I feel confident it is working correctly, but more testing is welcomed! > > > > This is no real support for Unicode type output, the simpliest solution > > would be to rely on HTML::Entities for encoding non-ASCII codes. I wasn't > > sure I wanted to make Tools::GFF depend on this right now so I've just > > implemented a simple encoding for '=,;'. Feel free to fix this if it > > needs to be more aggresive. > > > > -jason > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From hazards at musc.edu Fri Sep 26 14:29:57 2003 From: hazards at musc.edu (Starr Hazard) Date: Fri Sep 26 14:25:52 2003 Subject: [Bioperl-l] matching miRNAs to one or a lot of mRNAs Message-ID: <16516968.1064586597@22gdellstarr.library.musc.edu> Folks, In a recent paper, Kawasacki et al(pubmed 12808467) report on the interaction between a specific miRNA (human miRNA23 g.i. 17646028) and a specific mRNA (human HES1 g.i. 8400709). They suggest they did a BLAST search and ultimately located the interaction. I cannot duplicate their data mining and cannot find the association they describe. In general, is there a way to take a library of miRNAs and evaluate their potential interaction with a particular mRNA? Or is there a data mining tool that could screen a large pool of mRNAs for significant interactions with a pool miRNAs? I cannot at present see any BioPerl tools that address this issue (right now that means I scanned the FAQ for the string RNA and searched the BioPerl site for RNA but found only some traffic about Seq.pm).The people I have asked seem divided about whether this is text matching issue or more of a hybridization issue involving an energy of interaction evaluation. Anybody got any pointers to offer? Starr From brian_osborne at cognia.com Fri Sep 26 15:29:47 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Sep 26 15:32:13 2003 Subject: [NORDNS] [Bioperl-l] matching miRNAs to one or a lot of mRNAs In-Reply-To: <16516968.1064586597@22gdellstarr.library.musc.edu> Message-ID: Starr, Bioperl has some good tools for the blastn searches that you're suggesting. So if you'd like to explore the "text matching" approach then Bioperl could facilitate this, yes. Bioperl can also run EMBOSS programs like dan, which calculates or predicts the Tm for a given DNA or RNA. The only fly in the ointment may be that you cannot specify a specific pair of RNAs in dan, you seem to be only be able to specify % mismatch given one RNA duplex- perhaps this isn't exact enough. Go to www.emboss.org to learn more about the dan program. So the idea might be to find related sequences using blastn, find their % percent mismatch, then use dan. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Starr Hazard Sent: Friday, September 26, 2003 2:30 PM To: bioperl-l@bioperl.org Subject: [NORDNS] [Bioperl-l] matching miRNAs to one or a lot of mRNAs Folks, In a recent paper, Kawasacki et al(pubmed 12808467) report on the interaction between a specific miRNA (human miRNA23 g.i. 17646028) and a specific mRNA (human HES1 g.i. 8400709). They suggest they did a BLAST search and ultimately located the interaction. I cannot duplicate their data mining and cannot find the association they describe. In general, is there a way to take a library of miRNAs and evaluate their potential interaction with a particular mRNA? Or is there a data mining tool that could screen a large pool of mRNAs for significant interactions with a pool miRNAs? I cannot at present see any BioPerl tools that address this issue (right now that means I scanned the FAQ for the string RNA and searched the BioPerl site for RNA but found only some traffic about Seq.pm).The people I have asked seem divided about whether this is text matching issue or more of a hybridization issue involving an energy of interaction evaluation. Anybody got any pointers to offer? Starr _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From el_mech_des at yahoo.com Sat Sep 27 02:01:17 2003 From: el_mech_des at yahoo.com (Resume for job) Date: Sat Sep 27 05:00:33 2003 Subject: [Bioperl-l] Electrical Design Message-ID: <200309270900.h8R90Odb002530@portal.open-bio.org> Rich' S Resume Santa Clara, CA 96070 (408) 482-2102 elecdes@hotmail.com ELECTRICAL DESIGNER & Drafter ELECTRO - MECHANICAL DESIGNER & Drafer Drafting & Design Shopping Centers; grocery stories, hardware stories, restaurants & residential - housing areas, computer business & fast food units installation & testing; Energy systems; Solar Panels, Wind Energy, portable & emergency generators; Factory production lines, food & mechanical process machinery Installations & trouble shooting. Equipment & production line installations, MCC, Sensors, Wiring, Alarm, Network, Security, Electrical Design & Installations; Network Sketches, one line diagrams, and "as is" drawings update. Customizing Electronic and Electrical Components & Parts, Layouts electronic and electrical schematic, connectors and mechanical detailing. Use CAD, Windows and applications; Programming & Detailing, Production Equipment, Machinery, Conveyors Spiral Elevator, Fast Cannery Transportation, Electronic and Electrical Components, Parts, Schematics & Layouts, Master Control Center, Can Sheet Metal Oven Rebuilding Project, Combustion Remodeling, Designing Electric Cars using AutoCAD, Spreadsheet, Basic, dB; electronic and electrical schematic, Layouts, mechanical detailing & redesigning components for manufacturing, Assembly drawings, Customizing Electronic, Mechanical and Electrical Equipment. Quotes, supply, bids and job estimating. Customers contact, inspection, project mgmt & supervision of electricians & material handling; Project; Mgr. for satellite office & shop. Commercial, Industrial, Residential, Fire Alarms, Smoke Detectors, Lights, Plugs, Panels, Power Supply, Electro-Solar Installations, Emergency Generators, Transformers, Power Lines, Fire sprinklers design / control, CAD automation. Foreman, Estimator, Designer & CAD Operator (New & "as is" drawings update) Hands on electrical installation performing, fitting wires & power lines; Panels, Light, receptacles & Fuse boxes, emergency power supply, parking lighting & post installation; Installing lamps, switches, alarms, plugs, receptacles, fire alarms, smoke detectors, fire & safety installations; Computer & data network wiring; underground installations & conduit layout, bending and mounting; Job Estimating & bid preparation. Environmental Energy Systems and Green Building Coordinating Programs; P E C, C E C, PG&E training, Solar Living Institute, High Performance Schools, Health Analyzing, using Solar and Wind Energy Friendly Systems, Savings by Design recommendations & new Title 24 Standards - LEED, CEC, AIA & COTE Ratings. Electrical & Electrical Drafting, Electrical Design & management, Site Inspection and Quality Control, coordination with General Contractor, Estimating and Supplying, Document Control & Upgrade, daily performance checking & schedule update. Electrical Installations & Service Solar Energy Installations & Service Electro-Mechanical Assembly & Service ELECTRICAL PROJECT MANAGER - COORDINATOR ELECTRICAL & MAINTENANCE SERVICE HANDS ON WIRING & INSTALLATION ELECTRICAL AND MECHANICAL PROJECTS From el_mech_des at yahoo.com Sat Sep 27 02:01:17 2003 From: el_mech_des at yahoo.com (Resume for job) Date: Sat Sep 27 05:00:47 2003 Subject: [Bioperl-l] Electrical Design Message-ID: <200309270900.h8R90Odb002529@portal.open-bio.org> Rich' S Resume Santa Clara, CA 96070 (408) 482-2102 elecdes@hotmail.com ELECTRICAL DESIGNER & Drafter ELECTRO - MECHANICAL DESIGNER & Drafer Drafting & Design Shopping Centers; grocery stories, hardware stories, restaurants & residential - housing areas, computer business & fast food units installation & testing; Energy systems; Solar Panels, Wind Energy, portable & emergency generators; Factory production lines, food & mechanical process machinery Installations & trouble shooting. Equipment & production line installations, MCC, Sensors, Wiring, Alarm, Network, Security, Electrical Design & Installations; Network Sketches, one line diagrams, and "as is" drawings update. Customizing Electronic and Electrical Components & Parts, Layouts electronic and electrical schematic, connectors and mechanical detailing. Use CAD, Windows and applications; Programming & Detailing, Production Equipment, Machinery, Conveyors Spiral Elevator, Fast Cannery Transportation, Electronic and Electrical Components, Parts, Schematics & Layouts, Master Control Center, Can Sheet Metal Oven Rebuilding Project, Combustion Remodeling, Designing Electric Cars using AutoCAD, Spreadsheet, Basic, dB; electronic and electrical schematic, Layouts, mechanical detailing & redesigning components for manufacturing, Assembly drawings, Customizing Electronic, Mechanical and Electrical Equipment. Quotes, supply, bids and job estimating. Customers contact, inspection, project mgmt & supervision of electricians & material handling; Project; Mgr. for satellite office & shop. Commercial, Industrial, Residential, Fire Alarms, Smoke Detectors, Lights, Plugs, Panels, Power Supply, Electro-Solar Installations, Emergency Generators, Transformers, Power Lines, Fire sprinklers design / control, CAD automation. Foreman, Estimator, Designer & CAD Operator (New & "as is" drawings update) Hands on electrical installation performing, fitting wires & power lines; Panels, Light, receptacles & Fuse boxes, emergency power supply, parking lighting & post installation; Installing lamps, switches, alarms, plugs, receptacles, fire alarms, smoke detectors, fire & safety installations; Computer & data network wiring; underground installations & conduit layout, bending and mounting; Job Estimating & bid preparation. Environmental Energy Systems and Green Building Coordinating Programs; P E C, C E C, PG&E training, Solar Living Institute, High Performance Schools, Health Analyzing, using Solar and Wind Energy Friendly Systems, Savings by Design recommendations & new Title 24 Standards - LEED, CEC, AIA & COTE Ratings. Electrical & Electrical Drafting, Electrical Design & management, Site Inspection and Quality Control, coordination with General Contractor, Estimating and Supplying, Document Control & Upgrade, daily performance checking & schedule update. Electrical Installations & Service Solar Energy Installations & Service Electro-Mechanical Assembly & Service ELECTRICAL PROJECT MANAGER - COORDINATOR ELECTRICAL & MAINTENANCE SERVICE HANDS ON WIRING & INSTALLATION ELECTRICAL AND MECHANICAL PROJECTS From ik1 at sanger.ac.uk Sat Sep 27 06:45:44 2003 From: ik1 at sanger.ac.uk (Ian Korf) Date: Sat Sep 27 06:43:46 2003 Subject: [Bioperl-l] matching miRNAs to one or a lot of mRNAs In-Reply-To: <16516968.1064586597@22gdellstarr.library.musc.edu> Message-ID: The human HES1 8400709 is not the sequence from the paper I don't think. If you align the sequence in figure 1a against 8400709, you'll find they don't match. There are other HES1 sequences in GenBank though, for example, 1655593, that contain the sequence in the figure. But if you try aligning the miRNA to 1655593 with NCBI-BLAST, you won't find anything. If you do a S-W alignment (match +1, mismatch -1, gap -2) of the miRNA complement against 1655593 you get the following, which is the same alignment reported in the paper. Stats: score=12 Alignment: Q:855..874 S:1..21 17/3 1,0 Q: TGGAACTCACTGG-AAAGTGA ||||| || |||| || |||| S: TGGAAATCCCTGGAAATGTGA You'll note that the largest ungapped alilgnment is 5nt. The authors did not say they used BLAST, only that they searched GenBank. 5nt is too short for NCBI-BLAST, which has a minimum word size of 7. WU-BLAST has no limit of word size, and you can find the alignment with WU-BLAST. Same scoring system as above used here but note that E2 had to be raised to at least 11 or the alignment would get pruned before subjected to gapped statistics. Here it is: Score = 12 (17.3 bits), Expect = 0.037, P = 0.037 Identities = 17/21 (80%), Positives = 17/21 (80%), Strand = Plus / Plus Query: 855 TGGAACTCACTGGAAA-GTGA 874 ||||| || |||| || |||| Sbjct: 1 TGGAAATCCCTGGCAATGTGA 21 If you make a habit of such searches, don't be surprised if you run in to a lot of false-positives. I think you might want to use additional criteria such as overlapping the stop or located in the 3'UTR. I'm not aware of any software specifically designed for such searches, but perhaps the authors of the paper have one. The paper was very brief and had no description of the bioinformatics in the methods section (if I was one of the referees, I would have found this unacceptable). I suggest you contact the authors and find out specifically what they did. -Ian On Friday, September 26, 2003, at 07:29 PM, Starr Hazard wrote: > Folks, > > In a recent paper, Kawasacki et al(pubmed 12808467) report on the > interaction between a specific miRNA (human miRNA23 g.i. 17646028) and > a specific mRNA (human HES1 g.i. 8400709). They suggest they did a > BLAST search and ultimately located the interaction. I cannot > duplicate their data mining and cannot find the association they > describe. > > In general, is there a way to take a library of miRNAs and evaluate > their potential interaction with a particular mRNA? Or is there a data > mining tool that could screen a large pool of mRNAs for > significant interactions with a pool miRNAs? > > I cannot at present see any BioPerl tools that address this issue > (right now that means I scanned the FAQ for the string RNA and > searched the BioPerl site for RNA but found only some traffic about > Seq.pm).The people I have asked seem divided about whether this is > text matching issue or more of a hybridization issue involving > an energy of interaction evaluation. > > Anybody got any pointers to offer? > > Starr > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From stoilov at ucla.edu Sat Sep 27 17:17:16 2003 From: stoilov at ucla.edu (Peter Stoilov) Date: Sat Sep 27 17:26:43 2003 Subject: [Bioperl-l] matching miRNAs to one or a lot of mRNAs In-Reply-To: References: Message-ID: <200309271417.16936.stoilov@ucla.edu> Hi, they indeed mixed up the transcripts, but to me it looks like a honest mistake. There seems to be another unrelated transcript with the same name (HES1). The accession for this transcript is NM_004649. All of the experiments in the paper with the exception of the ELISA are done on the wrong gene (ugly). The funny stuff doesn't end here. Using FASTA search I was able to find 1 mi-RNA (hsa-miR-221) that matches the transcript (NM_004649) at exactly the same spot much better than hsa-miR-23(b). hsa-miR-221 vs Homo sapiens chromosome 21 open reading frame 33 (C21orf33), mRNA Matches 19 of 23 23 CUUUGGGUCGUCUGUUACAU-CG 2 :.::...: ..:: :.:: : :. 873 GGAACUCACUGGAAAGUG-ACGC 894 As for the real HES1 hsa-miR-205 and hsa-miR-221 are much better compared to hsa-miR-23b. hsa-miR-205 vs Homo sapiens hairy and enhancer of split 1, (Drosophila) (HES1), mRNA Matches 18 of 22 21 UCUGAGGCCACCUU-ACUUCCU 1 ::.. .:: :::: :::.::. 1062 AGGCCGUGGCGGAACUGAGGGG 1083 hsa-miR-221 vs Homo sapiens hairy and enhancer of split 1, (Drosophila) (HES1), mRNA Matches 21 of 23 23 CUUUGG-GUCGUC-UG-UUACAUCGA 1 ::.... ..:..: :. .: : .:.: 1061 GAGGCCGUGGCGGAACUGAGGGGGCU 1086 I I'll write to the autors to see what they think about this. Now about searching for mi-RNA targets. The smolest word size that I can use in BLAST is 7 for nucleic acid (thanks for the WU-BLAST idea!). So I had to go with FASTA. Now FASTA reports only one hit (should I say HSP?) per sequence. The way I go arround this is to generate multiple sequences for each transcript in wich the transcript except for 35 nt is masked with Ns. The unmasked regions are tiled with 5nt step (30nt overlap). The problem with this is that the database size gets completely out of hand and will not fit my hard drive;). Searching the database takes forever. But when I do it for individual transcripts it works pretty well. Peter On Saturday 27 September 2003 03:45, Ian Korf wrote: > The human HES1 8400709 is not the sequence from the paper I don't > think. If you align the sequence in figure 1a against 8400709, you'll > find they don't match. There are other HES1 sequences in GenBank > though, for example, 1655593, that contain the sequence in the figure. > But if you try aligning the miRNA to 1655593 with NCBI-BLAST, you won't > find anything. > > If you do a S-W alignment (match +1, mismatch -1, gap -2) of the miRNA > complement against 1655593 you get the following, which is the same > alignment reported in the paper. > > Stats: score=12 > Alignment: Q:855..874 S:1..21 17/3 1,0 > Q: TGGAACTCACTGG-AAAGTGA > > S: TGGAAATCCCTGGAAATGTGA > > You'll note that the largest ungapped alilgnment is 5nt. The authors > did not say they used BLAST, only that they searched GenBank. 5nt is > too short for NCBI-BLAST, which has a minimum word size of 7. WU-BLAST > has no limit of word size, and you can find the alignment with > WU-BLAST. Same scoring system as above used here but note that E2 had > to be raised to at least 11 or the alignment would get pruned before > subjected to gapped statistics. Here it is: > > Score = 12 (17.3 bits), Expect = 0.037, P = 0.037 > Identities = 17/21 (80%), Positives = 17/21 (80%), Strand = Plus / Plus > > Query: 855 TGGAACTCACTGGAAA-GTGA 874 > > Sbjct: 1 TGGAAATCCCTGGCAATGTGA 21 > > If you make a habit of such searches, don't be surprised if you run in > to a lot of false-positives. I think you might want to use additional > criteria such as overlapping the stop or located in the 3'UTR. I'm not > aware of any software specifically designed for such searches, but > perhaps the authors of the paper have one. The paper was very brief and > had no description of the bioinformatics in the methods section (if I > was one of the referees, I would have found this unacceptable). I > suggest you contact the authors and find out specifically what they did. > > -Ian > > On Friday, September 26, 2003, at 07:29 PM, Starr Hazard wrote: > > Folks, > > > > In a recent paper, Kawasacki et al(pubmed 12808467) report on the > > interaction between a specific miRNA (human miRNA23 g.i. 17646028) and > > a specific mRNA (human HES1 g.i. 8400709). They suggest they did a > > BLAST search and ultimately located the interaction. I cannot > > duplicate their data mining and cannot find the association they > > describe. > > > > In general, is there a way to take a library of miRNAs and evaluate > > their potential interaction with a particular mRNA? Or is there a data > > mining tool that could screen a large pool of mRNAs for > > significant interactions with a pool miRNAs? > > > > I cannot at present see any BioPerl tools that address this issue > > (right now that means I scanned the FAQ for the string RNA and > > searched the BioPerl site for RNA but found only some traffic about > > Seq.pm).The people I have asked seem divided about whether this is > > text matching issue or more of a hybridization issue involving > > an energy of interaction evaluation. > > > > Anybody got any pointers to offer? > > > > Starr > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From ik1 at sanger.ac.uk Sun Sep 28 02:28:39 2003 From: ik1 at sanger.ac.uk (Ian Korf) Date: Sun Sep 28 02:26:39 2003 Subject: [Bioperl-l] matching miRNAs to one or a lot of mRNAs In-Reply-To: <200309271417.16936.stoilov@ucla.edu> Message-ID: Some more reasons to use WU-BLAST: (1) You can use a nucleotide scoring matrix rather than simple match-mismatch values. The main reason for doing this is to make GC and GU both positive scoring. You might also make the various match and mismatch values a little different based on observed properties of matches and mismatches in complementary RNAs. (2) You get more control over gap costs. In NCBI-BLAST the gap initiation cost must always be greater than 0. So you can't do a uniform -1 or -2 for gaps, it must always be -2 for the first and -1 for the rest (or some such variant). You'd have to study RNA alignments a bit to determine if this is really an advantage or not. (3) At shorter word lengths, WU-BLAST switches its extension rules and no longer uses random values for ambiguities. Probably not a big deal unless you're mining ESTs or you have known RNA modifications like inosine. You would, of course, want to modify the scoring matrix if you had inosines in your sequence. (4) As you pointed out, unlike FASTA or SW, BLAST finds multiple high-scoring pairs rather than a single maximum scoring pair. Some things to watch out for: (a) The default setting for E2 may be too low and must be raised (e.g. 100). You might also want to raise E if your database is large. Make sure the statistics aren't getting in the way of finding your true positives. (b) It might take a long time to do the search. I'd recommend using hitdist=20 and W=4. W=5 seems a bit of a stretch to me, but W=4 on its own might be too sensitive. Requiring the second hit is a good idea, and lots of RNA structures have short symmetric bubbles, so this seems like it would work. (c) You don't want Sum statistics because you're really looking for complete matches. Use the -kap option to turn off combined stats. This will save some compute time as well. (d) Don't forget to pick up a copy of the O'Reilly BLAST book which has quite a bit of info on BLAST in general (sorry for the shameless plug). -Ian On Saturday, September 27, 2003, at 10:17 PM, Peter Stoilov wrote: > Hi, > > they indeed mixed up the transcripts, but to me it looks like a honest > mistake. There seems to be another unrelated transcript with the same > name > (HES1). The accession for this transcript is NM_004649. All of the > experiments in the paper with the exception of the ELISA are done on > the > wrong gene (ugly). The funny stuff doesn't end here. Using FASTA > search I was > able to find 1 mi-RNA (hsa-miR-221) that matches the transcript > (NM_004649) > at exactly the same spot much better than hsa-miR-23(b). > > hsa-miR-221 vs Homo sapiens chromosome 21 open reading frame 33 > (C21orf33), > mRNA > Matches 19 of 23 > 23 CUUUGGGUCGUCUGUUACAU-CG 2 > :.::...: ..:: :.:: : :. > 873 GGAACUCACUGGAAAGUG-ACGC 894 > > > > As for the real HES1 hsa-miR-205 and hsa-miR-221 are much better > compared to > hsa-miR-23b. > > hsa-miR-205 vs Homo sapiens hairy and enhancer of split 1, > (Drosophila) > (HES1), mRNA > Matches 18 of 22 > 21 UCUGAGGCCACCUU-ACUUCCU 1 > ::.. .:: :::: :::.::. > 1062 AGGCCGUGGCGGAACUGAGGGG 1083 > > hsa-miR-221 vs Homo sapiens hairy and enhancer of split 1, > (Drosophila) > (HES1), mRNA > Matches 21 of 23 > 23 CUUUGG-GUCGUC-UG-UUACAUCGA 1 > ::.... ..:..: :. .: : .:.: > 1061 GAGGCCGUGGCGGAACUGAGGGGGCU 1086 > > > I I'll write to the autors to see what they think about this. > > > Now about searching for mi-RNA targets. The smolest word size that I > can use > in BLAST is 7 for nucleic acid (thanks for the WU-BLAST idea!). So I > had to > go with FASTA. Now FASTA reports only one hit (should I say HSP?) per > sequence. The way I go arround this is to generate multiple sequences > for > each transcript in wich the transcript except for 35 nt is masked with > Ns. > The unmasked regions are tiled with 5nt step (30nt overlap). The > problem > with this is that the database size gets completely out of hand and > will not > fit my hard drive;). Searching the database takes forever. But when I > do it > for individual transcripts it works pretty well. > > Peter > > On Saturday 27 September 2003 03:45, Ian Korf wrote: >> The human HES1 8400709 is not the sequence from the paper I don't >> think. If you align the sequence in figure 1a against 8400709, you'll >> find they don't match. There are other HES1 sequences in GenBank >> though, for example, 1655593, that contain the sequence in the figure. >> But if you try aligning the miRNA to 1655593 with NCBI-BLAST, you >> won't >> find anything. >> >> If you do a S-W alignment (match +1, mismatch -1, gap -2) of the miRNA >> complement against 1655593 you get the following, which is the same >> alignment reported in the paper. >> >> Stats: score=12 >> Alignment: Q:855..874 S:1..21 17/3 1,0 >> Q: TGGAACTCACTGG-AAAGTGA >> >> S: TGGAAATCCCTGGAAATGTGA >> >> You'll note that the largest ungapped alilgnment is 5nt. The authors >> did not say they used BLAST, only that they searched GenBank. 5nt is >> too short for NCBI-BLAST, which has a minimum word size of 7. WU-BLAST >> has no limit of word size, and you can find the alignment with >> WU-BLAST. Same scoring system as above used here but note that E2 had >> to be raised to at least 11 or the alignment would get pruned before >> subjected to gapped statistics. Here it is: >> >> Score = 12 (17.3 bits), Expect = 0.037, P = 0.037 >> Identities = 17/21 (80%), Positives = 17/21 (80%), Strand = Plus / >> Plus >> >> Query: 855 TGGAACTCACTGGAAA-GTGA 874 >> >> Sbjct: 1 TGGAAATCCCTGGCAATGTGA 21 >> >> If you make a habit of such searches, don't be surprised if you run in >> to a lot of false-positives. I think you might want to use additional >> criteria such as overlapping the stop or located in the 3'UTR. I'm not >> aware of any software specifically designed for such searches, but >> perhaps the authors of the paper have one. The paper was very brief >> and >> had no description of the bioinformatics in the methods section (if I >> was one of the referees, I would have found this unacceptable). I >> suggest you contact the authors and find out specifically what they >> did. >> >> -Ian >> >> On Friday, September 26, 2003, at 07:29 PM, Starr Hazard wrote: >>> Folks, >>> >>> In a recent paper, Kawasacki et al(pubmed 12808467) report on the >>> interaction between a specific miRNA (human miRNA23 g.i. 17646028) >>> and >>> a specific mRNA (human HES1 g.i. 8400709). They suggest they did a >>> BLAST search and ultimately located the interaction. I cannot >>> duplicate their data mining and cannot find the association they >>> describe. >>> >>> In general, is there a way to take a library of miRNAs and evaluate >>> their potential interaction with a particular mRNA? Or is there a >>> data >>> mining tool that could screen a large pool of mRNAs for >>> significant interactions with a pool miRNAs? >>> >>> I cannot at present see any BioPerl tools that address this issue >>> (right now that means I scanned the FAQ for the string RNA and >>> searched the BioPerl site for RNA but found only some traffic about >>> Seq.pm).The people I have asked seem divided about whether this is >>> text matching issue or more of a hybridization issue involving >>> an energy of interaction evaluation. >>> >>> Anybody got any pointers to offer? >>> >>> Starr >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Martin_Artist at yahoo.com Sun Sep 28 02:28:05 2003 From: Martin_Artist at yahoo.com (Bush Bible PNAC) Date: Sun Sep 28 05:27:34 2003 Subject: [Bioperl-l] Quo Vadis USA Message-ID: <200309280927.h8S9RMdb031674@portal.open-bio.org> Hi John, Thanks for mail and here is my answer; Is hard, and . what can I say; is bad, very bad; and most republicans don't even see it; Our Retirement money get stolen, even our grandkids money get stolen In just 2 years all Americans end up in Jail, because of lie. Blowing Up dying terrorism for the purpose of profit and blackmail whole Nation: Communism no longer exist; - we abolish it all; they did enjoyed scary everyone to the max for years; So they re-Invented Terrorism in PNAC Here we are; Guess we back to Nero Time; destroy own nation for wild, sick happiness & steal $$; I'm very worry, desperate PNAC may do 3rd Pearl Harbor to US. Second is here: 1997 http://www.newamericancentury.org/RebuildingAmericasDefenses.pdf page 48 Clinton economy is good page 51 on document or 63 / on Acrobat viewer.. - We have to have major attack on America first, to convince people http://www.newamericancentury.org/publicationsreports.htm Did they convinced you ?? - not me but I see Herbert in it I was born Free and will never give up !! OPEN YOUR MIND FOR YOURSELF AND YOUR KIDS LIFE; Arnold or Adolph, what the different; D A N K E, will Terminate US all & California first Prescott help Adolph, George him & Enron; Gasoline $4 per gallon is just the beginning. Actor = skilled liar, how can we trust them ? I heard, his porno job is for sale; http://www.tarpley.net/bushb.htm http://www.bartcop.com/421102.htm http://www.bk2k.com/bushbodycount/prescott-bush/index.shtml http://hnn.us/comments/15105.html Hitler, Noriega, Hussein, bin Laden, Pinochet did business with the Bush family. Now we do business with Bush as well; $500 000 000 000 is gone, missing, and make US poor Perfect Taxpayers Rip Off - MONEY LAUNDRY true Iraq repair kidding, - direct to own pocket, Isn't this a crime ?? + Lie of war reasons, and lie, and lie, and lie. Please look on this and think by yourself; where we are going ?? Or PNAC push us there, like stick to a blind sheep. Start thinking on your own, please. http://pnac.tvnewslies.org http://www.tvnewslies.org START THINKING AGAIN !!! www.moveon.org/news/1577.html www.prisonplanet.com www.moveon.org www.votetoimpeach.com www.kgo810.com listen 810 AM; Bernie, Genie, Ray only (night time) Stalin made own election; ARE WE STILL FREE ? How many Concentration Camps do they need today ? one for sure is on Cuba. Patriot Act 2 (patriots fuck twice) may put there more than Mc Cartney commission They dream of DRAFT .. and put 20 000 000 our boys for free - to Conquer the World ... ( Hitler had 10 000 000 only & German Empire died in 5 years, killing 50 000 000 ) How long will last PNAC - Bush Empire, to a Global Nuclear War ? What next; we did seen many times; on futuristic films; If, after that news - you can't sleep too well -; get Warren book; " The Governor " and try understand - how to blackmail Congress Or movie films; " Enemy of the State" that most of us today " Spy like US " - abusing power by top managers " Air America " - how marihuana come to US It may happened, Pentagon will have to arrest White House and his own " Liar in Chief" to save USA On the base of George Washington Legacy Sorry for not sending this as attachment, but recently we got so many viruses in it; multiplying, destroying & redirecting E-mails, computers and servers, I'm scary of any attachments at all, will not send them to anyone too. It happened once, my mail visit wrong places before, so I'm twice careful now ( Thanks to; angry Iraq - for radioactive bombing, China with Whore Culture, and War President & his bible "PNAC" inventing Terrorism & new Pearl Harbor: http://www.newamericancentury.org/RebuildingAmericasDefenses.pdf ) I did fired few Chinese from my bed for wrong love - they ask for $$ & home from first kiss - now getting over 10 viruses a day .. will not open attachment even from God. I heard they killed 200 000 000 healthy born baby girls; what a wasting love resources & kissing abilities .. Any way .. let's do some business now and then I can put you those Solar Panels on next stories; See Soon (I'm inviting you on lunch some day) Be ready; and think of our Poor Nation; Before is too late. Thanks, Martin From Martin_Artist at yahoo.com Sun Sep 28 02:28:05 2003 From: Martin_Artist at yahoo.com (Bush Bible PNAC) Date: Sun Sep 28 05:27:50 2003 Subject: [Bioperl-l] Quo Vadis USA Message-ID: <200309280927.h8S9RMdb031673@portal.open-bio.org> Hi John, Thanks for mail and here is my answer; Is hard, and . what can I say; is bad, very bad; and most republicans don't even see it; Our Retirement money get stolen, even our grandkids money get stolen In just 2 years all Americans end up in Jail, because of lie. Blowing Up dying terrorism for the purpose of profit and blackmail whole Nation: Communism no longer exist; - we abolish it all; they did enjoyed scary everyone to the max for years; So they re-Invented Terrorism in PNAC Here we are; Guess we back to Nero Time; destroy own nation for wild, sick happiness & steal $$; I'm very worry, desperate PNAC may do 3rd Pearl Harbor to US. Second is here: 1997 http://www.newamericancentury.org/RebuildingAmericasDefenses.pdf page 48 Clinton economy is good page 51 on document or 63 / on Acrobat viewer.. - We have to have major attack on America first, to convince people http://www.newamericancentury.org/publicationsreports.htm Did they convinced you ?? - not me but I see Herbert in it I was born Free and will never give up !! OPEN YOUR MIND FOR YOURSELF AND YOUR KIDS LIFE; Arnold or Adolph, what the different; D A N K E, will Terminate US all & California first Prescott help Adolph, George him & Enron; Gasoline $4 per gallon is just the beginning. Actor = skilled liar, how can we trust them ? I heard, his porno job is for sale; http://www.tarpley.net/bushb.htm http://www.bartcop.com/421102.htm http://www.bk2k.com/bushbodycount/prescott-bush/index.shtml http://hnn.us/comments/15105.html Hitler, Noriega, Hussein, bin Laden, Pinochet did business with the Bush family. Now we do business with Bush as well; $500 000 000 000 is gone, missing, and make US poor Perfect Taxpayers Rip Off - MONEY LAUNDRY true Iraq repair kidding, - direct to own pocket, Isn't this a crime ?? + Lie of war reasons, and lie, and lie, and lie. Please look on this and think by yourself; where we are going ?? Or PNAC push us there, like stick to a blind sheep. Start thinking on your own, please. http://pnac.tvnewslies.org http://www.tvnewslies.org START THINKING AGAIN !!! www.moveon.org/news/1577.html www.prisonplanet.com www.moveon.org www.votetoimpeach.com www.kgo810.com listen 810 AM; Bernie, Genie, Ray only (night time) Stalin made own election; ARE WE STILL FREE ? How many Concentration Camps do they need today ? one for sure is on Cuba. Patriot Act 2 (patriots fuck twice) may put there more than Mc Cartney commission They dream of DRAFT .. and put 20 000 000 our boys for free - to Conquer the World ... ( Hitler had 10 000 000 only & German Empire died in 5 years, killing 50 000 000 ) How long will last PNAC - Bush Empire, to a Global Nuclear War ? What next; we did seen many times; on futuristic films; If, after that news - you can't sleep too well -; get Warren book; " The Governor " and try understand - how to blackmail Congress Or movie films; " Enemy of the State" that most of us today " Spy like US " - abusing power by top managers " Air America " - how marihuana come to US It may happened, Pentagon will have to arrest White House and his own " Liar in Chief" to save USA On the base of George Washington Legacy Sorry for not sending this as attachment, but recently we got so many viruses in it; multiplying, destroying & redirecting E-mails, computers and servers, I'm scary of any attachments at all, will not send them to anyone too. It happened once, my mail visit wrong places before, so I'm twice careful now ( Thanks to; angry Iraq - for radioactive bombing, China with Whore Culture, and War President & his bible "PNAC" inventing Terrorism & new Pearl Harbor: http://www.newamericancentury.org/RebuildingAmericasDefenses.pdf ) I did fired few Chinese from my bed for wrong love - they ask for $$ & home from first kiss - now getting over 10 viruses a day .. will not open attachment even from God. I heard they killed 200 000 000 healthy born baby girls; what a wasting love resources & kissing abilities .. Any way .. let's do some business now and then I can put you those Solar Panels on next stories; See Soon (I'm inviting you on lunch some day) Be ready; and think of our Poor Nation; Before is too late. Thanks, Martin From skirov at utk.edu Sun Sep 28 14:12:54 2003 From: skirov at utk.edu (Stefan Kirov) Date: Sun Sep 28 14:10:59 2003 Subject: [Bioperl-l] matching miRNAs to one or a lot of mRNAs In-Reply-To: <200309271417.16936.stoilov@ucla.edu> References: <200309271417.16936.stoilov@ucla.edu> Message-ID: <3F7724A6.70404@utk.edu> Just a small clarification here guys- BioPerl is not a collection of tools, though there are some. It provides you with the means to write your own tools, or integrate different tools to fit your needs. Pepi, have you looked at sim4? Maybe it also could solve your problem. And you don't you do the masking on the fly, instead of creating a datbase- it might be slower(not sure about that, since you'll skip some IO ops), but your disk won't be full? Stefan Peter Stoilov wrote: >Hi, > >they indeed mixed up the transcripts, but to me it looks like a honest >mistake. There seems to be another unrelated transcript with the same name >(HES1). The accession for this transcript is NM_004649. All of the >experiments in the paper with the exception of the ELISA are done on the >wrong gene (ugly). The funny stuff doesn't end here. Using FASTA search I was >able to find 1 mi-RNA (hsa-miR-221) that matches the transcript (NM_004649) >at exactly the same spot much better than hsa-miR-23(b). > >hsa-miR-221 vs Homo sapiens chromosome 21 open reading frame 33 (C21orf33), >mRNA >Matches 19 of 23 >23 CUUUGGGUCGUCUGUUACAU-CG 2 > :.::...: ..:: :.:: : :. >873 GGAACUCACUGGAAAGUG-ACGC 894 > > > >As for the real HES1 hsa-miR-205 and hsa-miR-221 are much better compared to >hsa-miR-23b. > >hsa-miR-205 vs Homo sapiens hairy and enhancer of split 1, (Drosophila) >(HES1), mRNA >Matches 18 of 22 >21 UCUGAGGCCACCUU-ACUUCCU 1 > ::.. .:: :::: :::.::. >1062 AGGCCGUGGCGGAACUGAGGGG 1083 > >hsa-miR-221 vs Homo sapiens hairy and enhancer of split 1, (Drosophila) >(HES1), mRNA >Matches 21 of 23 >23 CUUUGG-GUCGUC-UG-UUACAUCGA 1 > ::.... ..:..: :. .: : .:.: >1061 GAGGCCGUGGCGGAACUGAGGGGGCU 1086 > > > I I'll write to the autors to see what they think about this. > > >Now about searching for mi-RNA targets. The smolest word size that I can use >in BLAST is 7 for nucleic acid (thanks for the WU-BLAST idea!). So I had to >go with FASTA. Now FASTA reports only one hit (should I say HSP?) per >sequence. The way I go arround this is to generate multiple sequences for >each transcript in wich the transcript except for 35 nt is masked with Ns. >The unmasked regions are tiled with 5nt step (30nt overlap). The problem >with this is that the database size gets completely out of hand and will not >fit my hard drive;). Searching the database takes forever. But when I do it >for individual transcripts it works pretty well. > >Peter > >On Saturday 27 September 2003 03:45, Ian Korf wrote: > > >>The human HES1 8400709 is not the sequence from the paper I don't >>think. If you align the sequence in figure 1a against 8400709, you'll >>find they don't match. There are other HES1 sequences in GenBank >>though, for example, 1655593, that contain the sequence in the figure. >>But if you try aligning the miRNA to 1655593 with NCBI-BLAST, you won't >>find anything. >> >>If you do a S-W alignment (match +1, mismatch -1, gap -2) of the miRNA >>complement against 1655593 you get the following, which is the same >>alignment reported in the paper. >> >>Stats: score=12 >>Alignment: Q:855..874 S:1..21 17/3 1,0 >>Q: TGGAACTCACTGG-AAAGTGA >> >>S: TGGAAATCCCTGGAAATGTGA >> >>You'll note that the largest ungapped alilgnment is 5nt. The authors >>did not say they used BLAST, only that they searched GenBank. 5nt is >>too short for NCBI-BLAST, which has a minimum word size of 7. WU-BLAST >>has no limit of word size, and you can find the alignment with >>WU-BLAST. Same scoring system as above used here but note that E2 had >>to be raised to at least 11 or the alignment would get pruned before >>subjected to gapped statistics. Here it is: >> >> Score = 12 (17.3 bits), Expect = 0.037, P = 0.037 >> Identities = 17/21 (80%), Positives = 17/21 (80%), Strand = Plus / Plus >> >>Query: 855 TGGAACTCACTGGAAA-GTGA 874 >> >>Sbjct: 1 TGGAAATCCCTGGCAATGTGA 21 >> >>If you make a habit of such searches, don't be surprised if you run in >>to a lot of false-positives. I think you might want to use additional >>criteria such as overlapping the stop or located in the 3'UTR. I'm not >>aware of any software specifically designed for such searches, but >>perhaps the authors of the paper have one. The paper was very brief and >>had no description of the bioinformatics in the methods section (if I >>was one of the referees, I would have found this unacceptable). I >>suggest you contact the authors and find out specifically what they did. >> >>-Ian >> >>On Friday, September 26, 2003, at 07:29 PM, Starr Hazard wrote: >> >> >>>Folks, >>> >>>In a recent paper, Kawasacki et al(pubmed 12808467) report on the >>>interaction between a specific miRNA (human miRNA23 g.i. 17646028) and >>>a specific mRNA (human HES1 g.i. 8400709). They suggest they did a >>>BLAST search and ultimately located the interaction. I cannot >>>duplicate their data mining and cannot find the association they >>>describe. >>> >>>In general, is there a way to take a library of miRNAs and evaluate >>>their potential interaction with a particular mRNA? Or is there a data >>>mining tool that could screen a large pool of mRNAs for >>>significant interactions with a pool miRNAs? >>> >>>I cannot at present see any BioPerl tools that address this issue >>>(right now that means I scanned the FAQ for the string RNA and >>>searched the BioPerl site for RNA but found only some traffic about >>>Seq.pm).The people I have asked seem divided about whether this is >>>text matching issue or more of a hybridization issue involving >>>an energy of interaction evaluation. >>> >>>Anybody got any pointers to offer? >>> >>>Starr >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 1060 Commerce Park, Oak Ridge TN 37830-8026 USA tel +865 576 5120 fax +865 241 1965 e-mail: skirov@utk.edu sao@ornl.gov From MEC at Stowers-Institute.org Mon Sep 29 13:48:09 2003 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Mon Sep 29 13:46:09 2003 Subject: [Bioperl-l] script installation Message-ID: Fellow BioPerlers, I just installed v 1.2.3 and can not find any mention of how scripts are, or are not, installed. Amy I missing something? My best effort at finding a decision on what was going to be done found this thread: http://portal.open-bio.org/mailman/htdig/bioperl-l/2003-March/011573.htm l It begins as follows, but never seemed to resolve. Any help appreciated! Thanks, Malcolm Excerpt from thread: The README in scripts/utilities says: This directory is for robust scripts which have documentation, cmdline arguments, and can be used in a production environment. Their extensions will be renamed .pl and will be installed in the SCRIPT_INSTALL directory as defined in the Makefile.PL configuration. We should probably go ahead and implement this, or temporarily remove this README. If you'd like to proceed with the former I can make sure that the script installation is documented in the appropriate places, and I'll also go ahead and test all the scripts in scripts/ as best as I can, as promised. Your thoughts? From brian_osborne at cognia.com Mon Sep 29 14:02:41 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Sep 29 14:05:16 2003 Subject: [NORDNS] [65.246.187.176] [Bioperl-l] script installation In-Reply-To: Message-ID: Malcolm, No, you're not missing anything. Version 1.2.3 is based on version 1.2, naturally, which didn't have script installation. Script installation won't formally appear until 1.3 but there's documentation out there that talks about script installation. This is my fault. I had assumed, wrongly, that script installation would appear earlier than it has so I jumped the gun and wrote documentation that talked about it. If you would like script installation then install the latest version in CVS (bioperl-live, for example). I'll revise the documentation. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Cook, Malcolm Sent: Monday, September 29, 2003 1:48 PM To: bioperl-l@bioperl.org Subject: [NORDNS] [65.246.187.176] [Bioperl-l] script installation Fellow BioPerlers, I just installed v 1.2.3 and can not find any mention of how scripts are, or are not, installed. Amy I missing something? My best effort at finding a decision on what was going to be done found this thread: http://portal.open-bio.org/mailman/htdig/bioperl-l/2003-March/011573.htm l It begins as follows, but never seemed to resolve. Any help appreciated! Thanks, Malcolm Excerpt from thread: The README in scripts/utilities says: This directory is for robust scripts which have documentation, cmdline arguments, and can be used in a production environment. Their extensions will be renamed .pl and will be installed in the SCRIPT_INSTALL directory as defined in the Makefile.PL configuration. We should probably go ahead and implement this, or temporarily remove this README. If you'd like to proceed with the former I can make sure that the script installation is documented in the appropriate places, and I'll also go ahead and test all the scripts in scripts/ as best as I can, as promised. Your thoughts? _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From vamsi at warta.bio.psu.edu Mon Sep 29 15:26:44 2003 From: vamsi at warta.bio.psu.edu (Vamsi) Date: Mon Sep 29 15:24:38 2003 Subject: [Bioperl-l] Bio::Graphics segments problem Message-ID: <200309291926.h8TJQik06123@warta.bio.psu.edu> Hi, When I run the script below, the co-ordinates seem to be off slightly. They start correctly but end at 41 instead of 40 and 91 instead of 90 with the latest tar.gz bioperl distribution. However, they are correctly rendered using an older distribution. Also, I am not able to use the commented out $p->add_track(segments => [[1,40],[61,90]]) line though the documentation for Bio::Graphics::Panel shows something very similar. Any suggestions? Thanks, Vamsi ------------------ use Bio::Graphics::Panel; use Bio::SeqFeature::Generic; my $length = 100; my $seq = Bio::SeqFeature::Generic->new(-start => 1, -end => $length); my $p = new Bio::Graphics::Panel( -length => $length, -width => 800, ); $p->add_track($seq, -glyph => 'arrow', -tick => 2, -double => 1, ); #$p->add_track(segments => [[1,40],[61,90]]); my $g = $p->add_track(); $g->add_group([[1,40],[61,90]]); open(F, '>b.png'); binmode F; print F $p->png(); close F; From chauser at duke.edu Mon Sep 29 17:02:17 2003 From: chauser at duke.edu (Charles Hauser) Date: Mon Sep 29 17:02:19 2003 Subject: [Bioperl-l] modules for loading HSP data into chado:f eature, featureloc,feature_relationship Message-ID: <1064869464.19334.109.camel@pandorina.biology.duke.edu> All, As I ponder loading SearchIO-type data (blast, blat etc.) into chado I wanted to check if some has previously written modules to fill the tables; feature, featureloc and feature_relationship starting from SeqFeature data? Charles From vulnus at ZEDAT.FU-Berlin.DE Mon Sep 29 18:02:56 2003 From: vulnus at ZEDAT.FU-Berlin.DE (1 .. 22...333) Date: Mon Sep 29 18:00:50 2003 Subject: [Bioperl-l] blastq3 and remove_rid problem Message-ID: hi, after i find out that ncbi change their notation of the request id to #-#-#.blastq3 i put in the beginning of my scripts following line: $Bio::Tools::Run::RemoteBlast::RIDLINE = 'RID\s+=\s+(\d+-\d+-\d+\.BLASTQ\d)'; but still --> my script is not working! i think this is because the $blast-->remove_rid($rid) is not deleting the $rid out of the blast object. does anybody have the same problems or can somebody help me. cheers From maasha at image.dk Tue Sep 30 07:58:03 2003 From: maasha at image.dk (Martin A. Hansen) Date: Tue Sep 30 07:57:21 2003 Subject: [Bioperl-l] parsing blast reports Message-ID: <20030930115803.GK664@image> hi using Bio::SearchIO to parse blastn reports, how does one find out if a hit is on the leading or the complement strand? and is it only I who have trouble with these sorts of simple questions? i tried using Class::Inspector to view and test all the methods in Bio::Search::Hit::GenericHit and Bio::Search::HSP::GenericHSP, but i see nothing relating to strand ( "Plus / Plus" or "Plus / Minus" as seen in the blast report). it is rather diffecult, as a bioperl beginner, to pin down the function you need in order to solve a fairly simple question. tracing the inherited objects from the documentation in order to locate a specific functionality is awkward. and tracing the functionality from the code alone is painful. so what does one do? martin From brian_osborne at cognia.com Tue Sep 30 08:35:47 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Sep 30 08:37:56 2003 Subject: [Bioperl-l] parsing blast reports In-Reply-To: <20030930115803.GK664@image> Message-ID: Martin, Take a look at the SearchIO HOWTO, http://bioperl.org/HOWTOs/html/SearchIO.html, it may help with some of your SearchIO questions. You're absolutely right about it being awkward to trace the location of inherited methods, but this is intrinsic to Perl. The bptutorial.pl script can help out here, take a look at this: http://bioperl.org/Core/Latest/bptutorial.html#v.1_appendix:_finding_out_whi ch_methods_are_used_by_which_bioperl_objects Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Martin A. Hansen Sent: Tuesday, September 30, 2003 7:58 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] parsing blast reports hi using Bio::SearchIO to parse blastn reports, how does one find out if a hit is on the leading or the complement strand? and is it only I who have trouble with these sorts of simple questions? i tried using Class::Inspector to view and test all the methods in Bio::Search::Hit::GenericHit and Bio::Search::HSP::GenericHSP, but i see nothing relating to strand ( "Plus / Plus" or "Plus / Minus" as seen in the blast report). it is rather diffecult, as a bioperl beginner, to pin down the function you need in order to solve a fairly simple question. tracing the inherited objects from the documentation in order to locate a specific functionality is awkward. and tracing the functionality from the code alone is painful. so what does one do? martin _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Tue Sep 30 09:09:56 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Sep 30 09:08:02 2003 Subject: [Bioperl-l] parsing blast reports In-Reply-To: <20030930115803.GK664@image> References: <20030930115803.GK664@image> Message-ID: Query sequence orientation $hsp->query->strand Hit sequence orientation $hsp->hit->strand On Tue, 30 Sep 2003, Martin A. Hansen wrote: > hi > > using Bio::SearchIO to parse blastn reports, how does one find out if a hit is > on the leading or the complement strand? > > and is it only I who have trouble with these sorts of simple questions? i tried > using Class::Inspector to view and test all the methods in > Bio::Search::Hit::GenericHit and Bio::Search::HSP::GenericHSP, but i see > nothing relating to strand ( "Plus / Plus" or "Plus / Minus" as seen in the > blast report). > > it is rather diffecult, as a bioperl beginner, to pin down the function you need > in order to solve a fairly simple question. tracing the inherited objects from > the documentation in order to locate a specific functionality is awkward. and > tracing the functionality from the code alone is painful. so what does one do? > > martin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From Marc.Logghe at devgen.com Tue Sep 30 09:13:32 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Sep 30 09:11:43 2003 Subject: [Bioperl-l] how to pass --noplot option to Bio::Tools::Run::Tmhmm ? Message-ID: Hi, A short question: is there a way to pass the tmhmm --noplot option to the Bio::Tools::Run::Tmhmm object ? Greetz, Marc From jason at cgt.duhs.duke.edu Tue Sep 30 10:13:20 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Sep 30 10:11:30 2003 Subject: [Bioperl-l] how to pass --noplot option to Bio::Tools::Run::Tmhmm ? In-Reply-To: References: Message-ID: The object would need a little refactoring to do that. I have made some changes to this effect for you but need to test on a system with tmhmm first before I check them in. As an aside, I'm also noticing there is a real need to enforce some more consistency in Tools::Run implementations - we have tried to put a lot of shared functions in Bio::Tools::Run::WrapperBase which do things, but we need volunteers to help with code audits in the future. I'll try and hack up a guideline sheet with input from Shawnh by the end of the month. -jason On Tue, 30 Sep 2003, Marc Logghe wrote: > Hi, > A short question: is there a way to pass the tmhmm --noplot option to the Bio::Tools::Run::Tmhmm object ? > Greetz, > Marc > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From Marc.Logghe at devgen.com Tue Sep 30 11:14:40 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Sep 30 11:16:46 2003 Subject: [Bioperl-l] how to pass --noplot option to Bio::Tools::Run::Tmhmm ? Message-ID: > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: Tuesday, September 30, 2003 4:13 PM > To: Marc Logghe > Cc: Bioperl-L (E-mail) > Subject: Re: [Bioperl-l] how to pass --noplot option to > Bio::Tools::Run::Tmhmm ? > > > The object would need a little refactoring to do that. I > have made some > changes to this effect for you but need to test on a system with tmhmm > first before I check them in. Cool. Thanks Jason. If you'd like, I am willing to test. Regards, Marc From brian_osborne at cognia.com Tue Sep 30 11:35:03 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Sep 30 11:37:27 2003 Subject: [Bioperl-l] Windows support? Message-ID: Bioperl-l, We have no PPM files for our latest release. Would someone like to help out and create them? Jason is not allowed to do this. Seriously. It's simply not right that we continue to take advantage of his generous nature. Brian O. From vesko_baev at abv.bg Tue Sep 30 16:56:57 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Tue Sep 30 16:54:52 2003 Subject: [Bioperl-l] bioinformatics(BioPerl)/cell biology Message-ID: <1263170731.1064955417709.JavaMail.nobody@app1.ni.bg> Hello All, Can anyone tell me some themes for using the bioinformatics(and BioPerl) in Cell biology science? How can I combine these two sciences? Thanks in advance! Sincerely yours, Vesselin Baev ----------------------------------------------------------------- http://gsm.ABV.bg - ????? ???? ??????? ?? ???? ??????? ! From cjm at fruitfly.org Tue Sep 30 21:42:45 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Tue Sep 30 21:41:11 2003 Subject: [Bioperl-l] proposed changes to RangeI.pm In-Reply-To: Message-ID: I have committed this. hopefully it shouldn't break anyone's code note that CoordinateMapper.t fails, but this seems to fail with or without this change cheers Chris On Wed, 17 Sep 2003, Chris Mungall wrote: > > both intersection() and union() are documented as returning a (start, end, > strand) triple. in actual fact, intersection returns a RangeI compliant > object, and union() returns either a RangeI object or a triple depending > on wantarray() > > I have fixed things so that both intersection() and union() return either > RangeI or triple depending on wantarray() - following the principle of > least surprise - and documented this. The test suite passes. > > This will break code like this: > > $h = { 'range' => $sf->intersection($sf2) } > > since wantarray will be true here; however, this code violates the > previously documented interface anyway. > > I have also added a new method disconnected_ranges() to RangeI > > I could easily migrate this method somewhere else, but it seems to belong > with other geometrical methods such as intersection and union > > here is the pod docs: > > > Title : disconnected_ranges > Usage : my @disc_ranges = Bio::Range->disconnected_ranges(@ranges); > > Function: finds the minimal set of ranges such that each input range > is fully contained by at least one output range, and none of > the output ranges overlap > Args : a list of ranges > Returns : a list of objects of the same type as the input (conforms to > RangeI) > > =cut > > > is this a good time to check these changes in? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason at cgt.duhs.duke.edu Tue Sep 30 22:29:54 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Sep 30 22:27:47 2003 Subject: [Bioperl-l] palindrome Message-ID: I needed it so I wrote it - Added a parser for EMBOSS's 'palindrome' program as Bio::Tools::EMBOSS::Palindrome Others writing EMBOSS parsers should throw them in there as well I guess. The water/needle/matcher alignment parsing is handled by Bio::AlignIO::emboss currently - might add a pointer or a fake class which delegates to this for folks who are expecting to find parsers for that data in there as well. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From 010381P at nyp.edu.sg Mon Sep 29 02:08:36 2003 From: 010381P at nyp.edu.sg (010381P GOH PHUAY CHENG) Date: Thu Oct 2 08:21:43 2003 Subject: [Bioperl-l] RE: problem connecting to remote blast Message-ID: hi, i have already remove the entrez query part but i still encountered the same warning. my codes: #!/usr/bin/perl -w use Bio::SeqIO; use Bio::Tools::Run::RemoteBlast; use strict; my $prog = 'blastn'; my $db = 'ecoli.nt'; my $e_val= '1e-10'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; my $str = Bio::SeqIO->new(-file=>'test.txt' , '-format' => 'fasta' ); while (my $input = $str->next_seq()){ #my $r = $factory->submit_blast($input); my $r = $factory->submit_blast('test.txt'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } -------------------- WARNING --------------------- MSG:

INFO: [blastsrv4.REAL]: Error: Segmentation violation (or m emory usage limit was exceeded) SIGSEGV (11).


BLASTN 2.2.6 [Apr-09-2003]

RID: 1064815728-21972-271666.BLASTQ3
Query= Test
         (560 letters)

No significant similarity found. For reasons why, click here.

--------------------------------------------------- -----Original Message----- From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] Sent: Mon 9/29/2003 9:32 AM To: 010381P GOH PHUAY CHENG Cc: Subject: Re: problem connecting to remote blast I suppose it might be because you specified an entrez query for homo sapiens on the ecoli db - that is not going to return any hits. remove the entrez query part perhaps. please post your questions to the bioperl list in the future - bioperl-l@bioperl.org Thanks, -jason On Mon, 29 Sep 2003, 010381P GOH PHUAY CHENG wrote: > hi Jason, > i encountered some problems running the remote blast.could u please check for > me wats wrong with my codes. Thanks. > > phuay cheng > > -------------------- WARNING --------------------- > MSG:

> INFO: [blastsrv4.REAL]: Error: Segmentation violation > (or m > emory usage limit was exceeded) SIGSEGV (11). >


>

> BLASTN 2.2.6 [Apr-09-2003]
>
> 
RID: 1064796976-1676-2574126.BLASTQ3
> Query= Test
>          (560 letters)
>
> No significant similarity found. For reasons why,  "/blast/blast
> _FAQs.html#no hits">click here.

> > > --------------------------------------------------- > > #!/usr/bin/perl -w > use Bio::SeqIO; > use Bio::Tools::Run::RemoteBlast; > use strict; > my $prog = 'blastn'; > my $db = 'ecoli.nt'; > #my $e_val= '1e-10'; > my @params = ( '-prog' => $prog, > '-data' => $db, > '-readmethod' => 'SearchIO' ); > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > [ORGN]'; > > delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > my $v = 1; > > my $str = Bio::SeqIO->new(-file=>'test.txt' , '-format' => 'fasta' ); > while (my $input = $str->next_seq()){ > > #my $r = $factory->submit_blast($input); > my $r = $factory->submit_blast('test.txt'); > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From 010381P at nyp.edu.sg Mon Sep 29 02:16:45 2003 From: 010381P at nyp.edu.sg (010381P GOH PHUAY CHENG) Date: Thu Oct 2 08:21:52 2003 Subject: [Bioperl-l] problem with remote blast Message-ID: hi, i am trying to connect to the remote blast. but i encountered the following warnings. could u pls help me. -------------------- WARNING --------------------- MSG:

INFO: [blastsrv4.REAL]: Error: Segmentation violation (or m emory usage limit was exceeded) SIGSEGV (11).


BLASTN 2.2.6 [Apr-09-2003]

RID: 1064815996-23369-1254165.BLASTQ3
Query= Test
         (560 letters)

No significant similarity found. For reasons why, click here.

--------------------------------------------------- my codes: #!/usr/bin/perl -w use Bio::SeqIO; use Bio::Tools::Run::RemoteBlast; use strict; my $prog = 'blastn'; my $db = 'ecoli.nt'; my $e_val= '1e-10'; my @params = ( '-prog' => $prog, '-data' => $db, ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; my $str = Bio::SeqIO->new(-file=>'test.txt' , '-format' => 'fasta' ); while (my $input = $str->next_seq()){ #my $r = $factory->submit_blast($input); my $r = $factory->submit_blast('test.txt'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } }