From shameer at ncbs.res.in Tue May 1 07:36:31 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 1 May 2007 17:06:31 +0530 (IST) Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: References: <10259461.post@talk.nabble.com> Message-ID: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> Dear All, I am trying to impliment a bioperl based program to generate a dynamic, clickable image. I have used Dr. Lincoln Steins's code provided in example3 at this URL : http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to be perfect for my purpose. I need to add few modifications to the image. I reffered the Bio::Graphics HOWTO, Creating_Imagemaps documents and other old bio-perl list mails (may be am missing something imp.. ? ) but I couldnt get a quick solution, Thought I will ask about it to the experts for some tips and tricks. This is what I am looking for : 1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be changed according to length of the sequence. My sequence length is usually in a range of 70 - 200. 2. I also need to make the image interactive / clickable on the various blue bar as different hyperlink to NCBI / PDB using ID (This ids will be used instead of name of the blast hits) Many thanks in advance for your inputs, -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From shameer at ncbs.res.in Tue May 1 12:04:13 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 1 May 2007 21:34:13 +0530 (IST) Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> Message-ID: <42403.192.168.1.1.1178035453.squirrel@mail.ncbs.res.in> Dear Scot, > There is a fair amount of documentation in the perldoc for > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have > you read that? I agreed, but I couldnt the exact information I needed :( (may be I missed something important). > Also, for changing the scale, that should happen > automatically--have you tried yet? I tried by changing the Lincoln's program eg: blast3.pl my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); to my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); But it had given me a smaller scale of length upto 300. I was looking for an option where I need same width and height of given image and a dynamic start and end values depending on length of my sequence. Since I couldnt accomplish, I thought of getting some help from you guys. I think I need to play a little bit with the value for reformat the scale to accomodate my hits as well. Thanks a lot for your inputs, -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From shameer at ncbs.res.in Tue May 1 12:04:11 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 1 May 2007 21:34:11 +0530 (IST) Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <1178028249.2644.13.camel@localhost.localdomain> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> Message-ID: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> Dear Scot, > There is a fair amount of documentation in the perldoc for > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have > you read that? I agreed, but I couldnt the exact information I needed :( (may be I missed something important). > Also, for changing the scale, that should happen > automatically--have you tried yet? I tried by changing the Lincoln's program eg: blast3.pl my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); to my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); But it had given me a smaller scale of length upto 300. I was looking for an option where I need same width and height of given image and a dynamic start and end values depending on length of my sequence. Since I couldnt accomplish, I thought of getting some help from you guys. I think I need to play a little bit with the value for reformat the scale to accomodate my hits as well. Thanks a lot for your inputs, -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From cain at cshl.edu Tue May 1 10:04:09 2007 From: cain at cshl.edu (Scott Cain) Date: Tue, 01 May 2007 10:04:09 -0400 Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> Message-ID: <1178028249.2644.13.camel@localhost.localdomain> Hi Shameer, There is a fair amount of documentation in the perldoc for Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have you read that? Also, for changing the scale, that should happen automatically--have you tried yet? Scott On Tue, 2007-05-01 at 17:06 +0530, Shameer Khadar wrote: > Dear All, > > I am trying to impliment a bioperl based program to generate a dynamic, > clickable image. I have used Dr. Lincoln Steins's code provided in > example3 at this URL : > http://stein.cshl.org/genome_informatics/BioGraphics/blast3.html seems to > be perfect for my purpose. > > I need to add few modifications to the image. I reffered the Bio::Graphics > HOWTO, Creating_Imagemaps documents and other old bio-perl list mails > (may be am missing something imp.. ? ) but I couldnt get a quick > solution, Thought I will ask about it to the experts for some tips and > tricks. > > This is what I am looking for : > > 1. I need image of exactly same size and the scale (0.1k .. 0.9k) to be > changed according to length of the sequence. My sequence length is usually > in a range of 70 - 200. > > 2. I also need to make the image interactive / clickable on the various > blue bar as different hyperlink to NCBI / PDB using ID (This ids will be > used instead of name of the blast hits) > > > Many thanks in advance for your inputs, -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/f84a3220/attachment.bin From cjfields at uiuc.edu Tue May 1 13:10:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 May 2007 12:10:10 -0500 Subject: [Bioperl-l] Pb makefile In-Reply-To: References: Message-ID: Is there any reason you want to install bioperl 1.4 (which is over 3 yrs old)? The latest is v.1.5.2 (Dec. 2006); man page generation has been fixed for that version, which uses Module::Build. The man page generation was turned off prior to 1.4, though I may be wrong. Based on the Extutils::MakeMaker FAQ you should be able to prevent man page generation this way: perl Makefile.PL INSTALLMAN1DIR=none INSTALLMAN3DIR=none chris On Apr 30, 2007, at 5:35 AM, Francoise.LECOMTE at biogemma.com wrote: > Hi > I try to install biopoerl1.4 on Tru64 plateform and I've got a message > "make:line too long" when I run the command make install > How can I solve it ? How disable man pages installaton in > Makefile.PL if > it can sove this problem > > Best regards > > Fran?oise Lecomte From cain.cshl at gmail.com Tue May 1 15:50:42 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 01 May 2007 15:50:42 -0400 Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> Message-ID: <1178049042.2644.36.camel@localhost.localdomain> Perhaps if you provided some code and sample data we might be able to help you better. Scott On Tue, 2007-05-01 at 21:34 +0530, Shameer Khadar wrote: > Dear Scot, > > > There is a fair amount of documentation in the perldoc for > > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have > > you read that? > > I agreed, but I couldnt the exact information I needed :( (may be I missed > something important). > > > Also, for changing the scale, that should happen > > automatically--have you tried yet? > > I tried by changing the Lincoln's program eg: blast3.pl > my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); > to my > $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); > > But it had given me a smaller scale of length upto 300. I was looking for > an option where I need same width and height of given image and a dynamic > start and end values depending on length of my sequence. Since I couldnt > accomplish, I thought of getting some help from you guys. I think I need > to play a little bit with the value for reformat the scale to accomodate > my hits as well. > > Thanks a lot for your inputs, -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070501/9c655e4c/attachment.bin From agathman at semo.edu Tue May 1 19:10:20 2007 From: agathman at semo.edu (Gathman, Allen) Date: Tue, 1 May 2007 18:10:20 -0500 Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2 Message-ID: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu> Hi, all -- I've been using BioPerl 1.4 for a while; recently I installed 1.5.2, and found that scripts that had been using spliced_seq are now broken. Any thoughts on what might be going on? Here's a sample script: ********************************************* #!/usr/bin/perl -w use strict; use Bio::DB::GFF; my $db = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=cc;host=localhost', -fasta => '/gbrowse/databases/cc' ); $db->add_aggregator('transcript{CDS/mRNA}'); my $seg=$db->segment('ccin_Contig120'); my @genes=$seg->features(-types=>('transcript:GLEAN_alt')); for my $gene (@genes) { my $gid = $gene->display_id; print STDERR "Gene is $gid\n"; my $splgene = $gene->spliced_seq(); } ******************************************** The line with "spliced_seq" in it crashes the program. Here's the STDERR output: Gene is Jan06m400_GLEAN_11487 -------------------- WARNING --------------------- MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have absolute set to 1 -- be warned you may not be getting things on the correct strand --------------------------------------------------- -------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is ::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::,(0,881935 ,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098) --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Attempting to set the sequence to [Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170)Bio::Prim arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308)Bio::PrimarySeq=HAS H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH(0x881f4a 4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)] which does not look healthy STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359 STACK: Bio::PrimarySeq::seq /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258 STACK: Bio::PrimarySeq::new /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210 STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484 STACK: Bio::SeqFeatureI::spliced_seq /usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498 STACK: /transfer/testsplice.pl:20 ----------------------------------------------------------- Allen Gathman http://cstl-csm.semo.edu/gathman From cjfields at uiuc.edu Tue May 1 20:27:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 1 May 2007 19:27:46 -0500 Subject: [Bioperl-l] Problem with spliced_seq in BioPerl 1.5.2 In-Reply-To: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu> References: <2DA21E6CECCDE7419541C7A6DB798F0C080CB704@EXCHANGE.semo.edu> Message-ID: <9F00B020-AFF0-40DB-9694-6061B5A11A73@uiuc.edu> Can you file a bug on this? Attach the script and maybe detail what data is loaded into your local MySQL database (if possible). chris On May 1, 2007, at 6:10 PM, Gathman, Allen wrote: > Hi, all -- > > I've been using BioPerl 1.4 for a while; recently I installed > 1.5.2, and > found that scripts that had been using spliced_seq are now broken. > Any > thoughts on what might be going on? > > Here's a sample script: > > ********************************************* > > #!/usr/bin/perl -w > > use strict; > use Bio::DB::GFF; > > my $db = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=cc;host=localhost', > -fasta => '/gbrowse/databases/cc' > ); > $db->add_aggregator('transcript{CDS/mRNA}'); > my $seg=$db->segment('ccin_Contig120'); > my @genes=$seg->features(-types=>('transcript:GLEAN_alt')); > > for my $gene (@genes) { > my $gid = $gene->display_id; > > print STDERR "Gene is $gid\n"; > my $splgene = $gene->spliced_seq(); > } > > ******************************************** > The line with "spliced_seq" in it crashes the program. Here's the > STDERR output: > > Gene is Jan06m400_GLEAN_11487 > > -------------------- WARNING --------------------- > > MSG: Calling spliced_seq with a Bio::Das::SegmentI which does have > absolute set to 1 -- be warned you may not be getting things on the > correct strand > > --------------------------------------------------- > > -------------------- WARNING --------------------- > > MSG: seq doesn't validate, mismatch is > ::,(0,88211,0),::,(0,8821170),::,(0,8821260),::,(0,8821308),::, > (0,881935 > ,),::,(0,881,468),::,(0,881,4,4),::,(0,8818,0),::,(0,8819098) > > --------------------------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Attempting to set the sequence to > [Bio::PrimarySeq=HASH(0x88211d0)Bio::PrimarySeq=HASH(0x8821170) > Bio::Prim > arySeq=HASH(0x8821260)Bio::PrimarySeq=HASH(0x8821308) > Bio::PrimarySeq=HAS > H(0x881935c)Bio::PrimarySeq=HASH(0x881f468)Bio::PrimarySeq=HASH > (0x881f4a > 4)Bio::PrimarySeq=HASH(0x8818ff0)Bio::PrimarySeq=HASH(0x8819098)] > which > does not look healthy > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.6/Bio/Root/Root.pm:359 > > STACK: Bio::PrimarySeq::seq > /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:258 > > STACK: Bio::PrimarySeq::new > /usr/lib/perl5/site_perl/5.8.6/Bio/PrimarySeq.pm:210 > > STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.6/Bio/Seq.pm:484 > > STACK: Bio::SeqFeatureI::spliced_seq > /usr/lib/perl5/site_perl/5.8.6/Bio/SeqFeatureI.pm:498 > > STACK: /transfer/testsplice.pl:20 > > ----------------------------------------------------------- > > Allen Gathman > > http://cstl-csm.semo.edu/gathman > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From shameer at ncbs.res.in Tue May 1 23:46:59 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Wed, 2 May 2007 09:16:59 +0530 (IST) Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <1178049042.2644.36.camel@localhost.localdomain> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <1178049042.2644.36.camel@localhost.localdomain> Message-ID: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in> Dear Scott, Once thanks a lot for your inputs. I am following same data formats as in http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he blue boxes (feature) should be clickable like a hot-spot/imagesmap images. The purpose is to display these results in a web page. I am using the program in Stein's Bio::Graphics example http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl I need exactly same image as in http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png only difference is I need the scale (0.1k - 0.9k) in a range of simple 1-XXX , here XXX depends on the length of the sequence input. Many thanks for your help, > Perhaps if you provided some code and sample data we might be able to > help you better. > > Scott > -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From sdavis2 at mail.nih.gov Wed May 2 06:02:48 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 May 2007 06:02:48 -0400 Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <1178049042.2644.36.camel@localhost.localdomain> <59122.192.168.1.1.1178077619.squirrel@mail.ncbs.res.in> Message-ID: <200705020602.48404.sdavis2@mail.nih.gov> On Tuesday 01 May 2007 23:46, Shameer Khadar wrote: > Dear Scott, > > Once thanks a lot for your inputs. > > I am following same data formats as in > http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast_hits.txt > Only difference is instead of Hits, I will be using PFAMID/PDBID. Allt he > blue boxes (feature) should be clickable like a hot-spot/imagesmap images. > The purpose is to display these results in a web page. Do you have your data loaded into bioperl objects? What code did you use for that (post that code)? > I am using the program in Stein's Bio::Graphics example > http://stein.cshl.org/genome_informatics/BioGraphics/eg/blast3.pl Does this example run on your computer? Have you been able to use the bioperl objects you created in the first step in the creation of a graphic? If not, what have you tried (post the code) and any error messages. > I need exactly same image as in > http://stein.cshl.org/genome_informatics/BioGraphics/fig3.png > only difference is I need the scale (0.1k - 0.9k) in a range of simple > 1-XXX , here XXX depends on the length of the sequence input. Again, what have you tried? Posting code is helpful here, also. I'm not an expert in bioperl graphics, but it does really help those that know to see the code that you have written to know how best to help. Sean From lzlgboy at gmail.com Wed May 2 09:58:14 2007 From: lzlgboy at gmail.com (kenzy ken) Date: Wed, 2 May 2007 21:58:14 +0800 Subject: [Bioperl-l] Extract CDS from CDNA given Protein SEQs Message-ID: Hi ,everyone I got a task to extract cds sequences from cdna , and I have the protein sequence for each cdna, what should I do? Should I try 3_frmae_translate? But how. Thanks. -- ?????? Chen,Kenian =========================== School of Life Science, Sun Yat-Sen University =========================== Xingang Xilu 135 Guangzhou, Guangdong 510275 P. R. China =========================== Phone: (86) 20-84113677; (86) 20-34474683; Fax: (86) 20-34022356 =========================== Email:lzlgboy at gmail.com; chenkn at mail2.sysu.edu.cn From MEC at stowers-institute.org Wed May 2 18:38:31 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 2 May 2007 17:38:31 -0500 Subject: [Bioperl-l] Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase In-Reply-To: References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com> Message-ID: Lincoln, Here for your comment and review is a very reworked version of Bio::Graphics::FeatureBase->gff3_string. The main difference is to that homogenous children get ALL their attributes except for start/stop from the parent, including the group. I also provide option as to whether or now to "remove extraneous level of parentage" called $preserveHomegenousParent. There is an in-line comment and question for you in the code body. It works well in my hands to my use cases, but, I'm not positive it is in the spirit of your intentions. Cheers, Malcolm sub gff3_string { my ($self, $recurse, $preserveHomegenousParent, # Note: the following parameters, whose name begins with '$_', # are intended for recursive call only. $_parent, $_self_is_hsf, # is $self the child in a homogeneous parent/child relationship? $_hsf_parentgroup, # if so, what is the group (GFF column 9) of the parent ) = @_; # PURPOSE: Return GFF3 format for the feature $self. Optionally # $recurse to include GFF for any subfeatures of the feature. If # recursing, provide special handling to "remove an extraneous level # of parentage" (unless $preserveHomegenousParent) for features # which have subfeatures all of whose types are the same as the # feature itself (the "homogenous parent/child" case). This usage is # a convention for representing discontiguous features; they may be # created by using the -segment directive without specifying a # distinct -subtype in to `new` when creating a # Bio::Graphics::FeatureBase (i.e. Bio::DB::SeqFeature, # Bio::Graphics::Feature). Such homogenous subfeatures created in # this fashion DO NOT have the parent (GFF column 9) attributes # propogated to them; so, since they are all part of the same # parent, the ONLY difference relevant to GFF production SHOULD be # the $start and $end coordinates for their segment, and ALL THIER # OTHER ATTRIBUTES should be copied down from the parent (including: # strand, score, Name, ID, Parent, etc). my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's parent, if it is a homogenous child, otherwise $self. if ($recurse && (my @ssf = $self->sub_SeqFeature)) { my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be TRUE only if all subfeatures are the same type as $self. my $mygroup = # compute $self's group if it is needed to be passed down to # subfeatures, unless it is already being passed down (in which # case there are (at least) 3 levels of homogenous parent child # (will this ever happen in practice???)) ! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup : $self->format_attributes($_parent); return (join("\n", (($preserveHomegenousParent ? ($self->gff3_string(0)) : ()) , map {$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo genous,$mygroup)} @ssf))); } else { my $name = $hparentORself->name; my $class = $hparentORself->class; my $group = $_self_is_hsf ? $_hsf_parentgroup : $self->format_attributes($_parent); my $strand = ('-','.','+')[$self->strand+1]; # TODO: understand conditions under which this could be other than # hparentORself->strand. In particular, why does add_segment flip # the strand when start > stop? I thought this was not allowed! # Lincoln - any ideas? my $p = join("\t", $hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met hod||'.', $self->start||'.',$self->stop||'.', defined($hparentORself->score) ? $hparentORself->score : '.', $strand||'.', defined($hparentORself->phase) ? $hparentORself->phase : '.', $group||''); } } ________________________________ From: Cook, Malcolm Sent: Friday, April 27, 2007 1:45 PM To: 'lincoln.stein at gmail.com' Cc: 'lstein at cshl.org'; 'bioperl list' Subject: RE: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Lincoln, Cool. The principal of what I figured out I still think holds but the implementation is slightly broke. Improved patch forthoming next week. Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Friday, April 27, 2007 12:45 PM To: Cook, Malcolm Cc: lstein at cshl.org; bioperl list Subject: Re: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Malcom, This is absolutely ok and you can go ahead and commit. Thanks for figuring this out! Lincoln On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org > wrote: Lincoln, et al, I find that the gff3_string for Bio::DB::SeqFeature objects retreived from a Bio::DB::SeqFeature::Store that were initially created with -seqments (i.e. whose location was discontiguous) does not display any other attributes in column 9 than "Name". What do you think of the following patch to Bio::Graphics::FeatureBase, whose effect is to "contrive to return (duplicated) common group values" (which otherwise get lost when "collapsing" "homogenous" parent/child features) Another approach would be to copy the attributes from the parent to the children when the -seqments are first created. Another approach would be to use Bio::SeqFeature::Generic as the db's -seqfeature_class and save with -location being a Bio::Location::Split, but this was wrougth with other problems. Any other suggestions? Do you want me to commit this patch? Cheers, Malcolm Patch follows: Index: FeatureBase.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v retrieving revision 1.29 diff -c -r1.29 FeatureBase.pm *** FeatureBase.pm 16 Apr 2007 19:55:33 -0000 1.29 --- FeatureBase.pm 26 Apr 2007 16:30:23 -0000 *************** *** 581,587 **** foreach (@children) { s/Parent=/ID=/g; } # replace Parent tag with ID ! return join "\n", at children; } return join("\n",$p, at children); --- 581,589 ---- foreach (@children) { s/Parent=/ID=/g; } # replace Parent tag with ID ! #return join "\n", at children; ! # Instead of above, additionally, contrive to return (duplicated) common group values ! return(join("$group\n", at children) . $group); } return join("\n",$p, at children); -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Thu May 3 12:01:38 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 3 May 2007 12:01:38 -0400 Subject: [Bioperl-l] Help : Imagemaps using Bio::Graphics In-Reply-To: <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> Message-ID: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> The width of the image is determined by the -width attribute and is given in pixels. You cannot control the height of the image as it is computed dynamically based on the number of features and bumping options. Lincoln On 5/1/07, Shameer Khadar wrote: > > Dear Scot, > > > There is a fair amount of documentation in the perldoc for > > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; have > > you read that? > > I agreed, but I couldnt the exact information I needed :( (may be I missed > something important). > > > Also, for changing the scale, that should happen > > automatically--have you tried yet? > > I tried by changing the Lincoln's program eg: blast3.pl > my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); > to my > $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); > > But it had given me a smaller scale of length upto 300. I was looking for > an option where I need same width and height of given image and a dynamic > start and end values depending on length of my sequence. Since I couldnt > accomplish, I thought of getting some help from you guys. I think I need > to play a little bit with the value for reformat the scale to accomodate > my hits as well. > > Thanks a lot for your inputs, > -- > Shameer Khadar > Lab (# 25) The Computational Biology Group > National Centre for Biological Sciences (TIFR) > GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > T - 91-080-23666001 EXT - 6251 > W - http://www.ncbs.res.in > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bioperlanand at yahoo.com Thu May 3 16:09:18 2007 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Thu, 3 May 2007 13:09:18 -0700 (PDT) Subject: [Bioperl-l] a query on Obtaining UniProt sequences Message-ID: <922386.19570.qm@web36808.mail.mud.yahoo.com> Hi I am using Bioperl 1.4 and I am trying to obtain protein sequences for specific Uniprot records. For some records (ROA1_HUMAN), it prints the correct sequence, but it first prints the warning "Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, line 43." For other records (BOLA_HAEIN), it prints the correct sequence (without any warnings). Here is the code: ------------------------------------------------------------------------------------------- #!/usr/bin/perl -w use strict; use Bio::Perl; use Bio::DB::SwissProt; my $sp = new Bio::DB::SwissProt; #my $seq_object = $sp->get_Seq_by_id('ROA1_HUMAN'); my $seq_object = $sp->get_Seq_by_id('BOLA_HAEIN'); my $sequence_as_a_string = $seq_object->seq(); print "$sequence_as_a_string\n"; ------------------------------------------------------------------------------------------- Is there something I need to fix. Thanks in advance for the help. Anand --------------------------------- Ahhh...imagining that irresistible "new car" smell? Check outnew cars at Yahoo! Autos. From MEC at stowers-institute.org Thu May 3 16:19:00 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 3 May 2007 15:19:00 -0500 Subject: [Bioperl-l] Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase In-Reply-To: <6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com> References: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com> <6dce9a0b0705030745u3a1afffew68538f515c6b663b@mail.gmail.com> Message-ID: Lincoln, Ah, yes, round-tripping GFF, the holy grail.... Unfortunately, I don't really have a baseline to go against for an example that roundtrips successfully now. Do you? For example, after loading test data: > bp_seqfeature_load.PLS bioperl-live/t/data/biodbgff/test.gff3 the Contig1 portion of which looks like this: ##gff-version 3 ## sequence-region Contig1 1 37450 Contig1 confirmed transcript 1001 2000 42 + . ID=Transcript:trans-1;Gene=abc-1;Gene=xyz-2;Note=function+unknown Contig1 confirmed exon 1001 1100 . + . ID=Transcript:trans-1 Contig1 confirmed exon 1201 1300 . + . ID=Transcript:trans-1 Contig1 confirmed exon 1401 1450 . + . ID=Transcript:trans-1 Contig1 confirmed CDS 1051 1100 . + 0 ID=Transcript:trans-1 Contig1 confirmed CDS 1201 1300 . + 2 ID=Transcript:trans-1 Contig1 confirmed CDS 1401 1440 . + 0 ID=Transcript:trans-1 Contig1 est similarity 1001 1100 96 . . Target=EST:CEESC13F 1 100 + Contig1 est similarity 1201 1300 99 . . Target=EST:CEESC13F 101 200 + Contig1 est similarity 1401 1450 99 . . Target=EST:CEESC13F 201 250 + Contig1 tc1 transposon 5001 6000 . + . ID=Transposon:c128.1 Contig1 tc1 transposon 8001 9000 . - . ID=Transposon:c128.2 Contig1 confirmed transcript 30001 31000 . - . ID=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting Contig1 confirmed exon 30001 30100 . - . ID=Transcript:trans-2;Gene=abc-1;Note=function+unknown Contig1 confirmed exon 30701 30800 . - . ID=Transcript:trans-2 Contig1 confirmed exon 30801 31000 . - . ID=Transcript:trans-2 and then generating output with >bp_seqfeature_gff3.PLS --gff=1 -- seq_id Contig1 # using a script I just committed - I hope you like it. Note: gff=1 => recurse we get output gff with problems such as: 1 IDs get turned into Aliases 2 the seqid of a Target attributes gets copied into the features Name attribute 3 supression of parents of homogeneous subfeatures doesn't work when the parent has other subfeatures that those with its same type (i.e. the transcript feature also has exon subfeatures) look: Contig1 est similarity 1001 1100 96 . . Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 + Contig1 est similarity 1201 1300 99 . . Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 + Contig1 est similarity 1401 1450 99 . . Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 + Contig1 confirmed transcript 1001 2000 42 + . ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown Contig1 confirmed transcript 1001 2000 42 + . Parent=2;Alias=Transcript:trans-1;Note=function+unknown;Gene=abc-1,xyz-2 Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1 Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1 Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2 Contig1 confirmed transcript 30001 31000 . - . ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting Contig1 confirmed transcript 30001 31000 . - . Parent=9;Alias=Transcript:trans-2;Note=Terribly+interesting;Gene=xyz-2 Contig1 confirmed exon 30001 30100 . - . Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown Contig1 confirmed exon 30701 30800 . - . Parent=9;Alias=Transcript:trans-2 Contig1 confirmed exon 30801 31000 . - . Parent=9;Alias=Transcript:trans-2 Contig1 . region 1 37450 . . . Name=Contig1;ID=1 with my new version of gff3_string (not yet commited), only the 3rd problem is addressed, generating bp_seqfeature_gff3.PLS --gff 1 -- seq_id Contig1 Contig1 est similarity 1001 1100 96 . . Name=EST:CEESC13F;ID=3;Target=EST:CEESC13F 1 100 + Contig1 est similarity 1201 1300 99 . . Name=EST:CEESC13F;ID=4;Target=EST:CEESC13F 101 200 + Contig1 est similarity 1401 1450 99 . . Name=EST:CEESC13F;ID=5;Target=EST:CEESC13F 201 250 + Contig1 confirmed transcript 1001 2000 42 + . ID=2;Alias=Transcript:trans-1;Gene=abc-1,xyz-2;Note=function+unknown Contig1 confirmed exon 1001 1100 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed exon 1201 1300 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed exon 1401 1450 . + . Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1051 1100 . + 0 Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1201 1300 . + 2 Parent=2;Alias=Transcript:trans-1 Contig1 confirmed CDS 1401 1440 . + 0 Parent=2;Alias=Transcript:trans-1 Contig1 tc1 transposon 5001 6000 . + . ID=6;Alias=Transposon:c128.1 Contig1 tc1 transposon 8001 9000 . - . ID=7;Alias=Transposon:c128.2 Contig1 confirmed transcript 30001 31000 . - . ID=9;Alias=Transcript:trans-2;Gene=xyz-2;Note=Terribly+interesting Contig1 confirmed exon 30001 30100 . - . Parent=9;Alias=Transcript:trans-2;Gene=abc-1;Note=function+unknown Contig1 confirmed exon 30701 30800 . - . Parent=9;Alias=Transcript:trans-2 Contig1 confirmed exon 30801 31000 . - . Parent=9;Alias=Transcript:trans-2 Contig1 . region 1 37450 . . . Name=Contig1;ID=1 I had to make another change to get this output though, since I had to change the behaviour to # provide special handling to "remove an extraneous level # of parentage" (unless $preserveHomegenousParent) for features # which have at least one subfeature with the same type as the # feature itself (thus redefining Lincoln's "homogenous # parent/child" case, which previously required all children to have # the same type as parent) I think you will agree this is the more desirable behaviour. I would be happy to test any other GFF you suggest might be (more or less) roundtripped. What think you? --Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Thursday, May 03, 2007 9:46 AM To: Cook, Malcolm Subject: Re: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Malcolm, For me, the major use case is that GFF3 files round-trip correctly through the database. Do any of your use cases cover that? Lincoln On 5/2/07, Cook, Malcolm wrote: Lincoln, Here for your comment and review is a very reworked version of Bio::Graphics::FeatureBase->gff3_string. The main difference is to that homogenous children get ALL their attributes except for start/stop from the parent, including the group. I also provide option as to whether or now to "remove extraneous level of parentage" called $preserveHomegenousParent. There is an in-line comment and question for you in the code body. It works well in my hands to my use cases, but, I'm not positive it is in the spirit of your intentions. Cheers, Malcolm sub gff3_string { my ($self, $recurse, $preserveHomegenousParent, # Note: the following parameters, whose name begins with '$_', # are intended for recursive call only. $_parent, $_self_is_hsf, # is $self the child in a homogeneous parent/child relationship? $_hsf_parentgroup, # if so, what is the group (GFF column 9) of the parent ) = @_; # PURPOSE: Return GFF3 format for the feature $self. Optionally # $recurse to include GFF for any subfeatures of the feature. If # recursing, provide special handling to "remove an extraneous level # of parentage" (unless $preserveHomegenousParent) for features # which have subfeatures all of whose types are the same as the # feature itself (the "homogenous parent/child" case). This usage is # a convention for representing discontiguous features; they may be # created by using the -segment directive without specifying a # distinct -subtype in to `new` when creating a # Bio::Graphics::FeatureBase (i.e. Bio::DB::SeqFeature, # Bio::Graphics::Feature). Such homogenous subfeatures created in # this fashion DO NOT have the parent (GFF column 9) attributes # propogated to them; so, since they are all part of the same # parent, the ONLY difference relevant to GFF production SHOULD be # the $start and $end coordinates for their segment, and ALL THIER # OTHER ATTRIBUTES should be copied down from the parent (including: # strand, score, Name, ID, Parent, etc). my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's parent, if it is a homogenous child, otherwise $self. if ($recurse && (my @ssf = $self->sub_SeqFeature)) { my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be TRUE only if all subfeatures are the same type as $self. my $mygroup = # compute $self's group if it is needed to be passed down to # subfeatures, unless it is already being passed down (in which # case there are (at least) 3 levels of homogenous parent child # (will this ever happen in practice???)) ! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup : $self->format_attributes($_parent); return (join("\n", (($preserveHomegenousParent ? ($self->gff3_string(0)) : ()) , map {$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo genous,$mygroup)} @ssf))); } else { my $name = $hparentORself->name; my $class = $hparentORself->class; my $group = $_self_is_hsf ? $_hsf_parentgroup : $self->format_attributes($_parent); my $strand = ('-','.','+')[$self->strand+1]; # TODO: understand conditions under which this could be other than # hparentORself->strand. In particular, why does add_segment flip # the strand when start > stop? I thought this was not allowed! # Lincoln - any ideas? my $p = join("\t", $hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met hod||'.', $self->start||'.',$self->stop||'.', defined($hparentORself->score) ? $hparentORself->score : '.', $strand||'.', defined($hparentORself->phase) ? $hparentORself->phase : '.', $group||''); } } ________________________________ From: Cook, Malcolm Sent: Friday, April 27, 2007 1:45 PM To: 'lincoln.stein at gmail.com' Cc: 'lstein at cshl.org'; 'bioperl list' Subject: RE: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Lincoln, Cool. The principal of what I figured out I still think holds but the implementation is slightly broke. Improved patch forthoming next week. Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Friday, April 27, 2007 12:45 PM To: Cook, Malcolm Cc: lstein at cshl.org; bioperl list Subject: Re: Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase Hi Malcom, This is absolutely ok and you can go ahead and commit. Thanks for figuring this out! Lincoln On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org > wrote: Lincoln, et al, I find that the gff3_string for Bio::DB::SeqFeature objects retreived from a Bio::DB::SeqFeature::Store that were initially created with -seqments (i.e. whose location was discontiguous) does not display any other attributes in column 9 than "Name". What do you think of the following patch to Bio::Graphics::FeatureBase, whose effect is to "contrive to return (duplicated) common group values" (which otherwise get lost when "collapsing" "homogenous" parent/child features) Another approach would be to copy the attributes from the parent to the children when the -seqments are first created. Another approach would be to use Bio::SeqFeature::Generic as the db's -seqfeature_class and save with -location being a Bio::Location::Split, but this was wrougth with other problems. Any other suggestions? Do you want me to commit this patch? Cheers, Malcolm Patch follows: Index: FeatureBase.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v retrieving revision 1.29 diff -c -r1.29 FeatureBase.pm *** FeatureBase.pm 16 Apr 2007 19:55:33 -0000 1.29 --- FeatureBase.pm 26 Apr 2007 16:30:23 -0000 *************** *** 581,587 **** foreach (@children) { s/Parent=/ID=/g; } # replace Parent tag with ID ! return join "\n", at children; } return join("\n",$p, at children); --- 581,589 ---- foreach (@children) { s/Parent=/ID=/g; } # replace Parent tag with ID ! #return join "\n", at children; ! # Instead of above, additionally, contrive to return (duplicated) common group values ! return(join("$group\n", at children) . $group); } return join("\n",$p, at children); -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu May 3 16:57:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 3 May 2007 15:57:43 -0500 Subject: [Bioperl-l] a query on Obtaining UniProt sequences In-Reply-To: <922386.19570.qm@web36808.mail.mud.yahoo.com> References: <922386.19570.qm@web36808.mail.mud.yahoo.com> Message-ID: <2930F3F1-2BFB-4320-9A2C-50DFE6F808A1@uiuc.edu> I would update to BioPerl 1.5.2. v.1.4 is 3 yrs old and there have been tons of changes both for sequence retrieval and parsers. We can't predict when a new 'stable' release will be available but 1.5.2 works well for most purposes. chris On May 3, 2007, at 3:09 PM, Anand Venkatraman wrote: > Hi > > I am using Bioperl 1.4 and I am trying to obtain protein sequences > for specific Uniprot records. > ... > Is there something I need to fix. > > Thanks in advance for the help. > > Anand > > > --------------------------------- > Ahhh...imagining that irresistible "new car" smell? > Check outnew cars at Yahoo! Autos. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From thiago.venancio at gmail.com Thu May 3 17:12:35 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 3 May 2007 18:12:35 -0300 Subject: [Bioperl-l] extracting coding sequence from BLAST In-Reply-To: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org> References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com> <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org> <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com> <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org> Message-ID: <44255ea80705031412n7abef247je70d2681bb3cc7ed@mail.gmail.com> Hi all, Just for record. I am getting good results to extract CDS from protein X dna alignments by using the following procedure: - BLASTX to identify the hits for each dna sequence (if you want to process sequences for further multiple sequence alignment, it is important to record the frames); - fastx/y to refine the alignment between the protein and the dna. FASTX/Y is is quite good, because it performs well with frame shifts and a allows better identification of premature stop codons. In addition, the alignment (and the CDS prediction) is better. This is interesting to note, to avoid analysis of "phantom" mRNAs, which are sequences that have stops, so merely looking at the blast can raise misleading results sometimes. Best. Thiago On 4/13/07, Jason Stajich wrote: > > Hi - > There are some tools that do this for you -- I've listed a few from a > google search or from what I remember reading. It would be great If you > (and others!) are willing to contribute a little of the info of what you > find that works for you to the wiki, that would be great as well. A little > HOWTO would be cool - here or on openwetware.org. > > Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml > EST-PAC: doi: http://dx.doi.org/10.1186/1751-0473-1-2 > > Ewan Birney's estwise as part of wise package also can help if you have a > likely protein from BLAST you want to align to the est - estwise can handle > frameshifts, but can be too slow for some people. Exonerate's protein2dna > model may also work here, but I haven't tried it. > > -jason > On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote: > > Thanks Jason. > > I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX > comparisons and want to extract some translated coding regions for further > multiple aligmnent and phylogenetic analysis. > > Best. > > Thiago > > On 4/13/07, Jason Stajich wrote: > > > Depends on how far away the query protein is, but I don't trust BLAST for > the actual alignment. Find the boundaries, add a little slop, and refine > the alignment of protein to genome with a good alignment program designed > to > like genewise or exonerate or even FASTX/Y. > -jason > On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote: > > Hi all. > > What is the best way to extract coding region from a nucleotide sequence > based on a BLASTX or TBLASTX comparisons ? > > Thanks in advance. > > Thiago > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From lstein at cshl.edu Thu May 3 17:35:57 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 3 May 2007 17:35:57 -0400 Subject: [Bioperl-l] CSHL is hiring Message-ID: <6dce9a0b0705031435r3bc2d2ddlfca5ac02844b4ef0@mail.gmail.com> Hi Folks, Sorry for the spam. My group at CSHL is looking for a scientific programmer with good software development credentials and some experience in bioinformatics. Experience in object-oriented Perl programming is a strict requirement. This is to work on user interface development for several projects including: - BioMart (data warehouse) project (www.biomart.org) - GBrowse genome browser (www.gmod.org/GBrowse) - Reactome pathways database (www.reactome.org) I can offer salaries in the 60-80K range, depending on level of experience. Please reply to lstein at cshl.edu. Best, Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From MEC at stowers-institute.org Tue May 8 12:59:10 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 8 May 2007 11:59:10 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? Message-ID: Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates, as in: ($start,$stop) = ($stop,$start) if defined($start) && defined($stop) && $start > $stop; I thought it is not legal for a feature to be so composed. Anyone know? Cheers, Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri From cjfields at uiuc.edu Tue May 8 13:12:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 May 2007 12:12:45 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: Message-ID: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> I believe all seqfeature location coordinates are designed to have start < stop for consistency; in cases where the strand matters (CDS, gene, etc.) then the strand is set to 1 or -1. When start > stop, the two are reversed and the strand is flipped; at least that's the way locations are set up in BioPerl. chris On May 8, 2007, at 11:59 AM, Cook, Malcolm wrote: > Why does Bio::DB::GFF::Feature::gff3_string swap start and stop > coordinates, > > as in: > ($start,$stop) = ($stop,$start) if defined($start) && defined($stop) > && $start > $stop; > > I thought it is not legal for a feature to be so composed. > > Anyone know? > > Cheers, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From juheymann at yahoo.com Tue May 8 14:37:20 2007 From: juheymann at yahoo.com (Bohr) Date: Tue, 8 May 2007 11:37:20 -0700 (PDT) Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); Message-ID: <10381379.post@talk.nabble.com> Hi, I installed bioperl under OSX Tiger via Fink. I tested the installation using the test tutorial via: perl -w bptutorial.pl 5 The script failed indicating that the file to retrieve was missing. To identify the problem, I used a script using 'get_sequence' that will retrieve a file from 'genbank' or 'embl'. Both succeeded. If I replace it with 'swiss' or 'swissprot' and substitute the ID with the identical ID as in the tutorial, I am recreating the problem found with bptutorial.pl. Other ID's do the same. Any pointers on the origin of this finding would be greatly appreciated. -- View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10381379 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Tue May 8 17:53:04 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 May 2007 16:53:04 -0500 Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); In-Reply-To: <10381379.post@talk.nabble.com> References: <10381379.post@talk.nabble.com> Message-ID: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> The Fink BioPerl distribution is 1.5.1. You'll need to update to v 1.5.2 due to changes on the various remote servers (NCBI, UniProt, etc) accessed via bioperl. As a note, the bptutorial.pl has been moved to the bioperl wiki: http://www.bioperl.org/wiki/Bptutorial chris On May 8, 2007, at 1:37 PM, Bohr wrote: > > Hi, > > I installed bioperl under OSX Tiger via Fink. I tested the > installation > using the test tutorial via: perl -w bptutorial.pl 5 > > The script failed indicating that the file to retrieve was missing. To > identify the problem, I used a script using 'get_sequence' that will > retrieve a file from 'genbank' or 'embl'. Both succeeded. If I > replace it > with 'swiss' or 'swissprot' and substitute the ID with the > identical ID as > in the tutorial, I am recreating the problem found with > bptutorial.pl. Other > ID's do the same. > > Any pointers on the origin of this finding would be greatly > appreciated. > -- > View this message in context: http://www.nabble.com/problem-with- > Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- > tf3711391.html#a10381379 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From juheymann at yahoo.com Wed May 9 18:17:27 2007 From: juheymann at yahoo.com (Bohr) Date: Wed, 9 May 2007 15:17:27 -0700 (PDT) Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> References: <10381379.post@talk.nabble.com> <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> Message-ID: <10403903.post@talk.nabble.com> Thank you for the feedback and the suggestion. I installed 1.5.2 via Build.pl and the results were the same e.g. embl and genbank worked fine, swissprot failed Here is the output: MSG: acc (CALX_YEAST) does not exist --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Did not provide a valid Bio::PrimarySeqI object STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::SeqIO::fasta::write_seq /sw/lib/perl5/5.8.6/Bio/SeqIO/fasta.pm:181 Before contemplating too much: Here my question: how do I verify the update to 1.5.2? (I ran ./Build test and that came back positive.) And what else could have gone wrong here? What might be a clever way to troubleshoot this? --------------------------------------------------------------------------- Chris Fields wrote: > > The Fink BioPerl distribution is 1.5.1. You'll need to update to v > 1.5.2 due to changes on the various remote servers (NCBI, UniProt, > etc) accessed via bioperl. > > As a note, the bptutorial.pl has been moved to the bioperl wiki: > > http://www.bioperl.org/wiki/Bptutorial > > chris > > On May 8, 2007, at 1:37 PM, Bohr wrote: > >> >> Hi, >> >> I installed bioperl under OSX Tiger via Fink. I tested the >> installation >> using the test tutorial via: perl -w bptutorial.pl 5 >> >> The script failed indicating that the file to retrieve was missing. To >> identify the problem, I used a script using 'get_sequence' that will >> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I >> replace it >> with 'swiss' or 'swissprot' and substitute the ID with the >> identical ID as >> in the tutorial, I am recreating the problem found with >> bptutorial.pl. Other >> ID's do the same. >> >> Any pointers on the origin of this finding would be greatly >> appreciated. >> -- >> View this message in context: http://www.nabble.com/problem-with- >> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- >> tf3711391.html#a10381379 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10403903 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From ursula_cox at btinternet.com Wed May 9 18:12:26 2007 From: ursula_cox at btinternet.com (Ursula at BT) Date: Wed, 9 May 2007 23:12:26 +0100 Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection Message-ID: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore> Dear BioPerl List, I'm new to BioPerl (and Perl for that matter). I have an array of enzyme names, and a larger collection of enzymes (guaranteed to be a superset by the way it's constructed). I need to make a new collection containing just the enzymes corresponding to the names I have in the array. I was hoping that something like: my $all_rebase = Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet'); my $all_rebase_collection = $all_rebase->read(); my @enzymes = ('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','AccB1I',' AccB7I','AccI'); my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1); foreach $enzyme (all_rebase_collection) { $new_collection($enzyme) if grep $_ eq $enzyme->name, @enzymes; } would work, but I get a syntax error near "$new_collection(". Any clues much appreciated, Ursula Cox From juheymann at yahoo.com Wed May 9 18:38:42 2007 From: juheymann at yahoo.com (Bohr) Date: Wed, 9 May 2007 15:38:42 -0700 (PDT) Subject: [Bioperl-l] problem with Bioperl get_sequence ('swiss', "acc#"); In-Reply-To: <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> References: <10381379.post@talk.nabble.com> <2B5306D0-0498-47FD-8D57-1B559DC8E838@uiuc.edu> Message-ID: <10404211.post@talk.nabble.com> Thank you for pointing that out! I installed 1.5.2 via Build.pl. The scripts work as expected now. Chris Fields wrote: > > The Fink BioPerl distribution is 1.5.1. You'll need to update to v > 1.5.2 due to changes on the various remote servers (NCBI, UniProt, > etc) accessed via bioperl. > > As a note, the bptutorial.pl has been moved to the bioperl wiki: > > http://www.bioperl.org/wiki/Bptutorial > > chris > > On May 8, 2007, at 1:37 PM, Bohr wrote: > >> >> Hi, >> >> I installed bioperl under OSX Tiger via Fink. I tested the >> installation >> using the test tutorial via: perl -w bptutorial.pl 5 >> >> The script failed indicating that the file to retrieve was missing. To >> identify the problem, I used a script using 'get_sequence' that will >> retrieve a file from 'genbank' or 'embl'. Both succeeded. If I >> replace it >> with 'swiss' or 'swissprot' and substitute the ID with the >> identical ID as >> in the tutorial, I am recreating the problem found with >> bptutorial.pl. Other >> ID's do the same. >> >> Any pointers on the origin of this finding would be greatly >> appreciated. >> -- >> View this message in context: http://www.nabble.com/problem-with- >> Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29-- >> tf3711391.html#a10381379 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/problem-with-Bioperl-get_sequence-%28%27swiss%27%2C-%22acc-%22%29--tf3711391.html#a10404211 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Wed May 9 19:37:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 9 May 2007 18:37:33 -0500 Subject: [Bioperl-l] Getting a Subset of an Existing EnzymeCollection In-Reply-To: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore> References: <002e01c79287$20bbfe60$4101a8c0@AMDDualCore> Message-ID: On May 9, 2007, at 5:12 PM, Ursula at BT wrote: > Dear BioPerl List, > > > > I'm new to BioPerl (and Perl for that matter). I have an array of > enzyme > names, and a larger collection of enzymes (guaranteed to be a > superset by > the way it's constructed). I need to make a new collection > containing just > the enzymes corresponding to the names I have in the array. First, prior to using BioPerl you should really brush up on perl itself (Learning Perl, or James Tisdall's Perl for Bioinformatics books, the former preferred). Though there are several scripts available to get you started with Bioperl, much of the code is written with the expectation that you can write and debug a basic perl script (and there is some expectation that you are somewhat familiar with OO Perl). Saying that, let's see what's wrong... > I was hoping that something like: > > > > my $all_rebase = > Bio::Restriction::IO->new(-file=>'bionet.704',-format=>'bionet'); > > my $all_rebase_collection = $all_rebase->read(); The 'bionet' format is not supported; only 'withrefm', 'itype2', 'bairoch' are (the latter only experimentally). See 'perldoc Bio::Restriction::IO'. > my @enzymes = > ('AasI','AatI','AccII','AatII','AauI','Acc113I','Acc16I','Acc65I','Acc > B1I',' > AccB7I','AccI'); > > > > my $new_collection = Bio::Restriction::EnzymeCollection(-empty => 1); Missing a new() constructor here. > foreach $enzyme (all_rebase_collection) Not sure what this is. No '$' sigil for $all_rebase_collection will make the compiler look for (and fail to find) the sub all_rebase_collection(). > > { > > $new_collection($enzyme) if grep $_ eq $enzyme->name, > @enzymes; > > } > > > > would work, but I get a syntax error near "$new_collection(". Yep. You don't have your grep sub block in brackets {}, hence the error. See 'perldoc -f grep'. > Any clues much appreciated, > > > > Ursula Cox No prob, but again you might want to brush up on perl. chris From darin.london at duke.edu Thu May 10 12:17:38 2007 From: darin.london at duke.edu (darin.london at duke.edu) Date: Thu, 10 May 2007 12:17:38 -0400 Subject: [Bioperl-l] BOSC 2007 Second Call For Papers Message-ID: <200705101617.l4AGHceI002463@tenero.duhs.duke.edu> The BOSC Organizing Committee are proud to announce BOSC 2007, occurring in Vienna, Austria on July 19th, 20th. The conference this year promises to be exciting, as the BOSC developers attempt to define and solve currently intractable problems in Bioinformatics. Please refer to the following website for complete information, and requests for submissions. Thank you, and we hope to see you in Vienna. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. From lstein at cshl.edu Thu May 10 13:13:09 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 10 May 2007 13:12:09 -0401 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: Message-ID: <6dce9a0b0705101013w1923c173l5ec5d9288c67c9a2@mail.gmail.com> It's a workaround for some broken data sources. It should "never happen." Lincoln On 5/8/07, Cook, Malcolm wrote: > > Why does Bio::DB::GFF::Feature::gff3_string swap start and stop > coordinates, > > as in: > ($start,$stop) = ($stop,$start) if defined($start) && defined($stop) > && $start > $stop; > > I thought it is not legal for a feature to be so composed. > > Anyone know? > > Cheers, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Bank.Beszteri at awi.de Thu May 10 12:13:00 2007 From: Bank.Beszteri at awi.de (Bank Beszteri) Date: Thu, 10 May 2007 18:13:00 +0200 Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem Message-ID: <4643448C.4000807@awi.de> Dear Bioperl folks, I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, but in some things it did not behave as I expected it to, so I had to look inside a bit. In particular, I had problems with mixed up bootstrap values after re-rooting. After looking into the Bio::Tree::Tree data structures, it seems that a) bootstrap values are stored as attributes of nodes of the tree [to my understanding, they should rather be attributes of branches but Bio::Tree::Tree apparently tries to simplify away branches]; each node stores the bootstrap value belonging to the branch that connects it to its ancestor node (I?m reading in trees from Newick strings, and bootstrap values arrive in the id fields of internal branches) b) when re-rooting a tree, bootstrap values stay with the same node where they were before. Because the node that used to be the ancestor of a particular node in the original tree might have become its descendant after re-rooting, the bootstrap values are being mixed up. Can you confirm my conclusion? Whether yes or no, have you got an easy workaround or alternative solution to re-rooting trees (without having to touch the reroot method) or any other hints that could be useful for me to deal with this issue? Cheers, Bank -- Dr. B?nk Beszteri Alfred Wegener Institute for Polar and Marine Research From dmessina at wustl.edu Thu May 10 16:16:48 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 10 May 2007 15:16:48 -0500 Subject: [Bioperl-l] Cross_match parser and Search::Result object Message-ID: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> Hi everyone, Shin Leong here at the Wash U GSC has written SearchIO-compliant cross_match parsing and result modules. Specifically, Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult. To my knowledge this functionality doesn't exist in BioPerl. Any comments or objections before I commit these to CVS? Thanks, Dave -- Dave Messina Senior Analyst, Assembly Group Genome Sequencing Center Washington University St. Louis, MO From aperezp at uma.es Thu May 10 13:58:32 2007 From: aperezp at uma.es (=?ISO-8859-1?Q?=22Antonio_J=2E_P=E9rez=22?=) Date: Thu, 10 May 2007 19:58:32 +0200 Subject: [Bioperl-l] Get Swiss Entry Message-ID: <46435D48.4020309@uma.es> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/ca4e893e/attachment.html From jason at bioperl.org Thu May 10 16:53:28 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 10 May 2007 13:53:28 -0700 Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> Message-ID: Awesome! On May 10, 2007, at 1:16 PM, David Messina wrote: > Hi everyone, > > Shin Leong here at the Wash U GSC has written SearchIO-compliant > cross_match parsing and result modules. Specifically, > Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult. > > To my knowledge this functionality doesn't exist in BioPerl. Any > comments or objections before I commit these to CVS? > > Thanks, > Dave > > > -- > Dave Messina > Senior Analyst, Assembly Group > Genome Sequencing Center > Washington University > St. Louis, MO > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070510/b841b428/attachment.bin From cjfields at uiuc.edu Fri May 11 00:55:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 10 May 2007 23:55:05 -0500 Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> Message-ID: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> Sounds good to me! Any tests to be added? chris On May 10, 2007, at 3:16 PM, David Messina wrote: > Hi everyone, > > Shin Leong here at the Wash U GSC has written SearchIO-compliant > cross_match parsing and result modules. Specifically, > Bio::SearchIO::cross_match and Bio::Search::Result::CrossMatchResult. > > To my knowledge this functionality doesn't exist in BioPerl. Any > comments or objections before I commit these to CVS? > > Thanks, > Dave > > > -- > Dave Messina > Senior Analyst, Assembly Group > Genome Sequencing Center > Washington University > St. Louis, MO > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Fri May 11 01:42:53 2007 From: dmessina at wustl.edu (David Messina) Date: Fri, 11 May 2007 00:42:53 -0500 Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> Message-ID: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu> > Sounds good to me! Any tests to be added? No tests right now as far as I can tell. I'm swamped personally, but perhaps I can persuade Mark Johnson over here to crank out a few. From cjfields at uiuc.edu Fri May 11 11:25:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 11 May 2007 10:25:34 -0500 Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu> <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu> Message-ID: Thanks Mark! I don't think you'll need to add a ton of tests; just enough to demo anything that you feel is necessary or specific to the parser. These could go into SearchIO.t or their own test suite. chris On May 11, 2007, at 10:14 AM, Mark Johnson wrote: >>> Sounds good to me! Any tests to be added? >> >> No tests right now as far as I can tell. I'm swamped personally, but >> perhaps I can persuade Mark Johnson over here to crank out a few. > > I'll see what I can do. I just had to open my mouth about getting > this > contributed back after I noticed it, so I suppose this is appropriate > retribution. 8) > > From mjohnson at watson.wustl.edu Fri May 11 11:14:56 2007 From: mjohnson at watson.wustl.edu (Mark Johnson) Date: Fri, 11 May 2007 10:14:56 -0500 (CDT) Subject: [Bioperl-l] Cross_match parser and Search::Result object In-Reply-To: <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu> References: <1C6C74AC-9CD2-48E8-8A6A-772D6DEA8C45@wustl.edu> <1E23C374-16B7-4D00-9340-79DBA4B8BABF@uiuc.edu> <9744D96F-D2F4-43B5-B9D3-147A506F3AE7@wustl.edu> Message-ID: <57045.10.0.1.216.1178896496.squirrel@gscmail.wustl.edu> >> Sounds good to me! Any tests to be added? > > No tests right now as far as I can tell. I'm swamped personally, but > perhaps I can persuade Mark Johnson over here to crank out a few. I'll see what I can do. I just had to open my mouth about getting this contributed back after I noticed it, so I suppose this is appropriate retribution. 8) From golharam at umdnj.edu Fri May 11 16:20:41 2007 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 11 May 2007 16:20:41 -0400 Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself Message-ID: <000501c79409$d8c03480$f6028a0a@PICO> I'm running a large series of clustalw alignments. After a large number of alignments, my perl script would die indicating too many links were open. I checked my /tmp directory (while the script is running) and noticed that the temp directory created for ClustalW are not removed until after the script exists. How can I force the cleanup of these directories after I am done with the alignment? My code is essentially this; $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); $aa_aln = $aln_factory->align(\@aa_seqs); open(STDOUT, ">&OLDOUT"); $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs); Ryan From jason at bioperl.org Fri May 11 16:53:19 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 11 May 2007 13:53:19 -0700 Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO> References: <000501c79409$d8c03480$f6028a0a@PICO> Message-ID: Did you try adding this after your calls getting the CDS aln. $aln_factory->cleanup(); -jason On May 11, 2007, at 1:20 PM, Ryan Golhar wrote: > I'm running a large series of clustalw alignments. After a large > number of > alignments, my perl script would die indicating too many links were > open. I > checked my /tmp directory (while the script is running) and noticed > that the > temp directory created for ClustalW are not removed until after the > script > exists. > How can I force the cleanup of these directories after I am done > with the > alignment? > > My code is essentially this; > > $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); > $aa_aln = $aln_factory->align(\@aa_seqs); > open(STDOUT, ">&OLDOUT"); > $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs); > > > Ryan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Fri May 11 16:57:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 11 May 2007 15:57:23 -0500 Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself In-Reply-To: <000501c79409$d8c03480$f6028a0a@PICO> References: <000501c79409$d8c03480$f6028a0a@PICO> Message-ID: <41E91E58-48A5-4E29-B6BA-E9417BF17513@uiuc.edu> cleanup() is supposed to clean up temp directory stuff; it's inherited from Bio::Tools::Run::WrapperBase. chris On May 11, 2007, at 3:20 PM, Ryan Golhar wrote: > I'm running a large series of clustalw alignments. After a large > number of > alignments, my perl script would die indicating too many links were > open. I > checked my /tmp directory (while the script is running) and noticed > that the > temp directory created for ClustalW are not removed until after the > script > exists. > How can I force the cleanup of these directories after I am done > with the > alignment? > > My code is essentially this; > > $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); > $aa_aln = $aln_factory->align(\@aa_seqs); > open(STDOUT, ">&OLDOUT"); > $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs); > > > Ryan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From golharam at umdnj.edu Fri May 11 18:11:47 2007 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 11 May 2007 18:11:47 -0400 Subject: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself In-Reply-To: Message-ID: <001301c79419$5e794e90$f6028a0a@PICO> No, I didn't, but I will now. Thanks. Interestingly enough ClustalW removes the files from within the temp directory, but not the temp directory itself. -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Friday, May 11, 2007 4:53 PM To: golharam at umdnj.edu Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio:Tools::Run::Alignment::ClustalW not cleaning up after itself Did you try adding this after your calls getting the CDS aln. $aln_factory->cleanup(); -jason On May 11, 2007, at 1:20 PM, Ryan Golhar wrote: I'm running a large series of clustalw alignments. After a large number of alignments, my perl script would die indicating too many links were open. I checked my /tmp directory (while the script is running) and noticed that the temp directory created for ClustalW are not removed until after the script exists. How can I force the cleanup of these directories after I am done with the alignment? My code is essentially this; $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); $aa_aln = $aln_factory->align(\@aa_seqs); open(STDOUT, ">&OLDOUT"); $dna_aln = &aa_to_dna_aln($aa_aln, \%dna_seqs); Ryan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From goshng at gmail.com Sat May 12 11:21:59 2007 From: goshng at gmail.com (Sang Chul Choi) Date: Sat, 12 May 2007 11:21:59 -0400 Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object without making another object? Message-ID: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com> Hi, One Bio::Seq's sequence is "ACGT" and I want this object to have "ACGA" by changing the fouth letter from T to A. I thought I could do this by reading sequence string through the method of seq(), changing the string by perl's general function, and generating another Bio::Seq object with the new string. This seems to be silly, a little bit. Is there any simple way to do this? Or, is there any method of Bio::Seq to do this: to change one letter at a particular position, or additionally to change letters with some range? Thank you, Sang Chul From jason at bioperl.org Sat May 12 12:50:10 2007 From: jason at bioperl.org (Jason Stajich) Date: Sat, 12 May 2007 09:50:10 -0700 Subject: [Bioperl-l] How can I change only one letter of Bio::Seq object without making another object? In-Reply-To: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com> References: <33f36270705120821g15c53932wad96d8627ef5b5b7@mail.gmail.com> Message-ID: <22C99635-C22D-4F51-AADD-5CCF595222DF@bioperl.org> You can get/set the seq data via the seq() method. use Bio::Seq; my $seq = Bio::Seq->new(-seq => 'ACGT'); my $str = $seq->seq; print $str, "\n"; substr($str,3,1,'A'); $seq->seq($str); print $seq->seq, "\n"; On May 12, 2007, at 8:21 AM, Sang Chul Choi wrote: > Hi, > > One Bio::Seq's sequence is "ACGT" and I want this object to have > "ACGA" by changing the fouth letter from T to A. I thought I could do > this by reading sequence string through the method of seq(), changing > the string by perl's general function, and generating another Bio::Seq > object with the new string. This seems to be silly, a little bit. > > Is there any simple way to do this? Or, is there any method of > Bio::Seq to do this: to change one letter at a particular position, or > additionally to change letters with some range? > > Thank you, > > Sang Chul > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From jason at bioperl.org Sat May 12 18:12:56 2007 From: jason at bioperl.org (Jason Stajich) Date: Sat, 12 May 2007 15:12:56 -0700 Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem In-Reply-To: <4643448C.4000807@awi.de> References: <4643448C.4000807@awi.de> Message-ID: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org> On May 10, 2007, at 9:13 AM, Bank Beszteri wrote: > Dear Bioperl folks, > > I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, > but in some things it did not behave as I expected it to, so I had to > look inside a bit. > In particular, I had problems with mixed up bootstrap values after > re-rooting. After looking into the Bio::Tree::Tree data structures, it > seems that > > a) bootstrap values are stored as attributes of nodes of the tree > [to my > understanding, they should rather be attributes of branches but > Bio::Tree::Tree apparently tries to simplify away branches]; each node > stores the bootstrap value belonging to the branch that connects it to > its ancestor node (I?m reading in trees from Newick strings, and > bootstrap values arrive in the id fields of internal branches) Please feel free to suggest an alternative implementation if you don't agree with the object model. It has worked quite well in our hands so I'd be all ears for someone wanting to get in an do some more work on it. We have answered the question as to why bootstrap values are internal ids many times on this list and I believe on the wiki -- the parser can't tell the difference between a node id and a bootstrap value because nexus uses the same slot for both. if you know you have bootstrap values in the internal node it is trivial to process your tree and copy the values over. for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) { $node->bootstrap($node->id); $node->id(''); } I just added this as a method to TreeFunctionI so that it can be easily called now to help satisfy everyone who hopes that the toolkit can guess whether the internal nodes are bootstraps or identifiers. > > b) when re-rooting a tree, bootstrap values stay with the same node > where they were before. Because the node that used to be the > ancestor of > a particular node in the original tree might have become its > descendant > after re-rooting, the bootstrap values are being mixed up. > > Can you confirm my conclusion? Whether yes or no, have you got an easy > workaround or alternative solution to re-rooting trees (without having > to touch the reroot method) or any other hints that could be useful > for > me to deal with this issue? > I think you are right, but I am not clear what should be value for the internal node attached to the root now. Note that is always helpful to provide example code illustrating your problem. Here is an example which I think illustrates your problem. use Bio::TreeIO; my $in = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA); my $out = Bio::TreeIO->new(-format => 'newick'); while( my $t = $in->next_tree ){ my ($a) = $t->find_node(-id =>"A"); $out->write_tree($t); $t->reroot($a); $out->write_tree($t); } __DATA__ (((A:5,B:5)90:2,C:4)25:3,D:10); > Cheers, > > Bank > > > > -- > Dr. B?nk Beszteri > Alfred Wegener Institute for Polar and Marine Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From darin.london at duke.edu Mon May 14 10:44:56 2007 From: darin.london at duke.edu (darin.london at duke.edu) Date: Mon, 14 May 2007 10:44:56 -0400 Subject: [Bioperl-l] BOSC 2007 Abstract Submission Deadline Extended Message-ID: <200705141444.l4EEium2026969@tenero.duhs.duke.edu> Due to technical difficulties in sending out the 2nd call for papers, the BOSC organizers are extending the deadline for abstract submissions to Monday May 21st. The announcement day will remain the same so that it remains before the Early Discount Date. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. From thiago.venancio at gmail.com Mon May 14 14:54:44 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Mon, 14 May 2007 15:54:44 -0300 Subject: [Bioperl-l] get regions Message-ID: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> Hi all, Using Bio::Seq, is there any easy way to get the coordinates where a regular expression matches or should I build a sliding window? For example, looking for a given promoter region in a FASTA file. If the region is found, I would like to recover exactly the coordinates where it matches. Thanks in advance. Thiago -- "Doubt is not a pleasant condition, but certainty is absurd." Voltaire ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From jason at bioperl.org Mon May 14 15:06:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 14 May 2007 12:06:11 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> Message-ID: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> I assume you are doing the matches on the string with =~ so Bio::Seq doesn't really help you here I don't think. See the $` variable in Perl for how to capture the position of where a regexp matches. -jason On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > Hi all, > > Using Bio::Seq, is there any easy way to get the coordinates where a > regular expression matches or should I build a sliding window? > > For example, looking for a given promoter region in a FASTA file. If > the region is found, I would like to recover exactly the coordinates > where it matches. > > Thanks in advance. > > Thiago > -- > "Doubt is not a pleasant condition, but certainty is absurd." > Voltaire > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From Kevin.M.Brown at asu.edu Mon May 14 15:15:09 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 14 May 2007 12:15:09 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> Message-ID: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> I do this in perl with the pos() function. This requires the use of the match operator (m) like if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } pos() returns the location of the pointer where the regex left off after finding a match. I remove the length of my pattern (which is just a string with a few placeholder (.) wildcards, so I know how long the match will always be). > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jason Stajich > Sent: Monday, May 14, 2007 12:06 PM > To: Thiago Venancio > Cc: bioperl-l list > Subject: Re: [Bioperl-l] get regions > > I assume you are doing the matches on the string with =~ so > Bio::Seq doesn't really help you here I don't think. > See the $` variable in Perl for how to capture the position > of where a regexp matches. > > -jason > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > > > Hi all, > > > > Using Bio::Seq, is there any easy way to get the > coordinates where a > > regular expression matches or should I build a sliding window? > > > > For example, looking for a given promoter region in a FASTA > file. If > > the region is found, I would like to recover exactly the > coordinates > > where it matches. > > > > Thanks in advance. > > > > Thiago > > -- > > "Doubt is not a pleasant condition, but certainty is absurd." > > Voltaire > > > > ======================== > > Thiago Motta Venancio, MSc > > PhD student in Bioinformatics > > University of Sao Paulo > > ======================== > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Bank.Beszteri at awi.de Mon May 14 09:20:07 2007 From: Bank.Beszteri at awi.de (Bank Beszteri) Date: Mon, 14 May 2007 15:20:07 +0200 Subject: [Bioperl-l] Bio::Tree::Tree -- rerooting & bootstrap problem In-Reply-To: <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org> References: <4643448C.4000807@awi.de> <1369AFDC-2082-4021-8603-55E8ED032D41@bioperl.org> Message-ID: <46486207.60304@awi.de> Dear Jason, thanks for your answer! Sorry about having been ambiguous - it is clear that bootstrap values are parsed as ids from newick files, I had no problem with that, it was only the first step of the explanation of my problem, which was the rerooting issue. Thanks for your example code as well, it is indeed really useful to illustrate the problem. I modified the original tree a bit to make my point clearer: In your example, there are two internal node ids in a four-taxon tree. This is not a realistic situtation for bootstrap values, because bootstrap values are attached to bipartitions of terminal nodes, i.e., edges / branches of a tree (in what proportion of the bootstrap replicates was a particular bipartition recovered - an alternative representation of bootstraps, like produced e.g. by PAUP, is indeed a "taxon bipartition table"). This means that in a four taxon tree, we can have at most one bootstrap value - corresponding to the single non-trivial bipartition (all other bipartitions are trivial, i.e., they separate a terminal node from the rest). So here is an example 4-taxon tree with a bootstrap value: (A:52,(B:46,C:50)68:11,D:70); After rerooting at node B (using your example code) it looks like ((B:46,C:50,(A:52,D:70):11)68); Now there are two problems: 1) this seems to be a small problem with TreeIO rather than with rerooting: there is an extra pair of parentheses around the whole tree; but more importantly: 2) the bootstrap value appears at the root node, which is not sensible according to the convention that "each node stores the bootstrap value belonging to the branch linking it to its ancestor". You would like the bootstrap value appear at the node connecting A & D in this situation, which would look like (B:46,C:50,(A:52,D:70)68:11); because in this new situation, this position would correspond to the same bipartition as in the original tree [which is (A,D)(B,C)]. In the meanwhile, I got a mail showing me the solution (thx Daniel!), which is in fact pretty simple: all that has to be done is go through the nodes on the path from the old to the new root after rerooting, and for each node, take the bootstrap values from its ancestor (and remove it from the ancestor). This leaves the root node without a bootstrap value, which is exactly what you want (because it has no branch connecting it to its ancestor, there is no sensible bootstrap value attached to a root node). So this exercise tells me that bootstraps and "real" node ids should be handled in different manners when rerooting: real ids should of course stay with the nodes, whereas bootstrap values on the path between the new and old root should move over to the other end of the corresponding branch. Best wishes, Bank Jason Stajich wrote: > > On May 10, 2007, at 9:13 AM, Bank Beszteri wrote: > >> Dear Bioperl folks, >> >> I?m trying to use Bio::Tree::Tree for manipulating phylogenetic trees, >> but in some things it did not behave as I expected it to, so I had to >> look inside a bit. >> In particular, I had problems with mixed up bootstrap values after >> re-rooting. After looking into the Bio::Tree::Tree data structures, it >> seems that >> >> a) bootstrap values are stored as attributes of nodes of the tree [to my >> understanding, they should rather be attributes of branches but >> Bio::Tree::Tree apparently tries to simplify away branches]; each node >> stores the bootstrap value belonging to the branch that connects it to >> its ancestor node (I?m reading in trees from Newick strings, and >> bootstrap values arrive in the id fields of internal branches) > > Please feel free to suggest an alternative implementation if you don't > agree with the object model. It has worked quite well in our hands > so I'd be all ears for someone wanting to get in an do some more work > on it. > > We have answered the question as to why bootstrap values are internal > ids many times on this list and I believe on the wiki -- the parser > can't tell the difference between a node id and a bootstrap value > because nexus uses the same slot for both. if you know you have > bootstrap values in the internal node it is trivial to process your > tree and copy the values over. > > > for my $node ( grep { ! $_->is_Leaf } $tree->get_all_nodes ) { > $node->bootstrap($node->id); > $node->id(''); > } > > I just added this as a method to TreeFunctionI so that it can be > easily called now to help satisfy everyone who hopes that the toolkit > can guess whether the internal nodes are bootstraps or identifiers. > > >> >> b) when re-rooting a tree, bootstrap values stay with the same node >> where they were before. Because the node that used to be the ancestor of >> a particular node in the original tree might have become its descendant >> after re-rooting, the bootstrap values are being mixed up. >> >> Can you confirm my conclusion? Whether yes or no, have you got an easy >> workaround or alternative solution to re-rooting trees (without having >> to touch the reroot method) or any other hints that could be useful for >> me to deal with this issue? >> > > I think you are right, but I am not clear what should be value for the > internal node attached to the root now. > > Note that is always helpful to provide example code illustrating your > problem. Here is an example which I think illustrates your problem. > > use Bio::TreeIO; > > my $in = Bio::TreeIO->new(-format => 'newick', > -fh => \*DATA); > my $out = Bio::TreeIO->new(-format => 'newick'); > while( my $t = $in->next_tree ){ > my ($a) = $t->find_node(-id =>"A"); > $out->write_tree($t); > $t->reroot($a); > $out->write_tree($t); > } > __DATA__ > (((A:5,B:5)90:2,C:4)25:3,D:10); > > >> Cheers, >> >> Bank >> >> >> >> -- >> Dr. B?nk Beszteri >> Alfred Wegener Institute for Polar and Marine Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From basu at pharm.sunysb.edu Mon May 14 15:10:33 2007 From: basu at pharm.sunysb.edu (Siddhartha Basu) Date: Mon, 14 May 2007 15:10:33 -0400 Subject: [Bioperl-l] get regions In-Reply-To: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> Message-ID: <4648B429.2030907@pharm.sunysb.edu> Thiago Venancio wrote: > Hi all, > > Using Bio::Seq, is there any easy way to get the coordinates where a > regular expression matches or should I build a sliding window? The perl core function "pos" should help you in this case. Do a 'perldoc -f pos' for details. -sidd > > For example, looking for a given promoter region in a FASTA file. If > the region is found, I would like to recover exactly the coordinates > where it matches. > > Thanks in advance. > > Thiago From cjfields at uiuc.edu Mon May 14 16:48:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 14 May 2007 15:48:36 -0500 Subject: [Bioperl-l] get regions In-Reply-To: <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> Message-ID: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu> I use pos() with m{}g; the quoted globals tend to slow things down for me. Ah, see Kevin's answered that... chris On May 14, 2007, at 2:06 PM, Jason Stajich wrote: > I assume you are doing the matches on the string with =~ so Bio::Seq > doesn't really help you here I don't think. > See the $` variable in Perl for how to capture the position of where > a regexp matches. > > -jason > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > >> Hi all, >> >> Using Bio::Seq, is there any easy way to get the coordinates where a >> regular expression matches or should I build a sliding window? >> >> For example, looking for a given promoter region in a FASTA file. If >> the region is found, I would like to recover exactly the coordinates >> where it matches. >> >> Thanks in advance. >> >> Thiago >> -- >> "Doubt is not a pleasant condition, but certainty is absurd." >> Voltaire >> >> ======================== >> Thiago Motta Venancio, MSc >> PhD student in Bioinformatics >> University of Sao Paulo >> ======================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Mon May 14 17:50:09 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 14 May 2007 14:50:09 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> <1A37A8AB-4C9F-4BB6-BC91-3493C68A84DA@uiuc.edu> Message-ID: yep you are right pos() much better and faster for getting the position. -j On May 14, 2007, at 1:48 PM, Chris Fields wrote: > I use pos() with m{}g; the quoted globals tend to slow things down > for me. > > Ah, see Kevin's answered that... > > chris > > On May 14, 2007, at 2:06 PM, Jason Stajich wrote: > >> I assume you are doing the matches on the string with =~ so Bio::Seq >> doesn't really help you here I don't think. >> See the $` variable in Perl for how to capture the position of where >> a regexp matches. >> >> -jason >> On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: >> >>> Hi all, >>> >>> Using Bio::Seq, is there any easy way to get the coordinates where a >>> regular expression matches or should I build a sliding window? >>> >>> For example, looking for a given promoter region in a FASTA file. If >>> the region is found, I would like to recover exactly the coordinates >>> where it matches. >>> >>> Thanks in advance. >>> >>> Thiago >>> -- >>> "Doubt is not a pleasant condition, but certainty is absurd." >>> Voltaire >>> >>> ======================== >>> Thiago Motta Venancio, MSc >>> PhD student in Bioinformatics >>> University of Sao Paulo >>> ======================== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> http://jason.open-bio.org/ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From sac at bioperl.org Mon May 14 21:46:55 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 14 May 2007 18:46:55 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> Message-ID: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> On 5/14/07, Kevin Brown wrote: > I do this in perl with the pos() function. This requires the use of the > match operator (m) like > > if ($gene =~ m/$pattern/gi) > { > $start = pos($gene) - length($pattern) + 1; > } > > pos() returns the location of the pointer where the regex left off after > finding a match. Cool. I hadn't known that was possible. > I remove the length of my pattern (which is just a > string with a few placeholder (.) wildcards, so I know how long the > match will always be). To generalize your code so that it will work for any pattern, such as one that can match strings of variable length like "A{5,10}", just subtract the length of the actual string that was matched: if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($&) + 1; } Steve > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Jason Stajich > > Sent: Monday, May 14, 2007 12:06 PM > > To: Thiago Venancio > > Cc: bioperl-l list > > Subject: Re: [Bioperl-l] get regions > > > > I assume you are doing the matches on the string with =~ so > > Bio::Seq doesn't really help you here I don't think. > > See the $` variable in Perl for how to capture the position > > of where a regexp matches. > > > > -jason > > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > > > > > Hi all, > > > > > > Using Bio::Seq, is there any easy way to get the > > coordinates where a > > > regular expression matches or should I build a sliding window? > > > > > > For example, looking for a given promoter region in a FASTA > > file. If > > > the region is found, I would like to recover exactly the > > coordinates > > > where it matches. > > > > > > Thanks in advance. > > > > > > Thiago > > > -- > > > "Doubt is not a pleasant condition, but certainty is absurd." > > > Voltaire > > > > > > ======================== > > > Thiago Motta Venancio, MSc > > > PhD student in Bioinformatics > > > University of Sao Paulo > > > ======================== > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > http://jason.open-bio.org/ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shameer at ncbs.res.in Mon May 14 23:03:57 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Tue, 15 May 2007 08:33:57 +0530 (IST) Subject: [Bioperl-l] How to produce Bio::Graphics images using PROSITE output ? In-Reply-To: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> Message-ID: <49697.192.168.1.1.1179198237.squirrel@mail.ncbs.res.in> Dear All, Thanks a lot for all your inputs [Help : Imagemaps using Bio::Graphics ]. I am still working on the other part of this project. Now, I am sure that I can impliment it using Bio::Graphics. I will come back to imagemaps with in a week or two. Meanwhile, I need to parse a prosite output to present it as a Bio::Graphics image. Any one had tries Bio::Graphics to create images using prosite output ? I tried in the How-to I couldnt find anything related to prosite. My output looks like this : >Sequence : PS00001 ASN_GLYCOSYLATION N-glycosylation site. 75 - 78 NGSM >Sequence : PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation site. 41 - 43 SpK >Sequence : PS00008 MYRISTYL N-myristoylation site. 6 - 11 GTitNQ >Sequence : PS00009 AMIDATION Amidation site. 78 - 81 mGKR I need to impliment an image like blast-parser image. Thanks to any inputs/pointers. > The width of the image is determined by the -width attribute and is given > in > pixels. You cannot control the height of the image as it is computed > dynamically based on the number of features and bumping options. > > Lincoln > > On 5/1/07, Shameer Khadar wrote: >> >> Dear Scot, >> >> > There is a fair amount of documentation in the perldoc for >> > Bio::Graphics::Panel under the section called 'Creating Imagemaps'; >> have >> > you read that? >> >> I agreed, but I couldnt the exact information I needed :( (may be I >> missed >> something important). >> >> > Also, for changing the scale, that should happen >> > automatically--have you tried yet? >> >> I tried by changing the Lincoln's program eg: blast3.pl >> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000); >> to my >> $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>300); >> >> But it had given me a smaller scale of length upto 300. I was looking >> for >> an option where I need same width and height of given image and a >> dynamic >> start and end values depending on length of my sequence. Since I couldnt >> accomplish, I thought of getting some help from you guys. I think I need >> to play a little bit with the value for reformat the scale to accomodate >> my hits as well. >> >> Thanks a lot for your inputs, >> -- >> Shameer Khadar -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From bix at sendu.me.uk Tue May 15 04:23:52 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 15 May 2007 09:23:52 +0100 Subject: [Bioperl-l] New Blast parser Message-ID: <46496E18.1000809@sendu.me.uk> Back in August of last year I introduced Bio::PullParserI, a module that aids in the creation of fast SearchIO and Search modules. I've finally gotten around to implementing a Blast parser using the interface, which I've called Bio::SearchIO::blast_pull. To use it you say: my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file"); or in the near future (when I've committed StandAloneBlast changes): my $sab = Bio::Tools::Run::StandAloneBlast->new(-_READMETHOD => "blast_pull"); Currently the parser is incomplete: I've only tested it with NCBI BLASTN and BLASTP. However, results are promising. In one particular real-world usage-case involving running and parsing multiple Blast jobs via StandAloneBlast (amongst other things), changing only the _READMETHOD from 'blast' to 'blast_pull' in the code dropped run time from 20223s to 951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40% less). Please try it out and feed-back any bugs you discover. Cheers, Sendu. From aaron.j.mackey at gsk.com Tue May 15 10:30:13 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 15 May 2007 10:30:13 -0400 Subject: [Bioperl-l] get regions In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> Message-ID: Or, use a zero-width, positive look ahead assertion, and don't incur the penalty of either $` or $&: if ($gene =~ m/(?=$pattern)/gi) { $start = pos($gene) + 1; } -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 05/14/2007 09:46:55 PM: > On 5/14/07, Kevin Brown wrote: > > I do this in perl with the pos() function. This requires the use of the > > match operator (m) like > > > > if ($gene =~ m/$pattern/gi) > > { > > $start = pos($gene) - length($pattern) + 1; > > } > > > > pos() returns the location of the pointer where the regex left off after > > finding a match. > > Cool. I hadn't known that was possible. > > > I remove the length of my pattern (which is just a > > string with a few placeholder (.) wildcards, so I know how long the > > match will always be). > > To generalize your code so that it will work for any pattern, such as > one that can match strings of variable length like "A{5,10}", just > subtract the length of the actual string that was matched: > > if ($gene =~ m/$pattern/gi) > { > $start = pos($gene) - length($&) + 1; > } > > Steve > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org > > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > > Jason Stajich > > > Sent: Monday, May 14, 2007 12:06 PM > > > To: Thiago Venancio > > > Cc: bioperl-l list > > > Subject: Re: [Bioperl-l] get regions > > > > > > I assume you are doing the matches on the string with =~ so > > > Bio::Seq doesn't really help you here I don't think. > > > See the $` variable in Perl for how to capture the position > > > of where a regexp matches. > > > > > > -jason > > > On May 14, 2007, at 11:54 AM, Thiago Venancio wrote: > > > > > > > Hi all, > > > > > > > > Using Bio::Seq, is there any easy way to get the > > > coordinates where a > > > > regular expression matches or should I build a sliding window? > > > > > > > > For example, looking for a given promoter region in a FASTA > > > file. If > > > > the region is found, I would like to recover exactly the > > > coordinates > > > > where it matches. > > > > > > > > Thanks in advance. > > > > > > > > Thiago > > > > -- > > > > "Doubt is not a pleasant condition, but certainty is absurd." > > > > Voltaire > > > > > > > > ======================== > > > > Thiago Motta Venancio, MSc > > > > PhD student in Bioinformatics > > > > University of Sao Paulo > > > > ======================== > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Jason Stajich > > > jason at bioperl.org > > > http://jason.open-bio.org/ > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From diogoat at gmail.com Tue May 15 18:44:59 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 15 May 2007 19:44:59 -0300 Subject: [Bioperl-l] Downloading a sequence in genbank format Message-ID: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> Dear All, I need to download a lot of sequence of Leishmania major in genbank format... But i can't download on the page of NCBI, because the downloaded file are corrupted... when i use a browser to download this sequences And them i looking for some script to download that`s file and fink something like that: ######################################################### use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query =>'Leishmania major [Organism]', -db => 'nucleotide'); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); my $out = Bio::SeqIO->new(-format => 'genbank', -file => '>>teste6.gb'); $out->write_seq($seqio); ######################################################### And the system return me this erros [diogo1 at genome perl]$ perl teste6.pl -------------------- WARNING --------------------- MSG: Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module. Attempting to dump, but may fail! --------------------------------------------------- Can't locate object method "seq" via package "Bio::SeqIO::genbank" at /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692. Any Ideia? Thank`s Diogo Tschoeke Laboratory of Molecular Biology of Trypanosomatides Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil http:biowebdb.org From diogoat at gmail.com Tue May 15 19:27:05 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 15 May 2007 20:27:05 -0300 Subject: [Bioperl-l] Downloading a sequence in genbank format In-Reply-To: References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> Message-ID: <638512560705151627t2e25f17cg7f820f3097a67748@mail.gmail.com> Thank for your help Barry!! It`s work very fine and i`'m using the script... like you said... The error was on the print that`s right? I need to use a while to print all sequeces... Thanks a Lot Diogo Tschoeke Laboratory of Molecular Biology of Trypanosomatides Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil http://biowebdb.org 2007/5/15, Barry Moore : > > Diogo- > > write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO > object. Try this > > use strict; > use warnings; > > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > my $query = Bio::DB::Query::GenBank->new > (-query =>'Leishmania major > [Organism]', > -db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > my $out = Bio::SeqIO->new(-format => 'genbank', > -file => '>>teste6.gb'); > while (my $seq = $seqio->next_seq) { > $out->write_seq($seq); > } > > Barry > > On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote: > > > Dear All, > > > > I need to download a lot of sequence of Leishmania major in genbank > > format... > > But i can't download on the page of NCBI, because the downloaded > > file are > > corrupted... when i use a browser to download this sequences > > And them i looking for some script to download that`s file and fink > > something like that: > > > > > > ######################################################### > > use strict; > > use warnings; > > > > use Bio::Seq; > > use Bio::SeqIO; > > use Bio::DB::GenBank; > > > > my $query = Bio::DB::Query::GenBank->new > > (-query =>'Leishmania major > > [Organism]', > > -db => 'nucleotide'); > > my $gb = new Bio::DB::GenBank; > > my $seqio = $gb->get_Stream_by_query($query); > > > > my $out = Bio::SeqIO->new(-format => 'genbank', > > -file => '>>teste6.gb'); > > $out->write_seq($seqio); > > ######################################################### > > > > And the system return me this erros > > [diogo1 at genome perl]$ perl teste6.pl > > > > -------------------- WARNING --------------------- > > MSG: Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant > > module. > > Attempting to dump, but may fail! > > --------------------------------------------------- > > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692. > > > > Any Ideia? > > > > Thank`s > > > > Diogo Tschoeke > > Laboratory of Molecular Biology of Trypanosomatides > > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil > > http://biowebdb.org > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From barry.moore at genetics.utah.edu Tue May 15 19:17:39 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 15 May 2007 17:17:39 -0600 Subject: [Bioperl-l] Downloading a sequence in genbank format In-Reply-To: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> Message-ID: Diogo- write_seq expects to be given a Bio::Seq object, not a Bio::SeqIO object. Try this use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query =>'Leishmania major [Organism]', -db => 'nucleotide'); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); my $out = Bio::SeqIO->new(-format => 'genbank', -file => '>>teste6.gb'); while (my $seq = $seqio->next_seq) { $out->write_seq($seq); } Barry On May 15, 2007, at 4:44 PM, Diogo Tschoeke wrote: > Dear All, > > I need to download a lot of sequence of Leishmania major in genbank > format... > But i can't download on the page of NCBI, because the downloaded > file are > corrupted... when i use a browser to download this sequences > And them i looking for some script to download that`s file and fink > something like that: > > > ######################################################### > use strict; > use warnings; > > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > my $query = Bio::DB::Query::GenBank->new > (-query =>'Leishmania major > [Organism]', > -db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > my $out = Bio::SeqIO->new(-format => 'genbank', > -file => '>>teste6.gb'); > $out->write_seq($seqio); > ######################################################### > > And the system return me this erros > [diogo1 at genome perl]$ perl teste6.pl > > -------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant > module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692. > > Any Ideia? > > Thank`s > > Diogo Tschoeke > Laboratory of Molecular Biology of Trypanosomatides > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil > http:biowebdb.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue May 15 22:44:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 15 May 2007 21:44:43 -0500 Subject: [Bioperl-l] get regions In-Reply-To: <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> Message-ID: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu> On May 14, 2007, at 8:46 PM, Steve Chervitz wrote: ... > To generalize your code so that it will work for any pattern, such as > one that can match strings of variable length like "A{5,10}", just > subtract the length of the actual string that was matched: > > if ($gene =~ m/$pattern/gi) > { > $start = pos($gene) - length($&) + 1; > } > > Steve Right, but $& (as well as $` and $') inflict a significant penalty for their use, as Aaron alludes to. Their use, even indirectly via a library module, can cause a significant performance hit. chris From sac at bioperl.org Wed May 16 04:16:38 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 16 May 2007 01:16:38 -0700 Subject: [Bioperl-l] get regions In-Reply-To: <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu> Message-ID: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com> On 5/15/07, Chris Fields wrote: > > On May 14, 2007, at 8:46 PM, Steve Chervitz wrote: > ... > > > To generalize your code so that it will work for any pattern, such as > > one that can match strings of variable length like "A{5,10}", just > > subtract the length of the actual string that was matched: > > > > if ($gene =~ m/$pattern/gi) > > { > > $start = pos($gene) - length($&) + 1; > > } > > > > Steve > > Right, but $& (as well as $` and $') inflict a significant penalty > for their use, as Aaron alludes to. Their use, even indirectly via a > library module, can cause a significant performance hit. > > chris Yes. I had forgotten how poisonous $&, $` and $' were to regex performance. Please forgive me. We might consider regularly auditing the bioperl module tree for use of these in committed code. But regarding the use of the look ahead assertion, there's a problem if you want to find *all* occurrences of the pattern in a target string and the pattern can have variable length hits: it may report overlapping hits because it only collects the starting points of the match, and does not determine how long each match would be. For example: $gene = 'TTTAAAAAAAAGG'; $pattern="A{5,10}"; while ($gene =~ m/(?=$pattern)/gi) { $start = pos($gene) + 1; print ++$hit, " hit starts at $start\n"; } Generates: 1 hit starts at 4 2 hit starts at 5 3 hit starts at 6 4 hit starts at 7 You could get around this by imposing a constraint to avoid trivial overlaps. OK if you know the length of the pattern, but not so good for more complex patterns. If there was I way to get the look ahead to match the longest string possible for a variable length pattern, then this approach could work, but I'm not sure if that is possible. Here's a solution I think does the job of reporting the extent of each match without a performance hit and works for patterns of any complexity, taking advantage of the special arrays containing hit indexes, @- and @+: $gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG'; while ($gene =~ m/$pattern/gi){ $hit++; printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0]; } Generates: 1 hit at: 4 - 11 2 hit at: 16 - 21 You can also use this approach to report the locations of any internal back references, if the pattern contains any parentheses, via $-[1], $+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such patterns, but patterns not containing parens won't be penalized. Steve From georg.otto at tuebingen.mpg.de Wed May 16 05:19:06 2007 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Wed, 16 May 2007 11:19:06 +0200 Subject: [Bioperl-l] Downloading a sequence in genbank format - related problem References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> Message-ID: Dear all, I have a problem that has to do with downloading data from GenBank as well, therefor I put it in this thread. I try to get all entries from organism Danio rerio using the something like this: use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; use Bio::DB::Query::GenBank; my $query = "Danio rerio[ORGN]"; my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query); my $gb_obj = Bio::DB::GenBank->new; my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); while (my $seq_obj = $stream_obj->next_seq) { my $out = Bio::SeqIO->new(-format => 'fasta', -file => '>>output.fas'); $out->write_seq($seq_obj); } However, the download process aborts after a few thousand entries. I do not think that this is due to the request itself or problems with specific entries, since the number of transferred sequences varies before the stop. It might rather have to do with GenBank terminating the connection. Has anybody a suggestion of a better strategy to achieve what I want (e.g. a different kind of query, a method to reassume the download at the point where it terminated etc.)? Best, Georg "Diogo Tschoeke" writes: > Dear All, > > I need to download a lot of sequence of Leishmania major in genbank > format... > But i can't download on the page of NCBI, because the downloaded file are > corrupted... when i use a browser to download this sequences > And them i looking for some script to download that`s file and fink > something like that: > > > ######################################################### > use strict; > use warnings; > > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > my $query = Bio::DB::Query::GenBank->new > (-query =>'Leishmania major [Organism]', > -db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > my $out = Bio::SeqIO->new(-format => 'genbank', > -file => '>>teste6.gb'); > $out->write_seq($seqio); > ######################################################### > > And the system return me this erros > [diogo1 at genome perl]$ perl teste6.pl > > -------------------- WARNING --------------------- > MSG: Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant module. > Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::SeqIO::genbank" at > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692. > > Any Ideia? > > Thank`s > > Diogo Tschoeke > Laboratory of Molecular Biology of Trypanosomatides > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil > http:biowebdb.org From cjfields at uiuc.edu Wed May 16 09:05:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 May 2007 08:05:59 -0500 Subject: [Bioperl-l] Downloading a sequence in genbank format - related problem In-Reply-To: References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> Message-ID: It's likely from a timeout issue on the remote server. One thing which will speed things up is to retrieve the remote sequences in fasta format to begin with (described in the Bio::DB::GenBank POD): my $gb_obj = Bio::DB::GenBank->new(-retrievaltype => 'tempfile' , -format => 'fasta'); my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); while (my $seq_obj = $stream_obj->next_seq) { $out->write_seq($seq_obj); } I also suggest using the direct ftp downloads if at all possible (i.e. you are downloading WGS or contig sequences). It's much faster. chris On May 16, 2007, at 4:19 AM, Georg Otto wrote: > > Dear all, > > I have a problem that has to do with downloading data from GenBank as > well, therefor I put it in this thread. > > I try to get all entries from organism Danio rerio using the something > like this: > > > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > my $query = "Danio rerio[ORGN]"; > my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', > -query => $query); > my $gb_obj = Bio::DB::GenBank->new; > my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); > > > while (my $seq_obj = $stream_obj->next_seq) { > my $out = Bio::SeqIO->new(-format => 'fasta', > -file => '>>output.fas'); > $out->write_seq($seq_obj); > } > > > However, the download process aborts after a few thousand entries. I > do not think that this is due to the request itself or problems with > specific entries, since the number of transferred sequences varies > before the stop. It might rather have to do with GenBank terminating > the connection. > > Has anybody a suggestion of a better strategy to achieve what I want > (e.g. a different kind of query, a method to reassume the download at > the point where it terminated etc.)? > > Best, > > Georg > > > "Diogo Tschoeke" writes: > >> Dear All, >> >> I need to download a lot of sequence of Leishmania major in genbank >> format... >> But i can't download on the page of NCBI, because the downloaded >> file are >> corrupted... when i use a browser to download this sequences >> And them i looking for some script to download that`s file and fink >> something like that: >> >> >> ######################################################### >> use strict; >> use warnings; >> >> use Bio::Seq; >> use Bio::SeqIO; >> use Bio::DB::GenBank; >> >> my $query = Bio::DB::Query::GenBank->new >> (-query =>'Leishmania major >> [Organism]', >> -db => 'nucleotide'); >> my $gb = new Bio::DB::GenBank; >> my $seqio = $gb->get_Stream_by_query($query); >> >> my $out = Bio::SeqIO->new(-format => 'genbank', >> -file => '>>teste6.gb'); >> $out->write_seq($seqio); >> ######################################################### >> >> And the system return me this erros >> [diogo1 at genome perl]$ perl teste6.pl >> >> -------------------- WARNING --------------------- >> MSG: Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant >> module. >> Attempting to dump, but may fail! >> --------------------------------------------------- >> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at >> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692. >> >> Any Ideia? >> >> Thank`s >> >> Diogo Tschoeke >> Laboratory of Molecular Biology of Trypanosomatides >> Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil >> http:biowebdb.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ferraria at gmail.com Wed May 16 10:38:47 2007 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 16 May 2007 16:38:47 +0200 Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure Message-ID: Hi all, I want to do something relatively simple and I want to know how far Bioperl tools could help me because I'm having troubles to get to the point. Here is the pipeline : "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) -----> "GeneStructure" (*) : >From the EntrezGene ID, I want to retrieve the structure of the gene which means having the whole genomic sequence and having the start and end positions of each exons, introns, UTR'.... I thought of 2 ways to accomplish that : - use 'efetch', get raw xml or asn1 and then parse it to obtain the desired positions. this method should work but would take a little time to be ok. - use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I obtain a Bio::Seq object but I am not able to find any features stored in it. So it doesn't seem that the get_Seq_by_id function get all information contained in a EntrezGene entry (?) . Can somebody help me to make the right choice or show me the right way? I also saw that some packages detinated to deal with gene structure exist but I don't manage to know how to use it properly and even how to create one of those objects ! Are those packages currently usable ? Thanks in advance. Best regards, tony From cjfields at uiuc.edu Wed May 16 12:02:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 16 May 2007 11:02:28 -0500 Subject: [Bioperl-l] get regions In-Reply-To: <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com> References: <44255ea80705141154r2abaf862p73be150b6fd824a1@mail.gmail.com> <13EAE352-2A44-4728-BDA9-B828CFE2DC11@bioperl.org> <1A4207F8295607498283FE9E93B775B403283D5C@EX02.asurite.ad.asu.edu> <8f200b4c0705141846h68146d40nc238bc911d1a4b4d@mail.gmail.com> <6CDAB174-E36A-4D2A-8468-AD980CFCAED6@uiuc.edu> <8f200b4c0705160116j265f9e8eu1174d6e41e6ebbdc@mail.gmail.com> Message-ID: <9C6F4829-4E06-4751-8B10-B2726B5288B9@uiuc.edu> On May 16, 2007, at 3:16 AM, Steve Chervitz wrote: ... >> >> Right, but $& (as well as $` and $') inflict a significant penalty >> for their use, as Aaron alludes to. Their use, even indirectly via a >> library module, can cause a significant performance hit. >> >> chris > > Yes. I had forgotten how poisonous $&, $` and $' were to regex > performance. Please forgive me. We might consider regularly auditing > the bioperl module tree for use of these in committed code. Already done! We have run a few audits for gotchas like that: http://www.bioperl.org/wiki/Auditing http://www.bioperl.org/wiki/Bioperl_Best_Practices If there is anything we should be looking for please feel free to add as needed. There shouldn't be any use of the 'naughty' variables in CVS, but it might be worth a second look... > But regarding the use of the look ahead assertion, there's a problem > if you want to find *all* occurrences of the pattern in a target > string and the pattern can have variable length hits: it may report > overlapping hits because it only collects the starting points of the > match, and does not determine how long each match would be. For > example: > > $gene = 'TTTAAAAAAAAGG'; > $pattern="A{5,10}"; > while ($gene =~ m/(?=$pattern)/gi) { > $start = pos($gene) + 1; > print ++$hit, " hit starts at $start\n"; > } > > Generates: > 1 hit starts at 4 > 2 hit starts at 5 > 3 hit starts at 6 > 4 hit starts at 7 > > You could get around this by imposing a constraint to avoid trivial > overlaps. OK if you know the length of the pattern, but not so good > for more complex patterns. If there was I way to get the look ahead to > match the longest string possible for a variable length pattern, then > this approach could work, but I'm not sure if that is possible. > > Here's a solution I think does the job of reporting the extent of each > match without a performance hit and works for patterns of any > complexity, taking advantage of the special arrays containing hit > indexes, @- and @+: > > $gene = 'TTTAAAAAAAAGGGGAAAAAAGGGGG'; > while ($gene =~ m/$pattern/gi){ > $hit++; > printf "$hit hit at: %2d - %d\n", $-[0]+1, $+[0]; > } > > Generates: > 1 hit at: 4 - 11 > 2 hit at: 16 - 21 > > You can also use this approach to report the locations of any internal > back references, if the pattern contains any parentheses, via $-[1], > $+[1], $-[2], $+[2] etc. You'll pay a performance hit when using such > patterns, but patterns not containing parens won't be penalized. > > Steve Friedl's Regex book has outlined a few ways to get around the 'naughty' variables $`, $&, and $' using substr() and $-[0], $+[0], or both, which makes sense since @+ and @- are arrays of positions instead of actual text. $` substr(target, 0, $-[0]) $& substr(target, $-[0], $+[0] - $-[0]) $' substr(target, $+[0]) Wonderful book! chris From benoit at ebi.ac.uk Wed May 16 12:35:39 2007 From: benoit at ebi.ac.uk (Benoit Ballester) Date: Wed, 16 May 2007 17:35:39 +0100 Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure In-Reply-To: References: Message-ID: <464B32DB.6080607@ebi.ac.uk> Hi Tony, I don't know how simple it is in bioperl, but it is quite simple using the ensembl perl API. Have a look here : API instalation: http://www.ensembl.org/info/software/api_installation.html API tutorial : http://www.ensembl.org/info/software/core/core_tutorial.html API Perl module Documentation : http://www.ensembl.org/info/software/Pdoc/ensembl/index.html so you can do something similar to the example below : # Get the 'COG6' gene from human my $gene = $gene_adaptor->fetch_by_display_label('COG6'); print "GENE ", $gene->stable_id(), "\n"; # here you get gene coordinate foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) { print "TRANSCRIPT ", $transcript->stable_id(), "\n";; #print transcript coordinates foreach my $exon ( @{ $transcript->get_all_exons() } ) { #print the exon coordinates } } } Hope this helps Benoit Anthony Ferrari wrote: > Hi all, > > I want to do something relatively simple and I want to know how far Bioperl > tools could help me because I'm having troubles to get to the point. > Here is the pipeline : > > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) -----> > "GeneStructure" > > (*) : >>From the EntrezGene ID, I want to retrieve the structure of the gene which > means having the whole genomic sequence and having the start and end > positions of each exons, introns, UTR'.... > > I thought of 2 ways to accomplish that : > > - use 'efetch', get raw xml or asn1 and then parse it to obtain the > desired positions. > this method should work but would take a little time to be ok. > > - use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I > obtain a Bio::Seq object but I am not able to find any features stored in > it. So it doesn't seem that the get_Seq_by_id function get all information > contained in a EntrezGene entry (?) . > > Can somebody help me to make the right choice or show me the right way? > > I also saw that some packages detinated to deal with gene structure exist > but I don't manage to know how to use it properly and even how to create one > of those objects ! > Are those packages currently usable ? > > > Thanks in advance. > Best regards, > tony > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johnsonm at gmail.com Wed May 16 15:11:18 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 16 May 2007 14:11:18 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> Message-ID: On 5/8/07, Chris Fields wrote: > I believe all seqfeature location coordinates are designed to have > start < stop for consistency; in cases where the strand matters (CDS, > gene, etc.) then the strand is set to 1 or -1. When start > stop, > the two are reversed and the strand is flipped; at least that's the > way locations are set up in BioPerl. > > chris Oh yeah? I always tend to ensure that (start < stop), regardless of strand, when working with sequence features...the other day, I caught Glimmer2 emitting a prediction on the plus strand with start > stop. I was going to work up a patch for the parser, but I wonder, should I just force everything to start < stop? Or only predictions on the plus strand? Should all the parsers for all the ab initio predictors ensure they emit features with coordinates like this? From diogoat at gmail.com Wed May 16 16:02:44 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Wed, 16 May 2007 17:02:44 -0300 Subject: [Bioperl-l] Downloading a sequence in genbank format - related problem In-Reply-To: References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> Message-ID: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com> Dear all, The script wich i wrote with your helps is working very good ( I paste the script in the end of e-mail). But I have another problem now, all the times wich I use the script im every all the file have a diferent size... Any ideia? what is the problem..? My conection? Problem on Ncbi? The script maybe? Diogo Tschoeke Laboratory of Molecular Biology of Trypanosomatides Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil http://biowebdb.org ############################################################# use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query =>'Trypanosoma cruzi [Organism]', -db => 'nucleotide'); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); my $out = Bio::SeqIO->new(-format => 'genbank', -file => '>>Trypanosoma_cruzi1.gb'); while (my $seq = $seqio->next_seq){ $out->write_seq($seq); } ######################################################### From barry.moore at genetics.utah.edu Wed May 16 17:13:27 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 16 May 2007 15:13:27 -0600 Subject: [Bioperl-l] Downloading a sequence in genbank format - related problem In-Reply-To: <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com> References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com> Message-ID: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu> Diogo, I'd guess that this is a result of NCBI terminating the connection as Chris suggested previously. There are a number of approaches you could use: Download only fasta if that's all you need. Download only IDs, and then use SeqHound, Batch Entrez or BioPerl to download those sequences or you could download the genbank files from the ftp site as Chris also suggested, and then run a bioperl script on each of those files. I can see that you are looking at Trypanosomes, so doing this (on linux or Mac OSX): wget ftp://ftp.ncbi.nih.gov/genbank/gbinv*.seq.gz will get you the 10 files in the invertebrate division from GenBank, and you could run a bioperl script on those 10 files. Barry On May 16, 2007, at 2:02 PM, Diogo Tschoeke wrote: > Dear all, > > The script wich i wrote with your helps is working very good ( I > paste the > script in the end of e-mail). > But I have another problem now, all the times wich I use the script > im every > all the file have a diferent size... > Any ideia? what is the problem..? My conection? Problem on Ncbi? > The script > maybe? > > Diogo Tschoeke > Laboratory of Molecular Biology of Trypanosomatides > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil > http://biowebdb.org > > ############################################################# > use strict; > use warnings; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > my $query = Bio::DB::Query::GenBank->new > (-query =>'Trypanosoma cruzi > [Organism]', > -db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > my $out = Bio::SeqIO->new(-format => 'genbank', > -file => '>>Trypanosoma_cruzi1.gb'); > while (my $seq = $seqio->next_seq){ > $out->write_seq($seq); > } > ######################################################### > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sac at bioperl.org Wed May 16 18:29:16 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 16 May 2007 15:29:16 -0700 Subject: [Bioperl-l] EUtilities - pipeline - Exonic Structure In-Reply-To: <464B32DB.6080607@ebi.ac.uk> References: <464B32DB.6080607@ebi.ac.uk> Message-ID: <8f200b4c0705161529h26e7c44fk54082a1156201861@mail.gmail.com> Another option is to use DAS ( http://biodas.org ), which was designed precisely to solve this sort of problem. A DAS genome query is a URL that specifies the genome assembly version on which the returned coordinates should be based. For example, get all features and their coordinates associated with the human actin gene on hg17: http://das.biopackages.net/das/genome/human/17/feature?name=ACTA1 Ensembl, UCSC, and other sites also provide DAS servers for genomic features, but these serve up a different XML response format (DAS/1.x) from what biopackages.net is serving (DAS/2). Here's are some links to these servers, both DAS/1 and DAS/2: http://www.biodas.org/wiki/DAS/1#Servers http://www.biodas.org/wiki/DAS/2#Servers By default, a DAS/2 server will return data in DAS2XML format, but you can specify alternative formats if a server supports them. This is one advantage of the DAS/2 retrieval spec, which is stable and is described here: http://biodas.org/documents/das2/das2_get.html You may not be able to user an Entrez gene ID directly in the query. It depends on whether these IDs are available on the given server. Accessions and gene names should be OK. You can always map your Entrez ids to accessions or gene names using this file ftp://ftp.ncbi.nih.gov/gene/gene2refseq.gz . Steve On 5/16/07, Benoit Ballester wrote: > Hi Tony, > > I don't know how simple it is in bioperl, but it is quite simple using > the ensembl perl API. > > Have a look here : > > API instalation: > http://www.ensembl.org/info/software/api_installation.html > API tutorial : > http://www.ensembl.org/info/software/core/core_tutorial.html > API Perl module Documentation : > http://www.ensembl.org/info/software/Pdoc/ensembl/index.html > > so you can do something similar to the example below : > > # Get the 'COG6' gene from human > > my $gene = $gene_adaptor->fetch_by_display_label('COG6'); > > print "GENE ", $gene->stable_id(), "\n"; > # here you get gene coordinate > > foreach my $transcript ( @{ $gene->get_all_Transcripts() } ) { > print "TRANSCRIPT ", $transcript->stable_id(), "\n";; > #print transcript coordinates > > foreach my $exon ( @{ $transcript->get_all_exons() } ) { > #print the exon coordinates > > } > } > } > > Hope this helps > > Benoit > > > Anthony Ferrari wrote: > > Hi all, > > > > I want to do something relatively simple and I want to know how far > Bioperl > > tools could help me because I'm having troubles to get to the point. > > Here is the pipeline : > > > > "EntrezGene Query" ----- (esearch) -----> "Gene ID" ------ (*) -----> > > "GeneStructure" > > > > (*) : > >>From the EntrezGene ID, I want to retrieve the structure of the gene > which > > means having the whole genomic sequence and having the start and end > > positions of each exons, introns, UTR'.... > > > > I thought of 2 ways to accomplish that : > > > > - use 'efetch', get raw xml or asn1 and then parse it to obtain the > > desired positions. > > this method should work but would take a little time to be ok. > > > > - use Bio::DB::EntrezGene module with the "get_Seq_by_id" function. I > > obtain a Bio::Seq object but I am not able to find any features stored in > > it. So it doesn't seem that the get_Seq_by_id function get all > information > > contained in a EntrezGene entry (?) . > > > > Can somebody help me to make the right choice or show me the right way? > > > > I also saw that some packages detinated to deal with gene structure > exist > > but I don't manage to know how to use it properly and even how to > create one > > of those objects ! > > Are those packages currently usable ? > > > > > > Thanks in advance. > > Best regards, > > tony > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki at sanbi.ac.za Thu May 17 02:46:44 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 17 May 2007 08:46:44 +0200 Subject: [Bioperl-l] Writing OBO fiies Message-ID: <200705170846.44641.heikki@sanbi.ac.za> I've started putting together Bio::OntologyIO::obo::write_ontology(). The current parser ignores a number of fields in common obo files. If anyone knows any issues regarding adding more information into obo ontology object, shout now. I need to start parsing at least "xref_analog" and "subset" to get a reasonable roundtrip of obo files representing cell ontology and sequence ontology. I am not aiming at extending the existing ontology interfaces but simply patching obo parsing, but I am open to suggestions. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bernd.web at gmail.com Thu May 17 06:48:07 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 17 May 2007 12:48:07 +0200 Subject: [Bioperl-l] (Simple)Align Message-ID: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com> Hi, I am playing with alignment and would like to insert strings at certain columns (so in all sequences in the alignment). I know about the slice and remove_columns. Is there already an insert_columns type of functionality? Otherwise I'll just iterate over the sequences similar to remove_columns (and give it a try to implement add_columns like remove_columns). Regards Bernd From Kevin.M.Brown at asu.edu Thu May 17 11:17:04 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 17 May 2007 08:17:04 -0700 Subject: [Bioperl-l] (Simple)Align In-Reply-To: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com> References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B403284273@EX02.asurite.ad.asu.edu> > I am playing with alignment and would like to insert strings > at certain columns (so in all sequences in the alignment). I > know about the slice and remove_columns. > Is there already an insert_columns type of functionality? > Otherwise I'll just iterate over the sequences similar to > remove_columns (and give it a try to implement add_columns > like remove_columns). Try reading the deobfuscator to see all the methods available to the simplealign object. http://bioperl.org/cgi-bin/deob_interface.cgi From diogoat at gmail.com Thu May 17 14:14:14 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Thu, 17 May 2007 15:14:14 -0300 Subject: [Bioperl-l] Downloading a sequence in genbank format - related problem In-Reply-To: <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu> References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com> <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu> Message-ID: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com> Hi Barry thank's for all your help, I choose download the Invertebrates division of NCBI to machine... but the I don't have thus script to get the sequences of the local file and I know how to write... i tried choose change in the script the -db => 'nucleotide' for -db => 'local-gbdi.gb' like I wrote below my $query = Bio::DB::Query::GenBank->new (-query =>'Leishmania major', -db => '>local-gbdi.gb ); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); but didn't work because de Bio:DB::Query::GenBank is a perl module wich conect at Ncbi to do my query and my Database is now local. I need the genomes of Trypanosoma cruzi, Trypanosoma brucei, Leishmania major, Entamoeba and Plasmodium falciparum in the genbank format file. Any Sugestion? Somebody have this script? Help! And thank's for the help! Diogo Tschoeke Laboratory of Molecular Biology of Trypanosomatides Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil http://biowebdb.org From barry.moore at genetics.utah.edu Thu May 17 14:19:46 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 17 May 2007 12:19:46 -0600 Subject: [Bioperl-l] Downloading a sequence in genbank format - related problem In-Reply-To: <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com> References: <638512560705151544q27968474tbc5633f74db21083@mail.gmail.com> <638512560705161302gc26c941ye023712d0e80df8a@mail.gmail.com> <2C1732DD-F4F2-4C4B-B942-AE0C6A160FEC@genetics.utah.edu> <638512560705171114n1ee851bg79c599c77fe57ab7@mail.gmail.com> Message-ID: Diogo- Look at the bioperl documentation - there you will find a HowTo on SeqIO. This will help you learn how to write scripts to load genbank flat files and you can then iterate over those files and check the organism to see if it's one that you want. You should be able to find everything that you need in the documentation. B On May 17, 2007, at 12:14 PM, Diogo Tschoeke wrote: > Hi Barry thank's for all your help, > > I choose download the Invertebrates division of NCBI to machine... > but the I don't have thus script to get the sequences of the local > file and I know how to write... > i tried choose change in the script > the -db => 'nucleotide' for -db => 'local-gbdi.gb' > like I wrote below > > my $query = Bio::DB::Query::GenBank->new > (-query =>'Leishmania major', > -db => '>local-gbdi.gb ); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > but didn't work because de Bio:DB::Query::GenBank is a perl module > wich conect at Ncbi to do my query and my Database is now local. > > I need the genomes of Trypanosoma cruzi, Trypanosoma brucei, > Leishmania major, Entamoeba and Plasmodium falciparum in the > genbank format file. > Any Sugestion? Somebody have this script? > Help! > And thank's for the help! > > Diogo Tschoeke > Laboratory of Molecular Biology of Trypanosomatides > Funda??o Osvaldo Cruz - Fiocruz RJ, Brazil > http://biowebdb.org From torsten.seemann at infotech.monash.edu.au Fri May 18 04:13:38 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 18 May 2007 18:13:38 +1000 Subject: [Bioperl-l] New Blast parser In-Reply-To: <46496E18.1000809@sendu.me.uk> References: <46496E18.1000809@sendu.me.uk> Message-ID: Sendu, > Back in August of last year I introduced Bio::PullParserI, a module that > aids in the creation of fast SearchIO and Search modules. I've finally > gotten around to implementing a Blast parser using the interface, which > I've called Bio::SearchIO::blast_pull. > my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => "file"); > Please try it out and feed-back any bugs you discover. This is very cool! Here's hoping NCBI don't change the default output format too much. You should be able to add "rpsblast -p T" support as this is identical to "blastall -p blastp" except for first line: BLASTP 2.2.16 [Mar-25-2007] RPS-BLAST 2.2.16 [Mar-25-2007] The only problem is the (rarely used) "rpsblast -p F" mode which looks/behaves like a "blastall -p tblastn", ie. has hit summaries with "Frame" Score = 29.6 bits (65), Expect = 0.26 Identities = 10/26 (38%), Positives = 12/26 (46%) Frame = -1 BUT has the same header line, so you can't know -p F was used until you see a "Frame = ??" in a hit (what were they thinking???). TBLASTN 2.2.16 [Mar-25-2007] RPS-BLAST 2.2.16 [Mar-25-2007] # should be RPS-TBLASTN perhaps... Thanks for the good work. Shame I converted most of our systems to blastxml :-( --Torsten From cjfields at uiuc.edu Fri May 18 09:39:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 18 May 2007 08:39:05 -0500 Subject: [Bioperl-l] New Blast parser In-Reply-To: References: <46496E18.1000809@sendu.me.uk> Message-ID: <2219EED8-F721-4586-B029-EF6CD9C32246@uiuc.edu> I'll be looking at cleaning up SearchIO::blastxml soon myself. It needs to be more memory-friendly with large XML files and PSI-BLAST iterations need to be addressed (nope, I haven't forgot about that!). There is a XML::LibXML pull parser interface (XML::LibXML::Reader) we could look into... chris On May 18, 2007, at 3:13 AM, Torsten Seemann wrote: > Sendu, > >> Back in August of last year I introduced Bio::PullParserI, a >> module that >> aids in the creation of fast SearchIO and Search modules. I've >> finally >> gotten around to implementing a Blast parser using the interface, >> which >> I've called Bio::SearchIO::blast_pull. >> my $sio = Bio::SearchIO->new(-format => "blast_pull", -file => >> "file"); >> Please try it out and feed-back any bugs you discover. > > This is very cool! > Here's hoping NCBI don't change the default output format too much. > > You should be able to add "rpsblast -p T" support as this is identical > to "blastall -p blastp" except for first line: > BLASTP 2.2.16 [Mar-25-2007] > RPS-BLAST 2.2.16 [Mar-25-2007] > > The only problem is the (rarely used) "rpsblast -p F" mode which > looks/behaves like a "blastall -p tblastn", ie. has hit summaries with > "Frame" > > Score = 29.6 bits (65), Expect = 0.26 > Identities = 10/26 (38%), Positives = 12/26 (46%) > Frame = -1 > > BUT has the same header line, so you can't know -p F was used until > you see a "Frame = ??" in a hit (what were they thinking???). > > TBLASTN 2.2.16 [Mar-25-2007] > RPS-BLAST 2.2.16 [Mar-25-2007] # should be RPS-TBLASTN perhaps... > > Thanks for the good work. Shame I converted most of our systems to > blastxml :-( > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri May 18 10:00:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 18 May 2007 09:00:38 -0500 Subject: [Bioperl-l] Writing OBO fiies In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za> References: <200705170846.44641.heikki@sanbi.ac.za> Message-ID: <239FDEF1-38D4-47B8-AC71-514B61BDF9E0@uiuc.edu> Sounds great to me! Sohel Merchant might have some ideas... chris On May 17, 2007, at 1:46 AM, Heikki Lehvaslaiho wrote: > > I've started putting together Bio::OntologyIO::obo::write_ontology(). > The current parser ignores a number of fields in common obo files. > If anyone knows any issues regarding adding more information into > obo ontology > object, shout now. > > I need to start parsing at least "xref_analog" and "subset" to get a > reasonable roundtrip of obo files representing cell ontology and > sequence > ontology. > > I am not aiming at extending the existing ontology interfaces but > simply > patching obo parsing, but I am open to suggestions. > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Sat May 19 20:54:11 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 19 May 2007 20:54:11 -0400 Subject: [Bioperl-l] Writing OBO fiies In-Reply-To: <200705170846.44641.heikki@sanbi.ac.za> References: <200705170846.44641.heikki@sanbi.ac.za> Message-ID: <221DB1CF-2F4E-47D4-80A8-D8D8BD777423@gmx.net> Sounds great to me! -hilmar On May 17, 2007, at 2:46 AM, Heikki Lehvaslaiho wrote: > > I've started putting together Bio::OntologyIO::obo::write_ontology(). > The current parser ignores a number of fields in common obo files. > If anyone knows any issues regarding adding more information into > obo ontology > object, shout now. > > I need to start parsing at least "xref_analog" and "subset" to get a > reasonable roundtrip of obo files representing cell ontology and > sequence > ontology. > > I am not aiming at extending the existing ontology interfaces but > simply > patching obo parsing, but I am open to suggestions. > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat May 19 21:36:49 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 19 May 2007 21:36:49 -0400 Subject: [Bioperl-l] FW: release of cipres portal for tree inference References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu> Message-ID: FYI. Is it worth thinking about implementing a remote access interface to the CIPRES tree inference tools, similar to what we have for RemoteBlast? -hilmar Begin forwarded message: From: "Vision, Todd (Biology)" Date: May 16, 2007 6:48:49 AM EDT Subject: FW: release of cipres portal for tree inference The CIPRES Central Resource team is pleased to announce the first public release of the CIPRES portal for Tree Inference. The portal is based on capabilities exposed by the Cipres software libraries, which were constructed as a Joint Effort between Mark Holder at Florida State University and the SDSC SW engineering team led by Terri Liebowitz. It currently presents Parsimony (PAUP) and Likelihood (GARLI and RAxML) tools with or without boosting from RecIDCM3 created by Usman Roshan and co-workers. Nexus and Phylip files are currently supported. The site is available to all, and is underwritten by the CIPRES cluster at SDSC. The portal is fully supported by the SDSC team, with contributions and new features introduced by the team in collaboration with Mark Holder and Rutger Vos. At present weekly releases are made with improvements and new features. You can visit the portal at the Cipres Web Site. http://www.phylo.org/sub_sections/portal.htm Please forward this information to anyone you feel may find the portal useful. On behalf of the whole CIPRES team, Mark Mark A. Miller, PhD Principal Investigator, Biology San Diego Supercomputer Center University of California, San Diego La Jolla, CA, 92093-0505 Tel: 858-822-0866 Fax: 858-822-3610 -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat May 19 22:10:53 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 19 May 2007 21:10:53 -0500 Subject: [Bioperl-l] FW: release of cipres portal for tree inference In-Reply-To: References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu> Message-ID: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu> I think it would be worthwhile. Would we place it in bioperl-run? chris On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote: > FYI. Is it worth thinking about implementing a remote access > interface to the CIPRES tree inference tools, similar to what we have > for RemoteBlast? > > -hilmar > > Begin forwarded message: > > From: "Vision, Todd (Biology)" > Date: May 16, 2007 6:48:49 AM EDT > Subject: FW: release of cipres portal for tree inference > > The CIPRES Central Resource team is pleased to announce the first > public > release of the CIPRES portal for Tree Inference. > > The portal is based on capabilities exposed by the Cipres software > libraries, which were constructed as a Joint Effort between Mark > Holder > at Florida State University and the SDSC SW engineering team led by > Terri Liebowitz. > > It currently presents Parsimony (PAUP) and Likelihood (GARLI and > RAxML) > tools with or without boosting from RecIDCM3 created by Usman > Roshan and > co-workers. Nexus and Phylip files are currently supported. > > The site is available to all, and is underwritten by the CIPRES > cluster > at SDSC. > > The portal is fully supported by the SDSC team, with contributions and > new features introduced by the team in collaboration with Mark Holder > and Rutger Vos. At present weekly releases are made with improvements > and new features. > > You can visit the portal at the Cipres Web Site. > > http://www.phylo.org/sub_sections/portal.htm > > Please forward this information to anyone you feel may find the > portal useful. > > On behalf of the whole CIPRES team, > > Mark > > Mark A. Miller, PhD > Principal Investigator, Biology > San Diego Supercomputer Center > University of California, San Diego > La Jolla, CA, 92093-0505 > Tel: 858-822-0866 > Fax: 858-822-3610 > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Sat May 19 22:19:47 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 19 May 2007 22:19:47 -0400 Subject: [Bioperl-l] FW: release of cipres portal for tree inference In-Reply-To: <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu> References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu> <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu> Message-ID: I guess so. That's where RemoteBlast is too, if I'm not mistaken? What sucks about the UI from a programming perspective is that it goes through multiple screens. There may be a lot of screen-scraping. -hilmar On May 19, 2007, at 10:10 PM, Chris Fields wrote: > I think it would be worthwhile. Would we place it in bioperl-run? > > chris > > On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote: > >> FYI. Is it worth thinking about implementing a remote access >> interface to the CIPRES tree inference tools, similar to what we have >> for RemoteBlast? >> >> -hilmar >> >> Begin forwarded message: >> >> From: "Vision, Todd (Biology)" >> Date: May 16, 2007 6:48:49 AM EDT >> Subject: FW: release of cipres portal for tree inference >> >> The CIPRES Central Resource team is pleased to announce the first >> public >> release of the CIPRES portal for Tree Inference. >> >> The portal is based on capabilities exposed by the Cipres software >> libraries, which were constructed as a Joint Effort between Mark >> Holder >> at Florida State University and the SDSC SW engineering team led by >> Terri Liebowitz. >> >> It currently presents Parsimony (PAUP) and Likelihood (GARLI and >> RAxML) >> tools with or without boosting from RecIDCM3 created by Usman >> Roshan and >> co-workers. Nexus and Phylip files are currently supported. >> >> The site is available to all, and is underwritten by the CIPRES >> cluster >> at SDSC. >> >> The portal is fully supported by the SDSC team, with contributions >> and >> new features introduced by the team in collaboration with Mark Holder >> and Rutger Vos. At present weekly releases are made with improvements >> and new features. >> >> You can visit the portal at the Cipres Web Site. >> >> http://www.phylo.org/sub_sections/portal.htm >> >> Please forward this information to anyone you feel may find the >> portal useful. >> >> On behalf of the whole CIPRES team, >> >> Mark >> >> Mark A. Miller, PhD >> Principal Investigator, Biology >> San Diego Supercomputer Center >> University of California, San Diego >> La Jolla, CA, 92093-0505 >> Tel: 858-822-0866 >> Fax: 858-822-3610 >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Sun May 20 01:06:53 2007 From: jason at bioperl.org (Jason Stajich) Date: Sat, 19 May 2007 22:06:53 -0700 Subject: [Bioperl-l] FW: release of cipres portal for tree inference In-Reply-To: References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu> <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu> Message-ID: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org> technically remoteblast is in bioperl-live, but for historical/ease of user-install purposes (i.e. so many people want to use blast out of the box, we kept it in bioperl-live to not force them to install bioperl-run). I think it would be great to have the interface - can we do it all via HTTP or will it require some installation of client software and/ or CORBA? -jason On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote: > I guess so. That's where RemoteBlast is too, if I'm not mistaken? > > What sucks about the UI from a programming perspective is that it > goes through multiple screens. There may be a lot of screen-scraping. > > -hilmar > > On May 19, 2007, at 10:10 PM, Chris Fields wrote: > >> I think it would be worthwhile. Would we place it in bioperl-run? >> >> chris >> >> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote: >> >>> FYI. Is it worth thinking about implementing a remote access >>> interface to the CIPRES tree inference tools, similar to what we >>> have >>> for RemoteBlast? >>> >>> -hilmar >>> >>> Begin forwarded message: >>> >>> From: "Vision, Todd (Biology)" >>> Date: May 16, 2007 6:48:49 AM EDT >>> Subject: FW: release of cipres portal for tree inference >>> >>> The CIPRES Central Resource team is pleased to announce the first >>> public >>> release of the CIPRES portal for Tree Inference. >>> >>> The portal is based on capabilities exposed by the Cipres software >>> libraries, which were constructed as a Joint Effort between Mark >>> Holder >>> at Florida State University and the SDSC SW engineering team led by >>> Terri Liebowitz. >>> >>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and >>> RAxML) >>> tools with or without boosting from RecIDCM3 created by Usman >>> Roshan and >>> co-workers. Nexus and Phylip files are currently supported. >>> >>> The site is available to all, and is underwritten by the CIPRES >>> cluster >>> at SDSC. >>> >>> The portal is fully supported by the SDSC team, with contributions >>> and >>> new features introduced by the team in collaboration with Mark >>> Holder >>> and Rutger Vos. At present weekly releases are made with >>> improvements >>> and new features. >>> >>> You can visit the portal at the Cipres Web Site. >>> >>> http://www.phylo.org/sub_sections/portal.htm >>> >>> Please forward this information to anyone you feel may find the >>> portal useful. >>> >>> On behalf of the whole CIPRES team, >>> >>> Mark >>> >>> Mark A. Miller, PhD >>> Principal Investigator, Biology >>> San Diego Supercomputer Center >>> University of California, San Diego >>> La Jolla, CA, 92093-0505 >>> Tel: 858-822-0866 >>> Fax: 858-822-3610 >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070519/0afb50df/attachment-0001.bin From bernd.web at gmail.com Sun May 20 10:56:07 2007 From: bernd.web at gmail.com (Bernd Web) Date: Sun, 20 May 2007 16:56:07 +0200 Subject: [Bioperl-l] (Simple)Align In-Reply-To: References: <716af09c0705170348x7c48474fu5672ae1de19acee6@mail.gmail.com> Message-ID: <716af09c0705200756h46bf2134x3d6841d2a98744c0@mail.gmail.com> Hi I have made a simple add_columns function in SimpleAlign along the lines of remove_columns. I only need to insert characters that are the same for all sequences: =head2 add_columns Title : add_columns Usage : $aln2 = $aln->add_columns([0, 10, '.'], [12, 15]) Function : Creates an alignment with columns added by specifying the columns by number and supplying the character (optional) to insert in all sequences. Default character is gap_char. Returns : Bio::SimpleAlign object Args : Array ref where the referenced array contains a pair of integers that that specify a column range and optionally the character to insert. The first column is 0. =cut The functionalilty could be extended: - possibility to supply a string to insert (for all sequences) - possibility to define the string to insert on a per sequence basis (although this may be more transparant to do outside SimpleAlign). After some final checks I could supply it (e.g. via bugzilla). Regards, Bernd On 5/17/07, Jason Stajich wrote: > not yet - when I did this to insert intron positions I just manipulated the > sequence strings outside of SimpleAlign, but I think it would be nice to > have an insert function. > > -jason > > On May 17, 2007, at 3:48 AM, Bernd Web wrote: > > Hi, > > I am playing with alignment and would like to insert strings at > certain columns (so in all sequences in the alignment). I know about > the slice and remove_columns. > Is there already an insert_columns type of functionality? > Otherwise I'll just iterate over the sequences similar to > remove_columns (and give it a try to implement add_columns like > remove_columns). > > > Regards > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > From hlapp at gmx.net Sun May 20 11:59:03 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 20 May 2007 11:59:03 -0400 Subject: [Bioperl-l] FW: release of cipres portal for tree inference In-Reply-To: <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org> References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu> <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu> <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org> Message-ID: Just HTTP, no CORBA or other stuff needed client-side. Ultimately it would of course be nice if they offered a more SOA compliant interface too, to obviate the screen-scraping need. However, if I understand the UI correctly the screen scraping is - if at all - only needed for walking through the steps, and for extracting the location of the result. The result itself is in NEXUS format, as a separate file. -hilmar On May 20, 2007, at 1:06 AM, Jason Stajich wrote: > technically remoteblast is in bioperl-live, but for historical/ease > of user-install purposes (i.e. so many people want to use blast out > of the box, we kept it in bioperl-live to not force them to install > bioperl-run). > > I think it would be great to have the interface - can we do it all > via HTTP or will it require some installation of client software > and/or CORBA? > > -jason > On May 19, 2007, at 7:19 PM, Hilmar Lapp wrote: > >> I guess so. That's where RemoteBlast is too, if I'm not mistaken? >> >> What sucks about the UI from a programming perspective is that it >> goes through multiple screens. There may be a lot of screen-scraping. >> >> -hilmar >> >> On May 19, 2007, at 10:10 PM, Chris Fields wrote: >> >>> I think it would be worthwhile. Would we place it in bioperl-run? >>> >>> chris >>> >>> On May 19, 2007, at 8:36 PM, Hilmar Lapp wrote: >>> >>>> FYI. Is it worth thinking about implementing a remote access >>>> interface to the CIPRES tree inference tools, similar to what we >>>> have >>>> for RemoteBlast? >>>> >>>> -hilmar >>>> >>>> Begin forwarded message: >>>> >>>> From: "Vision, Todd (Biology)" >>>> Date: May 16, 2007 6:48:49 AM EDT >>>> Subject: FW: release of cipres portal for tree inference >>>> >>>> The CIPRES Central Resource team is pleased to announce the first >>>> public >>>> release of the CIPRES portal for Tree Inference. >>>> >>>> The portal is based on capabilities exposed by the Cipres software >>>> libraries, which were constructed as a Joint Effort between Mark >>>> Holder >>>> at Florida State University and the SDSC SW engineering team led by >>>> Terri Liebowitz. >>>> >>>> It currently presents Parsimony (PAUP) and Likelihood (GARLI and >>>> RAxML) >>>> tools with or without boosting >from RecIDCM3 created by Usman >>>> Roshan and >>>> co-workers. Nexus and Phylip files are currently supported. >>>> >>>> The site is available to all, and is underwritten by the CIPRES >>>> cluster >>>> at SDSC. >>>> >>>> The portal is fully supported by the SDSC team, with contributions >>>> and >>>> new features introduced by the team in collaboration with Mark >>>> Holder >>>> and Rutger Vos. At present weekly releases are made with >>>> improvements >>>> and new features. >>>> >>>> You can visit the portal at the Cipres Web Site. >>>> >>>> http://www.phylo.org/sub_sections/portal.htm >>>> >>>> Please forward this information to anyone you feel may find the >>>> portal useful. >>>> >>>> On behalf of the whole CIPRES team, >>>> >>>> Mark >>>> >>>> Mark A. Miller, PhD >>>> Principal Investigator, Biology >>>> San Diego Supercomputer Center >>>> University of California, San Diego >>>> La Jolla, CA, 92093-0505 >>>> Tel: 858-822-0866 >>>> Fax: 858-822-3610 >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From johnsonm at gmail.com Mon May 21 11:19:56 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 21 May 2007 10:19:56 -0500 Subject: [Bioperl-l] FW: release of cipres portal for tree inference In-Reply-To: References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu> <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu> <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org> Message-ID: Sounds like time to bust out WWW::Mechanize. I didn't step through the whole process, but the first screen/step looks okay. Plain HTML form with plain buttons. Looks like the Javascript is only getting involved for client-side sanity checking. Should be easy to automate (Don't look at me, I've bitten off a bit too much as it is). On 5/20/07, Hilmar Lapp wrote: > Just HTTP, no CORBA or other stuff needed client-side. > > Ultimately it would of course be nice if they offered a more SOA > compliant interface too, to obviate the screen-scraping need. > However, if I understand the UI correctly the screen scraping is - if > at all - only needed for walking through the steps, and for > extracting the location of the result. The result itself is in NEXUS > format, as a separate file. > > -hilmar From cjfields at uiuc.edu Mon May 21 16:11:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 May 2007 15:11:36 -0500 Subject: [Bioperl-l] FW: release of cipres portal for tree inference In-Reply-To: References: <5805338EEBC6DB4AB6F96B9693F2ABDB01B0CCA0@email.bio.unc.edu> <9B50FABD-C9A4-447C-900F-5B937187BC14@uiuc.edu> <5DA6A803-23E8-4D29-8797-DFCFE0F44BD7@bioperl.org> Message-ID: <61E0D74B-77F7-499B-A0B7-B1E5106964E6@uiuc.edu> It would be nice to have a generalized interface (SOAP, CGI, anything), as Hilmar states. I agree WWW::Mechanize is prob. the way to go for now. Don't know who wants to take it up... chris On May 21, 2007, at 10:19 AM, Mark Johnson wrote: > Sounds like time to bust out WWW::Mechanize. I didn't step through > the whole process, but the first screen/step looks okay. Plain HTML > form with plain buttons. Looks like the Javascript is only getting > involved for client-side sanity checking. Should be easy to automate > (Don't look at me, I've bitten off a bit too much as it is). > > On 5/20/07, Hilmar Lapp wrote: >> Just HTTP, no CORBA or other stuff needed client-side. >> >> Ultimately it would of course be nice if they offered a more SOA >> compliant interface too, to obviate the screen-scraping need. >> However, if I understand the UI correctly the screen scraping is - if >> at all - only needed for walking through the steps, and for >> extracting the location of the result. The result itself is in NEXUS >> format, as a separate file. >> >> -hilmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon May 21 16:35:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 May 2007 15:35:41 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> Message-ID: On May 16, 2007, at 2:11 PM, Mark Johnson wrote: > On 5/8/07, Chris Fields wrote: >> I believe all seqfeature location coordinates are designed to have >> start < stop for consistency; in cases where the strand matters (CDS, >> gene, etc.) then the strand is set to 1 or -1. When start > stop, >> the two are reversed and the strand is flipped; at least that's the >> way locations are set up in BioPerl. >> >> chris > > Oh yeah? I always tend to ensure that (start < stop), regardless > of strand, when working with sequence features...the other day, I > caught Glimmer2 emitting a prediction on the plus strand with start > > stop. I was going to work up a patch for the parser, but I wonder, > should I just force everything to start < stop? Or only predictions > on the plus strand? Should all the parsers for all the ab initio > predictors ensure they emit features with coordinates like this? Odd that it would predict a start > stop on the plus strand, though it may be corrected in Glimmer3. Does the same prediction show up in Glimmer3? chris From johnsonm at gmail.com Mon May 21 16:48:52 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 21 May 2007 15:48:52 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> Message-ID: Check the test data for Glimmer2 and Glimmer3. They both predict one large gene, I'd guess covering most of the sequence, in frame +1. That's probably a bogus prediction, but that's not up to the parser to decide. I hadn't noticed it until recently. I sent a patch via bugzilla to swap the coordinates if start > end and strand > 0. On 5/21/07, Chris Fields wrote: > On May 16, 2007, at 2:11 PM, Mark Johnson wrote: > > > On 5/8/07, Chris Fields wrote: > >> I believe all seqfeature location coordinates are designed to have > >> start < stop for consistency; in cases where the strand matters (CDS, > >> gene, etc.) then the strand is set to 1 or -1. When start > stop, > >> the two are reversed and the strand is flipped; at least that's the > >> way locations are set up in BioPerl. > >> > >> chris > > > > Oh yeah? I always tend to ensure that (start < stop), regardless > > of strand, when working with sequence features...the other day, I > > caught Glimmer2 emitting a prediction on the plus strand with start > > > stop. I was going to work up a patch for the parser, but I wonder, > > should I just force everything to start < stop? Or only predictions > > on the plus strand? Should all the parsers for all the ab initio > > predictors ensure they emit features with coordinates like this? > > Odd that it would predict a start > stop on the plus strand, though > it may be corrected in Glimmer3. Does the same prediction show up in > Glimmer3? > > chris > From cjfields at uiuc.edu Mon May 21 16:56:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 May 2007 15:56:50 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> Message-ID: <6186D928-A47E-4EED-B06A-50E25A4893CC@uiuc.edu> On May 21, 2007, at 3:35 PM, Chris Fields wrote: > On May 16, 2007, at 2:11 PM, Mark Johnson wrote: > >> On 5/8/07, Chris Fields wrote: >>> I believe all seqfeature location coordinates are designed to have >>> start < stop for consistency; in cases where the strand matters >>> (CDS, >>> gene, etc.) then the strand is set to 1 or -1. When start > stop, >>> the two are reversed and the strand is flipped; at least that's the >>> way locations are set up in BioPerl. >>> >>> chris >> >> Oh yeah? I always tend to ensure that (start < stop), regardless >> of strand, when working with sequence features...the other day, I >> caught Glimmer2 emitting a prediction on the plus strand with start > >> stop. I was going to work up a patch for the parser, but I wonder, >> should I just force everything to start < stop? Or only predictions >> on the plus strand? Should all the parsers for all the ab initio >> predictors ensure they emit features with coordinates like this? > > Odd that it would predict a start > stop on the plus strand, though > it may be corrected in Glimmer3. Does the same prediction show up in > Glimmer3? > > chris ... and I see that it does (per your bug report). The next thing to ask is how often these odd Glimmer hits occur and whether others have seen the same thing. Maybe there's an explanation (bug, etc) but I can't immediately think of anything that makes sense unless it's running the reverse of the + strand as a control for some reason. chris From cjfields at uiuc.edu Mon May 21 17:17:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 May 2007 16:17:37 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> Message-ID: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu> On May 21, 2007, at 3:48 PM, Mark Johnson wrote: > Check the test data for Glimmer2 and Glimmer3. They both predict one > large gene, I'd guess covering most of the sequence, in frame +1. > That's probably a bogus prediction, but that's not up to the parser to > decide. I hadn't noticed it until recently. > > I sent a patch via bugzilla to swap the coordinates if start > end and > strand > 0. I think I know what it is. If you mean these predictions: Glimmer2: 27 29263 6 [+1 L= 684 r=-1.187] Glimmer3: orf00001 29263 9 +1 9.60 Glimmer2/3 are predicting a gene for a circular chromosome that starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off the stop codon). Note in Glimmer2 detailed output the end is 29946 and the length of the sequence is 29940, so Glimmer2 artificially extends the end of the sequence with part of the start. This is handled as a split location in bioperl and in most GenBank files; the above would be a location string like 'join (29263..29940,1..9)'. If you switched the start and stop the location would be '9..29263' which wouldn't be correct (and would be a huge gene). chris From johnsonm at gmail.com Mon May 21 17:21:52 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 21 May 2007 16:21:52 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu> References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu> Message-ID: That makes sense. Is that behavior documented anywhere? I'll feel like less of an idiot if it's not. 8) Either way, if you're sure that's whats going on, I'll fix up the parser to handle that as a split location. > I think I know what it is. If you mean these predictions: > > Glimmer2: > > 27 29263 6 [+1 L= 684 r=-1.187] > > Glimmer3: > > orf00001 29263 9 +1 9.60 > > Glimmer2/3 are predicting a gene for a circular chromosome that > starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off > the stop codon). Note in Glimmer2 detailed output the end is 29946 > and the length of the sequence is 29940, so Glimmer2 artificially > extends the end of the sequence with part of the start. > > This is handled as a split location in bioperl and in most GenBank > files; the above would be a location string like 'join > (29263..29940,1..9)'. If you switched the start and stop the > location would be '9..29263' which wouldn't be correct (and would be > a huge gene). > > chris > From cjfields at uiuc.edu Mon May 21 19:13:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 May 2007 18:13:24 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu> Message-ID: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu> glimmer2/3 both assume the genome is circular by default (I'm assuming since Glimmer2/3 are used for bacterial genomes). Acc. to the Glimmer3 release notes the detail file has the information in the header; from the Glimmer3 data used for tests: Command: /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA Glimmer3.icm Glimmer3 Sequence file = ../BCTDNA ICM model file = Glimmer3.icm Excluded regions file = none List of orfs file = none Truncated orfs = false Circular genome = true ... There are options available for glimmer3 (-L, -X) that specify a linear sequence or allow ORFs to extend past the end of the sequence analyzed (the latter assumes a linear sequence). chris On May 21, 2007, at 4:21 PM, Mark Johnson wrote: > That makes sense. Is that behavior documented anywhere? I'll > feel like less of an idiot if it's not. 8) Either way, if you're > sure that's whats going on, I'll fix up the parser to handle that as a > split location. > >> I think I know what it is. If you mean these predictions: >> >> Glimmer2: >> >> 27 29263 6 [+1 L= 684 r=-1.187] >> >> Glimmer3: >> >> orf00001 29263 9 +1 9.60 >> >> Glimmer2/3 are predicting a gene for a circular chromosome that >> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off >> the stop codon). Note in Glimmer2 detailed output the end is 29946 >> and the length of the sequence is 29940, so Glimmer2 artificially >> extends the end of the sequence with part of the start. >> >> This is handled as a split location in bioperl and in most GenBank >> files; the above would be a location string like 'join >> (29263..29940,1..9)'. If you switched the start and stop the >> location would be '9..29263' which wouldn't be correct (and would be >> a huge gene). >> >> chris >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnsonm at gmail.com Mon May 21 19:57:03 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 21 May 2007 18:57:03 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu> References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu> <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu> Message-ID: Alrighty then. That's a feature, not a bug. Hmmmm. How about this for a fix? For plus strand predictions with start > end, use a split location. For minus strand predictions with start < end, use a split location. Without knowing the length of the sequence, that's the best that can be done, I think. Unless there are objections, I'll go code that up. Close that bug out as 'requester is an idiot'. 8) On 5/21/07, Chris Fields wrote: > glimmer2/3 both assume the genome is circular by default (I'm > assuming since Glimmer2/3 are used for bacterial genomes). Acc. to > the Glimmer3 release notes the detail file has the information in the > header; from the Glimmer3 data used for tests: > > Command: /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA > Glimmer3.icm Glimmer3 > > Sequence file = ../BCTDNA > ICM model file = Glimmer3.icm > Excluded regions file = none > List of orfs file = none > Truncated orfs = false > Circular genome = true > ... > > There are options available for glimmer3 (-L, -X) that specify a > linear sequence or allow ORFs to extend past the end of the sequence > analyzed (the latter assumes a linear sequence). > > chris > > On May 21, 2007, at 4:21 PM, Mark Johnson wrote: > > > That makes sense. Is that behavior documented anywhere? I'll > > feel like less of an idiot if it's not. 8) Either way, if you're > > sure that's whats going on, I'll fix up the parser to handle that as a > > split location. > > > >> I think I know what it is. If you mean these predictions: > >> > >> Glimmer2: > >> > >> 27 29263 6 [+1 L= 684 r=-1.187] > >> > >> Glimmer3: > >> > >> orf00001 29263 9 +1 9.60 > >> > >> Glimmer2/3 are predicting a gene for a circular chromosome that > >> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off > >> the stop codon). Note in Glimmer2 detailed output the end is 29946 > >> and the length of the sequence is 29940, so Glimmer2 artificially > >> extends the end of the sequence with part of the start. > >> > >> This is handled as a split location in bioperl and in most GenBank > >> files; the above would be a location string like 'join > >> (29263..29940,1..9)'. If you switched the start and stop the > >> location would be '9..29263' which wouldn't be correct (and would be > >> a huge gene). > >> > >> chris > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From torsten.seemann at infotech.monash.edu.au Mon May 21 20:29:47 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 22 May 2007 10:29:47 +1000 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu> References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu> <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu> Message-ID: > glimmer2/3 both assume the genome is circular by default (I'm > assuming since Glimmer2/3 are used for bacterial genomes). Acc. to > the Glimmer3 release notes the detail file has the information in the > header; from the Glimmer3 data used for tests: You beat me to the reply Chris - yes, Glimmer2/3 assume circular chromosome by default. I had forgotten about this in earlier discussions of the new Glimmer parsers as I normally run it in --linear / -L mode (even if I know it is circular) because it is easier to handle, and our sequencer/assembler team usually gets the origin of replication right. > Command: /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA > Glimmer3.icm Glimmer3 I did a double-take here - that's the path to my Glimmer3 installation! It took me a couple of minutes to realise that you got it from the bioperl test data I created. D'oh! :-) > There are options available for glimmer3 (-L, -X) that specify a > linear sequence or allow ORFs to extend past the end of the sequence > analyzed (the latter assumes a linear sequence). If the -L mode should produce Bio::Location::Split objects, I guess if -X is used it should produce Bio::Location::Fuzzy objects too... --Torsten From cjfields at uiuc.edu Mon May 21 20:59:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 May 2007 19:59:20 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu> <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu> Message-ID: You can add the necessary patch to the bug report when it's ready; no need to close it out. The most complete file format to parse seems to be the details file; it contains the sequence length: >BCTDNA Sequence length = 29940 which can be used for the split location. As Torsten points out, use of -X could also potentially produce fuzzy locations. Since the parser currently only parses predict files, you could optionally supply the parser with the seq length and emit a warning if seqfeatures requiring it are produced, such as the sporadic ones which wrap around. If one were using the bioperl-run module this could be automated a bit by passing the seq length in to the parser object by adding the seq length to the constructor argument list. chris On May 21, 2007, at 6:57 PM, Mark Johnson wrote: > Alrighty then. That's a feature, not a bug. Hmmmm. How about > this for a fix? For plus strand predictions with start > end, use a > split location. For minus strand predictions with start < end, use a > split location. Without knowing the length of the sequence, that's > the best that can be done, I think. > Unless there are objections, I'll go code that up. Close that bug > out as 'requester is an idiot'. 8) > > On 5/21/07, Chris Fields wrote: >> glimmer2/3 both assume the genome is circular by default (I'm >> assuming since Glimmer2/3 are used for bacterial genomes). Acc. to >> the Glimmer3 release notes the detail file has the information in the >> header; from the Glimmer3 data used for tests: >> >> Command: /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA >> Glimmer3.icm Glimmer3 >> >> Sequence file = ../BCTDNA >> ICM model file = Glimmer3.icm >> Excluded regions file = none >> List of orfs file = none >> Truncated orfs = false >> Circular genome = true >> ... >> >> There are options available for glimmer3 (-L, -X) that specify a >> linear sequence or allow ORFs to extend past the end of the sequence >> analyzed (the latter assumes a linear sequence). >> >> chris >> >> On May 21, 2007, at 4:21 PM, Mark Johnson wrote: >> >>> That makes sense. Is that behavior documented anywhere? I'll >>> feel like less of an idiot if it's not. 8) Either way, if you're >>> sure that's whats going on, I'll fix up the parser to handle that >>> as a >>> split location. >>> >>>> I think I know what it is. If you mean these predictions: >>>> >>>> Glimmer2: >>>> >>>> 27 29263 6 [+1 L= 684 r=-1.187] >>>> >>>> Glimmer3: >>>> >>>> orf00001 29263 9 +1 9.60 >>>> >>>> Glimmer2/3 are predicting a gene for a circular chromosome that >>>> starts at 29263 and ending at +9 (+6 for Glimmer2, which leaves off >>>> the stop codon). Note in Glimmer2 detailed output the end is 29946 >>>> and the length of the sequence is 29940, so Glimmer2 artificially >>>> extends the end of the sequence with part of the start. >>>> >>>> This is handled as a split location in bioperl and in most GenBank >>>> files; the above would be a location string like 'join >>>> (29263..29940,1..9)'. If you switched the start and stop the >>>> location would be '9..29263' which wouldn't be correct (and >>>> would be >>>> a huge gene). >>>> >>>> chris >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon May 21 21:00:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 May 2007 20:00:58 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu> <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu> Message-ID: On May 21, 2007, at 7:29 PM, Torsten Seemann wrote: >> glimmer2/3 both assume the genome is circular by default (I'm >> assuming since Glimmer2/3 are used for bacterial genomes). Acc. to >> the Glimmer3 release notes the detail file has the information in the >> header; from the Glimmer3 data used for tests: > > You beat me to the reply Chris - yes, Glimmer2/3 assume circular > chromosome by default. I had forgotten about this in earlier > discussions of the new Glimmer parsers as I normally run it in > --linear / -L mode (even if I know it is circular) because it is > easier to handle, and our sequencer/assembler team usually gets the > origin of replication right. > >> Command: /bio/sw/glimmer3/bin/glimmer3 -o 50 -g 110 -t 30 ../BCTDNA >> Glimmer3.icm Glimmer3 > > I did a double-take here - that's the path to my Glimmer3 > installation! It took me a couple of minutes to realise that you got > it from the bioperl test data I created. D'oh! :-) Yep, I forgot about that! >> There are options available for glimmer3 (-L, -X) that specify a >> linear sequence or allow ORFs to extend past the end of the sequence >> analyzed (the latter assumes a linear sequence). > > If the -L mode should produce Bio::Location::Split objects, I guess if > -X is used > it should produce Bio::Location::Fuzzy objects too... > > --Torsten True, didn't think about that one. Def. something to consider adding in. chris From johnsonm at gmail.com Tue May 22 14:04:31 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 22 May 2007 13:04:31 -0500 Subject: [Bioperl-l] Why does Bio::DB::GFF::Feature::gff3_string swap start and stop coordinates?? In-Reply-To: References: <79FDA731-CC37-42B0-8200-0865F52C1CAC@uiuc.edu> <62034FE5-C375-49F3-9A4E-2545F93615F4@uiuc.edu> <9FAD90F3-79B3-4002-9A11-6C11F7D00614@uiuc.edu> Message-ID: Yes, Glimmer3 outputs the length of the input sequence. I don't believe Glimmer2 does. > The most complete file format to parse seems to be the details file; > it contains the sequence length: > > >BCTDNA > Sequence length = 29940 > Since the parser currently only parses predict files, you could > optionally supply the parser with the seq length and emit a warning > if seqfeatures requiring it are produced, such as the sporadic ones > which wrap around. If one were using the bioperl-run module this > could be automated a bit by passing the seq length in to the parser > object by adding the seq length to the constructor argument list. I think we can spot wrap-around genes easily enough without knowing the length of the input sequence. Having it just means we can perform a sanity check or two, such as making sure 'wraparound' genes are within N bases of the end of the input sequence. Any suggestions on a good default value for N? Parsing both output files for glimmer3 will be a little tricky. The constructor for Bio::Tools::Glimmer calls $class->SUPER::new(@args);, which hits the constructor for Bio::Tools::AnalysisResult, which does the same thing. It all ends up in Bio::Root::IO::_initialize_io, which grabs the -file arg and opens it. So, either let, Bio::Root::IO handle -file and have Bio::Tools::Glimmer handle, say -detail file, or have Bio::Tools::Glimmer just implement intialize_io() and hopefully that will fly.. From ClarkeW at AGR.GC.CA Tue May 22 17:10:08 2007 From: ClarkeW at AGR.GC.CA (ClarkeW) Date: Tue, 22 May 2007 15:10:08 -0600 Subject: [Bioperl-l] TextResultWriter Message-ID: Hi, I am interested in becoming a bioperl developer as I have recently found a bug in TextResultWriter. I know that I should submit the bug fixes using the protocol outlined in the How To but I haven't been able to login to the CVS anonymously to check it out. However, I have checked that the bug still exists in the most recent version of the code using the web interface to the CVS repositories. The bug is between lines 433 and 443, and deals with the reporting of the number of letters in the database and the number of entries in the database. My fix would be to change the existing code block: from: Number of letters in database: %s Number of sequences in database: %s Matrix: %s }, $result->database_name(), $result->get_statistic('posted_date') || POSIX::strftime("%b %d, %Y %I:%M %p",localtime), &_numwithcommas($result->database_entries()), &_numwithcommas($result->database_letters()), $result->get_parameter('matrix') || ''); to: Number of letters in database: %s Number of sequences in database: %s Matrix: %s }, $result->database_name(), $result->get_statistic('posted_date') || POSIX::strftime("%b %d, %Y %I:%M %p",localtime), &_numwithcommas($result->database_letters()), &_numwithcommas($result->database_entries()), $result->get_parameter('matrix') || ''); I believe that this is a simple enough modification that it does not require any new test cases. Cheers, Wayne From dmessina at wustl.edu Wed May 23 02:06:52 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 23 May 2007 01:06:52 -0500 Subject: [Bioperl-l] TextResultWriter In-Reply-To: References: Message-ID: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu> Hi Wayne, I submitted the bug report on your behalf http://bugzilla.open-bio.org/show_bug.cgi?id=2300 and committed your patch. Thanks for reporting this, and thanks even more for including a patch! Regarding your trouble checking out the repository via anonymous CVS, could you post the transcript of your attempt so we can get a better look at what's going wrong? Dave From ClarkeW at AGR.GC.CA Wed May 23 10:39:17 2007 From: ClarkeW at AGR.GC.CA (ClarkeW) Date: Wed, 23 May 2007 08:39:17 -0600 Subject: [Bioperl-l] TextResultWriter In-Reply-To: <196BA474-F555-4A12-9A55-42E626C1E8E1@wustl.edu> Message-ID: With regards to not being able to connect, I have discovered that the reason I cannot connect is that our firewall is blocking my access. It appears that I am not the first person to have this problem but that the people in charge are firm in their position to block the anonymous access port. However, if I obtain a developer account I will be able to access the CVS. Cheers, Wayne On 5/23/07 12:06 AM, "David Messina" wrote: > Hi Wayne, > > I submitted the bug report on your behalf > > http://bugzilla.open-bio.org/show_bug.cgi?id=2300 > > and committed your patch. Thanks for reporting this, and thanks even > more for including a patch! > > Regarding your trouble checking out the repository via anonymous CVS, > could you post the transcript of your attempt so we can get a better > look at what's going wrong? > > Dave > > From cjfields at uiuc.edu Wed May 23 12:16:32 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 23 May 2007 11:16:32 -0500 Subject: [Bioperl-l] TextResultWriter In-Reply-To: References: Message-ID: <7077B4AB-A3B5-4EAE-9994-0EF629D2DE2B@uiuc.edu> You can always use the browsable CVS link to download a tarball if that works for you. http://www.bioperl.org/wiki/Using_CVS http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/? cvsroot=bioperl The link to download is at the bottom of the page. chris On May 23, 2007, at 9:39 AM, ClarkeW wrote: > With regards to not being able to connect, I have discovered that > the reason > I cannot connect is that our firewall is blocking my access. It > appears that > I am not the first person to have this problem but that the people > in charge > are firm in their position to block the anonymous access port. > However, if I > obtain a developer account I will be able to access the CVS. > > Cheers, Wayne > > > On 5/23/07 12:06 AM, "David Messina" wrote: > >> Hi Wayne, >> >> I submitted the bug report on your behalf >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2300 >> >> and committed your patch. Thanks for reporting this, and thanks even >> more for including a patch! >> >> Regarding your trouble checking out the repository via anonymous CVS, >> could you post the transcript of your attempt so we can get a better >> look at what's going wrong? >> >> Dave >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Xianjun.Dong at bccs.uib.no Tue May 29 07:57:39 2007 From: Xianjun.Dong at bccs.uib.no (Dong Xianjun) Date: Tue, 29 May 2007 13:57:39 +0200 Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why? In-Reply-To: <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com> References: <465AD6E8.3030707@ii.uib.no> <62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com> <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com> Message-ID: <465C1533.6070900@ii.uib.no> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: kaks_methods.pl Type: application/x-perl Size: 2732 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/c0b905c0/attachment.bin From avilella at gmail.com Tue May 29 09:02:44 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 29 May 2007 14:02:44 +0100 Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why? In-Reply-To: <465C1533.6070900@ii.uib.no> References: <465AD6E8.3030707@ii.uib.no> <62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com> <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com> <465C1533.6070900@ii.uib.no> Message-ID: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com> codeml in PAML can give different results in cases where the optimization reaches different local maxima depending on the different starting points of each run (seed values). So, at least for some methods and options, this instability is inherent to the underlying algorithm. Even more, for some methods and options, it is even recommended in PAML documentation to run the same data more than once, to see if the results are the same, which would be a good indication that the model is robust given the data. Maybe PAML's author can give a more specific answer for your data at: http://www.rannala.org/gsf/viewforum.php?f=1 Cheers, Albert. On 5/29/07, Dong Xianjun wrote: > > HI, dear all, //sorry for duplicated msg for *Jason* and *Neil* > > I'm bothering by two problems when I use PAML module to calculate Ka/Ks > for my sequences. Could you help me? > > 1. Codeml could produce different Ka/Ks value if I run it twice. I check > it both in command line and in Perl wrapper of > Bio::Tools::Run::Phylo::PAML::Codeml; > > The input sequences are: > >seq1 > TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG > >seq2 > > TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG > > For command-line program, I used Codeml in PAML3.14, with specifications > in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program four > times. The output are like below (from the output file). We could see that > they are different from each other. they should be same or slightly > different. Right? But they are NOT. Weird! > > ---------------------------------------------------------------------------------------------------------------------------------- > t=11.5447 S= 42.4 N= 122.6 dN/dS= 0.0035 dN= 0.0522 dS=14.8339 > t= 9.4132 S= 41.8 N= 123.2 dN/dS= 0.0041 dN= 0.0507 dS=12.2349 > t=11.6305 S= 42.2 N= 122.8 dN/dS= 0.0034 dN= 0.0510 dS=14.9961 > t= 7.7879 S= 41.4 N= 123.6 dN/dS= 0.0050 dN= 0.0505 dS=10.1852 > > ---------------------------------------------------------------------------------------------------------------------------------- > I found the same problem when I use the Perl Wrapper of > Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here, > similar to the one in BioPerl HOWTO). > > 2. Another strange thing is, if I switch to use program YN00 in the > package of PAML, the output are stable. However, it's much different from > Codeml. (see below) > > ---------------------------------------------------------------------------------------------------------------------------------- > seq. seq. S N t kappa omega dN +- SE > dS +- SE > 2 1 40.4 124.6 1.7452 1.3163 0.0378 0.0804 +- 0.0265 > 2.1300 +- 1.2272 > > ---------------------------------------------------------------------------------------------------------------------------------- > Why like this? Which one I should believe? > > > Is there any guy who would kindly help me to run the perl script (twice to > check whether they are different)? or help to run the codeml in command > line? > I don't know whether there is anyone noticed this before, or because of > the wrong version of PAML. > > Regards, > > Xianjun > > > > Himanshu Ardawatia wrote: > > #!/usr/bin/perl > > use strict; > use warnings; > > > use Bio::Tools::Run::Phylo::PAML::Codeml; > use Bio::Tools::Run::Alignment::Clustalw; > > # for projecting alignments from protein to R/DNA space > use Bio::Align::Utilities qw(aa_to_dna_aln); > > # for input of the sequence data > use Bio::SeqIO; > use Bio::AlignIO; > > my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw(); > > #my $seqdata = 'chuck.fa'; > my $seqdata = 'xianjun.fa '; > > my $seqIO = new Bio::SeqIO(-file => $seqdata, > -format => 'fasta'); > my %seqs; > my @prots; > > my $output; > # process each sequence > while( my $seq = $seqIO->next_seq ) { > $seqs{$seq->display_id} = $seq; > # translate them into protein > my $protein = $seq->translate(); > my $pseq = $protein->seq(); > if( $pseq =~ /\*/ && > $pseq !~ /\*$/ ) { > warn("provided a cDNA sequence with a stop codon, PAML will choke!"); > exit(0); > } > # Tcoffee can't handle '*' even if it is trailing > $pseq =~ s/\*//g; > $protein->seq($pseq); > push @prots, $protein; > } > > if( @prots < 2 ) { > warn("Need at least 2 cDNA sequences to proceed"); > exit(0); > } > > open(OUT, ">align_output.txt") || > die("cannot open output $output for writing"); > # Align the sequences with clustalw > > my $aa_aln = $aln_factory->align(\@prots); > > # project the protein alignment back to cDNA coordinates > my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs); > > my @each = $dna_aln->each_seq(); > > my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml > ( -params => { 'runmode' => -2, > 'seqtype' => 1, > 'model' => 1, > } > ); > > # set the alignment object > $kaks_factory->alignment($dna_aln); > > # run the KaKs analysis > my ($rc,$parser) = $kaks_factory->run(); > my $result = $parser->next_result; > my $MLmatrix = $result->get_MLmatrix(); > > my @otus = $result->get_seqs(); > # this gives us a mapping from the PAML order of sequences back to > # the input order (since names get truncated) > my @pos = map { > my $c= 1; > foreach my $s ( @each ) { > last if( $s->display_id eq $_->display_id ); > $c++; > } > $c; > } @otus; > > print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID > CDNA_PERCENTID)), "\n"; > for( my $i = 0; $i < (scalar @otus -1) ; $i++) { > for( my $j = $i+1; $j < (scalar @otus); $j++ ) { > my $sub_aa_aln = $aa_aln->select_noncont($pos[$i],$pos[$j]); > my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]); > print OUT join("\t", > $otus[$i]->display_id, > $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'}, > $MLmatrix->[$i]->[$j]->{'dS'}, > $MLmatrix->[$i]->[$j]->{'omega'}, > sprintf("%.2f",$sub_aa_aln->percentage_identity), > sprintf("%.2f",$sub_dna_aln->percentage_identity), > ), "\n"; > } > } > > > On 5/29/07, Himanshu Ardawatia wrote: > > > > Hi Xianjun, > > > > I recognize this script. But it was a bit cumbersom to use this as many > > things are done in the script (like multiple alignment, aa to dna alignment > > and ka/ks calculation) so one does not have real control on these different > > aspect. > > I do not remeber getting different Ka/Ks in different runs though. But I > > remeber that one I ran the script with different versions of clustalw and it > > REALLY gave different results !! So please make sure if the clustalw > > versions are the same in all your runs. Best is to use the latest version. > > > > Finally I wrote my simple script which would generate a codeml.ctl file > > for each set of sequences and run the codeml based on that and then more on. > > Disadvantage of this can be that some files keep getting over-written (like > > the one which have their names hard-coded in codeml program) and if one > > needs those files as well then one needs to run the codeml cycles for each > > set of sequences in different directories. > > > > One advantage of this kind of script is that you can use whichever > > alignment program you want to use and so on....But then its also extra steps > > of yourself doing multiple alignment and aa to dna alignment etc.... > > > > Does it make sense? If you still get different outputs with same version > > of clustalw then I can sit with you and look at things together. Or else try > > the script method which I mentioned. > > > > Cheers and Fu > > Himanshu > > \\ > > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote: > > > > > > HI, Himanshu > > > > > > I am sure you did some work in Ka/Ks calculation. Here I have a > > > question > > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is > > > not > > > stable(different for each runtime), and also different from the output > > > > > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00. > > > > > > Here I attached the script. Could you help to have a look and try to > > > run > > > the script? How is your way to calculate the Kaks ratio? > > > > > > Thanks > > > > > > -- > > > --------------------------- > > > Sterding (Xianjun) Dong > > > PhD student, Boris Lenhard's group > > > Bergen Center of Computational Science > > > Bergen University, Norway > > > Mobile: 0047-47361688 > > > Telephone: 0047-55276381 > > > Skype: xianjun.dong > > > > > > > > > > > > > -- > --------------------------- > Sterding (Xianjun) Dong > PhD student, Boris Lenhard's group > Bergen Center of Computational Science > Bergen University, Norway > Mobile: 0047-47361688 > Telephone: 0047-55276381 > Skype: xianjun.dong > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Xianjun.Dong at bccs.uib.no Tue May 29 09:30:09 2007 From: Xianjun.Dong at bccs.uib.no (Dong Xianjun) Date: Tue, 29 May 2007 15:30:09 +0200 Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why? In-Reply-To: <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com> References: <465AD6E8.3030707@ii.uib.no> <62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com> <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com> <465C1533.6070900@ii.uib.no> <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com> Message-ID: <465C2AE1.30101@ii.uib.no> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070529/532a333d/attachment.html From avilella at gmail.com Tue May 29 09:45:28 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 29 May 2007 14:45:28 +0100 Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why? In-Reply-To: <465C2AE1.30101@ii.uib.no> References: <465AD6E8.3030707@ii.uib.no> <62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com> <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com> <465C1533.6070900@ii.uib.no> <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com> <465C2AE1.30101@ii.uib.no> Message-ID: <358f4d650705290645s65f596cbp37715f12064a5ced@mail.gmail.com> On 5/29/07, Dong Xianjun wrote: > > Thanks for information, Albert. > > But still in two questions: > Albert Vilella wrote: > > codeml in PAML can give different results in cases where the optimization > reaches different local maxima depending on the different starting points of > each run (seed values). So, at least for some methods and options, this > instability is inherent to the underlying algorithm. > > 1. How to set the initial value in order to get a reasonable estimation? > Do you have some experience for that? > People usually change the initial omega in the conf. For example, 3 runs with 0.001, 1 and 5. Even more, for some methods and options, it is even recommended in PAML > documentation to run the same data more than once, to see if the results are > the same, which would be a good indication that the model is robust given > the data. > > 2. Is there a recommend way to test the significance if the results are > different? For example, in my case, dS could range from 10.1852 to 14.9961for the four runtime. If that means the model is not robust(how to check > this?), should I change to use another model? > I would prefer PAML's author to answer this question :) How could YN00 reach stable result? (Is it because YN00 does not require > initial value for optimization?) Why could YN00 produce so different result > from Codeml? (for YN00, dS=2.1300 with SE=1.2272; for Codeml, dS= > 10.1852-14.9961) > I think Yn00 is less prone to give different local maxima than some codeml models, but then, codeml is better in giving true positives in cases where yn00 will give false negatives... Maybe PAML's author can give a more specific answer for your data at: > http://www.rannala.org/gsf/viewforum.php?f=1 > > > Actually I already post my question in the author's forum. Let's wait and > see. > Yes, I would wait for his answers, which should be way more reliable than mine :) Cheers, > > Albert. > > On 5/29/07, Dong Xianjun wrote: > > > > HI, dear all, //sorry for duplicated msg for *Jason* and *Neil* > > > > I'm bothering by two problems when I use PAML module to calculate Ka/Ks > > for my sequences. Could you help me? > > > > 1. Codeml could produce different Ka/Ks value if I run it twice. I > > check it both in command line and in Perl wrapper of > > Bio::Tools::Run::Phylo::PAML::Codeml; > > > > The input sequences are: > > >seq1 > > TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG > > >seq2 > > > > TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG > > > > For command-line program, I used Codeml in PAML3.14, with specifications > > in codeml.ctl (runmode = -2, seqtype = 1). I tried to run the program > > four times. The output are like below (from the output file). We could see > > that they are different from each other. they should be same or slightly > > different. Right? But they are NOT. Weird! > > > > ---------------------------------------------------------------------------------------------------------------------------------- > > t=11.5447 S= 42.4 N= 122.6 dN/dS= 0.0035 dN= 0.0522 dS=14.8339 > > t= 9.4132 S= 41.8 N= 123.2 dN/dS= 0.0041 dN= 0.0507 dS=12.2349 > > t=11.6305 S= 42.2 N= 122.8 dN/dS= 0.0034 dN= 0.0510 dS=14.9961 > > t= 7.7879 S= 41.4 N= 123.6 dN/dS= 0.0050 dN= 0.0505 dS=10.1852 > > > > ---------------------------------------------------------------------------------------------------------------------------------- > > I found the same problem when I use the Perl Wrapper of > > Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here, > > similar to the one in BioPerl HOWTO). > > > > 2. Another strange thing is, if I switch to use program YN00 in the > > package of PAML, the output are stable. However, it's much different from > > Codeml. (see below) > > > > ---------------------------------------------------------------------------------------------------------------------------------- > > seq. seq. S N t kappa omega dN +- SE > > dS +- SE > > 2 1 40.4 124.6 1.7452 1.3163 0.0378 0.0804 +- 0.0265 > > 2.1300 +- 1.2272 > > > > ---------------------------------------------------------------------------------------------------------------------------------- > > Why like this? Which one I should believe? > > > > > > Is there any guy who would kindly help me to run the perl script (twice > > to check whether they are different)? or help to run the codeml in command > > line? > > I don't know whether there is anyone noticed this before, or because of > > the wrong version of PAML. > > > > Regards, > > > > Xianjun > > > > > > > > Himanshu Ardawatia wrote: > > > > #!/usr/bin/perl > > > > use strict; > > use warnings; > > > > > > use Bio::Tools::Run::Phylo::PAML::Codeml; > > use Bio::Tools::Run::Alignment::Clustalw; > > > > # for projecting alignments from protein to R/DNA space > > use Bio::Align::Utilities qw(aa_to_dna_aln); > > > > # for input of the sequence data > > use Bio::SeqIO; > > use Bio::AlignIO; > > > > my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw(); > > > > #my $seqdata = 'chuck.fa'; > > my $seqdata = 'xianjun.fa '; > > > > my $seqIO = new Bio::SeqIO(-file => $seqdata, > > -format => 'fasta'); > > my %seqs; > > my @prots; > > > > my $output; > > # process each sequence > > while( my $seq = $seqIO->next_seq ) { > > $seqs{$seq->display_id} = $seq; > > # translate them into protein > > my $protein = $seq->translate(); > > my $pseq = $protein->seq(); > > if( $pseq =~ /\*/ && > > $pseq !~ /\*$/ ) { > > warn("provided a cDNA sequence with a stop codon, PAML will > > choke!"); > > exit(0); > > } > > # Tcoffee can't handle '*' even if it is trailing > > $pseq =~ s/\*//g; > > $protein->seq($pseq); > > push @prots, $protein; > > } > > > > if( @prots < 2 ) { > > warn("Need at least 2 cDNA sequences to proceed"); > > exit(0); > > } > > > > open(OUT, ">align_output.txt") || > > die("cannot open output $output for writing"); > > # Align the sequences with clustalw > > > > my $aa_aln = $aln_factory->align(\@prots); > > > > # project the protein alignment back to cDNA coordinates > > my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs); > > > > my @each = $dna_aln->each_seq(); > > > > my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml > > ( -params => { 'runmode' => -2, > > 'seqtype' => 1, > > 'model' => 1, > > } > > ); > > > > # set the alignment object > > $kaks_factory->alignment($dna_aln); > > > > # run the KaKs analysis > > my ($rc,$parser) = $kaks_factory->run(); > > my $result = $parser->next_result; > > my $MLmatrix = $result->get_MLmatrix(); > > > > my @otus = $result->get_seqs(); > > # this gives us a mapping from the PAML order of sequences back to > > # the input order (since names get truncated) > > my @pos = map { > > my $c= 1; > > foreach my $s ( @each ) { > > last if( $s->display_id eq $_->display_id ); > > $c++; > > } > > $c; > > } @otus; > > > > print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID > > CDNA_PERCENTID)), "\n"; > > for( my $i = 0; $i < (scalar @otus -1) ; $i++) { > > for( my $j = $i+1; $j < (scalar @otus); $j++ ) { > > my $sub_aa_aln = $aa_aln->select_noncont($pos[$i],$pos[$j]); > > my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]); > > print OUT join("\t", > > $otus[$i]->display_id, > > $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'}, > > $MLmatrix->[$i]->[$j]->{'dS'}, > > $MLmatrix->[$i]->[$j]->{'omega'}, > > sprintf("%.2f",$sub_aa_aln->percentage_identity), > > sprintf("%.2f",$sub_dna_aln->percentage_identity), > > ), "\n"; > > } > > } > > > > > > On 5/29/07, Himanshu Ardawatia wrote: > > > > > > Hi Xianjun, > > > > > > I recognize this script. But it was a bit cumbersom to use this as > > > many things are done in the script (like multiple alignment, aa to dna > > > alignment and ka/ks calculation) so one does not have real control on these > > > different aspect. > > > I do not remeber getting different Ka/Ks in different runs though. But > > > I remeber that one I ran the script with different versions of clustalw and > > > it REALLY gave different results !! So please make sure if the clustalw > > > versions are the same in all your runs. Best is to use the latest version. > > > > > > Finally I wrote my simple script which would generate a codeml.ctlfile for each set of sequences and run the codeml based on that and then > > > more on. Disadvantage of this can be that some files keep getting > > > over-written (like the one which have their names hard-coded in codeml > > > program) and if one needs those files as well then one needs to run the > > > codeml cycles for each set of sequences in different directories. > > > > > > One advantage of this kind of script is that you can use whichever > > > alignment program you want to use and so on....But then its also extra steps > > > of yourself doing multiple alignment and aa to dna alignment etc.... > > > > > > Does it make sense? If you still get different outputs with same > > > version of clustalw then I can sit with you and look at things together. Or > > > else try the script method which I mentioned. > > > > > > Cheers and Fu > > > Himanshu > > > \\ > > > On 5/28/07, Dong Xianjun < Xianjun.Dong at bccs.uib.no> wrote: > > > > > > > > HI, Himanshu > > > > > > > > I am sure you did some work in Ka/Ks calculation. Here I have a > > > > question > > > > bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is > > > > not > > > > stable(different for each runtime), and also different from the > > > > output > > > > with modeul of Bio::Tools::Run::Phylo::PAML::Yn00. > > > > > > > > Here I attached the script. Could you help to have a look and try to > > > > run > > > > the script? How is your way to calculate the Kaks ratio? > > > > > > > > Thanks > > > > > > > > -- > > > > --------------------------- > > > > Sterding (Xianjun) Dong > > > > PhD student, Boris Lenhard's group > > > > Bergen Center of Computational Science > > > > Bergen University, Norway > > > > Mobile: 0047-47361688 > > > > Telephone: 0047-55276381 > > > > Skype: xianjun.dong > > > > > > > > > > > > > > > > > > > -- > > --------------------------- > > Sterding (Xianjun) Dong > > PhD student, Boris Lenhard's group > > Bergen Center of Computational Science > > Bergen University, Norway > > Mobile: 0047-47361688 > > Telephone: 0047-55276381 > > > > Skype: xianjun.dong > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > --------------------------- > Sterding (Xianjun) Dong > PhD student, Boris Lenhard's group > Bergen Center of Computational Science > Bergen University, Norway > Mobile: 0047-47361688 > Telephone: 0047-55276381 > Skype: xianjun.dong > > From roy at colibase.bham.ac.uk Tue May 29 10:05:12 2007 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Tue, 29 May 2007 15:05:12 +0100 Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why? In-Reply-To: <465C1533.6070900@ii.uib.no> References: <465AD6E8.3030707@ii.uib.no> <62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com> <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com> <465C1533.6070900@ii.uib.no> Message-ID: <465C3318.5080201@colibase.bham.ac.uk> Hi Xianjun, I'm not sure if it is the cause of your problem, but your sequences seem to be quite short. This paper: http://mbe.oxfordjournals.org/cgi/content/full/21/12/2290 suggests that the codeml method of calculating Ka and Ks may be unreliable for sequences shorter than 300 codons. Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From gbr0wn at comcast.net Wed May 30 11:44:13 2007 From: gbr0wn at comcast.net (gbr0wn at comcast.net) Date: Wed, 30 May 2007 15:44:13 +0000 Subject: [Bioperl-l] getting started in windows Message-ID: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070530/2f640e16/attachment.pl From golharam at umdnj.edu Wed May 30 11:40:28 2007 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 30 May 2007 11:40:28 -0400 Subject: [Bioperl-l] ClustalW Score? Message-ID: <00c201c7a2d0$d971f550$2d01a8c0@PICO> How do I get the clustalw score from a clustalw alignment? I'm using the following code to align my sequences: $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); $seq[0] = ... $seq[1] = ... $seq[2] = ... $seq[3] = ... $aln = $aln_factory->align(\@seq); I can get the percentage identity from the Bio::SimpleAlign object, but there is no score. I looked into it further and it doesn't look like the score is being captured anywhere. So, how does one get the score from ClustalW using this method? Ryan From barry.moore at genetics.utah.edu Wed May 30 12:21:16 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 30 May 2007 10:21:16 -0600 Subject: [Bioperl-l] getting started in windows In-Reply-To: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net> References: <053020071544.12576.465D9BCD000342B80000312022070210530299CF9D0D09@comcast.net> Message-ID: Try opening up a terminal window (I think you'll find that under accessories). Change to the directory where you code is and run it off the command line. B On May 30, 2007, at 9:44 AM, gbr0wn at comcast.net wrote: > I am a perl novice trying to run perl 5.8.8 on windows xp system. > I have used 'wordpad' to paste tutorial code into an executable > file and when I double click the icon for the file a window opens > up briefly with output and/or error message but closes too fast for > me to read. Any idea why this might be happening? > Thanks, Greg Brown - gbr0wn at comcast.net > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed May 30 13:16:49 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 30 May 2007 10:16:49 -0700 Subject: [Bioperl-l] ClustalW Score? In-Reply-To: <00c201c7a2d0$d971f550$2d01a8c0@PICO> References: <00c201c7a2d0$d971f550$2d01a8c0@PICO> Message-ID: <1A4207F8295607498283FE9E93B775B403349DAB@EX02.asurite.ad.asu.edu> > How do I get the clustalw score from a clustalw alignment? > I'm using the following code to align my sequences: > > $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); > > $seq[0] = ... > $seq[1] = ... > $seq[2] = ... > $seq[3] = ... > > $aln = $aln_factory->align(\@seq); > > I can get the percentage identity from the Bio::SimpleAlign > object, but there is no score. I looked into it further and > it doesn't look like the score is being captured anywhere. > So, how does one get the score from ClustalW using this method? open(OUTCOPY, ">&STDOUT") or die "Couldn't dup STDOUT: $!"; open(STDOUT, ">log.test") or die "Couldn't open log.test: $!"; push @aln, $factory->align(\@seq); close STDOUT; open(STDOUT, ">&OUTCOPY"); open(TEMP, "log.test"); while () { if ($_ =~ /Score:(\d+)/) { $aln->score($1); print "Found score of $1\n"; } } close TEMP; unlink("log.test"); From jason at bioperl.org Wed May 30 14:54:20 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 30 May 2007 11:54:20 -0700 Subject: [Bioperl-l] ClustalW Score? In-Reply-To: <00e201c7a2de$91f60f50$2d01a8c0@PICO> References: <00e201c7a2de$91f60f50$2d01a8c0@PICO> Message-ID: You can do it without redirecting STDOUT or creating a new file, just change the system call to: Here is the code for running in _run in the module: my $commandstring = $self->executable."$instring"."$param_string"; $self->debug( "clustal command = $commandstring"); my $status = system($commandstring); unless( $status == 0 ) { $self->warn( "Clustalw call ($commandstring) crashed: $? \n"); return undef; } Do something like: my $fh; open($fh, "$commandstring |"); my $score; while(<$fh>) { $score = $1 if ($_ =~ /Score:(\d+)/); } close($fh); ... then at the bottom after the alignment is created do: $aln->score($score); There may be some more debugging b/c if you invoke the quiet => 1 parameter there may be an automatic ">& /dev/null" appended to the end of the parameter string that you'll need to figure out how to override. Sorry I don't have more time to help; I hope this gets you started. -jason On May 30, 2007, at 10:18 AM, Ryan Golhar wrote: > Did you see Kevin's response? That's one possible solution that > could be > implemented... > > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of > Jason > Stajich > Sent: Wednesday, May 30, 2007 12:05 PM > To: golharam at umdnj.edu > Subject: Re: [Bioperl-l] ClustalW Score? > > > Nope it isn't parsed since it is part of the STDOUT from the > program not the > alignment. If you want to add parsing of the STDOUT from Clustalw > someone > will need to refactor how the program is run and capture and parse the > STDOUT. The score can be added to the score field of the > SimpleAlign object, > but again since there is no where for it to be stored in a clustalw > alignment file it won't be round tripped anywhere. I think > stockholm will > manage it for you though. > > Do you know what the score represents - can it be computed from the > alignment itsself? > > -jason > > On May 30, 2007, at 8:40 AM, Ryan Golhar wrote: > > > How do I get the clustalw score from a clustalw alignment? I'm > using the > following code to align my sequences: > > $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); > > $seq[0] = ... > $seq[1] = ... > $seq[2] = ... > $seq[3] = ... > > $aln = $aln_factory->align(\@seq); > > I can get the percentage identity from the Bio::SimpleAlign object, > but > there is no score. I looked into it further and it doesn't look > like the > score is being captured anywhere. So, how does one get the score from > ClustalW using this method? > > Ryan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From Kevin.M.Brown at asu.edu Wed May 30 15:52:01 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 30 May 2007 12:52:01 -0700 Subject: [Bioperl-l] ClustalW Score? In-Reply-To: References: <00e201c7a2de$91f60f50$2d01a8c0@PICO> Message-ID: <1A4207F8295607498283FE9E93B775B403349E4D@EX02.asurite.ad.asu.edu> > You can do it without redirecting STDOUT or creating a new > file, just change the system call to: > > Here is the code for running in _run in the module: > my $commandstring = $self->executable."$instring"."$param_string"; > $self->debug( "clustal command = $commandstring"); > my $status = system($commandstring); > unless( $status == 0 ) { > $self->warn( "Clustalw call ($commandstring) crashed: $? > \n"); > return undef; > } > > Do something like: > > my $fh; > open($fh, "$commandstring |"); > my $score; > while(<$fh>) { > $score = $1 if ($_ =~ /Score:(\d+)/); } close($fh); > > ... then at the bottom after the alignment is created do: > > $aln->score($score); > > > There may be some more debugging b/c if you invoke the quiet > => 1 parameter there may be an automatic ">& /dev/null" > appended to the end of the parameter string that you'll need > to figure out how to override. > > Sorry I don't have more time to help; I hope this gets you started. I did it my way as I was doing it without modifying the Bioperl code (in case I later updated to a new version and forgot about the changes I had put into it). So that code just sits in my perl script where it calls the Bioperl module to create the Clustal alignment object. From Xianjun.Dong at bccs.uib.no Tue May 29 11:02:21 2007 From: Xianjun.Dong at bccs.uib.no (Dong Xianjun) Date: Tue, 29 May 2007 17:02:21 +0200 Subject: [Bioperl-l] PAML::Codeml outputs unstable value, why? In-Reply-To: <465C2F8E.2070309@ed.ac.uk> References: <465AD6E8.3030707@ii.uib.no> <62d36e2b0705290125x32b3fbdascfb1cedaacc8a1a0@mail.gmail.com> <62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com> <465C1533.6070900@ii.uib.no> <358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com> <465C2AE1.30101@ii.uib.no> <465C2F8E.2070309@ed.ac.uk> Message-ID: <465C407D.608@ii.uib.no> HI, Darren The sequences are from Human and zebrafish. I currently use two sequences. I just want to see what's the substitution pattern there is. But your comment remind me whether I should get the other species involved, like mouse, chicken. BTW, what's you mean 'per codon, not per site'? Do you mean the Ds(Ks) of Codeml is for per codon, and Yn00 is for per site? I think there should be a possible/reasonable way to calculate the synonymous substitution, even if the divergence is big enough. If the Codeml is not a good solution for that case, do you have better suggestion? Thanks Xianjun Darren Obbard wrote: > Out of interest, what are the species, and how much sequence are you > using? > > - Estimating Ds when it is >>1 is very hard anyway, since the > substitutions are saturated. i.e. Regardless of the method, there will > be some level of divergence for which Ds can no longer be estimated. A > Ds of ~14 (for PAML I think this is per codon, not per site) sounds > very high to me - higher than I would want to try to estimate Ds. > > Dong Xianjun wrote: >> Thanks for information, Albert. >> >> But still in two questions: >> Albert Vilella wrote: >>> codeml in PAML can give different results in cases where the >>> optimization reaches different local maxima depending on the >>> different starting points of each run (seed values). So, at least >>> for some methods and options, this instability is inherent to the >>> underlying algorithm. >> 1. How to set the initial value in order to get a reasonable >> estimation? Do you have some experience for that? >>> Even more, for some methods and options, it is even recommended in >>> PAML documentation to run the same data more than once, to see if >>> the results are the same, which would be a good indication that the >>> model is robust given the data. >> 2. Is there a recommend way to test the significance if the results >> are different? For example, in my case, dS could range from 10.1852 >> to 14.9961 for the four runtime. If that means the model is not >> robust(how to check this?), should I change to use another model? >> >> How could YN00 reach stable result? (Is it because YN00 does not >> require initial value for optimization?) Why could YN00 produce so >> different result from Codeml? (for YN00, dS=2.1300 with SE=1.2272; >> for Codeml, dS=10.1852-14.9961) >>> Maybe PAML's author can give a more specific answer for your data at: >>> http://www.rannala.org/gsf/viewforum.php?f=1 >> >> Actually I already post my question in the author's forum. Let's wait >> and see. >>> >>> Cheers, >>> >>> Albert. >>> >>> On 5/29/07, *Dong Xianjun* >> > wrote: >>> >>> HI, dear all, //sorry for duplicated msg for /Jason/ and /Neil/ >>> >>> I'm bothering by two problems when I use PAML module to calculate >>> Ka/Ks for my sequences. Could you help me? >>> >>> 1. Codeml could produce different Ka/Ks value if I run it twice. >>> I check it both in command line and in Perl wrapper of >>> Bio::Tools::Run::Phylo::PAML::Codeml; >>> >>> The input sequences are: >>> >seq1 >>> >>> TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG >>> >>> >seq2 >>> >>> TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG >>> >>> >>> For command-line program, I used Codeml in PAML3.14, with >>> specifications in codeml.ctl (runmode = -2, seqtype = 1). I tried >>> to run the program four times. The output are like below (from >>> the output file). We could see that they are different from each >>> other. they should be same or slightly different. Right? But they >>> are NOT. Weird! >>> >>> ---------------------------------------------------------------------------------------------------------------------------------- >>> >>> t=11.5447 S= 42.4 N= 122.6 dN/dS= 0.0035 dN= 0.0522 >>> dS=14.8339 >>> t= 9.4132 S= 41.8 N= 123.2 dN/dS= 0.0041 dN= 0.0507 >>> dS=12.2349 >>> t=11.6305 S= 42.2 N= 122.8 dN/dS= 0.0034 dN= 0.0510 >>> dS=14.9961 >>> t= 7.7879 S= 41.4 N= 123.6 dN/dS= 0.0050 dN= 0.0505 >>> dS=10.1852 >>> >>> ---------------------------------------------------------------------------------------------------------------------------------- >>> >>> I found the same problem when I use the Perl Wrapper of >>> Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script >>> here, similar to the one in BioPerl HOWTO). >>> >>> 2. Another strange thing is, if I switch to use program YN00 in >>> the package of PAML, the output are stable. However, it's much >>> different from Codeml. (see below) >>> >>> ---------------------------------------------------------------------------------------------------------------------------------- >>> >>> seq. seq. S N t kappa omega dN +- SE >>> dS +- SE >>> 2 1 40.4 124.6 1.7452 1.3163 0.0378 0.0804 +- >>> 0.0265 2.1300 +- 1.2272 >>> >>> ---------------------------------------------------------------------------------------------------------------------------------- >>> >>> Why like this? Which one I should believe? >>> >>> >>> Is there any guy who would kindly help me to run the perl script >>> (twice to check whether they are different)? or help to run the >>> codeml in command line? >>> I don't know whether there is anyone noticed this before, or >>> because of the wrong version of PAML. >>> >>> Regards, >>> >>> Xianjun >>> >>> >>> >>> Himanshu Ardawatia wrote: >>>> #!/usr/bin/perl >>>> >>>> use strict; >>>> use warnings; >>>> >>>> >>>> use Bio::Tools::Run::Phylo::PAML::Codeml; >>>> use Bio::Tools::Run::Alignment::Clustalw; >>>> >>>> # for projecting alignments from protein to R/DNA space >>>> use Bio::Align::Utilities qw(aa_to_dna_aln); >>>> >>>> # for input of the sequence data >>>> use Bio::SeqIO; >>>> use Bio::AlignIO; >>>> >>>> my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw(); >>>> >>>> #my $seqdata = 'chuck.fa'; >>>> my $seqdata = 'xianjun.fa '; >>>> >>>> my $seqIO = new Bio::SeqIO(-file => $seqdata, >>>> -format => 'fasta'); >>>> my %seqs; >>>> my @prots; >>>> >>>> my $output; >>>> # process each sequence >>>> while( my $seq = $seqIO->next_seq ) { >>>> $seqs{$seq->display_id} = $seq; >>>> # translate them into protein >>>> my $protein = $seq->translate(); >>>> my $pseq = $protein->seq(); >>>> if( $pseq =~ /\*/ && >>>> $pseq !~ /\*$/ ) { >>>> warn("provided a cDNA sequence with a stop codon, PAML will >>>> choke!"); >>>> exit(0); >>>> } >>>> # Tcoffee can't handle '*' even if it is trailing >>>> $pseq =~ s/\*//g; >>>> $protein->seq($pseq); >>>> push @prots, $protein; >>>> } >>>> >>>> if( @prots < 2 ) { >>>> warn("Need at least 2 cDNA sequences to proceed"); >>>> exit(0); >>>> } >>>> >>>> open(OUT, ">align_output.txt") || >>>> die("cannot open output $output for writing"); >>>> # Align the sequences with clustalw >>>> >>>> my $aa_aln = $aln_factory->align(\@prots); >>>> >>>> # project the protein alignment back to cDNA coordinates >>>> my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs); >>>> >>>> my @each = $dna_aln->each_seq(); >>>> >>>> my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml >>>> ( -params => { 'runmode' => -2, >>>> 'seqtype' => 1, >>>> 'model' => 1, >>>> } >>>> ); >>>> >>>> # set the alignment object >>>> $kaks_factory->alignment($dna_aln); >>>> >>>> # run the KaKs analysis >>>> my ($rc,$parser) = $kaks_factory->run(); >>>> my $result = $parser->next_result; >>>> my $MLmatrix = $result->get_MLmatrix(); >>>> >>>> my @otus = $result->get_seqs(); >>>> # this gives us a mapping from the PAML order of sequences back to >>>> # the input order (since names get truncated) >>>> my @pos = map { >>>> my $c= 1; >>>> foreach my $s ( @each ) { >>>> last if( $s->display_id eq $_->display_id ); >>>> $c++; >>>> } >>>> $c; >>>> } @otus; >>>> >>>> print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID >>>> CDNA_PERCENTID)), "\n"; >>>> for( my $i = 0; $i < (scalar @otus -1) ; $i++) { >>>> for( my $j = $i+1; $j < (scalar @otus); $j++ ) { >>>> my $sub_aa_aln = $aa_aln->select_noncont($pos[$i],$pos[$j]); >>>> my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]); >>>> print OUT join("\t", $otus[$i]->display_id, >>>> >>>> $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'}, >>>> $MLmatrix->[$i]->[$j]->{'dS'}, >>>> $MLmatrix->[$i]->[$j]->{'omega'}, >>>> sprintf("%.2f",$sub_aa_aln->percentage_identity), >>>> sprintf("%.2f",$sub_dna_aln->percentage_identity), >>>> ), "\n"; >>>> } >>>> } >>>> >>>> >>>> On 5/29/07, *Himanshu Ardawatia* >>> > wrote: >>>> >>>> Hi Xianjun, >>>> >>>> I recognize this script. But it was a bit cumbersom to use >>>> this as many things are done in the script (like multiple >>>> alignment, aa to dna alignment and ka/ks calculation) so one >>>> does not have real control on these different aspect. >>>> I do not remeber getting different Ka/Ks in different runs >>>> though. But I remeber that one I ran the script with >>>> different versions of clustalw and it REALLY gave different >>>> results !! So please make sure if the clustalw versions are >>>> the same in all your runs. Best is to use the latest version. >>>> >>>> Finally I wrote my simple script which would generate a >>>> codeml.ctl file for each set of sequences and run the codeml >>>> based on that and then more on. Disadvantage of this can be >>>> that some files keep getting over-written (like the one >>>> which have their names hard-coded in codeml program) and if >>>> one needs those files as well then one needs to run the >>>> codeml cycles for each set of sequences in different >>>> directories. >>>> >>>> One advantage of this kind of script is that you can use >>>> whichever alignment program you want to use and so on....But >>>> then its also extra steps of yourself doing multiple >>>> alignment and aa to dna alignment etc.... >>>> >>>> Does it make sense? If you still get different outputs with >>>> same version of clustalw then I can sit with you and look at >>>> things together. Or else try the script method which I >>>> mentioned. >>>> >>>> Cheers and Fu >>>> Himanshu >>>> \\ >>>> >>>> On 5/28/07, *Dong Xianjun* < Xianjun.Dong at bccs.uib.no >>>> > wrote: >>>> >>>> HI, Himanshu >>>> >>>> I am sure you did some work in Ka/Ks calculation. Here I >>>> have a question >>>> bothering me; the output for >>>> Bio::Tools::Run::Phylo::PAML::Codeml is not >>>> stable(different for each runtime), and also different >>>> from the output >>>> with modeul of Bio::Tools::Run::Phylo::PAML::Yn00. >>>> >>>> Here I attached the script. Could you help to have a >>>> look and try to run >>>> the script? How is your way to calculate the Kaks ratio? >>>> >>>> Thanks >>>> >>>> -- >>>> --------------------------- >>>> Sterding (Xianjun) Dong >>>> PhD student, Boris Lenhard's group >>>> Bergen Center of Computational Science >>>> Bergen University, Norway >>>> Mobile: 0047-47361688 >>>> Telephone: 0047-55276381 >>>> Skype: xianjun.dong >>>> >>>> >>>> >>>> >>> >>> -- --------------------------- >>> Sterding (Xianjun) Dong >>> PhD student, Boris Lenhard's group >>> Bergen Center of Computational Science >>> Bergen University, Norway >>> Mobile: 0047-47361688 >>> Telephone: 0047-55276381 >>> >>> Skype: xianjun.dong >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> --------------------------- >> Sterding (Xianjun) Dong >> PhD student, Boris Lenhard's group >> Bergen Center of Computational Science >> Bergen University, Norway >> Mobile: 0047-47361688 >> Telephone: 0047-55276381 >> Skype: xianjun.dong >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- --------------------------- Sterding (Xianjun) Dong PhD student, Boris Lenhard's group Bergen Center of Computational Science Bergen University, Norway Mobile: 0047-47361688 Telephone: 0047-55276381 Skype: xianjun.dong From bix at sendu.me.uk Thu May 31 04:34:38 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 31 May 2007 09:34:38 +0100 Subject: [Bioperl-l] ClustalW Score? In-Reply-To: References: <00e201c7a2de$91f60f50$2d01a8c0@PICO> Message-ID: <465E889E.3090304@sendu.me.uk> Jason Stajich wrote: > Do something like: > > my $fh; > open($fh, "$commandstring |"); > my $score; > while(<$fh>) { > $score = $1 if ($_ =~ /Score:(\d+)/); > } > close($fh); > > ... then at the bottom after the alignment is created do: > > $aln->score($score); > > > There may be some more debugging b/c if you invoke the quiet => 1 > parameter there may be an automatic ">& /dev/null" appended to the > end of the parameter string that you'll need to figure out how to > override. Is there any particular reason for not having something along these lines committed to the module? Shall I go ahead and implement? From bix at sendu.me.uk Thu May 31 05:54:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 31 May 2007 10:54:32 +0100 Subject: [Bioperl-l] ClustalW Score? In-Reply-To: References: <00e201c7a2de$91f60f50$2d01a8c0@PICO> Message-ID: <465E9B58.1020403@sendu.me.uk> Jason Stajich wrote: > $score = $1 if ($_ =~ /Score:(\d+)/); I see that there are lots of lines in the output that match the above regex, but there is also a single /Alignment Score (\d+)/ line printed at the end. Isn't that the score that should get stored in $aln->score()? From jason at bioperl.org Thu May 31 14:08:19 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 31 May 2007 11:08:19 -0700 Subject: [Bioperl-l] ClustalW Score? In-Reply-To: <465E9B58.1020403@sendu.me.uk> References: <00e201c7a2de$91f60f50$2d01a8c0@PICO> <465E9B58.1020403@sendu.me.uk> Message-ID: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org> you're right --- it is not really my code, I was just elaborating Kevin's example --- it would probably need to be more specific or perhaps the last Score seen is sufficient for what one is trying to capture? -j On May 31, 2007, at 2:54 AM, Sendu Bala wrote: > Jason Stajich wrote: >> $score = $1 if ($_ =~ /Score:(\d+)/); > > I see that there are lots of lines in the output that match the > above regex, but there is also a single /Alignment Score (\d+)/ > line printed at the end. Isn't that the score that should get > stored in $aln->score()? > -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From Kevin.M.Brown at asu.edu Thu May 31 14:15:38 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 31 May 2007 11:15:38 -0700 Subject: [Bioperl-l] ClustalW Score? In-Reply-To: <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org> References: <00e201c7a2de$91f60f50$2d01a8c0@PICO><465E9B58.1020403@sendu.me.uk> <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org> Message-ID: <1A4207F8295607498283FE9E93B775B40334A01A@EX02.asurite.ad.asu.edu> > you're right --- it is not really my code, I was just > elaborating Kevin's example --- it would probably need to be > more specific or perhaps the last Score seen is sufficient > for what one is trying to capture? I took that code from a pairwise clustal alignment script that I wrote to deal with aligning a bunch of short sequences against a long one to see where they line up at. When all of them were fed to Clustal the short sequences all ended up aligned to each other and not well aligned to the longer sequence. I only saw one score in the output from the pairwise, so that is what I used to find a reasonable value.