From grossman at molgen.mpg.de Mon Nov 1 03:19:11 2004 From: grossman at molgen.mpg.de (Steffen Grossmann) Date: Mon Nov 1 03:17:49 2004 Subject: [Bioperl-l] Adapt source method in Bio::SeqFeature::Annotated Message-ID: <4185F17F.7070901@molgen.mpg.de> Dear all, as a follow up to my last message (http://bioperl.org/pipermail/bioperl-l/2004-October/017178.html), I propose to adapt the 'source' method in Bio::SeqFeature:.Annotated on order to make it use the annotation system. This is merely for convenience, but also helps avoiding trouble by using different places to store a 'source' annotation. I provide a patch below. Steffen Bio_SeqFeature_Annotated.diff ---8<---8<---8<---8<---8<---8<---8<---8<---cut here *** bioperl-live/Bio/SeqFeature/Annotated.pm Mon Oct 25 11:08:25 2004 --- modified_bioperl-live/Bio/SeqFeature/Annotated.pm Mon Nov 1 09:06:09 2004 *************** *** 347,355 **** sub source { my $self = shift; ! return $self->{'source'} = shift if @_; ! return $self->{'source'}; } --- 347,367 ---- sub source { my $self = shift; + my $source = shift; ! if ($source) { ! $self->annotation->remove_Annotations('source'); ! $self->annotation->add_Annotation(Bio::Annotation::SimpleValue->new(-value => $source, ! -tagname => 'source') ! ); ! } else { ! if (my ($sa) = $self->annotation->get_Annotations('source')) { ! $source = $sa->value; ! } else { ! $source = ""; ! } ! } ! return $source; } ---8<---8<---8<---8<---8<---8<---8<---8<---cut here -- %---------------------------------------------% % Steffen Grossmann % % % % Max Planck Institute for Molecular Genetics % % Computational Molecular Biology % %---------------------------------------------% % Ihnestrasse 73 % % 14195 Berlin % % Germany % %---------------------------------------------% % Tel: (++49 +30) 8413-1167 % % Fax: (++49 +30) 8413-1152 % %---------------------------------------------% _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From nathanhaigh at ukonline.co.uk Mon Nov 1 03:37:22 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Mon Nov 1 03:36:13 2004 Subject: [Bioperl-l] nmake test In-Reply-To: Message-ID: <000001c4bfee$085b9d80$9ef4cdd9@Desktop> Hmm, this seems to be a problem associated with CVSnt which I used to obtain the latest BioPerl from CVS - I got fed up with having to download the entire BioPerl package in tarball for small updates. The latest tarball tested without errors, I'm now trying CVSnt again to see if I can reproduce the problem! On the first attempt I noticed that several of the BioPerl files contained '?' at the end of every line, so I suspected some mix up in the end of line characters; but replacing these files with the relevant ones from the website did not rectify the problem (unless I missed some). I'll post again if I can replicate the problem, in case it is any help to someone else. But as far as I can see the tarball seems to test without errors! Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh > Sent: 31 October 2004 11:36 > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] nmake test > > Hi, just to let you know the latest BioPerl from CVS is now giving the following errors on WinXP, and I am trying to debug them: > > Failed Test?? Stat Wstat Total Fail? Failed? List of Failed > ----------------------------------------------------------- > t\Index.t????? 255 65280??? 41??? 0?? 0.00%? ?? > t\Pictogram.t? 255 65280???? 3??? 3 100.00%? 1-3 > t\ProtPsm.t??? 255 65280???? 5??? 5 100.00%? 1-5 > t\flat.t?????? 255 65280??? 16??? 3? 18.75%? 14-16 > t\psm.t??????? 255 65280??? 48?? 48 100.00%? 1-48 > 2 subtests skipped. > > Nathan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From nathanhaigh at ukonline.co.uk Mon Nov 1 05:23:12 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Mon Nov 1 05:21:59 2004 Subject: [Bioperl-l] BioPerl CVS EOL Inconsistency In-Reply-To: <000001c4bfee$085b9d80$9ef4cdd9@Desktop> Message-ID: Ok, here's what I think I've found! If anyone knows more about this that I do, please feel free to interject! I have found that the files stored in BioPerl CVS have different end-of-line chars, and when CVSnt does conversions on these chars, it results in odd end-of-line chars, that I assume were affecting my previous "nmake test" using the files downloaded by CVSnt. As an example, if you look at Bio::SeqIO::Interpro, this file contains the EOL char "0D 0A" (Hex) which is the DOS EOL char (CR LF). When CVSnt downloads this file; it convert it to "0D 0D 0A" (Hex) which is not a standard EOL char (it is actually CR CR LF). Whereas Bio::SeqIO::GenBank contains the EOL char "0A" (Hex) which is the standard UNIX EOL char (LF). CVSnt converts this correctly into "0D 0A" (Hex) which is the DOS EOL char (CR LF). I'm not sure how these different EOL chars occur, because as far as I understand CVS, a CVS client should convert the EOL chars to those used by their OS when the files are downloaded, and then converts them back to those used on the CVS server during a commit. I think the only way this mixing of EOL chars can happen is if someone uses a different CVS client or OS to commit the files. On second thoughts, it is probably due to the fact that people will submit patched files from several OS's to BioPerl for CVS commits, but these files do not get converted from DOS/Mac to Unix before commits. Is there anything that can be done to bring all the EOL chars back into line as I'd really like to be able to use CVS on windows!? Thanks Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh > Sent: 01 November 2004 08:37 > To: bioperl-l@bioperl.org > Subject: RE: [Bioperl-l] nmake test > > Hmm, this seems to be a problem associated with CVSnt which I used to obtain the latest BioPerl from CVS - I got fed up with having > to download the entire BioPerl package in tarball for small updates. > > The latest tarball tested without errors, I'm now trying CVSnt again to see if I can reproduce the problem! On the first attempt I > noticed that several of the BioPerl files contained '?' at the end of every line, so I suspected some mix up in the end of line > characters; but replacing these files with the relevant ones from the website did not rectify the problem (unless I missed some). > > I'll post again if I can replicate the problem, in case it is any help to someone else. But as far as I can see the tarball seems to > test without errors! > > Nathan > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh > > Sent: 31 October 2004 11:36 > > To: bioperl-l@bioperl.org > > Subject: [Bioperl-l] nmake test > > > > Hi, just to let you know the latest BioPerl from CVS is now giving the following errors on WinXP, and I am trying to debug them: > > > > Failed Test?? Stat Wstat Total Fail? Failed? List of Failed > > ----------------------------------------------------------- > > t\Index.t????? 255 65280??? 41??? 0?? 0.00%? ?? > > t\Pictogram.t? 255 65280???? 3??? 3 100.00%? 1-3 > > t\ProtPsm.t??? 255 65280???? 5??? 5 100.00%? 1-5 > > t\flat.t?????? 255 65280??? 16??? 3? 18.75%? 14-16 > > t\psm.t??????? 255 65280??? 48?? 48 100.00%? 1-48 > > 2 subtests skipped. > > > > Nathan > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0444-3, 29/10/2004 Tested on: 01/11/2004 10:21:43 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0444-3, 29/10/2004 Tested on: 01/11/2004 10:23:10 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From nsh104 at york.ac.uk Mon Nov 1 10:32:10 2004 From: nsh104 at york.ac.uk (Nathan Haigh) Date: Mon Nov 1 10:30:37 2004 Subject: [Bioperl-l] BioPerl CVS EOL Inconsistency References: Message-ID: <418656FA.8020603@york.ac.uk> To clarify what i think the problem is, i've taken snippets from the following page: See the explaination at http://www.tortoisecvs.org/faq.html#brokenlineendings Essentially, the BioPerl CVS server should store files with the UNIX style end-of-line (EOL) char (LF). When files are checked out/commited with a windows based CVS client (CVSnt), EOL conversions takes place (LF <-> CR LF) automatically. However, if you try to commit dos files with a non-windows CVS client, this conversion doesn't occur so the CVS server now has files that contain CR LF EOL chars. This causes the following problems: * If you check out the file on UNIX (where no conversion is performed), the line endings will be and not , which is wrong. * If you check out the file on Windows using CVSNT, it will convert each to a to set the corret line endings for Windows. Unfortunately, the server copy of the file already *had* a before the . As a result, the local line endings will be - the file is totally screwed up! General rules for avoiding this: * It's highly recommended not to access your sandbox (that was checked out with TortoiseCVS, CVSNT or WinCVS) using other CVS clients, especially not with UNIX-style CVS clients (like Linux's or Cygwin's) because of line ending incompatibilities. * Furthermore, you have to be careful when you've checked out a module using UNIX line endings. You must not edit those files with a Windows text editor that overwrites the UNIX line endings with DOS line endings, or else you'll eventually get that additional at the end of each line. Therefore, files in BioPerl CVS need to be run through dos2unix and recommited - i could supply a list of files that are affected (up to around 50 files) if someone wanted to commit them! Nathan Nathan Haigh wrote: >Ok, here's what I think I've found! If anyone knows more about this that I do, please feel free to interject! > >I have found that the files stored in BioPerl CVS have different end-of-line chars, and when CVSnt does conversions on these chars, >it results in odd end-of-line chars, that I assume were affecting my previous "nmake test" using the files downloaded by CVSnt. > >As an example, if you look at Bio::SeqIO::Interpro, this file contains the EOL char "0D 0A" (Hex) which is the DOS EOL char (CR LF). >When CVSnt downloads this file; it convert it to "0D 0D 0A" (Hex) which is not a standard EOL char (it is actually CR CR LF). >Whereas Bio::SeqIO::GenBank contains the EOL char "0A" (Hex) which is the standard UNIX EOL char (LF). CVSnt converts this correctly >into "0D 0A" (Hex) which is the DOS EOL char (CR LF). > >I'm not sure how these different EOL chars occur, because as far as I understand CVS, a CVS client should convert the EOL chars to >those used by their OS when the files are downloaded, and then converts them back to those used on the CVS server during a commit. I >think the only way this mixing of EOL chars can happen is if someone uses a different CVS client or OS to commit the files. > >On second thoughts, it is probably due to the fact that people will submit patched files from several OS's to BioPerl for CVS >commits, but these files do not get converted from DOS/Mac to Unix before commits. > >Is there anything that can be done to bring all the EOL chars back into line as I'd really like to be able to use CVS on windows!? > >Thanks >Nathan > > > > >>-----Original Message----- >>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh >>Sent: 01 November 2004 08:37 >>To: bioperl-l@bioperl.org >>Subject: RE: [Bioperl-l] nmake test >> >>Hmm, this seems to be a problem associated with CVSnt which I used to obtain the latest BioPerl from CVS - I got fed up with >> >> >having > > >>to download the entire BioPerl package in tarball for small updates. >> >>The latest tarball tested without errors, I'm now trying CVSnt again to see if I can reproduce the problem! On the first attempt I >>noticed that several of the BioPerl files contained '?' at the end of every line, so I suspected some mix up in the end of line >>characters; but replacing these files with the relevant ones from the website did not rectify the problem (unless I missed some). >> >>I'll post again if I can replicate the problem, in case it is any help to someone else. But as far as I can see the tarball seems >> >> >to > > >>test without errors! >> >>Nathan >> >> >> >> >>>-----Original Message----- >>>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh >>>Sent: 31 October 2004 11:36 >>>To: bioperl-l@bioperl.org >>>Subject: [Bioperl-l] nmake test >>> >>>Hi, just to let you know the latest BioPerl from CVS is now giving the following errors on WinXP, and I am trying to debug them: >>> >>>Failed Test Stat Wstat Total Fail Failed List of Failed >>>----------------------------------------------------------- >>>t\Index.t 255 65280 41 0 0.00% ?? >>>t\Pictogram.t 255 65280 3 3 100.00% 1-3 >>>t\ProtPsm.t 255 65280 5 5 100.00% 1-5 >>>t\flat.t 255 65280 16 3 18.75% 14-16 >>>t\psm.t 255 65280 48 48 100.00% 1-48 >>>2 subtests skipped. >>> >>>Nathan >>> >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >--- >avast! Antivirus: Outbound message clean. >Virus Database (VPS): 0444-3, 29/10/2004 >Tested on: 01/11/2004 10:21:43 >avast! is copyright (c) 2000-2003 ALWIL Software. >http://www.avast.com > > > >--- >avast! Antivirus: Outbound message clean. >Virus Database (VPS): 0444-3, 29/10/2004 >Tested on: 01/11/2004 10:23:10 >avast! is copyright (c) 2000-2003 ALWIL Software. >http://www.avast.com > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From cain at cshl.org Mon Nov 1 11:52:41 2004 From: cain at cshl.org (Scott Cain) Date: Mon Nov 1 11:51:09 2004 Subject: [Bioperl-l] about Bio::Annotation::Collection Message-ID: <1099327961.1542.18.camel@localhost.localdomain> Hello all, I am trying to flesh out Bio::FeatureIO::gff to handle Target strings in GFF3 files. What I would like to do is to create an array of Bio::Location::Simple objects to represent the (potentially more than one) Target strings, and then add them to the annotations for for the line, presumably as a Bio::Annotation::Collection. The thing is, I have no idea how B::A::C works. Here is the documentation for the method 'add_Annotation', which is what I would want to use. Note the mention of an archetype without defining it (though I think it refers to the last line in the Usage section): add_Annotation Title : add_Annotation Usage : $self->add_Annotation('reference',$object); $self->add_Annotation($object,'Bio::MyInterface::DiseaseI'); $self->add_Annotation($object); $self->add_Annotation('disease',$object,'Bio::MyInterface::DiseaseI'); Function: Adds an annotation for a specific key. If the key is omitted, the object to be added must provide a value via its tagname(). If the archetype is provided, this and future objects added under that tag have to comply with the archetype and will be rejected otherwise. Returns : none Args : annotation key ('disease', 'dblink', ...) object to store (must be Bio::AnnotationI compliant) [optional] object archetype to map future storage of object of these types to Here is the section of code from Bio::FeatureIO::gff where I would like to use the Target string; my approach certainly seems to conflict with what Bio::Annotation would like, but it is not clear to me how to use it in this context. if($attr{Target}){ foreach my $target_string (@{ $attr{Target} } ) { $target_string =~ s/\+/ /g; my ($t_id,$tstart,$tend,$strand,$extra) = split /\s+/, $target_string; if (!$tend || $extra) { # too much or too little stuff in the string $self->throw("The value in the Target string, $target_string, does not conform to the GFF3 specification"); } my $target_loc = Bio::Location::Simple->new( -seq_id => $t_id, -start => $tstart, -end => $tend, ); if ($strand eq '+') { $strand = 1; } elsif ($strand eq '-') { $strand = -1; } $target_loc->strand($strand) if $strand; $target_loc->is_remote(1); $self->target($target_loc); } $ac->add_Annotation('Target',$self->target()); } ... and ... =head2 target Title : target Usage : $obj->target($newval) Function: Either return an array ref with Bio::LocationI objects representing the targets, or to add a target to the internal target list Example : my @targets = $obj->target(); $obj->target($newtarget); Returns : A list of Bio::LocationI objects Args : On set, a Bio::LocationI object =cut sub target { my $self = shift; push @{$self->{'target'}}, shift if defined(@_); return \@{$self->{'target'}}; } Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From grossman at molgen.mpg.de Mon Nov 1 12:49:17 2004 From: grossman at molgen.mpg.de (Steffen Grossmann) Date: Mon Nov 1 12:48:03 2004 Subject: [Bioperl-l] about Bio::Annotation::Collection In-Reply-To: <1099327961.1542.18.camel@localhost.localdomain> References: <1099327961.1542.18.camel@localhost.localdomain> Message-ID: <4186771D.2050609@molgen.mpg.de> Dear Scott, to make it work you have to write something like my $ta = Bio::Annotation::SimpleValue->new(-value => $target_loc); $ac->add_Annotation('Target',$ta); Do this for every target entry you want to add. You can then retrieve the targets from the collection by calling my @targets = $ac->get_Annotations('Target'); or (from the point of view of the Bio::SeqFeature::Annotated object) my @targets = $feature->annotation->get_Annotations('Target'); If you are writing your own 'target'-method (which should be a part of Bio::SeqFeature::Annotated) in the way you do it, you are outside the Bio::Annotation concept. But one can also think about writing a 'target'-method, which writes/reads into the feature's Bio::Annotation::Collection object. Hope this helps! Actually, I thought it to be more natural to use something like Bio::SeqFeature::FeaturePair to deal with the target entries in the GFF3 file. But obviously this is not compatible with Bio::SeqFeature::Annotated (cf. http://bioperl.org/pipermail/bioperl-l/2004-October/017195.html)... Steffen Scott Cain wrote: >Hello all, > >I am trying to flesh out Bio::FeatureIO::gff to handle Target strings in >GFF3 files. What I would like to do is to create an array of >Bio::Location::Simple objects to represent the (potentially more than >one) Target strings, and then add them to the annotations for for the >line, presumably as a Bio::Annotation::Collection. The thing is, I have >no idea how B::A::C works. Here is the documentation for the method >'add_Annotation', which is what I would want to use. Note the mention >of an archetype without defining it (though I think it refers to the >last line in the Usage section): > > add_Annotation > > Title : add_Annotation > Usage : $self->add_Annotation('reference',$object); > $self->add_Annotation($object,'Bio::MyInterface::DiseaseI'); > $self->add_Annotation($object); > $self->add_Annotation('disease',$object,'Bio::MyInterface::DiseaseI'); > Function: Adds an annotation for a specific key. > > If the key is omitted, the object to be added must provide a value > via its tagname(). > > > > If the archetype is provided, this and future objects added under > that tag have to comply with the archetype and will be rejected > otherwise. > > Returns : none > Args : annotation key ('disease', 'dblink', ...) > object to store (must be Bio::AnnotationI compliant) > [optional] object archetype to map future storage of object > of these types to > >Here is the section of code from Bio::FeatureIO::gff where I would like >to use the Target string; my approach certainly seems to conflict with >what Bio::Annotation would like, but it is not clear to me how to use it >in this context. > > if($attr{Target}){ > foreach my $target_string (@{ $attr{Target} } ) { > $target_string =~ s/\+/ /g; > my ($t_id,$tstart,$tend,$strand,$extra) = split /\s+/, $target_string; > if (!$tend || $extra) { # too much or too little stuff in the string > $self->throw("The value in the Target string, $target_string, does not conform to the GFF3 specification"); > } > my $target_loc = Bio::Location::Simple->new( > -seq_id => $t_id, > -start => $tstart, > -end => $tend, > ); > > if ($strand eq '+') { > $strand = 1; > } elsif ($strand eq '-') { > $strand = -1; > } > $target_loc->strand($strand) if $strand; > $target_loc->is_remote(1); > > $self->target($target_loc); > } > $ac->add_Annotation('Target',$self->target()); > } > >... and ... > >=head2 target > > Title : target > Usage : $obj->target($newval) > Function: Either return an array ref with Bio::LocationI objects > representing the targets, or to add a target to the > internal target list > Example : my @targets = $obj->target(); > $obj->target($newtarget); > Returns : A list of Bio::LocationI objects > Args : On set, a Bio::LocationI object > > >=cut > >sub target { > my $self = shift; > push @{$self->{'target'}}, shift if defined(@_); > return \@{$self->{'target'}}; >} > >Thanks, >Scott > > > > -- %---------------------------------------------% % Steffen Grossmann % % % % Max Planck Institute for Molecular Genetics % % Computational Molecular Biology % %---------------------------------------------% % Ihnestrasse 73 % % 14195 Berlin % % Germany % %---------------------------------------------% % Tel: (++49 +30) 8413-1167 % % Fax: (++49 +30) 8413-1152 % %---------------------------------------------% From allenday at ucla.edu Mon Nov 1 13:17:45 2004 From: allenday at ucla.edu (Allen Day) Date: Mon Nov 1 13:16:25 2004 Subject: [Bioperl-l] Adapt source method in Bio::SeqFeature::Annotated In-Reply-To: <4185F17F.7070901@molgen.mpg.de> References: <4185F17F.7070901@molgen.mpg.de> Message-ID: are you a cvs user now? if so, you can commit this in directly. -allen On Mon, 1 Nov 2004, Steffen Grossmann wrote: > Dear all, > > as a follow up to my last message (http://bioperl.org/pipermail/bioperl-l/2004-October/017178.html), I propose to adapt the 'source' method > in Bio::SeqFeature:.Annotated on order to make it use the annotation system. This is merely for convenience, but also helps avoiding trouble > by using different places to store a 'source' annotation. > > I provide a patch below. > > Steffen > > > > Bio_SeqFeature_Annotated.diff > ---8<---8<---8<---8<---8<---8<---8<---8<---cut here > *** bioperl-live/Bio/SeqFeature/Annotated.pm Mon Oct 25 11:08:25 2004 > --- modified_bioperl-live/Bio/SeqFeature/Annotated.pm Mon Nov 1 09:06:09 2004 > *************** > *** 347,355 **** > > sub source { > my $self = shift; > > ! return $self->{'source'} = shift if @_; > ! return $self->{'source'}; > } > > > --- 347,367 ---- > > sub source { > my $self = shift; > + my $source = shift; > > ! if ($source) { > ! $self->annotation->remove_Annotations('source'); > ! $self->annotation->add_Annotation(Bio::Annotation::SimpleValue->new(-value => $source, > ! -tagname => 'source') > ! ); > ! } else { > ! if (my ($sa) = $self->annotation->get_Annotations('source')) { > ! $source = $sa->value; > ! } else { > ! $source = ""; > ! } > ! } > ! return $source; > } > > > ---8<---8<---8<---8<---8<---8<---8<---8<---cut here > > > From brian_osborne at cognia.com Mon Nov 1 13:25:03 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Nov 1 13:23:41 2004 Subject: [Bioperl-l] BioPerl CVS EOL Inconsistency In-Reply-To: <418656FA.8020603@york.ac.uk> Message-ID: Nathan, Sure, send me the file. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Nathan Haigh Sent: Monday, November 01, 2004 10:32 AM To: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] BioPerl CVS EOL Inconsistency To clarify what i think the problem is, i've taken snippets from the following page: See the explaination at http://www.tortoisecvs.org/faq.html#brokenlineendings Essentially, the BioPerl CVS server should store files with the UNIX style end-of-line (EOL) char (LF). When files are checked out/commited with a windows based CVS client (CVSnt), EOL conversions takes place (LF <-> CR LF) automatically. However, if you try to commit dos files with a non-windows CVS client, this conversion doesn't occur so the CVS server now has files that contain CR LF EOL chars. This causes the following problems: * If you check out the file on UNIX (where no conversion is performed), the line endings will be and not , which is wrong. * If you check out the file on Windows using CVSNT, it will convert each to a to set the corret line endings for Windows. Unfortunately, the server copy of the file already *had* a before the . As a result, the local line endings will be - the file is totally screwed up! General rules for avoiding this: * It's highly recommended not to access your sandbox (that was checked out with TortoiseCVS, CVSNT or WinCVS) using other CVS clients, especially not with UNIX-style CVS clients (like Linux's or Cygwin's) because of line ending incompatibilities. * Furthermore, you have to be careful when you've checked out a module using UNIX line endings. You must not edit those files with a Windows text editor that overwrites the UNIX line endings with DOS line endings, or else you'll eventually get that additional at the end of each line. Therefore, files in BioPerl CVS need to be run through dos2unix and recommited - i could supply a list of files that are affected (up to around 50 files) if someone wanted to commit them! Nathan Nathan Haigh wrote: >Ok, here's what I think I've found! If anyone knows more about this that I do, please feel free to interject! > >I have found that the files stored in BioPerl CVS have different end-of-line chars, and when CVSnt does conversions on these chars, >it results in odd end-of-line chars, that I assume were affecting my previous "nmake test" using the files downloaded by CVSnt. > >As an example, if you look at Bio::SeqIO::Interpro, this file contains the EOL char "0D 0A" (Hex) which is the DOS EOL char (CR LF). >When CVSnt downloads this file; it convert it to "0D 0D 0A" (Hex) which is not a standard EOL char (it is actually CR CR LF). >Whereas Bio::SeqIO::GenBank contains the EOL char "0A" (Hex) which is the standard UNIX EOL char (LF). CVSnt converts this correctly >into "0D 0A" (Hex) which is the DOS EOL char (CR LF). > >I'm not sure how these different EOL chars occur, because as far as I understand CVS, a CVS client should convert the EOL chars to >those used by their OS when the files are downloaded, and then converts them back to those used on the CVS server during a commit. I >think the only way this mixing of EOL chars can happen is if someone uses a different CVS client or OS to commit the files. > >On second thoughts, it is probably due to the fact that people will submit patched files from several OS's to BioPerl for CVS >commits, but these files do not get converted from DOS/Mac to Unix before commits. > >Is there anything that can be done to bring all the EOL chars back into line as I'd really like to be able to use CVS on windows!? > >Thanks >Nathan > > > > >>-----Original Message----- >>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh >>Sent: 01 November 2004 08:37 >>To: bioperl-l@bioperl.org >>Subject: RE: [Bioperl-l] nmake test >> >>Hmm, this seems to be a problem associated with CVSnt which I used to obtain the latest BioPerl from CVS - I got fed up with >> >> >having > > >>to download the entire BioPerl package in tarball for small updates. >> >>The latest tarball tested without errors, I'm now trying CVSnt again to see if I can reproduce the problem! On the first attempt I >>noticed that several of the BioPerl files contained '?' at the end of every line, so I suspected some mix up in the end of line >>characters; but replacing these files with the relevant ones from the website did not rectify the problem (unless I missed some). >> >>I'll post again if I can replicate the problem, in case it is any help to someone else. But as far as I can see the tarball seems >> >> >to > > >>test without errors! >> >>Nathan >> >> >> >> >>>-----Original Message----- >>>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh >>>Sent: 31 October 2004 11:36 >>>To: bioperl-l@bioperl.org >>>Subject: [Bioperl-l] nmake test >>> >>>Hi, just to let you know the latest BioPerl from CVS is now giving the following errors on WinXP, and I am trying to debug them: >>> >>>Failed Test Stat Wstat Total Fail Failed List of Failed >>>----------------------------------------------------------- >>>t\Index.t 255 65280 41 0 0.00% ?? >>>t\Pictogram.t 255 65280 3 3 100.00% 1-3 >>>t\ProtPsm.t 255 65280 5 5 100.00% 1-5 >>>t\flat.t 255 65280 16 3 18.75% 14-16 >>>t\psm.t 255 65280 48 48 100.00% 1-48 >>>2 subtests skipped. >>> >>>Nathan >>> >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >--- >avast! Antivirus: Outbound message clean. >Virus Database (VPS): 0444-3, 29/10/2004 >Tested on: 01/11/2004 10:21:43 >avast! is copyright (c) 2000-2003 ALWIL Software. >http://www.avast.com > > > >--- >avast! Antivirus: Outbound message clean. >Virus Database (VPS): 0444-3, 29/10/2004 >Tested on: 01/11/2004 10:23:10 >avast! is copyright (c) 2000-2003 ALWIL Software. >http://www.avast.com > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From allenday at ucla.edu Mon Nov 1 13:39:48 2004 From: allenday at ucla.edu (Allen Day) Date: Mon Nov 1 13:38:21 2004 Subject: [Bioperl-l] about Bio::Annotation::Collection In-Reply-To: <4186771D.2050609@molgen.mpg.de> References: <1099327961.1542.18.camel@localhost.localdomain> <4186771D.2050609@molgen.mpg.de> Message-ID: a couple of points: [1] Bio::Annotation::Collection won't accomodate anything that isn't an annotation. This means it won't accept Bio::Location::Simple or Bio::LocatableSeq objects. We need to write our own accessors into the SeqFeature (or inherit from something that stores multiple locations) [2] Bio::Location::Simple only stores a coordinate pair. It doesn't store the reference sequence identifier -- something that is important for a feature target. I'm about to commit code for handling target objects. It requires you instantiate a Bio::LocatableSeq outside the Bio::SeqFeature::Annotated, and pass it in. We can try to make the method smarter so it can understand strings like "chr1:+:1..100", but I've left it out for now. Here's the patch (which has been applied): --- ./Bio/SeqFeature/Annotated.pm 22 Oct 2004 14:55:15 -0000 1.5 +++ ./Bio/SeqFeature/Annotated.pm 1 Nov 2004 19:04:22 -0000 @@ -5,6 +5,7 @@ use base qw(Bio::Root::Root Bio::SeqFeat use Bio::Root::Root; use Bio::Annotation::Collection; +use Bio::LocatableSeq; use Bio::Location::Simple; use Bio::Tools::GFF; @@ -448,21 +449,20 @@ sub score { return $self->{'_gsf_score'}; } -=head2 phase - +=head2 phase + Title : phase Usage : $phase = $feat->phase() $feat->phase($phase) Function: get/set on phase information Returns : 0,1,2, '.' Args : none if get, the new value if set - - + =cut sub phase { my $self = shift; - + if ( @_ ) { my $value = shift; if ( defined $value && @@ -531,6 +531,40 @@ sub location { } return $self->{'_location'}; } + +=head2 add_target() + + Usage : $seqfeature->add_target(Bio::LocatableSeq->new(...)); + Function: adds a target location on another reference sequence for this feature + Returns : true on success + Args : a Bio::LocatableSeq object + + +=cut + +sub add_target { + my ($self,$seq) = @_; + $self->throw("$seq is not a Bio::LocatableSeq, bailing out") unless ref($seq) and seq->isa('Bio::LocatableSeq'); + push @{ $self->{'targets'} }, $seq; + return $seq; +} + +=head2 each_target() + + Usage : @targets = $seqfeature->each_target(); + Function: Returns a list of Bio::LocatableSeqs which are the locations of this object. + To obtain the "primary" location, see L. + Returns : a list of 0..N Bio::LocatableSeq objects + Args : none + + +=cut + +sub each_target { + my ($self) = @_; + return $self->{'targets'} ? @{ $self->{'targets'} } : (); +} + sub _no_tags { my $self = shift; -Allen On Mon, 1 Nov 2004, Steffen Grossmann wrote: > Dear Scott, > > to make it work you have to write something like > > my $ta = Bio::Annotation::SimpleValue->new(-value => $target_loc); > $ac->add_Annotation('Target',$ta); > > Do this for every target entry you want to add. You can then retrieve > the targets from the collection by calling > > my @targets = $ac->get_Annotations('Target'); > > or (from the point of view of the Bio::SeqFeature::Annotated object) > > my @targets = $feature->annotation->get_Annotations('Target'); > > If you are writing your own 'target'-method (which should be a part of > Bio::SeqFeature::Annotated) in the way you do it, you are outside the > Bio::Annotation concept. But one can also think about writing a > 'target'-method, which writes/reads into the feature's > Bio::Annotation::Collection object. > > Hope this helps! Actually, I thought it to be more natural to use > something like Bio::SeqFeature::FeaturePair to deal with the target > entries in the GFF3 file. But obviously this is not compatible with > Bio::SeqFeature::Annotated (cf. > http://bioperl.org/pipermail/bioperl-l/2004-October/017195.html)... > > Steffen > > > Scott Cain wrote: > > >Hello all, > > > >I am trying to flesh out Bio::FeatureIO::gff to handle Target strings in > >GFF3 files. What I would like to do is to create an array of > >Bio::Location::Simple objects to represent the (potentially more than > >one) Target strings, and then add them to the annotations for for the > >line, presumably as a Bio::Annotation::Collection. The thing is, I have > >no idea how B::A::C works. Here is the documentation for the method > >'add_Annotation', which is what I would want to use. Note the mention > >of an archetype without defining it (though I think it refers to the > >last line in the Usage section): > > > > add_Annotation > > > > Title : add_Annotation > > Usage : $self->add_Annotation('reference',$object); > > $self->add_Annotation($object,'Bio::MyInterface::DiseaseI'); > > $self->add_Annotation($object); > > $self->add_Annotation('disease',$object,'Bio::MyInterface::DiseaseI'); > > Function: Adds an annotation for a specific key. > > > > If the key is omitted, the object to be added must provide a value > > via its tagname(). > > > > > > > > If the archetype is provided, this and future objects added under > > that tag have to comply with the archetype and will be rejected > > otherwise. > > > > Returns : none > > Args : annotation key ('disease', 'dblink', ...) > > object to store (must be Bio::AnnotationI compliant) > > [optional] object archetype to map future storage of object > > of these types to > > > >Here is the section of code from Bio::FeatureIO::gff where I would like > >to use the Target string; my approach certainly seems to conflict with > >what Bio::Annotation would like, but it is not clear to me how to use it > >in this context. > > > > if($attr{Target}){ > > foreach my $target_string (@{ $attr{Target} } ) { > > $target_string =~ s/\+/ /g; > > my ($t_id,$tstart,$tend,$strand,$extra) = split /\s+/, $target_string; > > if (!$tend || $extra) { # too much or too little stuff in the string > > $self->throw("The value in the Target string, $target_string, does not conform to the GFF3 specification"); > > } > > my $target_loc = Bio::Location::Simple->new( > > -seq_id => $t_id, > > -start => $tstart, > > -end => $tend, > > ); > > > > if ($strand eq '+') { > > $strand = 1; > > } elsif ($strand eq '-') { > > $strand = -1; > > } > > $target_loc->strand($strand) if $strand; > > $target_loc->is_remote(1); > > > > $self->target($target_loc); > > } > > $ac->add_Annotation('Target',$self->target()); > > } > > > >... and ... > > > >=head2 target > > > > Title : target > > Usage : $obj->target($newval) > > Function: Either return an array ref with Bio::LocationI objects > > representing the targets, or to add a target to the > > internal target list > > Example : my @targets = $obj->target(); > > $obj->target($newtarget); > > Returns : A list of Bio::LocationI objects > > Args : On set, a Bio::LocationI object > > > > > >=cut > > > >sub target { > > my $self = shift; > > push @{$self->{'target'}}, shift if defined(@_); > > return \@{$self->{'target'}}; > >} > > > >Thanks, > >Scott > > > > > > > > > > From jason.stajich at duke.edu Mon Nov 1 13:55:29 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Nov 1 13:54:00 2004 Subject: [Bioperl-l] about Bio::Annotation::Collection In-Reply-To: References: <1099327961.1542.18.camel@localhost.localdomain> <4186771D.2050609@molgen.mpg.de> Message-ID: <99741DDC-2C37-11D9-9FF5-000393C44276@duke.edu> Locations do have a place to store the reference system -- the seq_id method. This would hold the ref coordinate system you are looking for I think. I think using a LocationI would be better than LocatableSeq because you can also use Location::Split to represent the multiple sub-locations and LocatableSeq necessarily is contiguous as it only has start/end. Maybe LocatableSeq could be updated to be more Location friendly and re-use those objects within? -jason On Nov 1, 2004, at 1:39 PM, Allen Day wrote: > a couple of points: > > [1] Bio::Annotation::Collection won't accomodate anything that isn't an > annotation. This means it won't accept Bio::Location::Simple or > Bio::LocatableSeq objects. We need to write our own accessors into the > SeqFeature (or inherit from something that stores multiple locations) > > [2] Bio::Location::Simple only stores a coordinate pair. It doesn't > store the reference sequence identifier -- something that is important > for > a feature target. > > I'm about to commit code for handling target objects. It requires you > instantiate a Bio::LocatableSeq outside the Bio::SeqFeature::Annotated, > and pass it in. We can try to make the method smarter so it can > understand strings like "chr1:+:1..100", but I've left it out for now. > > Here's the patch (which has been applied): > > --- ./Bio/SeqFeature/Annotated.pm 22 Oct 2004 14:55:15 -0000 > 1.5 > +++ ./Bio/SeqFeature/Annotated.pm 1 Nov 2004 19:04:22 -0000 > @@ -5,6 +5,7 @@ use base qw(Bio::Root::Root Bio::SeqFeat > > use Bio::Root::Root; > use Bio::Annotation::Collection; > +use Bio::LocatableSeq; > use Bio::Location::Simple; > use Bio::Tools::GFF; > > @@ -448,21 +449,20 @@ sub score { > return $self->{'_gsf_score'}; > } > > -=head2 phase > - > +=head2 phase > + > Title : phase > Usage : $phase = $feat->phase() > $feat->phase($phase) > Function: get/set on phase information > Returns : 0,1,2, '.' > Args : none if get, the new value if set > - > - > + > =cut > > sub phase { > my $self = shift; > - > + > if ( @_ ) { > my $value = shift; > if ( defined $value && > @@ -531,6 +531,40 @@ sub location { > } > return $self->{'_location'}; > } > + > +=head2 add_target() > + > + Usage : $seqfeature->add_target(Bio::LocatableSeq->new(...)); > + Function: adds a target location on another reference sequence for > this feature > + Returns : true on success > + Args : a Bio::LocatableSeq object > + > + > +=cut > + > +sub add_target { > + my ($self,$seq) = @_; > + $self->throw("$seq is not a Bio::LocatableSeq, bailing out") unless > ref($seq) and seq->isa('Bio::LocatableSeq'); > + push @{ $self->{'targets'} }, $seq; > + return $seq; > +} > + > +=head2 each_target() > + > + Usage : @targets = $seqfeature->each_target(); > + Function: Returns a list of Bio::LocatableSeqs which are the > locations of this object. > + To obtain the "primary" location, see L. > + Returns : a list of 0..N Bio::LocatableSeq objects > + Args : none > + > + > +=cut > + > +sub each_target { > + my ($self) = @_; > + return $self->{'targets'} ? @{ $self->{'targets'} } : (); > +} > + > > sub _no_tags { > my $self = shift; > > > > -Allen > > > > On Mon, 1 Nov 2004, Steffen Grossmann wrote: > >> Dear Scott, >> >> to make it work you have to write something like >> >> my $ta = Bio::Annotation::SimpleValue->new(-value => $target_loc); >> $ac->add_Annotation('Target',$ta); >> >> Do this for every target entry you want to add. You can then retrieve >> the targets from the collection by calling >> >> my @targets = $ac->get_Annotations('Target'); >> >> or (from the point of view of the Bio::SeqFeature::Annotated object) >> >> my @targets = $feature->annotation->get_Annotations('Target'); >> >> If you are writing your own 'target'-method (which should be a part of >> Bio::SeqFeature::Annotated) in the way you do it, you are outside the >> Bio::Annotation concept. But one can also think about writing a >> 'target'-method, which writes/reads into the feature's >> Bio::Annotation::Collection object. >> >> Hope this helps! Actually, I thought it to be more natural to use >> something like Bio::SeqFeature::FeaturePair to deal with the target >> entries in the GFF3 file. But obviously this is not compatible with >> Bio::SeqFeature::Annotated (cf. >> http://bioperl.org/pipermail/bioperl-l/2004-October/017195.html)... >> >> Steffen >> >> >> Scott Cain wrote: >> >>> Hello all, >>> >>> I am trying to flesh out Bio::FeatureIO::gff to handle Target >>> strings in >>> GFF3 files. What I would like to do is to create an array of >>> Bio::Location::Simple objects to represent the (potentially more than >>> one) Target strings, and then add them to the annotations for for the >>> line, presumably as a Bio::Annotation::Collection. The thing is, I >>> have >>> no idea how B::A::C works. Here is the documentation for the method >>> 'add_Annotation', which is what I would want to use. Note the >>> mention >>> of an archetype without defining it (though I think it refers to the >>> last line in the Usage section): >>> >>> add_Annotation >>> >>> Title : add_Annotation >>> Usage : $self->add_Annotation('reference',$object); >>> >>> $self->add_Annotation($object,'Bio::MyInterface::DiseaseI'); >>> $self->add_Annotation($object); >>> >>> $self->add_Annotation('disease',$object,'Bio::MyInterface:: >>> DiseaseI'); >>> Function: Adds an annotation for a specific key. >>> >>> If the key is omitted, the object to be added must >>> provide a value >>> via its tagname(). >>> >>> >>> >>> If the archetype is provided, this and future >>> objects added under >>> that tag have to comply with the archetype and will >>> be rejected >>> otherwise. >>> >>> Returns : none >>> Args : annotation key ('disease', 'dblink', ...) >>> object to store (must be Bio::AnnotationI compliant) >>> [optional] object archetype to map future storage >>> of object >>> of these types to >>> >>> Here is the section of code from Bio::FeatureIO::gff where I would >>> like >>> to use the Target string; my approach certainly seems to conflict >>> with >>> what Bio::Annotation would like, but it is not clear to me how to >>> use it >>> in this context. >>> >>> if($attr{Target}){ >>> foreach my $target_string (@{ $attr{Target} } ) { >>> $target_string =~ s/\+/ /g; >>> my ($t_id,$tstart,$tend,$strand,$extra) = split /\s+/, >>> $target_string; >>> if (!$tend || $extra) { # too much or too little stuff in the >>> string >>> $self->throw("The value in the Target string, $target_string, >>> does not conform to the GFF3 specification"); >>> } >>> my $target_loc = Bio::Location::Simple->new( >>> -seq_id => $t_id, >>> -start => $tstart, >>> -end => $tend, >>> ); >>> >>> if ($strand eq '+') { >>> $strand = 1; >>> } elsif ($strand eq '-') { >>> $strand = -1; >>> } >>> $target_loc->strand($strand) if $strand; >>> $target_loc->is_remote(1); >>> >>> $self->target($target_loc); >>> } >>> $ac->add_Annotation('Target',$self->target()); >>> } >>> >>> ... and ... >>> >>> =head2 target >>> >>> Title : target >>> Usage : $obj->target($newval) >>> Function: Either return an array ref with Bio::LocationI objects >>> representing the targets, or to add a target to the >>> internal target list >>> Example : my @targets = $obj->target(); >>> $obj->target($newtarget); >>> Returns : A list of Bio::LocationI objects >>> Args : On set, a Bio::LocationI object >>> >>> >>> =cut >>> >>> sub target { >>> my $self = shift; >>> push @{$self->{'target'}}, shift if defined(@_); >>> return \@{$self->{'target'}}; >>> } >>> >>> Thanks, >>> Scott >>> >>> >>> >>> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From brian_osborne at cognia.com Mon Nov 1 15:14:56 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Nov 1 15:13:54 2004 Subject: [Bioperl-l] update to Bio::Tools::Run::EMBOSSacd In-Reply-To: <32789.154.20.41.204.1098761586.squirrel@webx440.bcgsc.bc.ca> Message-ID: Stephen, This is done. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of smontgom@bcgsc.ca Sent: Monday, October 25, 2004 11:33 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] update to Bio::Tools::Run::EMBOSSacd Yo - Can I pass on this code snippet for a CVS commit? The newer version of EMBOSS removes the acdc parameter -acdtable. So the Bio::Tools::Run::EMBOSSacd module needs this snippet added at line 158 (after the reset of the hash %OPT) my $version = `embossversion`; my $file; if ($version lt "2.8.0") { # reading from EMBOSS program acdc stdout (previous to version 2.8.0) $file = `acdc $prog -help -verbose -acdtable 2>&1`; } else { # reading from EMBOSS program acdtable stdout (equal to after 2.8.0) $file = `acdtable $prog -help -verbose 2>&1`; } Cheers, Stephen _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From grossman at molgen.mpg.de Tue Nov 2 03:42:18 2004 From: grossman at molgen.mpg.de (Steffen Grossmann) Date: Tue Nov 2 03:40:49 2004 Subject: [Bioperl-l] Adapt source method in Bio::SeqFeature::Annotated In-Reply-To: References: <4185F17F.7070901@molgen.mpg.de> Message-ID: <4187486A.4060405@molgen.mpg.de> I am not. Still waiting for someone to make the final step... Steffen Allen Day wrote: >are you a cvs user now? if so, you can commit this in directly. > >-allen > > >On Mon, 1 Nov 2004, Steffen Grossmann wrote: > > > >>Dear all, >> >>as a follow up to my last message (http://bioperl.org/pipermail/bioperl-l/2004-October/017178.html), I propose to adapt the 'source' method >>in Bio::SeqFeature:.Annotated on order to make it use the annotation system. This is merely for convenience, but also helps avoiding trouble >>by using different places to store a 'source' annotation. >> >>I provide a patch below. >> >>Steffen >> >> >> >>Bio_SeqFeature_Annotated.diff >>---8<---8<---8<---8<---8<---8<---8<---8<---cut here >>*** bioperl-live/Bio/SeqFeature/Annotated.pm Mon Oct 25 11:08:25 2004 >>--- modified_bioperl-live/Bio/SeqFeature/Annotated.pm Mon Nov 1 09:06:09 2004 >>*************** >>*** 347,355 **** >> >> sub source { >> my $self = shift; >> >>! return $self->{'source'} = shift if @_; >>! return $self->{'source'}; >> } >> >> >>--- 347,367 ---- >> >> sub source { >> my $self = shift; >>+ my $source = shift; >> >>! if ($source) { >>! $self->annotation->remove_Annotations('source'); >>! $self->annotation->add_Annotation(Bio::Annotation::SimpleValue->new(-value => $source, >>! -tagname => 'source') >>! ); >>! } else { >>! if (my ($sa) = $self->annotation->get_Annotations('source')) { >>! $source = $sa->value; >>! } else { >>! $source = ""; >>! } >>! } >>! return $source; >> } >> >> >>---8<---8<---8<---8<---8<---8<---8<---8<---cut here >> >> >> >> >> > > > -- %---------------------------------------------% % Steffen Grossmann % % % % Max Planck Institute for Molecular Genetics % % Computational Molecular Biology % %---------------------------------------------% % Ihnestrasse 73 % % 14195 Berlin % % Germany % %---------------------------------------------% % Tel: (++49 +30) 8413-1167 % % Fax: (++49 +30) 8413-1152 % %---------------------------------------------% From zg105 at york.ac.uk Tue Nov 2 10:32:08 2004 From: zg105 at york.ac.uk (Zara Ghazoui) Date: Tue Nov 2 10:30:30 2004 Subject: [Bioperl-l] Bioperl help needed! Message-ID: <4187A878.6040803@york.ac.uk> Hi all, I am trying to compare some trees to each other based on their nodes. So one thing I need to do it to print and compare the nodes. I have tried the to_string() function but no joy. Your help is much appreciated. Many thanks, Zara From jason.stajich at duke.edu Tue Nov 2 10:38:21 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 2 10:36:50 2004 Subject: [Bioperl-l] Bioperl help needed! In-Reply-To: <4187A878.6040803@york.ac.uk> References: <4187A878.6040803@york.ac.uk> Message-ID: <3A059FA9-2CE5-11D9-BD44-000393C44276@duke.edu> You need to provide more information if you want help. 'no joy' -- meaning it didn't give you what you wanted? print $node->id will print the node Id, although internal nodes won't have labels unless you had provided them. print $node->internal_id will print the unique internal ID assigned by Bioperl. -jason On Nov 2, 2004, at 10:32 AM, Zara Ghazoui wrote: > Hi all, > I am trying to compare some trees to each other based on their nodes. > So one thing I need to do it to print and compare the nodes. > I have tried the > > to_string() > > function but no joy. > > Your help is much appreciated. > > Many thanks, > Zara > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From uridavid at netvision.net.il Tue Nov 2 11:15:47 2004 From: uridavid at netvision.net.il (Uri David Akavia) Date: Tue Nov 2 11:13:10 2004 Subject: [Bioperl-l] How do I bl2seq remotely? Message-ID: <4187B2B3.1030805@netvision.net.il> Hello. I have a program which runs bl2seq locally using the following commands: my ($query, $sequence, $p_segments) = (@_); my @params = ('program' => "blastn", # We are comparing nucleotides 'S' => "1", # use only the plus strands 'F' => "F", # Do not filter the sequences, compare all of them 'e' => "0.01" # Lower e-value, due to behavior of standalone BLAST ); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $report = $factory->bl2seq($query, $sequence); I would like to convert this bit of code to run bl2seq remotely, since the results of remote and local BLAST don't match up. This seems like a bug in blast, not bioperl. I haven't found out how to run bl2seq remotely - the documentation seems to focus on running blastn against a database remotely. Could someone please send me the appropriate bit of code to do the exact same thing remotely? Thanks, Uri David Akavia From zg105 at york.ac.uk Tue Nov 2 11:18:39 2004 From: zg105 at york.ac.uk (Zara Ghazoui) Date: Tue Nov 2 11:17:14 2004 Subject: [Bioperl-l] Bioperl help needed! Message-ID: <4187B35F.4020006@york.ac.uk> What I would like to be able to do is the following: I need to iterate through all the nodes of tree A (with bootstrap value >70) to see if each node, with its descedents (subtree) are present in tree B. Hope this clarifies my first question! Zara From jason.stajich at duke.edu Tue Nov 2 11:42:43 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 2 11:41:10 2004 Subject: [Bioperl-l] Bioperl help needed! In-Reply-To: <4187B35F.4020006@york.ac.uk> References: <4187B35F.4020006@york.ac.uk> Message-ID: <382357C2-2CEE-11D9-BD44-000393C44276@duke.edu> On Nov 2, 2004, at 11:18 AM, Zara Ghazoui wrote: > What I would like to be able to do is the following: > > I need to iterate through all the nodes of tree A (with bootstrap > value >70) to see if each node, with its descedents (subtree) are > present in tree B. > Are you just checking to see if they are present in tree B or have the same relationships (in the same clade). You can use methods like get_lca (see TreeFunctionsI) to get the least common ancestor of a set of nodes and see if The $node->ancestor method lets you walk UP the tree to get ancestors (each node only has 1 ancestor) or $node->get_all_Descendents to get all descendents below a node, or just $node->each_Descendent to get just the Nodes in the level below. Here is one way to start, but not really sure of the behavior you want. # assuming you have stored the bootstrap value as an internal ID for the node # like this where the A-B clade has 75% support # ((A:0.100,B:0.200)75:0.150,C:0.300); # get all the nodes, grep out for those which are NOT tips and have bootstrap support >= 70 for my $node ( grep { ! $_->is_Leaf() && $_->id >= 70 } $treeA->get_Nodes ) { # note that this will touch the same nodes more than once my $all_present = 1; # grab all the tips from this clade for my $descendent ( grep { $_->is_Leaf} $node->get_all_Descendents ) { my $name = $descdendent->id; unless( grep { $_->is_Leaf && $_->id eq $name } $treeB->get_Nodes ) { $all_present = 0; warn("Cannot find node $name in treeB\n"); } # do something whether or not we found all the child tips of $node in treeB # based on $all_present variable. } > Hope this clarifies my first question! > > Zara > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Tue Nov 2 13:02:48 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 2 13:01:19 2004 Subject: [Bioperl-l] Bioperl help needed! In-Reply-To: <4187C73E.8030400@york.ac.uk> References: <4187B35F.4020006@york.ac.uk> <382357C2-2CEE-11D9-BD44-000393C44276@duke.edu> <4187C73E.8030400@york.ac.uk> Message-ID: <67A4CD9A-2CF9-11D9-BD44-000393C44276@duke.edu> Well I'm only going to help if you try some of this out on your own first and post what you've tried. $tree->get_lca(-nodes => \@nodes) will find the least common ancestor ($node) = $tree->find_node($name) will find a particular node based on its name So - Make a list of node names in Tree A which into a clade with 70% support (based on the code I posted before). - Then find these nodes in Tree B (find_node method). - Find their LCA (get_lca method). - Then walk back down from the LCA (an internal node) and get all the tips (get_all_Descendents + the grep code posted below). - Check if there are any other taxa in the clade thus violating your requirement of identical clades. Post your code that does parts of what you want and we can help point. Please post to the list so other people can learn from it too. -jason On Nov 2, 2004, at 12:43 PM, Zara Ghazoui wrote: > Thanks for yur reply. > > Are you just checking to see if they are present in tree B or have the > same relationships (in the same clade). You can use methods like > get_lca (see TreeFunctionsI) to get the least common ancestor of a set > of nodes and see if > > I do need to check that the nodes are present in TreeB and have the > SAME relationship (in the same clade). > > Many thanks, > > Zara > > > Jason Stajich wrote: > >> >> On Nov 2, 2004, at 11:18 AM, Zara Ghazoui wrote: >> >>> What I would like to be able to do is the following: >>> >>> I need to iterate through all the nodes of tree A (with bootstrap >>> value >70) to see if each node, with its descedents (subtree) are >>> present in tree B. >>> >> Are you just checking to see if they are present in tree B or have >> the same relationships (in the same clade). You can use methods >> like get_lca (see TreeFunctionsI) to get the least common ancestor of >> a set of nodes and see if >> >> The $node->ancestor method lets you walk UP the tree to get ancestors >> (each node only has 1 ancestor) or $node->get_all_Descendents to get >> all descendents below a node, or just $node->each_Descendent to get >> just the Nodes in the level below. >> >> Here is one way to start, but not really sure of the behavior you >> want. >> >> # assuming you have stored the bootstrap value as an internal ID for >> the node >> # like this where the A-B clade has 75% support >> # ((A:0.100,B:0.200)75:0.150,C:0.300); >> >> # get all the nodes, grep out for those which are NOT tips and have >> bootstrap support >= 70 >> for my $node ( grep { ! $_->is_Leaf() && $_->id >= 70 } >> $treeA->get_Nodes ) { >> # note that this will touch the same nodes more than once >> my $all_present = 1; >> # grab all the tips from this clade >> for my $descendent ( grep { $_->is_Leaf} $node->get_all_Descendents >> ) { >> my $name = $descdendent->id; >> unless( grep { $_->is_Leaf && $_->id eq $name } $treeB->get_Nodes >> ) { >> $all_present = 0; >> warn("Cannot find node $name in treeB\n"); >> } >> # do something whether or not we found all the child tips of >> $node in treeB >> # based on $all_present variable. >> } >> >> >>> Hope this clarifies my first question! >>> >>> Zara >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> > > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From covitzp at mail.nih.gov Tue Nov 2 15:13:51 2004 From: covitzp at mail.nih.gov (Covitz, Peter (NIH/NCI)) Date: Tue Nov 2 15:12:32 2004 Subject: [Bioperl-l] RE: Integrating caBIOperl with BIOperl Message-ID: <27C204BD76CBC142BA1AE46D62A8548E0F69FD74@nihexchange9.nih.gov> Ewan, I thought jump in and pick up this thread. I understand and agree with your point about needing a 'bridge' between caBIO classes and equivalent existing bioperl classes. Your suggestion on how to go about implementing such a bridge was helpful, thanks. Beyond that, I had been thinking that it might be useful to contribute the entire caBIOperl module to bioperl and make it part of the bioperl core package. caBIOperl is really just an object-oriented query interface to caBIO data servers, so I naively thought it might fit nicely under Bio::DB::Query, perhaps Bio::DB::Query::caBIO ?? Of course people can use caBIOperl without it being part of bioperl. However, there are a some classes and subject areas in caBIO that are not in bioperl, so we thought it might be a useful extension to bioperl itself. In the next major caBIOperl release (~March 2005) we are going to include a full implementation of the MAGE-OM microarray data standard as part of the caBIOperl API, so that might be among the subject areas of interest to the bioperl community. I'd be interested to hear whether you and others think there might be value in incorporating caBIOperl itself into bioperl, or if you'd rather just consider incorporating the 'bridge' module. Regards, Peter Covitz -----Original Message----- From: Ewan Birney [mailto:birney@ebi.ac.uk] Sent: Tuesday, October 12, 2004 4:07 AM To: Jiang, Shan (NIH/NCI) Cc: bioperl-l@bioperl.org Subject: Re: Integrating caBIOperl with BIOperl On Mon, 11 Oct 2004, Jiang, Shan (NIH/NCI) wrote: > > Hi Ewan, > > I would like to introduce myself. I am a colleage of Gene Levinson at the > National Cancer Institue in the US. I am the original developer of > caBIOperl, which Gene presented at BOSC '04. I believe Gene talked to you > quite extensively during the meeting as well. (Gene asked me to say hi!) > > Currently, I am undertaking the task of integrating caBIOperl with > BIOperl.Gene indicated that you would be a great source to talk to. I am in > the process of learning BIOperl before deciding how to proceed. So I would > much appreciate your help in learning BIOperl as well as looking into > possible ways of integrating caBIOperl with BIOperl. > Great - I'm cc'ing this message to the main bioperl list to check I give you the best advice! > Let me start asking some questions to start the ball rolling. > > 1. Has similar kinds of integration work been done before? If so is there a > general recommended approach? The recommendation is definitely to have an caBIOperl "bridge" to Bioperl objects. The main ones you want to have are Bio::SeqI, Bio::DB::RandomAccessI and Bio::AnnotationCollectionI and Bio::SeqFeatureI The "I" means interface (a bit like Java) In each case you would have wrapper classes that has-a caBIOPerl object and is-a Bioperl object, for example, imagining the caBIOPerl sequence object has methods "human_readable_name" and "sequence_as_string" (of course, they might have something completely different...) package Bio::caBIOBridge::Seq; @ISA = qw( Bio::SeqI ); ... ... # Bio::SeqI isa Bio::PrimarySeqI, and needs to implement # display_id. this should give back the human readable name sub display_id { my $obj = shift; # the caBIOPerl method is "human_readable_name" return $obj->{'_cabioperl_object'}->human_readable_name() } # Bio::SeqI needs to implement seq sub seq { return $obj->{'_cabioperl_object'}->sequence_as_string() } etc etc This is, BTW, something I am planning to do with Ensembl as - make an Ensembl-Bioperl bridge. > 2. Do you have a repository where people can just "donate" their code into? I would suggest that the caBIO-Bioperl bridge was its own cvs module and donated into CPAN. You could run the cvs module at Bioperl.org or do it in your own shop - entirely up to you. > 3. caBIOperl has its own object model, if the end vision is to integrate > this model with BIOperl, how should I proceed? see above > 4. Can I get access to the CVS repository? > You shouldn't need access to the bioperl cvs repository to come up with some working code - if you want to have the caBio-Bioperl bridge repository hosted at bioperl.org that's feasible, but probably building some proof-of-concept classes first off would be great. A great first step would be if someone could write a script like: use Bio::caBIOBridge::DBAccess; use Bio::SeqIO; # default to well known caBio server $db = Bio::caBIOBridge::DBAccess->new(); $ca_wrapped_seq = $db->get_Seq_by_id('some_id'); # $ca_wrapped_seq is Bio::SeqI object but is actually a thin wrapper over # caBIO objects # Bio::SeqIO is a Bioperl object writer that works with Bio::SeqI # compliant objects $seqout = Bio::SeqIO->new( -format => 'EMBL'); # Here we see the bridge in action! $seqout->write_seq($ca_wrapped_seq); > I am not sure how familiar you are with caBIOperl. So if you have any > question, please do not hesistate to ask me. > > Regards, > Shan Jiang > (Contractor) > > From cmlarota at vbi.vt.edu Tue Nov 2 13:22:01 2004 From: cmlarota at vbi.vt.edu (Carlos Mauricio La Rota) Date: Tue Nov 2 16:59:47 2004 Subject: [Bioperl-l] A mechanism to store quality values for sequences in a BioSQL-db Message-ID: <16D15BDC-2CFC-11D9-AC16-000A95B769B2@vbi.vt.edu> Hello everybody, Is there a mechanism in place to store/retrieve sequences with qualities (Bio::Seq::SeqWithQuality) in the SQL database? Thanks, Mauricio. ======================== C. Mauricio La Rota Lawrence Lab Virginia Bioinformatics Institute From ediths at unizh.ch Mon Nov 1 10:08:52 2004 From: ediths at unizh.ch (Edith Schlagenhauf) Date: Tue Nov 2 16:59:55 2004 Subject: Re [Bioperl-l] Parsing Blast reports: GI number of a hit ? Message-ID: Hi, thanks for the information. Is there a method like $hit->gi_number() to get gi numbers from a Blast report? (I havent found anything so far..) thanks, Edith On Oct 28, 2004, at 9:44 AM, Edith Schlagenhauf wrote: > b) is there a more convenient way to get gi numbers from accession > numbers using Bioperl? Well NCBI provides the resource so would be their call. you can generate a accession to gi lookup table by grepping through your blast db and pulling out the gi number and accession, save this in a file and build a persistent lookup DB with DB_File or something equivalent. That is what I'd do if I am doing lots of these lookups. Of course if you are trying to do this for BLAST results you can have the GI number in the report if you specify just add the -I T option in your blastall command -- from the blastall docs: -I Show GI's in deflines [T/F] ****************************************** Dr Edith Schlagenhauf Bioinformatics Institute of Plant Biology University of Zurich Zollikerstrasse 107 CH-8008 Zurich SWITZERLAND e-mail: ediths AT botinst DOT unizh DOT ch Tel.: +41 1 634 82 78 Fax : +41 1 634 82 04 ****************************************** From pete at osc.edu Mon Nov 1 16:49:30 2004 From: pete at osc.edu (Peter G Carswell) Date: Tue Nov 2 17:00:05 2004 Subject: [Bioperl-l] BioPerl on Linux.... Message-ID: <4186AF6A.1030404@osc.edu> Hello, I am running Fedora core 1 on my laptop. Today I attempted to install bioperl-1.4, using the CPAN shell. Everything seemed to go well, just a couple of failed tests. So I did a 'force install' on B/BI/BIRNEY/bioperl-1.4.tar.gz. It finished and I quit the shell. I created a simple test: #!/bin/bin/perl use Bio::Perl and ran the script "perl test.pl". It created the output: Can't locate Bio/Perl.pm in @INC .... I looked in the path "/usr/lib/perl5/site_perl/5.8.3", the installation directory, and could find no Perl.pm in the Bio directory. In ./Bio/ were the directories 'Factory' and 'Tools'. Under ./Bundle/ there was a BioPerl.pm, but as I have not used BioPer extensively, I was unsure as where these files should fall. I was looking at some of the tutorial scripts and followed there lead as to using the perl command "use Bio::Perl". What I would like to do is reinstall, but is there a CPAN shell command to unistall the previous version? Any pointers? Thanks for your help. pete -- Peter G. Carswell The Ohio Supercomputer Center pete@osc.edu work: 614.292.1091 fax: 614.292.XXXX "DOC NOTE, I DISSENT. A FAST NEVER PREVENTS A FATNESS. I DIET ON COD." -Peter Hilton From ssri12 at yahoo.com Mon Nov 1 08:52:43 2004 From: ssri12 at yahoo.com (Sudha S) Date: Tue Nov 2 17:00:06 2004 Subject: [Bioperl-l] Problems installing Bio::DB In-Reply-To: Message-ID: <20041101135243.45842.qmail@web41205.mail.yahoo.com> Thank you very much! Problem has been fixed. Sudha --- Brian Osborne wrote: > Sudha, > > If you want bioperl-db and you must use Windows then > you might consider > Cygwin, it's a Unix emulator. Very easy to install, > and I've installed and > tested bioperl-db in Cygwin using both Mysql and > Postgres (and Postgres > comes with Cygwin, simplifying matters). > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On > Behalf Of Hilmar Lapp > Sent: Tuesday, October 26, 2004 2:12 AM > To: Sudha S > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Problems installing Bio::DB > > Sorry I can't help here since I've never tried to > install bioperl-db on > Windows. Can anyone out there who is on the same > platform help? > > -hilmar > > On Wednesday, October 20, 2004, at 08:03 PM, Sudha > S wrote: > > > > > Hi, > > > > I have been trying to install the Bio::DB for a > week > > now and I need some help in this badly. > > I keep getting the error : > > > > 'Error: no suitable installation target found for > > package bioperl-db.' > > > > I remember that i was prompted to give path for > > /usr/lib directory and I do not have this folder > on my > > machine which is running Windows XP. What should > be > > done to get over this problem? > > > > Thank you, > > Sudha > > > > > > > > _______________________________ > > Do you Yahoo!? > > Declare Yourself - Register online to vote today! > > http://vote.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp > at gnf.org > GNF, San Diego, Ca. 92121 phone: > +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > __________________________________ Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. http://promotions.yahoo.com/new_mail From tex at biosysadmin.com Tue Nov 2 01:08:57 2004 From: tex at biosysadmin.com (James Thompson) Date: Tue Nov 2 17:20:01 2004 Subject: [Bioperl-l] BioPerl on Linux.... Message-ID: Peter, It sounds like something went seriously wrong in your attempted installation, here's what my /usr/local/lib/perl5/site_perl/5.8.5/Bio directory has 41 files in it. I wouldn't worry to much about uninstalling, just repeat the installation and it should overwrite all of the old modules. Bioperl does depend on a few external modules, so try installing Bundle::BioPerl to grab the external dependencies. One thing that you should try is downloading the tarball (or grabbing it from your ~/.cpan/sources directory), unzipping it, running make and then make test. Look at what tests fail, then send e-mail to the list. That extra information will allow us to help you better. Best of luck. :) James Thompson On Mon, 1 Nov 2004, Peter G Carswell wrote: > Hello, > > I am running Fedora core 1 on my laptop. Today I attempted to install > bioperl-1.4, using the CPAN shell. Everything seemed to go well, just a > couple of failed tests. So I did a 'force install' on > B/BI/BIRNEY/bioperl-1.4.tar.gz. It finished and I quit the shell. I > created a simple test: > > #!/bin/bin/perl > > use Bio::Perl > > and ran the script "perl test.pl". It created the output: > > Can't locate Bio/Perl.pm in @INC .... > > I looked in the path "/usr/lib/perl5/site_perl/5.8.3", the installation > directory, and could find no Perl.pm in the Bio directory. In ./Bio/ > were the directories 'Factory' and 'Tools'. Under ./Bundle/ there was a > BioPerl.pm, but as I have not used BioPer extensively, I was unsure as > where these files should fall. I was looking at some of the tutorial > scripts and followed there lead as to using the perl command "use > Bio::Perl". > > What I would like to do is reinstall, but is there a CPAN shell command > to unistall the previous version? Any pointers? > > Thanks for your help. > > pete > > From lstein at cshl.edu Tue Nov 2 17:48:03 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Nov 2 17:46:32 2004 Subject: [Bioperl-l] RE: Integrating caBIOperl with BIOperl In-Reply-To: <27C204BD76CBC142BA1AE46D62A8548E0F69FD74@nihexchange9.nih.gov> References: <27C204BD76CBC142BA1AE46D62A8548E0F69FD74@nihexchange9.nih.gov> Message-ID: <200411021748.03426.lstein@cshl.edu> I think that caBIOperl and BioPerl will both need some work in order to make the APIs coherent with each other. I am not eager to see caBIOperl just dropped in without a more thorough integration. Why not just release caBIOPerl onto CPAN? Lincoln On Tuesday 02 November 2004 03:13 pm, Covitz, Peter (NIH/NCI) wrote: > Ewan, > > I thought jump in and pick up this thread. I understand and agree with > your point about needing a 'bridge' between caBIO classes and equivalent > existing bioperl classes. Your suggestion on how to go about implementing > such a bridge was helpful, thanks. > > Beyond that, I had been thinking that it might be useful to contribute the > entire caBIOperl module to bioperl and make it part of the bioperl core > package. caBIOperl is really just an object-oriented query interface to > caBIO data servers, so I naively thought it might fit nicely under > Bio::DB::Query, perhaps Bio::DB::Query::caBIO ?? > > Of course people can use caBIOperl without it being part of bioperl. > However, there are a some classes and subject areas in caBIO that are not > in bioperl, so we thought it might be a useful extension to bioperl itself. > In the next major caBIOperl release (~March 2005) we are going to include > a full implementation of the MAGE-OM microarray data standard as part of > the caBIOperl API, so that might be among the subject areas of interest to > the bioperl community. > > I'd be interested to hear whether you and others think there might be value > in incorporating caBIOperl itself into bioperl, or if you'd rather just > consider incorporating the 'bridge' module. > > Regards, > > Peter Covitz > > -----Original Message----- > From: Ewan Birney [mailto:birney@ebi.ac.uk] > Sent: Tuesday, October 12, 2004 4:07 AM > To: Jiang, Shan (NIH/NCI) > Cc: bioperl-l@bioperl.org > Subject: Re: Integrating caBIOperl with BIOperl > > On Mon, 11 Oct 2004, Jiang, Shan (NIH/NCI) wrote: > > Hi Ewan, > > > > I would like to introduce myself. I am a colleage of Gene Levinson at the > > National Cancer Institue in the US. I am the original developer of > > caBIOperl, which Gene presented at BOSC '04. I believe Gene talked to you > > quite extensively during the meeting as well. (Gene asked me to say hi!) > > > > Currently, I am undertaking the task of integrating caBIOperl with > > BIOperl.Gene indicated that you would be a great source to talk to. I am > > in > > > the process of learning BIOperl before deciding how to proceed. So I > > would much appreciate your help in learning BIOperl as well as looking > > into possible ways of integrating caBIOperl with BIOperl. > > Great - I'm cc'ing this message to the main bioperl list to check I give > you the best advice! > > > Let me start asking some questions to start the ball rolling. > > > > 1. Has similar kinds of integration work been done before? If so is there > > a > > > general recommended approach? > > The recommendation is definitely to have an caBIOperl "bridge" to Bioperl > objects. The main ones you want to have are Bio::SeqI, > Bio::DB::RandomAccessI and Bio::AnnotationCollectionI and Bio::SeqFeatureI > > The "I" means interface (a bit like Java) > > In each case you would have wrapper classes that has-a caBIOPerl object > and is-a Bioperl object, for example, imagining the caBIOPerl sequence > object has methods "human_readable_name" and "sequence_as_string" (of > course, they might have something completely different...) > > package Bio::caBIOBridge::Seq; > > @ISA = qw( Bio::SeqI ); > > ... > ... > > # Bio::SeqI isa Bio::PrimarySeqI, and needs to implement > # display_id. this should give back the human readable name > sub display_id { > my $obj = shift; > > # the caBIOPerl method is "human_readable_name" > return $obj->{'_cabioperl_object'}->human_readable_name() > } > > # Bio::SeqI needs to implement seq > sub seq { > return $obj->{'_cabioperl_object'}->sequence_as_string() > } > > > etc etc > > > This is, BTW, something I am planning to do with Ensembl as - make an > Ensembl-Bioperl bridge. > > > 2. Do you have a repository where people can just "donate" their code > > into? > > I would suggest that the caBIO-Bioperl bridge was its own cvs module and > donated into CPAN. You could run the cvs module at Bioperl.org or do it in > your own shop - entirely up to you. > > > 3. caBIOperl has its own object model, if the end vision is to integrate > > this model with BIOperl, how should I proceed? > > see above > > > 4. Can I get access to the CVS repository? > > You shouldn't need access to the bioperl cvs repository to come up with > some working code - if you want to have the caBio-Bioperl bridge > repository hosted at bioperl.org that's feasible, but probably building > some proof-of-concept classes first off would be great. > > > A great first step would be if someone could write a script like: > > > use Bio::caBIOBridge::DBAccess; > use Bio::SeqIO; > > # default to well known caBio server > $db = Bio::caBIOBridge::DBAccess->new(); > > $ca_wrapped_seq = $db->get_Seq_by_id('some_id'); > > # $ca_wrapped_seq is Bio::SeqI object but is actually a thin wrapper over > # caBIO objects > > # Bio::SeqIO is a Bioperl object writer that works with Bio::SeqI > # compliant objects > $seqout = Bio::SeqIO->new( -format => 'EMBL'); > > # Here we see the bridge in action! > $seqout->write_seq($ca_wrapped_seq); > > > I am not sure how familiar you are with caBIOperl. So if you have any > > question, please do not hesistate to ask me. > > > > Regards, > > Shan Jiang > > (Contractor) > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From jason.stajich at duke.edu Tue Nov 2 19:36:05 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 2 19:34:36 2004 Subject: Re [Bioperl-l] Parsing Blast reports: GI number of a hit ? In-Reply-To: References: Message-ID: <587A7E3C-2D30-11D9-9167-000393C44276@duke.edu> my ($gi); if ($hit->name =~ /gi\|(\d+)/) { $gi = $1 } -jason On Nov 1, 2004, at 10:08 AM, Edith Schlagenhauf wrote: > Hi, > > thanks for the information. Is there a method like $hit->gi_number() > to get gi numbers from a Blast report? (I havent found anything so > far..) > > thanks, > Edith > > On Oct 28, 2004, at 9:44 AM, Edith Schlagenhauf wrote: > >> b) is there a more convenient way to get gi numbers from accession >> numbers using Bioperl? > > Well NCBI provides the resource so would be their call. you can > generate a accession to gi lookup table by grepping through your blast > db and pulling out the gi number and accession, save this in a file and > build a persistent lookup DB with DB_File or something equivalent. > That is what I'd do if I am doing lots of these lookups. > > Of course if you are trying to do this for BLAST results you can have > the GI number in the report if you specify just add the > -I T option in your blastall command -- from the blastall docs: > > -I Show GI's in deflines [T/F] > > > > > ****************************************** > Dr Edith Schlagenhauf > Bioinformatics > Institute of Plant Biology > University of Zurich > Zollikerstrasse 107 > CH-8008 Zurich > SWITZERLAND > > e-mail: ediths AT botinst DOT unizh DOT ch > Tel.: +41 1 634 82 78 > Fax : +41 1 634 82 04 > ****************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From sutripa at vbi.vt.edu Tue Nov 2 20:35:24 2004 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Tue Nov 2 20:33:55 2004 Subject: [Bioperl-l] mis-match position in HSP Message-ID: <4058.199.3.136.4.1099445724.squirrel@webmail.vbi.vt.edu> Hi, Just a quick question, is there a method to get the mis match position from a blast output using Bio::Search::HSP::HSPI. For example if I have an HSP like this: 5 atgaataggatagggataggtagata 26 ||| |||||||||||||||||||||| 23 atgtataggatagggataggtagata 44 Then the mismatch position is 4. Thanks in advance --Sucheta -- Sucheta Tripathy Virginia Bioinformatics Institute Phase-I Washington street. Virginia Tech. Blacksburg,VA 24061-0447 phone:(540)231-8138 Fax: (540) 231-2606 From neil.saunders at unsw.edu.au Tue Nov 2 20:49:32 2004 From: neil.saunders at unsw.edu.au (Neil Saunders) Date: Tue Nov 2 20:48:00 2004 Subject: [Bioperl-l] Problems with "tblastx"-like output and GFF Message-ID: <20041103014932.GA17394@psychro> Dear all, I am working on ways to visualise conserved gene clusters (or gene synteny) in microbial genomes. Basically, what I want to do is (1) compare a set of contigs to 1 or more finished genomes using one or all of blast, blat and MUMmer, (2) convert the output to GFF including 'match' lines and (3) visualise in Generic Genome Browser. Along the way I've come across a few problems to share with you - I'm cross-posting to bioperl and gbrowse lists. For comparing genomes in this way a "tblastx-like" approach is best, as proteins are more conserved than DNA. This requires use of: (1) tblastx (2) blat with "t=dnax -q=dnax" (which I'll call "blatx") (3) promer from the MUMmer package So far so good. Problems arise when parsing these outputs to generate GFF. My problems are: (1) Bio::Tools::Blat. When I use this module to parse "blatx" output (just following the example in the module docs), I get errors like so: Argument "++" isn't numeric in numeric ne (!=) at /usr/local/share/perl/5.8.4/Bio/Location/Atomic.pm line 170, line 228. "Blatx" has a strand column containing both query and target strand (e.g. "++") and it seems Blat.pm is not happy with this? This looks like an easy fix. (2) Bio::SearchIO::psl This module gives me errors such as: Use of uninitialized value in pattern match (m//) at /usr/local/share/perl/5.8.4/Bio/SearchIO/psl.pm line 176, line 2503. I'm not entirely sure what to make of this, as the psl file looks OK. (3) search2gff and blast2gff I've used the search2gff script from bioperl and the blast2gff script from gbrowse. What I found was: - they both work fine on blastn output - search2gff with the "--match" switch works fine on blastn output, but gives errors on tblastx output - blast2gff gives similar errors to search2gff when trying to construct a match line from tblastx output The errors when trying to create "match" lines look like this: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Undefined sub-sequence (2400764,2400765). Valid range = 2400764 - 2400871 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.4/Bio/Root/Root.pm:328 STACK: Bio::Search::HSP::HSPI::matches /usr/local/share/perl/5.8.4/Bio/Search/HSP/HSPI.pm:711 STACK: Bio::Search::SearchUtils::_adjust_contigs /usr/local/share/perl/5.8.4/Bio/Search/SearchUtils.pm:389 STACK: Bio::Search::SearchUtils::tile_hsps /usr/local/share/perl/5.8.4/Bio/Search/SearchUtils.pm:182 STACK: Bio::Search::Hit::GenericHit::strand /usr/local/share/perl/5.8.4/Bio/Search/Hit/GenericHit.pm:1440 STACK: /usr/bin/bp_search2gff.pl:177 ----------------------------------------------------------- and the values in parentheses after "Undefined sub-sequence" always differ by one (as above, 2400764,2400765). I had some discussion with Jason S. about this issue some months ago and we concluded there was some complicated problem in the internals of tile_hsps. What all this boils down to is that "tblastx-like" output is quite hard to deal with. I have taken to writing my own scripts where I: 1) store GFF lines for single HSPs in an array 2) loop through and create a hash of hashes of arrays with keys query_id + target_id and an array of starts/ends 3) sort the array and use the first+last values to generate the "match" lines. This works OK (I've done this for promer too), but is a bit basic (e.g. "match" lines are not given scores). I have an example of how this looks for promer at: http://psychro.bioinformatics.unsw.edu.au/perl/gbrowse_img/sim?name=Contig70:38893..138892;type=CDS+PROMER;width=1024 If anyone is interested, I can provide output files and sample scripts and perhaps we can work this issue. I use: Debian Linux (unstable) Latest CVS bioperl and gbrowse BLAST 2.2.8 MUMmer 3.15 BLAT v. 17 thanks for your ideas, Neil -- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney 2052, Australia http://psychro.bioinformatics.unsw.edu.au/neil/index.php From khufaz83 at yahoo.com Wed Nov 3 01:10:15 2004 From: khufaz83 at yahoo.com (hafiz hafiz) Date: Wed Nov 3 01:09:33 2004 Subject: [Bioperl-l] CGI Perl and Bioperl Message-ID: <20041103061015.9921.qmail@web52510.mail.yahoo.com> i have writing source code to searching sequence with SeqIO used CGI perl but it don't running in Internet Exploxer . but it can running on IE while don't have SeqIO in my source code . why? for information, i have installed apache in our server. my source code: search.cgi #!/usr/bin/perl #use lib '/disk3/local/lib/perl5/site_perl'; #use lib '/home/database/swiss-prot/release'; use CGI qw/:standard :html3/; use Bio::DB::GenBank; use File::Temp; use FileHandle; use Bio::DB::SwissProt; use File::Temp; use FileHandle; use Bio::Root::IO; use Bio::SeqIO; use Bio::Seq; use Bio::PrimarySeq; use Location; print header, start_html(-title => 'find subsequence of large SwissProt entries',-author => ' '); print_form() unless param; print_results() if param; sub print_results { $database = SwissProt; $input = param('input'); $fileName = param('fileName'); $format = swiss; #Load module Location.pm into an array @filelocation = Location::filelocation ("sprot42.dat", $database); #Access directory path that resides in second element of array @filelocation my $location = $filelocation[1]; #Open the directory that are returned by Location.pm modul opendir (DIR, $location)|| die "Couldn't open directory or directory not found\n"; #recieve input from user print "====>>Enter sequence:$input"; # $input = ; chomp $input; print "\n"; #Read the drectory and store its content in an array @file = readdir (DIR); foreach $file(@file) { if ($file eq "sprot42.dat") { #Open sprot42.dat file for reading $file="$location"; $file .="/sprot42.dat"; open (FILE,$file) || die "Couldn't open file\n"; #Method from Bioperl module-Bio::SeqIO #Create a sequence object and store it in scalar variable $in = Bio::SeqIO->new(-file =>$file,'-format'=>"$format"); #Loop untill the end of the sequence object while (my $seq = $in->next_seq()) { #Read sequence in file my $sequence = $seq->seq; #Test if sequence exist in the file if ($sequence =~ /$input/) { my $io=Bio::SeqIO->new(-format=>"$format",-file=>">/var/www/html/infoseq"); $io->write_seq($seq); #Access sequence ID and store it in a scalar variable $fileName = $seq->id; print "Matching sequence found in $fileName\n\n"; #Store $fileName in @foundfile array push (@foundfile,$fileName); #Increase counter of found $found = $found + 1; }# End second if loop }#End while loop }#End first if loop }#End foreach loop } sub print_form { print start_form,table( Tr(td("Enter your sequence"),td(textfield(-name => 'input',-size => 20)))), submit ("Find my subsequence"); } ________________________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html From jason.stajich at duke.edu Wed Nov 3 08:22:47 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Nov 3 08:21:12 2004 Subject: [Bioperl-l] mis-match position in HSP In-Reply-To: <4058.199.3.136.4.1099445724.squirrel@webmail.vbi.vt.edu> References: <4058.199.3.136.4.1099445724.squirrel@webmail.vbi.vt.edu> Message-ID: <74034A94-2D9B-11D9-9167-000393C44276@duke.edu> Try the seq_inds method -- it allows you to get the positions where certain characters are matches, gaps,mismatches. I can't remember if they are returned in sequence coordinates or HSP coordinates, I think the latter though. You can use Bio::Coordinate::Utils to do coordinate mapping from one system to another based on an alignment. -jason On Nov 2, 2004, at 8:35 PM, Sucheta Tripathy wrote: > > Hi, > > Just a quick question, is there a method to get the mis match position > from a blast output using Bio::Search::HSP::HSPI. > > For example if I have an HSP like this: > > 5 atgaataggatagggataggtagata 26 > ||| |||||||||||||||||||||| > 23 atgtataggatagggataggtagata 44 > > Then the mismatch position is 4. > > Thanks in advance > > --Sucheta > > > -- > Sucheta Tripathy > Virginia Bioinformatics Institute Phase-I > Washington street. > Virginia Tech. > Blacksburg,VA 24061-0447 > phone:(540)231-8138 > Fax: (540) 231-2606 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From covitzp at mail.nih.gov Wed Nov 3 09:47:35 2004 From: covitzp at mail.nih.gov (Covitz, Peter (NIH/NCI)) Date: Wed Nov 3 09:46:05 2004 Subject: [Bioperl-l] RE: Integrating caBIOperl with BIOperl Message-ID: <27C204BD76CBC142BA1AE46D62A8548E0F69FD85@nihexchange9.nih.gov> Agreed, getting it into CPAN is the first order of business. caBIOperl is itself a wrapper around a lower-level SOAP-XML API. That gives us some flexibility on how we present the visible API interfaces. Once we get it into CPAN, I'd be interested in continuing the discussion of what the appropriate bridging and interface strategy would be to make it more suitable for use with bioperl. Thanks for the feedback! Regards, Peter -----Original Message----- From: Lincoln Stein [mailto:lstein@cshl.edu] Sent: Tuesday, November 02, 2004 5:48 PM To: Covitz, Peter (NIH/NCI); 'bioperl-l@bioperl.org' Subject: Re: [Bioperl-l] RE: Integrating caBIOperl with BIOperl I think that caBIOperl and BioPerl will both need some work in order to make the APIs coherent with each other. I am not eager to see caBIOperl just dropped in without a more thorough integration. Why not just release caBIOPerl onto CPAN? Lincoln On Tuesday 02 November 2004 03:13 pm, Covitz, Peter (NIH/NCI) wrote: > Ewan, > > I thought jump in and pick up this thread. I understand and agree with > your point about needing a 'bridge' between caBIO classes and equivalent > existing bioperl classes. Your suggestion on how to go about implementing > such a bridge was helpful, thanks. > > Beyond that, I had been thinking that it might be useful to contribute the > entire caBIOperl module to bioperl and make it part of the bioperl core > package. caBIOperl is really just an object-oriented query interface to > caBIO data servers, so I naively thought it might fit nicely under > Bio::DB::Query, perhaps Bio::DB::Query::caBIO ?? > > Of course people can use caBIOperl without it being part of bioperl. > However, there are a some classes and subject areas in caBIO that are not > in bioperl, so we thought it might be a useful extension to bioperl itself. > In the next major caBIOperl release (~March 2005) we are going to include > a full implementation of the MAGE-OM microarray data standard as part of > the caBIOperl API, so that might be among the subject areas of interest to > the bioperl community. > > I'd be interested to hear whether you and others think there might be value > in incorporating caBIOperl itself into bioperl, or if you'd rather just > consider incorporating the 'bridge' module. > > Regards, > > Peter Covitz > > -----Original Message----- > From: Ewan Birney [mailto:birney@ebi.ac.uk] > Sent: Tuesday, October 12, 2004 4:07 AM > To: Jiang, Shan (NIH/NCI) > Cc: bioperl-l@bioperl.org > Subject: Re: Integrating caBIOperl with BIOperl > > On Mon, 11 Oct 2004, Jiang, Shan (NIH/NCI) wrote: > > Hi Ewan, > > > > I would like to introduce myself. I am a colleage of Gene Levinson at the > > National Cancer Institue in the US. I am the original developer of > > caBIOperl, which Gene presented at BOSC '04. I believe Gene talked to you > > quite extensively during the meeting as well. (Gene asked me to say hi!) > > > > Currently, I am undertaking the task of integrating caBIOperl with > > BIOperl.Gene indicated that you would be a great source to talk to. I am > > in > > > the process of learning BIOperl before deciding how to proceed. So I > > would much appreciate your help in learning BIOperl as well as looking > > into possible ways of integrating caBIOperl with BIOperl. > > Great - I'm cc'ing this message to the main bioperl list to check I give > you the best advice! > > > Let me start asking some questions to start the ball rolling. > > > > 1. Has similar kinds of integration work been done before? If so is there > > a > > > general recommended approach? > > The recommendation is definitely to have an caBIOperl "bridge" to Bioperl > objects. The main ones you want to have are Bio::SeqI, > Bio::DB::RandomAccessI and Bio::AnnotationCollectionI and Bio::SeqFeatureI > > The "I" means interface (a bit like Java) > > In each case you would have wrapper classes that has-a caBIOPerl object > and is-a Bioperl object, for example, imagining the caBIOPerl sequence > object has methods "human_readable_name" and "sequence_as_string" (of > course, they might have something completely different...) > > package Bio::caBIOBridge::Seq; > > @ISA = qw( Bio::SeqI ); > > ... > ... > > # Bio::SeqI isa Bio::PrimarySeqI, and needs to implement > # display_id. this should give back the human readable name > sub display_id { > my $obj = shift; > > # the caBIOPerl method is "human_readable_name" > return $obj->{'_cabioperl_object'}->human_readable_name() > } > > # Bio::SeqI needs to implement seq > sub seq { > return $obj->{'_cabioperl_object'}->sequence_as_string() > } > > > etc etc > > > This is, BTW, something I am planning to do with Ensembl as - make an > Ensembl-Bioperl bridge. > > > 2. Do you have a repository where people can just "donate" their code > > into? > > I would suggest that the caBIO-Bioperl bridge was its own cvs module and > donated into CPAN. You could run the cvs module at Bioperl.org or do it in > your own shop - entirely up to you. > > > 3. caBIOperl has its own object model, if the end vision is to integrate > > this model with BIOperl, how should I proceed? > > see above > > > 4. Can I get access to the CVS repository? > > You shouldn't need access to the bioperl cvs repository to come up with > some working code - if you want to have the caBio-Bioperl bridge > repository hosted at bioperl.org that's feasible, but probably building > some proof-of-concept classes first off would be great. > > > A great first step would be if someone could write a script like: > > > use Bio::caBIOBridge::DBAccess; > use Bio::SeqIO; > > # default to well known caBio server > $db = Bio::caBIOBridge::DBAccess->new(); > > $ca_wrapped_seq = $db->get_Seq_by_id('some_id'); > > # $ca_wrapped_seq is Bio::SeqI object but is actually a thin wrapper over > # caBIO objects > > # Bio::SeqIO is a Bioperl object writer that works with Bio::SeqI > # compliant objects > $seqout = Bio::SeqIO->new( -format => 'EMBL'); > > # Here we see the bridge in action! > $seqout->write_seq($ca_wrapped_seq); > > > I am not sure how familiar you are with caBIOperl. So if you have any > > question, please do not hesistate to ask me. > > > > Regards, > > Shan Jiang > > (Contractor) > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From rousse at ccr.jussieu.fr Wed Nov 3 08:39:09 2004 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Wed Nov 3 10:53:29 2004 Subject: [Bioperl-l] graphing trees Message-ID: <4188DF7D.6050606@ccr.jussieu.fr> Hello. I have large trees to graph, the same way as phylogenetic trees are usually drawn, with edges length expressing distances between nodes. I had a quick look at BioPerl HowTos, but there was nothing about visual output in the Phylogenetic Tree HOWTO, and the Bio::Graphics HOWTO says nothing about trees. So, is there any way to easily draw such kind of visual tree, either in bioperl or outside ? From jason.stajich at duke.edu Wed Nov 3 11:57:02 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Nov 3 11:55:31 2004 Subject: [Bioperl-l] graphing trees In-Reply-To: <4188DF7D.6050606@ccr.jussieu.fr> References: <4188DF7D.6050606@ccr.jussieu.fr> Message-ID: <626DB952-2DB9-11D9-BFC6-000393C44276@duke.edu> On Nov 3, 2004, at 8:39 AM, Guillaume Rousse wrote: > Hello. > > I have large trees to graph, the same way as phylogenetic trees are > usually drawn, with edges length expressing distances between nodes. > Bio::TreeIO::svggraph generates SVG. You can also look at the Stoltzfus NEXUS nexplot tool http://camel5.umbi.umd.edu/camel/software/ which can produce postscript versions of trees. PHYLIP's drawtree and drawgram with the modules Bio::Tools::Run::Phylo::Phylip::DrawTree and DrawGram. This produces postscript files. > I had a quick look at BioPerl HowTos, but there was nothing about > visual output in the Phylogenetic Tree HOWTO, and the Bio::Graphics > HOWTO says nothing about trees. > > So, is there any way to easily draw such kind of visual tree, either > in bioperl or outside ? I assume you are asking about this for programatic soln? Lots of tree drawing software for tree-at-a-time analysis like TreeView, NJPlot, many more. There is also treeplot (cmdline, can be scripted to produce a lot of trees). -jason > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From rousse at ccr.jussieu.fr Wed Nov 3 12:00:49 2004 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Wed Nov 3 11:59:13 2004 Subject: [Bioperl-l] graphing trees In-Reply-To: <265214E0-2DB9-11D9-92E0-000A95D7BA10@mail.nih.gov> References: <4188DF7D.6050606@ccr.jussieu.fr> <265214E0-2DB9-11D9-92E0-000A95D7BA10@mail.nih.gov> Message-ID: <41890EC1.40500@ccr.jussieu.fr> Sean Davis wrote: > Not a bioperl answer, but you might want to look at: > > http://search.cpan.org/search?query=graphviz&mode=all > > Graphviz is a very nice package for dealing with graph visualization. It's what I used sofar, however it's not that practical for trees: - internal nodes are always visible - edges are always directs - you can't adjust their length easily From mbasu at mail.nih.gov Wed Nov 3 12:08:20 2004 From: mbasu at mail.nih.gov (Malay) Date: Wed Nov 3 12:07:37 2004 Subject: Re [Bioperl-l] Parsing Blast reports: GI number of a hit ? In-Reply-To: <587A7E3C-2D30-11D9-9167-000393C44276@duke.edu> References: <587A7E3C-2D30-11D9-9167-000393C44276@duke.edu> Message-ID: <41891084.6070104@mail.nih.gov> Jason Stajich wrote: > > my ($gi); > > if ($hit->name =~ /gi\|(\d+)/) { $gi = $1 } > > -jason > On Nov 1, 2004, at 10:08 AM, Edith Schlagenhauf wrote: > >> Hi, >> >> thanks for the information. Is there a method like $hit->gi_number() >> to get gi numbers from a Blast report? (I havent found anything so far..) >> >> thanks, >> Edith >> >> On Oct 28, 2004, at 9:44 AM, Edith Schlagenhauf wrote: >> >>> b) is there a more convenient way to get gi numbers from accession >>> numbers using Bioperl? >> >> >> Well NCBI provides the resource so would be their call. you can >> generate a accession to gi lookup table by grepping through your blast >> db and pulling out the gi number and accession, save this in a file and >> build a persistent lookup DB with DB_File or something equivalent. >> That is what I'd do if I am doing lots of these lookups. You don't need to do that. Provided your blast database actually has the gi in the defline. You can actually do this open (PIPE, "fastacmd -d $database -s $accession|"); my $line = ; $line =~ /gi\|(\d+)/; $gi = $1; Here $accession can be any string accession or whatever. Remember, that if the string is not unique it will return multiple hits. Malay From sdavis2 at mail.nih.gov Wed Nov 3 12:13:43 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Nov 3 12:11:22 2004 Subject: [Bioperl-l] graphing trees In-Reply-To: <41890EC1.40500@ccr.jussieu.fr> References: <4188DF7D.6050606@ccr.jussieu.fr> <265214E0-2DB9-11D9-92E0-000A95D7BA10@mail.nih.gov> <41890EC1.40500@ccr.jussieu.fr> Message-ID: Did you look at: http://workshop.molecularevolution.org/resources/fileformats/ tree_formats.php I happen to use R (statistical programming environment), which has some stuff for plotting trees (which I don't use). http://pbil.univ-lyon1.fr/ade4html/plot.phylog.html You might do a web search for "newick tree" to get stuff related to newick format and drawing it. On Nov 3, 2004, at 12:00 PM, Guillaume Rousse wrote: > Sean Davis wrote: >> Not a bioperl answer, but you might want to look at: >> http://search.cpan.org/search?query=graphviz&mode=all >> Graphviz is a very nice package for dealing with graph visualization. > It's what I used sofar, however it's not that practical for trees: > - internal nodes are always visible > - edges are always directs > - you can't adjust their length easily From hlapp at gmx.net Wed Nov 3 12:35:59 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Nov 3 12:34:24 2004 Subject: [Bioperl-l] RE: Integrating caBIOperl with BIOperl In-Reply-To: <27C204BD76CBC142BA1AE46D62A8548E0F69FD85@nihexchange9.nih.gov> Message-ID: I very much agree with Lincoln's comment. One of the more frequent comments we have gotten is that expecially to newbies the plethora of modules in Bioperl and the apparent diversity of its APIs are already confusing. A bridge that binds the caBIOperl API to the existing Bioperl object model would be a great addition though. -hilmar On Wednesday, November 3, 2004, at 06:47 AM, Covitz, Peter (NIH/NCI) wrote: > Agreed, getting it into CPAN is the first order of business. > > caBIOperl is itself a wrapper around a lower-level SOAP-XML API. That > gives > us some flexibility on how we present the visible API interfaces. > Once we > get it into CPAN, I'd be interested in continuing the discussion of > what the > appropriate bridging and interface strategy would be to make it more > suitable for use with bioperl. > > Thanks for the feedback! > > Regards, > > Peter > > -----Original Message----- > From: Lincoln Stein [mailto:lstein@cshl.edu] > Sent: Tuesday, November 02, 2004 5:48 PM > To: Covitz, Peter (NIH/NCI); 'bioperl-l@bioperl.org' > Subject: Re: [Bioperl-l] RE: Integrating caBIOperl with BIOperl > > > I think that caBIOperl and BioPerl will both need some work in order > to make > > the APIs coherent with each other. I am not eager to see caBIOperl > just > dropped in without a more thorough integration. Why not just release > caBIOPerl onto CPAN? > > Lincoln > > On Tuesday 02 November 2004 03:13 pm, Covitz, Peter (NIH/NCI) wrote: >> Ewan, >> >> I thought jump in and pick up this thread. I understand and agree >> with >> your point about needing a 'bridge' between caBIO classes and >> equivalent >> existing bioperl classes. Your suggestion on how to go about >> implementing >> such a bridge was helpful, thanks. >> >> Beyond that, I had been thinking that it might be useful to >> contribute the >> entire caBIOperl module to bioperl and make it part of the bioperl >> core >> package. caBIOperl is really just an object-oriented query interface >> to >> caBIO data servers, so I naively thought it might fit nicely under >> Bio::DB::Query, perhaps Bio::DB::Query::caBIO ?? >> >> Of course people can use caBIOperl without it being part of bioperl. >> However, there are a some classes and subject areas in caBIO that are >> not >> in bioperl, so we thought it might be a useful extension to bioperl > itself. >> In the next major caBIOperl release (~March 2005) we are going to >> include >> a full implementation of the MAGE-OM microarray data standard as part >> of >> the caBIOperl API, so that might be among the subject areas of >> interest to >> the bioperl community. >> >> I'd be interested to hear whether you and others think there might be > value >> in incorporating caBIOperl itself into bioperl, or if you'd rather >> just >> consider incorporating the 'bridge' module. >> >> Regards, >> >> Peter Covitz >> >> -----Original Message----- >> From: Ewan Birney [mailto:birney@ebi.ac.uk] >> Sent: Tuesday, October 12, 2004 4:07 AM >> To: Jiang, Shan (NIH/NCI) >> Cc: bioperl-l@bioperl.org >> Subject: Re: Integrating caBIOperl with BIOperl >> >> On Mon, 11 Oct 2004, Jiang, Shan (NIH/NCI) wrote: >>> Hi Ewan, >>> >>> I would like to introduce myself. I am a colleage of Gene Levinson at > the >>> National Cancer Institue in the US. I am the original developer of >>> caBIOperl, which Gene presented at BOSC '04. I believe Gene talked to > you >>> quite extensively during the meeting as well. (Gene asked me to say >>> hi!) >>> >>> Currently, I am undertaking the task of integrating caBIOperl with >>> BIOperl.Gene indicated that you would be a great source to talk to. >>> I am >> >> in >> >>> the process of learning BIOperl before deciding how to proceed. So I >>> would much appreciate your help in learning BIOperl as well as >>> looking >>> into possible ways of integrating caBIOperl with BIOperl. >> >> Great - I'm cc'ing this message to the main bioperl list to check I >> give >> you the best advice! >> >>> Let me start asking some questions to start the ball rolling. >>> >>> 1. Has similar kinds of integration work been done before? If so is > there >> >> a >> >>> general recommended approach? >> >> The recommendation is definitely to have an caBIOperl "bridge" to >> Bioperl >> objects. The main ones you want to have are Bio::SeqI, >> Bio::DB::RandomAccessI and Bio::AnnotationCollectionI and >> Bio::SeqFeatureI >> >> The "I" means interface (a bit like Java) >> >> In each case you would have wrapper classes that has-a caBIOPerl >> object >> and is-a Bioperl object, for example, imagining the caBIOPerl sequence >> object has methods "human_readable_name" and "sequence_as_string" (of >> course, they might have something completely different...) >> >> package Bio::caBIOBridge::Seq; >> >> @ISA = qw( Bio::SeqI ); >> >> ... >> ... >> >> # Bio::SeqI isa Bio::PrimarySeqI, and needs to implement >> # display_id. this should give back the human readable name >> sub display_id { >> my $obj = shift; >> >> # the caBIOPerl method is "human_readable_name" >> return $obj->{'_cabioperl_object'}->human_readable_name() >> } >> >> # Bio::SeqI needs to implement seq >> sub seq { >> return $obj->{'_cabioperl_object'}->sequence_as_string() >> } >> >> >> etc etc >> >> >> This is, BTW, something I am planning to do with Ensembl as - make an >> Ensembl-Bioperl bridge. >> >>> 2. Do you have a repository where people can just "donate" their code >> >> into? >> >> I would suggest that the caBIO-Bioperl bridge was its own cvs module >> and >> donated into CPAN. You could run the cvs module at Bioperl.org or do >> it in >> your own shop - entirely up to you. >> >>> 3. caBIOperl has its own object model, if the end vision is to >>> integrate >>> this model with BIOperl, how should I proceed? >> >> see above >> >>> 4. Can I get access to the CVS repository? >> >> You shouldn't need access to the bioperl cvs repository to come up >> with >> some working code - if you want to have the caBio-Bioperl bridge >> repository hosted at bioperl.org that's feasible, but probably >> building >> some proof-of-concept classes first off would be great. >> >> >> A great first step would be if someone could write a script like: >> >> >> use Bio::caBIOBridge::DBAccess; >> use Bio::SeqIO; >> >> # default to well known caBio server >> $db = Bio::caBIOBridge::DBAccess->new(); >> >> $ca_wrapped_seq = $db->get_Seq_by_id('some_id'); >> >> # $ca_wrapped_seq is Bio::SeqI object but is actually a thin wrapper >> over >> # caBIO objects >> >> # Bio::SeqIO is a Bioperl object writer that works with Bio::SeqI >> # compliant objects >> $seqout = Bio::SeqIO->new( -format => 'EMBL'); >> >> # Here we see the bridge in action! >> $seqout->write_seq($ca_wrapped_seq); >> >>> I am not sure how familiar you are with caBIOperl. So if you have any >>> question, please do not hesistate to ask me. >>> >>> Regards, >>> Shan Jiang >>> (Contractor) >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln Stein > lstein@cshl.edu > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From barry.moore at genetics.utah.edu Wed Nov 3 14:10:13 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed Nov 3 14:09:04 2004 Subject: [Bioperl-l] Blast Results IO to Local RDBMS Message-ID: <41892D15.2000507@genetics.utah.edu> I want to BLAST alot (100,000+) of ORFs and then save the results in a local relational database for periodic access later. I was thinking something like Bio::SearchIO::Writer::HSPTableWriter to build table(s) for a "roll my own" schema and then more code (either write it or find it) which pulls the search results back out of RDBMS and into Bio::Search object. I recall about a month ago Jason replied to someone with a similar goal as follows: >You will need to >build Bio::Search objects however from the DB - something that I am >sure has been done in part somewhere already. >.... >If someone gets this bit of to-from RDMBS code working for would be >really nice to have it in bioperl as just a simple component. I know >lots of people have their own store-similarity-search-results schemas >out there, happy to see a Bio::Search serializiation layer written to >talk to these. > My question is: Has anyone done this already? Barry -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From khufaz83 at yahoo.com Wed Nov 3 23:19:15 2004 From: khufaz83 at yahoo.com (hafiz hafiz) Date: Wed Nov 3 23:17:39 2004 Subject: [Bioperl-l] how to insert more than one features? Message-ID: <20041104041915.54211.qmail@web52508.mail.yahoo.com> hi help me,how to built new features more than one to new file likes this; FT SIGNAL 1 24 POTENTIAL. FT CHAIN 25 75 10 KDA PROTEIN. i can only built just one features likes this: FT CHAIN 25 75 10 KDA PROTEIN. this my source code, any suggestion; use Bio::SeqFeature::Generic $option=0; while($option=='0' ) { print"enter start\n"; $start = ; chomp $start; print"\nenter end\n"; $end = ; chomp $end; print"\nenter strand"; $strand = ; chomp $strand; print"\nenter primary"; $primary = ; chomp $primary; my $feat = new Bio::SeqFeature::Generic(-start =>$start,-end => $end, -strand =>$strand,-primary =>$primary); print" insert an other features\n" ; $option = ; chomp $option; }; ________________________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html From daniel.lang at biologie.uni-freiburg.de Thu Nov 4 03:38:04 2004 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Thu Nov 4 03:36:36 2004 Subject: [Bioperl-l] Length of ID in EMBL sequence entries Message-ID: <4189EA6C.7080103@biologie.uni-freiburg.de> Hi, I just stumbled over my sequence IDs getting trimmed to 10 characters when writing with Bio::SeqIO::embl. line 453: $temp_line = sprintf("%-11.10sstandard; $mol; $div; %d BP.", $seq->id(), $len); Is there any reason for the trimming? I can?t find any hints for that in the EMBL manual... Thanks in advance, Daniel:) From m.claesson at student.ucc.ie Thu Nov 4 05:49:50 2004 From: m.claesson at student.ucc.ie (Marcus Claesson) Date: Thu Nov 4 05:48:18 2004 Subject: [Bioperl-l] problem using 'add_sub_SeqFeature' on home-made blast parsed data in mysql table Message-ID: <1099565390.11383.45.camel@morpheus.ucc.ie> Hi! I'm trying to make a Bio::Graphics overview of some blastx outputs that are parsed and inserted in a Mysql table. It works fine if I treat every HSP as a separate hit, but I want to use 'add_sub_SeqFeature' to include additional HSPs as subfeatures in the same track. This is how the blastx data is structured in the table (sbj_name left out for clarity): sbj_hit hsp e_value score q_begin q_end frame 5922 1 4e-31 363 163 532 1 5922 2 5e-31 362 534 864 2 6912 1 6e-27 327 310 828 0 6913 1 6e-27 327 142 828 1 6915 1 6e-27 327 187 864 2 As you can see the first hit has two HSPs so I want to add the hsp=2 row as a subfeature. But when I run my script below I get this error message: Can't call method "isa" on unblessed reference at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 760, line 191. This only happens when the result of the sql query contains more than one HSP. Any ideas of how to make it work? Regards, Marcus Here's the script: use Bio::Graphics; use Bio::SeqFeature::Generic; use DBI; my $panel = Bio::Graphics::Panel->new(-length => 10000, -width => 600, -pad_left => 10, -pad_right => 10, ); $panel->add_track(arrow =>Bio::SeqFeature::Generic->new(-start =>0, -end =>10000), -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1); my $track = $panel->add_track(-glyph => 'graded_segments', -label => 1, -strand_arrow => 1, -connector => 'dashed', -bgcolor => 'blue'); $dbh = DBI->connect($dsn,$user); $sql = "SELECT sbj_name,sbj_count,hsp_count,e_value,score,query_begin, query_end,frame FROM table"; $sth = $dbh->prepare($sql); $sth->execute(); $set = $sth->fetchall_arrayref({}); $old_sbj_count = 0; foreach $row (@{$set}) { my $feature = Bio::SeqFeature::Generic->new( -display_name => $row->{sbj_name}, -start => $row->{query_begin}, -end => $row->{query_end}); # This is where the error occur. I don't know if $row is the right way to pass the additional HSP data. if ($old_sbj_count eq $row->{sbj_count}) { $feature->add_sub_SeqFeature($row,'EXPAND'); } $old_sbj_count = $row->{sbj_count}; $track->add_feature($feature); } print $panel->png; From simon.andrews at bbsrc.ac.uk Thu Nov 4 06:11:36 2004 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu Nov 4 06:15:46 2004 Subject: [Bioperl-l] Length of ID in EMBL sequence entries Message-ID: > -----Original Message----- > From: Daniel Lang [mailto:daniel.lang@biologie.uni-freiburg.de] > Sent: 04 November 2004 08:38 > To: Bioperl-List > Subject: [Bioperl-l] Length of ID in EMBL sequence entries > > > Hi, > I just stumbled over my sequence IDs getting trimmed to 10 > characters when writing with Bio::SeqIO::embl. line 453: > $temp_line = sprintf("%-11.10sstandard; $mol; $div; %d BP.", > $seq->id(), $len); It's one of the fixes in: http://bugzilla.bioperl.org/show_bug.cgi?id=1618 ..the problem was that the original format if given an 11 character ID code would allow that to run directly into the dataclass field on the ID line, which caused files generated in this way to not be recognised by a number of analysis programs. There have been a couple of previous posts on this list which were caused by this formatting issue. Looking back through the EMBL manual at: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html I don't see any clear guidance about exact posisionings in the ID line. All their examples show the dataclass (usually "standard") starting 11 characters from the beginning of the entryname so the fix included kept this distance, but limited the entryname to 10 chars to enforce a space between fields. The specification seems to include a space between entryname and dataclass, but that would limit us to 10 char entrynames. We could code this as: $temp_line = sprintf("%-12.11sstandard; $mol; $div; %d BP.", ..which would still allow an 11 char entryname, but would move the rest of the line along (which still looks like it conforms to the specification), but might this break other things? Does anyone have a definitive answer about the correct way to do this? Simon. From boconnor at ucla.edu Wed Nov 3 14:21:04 2004 From: boconnor at ucla.edu (Brian O'Connor) Date: Thu Nov 4 09:29:16 2004 Subject: [Bioperl-l] graphing trees In-Reply-To: References: <4188DF7D.6050606@ccr.jussieu.fr> <265214E0-2DB9-11D9-92E0-000A95D7BA10@mail.nih.gov> <41890EC1.40500@ccr.jussieu.fr> Message-ID: <41892FA0.9060505@ucla.edu> Bio::TreeIO::svggraph will let you create an SVG of a tree from a Newick formated file. --Brian Sean Davis wrote: > Did you look at: > > http://workshop.molecularevolution.org/resources/fileformats/ > tree_formats.php > > I happen to use R (statistical programming environment), which has > some stuff for plotting trees (which I don't use). > http://pbil.univ-lyon1.fr/ade4html/plot.phylog.html > > You might do a web search for "newick tree" to get stuff related to > newick format and drawing it. > > > On Nov 3, 2004, at 12:00 PM, Guillaume Rousse wrote: > >> Sean Davis wrote: >> >>> Not a bioperl answer, but you might want to look at: >>> http://search.cpan.org/search?query=graphviz&mode=all >>> Graphviz is a very nice package for dealing with graph visualization. >> >> It's what I used sofar, however it's not that practical for trees: >> - internal nodes are always visible >> - edges are always directs >> - you can't adjust their length easily > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Karolina.Zavisek at zg.htnet.hr Thu Nov 4 14:40:13 2004 From: Karolina.Zavisek at zg.htnet.hr (Karolina Zavisek) Date: Thu Nov 4 09:29:35 2004 Subject: [Bioperl-l] remote blast Message-ID: <200411041340.iA4DeD8X012026@ls413.htnet.hr> Hi, I would like to change the number of descriptions I get in my blast report from default 100 to 1000. I tried with adding a header $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = 1000; and I even changed the value in RemoteBlast.pm module itself and still no change. -Karolina ---------------------- T - c o m - - W e b m a i l ---------------------- Ova poruka poslana je upotrebom T-Com Webmail usluge. http://komunikator.tportal.hr From steve at trutane.net Wed Nov 3 20:35:30 2004 From: steve at trutane.net (Steve Chervitz Trutane) Date: Thu Nov 4 09:29:59 2004 Subject: [Bioperl-l] RE: Integrating caBIOperl with BIOperl In-Reply-To: Message-ID: Note: * perl MAGEstk - any tie in? * Always thought it would be nice to have a bridge into bioperl from MAGEstk. Would caBIO perl act as this bridge? * Are the caBio perl MAGE objects autogenerated from MAGE-OM? Steve > From: Hilmar Lapp > Date: Wed, 3 Nov 2004 09:35:59 -0800 > To: "Covitz, Peter (NIH/NCI)" > Cc: "'bioperl-l@bioperl.org'" , "'lstein@cshl.edu'" > > Subject: Re: [Bioperl-l] RE: Integrating caBIOperl with BIOperl > > I very much agree with Lincoln's comment. One of the more frequent > comments we have gotten is that expecially to newbies the plethora of > modules in Bioperl and the apparent diversity of its APIs are already > confusing. A bridge that binds the caBIOperl API to the existing > Bioperl object model would be a great addition though. > > -hilmar > > On Wednesday, November 3, 2004, at 06:47 AM, Covitz, Peter (NIH/NCI) > wrote: > >> Agreed, getting it into CPAN is the first order of business. >> >> caBIOperl is itself a wrapper around a lower-level SOAP-XML API. That >> gives >> us some flexibility on how we present the visible API interfaces. >> Once we >> get it into CPAN, I'd be interested in continuing the discussion of >> what the >> appropriate bridging and interface strategy would be to make it more >> suitable for use with bioperl. >> >> Thanks for the feedback! >> >> Regards, >> >> Peter >> >> -----Original Message----- >> From: Lincoln Stein [mailto:lstein@cshl.edu] >> Sent: Tuesday, November 02, 2004 5:48 PM >> To: Covitz, Peter (NIH/NCI); 'bioperl-l@bioperl.org' >> Subject: Re: [Bioperl-l] RE: Integrating caBIOperl with BIOperl >> >> >> I think that caBIOperl and BioPerl will both need some work in order >> to make >> >> the APIs coherent with each other. I am not eager to see caBIOperl >> just >> dropped in without a more thorough integration. Why not just release >> caBIOPerl onto CPAN? >> >> Lincoln >> >> On Tuesday 02 November 2004 03:13 pm, Covitz, Peter (NIH/NCI) wrote: >>> Ewan, >>> >>> I thought jump in and pick up this thread. I understand and agree >>> with >>> your point about needing a 'bridge' between caBIO classes and >>> equivalent >>> existing bioperl classes. Your suggestion on how to go about >>> implementing >>> such a bridge was helpful, thanks. >>> >>> Beyond that, I had been thinking that it might be useful to >>> contribute the >>> entire caBIOperl module to bioperl and make it part of the bioperl >>> core >>> package. caBIOperl is really just an object-oriented query interface >>> to >>> caBIO data servers, so I naively thought it might fit nicely under >>> Bio::DB::Query, perhaps Bio::DB::Query::caBIO ?? >>> >>> Of course people can use caBIOperl without it being part of bioperl. >>> However, there are a some classes and subject areas in caBIO that are >>> not >>> in bioperl, so we thought it might be a useful extension to bioperl >> itself. >>> In the next major caBIOperl release (~March 2005) we are going to >>> include >>> a full implementation of the MAGE-OM microarray data standard as part >>> of >>> the caBIOperl API, so that might be among the subject areas of >>> interest to >>> the bioperl community. >>> >>> I'd be interested to hear whether you and others think there might be >> value >>> in incorporating caBIOperl itself into bioperl, or if you'd rather >>> just >>> consider incorporating the 'bridge' module. >>> >>> Regards, >>> >>> Peter Covitz >>> >>> -----Original Message----- >>> From: Ewan Birney [mailto:birney@ebi.ac.uk] >>> Sent: Tuesday, October 12, 2004 4:07 AM >>> To: Jiang, Shan (NIH/NCI) >>> Cc: bioperl-l@bioperl.org >>> Subject: Re: Integrating caBIOperl with BIOperl >>> >>> On Mon, 11 Oct 2004, Jiang, Shan (NIH/NCI) wrote: >>>> Hi Ewan, >>>> >>>> I would like to introduce myself. I am a colleage of Gene Levinson at >> the >>>> National Cancer Institue in the US. I am the original developer of >>>> caBIOperl, which Gene presented at BOSC '04. I believe Gene talked to >> you >>>> quite extensively during the meeting as well. (Gene asked me to say >>>> hi!) >>>> >>>> Currently, I am undertaking the task of integrating caBIOperl with >>>> BIOperl.Gene indicated that you would be a great source to talk to. >>>> I am >>> >>> in >>> >>>> the process of learning BIOperl before deciding how to proceed. So I >>>> would much appreciate your help in learning BIOperl as well as >>>> looking >>>> into possible ways of integrating caBIOperl with BIOperl. >>> >>> Great - I'm cc'ing this message to the main bioperl list to check I >>> give >>> you the best advice! >>> >>>> Let me start asking some questions to start the ball rolling. >>>> >>>> 1. Has similar kinds of integration work been done before? If so is >> there >>> >>> a >>> >>>> general recommended approach? >>> >>> The recommendation is definitely to have an caBIOperl "bridge" to >>> Bioperl >>> objects. The main ones you want to have are Bio::SeqI, >>> Bio::DB::RandomAccessI and Bio::AnnotationCollectionI and >>> Bio::SeqFeatureI >>> >>> The "I" means interface (a bit like Java) >>> >>> In each case you would have wrapper classes that has-a caBIOPerl >>> object >>> and is-a Bioperl object, for example, imagining the caBIOPerl sequence >>> object has methods "human_readable_name" and "sequence_as_string" (of >>> course, they might have something completely different...) >>> >>> package Bio::caBIOBridge::Seq; >>> >>> @ISA = qw( Bio::SeqI ); >>> >>> ... >>> ... >>> >>> # Bio::SeqI isa Bio::PrimarySeqI, and needs to implement >>> # display_id. this should give back the human readable name >>> sub display_id { >>> my $obj = shift; >>> >>> # the caBIOPerl method is "human_readable_name" >>> return $obj->{'_cabioperl_object'}->human_readable_name() >>> } >>> >>> # Bio::SeqI needs to implement seq >>> sub seq { >>> return $obj->{'_cabioperl_object'}->sequence_as_string() >>> } >>> >>> >>> etc etc >>> >>> >>> This is, BTW, something I am planning to do with Ensembl as - make an >>> Ensembl-Bioperl bridge. >>> >>>> 2. Do you have a repository where people can just "donate" their code >>> >>> into? >>> >>> I would suggest that the caBIO-Bioperl bridge was its own cvs module >>> and >>> donated into CPAN. You could run the cvs module at Bioperl.org or do >>> it in >>> your own shop - entirely up to you. >>> >>>> 3. caBIOperl has its own object model, if the end vision is to >>>> integrate >>>> this model with BIOperl, how should I proceed? >>> >>> see above >>> >>>> 4. Can I get access to the CVS repository? >>> >>> You shouldn't need access to the bioperl cvs repository to come up >>> with >>> some working code - if you want to have the caBio-Bioperl bridge >>> repository hosted at bioperl.org that's feasible, but probably >>> building >>> some proof-of-concept classes first off would be great. >>> >>> >>> A great first step would be if someone could write a script like: >>> >>> >>> use Bio::caBIOBridge::DBAccess; >>> use Bio::SeqIO; >>> >>> # default to well known caBio server >>> $db = Bio::caBIOBridge::DBAccess->new(); >>> >>> $ca_wrapped_seq = $db->get_Seq_by_id('some_id'); >>> >>> # $ca_wrapped_seq is Bio::SeqI object but is actually a thin wrapper >>> over >>> # caBIO objects >>> >>> # Bio::SeqIO is a Bioperl object writer that works with Bio::SeqI >>> # compliant objects >>> $seqout = Bio::SeqIO->new( -format => 'EMBL'); >>> >>> # Here we see the bridge in action! >>> $seqout->write_seq($ca_wrapped_seq); >>> >>>> I am not sure how familiar you are with caBIOperl. So if you have any >>>> question, please do not hesistate to ask me. >>>> >>>> Regards, >>>> Shan Jiang >>>> (Contractor) >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Lincoln Stein >> lstein@cshl.edu >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From zg105 at york.ac.uk Thu Nov 4 10:27:32 2004 From: zg105 at york.ac.uk (Zara Ghazoui) Date: Thu Nov 4 10:25:47 2004 Subject: [Bioperl-l] comparing phylogenies. Message-ID: <418A4A64.2090008@york.ac.uk> Hi Jason, I am not sure if the method below would work in my case for comparing phylogenies. So - Make a list of node names in Tree A which into a clade with 70% support (based on the code I posted before). - Then find these nodes in Tree B (find_node method). - Find their LCA (get_lca method). - Then walk back down from the LCA (an internal node) and get all the tips (get_all_Descendents + the grep code posted below). - Check if there are any other taxa in the clade thus violating your requirement of identical clades. Post your code that does parts of what you want and we can help point. Please post to the list so other people can learn from it too. The reason being is that when I get all the descendents of the LCA in Tree B for a particular node, I will just get all the taxa provided that this particular node is an edge. Therefore the taxa would be the same since I have used the same species to build my 2 single gene phylogenies (TreeA and TreeB). So I though instead of looking up all the nodes from Tree A in Tree B, I 'll only look up the leafnodes. Also instead of going all the way down to the LCA in tree B, maybe I can just compare the descendents of the parents of that particular leafnode in Tree A and Tree B. However I think this will only work for comparing the groupings that form the leaves of my tree. So is there any way of doing this process recursively starting from the leaves i.e. Find leafnode X in Tree B, look up its sister groups (the descendents of its parent) + compare with the sister groups of leafnode X in TreeA. If similar move up Tree B doing this same process all the way to LCA in Tree B but if different then break saying these groupings are different. Thanks lot for your help in advance. Zara #!/usr/bin/perl use Bio::TreeIO; use Array::Compare ; my $inputA = new Bio::TreeIO(-file => "/biol/people/zg105/ProSeqFasta/AgroBradRpalusMesoSinoBrumelCauloRleg_stuff/aligned_files/test/1.phb",-format => "newick"); my $treeA = $inputA->next_tree; my $inputB = new Bio::TreeIO (-file => "/biol/people/zg105/ProSeqFasta/AgroBradRpalusMesoSinoBrumelCauloRleg_stuff/aligned_files/test/10.phb",-format => "newick"); my $treeB = $inputB ->next_tree; my $array_comp = Array::Compare->new; my @TreeA_leaves = grep {$_-> is_Leaf() && $_->ancestor->bootstrap > 700 } $treeA->get_nodes; for my $TreeAleaf (@TreeA_leaves) { my $TreeAleaf_parent = $TreeAleaf ->ancestor; my @TreeA_parent_descendents = $TreeAleaf_parent ->get_all_Descendents(); my $TreeAleafname = $TreeAleaf ->id; print ("Node in TreeA is $TreeAleafname \n"); for my $TreeA_parent_descendent (@TreeA_parent_descendents){ $TreeA_DescendentName = $TreeA_parent_descendent ->id ; push (@TreeA_DescendentNames ,$TreeA_DescendentName); print ("Descendents of this node in TreeA are $TreeA_DescendentName\n"); } my @TreeB_leaves = grep {$_-> is_Leaf() && $_->ancestor->bootstrap > 700 } $treeB->get_nodes; for my $TreeBleaf (@TreeB_leaves){ if ($TreeBleaf ->id eq $TreeAleafname ){ my $TreeBleaf_parent = $TreeBleaf ->ancestor; my @TreeB_parent_descendents = $TreeBleaf_parent->get_all_Descendents(); for my $TreeB_parent_descendent (@TreeB_parent_descendents){ $TreeB_DescendentName = $TreeB_parent_descendent ->id ; push (@TreeB_DescendentNames ,$TreeA_DescendentName); print ("Descendents of this node in TreeB are $TreeB_DescendentName \n"); unless( grep{ $_->id eq $TreeB_DescendentName} @DescendentANames) { print ("The trees are different\n"); } } } } } Well I'm only going to help if you try some of this out on your own first and post what you've tried. $tree->get_lca(-nodes => \@nodes) will find the least common ancestor ($node) = $tree->find_node($name) will find a particular node based on its name So - Make a list of node names in Tree A which into a clade with 70% support (based on the code I posted before). - Then find these nodes in Tree B (find_node method). - Find their LCA (get_lca method). - Then walk back down from the LCA (an internal node) and get all the tips (get_all_Descendents + the grep code posted below). - Check if there are any other taxa in the clade thus violating your requirement of identical clades. Post your code that does parts of what you want and we can help point. Please post to the list so other people can learn from it too. -jason From covitzp at mail.nih.gov Thu Nov 4 12:02:54 2004 From: covitzp at mail.nih.gov (Covitz, Peter (NIH/NCI)) Date: Thu Nov 4 12:01:30 2004 Subject: [Bioperl-l] RE: Integrating caBIOperl with BIOperl Message-ID: <27C204BD76CBC142BA1AE46D62A8548E0F69FD90@nihexchange9.nih.gov> Steve, What we do is take the MAGE-OM UML model (as an .XMI file) and feed it to a template-driven code generator. Up until now we have generated only our Java code that way, using templates that target the Java language. caBIOperl has up until now been hand-coded, and does not yet include the MAGE package. In the next major release (~March 2005) we expect to start autogenerating caBIOperl as well. That will enable us to include our MAGE API implementation in Perl as well as Java, since it is a fairly complicated model that would be difficult to hand-code. The caBIO APIs (including caBIOperl) provide standardized, object-oriented query interfaces to our model-driven data management systems. That means you will be able to retrieve MAGE-OM objects based on parameterized queries against their attributes. This API is not, however, a direct interface to MAGEstk. It is worth pointing out that we are using the MAGE-OM for a different purpose than it was originally designed for. Its original purpose was to provide a foundation for creating a MAGE-ML XML generator. We, however, are using it as a direct data service API specification for our microarray database. It isn't a perfect fit, since the MAGE-OM is quite complex, and therefore the API is complex. But we've found some value in having the API based upon the standard, since it fits well with our adoption of the model-driven architecture paradigm, and plan to support it going forward. Hope this explains it adequately, please let me know if not. Regards, Peter > -----Original Message----- > From: Steve Chervitz Trutane [mailto:steve@trutane.net] > Sent: Wednesday, November 03, 2004 8:35 PM > To: Hilmar Lapp; Covitz, Peter (NIH/NCI) > Cc: 'bioperl-l@bioperl.org'; Lincoln Stein > Subject: Re: [Bioperl-l] RE: Integrating caBIOperl with BIOperl > > > Note: > * perl MAGEstk - any tie in? > * Always thought it would be nice to have a bridge into bioperl from > MAGEstk. Would caBIO perl act as this bridge? > * Are the caBio perl MAGE objects autogenerated from MAGE-OM? > > Steve > > > From: Hilmar Lapp > > Date: Wed, 3 Nov 2004 09:35:59 -0800 > > To: "Covitz, Peter (NIH/NCI)" > > Cc: "'bioperl-l@bioperl.org'" , > "'lstein@cshl.edu'" > > > > Subject: Re: [Bioperl-l] RE: Integrating caBIOperl with BIOperl > > > > I very much agree with Lincoln's comment. One of the more frequent > > comments we have gotten is that expecially to newbies the > plethora of > > modules in Bioperl and the apparent diversity of its APIs > are already > > confusing. A bridge that binds the caBIOperl API to the existing > > Bioperl object model would be a great addition though. > > > > -hilmar > > > > On Wednesday, November 3, 2004, at 06:47 AM, Covitz, Peter > (NIH/NCI) > > wrote: > > > >> Agreed, getting it into CPAN is the first order of business. > >> > >> caBIOperl is itself a wrapper around a lower-level > SOAP-XML API. That > >> gives > >> us some flexibility on how we present the visible API interfaces. > >> Once we > >> get it into CPAN, I'd be interested in continuing the discussion of > >> what the > >> appropriate bridging and interface strategy would be to > make it more > >> suitable for use with bioperl. > >> > >> Thanks for the feedback! > >> > >> Regards, > >> > >> Peter > >> > >> -----Original Message----- > >> From: Lincoln Stein [mailto:lstein@cshl.edu] > >> Sent: Tuesday, November 02, 2004 5:48 PM > >> To: Covitz, Peter (NIH/NCI); 'bioperl-l@bioperl.org' > >> Subject: Re: [Bioperl-l] RE: Integrating caBIOperl with BIOperl > >> > >> > >> I think that caBIOperl and BioPerl will both need some > work in order > >> to make > >> > >> the APIs coherent with each other. I am not eager to see caBIOperl > >> just > >> dropped in without a more thorough integration. Why not > just release > >> caBIOPerl onto CPAN? > >> > >> Lincoln > >> > >> On Tuesday 02 November 2004 03:13 pm, Covitz, Peter > (NIH/NCI) wrote: > >>> Ewan, > >>> > >>> I thought jump in and pick up this thread. I understand and agree > >>> with > >>> your point about needing a 'bridge' between caBIO classes and > >>> equivalent > >>> existing bioperl classes. Your suggestion on how to go about > >>> implementing > >>> such a bridge was helpful, thanks. > >>> > >>> Beyond that, I had been thinking that it might be useful to > >>> contribute the > >>> entire caBIOperl module to bioperl and make it part of the bioperl > >>> core > >>> package. caBIOperl is really just an object-oriented > query interface > >>> to > >>> caBIO data servers, so I naively thought it might fit nicely under > >>> Bio::DB::Query, perhaps Bio::DB::Query::caBIO ?? > >>> > >>> Of course people can use caBIOperl without it being part > of bioperl. > >>> However, there are a some classes and subject areas in > caBIO that are > >>> not > >>> in bioperl, so we thought it might be a useful extension > to bioperl > >> itself. > >>> In the next major caBIOperl release (~March 2005) we are going to > >>> include > >>> a full implementation of the MAGE-OM microarray data > standard as part > >>> of > >>> the caBIOperl API, so that might be among the subject areas of > >>> interest to > >>> the bioperl community. > >>> > >>> I'd be interested to hear whether you and others think > there might be > >> value > >>> in incorporating caBIOperl itself into bioperl, or if you'd rather > >>> just > >>> consider incorporating the 'bridge' module. > >>> > >>> Regards, > >>> > >>> Peter Covitz > >>> > >>> -----Original Message----- > >>> From: Ewan Birney [mailto:birney@ebi.ac.uk] > >>> Sent: Tuesday, October 12, 2004 4:07 AM > >>> To: Jiang, Shan (NIH/NCI) > >>> Cc: bioperl-l@bioperl.org > >>> Subject: Re: Integrating caBIOperl with BIOperl > >>> > >>> On Mon, 11 Oct 2004, Jiang, Shan (NIH/NCI) wrote: > >>>> Hi Ewan, > >>>> > >>>> I would like to introduce myself. I am a colleage of > Gene Levinson at > >> the > >>>> National Cancer Institue in the US. I am the original > developer of > >>>> caBIOperl, which Gene presented at BOSC '04. I believe > Gene talked to > >> you > >>>> quite extensively during the meeting as well. (Gene > asked me to say > >>>> hi!) > >>>> > >>>> Currently, I am undertaking the task of integrating > caBIOperl with > >>>> BIOperl.Gene indicated that you would be a great source > to talk to. > >>>> I am > >>> > >>> in > >>> > >>>> the process of learning BIOperl before deciding how to > proceed. So I > >>>> would much appreciate your help in learning BIOperl as well as > >>>> looking > >>>> into possible ways of integrating caBIOperl with BIOperl. > >>> > >>> Great - I'm cc'ing this message to the main bioperl list > to check I > >>> give > >>> you the best advice! > >>> > >>>> Let me start asking some questions to start the ball rolling. > >>>> > >>>> 1. Has similar kinds of integration work been done > before? If so is > >> there > >>> > >>> a > >>> > >>>> general recommended approach? > >>> > >>> The recommendation is definitely to have an caBIOperl "bridge" to > >>> Bioperl > >>> objects. The main ones you want to have are Bio::SeqI, > >>> Bio::DB::RandomAccessI and Bio::AnnotationCollectionI and > >>> Bio::SeqFeatureI > >>> > >>> The "I" means interface (a bit like Java) > >>> > >>> In each case you would have wrapper classes that has-a caBIOPerl > >>> object > >>> and is-a Bioperl object, for example, imagining the > caBIOPerl sequence > >>> object has methods "human_readable_name" and > "sequence_as_string" (of > >>> course, they might have something completely different...) > >>> > >>> package Bio::caBIOBridge::Seq; > >>> > >>> @ISA = qw( Bio::SeqI ); > >>> > >>> ... > >>> ... > >>> > >>> # Bio::SeqI isa Bio::PrimarySeqI, and needs to implement > >>> # display_id. this should give back the human readable name > >>> sub display_id { > >>> my $obj = shift; > >>> > >>> # the caBIOPerl method is "human_readable_name" > >>> return $obj->{'_cabioperl_object'}->human_readable_name() > >>> } > >>> > >>> # Bio::SeqI needs to implement seq > >>> sub seq { > >>> return $obj->{'_cabioperl_object'}->sequence_as_string() > >>> } > >>> > >>> > >>> etc etc > >>> > >>> > >>> This is, BTW, something I am planning to do with Ensembl > as - make an > >>> Ensembl-Bioperl bridge. > >>> > >>>> 2. Do you have a repository where people can just > "donate" their code > >>> > >>> into? > >>> > >>> I would suggest that the caBIO-Bioperl bridge was its own > cvs module > >>> and > >>> donated into CPAN. You could run the cvs module at > Bioperl.org or do > >>> it in > >>> your own shop - entirely up to you. > >>> > >>>> 3. caBIOperl has its own object model, if the end vision is to > >>>> integrate > >>>> this model with BIOperl, how should I proceed? > >>> > >>> see above > >>> > >>>> 4. Can I get access to the CVS repository? > >>> > >>> You shouldn't need access to the bioperl cvs repository to come up > >>> with > >>> some working code - if you want to have the caBio-Bioperl bridge > >>> repository hosted at bioperl.org that's feasible, but probably > >>> building > >>> some proof-of-concept classes first off would be great. > >>> > >>> > >>> A great first step would be if someone could write a script like: > >>> > >>> > >>> use Bio::caBIOBridge::DBAccess; > >>> use Bio::SeqIO; > >>> > >>> # default to well known caBio server > >>> $db = Bio::caBIOBridge::DBAccess->new(); > >>> > >>> $ca_wrapped_seq = $db->get_Seq_by_id('some_id'); > >>> > >>> # $ca_wrapped_seq is Bio::SeqI object but is actually a > thin wrapper > >>> over > >>> # caBIO objects > >>> > >>> # Bio::SeqIO is a Bioperl object writer that works with Bio::SeqI > >>> # compliant objects > >>> $seqout = Bio::SeqIO->new( -format => 'EMBL'); > >>> > >>> # Here we see the bridge in action! > >>> $seqout->write_seq($ca_wrapped_seq); > >>> > >>>> I am not sure how familiar you are with caBIOperl. So if > you have any > >>>> question, please do not hesistate to ask me. > >>>> > >>>> Regards, > >>>> Shan Jiang > >>>> (Contractor) > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Lincoln Stein > >> lstein@cshl.edu > >> Cold Spring Harbor Laboratory > >> 1 Bungtown Road > >> Cold Spring Harbor, NY 11724 > >> (516) 367-8380 (voice) > >> (516) 367-8389 (fax) > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp at gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From daniel.lang at biologie.uni-freiburg.de Thu Nov 4 12:09:32 2004 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Thu Nov 4 12:07:59 2004 Subject: [Bioperl-l] Length of ID in EMBL sequence entries In-Reply-To: References: Message-ID: <418A624C.5010005@biologie.uni-freiburg.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I?m wondering why the entryname length should be limited at all... I just wrote an email to the EBI support. They should be able to solve this... But I fear that the standard is like "%-11.10s". I?ve checked parts of the latest EMBL distribution: perl -e 'while (<>) {print length $1 ,"\n" if /^ID\s+(\S+\s+)\S+/ && length $1 > 11;}' cum_2.dat - --> none of the entries was above 11 Since I have accessions >11 chars, I have to maintain my own version of embl.pm :( I?ll post the answer from ebi. Cheers Daniel simon andrews (BI) wrote: | |>-----Original Message----- |>From: Daniel Lang [mailto:daniel.lang@biologie.uni-freiburg.de] |>Sent: 04 November 2004 08:38 |>To: Bioperl-List |>Subject: [Bioperl-l] Length of ID in EMBL sequence entries |> |> |>Hi, |>I just stumbled over my sequence IDs getting trimmed to 10 |>characters when writing with Bio::SeqIO::embl. line 453: |>$temp_line = sprintf("%-11.10sstandard; $mol; $div; %d BP.", |>$seq->id(), $len); | | | It's one of the fixes in: | | http://bugzilla.bioperl.org/show_bug.cgi?id=1618 | | ..the problem was that the original format if given an 11 character ID | code would allow that to run directly into the dataclass field on the ID | line, which caused files generated in this way to not be recognised by a | number of analysis programs. There have been a couple of previous posts | on this list which were caused by this formatting issue. | | Looking back through the EMBL manual at: | | http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html | | I don't see any clear guidance about exact posisionings in the ID line. | All their examples show the dataclass (usually "standard") starting 11 | characters from the beginning of the entryname so the fix included kept | this distance, but limited the entryname to 10 chars to enforce a space | between fields. The specification seems to include a space between | entryname and dataclass, but that would limit us to 10 char entrynames. | | We could code this as: | | $temp_line = sprintf("%-12.11sstandard; $mol; $div; %d BP.", | | ..which would still allow an 11 char entryname, but would move the rest | of the line along (which still looks like it conforms to the | specification), but might this break other things? | | Does anyone have a definitive answer about the correct way to do this? | | Simon. | | | | | | | _______________________________________________ | Bioperl-l mailing list | Bioperl-l@portal.open-bio.org | http://portal.open-bio.org/mailman/listinfo/bioperl-l - -- Daniel Lang University of Freiburg, Plant Biotechnology Sonnenstr. 5, D-79104 Freiburg phone: +49 761 203 6988 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang@biologie.uni-freiburg.de ################################################# My software never has bugs. It just develops random features. ################################################# -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBimJMmJnbCpJAG3ARAvh5AJ9Zkmt2zP5AJuvgVnoQoQ9yyLEaJgCfZ2kX vWp6xJ4c+Kua9x8z5G7jkiU= =jY43 -----END PGP SIGNATURE----- From lmateiu at ualberta.ca Thu Nov 4 23:12:40 2004 From: lmateiu at ualberta.ca (Ligia Mateiu) Date: Thu Nov 4 23:11:59 2004 Subject: [Bioperl-l] Bio::DB::Query::GenBank Message-ID: <1099627959.6085.3.camel@Serenity> Hi all, I used a query for which exists >5000 hits in Genbank, but my code retrieved just the very fist 500. Any idea why? Thanks a lot, Mona From daniel.lang at biologie.uni-freiburg.de Fri Nov 5 03:41:25 2004 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Fri Nov 5 03:39:50 2004 Subject: [Bioperl-l] Length of ID in EMBL sequence entries Message-ID: <418B3CB5.5070108@biologie.uni-freiburg.de> Here is the answer from the EBI helpdesk. Seems like we could loosen our restrictions for the ID line:) I propose: 453c453 < $temp_line = sprintf("%-11.10sstandard; $mol; $div; %d BP.", $seq->id(), $len); --- > $temp_line = sprintf("%-10s standard; $mol; $div; %d BP.", $seq->id(), $len); Cheers, Daniel -------- Original Message -------- Subject: Re: EBI HELP: EMBL (Daniel.Lang@biologie.uni-freiburg.de) (rls) (SUP#213519) Date: Fri, 5 Nov 2004 01:52:17 GMT From: support@ebi.ac.uk To: Daniel.Lang@biologie.uni-freiburg.de Hi, The ID line description can be found in section 3.4.1 of the EMBL user manual which is at: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html The items in that line are not bound by column positions at the moment. Identifiers are not limited to 10 characters. Note that in the WGS data we already have 12 character ID's. It has not been decided what the policy will be in future for cases where ID or basepair number become so large the ID line will exceed it present width. However, wrapping may be allowed as in the case of AC lines. Hope this helps, R:) >Date: Thu, 4 Nov 2004 16:43:41 GMT >From: w3nobody@ebi.ac.uk >Reply-to: Daniel.Lang@biologie.uni-freiburg.de >To: support@ebi.ac.uk >Subject: EBI HELP: EMBL (Daniel.Lang@biologie.uni-freiburg.de) > > EMAIL: Daniel.Lang@biologie.uni-freiburg.de > QUERY TYPE: EBI HELP: EMBL (Daniel.Lang@biologie.uni-freiburg.de) > REFERER URL: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html > IP: 132.230.186.106 > BROWSER: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20040913 > Firefox/0.10 > DATE: Thursday 4 November 2004, 16:43 > > Hi, > I?ve got a question concerning the ID line in an EMBL entry: > Is there a length restriction for the entryname field in an EMBL entry? > In the manual, I don\'t see any clear guidance about exact posisionings in the > ID line. > All the examples show the dataclass (usually \"standard\") starting 11 > characters from the beginning of the entryname. > On the bioperl-list we are wondering how strict the standard is about the length > of the entryname and the positioning of the dataclass field. > At the moment the Bioperl SeqIO::embl module keeps this distance, but limited > the entryname to 10 chars to enforce a space between fields. The specification > seems to include a space between entryname and dataclass, but that limits to 10 > char entrynames. > > Thanks in advance, > Daniel > > > > > ######################################################################### -- Daniel Lang University of Freiburg, Plant Biotechnology Sonnenstr. 5, D-79104 Freiburg phone: +49 761 203 6988 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang@biologie.uni-freiburg.de ################################################# My software never has bugs. It just develops random features. ################################################# From peter.robinson at charite.de Fri Nov 5 05:51:26 2004 From: peter.robinson at charite.de (Robinson, Peter) Date: Fri Nov 5 05:49:52 2004 Subject: [Bioperl-l] Calculating distances from PDB files Message-ID: <5F7CE35370B6CF429AA3CA960ECC2780ACEC61@EXCHANGE2.charite.de> Dear bioperlers, I couldn't seem to find an answer to this anywhere, but forgive me if I should have... I would like to filter a set of PDB files for a given tetrapeptide and then calculate the distance between the C_alpha(1) and C_alpha(4) atoms. I am relatively new to this, but I have been able to extract the sequence from PDB files using Bio::Structure modules. When I look with Data::Dumper, I am unable to identify anything that looks like positional information for the residues. Does anybody have an example script or any suggestions? Thanks, Peter From simon.andrews at bbsrc.ac.uk Fri Nov 5 05:49:52 2004 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri Nov 5 05:50:15 2004 Subject: [Bioperl-l] Length of ID in EMBL sequence entries Message-ID: > -----Original Message----- > From: Daniel Lang [mailto:daniel.lang@biologie.uni-freiburg.de] > Sent: 05 November 2004 08:41 > To: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Length of ID in EMBL sequence entries > > > Here is the answer from the EBI helpdesk. > Seems like we could loosen our restrictions for the ID line:) > > I propose: > < sprintf("%-11.10sstandard; > > sprintf("%-10s standard; Err, don't those two equate to the same thing? If we're allowing 12 char IDs then that would be: $temp_line = sprintf("%-13.12sstandard; $mol; $div; %d BP.", $seq->id(), $len); But if we take what the EMBL said literally then we don't actually need to restrict the id length at all: $temp_line = sprintf("%s standard; $mol; $div; %d BP.", $seq->id(), $len); ..but that would seem to be asking for trouble if someone were to pass it a really large ID. I suppose the pragmatic limit would be to not allow the ID to be long enough to make the ID line longer than 80 chars, but that would require some extra logic due to the variation in length of the sequence length, and would impose a non-standard ID length limit. Simon. Simon. From rousse at ccr.jussieu.fr Fri Nov 5 07:53:35 2004 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Fri Nov 5 07:51:57 2004 Subject: [Bioperl-l] graphing trees In-Reply-To: <626DB952-2DB9-11D9-BFC6-000393C44276@duke.edu> References: <4188DF7D.6050606@ccr.jussieu.fr> <626DB952-2DB9-11D9-BFC6-000393C44276@duke.edu> Message-ID: <418B77CF.7070809@ccr.jussieu.fr> Jason Stajich wrote: > > On Nov 3, 2004, at 8:39 AM, Guillaume Rousse wrote: > >> Hello. >> >> I have large trees to graph, the same way as phylogenetic trees are >> usually drawn, with edges length expressing distances between nodes. >> > Bio::TreeIO::svggraph generates SVG. Seems the most interesting right now. Actually, I currently build my tree by first computing a distance matrix from my raw data, passing it to the Algorithm-Cluster module, and building nodes from its ouput. So the most logical way would be to implement a new Bio::TreeIO::cluster module, constructing a Bio::Tree object from this output, and using Bio::TreeIO::svggraph to output my graph. Would such piece of code eventually interest bioperl maintainers ? Alternatively, if there was a way to construct the nodes list from the matrix distance directly in BioPerl, so as to bypass Algorithm-Cluster, I could also get interested. I am just effraid that with a 3000 x 3000 matrix, a pure-perl implementation of the clustering could have perfs issues. From nathanhaigh at ukonline.co.uk Fri Nov 5 08:25:41 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Fri Nov 5 08:24:24 2004 Subject: [Bioperl-l] local blast Message-ID: I have a website setup to run local blasts on the server (windows) using the latest CVS BioPerl but I keep getting the following error, which seems to just stop Perl in it's tracks: ERROR: Subroutine new redefined at I:/Programming/Perl/Perl5.8.0/site/lib/Bio/Search/Result/GenericResult.pm line 162, <GEN0> line 1. It seems that when Bio::SearchIO::IteratedSearchResultEventBuilder make the following call (line 132): Bio::Factory::ObjectFactory->new( -type => 'Bio::Search::Result::BlastResult', -interface => 'Bio::Search::Result::ResultI')); It results in Bio::Root::Root evaluating (line 396) the following (where $load = Bio\Search\Result\BlastResult.pm): require $load; and then it just seems to stop, throwing the error shown above! It appears that Bio\Search\Result\GenericResult.pm was loaded the last time this subroutine was called. Could this be ActiveState Perl being over cautious when it is evaluated a second time around? Does anyone have any ideas for solutions? Thanks Nathan --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0445-1, 03/11/2004 Tested on: 05/11/2004 13:20:35 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From rousse at ccr.jussieu.fr Fri Nov 5 08:50:51 2004 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Fri Nov 5 08:49:12 2004 Subject: [Bioperl-l] graphing trees In-Reply-To: <06448BDA-2F31-11D9-979B-000A95D7BA10@mail.nih.gov> References: <4188DF7D.6050606@ccr.jussieu.fr> <626DB952-2DB9-11D9-BFC6-000393C44276@duke.edu> <418B77CF.7070809@ccr.jussieu.fr> <06448BDA-2F31-11D9-979B-000A95D7BA10@mail.nih.gov> Message-ID: <418B853B.8050002@ccr.jussieu.fr> Sean Davis wrote: > > On Nov 5, 2004, at 7:53 AM, Guillaume Rousse wrote: > >> Jason Stajich wrote: >> >>> On Nov 3, 2004, at 8:39 AM, Guillaume Rousse wrote: >>> >>>> Hello. >>>> >>>> I have large trees to graph, the same way as phylogenetic trees are >>>> usually drawn, with edges length expressing distances between nodes. >>>> >>> Bio::TreeIO::svggraph generates SVG. >> >> Seems the most interesting right now. >> >> Actually, I currently build my tree by first computing a distance >> matrix from my raw data, passing it to the Algorithm-Cluster module, >> and building nodes from its ouput. So the most logical way would be to >> implement a new Bio::TreeIO::cluster module, constructing a Bio::Tree >> object from this output, and using Bio::TreeIO::svggraph to output my >> graph. Would such piece of code eventually interest bioperl maintainers ? >> >> Alternatively, if there was a way to construct the nodes list from the >> matrix distance directly in BioPerl, so as to bypass >> Algorithm-Cluster, I could also get interested. I am just effraid that >> with a 3000 x 3000 matrix, a pure-perl implementation of the >> clustering could have perfs issues. >> > > Have you thought using R for the calculations/display? If you have that > much data, it might be worthwhile working with it in a computation > environment. Is R usable from a perl program, and not in interactive way only ? And this would only be a replacement for Algorithm-Cluster, which works quite well (it is a wrapper around a C library), that wouldn't get me rid of the task of building the Bio::Tree Object manually. I was just asking if a clustering algorithm was available from bioperl itself. From sdavis2 at mail.nih.gov Fri Nov 5 09:03:13 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri Nov 5 09:00:38 2004 Subject: [Bioperl-l] graphing trees In-Reply-To: <418B853B.8050002@ccr.jussieu.fr> References: <4188DF7D.6050606@ccr.jussieu.fr> <626DB952-2DB9-11D9-BFC6-000393C44276@duke.edu> <418B77CF.7070809@ccr.jussieu.fr> <06448BDA-2F31-11D9-979B-000A95D7BA10@mail.nih.gov> <418B853B.8050002@ccr.jussieu.fr> Message-ID: <6F34F4A1-2F33-11D9-979B-000A95D7BA10@mail.nih.gov> On Nov 5, 2004, at 8:50 AM, Guillaume Rousse wrote: > Sean Davis wrote: >> On Nov 5, 2004, at 7:53 AM, Guillaume Rousse wrote: >>> Jason Stajich wrote: >>> >>>> On Nov 3, 2004, at 8:39 AM, Guillaume Rousse wrote: >>>> >>>>> Hello. >>>>> >>>>> I have large trees to graph, the same way as phylogenetic trees >>>>> are usually drawn, with edges length expressing distances between >>>>> nodes. >>>>> >>>> Bio::TreeIO::svggraph generates SVG. >>> >>> Seems the most interesting right now. >>> >>> Actually, I currently build my tree by first computing a distance >>> matrix from my raw data, passing it to the Algorithm-Cluster module, >>> and building nodes from its ouput. So the most logical way would be >>> to implement a new Bio::TreeIO::cluster module, constructing a >>> Bio::Tree object from this output, and using Bio::TreeIO::svggraph >>> to output my graph. Would such piece of code eventually interest >>> bioperl maintainers ? >>> >>> Alternatively, if there was a way to construct the nodes list from >>> the matrix distance directly in BioPerl, so as to bypass >>> Algorithm-Cluster, I could also get interested. I am just effraid >>> that with a 3000 x 3000 matrix, a pure-perl implementation of the >>> clustering could have perfs issues. >>> >> Have you thought using R for the calculations/display? If you have >> that much data, it might be worthwhile working with it in a >> computation environment. > Is R usable from a perl program, and not in interactive way only ? And > this would only be a replacement for Algorithm-Cluster, which works > quite well (it is a wrapper around a C library), that wouldn't get me > rid of the task of building the Bio::Tree Object manually. I was just > asking if a clustering algorithm was available from bioperl itself. > As for what R can do, you can look at and do a search on the page for "phylo" or something like that.... http://cran.us.r-project.org/src/contrib/PACKAGES.html As for combining with perl, there is RSperl that is fairly complex (I haven't gotten it to build). There is also statistics-R (http://search.cpan.org/~gmpassos/Statistics-R-0.02/). Folks have used R with 2-way pipes. Finally, you can just use system calls to run R in batch mode (from perl, create a script file, run R->save results, read results). Let me say that I have not used R for phylogenetics of any kind, but I do use it for microarray data, etc. and find it a very nice addition to my own toolbox, even for perl-like data manipulation. Sean From barry.moore at genetics.utah.edu Fri Nov 5 11:44:43 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri Nov 5 11:43:06 2004 Subject: [Bioperl-l] Problems installing GD Message-ID: <418BADFB.3030406@genetics.utah.edu> I'm posting this to bioperl because GD is a dependency, and because I know that a lot of folks here use it. Hope that's not too much of a stretch. I'm trying to install the perl GD 2.16 library and make is failing. I have tried all of the suggestions in Linclon's README (except I haven't reinstalled perl). This does not appear to be a problem with libgd version (I get the same error trying to install GD-1.19 which doens't depend on external libraries) or dynamic linking. The error I get is about a syntax error in syntax error in /usr/include/stdlib.h. I've successfully installed GD on other systems - linux and windows - often with difficulty and I haven't seen this particular type of error before. I've googled various parts of the error message and searched and read perlmonks, comp.lang.perl.modules, the debian list etc. No similar errors out there that I can find. I'm installing on an Intel box running Debian GNU/Linux 2.4 and Perl 5.6.1. I have installed the latest versions of the following C library dependencies: zlib-1.2.1 libpng version 1.2.7 libgd version 2.0.33 jpeg-6b freetype version 1.3.1 My system doesn't have rpm. I tried apt-get install GD, but it couldn't find GD (I probably need to set up apt-get). I tried CPAN shell and manual installation from source. The last two attempts give me the same error. I installed all the above dependencies in the order listed manually and and they all installed fine with no errors and those that had tests past them. I su to root, move to the GD-2.16 directory and run perl Makefile.PL. I then run make and get the error: In file included from /usr/lib/perl/5.6.1/CORE/perl.h:493, from GD.xs:5: /usr/include/stdlib.h:158: syntax error before `long' make: *** [GD.o] Error 1 I tried installing GD-1.19 just to get something going, but I get the same error. The comlete output from the perl Makefile.PL and make commands are listed below. Any suggestions? Thanks. Barry -------------------------------------------------------------------------------------------------------------- bmoore@westwater:~/GD-2.16$ su root Password: westwater:/home/bmoore/GD-2.16# perl Makefile.PL NOTICE: This module requires libgd 2.0.12 or higher. it will NOT work with earlier versions. See www.cpan.org for versions of GD that are compatible with earlier versions of libgd. If you are using Math::Trig 1.01 or lower, it has a bug that causes a "prerequisite not found" warning to be issued. You may safely ignore this warning. Type perl Makefile.PL -h for command-line option summary Configuring for libgd version 2.0.33. Included Features: GD_XPM GD_PNG GD_GIF GD library used from: /usr/local If you experience compile problems, please check the @INC, @LIBPATH and @LIBS arrays defined in Makefile.PL and manually adjust, if necessary. Checking if your kit is complete... Looks good Writing Makefile for GD westwater:/home/bmoore/GD-2.16# make cp qd.pl blib/lib/qd.pl cp GD.pm blib/lib/GD.pm AutoSplitting blib/lib/GD.pm (blib/lib/auto/GD) cp GD/Polyline.pm blib/lib/GD/Polyline.pm /usr/bin/perl /usr/share/perl/5.6.1/ExtUtils/xsubpp -typemap /usr/share/perl/5.6.1/ExtUtils/typemap -typemap typemap GD.xs > GD.xsc && mv GD.xsc GD.c cc -c -I/usr/local/include -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -DVERSION=\"2.16\" -DXS_VERSION=\"2.16\" -fPIC "-I/usr/lib/perl/5.6.1/CORE" -DHAVE_XPM -DHAVE_GIF GD.c In file included from /usr/lib/perl/5.6.1/CORE/perl.h:493, from GD.xs:5: /usr/include/stdlib.h:158: syntax error before `long' make: *** [GD.o] Error 1 westwater:/home/bmoore/GD-2.16# -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From reche at research.dfci.harvard.edu Fri Nov 5 11:50:09 2004 From: reche at research.dfci.harvard.edu (Pedro Antonio Reche) Date: Fri Nov 5 11:48:28 2004 Subject: [Bioperl-l] getting proteins matching GO In-Reply-To: <587A7E3C-2D30-11D9-9167-000393C44276@duke.edu> References: <587A7E3C-2D30-11D9-9167-000393C44276@duke.edu> Message-ID: Hi, I am interested in getting all the protein sequences matching a specific GO term and I wonder if someone would know how to do this. Thanks in advance for any help. Cheers pdro From dhoworth at mrc-lmb.cam.ac.uk Fri Nov 5 12:18:52 2004 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Fri Nov 5 12:17:30 2004 Subject: [Bioperl-l] Problems installing GD In-Reply-To: <418BADFB.3030406@genetics.utah.edu> References: <418BADFB.3030406@genetics.utah.edu> Message-ID: <418BB5FC.4060605@mrc-lmb.cam.ac.uk> I think you have version problems in the underlying headers and libraries. Check carefully that all the packages are correctly installed and consistent. synaptic is quite good for that. Barry Moore wrote: > I'm installing on an Intel box running Debian GNU/Linux 2.4 Do you mean the kernel is a 2.4 kernel? What release of Debian are you running? woody/stable, sarge/testing, sid? > libgd version 2.0.33 Where did you get this from? It doesn't appear to be a standard Debian package. I believe you need a libgd2 package, not libgd, and the versions for that are here: http://packages.debian.org/cgi-bin/search_packages.pl?keywords=libgd2&searchon=names&subword=1&version=all&release=all which don't match what you have. I use woody and ended up patching it all by hand outside of apt. A real nightmare. Cheers, Dave From skirov at utk.edu Fri Nov 5 12:27:35 2004 From: skirov at utk.edu (Stefan Kirov) Date: Fri Nov 5 12:26:15 2004 Subject: [Bioperl-l] getting proteins matching GO In-Reply-To: References: <587A7E3C-2D30-11D9-9167-000393C44276@duke.edu> Message-ID: <418BB807.6060906@utk.edu> What organism? You can use either EnsMart (for example for human there is a table called hsapiens_gene_ensembl__xref_go__dm) or you can use GeneKeyDB if you install it locally (genereg.ornl.gov/gkdb), there is a table called ll_go, which you can search for the gene identifier(locuslink), associated with a particular GO term and then get the protein accession from another table (something like : "select r.np_accn from ll_go g, ll_refseq_nm r where r.ll_id=g.ll_id and g.go_term=?") and fetch the seq from RefSeq, etc. Both Ensembl and GeneKeyDB are restricted to certain eukaryotes. So it all depends on what kind of organisms you are expected to work with. Stefan Pedro Antonio Reche wrote: > Hi, > I am interested in getting all the protein sequences matching a > specific GO term and I wonder if someone would know how to do this. > Thanks in advance for any help. > Cheers > > pdro > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From facemann at yahoo.com Fri Nov 5 13:57:59 2004 From: facemann at yahoo.com (Andy Hammer) Date: Fri Nov 5 13:56:19 2004 Subject: [Bioperl-l] RE: Problems installing GD Solved. Message-ID: <20041105185759.35537.qmail@web13426.mail.yahoo.com> We solved the problem. Well, I guess there was nothing to solve. Just didn't realize apt-get libgd-perl was the module we needed. It loaded without a hitch! Thanks for the response! __________________________________ Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com From lstein at cshl.edu Fri Nov 5 14:32:24 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Nov 5 14:31:44 2004 Subject: [Bioperl-l] Problems installing GD In-Reply-To: <418BADFB.3030406@genetics.utah.edu> References: <418BADFB.3030406@genetics.utah.edu> Message-ID: <200411051432.24032.lstein@cshl.edu> Sadly this error is coming from your C compiler and/or its includes, and is not really related to GD. This is a very low level error and implies that you must: 1) reinstall GCC 2) reinstall libc or 3) reinstall glibc Not very attractive, I'm afraid, and any of these procedures has a chance of royally screwing up your system. Lincoln On Friday 05 November 2004 11:44 am, Barry Moore wrote: > I'm posting this to bioperl because GD is a dependency, and because > I know that a lot of folks here use it. Hope that's not too much > of a stretch. > > I'm trying to install the perl GD 2.16 library and make is failing. > I have tried all of the suggestions in Linclon's README (except I > haven't reinstalled perl). This does not appear to be a problem > with libgd version (I get the same error trying to install GD-1.19 > which doens't depend on external libraries) or dynamic linking. > The error I get is about a syntax error in syntax error in > /usr/include/stdlib.h. I've successfully installed GD on other > systems - linux and windows - often with difficulty and I haven't > seen this particular type of error before. I've googled various > parts of the error message and searched and read perlmonks, > comp.lang.perl.modules, the debian list etc. No similar errors out > there that I can find. > > I'm installing on an Intel box running Debian GNU/Linux 2.4 and > Perl 5.6.1. I have installed the latest versions of the following > C library dependencies: > > zlib-1.2.1 > libpng version 1.2.7 > libgd version 2.0.33 > jpeg-6b > freetype version 1.3.1 > > My system doesn't have rpm. I tried apt-get install GD, but it > couldn't find GD (I probably need to set up apt-get). I tried CPAN > shell and manual installation from source. The last two attempts > give me the same error. I installed all the above dependencies in > the order listed manually and and they all installed fine with no > errors and those that had tests past them. I su to root, move to > the GD-2.16 directory and run perl Makefile.PL. I then run make > and get the error: > > In file included from /usr/lib/perl/5.6.1/CORE/perl.h:493, > from GD.xs:5: > /usr/include/stdlib.h:158: syntax error before `long' > make: *** [GD.o] Error 1 > > I tried installing GD-1.19 just to get something going, but I get > the same error. > > The comlete output from the perl Makefile.PL and make commands are > listed below. Any suggestions? Thanks. > > > Barry > > ------------------------------------------------------------------- >------------------------------------------- > > bmoore@westwater:~/GD-2.16$ su root > Password: > westwater:/home/bmoore/GD-2.16# perl Makefile.PL > NOTICE: This module requires libgd 2.0.12 or higher. > it will NOT work with earlier versions. > See www.cpan.org for versions of GD that are compatible > with earlier versions of libgd. > > If you are using Math::Trig 1.01 or lower, it has a bug > that causes a "prerequisite not found" warning to be issued. You > may safely ignore this warning. > > Type perl Makefile.PL -h for command-line option summary > > Configuring for libgd version 2.0.33. > Included Features: GD_XPM GD_PNG GD_GIF > GD library used from: /usr/local > > If you experience compile problems, please check the @INC, @LIBPATH > and @LIBS > arrays defined in Makefile.PL and manually adjust, if necessary. > > Checking if your kit is complete... > Looks good > Writing Makefile for GD > westwater:/home/bmoore/GD-2.16# make > cp qd.pl blib/lib/qd.pl > cp GD.pm blib/lib/GD.pm > AutoSplitting blib/lib/GD.pm (blib/lib/auto/GD) > cp GD/Polyline.pm blib/lib/GD/Polyline.pm > /usr/bin/perl /usr/share/perl/5.6.1/ExtUtils/xsubpp -typemap > /usr/share/perl/5.6.1/ExtUtils/typemap -typemap typemap GD.xs > > GD.xsc && mv GD.xsc GD.c > cc -c -I/usr/local/include -DDEBIAN -fno-strict-aliasing > -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 > -DVERSION=\"2.16\" -DXS_VERSION=\"2.16\" -fPIC > "-I/usr/lib/perl/5.6.1/CORE" -DHAVE_XPM -DHAVE_GIF GD.c > In file included from /usr/lib/perl/5.6.1/CORE/perl.h:493, > from GD.xs:5: > /usr/include/stdlib.h:158: syntax error before `long' > make: *** [GD.o] Error 1 > westwater:/home/bmoore/GD-2.16# -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Fri Nov 5 16:31:47 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Nov 5 16:30:15 2004 Subject: [Bioperl-l] Problems installing GD In-Reply-To: <418BB5FC.4060605@mrc-lmb.cam.ac.uk> References: <418BADFB.3030406@genetics.utah.edu> <418BB5FC.4060605@mrc-lmb.cam.ac.uk> Message-ID: <200411051631.47173.lstein@cshl.edu> Hi Guys, For what it's worth, I compile my entire system from source and *NEVER* have version incompatibility problems. I've tried the various packaging systems, and *ALWAYS* run into glitches. The only problem is the initial boot, but I've gotten quite good at keying in hex from the serial port. Lincoln On Friday 05 November 2004 12:18 pm, Dave Howorth wrote: > I think you have version problems in the underlying headers and > libraries. Check carefully that all the packages are correctly > installed and consistent. synaptic is quite good for that. > > Barry Moore wrote: > > I'm installing on an Intel box running Debian GNU/Linux 2.4 > > Do you mean the kernel is a 2.4 kernel? > What release of Debian are you running? woody/stable, > sarge/testing, sid? > > > libgd version 2.0.33 > > Where did you get this from? It doesn't appear to be a standard > Debian package. I believe you need a libgd2 package, not libgd, and > the versions for that are here: > http://packages.debian.org/cgi-bin/search_packages.pl?keywords=libg >d2&searchon=names&subword=1&version=all&release=all which don't > match what you have. > > I use woody and ended up patching it all by hand outside of apt. A > real nightmare. > > Cheers, Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From reche at research.dfci.harvard.edu Fri Nov 5 17:25:30 2004 From: reche at research.dfci.harvard.edu (Pedro Antonio Reche) Date: Fri Nov 5 17:23:53 2004 Subject: [Bioperl-l] getting proteins matching GO In-Reply-To: <418BB807.6060906@utk.edu> References: <587A7E3C-2D30-11D9-9167-000393C44276@duke.edu> <418BB807.6060906@utk.edu> Message-ID: <99D4FA60-2F79-11D9-BD80-000393BC20D0@research.dfci.harvard.edu> Dear Stefan, thanks a lot for your e-mail. Actually, I am interested in getting all proteins from all organisms that are tagged with let say the go_process cell signaling. I will try the sites that you indicate to see if they can do the job. Do you know if Bioperl can also do this? Regards, pdro On Nov 5, 2004, at 12:27 PM, Stefan Kirov wrote: > What organism? You can use either EnsMart (for example for human there > is a table called hsapiens_gene_ensembl__xref_go__dm) or you can use > GeneKeyDB if you install it locally (genereg.ornl.gov/gkdb), there is > a table called ll_go, which you can search for the gene > identifier(locuslink), associated with a particular GO term and then > get the protein accession from another table (something like : > "select r.np_accn from ll_go g, ll_refseq_nm r where r.ll_id=g.ll_id > and g.go_term=?") and fetch the seq from RefSeq, etc. Both Ensembl and > GeneKeyDB are restricted to certain eukaryotes. So it all depends on > what kind of organisms you are expected to work with. > Stefan > > Pedro Antonio Reche wrote: > >> Hi, >> I am interested in getting all the protein sequences matching a >> specific GO term and I wonder if someone would know how to do this. >> Thanks in advance for any help. >> Cheers >> >> pdro >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Stefan Kirov, Ph.D. > University of Tennessee/Oak Ridge National Laboratory > 5700 bldg, PO BOX 2008 MS6164 > Oak Ridge TN 37831-6164 > USA > tel +865 576 5120 > fax +865-576-5332 > e-mail: skirov@utk.edu > sao@ornl.gov > > "And the wars go on with brainwashed pride > For the love of God and our human rights > And all these things are swept aside" > From sjmiller at email.arizona.edu Fri Nov 5 18:52:59 2004 From: sjmiller at email.arizona.edu (Susan J. Miller) Date: Fri Nov 5 18:51:21 2004 Subject: [Bioperl-l] Error using Bio::DB::SwissProt Message-ID: <418C125B.2000006@email.arizona.edu> When I run this small script: --------------------------------------- #!/usr/local/bin/perl -w use Bio::DB::SwissProt; $sp = Bio::DB::SwissProt->new(); $seqObj = $sp->get_Seq_by_acc('Q10654'); print $seqObj->display_id() . "\n"; --------------------------------------- I see the error message: Use of uninitialized value in substitution (s///) at /usr/local/lib/perl5/site_perl/5.6.0/Bio/SeqIO/swiss.pm line 855, line 30. --------------------------------------- The get_Seq_by_acc works, but I would like to present this to a class and not have the error message show up. Any suggestions? -- Thanks, -susan Susan J. Miller Biotechnology Computing Facility Arizona Research Laboratories Bio West 228 University of Arizona Tucson, AZ 85721 (520) 626-2597 From skirov at utk.edu Fri Nov 5 20:29:14 2004 From: skirov at utk.edu (Stefan Kirov) Date: Fri Nov 5 20:28:07 2004 Subject: [Bioperl-l] getting proteins matching GO In-Reply-To: <99D4FA60-2F79-11D9-BD80-000393BC20D0@research.dfci.harvard.edu> References: <587A7E3C-2D30-11D9-9167-000393C44276@duke.edu> <418BB807.6060906@utk.edu> <99D4FA60-2F79-11D9-BD80-000393BC20D0@research.dfci.harvard.edu> Message-ID: <418C28EA.9020309@utk.edu> Pedro, You may want to check Bio::Ontology and especially Bio::OntologyIO. These are pretty cool modules, but you will have to install bioperl-live or wait for bioperl 1.5 (which as I understand should be released soon). You will have to download the GO DB locally and parse it with Bio::OntologyIO, I am not sure if somebody is working on remote access (not familiar if it is possible at the moment). By the way if you are not familiar with mysql and you are OK with perl, Bio::OntologyIO might be easiest for you. It will also include anything you are able to get from GO website. But you will have to keep local database (or flat file). Hope this helps. Stefan Pedro Antonio Reche wrote: > Dear Stefan, thanks a lot for your e-mail. Actually, I am interested > in getting all proteins from all organisms that are tagged with let > say the go_process cell signaling. I will try the sites that you > indicate to see if they can do the job. Do you know if Bioperl can > also do this? > Regards, > > pdro > On Nov 5, 2004, at 12:27 PM, Stefan Kirov wrote: > >> What organism? You can use either EnsMart (for example for human >> there is a table called hsapiens_gene_ensembl__xref_go__dm) or you >> can use GeneKeyDB if you install it locally (genereg.ornl.gov/gkdb), >> there is a table called ll_go, which you can search for the gene >> identifier(locuslink), associated with a particular GO term and then >> get the protein accession from another table (something like : >> "select r.np_accn from ll_go g, ll_refseq_nm r where r.ll_id=g.ll_id >> and g.go_term=?") and fetch the seq from RefSeq, etc. Both Ensembl >> and GeneKeyDB are restricted to certain eukaryotes. So it all depends >> on what kind of organisms you are expected to work with. >> Stefan >> >> Pedro Antonio Reche wrote: >> >>> Hi, >>> I am interested in getting all the protein sequences matching a >>> specific GO term and I wonder if someone would know how to do this. >>> Thanks in advance for any help. >>> Cheers >>> >>> pdro >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> Stefan Kirov, Ph.D. >> University of Tennessee/Oak Ridge National Laboratory >> 5700 bldg, PO BOX 2008 MS6164 >> Oak Ridge TN 37831-6164 >> USA >> tel +865 576 5120 >> fax +865-576-5332 >> e-mail: skirov@utk.edu >> sao@ornl.gov >> >> "And the wars go on with brainwashed pride >> For the love of God and our human rights >> And all these things are swept aside" >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From sdavis2 at mail.nih.gov Fri Nov 5 23:37:36 2004 From: sdavis2 at mail.nih.gov (Davis, Sean (NIH/NHGRI)) Date: Fri Nov 5 23:36:03 2004 Subject: [Bioperl-l] getting proteins matching GO Message-ID: <0E3E7E8F6E23DF4C8127A063568356B50473B117@nihexchange12.nih.gov> If I'm not mistaken, the GO database contains mappings from GOA (http://www.ebi.ac.uk/GOA/). Sean -----Original Message----- From: Stefan Kirov To: Pedro Antonio Reche Cc: Bioperl Sent: 11/5/2004 8:29 PM Subject: Re: [Bioperl-l] getting proteins matching GO Pedro, You may want to check Bio::Ontology and especially Bio::OntologyIO. These are pretty cool modules, but you will have to install bioperl-live or wait for bioperl 1.5 (which as I understand should be released soon). You will have to download the GO DB locally and parse it with Bio::OntologyIO, I am not sure if somebody is working on remote access (not familiar if it is possible at the moment). By the way if you are not familiar with mysql and you are OK with perl, Bio::OntologyIO might be easiest for you. It will also include anything you are able to get from GO website. But you will have to keep local database (or flat file). Hope this helps. Stefan Pedro Antonio Reche wrote: > Dear Stefan, thanks a lot for your e-mail. Actually, I am interested > in getting all proteins from all organisms that are tagged with let > say the go_process cell signaling. I will try the sites that you > indicate to see if they can do the job. Do you know if Bioperl can > also do this? > Regards, > > pdro > On Nov 5, 2004, at 12:27 PM, Stefan Kirov wrote: > >> What organism? You can use either EnsMart (for example for human >> there is a table called hsapiens_gene_ensembl__xref_go__dm) or you >> can use GeneKeyDB if you install it locally (genereg.ornl.gov/gkdb), >> there is a table called ll_go, which you can search for the gene >> identifier(locuslink), associated with a particular GO term and then >> get the protein accession from another table (something like : >> "select r.np_accn from ll_go g, ll_refseq_nm r where r.ll_id=g.ll_id >> and g.go_term=?") and fetch the seq from RefSeq, etc. Both Ensembl >> and GeneKeyDB are restricted to certain eukaryotes. So it all depends >> on what kind of organisms you are expected to work with. >> Stefan >> >> Pedro Antonio Reche wrote: >> >>> Hi, >>> I am interested in getting all the protein sequences matching a >>> specific GO term and I wonder if someone would know how to do this. >>> Thanks in advance for any help. >>> Cheers >>> >>> pdro >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> Stefan Kirov, Ph.D. >> University of Tennessee/Oak Ridge National Laboratory >> 5700 bldg, PO BOX 2008 MS6164 >> Oak Ridge TN 37831-6164 >> USA >> tel +865 576 5120 >> fax +865-576-5332 >> e-mail: skirov@utk.edu >> sao@ornl.gov >> >> "And the wars go on with brainwashed pride >> For the love of God and our human rights >> And all these things are swept aside" >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Sat Nov 6 02:24:35 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 6 02:23:11 2004 Subject: [Bioperl-l] Error using Bio::DB::SwissProt In-Reply-To: <418C125B.2000006@email.arizona.edu> Message-ID: On Friday, November 5, 2004, at 03:52 PM, Susan J. Miller wrote: > I see the error message: > > Use of uninitialized value in substitution (s///) at > /usr/local/lib/perl5/site_perl/5.6.0/Bio/SeqIO/swiss.pm line 855, > line 30. > This is not an error, it's a (harmless) warning issued by perl. Remove the -w command line switch for perl and it will become silent. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Sat Nov 6 11:03:59 2004 From: skirov at utk.edu (Stefan A Kirov) Date: Sat Nov 6 11:02:24 2004 Subject: [Bioperl-l] getting proteins matching GO In-Reply-To: <0E3E7E8F6E23DF4C8127A063568356B50473B117@nihexchange12.nih.gov> References: <0E3E7E8F6E23DF4C8127A063568356B50473B117@nihexchange12.nih.gov> Message-ID: Sean, Please correct me if I am wrong, but GOA is still very limited project, just the human data is in it (at least this was the case the last time I saw it last month). And except or the manual annotation there is little you can do with GOA that you can't through EnsMART. Maybe I am missing something... Stefan On Fri, 5 Nov 2004, Davis, Sean (NIH/NHGRI) wrote: > >If I'm not mistaken, the GO database contains mappings from GOA >(http://www.ebi.ac.uk/GOA/). > >Sean > >-----Original Message----- >From: Stefan Kirov >To: Pedro Antonio Reche >Cc: Bioperl >Sent: 11/5/2004 8:29 PM >Subject: Re: [Bioperl-l] getting proteins matching GO > >Pedro, >You may want to check Bio::Ontology and especially Bio::OntologyIO. >These are pretty cool modules, but you will have to install bioperl-live > >or wait for bioperl 1.5 (which as I understand should be released soon). > >You will have to download the GO DB locally and parse it with >Bio::OntologyIO, I am not sure if somebody is working on remote access >(not familiar if it is possible at the moment). By the way if you are >not familiar with mysql and you are OK with perl, Bio::OntologyIO might >be easiest for you. It will also include anything you are able to get >from GO website. But you will have to keep local database (or flat >file). Hope this helps. >Stefan > > >Pedro Antonio Reche wrote: > >> Dear Stefan, thanks a lot for your e-mail. Actually, I am interested >> in getting all proteins from all organisms that are tagged with let >> say the go_process cell signaling. I will try the sites that you >> indicate to see if they can do the job. Do you know if Bioperl can >> also do this? >> Regards, >> >> pdro >> On Nov 5, 2004, at 12:27 PM, Stefan Kirov wrote: >> >>> What organism? You can use either EnsMart (for example for human >>> there is a table called hsapiens_gene_ensembl__xref_go__dm) or you >>> can use GeneKeyDB if you install it locally (genereg.ornl.gov/gkdb), >>> there is a table called ll_go, which you can search for the gene >>> identifier(locuslink), associated with a particular GO term and then >>> get the protein accession from another table (something like : >>> "select r.np_accn from ll_go g, ll_refseq_nm r where r.ll_id=g.ll_id >>> and g.go_term=?") and fetch the seq from RefSeq, etc. Both Ensembl >>> and GeneKeyDB are restricted to certain eukaryotes. So it all depends > >>> on what kind of organisms you are expected to work with. >>> Stefan >>> >>> Pedro Antonio Reche wrote: >>> >>>> Hi, >>>> I am interested in getting all the protein sequences matching a >>>> specific GO term and I wonder if someone would know how to do this. >>>> Thanks in advance for any help. >>>> Cheers >>>> >>>> pdro >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> Stefan Kirov, Ph.D. >>> University of Tennessee/Oak Ridge National Laboratory >>> 5700 bldg, PO BOX 2008 MS6164 >>> Oak Ridge TN 37831-6164 >>> USA >>> tel +865 576 5120 >>> fax +865-576-5332 >>> e-mail: skirov@utk.edu >>> sao@ornl.gov >>> >>> "And the wars go on with brainwashed pride >>> For the love of God and our human rights >>> And all these things are swept aside" >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >-- >Stefan Kirov, Ph.D. >University of Tennessee/Oak Ridge National Laboratory >5700 bldg, PO BOX 2008 MS6164 >Oak Ridge TN 37831-6164 >USA >tel +865 576 5120 >fax +865-576-5332 >e-mail: skirov@utk.edu >sao@ornl.gov > >"And the wars go on with brainwashed pride >For the love of God and our human rights >And all these things are swept aside" > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > From natg at shore.net Sun Nov 7 02:12:21 2004 From: natg at shore.net (Nathan (Nat) Goodman) Date: Sun Nov 7 02:12:53 2004 Subject: [Bioperl-l] getting proteins matching GO Message-ID: <200411070711.iA77BAKr009173@portal.open-bio.org> Hi Pedro > Pedro Antonio Reche wrote: > Dear Stefan, thanks a lot for your e-mail. Actually, I am interested > in getting all proteins from all organisms that are tagged with let say > the go_process cell signaling... The tricky part of working with GO annotations is that they are arranged in a hierarchical ontology. When you talk about wanting proteins that are tagged with a particular term, e.g., cell-cell signaling (GO:0007267), you probably also want proteins tagged with terms subordinate to the given term. There happen to be 93 such terms. I don't know if any of the sites mentioned by Stephan have this information at hand, but I have produced a table which I'm happy to share. It has 168,071 rows. If there are just a few terms that you're interested in, like cell-cell signaling, I can do the query for you and send you just that part of the table if that would be easier for you. The next step is to connect proteins to GO terms. I think the file you want is gene_association.goa_uniprot.gz at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/. Perhaps other readers can comment on whether there are better sources for the protein-GO connections you need. It's a flatfile that's easy to parse. A good way to proceed is to load the data into a relational database and then join with the GO defs from the paragraph above. You can also do the processing in Perl. Good luck, Nat ---------------------------------------------------------------------- Nathan (Nat) Goodman Senior Research Scientist Institute for Systems Biology 1441 North 34th Street Seattle, WA 98103-8904 206-331-0077 206-363-0431 (fax) natg@shore.net http://home.comcast.net/~natgoodman/ From krishnan at i2r.a-star.edu.sg Fri Nov 5 00:07:07 2004 From: krishnan at i2r.a-star.edu.sg (S.P.T.Krishnan) Date: Sun Nov 7 09:59:24 2004 Subject: [Bioperl-l] Bioperl methods and classes diagram ? Message-ID: <418B0A7B.7000608@i2r.a-star.edu.sg> Hi, Just joined the mailing list; Have been using Bioperl for some time now. Just wondering if there is any documentation for bioperl, that lists all the classes and the methods within them, just like MFC class sheet from Microsoft. thanks, Krishnan From sac at portal.open-bio.org Thu Nov 4 17:17:23 2004 From: sac at portal.open-bio.org (Steve Chervitz) Date: Sun Nov 7 10:01:05 2004 Subject: [Bioperl-l] RE: Integrating caBIOperl with BIOperl In-Reply-To: Message-ID: Sorry for the cryptic message. Somehow it got sent prematurely (pesky Entourage). I definitely favor the bridge plan rather than direct assimilation into Bioperl. My comments pertained to Peter's mention of MAGE-OM: >>> In the next major caBIOperl release (~March 2005) we are going to include a >>> full implementation of the MAGE-OM microarray data standard as part of the >>> caBIOperl API Sounds great. MAGE-OM/ML has been lacking a tie-in to Bioperl (and vice versa) for some time. Looks like caBIOperl could provide this link. I'm curious if the caBIO implementation will make use of any of the MAGEstk perl work: http://mged.sourceforge.net/software/MAGEstk.php (e.g., object generators, parsers, writers). There's a bioperl CVS repository called "bioperl-microarray" that would be a good location for any MAGE-specific caBIOperl bridge modules. However, a more generic bridge module should go into bioperl-live (where, perhaps Bio::Tools?). Steve > From: Steve Chervitz Trutane > Date: Wed, 03 Nov 2004 17:35:30 -0800 > To: Hilmar Lapp , "Covitz, Peter (NIH/NCI)" > > Cc: "'bioperl-l@bioperl.org'" , Lincoln Stein > > Subject: Re: [Bioperl-l] RE: Integrating caBIOperl with BIOperl > > Note: > * perl MAGEstk - any tie in? > * Always thought it would be nice to have a bridge into bioperl from MAGEstk. > Would caBIO perl act as this bridge? > * Are the caBio perl MAGE objects autogenerated from MAGE-OM? > > Steve > >> From: Hilmar Lapp >> Date: Wed, 3 Nov 2004 09:35:59 -0800 >> To: "Covitz, Peter (NIH/NCI)" >> Cc: "'bioperl-l@bioperl.org'" , "'lstein@cshl.edu'" >> >> Subject: Re: [Bioperl-l] RE: Integrating caBIOperl with BIOperl >> >> I very much agree with Lincoln's comment. One of the more frequent >> comments we have gotten is that expecially to newbies the plethora of >> modules in Bioperl and the apparent diversity of its APIs are already >> confusing. A bridge that binds the caBIOperl API to the existing >> Bioperl object model would be a great addition though. >> >> -hilmar >> >> On Wednesday, November 3, 2004, at 06:47 AM, Covitz, Peter (NIH/NCI) >> wrote: >> >>> Agreed, getting it into CPAN is the first order of business. >>> >>> caBIOperl is itself a wrapper around a lower-level SOAP-XML API. That >>> gives >>> us some flexibility on how we present the visible API interfaces. >>> Once we >>> get it into CPAN, I'd be interested in continuing the discussion of >>> what the >>> appropriate bridging and interface strategy would be to make it more >>> suitable for use with bioperl. >>> >>> Thanks for the feedback! >>> >>> Regards, >>> >>> Peter From nathanhaigh at ukonline.co.uk Fri Nov 5 08:21:41 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Sun Nov 7 10:01:28 2004 Subject: [Bioperl-l] local blast Message-ID: I have a website setup to run local blasts on the server (windows) using the latest CVS BioPerl but I keep getting the following error, which seems to just stop Perl in it's tracks: ERROR: Subroutine new redefined at I:/Programming/Perl/Perl5.8.0/site/lib/Bio/Search/Result/GenericResult.pm line 162, <GEN0> line 1. It seems that when Bio::SearchIO::IteratedSearchResultEventBuilder make the following call (line 132): Bio::Factory::ObjectFactory->new( -type => 'Bio::Search::Result::BlastResult', -interface => 'Bio::Search::Result::ResultI')); It results in Bio::Root::Root evaluating (line 396) the following (where $load = Bio\Search\Result\BlastResult.pm): require $load; and then it just seems to stop, throwing the error shown above! It appears that Bio\Search\Result\GenericResult.pm was loaded the last time this subroutine was called. Could this be ActiveState Perl being over cautious when it is evaluated a second time around? Does anyone have any ideas for solutions? Thanks Nathan --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0445-1, 03/11/2004 Tested on: 05/11/2004 13:20:35 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4756 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041105/c44279b5/winmail.bin From aqureshi at cs.odu.edu Sun Nov 7 21:58:43 2004 From: aqureshi at cs.odu.edu (affan qureshi) Date: Sun Nov 7 22:07:57 2004 Subject: [Bioperl-l] RemoteBlast: No Significant Matches (works for web interface) Message-ID: <33672.68.10.90.185.1099882723.squirrel@cartero.cs.odu.edu> Hi, I am trying to use the Remote Blast example for my own Nucleotide sequence but I get No Significant Matches found response from the server. However the same query gives me matches for the web-based interface "Nucleotide-nucleotide BLAST (blastn)" on the NCBI website. Do you know what I could be doing wrong? I searched the archive but saw a similar question for which i couldnt find the answer. Also is there a way to see the actual parameters being passed to the server and compare them with the web-based ones? Thanks a lot, Affan Here is my code: sub doBlast { #this is the filename containing nucleotide sequence my($filename) = @_; my $prog = 'blastn'; my $db = 'nr'; my $e_val= '1e-10'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; #remove a parameter #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; #$v is just to turn on and off the messages my $seqio = Bio::SeqIO->new(-file=>$filename, '-format' => 'fasta' ); while (my $input = $seqio->next_seq()) { #Blast a sequence against a database: my $r = $factory->submit_blast($input); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = 'results/'.$result->query_name().".blast"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } } From reche at research.dfci.harvard.edu Mon Nov 8 08:24:16 2004 From: reche at research.dfci.harvard.edu (Pedro Antonio Reche) Date: Mon Nov 8 08:22:45 2004 Subject: [Bioperl-l] getting proteins matching GO In-Reply-To: <200411070711.iA77BAKr009173@portal.open-bio.org> References: <200411070711.iA77BAKr009173@portal.open-bio.org> Message-ID: <7D136634-3189-11D9-B9AF-000393BC20D0@research.dfci.harvard.edu> Dear Nathan, thanks a lot for your help. As you mention I wish to collect all proteins subordinate to a given term. There are several terms I am interested in in retrieving the proteins (all related with the immune system) which I have not defined entirely. Therefore, I guess that it will be easier if you could send me the file you indicated. I have just retrieve the gene_association.goa_uniprot.gz. Thanks for the tip. I am looking forward to hearing from you. Best, pdro > Hi Pedro > >> Pedro Antonio Reche wrote: >> Dear Stefan, thanks a lot for your e-mail. Actually, I am interested >> in getting all proteins from all organisms that are tagged with let >> say >> the go_process cell signaling... > > The tricky part of working with GO annotations is that they are > arranged in > a hierarchical ontology. When you talk about wanting proteins that are > tagged with a particular term, e.g., cell-cell signaling (GO:0007267), > you > probably also want proteins tagged with terms subordinate to the given > term. > There happen to be 93 such terms. I don't know if any of the sites > mentioned > by Stephan have this information at hand, but I have produced a table > which > I'm happy to share. It has 168,071 rows. If there are just a few > terms > that you're interested in, like cell-cell signaling, I can do the > query for > you and send you just that part of the table if that would be easier > for > you. > > The next step is to connect proteins to GO terms. I think the file you > want > is gene_association.goa_uniprot.gz at > ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/. Perhaps other > readers > can comment on whether there are better sources for the protein-GO > connections you need. It's a flatfile that's easy to parse. A good > way to > proceed is to load the data into a relational database and then join > with > the GO defs from the paragraph above. You can also do the processing > in > Perl. > > Good luck, > Nat > ---------------------------------------------------------------------- > Nathan (Nat) Goodman > Senior Research Scientist > Institute for Systems Biology > 1441 North 34th Street > Seattle, WA 98103-8904 > 206-331-0077 > 206-363-0431 (fax) > natg@shore.net > http://home.comcast.net/~natgoodman/ > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From natg at shore.net Mon Nov 8 10:29:13 2004 From: natg at shore.net (Nathan (Nat) Goodman) Date: Mon Nov 8 10:28:03 2004 Subject: [Bioperl-l] getting proteins matching GO In-Reply-To: <7D136634-3189-11D9-B9AF-000393BC20D0@research.dfci.harvard.edu> Message-ID: <200411081527.iA8FRxKr031547@portal.open-bio.org> Hi Pedro I'll send the data in a separate message so as not to bombard the mailing list with a large attachment. If any other readers want the data, let me know, and I will be happy to forward. The data will come as a gzipped tar file containing two files created from the gene_ontology.obo file downloaded from the EBI GO site on Oct 26, 2004. go_defs.txt go_descendants.txt -------------------- go_defs.txt contains one row per GO term. The fields are go_id -- e.g., GO:0007267 namespace -- one of process, function, component definition -- long description of the term name -- short description of the term, e.g. cell-cell signaling -------------------- go_descendants.txt contains one row for each term and each 'descendant' of the term, where a descendant is a term which is subordinant to the given one. Note that the term itself is NOT considered a descendant for this purpose. The fields are go_id -- e.g., GO:0007267 namespace -- one of process, function, component descendant -- the go_id of one descendant -------------------- These are dumps from a relational database and the format is optimized for that purpose. To process the data in Perl, I would convert go_descendants.txt into a form with one line per go_id, with all descendants scrunched onto the same line separated by comma or space. Best, Nat > -----Original Message----- > From: Pedro Antonio Reche [mailto:reche@research.dfci.harvard.edu] > Sent: Monday, November 08, 2004 5:24 AM > To: natg@shore.net > Cc: Bioperl > Subject: Re: [Bioperl-l] getting proteins matching GO > > Dear Nathan, thanks a lot for your help. As you mention I > wish to collect all proteins subordinate to a given term. > There are several terms I am interested in in retrieving the > proteins (all related with the immune system) which I have > not defined entirely. Therefore, I guess that it will be > easier if you could send me the file you indicated. I have > just retrieve the gene_association.goa_uniprot.gz. > Thanks for the tip. > I am looking forward to hearing from you. > Best, From jason.stajich at duke.edu Mon Nov 8 10:41:54 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Nov 8 10:40:09 2004 Subject: [Bioperl-l] comparing phylogenies. In-Reply-To: <418A4A64.2090008@york.ac.uk> References: <418A4A64.2090008@york.ac.uk> Message-ID: On Nov 4, 2004, at 10:27 AM, Zara Ghazoui wrote: > Hi Jason, > > I am not sure if the method below would work in my case for comparing > phylogenies. > > So > - Make a list of node names in Tree A which into a clade with 70% > support (based on the code I posted before). > - Then find these nodes in Tree B (find_node method). > - Find their LCA (get_lca method). > - Then walk back down from the LCA (an internal node) and get all the > tips (get_all_Descendents + the grep code posted below). > - Check if there are any other taxa in the clade thus violating your > requirement of identical clades. > > Post your code that does parts of what you want and we can help point. > Please post to the list so other people can learn from it too. > > > The reason being is that when I get all the descendents of the LCA in > Tree B for a particular node, I will just get all the taxa provided > that this particular node is an edge. Therefore the taxa would be the > same since I have used the same species to build my 2 single gene > phylogenies (TreeA and TreeB). > > So I though instead of looking up all the nodes from Tree A in Tree B, > I 'll only look up the leafnodes. Also instead of going all the way > down to the LCA in tree B, > maybe I can just compare the descendents of the parents of that > particular leafnode in Tree A and Tree B. However I think this will > only work for comparing the groupings that form the leaves of my tree. > So is there any way of doing this process recursively starting from > the leaves i.e. Find leafnode X in Tree B, look up its sister groups > (the descendents of its parent) + compare with the sister groups of > leafnode X in TreeA. If similar move up Tree B doing this same > process all the way to LCA in Tree B but if different then break > saying these groupings are different. > I think this should work to get the sister group: my @sister_group = grep {$_->is_Leaf && $node->internal_id != $_->internal_id } $node->ancestor->get_all_Descendents internal_id is only relevant for within-tree comparisons. $_->is_Leaf insures you are only building lists which are leaf nodes. > > Thanks lot for your help in advance. > > Zara > > > > > > > > > > #!/usr/bin/perl > use Bio::TreeIO; > use Array::Compare ; > > my $inputA = new Bio::TreeIO(-file => > "/biol/people/zg105/ProSeqFasta/ > AgroBradRpalusMesoSinoBrumelCauloRleg_stuff/aligned_files/test/ > 1.phb",-format => "newick"); > my $treeA = $inputA->next_tree; > > my $inputB = new Bio::TreeIO (-file => > "/biol/people/zg105/ProSeqFasta/ > AgroBradRpalusMesoSinoBrumelCauloRleg_stuff/aligned_files/test/ > 10.phb",-format => "newick"); > my $treeB = $inputB ->next_tree; > > my $array_comp = Array::Compare->new; > my @TreeA_leaves = grep {$_-> is_Leaf() && $_->ancestor->bootstrap > > 700 } $treeA->get_nodes; > > for my $TreeAleaf (@TreeA_leaves) { > my $TreeAleaf_parent = $TreeAleaf ->ancestor; > my @TreeA_parent_descendents = $TreeAleaf_parent > ->get_all_Descendents(); > my $TreeAleafname = $TreeAleaf ->id; > print ("Node in TreeA is $TreeAleafname \n"); > for my $TreeA_parent_descendent (@TreeA_parent_descendents){ > $TreeA_DescendentName = $TreeA_parent_descendent ->id ; > push (@TreeA_DescendentNames ,$TreeA_DescendentName); > print ("Descendents of this node in TreeA are > $TreeA_DescendentName\n"); > } > > my @TreeB_leaves = grep {$_-> is_Leaf() && $_->ancestor->bootstrap > > 700 } $treeB->get_nodes; > for my $TreeBleaf (@TreeB_leaves){ > if ($TreeBleaf ->id eq $TreeAleafname ){ > my $TreeBleaf_parent = $TreeBleaf ->ancestor; > my @TreeB_parent_descendents = > $TreeBleaf_parent->get_all_Descendents(); > for my $TreeB_parent_descendent (@TreeB_parent_descendents){ > $TreeB_DescendentName = $TreeB_parent_descendent ->id ; > push (@TreeB_DescendentNames ,$TreeA_DescendentName); > print ("Descendents of this node in TreeB are > $TreeB_DescendentName \n"); > unless( grep{ $_->id eq $TreeB_DescendentName} > @DescendentANames) { > print ("The trees are different\n"); > } > } > } > } > } > > Well I'm only going to help if you try some of this out on your own > first and post what you've tried. > > $tree->get_lca(-nodes => \@nodes) will find the least common ancestor > ($node) = $tree->find_node($name) will find a particular node based > on its name > > So > - Make a list of node names in Tree A which into a clade with 70% > support (based on the code I posted before). > - Then find these nodes in Tree B (find_node method). > - Find their LCA (get_lca method). > - Then walk back down from the LCA (an internal node) and get all the > tips (get_all_Descendents + the grep code posted below). > - Check if there are any other taxa in the clade thus violating your > requirement of identical clades. > > Post your code that does parts of what you want and we can help point. > Please post to the list so other people can learn from it too. > > -jason > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From barry.moore at genetics.utah.edu Mon Nov 8 13:23:43 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon Nov 8 13:22:08 2004 Subject: [Bioperl-l] Problems installing GD In-Reply-To: <418BB5FC.4060605@mrc-lmb.cam.ac.uk> References: <418BADFB.3030406@genetics.utah.edu> <418BB5FC.4060605@mrc-lmb.cam.ac.uk> Message-ID: <418FB9AF.5000701@genetics.utah.edu> Thanks Dave and Andreas for your help. I previously had very limited experience with apt and debian packages. I read the apt docs over the weekend, and will no longer be trying to install from source things availble as packages. A colleague installed libgd-perl without a hitch over a weekend. One more small step up the linux learning curve! Thanks again. Barry Dave Howorth wrote: > I think you have version problems in the underlying headers and > libraries. Check carefully that all the packages are correctly > installed and consistent. synaptic is quite good for that. > > Barry Moore wrote: > >> I'm installing on an Intel box running Debian GNU/Linux 2.4 > > > Do you mean the kernel is a 2.4 kernel? > What release of Debian are you running? woody/stable, sarge/testing, sid? > >> libgd version 2.0.33 > > > Where did you get this from? It doesn't appear to be a standard Debian > package. I believe you need a libgd2 package, not libgd, and the > versions for that are here: > http://packages.debian.org/cgi-bin/search_packages.pl?keywords=libgd2&searchon=names&subword=1&version=all&release=all > > which don't match what you have. > > I use woody and ended up patching it all by hand outside of apt. A > real nightmare. > > Cheers, Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From bioperlanand at yahoo.com Sun Nov 7 16:34:33 2004 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Mon Nov 8 14:23:43 2004 Subject: [Bioperl-l] Re: Error using Bio::DB::SwissProt In-Reply-To: <200411060439.iA64cfKt022378@portal.open-bio.org> Message-ID: <20041107213433.81353.qmail@web61102.mail.yahoo.com> >From: "Susan J. Miller" >Subject: [Bioperl-l] Error using Bio::DB::SwissProt >To: bioperl-l@portal.open-bio.org >Message-ID: <418C125B.2000006@email.arizona.edu> >Content-Type: text/plain; charset=us-ascii; format=flowed >When I run this small script: > --------------------------------------- > #!/usr/local/bin/perl -w > use Bio::DB::SwissProt; > $sp = Bio::DB::SwissProt->new(); > $seqObj = $sp->get_Seq_by_acc('Q10654'); print $seqObj->display_id() . "\n"; > --------------------------------------- > I see the error message: > Use of uninitialized value in substitution (s///) > at /usr/local/lib/perl5/site_perl/5.6.0/Bio/SeqIO/swiss.pm line 855, line 30. > The get_Seq_by_acc works, but I would like to present this to a class and not have > the error message show up. Any suggestions? That is actually not an error....it is just a warning. That warning comes because the given accession "Q10654" is having a line starting with RG (instead of RA) around line 27. The next release of bioperl should fix this... Try a different accesion num (Q15375) in your script & you should not get those warnings. Here is an extract from the Uniprot User manual (on the RA/RG lines) http://us.expasy.org/sprot/userman.html#Ref_line ----------------------------------------------------------------------------------------------------------------------- 3.10.6. The RA line The RA (Reference Author) lines list the authors of the paper (or other work) cited. The RA line is present in most references, but might be missing in references that cite a reference group (see RG line). At least one RG or RA line is mandatory per reference block. ------------------------------------------------------------------------------------------------------------------ Anand > Thanks, > -susan __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From lstein at cshl.edu Mon Nov 8 15:08:33 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Nov 8 15:07:15 2004 Subject: [Bioperl-l] Problems installing GD In-Reply-To: <418FB9AF.5000701@genetics.utah.edu> References: <418BADFB.3030406@genetics.utah.edu> <418BB5FC.4060605@mrc-lmb.cam.ac.uk> <418FB9AF.5000701@genetics.utah.edu> Message-ID: <200411081508.33048.lstein@cshl.edu> Hi Barry, Fair enough, but I'm afraid you're gong to bump up against this problem the next time you're forced to install from source. Just be prepared. Lincoln On Monday 08 November 2004 01:23 pm, Barry Moore wrote: > Thanks Dave and Andreas for your help. I previously had very > limited experience with apt and debian packages. I read the apt > docs over the weekend, and will no longer be trying to install from > source things availble as packages. A colleague installed > libgd-perl without a hitch over a weekend. One more small step up > the linux learning curve! Thanks again. > > Barry > > Dave Howorth wrote: > > I think you have version problems in the underlying headers and > > libraries. Check carefully that all the packages are correctly > > installed and consistent. synaptic is quite good for that. > > > > Barry Moore wrote: > >> I'm installing on an Intel box running Debian GNU/Linux 2.4 > > > > Do you mean the kernel is a 2.4 kernel? > > What release of Debian are you running? woody/stable, > > sarge/testing, sid? > > > >> libgd version 2.0.33 > > > > Where did you get this from? It doesn't appear to be a standard > > Debian package. I believe you need a libgd2 package, not libgd, > > and the versions for that are here: > > http://packages.debian.org/cgi-bin/search_packages.pl?keywords=li > >bgd2&searchon=names&subword=1&version=all&release=all > > > > which don't match what you have. > > > > I use woody and ended up patching it all by hand outside of apt. > > A real nightmare. > > > > Cheers, Dave > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. From johnsonm at gmail.com Mon Nov 8 15:59:51 2004 From: johnsonm at gmail.com (Mark Johnson) Date: Mon Nov 8 15:58:12 2004 Subject: [Bioperl-l] Patched (Bugfix) Releases Message-ID: To quote http://www.bioperl.org: "Each official release passes our internal tests. We are committed to promptly fixing any bugs in the current 1.4 branch and making patched releases available. Bioperl releases are also mirrored by the Comprehensive Perl Archive Network (CPAN). CPAN links and info can be found at http://www.perl.com/." Am I blind and / or mentally challenged, or did no such maintenance releases ever occur for bioperl 1.4? They sure would be handy...has there been a concerted effort to merge appropriate fixes onto the 1.4 branch? Should I maybe have posted this to bioperl-guts instead? From amackey at pcbi.upenn.edu Mon Nov 8 17:08:14 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Nov 8 17:07:26 2004 Subject: [Bioperl-l] Patched (Bugfix) Releases In-Reply-To: References: Message-ID: Hi Mark, A 1.5 release is imminent ... we decided to pass on a 1.4 maintenance release with so much good stuff to be had in 1.5 -Aaron On Nov 8, 2004, at 3:59 PM, Mark Johnson wrote: > We are committed to > promptly fixing any bugs in the current 1.4 branch and making > patched releases available -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From aqureshi at cs.odu.edu Tue Nov 9 12:25:13 2004 From: aqureshi at cs.odu.edu (Affan Qureshi) Date: Tue Nov 9 12:34:26 2004 Subject: [Bioperl-l] Remote Blast error - 500 Too many open files In-Reply-To: <200411081527.iA8FRxKr031547@portal.open-bio.org> References: <7D136634-3189-11D9-B9AF-000393BC20D0@research.dfci.harvard.edu> <200411081527.iA8FRxKr031547@portal.open-bio.org> Message-ID: <34348.68.10.90.185.1100021113.squirrel@cartero.cs.odu.edu> I tried a remote blastx search today around 1:00pm and got this error message after a long wait. Also it seemed that the NCBI web interface was taking too long for BLAST searches. MSG: An Error Occurred

An Error Occurred

500 Cannot write to '/tmp/hLFqVvHO2D': Too many open files Is this a remote server error or am I doing something wrong? Anyone else got this error? Thanks, Affan From aqureshi at cs.odu.edu Tue Nov 9 12:37:03 2004 From: aqureshi at cs.odu.edu (Affan Qureshi) Date: Tue Nov 9 12:46:13 2004 Subject: [Bioperl-l] RemoteBlast: No Significant Matches (works for web interface) Message-ID: <34407.68.10.90.185.1100021823.squirrel@cartero.cs.odu.edu> I have tried blastn as well as blastx searches but they dont give me any result. Also nt database doesnt help either. There must be something wrong with the code I think. In fact the following sequence gives an error: Protein FASTA provided for Nucleotide sequence. The other sequence doesnt give me this error but no results either. This is my sequence which gives me the above error: AGGTCAAGGCAAAGGGCCACAAAGCCATTGGATCAACGAGCTACACCGTCGACATGTTCGTTGAGAACACGAACTTCTTC GTCCAGGTCACAAGCGCCCGATCGGTTCCTGCCACCTTGAAGGCCATCAGCCTGAATTCACTGGAGCTCAAGATACGCGA AAGTACCAAGCTCGGCCTGAACAAGGCACGCAGTAAGAAGTACCACGAGGCGATCCGGACCCGCGTCCAGGACCAACTGG CGGCCCTACTGTACGGAAGTTTCCGAGACGCGCTGAACAATTCCCTTGCTACCGTAACTGCGCCATTCCCCTGATTTACA GTTATGCGAAAAATACGCCCCACGACTTGCAAATTTGCCTCTTCAGAGTCACTGTACAGGTTCCAAAAGATTGTGGATTA AGCATGGAATAACGTGTATAGAACCTTGAAATAAACTGGTGGGGAAATACAAAAAAAAAAAAAAAAAAAA Also this one with no error but no results either: GGGGTGGCATGGCATTAGGCCGGGGATCCTGCAAAGTGAAGAATATTTCCTGGTCTATCCCCTTTCACNAAAGTATTTGC CTGGGCAATCCAACAGGGTGTGTCCAAGGCTGTCTCNCATGGTGTGGGGGCCCTAATTAACAAGGGATAAATCAGCAGTA TTTCCNTCCAACACTGGNACNCNAATAAAGTATGTGNNCCCCACGAAAAAAAAAAAAAAAAAAAA Here is the tail of some output on the screen when running the first sequence which probably shows the parameters passed to Blast: ........NATGGCATTAGGCCGGGGGACGCTTCGAGNTANCCGCATCGCCGANTTCCGCAANTCAGTTGTGNGCTAC TTCCA%00A02_DVMGL_P1DVM-contig_58%00&WORD+SIZE=11&EXPECT=1e-10&SERVICE=plain& FORMAT_OBJECT=Alignment&CMD=Put&CDD_SEARCH=off&PROGRAM=blastn Thanks for your time, Affan > nt is usually used for nucleotide BLAST ... that came across backwards > :) > > JO > > > On Sun, 7 Nov 2004 22:41:02 -0600, Joshua Orvis > wrote: >> I get different things with it when I use nr as the database rather >> than nt when doing nucleotide BLAST. Try that. What's your sequence >> (just in case)? >> >> JO >> >> >> >> >> On Sun, 7 Nov 2004 21:58:43 -0500 (EST), affan qureshi >> wrote: >> > Hi, >> > I am trying to use the Remote Blast example for my own Nucleotide >> sequence but I get No Significant Matches found response from the >> server. However the same query gives me matches for the web-based >> interface >> > "Nucleotide-nucleotide BLAST (blastn)" on the NCBI website. >> > >> > Do you know what I could be doing wrong? I searched the archive but >> saw a similar question for which i couldnt find the answer. Also is >> there a way to see the actual parameters being passed to the server >> and compare them with the web-based ones? >> > >> > Thanks a lot, >> > >> > Affan >> > >> > Here is my code: >> > >> > sub doBlast { >> > #this is the filename containing nucleotide sequence >> > my($filename) = @_; >> > >> > my $prog = 'blastn'; >> > my $db = 'nr'; >> > my $e_val= '1e-10'; >> > >> > my @params = ( '-prog' => $prog, >> > '-data' => $db, >> > '-expect' => $e_val, >> > '-readmethod' => 'SearchIO' ); >> > >> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> > >> > #change a paramter >> > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >> 'Homo sapiens >> > [ORGN]'; >> > >> > #remove a parameter >> > #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; >> > >> > my $v = 1; >> > #$v is just to turn on and off the messages >> > >> > my $seqio = Bio::SeqIO->new(-file=>$filename, '-format' => >> 'fasta' ); >> > >> > while (my $input = $seqio->next_seq()) { >> > >> > #Blast a sequence against a database: >> > >> > my $r = $factory->submit_blast($input); >> > >> > print STDERR "waiting..." if( $v > 0 ); >> > while ( my @rids = $factory->each_rid ) { >> > foreach my $rid ( @rids ) { >> > my $rc = $factory->retrieve_blast($rid); >> if( >> !ref($rc) ) { >> > if( $rc < 0 ) { >> > $factory->remove_rid($rid); >> > } >> > print STDERR "." if ( $v > 0 ); >> sleep 5; >> > } else { >> > my $result = $rc->next_result(); >> #save the output >> > my $filename = >> 'results/'.$result->query_name().".blast"; >> $factory->save_output($filename); >> $factory->remove_rid($rid); >> > print "\nQuery Name: ", >> $result->query_name(), "\n"; while ( >> my $hit = $result->next_hit ) { >> > next unless ( $v > 0); >> print >> "\thit name is ", >> $hit->name, "\n"; while( my >> $hsp = $hit->next_hsp ) { >> print "\t\tscore is ", >> $hsp->score, "\n"; } >> > } >> > } >> > } >> > } >> > } >> > >> > } >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l@portal.open-bio.org >> > http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > From sdavis2 at mail.nih.gov Tue Nov 9 14:55:04 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Nov 9 14:52:18 2004 Subject: [Bioperl-l] MEME parsing Message-ID: <3FE240D4-3289-11D9-835B-000A95D7BA10@mail.nih.gov> I am using TFBS::PatternGen::MEME to generate motifs based on a set of sequences. I am interested in the e-value and log ratio for each motif, but I don't see that in the object. Is there a simple way to do find that out? How would others approach the problem of running many meme runs (multiple batches of sequences)? Thanks, Sean From sdavis2 at mail.nih.gov Tue Nov 9 15:17:21 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Nov 9 15:14:46 2004 Subject: [Bioperl-l] MEME parsing In-Reply-To: <3FE240D4-3289-11D9-835B-000A95D7BA10@mail.nih.gov> References: <3FE240D4-3289-11D9-835B-000A95D7BA10@mail.nih.gov> Message-ID: <5CB5D09A-328C-11D9-835B-000A95D7BA10@mail.nih.gov> Sorry. I answered my own question--the e-value is stored as a tag. Sean On Nov 9, 2004, at 2:55 PM, Sean Davis wrote: > I am using TFBS::PatternGen::MEME to generate motifs based on a set of > sequences. I am interested in the e-value and log ratio for each > motif, but I don't see that in the object. Is there a simple way to > do find that out? How would others approach the problem of running > many meme runs (multiple batches of sequences)? > > Thanks, > Sean > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Tue Nov 9 15:33:13 2004 From: skirov at utk.edu (Stefan Kirov) Date: Tue Nov 9 15:31:37 2004 Subject: [Bioperl-l] MEME parsing In-Reply-To: <3FE240D4-3289-11D9-835B-000A95D7BA10@mail.nih.gov> References: <3FE240D4-3289-11D9-835B-000A95D7BA10@mail.nih.gov> Message-ID: <41912989.5010604@utk.edu> Hi Sean, You can take a look at Bio::Matrix::PSM::IO... Are you looking into DNA or protein? By the way I don't know if you remember that part of my talk, but our pipeline does exactly that- runs meme in batch mode. We have set up a database to store/mine the meme results and it is compatible with gkdb. I can export the schema for you and the mining/storing perl modules if you would like. Stefan Sean Davis wrote: > I am using TFBS::PatternGen::MEME to generate motifs based on a set of > sequences. I am interested in the e-value and log ratio for each > motif, but I don't see that in the object. Is there a simple way to > do find that out? How would others approach the problem of running > many meme runs (multiple batches of sequences)? > > Thanks, > Sean > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From popgen23 at mac.com Tue Nov 9 16:05:04 2004 From: popgen23 at mac.com (Michael Robeson) Date: Tue Nov 9 16:03:38 2004 Subject: [Bioperl-l] dynamically making a matrix Message-ID: <06E1E12E-3293-11D9-8F8A-000393539070@mac.com> Well, with your help and that of others I have been able to come up with this working code: #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my(%gap, $animal); $/ = '>'; while () { next unless s/^\s*(.+)//; $animal = $1; while (/(-+)/g) { my $gap_length = length $1; my $position = pos() - $gap_length +1; push @{$gap{$animal}{$gap_length}}, $position; } } print Dumper \%gap; __DATA__ >mouse GTTATAAAGTTTCTTTGAGACAGTAAAATTATGGTTTCAAGAAAGAGCCA TTGCCTCCTGTGCTGTTTGAAG--GGAAAGGAGGGGTGCCCC---TCCTC AACTCTGGT-ACA-TTAACATACTACTTACTACTTAGCATACTCTTTACT AGGGAGCGATTGGGGACCACTAATATCT----CACTAAGATATCATACTA >rat GTTATAAAGTTTCTTTGAGACAGTAAAATTATGGTTTCAAGAAAGAACCA TTGCCTCCTGTGCTGTTTGAAG--GGAAAGGAAGGA-GCCCC---TCCTC AAGTCCGGC-ACA-CTAACGTGCAACTTACTAATTAACATACTGTTTACT AGGGAGGTATTGGGGGCCTCTAATATCC----CATTAAGATATCATACTA >Human GTTATAAAGTTTCTTTGAGACAGTAAAATTATGATTTCTTGAAAGAACTG CT--CTCTTGTGCTGTGTGAGGCTGTGCCAGGGGGCCAGGCCAGGTTCCC GCCTCTGGAGACAGTTCATACAGGGTCAGCGACTTATCAA----CTTATC GGTGATAGAATGGAGACCCTGTACCCCAGAAACACCAGGGTATCGT-CAG >chimp GTTATAAAGTTTCTTTGAGACAGTAAAATTATGATTTCTTGAAAGAACTG CT--CTCTTGTGCTGTGTGAGGCTGTGCCAGGGGGCCAGGCCAGGTTCCC GCCTCTGGAGACAGTTCATACAGGGTCAGCGACTTATCAA----CTTATC GGTGATAGAATGGAGACCCTGTACCCCAGAAACACCAGGGTATCGT-CAG Well, I am trying to make a matrix of 1's (present) and 0's (absent) based on gaps in DNA sequence. I am still having trouble determining if my script above is the way to begin. I basically would like to take the gaps that I have stored in the hash of a hash of arrays and make a matrix that is output as follows (though I do not need that first row 'gap size (pos):' printed, that was just to show you were the data are coming from / being compared): gap size (pos): 1(89) 1(113) 1(117) 1(201) 2(55) 2(75) 3(95) 4(144) 4(183) chimp: 0 0 0 1 1 0 0 1 0 human: 0 0 0 1 1 0 0 1 0 rat: 1 1 1 0 0 1 1 0 1 mouse: 0 1 1 0 0 1 1 0 1 Any suggestions on were to go from here? I just realized that my HoHoA may be to cumbersome to output the data in the above fashion. Any recommendations? -Mike From jason.stajich at duke.edu Tue Nov 9 17:34:39 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 9 17:32:56 2004 Subject: [Bioperl-l] RemoteBlast: No Significant Matches (works for web interface) In-Reply-To: <34407.68.10.90.185.1100021823.squirrel@cartero.cs.odu.edu> References: <34407.68.10.90.185.1100021823.squirrel@cartero.cs.odu.edu> Message-ID: <8A969E1E-329F-11D9-A5C2-000393C44276@duke.edu> You might want to post exactly the script you have run, embedding the sequence data in it so someone else might be able to help with the minimal amount of playing around. -jason On Nov 9, 2004, at 12:37 PM, Affan Qureshi wrote: > I have tried blastn as well as blastx searches but they dont give me > any > result. Also nt database doesnt help either. There must be something > wrong with the code I think. > > In fact the following sequence gives an error: Protein FASTA provided > for Nucleotide sequence. The other sequence doesnt give me this error > but no results either. > > This is my sequence which gives me the above error: > > AGGTCAAGGCAAAGGGCCACAAAGCCATTGGATCAACGAGCTACACCGTCGACATGTTCGTTGAGAACACG > AACTTCTTC > GTCCAGGTCACAAGCGCCCGATCGGTTCCTGCCACCTTGAAGGCCATCAGCCTGAATTCACTGGAGCTCAA > GATACGCGA > AAGTACCAAGCTCGGCCTGAACAAGGCACGCAGTAAGAAGTACCACGAGGCGATCCGGACCCGCGTCCAGG > ACCAACTGG > CGGCCCTACTGTACGGAAGTTTCCGAGACGCGCTGAACAATTCCCTTGCTACCGTAACTGCGCCATTCCCC > TGATTTACA > GTTATGCGAAAAATACGCCCCACGACTTGCAAATTTGCCTCTTCAGAGTCACTGTACAGGTTCCAAAAGAT > TGTGGATTA > AGCATGGAATAACGTGTATAGAACCTTGAAATAAACTGGTGGGGAAATACAAAAAAAAAAAAAAAAAAAA > > Also this one with no error but no results either: > GGGGTGGCATGGCATTAGGCCGGGGATCCTGCAAAGTGAAGAATATTTCCTGGTCTATCCCCTTTCACNAA > AGTATTTGC > CTGGGCAATCCAACAGGGTGTGTCCAAGGCTGTCTCNCATGGTGTGGGGGCCCTAATTAACAAGGGATAAA > TCAGCAGTA > TTTCCNTCCAACACTGGNACNCNAATAAAGTATGTGNNCCCCACGAAAAAAAAAAAAAAAAAAAA > > Here is the tail of some output on the screen when running the first > sequence which probably shows the parameters passed to Blast: > > ........NATGGCATTAGGCCGGGGGACGCTTCGAGNTANCCGCATCGCCGANTTCCGCAANTCAGTTGT > GNGCTAC > TTCCA%00A02_DVMGL_P1DVM-contig_58%00&WORD+SIZE=11&EXPECT=1e > -10&SERVICE=plain& > FORMAT_OBJECT=Alignment&CMD=Put&CDD_SEARCH=off&PROGRAM=blastn > > Thanks for your time, > > Affan > >> nt is usually used for nucleotide BLAST ... that came across backwards >> :) >> >> JO >> >> >> On Sun, 7 Nov 2004 22:41:02 -0600, Joshua Orvis >> wrote: >>> I get different things with it when I use nr as the database rather >>> than nt when doing nucleotide BLAST. Try that. What's your sequence >>> (just in case)? >>> >>> JO >>> >>> >>> >>> >>> On Sun, 7 Nov 2004 21:58:43 -0500 (EST), affan qureshi >>> wrote: >>>> Hi, >>>> I am trying to use the Remote Blast example for my own Nucleotide >>> sequence but I get No Significant Matches found response from the >>> server. However the same query gives me matches for the web-based >>> interface >>>> "Nucleotide-nucleotide BLAST (blastn)" on the NCBI website. >>>> >>>> Do you know what I could be doing wrong? I searched the archive but >>> saw a similar question for which i couldnt find the answer. Also is >>> there a way to see the actual parameters being passed to the server >>> and compare them with the web-based ones? >>>> >>>> Thanks a lot, >>>> >>>> Affan >>>> >>>> Here is my code: >>>> >>>> sub doBlast { >>>> #this is the filename containing nucleotide sequence >>>> my($filename) = @_; >>>> >>>> my $prog = 'blastn'; >>>> my $db = 'nr'; >>>> my $e_val= '1e-10'; >>>> >>>> my @params = ( '-prog' => $prog, >>>> '-data' => $db, >>>> '-expect' => $e_val, >>>> '-readmethod' => 'SearchIO' ); >>>> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>> >>>> #change a paramter >>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>> 'Homo sapiens >>>> [ORGN]'; >>>> >>>> #remove a parameter >>>> #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; >>>> >>>> my $v = 1; >>>> #$v is just to turn on and off the messages >>>> >>>> my $seqio = Bio::SeqIO->new(-file=>$filename, '-format' => >>> 'fasta' ); >>>> >>>> while (my $input = $seqio->next_seq()) { >>>> >>>> #Blast a sequence against a database: >>>> >>>> my $r = $factory->submit_blast($input); >>>> >>>> print STDERR "waiting..." if( $v > 0 ); >>>> while ( my @rids = $factory->each_rid ) { >>>> foreach my $rid ( @rids ) { >>>> my $rc = $factory->retrieve_blast($rid); >>> if( >>> !ref($rc) ) { >>>> if( $rc < 0 ) { >>>> $factory->remove_rid($rid); >>>> } >>>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>>> } else { >>>> my $result = $rc->next_result(); >>> #save the output >>>> my $filename = >>> 'results/'.$result->query_name().".blast"; >>> $factory->save_output($filename); >>> $factory->remove_rid($rid); >>>> print "\nQuery Name: ", >>> $result->query_name(), "\n"; while ( >>> my $hit = $result->next_hit ) { >>>> next unless ( $v > 0); >>> print >>> "\thit name is ", >>> $hit->name, "\n"; while( my >>> $hsp = $hit->next_hsp ) { >>> print "\t\tscore is ", >>> $hsp->score, "\n"; } >>>> } >>>> } >>>> } >>>> } >>>> } >>>> >>>> } >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From neil.saunders at unsw.edu.au Wed Nov 10 01:16:15 2004 From: neil.saunders at unsw.edu.au (Neil Saunders) Date: Wed Nov 10 01:13:43 2004 Subject: [Bioperl-l] Fix for Bio::Tools::Blat Message-ID: <20041110061615.GA9330@psychro> dear all, I mentioned in a previous post that Bio::Tools::Blat has a problem with "tblastx-like" blat output (i.e. using the -t=dnax -q=dnax options). This generates PSL files with 2 strands in the 'strand' column - one for query, one for target (e.g. ++). If you use the next_result() method of Bio::Tools::Blat.pm on such files, you get errors like: Argument "++" isn't numeric in numeric ne (!=) at /usr/local/share/perl/5.8.4/Bio/Location/Atomic.pm line 170, line 228. I have added the following lines to my Blat.pm: if($strand =~/([+-])[+-]/) { $strand = $1; } which seems to fix it up. Perhaps we could make this an 'official' fix? Neil -- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney 2052, Australia http://psychro.bioinformatics.unsw.edu.au/neil/index.php From brian_osborne at cognia.com Wed Nov 10 08:09:49 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Nov 10 08:08:24 2004 Subject: [Bioperl-l] Fix for Bio::Tools::Blat In-Reply-To: <20041110061615.GA9330@psychro> Message-ID: Neil, I've added your patch, please check it to see that I've put it in the proper place. Pardon my ignorance but why do you use Tools::Blat rather than SearchIO::psl? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Neil Saunders Sent: Wednesday, November 10, 2004 1:16 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Fix for Bio::Tools::Blat dear all, I mentioned in a previous post that Bio::Tools::Blat has a problem with "tblastx-like" blat output (i.e. using the -t=dnax -q=dnax options). This generates PSL files with 2 strands in the 'strand' column - one for query, one for target (e.g. ++). If you use the next_result() method of Bio::Tools::Blat.pm on such files, you get errors like: Argument "++" isn't numeric in numeric ne (!=) at /usr/local/share/perl/5.8.4/Bio/Location/Atomic.pm line 170, line 228. I have added the following lines to my Blat.pm: if($strand =~/([+-])[+-]/) { $strand = $1; } which seems to fix it up. Perhaps we could make this an 'official' fix? Neil -- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney 2052, Australia http://psychro.bioinformatics.unsw.edu.au/neil/index.php _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From waibhav at gmail.com Tue Nov 9 17:11:20 2004 From: waibhav at gmail.com (Waibhav Tembe) Date: Wed Nov 10 13:41:14 2004 Subject: [Bioperl-l] BioPerl installation for SGI Message-ID: <734c67cb0411091411560c4217@mail.gmail.com> Hello, I am not an expert in Makefiles etc. So any help will be apprciated. I downloaded BioPerl.gz file and after untarring it ran perl Makefile.PL as mentioned in the docs. I have IRIX64 on SGI with perl version 5.004 _05 for irix_n32. The installation gave the following error: Not enough arguments for mkdir at Makefile.PL line 119, near "$dest_dir) " BEGIN not safe after errors--compilation aborted at Makefile.PL line 273. How do I fix this? I was successful in installing and running BioPerl on my Linux machine w/o any problems. Thanks to all!! From cjm at fruitfly.org Wed Nov 10 20:03:47 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Wed Nov 10 20:01:59 2004 Subject: [Bioperl-l] getting proteins matching GO In-Reply-To: References: Message-ID: Pedro, There are a number of approaches you can take. You can download the GO MySQL database and either query that directly via SQL or query it via the GO DB perl API. You can find the database build, API code and example queries here: http://www.godatabase.org/dev Click on "example SQL" in the navigation bar to see both SQL queries and API calls for doing what you want. Note that you probably want to do queries over the transitive closure, so that you get both "cell signaling" and all terms beneath this one in the GO graph. You can do something similar with the EnsMart, as Stefan mentions below. However, I am unsure if this takes into account the transitive closure, and how many organisms EnsMart covers, and how regularly they update their copy of GO. If you don't want to download a database and want to do this in pure perl, then you can either use bioperl, or the go-perl classes (the latter also available from the site above). You will also have to download the gene_association files for the organism you are interested in from here: ftp://ftp.geneontology.org/pub/go/gene-associations/ (This is the fileset used to build the GO DB, so you don't gain anything by this approach[*]) You can then use either bioperl or go-perl to parse the GO file to obtain a graph object. You'll then want to parse through all the association files and check every entry against the graph to see if your term of interest subsumes the term in the association file. This approach will be slow, there are several million associations in total! I don't believe bioperl has an association file parser - this is easy to write yourself, but there are a few gotchas, so you should read the format documentation carefully, or use the go-perl parser. If you're not comfortable with OO programming, you can also use the go2path script (part of go-perl) to generate the path-to-list route for all GO terms and use this when filtering the association files. Yet another way is to use the AmiGO browser, which queries GO DB - however, I assume you are after a programmatic solution. I would recommend the database solution. Let me know if you have any problems with the GO DB or the go-perl code. Cheers Chris [*] the GO DB actually has some of the older redundant associations filtered out, so you do gain something by going straight to the associations file, but not much Stefan Kirov skirov wrote: > Pedro, > You may want to check Bio::Ontology and especially Bio::OntologyIO. > These are pretty cool modules, but you will have to install bioperl-live > or wait for bioperl 1.5 (which as I understand should be released soon). > You will have to download the GO DB locally and parse it with > Bio::OntologyIO, I am not sure if somebody is working on remote access > (not familiar if it is possible at the moment). By the way if you are > not familiar with mysql and you are OK with perl, Bio::OntologyIO might > be easiest for you. It will also include anything you are able to get > from GO website. But you will have to keep local database (or flat > file). Hope this helps. > Stefan > > > Pedro Antonio Reche wrote: > > > Dear Stefan, thanks a lot for your e-mail. Actually, I am interested > > in getting all proteins from all organisms that are tagged with let > > say the go_process cell signaling. I will try the sites that you > > indicate to see if they can do the job. Do you know if Bioperl can > > also do this? > > Regards, > > > > pdro > > On Nov 5, 2004, at 12:27 PM, Stefan Kirov wrote: > > > >> What organism? You can use either EnsMart (for example for human > >> there is a table called hsapiens_gene_ensembl__xref_go__dm) or you > >> can use GeneKeyDB if you install it locally (genereg.ornl.gov/gkdb), > >> there is a table called ll_go, which you can search for the gene > >> identifier(locuslink), associated with a particular GO term and then > >> get the protein accession from another table (something like : > >> "select r.np_accn from ll_go g, ll_refseq_nm r where r.ll_id=g.ll_id > >> and g.go_term=?") and fetch the seq from RefSeq, etc. Both Ensembl > >> and GeneKeyDB are restricted to certain eukaryotes. So it all depends > >> on what kind of organisms you are expected to work with. > >> Stefan > >> > >> Pedro Antonio Reche wrote: > >> > >>> Hi, > >>> I am interested in getting all the protein sequences matching a > >>> specific GO term and I wonder if someone would know how to do this. > >>> Thanks in advance for any help. > >>> Cheers > >>> > >>> pdro > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> -- > >> Stefan Kirov, Ph.D. > >> University of Tennessee/Oak Ridge National Laboratory > >> 5700 bldg, PO BOX 2008 MS6164 > >> Oak Ridge TN 37831-6164 > >> USA > >> tel +865 576 5120 > >> fax +865-576-5332 > >> e-mail: skirov at utk.edu > >> sao at ornl.gov > >> > >> "And the wars go on with brainwashed pride > >> For the love of God and our human rights > >> And all these things are swept aside" > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From amackey at pcbi.upenn.edu Thu Nov 11 09:16:54 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Nov 11 09:15:05 2004 Subject: [Bioperl-l] 1.5.0-RC1 available for download Message-ID: <56B88C1E-33EC-11D9-A392-000D93392082@pcbi.upenn.edu> The somewhat-anxiously anticipated 1.5.0 release candidates are starting to roll off the shelves. Since I don't seem to have access to the download directory at bioperl.org, they are currently available via FTP (active, not passive) at: ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- RC1.tar.gz ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- RC1.tar.gz ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- RC1.tar.gz Regarding -run and -ext versioning, I've bumped them all to be in sync with bioperl-live; I remember there was some discussion of this long ago, but please remind me if there's some critical reason for this not to occur. Thanks, enjoy, and gimme feedback (I already know that Unflattener2.t will fail 2 tests; these have already been fixed, and will be in the next RC, and/or final) -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From amackey at pcbi.upenn.edu Thu Nov 11 09:20:36 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Nov 11 09:18:47 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: <56B88C1E-33EC-11D9-A392-000D93392082@pcbi.upenn.edu> References: <56B88C1E-33EC-11D9-A392-000D93392082@pcbi.upenn.edu> Message-ID: It seems our previous problems with passive FTP mode have gone away; FTP as you wish. -Aaron On Nov 11, 2004, at 9:16 AM, Aaron J. Mackey wrote: > > The somewhat-anxiously anticipated 1.5.0 release candidates are > starting to roll off the shelves. Since I don't seem to have access > to the download directory at bioperl.org, they are currently available > via FTP (active, not passive) at: > > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- > RC1.tar.gz > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- > RC1.tar.gz > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- > RC1.tar.gz > > Regarding -run and -ext versioning, I've bumped them all to be in sync > with bioperl-live; I remember there was some discussion of this long > ago, but please remind me if there's some critical reason for this not > to occur. > > Thanks, enjoy, and gimme feedback (I already know that Unflattener2.t > will fail 2 tests; these have already been fixed, and will be in the > next RC, and/or final) > > -Aaron > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From amackey at pcbi.upenn.edu Thu Nov 11 09:33:25 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Nov 11 09:31:33 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: <56B88C1E-33EC-11D9-A392-000D93392082@pcbi.upenn.edu> References: <56B88C1E-33EC-11D9-A392-000D93392082@pcbi.upenn.edu> Message-ID: And you can now get them via http, directly from the bioperl.org/DIST/ directory: http://bioperl.org/DIST/bioperl-1.5.0-RC1.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.0-RC1.tar.gz http://bioperl.org/DIST/bioperl-ext-1.5.0-RC1.tar.gz -Aaron On Nov 11, 2004, at 9:16 AM, Aaron J. Mackey wrote: > > The somewhat-anxiously anticipated 1.5.0 release candidates are > starting to roll off the shelves. Since I don't seem to have access > to the download directory at bioperl.org, they are currently available > via FTP (active, not passive) at: > > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- > RC1.tar.gz > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- > RC1.tar.gz > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- > RC1.tar.gz > > Regarding -run and -ext versioning, I've bumped them all to be in sync > with bioperl-live; I remember there was some discussion of this long > ago, but please remind me if there's some critical reason for this not > to occur. > > Thanks, enjoy, and gimme feedback (I already know that Unflattener2.t > will fail 2 tests; these have already been fixed, and will be in the > next RC, and/or final) > > -Aaron > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From barry.moore at genetics.utah.edu Thu Nov 11 18:34:49 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu Nov 11 18:33:00 2004 Subject: [Bioperl-l] BioSeq method errors with BioSQL sequences Message-ID: <4193F719.5010300@genetics.utah.edu> I have code that I have used to grab details from a Bio::Seq object like this: my $locus = $seq->display_name; my $length = $seq->length; my $mol_type = $seq->molecule; my $division = $seq->division; This works great when called on Bio::Seq created like this: use Bio::DB::RefSeq; $db = new Bio::DB::RefSeq; $seq = $db->get_Seq_by_id('NM_006732'); But I recently switched the to using a local database and getting my sequence objects like this: my $db_adaptor = Bio::DB::BioDB->new( -database => 'biosql', -user => 'postgres', -dbname => 'some_db', -host => 'localhost', -driver => 'Pg', ); my $tmp_seq = Bio::Seq->new(-accession_number => "$NM_006732", -namespace => "$ncbi"); my $seqfact = Bio::Seq::SeqFactory->new(-type => "Bio::Seq"); my $adp = $db_adaptor->get_object_adaptor($tmp_seq); my $seq = $adp->find_by_unique_key($tmp_seq, -obj_factory => $seqfact); The new seq object seems to behave very nicely under most circumstances but it will not execute some methods like these: my $mol_type = $seq->molecule; my $division = $seq->division; I get an error like this: Can't locate object method "division" via package "Bio::Seq" at /usr/local/share/perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm line 541, line 1. I tried to track down where the problem occured and found that in AUTOLOAD for Bio/DB/Persistent/PersistentObject.pm on line 534 and error was being thrown. 534: $self->throw("Can't locate object method \"$meth\" via package ". 535: ref($self)) 536: unless $obj && ($obj ne $self); Checking ref $obj and ref $self gives Bio::Seq and Bio::DB::Persistent::Seq respectively. That seems to me to indicate that the 'unless' above should be satisfied - $obj exist and is not equal to $self therefore don't throw. Running $seq->display_name generates the same values for $obj and $self, but it works. Methods molecule and division throw errors. Any suggestions? Barry -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From Marc.Logghe at devgen.com Thu Nov 11 18:41:30 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Nov 11 18:40:08 2004 Subject: [Bioperl-l] BioSeq method errors with BioSQL sequences Message-ID: Hi Barry, Have you tried setting the seqfactory type to 'Bio::Seq::RichSeq' ? Did not check it, but guess Bio::Seq has no division method, a richseq does. HTH, Marc > -----Original Message----- > From: Barry Moore [mailto:barry.moore@genetics.utah.edu] > Sent: Friday, November 12, 2004 12:35 AM > To: bioperl > Subject: [Bioperl-l] BioSeq method errors with BioSQL sequences > > > I have code that I have used to grab details from a Bio::Seq > object like > this: > > my $locus = $seq->display_name; > my $length = $seq->length; > my $mol_type = $seq->molecule; > my $division = $seq->division; > > This works great when called on Bio::Seq created like this: > > use Bio::DB::RefSeq; > $db = new Bio::DB::RefSeq; > $seq = $db->get_Seq_by_id('NM_006732'); > > But I recently switched the to using a local database and getting my > sequence objects like this: > > my $db_adaptor = Bio::DB::BioDB->new( > -database => 'biosql', > -user => 'postgres', > -dbname => 'some_db', > -host => 'localhost', > -driver => 'Pg', > ); > my $tmp_seq = Bio::Seq->new(-accession_number => "$NM_006732", > -namespace => "$ncbi"); > my $seqfact = Bio::Seq::SeqFactory->new(-type => "Bio::Seq"); > my $adp = $db_adaptor->get_object_adaptor($tmp_seq); > my $seq = $adp->find_by_unique_key($tmp_seq, -obj_factory => > $seqfact); > > The new seq object seems to behave very nicely under most > circumstances > but it will not execute some methods like these: > > my $mol_type = $seq->molecule; > my $division = $seq->division; > > I get an error like this: > > Can't locate object method "division" via package "Bio::Seq" at > /usr/local/share/perl/5.6.1/Bio/DB/Persistent/PersistentObject > .pm line > 541, line 1. > > > I tried to track down where the problem occured and found that in > AUTOLOAD for Bio/DB/Persistent/PersistentObject.pm on line > 534 and error > was being thrown. > > 534: $self->throw("Can't locate object method \"$meth\" via > package ". > 535: ref($self)) > 536: unless $obj && ($obj ne $self); > > Checking ref $obj and ref $self gives Bio::Seq and > Bio::DB::Persistent::Seq respectively. That seems to me to indicate > that the 'unless' above should be satisfied - $obj exist and is not > equal to $self therefore don't throw. Running $seq->display_name > generates the same values for $obj and $self, but it works. Methods > molecule and division throw errors. Any suggestions? > > Barry > > -- > > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From mlemieux at bioinfo.ca Fri Nov 12 00:40:27 2004 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Fri Nov 12 00:38:42 2004 Subject: [Bioperl-l] RemoteBlast with same e-val (10 vs 1e-10) works Message-ID: <5B9EC6A0-346D-11D9-B72C-000A95B139D2@bioinfo.ca> Affan, I ran your code with the expected value set to 10, as it is by default on the NCBI Blast form, and got exactly the same results as when I blasted your sequence on the NCBI server. Madeleine ----- Output from your code calling RemoteBlast with e-val = 10 Query Name: affan hit name is gb|AE016803.1| score is 22 hit name is gb|AC124687.4| score is 20 hit name is gb|AC112701.6| score is 20 hit name is gb|AC016932.17| score is 20 hit name is gb|AE008564.1|AE008564 score is 20 hit name is gb|AC073964.3|AC073964 score is 20 ----- NCBI Blast output Score E Sequences producing significant alignments: (bits) Value gi|27361305|gb|AE016803.1| Vibrio vulnificus CMCP6 chromoso... 44 0.28 gi|27356737|gb|AC124687.4| Mus musculus BAC clone RP24-447D... 40 4.4 gi|22296769|gb|AC112701.6| Mus musculus BAC clone RP23-171A... 40 4.4 gi|15789223|gb|AC016932.17| Homo sapiens 3 BAC RP11-166C10 ... 40 4.4 gi|15459696|gb|AE008564.1|AE008564 Streptococcus pneumoniae... 40 4.4 From dthomp at resp-sci.arizona.edu Fri Nov 12 13:33:07 2004 From: dthomp at resp-sci.arizona.edu (Dave Thompson) Date: Fri Nov 12 13:31:11 2004 Subject: [Bioperl-l] SeqIO File locking Message-ID: <419501E3.4000107@resp-sci.arizona.edu> Hi, I'm writing a cgi web interface that can alter GenBank files via SeqIO. Is there a way to implement file locking with SeqIO so that multiple users cannot write to the same GenBank file at the same time? thanks From tex at biosysadmin.com Thu Nov 11 21:33:19 2004 From: tex at biosysadmin.com (James Thompson) Date: Fri Nov 12 13:42:52 2004 Subject: [Bioperl-l] SeqIO File locking In-Reply-To: <419501E3.4000107@resp-sci.arizona.edu> Message-ID: Dave, If your program is the only way that these files can be modified, try using flock (perldoc -f flock). File locking is something that can vary depending on your platform of choice, so be sure to read the documentation thoroughly. Cheers, James Thompson On Fri, 12 Nov 2004, Dave Thompson wrote: > Hi, > I'm writing a cgi web interface that can alter GenBank files via SeqIO. > Is there a way to implement file locking with SeqIO so that multiple > users cannot write to the same GenBank file at the same time? > thanks > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Fri Nov 12 13:47:13 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Nov 12 13:45:18 2004 Subject: [Bioperl-l] SeqIO File locking In-Reply-To: <419501E3.4000107@resp-sci.arizona.edu> References: <419501E3.4000107@resp-sci.arizona.edu> Message-ID: <44A31150-34DB-11D9-AA93-000393C44276@duke.edu> Perl's flock() should take care of it - not sure how it performs with shared filesystems (NFS). perldoc -f flock http://www.perlmonks.org/index.pl?node_id=7058 You might want to do something using CVS or RCS to check in changes rather than just overwritting the file if you want to insure changes are incompatible. Let's you rollback if someone does a boo-boo.... Will prevent you from having to have file locking at the perl layer (write to a temporary directory, do your 'cvs commit') -j On Nov 12, 2004, at 1:33 PM, Dave Thompson wrote: > Hi, > I'm writing a cgi web interface that can alter GenBank files via > SeqIO. Is there a way to implement file locking with SeqIO so that > multiple users cannot write to the same GenBank file at the same time? > thanks > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From popgen23 at mac.com Fri Nov 12 13:48:47 2004 From: popgen23 at mac.com (Michael Robeson) Date: Fri Nov 12 13:46:55 2004 Subject: [Bioperl-l] gap position script Message-ID: <7C8799CA-34DB-11D9-93AE-000393539070@mac.com> Hi all, Below is the full code that will tally up the gap-size and positions for an aligned data set in FASTA format. More descriptions are in the script below. As always feel free to send me comments! -Cheers! -Mike -------- http://homepage.mac.com/popgen23 <-- begin code --> #!usr/bin/perl # # By Mike Robeson November 12, 2004. This script could not # have been done w/o the help of the wonderful folks at # bioperl.org and learnperl.org. # # This script will take an aligned sequence file in FASTA # format and tally up the sizes and positions of variously # sized gaps. The output is in the form of a matrix that # denotes the presence or absence of each gap type by using # 1s (present) and 0s (absent). File is output as a csv file # for easy viewing in a spread sheet (e.g. indel_list_$file.csv). # # Example output: # # Species:,mouse,rat,human # 1_20,1,0,1 # 2_34,1,0,1 # 3_65,0,1,0 # # Note: The first row denotes the species for each column of # data below it. After which, the info before the first ',' is # the gap-size followed by it's position (i.e. 1_20 means that # it is a 1 bp gap at the 20th position in the sequence. Followed # by its presence or absence in each species. # # For an easier reading output file just change some of the # ',' to '\t' in the script. # use warnings; use strict; my (@animals, $animal, %gap); ########################## # Request file for input ########################## print "Enter in the name of the DNA sequence file:\n"; chomp (my $dna_seq = ); open(DNA_SEQ, $dna_seq) or die "Can't open file: $!\n"; $dna_seq =~ s/\..+//g; # The outfile name is made by automatically made by # adding "indel_list_" to the front of # the name of your input file. open(OUTFILE, ">indel_list_"."$dna_seq"."\.csv") or die "Can't open outfile: $!\n"; ########################## # Read Data into Hash ########################## $/ = '>'; while () { chomp; next unless s/^\s*(.+)//; $animal = $1; push @animals, $animal; $_ =~ s/\s//g; while (/(-+)/g) { my $gap_length = length $1; my $position = pos() - $gap_length +1; $gap{$gap_length.'_'. $position}{$animal} = 1; } } ###################################### # Print out tab delimited data matrix ###################################### print OUTFILE "Species:"; foreach my $animal (@animals) { print "Processing: $animal\n"; print OUTFILE ",$animal"; } print OUTFILE "\n"; foreach my $state (sort keys %gap) { print OUTFILE "$state:"; foreach my $animal (@animals) { if (defined $gap{$state}{$animal} ) { print OUTFILE ",1"; } else { print OUTFILE ",0"; } } print OUTFILE "\n"; } print "\n***** Finished *****\n\n"; <-- end code --> From erobinso at uga.edu Fri Nov 12 16:05:02 2004 From: erobinso at uga.edu (Ed Robinson) Date: Fri Nov 12 16:03:37 2004 Subject: [Bioperl-l] TIGR parser for SeqIO Message-ID: <62f15ed.9e428375.81b3c00@punts5.cc.uga.edu> I am trying to get the tigr parser to work on our files. My script is otherwise ok, because when I call a genbank file with the genbank parser, I am able to load data. I have downloaded the most recent tigr.pm from the CVS. When I run it, I get this: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: [46]Required missing STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.6.1//Bio/Root/Root.pm:328 STACK: Bio::SeqIO::tigr::throw /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigr.pm:1343 STACK: Bio::SeqIO::tigr::_process_model /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigr.pm:1006 STACK: Bio::SeqIO::tigr::_process_tu /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigr.pm:730 STACK: Bio::SeqIO::tigr::_process_protein_coding /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigr.pm:541 STACK: Bio::SeqIO::tigr::_process_gene_list /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigr.pm:507 STACK: Bio::SeqIO::tigr::_process_assembly /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigr.pm:315 STACK: Bio::SeqIO::tigr::_process_tigr /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigr.pm:220 STACK: Bio::SeqIO::tigr::_process /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigr.pm:188 STACK: Bio::SeqIO::tigr::_initialize /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigr.pm:88 STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO.pm:358 STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO.pm:378 When I use the tigr.pm that comes with the bioperl release we have installed, I get the following: A whole bunch of warnings such as: -------------------- WARNING --------------------- MSG: Unknown element GENE_LIST, ignored --------------------------------------------------- -------------------- WARNING --------------------- MSG: Unknown element PROTEIN_CODING, ignored --------------------------------------------------- followed by this: Died at /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigrxml.pm line 150. at /usr/lib/perl5/site_perl/5.6.1/i386-linux//XML/LibXML/SAX.pm line 63 Indicating it died for some reason once it got to the feature_type. The XML I have is valid XML, although it is from August of this year. It is also a full XML file, it is not a coordset file. Any suggestions would be appreciated. I've even updated a number of our Perl modules to rule those problems out. -Ed R ----------------- Ed Robinson Program Specialist Center for Tropical and Emerging Global Diseases and Dept. of Genetics University of Georgia, Athens, GA 30602 erobinso@uga.edu/(706) 542.1447 From laurichj at bioinfo.ucr.edu Fri Nov 12 16:27:03 2004 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Fri Nov 12 16:25:11 2004 Subject: [Bioperl-l] TIGR parser for SeqIO In-Reply-To: <62f15ed.9e428375.81b3c00@punts5.cc.uga.edu> References: <62f15ed.9e428375.81b3c00@punts5.cc.uga.edu> Message-ID: <20041112212703.GC18664@batch107a> On Fri 11/12/04 16:05, Ed Robinson wrote: > I am trying to get the tigr parser to work on our files. My > script is otherwise ok, because when I call a genbank file > with the genbank parser, I am able to load data. I have > downloaded the most recent tigr.pm from the CVS. When I run > it, I get this: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: [46]Required missing Could you privately e-mail me lines around 46 from your input file (10 before, 10 after should be enough) or give me a URL to your file. > When I use the tigr.pm that comes with the bioperl release we > have installed, I get the following: > Died at /usr/lib/perl5/site_perl/5.6.1//Bio/SeqIO/tigrxml.pm > line 150. > at > /usr/lib/perl5/site_perl/5.6.1/i386-linux//XML/LibXML/SAX.pm > line 63 The "tigrxml" module is not for FULL TIGR files. It is for the tigr coordset files. The tigr.pm module is for the full TIGR files. -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| From idonalds at blueprint.org Fri Nov 12 17:09:24 2004 From: idonalds at blueprint.org (Ian Donaldson) Date: Fri Nov 12 17:07:42 2004 Subject: [Bioperl-l] getting proteins matching GO Message-ID: Hi Pedro One other solution that may be useful if you are interested in obtaining a list of proteins in one numbering space (i.e. GenBank GI's): SeqHound maps all of the GO Annotations from the GO ftp site to GI's using multiple sequence database cross-reference files. I have attached details below. These data are available via a remote programming API (in Perl/Java/C/C++) using the following calls SHoundGiFromGOID SHoundGiFromGOIDAndECode SHoundGiFromGOIDList SHoundGiFromGOIDListAndECode SHoundGOECodeFromGiAndGOID SHoundGOIDFromGi SHoundGOIDFromGiList SHoundGOIDFromLLID SHoundGOIDFromRedundantGi SHoundGOIDFromRedundantGiList SHoundGOPMIDFromGiAndGOID Details on the use of these functions is available at http://www.blueprint.org/seqhound/seqhound_documentation.html You can try out some of the http calls underneath the API calls to get a feel for results before you use the API in a program. Try http://seqhound.blueprint.org/cgi-bin/seqrem?fnct=SeqHoundGiFromGOID&goid=50 778 for proteins involved in positive regulation of the immune response. You can use the returned GI's to retrieve other data from SeqHound like sequence, sequence neighbours and conserved domains. There are also functions to help you traverse the GO tree like SHoundGODBGetChildrenOf and SHoundGODBGetParentOf. The compiled GO Annotation tables are also available from our ftp site in MySQL or text format: ftp://ftp.blueprint.org/pub/SeqHound/Data/goa/ and ftp://ftp.blueprint.org/pub/SeqHound/Data/dbxref/ Details on these tables are available in the SeqHound manual http://www.blueprint.org/seqhound/api_help/docs/The_SeqHound_Manual.pdf These data were just updated as of November 11 by Renan Cavero in our group. Best regards Ian ************************ GOA Data Release V Notes ************************ No code changes from the previous data build. Number of Records DBXref: 11,866,278 Goa_gigo: 4,471,483 This release contains file: DBXref files parsed from: Uniprot 3.0 ftp://expasy.org/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz ftp://expasy.org/databases/uniprot/knowledgebase/uniprot_trembl.dat.gz Locus Link Oct 29 ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2ref FlyBase. ftp://ftp.geneontology.org/pub/go/gene-associations/gene_association.fb.gz WormBase. ftp://ftp.sanger.ac.uk/pub/databases/wormpep/wormpep.table Mouse Genome Informatics. ftp://ftp.informatics.jax.org/pub/reports/MRK_Sequence.rpt ftp://ftp.informatics.jax.org/pub/reports/MRK_SwissProt_TrEMBL.rpt Saccharomyces Genome Database. ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/db xref.tab Alliance for Cellular Signaling. ftp://ftp.afcs.org/pub/mpdata/afcsflat.txt The Institute for Genomic Research, Arabidopsis thaliana database. ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/DATA_RELEASE_SUPPLEMENT/release_ 5.genbank_accessions.txt.gz ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.tigr_ath DictyBase. ftp://ftp.blueprint.org/pub/SeqHound/Private/DDB/dictybaseid_gb_accession.tx t.gz Rat Genome Database. ftp://rgd.mcw.edu/pub/data_release/genbank_to_gene_ids.txt The Zebrafish Information Network. http://zfin.org/data_transfer/Downloads/genbank.txt http://zfin.org/data_transfer/Downloads/refseq.txt ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.zfin GeneDB_Spombe. ftp://ftp.sanger.ac.uk/pub/yeast/pombe/Mappings/gp2swiss.txt The Arabidopsis Information Resource. ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.tigr_cmr The Institute for Genomic Research, Comprehensive Microbial Resource. NCBI UniGene is an experimental system for automatically partitioning GenBank sequences. ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.unigene Virus Database at University College London. ftp://ftp.geneontology.org/pub/go/gp2protein/gp2protein.vida IPI Cross Reference Files (human.xrefs, mouse.xrefs, rat.xrefs) ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/human.xrefs.gz ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/MOUSE/mouse.xrefs.gz ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/RAT/rat.xrefs.gz GOA files parsed from: ftp://ftp.geneontology.org/pub/go/gene-associations/ gene_association.Compugen_GenBank.gz gene_association.Compugen_UniProt.gz gene_association.GeneDB_Lmajor.gz gene_association.GeneDB_Pfalciparum.gz gene_association.GeneDB_Spombe.gz gene_association.GeneDB_Tbrucei.gz gene_association.GeneDB_tsetse.gz gene_association.ddb.gz gene_association.fb.gz gene_association.goa_human.gz gene_association.goa_mouse.gz gene_association.goa_pdb.gz gene_association.goa_rat.gz gene_association.goa_uniprot.gz gene_association.gramene_oryza.gz gene_association.mgi.gz gene_association.rgd.gz gene_association.sgd.gz gene_association.tair.gz gene_association.tigr_Athaliana.gz gene_association.tigr_Banthracis.gz gene_association.tigr_Cburnetii.gz gene_association.tigr_Gsulfurreducens.gz gene_association.tigr_Lmonocytogenes.gz gene_association.tigr_Psyringae.gz gene_association.tigr_Soneidensis.gz gene_association.tigr_Tbrucei_chr2.gz gene_association.tigr_Vcholerae.gz gene_association.tigr_gene_index.gz gene_association.wb.gz gene_association.zfin.gz The following Gis were spot checked. select a.* from goa_gigo a, seqhound.redund b, seqhound.redund c where b.rgroup= c.rgroup AND a.gi=b.gi AND c.gi IN (3641615, 17647231, 6321311, 6323259, 6322304, 6324701, 6324058, 10383763, 14318479, 6379287, 4758116, 28559088, 6678656, 9755336, 6323421, 30923724); _______________________________________________ From hlapp at gmx.net Sat Nov 13 17:11:16 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 13 17:09:35 2004 Subject: [Bioperl-l] BioSeq method errors with BioSQL sequences In-Reply-To: Message-ID: Good hit Marc. Barry, the reason you're getting the exception is because the method isn't implemented by the seq object (a Bio::Seq). Only a Bio::Seq::RichSeq instance will implement it. -hilmar On Thursday, November 11, 2004, at 03:41 PM, Marc Logghe wrote: > Hi Barry, > Have you tried setting the seqfactory type to 'Bio::Seq::RichSeq' ? > Did not check it, but guess Bio::Seq has no division method, a richseq > does. > HTH, > Marc > >> -----Original Message----- >> From: Barry Moore [mailto:barry.moore@genetics.utah.edu] >> Sent: Friday, November 12, 2004 12:35 AM >> To: bioperl >> Subject: [Bioperl-l] BioSeq method errors with BioSQL sequences >> >> >> I have code that I have used to grab details from a Bio::Seq >> object like >> this: >> >> my $locus = $seq->display_name; >> my $length = $seq->length; >> my $mol_type = $seq->molecule; >> my $division = $seq->division; >> >> This works great when called on Bio::Seq created like this: >> >> use Bio::DB::RefSeq; >> $db = new Bio::DB::RefSeq; >> $seq = $db->get_Seq_by_id('NM_006732'); >> >> But I recently switched the to using a local database and getting my >> sequence objects like this: >> >> my $db_adaptor = Bio::DB::BioDB->new( >> -database => 'biosql', >> -user => 'postgres', >> -dbname => 'some_db', >> -host => 'localhost', >> -driver => 'Pg', >> ); >> my $tmp_seq = Bio::Seq->new(-accession_number => "$NM_006732", >> -namespace => "$ncbi"); >> my $seqfact = Bio::Seq::SeqFactory->new(-type => "Bio::Seq"); >> my $adp = $db_adaptor->get_object_adaptor($tmp_seq); >> my $seq = $adp->find_by_unique_key($tmp_seq, -obj_factory => >> $seqfact); >> >> The new seq object seems to behave very nicely under most >> circumstances >> but it will not execute some methods like these: >> >> my $mol_type = $seq->molecule; >> my $division = $seq->division; >> >> I get an error like this: >> >> Can't locate object method "division" via package "Bio::Seq" at >> /usr/local/share/perl/5.6.1/Bio/DB/Persistent/PersistentObject >> .pm line >> 541, line 1. >> >> >> I tried to track down where the problem occured and found that in >> AUTOLOAD for Bio/DB/Persistent/PersistentObject.pm on line >> 534 and error >> was being thrown. >> >> 534: $self->throw("Can't locate object method \"$meth\" via >> package ". >> 535: ref($self)) >> 536: unless $obj && ($obj ne $self); >> >> Checking ref $obj and ref $self gives Bio::Seq and >> Bio::DB::Persistent::Seq respectively. That seems to me to indicate >> that the 'unless' above should be satisfied - $obj exist and is not >> equal to $self therefore don't throw. Running $seq->display_name >> generates the same values for $obj and $self, but it works. Methods >> molecule and division throw errors. Any suggestions? >> >> Barry >> >> -- >> >> Barry Moore >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From brian_osborne at cognia.com Sat Nov 13 22:06:13 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Sat Nov 13 22:04:33 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: Message-ID: Aaron, On Cygwin I see a whole slew of new failed tests: Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------------- --- t/Biblio.t 255 65280 24 38 158.33% 4-24 t/Biblio_biofetch.t 255 65280 11 0 0.00% ?? t/Biblio_eutils.t 255 65280 5 0 0.00% ?? t/DBCUTG.t 2 512 24 0 0.00% ?? t/Domcut.t 255 65280 25 0 0.00% ?? t/ELM.t 255 65280 14 0 0.00% ?? t/GOR4.t 255 65280 13 0 0.00% ?? t/HNN.t 255 65280 13 0 0.00% ?? t/MeSH.t 255 65280 26 0 0.00% ?? t/MitoProt.t 255 65280 8 0 0.00% ?? t/NetPhos.t 255 65280 14 0 0.00% ?? t/Scansite.t 255 65280 12 0 0.00% ?? t/Sopma.t 255 65280 15 0 0.00% ?? t/Unflattener2.t 11 2 18.18% 7 10 24 subtests skipped. make: *** [test_dynamic] Error 14 Most of them seem to be due to the same problem, blocks like this: 95 BEGIN { 96 $Revision = qSimpleAnalysisI.pm,v 1.4 2003/06/04 08:36:35 heikki Exp; 97 } 129 ~/tmp/bioperl-1.5.0-RC1>perl -I. -w t/Scansite.t 1..12 Number found where operator expected at Bio/SimpleAnalysisI.pm line 96, near "v 1.4" (Do you need to predeclare v?) Number found where operator expected at Bio/SimpleAnalysisI.pm line 96, near "1. 4 2003" (Missing operator before 2003?) Number found where operator expected at Bio/SimpleAnalysisI.pm line 96, near "04 08" (Missing operator before 08?) Bareword found where operator expected at Bio/SimpleAnalysisI.pm line 96, near " 35 heikki" (Missing operator before heikki?) syntax error at Bio/SimpleAnalysisI.pm line 96, near "v 1.4" Illegal octal digit '8' at Bio/SimpleAnalysisI.pm line 96, at end of line BEGIN not safe after errors--compilation aborted at Bio/SimpleAnalysisI.pm line 97. Compilation failed in require at Bio/Tools/Analysis/SimpleAnalysisBase.pm line 7 7. BEGIN failed--compilation aborted at Bio/Tools/Analysis/SimpleAnalysisBase.pm li ne 77. Compilation failed in require at Bio/Tools/Analysis/Protein/Scansite.pm line 138 . BEGIN failed--compilation aborted at Bio/Tools/Analysis/Protein/Scansite.pm line 138. Compilation failed in require at t/Scansite.t line 50. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Aaron J. Mackey Sent: Thursday, November 11, 2004 9:33 AM To: Bioperl Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download And you can now get them via http, directly from the bioperl.org/DIST/ directory: http://bioperl.org/DIST/bioperl-1.5.0-RC1.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.0-RC1.tar.gz http://bioperl.org/DIST/bioperl-ext-1.5.0-RC1.tar.gz -Aaron On Nov 11, 2004, at 9:16 AM, Aaron J. Mackey wrote: > > The somewhat-anxiously anticipated 1.5.0 release candidates are > starting to roll off the shelves. Since I don't seem to have access > to the download directory at bioperl.org, they are currently available > via FTP (active, not passive) at: > > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- > RC1.tar.gz > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- > RC1.tar.gz > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- > RC1.tar.gz > > Regarding -run and -ext versioning, I've bumped them all to be in sync > with bioperl-live; I remember there was some discussion of this long > ago, but please remind me if there's some critical reason for this not > to occur. > > Thanks, enjoy, and gimme feedback (I already know that Unflattener2.t > will fail 2 tests; these have already been fixed, and will be in the > next RC, and/or final) > > -Aaron > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From Robeson at colorado.edu Fri Nov 12 13:45:34 2004 From: Robeson at colorado.edu (Michael S. Robeson II) Date: Sun Nov 14 21:58:04 2004 Subject: [Bioperl-l] gap position script Message-ID: <09AFB9B9-34DB-11D9-93AE-000393539070@colorado.edu> Hi all, Below is the full code that will tally up the gap-size and positions for an aligned data set in FASTA format. More descriptions are in the script below. As always feel free to send me comments! -Cheers! -Mike -------- http://homepage.mac.com/popgen23 <-- begin code --> #!usr/bin/perl # # By Mike Robeson November 12, 2004. This script could not # have been done w/o the help of the wonderful folks at # bioperl.org and learnperl.org. # # This script will take an aligned sequence file in FASTA # format and tally up the sizes and positions of variously # sized gaps. The output is in the form of a matrix that # denotes the presence or absence of each gap type by using # 1s (present) and 0s (absent). File is output as a csv file # for easy viewing in a spread sheet (e.g. indel_list_$file.csv). # # Example output: # # Species:,mouse,rat,human # 1_20,1,0,1 # 2_34,1,0,1 # 3_65,0,1,0 # # Note: The first row denotes the species for each column of # data below it. After which, the info before the first ',' is # the gap-size followed by it's position (i.e. 1_20 means that # it is a 1 bp gap at the 20th position in the sequence. Followed # by its presence or absence in each species. # # For an easier reading output file just change some of the # ',' to '\t' in the script. # use warnings; use strict; my (@animals, $animal, %gap); ########################## # Request file for input ########################## print "Enter in the name of the DNA sequence file:\n"; chomp (my $dna_seq = ); open(DNA_SEQ, $dna_seq) or die "Can't open file: $!\n"; $dna_seq =~ s/\..+//g; # The outfile name is made by automatically made by # adding "indel_list_" to the front of # the name of your input file. open(OUTFILE, ">indel_list_"."$dna_seq"."\.csv") or die "Can't open outfile: $!\n"; ########################## # Read Data into Hash ########################## $/ = '>'; while () { chomp; next unless s/^\s*(.+)//; $animal = $1; push @animals, $animal; $_ =~ s/\s//g; while (/(-+)/g) { my $gap_length = length $1; my $position = pos() - $gap_length +1; $gap{$gap_length.'_'. $position}{$animal} = 1; } } ###################################### # Print out tab delimited data matrix ###################################### print OUTFILE "Species:"; foreach my $animal (@animals) { print "Processing: $animal\n"; print OUTFILE ",$animal"; } print OUTFILE "\n"; foreach my $state (sort keys %gap) { print OUTFILE "$state:"; foreach my $animal (@animals) { if (defined $gap{$state}{$animal} ) { print OUTFILE ",1"; } else { print OUTFILE ",0"; } } print OUTFILE "\n"; } print "\n***** Finished *****\n\n"; <-- end code --> From nathanhaigh at ukonline.co.uk Mon Nov 15 02:44:49 2004 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Mon Nov 15 02:43:13 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: Message-ID: On WinXP I see these failed tests: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t\Biblio.t 255 65280 24 22 91.67% 2 4-24 t\Biblio_biofetch.t 255 65280 11 11 100.00% 1-11 t\Biblio_eutils.t 255 65280 5 5 100.00% 1-5 t\DBCUTG.t 255 65280 24 0 0.00% ?? t\Domcut.t 255 65280 25 0 0.00% ?? t\ELM.t 255 65280 14 0 0.00% ?? t\GOR4.t 255 65280 13 0 0.00% ?? t\HNN.t 255 65280 13 0 0.00% ?? t\MeSH.t 255 65280 26 0 0.00% ?? t\MitoProt.t 255 65280 8 0 0.00% ?? t\NetPhos.t 255 65280 14 0 0.00% ?? t\OntologyStore.t 6 4 66.67% 3-6 t\Scansite.t 255 65280 12 0 0.00% ?? t\SeqFeature.t 192 1 0.52% 74 t\Sopma.t 255 65280 15 0 0.00% ?? t\Taxonomy.t 8 3 37.50% 2-3 8 t\Unflattener2.t 11 2 18.18% 7 10 16 subtests skipped. Failed 17/193 test scripts, 91.19% okay. 48/8889 subtests failed, 99.46% okay. Similar to those errors already mentioned by Brian. Nathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Brian Osborne > Sent: 14 November 2004 03:06 > To: Aaron J. Mackey; Bioperl > Subject: RE: [Bioperl-l] Re: 1.5.0-RC1 available for download > > Aaron, > > On Cygwin I see a whole slew of new failed tests: > > Failed Test Stat Wstat Total Fail Failed List of Failed > ---------------------------------------------------------------------------- > --- > t/Biblio.t 255 65280 24 38 158.33% 4-24 > t/Biblio_biofetch.t 255 65280 11 0 0.00% ?? > t/Biblio_eutils.t 255 65280 5 0 0.00% ?? > t/DBCUTG.t 2 512 24 0 0.00% ?? > t/Domcut.t 255 65280 25 0 0.00% ?? > t/ELM.t 255 65280 14 0 0.00% ?? > t/GOR4.t 255 65280 13 0 0.00% ?? > t/HNN.t 255 65280 13 0 0.00% ?? > t/MeSH.t 255 65280 26 0 0.00% ?? > t/MitoProt.t 255 65280 8 0 0.00% ?? > t/NetPhos.t 255 65280 14 0 0.00% ?? > t/Scansite.t 255 65280 12 0 0.00% ?? > t/Sopma.t 255 65280 15 0 0.00% ?? > t/Unflattener2.t 11 2 18.18% 7 10 > 24 subtests skipped. > make: *** [test_dynamic] Error 14 > > > Most of them seem to be due to the same problem, blocks like this: > > 95 BEGIN { > 96 $Revision = qSimpleAnalysisI.pm,v 1.4 2003/06/04 08:36:35 heikki Exp; > 97 } > > > 129 ~/tmp/bioperl-1.5.0-RC1>perl -I. -w t/Scansite.t > 1..12 > Number found where operator expected at Bio/SimpleAnalysisI.pm line 96, near > "v > 1.4" > (Do you need to predeclare v?) > Number found where operator expected at Bio/SimpleAnalysisI.pm line 96, near > "1. > 4 2003" > (Missing operator before 2003?) > Number found where operator expected at Bio/SimpleAnalysisI.pm line 96, near > "04 > 08" > (Missing operator before 08?) > Bareword found where operator expected at Bio/SimpleAnalysisI.pm line 96, > near " > 35 heikki" > (Missing operator before heikki?) > syntax error at Bio/SimpleAnalysisI.pm line 96, near "v 1.4" > Illegal octal digit '8' at Bio/SimpleAnalysisI.pm line 96, at end of line > BEGIN not safe after errors--compilation aborted at Bio/SimpleAnalysisI.pm > line > 97. > Compilation failed in require at Bio/Tools/Analysis/SimpleAnalysisBase.pm > line 7 > 7. > BEGIN failed--compilation aborted at > Bio/Tools/Analysis/SimpleAnalysisBase.pm li > ne 77. > Compilation failed in require at Bio/Tools/Analysis/Protein/Scansite.pm line > 138 > . > BEGIN failed--compilation aborted at Bio/Tools/Analysis/Protein/Scansite.pm > line > 138. > Compilation failed in require at t/Scansite.t line 50. > > Brian O. > > > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Aaron J. Mackey > Sent: Thursday, November 11, 2004 9:33 AM > To: Bioperl > Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download > > > And you can now get them via http, directly from the bioperl.org/DIST/ > directory: > > http://bioperl.org/DIST/bioperl-1.5.0-RC1.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.0-RC1.tar.gz > http://bioperl.org/DIST/bioperl-ext-1.5.0-RC1.tar.gz > > -Aaron > > On Nov 11, 2004, at 9:16 AM, Aaron J. Mackey wrote: > > > > > The somewhat-anxiously anticipated 1.5.0 release candidates are > > starting to roll off the shelves. Since I don't seem to have access > > to the download directory at bioperl.org, they are currently available > > via FTP (active, not passive) at: > > > > > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- > > RC1.tar.gz > > > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- > > RC1.tar.gz > > > > ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- > > RC1.tar.gz > > > > Regarding -run and -ext versioning, I've bumped them all to be in sync > > with bioperl-live; I remember there was some discussion of this long > > ago, but please remind me if there's some critical reason for this not > > to occur. > > > > Thanks, enjoy, and gimme feedback (I already know that Unflattener2.t > > will fail 2 tests; these have already been fixed, and will be in the > > next RC, and/or final) > > > > -Aaron > > > > -- > > Aaron J. Mackey, Ph.D. > > Dept. of Biology, Goddard 212 > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > 415 S. University Avenue office: 215-898-1205 > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0446-2, 11/11/2004 > Tested on: 14/11/2004 14:33:48 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0446-2, 11/11/2004 Tested on: 14/11/2004 16:48:34 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com From mlemieux at bioinfo.ca Tue Nov 16 01:38:10 2004 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Tue Nov 16 01:36:37 2004 Subject: [Bioperl-l] remote blast put HITLIST_SIZE = 1000 Message-ID: <15304290-379A-11D9-8313-000A95B139D2@bioinfo.ca> Karolina, You also have to add the following key/value pair to %HEADER in RemoteBlast.pm: HITLIST_SIZE => 1000 or add the following line to Perl.pm: Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} = 1000; See http://www.ncbi.nlm.nih.gov/BLAST/Doc/node3.html for a list of BLAST Put and Get command options. Regards, Madeleine > Hi, > > I would like to change the number of descriptions I get in my blast > report from > default 100 to 1000. I tried with adding a header > > $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = 1000; > > and I even changed the value in RemoteBlast.pm module itself and > still no change. > > -Karolina From mlemieux at bioinfo.ca Tue Nov 16 01:44:05 2004 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Tue Nov 16 01:42:19 2004 Subject: [Bioperl-l] remote blast DESCRIPTIONS Message-ID: Karolina, I also believe that the key/value pair DESCRIPTIONS => 1000 goes more appropriately in the %RETRIEVALHEADER hash, not %HEADER. So in Perl.pm you would write $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'DESCRIPTIONS'} = 1000; Cheers, Madeleine > Karolina, > > You also have to add the following key/value pair to %HEADER in > RemoteBlast.pm: > HITLIST_SIZE => 1000 > > or add the following line to Perl.pm: > Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} = 1000; > > See http://www.ncbi.nlm.nih.gov/BLAST/Doc/node3.html for a list of > BLAST Put and Get command options. > > Regards, > Madeleine > >> Hi, >> >> I would like to change the number of descriptions I get in my blast >> report from >> default 100 to 1000. I tried with adding a header >> >> $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = 1000; >> >> and I even changed the value in RemoteBlast.pm module itself and >> still no change. >> >> -Karolina From Alicia.Amadoz at uv.es Tue Nov 16 07:03:05 2004 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Tue Nov 16 07:01:25 2004 Subject: [Bioperl-l] problem installing bioperl-ext Message-ID: <7246683199amadoz@uv.es> Hi all, I finally managed to install Bioperl in Fedora Core 2 with Perl 5.8.5. It runs ok and passes the test of James Tisdall's book. But when I try to install (with 'cpan>force install B/BI/BIRNEY/bioperl-ext-1.4.tar.gz) I have some error: CPAN.pm: Going to build B/BI/BIRNEY/bioperl-ext-1.4.tar.gz Checking if your kit is complete... Looks good Writing Makefile for Bio::Ext::Align ERROR from evaluation of /root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: Can't locate Inline/MakeMaker.pm in @INC (@INC contains: /home/amadoz/.cpan/build/bioperl-1.4/Bio/Perl.pm /home/amadoz/.cpan/build/bioperl-1.4/blib/lib/Bio/Perl.pm /usr/lib/perl5/5.8.5/i686-linux /usr/lib/perl5/5.8.5 /usr/lib/perl5/site_perl/5.8.5/i686-linux /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /root/.cpan/build/bioperl-ext-1.4 .) at ./Makefile.PL line 1. BEGIN failed--compilation aborted at ./Makefile.PL line 1. Running make test Make had some problems, maybe interrupted? Won't test Running make install Make had some problems, maybe interrupted? Won't install I have been searching on the mailing-list archive but i couldn't find anything useful. Thanks for your help. Regards, Alicia ************************************************ Alicia Amadoz Evolutionary Genetics Unit Cavanilles Institute for Biodiversity and Evolutionary Biology University of Valencia Apartado Oficial 22085 E-46071 Valencia SPAIN Phone: (+34) 96 354 3687 FAX: (+34) 96 354 3670 e-mail: alicia.amadoz@uv.es http://www.uv.es/~amadoz *********************************************** NOTE! For shipments by EXPRESS COURIER use "Instituto Cavanilles de Biodiversidad y Biolog?a Evolutiva, Pol?gono de la Coma s/n, 46980 Paterna (Valencia), Spain" instead of P.O. Box no. and Post Code/City above. From rousse at ccr.jussieu.fr Tue Nov 16 07:17:36 2004 From: rousse at ccr.jussieu.fr (Guillaume Rousse) Date: Tue Nov 16 07:16:12 2004 Subject: [Bioperl-l] problem installing bioperl-ext In-Reply-To: <7246683199amadoz@uv.es> References: <7246683199amadoz@uv.es> Message-ID: <4199EFE0.70604@ccr.jussieu.fr> Alicia Amadoz wrote: > Hi all, > > I finally managed to install Bioperl in Fedora Core 2 with Perl 5.8.5. > It runs ok and passes the test of James Tisdall's book. But when I try > to install (with 'cpan>force install B/BI/BIRNEY/bioperl-ext-1.4.tar.gz) > I have some error: > > CPAN.pm: Going to build B/BI/BIRNEY/bioperl-ext-1.4.tar.gz > > Checking if your kit is complete... > Looks good > Writing Makefile for Bio::Ext::Align > ERROR from evaluation of > /root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: Can't > locate Inline/MakeMaker.pm in @INC (@INC contains: > /home/amadoz/.cpan/build/bioperl-1.4/Bio/Perl.pm > /home/amadoz/.cpan/build/bioperl-1.4/blib/lib/Bio/Perl.pm > /usr/lib/perl5/5.8.5/i686-linux /usr/lib/perl5/5.8.5 > /usr/lib/perl5/site_perl/5.8.5/i686-linux /usr/lib/perl5/site_perl/5.8.5 > /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 > /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 > /usr/lib/perl5/site_perl /root/.cpan/build/bioperl-ext-1.4 .) at > ./Makefile.PL line 1. > BEGIN failed--compilation aborted at ./Makefile.PL line 1. > Running make test > Make had some problems, maybe interrupted? Won't test > Running make install > Make had some problems, maybe interrupted? Won't install > > > I have been searching on the mailing-list archive but i couldn't find > anything useful. Thanks for your help. Here is your error: Can't locate Inline/MakeMaker.pm in @INC Search in your distribution wich package is providing Inline/MakeMaker.pm. Alternatively, you could use packages provided by www.biolinux.org if you're not familiar with perl module manual installation. -- If you really need an officer in a hurry, take a nap -- Murphy's Military Laws n?5 From Ying.Sun at ebc.uu.se Tue Nov 16 09:41:46 2004 From: Ying.Sun at ebc.uu.se (Ying Sun) Date: Tue Nov 16 09:55:12 2004 Subject: [Bioperl-l] (no subject) Message-ID: <1100616106.419a11aa94650@webmail.anst.uu.se> Dear there: I am the beginner for using Bio::DB::GFF module. So, I have some questions about that. I used bp_genbank2gff.pl to convert genbank file to gff file. And then I try to load gff file into Database.My script is as follows: #!/usr/local/bin/perl use Bio::DB::GFF; $file="/db/user/external/sying/project_test/NC002489_2.gff"; my $db=Bio::DB::GFF->new(-adaptor=>'dbi::mysqlopt', -dsn=>'dbi:mysql:ying'); $db->initialize(-erase=>1); $db->load_gff($file); Firstly, I create my database: ying. And then try to load gff file into database. But it doesn't work. So, I wonder something wrong with my script. Hope someone can help me. Thanks a lot! From amackey at pcbi.upenn.edu Tue Nov 16 12:37:16 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Nov 16 12:35:30 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: References: Message-ID: <28B10ECC-37F6-11D9-81F0-000A9577009E@pcbi.upenn.edu> Whomever thought this was cute, was wrong: % grep -r 'q\$Id' live ext run live/Bio/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v 1.5 2003/06/04 08:36:35 heikki Exp $; live/Bio/Biblio/IO/medline2ref.pm: $Revision = q$Id: medline2ref.pm,v 1.11 2003/06/04 08:36:36 heikki Exp $; live/Bio/Biblio/IO/medlinexml.pm: $Revision = q$Id: medlinexml.pm,v 1.6 2003/06/04 08:36:36 heikki Exp $; live/Bio/Biblio/IO/pubmed2ref.pm: $Revision = q$Id: pubmed2ref.pm,v 1.3 2003/06/04 08:36:36 heikki Exp $; live/Bio/Biblio/IO/pubmedxml.pm: $Revision = q$Id: pubmedxml.pm,v 1.5 2003/06/04 08:36:36 heikki Exp $; live/Bio/Biblio.pm: $Revision = q$Id: Biblio.pm,v 1.11 2004/07/13 11:44:37 bosborne Exp $; live/Bio/DB/Biblio/biofetch.pm: $Revision = q$Id: biofetch.pm,v 1.6 2003/06/04 08:36:37 heikki Exp $; live/Bio/DB/Biblio/soap.pm: $Revision = q$Id: soap.pm,v 1.6 2003/06/04 08:36:37 heikki Exp $; live/Bio/DB/BiblioI.pm: $Revision = q$Id: BiblioI.pm,v 1.6 2003/06/04 08:36:37 heikki Exp $; live/Bio/Factory/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v 1.2 2003/06/04 08:36:39 heikki Exp $; live/Bio/SimpleAnalysisI.pm: $Revision = q$Id: SimpleAnalysisI.pm,v 1.4 2003/06/04 08:36:35 heikki Exp $; live/Bio/WebAgent.pm: $Revision = q$Id: WebAgent.pm,v 1.4 2003/06/04 08:36:35 heikki Exp $; run/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: soap.pm,v 1.8 2003/06/04 08:48:26 heikki Exp $; run/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: Analysis.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; run/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = q$Id: soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; run/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; run/blib/lib/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: soap.pm,v 1.8 2003/06/04 08:48:26 heikki Exp $; run/blib/lib/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: Analysis.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; run/blib/lib/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = q$Id: soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; run/blib/lib/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; run/scripts/panalysis.PLS: $Revision = q$Id: panalysis.PLS,v 1.5 2003/11/25 18:09:56 bosborne Exp $; run/scripts/papplmaker.PLS: $Revision = q$Id: papplmaker.PLS,v 1.3 2003/05/30 15:43:22 jason Exp $; can someone please replace these q$Id ... $; with q[$Id .. $] or something similar? Thanks, -Aaron On Nov 13, 2004, at 10:06 PM, Brian Osborne wrote: > Aaron, > > On Cygwin I see a whole slew of new failed tests: > > Failed Test Stat Wstat Total Fail Failed List of Failed > ----------------------------------------------------------------------- > ----- > --- > t/Biblio.t 255 65280 24 38 158.33% 4-24 > t/Biblio_biofetch.t 255 65280 11 0 0.00% ?? > t/Biblio_eutils.t 255 65280 5 0 0.00% ?? > t/DBCUTG.t 2 512 24 0 0.00% ?? > t/Domcut.t 255 65280 25 0 0.00% ?? > t/ELM.t 255 65280 14 0 0.00% ?? > t/GOR4.t 255 65280 13 0 0.00% ?? > t/HNN.t 255 65280 13 0 0.00% ?? > t/MeSH.t 255 65280 26 0 0.00% ?? > t/MitoProt.t 255 65280 8 0 0.00% ?? > t/NetPhos.t 255 65280 14 0 0.00% ?? > t/Scansite.t 255 65280 12 0 0.00% ?? > t/Sopma.t 255 65280 15 0 0.00% ?? > t/Unflattener2.t 11 2 18.18% 7 10 > 24 subtests skipped. > make: *** [test_dynamic] Error 14 > > > Most of them seem to be due to the same problem, blocks like this: > > 95 BEGIN { > 96 $Revision = qSimpleAnalysisI.pm,v 1.4 2003/06/04 08:36:35 heikki > Exp; > 97 } > > > 129 ~/tmp/bioperl-1.5.0-RC1>perl -I. -w t/Scansite.t > 1..12 > Number found where operator expected at Bio/SimpleAnalysisI.pm line > 96, near > "v > 1.4" > (Do you need to predeclare v?) > Number found where operator expected at Bio/SimpleAnalysisI.pm line > 96, near > "1. > 4 2003" > (Missing operator before 2003?) > Number found where operator expected at Bio/SimpleAnalysisI.pm line > 96, near > "04 > 08" > (Missing operator before 08?) > Bareword found where operator expected at Bio/SimpleAnalysisI.pm line > 96, > near " > 35 heikki" > (Missing operator before heikki?) > syntax error at Bio/SimpleAnalysisI.pm line 96, near "v 1.4" > Illegal octal digit '8' at Bio/SimpleAnalysisI.pm line 96, at end of > line > BEGIN not safe after errors--compilation aborted at > Bio/SimpleAnalysisI.pm > line > 97. > Compilation failed in require at > Bio/Tools/Analysis/SimpleAnalysisBase.pm > line 7 > 7. > BEGIN failed--compilation aborted at > Bio/Tools/Analysis/SimpleAnalysisBase.pm li > ne 77. > Compilation failed in require at > Bio/Tools/Analysis/Protein/Scansite.pm line > 138 > . > BEGIN failed--compilation aborted at > Bio/Tools/Analysis/Protein/Scansite.pm > line > 138. > Compilation failed in require at t/Scansite.t line 50. > > Brian O. > > > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Aaron J. > Mackey > Sent: Thursday, November 11, 2004 9:33 AM > To: Bioperl > Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download > > > And you can now get them via http, directly from the bioperl.org/DIST/ > directory: > > http://bioperl.org/DIST/bioperl-1.5.0-RC1.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.0-RC1.tar.gz > http://bioperl.org/DIST/bioperl-ext-1.5.0-RC1.tar.gz > > -Aaron > > On Nov 11, 2004, at 9:16 AM, Aaron J. Mackey wrote: > >> >> The somewhat-anxiously anticipated 1.5.0 release candidates are >> starting to roll off the shelves. Since I don't seem to have access >> to the download directory at bioperl.org, they are currently available >> via FTP (active, not passive) at: >> >> >> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- >> RC1.tar.gz >> >> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- >> RC1.tar.gz >> >> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- >> RC1.tar.gz >> >> Regarding -run and -ext versioning, I've bumped them all to be in sync >> with bioperl-live; I remember there was some discussion of this long >> ago, but please remind me if there's some critical reason for this not >> to occur. >> >> Thanks, enjoy, and gimme feedback (I already know that Unflattener2.t >> will fail 2 tests; these have already been fixed, and will be in the >> next RC, and/or final) >> >> -Aaron >> >> -- >> Aaron J. Mackey, Ph.D. >> Dept. of Biology, Goddard 212 >> University of Pennsylvania email: amackey@pcbi.upenn.edu >> 415 S. University Avenue office: 215-898-1205 >> Philadelphia, PA 19104-6017 fax: 215-746-6697 >> >> > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From laurichj at bioinfo.ucr.edu Tue Nov 16 13:22:05 2004 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Tue Nov 16 13:20:24 2004 Subject: [Bioperl-l] TIGR parser for SeqIO In-Reply-To: <62f15ed.9e428375.81b3c00@punts5.cc.uga.edu> References: <62f15ed.9e428375.81b3c00@punts5.cc.uga.edu> Message-ID: <20041116182205.GA1467@bioinfo.ucr.edu> On Fri 11/12/04 16:05, Ed Robinson wrote: > I am trying to get the tigr parser to work on our files. My > script is otherwise ok, because when I call a genbank file > with the genbank parser, I am able to load data. I have > downloaded the most recent tigr.pm from the CVS. When I run > it, I get this: ... TIGR hasn't actually changed the format, but they are starting to use a tag which wasn't used previously and I had overlooked. As I get the time, I will add support and commit to CVS. The overlooked tag was MODEL_EVIDENCE. If you need this, check CVS if its not there yet smack me and I might hurry up. -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| From barry.moore at genetics.utah.edu Tue Nov 16 12:10:05 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue Nov 16 15:06:29 2004 Subject: [Bioperl-l] problem installing bioperl-ext In-Reply-To: <7246683199amadoz@uv.es> References: <7246683199amadoz@uv.es> Message-ID: <419A346D.7040109@genetics.utah.edu> Alicia- You'll need both the perl Inline package and the Staden package installed to do a complete install of bioperl-ext. Do you really want Staden (tools typically used by sequeincing centers)? If not, don't use the makefile.pl at the base of bioperl-ext, but rather use the makefile.pl at the level you want to install (i.e. if you only want the alignment stuff, use the makefile.pl at that level found here http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-ext/Bio/Ext/Align/?cvsroot=bioperl). Barry Alicia Amadoz wrote: >Hi all, > >I finally managed to install Bioperl in Fedora Core 2 with Perl 5.8.5. >It runs ok and passes the test of James Tisdall's book. But when I try >to install (with 'cpan>force install B/BI/BIRNEY/bioperl-ext-1.4.tar.gz) > I have some error: > > CPAN.pm: Going to build B/BI/BIRNEY/bioperl-ext-1.4.tar.gz > >Checking if your kit is complete... >Looks good >Writing Makefile for Bio::Ext::Align >ERROR from evaluation of >/root/.cpan/build/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: Can't >locate Inline/MakeMaker.pm in @INC (@INC contains: >/home/amadoz/.cpan/build/bioperl-1.4/Bio/Perl.pm >/home/amadoz/.cpan/build/bioperl-1.4/blib/lib/Bio/Perl.pm >/usr/lib/perl5/5.8.5/i686-linux /usr/lib/perl5/5.8.5 >/usr/lib/perl5/site_perl/5.8.5/i686-linux /usr/lib/perl5/site_perl/5.8.5 >/usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 >/usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 >/usr/lib/perl5/site_perl /root/.cpan/build/bioperl-ext-1.4 .) at >./Makefile.PL line 1. >BEGIN failed--compilation aborted at ./Makefile.PL line 1. >Running make test > Make had some problems, maybe interrupted? Won't test >Running make install > Make had some problems, maybe interrupted? Won't install > > >I have been searching on the mailing-list archive but i couldn't find >anything useful. Thanks for your help. > >Regards, >Alicia > >************************************************ >Alicia Amadoz >Evolutionary Genetics Unit >Cavanilles Institute for Biodiversity and Evolutionary >Biology >University of Valencia >Apartado Oficial 22085 >E-46071 Valencia SPAIN >Phone: (+34) 96 354 3687 >FAX: (+34) 96 354 3670 >e-mail: alicia.amadoz@uv.es >http://www.uv.es/~amadoz >*********************************************** >NOTE! For shipments by EXPRESS COURIER use "Instituto >Cavanilles de Biodiversidad y Biolog?a Evolutiva, >Pol?gono de la Coma s/n, 46980 Paterna (Valencia), >Spain" instead of P.O. Box no. and Post Code/City above. > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From hlapp at gnf.org Tue Nov 16 15:43:22 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Nov 16 15:41:29 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: <28B10ECC-37F6-11D9-81F0-000A9577009E@pcbi.upenn.edu> References: <28B10ECC-37F6-11D9-81F0-000A9577009E@pcbi.upenn.edu> Message-ID: <28235647-3810-11D9-AF99-000A95AE92B0@gnf.org> Aaron, are you sure that you cannot use $ as the quote character? In Brian's message, the '$Revision' line was missing the closing $ which may have been the only error. -hilmar On Nov 16, 2004, at 9:37 AM, Aaron J. Mackey wrote: > Whomever thought this was cute, was wrong: > > % grep -r 'q\$Id' live ext run > live/Bio/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v 1.5 > 2003/06/04 08:36:35 heikki Exp $; > live/Bio/Biblio/IO/medline2ref.pm: $Revision = q$Id: > medline2ref.pm,v 1.11 2003/06/04 08:36:36 heikki Exp $; > live/Bio/Biblio/IO/medlinexml.pm: $Revision = q$Id: medlinexml.pm,v > 1.6 2003/06/04 08:36:36 heikki Exp $; > live/Bio/Biblio/IO/pubmed2ref.pm: $Revision = q$Id: pubmed2ref.pm,v > 1.3 2003/06/04 08:36:36 heikki Exp $; > live/Bio/Biblio/IO/pubmedxml.pm: $Revision = q$Id: pubmedxml.pm,v > 1.5 2003/06/04 08:36:36 heikki Exp $; > live/Bio/Biblio.pm: $Revision = q$Id: Biblio.pm,v 1.11 2004/07/13 > 11:44:37 bosborne Exp $; > live/Bio/DB/Biblio/biofetch.pm: $Revision = q$Id: biofetch.pm,v 1.6 > 2003/06/04 08:36:37 heikki Exp $; > live/Bio/DB/Biblio/soap.pm: $Revision = q$Id: soap.pm,v 1.6 > 2003/06/04 08:36:37 heikki Exp $; > live/Bio/DB/BiblioI.pm: $Revision = q$Id: BiblioI.pm,v 1.6 > 2003/06/04 08:36:37 heikki Exp $; > live/Bio/Factory/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v 1.2 > 2003/06/04 08:36:39 heikki Exp $; > live/Bio/SimpleAnalysisI.pm: $Revision = q$Id: SimpleAnalysisI.pm,v > 1.4 2003/06/04 08:36:35 heikki Exp $; > live/Bio/WebAgent.pm: $Revision = q$Id: WebAgent.pm,v 1.4 > 2003/06/04 08:36:35 heikki Exp $; > run/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: soap.pm,v 1.8 > 2003/06/04 08:48:26 heikki Exp $; > run/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: Analysis.pm,v 1.6 > 2003/06/04 08:48:26 heikki Exp $; > run/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = q$Id: > soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; > run/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: > AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; > run/blib/lib/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: > soap.pm,v 1.8 2003/06/04 08:48:26 heikki Exp $; > run/blib/lib/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: > Analysis.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; > run/blib/lib/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = > q$Id: soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; > run/blib/lib/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: > AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; > run/scripts/panalysis.PLS: $Revision = q$Id: panalysis.PLS,v 1.5 > 2003/11/25 18:09:56 bosborne Exp $; > run/scripts/papplmaker.PLS: $Revision = q$Id: papplmaker.PLS,v 1.3 > 2003/05/30 15:43:22 jason Exp $; > > can someone please replace these q$Id ... $; with q[$Id .. $] or > something similar? > > Thanks, > > -Aaron > > On Nov 13, 2004, at 10:06 PM, Brian Osborne wrote: > >> Aaron, >> >> On Cygwin I see a whole slew of new failed tests: >> >> Failed Test Stat Wstat Total Fail Failed List of Failed >> ---------------------------------------------------------------------- >> ------ >> --- >> t/Biblio.t 255 65280 24 38 158.33% 4-24 >> t/Biblio_biofetch.t 255 65280 11 0 0.00% ?? >> t/Biblio_eutils.t 255 65280 5 0 0.00% ?? >> t/DBCUTG.t 2 512 24 0 0.00% ?? >> t/Domcut.t 255 65280 25 0 0.00% ?? >> t/ELM.t 255 65280 14 0 0.00% ?? >> t/GOR4.t 255 65280 13 0 0.00% ?? >> t/HNN.t 255 65280 13 0 0.00% ?? >> t/MeSH.t 255 65280 26 0 0.00% ?? >> t/MitoProt.t 255 65280 8 0 0.00% ?? >> t/NetPhos.t 255 65280 14 0 0.00% ?? >> t/Scansite.t 255 65280 12 0 0.00% ?? >> t/Sopma.t 255 65280 15 0 0.00% ?? >> t/Unflattener2.t 11 2 18.18% 7 10 >> 24 subtests skipped. >> make: *** [test_dynamic] Error 14 >> >> >> Most of them seem to be due to the same problem, blocks like this: >> >> 95 BEGIN { >> 96 $Revision = qSimpleAnalysisI.pm,v 1.4 2003/06/04 08:36:35 >> heikki Exp; >> 97 } >> >> >> 129 ~/tmp/bioperl-1.5.0-RC1>perl -I. -w t/Scansite.t >> 1..12 >> Number found where operator expected at Bio/SimpleAnalysisI.pm line >> 96, near >> "v >> 1.4" >> (Do you need to predeclare v?) >> Number found where operator expected at Bio/SimpleAnalysisI.pm line >> 96, near >> "1. >> 4 2003" >> (Missing operator before 2003?) >> Number found where operator expected at Bio/SimpleAnalysisI.pm line >> 96, near >> "04 >> 08" >> (Missing operator before 08?) >> Bareword found where operator expected at Bio/SimpleAnalysisI.pm line >> 96, >> near " >> 35 heikki" >> (Missing operator before heikki?) >> syntax error at Bio/SimpleAnalysisI.pm line 96, near "v 1.4" >> Illegal octal digit '8' at Bio/SimpleAnalysisI.pm line 96, at end of >> line >> BEGIN not safe after errors--compilation aborted at >> Bio/SimpleAnalysisI.pm >> line >> 97. >> Compilation failed in require at >> Bio/Tools/Analysis/SimpleAnalysisBase.pm >> line 7 >> 7. >> BEGIN failed--compilation aborted at >> Bio/Tools/Analysis/SimpleAnalysisBase.pm li >> ne 77. >> Compilation failed in require at >> Bio/Tools/Analysis/Protein/Scansite.pm line >> 138 >> . >> BEGIN failed--compilation aborted at >> Bio/Tools/Analysis/Protein/Scansite.pm >> line >> 138. >> Compilation failed in require at t/Scansite.t line 50. >> >> Brian O. >> >> >> >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Aaron J. >> Mackey >> Sent: Thursday, November 11, 2004 9:33 AM >> To: Bioperl >> Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download >> >> >> And you can now get them via http, directly from the bioperl.org/DIST/ >> directory: >> >> http://bioperl.org/DIST/bioperl-1.5.0-RC1.tar.gz >> http://bioperl.org/DIST/bioperl-run-1.5.0-RC1.tar.gz >> http://bioperl.org/DIST/bioperl-ext-1.5.0-RC1.tar.gz >> >> -Aaron >> >> On Nov 11, 2004, at 9:16 AM, Aaron J. Mackey wrote: >> >>> >>> The somewhat-anxiously anticipated 1.5.0 release candidates are >>> starting to roll off the shelves. Since I don't seem to have access >>> to the download directory at bioperl.org, they are currently >>> available >>> via FTP (active, not passive) at: >>> >>> >>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- >>> RC1.tar.gz >>> >>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- >>> RC1.tar.gz >>> >>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- >>> RC1.tar.gz >>> >>> Regarding -run and -ext versioning, I've bumped them all to be in >>> sync >>> with bioperl-live; I remember there was some discussion of this long >>> ago, but please remind me if there's some critical reason for this >>> not >>> to occur. >>> >>> Thanks, enjoy, and gimme feedback (I already know that Unflattener2.t >>> will fail 2 tests; these have already been fixed, and will be in the >>> next RC, and/or final) >>> >>> -Aaron >>> >>> -- >>> Aaron J. Mackey, Ph.D. >>> Dept. of Biology, Goddard 212 >>> University of Pennsylvania email: amackey@pcbi.upenn.edu >>> 415 S. University Avenue office: 215-898-1205 >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 >>> >>> >> -- >> Aaron J. Mackey, Ph.D. >> Dept. of Biology, Goddard 212 >> University of Pennsylvania email: amackey@pcbi.upenn.edu >> 415 S. University Avenue office: 215-898-1205 >> Philadelphia, PA 19104-6017 fax: 215-746-6697 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Tue Nov 16 15:52:54 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 16 15:50:57 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: <28235647-3810-11D9-AF99-000A95AE92B0@gnf.org> References: <28B10ECC-37F6-11D9-81F0-000A9577009E@pcbi.upenn.edu> <28235647-3810-11D9-AF99-000A95AE92B0@gnf.org> Message-ID: <7CBE2C5A-3811-11D9-A6CB-000393C44276@duke.edu> The cvs export command is dropping the $ $ which bound the '$Revision$' not the $Revision variable. You need to provide a delimiter to the q operator. I've fixed it by putting square brackets [] around the Revision declaration. bioperl-run needs some more testing and fixing before the release. Shouldn't take too long but need people to run the tests, and fix the warnings. Some tests seem to be lacking code which handles the cases when the application being used is not present. -jason On Nov 16, 2004, at 3:43 PM, Hilmar Lapp wrote: > Aaron, > > are you sure that you cannot use $ as the quote character? > > In Brian's message, the '$Revision' line was missing the closing $ > which may have been the only error. > > -hilmar > > On Nov 16, 2004, at 9:37 AM, Aaron J. Mackey wrote: > >> Whomever thought this was cute, was wrong: >> >> % grep -r 'q\$Id' live ext run >> live/Bio/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v 1.5 >> 2003/06/04 08:36:35 heikki Exp $; >> live/Bio/Biblio/IO/medline2ref.pm: $Revision = q$Id: >> medline2ref.pm,v 1.11 2003/06/04 08:36:36 heikki Exp $; >> live/Bio/Biblio/IO/medlinexml.pm: $Revision = q$Id: >> medlinexml.pm,v 1.6 2003/06/04 08:36:36 heikki Exp $; >> live/Bio/Biblio/IO/pubmed2ref.pm: $Revision = q$Id: >> pubmed2ref.pm,v 1.3 2003/06/04 08:36:36 heikki Exp $; >> live/Bio/Biblio/IO/pubmedxml.pm: $Revision = q$Id: pubmedxml.pm,v >> 1.5 2003/06/04 08:36:36 heikki Exp $; >> live/Bio/Biblio.pm: $Revision = q$Id: Biblio.pm,v 1.11 2004/07/13 >> 11:44:37 bosborne Exp $; >> live/Bio/DB/Biblio/biofetch.pm: $Revision = q$Id: biofetch.pm,v >> 1.6 2003/06/04 08:36:37 heikki Exp $; >> live/Bio/DB/Biblio/soap.pm: $Revision = q$Id: soap.pm,v 1.6 >> 2003/06/04 08:36:37 heikki Exp $; >> live/Bio/DB/BiblioI.pm: $Revision = q$Id: BiblioI.pm,v 1.6 >> 2003/06/04 08:36:37 heikki Exp $; >> live/Bio/Factory/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v >> 1.2 2003/06/04 08:36:39 heikki Exp $; >> live/Bio/SimpleAnalysisI.pm: $Revision = q$Id: >> SimpleAnalysisI.pm,v 1.4 2003/06/04 08:36:35 heikki Exp $; >> live/Bio/WebAgent.pm: $Revision = q$Id: WebAgent.pm,v 1.4 >> 2003/06/04 08:36:35 heikki Exp $; >> run/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: soap.pm,v >> 1.8 2003/06/04 08:48:26 heikki Exp $; >> run/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: Analysis.pm,v 1.6 >> 2003/06/04 08:48:26 heikki Exp $; >> run/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = q$Id: >> soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; >> run/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: >> AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; >> run/blib/lib/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: >> soap.pm,v 1.8 2003/06/04 08:48:26 heikki Exp $; >> run/blib/lib/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: >> Analysis.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; >> run/blib/lib/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = >> q$Id: soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; >> run/blib/lib/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: >> AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; >> run/scripts/panalysis.PLS: $Revision = q$Id: panalysis.PLS,v 1.5 >> 2003/11/25 18:09:56 bosborne Exp $; >> run/scripts/papplmaker.PLS: $Revision = q$Id: papplmaker.PLS,v 1.3 >> 2003/05/30 15:43:22 jason Exp $; >> >> can someone please replace these q$Id ... $; with q[$Id .. $] or >> something similar? >> >> Thanks, >> >> -Aaron >> >> On Nov 13, 2004, at 10:06 PM, Brian Osborne wrote: >> >>> Aaron, >>> >>> On Cygwin I see a whole slew of new failed tests: >>> >>> Failed Test Stat Wstat Total Fail Failed List of Failed >>> --------------------------------------------------------------------- >>> ------- >>> --- >>> t/Biblio.t 255 65280 24 38 158.33% 4-24 >>> t/Biblio_biofetch.t 255 65280 11 0 0.00% ?? >>> t/Biblio_eutils.t 255 65280 5 0 0.00% ?? >>> t/DBCUTG.t 2 512 24 0 0.00% ?? >>> t/Domcut.t 255 65280 25 0 0.00% ?? >>> t/ELM.t 255 65280 14 0 0.00% ?? >>> t/GOR4.t 255 65280 13 0 0.00% ?? >>> t/HNN.t 255 65280 13 0 0.00% ?? >>> t/MeSH.t 255 65280 26 0 0.00% ?? >>> t/MitoProt.t 255 65280 8 0 0.00% ?? >>> t/NetPhos.t 255 65280 14 0 0.00% ?? >>> t/Scansite.t 255 65280 12 0 0.00% ?? >>> t/Sopma.t 255 65280 15 0 0.00% ?? >>> t/Unflattener2.t 11 2 18.18% 7 10 >>> 24 subtests skipped. >>> make: *** [test_dynamic] Error 14 >>> >>> >>> Most of them seem to be due to the same problem, blocks like this: >>> >>> 95 BEGIN { >>> 96 $Revision = qSimpleAnalysisI.pm,v 1.4 2003/06/04 08:36:35 >>> heikki Exp; >>> 97 } >>> >>> >>> 129 ~/tmp/bioperl-1.5.0-RC1>perl -I. -w t/Scansite.t >>> 1..12 >>> Number found where operator expected at Bio/SimpleAnalysisI.pm line >>> 96, near >>> "v >>> 1.4" >>> (Do you need to predeclare v?) >>> Number found where operator expected at Bio/SimpleAnalysisI.pm line >>> 96, near >>> "1. >>> 4 2003" >>> (Missing operator before 2003?) >>> Number found where operator expected at Bio/SimpleAnalysisI.pm line >>> 96, near >>> "04 >>> 08" >>> (Missing operator before 08?) >>> Bareword found where operator expected at Bio/SimpleAnalysisI.pm >>> line 96, >>> near " >>> 35 heikki" >>> (Missing operator before heikki?) >>> syntax error at Bio/SimpleAnalysisI.pm line 96, near "v 1.4" >>> Illegal octal digit '8' at Bio/SimpleAnalysisI.pm line 96, at end of >>> line >>> BEGIN not safe after errors--compilation aborted at >>> Bio/SimpleAnalysisI.pm >>> line >>> 97. >>> Compilation failed in require at >>> Bio/Tools/Analysis/SimpleAnalysisBase.pm >>> line 7 >>> 7. >>> BEGIN failed--compilation aborted at >>> Bio/Tools/Analysis/SimpleAnalysisBase.pm li >>> ne 77. >>> Compilation failed in require at >>> Bio/Tools/Analysis/Protein/Scansite.pm line >>> 138 >>> . >>> BEGIN failed--compilation aborted at >>> Bio/Tools/Analysis/Protein/Scansite.pm >>> line >>> 138. >>> Compilation failed in require at t/Scansite.t line 50. >>> >>> Brian O. >>> >>> >>> >>> >>> -----Original Message----- >>> From: bioperl-l-bounces@portal.open-bio.org >>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Aaron J. >>> Mackey >>> Sent: Thursday, November 11, 2004 9:33 AM >>> To: Bioperl >>> Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download >>> >>> >>> And you can now get them via http, directly from the >>> bioperl.org/DIST/ >>> directory: >>> >>> http://bioperl.org/DIST/bioperl-1.5.0-RC1.tar.gz >>> http://bioperl.org/DIST/bioperl-run-1.5.0-RC1.tar.gz >>> http://bioperl.org/DIST/bioperl-ext-1.5.0-RC1.tar.gz >>> >>> -Aaron >>> >>> On Nov 11, 2004, at 9:16 AM, Aaron J. Mackey wrote: >>> >>>> >>>> The somewhat-anxiously anticipated 1.5.0 release candidates are >>>> starting to roll off the shelves. Since I don't seem to have access >>>> to the download directory at bioperl.org, they are currently >>>> available >>>> via FTP (active, not passive) at: >>>> >>>> >>>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- >>>> RC1.tar.gz >>>> >>>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- >>>> RC1.tar.gz >>>> >>>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- >>>> RC1.tar.gz >>>> >>>> Regarding -run and -ext versioning, I've bumped them all to be in >>>> sync >>>> with bioperl-live; I remember there was some discussion of this long >>>> ago, but please remind me if there's some critical reason for this >>>> not >>>> to occur. >>>> >>>> Thanks, enjoy, and gimme feedback (I already know that >>>> Unflattener2.t >>>> will fail 2 tests; these have already been fixed, and will be in the >>>> next RC, and/or final) >>>> >>>> -Aaron >>>> >>>> -- >>>> Aaron J. Mackey, Ph.D. >>>> Dept. of Biology, Goddard 212 >>>> University of Pennsylvania email: amackey@pcbi.upenn.edu >>>> 415 S. University Avenue office: 215-898-1205 >>>> Philadelphia, PA 19104-6017 fax: 215-746-6697 >>>> >>>> >>> -- >>> Aaron J. Mackey, Ph.D. >>> Dept. of Biology, Goddard 212 >>> University of Pennsylvania email: amackey@pcbi.upenn.edu >>> 415 S. University Avenue office: 215-898-1205 >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> Aaron J. Mackey, Ph.D. >> Dept. of Biology, Goddard 212 >> University of Pennsylvania email: amackey@pcbi.upenn.edu >> 415 S. University Avenue office: 215-898-1205 >> Philadelphia, PA 19104-6017 fax: 215-746-6697 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From amackey at pcbi.upenn.edu Tue Nov 16 15:55:18 2004 From: amackey at pcbi.upenn.edu (amackey@pcbi.upenn.edu) Date: Tue Nov 16 15:53:19 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: <28235647-3810-11D9-AF99-000A95AE92B0@gnf.org> References: <28B10ECC-37F6-11D9-81F0-000A9577009E@pcbi.upenn.edu> <28235647-3810-11D9-AF99-000A95AE92B0@gnf.org> Message-ID: <1100638518.419a693680bfe@webmail.pcbi.upenn.edu> The issue is that on CVS export I'm using -kv to strip out the CVS keywords (to be friendly to others who may wish to import the code into their own CVS repositories). So "reusing" the CVS keyword-associated $'s for quoted delimiters is cute, but (as evidenced below) causes problems. -Aaron Quoting Hilmar Lapp : > Aaron, > > are you sure that you cannot use $ as the quote character? > > In Brian's message, the '$Revision' line was missing the closing $ > which may have been the only error. > > -hilmar > > On Nov 16, 2004, at 9:37 AM, Aaron J. Mackey wrote: > > > Whomever thought this was cute, was wrong: > > > > % grep -r 'q\$Id' live ext run > > live/Bio/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v 1.5 > > 2003/06/04 08:36:35 heikki Exp $; > > live/Bio/Biblio/IO/medline2ref.pm: $Revision = q$Id: > > medline2ref.pm,v 1.11 2003/06/04 08:36:36 heikki Exp $; > > live/Bio/Biblio/IO/medlinexml.pm: $Revision = q$Id: medlinexml.pm,v > > 1.6 2003/06/04 08:36:36 heikki Exp $; > > live/Bio/Biblio/IO/pubmed2ref.pm: $Revision = q$Id: pubmed2ref.pm,v > > 1.3 2003/06/04 08:36:36 heikki Exp $; > > live/Bio/Biblio/IO/pubmedxml.pm: $Revision = q$Id: pubmedxml.pm,v > > 1.5 2003/06/04 08:36:36 heikki Exp $; > > live/Bio/Biblio.pm: $Revision = q$Id: Biblio.pm,v 1.11 2004/07/13 > > 11:44:37 bosborne Exp $; > > live/Bio/DB/Biblio/biofetch.pm: $Revision = q$Id: biofetch.pm,v 1.6 > > 2003/06/04 08:36:37 heikki Exp $; > > live/Bio/DB/Biblio/soap.pm: $Revision = q$Id: soap.pm,v 1.6 > > 2003/06/04 08:36:37 heikki Exp $; > > live/Bio/DB/BiblioI.pm: $Revision = q$Id: BiblioI.pm,v 1.6 > > 2003/06/04 08:36:37 heikki Exp $; > > live/Bio/Factory/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v 1.2 > > 2003/06/04 08:36:39 heikki Exp $; > > live/Bio/SimpleAnalysisI.pm: $Revision = q$Id: SimpleAnalysisI.pm,v > > 1.4 2003/06/04 08:36:35 heikki Exp $; > > live/Bio/WebAgent.pm: $Revision = q$Id: WebAgent.pm,v 1.4 > > 2003/06/04 08:36:35 heikki Exp $; > > run/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: soap.pm,v 1.8 > > 2003/06/04 08:48:26 heikki Exp $; > > run/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: Analysis.pm,v 1.6 > > 2003/06/04 08:48:26 heikki Exp $; > > run/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = q$Id: > > soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; > > run/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: > > AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; > > run/blib/lib/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: > > soap.pm,v 1.8 2003/06/04 08:48:26 heikki Exp $; > > run/blib/lib/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: > > Analysis.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; > > run/blib/lib/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = > > q$Id: soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; > > run/blib/lib/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: > > AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; > > run/scripts/panalysis.PLS: $Revision = q$Id: panalysis.PLS,v 1.5 > > 2003/11/25 18:09:56 bosborne Exp $; > > run/scripts/papplmaker.PLS: $Revision = q$Id: papplmaker.PLS,v 1.3 > > 2003/05/30 15:43:22 jason Exp $; > > > > can someone please replace these q$Id ... $; with q[$Id .. $] or > > something similar? > > > > Thanks, > > > > -Aaron > > > > On Nov 13, 2004, at 10:06 PM, Brian Osborne wrote: > > > >> Aaron, > >> > >> On Cygwin I see a whole slew of new failed tests: > >> > >> Failed Test Stat Wstat Total Fail Failed List of Failed > >> ---------------------------------------------------------------------- > >> ------ > >> --- > >> t/Biblio.t 255 65280 24 38 158.33% 4-24 > >> t/Biblio_biofetch.t 255 65280 11 0 0.00% ?? > >> t/Biblio_eutils.t 255 65280 5 0 0.00% ?? > >> t/DBCUTG.t 2 512 24 0 0.00% ?? > >> t/Domcut.t 255 65280 25 0 0.00% ?? > >> t/ELM.t 255 65280 14 0 0.00% ?? > >> t/GOR4.t 255 65280 13 0 0.00% ?? > >> t/HNN.t 255 65280 13 0 0.00% ?? > >> t/MeSH.t 255 65280 26 0 0.00% ?? > >> t/MitoProt.t 255 65280 8 0 0.00% ?? > >> t/NetPhos.t 255 65280 14 0 0.00% ?? > >> t/Scansite.t 255 65280 12 0 0.00% ?? > >> t/Sopma.t 255 65280 15 0 0.00% ?? > >> t/Unflattener2.t 11 2 18.18% 7 10 > >> 24 subtests skipped. > >> make: *** [test_dynamic] Error 14 > >> > >> > >> Most of them seem to be due to the same problem, blocks like this: > >> > >> 95 BEGIN { > >> 96 $Revision = qSimpleAnalysisI.pm,v 1.4 2003/06/04 08:36:35 > >> heikki Exp; > >> 97 } > >> > >> > >> 129 ~/tmp/bioperl-1.5.0-RC1>perl -I. -w t/Scansite.t > >> 1..12 > >> Number found where operator expected at Bio/SimpleAnalysisI.pm line > >> 96, near > >> "v > >> 1.4" > >> (Do you need to predeclare v?) > >> Number found where operator expected at Bio/SimpleAnalysisI.pm line > >> 96, near > >> "1. > >> 4 2003" > >> (Missing operator before 2003?) > >> Number found where operator expected at Bio/SimpleAnalysisI.pm line > >> 96, near > >> "04 > >> 08" > >> (Missing operator before 08?) > >> Bareword found where operator expected at Bio/SimpleAnalysisI.pm line > >> 96, > >> near " > >> 35 heikki" > >> (Missing operator before heikki?) > >> syntax error at Bio/SimpleAnalysisI.pm line 96, near "v 1.4" > >> Illegal octal digit '8' at Bio/SimpleAnalysisI.pm line 96, at end of > >> line > >> BEGIN not safe after errors--compilation aborted at > >> Bio/SimpleAnalysisI.pm > >> line > >> 97. > >> Compilation failed in require at > >> Bio/Tools/Analysis/SimpleAnalysisBase.pm > >> line 7 > >> 7. > >> BEGIN failed--compilation aborted at > >> Bio/Tools/Analysis/SimpleAnalysisBase.pm li > >> ne 77. > >> Compilation failed in require at > >> Bio/Tools/Analysis/Protein/Scansite.pm line > >> 138 > >> . > >> BEGIN failed--compilation aborted at > >> Bio/Tools/Analysis/Protein/Scansite.pm > >> line > >> 138. > >> Compilation failed in require at t/Scansite.t line 50. > >> > >> Brian O. > >> > >> > >> > >> > >> -----Original Message----- > >> From: bioperl-l-bounces@portal.open-bio.org > >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Aaron J. > >> Mackey > >> Sent: Thursday, November 11, 2004 9:33 AM > >> To: Bioperl > >> Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download > >> > >> > >> And you can now get them via http, directly from the bioperl.org/DIST/ > >> directory: > >> > >> http://bioperl.org/DIST/bioperl-1.5.0-RC1.tar.gz > >> http://bioperl.org/DIST/bioperl-run-1.5.0-RC1.tar.gz > >> http://bioperl.org/DIST/bioperl-ext-1.5.0-RC1.tar.gz > >> > >> -Aaron > >> > >> On Nov 11, 2004, at 9:16 AM, Aaron J. Mackey wrote: > >> > >>> > >>> The somewhat-anxiously anticipated 1.5.0 release candidates are > >>> starting to roll off the shelves. Since I don't seem to have access > >>> to the download directory at bioperl.org, they are currently > >>> available > >>> via FTP (active, not passive) at: > >>> > >>> > >>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- > >>> RC1.tar.gz > >>> > >>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- > >>> RC1.tar.gz > >>> > >>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- > >>> RC1.tar.gz > >>> > >>> Regarding -run and -ext versioning, I've bumped them all to be in > >>> sync > >>> with bioperl-live; I remember there was some discussion of this long > >>> ago, but please remind me if there's some critical reason for this > >>> not > >>> to occur. > >>> > >>> Thanks, enjoy, and gimme feedback (I already know that Unflattener2.t > >>> will fail 2 tests; these have already been fixed, and will be in the > >>> next RC, and/or final) > >>> > >>> -Aaron > >>> > >>> -- > >>> Aaron J. Mackey, Ph.D. > >>> Dept. of Biology, Goddard 212 > >>> University of Pennsylvania email: amackey@pcbi.upenn.edu > >>> 415 S. University Avenue office: 215-898-1205 > >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 > >>> > >>> > >> -- > >> Aaron J. Mackey, Ph.D. > >> Dept. of Biology, Goddard 212 > >> University of Pennsylvania email: amackey@pcbi.upenn.edu > >> 415 S. University Avenue office: 215-898-1205 > >> Philadelphia, PA 19104-6017 fax: 215-746-6697 > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > -- > > Aaron J. Mackey, Ph.D. > > Dept. of Biology, Goddard 212 > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > 415 S. University Avenue office: 215-898-1205 > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > From allenday at ucla.edu Tue Nov 16 17:20:01 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 16 17:18:02 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: <28235647-3810-11D9-AF99-000A95AE92B0@gnf.org> References: <28B10ECC-37F6-11D9-81F0-000A9577009E@pcbi.upenn.edu> <28235647-3810-11D9-AF99-000A95AE92B0@gnf.org> Message-ID: this is speculative, but maybe it is a perl version/platform-specific problem. more importantly though, why would you want to use $ as a quote delimiter when there are other, less potentially confusing, options available? i agree with aaron that is is not good. at the very least it is bad style. -allen On Tue, 16 Nov 2004, Hilmar Lapp wrote: > Aaron, > > are you sure that you cannot use $ as the quote character? > > In Brian's message, the '$Revision' line was missing the closing $ > which may have been the only error. > > -hilmar > > On Nov 16, 2004, at 9:37 AM, Aaron J. Mackey wrote: > > > Whomever thought this was cute, was wrong: > > > > % grep -r 'q\$Id' live ext run > > live/Bio/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v 1.5 > > 2003/06/04 08:36:35 heikki Exp $; > > live/Bio/Biblio/IO/medline2ref.pm: $Revision = q$Id: > > medline2ref.pm,v 1.11 2003/06/04 08:36:36 heikki Exp $; > > live/Bio/Biblio/IO/medlinexml.pm: $Revision = q$Id: medlinexml.pm,v > > 1.6 2003/06/04 08:36:36 heikki Exp $; > > live/Bio/Biblio/IO/pubmed2ref.pm: $Revision = q$Id: pubmed2ref.pm,v > > 1.3 2003/06/04 08:36:36 heikki Exp $; > > live/Bio/Biblio/IO/pubmedxml.pm: $Revision = q$Id: pubmedxml.pm,v > > 1.5 2003/06/04 08:36:36 heikki Exp $; > > live/Bio/Biblio.pm: $Revision = q$Id: Biblio.pm,v 1.11 2004/07/13 > > 11:44:37 bosborne Exp $; > > live/Bio/DB/Biblio/biofetch.pm: $Revision = q$Id: biofetch.pm,v 1.6 > > 2003/06/04 08:36:37 heikki Exp $; > > live/Bio/DB/Biblio/soap.pm: $Revision = q$Id: soap.pm,v 1.6 > > 2003/06/04 08:36:37 heikki Exp $; > > live/Bio/DB/BiblioI.pm: $Revision = q$Id: BiblioI.pm,v 1.6 > > 2003/06/04 08:36:37 heikki Exp $; > > live/Bio/Factory/AnalysisI.pm: $Revision = q$Id: AnalysisI.pm,v 1.2 > > 2003/06/04 08:36:39 heikki Exp $; > > live/Bio/SimpleAnalysisI.pm: $Revision = q$Id: SimpleAnalysisI.pm,v > > 1.4 2003/06/04 08:36:35 heikki Exp $; > > live/Bio/WebAgent.pm: $Revision = q$Id: WebAgent.pm,v 1.4 > > 2003/06/04 08:36:35 heikki Exp $; > > run/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: soap.pm,v 1.8 > > 2003/06/04 08:48:26 heikki Exp $; > > run/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: Analysis.pm,v 1.6 > > 2003/06/04 08:48:26 heikki Exp $; > > run/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = q$Id: > > soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; > > run/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: > > AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; > > run/blib/lib/Bio/Tools/Run/Analysis/soap.pm: $Revision = q$Id: > > soap.pm,v 1.8 2003/06/04 08:48:26 heikki Exp $; > > run/blib/lib/Bio/Tools/Run/Analysis.pm: $Revision = q$Id: > > Analysis.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; > > run/blib/lib/Bio/Tools/Run/AnalysisFactory/soap.pm: $Revision = > > q$Id: soap.pm,v 1.3 2003/06/04 08:48:26 heikki Exp $; > > run/blib/lib/Bio/Tools/Run/AnalysisFactory.pm: $Revision = q$Id: > > AnalysisFactory.pm,v 1.6 2003/06/04 08:48:26 heikki Exp $; > > run/scripts/panalysis.PLS: $Revision = q$Id: panalysis.PLS,v 1.5 > > 2003/11/25 18:09:56 bosborne Exp $; > > run/scripts/papplmaker.PLS: $Revision = q$Id: papplmaker.PLS,v 1.3 > > 2003/05/30 15:43:22 jason Exp $; > > > > can someone please replace these q$Id ... $; with q[$Id .. $] or > > something similar? > > > > Thanks, > > > > -Aaron > > > > On Nov 13, 2004, at 10:06 PM, Brian Osborne wrote: > > > >> Aaron, > >> > >> On Cygwin I see a whole slew of new failed tests: > >> > >> Failed Test Stat Wstat Total Fail Failed List of Failed > >> ---------------------------------------------------------------------- > >> ------ > >> --- > >> t/Biblio.t 255 65280 24 38 158.33% 4-24 > >> t/Biblio_biofetch.t 255 65280 11 0 0.00% ?? > >> t/Biblio_eutils.t 255 65280 5 0 0.00% ?? > >> t/DBCUTG.t 2 512 24 0 0.00% ?? > >> t/Domcut.t 255 65280 25 0 0.00% ?? > >> t/ELM.t 255 65280 14 0 0.00% ?? > >> t/GOR4.t 255 65280 13 0 0.00% ?? > >> t/HNN.t 255 65280 13 0 0.00% ?? > >> t/MeSH.t 255 65280 26 0 0.00% ?? > >> t/MitoProt.t 255 65280 8 0 0.00% ?? > >> t/NetPhos.t 255 65280 14 0 0.00% ?? > >> t/Scansite.t 255 65280 12 0 0.00% ?? > >> t/Sopma.t 255 65280 15 0 0.00% ?? > >> t/Unflattener2.t 11 2 18.18% 7 10 > >> 24 subtests skipped. > >> make: *** [test_dynamic] Error 14 > >> > >> > >> Most of them seem to be due to the same problem, blocks like this: > >> > >> 95 BEGIN { > >> 96 $Revision = qSimpleAnalysisI.pm,v 1.4 2003/06/04 08:36:35 > >> heikki Exp; > >> 97 } > >> > >> > >> 129 ~/tmp/bioperl-1.5.0-RC1>perl -I. -w t/Scansite.t > >> 1..12 > >> Number found where operator expected at Bio/SimpleAnalysisI.pm line > >> 96, near > >> "v > >> 1.4" > >> (Do you need to predeclare v?) > >> Number found where operator expected at Bio/SimpleAnalysisI.pm line > >> 96, near > >> "1. > >> 4 2003" > >> (Missing operator before 2003?) > >> Number found where operator expected at Bio/SimpleAnalysisI.pm line > >> 96, near > >> "04 > >> 08" > >> (Missing operator before 08?) > >> Bareword found where operator expected at Bio/SimpleAnalysisI.pm line > >> 96, > >> near " > >> 35 heikki" > >> (Missing operator before heikki?) > >> syntax error at Bio/SimpleAnalysisI.pm line 96, near "v 1.4" > >> Illegal octal digit '8' at Bio/SimpleAnalysisI.pm line 96, at end of > >> line > >> BEGIN not safe after errors--compilation aborted at > >> Bio/SimpleAnalysisI.pm > >> line > >> 97. > >> Compilation failed in require at > >> Bio/Tools/Analysis/SimpleAnalysisBase.pm > >> line 7 > >> 7. > >> BEGIN failed--compilation aborted at > >> Bio/Tools/Analysis/SimpleAnalysisBase.pm li > >> ne 77. > >> Compilation failed in require at > >> Bio/Tools/Analysis/Protein/Scansite.pm line > >> 138 > >> . > >> BEGIN failed--compilation aborted at > >> Bio/Tools/Analysis/Protein/Scansite.pm > >> line > >> 138. > >> Compilation failed in require at t/Scansite.t line 50. > >> > >> Brian O. > >> > >> > >> > >> > >> -----Original Message----- > >> From: bioperl-l-bounces@portal.open-bio.org > >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Aaron J. > >> Mackey > >> Sent: Thursday, November 11, 2004 9:33 AM > >> To: Bioperl > >> Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download > >> > >> > >> And you can now get them via http, directly from the bioperl.org/DIST/ > >> directory: > >> > >> http://bioperl.org/DIST/bioperl-1.5.0-RC1.tar.gz > >> http://bioperl.org/DIST/bioperl-run-1.5.0-RC1.tar.gz > >> http://bioperl.org/DIST/bioperl-ext-1.5.0-RC1.tar.gz > >> > >> -Aaron > >> > >> On Nov 11, 2004, at 9:16 AM, Aaron J. Mackey wrote: > >> > >>> > >>> The somewhat-anxiously anticipated 1.5.0 release candidates are > >>> starting to roll off the shelves. Since I don't seem to have access > >>> to the download directory at bioperl.org, they are currently > >>> available > >>> via FTP (active, not passive) at: > >>> > >>> > >>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-release-1.5.0- > >>> RC1.tar.gz > >>> > >>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-run-release-1.5.0- > >>> RC1.tar.gz > >>> > >>> ftp://roos-compbio.bio.upenn.edu/bioperl/bioperl-ext-release-1.5.0- > >>> RC1.tar.gz > >>> > >>> Regarding -run and -ext versioning, I've bumped them all to be in > >>> sync > >>> with bioperl-live; I remember there was some discussion of this long > >>> ago, but please remind me if there's some critical reason for this > >>> not > >>> to occur. > >>> > >>> Thanks, enjoy, and gimme feedback (I already know that Unflattener2.t > >>> will fail 2 tests; these have already been fixed, and will be in the > >>> next RC, and/or final) > >>> > >>> -Aaron > >>> > >>> -- > >>> Aaron J. Mackey, Ph.D. > >>> Dept. of Biology, Goddard 212 > >>> University of Pennsylvania email: amackey@pcbi.upenn.edu > >>> 415 S. University Avenue office: 215-898-1205 > >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 > >>> > >>> > >> -- > >> Aaron J. Mackey, Ph.D. > >> Dept. of Biology, Goddard 212 > >> University of Pennsylvania email: amackey@pcbi.upenn.edu > >> 415 S. University Avenue office: 215-898-1205 > >> Philadelphia, PA 19104-6017 fax: 215-746-6697 > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > -- > > Aaron J. Mackey, Ph.D. > > Dept. of Biology, Goddard 212 > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > 415 S. University Avenue office: 215-898-1205 > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From hlapp at gnf.org Tue Nov 16 17:37:12 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Nov 16 17:35:17 2004 Subject: [Bioperl-l] Re: 1.5.0-RC1 available for download In-Reply-To: References: <28B10ECC-37F6-11D9-81F0-000A9577009E@pcbi.upenn.edu> <28235647-3810-11D9-AF99-000A95AE92B0@gnf.org> Message-ID: <0EAAC49A-3820-11D9-AF99-000A95AE92B0@gnf.org> On Nov 16, 2004, at 2:20 PM, Allen Day wrote: > more importantly though, why would you want to use $ as a quote > delimiter > when there are other, less potentially confusing, options available? i > agree with aaron that is is not good. at the very least it is bad > style. > I do agree too - I didn't think of the export. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Tue Nov 16 19:00:18 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 16 18:58:18 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/FeatureIO gff.pm, 1.16, 1.17 In-Reply-To: <200411161935.iAGJZBDT005226@pub.open-bio.org> References: <200411161935.iAGJZBDT005226@pub.open-bio.org> Message-ID: there should be a next_sequence method. i wrote this into Bio::Tools::GFF, we should pretty much be able to just copy/paste it over. -allen On Tue, 16 Nov 2004, Scott Cain wrote: > Update of /home/repository/bioperl/bioperl-live/Bio/FeatureIO > In directory pub.open-bio.org:/tmp/cvs-serv5204 > > Modified Files: > gff.pm > Log Message: > added stuff to support fasta and target processing. The quesion remains what to > do with this data once you have it--particularly the fasta data. Should there be > (or is there) a next_sequence() method? > > > Index: gff.pm > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/FeatureIO/gff.pm,v > retrieving revision 1.16 > retrieving revision 1.17 > diff -C2 -d -r1.16 -r1.17 > *** gff.pm 16 Nov 2004 16:22:53 -0000 1.16 > --- gff.pm 16 Nov 2004 19:35:09 -0000 1.17 > *************** > *** 211,215 **** > return undef unless $gff_string; > > ! if($gff_string =~ /^##/){ > $self->_handle_directive($gff_string); > return $self->next_feature(); > --- 211,215 ---- > return undef unless $gff_string; > > ! if($gff_string =~ /^##/ or $gff_string =~ /^>/){ > $self->_handle_directive($gff_string); > return $self->next_feature(); > *************** > *** 248,255 **** > } > > ! elsif($directive eq 'FASTA'){ > $self->warn("'##$directive' directive handling not yet implemented"); > ! while($self->_readline()){ > ! #suck up the rest of the file > } > } > --- 248,266 ---- > } > > ! elsif($directive eq 'FASTA' or $directive =~ /^>(.+)/){ > ! my $fasta_directive_id = $1 if $1; > $self->warn("'##$directive' directive handling not yet implemented"); > ! local $/ = '>'; > ! while(my $read = $self->_readline()){ > ! chomp $read; > ! my $fasta_id; > ! my @seqarray = split /\n/, $read; > ! if ($fasta_directive_id) { > ! $fasta_id = $fasta_directive_id; > ! $fasta_directive_id = ''; > ! } else { > ! $fasta_id = shift @seqarray; > ! } > ! my $seq = join '', @seqarray; > } > } > *************** > *** 357,363 **** > ); > > ! if ($strand eq '+') { > $strand = 1; > ! } elsif ($strand eq '-') { > $strand = -1; > } > --- 368,374 ---- > ); > > ! if ($strand && $strand eq '+') { > $strand = 1; > ! } elsif ($strand && $strand eq '-') { > $strand = -1; > } > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > From michael.watson at bbsrc.ac.uk Wed Nov 17 04:19:24 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed Nov 17 04:20:20 2004 Subject: [Bioperl-l] Divide and Assemble Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89856@iahce2knas1.iah.bbsrc.reserved> Hi I want to take a sequence, divide it up into smaller pieces, run various algorithms on it that create sequence features on those smaller pieces, and then re-assemble the smaller pieces (with the newly created features on them) into the original, large sequence. I'm going to try and do this in BioPerl. My question is how do I transfer a sequence feature, and all it's co-ordinates, from one sequence to another? Is it as simple as modifying the start and end and just using add_SeqFeature()? Mick From michael.watson at bbsrc.ac.uk Wed Nov 17 04:28:40 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed Nov 17 04:27:24 2004 Subject: [Bioperl-l] Subseq that transfers features too Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89857@iahce2knas1.iah.bbsrc.reserved> Hi A quick question - I know trunc() will get me a sub sequence object without transferring features across, but is there a function that gets me a sub sequence that DOES transfer them? Thanks Mick From pvh at egenetics.com Wed Nov 17 07:24:15 2004 From: pvh at egenetics.com (Peter van Heusden) Date: Wed Nov 17 07:22:41 2004 Subject: [Bioperl-l] Kazusa Codon Table DB : Bio::DB::DBCUTG Message-ID: <419B42EF.1010203@egenetics.com> I noticed that the Kazusa Codon Table database at http://www.kazusa.or.jp/codon seems to have gone down as of today. Does anyone know if this is just maintainance or a more permanent state of affairs? Peter From jason.stajich at duke.edu Wed Nov 17 10:12:04 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Nov 17 10:10:52 2004 Subject: [Bioperl-l] Subseq that transfers features too In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95E89857@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E95E89857@iahce2knas1.iah.bbsrc.reserved> Message-ID: <0A577AFF-38AB-11D9-A792-000393C44276@duke.edu> Off the top of my head, the easiest way to go about this is load data into Bio::DB::GFF, then you can grab slices of the sequence and the features will have the right coordinates. Doing within Bioperl natively will require doing something with the coordinate mapping, but I don't know if the current implementations there try and solve this. -jason On Nov 17, 2004, at 4:28 AM, michael watson (IAH-C) wrote: > Hi > > A quick question - I know trunc() will get me a sub sequence object > without transferring features across, but is there a function that gets > me a sub sequence that DOES transfer them? > > Thanks > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From nakao-mitsuteru at aist.go.jp Wed Nov 17 08:36:54 2004 From: nakao-mitsuteru at aist.go.jp (Mitsuteru NAKAO) Date: Wed Nov 17 14:24:52 2004 Subject: [Bioperl-l] Kazusa Codon Table DB : Bio::DB::DBCUTG In-Reply-To: <419B42EF.1010203@egenetics.com> References: <419B42EF.1010203@egenetics.com> Message-ID: <20041117.223654.226808764.cortel@machina.localizome.net> Hi Peter, You can see information about the server maintenance at the top page of Kazusa DNA Research Institute, . > *Server Maintenance Information > It will be unavailable some part of pages during the dates. Apologies for inconvenience. > Server maintenance schedule due on 17-18 Nov. 2004. (GMT+09:00) From: Peter van Heusden Subject: [Bioperl-l] Kazusa Codon Table DB : Bio::DB::DBCUTG Date: Wed, 17 Nov 2004 14:24:15 +0200 > I noticed that the Kazusa Codon Table database at > http://www.kazusa.or.jp/codon seems to have gone down as of today. Does > anyone know if this is just maintainance or a more permanent state of > affairs? Best, - Mitsuteru Nakao Sequence Analysis Team Compuational Biology Research Center (CBRC) National Institute of Advanced Industrial Science and Technology (AIST) From szhang at compoundtherapeutics.com Wed Nov 17 12:55:16 2004 From: szhang at compoundtherapeutics.com (Shengsheng Zhang) Date: Wed Nov 17 14:24:55 2004 Subject: [Bioperl-l] bioperl-1.4 make error Message-ID: I can not get bioperl-1.4 installed on my linux machine, After getting the Makefile and type: make I get the following error: "Makefile:4763: *** multiple target patterns. Stop." Any clue what is going on? Thanks! -SS From Mikko.Arvas at vtt.fi Thu Nov 18 06:53:47 2004 From: Mikko.Arvas at vtt.fi (Mikko Arvas) Date: Thu Nov 18 06:51:49 2004 Subject: [Bioperl-l] Bio::SeqIO and bad entries in uniprot and interpro In-Reply-To: <200411112339.iABNcIKu022873@portal.open-bio.org> Message-ID: <4.3.2.7.2.20041118085838.00c8c868@vttmail.vtt.fi> Hi, I want to get all available Interpro matches for S. cerevisiae and some other species. So I need to parse Uniprot files to find a set of IDs for a given species and then get the Interpro matches from them. But the Uniprot release uniprot_trembl.dat gives an error towards the end of the file in next_seq call: my $inseq = Bio::SeqIO->new('-file' => ' 'swiss'); while (my $seq = $inseq->next_seq) { check species etc. in here} After happily processing a lot of sequences it gives: Invalid [] range "6-1" in regex; marked by <-- HERE in m/^Tomato severe leaf curl virus-[Guatemala 96-1 <-- HERE ]$/ Same goes for interpro: my $infeat = Bio::SeqIO->new('-file' => ' 'interpro' ); while (my $feat = $infeat->next_seq) { store features etc. in here} After happily processing a lot of features it gives: not well-formed (invalid token) at line 2, column 53, byte 131 at /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm line 187 I guess its no wonder that such big DBs have errors or are out of sync with perl modules etc. and I don't mind losing one seq or feature here or there. The files are rather big so fixing them manually is a bit painful. But I need to somehow get most things processed, is there a way to skip these bad entries or would you have some other smart ideas? I have bioperl 1.4. and latest Bio::SeqIO (for swiss.pm to work correctly) from CVS on SuSe8.1. Thanks a milloin for any help! Cheers, mikko Mikko Arvas VTT Biotechnology e-mail: mikko.arvas@vtt.fi tel: +358-(0)9-456 5827 mobile: +358-(0)44-381 0502 fax: +358-(0)9-455 2103 mail: Tietotie 2, Espoo P.O. Box 1500 FIN-02044 VTT, Finland From michael.watson at bbsrc.ac.uk Thu Nov 18 10:50:56 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu Nov 18 10:50:10 2004 Subject: [Bioperl-l] Bio::Tools::Genewise.pm problems Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89894@iahce2knas1.iah.bbsrc.reserved> Hi I'm on Bioperl 1.4 on linux. Are there any particular output options I need to specify when using genewise and the BioPerl parser? I have a genewise output, using default options. When I use the sample code in Bio::Tools::Genewise, nothing is printed out. When I use the sample code with an example output from the wise2 package (basic_genomic.out) I get a error trace: EXCEPTION MSG: Need a start STACK Bio::Tools::Genewise::_get_strand Genewise.pm::116 Is anyone using this parser without any errors? If so, any tips on how to get it to work? Thanks Mick From jason.stajich at duke.edu Thu Nov 18 11:01:23 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Nov 18 10:59:22 2004 Subject: [Bioperl-l] Bio::Tools::Genewise.pm problems In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95E89894@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E95E89894@iahce2knas1.iah.bbsrc.reserved> Message-ID: <18874E4D-397B-11D9-A792-000393C44276@duke.edu> You need -genesf. If you want the protein and DNA sequence names parsed out add -para as well. So -genesf -para Bio::SearchIO::wise may be more forgiving, I don't remember - we probably shouldn't be throwing errors with the parser as you saw, but we basically get all the data from the -genesf option so without it I don't think the parser can do much. We don't make any attempt to parse the wise alignment block. You can always look at the tests in t/ to get a sense of how an object is used (there will be a reference to a datafile in the test script usually) -jason On Nov 18, 2004, at 10:50 AM, michael watson (IAH-C) wrote: > Hi > > I'm on Bioperl 1.4 on linux. > > Are there any particular output options I need to specify when using > genewise and the BioPerl parser? > > I have a genewise output, using default options. When I use the sample > code in Bio::Tools::Genewise, nothing is printed out. When I use the > sample code with an example output from the wise2 package > (basic_genomic.out) I get a error trace: > > EXCEPTION > MSG: Need a start > STACK Bio::Tools::Genewise::_get_strand Genewise.pm::116 > > Is anyone using this parser without any errors? If so, any tips on how > to get it to work? > > Thanks > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From michael.watson at bbsrc.ac.uk Thu Nov 18 11:22:15 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu Nov 18 11:22:30 2004 Subject: [Bioperl-l] Bio::Tools::Genewise.pm problems Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89896@iahce2knas1.iah.bbsrc.reserved> Hi I would use SearchIO but the exon phase information seems to get lost... Thanks for the tip, Genewise.pm now works fine. Mick -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: 18 November 2004 16:01 To: michael watson (IAH-C) Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Genewise.pm problems You need -genesf. If you want the protein and DNA sequence names parsed out add -para as well. So -genesf -para Bio::SearchIO::wise may be more forgiving, I don't remember - we probably shouldn't be throwing errors with the parser as you saw, but we basically get all the data from the -genesf option so without it I don't think the parser can do much. We don't make any attempt to parse the wise alignment block. You can always look at the tests in t/ to get a sense of how an object is used (there will be a reference to a datafile in the test script usually) -jason On Nov 18, 2004, at 10:50 AM, michael watson (IAH-C) wrote: > Hi > > I'm on Bioperl 1.4 on linux. > > Are there any particular output options I need to specify when using > genewise and the BioPerl parser? > > I have a genewise output, using default options. When I use the > sample code in Bio::Tools::Genewise, nothing is printed out. When I > use the sample code with an example output from the wise2 package > (basic_genomic.out) I get a error trace: > > EXCEPTION > MSG: Need a start > STACK Bio::Tools::Genewise::_get_strand Genewise.pm::116 > > Is anyone using this parser without any errors? If so, any tips on > how to get it to work? > > Thanks > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From allenday at ucla.edu Thu Nov 18 18:43:54 2004 From: allenday at ucla.edu (Allen Day) Date: Thu Nov 18 18:41:50 2004 Subject: [Bioperl-l] make test errors Message-ID: FAILED: ------- t/LiveSeq.t 48 1 2.08% 19 t/SeqFeature.t 9 2304 192 0 0.00% ?? t/Unflattener.t 9 2304 6 4 66.67% 5-6 t/Unflattener2.t 11 2 18.18% 7 10 t/hmmer.t 136 16 11.76% 8 13 30 33-40 137-141 ------- i'm also getting *a lot* of "isn't numeric warning messages that look like this: Argument ">81297" isn't numeric in numeric gt (>) at .../bioperl-live/blib/lib/Bio/Factory/FTLocationFactory.pm line 217, line 860. [15:41]aday@asti:~/cvsroot/bioperl-live> uname -a Linux asti.ev.affymetrix.com 2.6.8-1.521 #1 Mon Aug 16 09:01:18 EDT 2004 i686 i686 i386 GNU/Linux [15:42]aday@asti:~/cvsroot/bioperl-live> perl -v This is perl, v5.6.1 built for i686-linux ... -Allen From jason.stajich at duke.edu Fri Nov 19 09:51:21 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Nov 19 09:49:17 2004 Subject: [Bioperl-l] make test errors In-Reply-To: References: Message-ID: <7A041328-3A3A-11D9-8957-000393C44276@duke.edu> On Nov 18, 2004, at 6:43 PM, Allen Day wrote: > FAILED: > ------- > t/LiveSeq.t 48 1 2.08% 19 > t/SeqFeature.t 9 2304 192 0 0.00% ?? > t/Unflattener.t 9 2304 6 4 66.67% 5-6 > t/Unflattener2.t 11 2 18.18% 7 10 > t/hmmer.t 136 16 11.76% 8 13 30 33-40 137-141 > ------- > > i'm also getting *a lot* of "isn't numeric warning messages that look > like > this: > > Argument ">81297" isn't numeric in numeric gt (>) at > .../bioperl-live/blib/lib/Bio/Factory/FTLocationFactory.pm line 217, > line 860. > this is fixed now. my bad. Can't reproduce any of the failed tests with CVS live now. > [15:41]aday@asti:~/cvsroot/bioperl-live> uname -a > Linux asti.ev.affymetrix.com 2.6.8-1.521 #1 Mon Aug 16 09:01:18 EDT > 2004 i686 i686 i386 GNU/Linux > [15:42]aday@asti:~/cvsroot/bioperl-live> perl -v > > This is perl, v5.6.1 built for i686-linux > ... > > -Allen > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Fri Nov 19 09:59:04 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Nov 19 09:57:00 2004 Subject: [Bioperl-l] make test errors In-Reply-To: <7A041328-3A3A-11D9-8957-000393C44276@duke.edu> References: <7A041328-3A3A-11D9-8957-000393C44276@duke.edu> Message-ID: <8E1BCAC4-3A3B-11D9-8957-000393C44276@duke.edu> RestrictionIO.t still fails on perl 5.6.0 and darwin. I'm not sure if this a broken perl on OSX problem or 5.6.0 bug. If anyone has 5.6.0 on a linux system would be nice to know if the test still fails. We need to either put a workaround in the test or figure out why the parser isn't getting everything it is supposed to get (failing on test 9 the withrefm.pm parser). -jason On Nov 19, 2004, at 9:51 AM, Jason Stajich wrote: > > On Nov 18, 2004, at 6:43 PM, Allen Day wrote: > >> FAILED: >> ------- >> t/LiveSeq.t 48 1 2.08% 19 >> t/SeqFeature.t 9 2304 192 0 0.00% ?? >> t/Unflattener.t 9 2304 6 4 66.67% 5-6 >> t/Unflattener2.t 11 2 18.18% 7 10 >> t/hmmer.t 136 16 11.76% 8 13 30 33-40 137-141 >> ------- >> >> i'm also getting *a lot* of "isn't numeric warning messages that look >> like >> this: >> >> Argument ">81297" isn't numeric in numeric gt (>) at >> .../bioperl-live/blib/lib/Bio/Factory/FTLocationFactory.pm line 217, >> line 860. >> > this is fixed now. my bad. > > Can't reproduce any of the failed tests with CVS live now. > >> [15:41]aday@asti:~/cvsroot/bioperl-live> uname -a >> Linux asti.ev.affymetrix.com 2.6.8-1.521 #1 Mon Aug 16 09:01:18 EDT >> 2004 i686 i686 i386 GNU/Linux >> [15:42]aday@asti:~/cvsroot/bioperl-live> perl -v >> >> This is perl, v5.6.1 built for i686-linux >> ... >> >> -Allen >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From ewijaya at singnet.com.sg Fri Nov 19 10:24:27 2004 From: ewijaya at singnet.com.sg (Edward WIJAYA) Date: Fri Nov 19 10:22:05 2004 Subject: [Bioperl-l] Module for finding consensus In-Reply-To: <7A041328-3A3A-11D9-8957-000393C44276@duke.edu> References: <7A041328-3A3A-11D9-8957-000393C44276@duke.edu> Message-ID: Hi, I am new in Bioperl. I would like to know if there is a module to find a consensus string (motif)? Hope to hear from you again, thanks. Regards, Edward WIJAYA SINGAPORE From amackey at pcbi.upenn.edu Fri Nov 19 10:31:13 2004 From: amackey at pcbi.upenn.edu (amackey@pcbi.upenn.edu) Date: Fri Nov 19 10:29:10 2004 Subject: [Bioperl-l] make test errors In-Reply-To: <8E1BCAC4-3A3B-11D9-8957-000393C44276@duke.edu> References: <7A041328-3A3A-11D9-8957-000393C44276@duke.edu> <8E1BCAC4-3A3B-11D9-8957-000393C44276@duke.edu> Message-ID: <1100878273.419e11c1aa587@webmail.pcbi.upenn.edu> I thought I'd fixed this; the refm data file was corrupt somehow (had newlines in the middle of journal entries) ... -Aaron Quoting Jason Stajich : > RestrictionIO.t still fails on perl 5.6.0 and darwin. I'm not sure if > this a broken perl on OSX problem or 5.6.0 bug. If anyone has 5.6.0 on > a linux system would be nice to know if the test still fails. > > We need to either put a workaround in the test or figure out why the > parser isn't getting everything it is supposed to get (failing on test > 9 the withrefm.pm parser). > > -jason > On Nov 19, 2004, at 9:51 AM, Jason Stajich wrote: > > > > > On Nov 18, 2004, at 6:43 PM, Allen Day wrote: > > > >> FAILED: > >> ------- > >> t/LiveSeq.t 48 1 2.08% 19 > >> t/SeqFeature.t 9 2304 192 0 0.00% ?? > >> t/Unflattener.t 9 2304 6 4 66.67% 5-6 > >> t/Unflattener2.t 11 2 18.18% 7 10 > >> t/hmmer.t 136 16 11.76% 8 13 30 33-40 137-141 > >> ------- > >> > >> i'm also getting *a lot* of "isn't numeric warning messages that look > >> like > >> this: > >> > >> Argument ">81297" isn't numeric in numeric gt (>) at > >> .../bioperl-live/blib/lib/Bio/Factory/FTLocationFactory.pm line 217, > >> line 860. > >> > > this is fixed now. my bad. > > > > Can't reproduce any of the failed tests with CVS live now. > > > >> [15:41]aday@asti:~/cvsroot/bioperl-live> uname -a > >> Linux asti.ev.affymetrix.com 2.6.8-1.521 #1 Mon Aug 16 09:01:18 EDT > >> 2004 i686 i686 i386 GNU/Linux > >> [15:42]aday@asti:~/cvsroot/bioperl-live> perl -v > >> > >> This is perl, v5.6.1 built for i686-linux > >> ... > >> > >> -Allen > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From cain at cshl.org Fri Nov 19 16:05:03 2004 From: cain at cshl.org (Scott Cain) Date: Fri Nov 19 16:03:09 2004 Subject: [Bioperl-l] SeqFeature::Annotated broken Message-ID: <1100898303.3251.29.camel@localhost.localdomain> Allen, I think Bio::SeqFeature::Annotated is broken, though I am not really sure. When I run this test script: #!/usr/bin/perl -w use strict; use lib '/home/scott/bioperl-live'; use Bio::FeatureIO; my $in = Bio::FeatureIO->new(-file=> "test.gff", -format=>"gff"); while (my $feature = $in->next_feature()) { print join ("\t",($feature->display_name, $feature->start, $feature->end)),"\n"; my $ac = $feature->annotation; foreach my $key ($ac->get_all_annotation_keys() ) { my @values = $ac->get_Annotations($key); foreach my $value (@values) { print print "Annotation ",$key," stringified value ",$value->as_text,"\n"; } } } and a small test GFF file, I get the following output: -------------------- WARNING --------------------- MSG: '##sequence-region' directive handling not yet implemented --------------------------------------------------- -------------------- WARNING --------------------- MSG: seq_id() is deprecated, use id() --------------------------------------------------- Can't locate object method "remove_Annotations" via package "Bio::SeqFeature::Annotated" at /home/scott/bioperl-live/Bio/SeqFeature/Annotated.pm line 174, line 4. Which is weird, because the remove_Annotations method is there. Do you have any insights as to what the problem might be? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From allenday at ucla.edu Fri Nov 19 16:28:19 2004 From: allenday at ucla.edu (Allen Day) Date: Fri Nov 19 16:26:15 2004 Subject: [Bioperl-l] Re: SeqFeature::Annotated broken In-Reply-To: <1100898303.3251.29.camel@localhost.localdomain> References: <1100898303.3251.29.camel@localhost.localdomain> Message-ID: i was hacking on this module yesterday. i'll try to reproduce your error and get back to you. -allen On Fri, 19 Nov 2004, Scott Cain wrote: > Allen, > > I think Bio::SeqFeature::Annotated is broken, though I am not really > sure. When I run this test script: > > #!/usr/bin/perl -w > use strict; > > use lib '/home/scott/bioperl-live'; > use Bio::FeatureIO; > > my $in = Bio::FeatureIO->new(-file=> "test.gff", -format=>"gff"); > > while (my $feature = $in->next_feature()) { > print join ("\t",($feature->display_name, > $feature->start, > $feature->end)),"\n"; > my $ac = $feature->annotation; > foreach my $key ($ac->get_all_annotation_keys() ) { > my @values = $ac->get_Annotations($key); > foreach my $value (@values) { > print print "Annotation ",$key," stringified value ",$value->as_text,"\n"; } > } > } > > and a small test GFF file, I get the following output: > > -------------------- WARNING --------------------- > MSG: '##sequence-region' directive handling not yet implemented > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: seq_id() is deprecated, use id() > --------------------------------------------------- > Can't locate object method "remove_Annotations" via package "Bio::SeqFeature::Annotated" at /home/scott/bioperl-live/Bio/SeqFeature/Annotated.pm line 174, line 4. > > Which is weird, because the remove_Annotations method is there. Do you > have any insights as to what the problem might be? > > Thanks, > Scott > > > From allenday at ucla.edu Fri Nov 19 17:50:33 2004 From: allenday at ucla.edu (Allen Day) Date: Fri Nov 19 17:48:50 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes Message-ID: For those of you not on the guts list and interested in the reorganization of Bio::SeqFeatureI and Bio::AnnotationCollectionI that is happening, here are the CVS commit notes from today. ------ updated Bio::AnnotationCollection to implement *_tag_* methods with deprecation warning. these were taken from Bio::SeqFeatureI and Bio::SeqFeature::Generic. *_tag_* methods in Bio::SeqFeature::Annotated are now implemented by explicit pasthru to the conttained Bio::Annotation::Collection instance. ------ changes to Bio::AnnotationCollectionI and Bio::SeqFeatureI. * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI * All Bio::SeqFeatureI *_tag_* methods have been moved to Bio::AnnotationCollectionI, marked as deprecated, and mapped to their analogous and mostly pre-existing Bio::AnnotationCollectionI methods. Methods which were not in Bio::AnnotationCollectionI, but were i Bio::Annotation::Collection and were necessary for *_tag_* method remapping were created in Bio::AnnotationCollecitonI. * Bio::RangeI and Bio::AnnotationCollectionI method documentation removed from Bio::SeqFeatureI, and replaced with a link to the interface class inherited from. This reduces documentation maintenance overhead. ------ -allen From hlapp at gmx.net Sat Nov 20 01:39:19 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 20 01:37:15 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > * All Bio::SeqFeatureI *_tag_* methods have been moved to > Bio::AnnotationCollectionI, marked as deprecated, and mapped to their > analogous and mostly pre-existing Bio::AnnotationCollectionI methods. > > Methods which were not in Bio::AnnotationCollectionI, but were i > Bio::Annotation::Collection and were necessary for *_tag_* method > remapping were created in Bio::AnnotationCollecitonI. > This is a fairly substantial if not huge change, and this is happening on the main trunk. Basically, with this change the 1.5 release has moved far far away from a drop-in replacement (it's not tagged yet or is it?). Bioperl-db for instance is incompatible with this, and anybody using bioperl-db will then need a solid 1.4 support for some time to come. It'd interesting to which degree GBrowse is fine with these changes. I think the deprecation warnings are a really *bad* idea. The effect will be that anyone who has written code against a version that is older than today the screen will be cluttered over and over with warnings. My suggestion is to either do this on a branch first, or if on the main trunk then in a way that is completely transparent to the API programmer for some time to come. You can think about cluttering people's screen after 1.6. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Sun Nov 21 15:35:03 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Nov 21 15:33:01 2004 Subject: [Bioperl-l] bioperl-1.4 make error In-Reply-To: References: Message-ID: You made the makefile in the first place with: perl Makefile.PL More information about your linux distro and perl version are needed I guess. -jason On Nov 17, 2004, at 12:55 PM, Shengsheng Zhang wrote: > I can not get bioperl-1.4 installed on my linux machine, > > After getting the Makefile and type: > make > > I get the following error: > > "Makefile:4763: *** multiple target patterns. Stop." > > Any clue what is going on? > > Thanks! > > -SS > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Mon Nov 22 15:58:20 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Nov 22 15:56:19 2004 Subject: [Bioperl-l] Bio::SeqIO and bad entries in uniprot and interpro In-Reply-To: <4.3.2.7.2.20041118085838.00c8c868@vttmail.vtt.fi> References: <4.3.2.7.2.20041118085838.00c8c868@vttmail.vtt.fi> Message-ID: <3D675A22-3CC9-11D9-86C1-000393C44276@duke.edu> On Nov 18, 2004, at 6:53 AM, Mikko Arvas wrote: > Hi, > > I want to get all available Interpro matches for S. cerevisiae and > some other species. So I need to parse Uniprot files to find a set of > IDs for a given species and then get the Interpro matches from them. > But the Uniprot release uniprot_trembl.dat gives an error towards the > end of the file in next_seq call: > > my $inseq = Bio::SeqIO->new('-file' => ' '-format' => 'swiss'); > while (my $seq = $inseq->next_seq) { check species etc. in here} > > After happily processing a lot of sequences it gives: > Invalid [] range "6-1" in regex; marked by <-- HERE in m/^Tomato > severe leaf curl virus-[Guatemala 96-1 <-- HERE ]$/ > > Same goes for interpro: > > my $infeat = Bio::SeqIO->new('-file' => ' '-format' => 'interpro' ); > while (my $feat = $infeat->next_seq) { store features etc. in here} > > After happily processing a lot of features it gives: > not well-formed (invalid token) at line 2, column 53, byte 131 at > /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm > line 187 > > I guess its no wonder that such big DBs have errors or are out of sync > with perl modules etc. and I don't mind losing one seq or feature here > or there. The files are rather big so fixing them manually is a bit > painful. But I need to somehow get most things processed, is there a > way to skip these bad entries or would you have some other smart > ideas? I think this has to do with some unsafe code the swiss.pm module which compares the species name against a list of Unknown species name values and is trying to interpret the 96-1 as a range in a regexp. Putting a \Q in front of the variable where this is being compared should be enough to fix it. This is the grep on line 986. - return if grep { /^$binomial$/ } @Unknown_names; + return if grep { /^\Q$binomial$/ } @Unknown_names; There was one more place in the code that did this as well which I think I have fixed. I'm checking this in to CVS so do a cvs update and see if you problem persists. I've tested it against the uniprot_trembl.dat. Not sure what the problem is with the interpro parser, someone else will need to look into that. > > I have bioperl 1.4. and latest Bio::SeqIO (for swiss.pm to work > correctly) from CVS on SuSe8.1. > > Thanks a milloin for any help! > Cheers, > mikko > Mikko Arvas > VTT Biotechnology > > e-mail: mikko.arvas@vtt.fi > tel: +358-(0)9-456 5827 > mobile: +358-(0)44-381 0502 > fax: +358-(0)9-455 2103 > mail: Tietotie 2, Espoo > P.O. Box 1500 > FIN-02044 VTT, Finland > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From tomso at niehs.nih.gov Mon Nov 22 11:00:48 2004 From: tomso at niehs.nih.gov (Tomso, Dan (NIH/NIEHS)) Date: Mon Nov 22 20:07:13 2004 Subject: [Bioperl-l] Point me in the right direction--chromosomal positions Message-ID: <97A32A5EB720C84F90D76AC8BA06C1E3070E0B1E@nihexchange21.nih.gov> Greetings all. I need a pointer-I'd like to retrieve gene names, start positions, end positions, orientations (strand), names, and descriptions via BioPerl. Can someone suggest the correct module, so I can digest the documentation? Thanks, Dan T. From sdavis2 at mail.nih.gov Mon Nov 22 21:08:43 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon Nov 22 21:07:28 2004 Subject: [Bioperl-l] Point me in the right direction--chromosomal positions References: <97A32A5EB720C84F90D76AC8BA06C1E3070E0B1E@nihexchange21.nih.gov> Message-ID: <001101c4d101$5bfaebc0$7d75f345@WATSON> Dan, What organism? What are you looking up--refseq, genbank, gene names, locuslink? I would suggest you use the table browser at UCSC (http://genome.ucsc.edu) or EnsMART (http://www.ensembl.org) unless there is some reason those won't work. Bioperl doesn't have a mechanism to retrieve locations directly, at least that I know of (aside from DAS). You can build databases to do help with that, but that is a bit of work for a problem with an existing solution. You could also look into the EnsEMBL API. Let me know if I can help with using those web interfaces. Sean ----- Original Message ----- From: "Tomso, Dan (NIH/NIEHS)" To: Sent: Monday, November 22, 2004 11:00 AM Subject: [Bioperl-l] Point me in the right direction--chromosomal positions > Greetings all. > > > > I need a pointer-I'd like to retrieve gene names, start positions, end > positions, orientations (strand), names, and descriptions via BioPerl. > Can > someone suggest the correct module, so I can digest the documentation? > > > > Thanks, > > Dan T. > > > > > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Tue Nov 23 01:29:27 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Nov 23 01:27:37 2004 Subject: [Bioperl-l] Bio::SeqIO and bad entries in uniprot and interpro In-Reply-To: <3D675A22-3CC9-11D9-86C1-000393C44276@duke.edu> Message-ID: <066CB804-3D19-11D9-8C1A-000A959EB4C4@gmx.net> On Monday, November 22, 2004, at 12:58 PM, Jason Stajich wrote: >> Same goes for interpro: >> >> my $infeat = Bio::SeqIO->new('-file' => '> '-format' => 'interpro' ); >> while (my $feat = $infeat->next_seq) { store features etc. in here} >> >> After happily processing a lot of features it gives: >> not well-formed (invalid token) at line 2, column 53, byte 131 at >> /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm >> line 187 Can you locate the position that raises the error? I have seen error like this thrown on non-ASCII characters. >> >> I guess its no wonder that such big DBs have errors or are out of >> sync with perl modules etc. and I don't mind losing one seq or >> feature here or there. The files are rather big so fixing them >> manually is a bit painful. But I need to somehow get most things >> processed, is there a way to skip these bad entries or would you have >> some other smart ideas? >> XML::Parser being built on top of expat, there is really no way of recovering from an XML violation that would let you resume parsing of the document. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From grossman at molgen.mpg.de Tue Nov 23 11:59:08 2004 From: grossman at molgen.mpg.de (Steffen Grossmann) Date: Tue Nov 23 12:08:01 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/FeatureIO gff.pm, 1.16, 1.17 In-Reply-To: References: <200411161935.iAGJZBDT005226@pub.open-bio.org> Message-ID: <41A36C5C.5060104@molgen.mpg.de> Dear Allen, dear Scott, before we write a next_sequence method, we should have something which is able to reconstruct the a set of hierarchically nested features. Any suggestions for method names? How about next_group? next_group gives back an array of features (which represent the top-level features, the lower features appear as subfeatures). A group is ended by a ### directive (or by the EOF). A next_sequence method could then also use this nesting... I have ideas how to realize the implementation. Tell me what you think about it and I can start doing it. Steffen Allen Day wrote: >there should be a next_sequence method. i wrote this into >Bio::Tools::GFF, we should pretty much be able to just copy/paste it over. > >-allen > > >On Tue, 16 Nov 2004, Scott Cain wrote: > > > >>Update of /home/repository/bioperl/bioperl-live/Bio/FeatureIO >>In directory pub.open-bio.org:/tmp/cvs-serv5204 >> >>Modified Files: >> gff.pm >>Log Message: >>added stuff to support fasta and target processing. The quesion remains what to >>do with this data once you have it--particularly the fasta data. Should there be >>(or is there) a next_sequence() method? >> >> >>Index: gff.pm >>=================================================================== >>RCS file: /home/repository/bioperl/bioperl-live/Bio/FeatureIO/gff.pm,v >>retrieving revision 1.16 >>retrieving revision 1.17 >>diff -C2 -d -r1.16 -r1.17 >>*** gff.pm 16 Nov 2004 16:22:53 -0000 1.16 >>--- gff.pm 16 Nov 2004 19:35:09 -0000 1.17 >>*************** >>*** 211,215 **** >> return undef unless $gff_string; >> >>! if($gff_string =~ /^##/){ >> $self->_handle_directive($gff_string); >> return $self->next_feature(); >>--- 211,215 ---- >> return undef unless $gff_string; >> >>! if($gff_string =~ /^##/ or $gff_string =~ /^>/){ >> $self->_handle_directive($gff_string); >> return $self->next_feature(); >>*************** >>*** 248,255 **** >> } >> >>! elsif($directive eq 'FASTA'){ >> $self->warn("'##$directive' directive handling not yet implemented"); >>! while($self->_readline()){ >>! #suck up the rest of the file >> } >> } >>--- 248,266 ---- >> } >> >>! elsif($directive eq 'FASTA' or $directive =~ /^>(.+)/){ >>! my $fasta_directive_id = $1 if $1; >> $self->warn("'##$directive' directive handling not yet implemented"); >>! local $/ = '>'; >>! while(my $read = $self->_readline()){ >>! chomp $read; >>! my $fasta_id; >>! my @seqarray = split /\n/, $read; >>! if ($fasta_directive_id) { >>! $fasta_id = $fasta_directive_id; >>! $fasta_directive_id = ''; >>! } else { >>! $fasta_id = shift @seqarray; >>! } >>! my $seq = join '', @seqarray; >> } >> } >>*************** >>*** 357,363 **** >> ); >> >>! if ($strand eq '+') { >> $strand = 1; >>! } elsif ($strand eq '-') { >> $strand = -1; >> } >>--- 368,374 ---- >> ); >> >>! if ($strand && $strand eq '+') { >> $strand = 1; >>! } elsif ($strand && $strand eq '-') { >> $strand = -1; >> } >> >>_______________________________________________ >>Bioperl-guts-l mailing list >>Bioperl-guts-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l >> >> >> >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- %---------------------------------------------% % Steffen Grossmann % % % % Max Planck Institute for Molecular Genetics % % Computational Molecular Biology % %---------------------------------------------% % Ihnestrasse 73 % % 14195 Berlin % % Germany % %---------------------------------------------% % Tel: (++49 +30) 8413-1167 % % Fax: (++49 +30) 8413-1152 % %---------------------------------------------% From cjm at fruitfly.org Tue Nov 23 13:16:34 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Tue Nov 23 13:14:30 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/FeatureIO gff.pm, 1.16, 1.17 In-Reply-To: <41A36C5C.5060104@molgen.mpg.de> References: <200411161935.iAGJZBDT005226@pub.open-bio.org> <41A36C5C.5060104@molgen.mpg.de> Message-ID: Is a group defined as a set of connected features? Is the group required to end with a ### directive? This could be checked for automatically by testing whether each feature is connected to the current feature graph. Or do we want to allow the data producer to define their own concept of grouping (if so this probably wouldn't round trip). What about singleton features such as SNPs - is a SNP in an intergenic area a group unto itself? (if so, we shouldn't require the ### directive after each one) Note that there's already code for reconstituting the SeqFeature hierarchy from the ID/Parent tags in Bio::SeqFeature::Tools Cheers Chris On Tue, 23 Nov 2004, Steffen Grossmann wrote: > Dear Allen, dear Scott, > > before we write a next_sequence method, we should have something which > is able to reconstruct the a set of hierarchically nested features. Any > suggestions for method names? How about next_group? next_group gives > back an array of features (which represent the top-level features, the > lower features appear as subfeatures). A group is ended by a ### > directive (or by the EOF). A next_sequence method could then also use > this nesting... > > I have ideas how to realize the implementation. Tell me what you think > about it and I can start doing it. > > Steffen > > Allen Day wrote: > > >there should be a next_sequence method. i wrote this into > >Bio::Tools::GFF, we should pretty much be able to just copy/paste it over. > > > >-allen > > > > > >On Tue, 16 Nov 2004, Scott Cain wrote: > > > > > > > >>Update of /home/repository/bioperl/bioperl-live/Bio/FeatureIO > >>In directory pub.open-bio.org:/tmp/cvs-serv5204 > >> > >>Modified Files: > >> gff.pm > >>Log Message: > >>added stuff to support fasta and target processing. The quesion remains what to > >>do with this data once you have it--particularly the fasta data. Should there be > >>(or is there) a next_sequence() method? > >> > >> > >>Index: gff.pm > >>=================================================================== > >>RCS file: /home/repository/bioperl/bioperl-live/Bio/FeatureIO/gff.pm,v > >>retrieving revision 1.16 > >>retrieving revision 1.17 > >>diff -C2 -d -r1.16 -r1.17 > >>*** gff.pm 16 Nov 2004 16:22:53 -0000 1.16 > >>--- gff.pm 16 Nov 2004 19:35:09 -0000 1.17 > >>*************** > >>*** 211,215 **** > >> return undef unless $gff_string; > >> > >>! if($gff_string =~ /^##/){ > >> $self->_handle_directive($gff_string); > >> return $self->next_feature(); > >>--- 211,215 ---- > >> return undef unless $gff_string; > >> > >>! if($gff_string =~ /^##/ or $gff_string =~ /^>/){ > >> $self->_handle_directive($gff_string); > >> return $self->next_feature(); > >>*************** > >>*** 248,255 **** > >> } > >> > >>! elsif($directive eq 'FASTA'){ > >> $self->warn("'##$directive' directive handling not yet implemented"); > >>! while($self->_readline()){ > >>! #suck up the rest of the file > >> } > >> } > >>--- 248,266 ---- > >> } > >> > >>! elsif($directive eq 'FASTA' or $directive =~ /^>(.+)/){ > >>! my $fasta_directive_id = $1 if $1; > >> $self->warn("'##$directive' directive handling not yet implemented"); > >>! local $/ = '>'; > >>! while(my $read = $self->_readline()){ > >>! chomp $read; > >>! my $fasta_id; > >>! my @seqarray = split /\n/, $read; > >>! if ($fasta_directive_id) { > >>! $fasta_id = $fasta_directive_id; > >>! $fasta_directive_id = ''; > >>! } else { > >>! $fasta_id = shift @seqarray; > >>! } > >>! my $seq = join '', @seqarray; > >> } > >> } > >>*************** > >>*** 357,363 **** > >> ); > >> > >>! if ($strand eq '+') { > >> $strand = 1; > >>! } elsif ($strand eq '-') { > >> $strand = -1; > >> } > >>--- 368,374 ---- > >> ); > >> > >>! if ($strand && $strand eq '+') { > >> $strand = 1; > >>! } elsif ($strand && $strand eq '-') { > >> $strand = -1; > >> } > >> > >>_______________________________________________ > >>Bioperl-guts-l mailing list > >>Bioperl-guts-l@portal.open-bio.org > >>http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > >> > >> > >> > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > From lairdm at sfu.ca Tue Nov 23 13:26:50 2004 From: lairdm at sfu.ca (Matthew Laird) Date: Tue Nov 23 13:24:46 2004 Subject: [Bioperl-l] Blast return codes (fwd) Message-ID: About a year ago I raised a question about bioperl (standaloneblast) throwing an error claiming Blast was dying with a return code of -1. After a bit of back and forth which I hope is readable in the thread below, we never really found an answer.... Looking through the bioperl list archives I find another reference to this error in July (http://bioperl.org/pipermail/bioperl-l/2004-July/016330.html), makes me feel better that I'm not alone in this problem. Anyhow, to this date the only solution we've found to this problem is in the install documentation for our package, instructing users to comment out the line in bioperl that checks the return code from blastall, as that Blast actually runs fine and returns a result, perl is just getting this odd return code. We've checked on a few platforms and see it consistantly on Solaris (both SPARC & x86), OS X, and some linux distributions. Actually, with linux even two different machines with the same distribution, one can have the problem, the other might not. Is there any know reason and solution for this issue? Any assistance would be greatly appriciated. Thanks. ---------- Forwarded message ---------- Date: Fri, 12 Dec 2003 15:23:56 -0800 (PST) From: Matthew Laird To: Keith James Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Blast return codes I did some more investigating and it's beginning to look a lot more bizarre. I changed the bioperl code to see if the blast command was actually run regardless of the -1 return code. And yes, blast did run and the result file is there - it's only that perl is returning a -1 to bioperl for some reason. I was actually just speaking with someone in another lab and he said he used to experience this problem too, his solution was to just write his own modules. :) So this certainly seems like some deep down perl/OS voodoo. I'd still be interested in hearing what envirnoment variables to set to run t/StandAloneBlast.t. Thanks. On Thu, 11 Dec 2003, Matthew Laird wrote: > Thanks for your assistance so far, I've been trying to find the difference > between the machines that do work and the ones that don't. The two most > similar machines which some do and some don't work are a group of Red Hat > 9 machines. > > The machines are running Red Hat 9 and Perl 5.8.0. I've tried both > bioperl 1.2.1 and 1.2.3. The only other difference I could find was that > perl was built from source on one of the machines that worked. I tried > doing that on one of the non-working machines and had no success. > > I've tried running the StandAloneBlast.t, what environment variables do I > need set? I receive: > [root@ssb7121-5 t]# perl StandAloneBlast.t > 1..10 > ok 1 > ok 2 > ok 3 > ok 4 > Blast Database ecoli.nt not found at StandAloneBlast.t line 67. > Blast Database swissprot not found at StandAloneBlast.t line 72. > Blast databases(s) not found, skipping remaining tests at > StandAloneBlast.t line 76. > ok 5 # skip Blast or env variables not installed correctly > ok 6 # skip Blast or env variables not installed correctly > ok 7 # skip Blast or env variables not installed correctly > ok 8 # skip Blast or env variables not installed correctly > ok 9 # skip Blast or env variables not installed correctly > ok 10 # skip Blast or env variables not installed correctly > > Obviously I need to set some variable so it can find the blast database. > > Thanks again. > > On 11 Dec 2003, Keith James wrote: > > > >>>>> "Matthew" == Matthew Laird writes: > > > > Matthew> Well, that's a step in the right direction, I have a > > Matthew> little more information now. I added a $! before the $? > > Matthew> and received: > > > > Matthew> ------------- EXCEPTION ------------- MSG: blastall call > > Matthew> crashed: -1 No child processes /usr/local/blast/blastall > > Matthew> -p blastp -d > > Matthew> /usr/local/psort/conf/analysis/sclblast/sclblast -i > > Matthew> /tmp/8Dt6zF1U59 -e 1e-09 -o /tmp/ojP9n04LZh > > > > Matthew> STACK Bio::Tools::Run::StandAloneBlast::_runblast > > Matthew> /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/Run/StandAloneBlast.pm:640 > > > > Matthew> This is where we get into Perl voodoo beyond my league, > > Matthew> "No child processes" - does that ring bells for anyone? > > Matthew> Thanks again. > > > > That's interesting. I think we need to know your OS platform and Perl > > version to get any further. > > > > I think that the value left by a system call in $? is the same as if a > > wait system call were made. No child processes is a Unix error code > > (ECHILD) which can be caused by a wait, being reported by Perl. > > > > What happens if you run the test for StandAloneBlast? (t/StandAloneBlast.t) > > > > Keith > > > > > > -- Matthew Laird SysAdmin/Web Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From allenday at ucla.edu Tue Nov 23 13:51:55 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 23 13:49:42 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] SeqFeature::Annotated->add_SeqFeature with 'EXPAND' option. In-Reply-To: <41A35B41.7000906@molgen.mpg.de> References: <41A35B41.7000906@molgen.mpg.de> Message-ID: It must have slipped through the cracks when I was porting SeqFeature::Generic functionality to SeqFeature::Annotated. Yes, please add docs to FeatureHolderI and implementation to SeqFeature::Annotated. Thanks. -Allen On Tue, 23 Nov 2004, Steffen Grossmann wrote: > Hi, > > is there a reason why the 'add_SeqFeature' method in Bio::SeqFeature::Annotated doesn't support the old 'EXPAND' option? I always considered > this a nice feature! If there is no reason, I could add it (the usual behaviour, when not explicitly stating 'EXPAND' wouldn't be > harmed...). We could also add it to the descrription in FeatureHolderI. > > Steffen > > > From allenday at ucla.edu Tue Nov 23 13:59:35 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 23 13:57:23 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/FeatureIO gff.pm, 1.16, 1.17 In-Reply-To: References: <200411161935.iAGJZBDT005226@pub.open-bio.org> <41A36C5C.5060104@molgen.mpg.de> Message-ID: On Tue, 23 Nov 2004, Chris Mungall wrote: > > Is a group defined as a set of connected features? i interpret group to mean a set of items, each of which has 0..N conections to other members of the set, and 0 connections to members in other sets. > Is the group required to end with a ### directive? This could be checked > for automatically by testing whether each feature is connected to the > current feature graph. Or do we want to allow the data producer to define > their own concept of grouping (if so this probably wouldn't round trip). on a related note, maybe Bio::FeatureIO::GFF should (optionally) write a '###' into the filehandle after each time write_feature() is called. > What about singleton features such as SNPs - is a SNP in an intergenic > area a group unto itself? (if so, we shouldn't require the ### directive > after each one) how does an intergenic snp differ from a genic snp? is there an easy way via SO to identify features that should never be connected to others? can we rely on Bio::SeqFeature::Tools to do this for us? > Note that there's already code for reconstituting the SeqFeature hierarchy > from the ID/Parent tags in Bio::SeqFeature::Tools -allen > Cheers > Chris > > On Tue, 23 Nov 2004, Steffen Grossmann wrote: > > > Dear Allen, dear Scott, > > > > before we write a next_sequence method, we should have something which > > is able to reconstruct the a set of hierarchically nested features. Any > > suggestions for method names? How about next_group? next_group gives > > back an array of features (which represent the top-level features, the > > lower features appear as subfeatures). A group is ended by a ### > > directive (or by the EOF). A next_sequence method could then also use > > this nesting... > > > > I have ideas how to realize the implementation. Tell me what you think > > about it and I can start doing it. > > > > Steffen > > > > Allen Day wrote: > > > > >there should be a next_sequence method. i wrote this into > > >Bio::Tools::GFF, we should pretty much be able to just copy/paste it over. > > > > > >-allen > > > > > > > > >On Tue, 16 Nov 2004, Scott Cain wrote: > > > > > > > > > > > >>Update of /home/repository/bioperl/bioperl-live/Bio/FeatureIO > > >>In directory pub.open-bio.org:/tmp/cvs-serv5204 > > >> > > >>Modified Files: > > >> gff.pm > > >>Log Message: > > >>added stuff to support fasta and target processing. The quesion remains what to > > >>do with this data once you have it--particularly the fasta data. Should there be > > >>(or is there) a next_sequence() method? > > >> > > >> > > >>Index: gff.pm > > >>=================================================================== > > >>RCS file: /home/repository/bioperl/bioperl-live/Bio/FeatureIO/gff.pm,v > > >>retrieving revision 1.16 > > >>retrieving revision 1.17 > > >>diff -C2 -d -r1.16 -r1.17 > > >>*** gff.pm 16 Nov 2004 16:22:53 -0000 1.16 > > >>--- gff.pm 16 Nov 2004 19:35:09 -0000 1.17 > > >>*************** > > >>*** 211,215 **** > > >> return undef unless $gff_string; > > >> > > >>! if($gff_string =~ /^##/){ > > >> $self->_handle_directive($gff_string); > > >> return $self->next_feature(); > > >>--- 211,215 ---- > > >> return undef unless $gff_string; > > >> > > >>! if($gff_string =~ /^##/ or $gff_string =~ /^>/){ > > >> $self->_handle_directive($gff_string); > > >> return $self->next_feature(); > > >>*************** > > >>*** 248,255 **** > > >> } > > >> > > >>! elsif($directive eq 'FASTA'){ > > >> $self->warn("'##$directive' directive handling not yet implemented"); > > >>! while($self->_readline()){ > > >>! #suck up the rest of the file > > >> } > > >> } > > >>--- 248,266 ---- > > >> } > > >> > > >>! elsif($directive eq 'FASTA' or $directive =~ /^>(.+)/){ > > >>! my $fasta_directive_id = $1 if $1; > > >> $self->warn("'##$directive' directive handling not yet implemented"); > > >>! local $/ = '>'; > > >>! while(my $read = $self->_readline()){ > > >>! chomp $read; > > >>! my $fasta_id; > > >>! my @seqarray = split /\n/, $read; > > >>! if ($fasta_directive_id) { > > >>! $fasta_id = $fasta_directive_id; > > >>! $fasta_directive_id = ''; > > >>! } else { > > >>! $fasta_id = shift @seqarray; > > >>! } > > >>! my $seq = join '', @seqarray; > > >> } > > >> } > > >>*************** > > >>*** 357,363 **** > > >> ); > > >> > > >>! if ($strand eq '+') { > > >> $strand = 1; > > >>! } elsif ($strand eq '-') { > > >> $strand = -1; > > >> } > > >>--- 368,374 ---- > > >> ); > > >> > > >>! if ($strand && $strand eq '+') { > > >> $strand = 1; > > >>! } elsif ($strand && $strand eq '-') { > > >> $strand = -1; > > >> } > > >> > > >>_______________________________________________ > > >>Bioperl-guts-l mailing list > > >>Bioperl-guts-l@portal.open-bio.org > > >>http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > > >> > > >> > > >> > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > From jason.stajich at duke.edu Tue Nov 23 14:12:07 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 23 14:10:02 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: Message-ID: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> On Nov 20, 2004, at 1:39 AM, Hilmar Lapp wrote: > > On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > >> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI >> * All Bio::SeqFeatureI *_tag_* methods have been moved to >> Bio::AnnotationCollectionI, marked as deprecated, and mapped to >> their >> analogous and mostly pre-existing Bio::AnnotationCollectionI >> methods. >> >> Methods which were not in Bio::AnnotationCollectionI, but were i >> Bio::Annotation::Collection and were necessary for *_tag_* method >> remapping were created in Bio::AnnotationCollecitonI. >> > > This is a fairly substantial if not huge change, and this is happening > on the main trunk. > > Basically, with this change the 1.5 release has moved far far away > from a drop-in replacement (it's not tagged yet or is it?). Bioperl-db > for instance is incompatible with this, and anybody using bioperl-db > will then need a solid 1.4 support for some time to come. It'd > interesting to which degree GBrowse is fine with these changes. it has not been tagged yet. I think Aaron is just really busy on this front. But I agree all this new stuff probably should not be part of the 1.5 release. I think a "grand plan" view is probably called for on what the architecture will be for Features. A lot of stuff is being rolled out, but I am not sure many people know how this is going to accommodate the difficult interface between GFF3 "everything is a feature and identifiable" and the current bioperl "features are attached to sequences" model. This is in fact something that many of us discussed offline and had ideas about but not sure what direction was really chosen. I admit to having my head down and not paying a lot of attention, and am glad for you guys to be working on it, but I think if we are having a serious departure from the current object structure and changing function names with deprecation warnings, it needs to be on a different release version since 1.5 is really just a 1.4+ release and not a new. Otherwise people are not going to adopt this new release and the baby (all the bugfixes that have gone in since 1.4) will get thrown out with the bathwater (lots of changes that some people may not want because they mean changing their already working code). > I think the deprecation warnings are a really *bad* idea. The effect > will be that anyone who has written code against a version that is > older than today the screen will be cluttered over and over with > warnings. > > My suggestion is to either do this on a branch first, or if on the > main trunk then in a way that is completely transparent to the API > programmer for some time to come. You can think about cluttering > people's screen after 1.6. I totally agree. Allen, et al, not sure if you are aware, but there is also a method called "deprecated" which is part of Bio::Root::you should be using instead of 'warn' which will only warn when verbose >= 0. It probably should be only printed when verbose > 0... > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From amackey at pcbi.upenn.edu Tue Nov 23 14:28:49 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Nov 23 14:26:45 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> Message-ID: > On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > >> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI >> * All Bio::SeqFeatureI *_tag_* methods have been moved to >> Bio::AnnotationCollectionI, marked as deprecated, and mapped to >> their >> analogous and mostly pre-existing Bio::AnnotationCollectionI >> methods. >> >> Methods which were not in Bio::AnnotationCollectionI, but were i >> Bio::Annotation::Collection and were necessary for *_tag_* method >> remapping were created in Bio::AnnotationCollecitonI. I've been paying some attention to this, but thought that the changes were only those required to get Bio::FeatureIO working (i.e. recapitulate GFF3 logic) without hampering object usage; do our tests pass with these changes in place? On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > it has not been tagged yet. I think Aaron is just really busy on this > front. I did tag the HEAD at RC1, so we could branch from there if we needed to; if this is really the big bug-bear that Hilmar and Jason are claiming, then I'd ask Allen to retract his patches that alter interface definitions, and branch. And I was so hoping to get RC2 packaged up later today ... -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From allenday at ucla.edu Tue Nov 23 15:10:42 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 23 15:08:54 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> Message-ID: On Tue, 23 Nov 2004, Aaron J. Mackey wrote: > > > On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > > > >> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > >> * All Bio::SeqFeatureI *_tag_* methods have been moved to > >> Bio::AnnotationCollectionI, marked as deprecated, and mapped to > >> their > >> analogous and mostly pre-existing Bio::AnnotationCollectionI > >> methods. > >> > >> Methods which were not in Bio::AnnotationCollectionI, but were i > >> Bio::Annotation::Collection and were necessary for *_tag_* method > >> remapping were created in Bio::AnnotationCollecitonI. > > I've been paying some attention to this, but thought that the changes > were only those required to get Bio::FeatureIO working (i.e. > recapitulate GFF3 logic) without hampering object usage; do our tests > pass with these changes in place? all tests pass here. > On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > > > it has not been tagged yet. I think Aaron is just really busy on this > > front. > > I did tag the HEAD at RC1, so we could branch from there if we needed > to; if this is really the big bug-bear that Hilmar and Jason are > claiming, then I'd ask Allen to retract his patches that alter > interface definitions, and branch. there hasn't really been any functionality change. just reorganization of interfaces. i put the deprecation warnings in where interface methods moved or mapped from one interface class to another. the changes are directly related to getting FeatureIO working, because Bio::SeqFeature::Annotated needs to fully implement Bio::SeqFeatureI. I didn't want to write the *_tag_* methods into Bio::SeqFeature::Annotated, which precipitated their deprecation and mapping to Bio::AnnotationCollectionI. > And I was so hoping to get RC2 packaged up later today ... Let me know if you want it branched. -Allen > -Aaron > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Tue Nov 23 15:11:11 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 23 15:09:04 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> Message-ID: I think if we just don't issue deprecation warnings it will be fine by me -- even if we are just calling the new subroutine under the hood. Tests seem to pass although Unflattner.t is falling over today not sure what is problem. -jason On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: > >> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: >> >>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI >>> * All Bio::SeqFeatureI *_tag_* methods have been moved to >>> Bio::AnnotationCollectionI, marked as deprecated, and mapped to >>> their >>> analogous and mostly pre-existing Bio::AnnotationCollectionI >>> methods. >>> >>> Methods which were not in Bio::AnnotationCollectionI, but were i >>> Bio::Annotation::Collection and were necessary for *_tag_* method >>> remapping were created in Bio::AnnotationCollecitonI. > > I've been paying some attention to this, but thought that the changes > were only those required to get Bio::FeatureIO working (i.e. > recapitulate GFF3 logic) without hampering object usage; do our tests > pass with these changes in place? > > On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > >> it has not been tagged yet. I think Aaron is just really busy on >> this front. > > I did tag the HEAD at RC1, so we could branch from there if we needed > to; if this is really the big bug-bear that Hilmar and Jason are > claiming, then I'd ask Allen to retract his patches that alter > interface definitions, and branch. > > And I was so hoping to get RC2 packaged up later today ... > > -Aaron > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From amackey at pcbi.upenn.edu Tue Nov 23 15:30:47 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Nov 23 15:28:37 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> Message-ID: <8E9219E6-3D8E-11D9-B1B7-000D93392082@pcbi.upenn.edu> > I didn't want to write the *_tag_* methods into > Bio::SeqFeature::Annotated, > which precipitated their deprecation and mapping to > Bio::AnnotationCollectionI. I read and re-read this sentence a few times, and it took awhile to understand what it means. Now that I've looked at the code, it seems that Bio::SeqFeature::Annotated implements the Bio::SeqFeatureI interface; Bio::SeqFeatureI calls for get_tag_values() to be implemented; so instead of implementing get_tag_values for Bio::SeqFeature::Annotated objects, you made Bio::SeqFeatureI inherit from Bio::AnnotationCollectionI, and then reimplemented get_tag_values() in AnnotationCollectionI to simply call get_Annotations() (deprecating it in the process)? While it "works", I don't see why it had to be done this way; I'd prefer a solution that didn't involve changing the definition of Bio::SeqFeatureI (at least not yet). Further, your "deprecations" seem to indicate that you simply want Bio::SeqFeatureI to go away entirely, and have us treat everything as an annotation collection (which may or may not have a location on a sequence). Is this the agreed-upon way of the future? What are you gaining, besides forcing my fingers to learn get_Annotations($tag) instead of get_tag_values($tag)? I'm all for sweeping change for consistency and logic, and 1.5 is meant to be a developer's release on the road to 1.6, so this *might* be the right time for it, as long as there is general agreement, and it doesn't (significantly) break existing tools without great reason. Thanks, -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From allenday at ucla.edu Tue Nov 23 16:47:26 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 23 16:45:15 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> Message-ID: On Tue, 23 Nov 2004, Jason Stajich wrote: > I think if we just don't issue deprecation warnings it will be fine by > me -- even if we are just calling the new subroutine under the hood. > Tests seem to pass although Unflattner.t is falling over today not sure > what is problem. that fails for me too, in addition to spewing out lots of diagnotistics. however, if you run 'make test_Unflattener2', it passes. strange. -allen > > -jason > On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: > > > > >> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > >> > >>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > >>> * All Bio::SeqFeatureI *_tag_* methods have been moved to > >>> Bio::AnnotationCollectionI, marked as deprecated, and mapped to > >>> their > >>> analogous and mostly pre-existing Bio::AnnotationCollectionI > >>> methods. > >>> > >>> Methods which were not in Bio::AnnotationCollectionI, but were i > >>> Bio::Annotation::Collection and were necessary for *_tag_* method > >>> remapping were created in Bio::AnnotationCollecitonI. > > > > I've been paying some attention to this, but thought that the changes > > were only those required to get Bio::FeatureIO working (i.e. > > recapitulate GFF3 logic) without hampering object usage; do our tests > > pass with these changes in place? > > > > On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > > > >> it has not been tagged yet. I think Aaron is just really busy on > >> this front. > > > > I did tag the HEAD at RC1, so we could branch from there if we needed > > to; if this is really the big bug-bear that Hilmar and Jason are > > claiming, then I'd ask Allen to retract his patches that alter > > interface definitions, and branch. > > > > And I was so hoping to get RC2 packaged up later today ... > > > > -Aaron > > > > -- > > Aaron J. Mackey, Ph.D. > > Dept. of Biology, Goddard 212 > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > 415 S. University Avenue office: 215-898-1205 > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From allenday at ucla.edu Tue Nov 23 16:54:04 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 23 16:51:50 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <8E9219E6-3D8E-11D9-B1B7-000D93392082@pcbi.upenn.edu> References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <8E9219E6-3D8E-11D9-B1B7-000D93392082@pcbi.upenn.edu> Message-ID: On Tue, 23 Nov 2004, Aaron J. Mackey wrote: > > > I didn't want to write the *_tag_* methods into > > Bio::SeqFeature::Annotated, > > which precipitated their deprecation and mapping to > > Bio::AnnotationCollectionI. > > I read and re-read this sentence a few times, and it took awhile to > understand what it means. Now that I've looked at the code, it seems > that Bio::SeqFeature::Annotated implements the Bio::SeqFeatureI > interface; Bio::SeqFeatureI calls for get_tag_values() to be > implemented; so instead of implementing get_tag_values for > Bio::SeqFeature::Annotated objects, you made Bio::SeqFeatureI inherit > from Bio::AnnotationCollectionI, and then reimplemented > get_tag_values() in AnnotationCollectionI to simply call > get_Annotations() (deprecating it in the process)? yes. > While it "works", I don't see why it had to be done this way; I'd > prefer a solution that didn't involve changing the definition of > Bio::SeqFeatureI (at least not yet). Further, your "deprecations" seem > to indicate that you simply want Bio::SeqFeatureI to go away entirely, > and have us treat everything as an annotation collection (which may or > may not have a location on a sequence). Is this the agreed-upon way of > the future? What are you gaining, besides forcing my fingers to learn > get_Annotations($tag) instead of get_tag_values($tag)? you've got the gist of it. i want features to be AnotatableI, not be AnnotationCollectionI themselves. the main payoffs i see here are: [1] consistency amongst annotated objects. why should SeqFeatureI do things differently? [2] possibility of strong annotation typing (e.g. Bio::Annotation::OntologyTerm instead of a plaintext string). this is the main reson driving the changes. > I'm all for sweeping change for consistency and logic, and 1.5 is meant > to be a developer's release on the road to 1.6, so this *might* be the > right time for it, as long as there is general agreement, and it > doesn't (significantly) break existing tools without great reason. the current interface method shuffling has the higher goal of consistency, and was the best i could come up with for shoehorning SeqFeatureI into being AnnotatableI. i'm open to suggestions if someone sees a better way. -Allen > > Thanks, > > -Aaron > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > From jason.stajich at duke.edu Tue Nov 23 17:08:09 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 23 17:06:11 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> Message-ID: <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> On Nov 23, 2004, at 4:47 PM, Allen Day wrote: > On Tue, 23 Nov 2004, Jason Stajich wrote: > >> I think if we just don't issue deprecation warnings it will be fine by >> me -- even if we are just calling the new subroutine under the hood. >> Tests seem to pass although Unflattner.t is falling over today not >> sure >> what is problem. > > that fails for me too, in addition to spewing out lots of > diagnotistics. > however, if you run 'make test_Unflattener2', it passes. strange. > no it is Unflattner not Unflattner2 % make test_Unflattener [SNIP OUT SOME STUFF] -------------------- WARNING --------------------- MSG: get_tagset_values() is deprecated. use get_Annotations() --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" is not implemented by package Bio::SeqFeature::Generic. > -allen > >> >> -jason >> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: >> >>> >>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: >>>> >>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI >>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to >>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped to >>>>> their >>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI >>>>> methods. >>>>> >>>>> Methods which were not in Bio::AnnotationCollectionI, but were i >>>>> Bio::Annotation::Collection and were necessary for *_tag_* method >>>>> remapping were created in Bio::AnnotationCollecitonI. >>> >>> I've been paying some attention to this, but thought that the changes >>> were only those required to get Bio::FeatureIO working (i.e. >>> recapitulate GFF3 logic) without hampering object usage; do our tests >>> pass with these changes in place? >>> >>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: >>> >>>> it has not been tagged yet. I think Aaron is just really busy on >>>> this front. >>> >>> I did tag the HEAD at RC1, so we could branch from there if we needed >>> to; if this is really the big bug-bear that Hilmar and Jason are >>> claiming, then I'd ask Allen to retract his patches that alter >>> interface definitions, and branch. >>> >>> And I was so hoping to get RC2 packaged up later today ... >>> >>> -Aaron >>> >>> -- >>> Aaron J. Mackey, Ph.D. >>> Dept. of Biology, Goddard 212 >>> University of Pennsylvania email: amackey@pcbi.upenn.edu >>> 415 S. University Avenue office: 215-898-1205 >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 >>> >>> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From cjm at fruitfly.org Tue Nov 23 17:31:47 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Tue Nov 23 17:29:33 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> Message-ID: Unflattener.t is failing because someone has messed up get_tagset_values() - this is a convenience method I originally added to SeqFeatureI. I'm not familiar enough with the new changes and AnnotationCollections to fix this. Surely the onus has always been on the person making changes to make sure the test suite passes before committing their changes? In which case, how did these changes make it in in the first place? On Tue, 23 Nov 2004, Jason Stajich wrote: > > On Nov 23, 2004, at 4:47 PM, Allen Day wrote: > > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > >> I think if we just don't issue deprecation warnings it will be fine by > >> me -- even if we are just calling the new subroutine under the hood. > >> Tests seem to pass although Unflattner.t is falling over today not > >> sure > >> what is problem. > > > > that fails for me too, in addition to spewing out lots of > > diagnotistics. > > however, if you run 'make test_Unflattener2', it passes. strange. > > > no it is Unflattner not Unflattner2 > > % make test_Unflattener > [SNIP OUT SOME STUFF] > > -------------------- WARNING --------------------- > MSG: get_tagset_values() is deprecated. use get_Annotations() > --------------------------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" is > not implemented by package Bio::SeqFeature::Generic. > > > > -allen > > > >> > >> -jason > >> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: > >> > >>> > >>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > >>>> > >>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > >>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to > >>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped to > >>>>> their > >>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI > >>>>> methods. > >>>>> > >>>>> Methods which were not in Bio::AnnotationCollectionI, but were i > >>>>> Bio::Annotation::Collection and were necessary for *_tag_* method > >>>>> remapping were created in Bio::AnnotationCollecitonI. > >>> > >>> I've been paying some attention to this, but thought that the changes > >>> were only those required to get Bio::FeatureIO working (i.e. > >>> recapitulate GFF3 logic) without hampering object usage; do our tests > >>> pass with these changes in place? > >>> > >>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > >>> > >>>> it has not been tagged yet. I think Aaron is just really busy on > >>>> this front. > >>> > >>> I did tag the HEAD at RC1, so we could branch from there if we needed > >>> to; if this is really the big bug-bear that Hilmar and Jason are > >>> claiming, then I'd ask Allen to retract his patches that alter > >>> interface definitions, and branch. > >>> > >>> And I was so hoping to get RC2 packaged up later today ... > >>> > >>> -Aaron > >>> > >>> -- > >>> Aaron J. Mackey, Ph.D. > >>> Dept. of Biology, Goddard 212 > >>> University of Pennsylvania email: amackey@pcbi.upenn.edu > >>> 415 S. University Avenue office: 215-898-1205 > >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 > >>> > >>> > >> -- > >> Jason Stajich > >> jason.stajich at duke.edu > >> http://www.duke.edu/~jes12/ > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From cjm at fruitfly.org Tue Nov 23 17:41:10 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Tue Nov 23 17:38:56 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <8E9219E6-3D8E-11D9-B1B7-000D93392082@pcbi.upenn.edu> Message-ID: On Tue, 23 Nov 2004, Allen Day wrote: [snip] > > While it "works", I don't see why it had to be done this way; I'd > > prefer a solution that didn't involve changing the definition of > > Bio::SeqFeatureI (at least not yet). Further, your "deprecations" seem > > to indicate that you simply want Bio::SeqFeatureI to go away entirely, > > and have us treat everything as an annotation collection (which may or > > may not have a location on a sequence). Is this the agreed-upon way of > > the future? What are you gaining, besides forcing my fingers to learn > > get_Annotations($tag) instead of get_tag_values($tag)? > > you've got the gist of it. i want features to be AnotatableI, not be > AnnotationCollectionI themselves. the main payoffs i see here are: > > [1] consistency amongst annotated objects. why should SeqFeatureI do > things differently? > > [2] possibility of strong annotation typing (e.g. > Bio::Annotation::OntologyTerm instead of a plaintext string). this is the > main reson driving the changes. Allen, I'm all for using ontologies for metadata and strong typing within bioperl, but I'm concerned that this is done right, and in such a way that it has minimal impact on the complexity and memory footprint of bioperl. I think that if you are edging us in the direction of [2] you should come up with a document describing your plan. I know this goes against the bioperl extreme programming ethos of "code first, ask questions later" but I'm quite concerned about this. Cheers Chris From allenday at ucla.edu Tue Nov 23 18:24:24 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 23 18:22:19 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> Message-ID: okay, i will look into this. Unflattener.t was passing for me, but Unflattener2.t was not. Jason, is Unflattener2.t passing for you? -Allen On Tue, 23 Nov 2004, Chris Mungall wrote: > > Unflattener.t is failing because someone has messed up get_tagset_values() > - this is a convenience method I originally added to SeqFeatureI. I'm not > familiar enough with the new changes and AnnotationCollections to fix > this. > > Surely the onus has always been on the person making changes to make sure > the test suite passes before committing their changes? In which case, how > did these changes make it in in the first place? > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > On Nov 23, 2004, at 4:47 PM, Allen Day wrote: > > > > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > >> I think if we just don't issue deprecation warnings it will be fine by > > >> me -- even if we are just calling the new subroutine under the hood. > > >> Tests seem to pass although Unflattner.t is falling over today not > > >> sure > > >> what is problem. > > > > > > that fails for me too, in addition to spewing out lots of > > > diagnotistics. > > > however, if you run 'make test_Unflattener2', it passes. strange. > > > > > no it is Unflattner not Unflattner2 > > > > % make test_Unflattener > > [SNIP OUT SOME STUFF] > > > > -------------------- WARNING --------------------- > > MSG: get_tagset_values() is deprecated. use get_Annotations() > > --------------------------------------------------- > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" is > > not implemented by package Bio::SeqFeature::Generic. > > > > > > > -allen > > > > > >> > > >> -jason > > >> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: > > >> > > >>> > > >>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > > >>>> > > >>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > > >>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to > > >>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped to > > >>>>> their > > >>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI > > >>>>> methods. > > >>>>> > > >>>>> Methods which were not in Bio::AnnotationCollectionI, but were i > > >>>>> Bio::Annotation::Collection and were necessary for *_tag_* method > > >>>>> remapping were created in Bio::AnnotationCollecitonI. > > >>> > > >>> I've been paying some attention to this, but thought that the changes > > >>> were only those required to get Bio::FeatureIO working (i.e. > > >>> recapitulate GFF3 logic) without hampering object usage; do our tests > > >>> pass with these changes in place? > > >>> > > >>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > > >>> > > >>>> it has not been tagged yet. I think Aaron is just really busy on > > >>>> this front. > > >>> > > >>> I did tag the HEAD at RC1, so we could branch from there if we needed > > >>> to; if this is really the big bug-bear that Hilmar and Jason are > > >>> claiming, then I'd ask Allen to retract his patches that alter > > >>> interface definitions, and branch. > > >>> > > >>> And I was so hoping to get RC2 packaged up later today ... > > >>> > > >>> -Aaron > > >>> > > >>> -- > > >>> Aaron J. Mackey, Ph.D. > > >>> Dept. of Biology, Goddard 212 > > >>> University of Pennsylvania email: amackey@pcbi.upenn.edu > > >>> 415 S. University Avenue office: 215-898-1205 > > >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 > > >>> > > >>> > > >> -- > > >> Jason Stajich > > >> jason.stajich at duke.edu > > >> http://www.duke.edu/~jes12/ > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l@portal.open-bio.org > > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From jason.stajich at duke.edu Tue Nov 23 20:48:54 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 23 20:46:41 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> Message-ID: On Nov 23, 2004, at 6:24 PM, Allen Day wrote: > okay, i will look into this. Unflattener.t was passing for me, but > Unflattener2.t was not. Jason, is Unflattener2.t passing for you? > > -Allen [jes12@head2 core]$ make test_Unflattener2 PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib -I/usr/lib/perl5/5.8.0/i386-linux-thread-multi -I/usr/lib/perl5/5.8.0 -e 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests @ARGV;' t/Unflattener2.t t/Unflattener2....ok All tests successful. Files=1, Tests=11, 2 wallclock secs ( 1.40 cusr + 0.07 csys = 1.47 CPU) Tests failing for t/Unflattener.t as I reported before. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From allenday at ucla.edu Tue Nov 23 21:16:09 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 23 21:13:58 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> Message-ID: Fixed. Here is a summary of what I did to make this happen. I went ahead and did the work necessary to make Bio::SeqFeatureI AnnotatableI instead of being itself an AnnotationCollectionI. . Bio::SeqFeatureI inherits Bio::AnnotatableI NOT Bio::AnnotationCollectionI . *_tag_* methods are in Bio::AnnotatableI, and internally defer to Bio::AnnotatableI->annotation->some_analagous_mapped_function() . method behavior is now more similar to original *_tag_* method behavior ; tag "values" are now instantiated as Bio::Annotation::SimpleValue objects by default, unless their name indicates they should be otherwise (e.g. tag name "comment" or "dblink") . deprecation warnings commented until 1.6 . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* methods map directly onto Bio::AnnotationI's Bio::AnnotationCollectionI instance. . Unflattener and Unflattener2 tests pass with no changes. . All tests pass. -Allen On Tue, 23 Nov 2004, Chris Mungall wrote: > > Unflattener.t is failing because someone has messed up get_tagset_values() > - this is a convenience method I originally added to SeqFeatureI. I'm not > familiar enough with the new changes and AnnotationCollections to fix > this. > > Surely the onus has always been on the person making changes to make sure > the test suite passes before committing their changes? In which case, how > did these changes make it in in the first place? > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > On Nov 23, 2004, at 4:47 PM, Allen Day wrote: > > > > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > >> I think if we just don't issue deprecation warnings it will be fine by > > >> me -- even if we are just calling the new subroutine under the hood. > > >> Tests seem to pass although Unflattner.t is falling over today not > > >> sure > > >> what is problem. > > > > > > that fails for me too, in addition to spewing out lots of > > > diagnotistics. > > > however, if you run 'make test_Unflattener2', it passes. strange. > > > > > no it is Unflattner not Unflattner2 > > > > % make test_Unflattener > > [SNIP OUT SOME STUFF] > > > > -------------------- WARNING --------------------- > > MSG: get_tagset_values() is deprecated. use get_Annotations() > > --------------------------------------------------- > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" is > > not implemented by package Bio::SeqFeature::Generic. > > > > > > > -allen > > > > > >> > > >> -jason > > >> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: > > >> > > >>> > > >>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > > >>>> > > >>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > > >>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to > > >>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped to > > >>>>> their > > >>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI > > >>>>> methods. > > >>>>> > > >>>>> Methods which were not in Bio::AnnotationCollectionI, but were i > > >>>>> Bio::Annotation::Collection and were necessary for *_tag_* method > > >>>>> remapping were created in Bio::AnnotationCollecitonI. > > >>> > > >>> I've been paying some attention to this, but thought that the changes > > >>> were only those required to get Bio::FeatureIO working (i.e. > > >>> recapitulate GFF3 logic) without hampering object usage; do our tests > > >>> pass with these changes in place? > > >>> > > >>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > > >>> > > >>>> it has not been tagged yet. I think Aaron is just really busy on > > >>>> this front. > > >>> > > >>> I did tag the HEAD at RC1, so we could branch from there if we needed > > >>> to; if this is really the big bug-bear that Hilmar and Jason are > > >>> claiming, then I'd ask Allen to retract his patches that alter > > >>> interface definitions, and branch. > > >>> > > >>> And I was so hoping to get RC2 packaged up later today ... > > >>> > > >>> -Aaron > > >>> > > >>> -- > > >>> Aaron J. Mackey, Ph.D. > > >>> Dept. of Biology, Goddard 212 > > >>> University of Pennsylvania email: amackey@pcbi.upenn.edu > > >>> 415 S. University Avenue office: 215-898-1205 > > >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 > > >>> > > >>> > > >> -- > > >> Jason Stajich > > >> jason.stajich at duke.edu > > >> http://www.duke.edu/~jes12/ > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l@portal.open-bio.org > > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From cjm at fruitfly.org Tue Nov 23 22:07:33 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Tue Nov 23 22:05:35 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> Message-ID: So let's see: AnnotatableI->annotation returns AnnotationCollectionI (*not* an AnnotationI) AnnotationCollectionI->get_Annotations returns list-of AnnotationI why can't accessor methods be named after the class of objects they return, rather than a different class? It makes things a lot easier for the easily confused like myself. More seriously: is the plan to move everyone across from SeqFeatureI->get_tag_values to SeqFeatureI->annotation? Have you considered the impact on the memory footprint and speed? Especially for large genbank files. Cheers Chris On Tue, 23 Nov 2004, Allen Day wrote: > Fixed. Here is a summary of what I did to make this happen. I went ahead > and did the work necessary to make Bio::SeqFeatureI AnnotatableI instead > of being itself an AnnotationCollectionI. > > . Bio::SeqFeatureI inherits Bio::AnnotatableI NOT > Bio::AnnotationCollectionI > . *_tag_* methods are in Bio::AnnotatableI, and internally defer to > Bio::AnnotatableI->annotation->some_analagous_mapped_function() > . method behavior is now more similar to original *_tag_* method > behavior ; tag "values" are now instantiated as > Bio::Annotation::SimpleValue objects by default, unless their name > indicates they should be otherwise (e.g. tag name "comment" or > "dblink") > . deprecation warnings commented until 1.6 > . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow > new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). > . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* > methods map directly onto Bio::AnnotationI's > Bio::AnnotationCollectionI instance. > . Unflattener and Unflattener2 tests pass with no changes. > . All tests pass. > > -Allen > > > On Tue, 23 Nov 2004, Chris Mungall wrote: > > > > > Unflattener.t is failing because someone has messed up get_tagset_values() > > - this is a convenience method I originally added to SeqFeatureI. I'm not > > familiar enough with the new changes and AnnotationCollections to fix > > this. > > > > Surely the onus has always been on the person making changes to make sure > > the test suite passes before committing their changes? In which case, how > > did these changes make it in in the first place? > > > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > > > > On Nov 23, 2004, at 4:47 PM, Allen Day wrote: > > > > > > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > > > >> I think if we just don't issue deprecation warnings it will be fine by > > > >> me -- even if we are just calling the new subroutine under the hood. > > > >> Tests seem to pass although Unflattner.t is falling over today not > > > >> sure > > > >> what is problem. > > > > > > > > that fails for me too, in addition to spewing out lots of > > > > diagnotistics. > > > > however, if you run 'make test_Unflattener2', it passes. strange. > > > > > > > no it is Unflattner not Unflattner2 > > > > > > % make test_Unflattener > > > [SNIP OUT SOME STUFF] > > > > > > -------------------- WARNING --------------------- > > > MSG: get_tagset_values() is deprecated. use get_Annotations() > > > --------------------------------------------------- > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" is > > > not implemented by package Bio::SeqFeature::Generic. > > > > > > > > > > -allen > > > > > > > >> > > > >> -jason > > > >> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: > > > >> > > > >>> > > > >>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > > > >>>> > > > >>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > > > >>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to > > > >>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped to > > > >>>>> their > > > >>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI > > > >>>>> methods. > > > >>>>> > > > >>>>> Methods which were not in Bio::AnnotationCollectionI, but were i > > > >>>>> Bio::Annotation::Collection and were necessary for *_tag_* method > > > >>>>> remapping were created in Bio::AnnotationCollecitonI. > > > >>> > > > >>> I've been paying some attention to this, but thought that the changes > > > >>> were only those required to get Bio::FeatureIO working (i.e. > > > >>> recapitulate GFF3 logic) without hampering object usage; do our tests > > > >>> pass with these changes in place? > > > >>> > > > >>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > > > >>> > > > >>>> it has not been tagged yet. I think Aaron is just really busy on > > > >>>> this front. > > > >>> > > > >>> I did tag the HEAD at RC1, so we could branch from there if we needed > > > >>> to; if this is really the big bug-bear that Hilmar and Jason are > > > >>> claiming, then I'd ask Allen to retract his patches that alter > > > >>> interface definitions, and branch. > > > >>> > > > >>> And I was so hoping to get RC2 packaged up later today ... > > > >>> > > > >>> -Aaron > > > >>> > > > >>> -- > > > >>> Aaron J. Mackey, Ph.D. > > > >>> Dept. of Biology, Goddard 212 > > > >>> University of Pennsylvania email: amackey@pcbi.upenn.edu > > > >>> 415 S. University Avenue office: 215-898-1205 > > > >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > >>> > > > >>> > > > >> -- > > > >> Jason Stajich > > > >> jason.stajich at duke.edu > > > >> http://www.duke.edu/~jes12/ > > > >> > > > >> _______________________________________________ > > > >> Bioperl-l mailing list > > > >> Bioperl-l@portal.open-bio.org > > > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >> > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > -- > > > Jason Stajich > > > jason.stajich at duke.edu > > > http://www.duke.edu/~jes12/ > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From cjm at fruitfly.org Tue Nov 23 22:24:21 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Tue Nov 23 22:22:12 2004 Subject: XML vs AnnotationCollectionI [was Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes] In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> Message-ID: You know, it strikes me that the whole AnnotationCollectionI framework is really just a recreation of xml. I'm not sure what the advantages of AC over xml are - but I can see plenty of advantages of xml over AC. with xml under the hood, you can still implement the exact same OO methods for accessing tags and values. But you potentially get more for free: potentially faster lookup times and smaller memory footprints; various validation choices - RNG/XML-Schema/DTDs (right now ACs are weakly typed which is not good for some s/w engineering tasks); potentially more interoperation between bio* projects; powerful querying and transformation choices; using standards; auto-serialization One idiom I have used in a project was to attach rich annotations to features using the existing tag-value system, but only using a single tag of type 'xml', and this served rather nicely. What I'm talking about is something more radical in that the entire tag-value hash would be replaced by an xml structure. of course, convenient get_tag_values style accessors would remain in place, and would query this structure. I may play around with something like this on a clean branch if there's anyone else who doesn't think this is a mad idea and may actually use the final results.... The only hindrance I can see is that sometimes AC is used to hold perl objects rather than recursive tag-value hashes; there would need to be a way of auto-reconstituting these - possibly from IDs.... Cheers Chris On Tue, 23 Nov 2004, Allen Day wrote: > Fixed. Here is a summary of what I did to make this happen. I went ahead > and did the work necessary to make Bio::SeqFeatureI AnnotatableI instead > of being itself an AnnotationCollectionI. > > . Bio::SeqFeatureI inherits Bio::AnnotatableI NOT > Bio::AnnotationCollectionI > . *_tag_* methods are in Bio::AnnotatableI, and internally defer to > Bio::AnnotatableI->annotation->some_analagous_mapped_function() > . method behavior is now more similar to original *_tag_* method > behavior ; tag "values" are now instantiated as > Bio::Annotation::SimpleValue objects by default, unless their name > indicates they should be otherwise (e.g. tag name "comment" or > "dblink") > . deprecation warnings commented until 1.6 > . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow > new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). > . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* > methods map directly onto Bio::AnnotationI's > Bio::AnnotationCollectionI instance. > . Unflattener and Unflattener2 tests pass with no changes. > . All tests pass. > > -Allen > > > On Tue, 23 Nov 2004, Chris Mungall wrote: > > > > > Unflattener.t is failing because someone has messed up get_tagset_values() > > - this is a convenience method I originally added to SeqFeatureI. I'm not > > familiar enough with the new changes and AnnotationCollections to fix > > this. > > > > Surely the onus has always been on the person making changes to make sure > > the test suite passes before committing their changes? In which case, how > > did these changes make it in in the first place? > > > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > > > > On Nov 23, 2004, at 4:47 PM, Allen Day wrote: > > > > > > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > > > >> I think if we just don't issue deprecation warnings it will be fine by > > > >> me -- even if we are just calling the new subroutine under the hood. > > > >> Tests seem to pass although Unflattner.t is falling over today not > > > >> sure > > > >> what is problem. > > > > > > > > that fails for me too, in addition to spewing out lots of > > > > diagnotistics. > > > > however, if you run 'make test_Unflattener2', it passes. strange. > > > > > > > no it is Unflattner not Unflattner2 > > > > > > % make test_Unflattener > > > [SNIP OUT SOME STUFF] > > > > > > -------------------- WARNING --------------------- > > > MSG: get_tagset_values() is deprecated. use get_Annotations() > > > --------------------------------------------------- > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" is > > > not implemented by package Bio::SeqFeature::Generic. > > > > > > > > > > -allen > > > > > > > >> > > > >> -jason > > > >> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: > > > >> > > > >>> > > > >>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > > > >>>> > > > >>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > > > >>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to > > > >>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped to > > > >>>>> their > > > >>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI > > > >>>>> methods. > > > >>>>> > > > >>>>> Methods which were not in Bio::AnnotationCollectionI, but were i > > > >>>>> Bio::Annotation::Collection and were necessary for *_tag_* method > > > >>>>> remapping were created in Bio::AnnotationCollecitonI. > > > >>> > > > >>> I've been paying some attention to this, but thought that the changes > > > >>> were only those required to get Bio::FeatureIO working (i.e. > > > >>> recapitulate GFF3 logic) without hampering object usage; do our tests > > > >>> pass with these changes in place? > > > >>> > > > >>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > > > >>> > > > >>>> it has not been tagged yet. I think Aaron is just really busy on > > > >>>> this front. > > > >>> > > > >>> I did tag the HEAD at RC1, so we could branch from there if we needed > > > >>> to; if this is really the big bug-bear that Hilmar and Jason are > > > >>> claiming, then I'd ask Allen to retract his patches that alter > > > >>> interface definitions, and branch. > > > >>> > > > >>> And I was so hoping to get RC2 packaged up later today ... > > > >>> > > > >>> -Aaron > > > >>> > > > >>> -- > > > >>> Aaron J. Mackey, Ph.D. > > > >>> Dept. of Biology, Goddard 212 > > > >>> University of Pennsylvania email: amackey@pcbi.upenn.edu > > > >>> 415 S. University Avenue office: 215-898-1205 > > > >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > >>> > > > >>> > > > >> -- > > > >> Jason Stajich > > > >> jason.stajich at duke.edu > > > >> http://www.duke.edu/~jes12/ > > > >> > > > >> _______________________________________________ > > > >> Bioperl-l mailing list > > > >> Bioperl-l@portal.open-bio.org > > > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >> > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > -- > > > Jason Stajich > > > jason.stajich at duke.edu > > > http://www.duke.edu/~jes12/ > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From allenday at ucla.edu Tue Nov 23 22:48:40 2004 From: allenday at ucla.edu (Allen Day) Date: Tue Nov 23 22:46:26 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> Message-ID: On Tue, 23 Nov 2004, Chris Mungall wrote: > > So let's see: > > AnnotatableI->annotation returns AnnotationCollectionI > (*not* an AnnotationI) > > AnnotationCollectionI->get_Annotations returns list-of AnnotationI > > why can't accessor methods be named after the class of objects they > return, rather than a different class? It makes things a lot easier for > the easily confused like myself. I agree, but this is how it was when I found it. > More seriously: is the plan to move everyone across from > SeqFeatureI->get_tag_values to SeqFeatureI->annotation? Yes. > Have you considered the impact on the memory footprint and speed? > Especially for large genbank files. Yes. For heavy lifting I think we'll need to have alternate AnnotationI/AnnotationCollectionI lightweight implementations that are stripped down and optimized for memory and speed. I imagine they'll be array-based and the AnnotationCollectionI will use a flyweight pattern. whatever is generating the SeqFeatureIs will need to be responsible for which type of AnnotationCollection is used. -Allen > Cheers > Chris > > > On Tue, 23 Nov 2004, Allen Day wrote: > > > Fixed. Here is a summary of what I did to make this happen. I went ahead > > and did the work necessary to make Bio::SeqFeatureI AnnotatableI instead > > of being itself an AnnotationCollectionI. > > > > . Bio::SeqFeatureI inherits Bio::AnnotatableI NOT > > Bio::AnnotationCollectionI > > . *_tag_* methods are in Bio::AnnotatableI, and internally defer to > > Bio::AnnotatableI->annotation->some_analagous_mapped_function() > > . method behavior is now more similar to original *_tag_* method > > behavior ; tag "values" are now instantiated as > > Bio::Annotation::SimpleValue objects by default, unless their name > > indicates they should be otherwise (e.g. tag name "comment" or > > "dblink") > > . deprecation warnings commented until 1.6 > > . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow > > new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). > > . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* > > methods map directly onto Bio::AnnotationI's > > Bio::AnnotationCollectionI instance. > > . Unflattener and Unflattener2 tests pass with no changes. > > . All tests pass. > > > > -Allen > > > > > > On Tue, 23 Nov 2004, Chris Mungall wrote: > > > > > > > > Unflattener.t is failing because someone has messed up get_tagset_values() > > > - this is a convenience method I originally added to SeqFeatureI. I'm not > > > familiar enough with the new changes and AnnotationCollections to fix > > > this. > > > > > > Surely the onus has always been on the person making changes to make sure > > > the test suite passes before committing their changes? In which case, how > > > did these changes make it in in the first place? > > > > > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > > > > > > > On Nov 23, 2004, at 4:47 PM, Allen Day wrote: > > > > > > > > > On Tue, 23 Nov 2004, Jason Stajich wrote: > > > > > > > > > >> I think if we just don't issue deprecation warnings it will be fine by > > > > >> me -- even if we are just calling the new subroutine under the hood. > > > > >> Tests seem to pass although Unflattner.t is falling over today not > > > > >> sure > > > > >> what is problem. > > > > > > > > > > that fails for me too, in addition to spewing out lots of > > > > > diagnotistics. > > > > > however, if you run 'make test_Unflattener2', it passes. strange. > > > > > > > > > no it is Unflattner not Unflattner2 > > > > > > > > % make test_Unflattener > > > > [SNIP OUT SOME STUFF] > > > > > > > > -------------------- WARNING --------------------- > > > > MSG: get_tagset_values() is deprecated. use get_Annotations() > > > > --------------------------------------------------- > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" is > > > > not implemented by package Bio::SeqFeature::Generic. > > > > > > > > > > > > > -allen > > > > > > > > > >> > > > > >> -jason > > > > >> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: > > > > >> > > > > >>> > > > > >>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > > > > >>>> > > > > >>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > > > > >>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to > > > > >>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped to > > > > >>>>> their > > > > >>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI > > > > >>>>> methods. > > > > >>>>> > > > > >>>>> Methods which were not in Bio::AnnotationCollectionI, but were i > > > > >>>>> Bio::Annotation::Collection and were necessary for *_tag_* method > > > > >>>>> remapping were created in Bio::AnnotationCollecitonI. > > > > >>> > > > > >>> I've been paying some attention to this, but thought that the changes > > > > >>> were only those required to get Bio::FeatureIO working (i.e. > > > > >>> recapitulate GFF3 logic) without hampering object usage; do our tests > > > > >>> pass with these changes in place? > > > > >>> > > > > >>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > > > > >>> > > > > >>>> it has not been tagged yet. I think Aaron is just really busy on > > > > >>>> this front. > > > > >>> > > > > >>> I did tag the HEAD at RC1, so we could branch from there if we needed > > > > >>> to; if this is really the big bug-bear that Hilmar and Jason are > > > > >>> claiming, then I'd ask Allen to retract his patches that alter > > > > >>> interface definitions, and branch. > > > > >>> > > > > >>> And I was so hoping to get RC2 packaged up later today ... > > > > >>> > > > > >>> -Aaron > > > > >>> > > > > >>> -- > > > > >>> Aaron J. Mackey, Ph.D. > > > > >>> Dept. of Biology, Goddard 212 > > > > >>> University of Pennsylvania email: amackey@pcbi.upenn.edu > > > > >>> 415 S. University Avenue office: 215-898-1205 > > > > >>> Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > >>> > > > > >>> > > > > >> -- > > > > >> Jason Stajich > > > > >> jason.stajich at duke.edu > > > > >> http://www.duke.edu/~jes12/ > > > > >> > > > > >> _______________________________________________ > > > > >> Bioperl-l mailing list > > > > >> Bioperl-l@portal.open-bio.org > > > > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > >> > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@portal.open-bio.org > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > -- > > > > Jason Stajich > > > > jason.stajich at duke.edu > > > > http://www.duke.edu/~jes12/ > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > From mlemieux at bioinfo.ca Wed Nov 24 04:22:26 2004 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Wed Nov 24 04:24:36 2004 Subject: [Bioperl-l] Easy switching from wwwBlast to QBlast Message-ID: <5B03F9DE-3DFA-11D9-AF99-000A95B139D2@bioinfo.ca> I've just recently started exploring BioPerl (v.1.4). So far it's been fun if a little daunting. As an exercise, I decided to try change the blast_sequence subroutine in Perl.pm so that it would let me send the query to either my local wwwBlast server or out over my slow, flakey internet connection to the QBlast server. I did this by adding a parameter LOCALSERVER which, if set to a URL, redirects the query to that server (e.g. LOCALSERVER => http://localhost/blast/blast.cgi); otherwise, it defaults to the server at the NCBI. I've also added support for query by accession or gi # (QBlast only since wwwBlast doesn't support such queries), submission of multiple sequences (either in a file or string or string variable), as well as passing any of the QBlast Put and Get options as parameters. Unlike the original one, my blast_sequence returns an array of results, not a single result, so that code calling my version of blast_sequence in a scalar context would incorrectly get the size of the array. Apart from Perl.pm, the only other file that I had to change was Bio/Tools/Run/RemoteBlast.pm. I just downloaded the latest release candidate, 1.5.RC1, and noticed that RemoteBlast.pm has been changed in ways that overlap with the changes I've made while maintaining backwards compatibility which my version does not since I was only working for myself at the time. So my question is: is anyone interested in getting the code I've developed? If so, a corollary question is: how do I go about contributing the code? I can pretty easily forward port my changes to RemoteBlast.pm to the 1.5.RC1 version in order to use the nice "validate by regexp" trick introduced there and to provide backwards compatibility. I'm not sure what to do about the Perl.pm module, though. I guess that the easiest would be to change the name of my blast_sequence subroutine and add it to Perl.pm since there is no object interface being altered. As I was working on this, I noticed that the HTML stripping that gets done on the response from the QBlast server fails on wwwBlast output since the format of the HTML is a little different (manifests as a "can't find mid-line data" error when processing the alignments). So I wrote a generic stripper which removes all HTML tags except those that contain an end-of-line within the tag itself or an internal, un-escaped closing angle bracket (>) which wouldn't be valid HTML anyway, I think. It doesn't touch single angle brackets (>) such as those found at the beginning of descriptions (>gi ...). # html stripper # remove simple and closing tags first and then leftover tags $str =~ s/<(\/)?\w+>//g; $str =~ s/<\D+([^>]*\n*)*>//g; Also, when retrieving RIDs in RemoteBlast.pm (retrieve_rid), the test for completion relies on the size of the file containing the reply. This has failed at least once for me. Since there is a status line near the top of the file in the response, it seems to me that something along the lines of the following might be more robust: # read file until QBlastInfoEnd to pull out status my $status = ''; my $junk = ''; open(TMP, $tempfile) or $self->throw("cannot open $tempfile"); while( defined (my $line = ) ) { last if ($line =~ /QBlastInfoEnd/); ($junk, $status) = (split /=/, $line) if ($line =~ /waiting|ready/i); } close TMP; if( $response->is_success ) { if ( $status =~ /waiting/i ) { return 0; } elsif ( $status =~ /ready/i ) { ... } else { # failed ... } } ... Finally, let me end by thanking all the BioPerl contributors for their fine work. Regards, Madeleine From amackey at pcbi.upenn.edu Wed Nov 24 08:55:42 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Nov 24 08:53:30 2004 Subject: XML vs AnnotationCollectionI [was Re: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes] In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> Message-ID: <87BE1AFA-3E20-11D9-A02C-000D93392082@pcbi.upenn.edu> I think this is a grand idea for what I've called BioPerl "nouveau", in which we may use any and all existing technologies at our disposal, regardless of how convoluted and difficult the installation may be, runtime-instantiated methods (even classes) are allowed, and where we only care marginally about backwards compatibility, and instead focus on doing it "right" (of course there are large arguments about this). In the "nouveau" project, interfaces are hidden from the naive end-user, documentation is highly structured (and inheritable), parsing is block- and event-based, and there are flexible API "adaptors" to provide ultra-simple, light-weight or heavy-weight access to underlying objects. Sounds great, but still a pipe-dream, unfortunately. I've been (slowly) writing some use cases for bioperl-nouveau which are really just example end-user code making use of simple, light-weight and heavy-weight APIs. Since this is still just a hobby for me, I haven't made them public yet; is there enough interest (yet) to talk about what a next-generation BioPerl might look like? -Aaron P.S. But Chris, if you actually wanted to get some real work done, you're welcome to just make a branch and go crazy ... whether we end up using it in the existing BioPerl depends on whether it ends up being more useful than the current implementation, and whether the audience seems willing to install XML tools for such a "core" functionality. On Nov 23, 2004, at 10:24 PM, Chris Mungall wrote: > I may play around with something like this on a clean branch if there's > anyone else who doesn't think this is a mad idea and may actually use > the > final results.... -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From amackey at pcbi.upenn.edu Wed Nov 24 08:58:51 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Nov 24 08:56:40 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: <919BED67-3D83-11D9-A3F1-000393C44276@duke.edu> <28C09BAC-3D9C-11D9-A3F1-000393C44276@duke.edu> Message-ID: Allen, thanks a ton for going the extra mile. Hilmar, does this solution satisfy your worries a bit? Thanks again to everyone, -Aaron On Nov 23, 2004, at 9:16 PM, Allen Day wrote: > Fixed. Here is a summary of what I did to make this happen. I went > ahead > and did the work necessary to make Bio::SeqFeatureI AnnotatableI > instead > of being itself an AnnotationCollectionI. > > . Bio::SeqFeatureI inherits Bio::AnnotatableI NOT > Bio::AnnotationCollectionI > . *_tag_* methods are in Bio::AnnotatableI, and internally defer to > Bio::AnnotatableI->annotation->some_analagous_mapped_function() > . method behavior is now more similar to original *_tag_* method > behavior ; tag "values" are now instantiated as > Bio::Annotation::SimpleValue objects by default, unless their name > indicates they should be otherwise (e.g. tag name "comment" or > "dblink") > . deprecation warnings commented until 1.6 > . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow > new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). > . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as > *_tag_* > methods map directly onto Bio::AnnotationI's > Bio::AnnotationCollectionI instance. > . Unflattener and Unflattener2 tests pass with no changes. > . All tests pass. > > -Allen > > > On Tue, 23 Nov 2004, Chris Mungall wrote: > >> >> Unflattener.t is failing because someone has messed up >> get_tagset_values() >> - this is a convenience method I originally added to SeqFeatureI. I'm >> not >> familiar enough with the new changes and AnnotationCollections to fix >> this. >> >> Surely the onus has always been on the person making changes to make >> sure >> the test suite passes before committing their changes? In which case, >> how >> did these changes make it in in the first place? >> >> On Tue, 23 Nov 2004, Jason Stajich wrote: >> >>> >>> On Nov 23, 2004, at 4:47 PM, Allen Day wrote: >>> >>>> On Tue, 23 Nov 2004, Jason Stajich wrote: >>>> >>>>> I think if we just don't issue deprecation warnings it will be >>>>> fine by >>>>> me -- even if we are just calling the new subroutine under the >>>>> hood. >>>>> Tests seem to pass although Unflattner.t is falling over today not >>>>> sure >>>>> what is problem. >>>> >>>> that fails for me too, in addition to spewing out lots of >>>> diagnotistics. >>>> however, if you run 'make test_Unflattener2', it passes. strange. >>>> >>> no it is Unflattner not Unflattner2 >>> >>> % make test_Unflattener >>> [SNIP OUT SOME STUFF] >>> >>> -------------------- WARNING --------------------- >>> MSG: get_tagset_values() is deprecated. use get_Annotations() >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" is >>> not implemented by package Bio::SeqFeature::Generic. >>> >>> >>>> -allen >>>> >>>>> >>>>> -jason >>>>> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: >>>>> >>>>>> >>>>>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: >>>>>>> >>>>>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI >>>>>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to >>>>>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped >>>>>>>> to >>>>>>>> their >>>>>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI >>>>>>>> methods. >>>>>>>> >>>>>>>> Methods which were not in Bio::AnnotationCollectionI, but >>>>>>>> were i >>>>>>>> Bio::Annotation::Collection and were necessary for *_tag_* >>>>>>>> method >>>>>>>> remapping were created in Bio::AnnotationCollecitonI. >>>>>> >>>>>> I've been paying some attention to this, but thought that the >>>>>> changes >>>>>> were only those required to get Bio::FeatureIO working (i.e. >>>>>> recapitulate GFF3 logic) without hampering object usage; do our >>>>>> tests >>>>>> pass with these changes in place? >>>>>> >>>>>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: >>>>>> >>>>>>> it has not been tagged yet. I think Aaron is just really busy on >>>>>>> this front. >>>>>> >>>>>> I did tag the HEAD at RC1, so we could branch from there if we >>>>>> needed >>>>>> to; if this is really the big bug-bear that Hilmar and Jason are >>>>>> claiming, then I'd ask Allen to retract his patches that alter >>>>>> interface definitions, and branch. >>>>>> >>>>>> And I was so hoping to get RC2 packaged up later today ... >>>>>> >>>>>> -Aaron >>>>>> >>>>>> -- >>>>>> Aaron J. Mackey, Ph.D. >>>>>> Dept. of Biology, Goddard 212 >>>>>> University of Pennsylvania email: amackey@pcbi.upenn.edu >>>>>> 415 S. University Avenue office: 215-898-1205 >>>>>> Philadelphia, PA 19104-6017 fax: 215-746-6697 >>>>>> >>>>>> >>>>> -- >>>>> Jason Stajich >>>>> jason.stajich at duke.edu >>>>> http://www.duke.edu/~jes12/ >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> -- >>> Jason Stajich >>> jason.stajich at duke.edu >>> http://www.duke.edu/~jes12/ >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From grossman at molgen.mpg.de Wed Nov 24 09:23:18 2004 From: grossman at molgen.mpg.de (Steffen Grossmann) Date: Wed Nov 24 09:22:29 2004 Subject: [Bioperl-l] More on bioperl-live/Bio/FeatureIO gff.pm In-Reply-To: <200411231357.02088.lstein@cshl.edu> References: <200411161935.iAGJZBDT005226@pub.open-bio.org> <41A36C5C.5060104@molgen.mpg.de> <200411231357.02088.lstein@cshl.edu> Message-ID: <41A49956.9070903@molgen.mpg.de> Here are some comments on some parts of your last emails: Lincoln Stein wrote: >A group is absolutely not required to end with a ### directive. It is >just a hint to the GFF parser that it no longer has to keep track of >previously-loaded features in case a child appears somewhere toward >the end of the file. > >Lincoln > > > That's exactly how I understand it. From Allen: >i interpret group to mean a set of items, each of which has 0..N >conections to other members of the set, and 0 connections to members in >other sets. > This is another true definition, but >on a related note, maybe Bio::FeatureIO::GFF should (optionally) write a >'###' into the filehandle after each time write_feature() is called. > this is not true, because it might happen, that a feature appears as a subfeature of more than one higher-level features. E.g. an exon can appear as part of two different transcripts. When writing out those two transcripts, you aren't allowed to put an '###' in between. Here are some proposals for the concrete implementation: 1) Apart from 'next_feature' we implement two further methods 'next_feature_group' and 'next_seq'. As discussed, a group either ends with '###' or with the EOF. 'next_seq' of course only makes sense when there is a '##FASTA' directive at the end of the file. 2a) To be able to deal with large gff-files we introduce two switches 'track_feature_groups' and 'track_seqs' which default to 0 and can be set to 1 when creating the Bio::FeatureIO object. Only when those switches are set, users are allowed to call 'next_feature_group' or 'next_seq', respectively. The reason for this that group or seq tracking can be very memory consuming in large gff-files without '###'s (because you never know what is to come...). Alternatively: 2b) As soon as one of the three methods to access the data in the gff-file has been used for the first time after creating the Bio::FeatureIO object, the other two don't work any longer... 3) On the writing side implement methods like 'write_feature_group' (can be ended with '###') and 'write_seq'. In 'write_seq' we would have to internally remember all written sequences until the file is closed (to be realized with DESTROY). Tell me your opinions and whether you have other ideas. I then start coding... Steffen P.S. I see that some of this functionality is already available in Bio::Tools::GFF. But as it seems there is a tendency away from it. I have no preference, I just would like to have full GFF3 functionality somewhere in bioperl... -- %---------------------------------------------% % Steffen Grossmann % % % % Max Planck Institute for Molecular Genetics % % Computational Molecular Biology % %---------------------------------------------% % Ihnestrasse 73 % % 14195 Berlin % % Germany % %---------------------------------------------% % Tel: (++49 +30) 8413-1167 % % Fax: (++49 +30) 8413-1152 % %---------------------------------------------% From ramiro.barrantes at uvm.edu Wed Nov 24 09:30:52 2004 From: ramiro.barrantes at uvm.edu (Ramiro Barrantes) Date: Wed Nov 24 09:24:15 2004 Subject: [Bioperl-l] question on abi module Message-ID: I have the following simple program ---- #! /usr/bin/perl -w use strict; use Bio::SeqIO; my $filename = shift @ARGV; my $in = Bio::SeqIO->new(-file => $filename, '-format' => 'abi'); my $bioseq = $in->next_seq(); my $seq = $bioseq->seq(); print $seq."\n"; ---- on certain abi files, when it comes to my $bioseq = $in->next_seq(); it prints an error fread_abi(): Base positions are not in order. Fixing fread_abi(): Base positions are not in order. Fixing fread_abi(): Base positions are not in order. Fixing ...ad infinitum and fills the whole error log and takes over the computer. What can I do? Thanks, Ramiro From m.claesson at student.ucc.ie Wed Nov 24 12:43:51 2004 From: m.claesson at student.ucc.ie (Marcus Claesson) Date: Wed Nov 24 12:41:44 2004 Subject: [Bioperl-l] How can I get add_sub_SeqFeature to work as I want? Message-ID: <1101318231.14533.20.camel@morpheus.ucc.ie> Hi! I've had a problem some time with getting the add_sub_SeqFeature to work for my own parsed blast data. Last time I posted it to the list nobody was be able to help me, but I have now simplified the problem a bit. I have blast parsed data like this in a text file (the columns are sbj_count,hsp_count,score,query_begin,query_end,strand),(I left out sbj_name for simplicity): 1 1 445 1148 375 -1 1 2 341 1717 1151 -1 2 1 364 1148 378 -1 2 2 344 1690 1151 -1 3 1 283 1145 381 -1 3 2 233 1714 1151 -1 4 1 182 1154 375 -1 4 2 124 1702 1160 -1 The only way I know the hit has more than one hsp is if the next line has the same sbj_count. Based on this I'd like to construct a Graphics Panel as an graphical overview of the blast hits. Below is the script that shows every hsp as separate hits (I want hsps belonging to the same hit on one line): #!/usr/bin/perl use Bio::Graphics; use Bio::SeqFeature::Generic; my $panel = Bio::Graphics::Panel->new(-length => 2000, -width => 500, -pad_left => 10, -pad_right => 10); my $track = $panel->add_track(-glyph => 'graded_segments', -label => 1, -strand_arrow => 1, -connector => 'dashed', -bgcolor => 'blue', -font2color => 'red', -sort_order => 'score', -description => sub { my $feature = shift; my ($score) = $feature->score; "Score=$score"}); $old_sbj_count = 0; while (<>) { ($sbj_count,$hsp_count,$score,$qbegin,$qend,$strand) = split; my $feature = Bio::SeqFeature::Generic->new(-start => $qbegin, -end => $qend, -strand => $strand, -score => $score); if ($old_sbj_count eq $sbj_count) { my $subfeature = Bio::SeqFeature::Generic->new( -start => $qbegin, -end => $qend, -strand=> $strand, -score => $score); $feature->add_sub_SeqFeature($subfeature,"EXPAND"); } $old_sbj_count = $sbj_count; $track->add_feature($feature); } print $panel->png; I would be extremely grateful for any help on this since I've been struggling for a long time... Best regards, Marcus From allenday at ucla.edu Wed Nov 24 13:49:38 2004 From: allenday at ucla.edu (Allen Day) Date: Wed Nov 24 13:47:24 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/SeqFeature Annotated.pm, 1.16, 1.17 In-Reply-To: <200411241632.iAOGW1DT019552@pub.open-bio.org> References: <200411241632.iAOGW1DT019552@pub.open-bio.org> Message-ID: i'd like a warning issued if the range expands. i see expansion as a convenience feature, and want to know when it happens because i don't generally expect it. On Wed, 24 Nov 2004, Steffen Grossman wrote: > Update of /home/repository/bioperl/bioperl-live/Bio/SeqFeature > In directory pub.open-bio.org:/tmp/cvs-serv19535 > > Modified Files: > Annotated.pm > Log Message: > Added support for the 'EXPAND' option in 'add_SeqFeature'. > > > Index: Annotated.pm > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v > retrieving revision 1.16 > retrieving revision 1.17 > diff -C2 -d -r1.16 -r1.17 > *** Annotated.pm 24 Nov 2004 02:14:06 -0000 1.16 > --- Annotated.pm 24 Nov 2004 16:31:59 -0000 1.17 > *************** > *** 604,617 **** > =head2 add_SeqFeature() > > ! Usage : $obj->add_SeqFeature($feat); > ! Function: Returns : nothing > ! Args : A Bio::SeqFeatureI object. Objects not implementing Bio::SeqFeatureI > ! and those whose bounds are not within those of the called object are > ! ignored with a warning. > > =cut > > sub add_SeqFeature { > ! my ($self,$val) = @_; > > return undef unless $val; > --- 604,625 ---- > =head2 add_SeqFeature() > > ! Usage : $feat->add_SeqFeature($subfeat); > ! $feat->add_SeqFeature($subfeat,'EXPAND') > ! Function: adds a SeqFeature into the subSeqFeature array. > ! with no 'EXPAND' qualifer, subfeat will be tested > ! as to whether it lies inside the parent, and throw > ! an exception if not. > ! > ! If EXPAND is used, the parent''s start/end/strand will > ! be adjusted so that it grows to accommodate the new > ! subFeature > ! Example : > ! Returns : nothing > ! Args : a Bio::SeqFeatureI object > > =cut > > sub add_SeqFeature { > ! my ($self,$val, $expand) = @_; > > return undef unless $val; > *************** > *** 619,626 **** > if ( !$val->isa('Bio::SeqFeatureI') ) { > $self->warn("$val does not implement Bio::SeqFeatureI, ignoring."); > } > > ! if ( !$self->contains($val) ) { > ! $self->warn("$val is not contained within parent feature, ignoring."); > } > > --- 627,640 ---- > if ( !$val->isa('Bio::SeqFeatureI') ) { > $self->warn("$val does not implement Bio::SeqFeatureI, ignoring."); > + return undef; > } > > ! if($expand && ($expand eq 'EXPAND')) { > ! $self->_expand_region($val); > ! } else { > ! if ( !$self->contains($val) ) { > ! $self->warn("$val is not contained within parent feature, and expansion is not valid, ignoring."); > ! return undef; > ! } > } > > *************** > *** 734,737 **** > --- 748,785 ---- > my ($self) = @_; > return $self->{'targets'} ? @{ $self->{'targets'} } : (); > + } > + > + =head2 _expand_region > + > + Title : _expand_region > + Usage : $self->_expand_region($feature); > + Function: Expand the total region covered by this feature to > + accomodate for the given feature. > + > + May be called whenever any kind of subfeature is added to this > + feature. add_SeqFeature() already does this. > + Returns : > + Args : A Bio::SeqFeatureI implementing object. > + > + > + =cut > + > + sub _expand_region { > + my ($self, $feat) = @_; > + if(! $feat->isa('Bio::SeqFeatureI')) { > + $self->warn("$feat does not implement Bio::SeqFeatureI"); > + } > + # if this doesn't have start/end set - forget it! > + if((! defined($self->start())) && (! defined $self->end())) { > + $self->start($feat->start()); > + $self->end($feat->end()); > + $self->strand($feat->strand) unless defined($self->strand()); > + # $self->strand($feat->strand) unless $self->strand(); > + } else { > + my $range = $self->union($feat); > + $self->start($range->start); > + $self->end($range->end); > + $self->strand($range->strand); > + } > } > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > From allenday at ucla.edu Wed Nov 24 13:53:23 2004 From: allenday at ucla.edu (Allen Day) Date: Wed Nov 24 13:51:10 2004 Subject: [Bioperl-l] How can I get add_sub_SeqFeature to work as I want? In-Reply-To: <1101318231.14533.20.camel@morpheus.ucc.ie> References: <1101318231.14533.20.camel@morpheus.ucc.ie> Message-ID: someone correct me if i'm wrong, but i think you need to add all the features for a track at once. maybe you should load this file into a database (even SQLite) so you don't have to keep track of the previous_row/next_row-ness of the text file. -allen On Wed, 24 Nov 2004, Marcus Claesson wrote: > Hi! > > I've had a problem some time with getting the add_sub_SeqFeature to work > for my own parsed blast data. Last time I posted it to the list nobody > was be able to help me, but I have now simplified the problem a bit. > > I have blast parsed data like this in a text file (the columns are > sbj_count,hsp_count,score,query_begin,query_end,strand),(I left out > sbj_name for simplicity): > 1 1 445 1148 375 -1 > 1 2 341 1717 1151 -1 > 2 1 364 1148 378 -1 > 2 2 344 1690 1151 -1 > 3 1 283 1145 381 -1 > 3 2 233 1714 1151 -1 > 4 1 182 1154 375 -1 > 4 2 124 1702 1160 -1 > > The only way I know the hit has more than one hsp is if the next line > has the same sbj_count. Based on this I'd like to construct a Graphics > Panel as an graphical overview of the blast hits. Below is the script > that shows every hsp as separate hits (I want hsps belonging to the same > hit on one line): > > #!/usr/bin/perl > use Bio::Graphics; > use Bio::SeqFeature::Generic; > my $panel = Bio::Graphics::Panel->new(-length => 2000, > -width => 500, > -pad_left => 10, > -pad_right => 10); > > my $track = $panel->add_track(-glyph => 'graded_segments', > -label => 1, > -strand_arrow => 1, > -connector => 'dashed', > -bgcolor => 'blue', > -font2color => 'red', > -sort_order => 'score', > -description => sub { > my $feature = shift; > my ($score) = $feature->score; > "Score=$score"}); > > $old_sbj_count = 0; > while (<>) { > ($sbj_count,$hsp_count,$score,$qbegin,$qend,$strand) = split; > my $feature = Bio::SeqFeature::Generic->new(-start => $qbegin, > -end => $qend, > -strand => $strand, > -score => $score); > if ($old_sbj_count eq $sbj_count) { > my $subfeature = Bio::SeqFeature::Generic->new( > -start => $qbegin, > -end => $qend, > -strand=> $strand, > -score => $score); > $feature->add_sub_SeqFeature($subfeature,"EXPAND"); > } > $old_sbj_count = $sbj_count; > $track->add_feature($feature); > } > print $panel->png; > > > I would be extremely grateful for any help on this since I've been > struggling for a long time... > > Best regards, > Marcus > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From amackey at pcbi.upenn.edu Wed Nov 24 14:10:07 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Nov 24 14:08:23 2004 Subject: [Bioperl-l] How can I get add_sub_SeqFeature to work as I want? In-Reply-To: References: <1101318231.14533.20.camel@morpheus.ucc.ie> Message-ID: <74999FD8-3E4C-11D9-A02C-000D93392082@pcbi.upenn.edu> Yep; if you want 4 hits drawn, you should only be calling $track->add_feature() 4 times, not once for every row in your file ... -Aaron On Nov 24, 2004, at 1:53 PM, Allen Day wrote: > someone correct me if i'm wrong, but i think you need to add all the > features for a track at once. maybe you should load this file into a > database (even SQLite) so you don't have to keep track of the > previous_row/next_row-ness of the text file. > > -allen > > On Wed, 24 Nov 2004, Marcus Claesson wrote: > >> Hi! >> >> I've had a problem some time with getting the add_sub_SeqFeature to >> work >> for my own parsed blast data. Last time I posted it to the list nobody >> was be able to help me, but I have now simplified the problem a bit. >> >> I have blast parsed data like this in a text file (the columns are >> sbj_count,hsp_count,score,query_begin,query_end,strand),(I left out >> sbj_name for simplicity): >> 1 1 445 1148 375 -1 >> 1 2 341 1717 1151 -1 >> 2 1 364 1148 378 -1 >> 2 2 344 1690 1151 -1 >> 3 1 283 1145 381 -1 >> 3 2 233 1714 1151 -1 >> 4 1 182 1154 375 -1 >> 4 2 124 1702 1160 -1 >> >> The only way I know the hit has more than one hsp is if the next line >> has the same sbj_count. Based on this I'd like to construct a Graphics >> Panel as an graphical overview of the blast hits. Below is the script >> that shows every hsp as separate hits (I want hsps belonging to the >> same >> hit on one line): >> >> #!/usr/bin/perl >> use Bio::Graphics; >> use Bio::SeqFeature::Generic; >> my $panel = Bio::Graphics::Panel->new(-length => 2000, >> -width => 500, >> -pad_left => 10, >> -pad_right => 10); >> >> my $track = $panel->add_track(-glyph => 'graded_segments', >> -label => 1, >> -strand_arrow => 1, >> -connector => 'dashed', >> -bgcolor => 'blue', >> -font2color => 'red', >> -sort_order => 'score', >> -description => sub { >> my $feature = shift; >> my ($score) = $feature->score; >> "Score=$score"}); >> >> $old_sbj_count = 0; >> while (<>) { >> ($sbj_count,$hsp_count,$score,$qbegin,$qend,$strand) = split; >> my $feature = Bio::SeqFeature::Generic->new(-start => $qbegin, >> -end => $qend, >> -strand => $strand, >> -score => $score); >> if ($old_sbj_count eq $sbj_count) { >> my $subfeature = Bio::SeqFeature::Generic->new( >> -start => $qbegin, >> -end => $qend, >> -strand=> $strand, >> -score => $score); >> $feature->add_sub_SeqFeature($subfeature,"EXPAND"); >> } >> $old_sbj_count = $sbj_count; >> $track->add_feature($feature); >> } >> print $panel->png; >> >> >> I would be extremely grateful for any help on this since I've been >> struggling for a long time... >> >> Best regards, >> Marcus >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From jason.stajich at duke.edu Wed Nov 24 14:20:40 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Nov 24 14:20:12 2004 Subject: [Bioperl-l] How can I get add_sub_SeqFeature to work as I want? In-Reply-To: References: <1101318231.14533.20.camel@morpheus.ucc.ie> Message-ID: The problem is that subfeatures are never added to the same top-level feature. Try this (note that the 'score' and 'description' [if you were to assign description] will be whatever the 1st instance of the subject is. See also Lincoln's example here http://bioperl.org/HOWTOs/Graphics-HOWTO/parsing-blast.html if you haven't seen this already. We're using a hash (%sbj) to figure out how to group things by a key ($sbj_count). my %sbj; while(<>) { my ($sbj_count,$hsp_count,$score,$qbegin,$qend,$strand) = split; my $feature = Bio::SeqFeature::Generic->new(-start => $qbegin, -end => $qend, -strand => $strand, -score => $score); if( defined $sbj{$sbj_count} ) { # add this feature as a subfeature instead since we've already seen one for this subject $sbj{$sbj_count}->add_sub_SeqFeature($feature,"EXPAND"); } else { # first time we've seen this subject so we'll just store this feature $sbj{$sbj_count} = $feature; } } # now lets add all those features to the track for my $f ( values %sbj ) { $track->add_feature($f); } On Nov 24, 2004, at 1:53 PM, Allen Day wrote: > someone correct me if i'm wrong, but i think you need to add all the > features for a track at once. maybe you should load this file into a > database (even SQLite) so you don't have to keep track of the > previous_row/next_row-ness of the text file. > > -allen > > On Wed, 24 Nov 2004, Marcus Claesson wrote: > >> Hi! >> >> I've had a problem some time with getting the add_sub_SeqFeature to >> work >> for my own parsed blast data. Last time I posted it to the list nobody >> was be able to help me, but I have now simplified the problem a bit. >> >> I have blast parsed data like this in a text file (the columns are >> sbj_count,hsp_count,score,query_begin,query_end,strand),(I left out >> sbj_name for simplicity): >> 1 1 445 1148 375 -1 >> 1 2 341 1717 1151 -1 >> 2 1 364 1148 378 -1 >> 2 2 344 1690 1151 -1 >> 3 1 283 1145 381 -1 >> 3 2 233 1714 1151 -1 >> 4 1 182 1154 375 -1 >> 4 2 124 1702 1160 -1 >> >> The only way I know the hit has more than one hsp is if the next line >> has the same sbj_count. Based on this I'd like to construct a Graphics >> Panel as an graphical overview of the blast hits. Below is the script >> that shows every hsp as separate hits (I want hsps belonging to the >> same >> hit on one line): >> >> #!/usr/bin/perl >> use Bio::Graphics; >> use Bio::SeqFeature::Generic; >> my $panel = Bio::Graphics::Panel->new(-length => 2000, >> -width => 500, >> -pad_left => 10, >> -pad_right => 10); >> >> my $track = $panel->add_track(-glyph => 'graded_segments', >> -label => 1, >> -strand_arrow => 1, >> -connector => 'dashed', >> -bgcolor => 'blue', >> -font2color => 'red', >> -sort_order => 'score', >> -description => sub { >> my $feature = shift; >> my ($score) = $feature->score; >> "Score=$score"}); >> >> $old_sbj_count = 0; >> while (<>) { >> ($sbj_count,$hsp_count,$score,$qbegin,$qend,$strand) = split; >> my $feature = Bio::SeqFeature::Generic->new(-start => $qbegin, >> -end => $qend, >> -strand => $strand, >> -score => $score); >> if ($old_sbj_count eq $sbj_count) { >> my $subfeature = Bio::SeqFeature::Generic->new( >> -start => $qbegin, >> -end => $qend, >> -strand=> $strand, >> -score => $score); >> $feature->add_sub_SeqFeature($subfeature,"EXPAND"); >> } >> $old_sbj_count = $sbj_count; >> $track->add_feature($feature); >> } >> print $panel->png; >> >> >> I would be extremely grateful for any help on this since I've been >> struggling for a long time... >> >> Best regards, >> Marcus >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From grossman at molgen.mpg.de Wed Nov 24 13:54:35 2004 From: grossman at molgen.mpg.de (Steffen Grossmann) Date: Wed Nov 24 14:55:49 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/SeqFeature Annotated.pm, 1.16, 1.17 In-Reply-To: References: <200411241632.iAOGW1DT019552@pub.open-bio.org> Message-ID: <41A4D8EB.9000203@molgen.mpg.de> As it is now, the expansion only takes place, when you explicitly say 'EXPAND'. If you don't do this and try to add a feature, which is not contained, you get a warning ("$val is not contained within parent feature, ignoring.") and nothing is added. Allen Day wrote: >i'd like a warning issued if the range expands. i see expansion as a >convenience feature, and want to know when it happens because i don't >generally expect it. > >On Wed, 24 Nov 2004, Steffen Grossman wrote: > > > >>Update of /home/repository/bioperl/bioperl-live/Bio/SeqFeature >>In directory pub.open-bio.org:/tmp/cvs-serv19535 >> >>Modified Files: >> Annotated.pm >>Log Message: >>Added support for the 'EXPAND' option in 'add_SeqFeature'. >> >> >>Index: Annotated.pm >>=================================================================== >>RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v >>retrieving revision 1.16 >>retrieving revision 1.17 >>diff -C2 -d -r1.16 -r1.17 >>*** Annotated.pm 24 Nov 2004 02:14:06 -0000 1.16 >>--- Annotated.pm 24 Nov 2004 16:31:59 -0000 1.17 >>*************** >>*** 604,617 **** >> =head2 add_SeqFeature() >> >>! Usage : $obj->add_SeqFeature($feat); >>! Function: Returns : nothing >>! Args : A Bio::SeqFeatureI object. Objects not implementing Bio::SeqFeatureI >>! and those whose bounds are not within those of the called object are >>! ignored with a warning. >> >> =cut >> >> sub add_SeqFeature { >>! my ($self,$val) = @_; >> >> return undef unless $val; >>--- 604,625 ---- >> =head2 add_SeqFeature() >> >>! Usage : $feat->add_SeqFeature($subfeat); >>! $feat->add_SeqFeature($subfeat,'EXPAND') >>! Function: adds a SeqFeature into the subSeqFeature array. >>! with no 'EXPAND' qualifer, subfeat will be tested >>! as to whether it lies inside the parent, and throw >>! an exception if not. >>! >>! If EXPAND is used, the parent''s start/end/strand will >>! be adjusted so that it grows to accommodate the new >>! subFeature >>! Example : >>! Returns : nothing >>! Args : a Bio::SeqFeatureI object >> >> =cut >> >> sub add_SeqFeature { >>! my ($self,$val, $expand) = @_; >> >> return undef unless $val; >>*************** >>*** 619,626 **** >> if ( !$val->isa('Bio::SeqFeatureI') ) { >> $self->warn("$val does not implement Bio::SeqFeatureI, ignoring."); >> } >> >>! if ( !$self->contains($val) ) { >>! $self->warn("$val is not contained within parent feature, ignoring."); >> } >> >>--- 627,640 ---- >> if ( !$val->isa('Bio::SeqFeatureI') ) { >> $self->warn("$val does not implement Bio::SeqFeatureI, ignoring."); >>+ return undef; >> } >> >>! if($expand && ($expand eq 'EXPAND')) { >>! $self->_expand_region($val); >>! } else { >>! if ( !$self->contains($val) ) { >>! $self->warn("$val is not contained within parent feature, and expansion is not valid, ignoring."); >>! return undef; >>! } >> } >> >>*************** >>*** 734,737 **** >>--- 748,785 ---- >> my ($self) = @_; >> return $self->{'targets'} ? @{ $self->{'targets'} } : (); >>+ } >>+ >>+ =head2 _expand_region >>+ >>+ Title : _expand_region >>+ Usage : $self->_expand_region($feature); >>+ Function: Expand the total region covered by this feature to >>+ accomodate for the given feature. >>+ >>+ May be called whenever any kind of subfeature is added to this >>+ feature. add_SeqFeature() already does this. >>+ Returns : >>+ Args : A Bio::SeqFeatureI implementing object. >>+ >>+ >>+ =cut >>+ >>+ sub _expand_region { >>+ my ($self, $feat) = @_; >>+ if(! $feat->isa('Bio::SeqFeatureI')) { >>+ $self->warn("$feat does not implement Bio::SeqFeatureI"); >>+ } >>+ # if this doesn't have start/end set - forget it! >>+ if((! defined($self->start())) && (! defined $self->end())) { >>+ $self->start($feat->start()); >>+ $self->end($feat->end()); >>+ $self->strand($feat->strand) unless defined($self->strand()); >>+ # $self->strand($feat->strand) unless $self->strand(); >>+ } else { >>+ my $range = $self->union($feat); >>+ $self->start($range->start); >>+ $self->end($range->end); >>+ $self->strand($range->strand); >>+ } >> } >> >> >>_______________________________________________ >>Bioperl-guts-l mailing list >>Bioperl-guts-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l >> >> >> >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- %---------------------------------------------% % Steffen Grossmann % % % % Max Planck Institute for Molecular Genetics % % Computational Molecular Biology % %---------------------------------------------% % Ihnestrasse 73 % % 14195 Berlin % % Germany % %---------------------------------------------% % Tel: (++49 +30) 8413-1167 % % Fax: (++49 +30) 8413-1152 % %---------------------------------------------% From rfsouza at citri.iq.usp.br Tue Nov 23 19:30:21 2004 From: rfsouza at citri.iq.usp.br (Robson Francisco de Souza {S}) Date: Wed Nov 24 21:16:57 2004 Subject: [Bioperl-l] bad entries in interpro Message-ID: <20041124003021.GA12104@cecm.usp.br> Hi everyone, A few days ago, Mikko Arvas sent an e-mail to this list asking how to ignore bad entries in the matches.xml file from the InterPro database. Hilmar Lapp answered asking him to locate the position in the file that raises the error message >> not well-formed (invalid token) at line 2, column 53, byte 131 at >> /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm >> line 187 Well, I saw no answers on the list, therefore I'm sending the problemtic entry below: The problem seems to be the "'" annotation at the second line. I also tested if an eval clause could be used to bypass such entries without crashing a script. The example script below worked fine and reported a problem with the entry above without crashing. Would it be too dificult to make interpro.pm able to parse names like the one above? Robson ################################################## #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $in = Bio::SeqIO->new(-file=>$ARGV[0], -format=>"interpro"); my $i=1; while (1) { my $seq; eval { $seq = $in->next_seq; }; last if (!defined $seq); if ($@) { print STDERR "Problem parsing sequence $i..."; next }; print STDERR $seq->id,"\n"; print "<=== ",$seq->id,"===>\n"; foreach my $f ($seq->get_all_SeqFeatures) { print $f->gff_string,"\n"; foreach my $key ($f->annotation->get_all_annotation_keys) { foreach my $value ($f->annotation->get_Annotations($key)) { print $key,":",$value->as_text,"\n"; } } } $i++; } exit 0; From lstein at cshl.edu Tue Nov 23 13:57:01 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Nov 24 21:17:34 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/FeatureIO gff.pm, 1.16, 1.17 In-Reply-To: References: <200411161935.iAGJZBDT005226@pub.open-bio.org> <41A36C5C.5060104@molgen.mpg.de> Message-ID: <200411231357.02088.lstein@cshl.edu> A group is absolutely not required to end with a ### directive. It is just a hint to the GFF parser that it no longer has to keep track of previously-loaded features in case a child appears somewhere toward the end of the file. Lincoln On Tuesday 23 November 2004 01:16 pm, Chris Mungall wrote: > Is a group defined as a set of connected features? > > Is the group required to end with a ### directive? This could be > checked for automatically by testing whether each feature is > connected to the current feature graph. Or do we want to allow the > data producer to define their own concept of grouping (if so this > probably wouldn't round trip). > > What about singleton features such as SNPs - is a SNP in an > intergenic area a group unto itself? (if so, we shouldn't require > the ### directive after each one) > > Note that there's already code for reconstituting the SeqFeature > hierarchy from the ID/Parent tags in Bio::SeqFeature::Tools > > Cheers > Chris > > On Tue, 23 Nov 2004, Steffen Grossmann wrote: > > Dear Allen, dear Scott, > > > > before we write a next_sequence method, we should have something > > which is able to reconstruct the a set of hierarchically nested > > features. Any suggestions for method names? How about next_group? > > next_group gives back an array of features (which represent the > > top-level features, the lower features appear as subfeatures). A > > group is ended by a ### directive (or by the EOF). A > > next_sequence method could then also use this nesting... > > > > I have ideas how to realize the implementation. Tell me what you > > think about it and I can start doing it. > > > > Steffen > > > > Allen Day wrote: > > >there should be a next_sequence method. i wrote this into > > >Bio::Tools::GFF, we should pretty much be able to just > > > copy/paste it over. > > > > > >-allen > > > > > >On Tue, 16 Nov 2004, Scott Cain wrote: > > >>Update of /home/repository/bioperl/bioperl-live/Bio/FeatureIO > > >>In directory pub.open-bio.org:/tmp/cvs-serv5204 > > >> > > >>Modified Files: > > >> gff.pm > > >>Log Message: > > >>added stuff to support fasta and target processing. The > > >> quesion remains what to do with this data once you have > > >> it--particularly the fasta data. Should there be (or is > > >> there) a next_sequence() method? > > >> > > >> > > >>Index: gff.pm > > >>=============================================================== > > >>==== RCS file: > > >> /home/repository/bioperl/bioperl-live/Bio/FeatureIO/gff.pm,v > > >> retrieving revision 1.16 > > >>retrieving revision 1.17 > > >>diff -C2 -d -r1.16 -r1.17 > > >>*** gff.pm 16 Nov 2004 16:22:53 -0000 1.16 > > >>--- gff.pm 16 Nov 2004 19:35:09 -0000 1.17 > > >>*************** > > >>*** 211,215 **** > > >> return undef unless $gff_string; > > >> > > >>! if($gff_string =~ /^##/){ > > >> $self->_handle_directive($gff_string); > > >> return $self->next_feature(); > > >>--- 211,215 ---- > > >> return undef unless $gff_string; > > >> > > >>! if($gff_string =~ /^##/ or $gff_string =~ /^>/){ > > >> $self->_handle_directive($gff_string); > > >> return $self->next_feature(); > > >>*************** > > >>*** 248,255 **** > > >> } > > >> > > >>! elsif($directive eq 'FASTA'){ > > >> $self->warn("'##$directive' directive handling not yet > > >> implemented"); ! while($self->_readline()){ > > >>! #suck up the rest of the file > > >> } > > >> } > > >>--- 248,266 ---- > > >> } > > >> > > >>! elsif($directive eq 'FASTA' or $directive =~ /^>(.+)/){ > > >>! my $fasta_directive_id = $1 if $1; > > >> $self->warn("'##$directive' directive handling not yet > > >> implemented"); ! local $/ = '>'; > > >>! while(my $read = $self->_readline()){ > > >>! chomp $read; > > >>! my $fasta_id; > > >>! my @seqarray = split /\n/, $read; > > >>! if ($fasta_directive_id) { > > >>! $fasta_id = $fasta_directive_id; > > >>! $fasta_directive_id = ''; > > >>! } else { > > >>! $fasta_id = shift @seqarray; > > >>! } > > >>! my $seq = join '', @seqarray; > > >> } > > >> } > > >>*************** > > >>*** 357,363 **** > > >> ); > > >> > > >>! if ($strand eq '+') { > > >> $strand = 1; > > >>! } elsif ($strand eq '-') { > > >> $strand = -1; > > >> } > > >>--- 368,374 ---- > > >> ); > > >> > > >>! if ($strand && $strand eq '+') { > > >> $strand = 1; > > >>! } elsif ($strand && $strand eq '-') { > > >> $strand = -1; > > >> } > > >> > > >>_______________________________________________ > > >>Bioperl-guts-l mailing list > > >>Bioperl-guts-l@portal.open-bio.org > > >>http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > > > > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041123/165697db/attachment.bin From dna88880 at yahoo.com Tue Nov 23 12:05:20 2004 From: dna88880 at yahoo.com (ted Huang) Date: Wed Nov 24 21:18:05 2004 Subject: [Bioperl-l] How to setup a local database of genbank? Message-ID: <20041123170520.72827.qmail@web54402.mail.yahoo.com> Hi, there, This is not a pure bioperl issue. But could anyone give me hand to set up a local database of genbank using bioperl packages? Thanks. Ted __________________________________ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com From pvh at egenetics.com Thu Nov 25 04:43:34 2004 From: pvh at egenetics.com (Peter van Heusden) Date: Thu Nov 25 04:41:31 2004 Subject: [Bioperl-l] Core dump in t/protgraph on FreeBSD Message-ID: <41A5A946.9040408@egenetics.com> Hi I'm seeing a core dump in the t/protgraph test with perl 5.8.5 on FreeBSD 4.10-STABLE. The core dump is coming from Bio::Graph:::ProteinGraph, line 414, where Clone's clone method is being called. Since clone() is implemented in XS, I imagine some kind of pointer trouble is causing this problem. Is anyone else seeing this problem? Its the only thing that stops me from having a 'green light' on my bioperl tests page. :( Peter From allenday at ucla.edu Thu Nov 25 05:44:09 2004 From: allenday at ucla.edu (Allen Day) Date: Thu Nov 25 05:42:13 2004 Subject: [Bioperl-l] bad entries in interpro In-Reply-To: <20041124003021.GA12104@cecm.usp.br> References: <20041124003021.GA12104@cecm.usp.br> Message-ID: i'm not ignoring you -- i'll get back to you on this soon. -allen On Tue, 23 Nov 2004, Robson Francisco de Souza {S} wrote: > Hi everyone, > > A few days ago, Mikko Arvas sent an e-mail to this list asking how to > ignore bad entries in the matches.xml file from the InterPro database. > Hilmar Lapp answered asking him to locate the position in the file that > raises the error message > > >> not well-formed (invalid token) at line 2, column 53, byte 131 at > >> /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm > >> line 187 > > Well, I saw no answers on the list, therefore I'm sending the problemtic > entry below: > > crc64="9797609B487FD64E"> > > > The problem seems to be the "'" annotation at the second line. > > I also tested if an eval clause could be used to bypass such entries > without crashing a script. The example script below worked fine and > reported a problem with the entry above without crashing. > > Would it be too dificult to make interpro.pm able to parse names like > the one above? > > Robson > > ################################################## > #!/usr/bin/perl -w > > use strict; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-file=>$ARGV[0], > -format=>"interpro"); > > my $i=1; > while (1) { > my $seq; > eval { > $seq = $in->next_seq; > }; > last if (!defined $seq); > if ($@) { print STDERR "Problem parsing sequence $i..."; next }; > print STDERR $seq->id,"\n"; > print "<=== ",$seq->id,"===>\n"; > foreach my $f ($seq->get_all_SeqFeatures) { > print $f->gff_string,"\n"; > foreach my $key ($f->annotation->get_all_annotation_keys) { > foreach my $value ($f->annotation->get_Annotations($key)) { > print $key,":",$value->as_text,"\n"; > } > } > } > $i++; > } > > exit 0; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From allenday at ucla.edu Thu Nov 25 05:45:58 2004 From: allenday at ucla.edu (Allen Day) Date: Thu Nov 25 05:43:48 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/SeqFeature Annotated.pm, 1.17, 1.18 In-Reply-To: <200411251002.iAPA2qDT022885@pub.open-bio.org> References: <200411251002.iAPA2qDT022885@pub.open-bio.org> Message-ID: i don't think it should do this, the whole point of using an AC is to retain typing. if you want to have a friendly stringification of the source attribute, use overloading in SimpleValue. there is an as_text() method already in there that is currently commented out. -allen On Thu, 25 Nov 2004, Steffen Grossman wrote: > Update of /home/repository/bioperl/bioperl-live/Bio/SeqFeature > In directory pub.open-bio.org:/tmp/cvs-serv22868 > > Modified Files: > Annotated.pm > Log Message: > 'source' now gives back its value instead of a > Bio::Annotation::SimpleValue object. (This is consisitent with > 'type'). > > > Index: Annotated.pm > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v > retrieving revision 1.17 > retrieving revision 1.18 > diff -C2 -d -r1.17 -r1.18 > *** Annotated.pm 24 Nov 2004 16:31:59 -0000 1.17 > --- Annotated.pm 25 Nov 2004 10:02:50 -0000 1.18 > *************** > *** 181,185 **** > } > > ! return $self->get_Annotations('source'); > } > > --- 181,188 ---- > } > > ! my $source_anno = $self->get_Annotations('source'); > ! > ! return $source_anno->value if ($source_anno); > ! return undef; > } > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > From fernan at iib.unsam.edu.ar Thu Nov 25 07:38:30 2004 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu Nov 25 07:36:51 2004 Subject: [Bioperl-l] Core dump in t/protgraph on FreeBSD In-Reply-To: <41A5A946.9040408@egenetics.com> References: <41A5A946.9040408@egenetics.com> Message-ID: <20041125123830.GD86543@iib.unsam.edu.ar> +----[ Peter van Heusden (25.Nov.2004 07:03): | | Hi Hi Peter! I've been trying to get some spare time to test the 1.5 release candidates on FreeBSD, but have not yet managed to do it. Are you using 1.5 from CVS or from the rc1 tarball? Fernan | I'm seeing a core dump in the t/protgraph test with perl 5.8.5 on | FreeBSD 4.10-STABLE. The core dump is coming from | Bio::Graph:::ProteinGraph, line 414, where Clone's clone method is being | called. Since clone() is implemented in XS, I imagine some kind of | pointer trouble is causing this problem. | | Is anyone else seeing this problem? Its the only thing that stops me | from having a 'green light' on my bioperl tests page. :( | | Peter | +----] -- Fern?n Ag?ero | Instituto de Investigaciones Biotecnol?gicas email | fernan at { iib.unsam.edu.ar , mail.retina.ar } wwww | http://genoma.unsam.edu.ar/~fernan phone, fax | +54 11 { 4580-7255 ext 310, 4752-9639 } From amackey at pcbi.upenn.edu Thu Nov 25 07:45:04 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Nov 25 07:46:03 2004 Subject: [Bioperl-l] Core dump in t/protgraph on FreeBSD In-Reply-To: <20041125123830.GD86543@iib.unsam.edu.ar> References: <41A5A946.9040408@egenetics.com> <20041125123830.GD86543@iib.unsam.edu.ar> Message-ID: <41A5D3D0.3040500@pcbi.upenn.edu> Please use the CVS HEAD ... -Aaron Fernan Aguero wrote: > +----[ Peter van Heusden (25.Nov.2004 07:03): > | > | Hi > > Hi Peter! > > I've been trying to get some spare time to test the 1.5 > release candidates on FreeBSD, but have not yet managed to > do it. > > Are you using 1.5 from CVS or from the rc1 tarball? > > Fernan > > | I'm seeing a core dump in the t/protgraph test with perl 5.8.5 on > | FreeBSD 4.10-STABLE. The core dump is coming from > | Bio::Graph:::ProteinGraph, line 414, where Clone's clone method is being > | called. Since clone() is implemented in XS, I imagine some kind of > | pointer trouble is causing this problem. > | > | Is anyone else seeing this problem? Its the only thing that stops me > | from having a 'green light' on my bioperl tests page. :( > | > | Peter > | > +----] > From pvh at egenetics.com Thu Nov 25 09:03:22 2004 From: pvh at egenetics.com (Peter van Heusden) Date: Thu Nov 25 09:01:35 2004 Subject: [Bioperl-l] Core dump in t/protgraph on FreeBSD In-Reply-To: <41A5D3D0.3040500@pcbi.upenn.edu> References: <41A5A946.9040408@egenetics.com> <20041125123830.GD86543@iib.unsam.edu.ar> <41A5D3D0.3040500@pcbi.upenn.edu> Message-ID: <41A5E62A.2090302@egenetics.com> Aaron J. Mackey wrote: > Please use the CVS HEAD ... > I've just reconfirmed - coredump on FreeBSD 4-STABLE with perl 5.8.5 on protgraph.t, bioperl from clean checkout of CVS head. Peter > -Aaron > > Fernan Aguero wrote: > >> +----[ Peter van Heusden (25.Nov.2004 07:03): >> | >> | Hi >> >> Hi Peter! >> >> I've been trying to get some spare time to test the 1.5 >> release candidates on FreeBSD, but have not yet managed to >> do it. >> >> Are you using 1.5 from CVS or from the rc1 tarball? >> >> Fernan >> >> | I'm seeing a core dump in the t/protgraph test with perl 5.8.5 on | >> FreeBSD 4.10-STABLE. The core dump is coming from | >> Bio::Graph:::ProteinGraph, line 414, where Clone's clone method is >> being | called. Since clone() is implemented in XS, I imagine some >> kind of | pointer trouble is causing this problem. >> | | Is anyone else seeing this problem? Its the only thing that stops >> me | from having a 'green light' on my bioperl tests page. :( >> | | Peter >> | >> +----] >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From pvh at egenetics.com Thu Nov 25 09:50:32 2004 From: pvh at egenetics.com (Peter van Heusden) Date: Thu Nov 25 09:48:50 2004 Subject: [Bioperl-l] Core dump in t/protgraph on FreeBSD In-Reply-To: <41A5E62A.2090302@egenetics.com> References: <41A5A946.9040408@egenetics.com> <20041125123830.GD86543@iib.unsam.edu.ar> <41A5D3D0.3040500@pcbi.upenn.edu> <41A5E62A.2090302@egenetics.com> Message-ID: <41A5F138.1030403@egenetics.com> And now with a fresh compile of Clone-0.15 this problem goes away. All bioperl tests are now ok on this version of FreeBSD (4-STABLE). Peter Peter van Heusden wrote: > Aaron J. Mackey wrote: > >> Please use the CVS HEAD ... >> > I've just reconfirmed - coredump on FreeBSD 4-STABLE with perl 5.8.5 > on protgraph.t, bioperl from clean checkout of CVS head. > > Peter > >> -Aaron >> >> Fernan Aguero wrote: >> >>> +----[ Peter van Heusden (25.Nov.2004 07:03): >>> | >>> | Hi >>> >>> Hi Peter! >>> >>> I've been trying to get some spare time to test the 1.5 >>> release candidates on FreeBSD, but have not yet managed to >>> do it. >>> >>> Are you using 1.5 from CVS or from the rc1 tarball? >>> >>> Fernan >>> >>> | I'm seeing a core dump in the t/protgraph test with perl 5.8.5 on >>> | FreeBSD 4.10-STABLE. The core dump is coming from | >>> Bio::Graph:::ProteinGraph, line 414, where Clone's clone method is >>> being | called. Since clone() is implemented in XS, I imagine some >>> kind of | pointer trouble is causing this problem. >>> | | Is anyone else seeing this problem? Its the only thing that >>> stops me | from having a 'green light' on my bioperl tests page. :( >>> | | Peter >>> | >>> +----] >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From Mikko.Arvas at vtt.fi Fri Nov 26 08:32:46 2004 From: Mikko.Arvas at vtt.fi (Mikko Arvas) Date: Fri Nov 26 08:30:15 2004 Subject: [Bioperl-l] Bio::SeqIO and bad entries in uniprot and interpro In-Reply-To: <3D675A22-3CC9-11D9-86C1-000393C44276@duke.edu> References: <4.3.2.7.2.20041118085838.00c8c868@vttmail.vtt.fi> <4.3.2.7.2.20041118085838.00c8c868@vttmail.vtt.fi> Message-ID: <4.3.2.7.2.20041126133111.00c9b550@vttmail.vtt.fi> Thanks a lot! Its fine now. Mikko At 15:58 22.11.2004 -0500, Jason Stajich wrote: >On Nov 18, 2004, at 6:53 AM, Mikko Arvas wrote: > >>Hi, >> >>I want to get all available Interpro matches for S. cerevisiae and some >>other species. So I need to parse Uniprot files to find a set of IDs for >>a given species and then get the Interpro matches from them. But the >>Uniprot release uniprot_trembl.dat gives an error towards the end of the >>file in next_seq call: >> >>my $inseq = Bio::SeqIO->new('-file' => '> '-format' => 'swiss'); >>while (my $seq = $inseq->next_seq) { check species etc. in here} >> >>After happily processing a lot of sequences it gives: >>Invalid [] range "6-1" in regex; marked by <-- HERE in m/^Tomato severe >>leaf curl virus-[Guatemala 96-1 <-- HERE ]$/ >> >>Same goes for interpro: >> >>my $infeat = Bio::SeqIO->new('-file' => '> '-format' => 'interpro' ); >>while (my $feat = $infeat->next_seq) { store features etc. in here} >> >>After happily processing a lot of features it gives: >>not well-formed (invalid token) at line 2, column 53, byte 131 at >>/usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm line 187 >> >>I guess its no wonder that such big DBs have errors or are out of sync >>with perl modules etc. and I don't mind losing one seq or feature here or >>there. The files are rather big so fixing them manually is a bit painful. >>But I need to somehow get most things processed, is there a way to skip >>these bad entries or would you have some other smart ideas? > >I think this has to do with some unsafe code the swiss.pm module which >compares the species name against a list of Unknown species name values >and is trying to interpret the 96-1 as a range in a regexp. Putting a \Q >in front of the variable where this is being compared should be enough to >fix it. This is the grep on line 986. > >- return if grep { /^$binomial$/ } @Unknown_names; >+ return if grep { /^\Q$binomial$/ } @Unknown_names; > >There was one more place in the code that did this as well which I think I >have fixed. > >I'm checking this in to CVS so do a cvs update and see if you problem >persists. I've tested it against the uniprot_trembl.dat. > >Not sure what the problem is with the interpro parser, someone else will >need to look into that. > >> >>I have bioperl 1.4. and latest Bio::SeqIO (for swiss.pm to work >>correctly) from CVS on SuSe8.1. >> >>Thanks a milloin for any help! >>Cheers, >>mikko >>Mikko Arvas >>VTT Biotechnology >> >>e-mail: mikko.arvas@vtt.fi >>tel: +358-(0)9-456 5827 >>mobile: +358-(0)44-381 0502 >>fax: +358-(0)9-455 2103 >>mail: Tietotie 2, Espoo >> P.O. Box 1500 >> FIN-02044 VTT, Finland >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >-- >Jason Stajich >jason.stajich at duke.edu >http://www.duke.edu/~jes12/ > Mikko Arvas VTT Biotechnology e-mail: mikko.arvas@vtt.fi tel: +358-(0)9-456 5827 mobile: +358-(0)44-381 0502 fax: +358-(0)9-455 2103 mail: Tietotie 2, Espoo P.O. Box 1500 FIN-02044 VTT, Finland From Mikko.Arvas at vtt.fi Fri Nov 26 08:44:26 2004 From: Mikko.Arvas at vtt.fi (Mikko Arvas) Date: Fri Nov 26 08:42:03 2004 Subject: [Bioperl-l] Bio::SeqIO and bad entries in uniprot and interpro In-Reply-To: <066CB804-3D19-11D9-8C1A-000A959EB4C4@gmx.net> References: <3D675A22-3CC9-11D9-86C1-000393C44276@duke.edu> Message-ID: <4.3.2.7.2.20041126133734.00ca5518@vttmail.vtt.fi> Hi, here is the first entry in match.xml that gives an error: At 22:29 22.11.2004 -0800, Hilmar Lapp wrote: >On Monday, November 22, 2004, at 12:58 PM, Jason Stajich wrote: > >>>Same goes for interpro: >>> >>>my $infeat = Bio::SeqIO->new('-file' => '>> '-format' => 'interpro' ); >>>while (my $feat = $infeat->next_seq) { store features etc. in here} >>> >>>After happily processing a lot of features it gives: >>>not well-formed (invalid token) at line 2, column 53, byte 131 at >>>/usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm line 187 > >Can you locate the position that raises the error? I have seen error like >this thrown on non-ASCII characters. > >>> >>>I guess its no wonder that such big DBs have errors or are out of sync >>>with perl modules etc. and I don't mind losing one seq or feature here >>>or there. The files are rather big so fixing them manually is a bit >>>painful. But I need to somehow get most things processed, is there a way >>>to skip these bad entries or would you have some other smart ideas? > >XML::Parser being built on top of expat, there is really no way of >recovering from an XML violation that would let you resume parsing of the >document. > > -hilmar > >-- >------------------------------------------------------------- >Hilmar Lapp email: lapp at gnf.org >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >------------------------------------------------------------- > > Mikko Arvas VTT Biotechnology e-mail: mikko.arvas@vtt.fi tel: +358-(0)9-456 5827 mobile: +358-(0)44-381 0502 fax: +358-(0)9-455 2103 mail: Tietotie 2, Espoo P.O. Box 1500 FIN-02044 VTT, Finland From davila at ioc.fiocruz.br Fri Nov 26 13:52:50 2004 From: davila at ioc.fiocruz.br (davila) Date: Fri Nov 26 13:54:49 2004 Subject: [Bioperl-l] Parsing Hit/Query Frames from Blastx Message-ID: <8D44604203DAF9438BF9123B4A08C779575FAB@alpha.ioc.fiocruz.br> Hi, Trying to parse a Blasxt output file, I realized it is not catching the real values of Hit_Frame and Query_Frame such as showed in the Bioperl Howtos: http://bioperl.org/HOWTOs/SearchIO/use.html HSP frame 0 $hsp->query->frame,$hsp->hit->frame My code (listed below) is returning wrong (Hit and Query) Frame values, maybe I am doing something wrong. Any help would be greatly appreciated. Thanks, Alberto ******* Code: use lib "/usr/local/bioperl14"; use Bio::SearchIO; $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' => 'clusters.blast'); while ($result = $searchio->next_result) { $query_name = $result->query_name(); $cluster_id = $query_name; print "$cluster_id\n"; $rank = 1; while ($hit = $result->next_hit) { ($gi) = $hit->name =~ /gi\|(\d+)\|/; $hsp = $hit->next_hsp; $hit_length=$hit->length; # $query_frame = $hsp->query->frame,$hsp->hit->frame; $query_frame = $hsp->query->frame; print "$query_frame\n"; $hit_frame = $hsp->hit->frame; print "$hit_frame\n"; $hsp_query_string = $hsp->query_string; #print "$hsp_query_string\n\n"; $hsp_homology_string = $hsp->homology_string; #print "$hsp_homology_string\n\n"; $hsp_hit_string = $hsp->hit_string; #print "$hsp_hit_string\n\n"; $hsp_frac_identical =$hsp->frac_identical*100; #print "$hsp_frac_identical%\n\n"; $hsp_frac_conserved= $hsp->frac_conserved*100; #print "$hsp_frac_conserved%\n\n"; $hsp_align="$hsp_query_string\n$hsp_homology_string\n$hsp_hit_string"; print "$hsp_align\n\n\n\n"; } } Results: [root@genome blast]# perl align-teste1.pl Name "main::gi" used only once: possible typo at align-teste1.pl line 17. Name "main::hsp_frac_conserved" used only once: possible typo at align-teste1.pl line 33. Name "main::hit_length" used only once: possible typo at align-teste1.pl line 19. Name "main::hsp_frac_identical" used only once: possible typo at align-teste1.pl line 31. Name "main::rank" used only once: possible typo at align-teste1.pl line 15. 333 334 335 336 337 1 (should be +2) 0 (should be -1) YLTPTPIEPHL Y+TPTPIEPHL YITPTPIEPHL 338 339 340 341 342 343 0 (should be +1) 0 (should be +1) IHCEELKQLGRASEKCVL*LFNYSLDTGQVPAKWRHGIIVPQLKPNKSANSMASFRPAPKHSKLNRLGVPLLA ++ E L+ LG + VL LFN SL TG VP W+ G+I+P LK K A + S+RP S L ++ ++A LYNEALQHLGITALNVVLRLFNESLRTGVVPPAWKTGVIIPILKAGKKAEDLDSYRPVTLTSCLCKVMERIIA From jason.stajich at duke.edu Fri Nov 26 15:24:01 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Nov 26 15:22:02 2004 Subject: [Bioperl-l] Parsing Hit/Query Frames from Blastx In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575FAB@alpha.ioc.fiocruz.br> References: <8D44604203DAF9438BF9123B4A08C779575FAB@alpha.ioc.fiocruz.br> Message-ID: <1BD3220E-3FE9-11D9-B611-000393C44276@duke.edu> See the FAQ. http://bioperl.org/Core/Latest/faq.html#Q3.5 On Nov 26, 2004, at 1:52 PM, davila wrote: > Hi, > > Trying to parse a Blasxt output file, I realized it is not catching > the real values of Hit_Frame and Query_Frame such as showed in the > Bioperl Howtos: > > http://bioperl.org/HOWTOs/SearchIO/use.html > > HSP frame 0 $hsp->query->frame,$hsp->hit->frame > > My code (listed below) is returning wrong (Hit and Query) Frame > values, maybe I am doing something wrong. Any help would be greatly > appreciated. > > Thanks, Alberto > > ******* > > Code: > > use lib "/usr/local/bioperl14"; > use Bio::SearchIO; > > > $searchio = new Bio::SearchIO ('-format' => 'blast', > '-file' => 'clusters.blast'); > > while ($result = $searchio->next_result) { > $query_name = $result->query_name(); > $cluster_id = $query_name; > print "$cluster_id\n"; > $rank = 1; > while ($hit = $result->next_hit) { > ($gi) = $hit->name =~ /gi\|(\d+)\|/; > $hsp = $hit->next_hsp; > $hit_length=$hit->length; > # $query_frame = $hsp->query->frame,$hsp->hit->frame; > $query_frame = $hsp->query->frame; > print "$query_frame\n"; > $hit_frame = $hsp->hit->frame; > print "$hit_frame\n"; > $hsp_query_string = $hsp->query_string; > #print "$hsp_query_string\n\n"; > $hsp_homology_string = $hsp->homology_string; > #print "$hsp_homology_string\n\n"; > $hsp_hit_string = $hsp->hit_string; > #print "$hsp_hit_string\n\n"; > $hsp_frac_identical =$hsp->frac_identical*100; > #print "$hsp_frac_identical%\n\n"; > $hsp_frac_conserved= $hsp->frac_conserved*100; > #print "$hsp_frac_conserved%\n\n"; > $hsp_align="$hsp_query_string\n$hsp_homology_string\n$hsp_hit_string"; > > print "$hsp_align\n\n\n\n"; > } > } > > Results: > > [root@genome blast]# perl align-teste1.pl > Name "main::gi" used only once: possible typo at align-teste1.pl line > 17. > Name "main::hsp_frac_conserved" used only once: possible typo at > align-teste1.pl line 33. > Name "main::hit_length" used only once: possible typo at > align-teste1.pl line 19. > Name "main::hsp_frac_identical" used only once: possible typo at > align-teste1.pl line 31. > Name "main::rank" used only once: possible typo at align-teste1.pl > line 15. > 333 > 334 > 335 > 336 > 337 > 1 (should be +2) > 0 (should be -1) > YLTPTPIEPHL > Y+TPTPIEPHL > YITPTPIEPHL > > > > 338 > 339 > 340 > 341 > 342 > 343 > 0 (should be +1) > 0 (should be +1) > IHCEELKQLGRASEKCVL*LFNYSLDTGQVPAKWRHGIIVPQLKPNKSANSMASFRPAPKHSKLNRLGVPL > LA > ++ E L+ LG + VL LFN SL TG VP W+ G+I+P LK K A + S+RP S L ++ > ++A > LYNEALQHLGITALNVVLRLFNESLRTGVVPPAWKTGVIIPILKAGKKAEDLDSYRPVTLTSCLCKVMERI > IA > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Fri Nov 26 15:55:09 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Nov 26 15:52:50 2004 Subject: [Bioperl-l] Easy switching from wwwBlast to QBlast In-Reply-To: <5B03F9DE-3DFA-11D9-AF99-000A95B139D2@bioinfo.ca> References: <5B03F9DE-3DFA-11D9-AF99-000A95B139D2@bioinfo.ca> Message-ID: <75616934-3FED-11D9-B611-000393C44276@duke.edu> Dear Madeleine - Great. Would love for someone to be a maintainer and keeper of this module. All your changes sound great. I think a new function in Bio::Perl would be the best way to allow providing of a new localserver. Note that Bio::Perl is supposed to really just be a convenience of just having a list of functions for new users - so there is room for new *well named* functions to be added there. As for applying the changes - you can submit a patch of differences for your new code versus the current CVS HEAD by making changes and then running "cvs diff -aur " to get the changes in a patch format. You'll want to checkout the code via CVS first - http://cvs.open-bio.org/. We have to give you an authorized account to be able to apply changes back to the repository though. Once you've submitted a few fixes to show you understand the toolkit and the coding practices we can see about getting you that account. -jason On Nov 24, 2004, at 4:22 AM, Madeleine Lemieux wrote: > I've just recently started exploring BioPerl (v.1.4). So far it's been > fun if a little daunting. > > As an exercise, I decided to try change the blast_sequence subroutine > in Perl.pm so that it would let me send the query to either my local > wwwBlast server or out over my slow, flakey internet connection to the > QBlast server. I did this by adding a parameter LOCALSERVER which, if > set to a URL, redirects the query to that server (e.g. LOCALSERVER => > http://localhost/blast/blast.cgi); otherwise, it defaults to the > server at the NCBI. > > I've also added support for query by accession or gi # (QBlast only > since wwwBlast doesn't support such queries), submission of multiple > sequences (either in a file or string or string variable), as well as > passing any of the QBlast Put and Get options as parameters. Unlike > the original one, my blast_sequence returns an array of results, not a > single result, so that code calling my version of blast_sequence in a > scalar context would incorrectly get the size of the array. > > Apart from Perl.pm, the only other file that I had to change was > Bio/Tools/Run/RemoteBlast.pm. I just downloaded the latest release > candidate, 1.5.RC1, and noticed that RemoteBlast.pm has been changed > in ways that overlap with the changes I've made while maintaining > backwards compatibility which my version does not since I was only > working for myself at the time. > > So my question is: is anyone interested in getting the code I've > developed? If so, a corollary question is: how do I go about > contributing the code? I can pretty easily forward port my changes to > RemoteBlast.pm to the 1.5.RC1 version in order to use the nice > "validate by regexp" trick introduced there and to provide backwards > compatibility. I'm not sure what to do about the Perl.pm module, > though. I guess that the easiest would be to change the name of my > blast_sequence subroutine and add it to Perl.pm since there is no > object interface being altered. > > As I was working on this, I noticed that the HTML stripping that gets > done on the response from the QBlast server fails on wwwBlast output > since the format of the HTML is a little different (manifests as a > "can't find mid-line data" error when processing the alignments). So I > wrote a generic stripper which removes all HTML tags except those that > contain an end-of-line within the tag itself or an internal, > un-escaped closing angle bracket (>) which wouldn't be valid HTML > anyway, I think. It doesn't touch single angle brackets (>) such as > those found at the beginning of descriptions (>gi ...). > # html stripper > # remove simple and closing tags first and then leftover tags > $str =~ s/<(\/)?\w+>//g; > $str =~ s/<\D+([^>]*\n*)*>//g; > > Also, when retrieving RIDs in RemoteBlast.pm (retrieve_rid), the test > for completion relies on the size of the file containing the reply. > This has failed at least once for me. Since there is a status line > near the top of the file in the response, it seems to me that > something along the lines of the following might be more robust: > # read file until QBlastInfoEnd to pull out status > my $status = ''; > my $junk = ''; > open(TMP, $tempfile) or $self->throw("cannot open $tempfile"); > while( defined (my $line = ) ) { > last if ($line =~ /QBlastInfoEnd/); > ($junk, $status) = (split /=/, $line) if ($line =~ > /waiting|ready/i); > } > close TMP; > > if( $response->is_success ) { > if ( $status =~ /waiting/i ) { > return 0; > } elsif ( $status =~ /ready/i ) { > ... > } else { # failed > ... > } > } ... > > Finally, let me end by thanking all the BioPerl contributors for their > fine work. > > Regards, > Madeleine > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From allenday at ucla.edu Fri Nov 26 21:58:35 2004 From: allenday at ucla.edu (Allen Day) Date: Fri Nov 26 21:56:21 2004 Subject: [Bioperl-l] bad entries in interpro In-Reply-To: <20041124003021.GA12104@cecm.usp.br> References: <20041124003021.GA12104@cecm.usp.br> Message-ID: the problem is that iprscan (the program that produces the interpro xml files) does not properly xml-escape some characters. there is an if-block in the module that tries to catch things like quotes and ampersands, but it's by no means exhaustive. the preferred solutions to this, in order of descending difficulty for you are to: A) complain to the interpro authors/maintainers and get them to make valid xml. B) write an if-block that will exhaustively escape characters that should be escaped. C) hack the current if-block to support your special character. I'll be happy to merge in a patch for you for case B or C. Go ahead and modify the module, and run: % diff -Bbup interpro.pm interpro.pm.new > interpro.patch and post the file to the list. -allen On Tue, 23 Nov 2004, Robson Francisco de Souza {S} wrote: > Hi everyone, > > A few days ago, Mikko Arvas sent an e-mail to this list asking how to > ignore bad entries in the matches.xml file from the InterPro database. > Hilmar Lapp answered asking him to locate the position in the file that > raises the error message > > >> not well-formed (invalid token) at line 2, column 53, byte 131 at > >> /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm > >> line 187 > > Well, I saw no answers on the list, therefore I'm sending the problemtic > entry below: > > crc64="9797609B487FD64E"> > > > The problem seems to be the "'" annotation at the second line. > > I also tested if an eval clause could be used to bypass such entries > without crashing a script. The example script below worked fine and > reported a problem with the entry above without crashing. > > Would it be too dificult to make interpro.pm able to parse names like > the one above? > > Robson > > ################################################## > #!/usr/bin/perl -w > > use strict; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-file=>$ARGV[0], > -format=>"interpro"); > > my $i=1; > while (1) { > my $seq; > eval { > $seq = $in->next_seq; > }; > last if (!defined $seq); > if ($@) { print STDERR "Problem parsing sequence $i..."; next }; > print STDERR $seq->id,"\n"; > print "<=== ",$seq->id,"===>\n"; > foreach my $f ($seq->get_all_SeqFeatures) { > print $f->gff_string,"\n"; > foreach my $key ($f->annotation->get_all_annotation_keys) { > foreach my $value ($f->annotation->get_Annotations($key)) { > print $key,":",$value->as_text,"\n"; > } > } > } > $i++; > } > > exit 0; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Sat Nov 27 00:49:34 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 00:47:18 2004 Subject: [Bioperl-l] bad entries in interpro In-Reply-To: Message-ID: <1DC72628-4038-11D9-A28A-000A959EB4C4@gmx.net> On Friday, November 26, 2004, at 06:58 PM, Allen Day wrote: > the problem is that iprscan (the program that produces the interpro xml > files) does not properly xml-escape some characters. Printing prime as ' does seem properly escaped though. I'm not sure whether I missed something but I couldn't find any other characters that should have been escaped but weren't in this entry. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Nov 27 01:03:05 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 01:00:49 2004 Subject: [Bioperl-l] bad entries in interpro In-Reply-To: Message-ID: <0159256F-403A-11D9-A28A-000A959EB4C4@gmx.net> On Friday, November 26, 2004, at 06:58 PM, Allen Day wrote: > the problem is that iprscan (the program that produces the interpro xml > files) does not properly xml-escape some characters. there is an > if-block > in the module that tries to catch things like quotes and ampersands, > but > it's by no means exhaustive. > I looked at the module and now I'm confused. Did I miss something or can you explain why my $zinc = "(\"zincins\")"; my $wing = "\"Winged helix\""; my $finger = "\"zinc finger\""; had to be hard-coded? I would have thought you wanted to encode (and catch!) the quotes, not the phrase Winged helix? Also, did you check whether CGI.pm or a LWP module or some such doesn't solve this problem generically enough? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Nov 27 01:06:57 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 01:04:44 2004 Subject: [Bioperl-l] bad entries in interpro In-Reply-To: <20041124003021.GA12104@cecm.usp.br> Message-ID: <8BA3431E-403A-11D9-A28A-000A959EB4C4@gmx.net> On Tuesday, November 23, 2004, at 04:30 PM, Robson Francisco de Souza {S} wrote: > >>> not well-formed (invalid token) at line 2, column 53, byte 131 at >>> /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm >>> line 187 > > Well, I saw no answers on the list, therefore I'm sending the > problemtic > entry below: > > crc64="9797609B487FD64E"> > > > The problem seems to be the "'" annotation at the second line. Did you try and delete the two ' from the entry and then it passed fine? Otherwise, the ' is not the problem. > > I also tested if an eval clause could be used to bypass such entries > without crashing a script. The example script below worked fine and > reported a problem with the entry above without crashing. This will work as long as you don't need to resume parsing of the block of text that raised the exception, and if the file pointer is properly advanced. The way SeqIO::interpro.pm works neither seems to be a problem. > > Would it be too dificult to make interpro.pm able to parse names like > the one above? What throws up is the XML parser (expat). There's nothing interpro.pm can do about this to mitigate it, once it happened. The only course of help is to prepare the text block to be parsed such that it won't raise exceptions. -hilmar > > Robson > > ################################################## > #!/usr/bin/perl -w > > use strict; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-file=>$ARGV[0], > -format=>"interpro"); > > my $i=1; > while (1) { > my $seq; > eval { > $seq = $in->next_seq; > }; > last if (!defined $seq); > if ($@) { print STDERR "Problem parsing sequence $i..."; next }; > print STDERR $seq->id,"\n"; > print "<=== ",$seq->id,"===>\n"; > foreach my $f ($seq->get_all_SeqFeatures) { > print $f->gff_string,"\n"; > foreach my $key ($f->annotation->get_all_annotation_keys) { > foreach my $value ($f->annotation->get_Annotations($key)) { > print $key,":",$value->as_text,"\n"; > } > } > } > $i++; > } > > exit 0; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Nov 27 01:08:55 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 01:06:45 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/SeqFeature Annotated.pm, 1.16, 1.17 In-Reply-To: Message-ID: What's the point of issuing a warning to the caller that what she just specifically asked for now indeed is going to happen? Remember, the default is not to allow expansion - you have to specifically ask for it. -hilmar On Wednesday, November 24, 2004, at 10:49 AM, Allen Day wrote: > i'd like a warning issued if the range expands. i see expansion as a > convenience feature, and want to know when it happens because i don't > generally expect it. > > On Wed, 24 Nov 2004, Steffen Grossman wrote: > >> Update of /home/repository/bioperl/bioperl-live/Bio/SeqFeature >> In directory pub.open-bio.org:/tmp/cvs-serv19535 >> >> Modified Files: >> Annotated.pm >> Log Message: >> Added support for the 'EXPAND' option in 'add_SeqFeature'. >> >> >> Index: Annotated.pm >> =================================================================== >> RCS file: >> /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v >> retrieving revision 1.16 >> retrieving revision 1.17 >> diff -C2 -d -r1.16 -r1.17 >> *** Annotated.pm 24 Nov 2004 02:14:06 -0000 1.16 >> --- Annotated.pm 24 Nov 2004 16:31:59 -0000 1.17 >> *************** >> *** 604,617 **** >> =head2 add_SeqFeature() >> >> ! Usage : $obj->add_SeqFeature($feat); >> ! Function: Returns : nothing >> ! Args : A Bio::SeqFeatureI object. Objects not implementing >> Bio::SeqFeatureI >> ! and those whose bounds are not within those of the >> called object are >> ! ignored with a warning. >> >> =cut >> >> sub add_SeqFeature { >> ! my ($self,$val) = @_; >> >> return undef unless $val; >> --- 604,625 ---- >> =head2 add_SeqFeature() >> >> ! Usage : $feat->add_SeqFeature($subfeat); >> ! $feat->add_SeqFeature($subfeat,'EXPAND') >> ! Function: adds a SeqFeature into the subSeqFeature array. >> ! with no 'EXPAND' qualifer, subfeat will be tested >> ! as to whether it lies inside the parent, and throw >> ! an exception if not. >> ! >> ! If EXPAND is used, the parent''s start/end/strand will >> ! be adjusted so that it grows to accommodate the new >> ! subFeature >> ! Example : >> ! Returns : nothing >> ! Args : a Bio::SeqFeatureI object >> >> =cut >> >> sub add_SeqFeature { >> ! my ($self,$val, $expand) = @_; >> >> return undef unless $val; >> *************** >> *** 619,626 **** >> if ( !$val->isa('Bio::SeqFeatureI') ) { >> $self->warn("$val does not implement Bio::SeqFeatureI, >> ignoring."); >> } >> >> ! if ( !$self->contains($val) ) { >> ! $self->warn("$val is not contained within parent feature, >> ignoring."); >> } >> >> --- 627,640 ---- >> if ( !$val->isa('Bio::SeqFeatureI') ) { >> $self->warn("$val does not implement Bio::SeqFeatureI, >> ignoring."); >> + return undef; >> } >> >> ! if($expand && ($expand eq 'EXPAND')) { >> ! $self->_expand_region($val); >> ! } else { >> ! if ( !$self->contains($val) ) { >> ! $self->warn("$val is not contained within parent feature, and >> expansion is not valid, ignoring."); >> ! return undef; >> ! } >> } >> >> *************** >> *** 734,737 **** >> --- 748,785 ---- >> my ($self) = @_; >> return $self->{'targets'} ? @{ $self->{'targets'} } : (); >> + } >> + >> + =head2 _expand_region >> + >> + Title : _expand_region >> + Usage : $self->_expand_region($feature); >> + Function: Expand the total region covered by this feature to >> + accomodate for the given feature. >> + >> + May be called whenever any kind of subfeature is added >> to this >> + feature. add_SeqFeature() already does this. >> + Returns : >> + Args : A Bio::SeqFeatureI implementing object. >> + >> + >> + =cut >> + >> + sub _expand_region { >> + my ($self, $feat) = @_; >> + if(! $feat->isa('Bio::SeqFeatureI')) { >> + $self->warn("$feat does not implement Bio::SeqFeatureI"); >> + } >> + # if this doesn't have start/end set - forget it! >> + if((! defined($self->start())) && (! defined $self->end())) { >> + $self->start($feat->start()); >> + $self->end($feat->end()); >> + $self->strand($feat->strand) unless >> defined($self->strand()); >> + # $self->strand($feat->strand) unless $self->strand(); >> + } else { >> + my $range = $self->union($feat); >> + $self->start($range->start); >> + $self->end($range->end); >> + $self->strand($range->strand); >> + } >> } >> >> >> _______________________________________________ >> Bioperl-guts-l mailing list >> Bioperl-guts-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Sat Nov 27 01:20:39 2004 From: allenday at ucla.edu (Allen Day) Date: Sat Nov 27 01:18:36 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl-live/Bio/SeqFeature Annotated.pm, 1.16, 1.17 In-Reply-To: References: Message-ID: yeah, you're right. i was thinking this was the default behaviour. -allen On Fri, 26 Nov 2004, Hilmar Lapp wrote: > What's the point of issuing a warning to the caller that what she just > specifically asked for now indeed is going to happen? > > Remember, the default is not to allow expansion - you have to > specifically ask for it. > > -hilmar > > On Wednesday, November 24, 2004, at 10:49 AM, Allen Day wrote: > > > i'd like a warning issued if the range expands. i see expansion as a > > convenience feature, and want to know when it happens because i don't > > generally expect it. > > > > On Wed, 24 Nov 2004, Steffen Grossman wrote: > > > >> Update of /home/repository/bioperl/bioperl-live/Bio/SeqFeature > >> In directory pub.open-bio.org:/tmp/cvs-serv19535 > >> > >> Modified Files: > >> Annotated.pm > >> Log Message: > >> Added support for the 'EXPAND' option in 'add_SeqFeature'. > >> > >> > >> Index: Annotated.pm > >> =================================================================== > >> RCS file: > >> /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v > >> retrieving revision 1.16 > >> retrieving revision 1.17 > >> diff -C2 -d -r1.16 -r1.17 > >> *** Annotated.pm 24 Nov 2004 02:14:06 -0000 1.16 > >> --- Annotated.pm 24 Nov 2004 16:31:59 -0000 1.17 > >> *************** > >> *** 604,617 **** > >> =head2 add_SeqFeature() > >> > >> ! Usage : $obj->add_SeqFeature($feat); > >> ! Function: Returns : nothing > >> ! Args : A Bio::SeqFeatureI object. Objects not implementing > >> Bio::SeqFeatureI > >> ! and those whose bounds are not within those of the > >> called object are > >> ! ignored with a warning. > >> > >> =cut > >> > >> sub add_SeqFeature { > >> ! my ($self,$val) = @_; > >> > >> return undef unless $val; > >> --- 604,625 ---- > >> =head2 add_SeqFeature() > >> > >> ! Usage : $feat->add_SeqFeature($subfeat); > >> ! $feat->add_SeqFeature($subfeat,'EXPAND') > >> ! Function: adds a SeqFeature into the subSeqFeature array. > >> ! with no 'EXPAND' qualifer, subfeat will be tested > >> ! as to whether it lies inside the parent, and throw > >> ! an exception if not. > >> ! > >> ! If EXPAND is used, the parent''s start/end/strand will > >> ! be adjusted so that it grows to accommodate the new > >> ! subFeature > >> ! Example : > >> ! Returns : nothing > >> ! Args : a Bio::SeqFeatureI object > >> > >> =cut > >> > >> sub add_SeqFeature { > >> ! my ($self,$val, $expand) = @_; > >> > >> return undef unless $val; > >> *************** > >> *** 619,626 **** > >> if ( !$val->isa('Bio::SeqFeatureI') ) { > >> $self->warn("$val does not implement Bio::SeqFeatureI, > >> ignoring."); > >> } > >> > >> ! if ( !$self->contains($val) ) { > >> ! $self->warn("$val is not contained within parent feature, > >> ignoring."); > >> } > >> > >> --- 627,640 ---- > >> if ( !$val->isa('Bio::SeqFeatureI') ) { > >> $self->warn("$val does not implement Bio::SeqFeatureI, > >> ignoring."); > >> + return undef; > >> } > >> > >> ! if($expand && ($expand eq 'EXPAND')) { > >> ! $self->_expand_region($val); > >> ! } else { > >> ! if ( !$self->contains($val) ) { > >> ! $self->warn("$val is not contained within parent feature, and > >> expansion is not valid, ignoring."); > >> ! return undef; > >> ! } > >> } > >> > >> *************** > >> *** 734,737 **** > >> --- 748,785 ---- > >> my ($self) = @_; > >> return $self->{'targets'} ? @{ $self->{'targets'} } : (); > >> + } > >> + > >> + =head2 _expand_region > >> + > >> + Title : _expand_region > >> + Usage : $self->_expand_region($feature); > >> + Function: Expand the total region covered by this feature to > >> + accomodate for the given feature. > >> + > >> + May be called whenever any kind of subfeature is added > >> to this > >> + feature. add_SeqFeature() already does this. > >> + Returns : > >> + Args : A Bio::SeqFeatureI implementing object. > >> + > >> + > >> + =cut > >> + > >> + sub _expand_region { > >> + my ($self, $feat) = @_; > >> + if(! $feat->isa('Bio::SeqFeatureI')) { > >> + $self->warn("$feat does not implement Bio::SeqFeatureI"); > >> + } > >> + # if this doesn't have start/end set - forget it! > >> + if((! defined($self->start())) && (! defined $self->end())) { > >> + $self->start($feat->start()); > >> + $self->end($feat->end()); > >> + $self->strand($feat->strand) unless > >> defined($self->strand()); > >> + # $self->strand($feat->strand) unless $self->strand(); > >> + } else { > >> + my $range = $self->union($feat); > >> + $self->start($range->start); > >> + $self->end($range->end); > >> + $self->strand($range->strand); > >> + } > >> } > >> > >> > >> _______________________________________________ > >> Bioperl-guts-l mailing list > >> Bioperl-guts-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > From allenday at ucla.edu Sat Nov 27 01:22:58 2004 From: allenday at ucla.edu (Allen Day) Date: Sat Nov 27 01:20:38 2004 Subject: [Bioperl-l] bad entries in interpro In-Reply-To: <0159256F-403A-11D9-A28A-000A959EB4C4@gmx.net> References: <0159256F-403A-11D9-A28A-000A959EB4C4@gmx.net> Message-ID: ooh, that's bad. sorry, i didn't write this but (i think) the burden has fallen on me to maintain it. -allen On Fri, 26 Nov 2004, Hilmar Lapp wrote: > > On Friday, November 26, 2004, at 06:58 PM, Allen Day wrote: > > > the problem is that iprscan (the program that produces the interpro xml > > files) does not properly xml-escape some characters. there is an > > if-block > > in the module that tries to catch things like quotes and ampersands, > > but > > it's by no means exhaustive. > > > > I looked at the module and now I'm confused. Did I miss something or > can you explain why > > my $zinc = "(\"zincins\")"; > my $wing = "\"Winged helix\""; > my $finger = "\"zinc finger\""; > > had to be hard-coded? I would have thought you wanted to encode (and > catch!) the quotes, not the phrase Winged helix? > > Also, did you check whether CGI.pm or a LWP module or some such doesn't > solve this problem generically enough? > > -hilmar > From hlapp at gmx.net Sat Nov 27 01:38:03 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 01:35:46 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: My worry was that to anybody who 1) treated objects returned by $seq->get_SeqFeatures() as implementing SeqFeatureI, 2) assumed that instantiating SeqFeature::Generic would give him a SeqFeatureI compliant object, 3) wrote her own SeqFeatureI compliant object and treated it as such these changes should be fully transparent. By fully transparent I mean that first the respective code should be no less correct than it was before, and that the code should produce the exact same results, including what is sent to standard error. By a SeqFeatureI compliant object I mean an object that IS-A Bio::SeqFeatureI as it was defined before those changes, with all calls to the previously defined functions being perfectly legal. Also, a class that I wrote before those changes to implement SeqFeatureI should now still implement SeqFeatureI without me having to make any changes. If this is satisfied then I have no worries to this end. If this is not satisfied then I think it needs to go on a branch or at least stay out of 1.5. I don't understand the implications of Allen's comments below enough to make the judgement as to whether transparency is provided; Allen will need to make this call himself I'm afraid. Unfortunately, I think we've never written tests thorough enough that would let us infer this from everything passing or not. For instance, if you change SeqFeatureI and SeqFeatureI::Generic simultaneously, there is no way of telling whether your SeqFeatureI changes are transparent. Beyond this, I do agree with Chris that there could be a performance issue with richly annotated databanks. Tag values were simple strings; using objects for all of them instead at feature creation and population time may be too heavy. Also Chris, I thought the XML idea sounded very interesting; did you have Data::Stag in mind to manage it? It would still require a working XML parser installation, no? -hilmar On Wednesday, November 24, 2004, at 05:58 AM, Aaron J. Mackey wrote: > > Allen, thanks a ton for going the extra mile. Hilmar, does this > solution satisfy your worries a bit? > > Thanks again to everyone, > > -Aaron > > On Nov 23, 2004, at 9:16 PM, Allen Day wrote: > >> Fixed. Here is a summary of what I did to make this happen. I went >> ahead >> and did the work necessary to make Bio::SeqFeatureI AnnotatableI >> instead >> of being itself an AnnotationCollectionI. >> >> . Bio::SeqFeatureI inherits Bio::AnnotatableI NOT >> Bio::AnnotationCollectionI >> . *_tag_* methods are in Bio::AnnotatableI, and internally defer to >> Bio::AnnotatableI->annotation->some_analagous_mapped_function() >> . method behavior is now more similar to original *_tag_* method >> behavior ; tag "values" are now instantiated as >> Bio::Annotation::SimpleValue objects by default, unless their name >> indicates they should be otherwise (e.g. tag name "comment" or >> "dblink") >> . deprecation warnings commented until 1.6 >> . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow >> new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). >> . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as >> *_tag_* >> methods map directly onto Bio::AnnotationI's >> Bio::AnnotationCollectionI instance. >> . Unflattener and Unflattener2 tests pass with no changes. >> . All tests pass. >> >> -Allen >> >> >> On Tue, 23 Nov 2004, Chris Mungall wrote: >> >>> >>> Unflattener.t is failing because someone has messed up >>> get_tagset_values() >>> - this is a convenience method I originally added to SeqFeatureI. >>> I'm not >>> familiar enough with the new changes and AnnotationCollections to fix >>> this. >>> >>> Surely the onus has always been on the person making changes to make >>> sure >>> the test suite passes before committing their changes? In which >>> case, how >>> did these changes make it in in the first place? >>> >>> On Tue, 23 Nov 2004, Jason Stajich wrote: >>> >>>> >>>> On Nov 23, 2004, at 4:47 PM, Allen Day wrote: >>>> >>>>> On Tue, 23 Nov 2004, Jason Stajich wrote: >>>>> >>>>>> I think if we just don't issue deprecation warnings it will be >>>>>> fine by >>>>>> me -- even if we are just calling the new subroutine under the >>>>>> hood. >>>>>> Tests seem to pass although Unflattner.t is falling over today not >>>>>> sure >>>>>> what is problem. >>>>> >>>>> that fails for me too, in addition to spewing out lots of >>>>> diagnotistics. >>>>> however, if you run 'make test_Unflattener2', it passes. strange. >>>>> >>>> no it is Unflattner not Unflattner2 >>>> >>>> % make test_Unflattener >>>> [SNIP OUT SOME STUFF] >>>> >>>> -------------------- WARNING --------------------- >>>> MSG: get_tagset_values() is deprecated. use get_Annotations() >>>> --------------------------------------------------- >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" >>>> is >>>> not implemented by package Bio::SeqFeature::Generic. >>>> >>>> >>>>> -allen >>>>> >>>>>> >>>>>> -jason >>>>>> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: >>>>>> >>>>>>> >>>>>>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: >>>>>>>> >>>>>>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI >>>>>>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to >>>>>>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped >>>>>>>>> to >>>>>>>>> their >>>>>>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI >>>>>>>>> methods. >>>>>>>>> >>>>>>>>> Methods which were not in Bio::AnnotationCollectionI, but >>>>>>>>> were i >>>>>>>>> Bio::Annotation::Collection and were necessary for *_tag_* >>>>>>>>> method >>>>>>>>> remapping were created in Bio::AnnotationCollecitonI. >>>>>>> >>>>>>> I've been paying some attention to this, but thought that the >>>>>>> changes >>>>>>> were only those required to get Bio::FeatureIO working (i.e. >>>>>>> recapitulate GFF3 logic) without hampering object usage; do our >>>>>>> tests >>>>>>> pass with these changes in place? >>>>>>> >>>>>>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: >>>>>>> >>>>>>>> it has not been tagged yet. I think Aaron is just really busy >>>>>>>> on >>>>>>>> this front. >>>>>>> >>>>>>> I did tag the HEAD at RC1, so we could branch from there if we >>>>>>> needed >>>>>>> to; if this is really the big bug-bear that Hilmar and Jason are >>>>>>> claiming, then I'd ask Allen to retract his patches that alter >>>>>>> interface definitions, and branch. >>>>>>> >>>>>>> And I was so hoping to get RC2 packaged up later today ... >>>>>>> >>>>>>> -Aaron >>>>>>> >>>>>>> -- >>>>>>> Aaron J. Mackey, Ph.D. >>>>>>> Dept. of Biology, Goddard 212 >>>>>>> University of Pennsylvania email: amackey@pcbi.upenn.edu >>>>>>> 415 S. University Avenue office: 215-898-1205 >>>>>>> Philadelphia, PA 19104-6017 fax: 215-746-6697 >>>>>>> >>>>>>> >>>>>> -- >>>>>> Jason Stajich >>>>>> jason.stajich at duke.edu >>>>>> http://www.duke.edu/~jes12/ >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@portal.open-bio.org >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at duke.edu >>>> http://www.duke.edu/~jes12/ >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> >> > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Nov 27 01:54:50 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 01:52:34 2004 Subject: [Bioperl-l] Annotated.pm In-Reply-To: Message-ID: <3B9D02E8-4041-11D9-A28A-000A959EB4C4@gmx.net> Well, if you return an object from source_tag() instead of a string your SeqFeatureI compliance goes out the window right there. Friendly stringification through overloading double quotes does not mitigate this. Same goes BTW for primary_tag(), seq_id() etc. Looking at the code, you're returning an object for these. Not SeqFeatureI contract-compliant. As an aside, your get_Annotations() short-cut is brittle. If someone happens to add a second 'source' annotation (or any other tag for that matter), it will break and return the length of the array instead of the first element. Furthermore, I wouldn't test for IS-A Bio::Annotation::OntologyTerm - this is only an implementation class and one day there may be better ones. What you really care about is that the object IS-A Bio::AnnotationI (so that you can add it to the collection) and IS-A Bio::Ontology::TermI (so that you have your ontology-enforced typing). -hilmar On Thursday, November 25, 2004, at 02:45 AM, Allen Day wrote: > i don't think it should do this, the whole point of using an AC is to > retain typing. if you want to have a friendly stringification of the > source attribute, use overloading in SimpleValue. there is an > as_text() > method already in there that is currently commented out. > > -allen > > > On Thu, 25 Nov 2004, Steffen Grossman wrote: > >> Update of /home/repository/bioperl/bioperl-live/Bio/SeqFeature >> In directory pub.open-bio.org:/tmp/cvs-serv22868 >> >> Modified Files: >> Annotated.pm >> Log Message: >> 'source' now gives back its value instead of a >> Bio::Annotation::SimpleValue object. (This is consisitent with >> 'type'). >> >> >> Index: Annotated.pm >> =================================================================== >> RCS file: >> /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v >> retrieving revision 1.17 >> retrieving revision 1.18 >> diff -C2 -d -r1.17 -r1.18 >> *** Annotated.pm 24 Nov 2004 16:31:59 -0000 1.17 >> --- Annotated.pm 25 Nov 2004 10:02:50 -0000 1.18 >> *************** >> *** 181,185 **** >> } >> >> ! return $self->get_Annotations('source'); >> } >> >> --- 181,188 ---- >> } >> >> ! my $source_anno = $self->get_Annotations('source'); >> ! >> ! return $source_anno->value if ($source_anno); >> ! return undef; >> } >> >> >> _______________________________________________ >> Bioperl-guts-l mailing list >> Bioperl-guts-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Nov 27 02:03:06 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 02:01:16 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: <63C1A952-4042-11D9-A28A-000A959EB4C4@gmx.net> On Tuesday, November 23, 2004, at 07:07 PM, Chris Mungall wrote: > AnnotatableI->annotation returns AnnotationCollectionI > (*not* an AnnotationI) > > AnnotationCollectionI->get_Annotations returns list-of AnnotationI > > why can't accessor methods be named after the class of objects they > return, rather than a different class? It makes things a lot easier for > the easily confused like myself. True, and should certainly be refactored at some point. Historically, $seq->annotation preceded $seq->isa("Bio::AnnotatableI") by several years, so when I added the interface definition I just took the name as it was. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Sat Nov 27 02:06:41 2004 From: allenday at ucla.edu (Allen Day) Date: Sat Nov 27 02:05:21 2004 Subject: [Bioperl-l] Annotated.pm In-Reply-To: <3B9D02E8-4041-11D9-A28A-000A959EB4C4@gmx.net> References: <3B9D02E8-4041-11D9-A28A-000A959EB4C4@gmx.net> Message-ID: On Fri, 26 Nov 2004, Hilmar Lapp wrote: > Well, if you return an object from source_tag() instead of a string > your SeqFeatureI compliance goes out the window right there. Friendly > stringification through overloading double quotes does not mitigate > this. > > Same goes BTW for primary_tag(), seq_id() etc. Looking at the code, > you're returning an object for these. Not SeqFeatureI > contract-compliant. stringified reference ( e.g. REF(0x804d584) ) is a string in my book. this is a backward compatible change. if you're worried that the user will see the stringified reference if treating the returned value as a string and not a data structure, that's what the overloading is for. i don't see a problem here. > As an aside, your get_Annotations() short-cut is brittle. If someone > happens to add a second 'source' annotation (or any other tag for that > matter), it will break and return the length of the array instead of this is intentional and documented. > the first element. Furthermore, I wouldn't test for IS-A > Bio::Annotation::OntologyTerm - this is only an implementation class > and one day there may be better ones. What you really care about is > that the object IS-A Bio::AnnotationI (so that you can add it to the > collection) and IS-A Bio::Ontology::TermI (so that you have your > ontology-enforced typing). what class are you referring to here? > > -hilmar > > On Thursday, November 25, 2004, at 02:45 AM, Allen Day wrote: > > > i don't think it should do this, the whole point of using an AC is to > > retain typing. if you want to have a friendly stringification of the > > source attribute, use overloading in SimpleValue. there is an > > as_text() > > method already in there that is currently commented out. > > > > -allen > > > > > > On Thu, 25 Nov 2004, Steffen Grossman wrote: > > > >> Update of /home/repository/bioperl/bioperl-live/Bio/SeqFeature > >> In directory pub.open-bio.org:/tmp/cvs-serv22868 > >> > >> Modified Files: > >> Annotated.pm > >> Log Message: > >> 'source' now gives back its value instead of a > >> Bio::Annotation::SimpleValue object. (This is consisitent with > >> 'type'). > >> > >> > >> Index: Annotated.pm > >> =================================================================== > >> RCS file: > >> /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v > >> retrieving revision 1.17 > >> retrieving revision 1.18 > >> diff -C2 -d -r1.17 -r1.18 > >> *** Annotated.pm 24 Nov 2004 16:31:59 -0000 1.17 > >> --- Annotated.pm 25 Nov 2004 10:02:50 -0000 1.18 > >> *************** > >> *** 181,185 **** > >> } > >> > >> ! return $self->get_Annotations('source'); > >> } > >> > >> --- 181,188 ---- > >> } > >> > >> ! my $source_anno = $self->get_Annotations('source'); > >> ! > >> ! return $source_anno->value if ($source_anno); > >> ! return undef; > >> } > >> > >> > >> _______________________________________________ > >> Bioperl-guts-l mailing list > >> Bioperl-guts-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > From hlapp at gmx.net Sat Nov 27 02:31:08 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 02:29:02 2004 Subject: [Bioperl-l] Annotated.pm In-Reply-To: Message-ID: <4E243FC0-4046-11D9-A28A-000A959EB4C4@gmx.net> On Friday, November 26, 2004, at 11:06 PM, Allen Day wrote: > On Fri, 26 Nov 2004, Hilmar Lapp wrote: > >> Well, if you return an object from source_tag() instead of a string >> your SeqFeatureI compliance goes out the window right there. Friendly >> stringification through overloading double quotes does not mitigate >> this. >> >> Same goes BTW for primary_tag(), seq_id() etc. Looking at the code, >> you're returning an object for these. Not SeqFeatureI >> contract-compliant. > > stringified reference ( e.g. REF(0x804d584) ) is a string in my book. > this is a backward compatible change. if you're worried that the user > will see the stringified reference if treating the returned value as a > string and not a data structure, that's what the overloading is for. > > i don't see a problem here. Well, what 'string' as return type basically means in perl speak is a scalar that when evaluated yields the value of the string. I.e., when I call $feature->primary_tag("my primary tag") and later ask for $tag = $feature->primary_tag() then ref($tag) eq ""; # is true print $tag; # prints the string my primary tag length($tag) == 14; # is true join("",split(//,$tag)) eq $tag; # is true and other things expecting a string will work as expected, like my $stringio = IO::String->new($tag); $dbi_statement_handle->bind_param(1, $tag); to name a few. If you can achieve this by returning a data structure instead of a string, fine with me. -hilmar > >> As an aside, your get_Annotations() short-cut is brittle. If someone >> happens to add a second 'source' annotation (or any other tag for that >> matter), it will break and return the length of the array instead of > > this is intentional and documented. > >> the first element. Furthermore, I wouldn't test for IS-A >> Bio::Annotation::OntologyTerm - this is only an implementation class >> and one day there may be better ones. What you really care about is >> that the object IS-A Bio::AnnotationI (so that you can add it to the >> collection) and IS-A Bio::Ontology::TermI (so that you have your >> ontology-enforced typing). > > what class are you referring to here? > >> >> -hilmar >> >> On Thursday, November 25, 2004, at 02:45 AM, Allen Day wrote: >> >>> i don't think it should do this, the whole point of using an AC is to >>> retain typing. if you want to have a friendly stringification of the >>> source attribute, use overloading in SimpleValue. there is an >>> as_text() >>> method already in there that is currently commented out. >>> >>> -allen >>> >>> >>> On Thu, 25 Nov 2004, Steffen Grossman wrote: >>> >>>> Update of /home/repository/bioperl/bioperl-live/Bio/SeqFeature >>>> In directory pub.open-bio.org:/tmp/cvs-serv22868 >>>> >>>> Modified Files: >>>> Annotated.pm >>>> Log Message: >>>> 'source' now gives back its value instead of a >>>> Bio::Annotation::SimpleValue object. (This is consisitent with >>>> 'type'). >>>> >>>> >>>> Index: Annotated.pm >>>> =================================================================== >>>> RCS file: >>>> /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v >>>> retrieving revision 1.17 >>>> retrieving revision 1.18 >>>> diff -C2 -d -r1.17 -r1.18 >>>> *** Annotated.pm 24 Nov 2004 16:31:59 -0000 1.17 >>>> --- Annotated.pm 25 Nov 2004 10:02:50 -0000 1.18 >>>> *************** >>>> *** 181,185 **** >>>> } >>>> >>>> ! return $self->get_Annotations('source'); >>>> } >>>> >>>> --- 181,188 ---- >>>> } >>>> >>>> ! my $source_anno = $self->get_Annotations('source'); >>>> ! >>>> ! return $source_anno->value if ($source_anno); >>>> ! return undef; >>>> } >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-guts-l mailing list >>>> Bioperl-guts-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Nov 27 02:33:26 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 02:31:07 2004 Subject: [Bioperl-l] Annotated.pm In-Reply-To: Message-ID: On Friday, November 26, 2004, at 11:06 PM, Allen Day wrote: >> As an aside, your get_Annotations() short-cut is brittle. If someone >> happens to add a second 'source' annotation (or any other tag for that >> matter), it will break and return the length of the array instead of > > this is intentional and documented. Maybe I'm missing something but I'm not sure why you would want code that is brittle on purpose. Why would that help the end user? > >> the first element. Furthermore, I wouldn't test for IS-A >> Bio::Annotation::OntologyTerm - this is only an implementation class >> and one day there may be better ones. What you really care about is >> that the object IS-A Bio::AnnotationI (so that you can add it to the >> collection) and IS-A Bio::Ontology::TermI (so that you have your >> ontology-enforced typing). > > what class are you referring to here? The one in the subject line. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From allenday at ucla.edu Sat Nov 27 02:45:44 2004 From: allenday at ucla.edu (Allen Day) Date: Sat Nov 27 02:43:25 2004 Subject: [Bioperl-l] Annotated.pm In-Reply-To: References: Message-ID: On Fri, 26 Nov 2004, Hilmar Lapp wrote: > > On Friday, November 26, 2004, at 11:06 PM, Allen Day wrote: > > >> As an aside, your get_Annotations() short-cut is brittle. If someone > >> happens to add a second 'source' annotation (or any other tag for that > >> matter), it will break and return the length of the array instead of > > > > this is intentional and documented. > > Maybe I'm missing something but I'm not sure why you would want code > that is brittle on purpose. Why would that help the end user? if you're not sure how many annotations there are, you'd better call it in list context unless you want the count: my @value = $feature->get_Annotations('key1'); my $count = $feature->get_Annotations('key1'); but if you're sure there is only one, you can call it either way: my @value = $feature->get_Annotations('key2'); my $value = $feature->get_Annotations('key2'); > >> the first element. Furthermore, I wouldn't test for IS-A > >> Bio::Annotation::OntologyTerm - this is only an implementation class > >> and one day there may be better ones. What you really care about is > >> that the object IS-A Bio::AnnotationI (so that you can add it to the > >> collection) and IS-A Bio::Ontology::TermI (so that you have your > >> ontology-enforced typing). > > > > what class are you referring to here? > > The one in the subject line. I don't see any lines with 'isa' that also contain Bio::Annotation::OntologyTerm. Please be more specific. > > -hilmar > From hlapp at gmx.net Sat Nov 27 03:15:51 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 03:13:46 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: Message-ID: <8D60A196-404C-11D9-A28A-000A959EB4C4@gmx.net> So, after checking out some pieces of code, AnnotatableI in particular, I'm not convinced that this transparency is achieved already; some more thoughts will have to be spent. E.g., get_all_tags() will now return all annotation keys ever added, regardless of whether through $feat->primary_tag, $feat->source_tag, $feat->annotation->add_Annotation, or $feat->add_tag_value, unless any of these methods is overridden. This is very different from what it used to be, namely only return those keys added through $feat->add_tag_value. Anybody who made any assumption that what you didn't add through add_tag_value you'll not get back through get_all_tags will be hosed. I.e., anybody who assumed that $feat->get_all_tags and $feat->annotation->get_all_annotation_keys hold distinct sets of values will be hosed. This includes bioperl-db, for those who care. It is also not what SeqFeature::Generic will do now, namely not return the primary_tag and source_tag keys. Allen, you changed the t/AnnotationAdaptor.t test in order to make it pass after the changes, and what you changed wasn't fixing a bug. This is cheating, quite frankly - you removed the evidence that the changes are not transparent. I suggest you roll back AnnotationAdaptor.t to the previous version and see that it passes the way it was. Or this entire change is rolled back from the release trunk and ironed out on a branch first. I'm really not convinced that releasing a one-week old substantial change to some of the core modules to the general public is the best way of minimizing avoidable frustration on the users' end. Call me too conservative, but given previous fiascos with prematurely released code in bioperl I just can't help but feel this may be too risky an undertaking for a release that was originally fancied to supplant 1.4. -hilmar On Friday, November 26, 2004, at 10:38 PM, Hilmar Lapp wrote: > My worry was that to anybody who > > 1) treated objects returned by $seq->get_SeqFeatures() as > implementing SeqFeatureI, > 2) assumed that instantiating SeqFeature::Generic would give him a > SeqFeatureI compliant object, > 3) wrote her own SeqFeatureI compliant object and treated it as such > > these changes should be fully transparent. By fully transparent I mean > that first the respective code should be no less correct than it was > before, and that the code should produce the exact same results, > including what is sent to standard error. By a SeqFeatureI compliant > object I mean an object that IS-A Bio::SeqFeatureI as it was defined > before those changes, with all calls to the previously defined > functions being perfectly legal. Also, a class that I wrote before > those changes to implement SeqFeatureI should now still implement > SeqFeatureI without me having to make any changes. > > If this is satisfied then I have no worries to this end. If this is > not satisfied then I think it needs to go on a branch or at least stay > out of 1.5. I don't understand the implications of Allen's comments > below enough to make the judgement as to whether transparency is > provided; Allen will need to make this call himself I'm afraid. > Unfortunately, I think we've never written tests thorough enough that > would let us infer this from everything passing or not. For instance, > if you change SeqFeatureI and SeqFeatureI::Generic simultaneously, > there is no way of telling whether your SeqFeatureI changes are > transparent. > > Beyond this, I do agree with Chris that there could be a performance > issue with richly annotated databanks. Tag values were simple strings; > using objects for all of them instead at feature creation and > population time may be too heavy. Also Chris, I thought the XML idea > sounded very interesting; did you have Data::Stag in mind to manage > it? It would still require a working XML parser installation, no? > > -hilmar > > On Wednesday, November 24, 2004, at 05:58 AM, Aaron J. Mackey wrote: > >> >> Allen, thanks a ton for going the extra mile. Hilmar, does this >> solution satisfy your worries a bit? >> >> Thanks again to everyone, >> >> -Aaron >> >> On Nov 23, 2004, at 9:16 PM, Allen Day wrote: >> >>> Fixed. Here is a summary of what I did to make this happen. I went >>> ahead >>> and did the work necessary to make Bio::SeqFeatureI AnnotatableI >>> instead >>> of being itself an AnnotationCollectionI. >>> >>> . Bio::SeqFeatureI inherits Bio::AnnotatableI NOT >>> Bio::AnnotationCollectionI >>> . *_tag_* methods are in Bio::AnnotatableI, and internally defer to >>> Bio::AnnotatableI->annotation->some_analagous_mapped_function() >>> . method behavior is now more similar to original *_tag_* method >>> behavior ; tag "values" are now instantiated as >>> Bio::Annotation::SimpleValue objects by default, unless their >>> name >>> indicates they should be otherwise (e.g. tag name "comment" or >>> "dblink") >>> . deprecation warnings commented until 1.6 >>> . Bio::AnnotatableI now keeps a tag->annotation_type registry to >>> allow >>> new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). >>> . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as >>> *_tag_* >>> methods map directly onto Bio::AnnotationI's >>> Bio::AnnotationCollectionI instance. >>> . Unflattener and Unflattener2 tests pass with no changes. >>> . All tests pass. >>> >>> -Allen >>> >>> >>> On Tue, 23 Nov 2004, Chris Mungall wrote: >>> >>>> >>>> Unflattener.t is failing because someone has messed up >>>> get_tagset_values() >>>> - this is a convenience method I originally added to SeqFeatureI. >>>> I'm not >>>> familiar enough with the new changes and AnnotationCollections to >>>> fix >>>> this. >>>> >>>> Surely the onus has always been on the person making changes to >>>> make sure >>>> the test suite passes before committing their changes? In which >>>> case, how >>>> did these changes make it in in the first place? >>>> >>>> On Tue, 23 Nov 2004, Jason Stajich wrote: >>>> >>>>> >>>>> On Nov 23, 2004, at 4:47 PM, Allen Day wrote: >>>>> >>>>>> On Tue, 23 Nov 2004, Jason Stajich wrote: >>>>>> >>>>>>> I think if we just don't issue deprecation warnings it will be >>>>>>> fine by >>>>>>> me -- even if we are just calling the new subroutine under the >>>>>>> hood. >>>>>>> Tests seem to pass although Unflattner.t is falling over today >>>>>>> not >>>>>>> sure >>>>>>> what is problem. >>>>>> >>>>>> that fails for me too, in addition to spewing out lots of >>>>>> diagnotistics. >>>>>> however, if you run 'make test_Unflattener2', it passes. strange. >>>>>> >>>>> no it is Unflattner not Unflattner2 >>>>> >>>>> % make test_Unflattener >>>>> [SNIP OUT SOME STUFF] >>>>> >>>>> -------------------- WARNING --------------------- >>>>> MSG: get_tagset_values() is deprecated. use get_Annotations() >>>>> --------------------------------------------------- >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" >>>>> is >>>>> not implemented by package Bio::SeqFeature::Generic. >>>>> >>>>> >>>>>> -allen >>>>>> >>>>>>> >>>>>>> -jason >>>>>>> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: >>>>>>> >>>>>>>> >>>>>>>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: >>>>>>>>> >>>>>>>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI >>>>>>>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to >>>>>>>>>> Bio::AnnotationCollectionI, marked as deprecated, and >>>>>>>>>> mapped to >>>>>>>>>> their >>>>>>>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI >>>>>>>>>> methods. >>>>>>>>>> >>>>>>>>>> Methods which were not in Bio::AnnotationCollectionI, but >>>>>>>>>> were i >>>>>>>>>> Bio::Annotation::Collection and were necessary for *_tag_* >>>>>>>>>> method >>>>>>>>>> remapping were created in Bio::AnnotationCollecitonI. >>>>>>>> >>>>>>>> I've been paying some attention to this, but thought that the >>>>>>>> changes >>>>>>>> were only those required to get Bio::FeatureIO working (i.e. >>>>>>>> recapitulate GFF3 logic) without hampering object usage; do our >>>>>>>> tests >>>>>>>> pass with these changes in place? >>>>>>>> >>>>>>>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: >>>>>>>> >>>>>>>>> it has not been tagged yet. I think Aaron is just really busy >>>>>>>>> on >>>>>>>>> this front. >>>>>>>> >>>>>>>> I did tag the HEAD at RC1, so we could branch from there if we >>>>>>>> needed >>>>>>>> to; if this is really the big bug-bear that Hilmar and Jason are >>>>>>>> claiming, then I'd ask Allen to retract his patches that alter >>>>>>>> interface definitions, and branch. >>>>>>>> >>>>>>>> And I was so hoping to get RC2 packaged up later today ... >>>>>>>> >>>>>>>> -Aaron >>>>>>>> >>>>>>>> -- >>>>>>>> Aaron J. Mackey, Ph.D. >>>>>>>> Dept. of Biology, Goddard 212 >>>>>>>> University of Pennsylvania email: amackey@pcbi.upenn.edu >>>>>>>> 415 S. University Avenue office: 215-898-1205 >>>>>>>> Philadelphia, PA 19104-6017 fax: 215-746-6697 >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Jason Stajich >>>>>>> jason.stajich at duke.edu >>>>>>> http://www.duke.edu/~jes12/ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@portal.open-bio.org >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@portal.open-bio.org >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> -- >>>>> Jason Stajich >>>>> jason.stajich at duke.edu >>>>> http://www.duke.edu/~jes12/ >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>> >>> >> -- >> Aaron J. Mackey, Ph.D. >> Dept. of Biology, Goddard 212 >> University of Pennsylvania email: amackey@pcbi.upenn.edu >> 415 S. University Avenue office: 215-898-1205 >> Philadelphia, PA 19104-6017 fax: 215-746-6697 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Nov 27 03:32:22 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 03:30:05 2004 Subject: [Bioperl-l] Annotated.pm In-Reply-To: Message-ID: On Friday, November 26, 2004, at 11:45 PM, Allen Day wrote: > On Fri, 26 Nov 2004, Hilmar Lapp wrote: > >> >> On Friday, November 26, 2004, at 11:06 PM, Allen Day wrote: >> >>>> As an aside, your get_Annotations() short-cut is brittle. If someone >>>> happens to add a second 'source' annotation (or any other tag for >>>> that >>>> matter), it will break and return the length of the array instead of >>> >>> this is intentional and documented. >> >> Maybe I'm missing something but I'm not sure why you would want code >> that is brittle on purpose. Why would that help the end user? > > if you're not sure how many annotations there are, you'd better call > it in > list context unless you want the count: > > my @value = $feature->get_Annotations('key1'); > my $count = $feature->get_Annotations('key1'); Well, as an implementation agnostic client $feature IS-A AnnotatableI and therefore you can't call $feature->get_Annotations but need $feature->annotation->get_Annotations('key1'). But this is not what I meant. As an example, $annotated->source_tag() will delegate to source(): sub source_tag { return $shift->source(@_); } Annotated::source returns what the get_Annotations() short-cut returns: return $self->get_Annotations('source'); If somebody accidentally added another annotation with tag 'source', not knowing that it is being used internally, the next call to $annotated->source_tag() will return a number, not the source tag, and not an array of source tags. This is what I mean by brittle. I mean that it is easy to hang yourself as a user and you don't even get a warning before you die. In this case it will even be a very slow death since a number is still a scalar, and in order to realize the problem you need to actually see that it is a number and not a meaningful string. > > but if you're sure there is only one, you can call it either way: > > my @value = $feature->get_Annotations('key2'); > my $value = $feature->get_Annotations('key2'); > >>>> the first element. Furthermore, I wouldn't test for IS-A >>>> Bio::Annotation::OntologyTerm - this is only an implementation class >>>> and one day there may be better ones. What you really care about is >>>> that the object IS-A Bio::AnnotationI (so that you can add it to the >>>> collection) and IS-A Bio::Ontology::TermI (so that you have your >>>> ontology-enforced typing). >>> >>> what class are you referring to here? >> >> The one in the subject line. > > I don't see any lines with 'isa' that also contain > Bio::Annotation::OntologyTerm. Please be more specific. The implementation of Annotated::type() has the following piece of code in it: if(!ref($val)){ # blah ... } elsif(ref($val) && $val->isa('Bio::Annotation::OntologyTerm')){ $term = $val; } else { #we have the wrong type of object $self->throw('give type() a SOFA term name, identifier, or Bio::Annotation ::OntologyTerm object, not '.$val); } This means that if I came up with another implementation class that adapts a TermI to an AnnotationI I'd be rejected by this code, even though my implementation semantically would perfectly fit the bill. -hilmar > >> >> -hilmar >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Nov 27 03:41:39 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Nov 27 03:39:26 2004 Subject: [Bioperl-l] Annotated.pm In-Reply-To: Message-ID: <280C342B-4050-11D9-A28A-000A959EB4C4@gmx.net> I should add that even Annotated::source() is documented as returning a single object: Returns : a Bio::Annotation::SimpleValue object representing the source. In this it is easier to detect the problem because a number is not a reference, so the first attempt to de-reference it will cause a script to die. (Although I wouldn't want to be the person having to debug the original cause ...) See my point? -hilmar On Saturday, November 27, 2004, at 12:32 AM, Hilmar Lapp wrote: > > But this is not what I meant. As an example, $annotated->source_tag() > will delegate to source(): > > sub source_tag { > return $shift->source(@_); > } > > Annotated::source returns what the get_Annotations() short-cut returns: > > return $self->get_Annotations('source'); > > If somebody accidentally added another annotation with tag 'source', > not knowing that it is being used internally, the next call to > $annotated->source_tag() will return a number, not the source tag, and > not an array of source tags. > > This is what I mean by brittle. I mean that it is easy to hang > yourself as a user and you don't even get a warning before you die. In > this case it will even be a very slow death since a number is still a > scalar, and in order to realize the problem you need to actually see > that it is a number and not a meaningful string. > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From khufaz83 at yahoo.com Sat Nov 27 08:41:40 2004 From: khufaz83 at yahoo.com (hafiz hafiz) Date: Sat Nov 27 08:39:21 2004 Subject: [Bioperl-l] print out species from swiss prot? Message-ID: <20041127134140.30444.qmail@web52504.mail.yahoo.com> hello , some body can help me, what modul bioperl should we use to print out species form swissprot , i can print annotation , features ,id and seq only. ________________________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html From jason.stajich at duke.edu Sat Nov 27 08:53:36 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Nov 27 08:51:14 2004 Subject: [Bioperl-l] print out species from swiss prot? In-Reply-To: <20041127134140.30444.qmail@web52504.mail.yahoo.com> References: <20041127134140.30444.qmail@web52504.mail.yahoo.com> Message-ID: See the Feature-Annotation HOWTO http://bioperl.org/HOWTOs/Feature-Annotation/other_objects.html my $sp = $seq->species; print $sp->genus, " ", $sp->species,"\n"; -jason On Nov 27, 2004, at 8:41 AM, hafiz hafiz wrote: > hello , > > some body can help me, what modul bioperl should we > use to print out species form swissprot , i can print > annotation , features ,id and seq only. > > > _______________________________________________________________________ > _ > Yahoo! Messenger - Communicate instantly..."Ping" > your friends today! Download Messenger Now > http://uk.messenger.yahoo.com/download/index.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From amackey at pcbi.upenn.edu Sun Nov 28 11:22:11 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Sun Nov 28 11:19:27 2004 Subject: [Bioperl-l] Annotated.pm In-Reply-To: References: Message-ID: <41A9FB33.3070900@pcbi.upenn.edu> Allen Day wrote: > > if you're not sure how many annotations there are, you'd better call it in > list context unless you want the count: > > my @value = $feature->get_Annotations('key1'); > my $count = $feature->get_Annotations('key1'); > > but if you're sure there is only one, you can call it either way: > > my @value = $feature->get_Annotations('key2'); > my $value = $feature->get_Annotations('key2'); This is the kind of thing that, while cute, makes BioPerl really hard to teach to newcomers who expect things to be consistently one way or the other. To use DBI as a (formidable) example, when I call $sth->fetchrow(), I'm going to get back an array, even if there was only one column in that row. I didn't see this "short cut" slip by, but I'm really against things like this when they're not implemented project-wide, sorry. Something else for bioperl-nouveau ... -Aaron From amackey at pcbi.upenn.edu Sun Nov 28 11:28:53 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Sun Nov 28 11:26:09 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <8D60A196-404C-11D9-A28A-000A959EB4C4@gmx.net> References: <8D60A196-404C-11D9-A28A-000A959EB4C4@gmx.net> Message-ID: <41A9FCC5.60402@pcbi.upenn.edu> Hilmar Lapp wrote: > So, after checking out some pieces of code, AnnotatableI in particular, > I'm not convinced that this transparency is achieved already; some more > thoughts will have to be spent. > > E.g., get_all_tags() will now return all annotation keys ever added, > regardless of whether through $feat->primary_tag, $feat->source_tag, > $feat->annotation->add_Annotation, or $feat->add_tag_value, unless any > of these methods is overridden. Thanks for taking a deeper look into things, Hilmar. Is this true for all SeqFeatureI-implementing classes, or just Bio::SeqFeature::Annotated? > This is very different from what it used to be, namely only return those > keys added through $feat->add_tag_value. Anybody who made any assumption > that what you didn't add through add_tag_value you'll not get back > through get_all_tags will be hosed. I could see an argument that "all_tag_values" should really retrieve just that: all tags, not just any extra tags from add_tag_value. But I agree that it's potentially a significant change. > I.e., anybody who assumed that > $feat->get_all_tags and $feat->annotation->get_all_annotation_keys hold > distinct sets of values will be hosed. This includes bioperl-db, for > those who care. But surely bioperl-db could be fixed to no longer make this assumption? > It is also not what SeqFeature::Generic will do now, namely not return > the primary_tag and source_tag keys. Ahh, so we are talking only about Bio::SeqFeature::Annotated? -Aaron From Karolina.Zavisek at zg.htnet.hr Fri Nov 26 15:56:56 2004 From: Karolina.Zavisek at zg.htnet.hr (Karolina Zavisek) Date: Sun Nov 28 12:10:51 2004 Subject: [Bioperl-l] bl2seq Message-ID: <200411261457.iAQEuu8X017196@ls413.htnet.hr> Hi , I want to run Bio::Tools::Run::StandAloneBlast (bl2seq) on one sequence which is standard, KS, against a list of sequences I got from EMBL retrieved by Bio::DB::EMBL. Problem is I cannot make program to read through a list of sequences and to blast2 each of them against KS sequence. Any suggestions? Thank you, Karolina - here is a part of program whic executes blast: #!/usr/bin/perl -w use strict; use Bio::Seq; use Bio::SeqIO; use Math::BigFloat; use Bio::SearchIO; use Bio::Tools::Run::StandAloneBlast; # input sequence of KS domain my $query1_in = Bio::SeqIO->newFh ( -file => 'KS.fasta', -format => 'fasta' ); my $query1 = <$query1_in>; # input a secnd sequence to run a bl2seq program my $query2_in = Bio::SeqIO->newFh ( -file => 'list.fasta', -format => 'fasta' ); my $query2 = <$query2_in>; while (my $input = $query2->next_seq()) { # start the bl2seq and save the result in the mybl2seq.bls file my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn', 'outfile' => 'my_bl2seq.bls'); my $report = $factory->bl2seq($query1, $query2); } # and now comes parser ---------------------- T - c o m - - W e b m a i l ---------------------- Ova poruka poslana je upotrebom T-Com Webmail usluge. http://komunikator.tportal.hr From jterol at ivia.es Thu Nov 25 06:57:18 2004 From: jterol at ivia.es (Javier Terol) Date: Sun Nov 28 12:11:24 2004 Subject: [Bioperl-l] getting teh NO HITS FOUND with SearchIO Message-ID: <6.1.2.0.0.20041125125356.01a0c1f0@master.ivia.es> An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041125/583dd6b7/attachment-0001.htm From jterol at ivia.es Fri Nov 26 04:58:57 2004 From: jterol at ivia.es (Javier Terol) Date: Sun Nov 28 12:11:33 2004 Subject: [Bioperl-l] more questins about SearchIO Message-ID: <6.1.2.0.0.20041126105427.01a0be90@master.ivia.es> An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041126/50452c2d/attachment-0001.htm From fernan at iib.unsam.edu.ar Thu Nov 25 13:56:16 2004 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Sun Nov 28 12:12:04 2004 Subject: [Bioperl-l] Core dump in t/protgraph on FreeBSD In-Reply-To: <41A5F138.1030403@egenetics.com> References: <41A5A946.9040408@egenetics.com> <20041125123830.GD86543@iib.unsam.edu.ar> <41A5D3D0.3040500@pcbi.upenn.edu> <41A5E62A.2090302@egenetics.com> <41A5F138.1030403@egenetics.com> Message-ID: <20041125185616.GI86543@iib.unsam.edu.ar> +----[ Peter van Heusden (25.Nov.2004 12:06): | | And now with a fresh compile of Clone-0.15 this problem goes away. All | bioperl tests are now ok on this version of FreeBSD (4-STABLE). | | Peter | Peter van Heusden wrote: | +----] Things do not look so pretty here. Some tests are failing, attached is the output (make.out). My Perl is 5.6.1, maybe some tests are failing because of this? Other than that, perhaps it is because of some outdated dependency? I have not seen any warning from bioperl about missing dependencies. Just in case I have also attached is the list of my installed perl modules, with their versions, in case someone with a sharp vision can spot something. Peter, can you send me the output of 'ls /var/db/pkg | grep p5'? Thanks, Fernan -- Fern?n Ag?ero | Instituto de Investigaciones Biotecnol?gicas email | fernan at { iib.unsam.edu.ar , mail.retina.ar } wwww | http://genoma.unsam.edu.ar/~fernan phone, fax | +54 11 { 4580-7255 ext 310, 4752-9639 } -------------- next part -------------- p5-AcePerl-1.83 p5-Apache-Test-1.10 p5-AppConfig-1.56 p5-Archive-Tar-1.09 p5-Attribute-Handlers-0.78 p5-Authen-SASL-2.08 p5-Cache-Cache-1.02 p5-Carp-Assert-0.18 p5-Class-AutoClass-0.09 p5-Class-Container-0.10 p5-Class-Data-Inheritable-0.02_1 p5-Class-Fields-0.201 p5-Clone-0.15 p5-Compress-Zlib-1.33 p5-Crypt-SSLeay-0.51 p5-DBD-Pg-1.32 p5-DBD-mysql-2.9003 p5-DBI-1.42_1 p5-Data-Dumper-2.121 p5-Date-Manip-5.42a p5-Devel-StackTrace-1.11 p5-Devel-ptkdb-1.1086 p5-Digest-1.08 p5-Digest-HMAC-1.01 p5-Digest-MD5-2.33 p5-Digest-Nilsimsa-0.06 p5-Digest-SHA1-2.10 p5-Error-0.15 p5-Exception-Class-1.19 p5-ExtUtils-CBuilder-0.03 p5-ExtUtils-ParseXS-2.07 p5-File-Spec-0.86 p5-File-Temp-0.12_1 p5-Finance-Quote-1.08 p5-Finance-QuoteHist-0.31 p5-GD-2.16 p5-GD-SVG-0.25 p5-Graph-0.20105 p5-GraphViz-1.8 p5-HTML-0.6 p5-HTML-Mason-1.26_1 p5-HTML-Parser-3.36 p5-HTML-TableExtract-1.08 p5-HTML-Tagset-3.03 p5-Heap-0.70 p5-IO-Socket-SSL-0.96 p5-IO-String-1.05 p5-IO-Tty-1.02 p5-IO-Zlib-1.01 p5-IO-stringy-2.108 p5-IPC-Run-0.78 p5-IPC-ShareLite-0.09 p5-Inline-0.44 p5-Inline-Java-0.33 p5-MIME-Base64-3.01 p5-MIME-Lite-3.01 p5-MIME-Tools-5.411a_3,1 p5-Mail-SpamAssassin-3.0.1_1 p5-Mail-Tools-1.62 p5-Math-Bezier-0.01 p5-Math-Derivative-0.01 p5-Math-Random-0.67 p5-Math-Spline-0.01 p5-Module-Build-0.25.01 p5-Net-1.18,1 p5-Net-DNS-0.47 p5-Net-Daemon-0.38 p5-Net-SSLeay-1.25 p5-Params-Validate-0.74 p5-Parse-RecDescent-1.94 p5-Parse-Yapp-1.05 p5-PlRPC-0.2017 p5-PodParser-1.28 p5-SOAP-Lite-0.60a p5-SVG-2.28 p5-SVG-Graph-0.01 p5-Scalar-List-Utils-1.14,1 p5-Statistics-Descriptive-2.6 p5-Storable-2.12 p5-Template-Toolkit-2.13 p5-Test-Harness-2.42 p5-Test-Simple-0.47_1 p5-Text-Balanced-1.95 p5-Text-Iconv-1.3 p5-Text-Shellwords-1.02 p5-Tie-IxHash-1.21 p5-Time-HiRes-1.59,1 p5-TimeDate-1.16,1 p5-Tk-800.023 p5-Tree-DAG_Node-1.04 p5-URI-1.31 p5-VCG-0.5 p5-XML-DOM-1.43 p5-XML-Filter-BufferText-1.01 p5-XML-Handler-YAWriter-0.23 p5-XML-NamespaceSupport-1.08 p5-XML-Node-0.11 p5-XML-Parser-2.34_1 p5-XML-RegExp-0.03 p5-XML-SAX-0.12 p5-XML-SAX-Writer-0.44 p5-XML-Simple-2.12 p5-XML-Twig-3.15 p5-XML-Writer-0.500 p5-XML-XPath-1.13 p5-XML-XQL-0.68 p5-YAML-0.35 p5-base-2.03 p5-bioperl-1.4 p5-libapreq-1.3 p5-libwww-5.79 p5-libxml-0.07 p5-podlators-1.27 -------------- next part -------------- A non-text attachment was scrubbed... Name: make.out.gz Type: application/x-gunzip Size: 33635 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041125/ce6b3591/make.out.bin From jason.stajich at duke.edu Sun Nov 28 12:31:34 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Nov 28 12:32:10 2004 Subject: [Bioperl-l] more questins about SearchIO In-Reply-To: <6.1.2.0.0.20041126105427.01a0be90@master.ivia.es> References: <6.1.2.0.0.20041126105427.01a0be90@master.ivia.es> Message-ID: <59B83404-4163-11D9-850B-000393C44276@duke.edu> It is not part of the report so you need to get that information from another source, GenBank and EMBL for example. The handy Bio::DB::EMBL and Bio::DB::GenBank would let you retrieve a sequence given an accession number. If you are doing this a lot and only want species information you can download a mapping of just GI to taxa ID which is part of the genbank taxonomy database download from NCBI. -jason On Nov 26, 2004, at 4:58 AM, Javier Terol wrote: > Hi! > > One more question about SearchIO: > > Is it possible to get the species of the hit sequence? I could not > find such module described in the SearchIO HOWTO and I wonder if > anyone has solved the problem. > > Thank you very much in advance > > Javierf > > > ?? O@@@@@??????? > ? @@@O@@O@??????Dr. Javier Terol Alcayde > ? @O@@@@O@??????Instituto Valenciano de Investigaciones Agrarias > (IVIA) > ? @@@O@@@@??????Carretera Moncada - N?quera, Km. 4,5 > ?? @@@@O@???????46113 Moncada (Valencia) > ???? ||??? ?????Tel.???? 96 342 4000 ext. 70160 > ???? ||??? ?????Fax.??? 96 342 4001 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2643 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041128/be44d878/attachment.bin From brian_osborne at cognia.com Sun Nov 28 12:48:00 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Sun Nov 28 12:45:39 2004 Subject: [Bioperl-l] bl2seq In-Reply-To: <200411261457.iAQEuu8X017196@ls413.htnet.hr> Message-ID: Karolina, You're missing the "database" parameter to the new() call, for you this would be KS.fasta file. See the bptutorial for an example. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Karolina Zavisek Sent: Friday, November 26, 2004 10:57 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] bl2seq Hi , I want to run Bio::Tools::Run::StandAloneBlast (bl2seq) on one sequence which is standard, KS, against a list of sequences I got from EMBL retrieved by Bio::DB::EMBL. Problem is I cannot make program to read through a list of sequences and to blast2 each of them against KS sequence. Any suggestions? Thank you, Karolina - here is a part of program whic executes blast: #!/usr/bin/perl -w use strict; use Bio::Seq; use Bio::SeqIO; use Math::BigFloat; use Bio::SearchIO; use Bio::Tools::Run::StandAloneBlast; # input sequence of KS domain my $query1_in = Bio::SeqIO->newFh ( -file => 'KS.fasta', -format => 'fasta' ); my $query1 = <$query1_in>; # input a secnd sequence to run a bl2seq program my $query2_in = Bio::SeqIO->newFh ( -file => 'list.fasta', -format => 'fasta' ); my $query2 = <$query2_in>; while (my $input = $query2->next_seq()) { # start the bl2seq and save the result in the mybl2seq.bls file my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastn', 'outfile' => 'my_bl2seq.bls'); my $report = $factory->bl2seq($query1, $query2); } # and now comes parser ---------------------- T - c o m - - W e b m a i l ---------------------- Ova poruka poslana je upotrebom T-Com Webmail usluge. http://komunikator.tportal.hr _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Sun Nov 28 22:08:40 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Nov 28 22:06:59 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <41A9FCC5.60402@pcbi.upenn.edu> Message-ID: On Sunday, November 28, 2004, at 08:28 AM, Aaron J. Mackey wrote: >> E.g., get_all_tags() will now return all annotation keys ever added, >> regardless of whether through $feat->primary_tag, $feat->source_tag, >> $feat->annotation->add_Annotation, or $feat->add_tag_value, unless >> any of these methods is overridden. > > Thanks for taking a deeper look into things, Hilmar. Is this true for > all SeqFeatureI-implementing classes, or just > Bio::SeqFeature::Annotated? Including primary_tag and source_tag in the annotation collection is what SeqFeature::Annotated does. Lumping together tags and annotation is the default decorating implementation on the AnnotatableI interface for the *_tag_* methods, and hence will be used for all SeqFeatureI implementing classes that don't override it. SeqFeature::Generic, for instance, doesn't override it (anymore), but it does keep primary_tag and source_tag separate from the annotation bundle. So, evidently there are subtle aspects about the various tags and the annotation that have not been spelled out when documenting the contract. Specifically, this is whether an implementation may or must keep the different tags and annotations in sets that at least behave as semantically distinct sets. E.g., if you set a primary_tag, may an implementation treat it as annotation and still be SeqFeatureI compliant (meaning, as a SeqFeatureI consumer, you'd have to be prepared for that). Or, if you add a tag 'blah' via add_tag_value('blah',$whatever), may an implementation or may it not enforce the same class on $whatever that it enforces for annotations added to the collection under the same key 'blah'. In the absence of this having been spelled out beforehand, I would argue that the most frequently used implementations' behavior will probably constitute what other programmers took as an answer. In this case, that'd be SeqFeature::Generic ... > >> This is very different from what it used to be, namely only return >> those keys added through $feat->add_tag_value. Anybody who made any >> assumption that what you didn't add through add_tag_value you'll not >> get back through get_all_tags will be hosed. > > I could see an argument that "all_tag_values" should really retrieve > just that: all tags, not just any extra tags from add_tag_value. But > I agree that it's potentially a significant change. Yes, and yes. It's a drastic change in behavior, but would be sensible if the target is to eventually obsolete the concepts of tags and annotations being different animals. > >> I.e., anybody who assumed that $feat->get_all_tags and >> $feat->annotation->get_all_annotation_keys hold distinct sets of >> values will be hosed. This includes bioperl-db, for those who care. > > But surely bioperl-db could be fixed to no longer make this assumption? It certainly could, and I certainly will once this is the direction that bioperl 'officially' adopts. Right now, the change is two weeks old and I'm not sure anyone, including Allen, is ready to vouch that this is a real good way of doing things. [And as aside, fixing bioperl-db to work with the new API may make bioperl-db backwardly incompatible as well.] I'm not saying this change of direction may be a show-stopper for any dependent package like bioperl-db. All I'm suggesting is let's be clear that this *is* a change of direction for a core interface, and let's give it some time to phase it in and to iron out wrinkles, both on the end of bioperl itself as well as the end of people who write software against bioperl. Let's give it some time to see how it works, and how it works under stress, before letting it lose on the general public who just wanted to get some bugfixes on the 1.4.0 release or some additional parsers. > >> It is also not what SeqFeature::Generic will do now, namely not >> return the primary_tag and source_tag keys. > > Ahh, so we are talking only about Bio::SeqFeature::Annotated? Well, no, see above. SeqFeature::Generic does override primary_tag and source_tag, which is why here those tags won't make it into the annotation bundle. It doesn't override the *_tag_* methods, so for those you get the behavior from AnnotatableI. -hilmar > > -Aaron > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From grossman at molgen.mpg.de Mon Nov 29 04:12:31 2004 From: grossman at molgen.mpg.de (Steffen Grossmann) Date: Mon Nov 29 04:10:39 2004 Subject: tags and annotations, was Re: [Bioperl-l] Annotated.pm In-Reply-To: <280C342B-4050-11D9-A28A-000A959EB4C4@gmx.net> References: <280C342B-4050-11D9-A28A-000A959EB4C4@gmx.net> Message-ID: <41AAE7FF.1000207@molgen.mpg.de> I see this problem now. I wrote it like this to make sure that the $feat->source always gives back something, so that my scripts don't die, when I call $feat->source->value. Of course, getting back too many objects is also a problem and should be fixed... But! I also am more concerned with the fact that it is not yet clear how tags and annotations should be handled in the future. Of course, typed annotations are a good thing, especially when it comes to more complex objects like 'OntologyTerms' or 'DBLinks'. But, frankly, I sometimes think that it is an overkill for 'simple values': For example in the Bio::SeqFeature::Annotated::score method all the checking for appropriateness of the value is not provided by using Bio::Annotation::SimpleValue. We could, of course, start to implement all kinds of other Bio::AnnotationI classes like 'Bio::Annotation::SimpleScore' or 'Bio::AnnotationSimpleString', but do we really want this? I, personally, would a tag/annotation scheme like to fulfil the following (I am guided by GFF3): 1) An important thing, but which is missing up to now, is a mechanism which makes sure that there is only _one single_ entry under a certain tag. This is, e.g., important when setting the ID of a feature in the sense of GFF3. 2) In some cases we want to have several values stored under a tag, but we want them to be unique. This is, e.g., the case when giving the parents of a feature in the sense of GFF3. (By the way, getting/setting the parents should be connected to the get_SeqFeature and remove_SeqFeature methods directly) 3) Sometimes typing is important, e.g., when (again GFF3) setting/getting 'Dbxref' attributes and, of course, also when talking about the 'type' of a feature, which has to come from Sequence Ontology in GFF3. I think that we should distinguish between 'standard' annotation and 'custom' annotation. On the use and typing of 'standard' annotations we should agree community-wide, whereas for 'custom' annotation this is left to the user (although bioperl should help in dealing with it). Methods like 'seqid', 'source', 'type', etc. etc. are all concerned with standard notation (although I bet we don't have a well defined way of thinking about them...), but currently they are implemented using mechanism which have been written to deal with custom annotation! This, I think, is the main problem. When we want annotation which is completely compatible with, e.g., GFF3, we want to be very strict (since GFF3 is quite well-defined) and we don't want it to interfere with other kinds of annotation. This we haven't solved yet (although, when writing this email I start to get ideas about how to do it...). Steffen Hilmar Lapp wrote: > I should add that even Annotated::source() is documented as returning > a single object: > > Returns : a Bio::Annotation::SimpleValue object representing the source. > > In this it is easier to detect the problem because a number is not a > reference, so the first attempt to de-reference it will cause a script > to die. (Although I wouldn't want to be the person having to debug the > original cause ...) > > See my point? > > -hilmar > > On Saturday, November 27, 2004, at 12:32 AM, Hilmar Lapp wrote: > >> >> But this is not what I meant. As an example, $annotated->source_tag() >> will delegate to source(): >> >> sub source_tag { >> return $shift->source(@_); >> } >> >> Annotated::source returns what the get_Annotations() short-cut returns: >> >> return $self->get_Annotations('source'); >> >> If somebody accidentally added another annotation with tag 'source', >> not knowing that it is being used internally, the next call to >> $annotated->source_tag() will return a number, not the source tag, >> and not an array of source tags. >> >> This is what I mean by brittle. I mean that it is easy to hang >> yourself as a user and you don't even get a warning before you die. >> In this case it will even be a very slow death since a number is >> still a scalar, and in order to realize the problem you need to >> actually see that it is a number and not a meaningful string. >> -- %---------------------------------------------% % Steffen Grossmann % % % % Max Planck Institute for Molecular Genetics % % Computational Molecular Biology % %---------------------------------------------% % Ihnestrasse 73 % % 14195 Berlin % % Germany % %---------------------------------------------% % Tel: (++49 +30) 8413-1167 % % Fax: (++49 +30) 8413-1152 % %---------------------------------------------% From gongwuming at gmail.com Mon Nov 29 04:26:12 2004 From: gongwuming at gmail.com (Wuming Gong) Date: Mon Nov 29 04:23:48 2004 Subject: [Bioperl-l] Bio::DB::Query::GenBank In-Reply-To: <1099627959.6085.3.camel@Serenity> References: <1099627959.6085.3.camel@Serenity> Message-ID: <24d6fd0504112901265d9a06fe@mail.gmail.com> Hi Mona, I have met the same kind of problem. You may pull down the sequences once by less than 500 and It works. Wuming On Thu, 04 Nov 2004 21:12:40 -0700, Ligia Mateiu wrote: > Hi all, > I used a query for which exists >5000 hits in Genbank, but my code > retrieved just the very fist 500. > > Any idea why? > > Thanks a lot, > Mona > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From davila at ioc.fiocruz.br Mon Nov 29 06:22:06 2004 From: davila at ioc.fiocruz.br (davila) Date: Mon Nov 29 06:25:12 2004 Subject: [Bioperl-l] Help with BioGraphics Message-ID: <8D44604203DAF9438BF9123B4A08C779575FAC@alpha.ioc.fiocruz.br> Hi, We are using the "Parsing Real BLAST Output" (from HOWTOS) to build a panel that shows 6 frames and hits matching any frame. The problem being to parse and show results from different blast results (meaning: we have 4 different blast output files, all have the same queries but with different target databases) in a single 6-frame panel. Our code and preliminary results are listed below... any tip/help would be greatly appreciated. Thanks, Alberto ******* Code: #!/usr/bin/perl # This is code example 4 in the Graphics-HOWTO #use strict; use lib "/usr/local/bioperl14"; use GD; use Bio::Graphics; use Bio::SearchIO; $directory = $ARGV[0] or die; chdir $directory; chomp (@outs_blast = `ls`); $panel_control = 0; for (-3..+3) { $frame = "$_"; $print_line_frame = 1; print "$frame\n"; foreach $file (@outs_blast){ $searchio = Bio::SearchIO->new(-file => $file, -format => 'blast') or die "parse failed"; print "\n$file\n"; while ( $result = $searchio->next_result ) { $query_name = $result->query_name; print "Open file a Imagem da $query_name na frame $frame \n"; open (PNG, ">>$query_name.png"); if ($panel_control < 2 ){ print "Make the Painel of $query_name\n"; $panel = Bio::Graphics::Panel->new(-length => $result- >query_length, -width => 1000, -pad_left => 10, -pad_right => 10, ); } if ($print_line_frame == 1){ print "Building line of frame $frame:\n"; $full_length = Bio::SeqFeature::Generic->new(-start=>1, -end=> $result->query_length, - display_name=>$result->query_name."FRAME:$frame" ); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, -label => 1, ); } print "Buscando query na frame $frame no resultado $file \n"; $track = $panel->add_track(-glyph => 'graded_segments', -label => 1, -connector => 'dashed', -bgcolor => 'blue', -font2color => 'red', -sort_order => 'high_score', -description => sub { $feature = shift; return unless $feature- >has_tag('description'); ($description) = $feature->each_tag_value('description'); $score = $feature->score; "$description, score= $score"; } ); $count =0; while( $hit = $result->next_hit ) { next unless $count < 1; $feature_hit = Bio::SeqFeature::Generic->new(-score => $hit->raw_score, - display_name => $hit->name, -tag => { description => $hit->description } ); while( $hsp = $hit->next_hsp ) { $query_frame = ($hsp->query->frame + 1) * $hsp->query- >strand; $hit_frame = ($hsp->hit->frame + 1) * $hsp- >hit->strand; $feature_hit->add_sub_SeqFeature ($hsp,'EXPAND'); $count++; } if ($query_frame == $frame) { $track->add_feature($feature_hit); } } print PNG $panel->png; $panel = ""; close PNG; } $print_line_frame++; $panel_control++; } } this are part of the error: Make the Painel of 335 Buscando query na frame -3 no resultado blast.db.repbase.out gd-png: fatal libpng error: Image width or height is zero in IHDR gd-png error: setjmp returns error condition Segmentation fault From amackey at pcbi.upenn.edu Mon Nov 29 08:58:51 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Nov 29 08:56:29 2004 Subject: [Bioperl-l] Bio::DB::Query::GenBank In-Reply-To: <24d6fd0504112901265d9a06fe@mail.gmail.com> References: <1099627959.6085.3.camel@Serenity> <24d6fd0504112901265d9a06fe@mail.gmail.com> Message-ID: If you try again late at night (meaning late at night EST), you may get all 5000 hits; NCBI seems to have implemented a limit of 500 entries in batch retrieval when network load is already high, but you may be successful during non-peak hours ... -Aaron On Nov 29, 2004, at 4:26 AM, Wuming Gong wrote: > Hi Mona, > > I have met the same kind of problem. You may pull down the sequences > once by less than 500 and It works. > > Wuming > > > On Thu, 04 Nov 2004 21:12:40 -0700, Ligia Mateiu > wrote: >> Hi all, >> I used a query for which exists >5000 hits in Genbank, but my code >> retrieved just the very fist 500. >> >> Any idea why? >> >> Thanks a lot, >> Mona >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From amackey at pcbi.upenn.edu Mon Nov 29 09:07:31 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Nov 29 09:05:11 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: Message-ID: <02C65092-4210-11D9-8243-000D93392082@pcbi.upenn.edu> Yep, OK, I hear you. I really thought all this was going to be contained to Bio::SeqFeature::Annotated, but I see now that with all sorts of implementation happening in the interfaces (ugh!), this can't happen. Woe is me. Here's what I'm willing to do to keep Allen from pulling his hair out: there have been very few changes on the development trunk since RC1 that aren't Annotated.pm-related; therefore, (if this makes sense to everyone) I will branch 1.5.0 off of RC1 and merge only those patches that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then tag the branch at RC2 (and similarly tag the HEAD, so that any later merging can be done relative to those tags). Make sense? Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out the cleanest path for 1.6.0, in which all things may change (with an eye towards at least some backwards compatibility); my vote would be that there remain some separation between "heavy" and "light" feature types. I don't expect/need my Bio::SeqFeature::Simple to implement AnnotationCollection! Thanks again to everyone; let me know if the CVS plan above sounds reasonable ... -Aaron On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > I'm not saying this change of direction may be a show-stopper for any > dependent package like bioperl-db. All I'm suggesting is let's be > clear that this *is* a change of direction for a core interface, and > let's give it some time to phase it in and to iron out wrinkles, both > on the end of bioperl itself as well as the end of people who write > software against bioperl. Let's give it some time to see how it works, > and how it works under stress, before letting it lose on the general > public who just wanted to get some bugfixes on the 1.4.0 release or > some additional parsers. -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From Marc.Logghe at devgen.com Mon Nov 29 09:17:55 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Nov 29 09:17:04 2004 Subject: [Bioperl-l] Bio::DB::Query::GenBank Message-ID: Hi, I think you will always bump into that limit; it is the limit ncbi is using with efetch. I don't know how it is internally done by Bio::DB::Query::GenBank but it should go via a 2 step process: 1) you perform a query and you get a webenv and query key back 2) you fetch your sequences by passing your webenv and query key and explicitely requesting your record numbers in chunks of 500. I also never succeeded in fetching more that 500 sequences with Bio::DB::Query::GenBank. I am currently using a non bioperl script based on http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl. NCBI also asks to run these kind of queries at night EST, in the weekend and with a sleep of at least 5 sec between every fetch of 500 records. HTH, Marc > -----Original Message----- > From: Aaron J. Mackey [mailto:amackey@pcbi.upenn.edu] > Sent: Monday, November 29, 2004 2:59 PM > To: Wuming Gong > Cc: Bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::Query::GenBank > > > > If you try again late at night (meaning late at night EST), > you may get > all 5000 hits; NCBI seems to have implemented a limit of 500 > entries in > batch retrieval when network load is already high, but you may be > successful during non-peak hours ... > > -Aaron > > On Nov 29, 2004, at 4:26 AM, Wuming Gong wrote: > > > Hi Mona, > > > > I have met the same kind of problem. You may pull down the sequences > > once by less than 500 and It works. > > > > Wuming > > > > > > On Thu, 04 Nov 2004 21:12:40 -0700, Ligia Mateiu > > > wrote: > >> Hi all, > >> I used a query for which exists >5000 hits in Genbank, but my code > >> retrieved just the very fist 500. > >> > >> Any idea why? > >> > >> Thanks a lot, > >> Mona > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Mon Nov 29 09:43:28 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Nov 29 09:41:04 2004 Subject: [Bioperl-l] Bio::DB::Query::GenBank In-Reply-To: References: Message-ID: <0862067A-4215-11D9-B3BD-000393C44276@duke.edu> Lincoln did some fixes this summer which I think did this 2 step process for you in Bio::DB::GenBank (another reason we need to get 1.5.0 out there for people to use). Any chance you can try the RC1 or CVS live code as well to see if you are hitting the same problems. -jason On Nov 29, 2004, at 9:17 AM, Marc Logghe wrote: > Hi, > I think you will always bump into that limit; it is the limit ncbi is > using with efetch. > I don't know how it is internally done by Bio::DB::Query::GenBank but > it should go via a 2 step process: > 1) you perform a query and you get a webenv and query key back > 2) you fetch your sequences by passing your webenv and query key and > explicitely requesting your record numbers in chunks of 500. > I also never succeeded in fetching more that 500 sequences with > Bio::DB::Query::GenBank. > I am currently using a non bioperl script based on > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl. > NCBI also asks to run these kind of queries at night EST, in the > weekend and with a sleep of at least 5 sec between every fetch of 500 > records. > > HTH, > Marc > >> -----Original Message----- >> From: Aaron J. Mackey [mailto:amackey@pcbi.upenn.edu] >> Sent: Monday, November 29, 2004 2:59 PM >> To: Wuming Gong >> Cc: Bioperl-l@portal.open-bio.org >> Subject: Re: [Bioperl-l] Bio::DB::Query::GenBank >> >> >> >> If you try again late at night (meaning late at night EST), >> you may get >> all 5000 hits; NCBI seems to have implemented a limit of 500 >> entries in >> batch retrieval when network load is already high, but you may be >> successful during non-peak hours ... >> >> -Aaron >> >> On Nov 29, 2004, at 4:26 AM, Wuming Gong wrote: >> >>> Hi Mona, >>> >>> I have met the same kind of problem. You may pull down the sequences >>> once by less than 500 and It works. >>> >>> Wuming >>> >>> >>> On Thu, 04 Nov 2004 21:12:40 -0700, Ligia Mateiu >> >>> wrote: >>>> Hi all, >>>> I used a query for which exists >5000 hits in Genbank, but my code >>>> retrieved just the very fist 500. >>>> >>>> Any idea why? >>>> >>>> Thanks a lot, >>>> Mona >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Aaron J. Mackey, Ph.D. >> Dept. of Biology, Goddard 212 >> University of Pennsylvania email: amackey@pcbi.upenn.edu >> 415 S. University Avenue office: 215-898-1205 >> Philadelphia, PA 19104-6017 fax: 215-746-6697 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From grossman at molgen.mpg.de Mon Nov 29 10:32:50 2004 From: grossman at molgen.mpg.de (Steffen Grossmann) Date: Mon Nov 29 10:30:46 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <02C65092-4210-11D9-8243-000D93392082@pcbi.upenn.edu> References: <02C65092-4210-11D9-8243-000D93392082@pcbi.upenn.edu> Message-ID: <41AB4122.6000805@molgen.mpg.de> I definitely support this solution! I have some gff3 related code which got broken by the Annotated.pm related changes made after 1.5.0RC1. E.g. Chris Mungall's functioning Bio::Tools::GFF and Bio::SeqFeature::Tools::IDHandler stuff is not in 1.4. and doesn't seem to work with the changes. But still, I think we can do much better with the annotation business as proposed up to now, so let's go for this when heading towards 1.6.0. Steffen Aaron J. Mackey wrote: > > Yep, OK, I hear you. I really thought all this was going to be > contained to Bio::SeqFeature::Annotated, but I see now that with all > sorts of implementation happening in the interfaces (ugh!), this can't > happen. Woe is me. > > Here's what I'm willing to do to keep Allen from pulling his hair out: > there have been very few changes on the development trunk since RC1 > that aren't Annotated.pm-related; therefore, (if this makes sense to > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then tag > the branch at RC2 (and similarly tag the HEAD, so that any later > merging can be done relative to those tags). Make sense? > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > the cleanest path for 1.6.0, in which all things may change (with an > eye towards at least some backwards compatibility); my vote would be > that there remain some separation between "heavy" and "light" feature > types. I don't expect/need my Bio::SeqFeature::Simple to implement > AnnotationCollection! > > Thanks again to everyone; let me know if the CVS plan above sounds > reasonable ... > > -Aaron > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > >> I'm not saying this change of direction may be a show-stopper for any >> dependent package like bioperl-db. All I'm suggesting is let's be >> clear that this *is* a change of direction for a core interface, and >> let's give it some time to phase it in and to iron out wrinkles, both >> on the end of bioperl itself as well as the end of people who write >> software against bioperl. Let's give it some time to see how it >> works, and how it works under stress, before letting it lose on the >> general public who just wanted to get some bugfixes on the 1.4.0 >> release or some additional parsers. > > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- %---------------------------------------------% % Steffen Grossmann % % % % Max Planck Institute for Molecular Genetics % % Computational Molecular Biology % %---------------------------------------------% % Ihnestrasse 73 % % 14195 Berlin % % Germany % %---------------------------------------------% % Tel: (++49 +30) 8413-1167 % % Fax: (++49 +30) 8413-1152 % %---------------------------------------------% From barry.moore at genetics.utah.edu Mon Nov 29 12:04:05 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon Nov 29 12:02:00 2004 Subject: [Bioperl-l] Re: Bioperl In-Reply-To: References: Message-ID: <41AB5685.2050204@genetics.utah.edu> Reddy, I'm running your e-mail back through the bioperl list. It's a good idea to keep these discussions on the list so that others can contribute to and benefit from the answers as well. I'm also afraid that I can't help you much because I'm not sure what code I sent you or what your original question was. I looked back through my sent mail box, and it doesn't look like I've ever sent you e-mail before. Could you help jog my memory and provide a bit more background on your question, some code, etc. Barry haraneesh chintalapalli wrote: >Hi Barry, > > Happy thanks giving. > > Thanks for the code you had provided. This will be extremely helpful >to me. But however when I run the program from the website I get a >weird error . It would be great if you can help me out here. Sorry >for burdening u. > > >undefined subroutine &IO::String in c:/Perl/site/lib/DB/WebDBSeqI.pm > >IO::String is installed and works when invoked in an onother program. >I am running bioperl on windows. > >thanks , > >cheers, >Reddy > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From Marc.Logghe at devgen.com Mon Nov 29 14:43:16 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Nov 29 14:42:09 2004 Subject: [Bioperl-l] Bio::DB::Query::GenBank Message-ID: > Lincoln did some fixes this summer which I think did this 2 step > process for you in Bio::DB::GenBank (another reason we need to get > 1.5.0 out there for people to use). Any chance you can try > the RC1 or > CVS live code as well to see if you are hitting the same problems. No prob, Jason. Everything goes fine. The query in the test script returned 14565 records as it should. The perl-live release 1.4.0 only returned 50 (!) with the exact same script as shown below. #!/usr/bin/perl use Bio::DB::Query::GenBank; use Bio::DB::GenBank; my $query_string = '"Oryza"[Organism] AND EST[Keyword] AND "2004/10/30 15.03"[MDAT] : "2004/11/29 15.03"[MDAT]'; my $query = Bio::DB::Query::GenBank->new( -db => 'nucleotide', -query => $query_string ); my $gb = new Bio::DB::GenBank; my $stream = $gb->get_Stream_by_query($query); while ( my $seq = $stream->next_seq ) { # do something with the sequence object print $seq->accession_number, "\n"; } HTH, Marc > > -jason > On Nov 29, 2004, at 9:17 AM, Marc Logghe wrote: > > > Hi, > > I think you will always bump into that limit; it is the > limit ncbi is > > using with efetch. > > I don't know how it is internally done by > Bio::DB::Query::GenBank but > > it should go via a 2 step process: > > 1) you perform a query and you get a webenv and query key back > > 2) you fetch your sequences by passing your webenv and > query key and > > explicitely requesting your record numbers in chunks of 500. > > I also never succeeded in fetching more that 500 sequences with > > Bio::DB::Query::GenBank. > > I am currently using a non bioperl script based on > > > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl. > > NCBI also asks to run these kind of queries at night EST, in the > > weekend and with a sleep of at least 5 sec between every > fetch of 500 > > records. > > > > HTH, > > Marc > > > >> -----Original Message----- > >> From: Aaron J. Mackey [mailto:amackey@pcbi.upenn.edu] > >> Sent: Monday, November 29, 2004 2:59 PM > >> To: Wuming Gong > >> Cc: Bioperl-l@portal.open-bio.org > >> Subject: Re: [Bioperl-l] Bio::DB::Query::GenBank > >> > >> > >> > >> If you try again late at night (meaning late at night EST), > >> you may get > >> all 5000 hits; NCBI seems to have implemented a limit of 500 > >> entries in > >> batch retrieval when network load is already high, but you may be > >> successful during non-peak hours ... > >> > >> -Aaron > >> > >> On Nov 29, 2004, at 4:26 AM, Wuming Gong wrote: > >> > >>> Hi Mona, > >>> > >>> I have met the same kind of problem. You may pull down > the sequences > >>> once by less than 500 and It works. > >>> > >>> Wuming > >>> > >>> > >>> On Thu, 04 Nov 2004 21:12:40 -0700, Ligia Mateiu > >> > >>> wrote: > >>>> Hi all, > >>>> I used a query for which exists >5000 hits in Genbank, > but my code > >>>> retrieved just the very fist 500. > >>>> > >>>> Any idea why? > >>>> > >>>> Thanks a lot, > >>>> Mona > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l@portal.open-bio.org > >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> -- > >> Aaron J. Mackey, Ph.D. > >> Dept. of Biology, Goddard 212 > >> University of Pennsylvania email: amackey@pcbi.upenn.edu > >> 415 S. University Avenue office: 215-898-1205 > >> Philadelphia, PA 19104-6017 fax: 215-746-6697 > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > From ediths at unizh.ch Mon Nov 29 09:33:24 2004 From: ediths at unizh.ch (Edith Schlagenhauf) Date: Mon Nov 29 15:07:03 2004 Subject: [Bioperl-l] getting teh NO HITS FOUND with SearchIO In-Reply-To: <6.1.2.0.0.20041125125356.01a0c1f0@master.ivia.es> References: <6.1.2.0.0.20041125125356.01a0c1f0@master.ivia.es> Message-ID: Hi, to deal with "No Hits Found" you might just add sort of a check if the object returned by $result->next_hit() is defined/not defined, ie. my $best_hit = $result->next_hit(); if (!defined $best_hit) { print "No hits found \n"; } HTH, Edith On Thu, 25 Nov 2004, Javier Terol wrote: > Hi! > > I have been using the SearchIO scripts very happily, but I have noticed that the parsing does not qive > any information about the sequences that do not produce any hit. > How coul I get the NO HITS FOUND sequences? Could I modify the example script to do this? > > Thank you very muc in advance > > ?? O@@@@@??????? > ? @@@O@@O@??????Javier Terol Alcayde > ? @O@@@@O@??????Instituto Valenciano de Investigaciones Agrarias (IVIA) > ? @@@O@@@@??????Carretera Moncada - N?quera, Km. 4,5 > ?? @@@@O@???????46113 Moncada (Valencia) > ???? ||??? ?????Tel.???? 96 342 4000 ext. 70160 > ???? ||??? ?????Fax.??? 96 342 4001 > > > > ****************************************** Dr Edith Schlagenhauf Bioinformatics Institute of Plant Biology University of Zurich Zollikerstrasse 107 CH-8008 Zurich SWITZERLAND e-mail: ediths AT botinst DOT unizh DOT ch Tel.: +41 1 634 82 78 Fax : +41 1 634 82 04 ****************************************** From soojin.yi at biology.gatech.edu Mon Nov 29 11:44:09 2004 From: soojin.yi at biology.gatech.edu (Soojin Yi) Date: Mon Nov 29 15:07:11 2004 Subject: [Bioperl-l] LWP.pm Message-ID: Hello, I am trying to use Bioperl, and having a error message saying it cannot locate the LWP.pm. where can I get it? Thanks, Soojin -- ****************************************** Soojin Yi, PhD Assistant Professor School of Biology Georgia Institute of Technology 310 Ferst Drive Atlanta, GA 30332 (404) 385-6084 (tel) (404) 894-0519 (fax) ******************************************* From barry.moore at genetics.utah.edu Mon Nov 29 12:47:34 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon Nov 29 15:07:14 2004 Subject: [Bioperl-l] getting teh NO HITS FOUND with SearchIO In-Reply-To: <6.1.2.0.0.20041125125356.01a0c1f0@master.ivia.es> References: <6.1.2.0.0.20041125125356.01a0c1f0@master.ivia.es> Message-ID: <41AB60B6.1030204@genetics.utah.edu> Javier, Which SearchIO script are you using? You are parsing blast reports with SearchIO right? If so, how are you generating those blast reports - local blast, remote blast...? Running the blast is the point at which you should be detecting whether or not you got a hit. Look at the synopsis code for Bio::Tools::Run::RemoteBlast for an example of how to do this. Barry Javier Terol wrote: > Hi! > > I have been using the SearchIO scripts very happily, but I have > noticed that the parsing does not qive any information about the > sequences that do not produce any hit. > How coul I get the NO HITS FOUND sequences? Could I modify the example > script to do this? > > Thank you very muc in advance > > O@@@@@ > @@@O@@O@ Javier Terol Alcayde > @O@@@@O@ Instituto Valenciano de Investigaciones Agrarias (IVIA) > @@@O@@@@ Carretera Moncada - N?quera, Km. 4,5 > @@@@O@ 46113 Moncada (Valencia) > || Tel. 96 342 4000 ext. 70160 > || Fax. 96 342 4001 > > >------------------------------------------------------------------------ > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From barry.moore at genetics.utah.edu Mon Nov 29 15:56:19 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon Nov 29 15:53:55 2004 Subject: [Bioperl-l] LWP.pm In-Reply-To: References: Message-ID: <41AB8CF3.9000208@genetics.utah.edu> You can get LWP along with all perl modules at CPAN (http://www.cpan.org/). Barry Soojin Yi wrote: > Hello, > > I am trying to use Bioperl, and having a error message saying it > cannot locate the LWP.pm. where can I get it? > > Thanks, > Soojin -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From tex at biocompute.net Sun Nov 28 23:39:17 2004 From: tex at biocompute.net (James Thompson) Date: Mon Nov 29 15:54:23 2004 Subject: [Bioperl-l] LWP.pm In-Reply-To: Message-ID: You can install it via CPAN as part of Bundle::BioPerl. Try this: $ perl -MCPAN -e shell cpan> install LWP LWP is a module on which BioPerl depends. There are a few more of these, you can install them from the CPAN prompt by doing this: cpan> install Bundle::BioPerl Cheers, James Thompson On Mon, 29 Nov 2004, Soojin Yi wrote: > Hello, > > I am trying to use Bioperl, and having a error message saying it > cannot locate the LWP.pm. where can I get it? > > Thanks, > Soojin > From cjm at fruitfly.org Mon Nov 29 16:22:56 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Mon Nov 29 16:20:35 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: Message-ID: On Fri, 26 Nov 2004, Hilmar Lapp wrote: [snip] > Beyond this, I do agree with Chris that there could be a performance > issue with richly annotated databanks. Tag values were simple strings; > using objects for all of them instead at feature creation and > population time may be too heavy. Also Chris, I thought the XML idea > sounded very interesting; did you have Data::Stag in mind to manage it? Not necessarily - though it may borrow some ideas from Data::Stag - the interface would have to be more bioperl-y (The current Data::Stag implementation trades speed for convenience, though later versions may be made more efficient) > It would still require a working XML parser installation, no? yes - it would require some kind of third-party XML module or modules. I guess this may be slightly problemmatic as these are currently all optional for bioperl, yep? > -hilmar > > On Wednesday, November 24, 2004, at 05:58 AM, Aaron J. Mackey wrote: > > > > > Allen, thanks a ton for going the extra mile. Hilmar, does this > > solution satisfy your worries a bit? > > > > Thanks again to everyone, > > > > -Aaron > > > > On Nov 23, 2004, at 9:16 PM, Allen Day wrote: > > > >> Fixed. Here is a summary of what I did to make this happen. I went > >> ahead > >> and did the work necessary to make Bio::SeqFeatureI AnnotatableI > >> instead > >> of being itself an AnnotationCollectionI. > >> > >> . Bio::SeqFeatureI inherits Bio::AnnotatableI NOT > >> Bio::AnnotationCollectionI > >> . *_tag_* methods are in Bio::AnnotatableI, and internally defer to > >> Bio::AnnotatableI->annotation->some_analagous_mapped_function() > >> . method behavior is now more similar to original *_tag_* method > >> behavior ; tag "values" are now instantiated as > >> Bio::Annotation::SimpleValue objects by default, unless their name > >> indicates they should be otherwise (e.g. tag name "comment" or > >> "dblink") > >> . deprecation warnings commented until 1.6 > >> . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow > >> new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). > >> . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as > >> *_tag_* > >> methods map directly onto Bio::AnnotationI's > >> Bio::AnnotationCollectionI instance. > >> . Unflattener and Unflattener2 tests pass with no changes. > >> . All tests pass. > >> > >> -Allen > >> > >> > >> On Tue, 23 Nov 2004, Chris Mungall wrote: > >> > >>> > >>> Unflattener.t is failing because someone has messed up > >>> get_tagset_values() > >>> - this is a convenience method I originally added to SeqFeatureI. > >>> I'm not > >>> familiar enough with the new changes and AnnotationCollections to fix > >>> this. > >>> > >>> Surely the onus has always been on the person making changes to make > >>> sure > >>> the test suite passes before committing their changes? In which > >>> case, how > >>> did these changes make it in in the first place? > >>> > >>> On Tue, 23 Nov 2004, Jason Stajich wrote: > >>> > >>>> > >>>> On Nov 23, 2004, at 4:47 PM, Allen Day wrote: > >>>> > >>>>> On Tue, 23 Nov 2004, Jason Stajich wrote: > >>>>> > >>>>>> I think if we just don't issue deprecation warnings it will be > >>>>>> fine by > >>>>>> me -- even if we are just calling the new subroutine under the > >>>>>> hood. > >>>>>> Tests seem to pass although Unflattner.t is falling over today not > >>>>>> sure > >>>>>> what is problem. > >>>>> > >>>>> that fails for me too, in addition to spewing out lots of > >>>>> diagnotistics. > >>>>> however, if you run 'make test_Unflattener2', it passes. strange. > >>>>> > >>>> no it is Unflattner not Unflattner2 > >>>> > >>>> % make test_Unflattener > >>>> [SNIP OUT SOME STUFF] > >>>> > >>>> -------------------- WARNING --------------------- > >>>> MSG: get_tagset_values() is deprecated. use get_Annotations() > >>>> --------------------------------------------------- > >>>> > >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>> MSG: Abstract method "Bio::AnnotationCollectionI::get_Annotations" > >>>> is > >>>> not implemented by package Bio::SeqFeature::Generic. > >>>> > >>>> > >>>>> -allen > >>>>> > >>>>>> > >>>>>> -jason > >>>>>> On Nov 23, 2004, at 2:28 PM, Aaron J. Mackey wrote: > >>>>>> > >>>>>>> > >>>>>>>> On Friday, November 19, 2004, at 02:50 PM, Allen Day wrote: > >>>>>>>> > >>>>>>>>> * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI > >>>>>>>>> * All Bio::SeqFeatureI *_tag_* methods have been moved to > >>>>>>>>> Bio::AnnotationCollectionI, marked as deprecated, and mapped > >>>>>>>>> to > >>>>>>>>> their > >>>>>>>>> analogous and mostly pre-existing Bio::AnnotationCollectionI > >>>>>>>>> methods. > >>>>>>>>> > >>>>>>>>> Methods which were not in Bio::AnnotationCollectionI, but > >>>>>>>>> were i > >>>>>>>>> Bio::Annotation::Collection and were necessary for *_tag_* > >>>>>>>>> method > >>>>>>>>> remapping were created in Bio::AnnotationCollecitonI. > >>>>>>> > >>>>>>> I've been paying some attention to this, but thought that the > >>>>>>> changes > >>>>>>> were only those required to get Bio::FeatureIO working (i.e. > >>>>>>> recapitulate GFF3 logic) without hampering object usage; do our > >>>>>>> tests > >>>>>>> pass with these changes in place? > >>>>>>> > >>>>>>> On Nov 23, 2004, at 2:12 PM, Jason Stajich wrote: > >>>>>>> > >>>>>>>> it has not been tagged yet. I think Aaron is just really busy > >>>>>>>> on > >>>>>>>> this front. > >>>>>>> > >>>>>>> I did tag the HEAD at RC1, so we could branch from there if we > >>>>>>> needed > >>>>>>> to; if this is really the big bug-bear that Hilmar and Jason are > >>>>>>> claiming, then I'd ask Allen to retract his patches that alter > >>>>>>> interface definitions, and branch. > >>>>>>> > >>>>>>> And I was so hoping to get RC2 packaged up later today ... > >>>>>>> > >>>>>>> -Aaron > >>>>>>> > >>>>>>> -- > >>>>>>> Aaron J. Mackey, Ph.D. > >>>>>>> Dept. of Biology, Goddard 212 > >>>>>>> University of Pennsylvania email: amackey@pcbi.upenn.edu > >>>>>>> 415 S. University Avenue office: 215-898-1205 > >>>>>>> Philadelphia, PA 19104-6017 fax: 215-746-6697 > >>>>>>> > >>>>>>> > >>>>>> -- > >>>>>> Jason Stajich > >>>>>> jason.stajich at duke.edu > >>>>>> http://www.duke.edu/~jes12/ > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l@portal.open-bio.org > >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l@portal.open-bio.org > >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>> -- > >>>> Jason Stajich > >>>> jason.stajich at duke.edu > >>>> http://www.duke.edu/~jes12/ > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l@portal.open-bio.org > >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >> > >> > > -- > > Aaron J. Mackey, Ph.D. > > Dept. of Biology, Goddard 212 > > University of Pennsylvania email: amackey@pcbi.upenn.edu > > 415 S. University Avenue office: 215-898-1205 > > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > From allenday at ucla.edu Mon Nov 29 17:24:36 2004 From: allenday at ucla.edu (Allen Day) Date: Mon Nov 29 17:22:18 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: References: Message-ID: On Sun, 28 Nov 2004, Hilmar Lapp wrote: > > On Sunday, November 28, 2004, at 08:28 AM, Aaron J. Mackey wrote: > > >> E.g., get_all_tags() will now return all annotation keys ever added, > >> regardless of whether through $feat->primary_tag, $feat->source_tag, > >> $feat->annotation->add_Annotation, or $feat->add_tag_value, unless > >> any of these methods is overridden. > > > > Thanks for taking a deeper look into things, Hilmar. Is this true for > > all SeqFeatureI-implementing classes, or just > > Bio::SeqFeature::Annotated? > > Including primary_tag and source_tag in the annotation collection is > what SeqFeature::Annotated does. Lumping together tags and annotation > is the default decorating implementation on the AnnotatableI interface > for the *_tag_* methods, and hence will be used for all SeqFeatureI > implementing classes that don't override it. SeqFeature::Generic, for > instance, doesn't override it (anymore), but it does keep primary_tag > and source_tag separate from the annotation bundle. primary_tag() and source_tag() being separate from the AC was an oversight on my part. the intention was to move all of the feature's tag attributes into the collection. > So, evidently there are subtle aspects about the various tags and the > annotation that have not been spelled out when documenting the contract. that's right. it's still not well defined. > Specifically, this is whether an implementation may or must keep the > different tags and annotations in sets that at least behave as > semantically distinct sets. E.g., if you set a primary_tag, may an > implementation treat it as annotation and still be SeqFeatureI > compliant (meaning, as a SeqFeatureI consumer, you'd have to be > prepared for that). Or, if you add a tag 'blah' via > add_tag_value('blah',$whatever), may an implementation or may it not > enforce the same class on $whatever that it enforces for annotations > added to the collection under the same key 'blah'. > > In the absence of this having been spelled out beforehand, I would > argue that the most frequently used implementations' behavior will > probably constitute what other programmers took as an answer. In this > case, that'd be SeqFeature::Generic ... i agree with this for the 1.5 release. however, i also feel that SeqFeatureI is due for some heavy revision. > >> This is very different from what it used to be, namely only return > >> those keys added through $feat->add_tag_value. Anybody who made any > >> assumption that what you didn't add through add_tag_value you'll not > >> get back through get_all_tags will be hosed. > > > > I could see an argument that "all_tag_values" should really retrieve > > just that: all tags, not just any extra tags from add_tag_value. But > > I agree that it's potentially a significant change. > > Yes, and yes. It's a drastic change in behavior, but would be sensible > if the target is to eventually obsolete the concepts of tags and > annotations being different animals. which it is. but for the 1.5 release in the interest of stability i think the best thing to do will be to put the SeqFeature::Generic *_tag_* implementation back into that class (i.e. override the SeqFeatureI default dependence on AnnotatableI) so get_all_annotation_keys() and friends will continue to return results as previously expected. > >> I.e., anybody who assumed that $feat->get_all_tags and > >> $feat->annotation->get_all_annotation_keys hold distinct sets of > >> values will be hosed. This includes bioperl-db, for those who care. > > > > But surely bioperl-db could be fixed to no longer make this assumption? > > It certainly could, and I certainly will once this is the direction > that bioperl 'officially' adopts. Right now, the change is two weeks > old and I'm not sure anyone, including Allen, is ready to vouch that > this is a real good way of doing things. [And as aside, fixing > bioperl-db to work with the new API may make bioperl-db backwardly > incompatible as well.] right, it's still not fully clear. i will continue to advocate the merging of the two feature annotation systems (hash based and object based) into some common form. the SeqI annotation mechanisms should follow suit, but i'm not ready to touch them yet. > I'm not saying this change of direction may be a show-stopper for any > dependent package like bioperl-db. All I'm suggesting is let's be clear > that this *is* a change of direction for a core interface, and let's > give it some time to phase it in and to iron out wrinkles, both on the > end of bioperl itself as well as the end of people who write software > against bioperl. Let's give it some time to see how it works, and how > it works under stress, before letting it lose on the general public who > just wanted to get some bugfixes on the 1.4.0 release or some > additional parsers. > > > > >> It is also not what SeqFeature::Generic will do now, namely not > >> return the primary_tag and source_tag keys. > > > > Ahh, so we are talking only about Bio::SeqFeature::Annotated? > > Well, no, see above. SeqFeature::Generic does override primary_tag and > source_tag, which is why here those tags won't make it into the > annotation bundle. It doesn't override the *_tag_* methods, so for > those you get the behavior from AnnotatableI. again this wasn't my intention, i missed something in the code shuffle. all the tags, including primary_tag() and source_tag() should have been moved into the AC. -Allen > > -hilmar > > > > > -Aaron > > > > > From allenday at ucla.edu Mon Nov 29 17:26:45 2004 From: allenday at ucla.edu (Allen Day) Date: Mon Nov 29 17:24:21 2004 Subject: [Bioperl-l] Annotated.pm In-Reply-To: <41A9FB33.3070900@pcbi.upenn.edu> References: <41A9FB33.3070900@pcbi.upenn.edu> Message-ID: I'll take this out. On Sun, 28 Nov 2004, Aaron J. Mackey wrote: > > > Allen Day wrote: > > > > if you're not sure how many annotations there are, you'd better call it in > > list context unless you want the count: > > > > my @value = $feature->get_Annotations('key1'); > > my $count = $feature->get_Annotations('key1'); > > > > but if you're sure there is only one, you can call it either way: > > > > my @value = $feature->get_Annotations('key2'); > > my $value = $feature->get_Annotations('key2'); > > This is the kind of thing that, while cute, makes BioPerl really hard to > teach to newcomers who expect things to be consistently one way or the > other. To use DBI as a (formidable) example, when I call > $sth->fetchrow(), I'm going to get back an array, even if there was only > one column in that row. I didn't see this "short cut" slip by, but I'm > really against things like this when they're not implemented > project-wide, sorry. Something else for bioperl-nouveau ... > > -Aaron > From allenday at ucla.edu Mon Nov 29 17:30:40 2004 From: allenday at ucla.edu (Allen Day) Date: Mon Nov 29 17:28:16 2004 Subject: [Bioperl-l] AnnotationCollectionI and SeqFeatureI changes In-Reply-To: <02C65092-4210-11D9-8243-000D93392082@pcbi.upenn.edu> References: <02C65092-4210-11D9-8243-000D93392082@pcbi.upenn.edu> Message-ID: On Mon, 29 Nov 2004, Aaron J. Mackey wrote: > > Yep, OK, I hear you. I really thought all this was going to be > contained to Bio::SeqFeature::Annotated, but I see now that with all > sorts of implementation happening in the interfaces (ugh!), this can't > happen. Woe is me. > > Here's what I'm willing to do to keep Allen from pulling his hair out: > there have been very few changes on the development trunk since RC1 > that aren't Annotated.pm-related; therefore, (if this makes sense to > everyone) I will branch 1.5.0 off of RC1 and merge only those patches > that are Annotated.pm-*unrelated* to the 1.5.0 branch. I will then tag > the branch at RC2 (and similarly tag the HEAD, so that any later > merging can be done relative to those tags). Make sense? yeah. i think the easiest way to restore old functionality is going to be to roll back SeqFeature::Generic to before i removed the current SeqFeatureI overriding methods (get_tag_values(), add_tag_value(), etc), or to re-add these methods. this also necessitates rolling back SeqFeature::AnnotationAdaptor and its unit test, as it assumes the *_tag_* hash and the attached AC don't overlap. that's pretty much it. the interface classes can stay the same -- it won't affect the "heavy" lifter (SeqFeature::Generic), and it will allow SeqFeature::Annotated to continue to depend on the revised interface. -allen > > Then, the rest of you (Allen, Hilmar, Steffen, etc) need to figure out > the cleanest path for 1.6.0, in which all things may change (with an > eye towards at least some backwards compatibility); my vote would be > that there remain some separation between "heavy" and "light" feature > types. I don't expect/need my Bio::SeqFeature::Simple to implement > AnnotationCollection! > > Thanks again to everyone; let me know if the CVS plan above sounds > reasonable ... > > -Aaron > > On Nov 28, 2004, at 10:08 PM, Hilmar Lapp wrote: > > > I'm not saying this change of direction may be a show-stopper for any > > dependent package like bioperl-db. All I'm suggesting is let's be > > clear that this *is* a change of direction for a core interface, and > > let's give it some time to phase it in and to iron out wrinkles, both > > on the end of bioperl itself as well as the end of people who write > > software against bioperl. Let's give it some time to see how it works, > > and how it works under stress, before letting it lose on the general > > public who just wanted to get some bugfixes on the 1.4.0 release or > > some additional parsers. > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > From cariaso at yahoo.com Mon Nov 29 18:03:57 2004 From: cariaso at yahoo.com (Mike Cariaso) Date: Mon Nov 29 18:31:41 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: <41AB7773.5040703@mail.nih.gov> Message-ID: <20041129230357.21060.qmail@web52704.mail.yahoo.com> This message is being cross posted from bioclusters to bioperl. I'd appreciate a clarification from anyone in bioperl who can speak more authoritatively than my semi-speculation. Perl does have a garbage collector. It is not wildly sophisticated. As you've suggested it uses simple reference counting. This means that circular references will cause memory to be held until program termination. However I think you are overstating the inefficiency in the system. While the perl GC *may* not release memory to the system, it does at least allow memory to be reused within the process. If the system instead behaved as you describe, I think perl would hemorrhage memory and would be unsuitable for any long running processes. However I can say with considerable certainty that that BPLite is able to handle blast reports which cause SearchIO to thrash. I've attributed this to BPLite being a true stream processor, while SearchIO seems to slurp the whole file and object heirarchy into memory. I know that SearchIO is the prefered blast parser, but it seems that BPLite is not quite dead, for the reasons above. If this is infact the unique benefit of BPLite, perhaps the documentation should be clearer about this, as I suspect I'm not the only person to have had to reengineer a substantial piece of code to adjust between their different models. Had I known of this difference early on I would have chosen BPLite. So, bioperlers (especially Jason Stajich) can you shed any light on this vestigial bioperl organ? --- Malay wrote: > Michael Cariaso wrote: > > Michael Maibaum wrote: > > > >> > >> On 10 Nov 2004, at 18:25, Al Tucker wrote: > >> > >>> Hi everybody. > >>> > >>> We're new to the Inquiry Xserve scientific > cluster and trying to iron > >>> out a few things. > >>> > >>> One thing is we seem to be coming up against is > an out of memory > >>> error when getting large sequence analysis > results (5,000 seq - at > >>> least- and above) back from BTblastall. The > problem seems to be with > >>> BioPerl. > >>> > >>> Might anyone here know if BioPerl is knows > enough not to try and > >>> access more than 4gb of RAM in a single process > (an OS X limit)? I'm > >>> told Blastall and BTblastall are and will chunk > problems accordingly, > >>> but we're not certain if BioPerl is when called > to merge large Blast > >>> results back together. It's the default version > 1.2.3 that's supplied > >>> btw, and OS X 10.3.5 with all current updates > just short of the > >>> latest 10.3.6 update. > >> > >> > > > >> BioPerl tries to slurp up the entire results set > from a BLAST query, > >> and build objects for each little bit of the > result set and uses lots > >> of memory. It doesn't have anything smart at all > about breaking up the > >> job within the result set, afaik. > >> > > This is not really true. SearchIO module as far as I > know works on stream. > > >> I ended up stripping out results that hit a > certain threshold size to > >> run on a different, large memory opteron/linux > box and I'm > >> experimenting with replacing BioPerl with > BioPython etc. > >> > >> Michael > > > > > > You may find hthat the BPLite parser works better > when dealing with > > large blast result files. Its not as clean or > maintained, but it does > > the job nicely for my current needs, which > overloaded the usual parser. > > There is basically no difference between BPLite and > other BLAST parser > interfaces in Bioperl. > > > The problem lies in the core of Perl iteself. Perl > does not release > memory to the system even after the reference count > of an object created > in the memory goes to 0, unless the program in > actually over. Perl > object system in highly inefficient to handle large > number of objects > created in the memory. > > -Malay > _______________________________________________ > Bioclusters maillist - > Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters > ===== Mike Cariaso From ifkorf at ucdavis.edu Mon Nov 29 18:32:50 2004 From: ifkorf at ucdavis.edu (Ian Korf) Date: Mon Nov 29 18:31:44 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: <20041129230357.21060.qmail@web52704.mail.yahoo.com> References: <20041129230357.21060.qmail@web52704.mail.yahoo.com> Message-ID: After a recent conversation about memory in Perl, I decided to do some actual experiments. Here's the email I composed on the subject. I looked into the Perl memory issue. It's true that if you allocate a huge amount of memory that Perl doesn't like to give it back. But it's not as bad a situation as you might think. Let's say you do something like $FOO = 'N' x 100000000; That will allocate a chunk of about 192 Mb on my system. It doesn't matter if this is a package variable or lexical. our $FOO = 'N' x 100000000; # 192 Mb my $FOO = 'N' x 100000000; # 192 Mb If you put this in a subroutine sub foo {my $FOO = 'N' x 100000000} and you call this a bunch of times foo(); foo(); foo(); foo(); foo(); foo(); foo(); the memory footprint stays at 192 Mb. So Perl's garbage collection works just fine. Perl doesn't let go of the memory it has taken from the OS, but it is happy to reassign the memory it has reserved. Here's something odd. The following labeled block looks like it should use no memory. BLOCK: { my $FOO = 'N' x 100000000; } The weird thing is that after executing the block, the memory footprint is still 192 Mb as if it hadn't been garbage collected. Now look at this: my $foo = 'X' x 100000000; undef $foo; This has a memory footprint of 96 Mb. After some more experimentation, I have come up with the following interpretation of memory allocation and garbage collection in Perl. Perl will reuse memory for a variable of a given name (either package or lexical scope). There is no fear of memory leaks in loops for example. But each different named variable will retain its own minimum memory. That minimum memory is the size of the largest memory allocated to that variable, or half that amount if other variables have taken some of that space already. You can get any variable to automatically give up half its memory with undef. But this takes a little more CPU time. Here's some test code that shows this behavior. sub foo {my $FOO = 'N' x 100000000} for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s sub bar {my $BAR = 'N' x 100000000; undef $BAR} for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s The increase from 1 sec to 21 sec system CPU time is all the extra memory allocation and freeing associated with the undef statement. Why the user time is less in the undef example is a mystery to me. OK, to make a hideously long story short, use undef to save memory and use the same variable name over and over if you can. --- But this email thread has gone to BPlite, of which I am the original author. BPlite is designed to parse a stream and only reads a minimal amount of information at a time. The disadvantage of this is that if you want to know something about statistics, you can't get this until the end of the report (the original BPlite ignored statistics entirely). I like the new SearchIO interface better than BPlite, but for my own uses I generally use a table format most of the time and don't really use a BLAST parser very often. -Ian On Nov 29, 2004, at 3:03 PM, Mike Cariaso wrote: > This message is being cross posted from bioclusters to > bioperl. I'd appreciate a clarification from anyone in > bioperl who can speak more authoritatively than my > semi-speculation. > > > Perl does have a garbage collector. It is not wildly > sophisticated. As you've suggested it uses simple > reference counting. This means that circular > references will cause memory to be held until program > termination. > > However I think you are overstating the inefficiency > in the system. While the perl GC *may* not release > memory to the system, it does at least allow memory to > be reused within the process. > > If the system instead behaved as you describe, I think > perl would hemorrhage memory and would be unsuitable > for any long running processes. > > However I can say with considerable certainty that > that BPLite is able to handle blast reports which > cause SearchIO to thrash. I've attributed this to > BPLite being a true stream processor, while SearchIO > seems to slurp the whole file and object heirarchy > into memory. > > I know that SearchIO is the prefered blast parser, but > it seems that BPLite is not quite dead, for the > reasons above. If this is infact the unique benefit of > BPLite, perhaps the documentation should be clearer > about this, as I suspect I'm not the only person to > have had to reengineer a substantial piece of code to > adjust between their different models. Had I known of > this difference early on I would have chosen BPLite. > > So, bioperlers (especially Jason Stajich) can you shed > any light on this vestigial bioperl organ? > > > > --- Malay wrote: > >> Michael Cariaso wrote: >>> Michael Maibaum wrote: >>> >>>> >>>> On 10 Nov 2004, at 18:25, Al Tucker wrote: >>>> >>>>> Hi everybody. >>>>> >>>>> We're new to the Inquiry Xserve scientific >> cluster and trying to iron >>>>> out a few things. >>>>> >>>>> One thing is we seem to be coming up against is >> an out of memory >>>>> error when getting large sequence analysis >> results (5,000 seq - at >>>>> least- and above) back from BTblastall. The >> problem seems to be with >>>>> BioPerl. >>>>> >>>>> Might anyone here know if BioPerl is knows >> enough not to try and >>>>> access more than 4gb of RAM in a single process >> (an OS X limit)? I'm >>>>> told Blastall and BTblastall are and will chunk >> problems accordingly, >>>>> but we're not certain if BioPerl is when called >> to merge large Blast >>>>> results back together. It's the default version >> 1.2.3 that's supplied >>>>> btw, and OS X 10.3.5 with all current updates >> just short of the >>>>> latest 10.3.6 update. >>>> >>>> >> >> >>>> BioPerl tries to slurp up the entire results set >> from a BLAST query, >>>> and build objects for each little bit of the >> result set and uses lots >>>> of memory. It doesn't have anything smart at all >> about breaking up the >>>> job within the result set, afaik. >>>> >> >> This is not really true. SearchIO module as far as I >> know works on stream. >> >>>> I ended up stripping out results that hit a >> certain threshold size to >>>> run on a different, large memory opteron/linux >> box and I'm >>>> experimenting with replacing BioPerl with >> BioPython etc. >>>> >>>> Michael >>> >>> >>> You may find hthat the BPLite parser works better >> when dealing with >>> large blast result files. Its not as clean or >> maintained, but it does >>> the job nicely for my current needs, which >> overloaded the usual parser. >> >> There is basically no difference between BPLite and >> other BLAST parser >> interfaces in Bioperl. >> >> >> The problem lies in the core of Perl iteself. Perl >> does not release >> memory to the system even after the reference count >> of an object created >> in the memory goes to 0, unless the program in >> actually over. Perl >> object system in highly inefficient to handle large >> number of objects >> created in the memory. >> >> -Malay >> _______________________________________________ >> Bioclusters maillist - >> Bioclusters@bioinformatics.org >> > https://bioinformatics.org/mailman/listinfo/bioclusters >> > > > ===== > Mike Cariaso > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters > From billk at iinet.net.au Mon Nov 29 20:04:11 2004 From: billk at iinet.net.au (Bill Kenworthy) Date: Mon Nov 29 20:02:41 2004 Subject: [Bioperl-l] SearchIO and bl2seq blast reports Message-ID: <1101776651.15431.62.camel@cbbcbitl303c.murdoch.edu.au> Hi, what is the best way to detect strandedness using SearchIO parsing bl2seq reports? Or should I go back to BPLite which I think works. There were bugs posted about this in the past, but SearchIO in 1.4 is still returning 0 for frame and strand on both hit and query with any form of bl2seq I have tried. BillK From jason.stajich at duke.edu Mon Nov 29 22:58:11 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Nov 29 22:55:44 2004 Subject: [Bioperl-l] SearchIO and bl2seq blast reports In-Reply-To: <1101776651.15431.62.camel@cbbcbitl303c.murdoch.edu.au> References: <1101776651.15431.62.camel@cbbcbitl303c.murdoch.edu.au> Message-ID: <0DC1218A-4284-11D9-B39E-000393C44276@duke.edu> Should work - we have tests in the bioperl-live code for strandedness and bl2seq reports for blastn and tblastn. in t/SearchIO.t ok($hsp->query->start, 94); ok($hsp->query->end, 180); ok($hsp->query->strand, 1); ok($hsp->hit->strand, -1); ok($hsp->hit->start, 1); I don't remember if this bug was fixed before 1.4 went out but SearchIO::blast parsing was definitely updated for several bugs since 1.4.0 release. Can you post a report and exact code that you are testing as a bug report to http://bugzilla.open-bio.org and someone can have a looksie. -jason On Nov 29, 2004, at 8:04 PM, Bill Kenworthy wrote: > Hi, what is the best way to detect strandedness using SearchIO parsing > bl2seq reports? Or should I go back to BPLite which I think works. > > There were bugs posted about this in the past, but SearchIO in 1.4 is > still returning 0 for frame and strand on both hit and query with any > form of bl2seq I have tried. > > BillK > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From mbasu at mail.nih.gov Mon Nov 29 23:12:10 2004 From: mbasu at mail.nih.gov (Malay) Date: Mon Nov 29 23:10:01 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: References: <20041129230357.21060.qmail@web52704.mail.yahoo.com> Message-ID: <41ABF31A.4080101@mail.nih.gov> Thanks Ian for your mail. But you have missed a major point of the original discussion. What happens to object? So I did the same test that you did using object. Here is the result. use strict; package Test; sub new { my $class =shift; my $self = {}; bless $self, $class; $sel->{'foo'} = 'N' x 100000000; return $self; } package main; my $ob = Test->new(); #uses 197 MB as you said. undef $ob; ## still uses 197 MB ???!!!! This was the original point. Perl never releases memory for the initial object creation. Infact try doing this in whatever way possible, reusing references or undeffing it, the memory usage will never go down below 197 MB, till the executaion duration of the program. So I humbly differ in my opinion in any elaborate in-memory object hierarchy in Perl. The language is not meant for that. But I am nobody, stallwarts will differ in opinion. -Malay Ian Korf wrote: > After a recent conversation about memory in Perl, I decided to do some > actual experiments. Here's the email I composed on the subject. > > > I looked into the Perl memory issue. It's true that if you allocate a > huge amount of memory that Perl doesn't like to give it back. But it's > not as bad a situation as you might think. Let's say you do something > like > > $FOO = 'N' x 100000000; > > That will allocate a chunk of about 192 Mb on my system. It doesn't > matter if this is a package variable or lexical. > > our $FOO = 'N' x 100000000; # 192 Mb > my $FOO = 'N' x 100000000; # 192 Mb > > If you put this in a subroutine > > sub foo {my $FOO = 'N' x 100000000} > > and you call this a bunch of times > > foo(); foo(); foo(); foo(); foo(); foo(); foo(); > > the memory footprint stays at 192 Mb. So Perl's garbage collection > works just fine. Perl doesn't let go of the memory it has taken from > the OS, but it is happy to reassign the memory it has reserved. > > Here's something odd. The following labeled block looks like it should > use no memory. > > BLOCK: { > my $FOO = 'N' x 100000000; > } > > The weird thing is that after executing the block, the memory > footprint is still 192 Mb as if it hadn't been garbage collected. > > Now look at this: > > my $foo = 'X' x 100000000; > undef $foo; > > This has a memory footprint of 96 Mb. After some more experimentation, > I have come up with the following interpretation of memory allocation > and garbage collection in Perl. Perl will reuse memory for a variable > of a given name (either package or lexical scope). There is no fear of > memory leaks in loops for example. But each different named variable > will retain its own minimum memory. That minimum memory is the size of > the largest memory allocated to that variable, or half that amount if > other variables have taken some of that space already. You can get any > variable to automatically give up half its memory with undef. But this > takes a little more CPU time. Here's some test code that shows this > behavior. > > sub foo {my $FOO = 'N' x 100000000} > for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s > > sub bar {my $BAR = 'N' x 100000000; undef $BAR} > for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s > > The increase from 1 sec to 21 sec system CPU time is all the extra > memory allocation and freeing associated with the undef statement. Why > the user time is less in the undef example is a mystery to me. > > OK, to make a hideously long story short, use undef to save memory and > use the same variable name over and over if you can. > > --- > > But this email thread has gone to BPlite, of which I am the original > author. BPlite is designed to parse a stream and only reads a minimal > amount of information at a time. The disadvantage of this is that if > you want to know something about statistics, you can't get this until > the end of the report (the original BPlite ignored statistics > entirely). I like the new SearchIO interface better than BPlite, but > for my own uses I generally use a table format most of the time and > don't really use a BLAST parser very often. > > -Ian > > On Nov 29, 2004, at 3:03 PM, Mike Cariaso wrote: > >> This message is being cross posted from bioclusters to >> bioperl. I'd appreciate a clarification from anyone in >> bioperl who can speak more authoritatively than my >> semi-speculation. >> >> >> Perl does have a garbage collector. It is not wildly >> sophisticated. As you've suggested it uses simple >> reference counting. This means that circular >> references will cause memory to be held until program >> termination. >> >> However I think you are overstating the inefficiency >> in the system. While the perl GC *may* not release >> memory to the system, it does at least allow memory to >> be reused within the process. >> >> If the system instead behaved as you describe, I think >> perl would hemorrhage memory and would be unsuitable >> for any long running processes. >> >> However I can say with considerable certainty that >> that BPLite is able to handle blast reports which >> cause SearchIO to thrash. I've attributed this to >> BPLite being a true stream processor, while SearchIO >> seems to slurp the whole file and object heirarchy >> into memory. >> >> I know that SearchIO is the prefered blast parser, but >> it seems that BPLite is not quite dead, for the >> reasons above. If this is infact the unique benefit of >> BPLite, perhaps the documentation should be clearer >> about this, as I suspect I'm not the only person to >> have had to reengineer a substantial piece of code to >> adjust between their different models. Had I known of >> this difference early on I would have chosen BPLite. >> >> So, bioperlers (especially Jason Stajich) can you shed >> any light on this vestigial bioperl organ? >> >> >> >> --- Malay wrote: >> >>> Michael Cariaso wrote: >>> >>>> Michael Maibaum wrote: >>>> >>>>> >>>>> On 10 Nov 2004, at 18:25, Al Tucker wrote: >>>>> >>>>>> Hi everybody. >>>>>> >>>>>> We're new to the Inquiry Xserve scientific >>>>> >>> cluster and trying to iron >>> >>>>>> out a few things. >>>>>> >>>>>> One thing is we seem to be coming up against is >>>>> >>> an out of memory >>> >>>>>> error when getting large sequence analysis >>>>> >>> results (5,000 seq - at >>> >>>>>> least- and above) back from BTblastall. The >>>>> >>> problem seems to be with >>> >>>>>> BioPerl. >>>>>> >>>>>> Might anyone here know if BioPerl is knows >>>>> >>> enough not to try and >>> >>>>>> access more than 4gb of RAM in a single process >>>>> >>> (an OS X limit)? I'm >>> >>>>>> told Blastall and BTblastall are and will chunk >>>>> >>> problems accordingly, >>> >>>>>> but we're not certain if BioPerl is when called >>>>> >>> to merge large Blast >>> >>>>>> results back together. It's the default version >>>>> >>> 1.2.3 that's supplied >>> >>>>>> btw, and OS X 10.3.5 with all current updates >>>>> >>> just short of the >>> >>>>>> latest 10.3.6 update. >>>>> >>>>> >>>>> >>> >>> >>>>> BioPerl tries to slurp up the entire results set >>>> >>> from a BLAST query, >>> >>>>> and build objects for each little bit of the >>>> >>> result set and uses lots >>> >>>>> of memory. It doesn't have anything smart at all >>>> >>> about breaking up the >>> >>>>> job within the result set, afaik. >>>>> >>> >>> This is not really true. SearchIO module as far as I >>> know works on stream. >>> >>>>> I ended up stripping out results that hit a >>>> >>> certain threshold size to >>> >>>>> run on a different, large memory opteron/linux >>>> >>> box and I'm >>> >>>>> experimenting with replacing BioPerl with >>>> >>> BioPython etc. >>> >>>>> >>>>> Michael >>>> >>>> >>>> >>>> You may find hthat the BPLite parser works better >>> >>> when dealing with >>> >>>> large blast result files. Its not as clean or >>> >>> maintained, but it does >>> >>>> the job nicely for my current needs, which >>> >>> overloaded the usual parser. >>> >>> There is basically no difference between BPLite and >>> other BLAST parser >>> interfaces in Bioperl. >>> >>> >>> The problem lies in the core of Perl iteself. Perl >>> does not release >>> memory to the system even after the reference count >>> of an object created >>> in the memory goes to 0, unless the program in >>> actually over. Perl >>> object system in highly inefficient to handle large >>> number of objects >>> created in the memory. >>> >>> -Malay >>> _______________________________________________ >>> Bioclusters maillist - >>> Bioclusters@bioinformatics.org >>> >> https://bioinformatics.org/mailman/listinfo/bioclusters >> >>> >> >> >> ===== >> Mike Cariaso >> _______________________________________________ >> Bioclusters maillist - Bioclusters@bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bioclusters >> > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters From billk at iinet.net.au Tue Nov 30 02:33:47 2004 From: billk at iinet.net.au (Bill Kenworthy) Date: Tue Nov 30 02:31:52 2004 Subject: [Bioperl-l] SearchIO and bl2seq blast reports In-Reply-To: <0DC1218A-4284-11D9-B39E-000393C44276@duke.edu> References: <1101776651.15431.62.camel@cbbcbitl303c.murdoch.edu.au> <0DC1218A-4284-11D9-B39E-000393C44276@duke.edu> Message-ID: <1101800027.16199.13.camel@cbbcbitl303c.murdoch.edu.au> Filed as bug 1713 with the bl2seq blastn report as an attachment BillK On Mon, 2004-11-29 at 22:58 -0500, Jason Stajich wrote: > Should work - we have tests in the bioperl-live code for strandedness > and bl2seq reports for blastn and tblastn. > > in t/SearchIO.t > ok($hsp->query->start, 94); > ok($hsp->query->end, 180); > ok($hsp->query->strand, 1); > ok($hsp->hit->strand, -1); > ok($hsp->hit->start, 1); > > I don't remember if this bug was fixed before 1.4 went out but > SearchIO::blast parsing was definitely updated for several bugs since > 1.4.0 release. > > Can you post a report and exact code that you are testing as a bug > report to http://bugzilla.open-bio.org and someone can have a looksie. > > -jason > On Nov 29, 2004, at 8:04 PM, Bill Kenworthy wrote: > > > Hi, what is the best way to detect strandedness using SearchIO parsing > > bl2seq reports? Or should I go back to BPLite which I think works. > > > > There were bugs posted about this in the past, but SearchIO in 1.4 is > > still returning 0 for frame and strand on both hit and query with any > > form of bl2seq I have tried. > > > > BillK > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From sac at portal.open-bio.org Tue Nov 30 04:24:24 2004 From: sac at portal.open-bio.org (Steve Chervitz) Date: Tue Nov 30 08:13:57 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: <41ABF31A.4080101@mail.nih.gov> Message-ID: My perl behaves like Ian's: undefing the object leads to about half of the memory being reclaimed by the OS. Repeatedly creating and undefing never leads to greater than ~190Mb or less than ~95Mb of allocation. (My perl= 5.8.1-RC3 built for darwin-thread-multi-2level, Mac OS X 10.3.6) I think that Perl, in attempting to improve performance, doesn't want to give back memory resulting from deletions within a running process, but keeps some or all of it in a pool for future allocation. This behavior can probably be controlled by how you build perl. For example, not including any compiler optimizations might make perl more forthcoming with memory, but performance will suffer. For a definitive answer, I'd recommend checking with the perl porters: http://www.gossamer-threads.com/lists/perl/porters/ . Regarding SearchIO memory usage, I don't think this has been an issue before, so I wonder if there is something about the installation or specific usage of it that is leading to memory hogging. I've run it over large numbers of reports without noticing troubles. It would be useful to see a sample report + script using SearchIO that leads to the memory troubles, so we can try to reproduce it. Steve > From: Malay > Date: Mon, 29 Nov 2004 23:12:10 -0500 > To: "Clustering, compute farming & distributed computing in life science > informatics" > Cc: , > Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling > > Thanks Ian for your mail. But you have missed a major point of the > original discussion. What happens to object? So I did the same test that > you did using object. Here is the result. > > use strict; > package Test; > > sub new { > > my $class =shift; > my $self = {}; > bless $self, $class; > $sel->{'foo'} = 'N' x 100000000; > return $self; > } > > package main; > > my $ob = Test->new(); #uses 197 MB as you said. > > undef $ob; ## still uses 197 MB ???!!!! > > This was the original point. Perl never releases memory for the initial > object creation. Infact try doing this in whatever way possible, > reusing references or undeffing it, the memory usage will never go down > below 197 MB, till the executaion duration of the program. > > So I humbly differ in my opinion in any elaborate in-memory object > hierarchy in Perl. The language is not meant for that. But I am nobody, > stallwarts will differ in opinion. > > -Malay > > > > > > > > > Ian Korf wrote: > >> After a recent conversation about memory in Perl, I decided to do some >> actual experiments. Here's the email I composed on the subject. >> >> >> I looked into the Perl memory issue. It's true that if you allocate a >> huge amount of memory that Perl doesn't like to give it back. But it's >> not as bad a situation as you might think. Let's say you do something >> like >> >> $FOO = 'N' x 100000000; >> >> That will allocate a chunk of about 192 Mb on my system. It doesn't >> matter if this is a package variable or lexical. >> >> our $FOO = 'N' x 100000000; # 192 Mb >> my $FOO = 'N' x 100000000; # 192 Mb >> >> If you put this in a subroutine >> >> sub foo {my $FOO = 'N' x 100000000} >> >> and you call this a bunch of times >> >> foo(); foo(); foo(); foo(); foo(); foo(); foo(); >> >> the memory footprint stays at 192 Mb. So Perl's garbage collection >> works just fine. Perl doesn't let go of the memory it has taken from >> the OS, but it is happy to reassign the memory it has reserved. >> >> Here's something odd. The following labeled block looks like it should >> use no memory. >> >> BLOCK: { >> my $FOO = 'N' x 100000000; >> } >> >> The weird thing is that after executing the block, the memory >> footprint is still 192 Mb as if it hadn't been garbage collected. >> >> Now look at this: >> >> my $foo = 'X' x 100000000; >> undef $foo; >> >> This has a memory footprint of 96 Mb. After some more experimentation, >> I have come up with the following interpretation of memory allocation >> and garbage collection in Perl. Perl will reuse memory for a variable >> of a given name (either package or lexical scope). There is no fear of >> memory leaks in loops for example. But each different named variable >> will retain its own minimum memory. That minimum memory is the size of >> the largest memory allocated to that variable, or half that amount if >> other variables have taken some of that space already. You can get any >> variable to automatically give up half its memory with undef. But this >> takes a little more CPU time. Here's some test code that shows this >> behavior. >> >> sub foo {my $FOO = 'N' x 100000000} >> for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s >> >> sub bar {my $BAR = 'N' x 100000000; undef $BAR} >> for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s >> >> The increase from 1 sec to 21 sec system CPU time is all the extra >> memory allocation and freeing associated with the undef statement. Why >> the user time is less in the undef example is a mystery to me. >> >> OK, to make a hideously long story short, use undef to save memory and >> use the same variable name over and over if you can. >> >> --- >> >> But this email thread has gone to BPlite, of which I am the original >> author. BPlite is designed to parse a stream and only reads a minimal >> amount of information at a time. The disadvantage of this is that if >> you want to know something about statistics, you can't get this until >> the end of the report (the original BPlite ignored statistics >> entirely). I like the new SearchIO interface better than BPlite, but >> for my own uses I generally use a table format most of the time and >> don't really use a BLAST parser very often. >> >> -Ian >> >> On Nov 29, 2004, at 3:03 PM, Mike Cariaso wrote: >> >>> This message is being cross posted from bioclusters to >>> bioperl. I'd appreciate a clarification from anyone in >>> bioperl who can speak more authoritatively than my >>> semi-speculation. >>> >>> >>> Perl does have a garbage collector. It is not wildly >>> sophisticated. As you've suggested it uses simple >>> reference counting. This means that circular >>> references will cause memory to be held until program >>> termination. >>> >>> However I think you are overstating the inefficiency >>> in the system. While the perl GC *may* not release >>> memory to the system, it does at least allow memory to >>> be reused within the process. >>> >>> If the system instead behaved as you describe, I think >>> perl would hemorrhage memory and would be unsuitable >>> for any long running processes. >>> >>> However I can say with considerable certainty that >>> that BPLite is able to handle blast reports which >>> cause SearchIO to thrash. I've attributed this to >>> BPLite being a true stream processor, while SearchIO >>> seems to slurp the whole file and object heirarchy >>> into memory. >>> >>> I know that SearchIO is the prefered blast parser, but >>> it seems that BPLite is not quite dead, for the >>> reasons above. If this is infact the unique benefit of >>> BPLite, perhaps the documentation should be clearer >>> about this, as I suspect I'm not the only person to >>> have had to reengineer a substantial piece of code to >>> adjust between their different models. Had I known of >>> this difference early on I would have chosen BPLite. >>> >>> So, bioperlers (especially Jason Stajich) can you shed >>> any light on this vestigial bioperl organ? >>> >>> >>> >>> --- Malay wrote: >>> >>>> Michael Cariaso wrote: >>>> >>>>> Michael Maibaum wrote: >>>>> >>>>>> >>>>>> On 10 Nov 2004, at 18:25, Al Tucker wrote: >>>>>> >>>>>>> Hi everybody. >>>>>>> >>>>>>> We're new to the Inquiry Xserve scientific >>>>>>> >>>> cluster and trying to iron >>>> >>>>>>> out a few things. >>>>>>> >>>>>>> One thing is we seem to be coming up against is >>>>>>> >>>> an out of memory >>>> >>>>>>> error when getting large sequence analysis >>>>>>> >>>> results (5,000 seq - at >>>> >>>>>>> least- and above) back from BTblastall. The >>>>>>> >>>> problem seems to be with >>>> >>>>>>> BioPerl. >>>>>>> >>>>>>> Might anyone here know if BioPerl is knows >>>>>>> >>>> enough not to try and >>>> >>>>>>> access more than 4gb of RAM in a single process >>>>>>> >>>> (an OS X limit)? I'm >>>> >>>>>>> told Blastall and BTblastall are and will chunk >>>>>>> >>>> problems accordingly, >>>> >>>>>>> but we're not certain if BioPerl is when called >>>>>>> >>>> to merge large Blast >>>> >>>>>>> results back together. It's the default version >>>>>>> >>>> 1.2.3 that's supplied >>>> >>>>>>> btw, and OS X 10.3.5 with all current updates >>>>>>> >>>> just short of the >>>> >>>>>>> latest 10.3.6 update. >>>>>>> >>>>>> >>>>>> >>>> >>>> >>>>>> BioPerl tries to slurp up the entire results set >>>>>> >>>> from a BLAST query, >>>> >>>>>> and build objects for each little bit of the >>>>>> >>>> result set and uses lots >>>> >>>>>> of memory. It doesn't have anything smart at all >>>>>> >>>> about breaking up the >>>> >>>>>> job within the result set, afaik. >>>>>> >>>> >>>> This is not really true. SearchIO module as far as I know works on stream. >>>> >>>>>> I ended up stripping out results that hit a >>>>>> >>>> certain threshold size to >>>> >>>>>> run on a different, large memory opteron/linux >>>>>> >>>> box and I'm >>>> >>>>>> experimenting with replacing BioPerl with >>>>>> >>>> BioPython etc. >>>> >>>>>> >>>>>> Michael >>>>>> >>>>> >>>>> >>>>> You may find hthat the BPLite parser works better >>>>> >>>> when dealing with >>>> >>>>> large blast result files. Its not as clean or >>>>> >>>> maintained, but it does >>>> >>>>> the job nicely for my current needs, which >>>>> >>>> overloaded the usual parser. >>>> >>>> There is basically no difference between BPLite and other BLAST parser >>>> interfaces in Bioperl. >>>> >>>> >>>> The problem lies in the core of Perl iteself. Perl does not release memory >>>> to the system even after the reference count of an object created in the >>>> memory goes to 0, unless the program in actually over. Perl object system >>>> in highly inefficient to handle large number of objects created in the >>>> memory. >>>> >>>> -Malay >>>> _______________________________________________ >>>> Bioclusters maillist - >>>> Bioclusters@bioinformatics.org >>>> >>> https://bioinformatics.org/mailman/listinfo/bioclusters >>> >>>> >>> >>> >>> ===== >>> Mike Cariaso >>> _______________________________________________ >>> Bioclusters maillist - Bioclusters@bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bioclusters >>> >> >> _______________________________________________ >> Bioclusters maillist - Bioclusters@bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bioclusters > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From mike at maibaum.org Tue Nov 30 05:59:13 2004 From: mike at maibaum.org (Michael Maibaum) Date: Tue Nov 30 08:14:07 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: References: <41ABF31A.4080101@mail.nih.gov> Message-ID: <20041130105913.GE30404@remote.gene-hacker.net> On Tue, Nov 30, 2004 at 01:24:24AM -0800, Steve Chervitz wrote: >Regarding SearchIO memory usage, I don't think this has been an issue >before, so I wonder if there is something about the installation or specific >usage of it that is leading to memory hogging. I've run it over large >numbers of reports without noticing troubles. It would be useful to see a >sample report + script using SearchIO that leads to the memory troubles, so >we can try to reproduce it. FWIW - I at least didn't have a problem parsing many thousands of results in a stram with SearchIO - I had a problem with parsing certain specific result sets, Essentially anything with about 2000 hits and alignments (or more) for a single query would kill a linux box with 1 gig of RAM (it would thrash VM to death). These would run on a opteron 16Gig box and used >8 gig of RAM in some cases. As far as I can see the majority of the memory was then returned when BioPerl moved on to the next record. The issue is that it takes a rather large amount or RAM for an individual record and I assumed (rightly or wrongly) that BioPerl slurps up the entire record and builds the objects representing it as a whole hence the large RAM usage. It may be that the objects to represetn 2000+ hits are just very (unreasonably?) large. Michael From iankorf at mac.com Tue Nov 30 02:42:23 2004 From: iankorf at mac.com (Ian Korf) Date: Tue Nov 30 08:14:14 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: <41ABF31A.4080101@mail.nih.gov> References: <20041129230357.21060.qmail@web52704.mail.yahoo.com> <41ABF31A.4080101@mail.nih.gov> Message-ID: <5F8D9F26-42A3-11D9-BEC7-000D93B2B83E@mac.com> I tried your example (fixing the syntax error of $sel->{'foo'} to $self->{'foo'} and I find that I get back half the memory after undef, which is exactly the behavior I described. This could be differences in Perl versions. What version of Perl are you using? perl -v gives me: This is perl, v5.8.1-RC3 built for darwin-thread-multi-2level On Nov 29, 2004, at 8:12 PM, Malay wrote: > Thanks Ian for your mail. But you have missed a major point of the > original discussion. What happens to object? So I did the same test > that you did using object. Here is the result. > > use strict; > package Test; > > sub new { > my $class =shift; > my $self = {}; > bless $self, $class; > $sel->{'foo'} = 'N' x 100000000; > return $self; > } > > package main; > > my $ob = Test->new(); #uses 197 MB as you said. > > undef $ob; ## still uses 197 MB ???!!!! > > This was the original point. Perl never releases memory for the > initial object creation. Infact try doing this in whatever way > possible, reusing references or undeffing it, the memory usage will > never go down below 197 MB, till the executaion duration of the > program. > > So I humbly differ in my opinion in any elaborate in-memory object > hierarchy in Perl. The language is not meant for that. But I am > nobody, stallwarts will differ in opinion. > > -Malay > > > > > > > > Ian Korf wrote: > >> After a recent conversation about memory in Perl, I decided to do >> some actual experiments. Here's the email I composed on the subject. >> >> >> I looked into the Perl memory issue. It's true that if you allocate a >> huge amount of memory that Perl doesn't like to give it back. But >> it's not as bad a situation as you might think. Let's say you do >> something like >> >> $FOO = 'N' x 100000000; >> >> That will allocate a chunk of about 192 Mb on my system. It doesn't >> matter if this is a package variable or lexical. >> >> our $FOO = 'N' x 100000000; # 192 Mb >> my $FOO = 'N' x 100000000; # 192 Mb >> >> If you put this in a subroutine >> >> sub foo {my $FOO = 'N' x 100000000} >> >> and you call this a bunch of times >> >> foo(); foo(); foo(); foo(); foo(); foo(); foo(); >> >> the memory footprint stays at 192 Mb. So Perl's garbage collection >> works just fine. Perl doesn't let go of the memory it has taken from >> the OS, but it is happy to reassign the memory it has reserved. >> >> Here's something odd. The following labeled block looks like it >> should use no memory. >> >> BLOCK: { >> my $FOO = 'N' x 100000000; >> } >> >> The weird thing is that after executing the block, the memory >> footprint is still 192 Mb as if it hadn't been garbage collected. >> >> Now look at this: >> >> my $foo = 'X' x 100000000; >> undef $foo; >> >> This has a memory footprint of 96 Mb. After some more >> experimentation, I have come up with the following interpretation of >> memory allocation and garbage collection in Perl. Perl will reuse >> memory for a variable of a given name (either package or lexical >> scope). There is no fear of memory leaks in loops for example. But >> each different named variable will retain its own minimum memory. >> That minimum memory is the size of the largest memory allocated to >> that variable, or half that amount if other variables have taken some >> of that space already. You can get any variable to automatically give >> up half its memory with undef. But this takes a little more CPU time. >> Here's some test code that shows this behavior. >> >> sub foo {my $FOO = 'N' x 100000000} >> for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s >> >> sub bar {my $BAR = 'N' x 100000000; undef $BAR} >> for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s >> >> The increase from 1 sec to 21 sec system CPU time is all the extra >> memory allocation and freeing associated with the undef statement. >> Why the user time is less in the undef example is a mystery to me. >> >> OK, to make a hideously long story short, use undef to save memory >> and use the same variable name over and over if you can. >> >> --- >> >> But this email thread has gone to BPlite, of which I am the original >> author. BPlite is designed to parse a stream and only reads a minimal >> amount of information at a time. The disadvantage of this is that if >> you want to know something about statistics, you can't get this until >> the end of the report (the original BPlite ignored statistics >> entirely). I like the new SearchIO interface better than BPlite, but >> for my own uses I generally use a table format most of the time and >> don't really use a BLAST parser very often. >> >> -Ian >> >> On Nov 29, 2004, at 3:03 PM, Mike Cariaso wrote: >> >>> This message is being cross posted from bioclusters to >>> bioperl. I'd appreciate a clarification from anyone in >>> bioperl who can speak more authoritatively than my >>> semi-speculation. >>> >>> >>> Perl does have a garbage collector. It is not wildly >>> sophisticated. As you've suggested it uses simple >>> reference counting. This means that circular >>> references will cause memory to be held until program >>> termination. >>> >>> However I think you are overstating the inefficiency >>> in the system. While the perl GC *may* not release >>> memory to the system, it does at least allow memory to >>> be reused within the process. >>> >>> If the system instead behaved as you describe, I think >>> perl would hemorrhage memory and would be unsuitable >>> for any long running processes. >>> >>> However I can say with considerable certainty that >>> that BPLite is able to handle blast reports which >>> cause SearchIO to thrash. I've attributed this to >>> BPLite being a true stream processor, while SearchIO >>> seems to slurp the whole file and object heirarchy >>> into memory. >>> >>> I know that SearchIO is the prefered blast parser, but >>> it seems that BPLite is not quite dead, for the >>> reasons above. If this is infact the unique benefit of >>> BPLite, perhaps the documentation should be clearer >>> about this, as I suspect I'm not the only person to >>> have had to reengineer a substantial piece of code to >>> adjust between their different models. Had I known of >>> this difference early on I would have chosen BPLite. >>> >>> So, bioperlers (especially Jason Stajich) can you shed >>> any light on this vestigial bioperl organ? >>> >>> >>> >>> --- Malay wrote: >>> >>>> Michael Cariaso wrote: >>>> >>>>> Michael Maibaum wrote: >>>>> >>>>>> >>>>>> On 10 Nov 2004, at 18:25, Al Tucker wrote: >>>>>> >>>>>>> Hi everybody. >>>>>>> >>>>>>> We're new to the Inquiry Xserve scientific >>>>>> >>>> cluster and trying to iron >>>> >>>>>>> out a few things. >>>>>>> >>>>>>> One thing is we seem to be coming up against is >>>>>> >>>> an out of memory >>>> >>>>>>> error when getting large sequence analysis >>>>>> >>>> results (5,000 seq - at >>>> >>>>>>> least- and above) back from BTblastall. The >>>>>> >>>> problem seems to be with >>>> >>>>>>> BioPerl. >>>>>>> >>>>>>> Might anyone here know if BioPerl is knows >>>>>> >>>> enough not to try and >>>> >>>>>>> access more than 4gb of RAM in a single process >>>>>> >>>> (an OS X limit)? I'm >>>> >>>>>>> told Blastall and BTblastall are and will chunk >>>>>> >>>> problems accordingly, >>>> >>>>>>> but we're not certain if BioPerl is when called >>>>>> >>>> to merge large Blast >>>> >>>>>>> results back together. It's the default version >>>>>> >>>> 1.2.3 that's supplied >>>> >>>>>>> btw, and OS X 10.3.5 with all current updates >>>>>> >>>> just short of the >>>> >>>>>>> latest 10.3.6 update. >>>>>> >>>>>> >>>>>> >>>> >>>> >>>>>> BioPerl tries to slurp up the entire results set >>>>> >>>> from a BLAST query, >>>> >>>>>> and build objects for each little bit of the >>>>> >>>> result set and uses lots >>>> >>>>>> of memory. It doesn't have anything smart at all >>>>> >>>> about breaking up the >>>> >>>>>> job within the result set, afaik. >>>>>> >>>> >>>> This is not really true. SearchIO module as far as I >>>> know works on stream. >>>> >>>>>> I ended up stripping out results that hit a >>>>> >>>> certain threshold size to >>>> >>>>>> run on a different, large memory opteron/linux >>>>> >>>> box and I'm >>>> >>>>>> experimenting with replacing BioPerl with >>>>> >>>> BioPython etc. >>>> >>>>>> >>>>>> Michael >>>>> >>>>> >>>>> >>>>> You may find hthat the BPLite parser works better >>>> >>>> when dealing with >>>> >>>>> large blast result files. Its not as clean or >>>> >>>> maintained, but it does >>>> >>>>> the job nicely for my current needs, which >>>> >>>> overloaded the usual parser. >>>> >>>> There is basically no difference between BPLite and >>>> other BLAST parser >>>> interfaces in Bioperl. >>>> >>>> >>>> The problem lies in the core of Perl iteself. Perl >>>> does not release >>>> memory to the system even after the reference count >>>> of an object created >>>> in the memory goes to 0, unless the program in >>>> actually over. Perl >>>> object system in highly inefficient to handle large >>>> number of objects >>>> created in the memory. >>>> >>>> -Malay >>>> _______________________________________________ >>>> Bioclusters maillist - >>>> Bioclusters@bioinformatics.org >>>> >>> https://bioinformatics.org/mailman/listinfo/bioclusters >>> >>>> >>> >>> >>> ===== >>> Mike Cariaso >>> _______________________________________________ >>> Bioclusters maillist - Bioclusters@bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bioclusters >>> >> >> _______________________________________________ >> Bioclusters maillist - Bioclusters@bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bioclusters > > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters > From tjrc at sanger.ac.uk Tue Nov 30 04:57:17 2004 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Tue Nov 30 08:14:24 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: References: <20041129230357.21060.qmail@web52704.mail.yahoo.com> Message-ID: <383F9130-42B6-11D9-909B-000A95B2B140@sanger.ac.uk> On 29 Nov 2004, at 11:32 pm, Ian Korf wrote: > Here's something odd. The following labeled block looks like it should > use no memory. > > BLOCK: { > my $FOO = 'N' x 100000000; > } > > The weird thing is that after executing the block, the memory > footprint is still 192 Mb as if it hadn't been garbage collected. Perl's garbage collection does not give the memory back to the OS; it just marks the allocated memory for internal reuse by subsequent allocations within perl. This is actually true of most UNIX programs; this is not unique to perl. free() does not necessarily give the memory back to the operating system, it just marks it for re-use by the current process the next time it calls malloc(). The memory doesn't become available to the OS until the program exits. This is one reason why garbage collecting languages like perl and java should not be relied on to keep memory under control; GC does *not* absolve the programmer from the need to keep their memory usage tight. Consider the following C program (which you need to run on an OS which actually populates all the contents of the rusage struct - Linux does not, and neither does MacOS X, but Tru64 does): #include #include #include #include #define PRINT_RESOURCES(x) getrusage(RUSAGE_SELF, &r);\ printf(#x "\n\nShared: %lu\nUnshared: %lu\nStack: %lu\n\n",\ r.ru_ixrss, r.ru_idrss, r.ru_isrss) int main(void) { char *p; struct rusage r; int i; PRINT_RESOURCES("Program start"); p = malloc(100000000); /* Use the memory */ for (i = 0; i<100000000; i++) p[i] = 'N'; PRINT_RESOURCES("After malloc"); free(p); PRINT_RESOURCES("After free"); return 0; } The output on this Tru64 machine is: 09:46:26 tjrc@ecs2d:~$ ./memtest "Program start" Shared: 0 Unshared: 0 Stack: 0 "After malloc" Shared: 19 Unshared: 116577 Stack: 19 "After free" Shared: 19 Unshared: 116577 Stack: 19 As you can see, free() does not actually release the memory from the process back to the operating system. > > sub foo {my $FOO = 'N' x 100000000} > for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s > > sub bar {my $BAR = 'N' x 100000000; undef $BAR} > for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s > > The increase from 1 sec to 21 sec system CPU time is all the extra > memory allocation and freeing associated with the undef statement. Why > the user time is less in the undef example is a mystery to me. I can explain this. It's because you're forgetting that the final statement in a perl subroutine is always its return value, even if you don't specify 'return', so if you allocate 100MB of Ns, as in the first case, and then return it (which you do because the allocation is the last statement in the subroutine) you actually force perl to *copy* that lexically scoped variable each time the routine is called. That's why the program uses 200MB of memory, not 100MB. In the second version, by explicitly freeing the memory, perl never has to copy the return value, so its memory footprint is half. Using undef has not actually freed any memory at all, it's just changed the return value from the function and stopped perl doubling its memory use. The lesson here is therefore to be very careful in perl subroutines where you don't care about the return value to make sure the return value is something tiny. Perl has no equivalent to a C void function. Tim -- Dr Tim Cutts Informatics Systems Group, Wellcome Trust Sanger Institute GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233 From jason.stajich at duke.edu Tue Nov 30 08:46:19 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 30 08:43:54 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: <20041130105913.GE30404@remote.gene-hacker.net> References: <41ABF31A.4080101@mail.nih.gov> <20041130105913.GE30404@remote.gene-hacker.net> Message-ID: <36FA3644-42D6-11D9-B39E-000393C44276@duke.edu> That's true - it does create a lot of objects for all the compnents of the report. When you have 2000 hits it needs to build quite a few objects. It does build them all for a single result. Steve had a lazy parser implementation in at one point, but that was more for speed when you didn't want to actually see the HSP details for every hit. I second Ian's comment that I use the tabular output from BLAST when dealing with large datasets. SearchIO is intended to give you access to the entire data in the report, so there is an overhead in that. There are a couple of workarounds depending on what kind of data you want. We designed SearchIO to be a modular system which separates parsing the data from instantiating objects by throwing events (like SAX) and having a listener build objects from these events. One can instantiate a different listener which builds simpler objects or throws away the data you don't want. At some point I hope we can build some light-weight Result/Hit/HSP objects and a listener which creates these instead of full-fledged bioperl objects. You can build your own listener object - SearchResultEventBuilder and FastHitEventBuilder are 2 implementations and you can specify the type of Result/Hit/HSP objects that are created by the listeners. It might be easiest to create some lightweight Hit and HSP objects and have SearchResultEventBuilder create these instead of the default full-fledged ones. At some point though, if you are getting 5-10k hits I don't think the parser is going to play nice as it wasn't really engineered with this extreme case in mind. Now the whole parser/listener design assumes that you want to process all the data for a result before moving on to the next one - at least from the listener's standpoint this means you have to store all the data you just got from the parser - whether this is in memory, or potentially stored in a tempfile/temp dbfile would be up to the implementation. Here is an example of how you can provide a different listener - FastHitEventBuilder just throws away the HSPs and only builds Result and Hit objects. use Bio::SearchIO; use Bio::SearchIO::FastHitEventBuilder; my $searchio = new Bio::SearchIO(-format => $format, -file => $file); $searchio->attach_EventHandler(new Bio::SearchIO::FastHitEventBuilder); while( my $r = $searchio->next_result ) { while( my $h = $r->next_hit ) { # note that Hits will NOT have HSPs } } On Nov 30, 2004, at 5:59 AM, Michael Maibaum wrote: > On Tue, Nov 30, 2004 at 01:24:24AM -0800, Steve Chervitz wrote: >> Regarding SearchIO memory usage, I don't think this has been an issue >> before, so I wonder if there is something about the installation or >> specific >> usage of it that is leading to memory hogging. I've run it over large >> numbers of reports without noticing troubles. It would be useful to >> see a >> sample report + script using SearchIO that leads to the memory >> troubles, so >> we can try to reproduce it. > > > FWIW - I at least didn't have a problem parsing many thousands of > results in a stram with SearchIO - I had a problem with parsing > certain specific result sets, Essentially anything with about 2000 > hits and alignments (or more) for a single query would kill a linux > box with 1 gig of RAM (it would thrash VM to death). These would run > on a opteron 16Gig box and used >8 gig of RAM in some cases. > As far as I can see the majority of the memory was then returned when > BioPerl moved on to the next record. The issue is that it takes a > rather large amount or RAM for an individual record and I assumed > (rightly or wrongly) that BioPerl slurps up the entire record and > builds the objects representing it as a whole hence the large RAM > usage. It may be that the objects to represetn 2000+ hits are just > very (unreasonably?) large. > > Michael > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From brian_osborne at cognia.com Tue Nov 30 09:08:28 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Nov 30 09:06:42 2004 Subject: [Bioperl-l] SearchIO broken Message-ID: bioperl-l, I'm not sure what's been decided with respect to backing out changes in Annotation and SeqFeature for possible release in 1.6 but SearchIO is broken now in bioperl-live, probably due to these changes. 149 ~/bioperl-live>perl -I. -w t/SearchIO.t 1..1216 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 ok 9 ok 10 ok 11 ok 12 ok 13 -------------------- WARNING --------------------- MSG: error in parsing a report: Can't locate object method "has_tag" via package "Bio::SeqFeature::Similarity" at Bio/SeqFeature/Generic.pm line 889. --------------------------------------------------- Can't call method "next_hit" on an undefined value at t/SearchIO.t line 84. Brian O. From jason.stajich at duke.edu Tue Nov 30 16:22:18 2004 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Nov 30 16:20:03 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl 1.2.3 and memory handling In-Reply-To: <20041130211002.49241.qmail@web52704.mail.yahoo.com> References: <20041130211002.49241.qmail@web52704.mail.yahoo.com> Message-ID: On Nov 30, 2004, at 4:10 PM, Mike Cariaso wrote: > Al, > > While I'm certainly learning a bit from the > bioperlers, we seem to have strayed a bit from your > original question. > > If you don't need to see the alignments, you might > wish to investigate if your software can be made to > use blast's table output ("blastall -m 8" I believe). > Perhaps the bioperl parser will recognize the format, > and will be able to complete since it will have no > alignments to eat up memory. If its not automatically > recognized writing a parser for this might be pretty > simple. > This is the 'blasttable' format - it will be more efficient since there is less data to store, but may still suffer from the memory overhead of creating Result/Hit/HSP objects even if they don't contain the alignment information. Bioperl 1.2.3 is pretty old so it might not have this - upgrading to 1.4 or the upcoming 1.5 release is suggested if you want to take advantage of bugfixes and new functionality. If you are running WU-BLAST you can specify the -noseqs option to not see the alignment data and the modern SearchIO::blast (I think only since the 1.4 bioperl release) will properly construct HSPs for you with the start/end information but no alignment sequences. My feeling is you should use SearchIO if you want the flexibility of changing algorithms, versions of programs, or output options and not have to change your script code which expects an API for the objects. If speed is what you want then convert things down to tab delimited format and the parser is as follows and you get to do whatever you want with the columns. while(<>) { my @fields = split(/\t/,$_) # do something with an HSP } I personally use a combination of approaches, trying to find the right tool for the job. > If you need the alignments but don't need all the > statistics, you might wish to use the BPLite parser, > which manages to handle some reports that the SearchIO > parser cannot. > > If you need both, you can probably still use BPLite, > but you'll need to do a bit more work. > > Sadly, I don't believe that the XML (-m 7) format is > handled by bioperl yet. That would probably solve all > of these issues. The format parser is called blastxml and it has been supported since Bio::SearchIO was written as it was in fact the first one I wrote because I wanted to write a SAX-like environment from the outset. [jason@lugano SearchIO]$ cvs log blastxml.pm | grep -A2 -P 'revision 1\.1\s+' revision 1.1 date: 2001/10/22 02:56:32; author: jason; state: Exp; initial commit of SearchIO modules and new Search objects > > > That'll teach you to ask a question! ;) > Mike Cariaso > > > > > --- Al Tucker wrote: > >> Hi everybody. >> >> We're new to the Inquiry Xserve scientific cluster >> and trying to iron >> out a few things. >> >> One thing is we seem to be coming up against is an >> out of memory >> error when getting large sequence analysis results >> (5,000 seq - at >> least- and above) back from BTblastall. The problem >> seems to be with >> BioPerl. >> >> Might anyone here know if BioPerl is knows enough >> not to try and >> access more than 4gb of RAM in a single process (an >> OS X limit)? I'm >> told Blastall and BTblastall are and will chunk >> problems accordingly, >> but we're not certain if BioPerl is when called to >> merge large Blast >> results back together. It's the default version >> 1.2.3 that's supplied >> btw, and OS X 10.3.5 with all current updates just >> short of the >> latest 10.3.6 update. >> >> - Al Tucker > > > ===== > Mike Cariaso > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From iankorf at mac.com Tue Nov 30 10:57:45 2004 From: iankorf at mac.com (Ian Korf) Date: Tue Nov 30 16:54:18 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: <383F9130-42B6-11D9-909B-000A95B2B140@sanger.ac.uk> References: <20041129230357.21060.qmail@web52704.mail.yahoo.com> <383F9130-42B6-11D9-909B-000A95B2B140@sanger.ac.uk> Message-ID: <934F34AC-42E8-11D9-BEC7-000D93B2B83E@mac.com> Perl does give memory back to the OS. If I do my $dna = 'N' x 100000000; the memory footprint is 192 MB. undef $dna; restores half the memory. This is not within a subroutine, but within the main program. On Nov 30, 2004, at 1:57 AM, Tim Cutts wrote: > > On 29 Nov 2004, at 11:32 pm, Ian Korf wrote: >> Here's something odd. The following labeled block looks like it >> should use no memory. >> >> BLOCK: { >> my $FOO = 'N' x 100000000; >> } >> >> The weird thing is that after executing the block, the memory >> footprint is still 192 Mb as if it hadn't been garbage collected. > > Perl's garbage collection does not give the memory back to the OS; it > just marks the allocated memory for internal reuse by subsequent > allocations within perl. > > This is actually true of most UNIX programs; this is not unique to > perl. free() does not necessarily give the memory back to the > operating system, it just marks it for re-use by the current process > the next time it calls malloc(). The memory doesn't become available > to the OS until the program exits. > > This is one reason why garbage collecting languages like perl and java > should not be relied on to keep memory under control; GC does *not* > absolve the programmer from the need to keep their memory usage tight. > > Consider the following C program (which you need to run on an OS which > actually populates all the contents of the rusage struct - Linux does > not, and neither does MacOS X, but Tru64 does): > > #include > #include > #include > #include > > #define PRINT_RESOURCES(x) getrusage(RUSAGE_SELF, &r);\ > printf(#x "\n\nShared: %lu\nUnshared: %lu\nStack: %lu\n\n",\ > r.ru_ixrss, r.ru_idrss, r.ru_isrss) > > int main(void) { > > char *p; > struct rusage r; > int i; > > PRINT_RESOURCES("Program start"); > > p = malloc(100000000); > > /* Use the memory */ > for (i = 0; i<100000000; i++) > p[i] = 'N'; > > PRINT_RESOURCES("After malloc"); > > free(p); > > PRINT_RESOURCES("After free"); > > return 0; > > } > > The output on this Tru64 machine is: > > 09:46:26 tjrc@ecs2d:~$ ./memtest > "Program start" > > Shared: 0 > Unshared: 0 > Stack: 0 > > "After malloc" > > Shared: 19 > Unshared: 116577 > Stack: 19 > > "After free" > > Shared: 19 > Unshared: 116577 > Stack: 19 > > As you can see, free() does not actually release the memory from the > process back to the operating system. >> > >> sub foo {my $FOO = 'N' x 100000000} >> for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s >> >> sub bar {my $BAR = 'N' x 100000000; undef $BAR} >> for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s >> >> The increase from 1 sec to 21 sec system CPU time is all the extra >> memory allocation and freeing associated with the undef statement. >> Why the user time is less in the undef example is a mystery to me. > > I can explain this. It's because you're forgetting that the final > statement in a perl subroutine is always its return value, even if you > don't specify 'return', so if you allocate 100MB of Ns, as in the > first case, and then return it (which you do because the allocation is > the last statement in the subroutine) you actually force perl to > *copy* that lexically scoped variable each time the routine is called. > That's why the program uses 200MB of memory, not 100MB. > > In the second version, by explicitly freeing the memory, perl never > has to copy the return value, so its memory footprint is half. > > Using undef has not actually freed any memory at all, it's just > changed the return value from the function and stopped perl doubling > its memory use. > > The lesson here is therefore to be very careful in perl subroutines > where you don't care about the return value to make sure the return > value is something tiny. Perl has no equivalent to a C void > function. > > Tim > > -- > Dr Tim Cutts > Informatics Systems Group, Wellcome Trust Sanger Institute > GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233 > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters > From jcuff at broad.mit.edu Tue Nov 30 15:17:55 2004 From: jcuff at broad.mit.edu (James Cuff) Date: Tue Nov 30 16:54:20 2004 Subject: [Bioperl-l] Re: [Bioclusters] BioPerl and memory handling In-Reply-To: <934F34AC-42E8-11D9-BEC7-000D93B2B83E@mac.com> References: <20041129230357.21060.qmail@web52704.mail.yahoo.com> <383F9130-42B6-11D9-909B-000A95B2B140@sanger.ac.uk> <934F34AC-42E8-11D9-BEC7-000D93B2B83E@mac.com> Message-ID: Sigh. All depends on your OS. Here are two versions of the same code that Ian posted, the first w/o the undef and the other with. Both with a while(1==1) to make it hang around, to see with ps. bink:~ jcuff$ uname -a Darwin bink 7.6.0 Darwin Kernel Version 7.6.0: Sun Oct 10 12:05:27 PDT 2004; root:xnu/xnu-517.9.4.obj~1/RELEASE_PPC Power Macintosh powerpc bink:~ jcuff$ ps aux | grep perl jcuff 12609 98.3 18.8 223704 196832 std R 3:04PM 0:02.43 perl ./test. jcuff 12603 98.1 9.5 126044 99180 std R 3:04PM 0:22.05 perl ./test. 3:09pm jcuff@lead ~ > uname -a Linux lead 2.4.18-24.7.xsmp #1 SMP Fri Jan 31 06:10:55 EST 2003 i686 unknown 3:09pm jcuff@lead ~ > ps wwaux | grep test.pl jcuff 18612 95.2 4.9 197952 196420 pts/42 R 15:09 0:12 /util/bin/perl ./test.pl jcuff 18619 52.8 2.5 100288 98760 pts/42 R 15:09 0:04 /util/bin/perl ./test.pl tryptophan# uname -a FreeBSD tryptophan 4.10-RELEASE-p3 FreeBSD 4.10-RELEASE-p3 #0: Wed Sep 29 14:08:46 EDT 2004 root@tryptophan:/usr/obj/usr/src/sys/TRYPTOPHAN i386 tryptophan# ps aux | grep perl root 41538 96.3 4.9 197404 197156 p1 R 3:11PM 0:07.22 perl test.pl root 41540 91.2 2.5 99808 99468 p1 R 3:11PM 0:02.44 perl test.pl # uname -a SunOS sun 5.10 s10_69 sun4u sparc SUNW,Ultra-5_10 # ps -eadfl | grep test 0 R root 3003 2996 49 69 20 ? 24820 15:12:07 pts/3 1:00 perl ./test.pl 0 R root 3005 2996 49 79 20 ? 24820 15:12:14 pts/3 0:52 perl ./test.pl 3:05pm jcuff@bismuth ~ > uname -a OSF1 bismuth.broad.mit.edu V5.1 2650 alpha alpha 3:05pm jcuff@bismuth ~ > ps aux | grep test.pl jcuff 220249 97.8 4.7 260M 191M pts/3 R 15:05:37 0:08.85 ./test.pl jcuff 220252 68.7 4.7 260M 191M pts/3 R 15:05:42 0:03.16 ./test.pl So basically Tru64 and Solaris 10 wire in the memory until the process ends. Linux, OSX and FreeBSD let it go, well as Ian said, half of it :-) j. On Nov 30, 2004, at 10:57 AM, Ian Korf wrote: > Perl does give memory back to the OS. If I do > > my $dna = 'N' x 100000000; > > the memory footprint is 192 MB. > > undef $dna; > > restores half the memory. This is not within a subroutine, but within > the main program. > > -- James Cuff, D. Phil. Group Leader, Applied Production Systems The Broad Institute of MIT and Harvard. 320 Charles Street, Cambridge, MA. 02141. Tel: 617-252-1925 Fax: 617-258-0903