From er at xs4all.nl Mon Jan 1 15:17:42 2007 From: er at xs4all.nl (Erik) Date: Mon, 1 Jan 2007 21:17:42 +0100 (CET) Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> Message-ID: <21156.156.83.1.215.1167682662.squirrel@webmail.xs4all.nl> > Agree with Hilmar, in that we need examples. Another problematic one was NC_004822 - however, most other problems I referred to were ones that did *not* stop de DBD indexing. (Of course, I do not know how many error-throwing entries there still are in the files that are not yet indexed: ca 75%). The most common error was pollution of the 'binomial' name with classification lines as a result of faulty parsing. If Bio::Species is deprecated (I hadn't noticed that before) then these problems are of course correspondingly less important. >> If you are referring to your submitted bug: > http://bugzilla.open-bio.org/show_bug.cgi?id=2167 Yes, the above was one that stopped refseq indexing (there is one more that I will stick into bugzilla in a minute). Thanks for the commit. > > we could add this in as long as it passes (I'll try giving it a > workout with my local bacterial seqs tonight or tomorrow). However, > in the not-too-distant future your patch would likely be rendered > obsolete, as any parsing in Bio::SeqIO modules pertaining to > Bio::Species-related matters will be deprecated in favor of simple > parsing (more foolproof, less uncertainty) and Bio::Taxon (which has > optional db lookups using NCBI Taxonomy). Bio::Species and anything > related to it are considered marked for deprecation. Fair warning... What does simple parsing mean? Just returning the whole ORGANISM string, and leaving further parsing to application side? I shall look a bit closer at the Bio::Taxon and its relation to the parser modules, assuming there still *is* a relation. :) Maybe someone could elaborate just a little bit to get me started on how to get taxonomic data from a refseg id or a genbank entry? thanks, Erik > On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote: > >> Can you send examples and the resulting error messages? Also, I'm >> assuming you running the 1.5.2 release of Bioperl; if not that's what >> I would try first. >> >> -hilmar >> >> On Dec 30, 2006, at 7:05 PM, Erik wrote: >> >>> Hi all, >>> >>> I downloaded the refseq files (.gbff) and want to index the lot with >>> Bio::DB::Flat. >>> >>> It turns out that there are many cases where the SOURCE and >>> ORGANISM lines >>> are messed up, sometimes to a degree where the indexing fails on a >>> Bio::SeqIO::genbank error. >>> >>> I'd like to change Bio::SeqIO::genbank to let this parsing go at >>> least so >>> far as to make the indexing of the refseq files possible, and >>> hopefully >>> improving the taxonomic output ($seq->species->binomial is often >>> mutilated >>> at the moment). >>> >>> Is it still worthwhile to change parsing modules like >>> Bio::SeqIO::genbank? >>> Is anyone already working on a rewrite? Because if this is the >>> case I may >>> be better off writing my own indexing scheme? >>> >>> Below is (outline of) my indexing program, which uses >>> Bio::DB::Flat::DBD. >>> If anyone knows of a better way to get a locally searchable refseq >>> flat >>> file index, I would be very interested. >>> >>> Thanks for your help, >>> >>> Erikjan >>> >>> >>> ------------- >>> use Bio::DB::Flat; >>> >>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; >>> my $db=Bio::DB::Flat->new( >>> -directory => $refseq_dir, >>> -dbname => 'refseq', >>> -format => 'genbank', >>> -index => 'bdb', >>> -write_flag => 1, >>> ); >>> my @files = getfiles($refseq_dir); >>> for my $f (@files) { >>> db->build_index($f); >>> } >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== From cjfields at uiuc.edu Mon Jan 1 18:19:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 1 Jan 2007 17:19:12 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <21156.156.83.1.215.1167682662.squirrel@webmail.xs4all.nl> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> <21156.156.83.1.215.1167682662.squirrel@webmail.xs4all.nl> Message-ID: <535CDEEC-EE32-40A9-BEF5-646981AB208B@uiuc.edu> On Jan 1, 2007, at 2:17 PM, Erik wrote: >> we could add this in as long as it passes (I'll try giving it a >> workout with my local bacterial seqs tonight or tomorrow). However, >> in the not-too-distant future your patch would likely be rendered >> obsolete, as any parsing in Bio::SeqIO modules pertaining to >> Bio::Species-related matters will be deprecated in favor of simple >> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has >> optional db lookups using NCBI Taxonomy). Bio::Species and anything >> related to it are considered marked for deprecation. Fair warning... > > What does simple parsing mean? Just returning the whole ORGANISM > string, > and leaving further parsing to application side? Current behavior with parsing tries to determine genus/species from the data in the sequence record data alone, which has become increasingly more difficult and unreliable over the years. Since a perfectly valid source for taxonomic information exists (NCBI Taxonomy), and each GenBank/EMBL sequence record is tagged with a relevant TaxID, it makes more sense to base reliable parsing of taxonomic data on that resource. Sendu has essentially set up Bio::Taxon for that reason; Bio::Species has been changed to inherit Bio::Taxon (which is also a Bio::Tree::Node) but still exhibit older behavior (i.e. retain the old API). It will gradually be shifted out in favor of Bio::Taxon by rel 1.8. We hope. > I shall look a bit closer at the Bio::Taxon and its relation to the > parser > modules, assuming there still *is* a relation. :) > > Maybe someone could elaborate just a little bit to get me started > on how > to get taxonomic data from a refseg id or a genbank entry? I'm assuming you could use code similar to that found in taxonomy2tree.pl (in the scripts/taxa directory in CVS). I believe the NCBI taxid is accessible via: $seq->species->ncbi_taxid The script above should help somewhat, and the HOWTO on Trees I think also has some more. Maybe some of the newer Bio::Taxon behavior needs to be added at some point? chris From Karen.Buysse at UGent.be Tue Jan 2 09:59:46 2007 From: Karen.Buysse at UGent.be (Karen Buysse) Date: Tue, 02 Jan 2007 15:59:46 +0100 Subject: [Bioperl-l] repeatmasker Message-ID: <459A7362.7060405@UGent.be> Dear all, I want to use the BioPerl repeatmasker program. However, when I run the following program (=first lines of the synopsis): use Bio::Tools::Run::RepeatMasker; my @params=("mam" => 1,"noint"=>1); my $factory = Bio::Tools::Run::RepeatMasker->new(@params); I get the error message: *RepeatMasker program not found as or not executable.* I have the file *RepeatMasker.pm* in the following directories: C:/Perl/lib/Bio/Tools/Run C:/Perl/site/lib/Bio/Tools/Run Can anyone please help me with this? Many thanks in advance and a happy new year, Karen -- ir. Karen Buysse Center for Medical Genetics Ghent (CMGG) Ghent University Hospital Medical Research Building (MRB), 2nd floor, room 120.050 De Pintelaan 185, B-9000 Ghent, Belgium +32 9 240 39 46 (phone) +32 9 240 65 49 (fax) http://medgen.ugent.be Karen.Buysse at UGent.be From arareko at campus.iztacala.unam.mx Tue Jan 2 11:12:34 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 02 Jan 2007 10:12:34 -0600 Subject: [Bioperl-l] repeatmasker In-Reply-To: <459A7362.7060405@UGent.be> References: <459A7362.7060405@UGent.be> Message-ID: <459A8472.6070706@campus.iztacala.unam.mx> Hi Karen, It seems like you don't have the RepeatMasker executable installed in your machine and available for BioPerl to make use of it. You'll need to download and install it first. You can do it from here: http://www.repeatmasker.org/ I don't know if there's a compiled version of RM available for Windows. Does anyone knows about this? Chris, Nathan? Regards, Mauricio. Karen Buysse wrote: > Dear all, > > I want to use the BioPerl repeatmasker program. > However, when I run the following program (=first lines of the synopsis): > > use Bio::Tools::Run::RepeatMasker; > > my @params=("mam" => 1,"noint"=>1); > my $factory = Bio::Tools::Run::RepeatMasker->new(@params); > > I get the error message: *RepeatMasker program not found as or not > executable.* > I have the file *RepeatMasker.pm* in the following directories: > C:/Perl/lib/Bio/Tools/Run > C:/Perl/site/lib/Bio/Tools/Run > > Can anyone please help me with this? > > Many thanks in advance and a happy new year, > Karen > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From bix at sendu.me.uk Tue Jan 2 11:16:57 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 02 Jan 2007 16:16:57 +0000 Subject: [Bioperl-l] repeatmasker In-Reply-To: <459A7362.7060405@UGent.be> References: <459A7362.7060405@UGent.be> Message-ID: <459A8579.80408@sendu.me.uk> Karen Buysse wrote: > Dear all, > > I want to use the BioPerl repeatmasker program. > However, when I run the following program (=first lines of the synopsis): > > use Bio::Tools::Run::RepeatMasker; > > my @params=("mam" => 1,"noint"=>1); > my $factory = Bio::Tools::Run::RepeatMasker->new(@params); > > I get the error message: *RepeatMasker program not found as or not > executable.* > I have the file *RepeatMasker.pm* in the following directories: > C:/Perl/lib/Bio/Tools/Run > C:/Perl/site/lib/Bio/Tools/Run > > Can anyone please help me with this? Bioperl run modules are not programs but front-ends ('wrappers') to external programs. You need to install the program that corresponds to the module before it will work. Visit http://www.repeatmasker.org/ to get the program and database. From er at xs4all.nl Tue Jan 2 11:23:12 2007 From: er at xs4all.nl (Erik) Date: Tue, 2 Jan 2007 17:23:12 +0100 (CET) Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <535CDEEC-EE32-40A9-BEF5-646981AB208B@uiuc.edu> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> <21156.156.83.1.215.1167682662.squirrel@webmail.xs4all.nl> <535CDEEC-EE32-40A9-BEF5-646981AB208B@uiuc.edu> Message-ID: <13789.156.83.1.215.1167754992.squirrel@webmail.xs4all.nl> That seems like an real improvement over parsing the name out of the text-entry. I'll use taxid = $seq->species->ncbi_taxid from now on. Thanks for that elucidation. :) That leaves the error-throwing problem in Bio::DB::Flat, which I encountered while making a local RefSeq BerkeleyDB index. I supposed it remains worthwhile to prevent the indexing from breaking on Bio::SeqIO instantiation (at least for the RefSeq entry set), so I have put a simple fix on bugzilla that prevents one more problem entry (NC_004822) from breaking the indexing process. Thanks, Erikjan > On Jan 1, 2007, at 2:17 PM, Erik wrote: > >>> we could add this in as long as it passes (I'll try giving it a >>> workout with my local bacterial seqs tonight or tomorrow). However, >>> in the not-too-distant future your patch would likely be rendered >>> obsolete, as any parsing in Bio::SeqIO modules pertaining to >>> Bio::Species-related matters will be deprecated in favor of simple >>> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has >>> optional db lookups using NCBI Taxonomy). Bio::Species and anything >>> related to it are considered marked for deprecation. Fair warning... >> >> What does simple parsing mean? Just returning the whole ORGANISM >> string, >> and leaving further parsing to application side? > > Current behavior with parsing tries to determine genus/species from > the data in the sequence record data alone, which has become > increasingly more difficult and unreliable over the years. Since a > perfectly valid source for taxonomic information exists (NCBI > Taxonomy), and each GenBank/EMBL sequence record is tagged with a > relevant TaxID, it makes more sense to base reliable parsing of > taxonomic data on that resource. > > Sendu has essentially set up Bio::Taxon for that reason; Bio::Species > has been changed to inherit Bio::Taxon (which is also a > Bio::Tree::Node) but still exhibit older behavior (i.e. retain the > old API). It will gradually be shifted out in favor of Bio::Taxon by > rel 1.8. We hope. > >> I shall look a bit closer at the Bio::Taxon and its relation to the >> parser >> modules, assuming there still *is* a relation. :) >> >> Maybe someone could elaborate just a little bit to get me started >> on how >> to get taxonomic data from a refseg id or a genbank entry? > > I'm assuming you could use code similar to that found in > taxonomy2tree.pl (in the scripts/taxa directory in CVS). I believe > the NCBI taxid is accessible via: > > $seq->species->ncbi_taxid > > The script above should help somewhat, and the HOWTO on Trees I think > also has some more. Maybe some of the newer Bio::Taxon behavior > needs to be added at some point? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Jan 2 11:21:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Jan 2007 10:21:26 -0600 Subject: [Bioperl-l] repeatmasker In-Reply-To: <459A7362.7060405@UGent.be> References: <459A7362.7060405@UGent.be> Message-ID: RepeastMasker.pm is a wrapper for the RepeatMasker program. This is from the POD: DESCRIPTION RepeatMasker is a program that screens DNA sequences for interspersed repeats known to exist in mammalian genomes as well as for low complex- ity DNA sequences. For more information, on the program and its usage, please refer to http://repeatmasker.genome.washington.edu/. Newer versions are available here: http://www.repeatmasker.org/ You need to install RepeatMasker in your PATH in order to use the wrapper. chris On Jan 2, 2007, at 8:59 AM, Karen Buysse wrote: > Dear all, > > I want to use the BioPerl repeatmasker program. > However, when I run the following program (=first lines of the > synopsis): > > use Bio::Tools::Run::RepeatMasker; > > my @params=("mam" => 1,"noint"=>1); > my $factory = Bio::Tools::Run::RepeatMasker->new(@params); > > I get the error message: *RepeatMasker program not found as or not > executable.* > I have the file *RepeatMasker.pm* in the following directories: > C:/Perl/lib/Bio/Tools/Run > C:/Perl/site/lib/Bio/Tools/Run > > Can anyone please help me with this? > > Many thanks in advance and a happy new year, > Karen > > -- > ir. Karen Buysse > Center for Medical Genetics Ghent (CMGG) > Ghent University Hospital > Medical Research Building (MRB), 2nd floor, room 120.050 > De Pintelaan 185, B-9000 Ghent, Belgium > +32 9 240 39 46 (phone) > +32 9 240 65 49 (fax) > http://medgen.ugent.be > Karen.Buysse at UGent.be > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Derek.Fairley at bll.n-i.nhs.uk Tue Jan 2 11:13:54 2007 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Tue, 2 Jan 2007 16:13:54 -0000 Subject: [Bioperl-l] repeatmasker In-Reply-To: <459A7362.7060405@UGent.be> Message-ID: Hi Karen, RepeatMasker isn't a BioPerl program - although the Bio::Tools::Run::RepeatMasker module can call it if it's properly installed on your system. RepeatMasker also requires either Cross_Match or WUBlast to be installed, in addition to a local database containing repeat data. Can you run RepeatMasker from the command line okay? Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Karen Buysse Sent: 02 January 2007 15:00 To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] repeatmasker Dear all, I want to use the BioPerl repeatmasker program. However, when I run the following program (=first lines of the synopsis): use Bio::Tools::Run::RepeatMasker; my @params=("mam" => 1,"noint"=>1); my $factory = Bio::Tools::Run::RepeatMasker->new(@params); I get the error message: *RepeatMasker program not found as or not executable.* I have the file *RepeatMasker.pm* in the following directories: C:/Perl/lib/Bio/Tools/Run C:/Perl/site/lib/Bio/Tools/Run Can anyone please help me with this? Many thanks in advance and a happy new year, Karen -- ir. Karen Buysse Center for Medical Genetics Ghent (CMGG) Ghent University Hospital Medical Research Building (MRB), 2nd floor, room 120.050 De Pintelaan 185, B-9000 Ghent, Belgium +32 9 240 39 46 (phone) +32 9 240 65 49 (fax) http://medgen.ugent.be Karen.Buysse at UGent.be _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From n.haigh at sheffield.ac.uk Tue Jan 2 11:44:58 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 02 Jan 2007 16:44:58 +0000 Subject: [Bioperl-l] repeatmasker In-Reply-To: <459A8472.6070706@campus.iztacala.unam.mx> References: <459A7362.7060405@UGent.be> <459A8472.6070706@campus.iztacala.unam.mx> Message-ID: <459A8C0A.1010600@sheffield.ac.uk> Mauricio Herrera Cuadra wrote: > Hi Karen, > > It seems like you don't have the RepeatMasker executable installed in > your machine and available for BioPerl to make use of it. You'll need to > download and install it first. You can do it from here: > > http://www.repeatmasker.org/ > > I don't know if there's a compiled version of RM available for Windows. > Does anyone knows about this? Chris, Nathan? > > Regards, > Mauricio. > > Ah yes, I didn't see the Windows file paths! It doesn't look like there is an executable available from their website. Karen - it might be worth contacting them to check! Nath From marian.thieme at klinik.uni-regensburg.de Tue Jan 2 11:48:18 2007 From: marian.thieme at klinik.uni-regensburg.de (Marian Thieme) Date: Tue, 02 Jan 2007 17:48:18 +0100 Subject: [Bioperl-l] store variations, generate sequences Message-ID: <459A8CD2.9060805@klinik.uni-regensburg.de> Hi all, I am quite new to bioperl and I have a question about sequence data: I am working on a resequencing project. Here we have resequenced 1000 genes of a certain gene. My question: What is easiest way to store each discovered variation of each individual and get a fasta sequence for an arbitrary individual. I would expect that there is some way to set up a reference sequence and store all variationsm relative to this reference sequence. Afterward it should be possible to genereate sequences for each indiviudal in question, right ? My approach was the following: I have created an seqdiff object: $seqDiff = Bio::Variation::SeqDiff->new (...) and I have assigned the reference sequence to that object via: $seqDiff->dna_ori('atgcgtatatg'); Now I thought, I can create some variations via DNAMutation object: $dnamut = Bio::Variation::DNAMutation->new ( -start => 6, -end => 6, -length => 1, -isMutation => 1, -upStreamSeq => 'atgcg', -dnStreamSeq => 'atatg' ); $a1 = Bio::Variation::Allele->new; $a1->seq('t'); $dnamut->allele_ori($a1); my $a2 = Bio::Variation::Allele->new; $a2->seq('a'); $dnamut->add_Allele($a2); Is that the correct way to describe the reference sequence, describe a variation and attach this to seqdiff object ? Probably I didnt understand the api right. (I did assume start/end means start/endposition of the mutation). Is it possible to get a complete sequence print (fast format) of each variation/indiviudal ? Regards, Marian -- Marian Thieme University Regensburg Institute of Functional Genomics Josef-Engert-Str. 9 93053 Regensburg Germany P: 0049 (0)941 943 5055 F: 0049 (0)941 943 5020 E: marian.thieme at klinik.uni-regensburg.de W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik From Karen.Buysse at UGent.be Tue Jan 2 12:02:41 2007 From: Karen.Buysse at UGent.be (Karen Buysse) Date: Tue, 02 Jan 2007 18:02:41 +0100 Subject: [Bioperl-l] repeatmasker In-Reply-To: <459A8C0A.1010600@sheffield.ac.uk> References: <459A7362.7060405@UGent.be> <459A8472.6070706@campus.iztacala.unam.mx> <459A8C0A.1010600@sheffield.ac.uk> Message-ID: <459A9031.6060802@UGent.be> Nathan S. Haigh wrote: >Mauricio Herrera Cuadra wrote: > > >>Hi Karen, >> >>It seems like you don't have the RepeatMasker executable installed in >>your machine and available for BioPerl to make use of it. You'll need to >>download and install it first. You can do it from here: >> >>http://www.repeatmasker.org/ >> >>I don't know if there's a compiled version of RM available for Windows. >>Does anyone knows about this? Chris, Nathan? >> >>Regards, >>Mauricio. >> >> >> >> > >Ah yes, I didn't see the Windows file paths! It doesn't look like there >is an executable available from their website. Karen - it might be worth >contacting them to check! > >Nath > > > OK, I will, thanks! Karen From marian.thieme at lycos.de Tue Jan 2 11:57:04 2007 From: marian.thieme at lycos.de (Marian thieme) Date: Tue, 02 Jan 2007 17:57:04 +0100 Subject: [Bioperl-l] store variations, generate sequences Message-ID: <459A8EE0.3000504@lycos.de> sorry if you get this email twice, prob. the first time I have used the wrong sender address Hi all, I am quite new to bioperl and I have a question about sequence data: I am working on a resequencing project. Here we have resequenced 1000 genes of a certain gene. My question: What is easiest way to store each discovered variation of each individual and get a fasta sequence for an arbitrary individual. I would expect that there is some way to set up a reference sequence and store all variationsm relative to this reference sequence. Afterward it should be possible to genereate sequences for each indiviudal in question, right ? My approach was the following: I have created an seqdiff object: $seqDiff = Bio::Variation::SeqDiff->new (...) and I have assigned the reference sequence to that object via: $seqDiff->dna_ori('atgcgtatatg'); Now I thought, I can create some variations via DNAMutation object: $dnamut = Bio::Variation::DNAMutation->new ( -start => 6, -end => 6, -length => 1, -isMutation => 1, -upStreamSeq => 'atgcg', -dnStreamSeq => 'atatg' ); $a1 = Bio::Variation::Allele->new; $a1->seq('t'); $dnamut->allele_ori($a1); my $a2 = Bio::Variation::Allele->new; $a2->seq('a'); $dnamut->add_Allele($a2); Is that the correct way to describe the reference sequence, describe a variation and attach this to seqdiff object ? Probably I didnt understand the api right. (I did assume start/end means start/endposition of the mutation). Is it possible to get a complete sequence print (fast format) of each variation/indiviudal ? Regards, Marian From bix at sendu.me.uk Tue Jan 2 11:52:53 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 02 Jan 2007 16:52:53 +0000 Subject: [Bioperl-l] quiet() proposed for WrapperBase Message-ID: <459A8DE5.6060207@sendu.me.uk> Hi, I propose a quiet() method in Bio::Tools::Run::WrapperBase. A number of wrapper modules 'implement' their own quiet() via AUTOLOAD; this would be for use by wrappers not making use of AUTOLOAD. The code would be: =head2 quiet Title : quiet Usage : $factory->quiet(1); if ($factory->quiet()) { ... } Function: Get/set the quiet state. Can be used by wrappers to control if program output is printed to the console or not. Returns : boolean Args : none to get, boolean to set =cut sub quiet { my $self = shift; if (@_) { $self->{quiet} = shift } return $self->{quiet} || 0; } This method would get used instead of the AUTOLOAD version in most wrappers, but I don't think any behaviour different will result. Any objections? From n.haigh at sheffield.ac.uk Tue Jan 2 11:18:53 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 02 Jan 2007 16:18:53 +0000 Subject: [Bioperl-l] repeatmasker In-Reply-To: <459A7362.7060405@UGent.be> References: <459A7362.7060405@UGent.be> Message-ID: <459A85ED.5020806@sheffield.ac.uk> Karen Buysse wrote: > Dear all, > > I want to use the BioPerl repeatmasker program. > However, when I run the following program (=first lines of the synopsis): > > use Bio::Tools::Run::RepeatMasker; > > my @params=("mam" => 1,"noint"=>1); > my $factory = Bio::Tools::Run::RepeatMasker->new(@params); > > I get the error message: *RepeatMasker program not found as or not > executable.* > I have the file *RepeatMasker.pm* in the following directories: > C:/Perl/lib/Bio/Tools/Run > C:/Perl/site/lib/Bio/Tools/Run > > Can anyone please help me with this? > > Many thanks in advance and a happy new year, > Karen > > Hi Karen, Bioperl only supplies a wrapper for the repeatmasker software. You need to download and install the RepeatMasker software from: http://www.repeatmasker.org/RMDownload.html Have another bash at it from here, and post again if you have problems. Cheers Nath From cjfields at uiuc.edu Tue Jan 2 12:32:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Jan 2007 11:32:02 -0600 Subject: [Bioperl-l] repeatmasker In-Reply-To: <459A9031.6060802@UGent.be> Message-ID: <002c01c72e93$ea692ce0$15327e82@pyrimidine> > Nathan S. Haigh wrote: > > >Mauricio Herrera Cuadra wrote: > > > > > >>Hi Karen, > >> > >>It seems like you don't have the RepeatMasker executable > installed in > >>your machine and available for BioPerl to make use of it. > You'll need > >>to download and install it first. You can do it from here: > >> > >>http://www.repeatmasker.org/ > >> > >>I don't know if there's a compiled version of RM available > for Windows. > >>Does anyone knows about this? Chris, Nathan? > >> > >>Regards, > >>Mauricio. > >> > >> > >> > >> > > > >Ah yes, I didn't see the Windows file paths! It doesn't look > like there > >is an executable available from their website. Karen - it might be > >worth contacting them to check! > > > >Nath > > > > > > > OK, I will, thanks! > > Karen Nice to see everybody is alive and kicking after the break! chris From cjfields at uiuc.edu Tue Jan 2 12:45:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Jan 2007 11:45:41 -0600 Subject: [Bioperl-l] quiet() proposed for WrapperBase In-Reply-To: <459A8DE5.6060207@sendu.me.uk> Message-ID: <002d01c72e95$d21206b0$15327e82@pyrimidine> > Hi, > > I propose a quiet() method in Bio::Tools::Run::WrapperBase. A > number of wrapper modules 'implement' their own quiet() via > AUTOLOAD; this would be for use by wrappers not making use of > AUTOLOAD. > > The code would be: > > =head2 quiet > > Title : quiet > Usage : $factory->quiet(1); > if ($factory->quiet()) { ... } > Function: Get/set the quiet state. Can be used by wrappers > to control if > program output is printed to the console or not. > Returns : boolean > Args : none to get, boolean to set > > =cut > > sub quiet { > my $self = shift; > if (@_) { $self->{quiet} = shift } > return $self->{quiet} || 0; > } > > > This method would get used instead of the AUTOLOAD version in > most wrappers, but I don't think any behaviour different will result. > > Any objections? No problem with me. On less AUTOLOAD'ed sub to worry about! chris From avilella at gmail.com Tue Jan 2 12:35:58 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 2 Jan 2007 18:35:58 +0100 Subject: [Bioperl-l] store variations, generate sequences In-Reply-To: <459A8CD2.9060805@klinik.uni-regensburg.de> References: <459A8CD2.9060805@klinik.uni-regensburg.de> Message-ID: <358f4d650701020935s45a76d0fm208912e60893bf52@mail.gmail.com> The Bio::PopGen modules contain Individual, population and genotype objects, among other utilities. There are some input/output formats in Bio::PopGen::IO and also some methods to go from an aln to a population. That said, I am not entirely sure about how much of that overlaps with Bio::Variation. If you think anything missing that you would like to have implemented in bioperl, we would greatly appreciate your feedback, Cheers, Albert. On 1/2/07, Marian Thieme wrote: > Hi all, > > I am quite new to bioperl and I have a question about sequence data: I > am working on a resequencing project. Here we have resequenced 1000 > genes of a certain gene. My question: What is easiest way to store each > discovered variation of each individual and get a fasta sequence for an > arbitrary individual. > > I would expect that there is some way to set up a reference sequence and > store all variationsm relative to this reference sequence. Afterward it > should be possible to genereate sequences for each indiviudal in > question, right ? > > My approach was the following: > > I have created an seqdiff object: > > $seqDiff = Bio::Variation::SeqDiff->new (...) > > > and I have assigned the reference sequence to that object via: > > $seqDiff->dna_ori('atgcgtatatg'); > > > Now I thought, I can create some variations via DNAMutation object: > > $dnamut = Bio::Variation::DNAMutation->new ( > -start => 6, > -end => 6, > -length => 1, > -isMutation => 1, > -upStreamSeq => 'atgcg', > -dnStreamSeq => 'atatg' > ); > > $a1 = Bio::Variation::Allele->new; > $a1->seq('t'); > $dnamut->allele_ori($a1); > > my $a2 = Bio::Variation::Allele->new; > $a2->seq('a'); > $dnamut->add_Allele($a2); > > > > Is that the correct way to describe the reference sequence, describe a > variation and attach this to seqdiff object ? > Probably I didnt understand the api right. (I did assume start/end means > start/endposition of the mutation). Is it possible to get a complete > sequence print (fast format) of each variation/indiviudal ? > > Regards, > Marian > > -- > Marian Thieme > University Regensburg > Institute of Functional Genomics > Josef-Engert-Str. 9 > 93053 > Regensburg > Germany > P: 0049 (0)941 943 5055 > F: 0049 (0)941 943 5020 > E: marian.thieme at klinik.uni-regensburg.de > W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Jan 2 14:04:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 Jan 2007 13:04:40 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <13789.156.83.1.215.1167754992.squirrel@webmail.xs4all.nl> Message-ID: <002e01c72ea0$daf060a0$15327e82@pyrimidine> > That seems like an real improvement over parsing the name out > of the text-entry. I'll use taxid = $seq->species->ncbi_taxid > from now on. > > Thanks for that elucidation. :) No problem. > That leaves the error-throwing problem in Bio::DB::Flat, > which I encountered while making a local RefSeq BerkeleyDB index. > > I supposed it remains worthwhile to prevent the indexing from > breaking on Bio::SeqIO instantiation (at least for the RefSeq > entry set), so I have put a simple fix on bugzilla that > prevents one more problem entry > (NC_004822) from breaking the indexing process. > > > Thanks, > > Erikjan I'll look into the bug fix; that particular record has an unusual taxonomic name which may change at some point (Candidiatus something-or-other, likely). Best that we don't rely on that supposition though. The way I see it we can go down two roads: 1) Continue on with working in Bio::Species-related parsing (which I do not support) 2) Work towards Bio::Taxon-related parsing (which I do support). Note that both the classification issue (first bug, now resolved) and the SOURCE line issue (second bug, unresolved) are related to the older way of parsing that we are trying to shift away from, namely reliance on record data alone for taxonomic analyses. I think we need to shift more towards simpler, cleaner parsing and away from the tendency to add fixes based on one sequence record failing, which is due to the overly complex parsing scheme currently present. As past fixes attest, there will always be another sequence record with a weird name down the road that will break parsing again! For instance, the first bug could be solved by splitting the complete classification array on ';' alone, since that is the delimiter used for the classification array; there is a substitution of the '.' which causes an extra split and the parsing error. The second bug could be solved by simply assigning the SOURCE name to to scientific_name (or node_name), and any data in parentheses to common_name(); organelles would be parsed out as well. No more subparsing fixes based on trying to work out genus/species/subsp/etc, which is where this bug occurs. Maybe I'm alone in that. Sendu? Any thoughts? chris From bosborne11 at verizon.net Tue Jan 2 14:06:22 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 02 Jan 2007 14:06:22 -0500 Subject: [Bioperl-l] store variations, generate sequences In-Reply-To: <459A8EE0.3000504@lycos.de> Message-ID: Marian, I can't answer your questions as I'm not familiar with these modules but in Bioperl one place to look for example code is in the t/ directory. So I see the following interesting files: t/Variation_IO.t t/SeqDiff.t t/DNAMutation.t t/Allele.t t/SNP.t This is not intended to be exhaustive, there may be other useful files in there. Brian O. On 1/2/07 11:57 AM, "Marian thieme" wrote: > sorry if you get this email twice, prob. the first time I have used the > wrong sender address > > Hi all, > > I am quite new to bioperl and I have a question about sequence data: I > am working on a resequencing project. Here we have resequenced 1000 > genes of a certain gene. My question: What is easiest way to store each > discovered variation of each individual and get a fasta sequence for an > arbitrary individual. > > I would expect that there is some way to set up a reference sequence and > store all variationsm relative to this reference sequence. Afterward it > should be possible to genereate sequences for each indiviudal in > question, right ? > > My approach was the following: > > I have created an seqdiff object: > > $seqDiff = Bio::Variation::SeqDiff->new (...) > > > and I have assigned the reference sequence to that object via: > > $seqDiff->dna_ori('atgcgtatatg'); > > > Now I thought, I can create some variations via DNAMutation object: > > $dnamut = Bio::Variation::DNAMutation->new ( > -start => 6, > -end => 6, > -length => 1, > -isMutation => 1, > -upStreamSeq => 'atgcg', > -dnStreamSeq => 'atatg' > ); > > $a1 = Bio::Variation::Allele->new; > $a1->seq('t'); > $dnamut->allele_ori($a1); > > my $a2 = Bio::Variation::Allele->new; > $a2->seq('a'); > $dnamut->add_Allele($a2); > > > > Is that the correct way to describe the reference sequence, describe a > variation and attach this to seqdiff object ? > Probably I didnt understand the api right. (I did assume start/end means > start/endposition of the mutation). Is it possible to get a complete > sequence print (fast format) of each variation/indiviudal ? > > Regards, > Marian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Wed Jan 3 05:24:34 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 3 Jan 2007 11:24:34 +0100 Subject: [Bioperl-l] PopGen In-Reply-To: <358f4d650701030145y167a2406hb27dffa16ffb06d1@mail.gmail.com> References: <459A8CD2.9060805@klinik.uni-regensburg.de> <358f4d650701020935s45a76d0fm208912e60893bf52@mail.gmail.com> <459B741B.7070504@klinik.uni-regensburg.de> <358f4d650701030145y167a2406hb27dffa16ffb06d1@mail.gmail.com> Message-ID: <358f4d650701030224h370114ablf1beb1cb076071c4@mail.gmail.com> To add a bit more info. Using the example.hap file in the t/data dir of bioperl, you can see that the alleles correspond to the nucleotides, and the marker name corresponds to the dbSNP rs id (I guess in your case it can be something that relates to the coords of the sequence): #!/usr/local/bin/perl use Bio::PopGen::IO; my $io = new Bio::PopGen::IO(-format => 'hapmap', -file => '../../t/data/example.hap'); # Some IO might support reading in a population at a time my @population; while ( my $ind = $io->next_individual ) { push @population, $ind; } foreach my $individual (@population) { my @genotypes = $individual->get_Genotypes; foreach my $genotype (@genotypes) { print "individual_id ", $genotype->individual_id ,"\n"; print "alleles ", $genotype->get_Alleles ,"\n"; print "marker_name ", $genotype->marker_name ,"\n"; } } 1; On 1/3/07, Albert Vilella wrote: > Well, in that cases the alleles are numerical ids instead of > nucleotides... but in your case you will have the nucleotide > corresponding to the coordinate with polymorphism... > > On 1/3/07, Marian Thieme wrote: > > Albert, > > > > thank you very much for this hint. I did completely overlook the PopGen > > package. But at least one question remains, because I didnt fully > > understand the allele attribute of the Bio::PopGen::Genotype object, > > perhaps you can help me: > > > > in the HOWTO (http://www.bioperl.org/wiki/HOWTO:PopGen) there is a > > Genotype created by: > > > > my $genotype = Bio::PopGen::Genotype->new(-marker_name => 'D7S123', > > -individual_id => '1001', > > -alleles => > > ['104','107'] ); > > > > Can you explain me what the numbers mean (-alleles=> ['104','107'] );) ? > > I would expect that an allele is specified by a position AND the bases > > which are different to the bases in the original (reference) sequence. > > > > Regards, > > Marian > > > > Albert Vilella wrote: > > > > > The Bio::PopGen modules contain Individual, population and genotype > > > objects, among other utilities. There are some input/output formats in > > > Bio::PopGen::IO and also some methods to go from an aln to a > > > population. > > > > > > That said, I am not entirely sure about how much of that overlaps with > > > Bio::Variation. > > > > > > If you think anything missing that you would like to have implemented > > > in bioperl, we would greatly appreciate your feedback, > > > > > > Cheers, > > > > > > Albert. > > > > > > On 1/2/07, Marian Thieme wrote: > > > > > >> Hi all, > > >> > > >> I am quite new to bioperl and I have a question about sequence data: I > > >> am working on a resequencing project. Here we have resequenced 1000 > > >> genes of a certain gene. My question: What is easiest way to store each > > >> discovered variation of each individual and get a fasta sequence for an > > >> arbitrary individual. > > >> > > >> I would expect that there is some way to set up a reference sequence and > > >> store all variationsm relative to this reference sequence. Afterward it > > >> should be possible to genereate sequences for each indiviudal in > > >> question, right ? > > >> > > >> My approach was the following: > > >> > > >> I have created an seqdiff object: > > >> > > >> $seqDiff = Bio::Variation::SeqDiff->new (...) > > >> > > >> > > >> and I have assigned the reference sequence to that object via: > > >> > > >> $seqDiff->dna_ori('atgcgtatatg'); > > >> > > >> > > >> Now I thought, I can create some variations via DNAMutation object: > > >> > > >> $dnamut = Bio::Variation::DNAMutation->new ( > > >> -start => 6, > > >> -end => 6, > > >> -length => 1, > > >> -isMutation => 1, > > >> -upStreamSeq => 'atgcg', > > >> -dnStreamSeq => 'atatg' > > >> ); > > >> > > >> $a1 = Bio::Variation::Allele->new; > > >> $a1->seq('t'); > > >> $dnamut->allele_ori($a1); > > >> > > >> my $a2 = Bio::Variation::Allele->new; > > >> $a2->seq('a'); > > >> $dnamut->add_Allele($a2); > > >> > > >> > > >> > > >> Is that the correct way to describe the reference sequence, describe a > > >> variation and attach this to seqdiff object ? > > >> Probably I didnt understand the api right. (I did assume start/end means > > >> start/endposition of the mutation). Is it possible to get a complete > > >> sequence print (fast format) of each variation/indiviudal ? > > >> > > >> Regards, > > >> Marian > > >> > > >> -- > > >> Marian Thieme > > >> University Regensburg > > >> Institute of Functional Genomics > > >> Josef-Engert-Str. 9 > > >> 93053 > > >> Regensburg > > >> Germany > > >> P: 0049 (0)941 943 5055 > > >> F: 0049 (0)941 943 5020 > > >> E: marian.thieme at klinik.uni-regensburg.de > > >> W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > -- > > Marian Thieme > > University Regensburg > > Institute of Functional Genomics > > Josef-Engert-Str. 9 > > 93053 > > Regensburg > > Germany > > P: 0049 (0)941 943 5055 > > F: 0049 (0)941 943 5020 > > E: marian.thieme at klinik.uni-regensburg.de > > W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik > > > > > From enrique_rulz at yahoo.com Wed Jan 3 09:42:59 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Wed, 3 Jan 2007 06:42:59 -0800 (PST) Subject: [Bioperl-l] Assignment problem :( Message-ID: <8141866.post@talk.nabble.com> Hi guys...I ve got this assignment to submit ...I m not asking n e 1 to do it for me...But if n e can gimme n e direction will be really glad,,,There are four .embl files which I have attached...We need to do the following things 1) Read 3 arguments from the user at command line. 2) Read in the set of files in a directory. 3)for each sequence reformat the sequence in to fasta format. 4)for each sequence in fasta format search for close homologues using the local copy of BLAST. 5) for each sequence print the swissprot name and code of its close homologues to an HTML formatted file that can be viewed in a web browser.Is it possible to extract the evalues for each hit & record them in the HTML file? I m done with first two But havin problem in 3,4 & 5...I ve jus got basic knowledge of PERL... I ve been really tryin hard to get solution...But not able to solve them...Plzz..If n e 1 can help me in giving direction will be really great full... Thanx in advance.. File 1 : P41391.embl ========================================================================================= ID sp|P41391|RNA1_SCHPOstandard; AA; UNK; 386 BP. XX AC unknown; XX DE Ran GTPase-activating protein 1 (Protein rna1) - Schizosaccharomyces pombe DE (Fission yeast). XX FH Key Location/Qualifiers FH XX SQ Sequence 386 BP; 31 A; 4 C; 20 G; 18 T; 313 other; msrfsiegks lkldaitted eksvfavlle ddsvkeivls gntigteaar wlseniaskk 60 dleiaefsdi ftgrvkdeip ealrlllqal lkcpklhtvr lsdnafgpta qeplidflsk 120 htplehlylh nnglgpqaga kiaralqela vnkkaknapp lrsiicgrnr lengsmkewa 180 ktfqshrllh tvkmvqngir pegiehllle glaycqelkv ldlqdntfth lgssalaial 240 kswpnlrelg lndcllsarg aaavvdafsk leniglqtlr lqyneielda vrtlktvide 300 kmpdllflel ngnrfseedd vvdeirevfs trgrgeldel ddmeeltdee eedeeeeaes 360 qspepetsee ekedkelade lskahi 386 // =========================================================================================== File 2: P43994.embl =========================================================================================== ID sp|P43994|Y395_HAEINstandard; AA; UNK; 102 BP. XX AC unknown; XX DE UPF0125 protein HI0395 - Haemophilus influenzae. XX FH Key Location/Qualifiers FH XX SQ Sequence 102 BP; 10 A; 0 C; 5 G; 5 T; 82 other; mnqinieiay afperyylks fqvdegitvq taitqsgils qfpeidlstn kigifsrpik 60 ltdvlkegdr ieiyrpllad pkeirrkraa eqaaakdkek ga 102 // ========================================================================================== File 3: Q9UJ38.embl ========================================================================================== ID sp|Q9UJ37|SI7B_HUMANstandard; AA; UNK; 374 BP. XX AC unknown; XX DE Alpha-N-acetylgalactosaminide alpha-2,6-sialyltransferase II (EC 2.4.99.-) DE (Gal-beta-1,3-GalNAc alpha-2,6-sialyltransferase) (ST6GalNAc II) DE (Sialyltransferase 7B) (SThM) - Homo sapiens (Human). XX FH Key Location/Qualifiers FH XX SQ Sequence 374 BP; 30 A; 5 C; 31 G; 18 T; 290 other; mglprgsffw llllltaacs gllfalyfsa vqrypgpaag ardttsfeaf fqskasnswt 60 gkgqacrhll hlaiqrhphf rglfnlsipv llwgdlftpa lwdrlsqhka pygwrglshq 120 viastlslln gsesaklfap prdtppkcir cavvgnggil ngsrqgpnid ahdyvfrlng 180 avikgferdv gtktsfygft vntmknslvs ywnlgftsvp qgqdlqyifi psdirdyvml 240 rsailgvpvp egldkgdrph ayfgpeasas kfkllhpdfi sylterflks klinthfgdl 300 ympstgalml ltalhtcdqv saygfitsny wkfsdhyfer kmkplifyan hdlsleaalw 360 rdlhkagilq lyqr 374 // =========================================================================================== File 4: Q9WZY7.embl ========================================================================================== ID tr|Q9WZY7 standard; AA; UNK; 185 BP. XX AC unknown; XX DE Hypothetical protein - Thermotoga maritima. XX FH Key Location/Qualifiers FH XX SQ Sequence 185 BP; 17 A; 2 C; 13 G; 12 T; 141 other; mvlfekpgke ntrktleiai qkaselsskk lliasatgys armalemipe dmklvvvthh 60 agfeepdtqe fdeelrkllk ekghdvltat halsagersl rrkfggiypl eiiantlrmf 120 segvkvgvei tlmaadaglv ktselvvacg gtesgldsai vvkpanspnl fdlkiteilc 180 kplis 185 // =========================================================================================== I ve even Copy pasted the file if u guys not able to dowload http://maxupload.com/0FD1F2F4 <-- P41391.embl http://maxupload.com/E62811F4 <--P43994.embl http://maxupload.com/88F8A43E<- Q9UJ38.embl http://maxupload.com/B687CE7D<--Q9WZY7.embl -- View this message in context: http://www.nabble.com/Assignment-problem-%3A%28-tf2913859.html#a8141866 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sdavis2 at mail.nih.gov Wed Jan 3 10:53:00 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 3 Jan 2007 10:53:00 -0500 Subject: [Bioperl-l] Assignment problem :( In-Reply-To: <8141866.post@talk.nabble.com> References: <8141866.post@talk.nabble.com> Message-ID: <200701031053.00753.sdavis2@mail.nih.gov> On Wednesday 03 January 2007 09:42, Kurt Gobain wrote: > Hi guys...I ve got this assignment to submit ...I m not asking n e 1 to do > it for me...But if n e can gimme n e direction will be really glad,,,There > are four .embl files which I have attached...We need to do the following > things 1) Read 3 arguments from the user at command line. > 2) Read in the set of files in a directory. > 3)for each sequence reformat the sequence in to fasta format. > 4)for each sequence in fasta format search for close homologues using the > local copy of BLAST. > 5) for each sequence print the swissprot name and code of its close > homologues to an HTML formatted file that can be viewed in a web browser.Is > it possible to extract the evalues for each hit & record them in the HTML > file? > > I m done with first two But havin problem in 3,4 & 5...I ve jus got basic > knowledge of PERL... I ve been really tryin hard to get solution...But not > able to solve them...Plzz..If n e 1 can help me in giving direction will be > really great full... For 3, 4, and 5, I would probably look at the howtos here: http://www.bioperl.org/wiki/HOWTOs I think you will find the answers you need in the first 3. You will also need to have some understanding of HTML to produce your HTML file. There are dozens of tutorials on HTML online. Simply googling will give you MANY hits. Good luck. Sean From florin at iucha.net Wed Jan 3 10:57:20 2007 From: florin at iucha.net (Florin Iucha) Date: Wed, 3 Jan 2007 09:57:20 -0600 Subject: [Bioperl-l] Assignment problem :( In-Reply-To: <8141866.post@talk.nabble.com> References: <8141866.post@talk.nabble.com> Message-ID: <20070103155720.GS22307@iucha.net> On Wed, Jan 03, 2007 at 06:42:59AM -0800, Kurt Gobain wrote: > > Hi guys...I ve got this assignment to submit ...I m not asking n e 1 to do it > for me...But if n e can gimme n e direction will be really glad,,,There are > four .embl files which I have attached...We need to do the following things > 1) Read 3 arguments from the user at command line. > 2) Read in the set of files in a directory. > 3)for each sequence reformat the sequence in to fasta format. > 4)for each sequence in fasta format search for close homologues using the > local copy of BLAST. > 5) for each sequence print the swissprot name and code of its close > homologues to an HTML formatted file that can be viewed in a web browser.Is > it possible to extract the evalues for each hit & record them in the HTML > file? > > I m done with first two But havin problem in 3,4 & 5...I ve jus got basic > knowledge of PERL... I ve been really tryin hard to get solution...But not > able to solve them...Plzz..If n e 1 can help me in giving direction will be > really great full... "Kurt", Please read this first: http://catb.org/~esr/faqs/smart-questions.html You are more likely to get help if you show what you've done and where you got stuck. Randomly asking people to do parts of your homework won't get you anywhere. A good example is: I have read chapter 7 of the documentation at http://whatever and modified example 3 in this way $baz{"embl"} = 9; $foo->bar(15, \%baz); to convert embl to fasta and I get this error "unable to divide by 0" at line 32. If you don't put any work in researching and asking the question, don't expect anybody else to put any work in doing it for you. Good luck, florin -- Bruce Schneier expects the Spanish Inquisition. http://geekz.co.uk/schneierfacts/fact/163 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070103/5fcc2aff/attachment.bin From arareko at campus.iztacala.unam.mx Wed Jan 3 13:01:36 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 03 Jan 2007 12:01:36 -0600 Subject: [Bioperl-l] Assignment problem :( In-Reply-To: <200701031053.00753.sdavis2@mail.nih.gov> References: <8141866.post@talk.nabble.com> <200701031053.00753.sdavis2@mail.nih.gov> Message-ID: <459BEF80.5080502@campus.iztacala.unam.mx> Sean Davis wrote: > You will also need to have some understanding of HTML to produce your > HTML file. There are dozens of tutorials on HTML online. Simply > googling will give you MANY hits. HTML::Template is a good/easy solution for writing HTML from Perl. Its POD is comprehensive, read it and have fun! Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From bix at sendu.me.uk Wed Jan 3 13:09:26 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 03 Jan 2007 18:09:26 +0000 Subject: [Bioperl-l] Auto-method caller proposal Message-ID: <459BF156.2020702@sendu.me.uk> I propose a method that sets method values based on user-supplied args to new(), most likely to be placed in Bio::Root::RootI (given its intention to substitute or complement _rearrange for some module authors). The method name (_set_from_args) is open to alternative suggestions. A lazy module author (eg. someone doing a run wrapper) might say: package Bio::Tools::Run::Lazy; sub new { my ($class, @args) = @_; my $self = $class->SUPER::new(@args); $self->_set_from_args(\@args, -methods => [qw(id score evalue)], -create => 1); return $self; } 1; A user with a tendency to accidentally press shift or forget to use dashes could then say: use Bio::Tools::Run::Lazy; my $lazy = Bio::Tools::Run::Lazy->new(-sCore => 5, evalue => 0); my $id = $lazy->id # undef, not fatal my $score = $lazy->score # 5, $lazy->sCore would be fatal my $evalue = $lazy->evalue # 0 This has the very slight advantage over AUTOLOAD in that we can $lazy->can('id'), and the better advantage over the current run wrappers: not every one of them would have to define its own AUTOLOAD method and have its own way of dealing with dashed or dashless parameters. For less lazy authors who define all their methods, we can still gain a benefit. Instead of the current: package Bio::Tools::Run::GoodBoy; sub new { my ($class, @args) = @_; my $self = $class->SUPER::new(@args); my ($id, $score, $evalue) = $self->_rearrange([qw(ID SCORE EVALUE)], %args); $self->id($id) if defined $id; $self->score($score) if defined $score; $self->evalue($evalue) if defined $evalue; return $self; } # methods... 1; We can have the nicer: package Bio::Tools::Run::GoodBoy; sub new { my ($class, @args) = @_; my $self = $class->SUPER::new(@args); $self->_set_from_args(\@args, -methods => [qw(id score evalue)]); return $self; } # methods... 1; Proposed code (excuse the broken formatting): =head2 _set_from_args Usage : $object->_set_from_args(\%args, -methods => \@methods) Purpose : Takes a hash of user-supplied args whos keys match method names, : and calls the method supplying it the corresponding value. Example : $self->_set_from_args(%args, -methods => [qw(sequence id desc)]); : Where %args = (-sequence => $s, : -description => $d, : -ID => $i); Returns : n/a : the above _set_from_args calls the following methods: : $self->sequence($s); : $self->id($i); : ( $self->description($i) is not called because 'description' wasn't : one of the given methods ) Argument : \%args : a hash ref of arguments where keys are any-case : strings corresponding to method names but : optionally prefixed with hyphens, and values are : the values the method should be supplied : -methods => [] : (optional) only call methods with names in this : array ref : -force => bool : (optional, default 0) call methods that don't : seem to exist, ie. let AUTOLOAD handle them : -create => bool : (optional, default 0) when a method doesn't : exist, create it as a simple getter/setter : (combined with -methods it would create all the : supplied methods that didn't exist, even if not : mentioned in the supplied %args) =cut sub _set_from_args { my ($self, $args, @own_args) = @_; $self->throw("a hash ref of arguments must be supplied") unless ref($args); my ($methods, $force, $create); if (@own_args) { ($methods, $force, $create) = $self->_rearrange([qw(METHODS FORCE CREATE)], @own_args); } my %args = ref($args) eq 'HASH' ? %{$args} : @{$args}; my %methods = $methods ? map { lc($_) => $_ } @{$methods} : (); if ($create) { foreach my $method (@{$methods}) { $self->can($method) && next; # create get/setter method no strict 'refs'; *{ref($self).'::'.$method} = sub { my $self = shift; if (@_) { $self->{'_'.$method} = shift } return $self->{'_'.$method} || return; }; } } while (my ($method, $value) = each %args) { $method =~ s/^-+//; $method = $methods{lc($method)} || ($methods ? next : $method); unless ($force) { $self->can($method) || next; } $self->$method($value); } } From cjfields at uiuc.edu Wed Jan 3 14:29:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 Jan 2007 13:29:38 -0600 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <459BF156.2020702@sendu.me.uk> References: <459BF156.2020702@sendu.me.uk> Message-ID: I completely agree; this would come in particularly handy with the EUtilities parameter get/sets. It would also be handy to have this or something similar handle code fragments on the fly, so you could deal with more complex methods: our %DT = ( 'foo' => 'my $self=shift; my $self->{\'_foo\'} = $shift if @_; return;', 'bar' => 'my $self=shift; my $bar = $shift if @_; return ($self->foo)*$bar if $bar;', ... ); # in new() $self->_set_from_args(\@args, -methods => [qw(id foo bar)], -dispatch_table => \%DT, -create => 1); If the method exists in the dispatch_table hash then you could have more complex subs; all others would be simple get/sets. Don't know how feasible it would be, but maybe something like the following (in _set_from_args()): ... if ($create) { foreach my $method (@{$methods}) { $self->can($method) && next; no strict 'refs'; if (exists($dispatch_table{$method})) { my $sub = eval "sub { $dispatch_table{$method} }"; $self->throw("Compilation error for $method : $@") if $@; *{ref($self).'::'.$method}. = $sub; } else { # create simple get/setter method *{ref($self).'::'.$method} = sub { my $self = shift; if (@_) { $self->{'_'.$method} = shift } return $self->{'_'.$method} || return; }; } } chris On Jan 3, 2007, at 12:09 PM, Sendu Bala wrote: > I propose a method that sets method values based on user-supplied args > to new(), most likely to be placed in Bio::Root::RootI (given its > intention to substitute or complement _rearrange for some module > authors). The method name (_set_from_args) is open to alternative > suggestions. > > A lazy module author (eg. someone doing a run wrapper) might say: > > package Bio::Tools::Run::Lazy; > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); > > $self->_set_from_args(\@args, > -methods => [qw(id score evalue)], > -create => 1); > > return $self; > } > 1; > > A user with a tendency to accidentally press shift or forget to use > dashes could then say: > > use Bio::Tools::Run::Lazy; > my $lazy = Bio::Tools::Run::Lazy->new(-sCore => 5, evalue => 0); > my $id = $lazy->id # undef, not fatal > my $score = $lazy->score # 5, $lazy->sCore would be fatal > my $evalue = $lazy->evalue # 0 > > > This has the very slight advantage over AUTOLOAD in that we can > $lazy->can('id'), and the better advantage over the current run > wrappers: not every one of them would have to define its own AUTOLOAD > method and have its own way of dealing with dashed or dashless > parameters. > > > For less lazy authors who define all their methods, we can still > gain a > benefit. Instead of the current: > > package Bio::Tools::Run::GoodBoy; > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); > > my ($id, $score, $evalue) = $self->_rearrange([qw(ID SCORE > EVALUE)], > %args); > > $self->id($id) if defined $id; > $self->score($score) if defined $score; > $self->evalue($evalue) if defined $evalue; > > return $self; > } > # methods... > 1; > > We can have the nicer: > > package Bio::Tools::Run::GoodBoy; > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); > > $self->_set_from_args(\@args, > -methods => [qw(id score evalue)]); > > return $self; > } > # methods... > 1; > > > > > Proposed code (excuse the broken formatting): > > =head2 _set_from_args > > Usage : $object->_set_from_args(\%args, -methods => \@methods) > Purpose : Takes a hash of user-supplied args whos keys match > method > names, > : and calls the method supplying it the corresponding > value. > Example : $self->_set_from_args(%args, -methods => [qw(sequence id > desc)]); > : Where %args = (-sequence => $s, > : -description => $d, > : -ID => $i); > Returns : n/a > : the above _set_from_args calls the following methods: > : $self->sequence($s); > : $self->id($i); > : ( $self->description($i) is not called because > 'description' wasn't > : one of the given methods ) > Argument : \%args : a hash ref of arguments where keys are > any-case > : strings corresponding to method > names but > : optionally prefixed with hyphens, and > values are > : the values the method should be > supplied > : -methods => [] : (optional) only call methods with > names > in this > : array ref > : -force => bool : (optional, default 0) call methods > that > don't > : seem to exist, ie. let AUTOLOAD > handle them > : -create => bool : (optional, default 0) when a method > doesn't > : exist, create it as a simple getter/ > setter > : (combined with -methods it would > create > all the > : supplied methods that didn't exist, > even > if not > : mentioned in the supplied %args) > > =cut > > sub _set_from_args { > my ($self, $args, @own_args) = @_; > $self->throw("a hash ref of arguments must be supplied") unless > ref($args); > > my ($methods, $force, $create); > if (@own_args) { > ($methods, $force, $create) = $self->_rearrange([qw(METHODS > FORCE > CREATE)], > @own_args); > } > > my %args = ref($args) eq 'HASH' ? %{$args} : @{$args}; > my %methods = $methods ? map { lc($_) => $_ } @{$methods} : (); > > if ($create) { > foreach my $method (@{$methods}) { > $self->can($method) && next; > > # create get/setter method > no strict 'refs'; > *{ref($self).'::'.$method} = sub { my $self = shift; > if (@_) { > $self->{'_'.$method} = shift } > return > $self->{'_'.$method} || return; }; > } > } > > while (my ($method, $value) = each %args) { > $method =~ s/^-+//; > $method = $methods{lc($method)} || ($methods ? next : > $method); > > unless ($force) { > $self->can($method) || next; > } > > $self->$method($value); > } > } From avilella at gmail.com Wed Jan 3 14:44:25 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 3 Jan 2007 20:44:25 +0100 Subject: [Bioperl-l] PopGen In-Reply-To: <459BD8AD.4000603@klinik.uni-regensburg.de> References: <459A8CD2.9060805@klinik.uni-regensburg.de> <358f4d650701020935s45a76d0fm208912e60893bf52@mail.gmail.com> <459B741B.7070504@klinik.uni-regensburg.de> <358f4d650701030145y167a2406hb27dffa16ffb06d1@mail.gmail.com> <358f4d650701030224h370114ablf1beb1cb076071c4@mail.gmail.com> <459BD8AD.4000603@klinik.uni-regensburg.de> Message-ID: <358f4d650701031144v5bd650eci61a7771be64bda9@mail.gmail.com> Let's see if anyone else in the bioperl-ml has any comments/ideas on how this could be done :) On 1/3/07, Marian Thieme wrote: > Hi again, > > if I understand the t/data/example.hap file right, then it is nearly the > desired format and I can relatively easy import our snp data into some > perl object. > Because you asked/encouraged me to give some feedback: I didnt find a > way to output the complete sequence of each individual while regarding > the specific sequence properties (specific alleles) of that individual. > If I understand the PopGen Api right, then it can represent > snps/variations specific to an individual, but it doesnt cope the > complete (reference) sequence. > Of course, I can get the some reference sequence from genbank or where > else and produce a individual specific sequence based on that reference > seq. by subsituting bases in the corresponding positions. But I was > hoping there is some function which do this for me. If not perhaps I can > develop this feature and contribute to the bioperl package !? > > Marian > > > Albert Vilella wrote: > > > To add a bit more info. Using the example.hap file in the t/data dir > > of bioperl, you can see that the alleles correspond to the > > nucleotides, and the marker name corresponds to the dbSNP rs id (I > > guess in your case it can be something that relates to the coords of > > the sequence): > > > > #!/usr/local/bin/perl > > > > use Bio::PopGen::IO; > > my $io = new Bio::PopGen::IO(-format => 'hapmap', > > -file => '../../t/data/example.hap'); > > > > # Some IO might support reading in a population at a time > > > > my @population; > > while ( my $ind = $io->next_individual ) { > > push @population, $ind; > > } > > > > foreach my $individual (@population) { > > my @genotypes = $individual->get_Genotypes; > > foreach my $genotype (@genotypes) { > > print "individual_id ", $genotype->individual_id ,"\n"; > > print "alleles ", $genotype->get_Alleles ,"\n"; > > print "marker_name ", $genotype->marker_name ,"\n"; > > } > > } > > > > 1; > > > > > > On 1/3/07, Albert Vilella wrote: > > > >> Well, in that cases the alleles are numerical ids instead of > >> nucleotides... but in your case you will have the nucleotide > >> corresponding to the coordinate with polymorphism... > >> > >> On 1/3/07, Marian Thieme wrote: > >> > Albert, > >> > > >> > thank you very much for this hint. I did completely overlook the > >> PopGen > >> > package. But at least one question remains, because I didnt fully > >> > understand the allele attribute of the Bio::PopGen::Genotype object, > >> > perhaps you can help me: > >> > > >> > in the HOWTO (http://www.bioperl.org/wiki/HOWTO:PopGen) there is a > >> > Genotype created by: > >> > > >> > my $genotype = Bio::PopGen::Genotype->new(-marker_name => 'D7S123', > >> > -individual_id => '1001', > >> > -alleles => > >> > ['104','107'] ); > >> > > >> > Can you explain me what the numbers mean (-alleles=> ['104','107'] > >> );) ? > >> > I would expect that an allele is specified by a position AND the bases > >> > which are different to the bases in the original (reference) sequence. > >> > > >> > Regards, > >> > Marian > >> > > >> > Albert Vilella wrote: > >> > > >> > > The Bio::PopGen modules contain Individual, population and genotype > >> > > objects, among other utilities. There are some input/output > >> formats in > >> > > Bio::PopGen::IO and also some methods to go from an aln to a > >> > > population. > >> > > > >> > > That said, I am not entirely sure about how much of that overlaps > >> with > >> > > Bio::Variation. > >> > > > >> > > If you think anything missing that you would like to have > >> implemented > >> > > in bioperl, we would greatly appreciate your feedback, > >> > > > >> > > Cheers, > >> > > > >> > > Albert. > >> > > > >> > > On 1/2/07, Marian Thieme > >> wrote: > >> > > > >> > >> Hi all, > >> > >> > >> > >> I am quite new to bioperl and I have a question about sequence > >> data: I > >> > >> am working on a resequencing project. Here we have resequenced 1000 > >> > >> genes of a certain gene. My question: What is easiest way to > >> store each > >> > >> discovered variation of each individual and get a fasta sequence > >> for an > >> > >> arbitrary individual. > >> > >> > >> > >> I would expect that there is some way to set up a reference > >> sequence and > >> > >> store all variationsm relative to this reference sequence. > >> Afterward it > >> > >> should be possible to genereate sequences for each indiviudal in > >> > >> question, right ? > >> > >> > >> > >> My approach was the following: > >> > >> > >> > >> I have created an seqdiff object: > >> > >> > >> > >> $seqDiff = Bio::Variation::SeqDiff->new (...) > >> > >> > >> > >> > >> > >> and I have assigned the reference sequence to that object via: > >> > >> > >> > >> $seqDiff->dna_ori('atgcgtatatg'); > >> > >> > >> > >> > >> > >> Now I thought, I can create some variations via DNAMutation object: > >> > >> > >> > >> $dnamut = Bio::Variation::DNAMutation->new ( > >> > >> -start => 6, > >> > >> -end => 6, > >> > >> -length => 1, > >> > >> -isMutation => 1, > >> > >> -upStreamSeq => 'atgcg', > >> > >> -dnStreamSeq => 'atatg' > >> > >> ); > >> > >> > >> > >> $a1 = Bio::Variation::Allele->new; > >> > >> $a1->seq('t'); > >> > >> $dnamut->allele_ori($a1); > >> > >> > >> > >> my $a2 = Bio::Variation::Allele->new; > >> > >> $a2->seq('a'); > >> > >> $dnamut->add_Allele($a2); > >> > >> > >> > >> > >> > >> > >> > >> Is that the correct way to describe the reference sequence, > >> describe a > >> > >> variation and attach this to seqdiff object ? > >> > >> Probably I didnt understand the api right. (I did assume > >> start/end means > >> > >> start/endposition of the mutation). Is it possible to get a > >> complete > >> > >> sequence print (fast format) of each variation/indiviudal ? > >> > >> > >> > >> Regards, > >> > >> Marian > >> > >> > >> > >> -- > >> > >> Marian Thieme > >> > >> University Regensburg > >> > >> Institute of Functional Genomics > >> > >> Josef-Engert-Str. 9 > >> > >> 93053 > >> > >> Regensburg > >> > >> Germany > >> > >> P: 0049 (0)941 943 5055 > >> > >> F: 0049 (0)941 943 5020 > >> > >> E: marian.thieme at klinik.uni-regensburg.de > >> > >> W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik > >> > >> > >> > >> _______________________________________________ > >> > >> Bioperl-l mailing list > >> > >> Bioperl-l at lists.open-bio.org > >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > > >> > > >> > -- > >> > Marian Thieme > >> > University Regensburg > >> > Institute of Functional Genomics > >> > Josef-Engert-Str. 9 > >> > 93053 > >> > Regensburg > >> > Germany > >> > P: 0049 (0)941 943 5055 > >> > F: 0049 (0)941 943 5020 > >> > E: marian.thieme at klinik.uni-regensburg.de > >> > W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik > >> > > >> > > >> > > > -- > Marian Thieme > University Regensburg > Institute of Functional Genomics > Josef-Engert-Str. 9 > 93053 > Regensburg > Germany > P: 0049 (0)941 943 5055 > F: 0049 (0)941 943 5020 > E: marian.thieme at klinik.uni-regensburg.de > W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik > > From aaron.j.mackey at gsk.com Wed Jan 3 15:01:25 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Wed, 3 Jan 2007 15:01:25 -0500 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <459BF156.2020702@sendu.me.uk> Message-ID: I'm not against this at all, but let's not reinvent a (somewhat-standard) wheel: see Class::MethodMaker and accompanying tools. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2007 01:09:26 PM: > I propose a method that sets method values based on user-supplied args > to new(), most likely to be placed in Bio::Root::RootI (given its > intention to substitute or complement _rearrange for some module > authors). The method name (_set_from_args) is open to alternative > suggestions. > > A lazy module author (eg. someone doing a run wrapper) might say: > > package Bio::Tools::Run::Lazy; > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); > > $self->_set_from_args(\@args, > -methods => [qw(id score evalue)], > -create => 1); > > return $self; > } > 1; > > A user with a tendency to accidentally press shift or forget to use > dashes could then say: > > use Bio::Tools::Run::Lazy; > my $lazy = Bio::Tools::Run::Lazy->new(-sCore => 5, evalue => 0); > my $id = $lazy->id # undef, not fatal > my $score = $lazy->score # 5, $lazy->sCore would be fatal > my $evalue = $lazy->evalue # 0 > > > This has the very slight advantage over AUTOLOAD in that we can > $lazy->can('id'), and the better advantage over the current run > wrappers: not every one of them would have to define its own AUTOLOAD > method and have its own way of dealing with dashed or dashless parameters. > > > For less lazy authors who define all their methods, we can still gain a > benefit. Instead of the current: > > package Bio::Tools::Run::GoodBoy; > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); > > my ($id, $score, $evalue) = $self->_rearrange([qw(ID SCORE EVALUE)], > %args); > > $self->id($id) if defined $id; > $self->score($score) if defined $score; > $self->evalue($evalue) if defined $evalue; > > return $self; > } > # methods... > 1; > > We can have the nicer: > > package Bio::Tools::Run::GoodBoy; > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); > > $self->_set_from_args(\@args, > -methods => [qw(id score evalue)]); > > return $self; > } > # methods... > 1; > > > > > Proposed code (excuse the broken formatting): > > =head2 _set_from_args > > Usage : $object->_set_from_args(\%args, -methods => \@methods) > Purpose : Takes a hash of user-supplied args whos keys match method > names, > : and calls the method supplying it the corresponding value. > Example : $self->_set_from_args(%args, -methods => [qw(sequence id > desc)]); > : Where %args = (-sequence => $s, > : -description => $d, > : -ID => $i); > Returns : n/a > : the above _set_from_args calls the following methods: > : $self->sequence($s); > : $self->id($i); > : ( $self->description($i) is not called because > 'description' wasn't > : one of the given methods ) > Argument : \%args : a hash ref of arguments where keys are > any-case > : strings corresponding to method names but > : optionally prefixed with hyphens, and > values are > : the values the method should be supplied > : -methods => [] : (optional) only call methods with names > in this > : array ref > : -force => bool : (optional, default 0) call methods that > don't > : seem to exist, ie. let AUTOLOAD handle them > : -create => bool : (optional, default 0) when a method doesn't > : exist, create it as a simple getter/setter > : (combined with -methods it would create > all the > : supplied methods that didn't exist, even > if not > : mentioned in the supplied %args) > > =cut > > sub _set_from_args { > my ($self, $args, @own_args) = @_; > $self->throw("a hash ref of arguments must be supplied") unless > ref($args); > > my ($methods, $force, $create); > if (@own_args) { > ($methods, $force, $create) = $self->_rearrange([qw(METHODS > FORCE > CREATE)], > @own_args); > } > > my %args = ref($args) eq 'HASH' ? %{$args} : @{$args}; > my %methods = $methods ? map { lc($_) => $_ } @{$methods} : (); > > if ($create) { > foreach my $method (@{$methods}) { > $self->can($method) && next; > > # create get/setter method > no strict 'refs'; > *{ref($self).'::'.$method} = sub { my $self = shift; > if (@_) { > $self->{'_'.$method} = shift } > return > $self->{'_'.$method} || return; }; > } > } > > while (my ($method, $value) = each %args) { > $method =~ s/^-+//; > $method = $methods{lc($method)} || ($methods ? next : $method); > > unless ($force) { > $self->can($method) || next; > } > > $self->$method($value); > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Jan 3 15:35:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 Jan 2007 14:35:28 -0600 Subject: [Bioperl-l] Bio::Tools RNA modules Message-ID: <341CD587-C5A0-4E8A-852F-1E1426BD9C64@uiuc.edu> I have added several RNA motif-based parsing modules (Infernal, RNAMotif, ERPIN) over time to Bio::Tools. However, I am in the process of modifying them for SearchIO-based parsing (possibly using Sendu's PullParserI and related modules), simply b/c I am also writing wrappers for said modules and want comparable results. I'll be committing the newer SearchIO-based modules over the next month or two, beyond which I don't see the point of maintaining the various Bio::Tools-based versions. Does anybody mind if they are deprecated by the next release? chris From bix at sendu.me.uk Thu Jan 4 05:15:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 04 Jan 2007 10:15:37 +0000 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: References: Message-ID: <459CD3C9.5010105@sendu.me.uk> aaron.j.mackey at gsk.com wrote: > I'm not against this at all, but let's not reinvent a (somewhat-standard) > wheel: see Class::MethodMaker and accompanying tools. It would certainly be possible for a module author to make use of Class::MethodMaker, but from what I can see each module would have to use Class::MethodMaker itself, leaving them to setup and configure it in their own way, and still leaving them to handle how user arguments become method values. The primary motivation for this proposal is to provide an incredibly simple and consistent way of turning user args into method values. Method creation was just an added extra, implemented since the burden was just a few extra lines of trivial code. I don't see any significant benefit* of farming out to another class to do that simple thing, only the disadvantage of adding what would be a third (true) pre-requisite for Bioperl installation. >> We can have the nicer: >> >> package Bio::Tools::Run::GoodBoy; >> sub new { >> my ($class, @args) = @_; >> my $self = $class->SUPER::new(@args); >> >> $self->_set_from_args(\@args, >> -methods => [qw(id score evalue)]); >> >> return $self; >> } >> # methods... >> 1; [*] The only benefit I see is that Class::MethodMaker lets you create methods for storing arrays and hashes, not just scalars. But in Bioperl we don't 'like' auto-created methods, using them primarily for the simple scalar get/setters needed by run-wrappers where it would be too tedious to implement them all directly. From bix at sendu.me.uk Thu Jan 4 05:45:45 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 04 Jan 2007 10:45:45 +0000 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: References: <459BF156.2020702@sendu.me.uk> Message-ID: <459CDAD9.5040101@sendu.me.uk> Chris Fields wrote: > I completely agree; this would come in particularly handy with the > EUtilities parameter get/sets. > > It would also be handy to have this or something similar handle code > fragments on the fly, so you could deal with more complex methods: > > our %DT = ( > 'foo' => 'my $self=shift; my $self->{\'_foo\'} = $shift if @_; > return;', > 'bar' => 'my $self=shift; my $bar = $shift if @_; return > ($self->foo)*$bar if $bar;', > ... > ); > > # in new() > > $self->_set_from_args(\@args, > -methods => [qw(id foo bar)], > -dispatch_table => \%DT, > -create => 1); > > If the method exists in the dispatch_table hash then you could have more > complex subs; all others would be simple get/sets. I not sure that this makes much sense; if you need something fancier than a simple scalar get/setter then by all means instead of writing it as a string and passing it to _set_from_args you should implement it as an explicit method in the class. That would be far preferable - users can then see its POD and code in the online documentation. The -create => 1 option is really only to take the tediousness out of writing a million simple scalar get/setters for run-wrappers or other modules with many (externally determined) attributes. From rvosa at sfu.ca Thu Jan 4 06:13:21 2007 From: rvosa at sfu.ca (Rutger Vos) Date: Thu, 04 Jan 2007 03:13:21 -0800 Subject: [Bioperl-l] Auto-method caller proposal Message-ID: <200701041113.l04BDL7k001110@rm-rstar.sfu.ca> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070104/401576c1/attachment.pl From hlapp at gmx.net Thu Jan 4 10:25:33 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 Jan 2007 10:25:33 -0500 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <459BF156.2020702@sendu.me.uk> References: <459BF156.2020702@sendu.me.uk> Message-ID: <0F529528-0C1C-4128-B464-DDD403D4F4EE@gmx.net> Sounds good to me, except that I'm still not sure that dashed and dash-less parameters are just variations of the same. What about - verbose and a verbose parameter for the tool, for example? If dashed and dash-less is really the same then it should indeed be silently treated as such but also the documentation should not encourage using them interchangeably. It just leaves too much room for confusion. -hilmar On Jan 3, 2007, at 1:09 PM, Sendu Bala wrote: > I propose a method that sets method values based on user-supplied args > to new(), most likely to be placed in Bio::Root::RootI (given its > intention to substitute or complement _rearrange for some module > authors). The method name (_set_from_args) is open to alternative > suggestions. > > A lazy module author (eg. someone doing a run wrapper) might say: > > package Bio::Tools::Run::Lazy; > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); > > $self->_set_from_args(\@args, > -methods => [qw(id score evalue)], > -create => 1); > > return $self; > } > 1; > > A user with a tendency to accidentally press shift or forget to use > dashes could then say: > > use Bio::Tools::Run::Lazy; > my $lazy = Bio::Tools::Run::Lazy->new(-sCore => 5, evalue => 0); > my $id = $lazy->id # undef, not fatal > my $score = $lazy->score # 5, $lazy->sCore would be fatal > my $evalue = $lazy->evalue # 0 > > > This has the very slight advantage over AUTOLOAD in that we can > $lazy->can('id'), and the better advantage over the current run > wrappers: not every one of them would have to define its own AUTOLOAD > method and have its own way of dealing with dashed or dashless > parameters. > > > For less lazy authors who define all their methods, we can still > gain a > benefit. Instead of the current: > > package Bio::Tools::Run::GoodBoy; > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); > > my ($id, $score, $evalue) = $self->_rearrange([qw(ID SCORE > EVALUE)], > %args); > > $self->id($id) if defined $id; > $self->score($score) if defined $score; > $self->evalue($evalue) if defined $evalue; > > return $self; > } > # methods... > 1; > > We can have the nicer: > > package Bio::Tools::Run::GoodBoy; > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); > > $self->_set_from_args(\@args, > -methods => [qw(id score evalue)]); > > return $self; > } > # methods... > 1; > > > > > Proposed code (excuse the broken formatting): > > =head2 _set_from_args > > Usage : $object->_set_from_args(\%args, -methods => \@methods) > Purpose : Takes a hash of user-supplied args whos keys match > method > names, > : and calls the method supplying it the corresponding > value. > Example : $self->_set_from_args(%args, -methods => [qw(sequence id > desc)]); > : Where %args = (-sequence => $s, > : -description => $d, > : -ID => $i); > Returns : n/a > : the above _set_from_args calls the following methods: > : $self->sequence($s); > : $self->id($i); > : ( $self->description($i) is not called because > 'description' wasn't > : one of the given methods ) > Argument : \%args : a hash ref of arguments where keys are > any-case > : strings corresponding to method > names but > : optionally prefixed with hyphens, and > values are > : the values the method should be > supplied > : -methods => [] : (optional) only call methods with > names > in this > : array ref > : -force => bool : (optional, default 0) call methods > that > don't > : seem to exist, ie. let AUTOLOAD > handle them > : -create => bool : (optional, default 0) when a method > doesn't > : exist, create it as a simple getter/ > setter > : (combined with -methods it would > create > all the > : supplied methods that didn't exist, > even > if not > : mentioned in the supplied %args) > > =cut > > sub _set_from_args { > my ($self, $args, @own_args) = @_; > $self->throw("a hash ref of arguments must be supplied") unless > ref($args); > > my ($methods, $force, $create); > if (@own_args) { > ($methods, $force, $create) = $self->_rearrange([qw(METHODS > FORCE > CREATE)], > @own_args); > } > > my %args = ref($args) eq 'HASH' ? %{$args} : @{$args}; > my %methods = $methods ? map { lc($_) => $_ } @{$methods} : (); > > if ($create) { > foreach my $method (@{$methods}) { > $self->can($method) && next; > > # create get/setter method > no strict 'refs'; > *{ref($self).'::'.$method} = sub { my $self = shift; > if (@_) { > $self->{'_'.$method} = shift } > return > $self->{'_'.$method} || return; }; > } > } > > while (my ($method, $value) = each %args) { > $method =~ s/^-+//; > $method = $methods{lc($method)} || ($methods ? next : > $method); > > unless ($force) { > $self->can($method) || next; > } > > $self->$method($value); > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Jan 4 11:13:43 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 Jan 2007 11:13:43 -0500 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <459D24AD.5060505@sendu.me.uk> References: <459BF156.2020702@sendu.me.uk> <0F529528-0C1C-4128-B464-DDD403D4F4EE@gmx.net> <459D24AD.5060505@sendu.me.uk> Message-ID: I see your point; and for the sake of reducing ambiguity/increasing clarity demanding unambiguous parameter names rather than disambiguating by the presence or absence of a leading dash is preferable I guess. Just should be clearly documented for module wrapper implementers; also, are we going to retrofit existing wrappers? -hilmar On Jan 4, 2007, at 11:00 AM, Sendu Bala wrote: > Hilmar Lapp wrote: >> Sounds good to me, except that I'm still not sure that dashed and >> dash-less parameters are just variations of the same. What about - >> verbose and a verbose parameter for the tool, for example? > > That's an open problem. I identified it in a recent thread as one > of the few reasons a run-wrapper module might deliberately demand > dashless args for program args and only allow dashed args for > Bioperl args. The consensus (and I agree) was that it is preferable > to allow dashed args for everything regardless of this problem. > > For this proposal, Bio::Root::Root's verbose() would be set for > both dashed and dashless, and then it would be up to the run- > wrapper module author to figure out some extra way to set a verbose > option independently for the program being wrapped. I imagine the > author might apologise in their POD and tell the user they had to > use 'program_verbose' or something, then handle things > appropriately when it comes time to create the argument string for > passing to the program. > > Alternatively (not something I recommend since it will causes > confusion and mistakes), you could deliberately request problem > cases be dashed or undashed: > > package Bio::Tools::Run::Confusing; > =head2 new > ... > To set Bioperl verbosity use: > -verbose => int > To activate program verbosity use: > verbose => 1 > =cut > sub new { > my ($class, @args) = @_; > my $self = $class->SUPER::new(@args); # Bioperl verbosity set here > > my %args = @args; > if (my $program_verbose = $args{verbose}) { > delete $args{verbose}; > $args{program_verbose} = $program_verbose; > } > > $self->_set_from_args(\%args, > -methods => [id score program_verbose], > -create => 1); > > return $self; > } > 1; > > # test code > my $factory = Bio::Tools::Run::Confusing->new(-verbose => 0, > verbose => 1); > is $factory->verbose, 0, 'Bioperl verbosity set correctly'; > is $factory->program_verbose, 1, 'program verbosity set correctly'; > > > > Any other ideas? -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Thu Jan 4 11:00:45 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 04 Jan 2007 16:00:45 +0000 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <0F529528-0C1C-4128-B464-DDD403D4F4EE@gmx.net> References: <459BF156.2020702@sendu.me.uk> <0F529528-0C1C-4128-B464-DDD403D4F4EE@gmx.net> Message-ID: <459D24AD.5060505@sendu.me.uk> Hilmar Lapp wrote: > Sounds good to me, except that I'm still not sure that dashed and > dash-less parameters are just variations of the same. What about > -verbose and a verbose parameter for the tool, for example? That's an open problem. I identified it in a recent thread as one of the few reasons a run-wrapper module might deliberately demand dashless args for program args and only allow dashed args for Bioperl args. The consensus (and I agree) was that it is preferable to allow dashed args for everything regardless of this problem. For this proposal, Bio::Root::Root's verbose() would be set for both dashed and dashless, and then it would be up to the run-wrapper module author to figure out some extra way to set a verbose option independently for the program being wrapped. I imagine the author might apologise in their POD and tell the user they had to use 'program_verbose' or something, then handle things appropriately when it comes time to create the argument string for passing to the program. Alternatively (not something I recommend since it will causes confusion and mistakes), you could deliberately request problem cases be dashed or undashed: package Bio::Tools::Run::Confusing; =head2 new ... To set Bioperl verbosity use: -verbose => int To activate program verbosity use: verbose => 1 =cut sub new { my ($class, @args) = @_; my $self = $class->SUPER::new(@args); # Bioperl verbosity set here my %args = @args; if (my $program_verbose = $args{verbose}) { delete $args{verbose}; $args{program_verbose} = $program_verbose; } $self->_set_from_args(\%args, -methods => [id score program_verbose], -create => 1); return $self; } 1; # test code my $factory = Bio::Tools::Run::Confusing->new(-verbose => 0, verbose => 1); is $factory->verbose, 0, 'Bioperl verbosity set correctly'; is $factory->program_verbose, 1, 'program verbosity set correctly'; Any other ideas? From bix at sendu.me.uk Thu Jan 4 11:21:18 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 04 Jan 2007 16:21:18 +0000 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: References: <459BF156.2020702@sendu.me.uk> <0F529528-0C1C-4128-B464-DDD403D4F4EE@gmx.net> <459D24AD.5060505@sendu.me.uk> Message-ID: <459D297E.5080001@sendu.me.uk> Hilmar Lapp wrote: > I see your point; and for the sake of reducing ambiguity/increasing > clarity demanding unambiguous parameter names rather than disambiguating > by the presence or absence of a leading dash is preferable I guess. > > Just should be clearly documented for module wrapper implementers; also, > are we going to retrofit existing wrappers? I think a retrofit would be a good idea. I'm considering doing it myself at some point if there are no objections. But some time should pass between me committing the method and the retrofit to allow time for 'issues' to be discovered and resolved. From cjfields at uiuc.edu Thu Jan 4 12:08:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 Jan 2007 11:08:59 -0600 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <459CDAD9.5040101@sendu.me.uk> References: <459BF156.2020702@sendu.me.uk> <459CDAD9.5040101@sendu.me.uk> Message-ID: <145293A0-3BA0-4D03-A623-D583327282E1@uiuc.edu> On Jan 4, 2007, at 4:45 AM, Sendu Bala wrote: > Chris Fields wrote: >> ... >> If the method exists in the dispatch_table hash then you could >> have more >> complex subs; all others would be simple get/sets. > > I not sure that this makes much sense; if you need something fancier > than a simple scalar get/setter then by all means instead of > writing it > as a string and passing it to _set_from_args you should implement > it as > an explicit method in the class. That would be far preferable - users > can then see its POD and code in the online documentation. The script/class POD would still describe the method w/o the actual code being present. Just describing the methods as get/setters or whatever in POD should be enough. It's already done for BioPerl classes inheriting defined methods from base or interface classes. > The -create => 1 option is really only to take the tediousness out of > writing a million simple scalar get/setters for run-wrappers or other > modules with many (externally determined) attributes. Not a big deal, really. I think it's probably best to have _set_from_args() use simple get/setters as you now have it, and have a different RootI-based method capable of doing what I described if needed. I agree there are other much more straightforward ways of doing this, but I'm thinking along the lines of added hooks for customization. The idea is one could add in customized code fairly easily using this (very useful, BTW) method, or a similar one. For example, I could replace code defined in a dispatch table with my own on the fly by passing in new code as an hash ref: # customized subs my %subs = ( 'ids' => '# modified id() for customized ID retrieval, not std get/set', 'score' => '# modified score() code using myparam() and mydata()' ); # customized user-params (get/sets) my @params = (qw(myparam mydata)); # pass into similar method to _set_from_args() # using parameters and hash/array refs BTW, don't know if this is possible, but can you get around 'no strict "refs"' by building the sub code as a string and using an eval? I did something like this in the EUtilities BEGIN block using heredoc, something picked up from Brian's Bio::DB::Query::GenBank. chris From staffa at niehs.nih.gov Thu Jan 4 13:38:39 2007 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Thu, 04 Jan 2007 13:38:39 -0500 Subject: [Bioperl-l] Mask vector Message-ID: Y'all got a module that'd cut vector sequence from either end of the result of a sequencing run? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From jason at bioperl.org Thu Jan 4 13:59:19 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 4 Jan 2007 10:59:19 -0800 Subject: [Bioperl-l] Mask vector In-Reply-To: References: Message-ID: <9029BAD3-42A0-4802-90CD-AA6FA2704268@bioperl.org> try LUCY http://www.tigr.org/software/sequencing.shtml On Jan 4, 2007, at 10:38 AM, Staffa, Nick (NIH/NIEHS) wrote: > Y'all got a module that'd cut vector sequence from either end of > the result > of a sequencing run? > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From cjfields at uiuc.edu Thu Jan 4 14:19:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 Jan 2007 13:19:26 -0600 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <459D297E.5080001@sendu.me.uk> References: <459BF156.2020702@sendu.me.uk> <0F529528-0C1C-4128-B464-DDD403D4F4EE@gmx.net> <459D24AD.5060505@sendu.me.uk> <459D297E.5080001@sendu.me.uk> Message-ID: <1A84E993-CA65-4A44-AA0A-292AABBD7329@uiuc.edu> On Jan 4, 2007, at 10:21 AM, Sendu Bala wrote: > Hilmar Lapp wrote: >> I see your point; and for the sake of reducing ambiguity/increasing >> clarity demanding unambiguous parameter names rather than >> disambiguating >> by the presence or absence of a leading dash is preferable I guess. >> >> Just should be clearly documented for module wrapper implementers; >> also, >> are we going to retrofit existing wrappers? > > I think a retrofit would be a good idea. I'm considering doing it > myself > at some point if there are no objections. But some time should pass > between me committing the method and the retrofit to allow time for > 'issues' to be discovered and resolved. No time like the present! This shouldn't affect any modules that don't use your new RootI method (unless there is something I missed...). One thing, though it's minor. The get/setters in the bioperl-run wrappers may return different values upon a set (i.e. the previously set value or the passed value). I have seen both get/setter types in core, so my guess is run will also have the same. chris From enrique_rulz at yahoo.com Thu Jan 4 17:15:16 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Thu, 4 Jan 2007 14:15:16 -0800 (PST) Subject: [Bioperl-l] Assignment problem :( In-Reply-To: <8141866.post@talk.nabble.com> References: <8141866.post@talk.nabble.com> Message-ID: <8168772.post@talk.nabble.com> Thanx for the replys...As suggested b Florin...I m posting the prog which I had done so far... ======================================================================= use strict; use warnings; my $dir= "$ARGV[0]" or die "inadequate argument at command line use /users/***/***/biocomputing1/assignment\n"; my $database="$ARGV[1]" or die "inadequate argument at commandline use /databases/swissprot/swissprot\n" ; my $file; my ($i,$seq); my $length=0; open (OUTPUT,">result.fasta"); opendir (DIR,"$dir") or die "cannot open the specified directory\n"; foreach $file (readdir DIR) { if ($file =~ /\.embl/) { open(FILE,"$dir/$file") or die "cannot open the specified file\n"; #print ">$file\n"; #print OUTPUT ">$file\n"; while () { if (/^\s/) { my @array = split ('',$sequence); foreach my $fasta_seq (@array) { if ( $fasta_seq =~ /[A-Za-z]/) { $length++; print "$fasta_seq"; print OUTPUT "$fasta_seq"; } } print"\n"; print "length of fasta sequence is $length\n"; } } my $fasta_seq=@_; for ($i=0,$i<6,$i++) { my $result="blastall -F T -p blastp -d $database -i result.fasta -o blastresult.data"; system($result); } print "\n"; print OUTPUT"\n"; } close (DIR); close (FILE); ======================================================================= I am stuck on bit no 5.. ""for each sequence print the swissprot name and code of its close homologues to an HTML formatted file that can be viewed in a web browser.Is it possible to extract the evalues for each hit & record them in the HTML file"" I am told we need to use sum Bioperl Packages. like Bio::DB::SwissProt, Bio::SeqIO;....It wud be great if any 1 can shed lighte on this.... Thanx in advance.. -- View this message in context: http://www.nabble.com/Assignment-problem-%3A%28-tf2913859.html#a8168772 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From stewarta at nmrc.navy.mil Fri Jan 5 17:02:51 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Fri, 5 Jan 2007 17:02:51 -0500 Subject: [Bioperl-l] (no subject) Message-ID: <1BF52B89-EA16-4FF0-8BFC-8D1EA3F7C91D@nmrc.navy.mil> After running Bio::Tools::StandAloneBlast->blastall on a collection of protein sequences, SearchIO then allows me to traverse the list of hits returned for those sequences which hits were found for. Here's a question: what if I want to know which query sequences -didn't- produce hits? Now, I realize I could just go back and compare the set of query sequences to the set of blast hits to find these 'no-hits', but I'm wondering if SearchIO can include the queries for which hits were not found. Anyone know? Or have any other suggestions? Thanks, Andrew -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From jason at bioperl.org Fri Jan 5 17:30:31 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 5 Jan 2007 14:30:31 -0800 Subject: [Bioperl-l] (no subject) In-Reply-To: <1BF52B89-EA16-4FF0-8BFC-8D1EA3F7C91D@nmrc.navy.mil> References: <1BF52B89-EA16-4FF0-8BFC-8D1EA3F7C91D@nmrc.navy.mil> Message-ID: I'm not sure I understand the question - can't you just test if there are no hits? if( $result->num_hits == 0 ) { # do something } On Jan 5, 2007, at 2:02 PM, Andrew Stewart wrote: > After running Bio::Tools::StandAloneBlast->blastall on a collection > of protein sequences, SearchIO then allows me to traverse the list of > hits returned for those sequences which hits were found for. Here's > a question: what if I want to know which query sequences -didn't- > produce hits? > > Now, I realize I could just go back and compare the set of query > sequences to the set of blast hits to find these 'no-hits', but I'm > wondering if SearchIO can include the queries for which hits were not > found. Anyone know? Or have any other suggestions? > > Thanks, > Andrew > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stewarta at nmrc.navy.mil Fri Jan 5 18:03:26 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Fri, 5 Jan 2007 18:03:26 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: References: <1BF52B89-EA16-4FF0-8BFC-8D1EA3F7C91D@nmrc.navy.mil> Message-ID: <165B74D1-7159-404B-B1C4-184C977EAE78@nmrc.navy.mil> Doh, why didn't I think of that? Thanks. On Jan 5, 2007, at 5:30 PM, Jason Stajich wrote: > I'm not sure I understand the question - can't you just test if > there are no hits? > if( $result->num_hits == 0 ) { # do something > } > > On Jan 5, 2007, at 2:02 PM, Andrew Stewart wrote: > >> After running Bio::Tools::StandAloneBlast->blastall on a collection >> of protein sequences, SearchIO then allows me to traverse the list of >> hits returned for those sequences which hits were found for. Here's >> a question: what if I want to know which query sequences -didn't- >> produce hits? >> >> Now, I realize I could just go back and compare the set of query >> sequences to the set of blast hits to find these 'no-hits', but I'm >> wondering if SearchIO can include the queries for which hits were not >> found. Anyone know? Or have any other suggestions? >> >> Thanks, >> Andrew >> >> -- >> Andrew Stewart >> Research Assistant, Genomics Team >> Navy Medical Research Center (NMRC) >> Biological Defense Research Directorate (BDRD) >> BDRD Annex >> 12300 Washington Avenue, 2nd Floor >> Rockville, MD 20852 >> >> email: stewarta at nmrc.navy.mil >> phone: 301-231-6700 Ext 270 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From cjfields at uiuc.edu Fri Jan 5 22:46:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 Jan 2007 21:46:20 -0600 Subject: [Bioperl-l] (no subject) In-Reply-To: <165B74D1-7159-404B-B1C4-184C977EAE78@nmrc.navy.mil> References: <1BF52B89-EA16-4FF0-8BFC-8D1EA3F7C91D@nmrc.navy.mil> <165B74D1-7159-404B-B1C4-184C977EAE78@nmrc.navy.mil> Message-ID: Looks like there are a few ways to go about doing that, actually. According to Search::Result::ResultI POD, no_hits_found() is supposed to work like: if( $result->no_hits_found() ) { # do something though I can't claim to have tried it myself. chris On Jan 5, 2007, at 5:03 PM, Andrew Stewart wrote: > Doh, why didn't I think of that? > > Thanks. > > > On Jan 5, 2007, at 5:30 PM, Jason Stajich wrote: > >> I'm not sure I understand the question - can't you just test if >> there are no hits? >> if( $result->num_hits == 0 ) { # do something >> } >> >> On Jan 5, 2007, at 2:02 PM, Andrew Stewart wrote: >> >>> After running Bio::Tools::StandAloneBlast->blastall on a collection >>> of protein sequences, SearchIO then allows me to traverse the >>> list of >>> hits returned for those sequences which hits were found for. Here's >>> a question: what if I want to know which query sequences -didn't- >>> produce hits? >>> >>> Now, I realize I could just go back and compare the set of query >>> sequences to the set of blast hits to find these 'no-hits', but I'm >>> wondering if SearchIO can include the queries for which hits were >>> not >>> found. Anyone know? Or have any other suggestions? >>> >>> Thanks, >>> Andrew >>> >>> -- >>> Andrew Stewart >>> Research Assistant, Genomics Team >>> Navy Medical Research Center (NMRC) >>> Biological Defense Research Directorate (BDRD) >>> BDRD Annex >>> 12300 Washington Avenue, 2nd Floor >>> Rockville, MD 20852 >>> >>> email: stewarta at nmrc.navy.mil >>> phone: 301-231-6700 Ext 270 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Sat Jan 6 08:25:56 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 06 Jan 2007 13:25:56 +0000 Subject: [Bioperl-l] (no subject) In-Reply-To: References: <1BF52B89-EA16-4FF0-8BFC-8D1EA3F7C91D@nmrc.navy.mil> <165B74D1-7159-404B-B1C4-184C977EAE78@nmrc.navy.mil> Message-ID: <459FA364.1020103@sendu.me.uk> Chris Fields wrote: > Looks like there are a few ways to go about doing that, actually. > According to Search::Result::ResultI POD, no_hits_found() is supposed > to work like: > > if( $result->no_hits_found() ) { # do something That is the method that should be used. no_hits_found can be false even when num_hits returns 0 thanks to certain options you might have supplied to filter out hits. From er at xs4all.nl Mon Jan 8 10:14:27 2007 From: er at xs4all.nl (Erik) Date: Mon, 8 Jan 2007 16:14:27 +0100 (CET) Subject: [Bioperl-l] Bioperl wiki 'working examples' - SeqIO HOWTO In-Reply-To: References: <1BF52B89-EA16-4FF0-8BFC-8D1EA3F7C91D@nmrc.navy.mil> <165B74D1-7159-404B-B1C4-184C977EAE78@nmrc.navy.mil> Message-ID: <20615.156.83.0.39.1168269267.squirrel@webmail.xs4all.nl> Hi all, Although I think the following change to the SeqIO HowTo is an improvement, I would like someone to confirm that. (I don't use these '>' and '<' prefixes myself) http://www.bioperl.org/w/index.php?title=HOWTO:SeqIO&curid=1488&diff=9268&oldid=9257 Thanks, Erikjan From bosborne11 at verizon.net Mon Jan 8 11:15:35 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 08 Jan 2007 11:15:35 -0500 Subject: [Bioperl-l] Bioperl wiki 'working examples' - SeqIO HOWTO In-Reply-To: <20615.156.83.0.39.1168269267.squirrel@webmail.xs4all.nl> Message-ID: Erik, I don't think I've ever seen "<$file" in any of the Bioperl documentation, all I've seen is "$file", personally I'd prefer the second, simpler version. Brian O. On 1/8/07 10:14 AM, "Erik" wrote: > Hi all, > > Although I think the following change to the SeqIO HowTo is an > improvement, I would like someone to confirm that. (I don't use these '>' > and '<' prefixes myself) > > http://www.bioperl.org/w/index.php?title=HOWTO:SeqIO&curid=1488&diff=9268&oldi > d=9257 > > > Thanks, > Erikjan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From er at xs4all.nl Mon Jan 8 11:30:33 2007 From: er at xs4all.nl (Erik) Date: Mon, 8 Jan 2007 17:30:33 +0100 (CET) Subject: [Bioperl-l] Bioperl wiki 'working examples' - SeqIO HOWTO In-Reply-To: References: Message-ID: <8534.156.83.0.39.1168273833.squirrel@webmail.xs4all.nl> Hi Brian I agree with you about leaving the >< prefix out, but in the code-comments just below this link: http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples it says: [...] # Although it is optional, it is good # programming practice to provide > and < in front of any # filenames provided in the -file parameter. This makes the # resulting filehandle created by SeqIO explicitly read (<) # or write(>). It will definitely help others reading your # code understand the function of the SeqIO object. [...] So it would be a little strange to do it differently in the same doc. Ah well. If 'my' change is not wrong, I'll leave it for now. The code example now works (there was another little bug). Thanks :) Erikjan > Erik, > > I don't think I've ever seen "<$file" in any of the Bioperl documentation, > all I've seen is "$file", personally I'd prefer the second, simpler > version. > > Brian O. > > > On 1/8/07 10:14 AM, "Erik" wrote: > >> Hi all, >> >> Although I think the following change to the SeqIO HowTo is an >> improvement, I would like someone to confirm that. (I don't use these >> '>' >> and '<' prefixes myself) >> >> http://www.bioperl.org/w/index.php?title=HOWTO:SeqIO&curid=1488&diff=9268&oldi >> d=9257 >> >> >> Thanks, >> Erikjan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From Kevin.M.Brown at asu.edu Mon Jan 8 11:25:57 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 8 Jan 2007 09:25:57 -0700 Subject: [Bioperl-l] Bioperl wiki 'working examples' - SeqIO HOWTO Message-ID: <1A4207F8295607498283FE9E93B775B4028B2DC1@EX02.asurite.ad.asu.edu> $file is read/write >$file is write only (overwrite) >>$file is write only (append) <$file is read only > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Brian Osborne > Sent: Monday, January 08, 2007 9:16 AM > To: Erik; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bioperl wiki 'working examples' - SeqIO HOWTO > > Erik, > > I don't think I've ever seen "<$file" in any of the Bioperl > documentation, > all I've seen is "$file", personally I'd prefer the second, > simpler version. > > Brian O. > > > On 1/8/07 10:14 AM, "Erik" wrote: > > > Hi all, > > > > Although I think the following change to the SeqIO HowTo is an > > improvement, I would like someone to confirm that. (I don't > use these '>' > > and '<' prefixes myself) > > > > > http://www.bioperl.org/w/index.php?title=HOWTO:SeqIO&curid=148 > 8&diff=9268&oldi > > d=9257 > > > > > > Thanks, > > Erikjan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Kevin.M.Brown at asu.edu Mon Jan 8 12:47:12 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 8 Jan 2007 10:47:12 -0700 Subject: [Bioperl-l] Graphics Panel Message-ID: <1A4207F8295607498283FE9E93B775B4028B2E0D@EX02.asurite.ad.asu.edu> I'm trying to create an image showing the alignment of CDS's to a chromosome in a genbank file that I retrieved from ncbi. The base code that I'm using came directly from the howto (http://www.bioperl.org/wiki/HOWTO:Graphics). Since the chromosome is fairly long (3Mbp in length) I'm trying to split the drawing up into sections and I'm running into some issues. The first issue is that no matter what I do with the -description option, I can't get the locus tag of the gene to show up properly. Instead I always get the description of what the gene does. I have verified that "locus_tag" does contain the information I'm after (BMA0001, BMA0002, etc...). The other issue is that even though I'm feeding the add_track method a new array for each division, only the first section even shows the genes in a track. Below are two of the scripts that I use to retrieve the genbank files and parse them. Attached is an example output with the issues I've tried to describe. This is with BioPerl 1.5.2_100 retrieved from CPAN today. This retrieves the genbank files for the organism that I'm mucking around with. #!/use/bin/perl -w use strict; use Bio::DB::GenBank; use Bio::SeqIO; my $gb = new Bio::DB::GenBank; my @ids = ('NC_006348'); foreach my $id (@ids) { my $entry = $gb->get_Seq_by_id($id); my $out = Bio::SeqIO->new(-file=>">$id.gb", -format=>'genbank'); $out->write_seq($entry); } This script is for creating the image from the above genbank file. # This script generates a PNG picture of a 10K region containing a # set of red features and a set of blue features. Call it like this: # red_and_blue.pl > redblue.png # you can now view the picture with your favorite image application # This script parses a GenBank or EMBL file named on the command # line and produces a PNG rendering of it. Call it like this: # biographics.pl NC_006348.gb NC_006348.png 10 use strict; use Bio::Graphics; use Bio::SeqIO; my $file = shift or die "provide a sequence file as the argument\n"; my $output = shift or die "provide an output file for the image\n"; my $division = shift || 3; my $io = Bio::SeqIO->new(-file=>$file) or die "could not create Bio::SeqIO"; my $seq = $io->next_seq or die "could not find a sequence in the file"; open my $out, ">$output" or die "could not open $output for writing\n"; my @features = $seq->all_SeqFeatures; # sort features by their primary tags my %sorted_features; for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { push @{$sorted_features{int($f->start()/$seq->length()*$division)}},$f; } } foreach my $key (keys %sorted_features) { print $key ."\t".@{$sorted_features{$key}}."\n"; } my $wholeseq = Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length); my $panel = Bio::Graphics::Panel->new( -length => $seq->length/$division, -key_style => 'between', -width => 15000, -pad_left => 10, -pad_right => 10, ); for (my $i = 0; $i<$division;$i++) { $panel->add_track($wholeseq, -glyph => 'arrow', -bump => 0, -double=>1, -tick => 2, -start => 1+$i*int($seq->length()/$division), -end => ($i+1)*int($seq->length()/$division), ); $panel->add_track($wholeseq, -glyph => 'generic', -bgcolor => 'blue', -label => 1, -offset => $i*int($seq->length()/$division), -start => 1+$i*int($seq->length()/$division), -end => ($i+1)*int($seq->length()/$division), ); # general case $panel->add_track(\@{$sorted_features{$i}}, -glyph => 'generic', -bgcolor => sub { my $feature = shift; my $idx; my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua); if ($feature->strand >= 0) { $idx = ($feature->start() - 1) % 3; } else { $idx = ($feature->end() - 1) % 3 + 3; } $colors[$idx];}, -fgcolor => 'black', -font2color => 'red', -bump => +1, -height => 8, -description => \&generic_description, -offset => 1+$i*int($seq->length()/$division), #-end => ($i+1)*int($seq->length()/$division), ); } print $out $panel->png; exit 0; sub generic_description { my $feature = shift; my $description = $feature->get_tag_values("locus_tag"); return "$description"; } -------------- next part -------------- A non-text attachment was scrubbed... Name: NC_006348.png Type: image/png Size: 125121 bytes Desc: NC_006348.png Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070108/41ea62ca/attachment-0001.png From natg at shore.net Mon Jan 8 15:23:43 2007 From: natg at shore.net (Nathan (Nat) Goodman) Date: Mon, 8 Jan 2007 12:23:43 -0800 Subject: [Bioperl-l] Auto-method caller proposal Message-ID: <20070108202343.B24F23C0A53@heimdall.systemsbiology.net> On Jan 3, 2007, at 1:09 PM, Sendu Bala wrote: > I propose a method that sets method values based on user-supplied args > to new()... You might take a look at Class::AutoClass and its companion Class::AutoClass::Args (both available in CPAN) which do most of what you want. We use it extensively. It handles auto-generation of set and get methods but can also be used to pass arguments into programmer-coded methods. It's already used in BioPerl in Bio::Graph::SimpleGraph, a module that we provided for a specialized purpose (may be obsolete). Historical background: we offered AutoClass to BioPerl years ago, but it was shot down by senior members of the development community. If I recall correctly, there was concern about debuggability and understandability of code that relied on auto-generated methods and too much magic argument processing. These are reasonable concerns. Perhaps these issues should be aired again to make sure the key people agree with this direction before too much effort is spent Best, Nat From george.heller at yahoo.com Mon Jan 8 15:17:50 2007 From: george.heller at yahoo.com (George Heller) Date: Mon, 8 Jan 2007 12:17:50 -0800 (PST) Subject: [Bioperl-l] Error while running load_seqdatabase.pl Message-ID: <170605.54942.qm@web58912.mail.re1.yahoo.com> Hi all. I am new to Bioperl and am trying to run the load_seqdatabase.pl script to load sequence data from a file into Postgres database. I am invoking the script through the following command: perl load_seqdatabase.pl -host localhost -dbname biodb06 -format fasta -dbuser postgres -driver Pg I am getting the following error: -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were ("FGENES HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400|1","unknown" ,"","0","") FKs (1,) ERROR: duplicate key violates unique constraint "bioentry_accession_key" --------------------------------------------------- Could not store unknown: ------------- EXCEPTION ------------- MSG: error while executing statement in Bio::DB::BioSQL::SeqAdaptor::find_by_uni que_key: ERROR: current transaction is aborted, commands ignored until end of t ransaction block STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/perl 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/perl5 /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5 .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5. 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8. 5/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) load_seqdatabase.pl:620 STACK toplevel load_seqdatabase.pl:602 -------------------------------------- at load_seqdatabase.pl line 633 Can anyone tell me how I can correct this error and get my script running? Thanks!!! George. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From torsten.seemann at infotech.monash.edu.au Mon Jan 8 18:59:57 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 9 Jan 2007 10:59:57 +1100 Subject: [Bioperl-l] Graphics Panel In-Reply-To: <1A4207F8295607498283FE9E93B775B4028B2E0D@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B4028B2E0D@EX02.asurite.ad.asu.edu> Message-ID: Kevin, > matter what I do with the -description option, I can't get the locus tag > of the gene to show up properly. Instead I always get the description > > sub generic_description { > my $feature = shift; > my $description = $feature->get_tag_values("locus_tag"); > return "$description"; > } I don't know if this is causing your problem, but be aware that get_tag_values() is meant to be used in array context to return all the tag values (there could be more than one). Because you are using it in a scalar context, it would be returning the array length (probably "1"). If you know there is only one locus_tag (most likely), then write: my ($description) = $feature->get_tag_values("locus_tag"); $description ||= 'no locus tag'; --Torsten From george.heller at yahoo.com Mon Jan 8 22:27:33 2007 From: george.heller at yahoo.com (George Heller) Date: Mon, 8 Jan 2007 19:27:33 -0800 (PST) Subject: [Bioperl-l] Error while running load_seqdatabase.pl Message-ID: <20070109032733.40941.qmail@web58903.mail.re1.yahoo.com> Hi all. I am new to Bioperl and am trying to run the load_seqdatabase.pl script to load sequence data from a file into Postgres database. I am invoking the script through the following command: perl load_seqdatabase.pl -host localhost -dbname biodb06 -format fasta -dbuser postgres -driver Pg I have already loaded the taxon data using the load_ncbi_taxonomy.pl script. But while running the load_seqdatabase.pl, I am getting the following error: Loading maize_pep.fasta ... -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400|1","unknown","","0","") FKs (1,) ERROR: duplicate key violates unique constraint "bioentry_accession_key" --------------------------------------------------- Could not store unknown: ------------- EXCEPTION ------------- MSG: error while executing statement in Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current transaction is aborted, commands ignored until end of transaction block STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) load_seqdatabase.pl:620 STACK toplevel load_seqdatabase.pl:602 -------------------------------------- at load_seqdatabase.pl line 633 Can anyone tell me how I can correct this error and get my script running? Thanks!!! George. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hlapp at gmx.net Mon Jan 8 23:11:38 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 8 Jan 2007 23:11:38 -0500 Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: <170605.54942.qm@web58912.mail.re1.yahoo.com> References: <170605.54942.qm@web58912.mail.re1.yahoo.com> Message-ID: <3AB417DD-6943-420D-AB2C-F95C54628832@gmx.net> George, this is almost certainly caused by using FASTA format and bioperl's treatment of it. I am guilty of not having written a FAQ yet for Bioperl-db, as this would certainly be there. Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl uses Bioperl to parse sequence files) does not extract the accession number from the description line of the fasta sequence, and instead sets the accession_number property if sequence objects it creates to "unknown". Since there is a unique key constraint on (accession,version,namespace) the second sequence loaded will raise an exception as it will violate the constraint. The simplest way to deal with this is to write a SeqProcessor that massages the accession_number appropriately and then supply the module to load_seqdatabase.pl using the --pipeline command line switch. There are several examples for how to do this in the email archives. See for example this thread on the Biosql list: http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html with two links to examples, and Marc Logghe gives another one in the thread itself. Hth, -hilmar On Jan 8, 2007, at 3:17 PM, George Heller wrote: > Hi all. > > I am new to Bioperl and am trying to run the load_seqdatabase.pl > script to load sequence data from a file into Postgres database. I > am invoking the script through the following command: > > perl load_seqdatabase.pl -host localhost -dbname biodb06 -format > fasta > -dbuser postgres -driver Pg > > I am getting the following error: > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were ("FGENES > HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > 1","unknown" > ,"","0","") FKs (1,) > ERROR: duplicate key violates unique constraint > "bioentry_accession_key" > --------------------------------------------------- > Could not store unknown: > ------------- EXCEPTION ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::SeqAdaptor::find_by_uni > que_key: ERROR: current transaction is aborted, commands ignored > until end of t > ransaction block > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ > lib/perl > 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > usr/lib/perl5 > /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ > perl5/site_perl/5 > .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/ > site_perl/5. > 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ > site_perl/5.8. > 5/Bio/DB/Persistent/PersistentObject.pm:271 > STACK (eval) load_seqdatabase.pl:620 > STACK toplevel load_seqdatabase.pl:602 > -------------------------------------- > at load_seqdatabase.pl line 633 > > Can anyone tell me how I can correct this error and get my script > running? Thanks!!! > > George. > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From himanshu.ardawatia at bccs.uib.no Tue Jan 9 06:26:22 2007 From: himanshu.ardawatia at bccs.uib.no (Himanshu Ardawatia) Date: Tue, 9 Jan 2007 04:26:22 -0700 Subject: [Bioperl-l] Question on Tree : Last common ancestor / Internal Nodes Message-ID: <62d36e2b0701090326g733893d4u3e38f448b5a1dc72@mail.gmail.com> Hi, I am trying the available bioperl script (attached below), and as I see it seems to give (or not give at all) strange results: Input tree: __DATA__ (a,((c,d)z,(e,f)y)x)root; Output : Use of uninitialized value in pattern match (m//) at try_tree_new.pl line 9. lca is x for c,d,f Use of uninitialized value in pattern match (m//) at try_tree_new.pl line 18. Use of uninitialized value in print at try_tree_new.pl line 25. lca is for a,z However, here we can see that actually, 'lca' for 'e' and 'f' should be 'y' and 'lca' for 'c' and 'd' should be 'z' . In another case, if my input tree is : __DATA__ (a,((c,d)D0L0=0+0,(e,f)D0L1=1+0)D0L2=1+1)D1L0=0+0; (where I have replaced 'x', 'y', 'z' and 'root' internal nodes with some other values which are important for me) I get the Output: Use of uninitialized value in pattern match (m//) at try_tree_new.pl line 9. lca is D0L2=1+1 for c,d,f Use of uninitialized value in pattern match (m//) at try_tree_new.pl line 18. lca is a for a Here the last line is changed ' lca is a for a' as comapred to the previous result. I wonder why this change..... If a use a completely different tree : Input: __DATA_ ((48355,(21337,65453)D0L0=0+0)D0L1=1+0,(38243,18116)D0L2=1+1)D1L0=0+0; I do not get any result in the output at all... Output : Use of uninitialized value in pattern match (m//) at try_tree_new.pl line 9. Can't call method "id" on an undefined value at try_tree_new.pl line 17. Can anyone suggest why these differences and how can I obtain 'internal node ids' (eg. in this case 'D1L0=0+0' etc. for each leaf seperately ? Thanks Himanshu Script: #!/usr/bin/perl -w use strict; use Bio::TreeIO; my $tree = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA)->next_tree; my @nodes = grep { $_->id =~ /c|d|f/ } $tree->get_nodes; my @orig = @nodes; while( @nodes > 1 ) { my $lca = $tree->get_lca(-nodes => [shift @nodes, shift @nodes]); push @nodes, $lca; } my $lca = shift @nodes; print "lca is ",$lca->id, " for ", join(",",map { $_->id } @orig), "\n"; @nodes = grep { $_->id =~ /a|z/ } $tree->get_nodes; @orig = @nodes; while( @nodes > 1 ) { my $lca = $tree->get_lca(-nodes => [shift @nodes, shift @nodes]); push @nodes, $lca; } $lca = shift @nodes; print "lca is ",$lca->id, " for ", join(",",map { $_->id } @orig), "\n"; __DATA__ From bix at sendu.me.uk Tue Jan 9 06:38:11 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 09 Jan 2007 11:38:11 +0000 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <20070108202343.B24F23C0A53@heimdall.systemsbiology.net> References: <20070108202343.B24F23C0A53@heimdall.systemsbiology.net> Message-ID: <45A37EA3.1030503@sendu.me.uk> Nathan (Nat) Goodman wrote: > On Jan 3, 2007, at 1:09 PM, Sendu Bala wrote: > >> I propose a method that sets method values based on user-supplied args >> to new()... > > You might take a look at Class::AutoClass and its companion > Class::AutoClass::Args (both available in CPAN) which do most of what you > want. It does look really good and is certainly featurefull. Before investigating it I already had to add a synonym handling scheme to my proposed code. > We use it extensively. It handles auto-generation of set and get > methods but can also be used to pass arguments into programmer-coded > methods. It's already used in BioPerl in Bio::Graph::SimpleGraph, a module > that we provided for a specialized purpose (may be obsolete). My concern is that Bio::Graph::SimpleGraph doesn't inherit from Bio::Root::Root. How difficult would that have been? I'm also concerned that it would be too difficult to change things over to using Class::AutoClass if _init_self() methods have to be added all over the place. Alternatively, would the system work if only Bio::Root::RootI was based on Class::AutoClass? What would the necessary code be for changes in RootI and then changes in an existing run-wrapper for example? > Historical background: we offered AutoClass to BioPerl years ago, but it was > shot down by senior members of the development community. If I recall > correctly, there was concern about debuggability and understandability of > code that relied on auto-generated methods and too much magic argument > processing. These are reasonable concerns. Perhaps these issues should be > aired again to make sure the key people agree with this direction before too > much effort is spent Personally I'm in favour of not having auto-generated methods except as a last resort, but that last resort does need to be there or it becomes simply too depressing to write run-wrappers. I'm not sure I like the idea of purely 'magic' argument processing, preferring an explicit method call which makes it clear what you have chosen to do. But perhaps that could be implemented in a RootI method using Class::AutoClass? If so, can you offer some example code of how that might be done? From bix at sendu.me.uk Tue Jan 9 06:49:49 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 09 Jan 2007 11:49:49 +0000 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <145293A0-3BA0-4D03-A623-D583327282E1@uiuc.edu> References: <459BF156.2020702@sendu.me.uk> <459CDAD9.5040101@sendu.me.uk> <145293A0-3BA0-4D03-A623-D583327282E1@uiuc.edu> Message-ID: <45A3815D.4020107@sendu.me.uk> Chris Fields wrote: > The idea is one could add in customized code fairly easily using this > (very useful, BTW) method, or a similar one. For example, I could > replace code defined in a dispatch table with my own on the fly by > passing in new code as an hash ref: > > # customized subs my %subs = ( 'ids' => '# modified id() for > customized ID retrieval, not std get/set', 'score' => '# modified > score() code using myparam() and mydata()' ); > > # customized user-params (get/sets) my @params = (qw(myparam > mydata)); > > # pass into similar method to _set_from_args() # using parameters and > hash/array refs I'm not sure I understand. Can you offer a specific example of this sort of thing being used? From what I initially understood of your idea, in class Bio::MyNewClass you write out a sub as a string and associate it with a key like 'bar' => 'my $self=shift; my $bar = $shift if @_; return ($self->foo)*$bar if $bar;' and in some way supply that pair to something such that when bar() is called by the user, the string is evaluated and used as the method. In what circumstance does this provide any benefit (to the author or the user) over defining the following sub explicitly in Bio::MyNewClass ? sub bar { my $self=shift; my $bar = $shift if @_; return ($self->foo)*$bar if $bar; } > BTW, don't know if this is possible, but can you get around 'no > strict "refs"' by building the sub code as a string and using an > eval? I did something like this in the EUtilities BEGIN block using > heredoc, something picked up from Brian's Bio::DB::Query::GenBank. The reference in question is the method name, not the method code. From bix at sendu.me.uk Tue Jan 9 07:38:26 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 09 Jan 2007 12:38:26 +0000 Subject: [Bioperl-l] Question on Tree : Last common ancestor / Internal Nodes In-Reply-To: <62d36e2b0701090326g733893d4u3e38f448b5a1dc72@mail.gmail.com> References: <62d36e2b0701090326g733893d4u3e38f448b5a1dc72@mail.gmail.com> Message-ID: <45A38CC2.5020908@sendu.me.uk> Himanshu Ardawatia wrote: > Hi, > > I am trying the available bioperl script (attached below), and as I > see it seems to give (or not give at all) strange results: What version of Bioperl are you using? With 1.5.2 I don't see any problems except user-error. I've noted the results I get below. > Input tree: __DATA__ (a,((c,d)z,(e,f)y)x)root; > > Output : Use of uninitialized value in pattern match (m//) at > try_tree_new.pl line 9. lca is x for c,d,f Use of uninitialized value > in pattern match (m//) at try_tree_new.pl line 18. Use of > uninitialized value in print at try_tree_new.pl line 25. lca is for > a,z lca is x for c,d,f lca is root for a,z > However, here we can see that actually, 'lca' for 'e' and 'f' should > be 'y' and 'lca' for 'c' and 'd' should be 'z' . What do you mean? Your script asks the lca of c,d,f and a,z, not e,f or c,d. > In another case, if my input tree is : __DATA__ > (a,((c,d)D0L0=0+0,(e,f)D0L1=1+0)D0L2=1+1)D1L0=0+0; (where I have > replaced 'x', 'y', 'z' and 'root' internal nodes with some other > values which are important for me) > > I get the Output: Use of uninitialized value in pattern match (m//) > at try_tree_new.pl line 9. lca is D0L2=1+1 for c,d,f Use of > uninitialized value in pattern match (m//) at try_tree_new.pl line > 18. lca is a for a lca is D0L2=1+1 for c,d,f lca is a for a > Here the last line is changed ' lca is a for a' as comapred to the > previous result. > > I wonder why this change..... Your script asks for the lca of a,z but the tree no longer has a node 'z'. So you ask for the lca of 'a', which is 'a'. > If a use a completely different tree : > > Input: __DATA_ > ((48355,(21337,65453)D0L0=0+0)D0L1=1+0,(38243,18116)D0L2=1+1)D1L0=0+0; > > > I do not get any result in the output at all... > > Output : Use of uninitialized value in pattern match (m//) at > try_tree_new.pl line 9. Can't call method "id" on an undefined value > at try_tree_new.pl line 17. Can't call method "id" on an undefined value at try_tree_new.pl line 14. > Can anyone suggest why these differences Again, your script is asking for nodes that aren't in the tree, so of course it isn't going to work. > and how can I obtain 'internal node ids' (eg. in this case 'D1L0=0+0' > etc. for each leaf seperately ? You'll want to use the methods get_leaf_nodes() and internal_id() (though you almost certainly don't really want the internal id and should be using the human-readable id id() instead). In Bioperl 1.5.2 your code can be written as the much simpler: #!/usr/bin/perl -w use strict; use Bio::TreeIO; my $tree = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA)->next_tree; my @nodes = grep { $_->id =~ /c|d|f/ } $tree->get_nodes; my $lca = $tree->get_lca(@nodes); print "lca is ",$lca->id, " for ", join(",",map { $_->id } @nodes), "\n"; @nodes = grep { $_->id =~ /a|z/ } $tree->get_nodes; $lca = $tree->get_lca(@nodes); print "lca is ",$lca->id, " for ", join(",",map { $_->id } @nodes), "\n"; __DATA__ (a,((c,d)z,(e,f)y)x)root; From lidaof at gmail.com Tue Jan 9 09:38:59 2007 From: lidaof at gmail.com (lidaof) Date: Tue, 9 Jan 2007 22:38:59 +0800 Subject: [Bioperl-l] problem with Bio::SearchIO::Writer Message-ID: <12d02c20701090638i714a0d66k6c0a88f0a1ae5fee@mail.gmail.com> Hi, i am confusing with the output of the blast result i use Bio::SearchIO to analysis the blast result and i don't know how to use Bio::SearchIO::Writer to output the result i want as a Text file the document of Bio::SearchIO::Writer is too simple to me can anyone show me some examples for output blast result as text file using Bio::SearchIO::Writer or other Module? Any reply will be appreciate! Thanks for your time in advance:) -- Li From natg at shore.net Tue Jan 9 09:54:49 2007 From: natg at shore.net (Nathan (Nat) Goodman) Date: Tue, 9 Jan 2007 06:54:49 -0800 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <45A37EA3.1030503@sendu.me.uk> Message-ID: <003801c733fe$2ebcbfa0$3300a8c0@goodmandesktop> >> On Jan 3, 2007, at 1:09 PM, Sendu Bala wrote: >> >>> I propose a method that sets method values based on user-supplied >>> args to new()... >> >> You might take a look at Class::AutoClass and its companion >> Class::AutoClass::Args (both available in CPAN) which do most of what >> you want. > > My concern is that Bio::Graph::SimpleGraph doesn't inherit from > Bio::Root::Root. How difficult would that have been? Inheriting from Bio::Root::Root should work fine. The code handles multiple inheritance properly. It is also designed to co-exist with non-AutoClass base classes (by which I mean super-classes that don't inherit from AutoClass), such that the auto-initialization stuff is only attempted for classes that inherit from AutoClass, > I'm also > concerned that it would be too difficult to change things over to > using Class::AutoClass if _init_self() methods have to be added all > over the place. _init_self is only needed for classes that (1) inherit from AutoClass and (2) that need to do non-standard initialization. See design notes, below. > Alternatively, would the system work if only Bio::Root::RootI was > based on Class::AutoClass? What would the necessary code be for > changes in RootI and then changes in an existing run-wrapper for > example? Yes, it should work for Bio::Root::RootI to inherit from AutoClass. This, however, would impose AutoClass on every BioPerl-er whether they want it or not :) Design notes: 1) The reason we do the _init_self thing, rather than doing this via 'new', is to reduce the opportunities for programmers to mess up the auto-initialization. For auto-initialization to work as expected, __every__ class has to do it. So, if you let programmers mess with 'new', you have to be confident that they will insert the call to 'auto_init' or whatever at the top of their code. 2) Another concern is multiple inheritance. Out of the box, Perl does not do initialization correctly in the presence of multiple inheritance. AutoClass::new does it correctly using a method explained in Paul Fenwick's tutorial on object-oriented Perl. Again, if you let programmers mess with 'new', you run the risk of them neglecting this detail. We find that mutliple inheritance is common in modules that are 'grafted' onto class hierarchies like BioPerl -- eg, to add application specific behavior to BioPerl classes. This said, it would be easy to refine the AutoClass design to let programmers contol the operation of 'new'. We'd be happy to do this or assist others in doing this if folks want to go in this direction. > Personally I'm in favour of not having auto-generated methods except > as a last resort, but that last resort does need to be there or it > becomes simply too depressing to write run-wrappers. As I said, we use AutoClass extensively. Once you get used to auto-generated methods, it's hard to go back! > I'm not sure I like the idea of purely 'magic' argument processing, > preferring an explicit method call which makes it clear what you have > chosen to do. But perhaps that could be implemented in a RootI method > using Class::AutoClass? If so, can you offer some example code of how > that might be done? Argument parsing is handled by the companion module Class::AutoClass::Args. Here's an example of what you want. Assume that $self is the object being intialized, @_ is the parameter list, and you run to run the set methods for 'attributes' name, sex, and hobbies: my $args=new Class::AutoClass::Args(@_); $self->set_attributes([qw(name sex hobbies)],$args); Easy! But I remind you this only handles one level of initialization. If you want to handle inheritance, and esp. multiple inheritance correctly, more code is needed. Also note that this approach requires that every new attribute be manually added to the initialization code which is easy to forget. If this is what you want, the relevant pieces could be pulled out of AutoClass and Args and added to Bio::Root::RootI with little trouble. Best, Nat From bix at sendu.me.uk Tue Jan 9 10:12:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 09 Jan 2007 15:12:39 +0000 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <003801c733fe$2ebcbfa0$3300a8c0@goodmandesktop> References: <003801c733fe$2ebcbfa0$3300a8c0@goodmandesktop> Message-ID: <45A3B0E7.5090407@sendu.me.uk> Nathan (Nat) Goodman wrote: >>> On Jan 3, 2007, at 1:09 PM, Sendu Bala wrote: > >> Alternatively, would the system work if only Bio::Root::RootI was >> based on Class::AutoClass? What would the necessary code be for >> changes in RootI and then changes in an existing run-wrapper for >> example? > > Yes, it should work for Bio::Root::RootI to inherit from AutoClass. > This, however, would impose AutoClass on every BioPerl-er whether they > want it or not :) As a generally useful thing I'd like to see all Bioperl modules have easy access to this functionality, just as they can currently call _rearrange(). So yes, we would impose AutoClass on everyone. This may not be a major burden since it is already an optional pre-requisite. So, before I investigate using Class::AutoClass instead of my own proposed method, does anyone feel there are good benefits of Class::AutoClass over my proposed method (considering I already added a -synonym option to it), and do those benefits outweigh having Class::AutoClass as a Bioperl installation requirement? From Kevin.M.Brown at asu.edu Tue Jan 9 10:32:04 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 9 Jan 2007 08:32:04 -0700 Subject: [Bioperl-l] Graphics Panel Message-ID: <1A4207F8295607498283FE9E93B775B4028B2F62@EX02.asurite.ad.asu.edu> > > matter what I do with the -description option, I can't get > > the locus tag > > of the gene to show up properly. Instead I always get the > > description > > > > sub generic_description { > > my $feature = shift; > > my $description = $feature->get_tag_values("locus_tag"); > > return "$description"; > > } > > I don't know if this is causing your problem, but be aware that > get_tag_values() is meant to be used in array context to return all > the tag values (there could be more than one). Because you are using > it in a scalar context, it would be returning the array length > (probably "1"). If you know there is only one locus_tag (most likely), > then write: > > my ($description) = $feature->get_tag_values("locus_tag"); > $description ||= 'no locus tag'; Hmm, that is odd. I use that very call in the other script that retrieves the GenBank files (code not shown) and it returns the value I was expecting. It turns out that after the features get dumped into the graphics panel, they lose a number of tag values. I'm not sure why that is happening. Basically if I do print $feature->get_tag_values("locus_tag"); before I push it into add_track I do see the string I want returned from it as expected, but once I send it off to add_track and it launches the sub to get the tag, it is no longer there. The other interesting thing is, I run this script with the same input file each time and randomly I get an error that it can't find the method get_tag_values in a Bio::Location::Simple object, but I never pass it Bio::Location::Simple. The array is full of Bio::SeqFeature::Generic objects which I have verified by printing out the object reference type before putting it in the array of items I'm after. From cjfields at uiuc.edu Tue Jan 9 11:19:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 9 Jan 2007 10:19:17 -0600 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <45A3815D.4020107@sendu.me.uk> References: <459BF156.2020702@sendu.me.uk> <459CDAD9.5040101@sendu.me.uk> <145293A0-3BA0-4D03-A623-D583327282E1@uiuc.edu> <45A3815D.4020107@sendu.me.uk> Message-ID: <7B878161-C7BE-4F94-B6DF-077C92967101@uiuc.edu> On Jan 9, 2007, at 5:49 AM, Sendu Bala wrote: > Chris Fields wrote: >> The idea is one could add in customized code fairly easily using this >> (very useful, BTW) method, or a similar one. For example, I could >> replace code defined in a dispatch table with my own on the fly by >> passing in new code as an hash ref: >> >> # customized subs my %subs = ( 'ids' => '# modified id() for >> customized ID retrieval, not std get/set', 'score' => '# modified >> score() code using myparam() and mydata()' ); >> >> # customized user-params (get/sets) my @params = (qw(myparam >> mydata)); >> >> # pass into similar method to _set_from_args() # using parameters and >> hash/array refs > > I'm not sure I understand. Can you offer a specific example of this > sort > of thing being used? There are a few examples in Higher Order Perl (and maybe Advanced Perl, but I can't remember). These are really dispatch tables for config files. > From what I initially understood of your idea, in class > Bio::MyNewClass > you write out a sub as a string and associate it with a key like > > 'bar' => 'my $self=shift; my $bar = $shift if @_; return > ($self->foo)*$bar if $bar;' > > and in some way supply that pair to something such that when bar() is > called by the user, the string is evaluated and used as the method. In > what circumstance does this provide any benefit (to the author or the > user) over defining the following sub explicitly in Bio::MyNewClass ? > > sub bar { > my $self=shift; > my $bar = $shift if @_; > return ($self->foo)*$bar if $bar; > } Don't read too much into my posted example. I'm basically just using that to demonstrate what could be done w/o having to define a brand- new class or make modifications to the Bioperl code directly. As I said before, there are much simpler ways to go about it (including your suggestion and the previous suggestions by Aaron and Nat using Class* methods), so it's probably more trouble than it's worth to even worry about. >> BTW, don't know if this is possible, but can you get around 'no >> strict "refs"' by building the sub code as a string and using an >> eval? I did something like this in the EUtilities BEGIN block using >> heredoc, something picked up from Brian's Bio::DB::Query::GenBank. > > The reference in question is the method name, not the method code. Now you pointed it out, yes, went back and looked again. I was thinking of something like the heredoc version I used in the past: our @METHODS = qw(foo bar); for my $method (@METHODS) { eval <{'_$method'} = shift if \@_; return \$self->{'_$method'}; } END chris From natg at shore.net Tue Jan 9 11:35:04 2007 From: natg at shore.net (Nathan (Nat) Goodman) Date: Tue, 9 Jan 2007 08:35:04 -0800 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <45A3B0E7.5090407@sendu.me.uk> Message-ID: <004901c7340c$2fb23ad0$3300a8c0@goodmandesktop> >Sendu Bala wrote: > As a generally useful thing I'd like to see all Bioperl modules have > easy access to this functionality, just as they can currently call > _rearrange(). So yes, we would impose AutoClass on everyone. This may > not be a major burden since it is already an optional > pre-requisite. So, before I investigate using Class::AutoClass instead > of my own proposed method, does anyone feel there are good benefits of > Class::AutoClass over my proposed method (considering I already added > a -synonym option to it), and do those benefits outweigh having > Class::AutoClass as a Bioperl installation requirement? Just to be clear. Based on my understanding of your requirements, I appears you don't really want the full-blown auto-generation and auto-initialization capabilities of AutoClass. My suggestion is that you look at the code in AutoClass and its companion Args handler, and extract any pieces that are relevant to what you're trying to do. I'd be happy to help with this. If I've mis-understood your requirements and you do want the full AutoClass capabilities, I'd be happy to assist with adapting it to fit the BioPerl framework. A lot depends on whether you need the initialization to include things like default values and to work correctly in the presence of inheritance. These requirements really complicate the picture. Best, Nat From lstein at cshl.edu Tue Jan 9 11:50:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 9 Jan 2007 11:50:26 -0500 Subject: [Bioperl-l] Graphics Panel In-Reply-To: <1A4207F8295607498283FE9E93B775B4028B2F62@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B4028B2F62@EX02.asurite.ad.asu.edu> Message-ID: <6dce9a0b0701090850m539886d4pc3dd0ac771d08b81@mail.gmail.com> Hi, I hope this isn't a bug. Could you try the following (using a single feature to make life easier): Before you add the feature to the track do: warn "BEFORE: feature=$feature, values=",$feature->get_tag_values("locus_tag"); Then in the callback, do: sub { my $feature = shift; warn "AFTER: feature=$feature, values=",$feature->get_tag_values("locus_tag"); return join '; ',$feature->get_tag_values("locus_tag"); } Hopefully, the feature will be the SAME one each time and will return the same list of tag values. Lincoln On 1/9/07, Kevin Brown wrote: > > > > matter what I do with the -description option, I can't get > > > the locus tag > > > of the gene to show up properly. Instead I always get the > > > description > > > > > > sub generic_description { > > > my $feature = shift; > > > my $description = $feature->get_tag_values("locus_tag"); > > > return "$description"; > > > } > > > > I don't know if this is causing your problem, but be aware that > > get_tag_values() is meant to be used in array context to return all > > the tag values (there could be more than one). Because you are using > > it in a scalar context, it would be returning the array length > > (probably "1"). If you know there is only one locus_tag (most likely), > > then write: > > > > my ($description) = $feature->get_tag_values("locus_tag"); > > $description ||= 'no locus tag'; > > Hmm, that is odd. I use that very call in the other script that > retrieves the GenBank files (code not shown) and it returns the value I > was expecting. It turns out that after the features get dumped into the > graphics panel, they lose a number of tag values. I'm not sure why that > is happening. Basically if I do > print $feature->get_tag_values("locus_tag"); > before I push it into add_track I do see the string I want returned from > it as expected, but once I send it off to add_track and it launches the > sub to get the tag, it is no longer there. > > The other interesting thing is, I run this script with the same input > file each time and randomly I get an error that it can't find the method > get_tag_values in a Bio::Location::Simple object, but I never pass it > Bio::Location::Simple. The array is full of Bio::SeqFeature::Generic > objects which I have verified by printing out the object reference type > before putting it in the array of items I'm after. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Tue Jan 9 11:50:52 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 09 Jan 2007 16:50:52 +0000 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <004901c7340c$2fb23ad0$3300a8c0@goodmandesktop> References: <004901c7340c$2fb23ad0$3300a8c0@goodmandesktop> Message-ID: <45A3C7EC.9030705@sendu.me.uk> Nathan (Nat) Goodman wrote: >> Sendu Bala wrote: >> As a generally useful thing I'd like to see all Bioperl modules have >> easy access to this functionality, just as they can currently call >> _rearrange(). So yes, we would impose AutoClass on everyone. This may >> not be a major burden since it is already an optional >> pre-requisite. So, before I investigate using Class::AutoClass instead >> of my own proposed method, does anyone feel there are good benefits of >> Class::AutoClass over my proposed method (considering I already added >> a -synonym option to it), and do those benefits outweigh having >> Class::AutoClass as a Bioperl installation requirement? > > Just to be clear. Based on my understanding of your requirements, I appears > you don't really want the full-blown auto-generation and auto-initialization > capabilities of AutoClass. My suggestion is that you look at the code in > AutoClass and its companion Args handler, and extract any pieces that are > relevant to what you're trying to do. I'd be happy to help with this. As far as I know, my proposed method already does everything I want it to do, so there wouldn't be any need to 'extract any pieces' from AutoClass... > If I've mis-understood your requirements and you do want the full AutoClass > capabilities, I'd be happy to assist with adapting it to fit the BioPerl > framework. > > A lot depends on whether you need the initialization to include things like > default values and to work correctly in the presence of inheritance. These > requirements really complicate the picture. ...So yes, if AutoClass were to be used it would be to take advantage of the whole thing. My current question is simply, "does AutoClass offer things we want and 'should' take advantage of, that my proposed method does not offer"? My preference at the moment is to use my proposed method, if only because I'm lazy. I want someone to tell me there is significant value in using AutoClass before I spend time on it. Alternatively Nat, if you could provide AutoClass-using code that does what my method does but in a better way or with more features, in a form that I can commit to the Bioperl repository and immediately start taking advantage of in new Bioperl modules, please do so. From cjfields at uiuc.edu Tue Jan 9 12:29:06 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 9 Jan 2007 11:29:06 -0600 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <45A3B0E7.5090407@sendu.me.uk> References: <003801c733fe$2ebcbfa0$3300a8c0@goodmandesktop> <45A3B0E7.5090407@sendu.me.uk> Message-ID: <94EB4115-9338-4CA7-A628-3B6F341AA1C7@uiuc.edu> On Jan 9, 2007, at 9:12 AM, Sendu Bala wrote: > Nathan (Nat) Goodman wrote: >>>> On Jan 3, 2007, at 1:09 PM, Sendu Bala wrote: >> >>> Alternatively, would the system work if only Bio::Root::RootI was >>> based on Class::AutoClass? What would the necessary code be for >>> changes in RootI and then changes in an existing run-wrapper for >>> example? >> >> Yes, it should work for Bio::Root::RootI to inherit from AutoClass. >> This, however, would impose AutoClass on every BioPerl-er whether >> they >> want it or not :) > > As a generally useful thing I'd like to see all Bioperl modules have > easy access to this functionality, just as they can currently call > _rearrange(). So yes, we would impose AutoClass on everyone. This may > not be a major burden since it is already an optional pre- > requisite. So, > before I investigate using Class::AutoClass instead of my own proposed > method, does anyone feel there are good benefits of Class::AutoClass > over my proposed method (considering I already added a -synonym option > to it), and do those benefits outweigh having Class::AutoClass as a > Bioperl installation requirement? To me, this goes back to Aaron's argument about 'reinventing the wheel.' If you can accomplish what you need using Class::AutoClass, why not use it? Or as Nat suggests, use the relevant Class::AutoClass code directly in RootI. However, if you believe your method is simpler, by all means use that. The main criticisms about using autogenerated methods (summarized by Nat) seem to be that (1) they cause problems with understanding code and debugging, and (2) too magic argument processing. However, there are many instances where AUTOLOAD is being used in bioperl to deal with general get/setters, or a similar kludge (like the heredoc eval {} kludge I posted earlier) is used instead. Based on that and your previous post there is obviously a need for some way to autogenerate methods like get/setters, even if that need isn't expressed openly. chris From dmessina at wustl.edu Tue Jan 9 12:51:25 2007 From: dmessina at wustl.edu (David Messina) Date: Tue, 9 Jan 2007 11:51:25 -0600 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <45A3C7EC.9030705@sendu.me.uk> References: <004901c7340c$2fb23ad0$3300a8c0@goodmandesktop> <45A3C7EC.9030705@sendu.me.uk> Message-ID: On Jan 9, 2007, at 10:50 AM, Sendu Bala wrote: > My preference at the moment is to use my proposed method, if only > because I'm lazy. I want someone to tell me there is significant value > in using AutoClass before I spend time on it. I am reluctant to get in the way of someone threatening to code, and I'll admit I don't know enough about the particulars of this problem, but in general the problem I see with a custom solution to an already- solved problem is that: - the custom code will be brand-new and examined by few pairs of eyes, whereas the existing code (AutoClass) has been already vetted and debugged by the Perl community. Based on Nat's design notes, there are some subtle gotchas to implementing auto-classing which have to be considered carefully. - the custom code has not been used before in BioPerl, whereas AutoClass has been already -- why have two sets of code in BioPerl doing the same thing? That makes the BioPerl codebase harder to maintain and unnecessarily complicated. Again, this is just how it looks to me sitting in the cheap seats, so my apologies if I'm way off base here... Dave From dmessina at wustl.edu Tue Jan 9 12:29:42 2007 From: dmessina at wustl.edu (David Messina) Date: Tue, 9 Jan 2007 11:29:42 -0600 Subject: [Bioperl-l] problem with Bio::SearchIO::Writer In-Reply-To: <12d02c20701090638i714a0d66k6c0a88f0a1ae5fee@mail.gmail.com> References: <12d02c20701090638i714a0d66k6c0a88f0a1ae5fee@mail.gmail.com> Message-ID: <5EDDB2C8-60B6-4773-BB74-F5E355F85483@wustl.edu> > i am confusing with the output of the blast result > i use Bio::SearchIO to analysis the blast result Okay, so you've got the first part. When you get stuck, the easiest way to learn how to use BioPerl is to read the How-Tos: http://www.bioperl.org/wiki/HOWTOs If there is a Hot-To on the topic you're interested in, that is the first place you should look. There is a How-to on SearchIO here: http://www.bioperl.org/wiki/HOWTO:SearchIO Also, lots of questions have already been answered on the mailing list. You can search the mailing list archives here: http://news.gmane.org/gmane.comp.lang.perl.bio.general > and i don't know how to use Bio::SearchIO::Writer to output the > result i > want as a Text file > the document of Bio::SearchIO::Writer is too simple to me If you go to the documentation for Bio::SearchIO::Writer... http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/SearchIO/Writer/ toc.html ...you'll see a list of output writers, including Bio::SearchIO::Writer::TextResultWriter: http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/SearchIO/Writer/ TextResultWriter.html > can anyone show me some examples for output blast result as text > file using > Bio::SearchIO::Writer or other Module? Looking on that page, the synopsis shows this example... use Bio::SearchIO; use Bio::SearchIO::Writer::TextResultWriter; my $in = new Bio::SearchIO(-format => 'blast', -file => shift @ARGV); my $writer = new Bio::SearchIO::Writer::TextResultWriter(); my $out = new Bio::SearchIO(-writer => $writer); $out->write_result($in->next_result); ...which I think will do exactly what you want. Dave From MEC at stowers-institute.org Tue Jan 9 14:38:48 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 9 Jan 2007 13:38:48 -0600 Subject: [Bioperl-l] bp_seqfeature_load of latest Flybase GFF annotation fails due to data inconsistency. Message-ID: Drat! bash> bp_seqfeature_load.PLS --fast --dsn 'dbi:mysql:database=dmel_r5_1;host=mysql-dev' --create --noverbose <( flygenegff ./flybase.net/genomes/Drosophila_melanogaster/dmel_r5.1/gff/*.gff ) (note: `flygenegff` used above sorts and filters the GFF input so that the GFF features are loaded in order needed: gene before mRNA before exon) This worked fine with the last release of Flybase. But now I get: ------------- EXCEPTION ------------- MSG: FBtr0110936 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 And indeed, sleuthing the data proves that FBtr0110936 is an example of a Flybase transcript identifier that is annotated as being one of the multiple parents of exons but that does not itself have an entry in Flybase! Proof: `grep FBtr0110936 dmel_r5.1/gff/*.gff` returns only exon features (no gene, CDS, UTR, or mRNA) ... whereas, grepping for any of the other three transcripts mentioned as parents of those exons yields the expected additional feature of type mRNA, protein, CDS, etc By the way, this data-bug manifests itself when searching the Flybase website (FB2006_01, released December 8, 2006) for transcript FBtr0110936 as: "ERROR: report for FBtr0110936 not found" I wonder if anyone can tell me what causes this data problem, and tell me whether it is ubiquitous (i.e. are there other transcripts mentioned as exon parents that do not have their own feature)? I am trying to load this latest Flybase GFF into Lincoln Steins Bio::DB::SeqFeature database (using bp_seqfeature_load) but the load fails due to this data problem. Any recommendations/workarounds to this issue are quite welcome. Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri From marian.thieme at lycos.de Tue Jan 9 14:38:42 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Tue, 09 Jan 2007 20:38:42 +0100 Subject: [Bioperl-l] Bio::PopGen::IO Message-ID: <45A3EF42.2050709@lycos.de> Hi, I have a problem/missing knowledge about initialising/providing markers in a population object Let me illustrating what I did: 1. importing csv file with sample and marker information for some individuals. (including header info): my $io = new Bio::PopGen::IO ( -format => 'csv', -allele_delimiter => ' ', -field_delimiter => '', -file => 'test.csv') 2.)pushing all individuals to an array my @population; while( my $ind = $io->next_individual ) { push @population, $ind; } 3.) I convinced myself that this has worked, because I was able to ouput alleles, individual_id and marker_name. 4.) Did create a population object with that individuals: my $pop = Bio::PopGen::Population->new( -name => 'popname', -description => 'description', -individuals => @population); 5.) But when I try to access the markers via: for my $name ( $pop->get_marker_names ) { my $marker = $pop->get_Marker(); print $marker; } nothing appears on the screen, becuase the function get_marker_names obviously doesnt fetch some values, in my case. Question: Do I need to provide the marker information seperatly ? Regards, Marian From jason at bioperl.org Tue Jan 9 16:10:32 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 9 Jan 2007 13:10:32 -0800 Subject: [Bioperl-l] Bio::PopGen::IO In-Reply-To: <45A3EF42.2050709@lycos.de> References: <45A3EF42.2050709@lycos.de> Message-ID: you need to pass in an array ref when you create the population, note the missing '\'. The CSV method also provides a method next_population which will go ahead and build a population object for you as well. > my $pop = Bio::PopGen::Population->new( > -name => 'popname', > -description => 'description', > -individuals => \@population); On Jan 9, 2007, at 11:38 AM, Marian Thieme wrote: > Hi, > > I have a problem/missing knowledge about initialising/providing > markers > in a population object > > Let me illustrating what I did: > > 1. importing csv file with sample and marker information for some > individuals. (including header info): > > my $io = new Bio::PopGen::IO ( -format => 'csv', > -allele_delimiter => ' ', > -field_delimiter => '', > -file => 'test.csv') > > > 2.)pushing all individuals to an array > > my @population; > while( my $ind = $io->next_individual ) { > push @population, $ind; > } > > 3.) I convinced myself that this has worked, because I was able to > ouput > alleles, individual_id and marker_name. > > > 4.) Did create a population object with that individuals: > > my $pop = Bio::PopGen::Population->new( > -name => 'popname', > -description => 'description', > -individuals => @population); > > 5.) But when I try to access the markers via: > > for my $name ( $pop->get_marker_names ) { > my $marker = $pop->get_Marker(); > print $marker; > } > > nothing appears on the screen, becuase the function get_marker_names > obviously doesnt fetch some values, in my case. > > Question: Do I need to provide the marker information seperatly ? > > Regards, > Marian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From marian.thieme at lycos.de Tue Jan 9 17:23:22 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Tue, 09 Jan 2007 23:23:22 +0100 Subject: [Bioperl-l] Bio::PopGen::IO In-Reply-To: References: <45A3EF42.2050709@lycos.de> Message-ID: <45A415DA.7080207@lycos.de> If I do so (something like this): if( my $pop = $io->next_population ) { print $pop->get_Individuals; print $pop; } I get only a reference to the population object: Bio::PopGen::Population=HASH(...). On the other hand, if I create the pop object via new constructor and afterwards try to access markers via get_Marker() method: for my $name ( $pop->get_marker_names ) { my $marker = $pop->get_Marker(); print $marker; } then an error occurs: Can't locate object method "get_Marker" via package "marker_name_a" (perhaps you forgot to load "marker_name_a"?) at myfile.pl line xx, line 95. To which file does it refer ? ( line 95) How to load "marker_name_a" ? ("marker_name_a" belongs to the marker names I did provide in the csv file) Regards, Marian Jason Stajich schrieb: > you need to pass in an array ref when you create the population, note > the missing '\'. > The CSV method also provides a method next_population which will go > ahead and build a population object for you as well. > >> my $pop = Bio::PopGen::Population->new( >> -name => 'popname', >> -description => 'description', >> -individuals => \@population); > > On Jan 9, 2007, at 11:38 AM, Marian Thieme wrote: > >> Hi, >> >> I have a problem/missing knowledge about initialising/providing markers >> in a population object >> >> Let me illustrating what I did: >> >> 1. importing csv file with sample and marker information for some >> individuals. (including header info): >> >> my $io = new Bio::PopGen::IO ( -format => 'csv', >> -allele_delimiter => ' ', >> -field_delimiter => '', >> -file => 'test.csv') >> >> >> 2.)pushing all individuals to an array >> >> my @population; >> while( my $ind = $io->next_individual ) { >> push @population, $ind; >> } >> >> 3.) I convinced myself that this has worked, because I was able to ouput >> alleles, individual_id and marker_name. >> >> >> 4.) Did create a population object with that individuals: >> >> my $pop = Bio::PopGen::Population->new( >> -name => 'popname', >> -description => 'description', >> -individuals => @population); >> >> 5.) But when I try to access the markers via: >> >> for my $name ( $pop->get_marker_names ) { >> my $marker = $pop->get_Marker(); >> print $marker; >> } >> >> nothing appears on the screen, becuase the function get_marker_names >> obviously doesnt fetch some values, in my case. >> >> Question: Do I need to provide the marker information seperatly ? >> >> Regards, >> Marian >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > > From jason at bioperl.org Tue Jan 9 17:30:55 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 9 Jan 2007 14:30:55 -0800 Subject: [Bioperl-l] Bio::PopGen::IO In-Reply-To: <45A415DA.7080207@lycos.de> References: <45A3EF42.2050709@lycos.de> <45A415DA.7080207@lycos.de> Message-ID: this doesn't make sense to me - you want to request a particular marker with the vals you requested in the loop. > for my $name ( $pop->get_marker_names ) { > my $marker = $pop->get_Marker($name); > print $marker; > } See the code in t/PopGen.t for working code that tests these modules. Here is some code from there cut+pasted: # build a population from an alignment for my $name ( $population->get_marker_names ) { my $marker = $population->get_Marker($name); warn("$name ",join(" ",$marker->get_Alleles()),"\n"); } -jason On Jan 9, 2007, at 2:23 PM, Marian Thieme wrote: > If I do so (something like this): > if( my $pop = $io->next_population ) { > print $pop->get_Individuals; > print $pop; > } > > I get only a reference to the population object: > Bio::PopGen::Population=HASH(...). > > On the other hand, if I create the pop object via new constructor and > afterwards try to access markers via get_Marker() method: > > for my $name ( $pop->get_marker_names ) { > my $marker = $pop->get_Marker(); > print $marker; > } > > then an error occurs: > Can't locate object method "get_Marker" via package "marker_name_a" > (perhaps you forgot to load "marker_name_a"?) at myfile.pl line xx, > line 95. > To which file does it refer ? ( line 95) > How to load "marker_name_a" ? > > ("marker_name_a" belongs to the marker names I did provide in the > csv file) > > Regards, > Marian > > > Jason Stajich schrieb: >> you need to pass in an array ref when you create the population, note >> the missing '\'. >> The CSV method also provides a method next_population which will go >> ahead and build a population object for you as well. >> >>> my $pop = Bio::PopGen::Population->new( >>> -name => 'popname', >>> -description => 'description', >>> -individuals => \@population); >> >> On Jan 9, 2007, at 11:38 AM, Marian Thieme wrote: >> >>> Hi, >>> >>> I have a problem/missing knowledge about initialising/providing >>> markers >>> in a population object >>> >>> Let me illustrating what I did: >>> >>> 1. importing csv file with sample and marker information for some >>> individuals. (including header info): >>> >>> my $io = new Bio::PopGen::IO ( -format => 'csv', >>> -allele_delimiter => ' ', >>> -field_delimiter => '', >>> -file => 'test.csv') >>> >>> >>> 2.)pushing all individuals to an array >>> >>> my @population; >>> while( my $ind = $io->next_individual ) { >>> push @population, $ind; >>> } >>> >>> 3.) I convinced myself that this has worked, because I was able >>> to ouput >>> alleles, individual_id and marker_name. >>> >>> >>> 4.) Did create a population object with that individuals: >>> >>> my $pop = Bio::PopGen::Population->new( >>> -name => 'popname', >>> -description => 'description', >>> -individuals => @population); >>> >>> 5.) But when I try to access the markers via: >>> >>> for my $name ( $pop->get_marker_names ) { >>> my $marker = $pop->get_Marker(); >>> print $marker; >>> } >>> >>> nothing appears on the screen, becuase the function get_marker_names >>> obviously doesnt fetch some values, in my case. >>> >>> Question: Do I need to provide the marker information seperatly ? >>> >>> Regards, >>> Marian >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Miller Research Fellow >> University of California, Berkeley >> lab: 510.642.8441 >> http://pmb.berkeley.edu/~taylor/people/js.html >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From MEC at stowers-institute.org Tue Jan 9 18:05:09 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 9 Jan 2007 17:05:09 -0600 Subject: [Bioperl-l] PROPOSED new method Bio::Range->offsetStranded Message-ID: Hi, I'd like to commit changes to Bio::RangeI which defined offsetStranded to allows the following tests to pass in Bio/Range.t $r = Bio::Range->new(-start => 30, -end => 40, -strand => -1); ok ($r->offsetStranded(-5,10)->toString, '(20, 45) strand=-1'); ok ($r->offsetStranded(+5,-10)->toString, '(30, 40) strand=-1'); $r->strand(1); ok ($r->offsetStranded(-5,10)->toString, '(25, 50) strand=1'); ok ($r->offsetStranded(+5,-10)->toString, '(30, 40) strand=1'); Here's the implementation. =head2 offsetStranded Title : offsetStranded Usage : $rnge->ofsetStranded($fiveprime_offset, $threeprime_offset) Function : destructively modifies RangeI implementing object to offset its start and stop coordinates by values $fiveprime_offset and $threeprime_offset (positive values being in the strand direction). Args : two integer offsets: $fiveprime_offset and $threeprime_offset Returns : $self, offset accordingly. =cut sub offsetStranded { my ($self, $offset_fiveprime, $offset_threeprime) = @_; my ($offset_start, $offset_end) = $self->strand() eq -1 ? (- $offset_threeprime, - $offset_fiveprime) : ($offset_fiveprime, $offset_threeprime); $self->start($self->start + $offset_start); $self->end($self->end + $offset_end); return $self; }; I'll commit tomorrow unless I'm told 'that would be a mistake'. Cheers, --Malcolm From lidaof at gmail.com Tue Jan 9 20:37:04 2007 From: lidaof at gmail.com (lidaof) Date: Wed, 10 Jan 2007 09:37:04 +0800 Subject: [Bioperl-l] problem with Bio::SearchIO::Writer In-Reply-To: <5EDDB2C8-60B6-4773-BB74-F5E355F85483@wustl.edu> References: <12d02c20701090638i714a0d66k6c0a88f0a1ae5fee@mail.gmail.com> <5EDDB2C8-60B6-4773-BB74-F5E355F85483@wustl.edu> Message-ID: <12d02c20701091737w666fcd5em17f86d619fdf39b0@mail.gmail.com> Hi,Dave: this document also available on CPAN and i have saw it:) i use a filehandle for output before i know some Text output module such as Bio::SearchIO::Writer::TextResultWriter or other module so which variables could be visited by the object created by Writer::TextResultWriter module? i have seen no FILE option of the synopsis of TextResultWriter that is exactly i confusing with Thank you very much for the advice and information you provided. Li On 1/10/07, David Messina wrote: > > > i am confusing with the output of the blast result > > i use Bio::SearchIO to analysis the blast result > > Okay, so you've got the first part. > > When you get stuck, the easiest way to learn how to use BioPerl is to > read the How-Tos: > http://www.bioperl.org/wiki/HOWTOs > > If there is a Hot-To on the topic you're interested in, that is the > first place you should look. > > There is a How-to on SearchIO here: > http://www.bioperl.org/wiki/HOWTO:SearchIO > > Also, lots of questions have already been answered on the mailing > list. You can search the mailing list archives here: > http://news.gmane.org/gmane.comp.lang.perl.bio.general > > > > and i don't know how to use Bio::SearchIO::Writer to output the > > result i > > want as a Text file > > > the document of Bio::SearchIO::Writer is too simple to me > > If you go to the documentation for Bio::SearchIO::Writer... > http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/SearchIO/Writer/ > toc.html > > ...you'll see a list of output writers, including > Bio::SearchIO::Writer::TextResultWriter: > http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/SearchIO/Writer/ > TextResultWriter.html > > > > can anyone show me some examples for output blast result as text > > file using > > Bio::SearchIO::Writer or other Module? > > Looking on that page, the synopsis shows this example... > > use Bio::SearchIO; > use Bio::SearchIO::Writer::TextResultWriter; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => shift @ARGV); > > my $writer = new Bio::SearchIO::Writer::TextResultWriter(); > my $out = new Bio::SearchIO(-writer => $writer); > $out->write_result($in->next_result); > > > ...which I think will do exactly what you want. > > > Dave > -- Li From himanshu.ardawatia at bccs.uib.no Tue Jan 9 23:08:18 2007 From: himanshu.ardawatia at bccs.uib.no (Himanshu Ardawatia) Date: Tue, 9 Jan 2007 21:08:18 -0700 Subject: [Bioperl-l] Question on Tree : Last common ancestor / Internal Nodes In-Reply-To: <45A38CC2.5020908@sendu.me.uk> References: <62d36e2b0701090326g733893d4u3e38f448b5a1dc72@mail.gmail.com> <45A38CC2.5020908@sendu.me.uk> Message-ID: <62d36e2b0701092008p780d2fa0m8b0cf98d71d91b60@mail.gmail.com> Hi, I am using version 1.4 . With this version, I run the script (the one you sent) and get the following error: Use of uninitialized value in pattern match (m//) at tree.pl line 7. -------------------- WARNING --------------------- MSG: Must provide a valid array reference for -nodes --------------------------------------------------- Can't call method "id" on an undefined value at tree.pl line 9. I then installed version 1.5.2. With this vesion I get the following error: And I get the same error as above . Which method should I use : 'id()' or 'internal_id()' . I get the same error using any of the methods. Thanks Himanshu script : #!/usr/bin/perl -w use strict; use Bio::TreeIO; my $tree = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA)->next_tree; my @nodes = grep { $_->id =~ /c|d|f/ } $tree->get_nodes; my $lca = $tree->get_lca(@nodes); print "lca is ",$lca->id, " for ", join(",",map { $_->id } @nodes), "\n"; @nodes = grep { $_->id =~ /a|z/ } $tree->get_nodes; $lca = $tree->get_lca(@nodes); print "lca is ",$lca->id, " for ", join(",",map { $_->id } @nodes), "\n"; __DATA__ (a,((c,d)z,(e,f)y)x)root; --------------------------------------------------- On 1/9/07, Sendu Bala wrote: > > Himanshu Ardawatia wrote: > > Hi, > > > > I am trying the available bioperl script (attached below), and as I > > see it seems to give (or not give at all) strange results: > > What version of Bioperl are you using? With 1.5.2 I don't see any > problems except user-error. I've noted the results I get below. > > > > Input tree: __DATA__ (a,((c,d)z,(e,f)y)x)root; > > > > Output : Use of uninitialized value in pattern match (m//) at > > try_tree_new.pl line 9. lca is x for c,d,f Use of uninitialized value > > in pattern match (m//) at try_tree_new.pl line 18. Use of > > uninitialized value in print at try_tree_new.pl line 25. lca is for > > a,z > > lca is x for c,d,f > lca is root for a,z > > > > However, here we can see that actually, 'lca' for 'e' and 'f' should > > be 'y' and 'lca' for 'c' and 'd' should be 'z' . > > What do you mean? Your script asks the lca of c,d,f and a,z, not e,f or > c,d. > > > > In another case, if my input tree is : __DATA__ > > (a,((c,d)D0L0=0+0,(e,f)D0L1=1+0)D0L2=1+1)D1L0=0+0; (where I have > > replaced 'x', 'y', 'z' and 'root' internal nodes with some other > > values which are important for me) > > > > I get the Output: Use of uninitialized value in pattern match (m//) > > at try_tree_new.pl line 9. lca is D0L2=1+1 for c,d,f Use of > > uninitialized value in pattern match (m//) at try_tree_new.pl line > > 18. lca is a for a > > lca is D0L2=1+1 for c,d,f > lca is a for a > > > > Here the last line is changed ' lca is a for a' as comapred to the > > previous result. > > > > I wonder why this change..... > > Your script asks for the lca of a,z but the tree no longer has a node > 'z'. So you ask for the lca of 'a', which is 'a'. > > > > If a use a completely different tree : > > > > Input: __DATA_ > > ((48355,(21337,65453)D0L0=0+0)D0L1=1+0,(38243,18116)D0L2=1+1)D1L0=0+0; > > > > > > I do not get any result in the output at all... > > > > Output : Use of uninitialized value in pattern match (m//) at > > try_tree_new.pl line 9. Can't call method "id" on an undefined value > > at try_tree_new.pl line 17. > > Can't call method "id" on an undefined value at try_tree_new.pl line 14. > > > > Can anyone suggest why these differences > > Again, your script is asking for nodes that aren't in the tree, so of > course it isn't going to work. > > > > and how can I obtain 'internal node ids' (eg. in this case 'D1L0=0+0' > > etc. for each leaf seperately ? > > You'll want to use the methods get_leaf_nodes() and internal_id() > (though you almost certainly don't really want the internal id and > should be using the human-readable id id() instead). > > > In Bioperl 1.5.2 your code can be written as the much simpler: > > #!/usr/bin/perl -w > > use strict; > use Bio::TreeIO; > my $tree = Bio::TreeIO->new(-format => 'newick', -fh => > \*DATA)->next_tree; > > my @nodes = grep { $_->id =~ /c|d|f/ } $tree->get_nodes; > my $lca = $tree->get_lca(@nodes); > print "lca is ",$lca->id, " for ", join(",",map { $_->id } @nodes), "\n"; > > @nodes = grep { $_->id =~ /a|z/ } $tree->get_nodes; > $lca = $tree->get_lca(@nodes); > print "lca is ",$lca->id, " for ", join(",",map { $_->id } @nodes), "\n"; > > __DATA__ > (a,((c,d)z,(e,f)y)x)root; > From jason at bioperl.org Wed Jan 10 01:05:38 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 9 Jan 2007 22:05:38 -0800 Subject: [Bioperl-l] Question on Tree : Last common ancestor / Internal Nodes In-Reply-To: <62d36e2b0701092008p780d2fa0m8b0cf98d71d91b60@mail.gmail.com> References: <62d36e2b0701090326g733893d4u3e38f448b5a1dc72@mail.gmail.com> <45A38CC2.5020908@sendu.me.uk> <62d36e2b0701092008p780d2fa0m8b0cf98d71d91b60@mail.gmail.com> Message-ID: <70CF9072-9494-45A5-A85D-AC1D23CD33E3@bioperl.org> Works for me on bioperl 1.5.x so this may be a bug in get_lca in 1.4 version, I'm not sure. -jason On Jan 9, 2007, at 8:08 PM, Himanshu Ardawatia wrote: > #!/usr/bin/perl -w > > use strict; > use Bio::TreeIO; > my $tree = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA)- > >next_tree; > > my @nodes = grep { $_->id =~ /c|d|f/ } $tree->get_nodes; > my $lca = $tree->get_lca(@nodes); > print "lca is ",$lca->id, " for ", join(",",map { $_->id } @nodes), > "\n"; > > @nodes = grep { $_->id =~ /a|z/ } $tree->get_nodes; > $lca = $tree->get_lca(@nodes); > print "lca is ",$lca->id, " for ", join(",",map { $_->id } @nodes), > "\n"; > > __DATA__ > (a,((c,d)z,(e,f)y)x)root; -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From bix at sendu.me.uk Wed Jan 10 04:37:26 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 10 Jan 2007 09:37:26 +0000 Subject: [Bioperl-l] Question on Tree : Last common ancestor / Internal Nodes In-Reply-To: <62d36e2b0701092008p780d2fa0m8b0cf98d71d91b60@mail.gmail.com> References: <62d36e2b0701090326g733893d4u3e38f448b5a1dc72@mail.gmail.com> <45A38CC2.5020908@sendu.me.uk> <62d36e2b0701092008p780d2fa0m8b0cf98d71d91b60@mail.gmail.com> Message-ID: <45A4B3D6.9060409@sendu.me.uk> Himanshu Ardawatia wrote: > Hi, > > I am using version 1.4 . With this version, I run the script (the one > you sent) and get the following error: > Use of uninitialized value in pattern match (m//) at tree.pl line 7. > > -------------------- WARNING --------------------- > MSG: Must provide a valid array reference for -nodes > --------------------------------------------------- > Can't call method "id" on an undefined value at tree.pl line 9. > > > I then installed version 1.5.2. With this vesion I get the following error: What error? > And I get the same error as above . You probably didn't install 1.5.2 properly. Either you're actually still using 1.4, or when 1.5.2 was installed not all of 1.4 was first removed. Check your version number ( http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F ), locate and manually delete your 1.4 installation and try the 1.5.2 installation again. > Which method should I use : 'id()' or 'internal_id()' . I get the same > error using any of the methods. The internal id is something Bioperl makes up for itself, so you almost certainly don't want it. Use id(). From bix at sendu.me.uk Wed Jan 10 10:49:31 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 10 Jan 2007 15:49:31 +0000 Subject: [Bioperl-l] Generalized _setparams proposal Message-ID: <45A50B0B.5030906@sendu.me.uk> Bioperl-run wrappers (and some other modules) typically store parameters from the user intended for the program they wrap via methods of similar name to the parameter. The wrappers then implement their own way of taking the values of those methods and generating a parameter string suitable for passing to the program. Typically this is done in a _setparams() method with code along the lines of: $param_string .= ' --param_x '.$self->param_x if $self->param_x; or a number of modules have their own variation on: # clustalw's _setparam method my $param_string = ""; for $attr ( @CLUSTALW_PARAMS ) { $value = $self->$attr(); next unless (defined $value); my $attr_key = lc $attr; #put params in format expected by clustalw $attr_key = ' -'.$attr_key; $param_string .= $attr_key.'='.$value; } #... I propose generalizing this into a _setparams method in Bio::Tools::Run::WrapperBase, eg. such that the above could be re-implemented as: # clustalw's _setparam method my $param_string = $self->SUPER::_setparam( -methods => \@CLUSTALW_PARAMS, -dash => 1, -lc => 1, -join => '='); #... Proposed code follows: =head2 _setparams() Title : _setparams Usage : $params = $self->_setparams(-methods => [qw(window evalue_cutoff)]) Function: For internal use by wrapper modules to build parameter strings suitable for sending to the program being wrapped. For each method name supplied, calls the method and adds the method name (as modified by optional things) along with its value (unless a switch) to the parameter string Example : $params = $self->_setparams(-methods => [qw(window evalue_cutoff)], -double_dash => 1, -underscore_to_dash => 1); If window() had not been previously called, but evalue_cutoff(0.5) had been called, $params would be ' --evalue-cutoff 0.5' A separate call should be used for methods that store values that are intended to be valueless switches: $params .= $self->_setparams(-methods => [qw(simple large all)], -switches => 1, -double_dash => 1, -underscore_to_dash => 1); If simple() had not been called, large(1) had been called and all(0) had been called, $params would now be ' --evalue-cutoff 0.5 --large' Returns : parameter string Args : -methods => [] or {} # REQUIRED, array ref of method names to call, or hash ref where keys are method names and values are how those names should be output in the params string -join => string # define how parameters and their values are joined, default ' '. (eg. could be '=' for param=value) -switches => boolean # all supplied methods are treated as switches -lc => boolean # lc() method names prior to output in string -dash => boolean # prefix all method names with a single dash -double_dash => bool # prefix all method names with a double dash -underscore_to_dash => boolean # convert all underscores in method names to dashes =cut sub _setparams { my ($self, @args) = @_; my ($methods, $switch, $join, $lc, $d, $dd, $utd) = $self->_rearrange([qw(METHODS SWITCHES JOIN LC DASH DOUBLE_DASH UNDERSCORE_TO_DASH)], @args); $self->throw('-methods is required') unless $methods; $self->throw("-dash and -double_dash are mutually exclusive") if ($d && $dd); $join ||= ' '; my %methods = ref($methods) eq 'HASH' ? %{$methods} : map { $_ => $_ } @{$methods}; my $param_string = ''; while (my ($method, $method_out) = each %methods) { my $value = $self->$method(); next unless (defined $value); next if ($switch && ! $value); $method_out = lc($method_out) if $lc; $method_out = '-'.$method_out if $d; $method_out = '--'.$method_out if $dd; $method_out =~ s/_/-/g if $utd; $param_string .= ' '.$method_out.($switch ? '' : $join.$value); } return $param_string; } From dmessina at wustl.edu Wed Jan 10 10:58:50 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 10 Jan 2007 09:58:50 -0600 Subject: [Bioperl-l] problem with Bio::SearchIO::Writer In-Reply-To: <12d02c20701091737w666fcd5em17f86d619fdf39b0@mail.gmail.com> References: <12d02c20701090638i714a0d66k6c0a88f0a1ae5fee@mail.gmail.com> <5EDDB2C8-60B6-4773-BB74-F5E355F85483@wustl.edu> <12d02c20701091737w666fcd5em17f86d619fdf39b0@mail.gmail.com> Message-ID: > this document also available on CPAN and i have saw it:) That's good, but you may want to become familiar with the BioPerl website, because the information there is more extensive and more up- to-date. > i use a filehandle for output before i know some Text output module > such as Bio::SearchIO::Writer::TextResultWriter or other module > so which variables could be visited by the object created by > Writer::TextResultWriter module? I'm not exactly sure what you're asking here. Do you want to know what methods can be used on a TextResultWriter object? > i have seen no FILE option of the synopsis of TextResultWriter Ahh, that is because TextResultWriter doesn't have a file option. :) If you look carefully at the example, it is actually Bio::SearchIO that take the -file parameter: > my $in = new Bio::SearchIO(-format => 'blast', > -file => shift @ARGV); This example might be a little confusing because it uses an unusual (antiquated?) syntax. This would do the same thing: my $in = Bio::SearchIO->new(-format => 'blast', -file => shift @ARGV); The Bio::SearchIO documentation for the new() method describes all of the parameters it can take: http://doc.bioperl.org/bioperl-live/Bio/SearchIO.html#POD1 > that is exactly i confusing with BioPerl can be hard to understand at first. Time spent reading bioperl.org and this mailing list is a good way to become familiar with the "Bioperl way" of doing things. Dave From cjfields at uiuc.edu Wed Jan 10 11:06:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 Jan 2007 10:06:17 -0600 Subject: [Bioperl-l] Question on Tree : Last common ancestor / Internal Nodes In-Reply-To: <45A4B3D6.9060409@sendu.me.uk> Message-ID: <00b401c734d1$4409ca50$15327e82@pyrimidine> > Himanshu Ardawatia wrote: > > Hi, > > > > I am using version 1.4 . With this version, I run the > script (the one > > you sent) and get the following error: > > Use of uninitialized value in pattern match (m//) at tree.pl line 7. > > > > -------------------- WARNING --------------------- > > MSG: Must provide a valid array reference for -nodes > > --------------------------------------------------- > > Can't call method "id" on an undefined value at tree.pl line 9. > > > > > > I then installed version 1.5.2. With this vesion I get the > following error: > > What error? Maybe he meant the same error above, just forgot to delete the line. > > And I get the same error as above . > > You probably didn't install 1.5.2 properly. Either you're > actually still using 1.4, or when 1.5.2 was installed not all > of 1.4 was first removed. > Check your version number ( > http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of > _BioPerl_is_installed.3F > ), locate and manually delete your 1.4 installation and try > the 1.5.2 installation again. ... Also, a quick way to locate the version of Bioperl used by default in a script (including directories defined in PERL5LIB): (BTW, this came from 'Perl Hacks'): perldoc -l Bio::Perl Or whatever Bioperl module you want. BTW, I get the script to work on Mac OS X and WinXP, so I agree with Sendu: you likely have problems with different versions of Bioperl installed on your system causing conflicts. chris From cjfields at uiuc.edu Wed Jan 10 11:17:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 Jan 2007 10:17:16 -0600 Subject: [Bioperl-l] Generalized _setparams proposal In-Reply-To: <45A50B0B.5030906@sendu.me.uk> Message-ID: <00ba01c734d2$cbede310$15327e82@pyrimidine> Sendu, Looks great to me! As with your former proposed Bio::Root::RootI method, would you switch run modules over to using this, or leave them be? chris > Bioperl-run wrappers (and some other modules) typically store > parameters from the user intended for the program they wrap > via methods of similar name to the parameter. The wrappers > then implement their own way of taking the values of those > methods and generating a parameter string suitable for > passing to the program. > > Typically this is done in a _setparams() method with code > along the lines of: > > $param_string .= ' --param_x '.$self->param_x if $self->param_x; > > or a number of modules have their own variation on: > > # clustalw's _setparam method > my $param_string = ""; > for $attr ( @CLUSTALW_PARAMS ) { > $value = $self->$attr(); > next unless (defined $value); > my $attr_key = lc $attr; #put params in format expected by clustalw > $attr_key = ' -'.$attr_key; > $param_string .= $attr_key.'='.$value; } #... > > > I propose generalizing this into a _setparams method in > Bio::Tools::Run::WrapperBase, eg. such that the above could > be re-implemented as: > > # clustalw's _setparam method > my $param_string = $self->SUPER::_setparam( > -methods => \@CLUSTALW_PARAMS, > -dash => 1, > -lc => 1, > -join => '='); > #... > > > > Proposed code follows: > > =head2 _setparams() > > Title : _setparams > Usage : $params = $self->_setparams(-methods => [qw(window > evalue_cutoff)]) > Function: For internal use by wrapper modules to build > parameter strings > suitable for sending to the program being > wrapped. For each method > name supplied, calls the method and adds the > method name (as modified > by optional things) along with its value (unless > a switch) to the > parameter string > Example : $params = $self->_setparams(-methods => > [qw(window evalue_cutoff)], > -double_dash => 1, > -underscore_to_dash => 1); > If window() had not been previously called, but > evalue_cutoff(0.5) > had been called, $params would be ' --evalue-cutoff 0.5' > > A separate call should be used for methods that > store values that > are intended to be valueless switches: > > $params .= $self->_setparams(-methods => > [qw(simple large all)], > -switches => 1, > -double_dash => 1, > -underscore_to_dash => 1); > If simple() had not been called, large(1) had > been called and all(0) > had been called, $params would now be ' > --evalue-cutoff 0.5 --large' > > Returns : parameter string > Args : -methods => [] or {} # REQUIRED, array ref of > method names > to call, > or hash ref where keys are > method names and > values are how those names > should be output > in the params string > -join => string # define how parameters and their > values are > joined, default ' '. (eg. > could be '=' for > param=value) > -switches => boolean # all supplied methods are > treated as switches > -lc => boolean # lc() method names prior to > output in > string > -dash => boolean # prefix all method names > with a single > dash > -double_dash => bool # prefix all method names > with a double dash > -underscore_to_dash => boolean # convert all > underscores in method > names to dashes > > =cut > > sub _setparams { > my ($self, @args) = @_; > > my ($methods, $switch, $join, $lc, $d, $dd, $utd) = > $self->_rearrange([qw(METHODS > SWITCHES > JOIN > LC > DASH > DOUBLE_DASH > UNDERSCORE_TO_DASH)], @args); > $self->throw('-methods is required') unless $methods; > $self->throw("-dash and -double_dash are mutually > exclusive") if ($d && $dd); > $join ||= ' '; > > my %methods = ref($methods) eq 'HASH' ? %{$methods} : > map { $_ => $_ } @{$methods}; > > my $param_string = ''; > while (my ($method, $method_out) = each %methods) { > my $value = $self->$method(); > next unless (defined $value); > next if ($switch && ! $value); > > $method_out = lc($method_out) if $lc; > $method_out = '-'.$method_out if $d; > $method_out = '--'.$method_out if $dd; > $method_out =~ s/_/-/g if $utd; > > $param_string .= ' '.$method_out.($switch ? '' : > $join.$value); > } > > return $param_string; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Jan 10 11:10:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 Jan 2007 10:10:21 -0600 Subject: [Bioperl-l] PROPOSED new method Bio::Range->offsetStranded In-Reply-To: Message-ID: <00b501c734d1$d80405e0$15327e82@pyrimidine> Malcolm, I don't have a problem with it. Just curious, but what would it be used for? chris > Hi, > > I'd like to commit changes to Bio::RangeI which defined > offsetStranded to allows the following tests to pass in Bio/Range.t > > $r = Bio::Range->new(-start => 30, -end => 40, -strand => > -1); ok ($r->offsetStranded(-5,10)->toString, '(20, 45) > strand=-1'); ok ($r->offsetStranded(+5,-10)->toString, '(30, > 40) strand=-1'); $r->strand(1); ok > ($r->offsetStranded(-5,10)->toString, '(25, 50) strand=1'); > ok ($r->offsetStranded(+5,-10)->toString, '(30, 40) strand=1'); > > > Here's the implementation. > > =head2 offsetStranded > > Title : offsetStranded > Usage : $rnge->ofsetStranded($fiveprime_offset, > $threeprime_offset) > Function : destructively modifies RangeI implementing object to > offset its start and stop coordinates by > values $fiveprime_offset and > $threeprime_offset (positive values being in > the strand direction). > Args : two integer offsets: $fiveprime_offset and > $threeprime_offset > Returns : $self, offset accordingly. > > =cut > > sub offsetStranded { > my ($self, $offset_fiveprime, $offset_threeprime) = @_; > my ($offset_start, $offset_end) = $self->strand() eq -1 ? > (- $offset_threeprime, - $offset_fiveprime) : > ($offset_fiveprime, $offset_threeprime); > $self->start($self->start + $offset_start); > $self->end($self->end + $offset_end); > return $self; > }; > > > I'll commit tomorrow unless I'm told 'that would be a mistake'. > > Cheers, > > --Malcolm > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Jan 10 11:35:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 Jan 2007 10:35:17 -0600 Subject: [Bioperl-l] problem with Bio::SearchIO::Writer In-Reply-To: Message-ID: <00bb01c734d5$501db0f0$15327e82@pyrimidine> ... > This example might be a little confusing because it uses an unusual > (antiquated?) syntax. > This would do the same thing: > > my $in = Bio::SearchIO->new(-format => 'blast', > > -file => shift @ARGV); > > The Bio::SearchIO documentation for the new() method > describes all of the parameters it can take: > > http://doc.bioperl.org/bioperl-live/Bio/SearchIO.html#POD1 > > > > that is exactly i confusing with > > BioPerl can be hard to understand at first. Time spent > reading bioperl.org and this mailing list is a good way to > become familiar with the "Bioperl way" of doing things. > > > Dave In general, it is recommended to use direct object syntax for constructors as well as object methods, as it always works as expected: $foo = Class::Foo->new(); $foo->bar(); Though indirect syntax almost always works, and is syntactically similar to other programming languages: $foo = new Class::Foo(); a number of very reliable sources (Programming Perl, Best Practices) indicate there are subtle (but important) differences in the way these are interpreted and compiled which can lead to hard-to-diagnose (and possibly OS-dependent) bugs. Using direct syntax prevents this from occurring. We have run into this very issue in Bioperl, in Bio::Root::Root::throw(). The problem persisted for a number of years, effectively preventing bioperl-db from working properly in Windows after a normal installation, even though the problem didn't appear on other OS's. Sendu managed to work it out before the last release. chris From cjfields at uiuc.edu Wed Jan 10 11:41:53 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 Jan 2007 10:41:53 -0600 Subject: [Bioperl-l] Generalized _setparams proposal In-Reply-To: <45A51305.3070403@sendu.me.uk> Message-ID: <00bc01c734d6$3c72ca30$15327e82@pyrimidine> ... > > Sendu, > > > > Looks great to me! As with your former proposed Bio::Root::RootI > > method, would you switch run modules over to using this, or > leave them be? > > I'd like to see all (or at least most) run modules moved over > to being based on WrapperBase and taking advantage of this > proposal and the other one. The only question is if I can > find the time; its no priority: it would just be 'nice'. > Certainly I'd encourage new run-wrappers to be written with > these things in mind though. Agree on all points. Changing them over definitely isn't a top priority as long as they still work properly, but it should eventually be done. Maybe something to add to the wiki Priority List? Also, I don't know if there is a list of Run wrappers which don't use WrapperBase, but it might be handy to have around on the wiki, at least for keeping track of changes. chris From bix at sendu.me.uk Wed Jan 10 12:00:58 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 10 Jan 2007 17:00:58 +0000 Subject: [Bioperl-l] Auto-method caller proposal In-Reply-To: <45A3C7EC.9030705@sendu.me.uk> References: <004901c7340c$2fb23ad0$3300a8c0@goodmandesktop> <45A3C7EC.9030705@sendu.me.uk> Message-ID: <45A51BCA.1010804@sendu.me.uk> Sendu Bala wrote: > Nathan (Nat) Goodman wrote: >> Just to be clear. Based on my understanding of your requirements, I appears >> you don't really want the full-blown auto-generation and auto-initialization >> capabilities of AutoClass. My suggestion is that you look at the code in >> AutoClass and its companion Args handler, and extract any pieces that are >> relevant to what you're trying to do. I'd be happy to help with this. > > As far as I know, my proposed method already does everything I want it > to do, so there wouldn't be any need to 'extract any pieces' from > AutoClass... > > >> If I've mis-understood your requirements and you do want the full AutoClass >> capabilities, I'd be happy to assist with adapting it to fit the BioPerl >> framework. >> >> A lot depends on whether you need the initialization to include things like >> default values and to work correctly in the presence of inheritance. These >> requirements really complicate the picture. > > ...So yes, if AutoClass were to be used it would be to take advantage of > the whole thing. Having looked at it a little more, you're quite right in saying I don't want the actual auto-classing (generation and initialization) stuff. Am I right in thinking the method generation is done using AUTOLOAD? I think we'd prefer it not be done with AUTOLOAD, so I don't think there's much I can borrow from Class::AutoClass. (Also, fix_keyword() doesn't convert --multi-word to multi_word afaict.) I think on balance (though David Messina made a good point, I don't think AutoClass is the tool for this job) I'll stick with the method I proposed, unless there are any further comments from people? From bix at sendu.me.uk Wed Jan 10 11:23:33 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 10 Jan 2007 16:23:33 +0000 Subject: [Bioperl-l] Generalized _setparams proposal In-Reply-To: <00ba01c734d2$cbede310$15327e82@pyrimidine> References: <00ba01c734d2$cbede310$15327e82@pyrimidine> Message-ID: <45A51305.3070403@sendu.me.uk> Chris Fields wrote: > Sendu, > > Looks great to me! As with your former proposed Bio::Root::RootI method, > would you switch run modules over to using this, or leave them be? I'd like to see all (or at least most) run modules moved over to being based on WrapperBase and taking advantage of this proposal and the other one. The only question is if I can find the time; its no priority: it would just be 'nice'. Certainly I'd encourage new run-wrappers to be written with these things in mind though. From MEC at stowers-institute.org Wed Jan 10 12:07:21 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 10 Jan 2007 11:07:21 -0600 Subject: [Bioperl-l] PROPOSED new method Bio::Range->offsetStranded Message-ID: Hi Chris, In part of a pipeline to design oligo microarray for detecting alternate splice sites, I build 0-length features which identify the location of all 5' and 3' splice sites (as inferred from flybase GFF gene, transcript, mRNA, exon features) and then offset these feature to create a region flanking it, which is then taken as the target of an oligo design process. I'm sure I could re-conceptualize my approach, but, as implemented, it works a treat. My thinking is influenced by my Excel's object model (VBA), which has Offset as a destructive method of Excel.Range (of excel cells). I suppose any algorithm that used a 'sliding window' could possibly benefit from Bio::Range->offsetStranded. Cheers, Malcolm Cook (poised to commit ;) > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Wednesday, January 10, 2007 10:10 AM > To: Cook, Malcolm; 'bioperl list'; muilu at ebi.ac.uk; bix at sendu.me.uk > Subject: RE: [Bioperl-l] PROPOSED new method > Bio::Range->offsetStranded > > Malcolm, > > I don't have a problem with it. Just curious, but what would > it be used > for? > > chris > > > Hi, > > > > I'd like to commit changes to Bio::RangeI which defined > > offsetStranded to allows the following tests to pass in Bio/Range.t > > > > $r = Bio::Range->new(-start => 30, -end => 40, -strand => > > -1); ok ($r->offsetStranded(-5,10)->toString, '(20, 45) > > strand=-1'); ok ($r->offsetStranded(+5,-10)->toString, '(30, > > 40) strand=-1'); $r->strand(1); ok > > ($r->offsetStranded(-5,10)->toString, '(25, 50) strand=1'); > > ok ($r->offsetStranded(+5,-10)->toString, '(30, 40) strand=1'); > > > > > > Here's the implementation. > > > > =head2 offsetStranded > > > > Title : offsetStranded > > Usage : $rnge->ofsetStranded($fiveprime_offset, > > $threeprime_offset) > > Function : destructively modifies RangeI implementing object to > > offset its start and stop coordinates by > > values $fiveprime_offset and > > $threeprime_offset (positive values being in > > the strand direction). > > Args : two integer offsets: $fiveprime_offset and > > $threeprime_offset > > Returns : $self, offset accordingly. > > > > =cut > > > > sub offsetStranded { > > my ($self, $offset_fiveprime, $offset_threeprime) = @_; > > my ($offset_start, $offset_end) = $self->strand() eq -1 ? > > (- $offset_threeprime, - $offset_fiveprime) : > > ($offset_fiveprime, $offset_threeprime); > > $self->start($self->start + $offset_start); > > $self->end($self->end + $offset_end); > > return $self; > > }; > > > > > > I'll commit tomorrow unless I'm told 'that would be a mistake'. > > > > Cheers, > > > > --Malcolm > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Wed Jan 10 12:11:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 10 Jan 2007 17:11:35 +0000 Subject: [Bioperl-l] PROPOSED new method Bio::Range->offsetStranded In-Reply-To: References: Message-ID: <45A51E47.5030801@sendu.me.uk> Cook, Malcolm wrote: > Hi Chris, > > In part of a pipeline to design oligo microarray for detecting alternate > splice sites, I build 0-length features which identify the location of > all 5' and 3' splice sites (as inferred from flybase GFF gene, > transcript, mRNA, exon features) and then offset these feature to create > a region flanking it, which is then taken as the target of an oligo > design process. > > I'm sure I could re-conceptualize my approach, but, as implemented, it > works a treat. > > My thinking is influenced by my Excel's object model (VBA), which has > Offset as a destructive method of Excel.Range (of excel cells). > > I suppose any algorithm that used a 'sliding window' could possibly > benefit from Bio::Range->offsetStranded. > > Cheers, > > Malcolm Cook (poised to commit ;) Sounds good to me. I didn't understand before why you'd have different start and end offsets, but now all is clear :) From cjfields at uiuc.edu Wed Jan 10 12:17:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 Jan 2007 11:17:40 -0600 Subject: [Bioperl-l] PROPOSED new method Bio::Range->offsetStranded In-Reply-To: <45A51E47.5030801@sendu.me.uk> References: <45A51E47.5030801@sendu.me.uk> Message-ID: <000001c734db$3befc540$15327e82@pyrimidine> > Cook, Malcolm wrote: > > Hi Chris, > > > > In part of a pipeline to design oligo microarray for detecting > > alternate splice sites, I build 0-length features which > identify the > > location of all 5' and 3' splice sites (as inferred from > flybase GFF > > gene, transcript, mRNA, exon features) and then offset > these feature > > to create a region flanking it, which is then taken as the > target of > > an oligo design process. > > > > I'm sure I could re-conceptualize my approach, but, as > implemented, it > > works a treat. > > > > My thinking is influenced by my Excel's object model (VBA), > which has > > Offset as a destructive method of Excel.Range (of excel cells). > > > > I suppose any algorithm that used a 'sliding window' could possibly > > benefit from Bio::Range->offsetStranded. > > > > Cheers, > > > > Malcolm Cook (poised to commit ;) > > Sounds good to me. I didn't understand before why you'd have > different start and end offsets, but now all is clear :) Same here. Commit away I say! chris From MEC at stowers-institute.org Wed Jan 10 15:31:26 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 10 Jan 2007 14:31:26 -0600 Subject: [Bioperl-l] bp_seqfeature_load of latest Flybase GFF annotationfails due to data inconsistency. Message-ID: Aloha, For those tracking this (or otherwise lurking) Flybase have released new versions of dmel_r5_1 GFF files that remove the data problem. --Malcolm > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Cook, Malcolm > Sent: Tuesday, January 09, 2007 1:39 PM > To: bioperl list; Blanchette, Marco > Subject: [Bioperl-l] bp_seqfeature_load of latest Flybase GFF > annotationfails due to data inconsistency. > > > Drat! > > bash> bp_seqfeature_load.PLS --fast --dsn > 'dbi:mysql:database=dmel_r5_1;host=mysql-dev' --create --noverbose <( > flygenegff > ./flybase.net/genomes/Drosophila_melanogaster/dmel_r5.1/gff/*.gff ) > > > (note: `flygenegff` used above sorts and filters the GFF input so that > the GFF features are loaded in order needed: gene before mRNA before > exon) > > This worked fine with the last release of Flybase. But now I get: > > ------------- EXCEPTION ------------- > MSG: FBtr0110936 doesn't have a primary id > STACK > Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 > STACK toplevel > /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seq > feature_lo > ad.PLS:76 > > And indeed, sleuthing the data proves that FBtr0110936 is an > example of > a Flybase transcript identifier that is annotated as being one of the > multiple parents of exons but that does not itself have an entry in > Flybase! > > Proof: > > `grep FBtr0110936 dmel_r5.1/gff/*.gff` returns only exon features (no > gene, CDS, UTR, or mRNA) > > ... whereas, grepping for any of the other three transcripts mentioned > as parents of those exons yields the expected additional > feature of type > mRNA, protein, CDS, etc > > By the way, this data-bug manifests itself when searching the Flybase > website (FB2006_01, released December 8, 2006) for transcript > FBtr0110936 as: > > "ERROR: report for FBtr0110936 not found" > > I wonder if anyone can tell me what causes this data problem, and tell > me whether it is ubiquitous (i.e. are there other transcripts > mentioned > as exon parents that do not have their own feature)? > > I am trying to load this latest Flybase GFF into Lincoln Steins > Bio::DB::SeqFeature database (using bp_seqfeature_load) but the load > fails due to this data problem. Any recommendations/workarounds to > this issue are quite welcome. > > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From himanshu.ardawatia at bccs.uib.no Wed Jan 10 17:46:41 2007 From: himanshu.ardawatia at bccs.uib.no (Himanshu Ardawatia) Date: Wed, 10 Jan 2007 15:46:41 -0700 Subject: [Bioperl-l] Question on Tree : Last common ancestor / Internal Nodes In-Reply-To: <00b401c734d1$4409ca50$15327e82@pyrimidine> References: <45A4B3D6.9060409@sendu.me.uk> <00b401c734d1$4409ca50$15327e82@pyrimidine> Message-ID: <62d36e2b0701101446p3c9cb83dk6591c7beef69ad1e@mail.gmail.com> Yes it works now. I realized I had not completely uninstalled the older version. Thanks. Himanshu \\ On 1/10/07, Chris Fields wrote: > > > > Himanshu Ardawatia wrote: > > > Hi, > > > > > > I am using version 1.4 . With this version, I run the > > script (the one > > > you sent) and get the following error: > > > Use of uninitialized value in pattern match (m//) at tree.pl line 7. > > > > > > -------------------- WARNING --------------------- > > > MSG: Must provide a valid array reference for -nodes > > > --------------------------------------------------- > > > Can't call method "id" on an undefined value at tree.pl line 9. > > > > > > > > > I then installed version 1.5.2. With this vesion I get the > > following error: > > > > What error? > > Maybe he meant the same error above, just forgot to delete the line. > > > > And I get the same error as above . > > > > You probably didn't install 1.5.2 properly. Either you're > > actually still using 1.4, or when 1.5.2 was installed not all > > of 1.4 was first removed. > > Check your version number ( > > http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of > > _BioPerl_is_installed.3F > > ), locate and manually delete your 1.4 installation and try > > the 1.5.2 installation again. > ... > > Also, a quick way to locate the version of Bioperl used by default in a > script (including directories defined in PERL5LIB): > > (BTW, this came from 'Perl Hacks'): > > perldoc -l Bio::Perl > > Or whatever Bioperl module you want. > > BTW, I get the script to work on Mac OS X and WinXP, so I agree with > Sendu: > you likely have problems with different versions of Bioperl installed on > your system causing conflicts. > > chris > > > From george.heller at yahoo.com Wed Jan 10 21:33:30 2007 From: george.heller at yahoo.com (George Heller) Date: Wed, 10 Jan 2007 18:33:30 -0800 (PST) Subject: [Bioperl-l] Taxon id null with load_seqdatabase.pl Message-ID: <20070111023331.61553.qmail@web58903.mail.re1.yahoo.com> Hi all. I just loaded Arabidopsis data into the bioentry and biosequence tables in my Postgres database using load_seqdatabase.pl. I had loaded the taxon data prior to this, using the load_ncbi_taxonomy.pl. Yet, I see the taxon_id field in the bioentry field is blank, after the loading. Can anyone tell me why this might be happening? And perhaps what needs to be done to correct this? Thanks. George --------------------------------- Access over 1 million songs - Yahoo! Music Unlimited. From lidaof at gmail.com Thu Jan 11 07:50:14 2007 From: lidaof at gmail.com (lidaof) Date: Thu, 11 Jan 2007 20:50:14 +0800 Subject: [Bioperl-l] problem with Bio::SearchIO::Writer In-Reply-To: References: <12d02c20701090638i714a0d66k6c0a88f0a1ae5fee@mail.gmail.com> <5EDDB2C8-60B6-4773-BB74-F5E355F85483@wustl.edu> <12d02c20701091737w666fcd5em17f86d619fdf39b0@mail.gmail.com> Message-ID: <12d02c20701110450w6fe2ca25i24b57ea7ffa2a06d@mail.gmail.com> Hi,Dave and Chris, sorry for disturbing again in the code i paste below: #!/usr/bin/perl -w use strict; use warnings; use Bio::SearchIO; #use Bio::SearchIO::Writer::TextResultWriter; my $in = new Bio::SearchIO (-format => 'blast', -file => "$ARGV[0]"); $cutoff = 1e-10; open OP,">out"; while(my $result = $in->next_result){ my $resultname = $result->query_name(); while(my $hit = $result->next_hit){ my $name = $hit->name(); my $e = $hit->significance(); if($e < $cutoff){ print OP "$resultname\t$name\t$e\n"; next; } } } i use a filehandle "OP" for file output i want to use Bio::SearchIO::Writer::TextResultWriter but actually i didn't use it that is the place you are not sure in your last mail and i will spend some time on reading the website of bioperl and this mailing list Thanks for your kindness:) Li On 1/10/07, David Messina wrote: > > this document also available on CPAN and i have saw it:) > > > That's good, but you may want to become familiar with the BioPerl website, > because the information there is more extensive and more up-to-date. > > > i use a filehandle for output before i know some Text output module such > as Bio::SearchIO::Writer::TextResultWriter or other module > so which variables could be visited by the object created by > Writer::TextResultWriter module? > > > I'm not exactly sure what you're asking here. Do you want to know what > methods can be used on a TextResultWriter object? > > > i have seen no FILE option of the synopsis of TextResultWriter > > > Ahh, that is because TextResultWriter doesn't have a file option. :) > If you look carefully at the example, it is actually Bio::SearchIO that > take the -file parameter: > > > my $in = new Bio::SearchIO(-format => 'blast', > -file => shift @ARGV); > > > > This example might be a little confusing because it uses an unusual > (antiquated?) syntax. > This would do the same thing: > > my $in = Bio::SearchIO->new(-format => 'blast', > -file => shift @ARGV); > > The Bio::SearchIO documentation for the new() method describes all of the > parameters it can take: > > http://doc.bioperl.org/bioperl-live/Bio/SearchIO.html#POD1 > > > that is exactly i confusing with > > > BioPerl can be hard to understand at first. Time spent reading bioperl.organd this mailing list is a good way to become familiar with the "Bioperl > way" of doing things. > > > Dave > -- Li From bosborne11 at verizon.net Thu Jan 11 12:30:47 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 11 Jan 2007 12:30:47 -0500 Subject: [Bioperl-l] Taxon id null with load_seqdatabase.pl In-Reply-To: <20070111023331.61553.qmail@web58903.mail.re1.yahoo.com> Message-ID: George, What file or files provided the Arabidopsis data? Brian O. On 1/10/07 9:33 PM, "George Heller" wrote: > Hi all. > > I just loaded Arabidopsis data into the bioentry and biosequence tables in > my Postgres database using load_seqdatabase.pl. I had loaded the taxon data > prior to this, using the load_ncbi_taxonomy.pl. Yet, I see the taxon_id field > in the bioentry field is blank, after the loading. > > Can anyone tell me why this might be happening? And perhaps what needs to be > done to correct this? > > Thanks. > George > > > --------------------------------- > Access over 1 million songs - Yahoo! Music Unlimited. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hxu.hong at gmail.com Thu Jan 11 12:26:34 2007 From: hxu.hong at gmail.com (Hong Xu) Date: Thu, 11 Jan 2007 12:26:34 -0500 Subject: [Bioperl-l] parser for BLAT cDNA-genome alignment in Sim4 format Message-ID: <474ce9a0701110926j347eb327rd9d6614eaa317f5e@mail.gmail.com> Dear all, I have written a module for parsing BLAT result in Sim4 format. I borrowed the code from Bio::Tools::Sim4::Results. Hope this code will be useful to other people. regards, Hong ------------- see code below ------------------------- # # BioPerl module for Bio::Tools::Sim4::BlatResults # # Cared for by Hong Xu # Modified from Bio::Tools::Sim4::BlatResults # that was developed by: # Ewan Birney # and Hilmar Lapp # # Copyright Ewan Birney, Hilmar Lapp, Hong Xu # # You may distribute this module under the same terms as perl itself # POD documentation - main docs before the code =head1 NAME Bio::Tools::Sim4::BlatResults - Results of one BLAT run with Sim4 output =head1 SYNOPSIS # to preset the order of EST and genomic file as given on the sim4 # command line: my $sim4 = Bio::Tools::Sim4::BlatResults->new(-file => 'result.sim4', -estfirst => 1); # to let the order be determined automatically (by length comparison): $sim4 = Bio::Tools::Sim4::BlatResults->new( -file => 'sim4.results' ); # filehandle: $sim4 = Bio::Tools::Sim4::BlatResults->new( -fh => \*INPUT ); # parse the results while(my $exonset = $sim4->next_exonset()) { # $exonset is-a Bio::SeqFeature::Generic with Bio::Tools::Sim4::Exons # as sub features print "Delimited on sequence ", $exonset->seq_id(), "from ", $exonset->start(), " to ", $exonset->end(), "\n"; foreach my $exon ( $exonset->sub_SeqFeature() ) { # $exon is-a Bio::SeqFeature::FeaturePair print "Exon from ", $exon->start, " to ", $exon->end, " on strand ", $exon->strand(), "\n"; # you can get out what it matched using the est_hit attribute my $homol = $exon->est_hit(); print "Matched to sequence ", $homol->seq_id, " at ", $homol->start," to ", $homol->end, "\n"; } } # essential if you gave a filename at initialization (otherwise the file # stays open) $sim4->close(); =head1 DESCRIPTION The sim4 module provides a parser and results object for sim4 output from BLAT. The sim4 results are specialised types of SeqFeatures, meaning you can add them to AnnSeq objects fine, and manipulate them in the "normal" seqfeature manner. The sim4 Exon objects are Bio::SeqFeature::FeaturePair inherited objects. The $esthit = $exon-Eest_hit() is the alignment as a feature on the matching object (normally, an EST), in which the start/end points are where the hit lies. To make this module work sensibly you need to run blat -out=sim4 genomic.fasta est.fasta exon.sim4 One fiddle here is that there are only two real possibilities to the matching criteria: either one sequence needs reversing or not. Because of this, it is impossible to tell whether the match is in the forward or reverse strand of the genomic DNA. We solve this here by assuming that the genomic DNA is always forward. As a consequence, the strand attribute of the matching EST is unknown, and the strand attribute of the genomic DNA (i.e., the Exon object) will reflect the direction of the hit. =head1 FEEDBACK =head2 Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l at bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists =head2 Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs at bio.perl.org http://bugzilla.bioperl.org/ =head1 AUTHOR - Ewan Birney, Hilmar Lapp, Hong Xu Email birney at sanger.ac.uk hlapp at gmx.net (or hilmar.lapp at pharma.novartis.com) hxu.hong at gmail.com Describe contact details here =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ =cut # Let the code begin... package Bio::Tools::Sim4::BlatResults; use vars qw(@ISA); use strict; # Object preamble - inherits from Bio::Root::Object use File::Basename; use Bio::Root::Root; use Bio::Tools::AnalysisResult; use Bio::Tools::Sim4::Exon; @ISA = qw(Bio::Tools::AnalysisResult); sub _initialize_state { my($self, at args) = @_; # call the inherited method first my $make = $self->SUPER::_initialize_state(@args); my ($est_is_first) = $self->_rearrange([qw(ESTFIRST)], @args); delete($self->{'_est_is_first'}); $self->{'_est_is_first'} = $est_is_first if(defined($est_is_first)); $self->analysis_method("BLAT"); } =head2 analysis_method Usage : $sim4->analysis_method(); Purpose : Inherited method. Overridden to ensure that the name matches /blat/i. Returns : String Argument : n/a =cut #------------- sub analysis_method { #------------- my ($self, $method) = @_; if($method && ($method !~ /blat/i)) { $self->throw("method $method not supported in " . ref($self)); } return $self->SUPER::analysis_method($method); } =head2 parse_next_alignment Title : parse_next_alignment Usage : @exons = $sim4_result->parse_next_alignment; foreach $exon (@exons) { # do something } Function: Parses the next alignment of the BLAT sim4 result file and returns the found exons as an array of Bio::Tools::Sim4::Exon objects. Call this method repeatedly until an empty array is returned to get the results for all alignments. The $exon->seq_id() attribute will be set to the identifier of the respective sequence for both sequences. The length is accessible via the seqlength() attribute of $exon->query() and $exon->est_hit(). It automatically determines which of the two sequences has been reversed, and adjusts the coordinates for that sequence. It will also detect whether the EST sequence(s) were given as first or as second file to sim4, unless this has been specified at creation time of the object. Example : Returns : An array of Bio::Tools::Sim4::Exon objects Args : =cut sub parse_next_alignment { my ($self) = @_; my @exons = (); my %seq1props = (); my %seq2props = (); # we refer to the properties of each seq by reference my ($estseq, $genomseq, $to_reverse); my $started = 0; my $hit_direction = 1; while(defined($_ = $self->_readline())) { # # bascially, each sim4 'hit' starts with seq1... # /^seq1/ && do { if($started) { $self->_pushback($_); last; } $started = 1; # seqname and length of seq 1 /^seq1\s+=\s+(\S+)\,\s+(\d+)/ || $self->throw("Sim4 parsing error on seq1 [$_] line. Sorry!"); $seq1props{'seqname'} = $1; $seq1props{'length'} = $2; next; }; /^seq2/ && do { /^seq2\s+=\s+(\S+)\,\s+(\d+)/ || $self->throw("Sim4 parsing error on seq2 [$_] line. Sorry!"); $seq2props{'seqname'} = $1; $seq2props{'length'} = $2; next; }; /^\(complement\)/ && do { $hit_direction = -1; next; }; # this matches # start-end (start-end) pctid% if(/(\d+)-(\d+)\s+\((\d+)-(\d+)\)\s+(\d+)%/) { $seq1props{'start'} = $1; $seq1props{'end'} = $2; $seq2props{'start'} = $3; $seq2props{'end'} = $4; my $pctid = $5; if(! defined($estseq)) { # for the first time here: need to set the references referring # to seq1 and seq2 if(! exists($self->{'_est_is_first'})) { # detect which one is the EST by looking at the lengths, # and assume that this holds throughout the entire result # file (i.e., when this method is called for the next # alignment, this will not be checked again) if($seq1props{'length'} > $seq2props{'length'}) { $self->{'_est_is_first'} = 0; } else { $self->{'_est_is_first'} = 1; } } if($self->{'_est_is_first'}) { $estseq = \%seq1props; $genomseq = \%seq2props; } else { $estseq = \%seq2props; $genomseq = \%seq1props; } } if($hit_direction == -1) { # we have to reverse the coordinates of one of both seqs my $tmp = $to_reverse->{'start'}; $to_reverse->{'start'} = $to_reverse->{'length'} - $to_reverse->{'end'} + 1; $to_reverse->{'end'} = $to_reverse->{'length'} - $tmp + 1; } # create and initialize the exon object my $exon = Bio::Tools::Sim4::Exon->new( '-start' => $genomseq->{'start'}, '-end' => $genomseq->{'end'}, '-strand' => $hit_direction); $exon->seq_id($genomseq->{'seqname'}); # feature1 is supposed to be initialized to a Similarity object, # but we provide a safety net if($exon->feature1()->can('seqlength')) { $exon->feature1()->seqlength($genomseq->{'length'}); } else { $exon->feature1()->add_tag_value('SeqLength', $genomseq->{'length'}); } # create and initialize the feature wrapping the 'hit' (the EST) my $fea2 = Bio::SeqFeature::Similarity->new( '-start' => $estseq->{'start'}, '-end' => $estseq->{'end'}, '-strand' => 0, '-primary' => "aligning_EST"); $fea2->seq_id($estseq->{'seqname'}); $fea2->seqlength($estseq->{'length'}); # store $exon->est_hit($fea2); # general properties $exon->source_tag($self->analysis_method()); $exon->percentage_id($pctid); $exon->score($exon->percentage_id()); # push onto array push(@exons, $exon); next; # back to while loop } } return @exons; } =head2 next_exonset Title : next_exonset Usage : $exonset = $sim4_result->parse_next_exonset; print "Exons start at ", $exonset->start(), "and end at ", $exonset->end(), "\n"; foreach $exon ($exonset->sub_SeqFeature()) { # do something } Function: Parses the next alignment of the Sim4 result file and returns the set of exons as a container of features. The container is itself a Bio::SeqFeature::Generic object, with the Bio::Tools::Sim4::Exon objects as sub features. Start, end, and strand of the container will represent the total region covered by the exons of this set. See the documentation of parse_next_alignment() for further reference about parsing and how the information is stored. Example : Returns : An Bio::SeqFeature::Generic object holding Bio::Tools::Sim4::Exon objects as sub features. Args : =cut sub next_exonset { my $self = shift; my $exonset; # get the next array of exons my @exons = $self->parse_next_alignment(); return if($#exons < 0); # create the container of exons as a feature object itself, with the # data of the first exon for initialization $exonset = Bio::SeqFeature::Generic->new('-start' => $exons[0]->start(), '-end' => $exons[0]->end(), '-strand' => $exons[0]->strand(), '-primary' => "ExonSet"); $exonset->source_tag($exons[0]->source_tag()); $exonset->seq_id($exons[0]->seq_id()); # now add all exons as sub features, with enabling EXPANsion of the region # covered in total foreach my $exon (@exons) { $exonset->add_sub_SeqFeature($exon, 'EXPAND'); } return $exonset; } =head2 next_feature Title : next_feature Usage : while($exonset = $sim4->next_feature()) { # do something } Function: Does the same as L. See there for documentation of the functionality. Call this method repeatedly until FALSE is returned. The returned object is actually a SeqFeatureI implementing object. This method is required for classes implementing the SeqAnalysisParserI interface, and is merely an alias for next_exonset() at present. Example : Returns : A Bio::SeqFeature::Generic object. Args : =cut sub next_feature { my ($self, at args) = @_; # even though next_exonset doesn't expect any args (and this method # does neither), we pass on args in order to be prepared if this changes # ever return $self->next_exonset(@args); } 1; From jason at bioperl.org Thu Jan 11 17:08:32 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Jan 2007 14:08:32 -0800 Subject: [Bioperl-l] parser for BLAT cDNA-genome alignment in Sim4 format In-Reply-To: <474ce9a0701110926j347eb327rd9d6614eaa317f5e@mail.gmail.com> References: <474ce9a0701110926j347eb327rd9d6614eaa317f5e@mail.gmail.com> Message-ID: <52E13338-C7FF-48E5-BFC1-5BA2F06103BB@bioperl.org> Is the BLAT SIM4 output qualitatively different from "regular" Sim4 output? Doesn't it make sense to just fix the parser to be more flexible? There is also a sim4 parser for query/hit style (rather than gene-style features you get from the Bio::Tools:: parser) in SearchIO::sim4 - curious if it fails for BLAT-created sim4 output? -jason On Jan 11, 2007, at 9:26 AM, Hong Xu wrote: > Dear all, > > I have written a module for parsing BLAT result in Sim4 format. I > borrowed the code from Bio::Tools::Sim4::Results. Hope this code will > be useful to other people. > > regards, > > Hong > > ------------- see code below ------------------------- > > # > # BioPerl module for Bio::Tools::Sim4::BlatResults > # > # Cared for by Hong Xu > # Modified from Bio::Tools::Sim4::BlatResults > # that was developed by: > # Ewan Birney > # and Hilmar Lapp > # > # Copyright Ewan Birney, Hilmar Lapp, Hong Xu > # > # You may distribute this module under the same terms as perl itself > > # POD documentation - main docs before the code > > =head1 NAME > > Bio::Tools::Sim4::BlatResults - Results of one BLAT run with Sim4 > output > > =head1 SYNOPSIS > > # to preset the order of EST and genomic file as given on the sim4 > # command line: > my $sim4 = Bio::Tools::Sim4::BlatResults->new(-file => > 'result.sim4', > -estfirst => 1); > # to let the order be determined automatically (by length > comparison): > $sim4 = Bio::Tools::Sim4::BlatResults->new( -file => > 'sim4.results' ); > # filehandle: > $sim4 = Bio::Tools::Sim4::BlatResults->new( -fh => \*INPUT ); > > # parse the results > while(my $exonset = $sim4->next_exonset()) { > # $exonset is-a Bio::SeqFeature::Generic with > Bio::Tools::Sim4::Exons > # as sub features > print "Delimited on sequence ", $exonset->seq_id(), > "from ", $exonset->start(), " to ", $exonset->end(), > "\n"; > foreach my $exon ( $exonset->sub_SeqFeature() ) { > # $exon is-a Bio::SeqFeature::FeaturePair > print "Exon from ", $exon->start, " to ", $exon->end, > " on strand ", $exon->strand(), "\n"; > # you can get out what it matched using the est_hit > attribute > my $homol = $exon->est_hit(); > print "Matched to sequence ", $homol->seq_id, > " at ", $homol->start," to ", $homol->end, "\n"; > } > } > > # essential if you gave a filename at initialization (otherwise > the file > # stays open) > $sim4->close(); > > =head1 DESCRIPTION > > The sim4 module provides a parser and results object for sim4 > output from BLAT. > The sim4 results are specialised types of SeqFeatures, meaning you > can add them > to AnnSeq objects fine, and manipulate them in the "normal" > seqfeature manner. > > The sim4 Exon objects are Bio::SeqFeature::FeaturePair inherited > objects. The > $esthit = $exon-Eest_hit() is the alignment as a feature on the > matching > object (normally, an EST), in which the start/end points are where > the hit > lies. > > To make this module work sensibly you need to run > > blat -out=sim4 genomic.fasta est.fasta exon.sim4 > > One fiddle here is that there are only two real possibilities to > the matching > criteria: either one sequence needs reversing or not. Because of > this, it > is impossible to tell whether the match is in the forward or > reverse strand > of the genomic DNA. We solve this here by assuming that the genomic > DNA is > always forward. As a consequence, the strand attribute of the > matching EST is > unknown, and the strand attribute of the genomic DNA (i.e., the > Exon object) > will reflect the direction of the hit. > > =head1 FEEDBACK > > =head2 Mailing Lists > > User feedback is an integral part of the evolution of this and other > Bioperl modules. Send your comments and suggestions preferably to one > of the Bioperl mailing lists. Your participation is much appreciated. > > bioperl-l at bioperl.org - General discussion > http://bio.perl.org/MailList.html - About the mailing > lists > > =head2 Reporting Bugs > > Report bugs to the Bioperl bug tracking system to help us keep track > the bugs and their resolution. Bug reports can be submitted via email > or the web: > > bioperl-bugs at bio.perl.org > http://bugzilla.bioperl.org/ > > =head1 AUTHOR - Ewan Birney, Hilmar Lapp, Hong Xu > > Email birney at sanger.ac.uk > hlapp at gmx.net (or hilmar.lapp at pharma.novartis.com) > hxu.hong at gmail.com > > Describe contact details here > > =head1 APPENDIX > > The rest of the documentation details each of the object methods. > Internal methods are usually preceded with a _ > > =cut > > > # Let the code begin... > > > package Bio::Tools::Sim4::BlatResults; > use vars qw(@ISA); > use strict; > > # Object preamble - inherits from Bio::Root::Object > > use File::Basename; > use Bio::Root::Root; > use Bio::Tools::AnalysisResult; > use Bio::Tools::Sim4::Exon; > > @ISA = qw(Bio::Tools::AnalysisResult); > > > sub _initialize_state { > my($self, at args) = @_; > > # call the inherited method first > my $make = $self->SUPER::_initialize_state(@args); > > my ($est_is_first) = $self->_rearrange([qw(ESTFIRST)], @args); > > delete($self->{'_est_is_first'}); > $self->{'_est_is_first'} = $est_is_first if(defined > ($est_is_first)); > $self->analysis_method("BLAT"); > } > > =head2 analysis_method > > Usage : $sim4->analysis_method(); > Purpose : Inherited method. Overridden to ensure that the name > matches > /blat/i. > Returns : String > Argument : n/a > > =cut > > #------------- > sub analysis_method { > #------------- > my ($self, $method) = @_; > if($method && ($method !~ /blat/i)) { > $self->throw("method $method not supported in " . ref($self)); > } > return $self->SUPER::analysis_method($method); > } > > =head2 parse_next_alignment > > Title : parse_next_alignment > Usage : @exons = $sim4_result->parse_next_alignment; > foreach $exon (@exons) { > # do something > } > Function: Parses the next alignment of the BLAT sim4 result file > and returns > the found exons as an array of Bio::Tools::Sim4::Exon > objects. Call > this method repeatedly until an empty array is returned > to get the > results for all alignments. > > The $exon->seq_id() attribute will be set to the > identifier of the > respective sequence for both sequences. The length is > accessible > via the seqlength() attribute of $exon->query() and > $exon->est_hit(). > > It automatically determines which of the two sequences > has been > reversed, and adjusts the coordinates for that sequence. > It will > also detect whether the EST sequence(s) were given as > first or as > second file to sim4, unless this has been specified at > creation > time of the object. > > Example : > Returns : An array of Bio::Tools::Sim4::Exon objects > Args : > > > =cut > > sub parse_next_alignment { > my ($self) = @_; > my @exons = (); > my %seq1props = (); > my %seq2props = (); > # we refer to the properties of each seq by reference > my ($estseq, $genomseq, $to_reverse); > my $started = 0; > my $hit_direction = 1; > > while(defined($_ = $self->_readline())) { > > # > # bascially, each sim4 'hit' starts with seq1... > # > /^seq1/ && do { > if($started) { > $self->_pushback($_); > last; > } > $started = 1; > > # seqname and length of seq 1 > /^seq1\s+=\s+(\S+)\,\s+(\d+)/ || > $self->throw("Sim4 parsing error on seq1 [$_] line. Sorry!"); > $seq1props{'seqname'} = $1; > $seq1props{'length'} = $2; > next; > }; > /^seq2/ && do { > /^seq2\s+=\s+(\S+)\,\s+(\d+)/ || > $self->throw("Sim4 parsing error on seq2 [$_] line. Sorry!"); > $seq2props{'seqname'} = $1; > $seq2props{'length'} = $2; > next; > }; > /^\(complement\)/ && do { > $hit_direction = -1; > next; > }; > > # this matches > # start-end (start-end) pctid% > if(/(\d+)-(\d+)\s+\((\d+)-(\d+)\)\s+(\d+)%/) { > $seq1props{'start'} = $1; > $seq1props{'end'} = $2; > $seq2props{'start'} = $3; > $seq2props{'end'} = $4; > my $pctid = $5; > > if(! defined($estseq)) { > # for the first time here: need to set the references > referring > # to seq1 and seq2 > if(! exists($self->{'_est_is_first'})) { > # detect which one is the EST by looking at the lengths, > # and assume that this holds throughout the entire result > # file (i.e., when this method is called for the next > # alignment, this will not be checked again) > if($seq1props{'length'} > $seq2props{'length'}) { > $self->{'_est_is_first'} = 0; > } else { > $self->{'_est_is_first'} = 1; > } > } > if($self->{'_est_is_first'}) { > $estseq = \%seq1props; > $genomseq = \%seq2props; > } else { > $estseq = \%seq2props; > $genomseq = \%seq1props; > } > } > if($hit_direction == -1) { > # we have to reverse the coordinates of one of both seqs > my $tmp = $to_reverse->{'start'}; > $to_reverse->{'start'} = > $to_reverse->{'length'} - $to_reverse->{'end'} + 1; > $to_reverse->{'end'} = $to_reverse->{'length'} - $tmp + 1; > } > # create and initialize the exon object > my $exon = Bio::Tools::Sim4::Exon->new( > '-start' => $genomseq->{'start'}, > '-end' => $genomseq->{'end'}, > '-strand' => $hit_direction); > $exon->seq_id($genomseq->{'seqname'}); > # feature1 is supposed to be initialized to a Similarity object, > # but we provide a safety net > if($exon->feature1()->can('seqlength')) { > $exon->feature1()->seqlength($genomseq->{'length'}); > } else { > $exon->feature1()->add_tag_value('SeqLength', > $genomseq->{'length'}); > } > # create and initialize the feature wrapping the 'hit' (the EST) > my $fea2 = Bio::SeqFeature::Similarity->new( > '-start' => $estseq->{'start'}, > '-end' => $estseq->{'end'}, > '-strand' => 0, > '-primary' => "aligning_EST"); > $fea2->seq_id($estseq->{'seqname'}); > $fea2->seqlength($estseq->{'length'}); > # store > $exon->est_hit($fea2); > # general properties > $exon->source_tag($self->analysis_method()); > $exon->percentage_id($pctid); > $exon->score($exon->percentage_id()); > # push onto array > push(@exons, $exon); > next; # back to while loop > } > } > return @exons; > } > > =head2 next_exonset > > Title : next_exonset > Usage : $exonset = $sim4_result->parse_next_exonset; > print "Exons start at ", $exonset->start(), > "and end at ", $exonset->end(), "\n"; > foreach $exon ($exonset->sub_SeqFeature()) { > # do something > } > Function: Parses the next alignment of the Sim4 result file and > returns the > set of exons as a container of features. The container > is itself > a Bio::SeqFeature::Generic object, with the > Bio::Tools::Sim4::Exon > objects as sub features. Start, end, and strand of the > container > will represent the total region covered by the exons of > this set. > > See the documentation of parse_next_alignment() for further > reference about parsing and how the information is stored. > > Example : > Returns : An Bio::SeqFeature::Generic object holding > Bio::Tools::Sim4::Exon > objects as sub features. > Args : > > =cut > > sub next_exonset { > my $self = shift; > my $exonset; > > # get the next array of exons > my @exons = $self->parse_next_alignment(); > return if($#exons < 0); > # create the container of exons as a feature object itself, > with the > # data of the first exon for initialization > $exonset = Bio::SeqFeature::Generic->new('-start' => $exons[0]- > >start(), > '-end' => $exons[0]->end(), > '-strand' => $exons[0]->strand(), > '-primary' => "ExonSet"); > $exonset->source_tag($exons[0]->source_tag()); > $exonset->seq_id($exons[0]->seq_id()); > # now add all exons as sub features, with enabling EXPANsion of > the region > # covered in total > foreach my $exon (@exons) { > $exonset->add_sub_SeqFeature($exon, 'EXPAND'); > } > return $exonset; > } > > =head2 next_feature > > Title : next_feature > Usage : while($exonset = $sim4->next_feature()) { > # do something > } > Function: Does the same as L. See there for > documentation of > the functionality. Call this method repeatedly until > FALSE is > returned. > > The returned object is actually a SeqFeatureI > implementing object. > This method is required for classes implementing the > SeqAnalysisParserI interface, and is merely an alias for > next_exonset() at present. > > Example : > Returns : A Bio::SeqFeature::Generic object. > Args : > > =cut > > sub next_feature { > my ($self, at args) = @_; > # even though next_exonset doesn't expect any args (and this > method > # does neither), we pass on args in order to be prepared if > this changes > # ever > return $self->next_exonset(@args); > } > > 1; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dmessina at wustl.edu Thu Jan 11 17:32:05 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 11 Jan 2007 16:32:05 -0600 Subject: [Bioperl-l] problem with Bio::SearchIO::Writer In-Reply-To: <12d02c20701110450w6fe2ca25i24b57ea7ffa2a06d@mail.gmail.com> References: <12d02c20701090638i714a0d66k6c0a88f0a1ae5fee@mail.gmail.com> <5EDDB2C8-60B6-4773-BB74-F5E355F85483@wustl.edu> <12d02c20701091737w666fcd5em17f86d619fdf39b0@mail.gmail.com> <12d02c20701110450w6fe2ca25i24b57ea7ffa2a06d@mail.gmail.com> Message-ID: <54717799-69C3-46C5-997C-94C62C99A103@wustl.edu> Hi Li, Please reply also to the list so that the whole thread can be archived. I dug a little deeper into the Bio::SearchIO::Writer::TextResultWriter documentation and found (in the description section) that you can pass a subroutine to do your filtering. So your code can be shortened to: #!/usr/bin/perl -w use strict; use warnings; use Bio::SearchIO; use Bio::SearchIO::Writer::TextResultWriter; my $usage = " signifablast - filter blast reports for significance and write out a new report Usage: signifablast "; @ARGV == 1 or die $usage; # create a SearchIO object for reading in the file of blast report(s) my $in = Bio::SearchIO->new( -format => 'blast', -file => "$ARGV[0]" ); # create a TextResultWriter object my $writer = Bio::SearchIO::Writer::TextResultWriter->new( -filters => { 'HIT' => \&hit_filter } ); # create a SearchIO object to store our filtered hits and write them to 'out' my $out = Bio::SearchIO->new( -writer => $writer, -file => '>out' ); # write out our new (text) blast report $out->write_result($in->next_result); # E-value filter sub hit_filter { my $hit = shift; # set E value cutoff my $e = 1e-10; # &hit_filter must return true to keep the hit and false to discard the hit return $hit->significance < $e; } Dave On Jan 11, 2007, at 6:50 AM, lidaof wrote: > Hi,Dave and Chris, > > sorry for disturbing again > in the code i paste below: > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::SearchIO; > #use Bio::SearchIO::Writer::TextResultWriter; > > my $in = new Bio::SearchIO (-format => 'blast', -file => "$ARGV[0]"); > > $cutoff = 1e-10; > open OP,">out"; > while(my $result = $in->next_result){ > my $resultname = $result->query_name(); > while(my $hit = $result->next_hit){ > my $name = $hit->name(); > my $e = $hit->significance(); > if($e < $cutoff){ > print OP "$resultname\t$name\t$e\n"; > next; > } > } > } > > i use a filehandle "OP" for file output > i want to use Bio::SearchIO::Writer::TextResultWriter but actually > i didn't use it > that is the place you are not sure in your last mail > and i will spend some time on reading the website of bioperl and > this mailing list > > Thanks for your kindness:) > > Li From lidaof at gmail.com Thu Jan 11 20:24:04 2007 From: lidaof at gmail.com (lidaof) Date: Fri, 12 Jan 2007 09:24:04 +0800 Subject: [Bioperl-l] problem with Bio::SearchIO::Writer In-Reply-To: <54717799-69C3-46C5-997C-94C62C99A103@wustl.edu> References: <12d02c20701090638i714a0d66k6c0a88f0a1ae5fee@mail.gmail.com> <5EDDB2C8-60B6-4773-BB74-F5E355F85483@wustl.edu> <12d02c20701091737w666fcd5em17f86d619fdf39b0@mail.gmail.com> <12d02c20701110450w6fe2ca25i24b57ea7ffa2a06d@mail.gmail.com> <54717799-69C3-46C5-997C-94C62C99A103@wustl.edu> Message-ID: <12d02c20701111724p25e19c51mc005c6c4fd15fa00@mail.gmail.com> Hi Dave, Thank you for the patience on my problem and now i have some understanding on that issue Thank you! my last mail alse have " bcc bioperl-l at lists.open-bio.org" and i have joined in this great list:) Li On 1/12/07, David Messina wrote: > > Hi Li, > > Please reply also to the list so that the whole thread can be archived. > > > I dug a little deeper into the Bio::SearchIO::Writer::TextResultWriter > documentation and found (in the description section) that you can pass a > subroutine to do your filtering. So your code can be shortened to: > > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::SearchIO; > use Bio::SearchIO::Writer::TextResultWriter; > > my $usage = " > signifablast - filter blast reports for significance and write out a new > report > > Usage: signifablast > "; > @ARGV == 1 or die $usage; > > # create a SearchIO object for reading in the file of blast report(s) > my $in = Bio::SearchIO->new( -format => 'blast', -file => "$ARGV[0]" ); > > # create a TextResultWriter object > my $writer = Bio::SearchIO::Writer::TextResultWriter->new( > -filters => { 'HIT' => \&hit_filter } ); > > # create a SearchIO object to store our filtered hits and write them to > 'out' > my $out = Bio::SearchIO->new( -writer => $writer, -file => '>out' ); > > # write out our new (text) blast report > $out->write_result($in->next_result); > > > > # E-value filter > sub hit_filter { > my $hit = shift; > > # set E value cutoff > my $e = 1e-10; > > # &hit_filter must return true to keep the hit and false to discard > the hit > return $hit->significance < $e; > } > > > > > > > Dave > > > > > > On Jan 11, 2007, at 6:50 AM, lidaof wrote: > > Hi,Dave and Chris, > > sorry for disturbing again > in the code i paste below: > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::SearchIO; > #use Bio::SearchIO::Writer::TextResultWriter; > > my $in = new Bio::SearchIO (-format => 'blast', -file => "$ARGV[0]"); > > $cutoff = 1e-10; > open OP,">out"; > while(my $result = $in->next_result){ > my $resultname = $result->query_name(); > while(my $hit = $result->next_hit){ > my $name = $hit->name(); > my $e = $hit->significance(); > if($e < $cutoff){ > print OP "$resultname\t$name\t$e\n"; > next; > } > } > } > > i use a filehandle "OP" for file output > i want to use Bio::SearchIO::Writer::TextResultWriter but actually i > didn't use it > that is the place you are not sure in your last mail > and i will spend some time on reading the website of bioperl and this > mailing list > > Thanks for your kindness:) > > Li > > > > -- Li From lidaof at gmail.com Thu Jan 11 23:18:47 2007 From: lidaof at gmail.com (lidaof) Date: Fri, 12 Jan 2007 12:18:47 +0800 Subject: [Bioperl-l] problem with Bio::SearchIO::Writer In-Reply-To: <12d02c20701111724p25e19c51mc005c6c4fd15fa00@mail.gmail.com> References: <12d02c20701090638i714a0d66k6c0a88f0a1ae5fee@mail.gmail.com> <5EDDB2C8-60B6-4773-BB74-F5E355F85483@wustl.edu> <12d02c20701091737w666fcd5em17f86d619fdf39b0@mail.gmail.com> <12d02c20701110450w6fe2ca25i24b57ea7ffa2a06d@mail.gmail.com> <54717799-69C3-46C5-997C-94C62C99A103@wustl.edu> <12d02c20701111724p25e19c51mc005c6c4fd15fa00@mail.gmail.com> Message-ID: <12d02c20701112018qb46706bxbbec45eebbf17a13@mail.gmail.com> Hi Dave, i use this two code for the test file the Bio::SearchIO HOW-TO pages provided with the code i provied,the content of out: [lidaof at lidaofbox blast]$ more out gi|20521485|dbj|AP004641.2 gb|443893|124775 2e-022 with the code you provided,the conten of out: [lidaof at lidaofbox blast]$ more out BLASTX 2.2.4 [Aug-26-2002] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= gi|20521485|dbj|AP004641.2 Oryza sativa (japonica cultivar-group) genomic DNA, chromosome 1, BAC clone:B1147B04, 3785 bases, 977CE9AF checksum. (3,059 letters) Database: test.fa 5 sequences; 1,291 total letters Score E Sequences producing significant alignments: (bits) value gb|443893|124775 LaForas sequence 92 2e-022 >gb|443893|124775 LaForas sequence Length = 331 Score = 92.0 bits (227), Expect = 2e-022 Identities = 46/52 (88%), Positives = 48/52 (92%), Gaps = 2/52 (3%) Frame = +1 Query: 2896 DMGRCSSGCNRYPEPMTPDTMIKLYREKEGLGAYIWMPTPDMSTEGRVQMLP 3051 D+ + SSGCNRYPEPMTPDTMIKLYRE EGL AYIWMPTPDMSTEGRVQMLP Sbjct: 197 DIVQNSSGCNRYPEPMTPDTMIKLYRE-EGL-AYIWMPTPDMSTEGRVQMLP 246 Database: test.fa Posted date: Feb 12, 2003 9:51 AM Number of letters in database: 5 Number of sequences in database: 1,291 Matrix: BLOSUM62 Gap Penalties Existence: 11, Extension: 1 expect: 10.0 allowgaps: yes Search Statistics A: 40 Hits_to_DB: 7,140 S1: 32 S1_bits: 17.6 T: 12 X1: 16 X1_bits: 7.3 X2: 38 X2_bits: 14.6 X3: 64 X3_bits: 24.7 dbentries: 5 dbletters: 1,291 decayconst: 0.1 so the second just like the input file the first is exactly i wanted,but without use of Bio::SearchIO::Writer::TextResultWriter can i got the result like the first one by using Bio::SearchIO::Writer::TextResultWriter? Thanks! Li On 1/12/07, lidaof wrote: > > Hi Dave, > > Thank you for the patience on my problem > and now i have some understanding on that issue > Thank you! > my last mail alse have " bcc bioperl-l at lists.open-bio.org" > and i have joined in this great list:) > > Li > > On 1/12/07, David Messina wrote: > > > > Hi Li, > > > > Please reply also to the list so that the whole thread can be archived. > > > > > > I dug a little deeper into the Bio::SearchIO::Writer::TextResultWriter > > documentation and found (in the description section) that you can pass a > > subroutine to do your filtering. So your code can be shortened to: > > > > > > #!/usr/bin/perl -w > > > > use strict; > > use warnings; > > use Bio::SearchIO; > > use Bio::SearchIO::Writer::TextResultWriter; > > > > my $usage = " > > signifablast - filter blast reports for significance and write out a new > > report > > > > Usage: signifablast > > "; > > @ARGV == 1 or die $usage; > > > > # create a SearchIO object for reading in the file of blast report(s) > > my $in = Bio::SearchIO->new( -format => 'blast', -file => "$ARGV[0]" ); > > > > # create a TextResultWriter object > > my $writer = Bio::SearchIO::Writer::TextResultWriter->new( > > -filters => { 'HIT' => \&hit_filter } ); > > > > # create a SearchIO object to store our filtered hits and write them to > > 'out' > > my $out = Bio::SearchIO->new( -writer => $writer, -file => '>out' ); > > > > # write out our new (text) blast report > > $out->write_result($in->next_result); > > > > > > > > # E-value filter > > sub hit_filter { > > my $hit = shift; > > > > # set E value cutoff > > my $e = 1e-10; > > > > # &hit_filter must return true to keep the hit and false to discard > > the hit > > return $hit->significance < $e; > > } > > > > > > > > > > > > > > Dave > > > > > > > > > > > > On Jan 11, 2007, at 6:50 AM, lidaof wrote: > > > > Hi,Dave and Chris, > > > > sorry for disturbing again > > in the code i paste below: > > > > #!/usr/bin/perl -w > > > > use strict; > > use warnings; > > use Bio::SearchIO; > > #use Bio::SearchIO::Writer::TextResultWriter; > > > > my $in = new Bio::SearchIO (-format => 'blast', -file => "$ARGV[0]"); > > > > $cutoff = 1e-10; > > open OP,">out"; > > while(my $result = $in->next_result){ > > my $resultname = $result->query_name(); > > while(my $hit = $result->next_hit){ > > my $name = $hit->name(); > > my $e = $hit->significance(); > > if($e < $cutoff){ > > print OP "$resultname\t$name\t$e\n"; > > next; > > } > > } > > } > > > > i use a filehandle "OP" for file output > > i want to use Bio::SearchIO::Writer::TextResultWriter but actually i > > didn't use it > > that is the place you are not sure in your last mail > > and i will spend some time on reading the website of bioperl and this > > mailing list > > > > Thanks for your kindness:) > > > > Li > > > > > > > > > > > > -- > Li -- Li From N.Haigh at sheffield.ac.uk Sat Jan 13 05:05:36 2007 From: N.Haigh at sheffield.ac.uk (Nathan Haigh) Date: Sat, 13 Jan 2007 10:05:36 +0000 Subject: [Bioperl-l] Fwd: FASTA version numbers Message-ID: <1168682736.45a8aef04e4e7@webmail.shef.ac.uk> Before the 1.5.2 release there was some talk about being able to obtain version numbers from FASTA that could be easily compared computationally. Unfortunately, FASTA contained none numeric characters and also didn't output the full version number. I mentioned the problems and a made a couple of suggestions. I have now just recieved this replay from Bill Peason (the author of the FASTA programs) and thought I'd post it to the list FYI. Nath ----- Forwarded message from "William R. Pearson" ----- Date: Fri, 12 Jan 2007 15:59:33 -0500 From: "William R. Pearson" Reply-To: "William R. Pearson" Subject: FASTA version numbers To: n.haigh at sheffield.ac.uk Several people have asked me to simplify (or perhaps just rationalize) the version numbers used by FASTA (see below). The version string makes some sense to me (and should be logged in the readme.v34t0 file), but I can see why it causes problems. With the next release, I will go to a new system - my CVS tags will be of the form fasta-34_26_x (since CVS does not allow decimal points in a version tag) and, within the program output, you will see fasta-34.26 (typically without the last number, but with a date). The actual filenames on the FTP site will be fasta-34.26.2.shar.Z or fasta-34.26.2.tgz. This should address most of the problems. However, part of the problem is that there are several versions associated with the program - in particular the version printed at the beginning of the output: ===================================== SSEARCH searches a sequence data bank version 34.26 January 12, 2007 ===================================== and at the end: ===================================== 218 residues in 1 query sequences 83566858 residues in 223447 library sequences Tcomplib [34.26] (2 proc) start: Fri Jan 12 15:22:33 2007 done: Fri Jan 12 15:22:39 2007 Total Scan time: 11.490 Total Display time: 0.130 Function used was SSEARCH [version 34.26 January 12, 2007] ===================================== which looks different from the one printed in the algorithm description: ===================================== Smith-Waterman (SSE2, Michael Farrar 2006) (5.5 Sept 2006) function [BL50 matrix (15:-5)xS], open/ext: -10/-2 ===================================== or ===================================== FASTX (3.5 Sept 2006) function [optimized, BL50 matrix (o=15:-5:-1:-1) xS] ktup:2 ===================================== The algorithm version strings and dates have little to do with each other because the algorithms are revised much more rarely than the main wrapper programs. Hopefully, the new system will make it easier to keep track of things. Bill Pearson Begin forwarded message: > From: "Nathan S. Haigh" > Date: November 6, 2006 12:52:25 PM EST > Subject: FASTA versioning > > Dear Prof. Pearson, > > I am trying to extend a Bioperl module that works as a wrapper for the > FASTA programs. I am currently, trying to build a subroutine that > extracts the version number of the installed FASTA program for later > comparison. However, because of the nature of the versoning system > that > you have employed it makes it difficult to do a computational > comparison > of the version strings extracted as they are not a pure floating point > number. > > Please could you let me know your current versioning scheme for FASTA > releases is? I.e. what does t24 mean what does b2 mean? This might > then > allow me to attempt to do version comparisons successfully. In > addition, > it appears that the version printed when you start one of the programs > does not display the full version information (as indicated by the > downloaded file) and also shows what I assume is a release date? > e.g. version 3.4t26 July 7, 2006 rather than 3.4t26b2 as indicated by > the downloaded file (I'm not even sure if this version matches the > filename - just an example) > > A more standard floating point number (3.4262) or 3-4 numbers > separated > by decimal points (3.4.26.2) would make computational comparisons far > easier. > > Kind regards > Nathan > ----- End forwarded message ----- From bosborne11 at verizon.net Sat Jan 13 13:06:30 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 13 Jan 2007 13:06:30 -0500 Subject: [Bioperl-l] Quote Message-ID: Technology Review January/February 2007 ?Bjarne Stroustrup The Problem with Programming? TR: Why is most software so bad? Bjarne Stroustrup: Some software is actually pretty good, by any standard. Think of the Mars Rover, Google, and the Human Genome Project. Now, that?s quality software! From slenk at emich.edu Sat Jan 13 16:24:19 2007 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Sat, 13 Jan 2007 16:24:19 -0500 Subject: [Bioperl-l] Quote Message-ID: <1c20e0a1c23916.1c239161c20e0a@emich.edu> Hi, I'm a bit puzzled about what "actually pretty good" means. My understanding about Mars Rover is: "The current Mars rovers may rely on proven computer technology, but for Spirit the journey has not been glitch-free. After a promising start to its mission, the Spirit rover -- the first of the MER twins to land on Mars -- stopped sending proper data to JPL scientists 18 days into the mission and later baffled ground controllers by rebooting itself over and over again. Since then, mission controllers were able to regain reliable communications with the rover and continue to study what may have caused the malfunction." [http://www.space.com/businesstechnology/technology/mer_computer_040128 .html] Most industries know that "rebooting itself over and over again" is not desirable. Evidently Mars is out of sight - out of mind. Software developers have excessive pride in deliverables that are all too often grossly dysfunctional. If a product released to the market failed by "rebooting itself over and over again," there would be no end of deserved criticism. Why is software immune from reasonable scrutiny? Who believes their community is immune from the release of defects? Maybe we should all ask ourselves how long it took for defects to be found after the release of our last piece of wonderware, whatever it was. As far a Google or genome software - how do you know you haven't missed something in ALL the web pages of the world or in a huge database being searched heuristicly. Are you saying defect free - never misses anything - perfect? Or just darn good - if so, how good, and how is that determined? Is that aspect of quality openly measured, quantified, and available or is it just brushed under the rug. I do not claim to be flawless. This is *-->>NOT<--* a screed against Osborne/Stroustrop or anyone else who takes pride in a job well done. I am simply not convinced that the software community is as defect free as they claim. Sorry to be 'aggressive' (if this be such) but I am completely sick of defective software propelled crap. Steve ----- Original Message ----- From: Brian Osborne Date: Saturday, January 13, 2007 1:06 pm Subject: [Bioperl-l] Quote > Technology Review January/February 2007 ?Bjarne Stroustrup The > Problem with > Programming? > > TR: Why is most software so bad? > > Bjarne Stroustrup: Some software is actually pretty good, by any > standard.Think of the Mars Rover, Google, and the Human Genome > Project. Now, that?s > quality software! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Sat Jan 13 20:00:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 13 Jan 2007 19:00:42 -0600 Subject: [Bioperl-l] Quote In-Reply-To: <1c20e0a1c23916.1c239161c20e0a@emich.edu> References: <1c20e0a1c23916.1c239161c20e0a@emich.edu> Message-ID: <366F1CCE-BD58-4647-B0CD-3BFEF34B7C37@uiuc.edu> Not to wax philosophical on a very off-topic issue (but it is the weekend)... I would also argue that software itself isn't to blame. Software, particularly open-source software, is only as good as the people involved with its development, either the actual developers or users who contribute back in some way (filing bugs, making suggestions, etc). People aren't perfect, so why expect software to be? To make a completely lame analogy : if you sat on a faulty chair which collapsed, would you blame the chair or the carpenter? As for making sure nothing is missed or is defect-free, how can one prove a negative? Of course something will be missed, or a defect eventually found. People are still finding new and exciting things (riboswitches, epigenetic regulatory mechanisms, etc) years after genomes have been completed and released. A huge number of predicted proteins have no known or characterized function. Security holes have been consistently found (and patched and sometimes repatched) in some of the best OS's out there. Something changes beyond the control of a developer (a sequence format, or a server change) thus causing a bug in the software. Whatever you do, don't confuse success with perfection. Success at least is attainable; perfection, meh, not so much. Sorry you had frustrations with something along the way. chris On Jan 13, 2007, at 3:24 PM, Stephen Gordon Lenk wrote: > Hi, > > I'm a bit puzzled about what "actually pretty good" means. My > understanding about Mars Rover is: > > "The current Mars rovers may rely on proven computer technology, but > for Spirit the journey has not been glitch-free. > > After a promising start to its mission, the Spirit rover -- the first > of the MER twins to land on Mars -- stopped sending proper data to JPL > scientists 18 days into the mission and later baffled ground > controllers by rebooting itself over and over again. Since then, > mission controllers were able to regain reliable communications with > the rover and continue to study what may have caused the malfunction." > [http://www.space.com/businesstechnology/technology/ > mer_computer_040128 > .html] > > Most industries know that "rebooting itself over and over again" is > not desirable. Evidently Mars is out of sight - out of mind. Software > developers have excessive pride in deliverables that are all too often > grossly dysfunctional. If a product released to the market failed > by "rebooting itself over and over again," there would be no end of > deserved criticism. Why is software immune from reasonable scrutiny? > Who believes their community is immune from the release of defects? > Maybe we should all ask ourselves how long it took for defects to be > found after the release of our last piece of wonderware, whatever it > was. > > As far a Google or genome software - how do you know you haven't > missed something in ALL the web pages of the world or in a huge > database being searched heuristicly. Are you saying defect free - > never misses anything - perfect? Or just darn good - if so, how good, > and how is that determined? Is that aspect of quality openly measured, > quantified, and available or is it just brushed under the rug. > > I do not claim to be flawless. This is *-->>NOT<--* a screed against > Osborne/Stroustrop or anyone else who takes pride in a job well done. > I am simply not convinced that the software community is as defect > free as they claim. Sorry to be 'aggressive' (if this be such) but I > am completely sick of defective software propelled crap. > > Steve > > > ----- Original Message ----- > From: Brian Osborne > Date: Saturday, January 13, 2007 1:06 pm > Subject: [Bioperl-l] Quote > >> Technology Review January/February 2007 ?Bjarne Stroustrup The >> Problem with >> Programming? >> >> TR: Why is most software so bad? >> >> Bjarne Stroustrup: Some software is actually pretty good, by any >> standard.Think of the Mars Rover, Google, and the Human Genome >> Project. Now, that?s >> quality software! Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From slenk at emich.edu Sun Jan 14 11:30:11 2007 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Sun, 14 Jan 2007 11:30:11 -0500 Subject: [Bioperl-l] Quote Message-ID: <1c91e091c93aa4.1c93aa41c91e09@emich.edu> The open-source community is OK - It's 'professional' developers who can't build reliable software, but who refuse to learn new skills or even respond to defect reports, that fry me. EE's like to call non-EEs "w******" or "pc-programmers" at the same time that they release code for test that blatently does not work. I suppose I was actually off topic - sorry. Off topic in the sense of non-open source - quality and reliability are, however, on topic in a software tools site. ----- Original Message ----- From: Chris Fields Date: Saturday, January 13, 2007 8:00 pm Subject: Re: [Bioperl-l] Quote > Not to wax philosophical on a very off-topic issue (but it is the > weekend)... > > I would also argue that software itself isn't to blame. Software, > > particularly open-source software, is only as good as the people > involved with its development, either the actual developers or > users > who contribute back in some way (filing bugs, making suggestions, > etc). People aren't perfect, so why expect software to be? To > make > a completely lame analogy : if you sat on a faulty chair which > collapsed, would you blame the chair or the carpenter? > > As for making sure nothing is missed or is defect-free, how can > one > prove a negative? Of course something will be missed, or a defect > > eventually found. People are still finding new and exciting > things > (riboswitches, epigenetic regulatory mechanisms, etc) years after > genomes have been completed and released. A huge number of > predicted > proteins have no known or characterized function. Security holes > have been consistently found (and patched and sometimes repatched) > in > some of the best OS's out there. Something changes beyond the > control of a developer (a sequence format, or a server change) > thus > causing a bug in the software. > > Whatever you do, don't confuse success with perfection. Success > at > least is attainable; perfection, meh, not so much. > > Sorry you had frustrations with something along the way. > > chris > > On Jan 13, 2007, at 3:24 PM, Stephen Gordon Lenk wrote: > > > Hi, > > > > I'm a bit puzzled about what "actually pretty good" means. My > > understanding about Mars Rover is: > > > > "The current Mars rovers may rely on proven computer technology, but > > for Spirit the journey has not been glitch-free. > > > > After a promising start to its mission, the Spirit rover -- the > first> of the MER twins to land on Mars -- stopped sending proper > data to JPL > > scientists 18 days into the mission and later baffled ground > > controllers by rebooting itself over and over again. Since then, > > mission controllers were able to regain reliable communications with > > the rover and continue to study what may have caused the > malfunction."> > [http://www.space.com/businesstechnology/technology/ > > mer_computer_040128 > > .html] > > > > Most industries know that "rebooting itself over and over again" is > > not desirable. Evidently Mars is out of sight - out of mind. > Software> developers have excessive pride in deliverables that are > all too often > > grossly dysfunctional. If a product released to the market failed > > by "rebooting itself over and over again," there would be no end of > > deserved criticism. Why is software immune from reasonable scrutiny? > > Who believes their community is immune from the release of defects? > > Maybe we should all ask ourselves how long it took for defects > to be > > found after the release of our last piece of wonderware, > whatever it > > was. > > > > As far a Google or genome software - how do you know you haven't > > missed something in ALL the web pages of the world or in a huge > > database being searched heuristicly. Are you saying defect free - > > never misses anything - perfect? Or just darn good - if so, how > good,> and how is that determined? Is that aspect of quality > openly measured, > > quantified, and available or is it just brushed under the rug. > > > > I do not claim to be flawless. This is *-->>NOT<--* a screed against > > Osborne/Stroustrop or anyone else who takes pride in a job well > done.> I am simply not convinced that the software community is as > defect> free as they claim. Sorry to be 'aggressive' (if this be > such) but I > > am completely sick of defective software propelled crap. > > > > Steve > > > > > > ----- Original Message ----- > > From: Brian Osborne > > Date: Saturday, January 13, 2007 1:06 pm > > Subject: [Bioperl-l] Quote > > > >> Technology Review January/February 2007 ?Bjarne Stroustrup The > >> Problem with > >> Programming? > >> > >> TR: Why is most software so bad? > >> > >> Bjarne Stroustrup: Some software is actually pretty good, by any > >> standard.Think of the Mars Rover, Google, and the Human Genome > >> Project. Now, that?s > >> quality software! > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From hlapp at gmx.net Sun Jan 14 12:05:19 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 14 Jan 2007 12:05:19 -0500 Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: <776666.56811.qm@web58905.mail.re1.yahoo.com> References: <776666.56811.qm@web58905.mail.re1.yahoo.com> Message-ID: Hi George, sorry for the sluggish response, I was tied up during the week. This is also why you always want to keep the thread on the list. Perl is an interpreted language, so no compilation is necessary. The only thing you need to do is have the package in a place where perl can find it. The simplest way to achieve this is by setting the PERL5LIB environment variable: $ export PERL5LIB=/where/you/put/your/perl/package or if PERL5LIB was set already, you'd append it: $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package I do assume that you didn't really add your code to the SeqAdaptor.pm package - there is no necessity for nor benefit from that, and at worst (and quite likely) perl won't be able to find the package. Note that there is plenty of documentation for how to write packages for perl and how to make them accessible to perl. Hth, -hilmar On Jan 8, 2007, at 11:52 PM, George Heller wrote: > Hi Hilmer. > > Thanks so much for the response. As I am new to Bioperl, I have > another question. > > I have made the changes as suggested by you, and have added the > code below to the SeqAdaptor.pm script. > > package SeqProcessor::Accession; > use strict; > use vars qw(@ISA); > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > sub process_seq > { > my ($self, $seq) = @_; > $seq->accession_number($seq->display_id); > return ($seq); > } > > Now that I have done my changes, do I need to compile or something > for the changes to reflect? If so, can you please let me know the > command for the same, or direct me to any lin that has > documentation for the same? > > Thanks so much for the help. > George. > > Hilmar Lapp wrote: > George, > > this is almost certainly caused by using FASTA format and bioperl's > treatment of it. I am guilty of not having written a FAQ yet for > Bioperl-db, as this would certainly be there. > > Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl > uses Bioperl to parse sequence files) does not extract the accession > number from the description line of the fasta sequence, and instead > sets the accession_number property if sequence objects it creates to > "unknown". Since there is a unique key constraint on > (accession,version,namespace) the second sequence loaded will raise > an exception as it will violate the constraint. > > The simplest way to deal with this is to write a SeqProcessor that > massages the accession_number appropriately and then supply the > module to load_seqdatabase.pl using the --pipeline command line > switch. > > There are several examples for how to do this in the email archives. > See for example this thread on the Biosql list: > > http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html > > with two links to examples, and Marc Logghe gives another one in the > thread itself. > > Hth, > > -hilmar > > On Jan 8, 2007, at 3:17 PM, George Heller wrote: > > > Hi all. > > > > I am new to Bioperl and am trying to run the load_seqdatabase.pl > > script to load sequence data from a file into Postgres database. I > > am invoking the script through the following command: > > > > perl load_seqdatabase.pl -host localhost -dbname biodb06 -format > > fasta > > -dbuser postgres -driver Pg > > > > I am getting the following error: > > > > -------------------- WARNING --------------------- > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > > were ("FGENES > > HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > > 1","unknown" > > ,"","0","") FKs (1,) > > ERROR: duplicate key violates unique constraint > > "bioentry_accession_key" > > --------------------------------------------------- > > Could not store unknown: > > ------------- EXCEPTION ------------- > > MSG: error while executing statement in > > Bio::DB::BioSQL::SeqAdaptor::find_by_uni > > que_key: ERROR: current transaction is aborted, commands ignored > > until end of t > > ransaction block > > STACK > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ > > lib/perl > > 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > > usr/lib/perl5 > > /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ > > perl5/site_perl/5 > > .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/ > > site_perl/5. > > 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > > STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ > > site_perl/5.8. > > 5/Bio/DB/Persistent/PersistentObject.pm:271 > > STACK (eval) load_seqdatabase.pl:620 > > STACK toplevel load_seqdatabase.pl:602 > > -------------------------------------- > > at load_seqdatabase.pl line 633 > > > > Can anyone tell me how I can correct this error and get my script > > running? Thanks!!! > > > > George. > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Jan 14 12:17:21 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 14 Jan 2007 12:17:21 -0500 Subject: [Bioperl-l] Taxon id null with load_seqdatabase.pl In-Reply-To: <20070111023331.61553.qmail@web58903.mail.re1.yahoo.com> References: <20070111023331.61553.qmail@web58903.mail.re1.yahoo.com> Message-ID: <2FDF1192-705A-4BEE-A931-18A3CFA9CC84@gmx.net> George, I presume you are still loading fasta-formatted files? These do not have a syntax for parsing or figuring out the species, so bioperl won't have one. If you somehow know the species/taxon, you can use that same SeqProcessor module you created to also set the species: $seq->species(Bio::Species->new(-classification=>[qw(thaliana Arabidopsis)])); Better yet, also add the NCBI taxon ID, which is the most effective lookup: $seq->species(Bio::Species->new( -classification=>[qw(thaliana Arabidopsis)], -ncbi_taxid=>3702)); Hth, -hilmar On Jan 10, 2007, at 9:33 PM, George Heller wrote: > Hi all. > > I just loaded Arabidopsis data into the bioentry and biosequence > tables in my Postgres database using load_seqdatabase.pl. I had > loaded the taxon data prior to this, using the > load_ncbi_taxonomy.pl. Yet, I see the taxon_id field in the > bioentry field is blank, after the loading. > > Can anyone tell me why this might be happening? And perhaps what > needs to be done to correct this? > > Thanks. > George > > > --------------------------------- > Access over 1 million songs - Yahoo! Music Unlimited. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lidaof at gmail.com Sun Jan 14 21:36:30 2007 From: lidaof at gmail.com (lidaof) Date: Mon, 15 Jan 2007 10:36:30 +0800 Subject: [Bioperl-l] Annotation method help Message-ID: <12d02c20701141836k6865025an7ce4155c68c08244@mail.gmail.com> Hi,all, i have a lot of fasta sequence in a file without annotation and i do a blastx against with Uniprot's protein database file with my fasta file now i want to add the Hit's description to my fasta's description derectly which bioperl module can help me do this? Thanks for any advice:) Best regards! -- Li From letondal at pasteur.fr Mon Jan 15 17:01:18 2007 From: letondal at pasteur.fr (Catherine Letondal) Date: Mon, 15 Jan 2007 23:01:18 +0100 Subject: [Bioperl-l] Quote In-Reply-To: References: Message-ID: On Jan 13, 2007, at 7:06 PM, Brian Osborne wrote: > Technology Review January/February 2007 ?Bjarne Stroustrup The Problem > with > Programming? > > TR: Why is most software so bad? This may help to find an answer: Why Software Sucks ... And What You Can Do About It by David Platt (http://www.whysoftwaresucks.com/) http://suckbusters.com/ (...my 2 euros!) -- Catherine Letondal -- Institut Pasteur www.pasteur.fr/~letondal From marian.thieme at lycos.de Tue Jan 16 05:20:47 2007 From: marian.thieme at lycos.de (marian thieme) Date: Tue, 16 Jan 2007 10:20:47 +0000 Subject: [Bioperl-l] Bio::Root::Root/Bio::LiveSeq::Mutation Message-ID: <109493435528517@lycos-europe.com> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070116/77db2aa6/attachment.html From cjfields at uiuc.edu Tue Jan 16 13:04:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 Jan 2007 12:04:00 -0600 Subject: [Bioperl-l] Filehandle issue Message-ID: <14C14E65-0A8B-4D94-BAB5-7DAE18A56AEE@uiuc.edu> All, I have noticed an interesting problem when running tests on a bioperl bug (http://tinyurl.com/tamf8). Several Bio::AlignIO parsers, such as Bio::AlignIO::phylip, return 0 instead of using 'return', which causes problems when using newFh() for retrieving an IO::Handle: ---------------------------- $in = Bio::AlignIO->newFh(-fh => \*STDIN, -format => 'phylip'); # $in is a GLOB while( my $aln = <$in>) { print $aln->no_sequences(),"\n"; } ---------------------------- cjfields:~/tests/phylip cjfields$ more testaln.phylip | ./alignio.pl 4 Can't call method "no_sequences" without a package or object reference at ./alignio.pl line 28, line 10. The method call works for the first loop iteration, but the while loop test evaluates '0' as true in this case. I think it's b/c perl is treating the returned 0 like general text retrieved from a file handle (the string '0') instead of EOF (with would return undef). I ran a quick check on this and 'return 0' is used quite a bit in other IO modules (though not all). Changing the 'return 0' to a simple 'return' fixes the problem. How common is the above while loop idiom used when iterating through data via a file handle? chris From stewarta at nmrc.navy.mil Tue Jan 16 16:11:35 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Tue, 16 Jan 2007 16:11:35 -0500 Subject: [Bioperl-l] contig disassembly Message-ID: <99701520-9366-4AE1-8F4D-6CD66C4BA211@nmrc.navy.mil> If I want to take a Bio::Seq object representing a contig and disassemble it into the constituent sequences which originally lead to its formation, all the while preserving the feature annotation associated with each sub-sequence, and with the coordinates of these feature sets updated to reflect their position relative to these sub- sequences, what is the best way to go about this? If I take a subsequence of a Seq object, will it carry over the relevant features as well? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From lidaof at gmail.com Wed Jan 17 01:31:47 2007 From: lidaof at gmail.com (lidaof) Date: Wed, 17 Jan 2007 14:31:47 +0800 Subject: [Bioperl-l] [Bioperl]problem with E-value Message-ID: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> Hi,all i write a script to analyze the NCBI-blast output i use bioperl 1.5.2_100 i encountened some wrong it says: Argument "1e" isn't numeric in numeric comparison (<=>) at /usr/lib/perl5/site_perl/5.8.5/Bio/SearchIO/blast.pm line 661, i searched the mailling list and add "$E = "1" . $E if $E =~ /^e/;" but the wrong message also happens to out then i change a machine with bioperl 1.5.1 installed all thing seems ok! no wrong message output! so,how can i do some change to aviod the wrong message using the newest bioperl? Thanks! -- Li From heikki at sanbi.ac.za Wed Jan 17 03:38:22 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 17 Jan 2007 10:38:22 +0200 Subject: [Bioperl-l] contig disassembly In-Reply-To: <99701520-9366-4AE1-8F4D-6CD66C4BA211@nmrc.navy.mil> References: <99701520-9366-4AE1-8F4D-6CD66C4BA211@nmrc.navy.mil> Message-ID: <200701171038.22979.heikki@sanbi.ac.za> Andrew, The default interface to Bio::SeqI objects does not do it. The methods leave you to deal with feature table changes after sequence changes. However, there are some attempts to provide this kind functionality in the Bio::SeqUtils class. Bio::SeqUtils::cat Bio::SeqUtils::revcom_with_features Bio::SeqUtils::trunc_with_features These could be expanded and made more complete, maybe even a class of its own if there is enough interest? -Heikki On Tuesday 16 January 2007 23:11, Andrew Stewart wrote: > If I want to take a Bio::Seq object representing a contig and > disassemble it into the constituent sequences which originally lead > to its formation, all the while preserving the feature annotation > associated with each sub-sequence, and with the coordinates of these > feature sets updated to reflect their position relative to these sub- > sequences, what is the best way to go about this? If I take a > subsequence of a Seq object, will it carry over the relevant features > as well? > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed Jan 17 03:26:58 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 17 Jan 2007 10:26:58 +0200 Subject: [Bioperl-l] Bio::Root::Root/Bio::LiveSeq::Mutation In-Reply-To: <109493435528517@lycos-europe.com> References: <109493435528517@lycos-europe.com> Message-ID: <200701171026.59651.heikki@sanbi.ac.za> Marian, I do not think preventing the error message here is good thing. The underlying assumption in all sequence classes is that one residue is represented by exactly one character. If you replacing one valid UIPAC character with a longish string, e.g. '[a/g]' you break all the methods that work on sequences. See: Bio::Tools::IUPAC Better alternatives are to 1) If you are sure you are not using ambiguous characters anywhere else in your sequence you could have sequence class that treats any ambiguity codes as polymorphisms and a SeqIO class that does the output formatting: r => a/g. 2. Use sequence features like they do in EMBL/GenBank/DDJB feature tables to annotate mutations in the reference sequence. 3. Use Bio::Variation::SeqDiff to hold your reference sequence and annotate polymorphisms as Bio::Variation::DNAMutation objects that can in turn hold moltiple Bio::Variation::Allele objects. I am sure there are other solutions, too. It all depends what you need to do with the information. -Heikki On Tuesday 16 January 2007 12:20, marian thieme wrote: > Hi, as I told to this list some time ago, I want to ouput heterozygous dna > sequences of different individuals. We need to output variations in the > following manner: > [a/g] if there is a loci where one allele has an "a" and the other has a > "g". (Also known as BIC db format or something like this) My approach is to > use the Bio::LiveSeq::Mutation (class ?) to change the specific position in > the sequence. > > > Bio::SeqUtils->mutate($seqobj, Bio::LiveSeq::Mutation->new( > ? -seq => "[a/g]", > ? -seqori => $seqori, > ? -pos => $pos, > ? -len => $length)); > > But unfortunatly this would rise an exception, that some unexpected chars > occur. Hence I went in to the code of Root.pm and made a small change: > commenting out line 359 in Root.pm : > > if( $ERRORLOADED ) { > #?????? print STDERR "? Calling Error::throw\n\n"; > > ?????? # Enable re-throwing of Error objects. > ?????? # If the error is not derived from Bio::Root::Exception, > ?????? # we can't guarantee that the Error's value was set properly > ?????? # and, ipso facto, that it will be catchable from an eval{}. > ?????? # But chances are, if you're re-throwing non-Bio::Root::Exceptions, > ?????? # you're probably using Error::try(), not eval{}. > ?????? # TODO: Fix the MSG: line of the re -thrown error. Has an extra line > ?????? # containing the '----- EXCEPTION -----' banner. > ?????? if( ref($args[0])) { > ?????????? if( $args[0]->isa('Error')) { > ?????????????? my $class = ref $args[0]; > ?????????????? $class->throw( @args ); > ?????????? } else { > ?????????????? my $text .= "\nWARNING: Attempt to throw a non-Error.pm > object: " . ref$args[0]; my $class = "Bio::Root::Exception"; > ?????????????? $class->throw( '-text' => $text, '-value' => $args[0] ); > ?????????? } > ?????? } else { > ?????????? $class ||= "Bio::Root::Exception"; > > ?????????? my %args; > ?????????? if( @args % 2 == 0 && $args[0] =~ /^-/ ) { > ?????????????? %args = @args; > ?????????????? $args{-text} = $text; > ?????????????? $args{-object} = $self; > ?????????? } > > (Line 359:) ? #$class->throw( scalar keys %args > 0 ? %args : @args ); # > (%args || @args) puts %args in scalar context! &nbs p;???? } > ?? } > > > After I did alter this line all is working fine. But I know that this can > be considered in the best case? as a work around. > > 2 Questions: > > Do you think it is worth to provide some class which are natively able to > cope with that matter ? Do I need to expect some unwanted behavior of some > scripts resp. classes ? > > Regards, > Marian > > > > > > > > _________________________________ > Stelle Deine Fragen bei Lycos iQ http://iq.lycos.de/qa/ask/ -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Wed Jan 17 05:47:06 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jan 2007 10:47:06 +0000 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> Message-ID: <45ADFEAA.2030808@sendu.me.uk> lidaof wrote: > Hi,all > > i write a script to analyze the NCBI-blast output > i use bioperl 1.5.2_100 > i encountened some wrong > it says: > Argument "1e" isn't numeric in numeric comparison (<=>) at > /usr/lib/perl5/site_perl/5.8.5/Bio/SearchIO/blast.pm line 661, Can you send me a blast output that causes this problem please? Cheers, Sendu. From bix at sendu.me.uk Wed Jan 17 09:17:42 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jan 2007 14:17:42 +0000 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> Message-ID: <45AE3006.1020004@sendu.me.uk> lidaof wrote: > Hi Sendu, > > Thanks for your reply! > the blast result and my .pl file were attached with this mail as a > single compressed file(the first attachment) > and on the machine installed bioperl 1.5.2_100 > the exactly wrong message like" > Argument "1e" isn't numeric in numeric comparison (<=>) at > /usr/lib/perl5/site_perl > /5.8.5/Bio/SearchIO/blast.pm line 661, line 50. > Argument "1e" isn't numeric in numeric comparison (<=>) at > /usr/lib/perl5/site_perl > /5.8.5/Bio/SearchIO/blast.pm line 661, line 78. > " > and on the machine installed bioperl 1.5.1,no wrong message > but the output result is Different!!! > and the different result are attached as the second attachment Thanks for those. Can you describe in detail exactly how you generated the blast report? The problem is that Bioperl is completely mis-parsing the results. No version of Bioperl is able to handle your blast report, because the blast parser seems to expect there to be alignments for those 'Sequences producing significant alignments'. Until the parser is fixed (assuming your blast report is valid), let me make it clear: do NOT use Bioperl to parse your blast report - the results are TOTALLY WRONG. From cuiw at ncbi.nlm.nih.gov Wed Jan 17 09:29:17 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Wed, 17 Jan 2007 09:29:17 -0500 Subject: [Bioperl-l] contig disassembly In-Reply-To: <99701520-9366-4AE1-8F4D-6CD66C4BA211@nmrc.navy.mil> References: <99701520-9366-4AE1-8F4D-6CD66C4BA211@nmrc.navy.mil> Message-ID: <18C407FD4FFB424292D769FBD68C1987020BB5AC@NIHCESMLBX8.nih.gov> Hi, Andrew, I am not sure you are aware 0f NCBI Map Viewer: http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=9606&gnl=NT_079540.1& maps=comp,genes&cmd=txt Here is an example to retrieve components and genes from NT_079540.1 of current Hs. genome build 36.2. Wenwu Cui, PhD National Center for Biotechnology Information National Institutes of Health Bethesda, MD 20892 -----Original Message----- From: Andrew Stewart [mailto:stewarta at nmrc.navy.mil] Sent: Tuesday, January 16, 2007 4:12 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] contig disassembly If I want to take a Bio::Seq object representing a contig and disassemble it into the constituent sequences which originally lead to its formation, all the while preserving the feature annotation associated with each sub-sequence, and with the coordinates of these feature sets updated to reflect their position relative to these sub- sequences, what is the best way to go about this? If I take a subsequence of a Seq object, will it carry over the relevant features as well? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From ramona.schmid at freenet.de Wed Jan 17 09:33:06 2007 From: ramona.schmid at freenet.de (magnusgeist) Date: Wed, 17 Jan 2007 06:33:06 -0800 (PST) Subject: [Bioperl-l] error reading psi 2.5 file from intact using bioperl-network-1.5.2_100 Message-ID: <8411196.post@talk.nabble.com> dear all, trying to read files in psi 2.5 format from intact like this: my $io = Bio::Network::IO->new(-format => 'psi', -source => 'intact', -file => 'human_small-07.xml'); my $graph = $io->next_network; returns the following error: Can't call method "att" on an undefined value at /vol/pi/lib/perl-5.8.0/Bio/Network/IO/psi.pm line 396. doing the same with files from dip: my $io = Bio::Network::IO->new( -format => 'psi', -file => 'Hsapi20070107.mif'); my $graph = $io->next_network; does not result in any problems. would be great if one of you could help! thank you very much in advance. magnusgeist -- View this message in context: http://www.nabble.com/error-reading-psi-2.5-file-from-intact-using-bioperl-network-1.5.2_100-tf3027578.html#a8411196 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From lidaof at gmail.com Wed Jan 17 08:02:09 2007 From: lidaof at gmail.com (lidaof) Date: Wed, 17 Jan 2007 21:02:09 +0800 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: <45ADFEAA.2030808@sendu.me.uk> References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> Message-ID: <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> Hi Sendu, Thanks for your reply! the blast result and my .pl file were attached with this mail as a single compressed file(the first attachment) and on the machine installed bioperl 1.5.2_100 the exactly wrong message like" Argument "1e" isn't numeric in numeric comparison (<=>) at /usr/lib/perl5/site_perl /5.8.5/Bio/SearchIO/blast.pm line 661, line 50. Argument "1e" isn't numeric in numeric comparison (<=>) at /usr/lib/perl5/site_perl /5.8.5/Bio/SearchIO/blast.pm line 661, line 78. " and on the machine installed bioperl 1.5.1,no wrong message but the output result is Different!!! and the different result are attached as the second attachment the .pl file is the program i wrote and i add some comments for debugging..so maybe make you read it a little difficultly Thanks for your attention! Best Regards! On 1/17/07, Sendu Bala wrote: > > lidaof wrote: > > Hi,all > > > > i write a script to analyze the NCBI-blast output > > i use bioperl 1.5.2_100 > > i encountened some wrong > > it says: > > Argument "1e" isn't numeric in numeric comparison (<=>) at > > /usr/lib/perl5/site_perl/5.8.5/Bio/SearchIO/blast.pm line 661, > > Can you send me a blast output that causes this problem please? > > > Cheers, > Sendu. > -- Li -------------- next part -------------- A non-text attachment was scrubbed... Name: pleaseCheck.tgz Type: application/x-gzip Size: 25164 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070117/a9a0d93d/attachment.tgz -------------- next part -------------- A non-text attachment was scrubbed... Name: the_different_result.tgz Type: application/x-gzip Size: 315 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070117/a9a0d93d/attachment-0001.tgz From Kevin.M.Brown at asu.edu Wed Jan 17 11:16:48 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 17 Jan 2007 09:16:48 -0700 Subject: [Bioperl-l] Alignment->slice() issue? Message-ID: <1A4207F8295607498283FE9E93B775B4028B39C3@EX02.asurite.ad.asu.edu> Bioperl: 1.5.2_100 Perl: perl -v This is perl, v5.8.5 built for i386-linux-thread-multi I'm hoping this is just me, but I've created a huge alignment of a set of primers on a chromosome and then I'm trying to slice up that one large alignment into smaller alignments based around the CDS features of the chromosome (taken from a Genbank file that the script read in previously that gives me both the features and the chromosome sequence). The error occurs when I request the slice. I get the following: ------------- EXCEPTION ------------- MSG: Bad start,end parameters. Start [1088] has to be less than end [850] STACK Bio::PrimarySeq::subseq /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:354 STACK Bio::SimpleAlign::slice /usr/lib/perl5/site_perl/5.8.5/Bio/SimpleAlign.pm:929 STACK toplevel ./PrimerAnalysis.pl:376 -------------------------------------- But, based on output I've put into my script that isn't the range I requested from the alignment. What I've requested is $align = $alignments{$key}->slice($start, $stop); $start is 1088 and $stop is 2377 (from the printout below) "Forward strand with start(1088) and stop(2377) at ./PrimerAnalysis.pl line 358, <$primer> line 657." The feature I'm initally after is BMAA0001 start:1139 stop:2326 with some upstream and downstream sequence. I noticed that slice does "foreach my $seq ( $self->each_seq() )", so I copied that to printout all the sequences held by the alignment and their start and stop locations and get the following: NC_006349 1 2325379 BurkM_0005_a-f..BurkM_0005_a-r 80686 81516 BurkM_0005_a-f..BurkM_0005_a-r 268747 269577 BurkM_0005_a-f..BurkM_0005_a-r 329852 330682 BurkM_0005_a-f..BurkM_0005_a-r 560818 561648 BurkM_0005_a-f..BurkM_0005_a-r 592443 593273 BurkM_0005_a-f..BurkM_0005_a-r 908245 909075 BurkM_0005_a-f..BurkM_0005_a-r 935390 936220 BurkM_0005_a-f..BurkM_0005_a-r 1014714 1015544 BurkM_0005_a-f..BurkM_0005_a-r 1034315 1035145 BurkM_0005_a-f..BurkM_0005_a-r 1225934 1226764 BurkM_0005_a-f..BurkM_0005_a-r 1324779 1325609 BurkM_0005_a-f..BurkM_0005_a-r 1413075 1413905 BurkM_0005_a-f..BurkM_0005_a-r 1480717 1481547 BurkM_0005_a-f..BurkM_0005_a-r 1517965 1518795 BurkM_0005_a-f..BurkM_0005_a-r 1900786 1901616 BurkM_0005_a-f..BurkM_0005_a-r 1921906 1922736 BurkM_0005_a-f..BurkM_0005_a-r 1957979 1958809 BurkM_0005_a-f..BurkM_0005_a-r 2136301 2137131 BurkM_0005_a-r..BurkM_0005_a-f 103238 104068 BurkM_0005_a-r..BurkM_0005_a-f 170641 171471 BurkM_0005_a-r..BurkM_0005_a-f 408755 409585 BurkM_0005_a-r..BurkM_0005_a-f 432906 433736 BurkM_0005_a-r..BurkM_0005_a-f 509458 510288 BurkM_0005_a-r..BurkM_0005_a-f 565194 566024 BurkM_0005_a-r..BurkM_0005_a-f 656754 657584 BurkM_0005_a-r..BurkM_0005_a-f 733927 734757 BurkM_0005_a-r..BurkM_0005_a-f 838705 839535 BurkM_0005_a-r..BurkM_0005_a-f 869777 870607 BurkM_0005_a-r..BurkM_0005_a-f 892021 892851 BurkM_0005_a-r..BurkM_0005_a-f 909903 910733 BurkM_0005_a-r..BurkM_0005_a-f 1061801 1062631 BurkM_0005_a-r..BurkM_0005_a-f 1096777 1097607 BurkM_0005_a-r..BurkM_0005_a-f 1636356 1637186 BurkM_0005_a-r..BurkM_0005_a-f 1636356 1643935 BurkM_0005_a-r..BurkM_0005_a-f 1643105 1643935 BurkM_0005_a-r..BurkM_0005_a-f 1790703 1791533 BurkM_0005_a-r..BurkM_0005_a-f 2267109 2267939 BurkM_0005_a-f..BurkM_0005_a-f 560818 566024 BurkM_0005_a-f..BurkM_0005_a-f 908245 910733 BMA_0006_a-r..BMA_0006_a-r 561646 565196 BMA_0006_a-r..BMA_0006_a-r 909073 909905 BMA_0046_a-f..BMA_0046_a-r 437921 438661 BurkM_0092_a-f..BurkM_0092_a-f 561670 565172 BurkM_0092_a-f..BurkM_0092_a-f 909097 909881 BMA_0113_a-f..BMA_0113_a-r 1310782 1311536 BMA_0113_a-r..BMA_0113_a-f 172284 173038 BMA_0113_a-r..BMA_0113_a-f 2266197 2266951 BMA_0146_a-f..BMA_0146_a-r 1172194 1173065 BMA_0146_a-f..BMA_0146_a-r 2267012 2269123 BMA_0146_a-r..BMA_0146_a-f 167410 168281 BMA_0146_a-r..BMA_0146_a-f 320180 321051 BMA_0146_a-r..BMA_0146_a-f 894226 895097 BMA_0146_a-r..BMA_0146_a-f 894226 900207 BMA_0146_a-r..BMA_0146_a-f 899335 900207 BMA_0146_a-r..BMA_0146_a-f 1638747 1639622 BMA_0146_a-r..BMA_0146_a-f 1972415 1973286 BMA_0146_a-r..BMA_0146_a-f 2157899 2158770 BMA_0146_a-r..BMA_0146_a-f 2321169 2322040 So, I can see that all the sequences held in the alignment have Start < Stop as expected. What I can't figure is where the end value is coming from that is messing this up. Any help is greatly appreciated. From cjfields at uiuc.edu Wed Jan 17 11:22:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 17 Jan 2007 10:22:36 -0600 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: <45AE3006.1020004@sendu.me.uk> References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> <45AE3006.1020004@sendu.me.uk> Message-ID: Is this a supported BLAST report (i.e. listed in Bio::SearchIO::blast POD)? If so, could someone please file this as a bug along with an example report? Or has this been fixed already? chris On Jan 17, 2007, at 8:17 AM, Sendu Bala wrote: > lidaof wrote: >> Hi Sendu, >> >> Thanks for your reply! >> the blast result and my .pl file were attached with this mail as a >> single compressed file(the first attachment) >> and on the machine installed bioperl 1.5.2_100 >> the exactly wrong message like" >> Argument "1e" isn't numeric in numeric comparison (<=>) at >> /usr/lib/perl5/site_perl >> /5.8.5/Bio/SearchIO/blast.pm line 661, line 50. >> Argument "1e" isn't numeric in numeric comparison (<=>) at >> /usr/lib/perl5/site_perl >> /5.8.5/Bio/SearchIO/blast.pm line 661, line 78. >> " >> and on the machine installed bioperl 1.5.1,no wrong message >> but the output result is Different!!! >> and the different result are attached as the second attachment > > Thanks for those. Can you describe in detail exactly how you generated > the blast report? > > The problem is that Bioperl is completely mis-parsing the results. No > version of Bioperl is able to handle your blast report, because the > blast parser seems to expect there to be alignments for those > 'Sequences > producing significant alignments'. > > Until the parser is fixed (assuming your blast report is valid), > let me > make it clear: do NOT use Bioperl to parse your blast report - the > results are TOTALLY WRONG. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Wed Jan 17 11:29:47 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jan 2007 16:29:47 +0000 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> <45AE3006.1020004@sendu.me.uk> Message-ID: <45AE4EFB.4090909@sendu.me.uk> Chris Fields wrote: > Is this a supported BLAST report (i.e. listed in Bio::SearchIO::blast > POD)? If so, could someone please file this as a bug along with an > example report? Or has this been fixed already? No, it hasn't been fixed already. (I made a recent commit that just got rid of the annoying error message.) Waiting to find out how the blast report was created before taking any further action. It claims to be: BLASTX 2.2.15 [Oct-15-2006] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. From jason at bioperl.org Wed Jan 17 11:40:45 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 17 Jan 2007 08:40:45 -0800 Subject: [Bioperl-l] Bio::Root::Root/Bio::LiveSeq::Mutation In-Reply-To: <109493435528517@lycos-europe.com> References: <109493435528517@lycos-europe.com> Message-ID: <9D01DE92-4A0F-43D3-8859-B53057C1084B@bioperl.org> I think you are ignoring the fact that errors are thrown for a reason, not just to annoy you. Why not store the data in Bio::Seq objects as IUPAC ambiguity codes and write a special writer class in Bio::SeqIO which converts the ambiguity codes to your specified encoding. There are examples of how to write your own Bio::SeqIO class in the HOWTO tutorials when we talk about extending the toolkit. There is also all the code to decompose an ambiguity code into the bases it represents. -jason On Jan 16, 2007, at 2:20 AM, marian thieme wrote: > Hi, as I told to this list some time ago, I want to ouput > heterozygous dna sequences of different individuals. > We need to output variations in the following manner: > [a/g] if there is a loci where one allele has an "a" and the other > has a "g". (Also known as BIC db format or something like this) > My approach is to use the Bio::LiveSeq::Mutation (class ?) to > change the specific position in the sequence. > > > Bio::SeqUtils->mutate($seqobj, Bio::LiveSeq::Mutation->new( > -seq => "[a/g]", > -seqori => $seqori, > -pos => $pos, > -len => $length)); > > But unfortunatly this would rise an exception, that some unexpected > chars occur. Hence I went in to the code of Root.pm and made a > small change: commenting out line 359 in Root.pm : > > if( $ERRORLOADED ) { > # print STDERR " Calling Error::throw\n\n"; > > # Enable re-throwing of Error objects. > # If the error is not derived from Bio::Root::Exception, > # we can't guarantee that the Error's value was set properly > # and, ipso facto, that it will be catchable from an eval{}. > # But chances are, if you're re-throwing non- > Bio::Root::Exceptions, > # you're probably using Error::try(), not eval{}. > # TODO: Fix the MSG: line of the re -thrown error. Has an > extra line > # containing the '----- EXCEPTION -----' banner. > if( ref($args[0])) { > if( $args[0]->isa('Error')) { > my $class = ref $args[0]; > $class->throw( @args ); > } else { > my $text .= "\nWARNING: Attempt to throw a non- > Error.pm object: " . ref$args[0]; > my $class = "Bio::Root::Exception"; > $class->throw( '-text' => $text, '-value' => $args > [0] ); > } > } else { > $class ||= "Bio::Root::Exception"; > > my %args; > if( @args % 2 == 0 && $args[0] =~ /^-/ ) { > %args = @args; > $args{-text} = $text; > $args{-object} = $self; > } > > (Line 359:) #$class->throw( scalar keys %args > 0 ? %args : > @args ); # (%args || @args) puts %args in scalar context! > &nbs p; } > } > > > After I did alter this line all is working fine. But I know that > this can be considered in the best case as a work around. > > 2 Questions: > > Do you think it is worth to provide some class which are natively > able to cope with that matter ? > Do I need to expect some unwanted behavior of some scripts resp. > classes ? > > Regards, > Marian > > > > > > > > > _________________________________ > Stelle Deine Fragen bei Lycos iQ http://iq.lycos.de/qa/ask/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From cjfields at uiuc.edu Wed Jan 17 11:48:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 17 Jan 2007 10:48:03 -0600 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: <45AE4EFB.4090909@sendu.me.uk> References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> <45AE3006.1020004@sendu.me.uk> <45AE4EFB.4090909@sendu.me.uk> Message-ID: I pretty sure I know exactly what the problem is and how to fix it (if you haven't done it already). Looks like the parser trashes the rest of the BLAST results data since it's not catching the next BLAST report header (and not breaking out of the while() loop). That may be what is triggering the e-value error. I have been parsing new BLAST reports recently w/o that one popping up, but it may be a difference between the web BLAST report and the executable (wouldn't be the first time that has happened. Did you want me to take a look? chris On Jan 17, 2007, at 10:29 AM, Sendu Bala wrote: > Chris Fields wrote: >> Is this a supported BLAST report (i.e. listed in >> Bio::SearchIO::blast POD)? If so, could someone please file this >> as a bug along with an example report? Or has this been fixed >> already? > > No, it hasn't been fixed already. (I made a recent commit that just > got rid of the annoying error message.) Waiting to find out how the > blast report was created before taking any further action. It > claims to be: > > BLASTX 2.2.15 [Oct-15-2006] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database > search > programs", Nucleic Acids Res. 25:3389-3402. > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mikel.eganaaranguren at cs.man.ac.uk Wed Jan 17 11:37:35 2007 From: mikel.eganaaranguren at cs.man.ac.uk (=?ISO-8859-1?Q?Mikel_Ega=F1a_Aranguren?=) Date: Wed, 17 Jan 2007 16:37:35 +0000 Subject: [Bioperl-l] error reading psi 2.5 file from intact using bioperl-network-1.5.2_100 In-Reply-To: <8411196.post@talk.nabble.com> References: <8411196.post@talk.nabble.com> Message-ID: <45AE50CF.40805@cs.man.ac.uk> Hello everyone; I get exactly the same error when parsing the intact file from ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/species/human_small-01.xml and I was about to send an email; help would be much appreciated. thanks a lot Mikel magnusgeist(e)k dio: > dear all, > > trying to read files in psi 2.5 format from intact like this: > > my $io = Bio::Network::IO->new(-format => 'psi', > -source => 'intact', > -file => > 'human_small-07.xml'); > > my $graph = $io->next_network; > > returns the following error: Can't call method "att" on an undefined value > at /vol/pi/lib/perl-5.8.0/Bio/Network/IO/psi.pm line 396. > > doing the same with files from dip: > > my $io = Bio::Network::IO->new( -format => 'psi', > -file => 'Hsapi20070107.mif'); > > my $graph = $io->next_network; > > does not result in any problems. > > would be great if one of you could help! > thank you very much in advance. > magnusgeist > -- Mikel Ega?a Aranguren - http://www.mikeleganaaranguren.com PhD student - Manchester University Computer Science Cell Cycle Ontology http://www.cellcycleontology.org Gene Ontology Next Generation http://www.gong.manchester.ac.uk Metabolik BioHacklab http://www.sindominio.net/metabolik/weblog X-Evian http://x-evian.org/ From bix at sendu.me.uk Wed Jan 17 11:54:27 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jan 2007 16:54:27 +0000 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> <45AE3006.1020004@sendu.me.uk> <45AE4EFB.4090909@sendu.me.uk> Message-ID: <45AE54C3.9030201@sendu.me.uk> Chris Fields wrote: > I pretty sure I know exactly what the problem is and how to fix it (if > you haven't done it already). Looks like the parser trashes the rest of > the BLAST results data since it's not catching the next BLAST report > header (and not breaking out of the while() loop). That may be what is > triggering the e-value error. I have been parsing new BLAST reports > recently w/o that one popping up, but it may be a difference between the > web BLAST report and the executable (wouldn't be the first time that has > happened. > > Did you want me to take a look? Sure. In case you didn't notice, the reason it isn't catching the next BLAST report is the lack of alignments. For the few results that do have alignments, that's where it 'works'. From cjfields at uiuc.edu Wed Jan 17 12:03:56 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 17 Jan 2007 11:03:56 -0600 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: <45AE54C3.9030201@sendu.me.uk> References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> <45AE3006.1020004@sendu.me.uk> <45AE4EFB.4090909@sendu.me.uk> <45AE54C3.9030201@sendu.me.uk> Message-ID: On Jan 17, 2007, at 10:54 AM, Sendu Bala wrote: > Chris Fields wrote: >> I pretty sure I know exactly what the problem is and how to fix it >> (if you haven't done it already). Looks like the parser trashes >> the rest of the BLAST results data since it's not catching the >> next BLAST report header (and not breaking out of the while() >> loop). That may be what is triggering the e-value error. I have >> been parsing new BLAST reports recently w/o that one popping up, >> but it may be a difference between the web BLAST report and the >> executable (wouldn't be the first time that has happened. >> Did you want me to take a look? > > Sure. In case you didn't notice, the reason it isn't catching the > next BLAST report is the lack of alignments. For the few results > that do have alignments, that's where it 'works'. I plan on generating a new BLAST report (from the web, since I don't have 2.2.15 installed) with multiple queries and no alignments to see what happens (i.e. see if the new multiquery report is similar to this one). If so, there is a 'wrap-up' section of the parser, where hit data not in alignments (in the table only) is used to generate Hit objects; it's dropping this data likely b/c there is no regex signalling the next result event. chris From bosborne11 at verizon.net Wed Jan 17 13:12:41 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 17 Jan 2007 13:12:41 -0500 Subject: [Bioperl-l] error reading psi 2.5 file from intact using bioperl-network-1.5.2_100 In-Reply-To: <8411196.post@talk.nabble.com> Message-ID: Ramona, The current tests read bovin_small_intact.xml and sv40_small.xml successfully, both IntAct PSI files, but I don't recall ever testing human_small-07.xml. Can you send me this file? Don't send it to bioperl-l though. Brian O. On 1/17/07 9:33 AM, "magnusgeist" wrote: > > dear all, > > trying to read files in psi 2.5 format from intact like this: > > my $io = Bio::Network::IO->new(-format => 'psi', > -source => 'intact', > -file => > 'human_small-07.xml'); > > my $graph = $io->next_network; > > returns the following error: Can't call method "att" on an undefined value > at /vol/pi/lib/perl-5.8.0/Bio/Network/IO/psi.pm line 396. > > doing the same with files from dip: > > my $io = Bio::Network::IO->new( -format => 'psi', > -file => 'Hsapi20070107.mif'); > > my $graph = $io->next_network; > > does not result in any problems. > > would be great if one of you could help! > thank you very much in advance. > magnusgeist From cjfields at uiuc.edu Wed Jan 17 13:17:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 17 Jan 2007 12:17:19 -0600 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> <45AE3006.1020004@sendu.me.uk> <45AE4EFB.4090909@sendu.me.uk> <45AE54C3.9030201@sendu.me.uk> Message-ID: <413EAEEC-1B81-4154-8ACD-A5D73F9F5D88@uiuc.edu> Li, Sendu, Following up on this, I generated two new multiquery BLAST reports (BLASTP and BLASTX) from the NCBI BLAST server. I can't reproduce output similar to Li's example (where the BLASTX header line appears over and over). It is entirely possible that a local v. 2.2.15 BLAST installation would generate different output. Li, was the example file you sent generated from single query BLAST runs where the output was appended to the same file, or were they generated from a single BLAST run where the queries were all in one file? There is a significant difference between the two. Regardless, I am able to reproduce a bug where hits aren't created properly from BLAST hit tables (which should be easy to fix). I'll file a bug report and work on this. As a general note, if you are only interested in hits and not alignments you probably shouldn't use the regular BLAST text output. Try using -m8 -r -m9 BLAST tabular output and 'blasttable' parsing instead of 'blast' and maybe switch event handlers (all in the SearchIO HOWTO). chris On Jan 17, 2007, at 11:03 AM, Chris Fields wrote ... > ... > I plan on generating a new BLAST report (from the web, since I don't > have 2.2.15 installed) with multiple queries and no alignments to see > what happens (i.e. see if the new multiquery report is similar to > this one). If so, there is a 'wrap-up' section of the parser, where > hit data not in alignments (in the table only) is used to generate > Hit objects; it's dropping this data likely b/c there is no regex > signalling the next result event. > > chris From bosborne11 at verizon.net Wed Jan 17 13:14:02 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 17 Jan 2007 13:14:02 -0500 Subject: [Bioperl-l] error reading psi 2.5 file from intact using bioperl-network-1.5.2_100 In-Reply-To: <45AE50CF.40805@cs.man.ac.uk> Message-ID: Mikel, Please send me human_small-01.xml, I'll take a look. Brian O. On 1/17/07 11:37 AM, "Mikel Ega?a Aranguren" wrote: > Hello everyone; > > I get exactly the same error when parsing the intact file from > ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/species/human_small-01. > xml > > and I was about to send an email; help would be much appreciated. > > thanks a lot > > Mikel > > > magnusgeist(e)k dio: >> dear all, >> >> trying to read files in psi 2.5 format from intact like this: >> >> my $io = Bio::Network::IO->new(-format => 'psi', >> -source => 'intact', >> -file => >> 'human_small-07.xml'); >> >> my $graph = $io->next_network; >> >> returns the following error: Can't call method "att" on an undefined value >> at /vol/pi/lib/perl-5.8.0/Bio/Network/IO/psi.pm line 396. >> >> doing the same with files from dip: >> >> my $io = Bio::Network::IO->new( -format => 'psi', >> -file => 'Hsapi20070107.mif'); >> >> my $graph = $io->next_network; >> >> does not result in any problems. >> >> would be great if one of you could help! >> thank you very much in advance. >> magnusgeist >> > From stewarta at nmrc.navy.mil Wed Jan 17 17:21:43 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Wed, 17 Jan 2007 17:21:43 -0500 Subject: [Bioperl-l] contig disassembly In-Reply-To: <200701171038.22979.heikki@sanbi.ac.za> References: <99701520-9366-4AE1-8F4D-6CD66C4BA211@nmrc.navy.mil> <200701171038.22979.heikki@sanbi.ac.za> Message-ID: I can't seem to find trunc_with_features anywhere in Bio::SeqUtils. Was this method or its functioning moved to another module / method within this module?? Thanks -Andrew On Jan 17, 2007, at 3:38 AM, Heikki Lehvaslaiho wrote: > Andrew, > > The default interface to Bio::SeqI objects does not do it. The > methods leave > you to deal with feature table changes after sequence changes. > > However, there are some attempts to provide this kind functionality > in the > Bio::SeqUtils class. > > Bio::SeqUtils::cat > Bio::SeqUtils::revcom_with_features > Bio::SeqUtils::trunc_with_features > > These could be expanded and made more complete, maybe even a class > of its own > if there is enough interest? > > -Heikki > > > On Tuesday 16 January 2007 23:11, Andrew Stewart wrote: >> If I want to take a Bio::Seq object representing a contig and >> disassemble it into the constituent sequences which originally lead >> to its formation, all the while preserving the feature annotation >> associated with each sub-sequence, and with the coordinates of these >> feature sets updated to reflect their position relative to these sub- >> sequences, what is the best way to go about this? If I take a >> subsequence of a Seq object, will it carry over the relevant features >> as well? >> >> >> -- >> Andrew Stewart >> Research Assistant, Genomics Team >> Navy Medical Research Center (NMRC) >> Biological Defense Research Directorate (BDRD) >> BDRD Annex >> 12300 Washington Avenue, 2nd Floor >> Rockville, MD 20852 >> >> email: stewarta at nmrc.navy.mil >> phone: 301-231-6700 Ext 270 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From cjfields at uiuc.edu Wed Jan 17 17:56:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 17 Jan 2007 16:56:00 -0600 Subject: [Bioperl-l] contig disassembly In-Reply-To: References: <99701520-9366-4AE1-8F4D-6CD66C4BA211@nmrc.navy.mil> <200701171038.22979.heikki@sanbi.ac.za> Message-ID: <186EFF97-4189-4E48-B3AB-03525F98FB1F@uiuc.edu> It was added post-rel. 1.5.1, I believe, so it won't be there unless you're using the latest bioperl release (1.5.2) or bioperl-live. chris On Jan 17, 2007, at 4:21 PM, Andrew Stewart wrote: > I can't seem to find trunc_with_features anywhere in Bio::SeqUtils. > Was this method or its functioning moved to another module / method > within this module?? > > Thanks > -Andrew > > > On Jan 17, 2007, at 3:38 AM, Heikki Lehvaslaiho wrote: > >> Andrew, >> >> The default interface to Bio::SeqI objects does not do it. The >> methods leave >> you to deal with feature table changes after sequence changes. >> >> However, there are some attempts to provide this kind functionality >> in the >> Bio::SeqUtils class. >> >> Bio::SeqUtils::cat >> Bio::SeqUtils::revcom_with_features >> Bio::SeqUtils::trunc_with_features >> >> These could be expanded and made more complete, maybe even a class >> of its own >> if there is enough interest? >> >> -Heikki >> >> >> On Tuesday 16 January 2007 23:11, Andrew Stewart wrote: >>> If I want to take a Bio::Seq object representing a contig and >>> disassemble it into the constituent sequences which originally lead >>> to its formation, all the while preserving the feature annotation >>> associated with each sub-sequence, and with the coordinates of these >>> feature sets updated to reflect their position relative to these >>> sub- >>> sequences, what is the best way to go about this? If I take a >>> subsequence of a Seq object, will it carry over the relevant >>> features >>> as well? >>> >>> >>> -- >>> Andrew Stewart >>> Research Assistant, Genomics Team >>> Navy Medical Research Center (NMRC) >>> Biological Defense Research Directorate (BDRD) >>> BDRD Annex >>> 12300 Washington Avenue, 2nd Floor >>> Rockville, MD 20852 >>> >>> email: stewarta at nmrc.navy.mil >>> phone: 301-231-6700 Ext 270 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> ______ _/ _/ >> _____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/ >> ________________________________________________________ > > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From marian.thieme at lycos.de Wed Jan 17 18:31:58 2007 From: marian.thieme at lycos.de (marian thieme) Date: Wed, 17 Jan 2007 23:31:58 +0000 Subject: [Bioperl-l] Mutation IO Message-ID: <188661178029100@lycos-europe.com> Jason, your right, probably it is some kind of abuse of the bioperl api, but its a very quick way to get results, because I dont need to cope with replacing substrings. On the other hand, if you are using the Root.pm class in other scripts, it can probably cause some malfunction (inclusive crash of your application). Probably its no big matter to provide a filestream IO class which is reading/writing the sequence and translates the in/from IUPAC chars. But one thing I dont see at present: How would you represent more complex mutations, as change of few bases ? Ok here we could represent each position seperatly. But in the case of a mutation ? I dont know if there is a iupac char which treats a mutation ! Lets consider this case: 1.) origin of some position is a 2.) some individual has in one locus an a and the other is missing that base or perhaps both loci are missing the a. so via BIC notation you can write [a/_] resp. [_/_]. Any idea how to resolve this ? Marian > Von: Jason Stajich > An: marian thieme > Betreff: Re: [Bioperl-l] Bio::Root::Root/Bio::LiveSeq::Mutation > Datum: Wed, 17 Jan 2007 08:40:45 -0800 > I think you are ignoring the fact that errors are thrown for a > reason, not just to annoy you. > > Why not store the data in Bio::Seq objects as IUPAC ambiguity codes > and write a special writer class in Bio::SeqIO which converts the > ambiguity codes to your specified encoding. > There are examples of how to write your own Bio::SeqIO class in the > HOWTO tutorials when we talk about extending the toolkit. There is > also all the code to decompose an ambiguity code into the bases it > represents. > > > -jason > On Jan 16, 2007, at 2:20 AM, marian thieme wrote: > > > Hi, as I told to this list some time ago, I want to ouput > > heterozygous dna sequences of different individuals. > > We need to output variations in the following manner: > > [a/g] if there is a loci where one allele has an "a" and the other > > has a "g". (Also known as BIC db format or something like this) > > My approach is to use the Bio::LiveSeq::Mutation (class ?) to > > change the specific position in the sequence. > > > > > > Bio::SeqUtils->mutate($seqobj, Bio::LiveSeq::Mutation->new( > > -seq => "[a/g]", > > -seqori => $seqori, > > -pos => $pos, > > -len => $length)); > > > > But unfortunatly this would rise an exception, that some unexpected > > chars occur. Hence I went in to the code of Root.pm and made a > > small change: commenting out line 359 in Root.pm : > > > > if( $ERRORLOADED ) { > > # print STDERR " Calling Error::throw\n\n"; > > > > # Enable re-throwing of Error objects. > > # If the error is not derived from Bio::Root::Exception, > > # we can't guarantee that the Error's value was set properly > > # and, ipso facto, that it will be catchable from an eval{}. > > # But chances are, if you're re-throwing non- > > Bio::Root::Exceptions, > > # you're probably using Error::try(), not eval{}. > > # TODO: Fix the MSG: line of the re -thrown error. Has an > > extra line > > # containing the '----- EXCEPTION -----' banner. > > if( ref($args[0])) { > > if( $args[0]->isa('Error')) { > > my $class = ref $args[0]; > > $class->throw( @args ); > > } else { > > my $text .= "\nWARNING: Attempt to throw a non- > > Error.pm object: " . ref$args[0]; > > my $class = "Bio::Root::Exception"; > > $class->throw( '-text' => $text, '-value' => $args > > [0] ); > > } > > } else { > > $class ||= "Bio::Root::Exception"; > > > > my %args; > > if( @args % 2 == 0 && $args[0] =~ /^-/ ) { > > %args = @args; > > $args{-text} = $text; > > $args{-object} = $self; > > } > > > > (Line 359:) #$class->throw( scalar keys %args > 0 ? %args : > > @args ); # (%args || @args) puts %args in scalar context! > > &nbs p; } > > } > > > > > > After I did alter this line all is working fine. But I know that > > this can be considered in the best case as a work around. > > > > 2 Questions: > > > > Do you think it is worth to provide some class which are natively > > able to cope with that matter ? > > Do I need to expect some unwanted behavior of some scripts resp. > > classes ? > > > > Regards, > > Marian > > > > > > > > > > > > > > > > > > _________________________________ > > Stelle Deine Fragen bei Lycos iQ href=http://iq.lycos.de/qa/ask/>http://iq.lycos.de/qa/ask/> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > href=http://lists.open-bio.org/mailman/listinfo/bioperl-l>http://lists.open- > bio.org/mailman/listinfo/bioperl-l > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > href=http://pmb.berkeley.edu/~taylor/people/js.html>http://pmb.berkeley.edu/ > ~taylor/people/js.html > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > href=http://lists.open-bio.org/mailman/listinfo/bioperl-l>http://lists.open- > bio.org/mailman/listinfo/bioperl-l Schnell und einfach ohne Anschlusswechsel zur Lycos DSL Flatrate wechseln und 3 Monate kostenlos ab effektiven 5,21 EUR pro Monat im ersten Jahr surfen. http://www.lycos.de/startseite/online/dsl/index.html?prod=DSL&trackingID=email_footertxt From cjfields at uiuc.edu Wed Jan 17 20:59:25 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 17 Jan 2007 19:59:25 -0600 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: <413EAEEC-1B81-4154-8ACD-A5D73F9F5D88@uiuc.edu> References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> <45AE3006.1020004@sendu.me.uk> <45AE4EFB.4090909@sendu.me.uk> <45AE54C3.9030201@sendu.me.uk> <413EAEEC-1B81-4154-8ACD-A5D73F9F5D88@uiuc.edu> Message-ID: <27A9D909-A539-4415-9D6D-EB496083E645@uiuc.edu> Li, Sendu, I have committed a fix for this to CVS. Could you check it to make sure everything is kosher? The fix should work for your (Li's) BLAST report example as well as multiquery BLAST reports. chris On Jan 17, 2007, at 12:17 PM, Chris Fields wrote: > Li, Sendu, > > Following up on this, I generated two new multiquery BLAST reports > (BLASTP and BLASTX) from the NCBI BLAST server. I can't reproduce > output similar to Li's example (where the BLASTX header line appears > over and over). It is entirely possible that a local v. 2.2.15 BLAST > installation would generate different output. > > Li, was the example file you sent generated from single query BLAST > runs where the output was appended to the same file, or were they > generated from a single BLAST run where the queries were all in one > file? There is a significant difference between the two. > > Regardless, I am able to reproduce a bug where hits aren't created > properly from BLAST hit tables (which should be easy to fix). I'll > file a bug report and work on this. > > As a general note, if you are only interested in hits and not > alignments you probably shouldn't use the regular BLAST text output. > Try using -m8 -r -m9 BLAST tabular output and 'blasttable' parsing > instead of 'blast' and maybe switch event handlers (all in the > SearchIO HOWTO). > > chris > > On Jan 17, 2007, at 11:03 AM, Chris Fields wrote > ... >> ... >> I plan on generating a new BLAST report (from the web, since I don't >> have 2.2.15 installed) with multiple queries and no alignments to see >> what happens (i.e. see if the new multiquery report is similar to >> this one). If so, there is a 'wrap-up' section of the parser, where >> hit data not in alignments (in the table only) is used to generate >> Hit objects; it's dropping this data likely b/c there is no regex >> signalling the next result event. >> >> chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lidaof at gmail.com Thu Jan 18 00:02:30 2007 From: lidaof at gmail.com (lidaof) Date: Thu, 18 Jan 2007 13:02:30 +0800 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: <27A9D909-A539-4415-9D6D-EB496083E645@uiuc.edu> References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> <45AE3006.1020004@sendu.me.uk> <45AE4EFB.4090909@sendu.me.uk> <45AE54C3.9030201@sendu.me.uk> <413EAEEC-1B81-4154-8ACD-A5D73F9F5D88@uiuc.edu> <27A9D909-A539-4415-9D6D-EB496083E645@uiuc.edu> Message-ID: <12d02c20701172102v1e84a66akb3e1fba88cf7c714@mail.gmail.com> Hi Chris,Sendu, sorry for later reply! i installed the new NCBI blast 2.2.15(blast-2.2.15-ia32-linux.tar.gz) on a CentOS server and the query sequence is the fasta sequence retrieved from NCBI(Soybean's EST sequence) because the sequence got from NCBI was not annotated so i build a plant protein blast database(by formatdb command) using uniprot's plant protein sequence(plantUniP_withIPR.pep) the i do the blastx using the EST sequence against protein sequence the command is "blastall -p blastx -i soybean_est -d plantUniP_withIPR.pep -a 4 -b 5 -v 5 -o soybean_est_blastx &" about 6 days later,i got the output result of blastx. the i wrote a script to extract the best hit's description to annotate each sequence that's exactly i do Cheers Li On 1/18/07, Chris Fields wrote: > > Li, Sendu, > > I have committed a fix for this to CVS. Could you check it to make > sure everything is kosher? > > The fix should work for your (Li's) BLAST report example as well as > multiquery BLAST reports. > > chris > > On Jan 17, 2007, at 12:17 PM, Chris Fields wrote: > > > Li, Sendu, > > > > Following up on this, I generated two new multiquery BLAST reports > > (BLASTP and BLASTX) from the NCBI BLAST server. I can't reproduce > > output similar to Li's example (where the BLASTX header line appears > > over and over). It is entirely possible that a local v. 2.2.15 BLAST > > installation would generate different output. > > > > Li, was the example file you sent generated from single query BLAST > > runs where the output was appended to the same file, or were they > > generated from a single BLAST run where the queries were all in one > > file? There is a significant difference between the two. > > > > Regardless, I am able to reproduce a bug where hits aren't created > > properly from BLAST hit tables (which should be easy to fix). I'll > > file a bug report and work on this. > > > > As a general note, if you are only interested in hits and not > > alignments you probably shouldn't use the regular BLAST text output. > > Try using -m8 -r -m9 BLAST tabular output and 'blasttable' parsing > > instead of 'blast' and maybe switch event handlers (all in the > > SearchIO HOWTO). > > > > chris > > > > On Jan 17, 2007, at 11:03 AM, Chris Fields wrote > > ... > >> ... > >> I plan on generating a new BLAST report (from the web, since I don't > >> have 2.2.15 installed) with multiple queries and no alignments to see > >> what happens (i.e. see if the new multiquery report is similar to > >> this one). If so, there is a 'wrap-up' section of the parser, where > >> hit data not in alignments (in the table only) is used to generate > >> Hit objects; it's dropping this data likely b/c there is no regex > >> signalling the next result event. > >> > >> chris > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- Li From lidaof at gmail.com Thu Jan 18 00:05:48 2007 From: lidaof at gmail.com (lidaof) Date: Thu, 18 Jan 2007 13:05:48 +0800 Subject: [Bioperl-l] [Bioperl]problem with E-value In-Reply-To: <45AE54C3.9030201@sendu.me.uk> References: <12d02c20701162231o68411209i51f9837c783cce62@mail.gmail.com> <45ADFEAA.2030808@sendu.me.uk> <12d02c20701170502h72a3ebdeqfd67b58129c388cd@mail.gmail.com> <45AE3006.1020004@sendu.me.uk> <45AE4EFB.4090909@sendu.me.uk> <45AE54C3.9030201@sendu.me.uk> Message-ID: <12d02c20701172105t7b11e85ch266f263692918f88@mail.gmail.com> Hi, the blast report i provided is the last 5000 line of my blastx result i use "tail -5000 >" to generated that because the origin result is so huge and i will update my Bioperl using CVS to check if it works Cheers Li On 1/18/07, Sendu Bala wrote: > > Chris Fields wrote: > > I pretty sure I know exactly what the problem is and how to fix it (if > > you haven't done it already). Looks like the parser trashes the rest of > > the BLAST results data since it's not catching the next BLAST report > > header (and not breaking out of the while() loop). That may be what is > > triggering the e-value error. I have been parsing new BLAST reports > > recently w/o that one popping up, but it may be a difference between the > > web BLAST report and the executable (wouldn't be the first time that has > > happened. > > > > Did you want me to take a look? > > Sure. In case you didn't notice, the reason it isn't catching the next > BLAST report is the lack of alignments. For the few results that do have > alignments, that's where it 'works'. > -- Li From heikki at sanbi.ac.za Thu Jan 18 02:39:31 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 18 Jan 2007 09:39:31 +0200 Subject: [Bioperl-l] Mutation IO In-Reply-To: <188661178029100@lycos-europe.com> References: <188661178029100@lycos-europe.com> Message-ID: <200701180939.32113.heikki@sanbi.ac.za> Marian, Do not try to cram too much into one class. BIC format is apparently a useful shorthand for some cases, but representing that in the memory using objects in an expandable way is an other thing. Your example below describes an individual's diploid genotype. Putting that into one sequence object is not a good idea. The way to model that is to have a reference sequence and then define an individual that has that sequence in diploid or haploid (sex chromosomes) setting and list the alleles that person has in the reference sequence coordinate system. You might be interested in separating the alleles by chromosomes, too. Representing, reporting and modelling genotype information is something that has been of interest for me and a group of other people for some time. An early draft of a web site about a genotyping standard can be found here: http://www.openpml.org. It being worked on heavily and more material will be added soon. -Heikki On Thursday 18 January 2007 01:31, marian thieme wrote: > Jason, your right, probably it is some kind of abuse of the bioperl api, > but its a very quick way to get results, because I dont need to cope with > replacing substrings. On the other hand, if you are using the Root.pm class > in other scripts, it can probably cause some malfunction (inclusive crash > of your application). Probably its no big matter to provide a filestream IO > class which is reading/writing the sequence and translates the in/from > IUPAC chars. But one thing I dont see at present: How would you represent > more complex mutations, as change of few bases ? Ok here we could represent > each position seperatly. But in the case of a mutation ? I dont know if > there is a iupac char which treats a mutation ! Lets consider this case: > 1.) origin of some position is a > 2.) some individual has in one locus an a and the other is missing that > base or perhaps both loci are missing the a. so via BIC notation you can > write [a/_] resp. [_/_]. Any idea how to resolve this ? > > Marian > > > Von: Jason Stajich > > An: marian thieme > > Betreff: Re: [Bioperl-l] Bio::Root::Root/Bio::LiveSeq::Mutation > > Datum: Wed, 17 Jan 2007 08:40:45 -0800 > > > > I think you are ignoring the fact that errors are thrown for a > > reason, not just to annoy you. > > > > Why not store the data in Bio::Seq objects as IUPAC ambiguity codes > > and write a special writer class in Bio::SeqIO which converts the > > ambiguity codes to your specified encoding. > > There are examples of how to write your own Bio::SeqIO class in the > > HOWTO tutorials when we talk about extending the toolkit. There is > > also all the code to decompose an ambiguity code into the bases it > > represents. > > > > > > -jason > > > > On Jan 16, 2007, at 2:20 AM, marian thieme wrote: > > > Hi, as I told to this list some time ago, I want to ouput > > > heterozygous dna sequences of different individuals. > > > We need to output variations in the following manner: > > > [a/g] if there is a loci where one allele has an "a" and the other > > > has a "g". (Also known as BIC db format or something like this) > > > My approach is to use the Bio::LiveSeq::Mutation (class ?) to > > > change the specific position in the sequence. > > > > > > > > > Bio::SeqUtils->mutate($seqobj, Bio::LiveSeq::Mutation->new( > > > -seq => "[a/g]", > > > -seqori => $seqori, > > > -pos => $pos, > > > -len => $length)); > > > > > > But unfortunatly this would rise an exception, that some unexpected > > > chars occur. Hence I went in to the code of Root.pm and made a > > > small change: commenting out line 359 in Root.pm : > > > > > > if( $ERRORLOADED ) { > > > # print STDERR " Calling Error::throw\n\n"; > > > > > > # Enable re-throwing of Error objects. > > > # If the error is not derived from Bio::Root::Exception, > > > # we can't guarantee that the Error's value was set properly > > > # and, ipso facto, that it will be catchable from an eval{}. > > > # But chances are, if you're re-throwing non- > > > Bio::Root::Exceptions, > > > # you're probably using Error::try(), not eval{}. > > > # TODO: Fix the MSG: line of the re -thrown error. Has an > > > extra line > > > # containing the '----- EXCEPTION -----' banner. > > > if( ref($args[0])) { > > > if( $args[0]->isa('Error')) { > > > my $class = ref $args[0]; > > > $class->throw( @args ); > > > } else { > > > my $text .= "\nWARNING: Attempt to throw a non- > > > Error.pm object: " . ref$args[0]; > > > my $class = "Bio::Root::Exception"; > > > $class->throw( '-text' => $text, '-value' => $args > > > [0] ); > > > } > > > } else { > > > $class ||= "Bio::Root::Exception"; > > > > > > my %args; > > > if( @args % 2 == 0 && $args[0] =~ /^-/ ) { > > > %args = @args; > > > $args{-text} = $text; > > > $args{-object} = $self; > > > } > > > > > > (Line 359:) #$class->throw( scalar keys %args > 0 ? %args : > > > @args ); # (%args || @args) puts %args in scalar context! > > > &nbs p; } > > > } > > > > > > > > > After I did alter this line all is working fine. But I know that > > > this can be considered in the best case as a work around. > > > > > > 2 Questions: > > > > > > Do you think it is worth to provide some class which are natively > > > able to cope with that matter ? > > > Do I need to expect some unwanted behavior of some scripts resp. > > > classes ? > > > > > > Regards, > > > Marian > > > > > > > > > > > > > > > > > > > > > > > > > > > _________________________________ > > > Stelle Deine Fragen bei Lycos iQ > > > href=http://iq.lycos.de/qa/ask/>http://iq.lycos.de/qa/ask/> > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > > > > href=http://lists.open-bio.org/mailman/listinfo/bioperl-l>http://lists.op > >en- bio.org/mailman/listinfo/bioperl-l > > -- > > Jason Stajich > > Miller Research Fellow > > University of California, Berkeley > > lab: 510.642.8441 > > > href=http://pmb.berkeley.edu/~taylor/people/js.html>http://pmb.berkeley.e > >du/ ~taylor/people/js.html > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > href=http://lists.open-bio.org/mailman/listinfo/bioperl-l>http://lists.op > >en- bio.org/mailman/listinfo/bioperl-l > > Schnell und einfach ohne Anschlusswechsel zur Lycos DSL Flatrate wechseln > und 3 Monate kostenlos ab effektiven 5,21 EUR pro Monat im ersten Jahr > surfen. > http://www.lycos.de/startseite/online/dsl/index.html?prod=DSL&trackingID=em >ail_footertxt -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Anthony.Underwood at hpa.org.uk Thu Jan 18 08:44:51 2007 From: Anthony.Underwood at hpa.org.uk (Anthony Underwood) Date: Thu, 18 Jan 2007 13:44:51 -0000 Subject: [Bioperl-l] Translate sequences align and reverse translate Message-ID: <69E2D2428BD6C2429B8944FEC53B1EB91263D5@colhpaexc004.HPA.org.uk> Hi All, I would like to automate the process of taking coding DNA sequences, translating them to amino acids, aligning them with clustalw and then reverse-translating back to DNA so as to obtain the best alignment. I can think of how I would do this with bioperl but since it is probably a common process I would like to ask if anyone already has such a script that they wouldn't mind sharing so I don't re-invent the wheel. Many thanks, Anthony ----------------------------------------- ******************************************************************* ******* The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************** ************************************************ From avilella at gmail.com Thu Jan 18 10:53:00 2007 From: avilella at gmail.com (Albert Vilella) Date: Thu, 18 Jan 2007 15:53:00 +0000 Subject: [Bioperl-l] Translate sequences align and reverse translate In-Reply-To: <69E2D2428BD6C2429B8944FEC53B1EB91263D5@colhpaexc004.HPA.org.uk> References: <69E2D2428BD6C2429B8944FEC53B1EB91263D5@colhpaexc004.HPA.org.uk> Message-ID: <358f4d650701180753x6aaf17fau3610cc9b0e429934@mail.gmail.com> There are some dirty dirty scripts in: http://ortholytics.googlecode.com/svn/create_alignment_sets.PLS (this one relies on orthology info from orthomcl) http://ortholytics.googlecode.com/svn/launch_probcons_sets.PLS http://ortholytics.googlecode.com/svn/launch_aa_to_cds_dna.PLS You may find some of the stuff of interest, Cheers, Albert. On 1/18/07, Anthony Underwood wrote: > Hi All, > > > > I would like to automate the process of taking coding DNA sequences, > translating them to amino acids, aligning them with clustalw and then > reverse-translating back to DNA so as to obtain the best alignment. I > can think of how I would do this with bioperl but since it is probably a > common process I would like to ask if anyone already has such a script > that they wouldn't mind sharing so I don't re-invent the wheel. > > > > Many thanks, > > > > Anthony > > > > > ----------------------------------------- > ******************************************************************* > ******* > The information contained in the EMail and any attachments is > confidential and intended solely and for the attention and use of > the named addressee(s). It may not be disclosed to any other person > without the express authority of the HPA, or the intended > recipient, or both. If you are not the intended recipient, you must > not disclose, copy, distribute or retain this message or any part > of it. This footnote also confirms that this EMail has been swept > for computer viruses, but please re-sweep any attachments before > opening or saving. HTTP://www.HPA.org.uk ************************** > ************************************************ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Jan 18 10:56:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 18 Jan 2007 09:56:42 -0600 Subject: [Bioperl-l] Mutation IO In-Reply-To: <200701180939.32113.heikki@sanbi.ac.za> References: <188661178029100@lycos-europe.com> <200701180939.32113.heikki@sanbi.ac.za> Message-ID: <19232667-7E88-4802-B456-53F96514E717@uiuc.edu> I haven't dabbled with the Mutation/Variation stuff, but couldn't one use a reference sequence (as Heikki suggests) and then use SeqFeatures for the alleles? You could tag the seqfeature with the allele name for downstream work. You could maybe add a SeqIO writer (Jason's suggestion) or just add a helper sub to Bio::SeqUtils for converting any variation data in a Bio::SeqI into the string you want, based on allele(s) you specify and the Seq object. While working on Location stuff, I noticed this is how variations are represented in normal GenBank files, using the primary feature tag of 'variation' or 'misc_difference' (I think there are a few others): http://tinyurl.com/22coeq Using SeqFeatures also allows for deletions/insertions: http://tinyurl.com/2e2egw http://tinyurl.com/277a6g chris On Jan 18, 2007, at 1:39 AM, Heikki Lehvaslaiho wrote: > Marian, > > Do not try to cram too much into one class. BIC format is > apparently a useful > shorthand for some cases, but representing that in the memory using > objects > in an expandable way is an other thing. > > Your example below describes an individual's diploid genotype. > Putting that > into one sequence object is not a good idea. The way to model that > is to have > a reference sequence and then define an individual that has that > sequence in > diploid or haploid (sex chromosomes) setting and list the alleles > that person > has in the reference sequence coordinate system. You might be > interested in > separating the alleles by chromosomes, too. > > Representing, reporting and modelling genotype information is > something that > has been of interest for me and a group of other people for some > time. An > early draft of a web site about a genotyping standard can be found > here: > http://www.openpml.org. It being worked on heavily and more > material will be > added soon. > > -Heikki > > > On Thursday 18 January 2007 01:31, marian thieme wrote: >> Jason, your right, probably it is some kind of abuse of the >> bioperl api, >> but its a very quick way to get results, because I dont need to >> cope with >> replacing substrings. On the other hand, if you are using the >> Root.pm class >> in other scripts, it can probably cause some malfunction >> (inclusive crash >> of your application). Probably its no big matter to provide a >> filestream IO >> class which is reading/writing the sequence and translates the in/ >> from >> IUPAC chars. But one thing I dont see at present: How would you >> represent >> more complex mutations, as change of few bases ? Ok here we could >> represent >> each position seperatly. But in the case of a mutation ? I dont >> know if >> there is a iupac char which treats a mutation ! Lets consider this >> case: >> 1.) origin of some position is a >> 2.) some individual has in one locus an a and the other is missing >> that >> base or perhaps both loci are missing the a. so via BIC notation >> you can >> write [a/_] resp. [_/_]. Any idea how to resolve this ? >> >> Marian >> >>> Von: Jason Stajich >>> An: marian thieme >>> Betreff: Re: [Bioperl-l] Bio::Root::Root/Bio::LiveSeq::Mutation >>> Datum: Wed, 17 Jan 2007 08:40:45 -0800 >>> >>> I think you are ignoring the fact that errors are thrown for a >>> reason, not just to annoy you. >>> >>> Why not store the data in Bio::Seq objects as IUPAC ambiguity codes >>> and write a special writer class in Bio::SeqIO which converts the >>> ambiguity codes to your specified encoding. >>> There are examples of how to write your own Bio::SeqIO class in the >>> HOWTO tutorials when we talk about extending the toolkit. There is >>> also all the code to decompose an ambiguity code into the bases it >>> represents. >>> >>> >>> -jason >>> >>> On Jan 16, 2007, at 2:20 AM, marian thieme wrote: >>>> Hi, as I told to this list some time ago, I want to ouput >>>> heterozygous dna sequences of different individuals. >>>> We need to output variations in the following manner: >>>> [a/g] if there is a loci where one allele has an "a" and the other >>>> has a "g". (Also known as BIC db format or something like this) >>>> My approach is to use the Bio::LiveSeq::Mutation (class ?) to >>>> change the specific position in the sequence. >>>> >>>> >>>> Bio::SeqUtils->mutate($seqobj, Bio::LiveSeq::Mutation->new( >>>> -seq => "[a/g]", >>>> -seqori => $seqori, >>>> -pos => $pos, >>>> -len => $length)); >>>> >>>> But unfortunatly this would rise an exception, that some unexpected >>>> chars occur. Hence I went in to the code of Root.pm and made a >>>> small change: commenting out line 359 in Root.pm : >>>> >>>> if( $ERRORLOADED ) { >>>> # print STDERR " Calling Error::throw\n\n"; >>>> >>>> # Enable re-throwing of Error objects. >>>> # If the error is not derived from Bio::Root::Exception, >>>> # we can't guarantee that the Error's value was set properly >>>> # and, ipso facto, that it will be catchable from an eval{}. >>>> # But chances are, if you're re-throwing non- >>>> Bio::Root::Exceptions, >>>> # you're probably using Error::try(), not eval{}. >>>> # TODO: Fix the MSG: line of the re -thrown error. Has an >>>> extra line >>>> # containing the '----- EXCEPTION -----' banner. >>>> if( ref($args[0])) { >>>> if( $args[0]->isa('Error')) { >>>> my $class = ref $args[0]; >>>> $class->throw( @args ); >>>> } else { >>>> my $text .= "\nWARNING: Attempt to throw a non- >>>> Error.pm object: " . ref$args[0]; >>>> my $class = "Bio::Root::Exception"; >>>> $class->throw( '-text' => $text, '-value' => $args >>>> [0] ); >>>> } >>>> } else { >>>> $class ||= "Bio::Root::Exception"; >>>> >>>> my %args; >>>> if( @args % 2 == 0 && $args[0] =~ /^-/ ) { >>>> %args = @args; >>>> $args{-text} = $text; >>>> $args{-object} = $self; >>>> } >>>> >>>> (Line 359:) #$class->throw( scalar keys %args > 0 ? %args : >>>> @args ); # (%args || @args) puts %args in scalar context! >>>> &nbs p; } >>>> } >>>> >>>> >>>> After I did alter this line all is working fine. But I know that >>>> this can be considered in the best case as a work around. >>>> >>>> 2 Questions: >>>> >>>> Do you think it is worth to provide some class which are natively >>>> able to cope with that matter ? >>>> Do I need to expect some unwanted behavior of some scripts resp. >>>> classes ? >>>> >>>> Regards, >>>> Marian >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _________________________________ >>>> Stelle Deine Fragen bei Lycos iQ >> >>> href=http://iq.lycos.de/qa/ask/>http://iq.lycos.de/qa/ask/> >>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> >> >>> href=http://lists.open-bio.org/mailman/listinfo/bioperl-l>http:// >>> lists.op >>> en- bio.org/mailman/listinfo/bioperl-l >>> -- >>> Jason Stajich >>> Miller Research Fellow >>> University of California, Berkeley >>> lab: 510.642.8441 >>> >> href=http://pmb.berkeley.edu/~taylor/people/js.html>http:// >>> pmb.berkeley.e >>> du/ ~taylor/people/js.html >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> >> href=http://lists.open-bio.org/mailman/listinfo/bioperl-l>http:// >>> lists.op >>> en- bio.org/mailman/listinfo/bioperl-l >> >> Schnell und einfach ohne Anschlusswechsel zur Lycos DSL Flatrate >> wechseln >> und 3 Monate kostenlos ab effektiven 5,21 EUR pro Monat im ersten >> Jahr >> surfen. >> http://www.lycos.de/startseite/online/dsl/index.html? >> prod=DSL&trackingID=em >> ail_footertxt > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arareko at campus.iztacala.unam.mx Thu Jan 18 11:25:22 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 18 Jan 2007 10:25:22 -0600 Subject: [Bioperl-l] Mutation IO In-Reply-To: <19232667-7E88-4802-B456-53F96514E717@uiuc.edu> References: <188661178029100@lycos-europe.com> <200701180939.32113.heikki@sanbi.ac.za> <19232667-7E88-4802-B456-53F96514E717@uiuc.edu> Message-ID: <45AF9F72.5050304@campus.iztacala.unam.mx> Folks, A bit off-topic here: It would be better if we post full URLs instead of tinyfied ones. I think of this because tinyurls have an expiration date thus leaving a soon-to-expire URL archived in the mailing list and making the list archive less useful for future references. Regards, Mauricio. Chris Fields wrote: > I haven't dabbled with the Mutation/Variation stuff, but couldn't one > use a reference sequence (as Heikki suggests) and then use > SeqFeatures for the alleles? You could tag the seqfeature with the > allele name for downstream work. You could maybe add a SeqIO writer > (Jason's suggestion) or just add a helper sub to Bio::SeqUtils for > converting any variation data in a Bio::SeqI into the string you > want, based on allele(s) you specify and the Seq object. > > While working on Location stuff, I noticed this is how variations are > represented in normal GenBank files, using the primary feature tag of > 'variation' or 'misc_difference' (I think there are a few others): > > http://tinyurl.com/22coeq > > Using SeqFeatures also allows for deletions/insertions: > > http://tinyurl.com/2e2egw > > http://tinyurl.com/277a6g > > > chris > > On Jan 18, 2007, at 1:39 AM, Heikki Lehvaslaiho wrote: > >> Marian, >> >> Do not try to cram too much into one class. BIC format is >> apparently a useful >> shorthand for some cases, but representing that in the memory using >> objects >> in an expandable way is an other thing. >> >> Your example below describes an individual's diploid genotype. >> Putting that >> into one sequence object is not a good idea. The way to model that >> is to have >> a reference sequence and then define an individual that has that >> sequence in >> diploid or haploid (sex chromosomes) setting and list the alleles >> that person >> has in the reference sequence coordinate system. You might be >> interested in >> separating the alleles by chromosomes, too. >> >> Representing, reporting and modelling genotype information is >> something that >> has been of interest for me and a group of other people for some >> time. An >> early draft of a web site about a genotyping standard can be found >> here: >> http://www.openpml.org. It being worked on heavily and more >> material will be >> added soon. >> >> -Heikki >> >> >> On Thursday 18 January 2007 01:31, marian thieme wrote: >>> Jason, your right, probably it is some kind of abuse of the >>> bioperl api, >>> but its a very quick way to get results, because I dont need to >>> cope with >>> replacing substrings. On the other hand, if you are using the >>> Root.pm class >>> in other scripts, it can probably cause some malfunction >>> (inclusive crash >>> of your application). Probably its no big matter to provide a >>> filestream IO >>> class which is reading/writing the sequence and translates the in/ >>> from >>> IUPAC chars. But one thing I dont see at present: How would you >>> represent >>> more complex mutations, as change of few bases ? Ok here we could >>> represent >>> each position seperatly. But in the case of a mutation ? I dont >>> know if >>> there is a iupac char which treats a mutation ! Lets consider this >>> case: >>> 1.) origin of some position is a >>> 2.) some individual has in one locus an a and the other is missing >>> that >>> base or perhaps both loci are missing the a. so via BIC notation >>> you can >>> write [a/_] resp. [_/_]. Any idea how to resolve this ? >>> >>> Marian >>> >>>> Von: Jason Stajich >>>> An: marian thieme >>>> Betreff: Re: [Bioperl-l] Bio::Root::Root/Bio::LiveSeq::Mutation >>>> Datum: Wed, 17 Jan 2007 08:40:45 -0800 >>>> >>>> I think you are ignoring the fact that errors are thrown for a >>>> reason, not just to annoy you. >>>> >>>> Why not store the data in Bio::Seq objects as IUPAC ambiguity codes >>>> and write a special writer class in Bio::SeqIO which converts the >>>> ambiguity codes to your specified encoding. >>>> There are examples of how to write your own Bio::SeqIO class in the >>>> HOWTO tutorials when we talk about extending the toolkit. There is >>>> also all the code to decompose an ambiguity code into the bases it >>>> represents. >>>> >>>> >>>> -jason >>>> >>>> On Jan 16, 2007, at 2:20 AM, marian thieme wrote: >>>>> Hi, as I told to this list some time ago, I want to ouput >>>>> heterozygous dna sequences of different individuals. >>>>> We need to output variations in the following manner: >>>>> [a/g] if there is a loci where one allele has an "a" and the other >>>>> has a "g". (Also known as BIC db format or something like this) >>>>> My approach is to use the Bio::LiveSeq::Mutation (class ?) to >>>>> change the specific position in the sequence. >>>>> >>>>> >>>>> Bio::SeqUtils->mutate($seqobj, Bio::LiveSeq::Mutation->new( >>>>> -seq => "[a/g]", >>>>> -seqori => $seqori, >>>>> -pos => $pos, >>>>> -len => $length)); >>>>> >>>>> But unfortunatly this would rise an exception, that some unexpected >>>>> chars occur. Hence I went in to the code of Root.pm and made a >>>>> small change: commenting out line 359 in Root.pm : >>>>> >>>>> if( $ERRORLOADED ) { >>>>> # print STDERR " Calling Error::throw\n\n"; >>>>> >>>>> # Enable re-throwing of Error objects. >>>>> # If the error is not derived from Bio::Root::Exception, >>>>> # we can't guarantee that the Error's value was set properly >>>>> # and, ipso facto, that it will be catchable from an eval{}. >>>>> # But chances are, if you're re-throwing non- >>>>> Bio::Root::Exceptions, >>>>> # you're probably using Error::try(), not eval{}. >>>>> # TODO: Fix the MSG: line of the re -thrown error. Has an >>>>> extra line >>>>> # containing the '----- EXCEPTION -----' banner. >>>>> if( ref($args[0])) { >>>>> if( $args[0]->isa('Error')) { >>>>> my $class = ref $args[0]; >>>>> $class->throw( @args ); >>>>> } else { >>>>> my $text .= "\nWARNING: Attempt to throw a non- >>>>> Error.pm object: " . ref$args[0]; >>>>> my $class = "Bio::Root::Exception"; >>>>> $class->throw( '-text' => $text, '-value' => $args >>>>> [0] ); >>>>> } >>>>> } else { >>>>> $class ||= "Bio::Root::Exception"; >>>>> >>>>> my %args; >>>>> if( @args % 2 == 0 && $args[0] =~ /^-/ ) { >>>>> %args = @args; >>>>> $args{-text} = $text; >>>>> $args{-object} = $self; >>>>> } >>>>> >>>>> (Line 359:) #$class->throw( scalar keys %args > 0 ? %args : >>>>> @args ); # (%args || @args) puts %args in scalar context! >>>>> &nbs p; } >>>>> } >>>>> >>>>> >>>>> After I did alter this line all is working fine. But I know that >>>>> this can be considered in the best case as a work around. >>>>> >>>>> 2 Questions: >>>>> >>>>> Do you think it is worth to provide some class which are natively >>>>> able to cope with that matter ? >>>>> Do I need to expect some unwanted behavior of some scripts resp. >>>>> classes ? >>>>> >>>>> Regards, >>>>> Marian >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _________________________________ >>>>> Stelle Deine Fragen bei Lycos iQ >>> href=http://iq.lycos.de/qa/ask/>http://iq.lycos.de/qa/ask/> >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> >>> href=http://lists.open-bio.org/mailman/listinfo/bioperl-l>http:// >>>> lists.op >>>> en- bio.org/mailman/listinfo/bioperl-l >>>> -- >>>> Jason Stajich >>>> Miller Research Fellow >>>> University of California, Berkeley >>>> lab: 510.642.8441 >>>> >>> href=http://pmb.berkeley.edu/~taylor/people/js.html>http:// >>>> pmb.berkeley.e >>>> du/ ~taylor/people/js.html >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> >>> href=http://lists.open-bio.org/mailman/listinfo/bioperl-l>http:// >>>> lists.op >>>> en- bio.org/mailman/listinfo/bioperl-l >>> Schnell und einfach ohne Anschlusswechsel zur Lycos DSL Flatrate >>> wechseln >>> und 3 Monate kostenlos ab effektiven 5,21 EUR pro Monat im ersten >>> Jahr >>> surfen. >>> http://www.lycos.de/startseite/online/dsl/index.html? >>> prod=DSL&trackingID=em >>> ail_footertxt >> -- >> ______ _/ _/_____________________________________________________ >> _/ _/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >> _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho >> _/ _/ _/ SANBI, South African National Bioinformatics Institute >> _/ _/ _/ University of Western Cape, South Africa >> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >> ___ _/_/_/_/_/________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu Jan 18 11:45:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 18 Jan 2007 10:45:21 -0600 Subject: [Bioperl-l] Mutation IO In-Reply-To: <45AF9F72.5050304@campus.iztacala.unam.mx> References: <188661178029100@lycos-europe.com> <200701180939.32113.heikki@sanbi.ac.za> <19232667-7E88-4802-B456-53F96514E717@uiuc.edu> <45AF9F72.5050304@campus.iztacala.unam.mx> Message-ID: <38043784-D437-43C3-A2D8-08E118872363@uiuc.edu> Agreed. The only reason I use them is the wrap-around issue. For future eyes: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1335776 http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=558492 http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=468423 chris On Jan 18, 2007, at 10:25 AM, Mauricio Herrera Cuadra wrote: > Folks, > > A bit off-topic here: It would be better if we post full URLs > instead of tinyfied ones. I think of this because tinyurls have an > expiration date thus leaving a soon-to-expire URL archived in the > mailing list and making the list archive less useful for future > references. > > Regards, > Mauricio. > > Chris Fields wrote: >> I haven't dabbled with the Mutation/Variation stuff, but couldn't >> one use a reference sequence (as Heikki suggests) and then use >> SeqFeatures for the alleles? You could tag the seqfeature with >> the allele name for downstream work. You could maybe add a SeqIO >> writer (Jason's suggestion) or just add a helper sub to >> Bio::SeqUtils for converting any variation data in a Bio::SeqI >> into the string you want, based on allele(s) you specify and the >> Seq object. >> While working on Location stuff, I noticed this is how variations >> are represented in normal GenBank files, using the primary >> feature tag of 'variation' or 'misc_difference' (I think there >> are a few others): >> http://tinyurl.com/22coeq >> Using SeqFeatures also allows for deletions/insertions: >> http://tinyurl.com/2e2egw >> http://tinyurl.com/277a6g >> chris >> On Jan 18, 2007, at 1:39 AM, Heikki Lehvaslaiho wrote: >>> Marian, >>> >>> Do not try to cram too much into one class. BIC format is >>> apparently a useful >>> shorthand for some cases, but representing that in the memory >>> using objects >>> in an expandable way is an other thing. >>> >>> Your example below describes an individual's diploid genotype. >>> Putting that >>> into one sequence object is not a good idea. The way to model >>> that is to have >>> a reference sequence and then define an individual that has that >>> sequence in >>> diploid or haploid (sex chromosomes) setting and list the >>> alleles that person >>> has in the reference sequence coordinate system. You might be >>> interested in >>> separating the alleles by chromosomes, too. >>> >>> Representing, reporting and modelling genotype information is >>> something that >>> has been of interest for me and a group of other people for some >>> time. An >>> early draft of a web site about a genotyping standard can be >>> found here: >>> http://www.openpml.org. It being worked on heavily and more >>> material will be >>> added soon. >>> >>> -Heikki >>> >>> >>> On Thursday 18 January 2007 01:31, marian thieme wrote: >>>> Jason, your right, probably it is some kind of abuse of the >>>> bioperl api, >>>> but its a very quick way to get results, because I dont need to >>>> cope with >>>> replacing substrings. On the other hand, if you are using the >>>> Root.pm class >>>> in other scripts, it can probably cause some malfunction >>>> (inclusive crash >>>> of your application). Probably its no big matter to provide a >>>> filestream IO >>>> class which is reading/writing the sequence and translates the >>>> in/ from >>>> IUPAC chars. But one thing I dont see at present: How would you >>>> represent >>>> more complex mutations, as change of few bases ? Ok here we >>>> could represent >>>> each position seperatly. But in the case of a mutation ? I dont >>>> know if >>>> there is a iupac char which treats a mutation ! Lets consider >>>> this case: >>>> 1.) origin of some position is a >>>> 2.) some individual has in one locus an a and the other is >>>> missing that >>>> base or perhaps both loci are missing the a. so via BIC >>>> notation you can >>>> write [a/_] resp. [_/_]. Any idea how to resolve this ? >>>> >>>> Marian >>>> >>>>> Von: Jason Stajich >>>>> An: marian thieme >>>>> Betreff: Re: [Bioperl-l] Bio::Root::Root/Bio::LiveSeq::Mutation >>>>> Datum: Wed, 17 Jan 2007 08:40:45 -0800 >>>>> >>>>> I think you are ignoring the fact that errors are thrown for a >>>>> reason, not just to annoy you. >>>>> >>>>> Why not store the data in Bio::Seq objects as IUPAC ambiguity >>>>> codes >>>>> and write a special writer class in Bio::SeqIO which converts the >>>>> ambiguity codes to your specified encoding. >>>>> There are examples of how to write your own Bio::SeqIO class in >>>>> the >>>>> HOWTO tutorials when we talk about extending the toolkit. There is >>>>> also all the code to decompose an ambiguity code into the bases it >>>>> represents. >>>>> >>>>> >>>>> -jason >>>>> >>>>> On Jan 16, 2007, at 2:20 AM, marian thieme wrote: >>>>>> Hi, as I told to this list some time ago, I want to ouput >>>>>> heterozygous dna sequences of different individuals. >>>>>> We need to output variations in the following manner: >>>>>> [a/g] if there is a loci where one allele has an "a" and the >>>>>> other >>>>>> has a "g". (Also known as BIC db format or something like this) >>>>>> My approach is to use the Bio::LiveSeq::Mutation (class ?) to >>>>>> change the specific position in the sequence. >>>>>> >>>>>> >>>>>> Bio::SeqUtils->mutate($seqobj, Bio::LiveSeq::Mutation->new( >>>>>> -seq => "[a/g]", >>>>>> -seqori => $seqori, >>>>>> -pos => $pos, >>>>>> -len => $length)); >>>>>> >>>>>> But unfortunatly this would rise an exception, that some >>>>>> unexpected >>>>>> chars occur. Hence I went in to the code of Root.pm and made a >>>>>> small change: commenting out line 359 in Root.pm : >>>>>> >>>>>> if( $ERRORLOADED ) { >>>>>> # print STDERR " Calling Error::throw\n\n"; >>>>>> >>>>>> # Enable re-throwing of Error objects. >>>>>> # If the error is not derived from Bio::Root::Exception, >>>>>> # we can't guarantee that the Error's value was set >>>>>> properly >>>>>> # and, ipso facto, that it will be catchable from an >>>>>> eval{}. >>>>>> # But chances are, if you're re-throwing non- >>>>>> Bio::Root::Exceptions, >>>>>> # you're probably using Error::try(), not eval{}. >>>>>> # TODO: Fix the MSG: line of the re -thrown error. Has an >>>>>> extra line >>>>>> # containing the '----- EXCEPTION -----' banner. >>>>>> if( ref($args[0])) { >>>>>> if( $args[0]->isa('Error')) { >>>>>> my $class = ref $args[0]; >>>>>> $class->throw( @args ); >>>>>> } else { >>>>>> my $text .= "\nWARNING: Attempt to throw a non- >>>>>> Error.pm object: " . ref$args[0]; >>>>>> my $class = "Bio::Root::Exception"; >>>>>> $class->throw( '-text' => $text, '-value' => $args >>>>>> [0] ); >>>>>> } >>>>>> } else { >>>>>> $class ||= "Bio::Root::Exception"; >>>>>> >>>>>> my %args; >>>>>> if( @args % 2 == 0 && $args[0] =~ /^-/ ) { >>>>>> %args = @args; >>>>>> $args{-text} = $text; >>>>>> $args{-object} = $self; >>>>>> } >>>>>> >>>>>> (Line 359:) #$class->throw( scalar keys %args > 0 ? %args : >>>>>> @args ); # (%args || @args) puts %args in scalar context! >>>>>> &nbs p; } >>>>>> } >>>>>> >>>>>> >>>>>> After I did alter this line all is working fine. But I know that >>>>>> this can be considered in the best case as a work around. >>>>>> >>>>>> 2 Questions: >>>>>> >>>>>> Do you think it is worth to provide some class which are natively >>>>>> able to cope with that matter ? >>>>>> Do I need to expect some unwanted behavior of some scripts resp. >>>>>> classes ? >>>>>> >>>>>> Regards, >>>>>> Marian >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _________________________________ >>>>>> Stelle Deine Fragen bei Lycos iQ >>>> href=http://iq.lycos.de/qa/ask/>http://iq.lycos.de/qa/ask/> >>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> >>>> href=http://lists.open-bio.org/mailman/listinfo/bioperl- >>>>> l>http:// lists.op >>>>> en- bio.org/mailman/listinfo/bioperl-l >>>>> -- >>>>> Jason Stajich >>>>> Miller Research Fellow >>>>> University of California, Berkeley >>>>> lab: 510.642.8441 >>>>> >>>> href=http://pmb.berkeley.edu/~taylor/people/js.html>http:// >>>>> pmb.berkeley.e >>>>> du/ ~taylor/people/js.html >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> >>>> href=http://lists.open-bio.org/mailman/listinfo/bioperl- >>>>> l>http:// lists.op >>>>> en- bio.org/mailman/listinfo/bioperl-l >>>> Schnell und einfach ohne Anschlusswechsel zur Lycos DSL >>>> Flatrate wechseln >>>> und 3 Monate kostenlos ab effektiven 5,21 EUR pro Monat im >>>> ersten Jahr >>>> surfen. >>>> http://www.lycos.de/startseite/online/dsl/index.html? >>>> prod=DSL&trackingID=em >>>> ail_footertxt >>> -- >>> ______ _/ _/ >>> _____________________________________________________ >>> _/ _/ >>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>> _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho >>> _/ _/ _/ SANBI, South African National Bioinformatics >>> Institute >>> _/ _/ _/ University of Western Cape, South Africa >>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>> ___ _/_/_/_/_/ >>> ________________________________________________________ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Thu Jan 18 12:50:54 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 18 Jan 2007 09:50:54 -0800 Subject: [Bioperl-l] Translate sequences align and reverse translate In-Reply-To: <69E2D2428BD6C2429B8944FEC53B1EB91263D5@colhpaexc004.HPA.org.uk> References: <69E2D2428BD6C2429B8944FEC53B1EB91263D5@colhpaexc004.HPA.org.uk> Message-ID: <44FC1ED7-08A2-44A9-B16E-6347A162E90C@bioperl.org> bp_pairwise_kaks does most of this for you (you can use clustalw and muscle). at the core is aa_to_dna_align() routine from Bio::Align::Utilities -jason On Jan 18, 2007, at 5:44 AM, Anthony Underwood wrote: > Hi All, > > > > I would like to automate the process of taking coding DNA sequences, > translating them to amino acids, aligning them with clustalw and then > reverse-translating back to DNA so as to obtain the best alignment. I > can think of how I would do this with bioperl but since it is > probably a > common process I would like to ask if anyone already has such a > script > that they wouldn't mind sharing so I don't re-invent the wheel. > > > > Many thanks, > > > > Anthony > > > > > ----------------------------------------- > ******************************************************************* > ******* > The information contained in the EMail and any attachments is > confidential and intended solely and for the attention and use of > the named addressee(s). It may not be disclosed to any other person > without the express authority of the HPA, or the intended > recipient, or both. If you are not the intended recipient, you must > not disclose, copy, distribute or retain this message or any part > of it. This footnote also confirms that this EMail has been swept > for computer viruses, but please re-sweep any attachments before > opening or saving. HTTP://www.HPA.org.uk ************************** > ************************************************ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From stewarta at nmrc.navy.mil Thu Jan 18 12:56:16 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 18 Jan 2007 12:56:16 -0500 Subject: [Bioperl-l] contig disassembly In-Reply-To: <186EFF97-4189-4E48-B3AB-03525F98FB1F@uiuc.edu> References: <99701520-9366-4AE1-8F4D-6CD66C4BA211@nmrc.navy.mil> <200701171038.22979.heikki@sanbi.ac.za> <186EFF97-4189-4E48-B3AB-03525F98FB1F@uiuc.edu> Message-ID: Oh, I forgot to mention that I was referring to Bio::SeqUtils within my updated checkout of bioperl-live. Anyhow, I removed it and got another checkout and I found the new method just fine. Thanks for the help guys :) On Jan 17, 2007, at 5:56 PM, Chris Fields wrote: > It was added post-rel. 1.5.1, I believe, so it won't be there > unless you're using the latest bioperl release (1.5.2) or bioperl- > live. > > chris > > On Jan 17, 2007, at 4:21 PM, Andrew Stewart wrote: > >> I can't seem to find trunc_with_features anywhere in Bio::SeqUtils. >> Was this method or its functioning moved to another module / method >> within this module?? >> >> Thanks >> -Andrew >> >> >> On Jan 17, 2007, at 3:38 AM, Heikki Lehvaslaiho wrote: >> >>> Andrew, >>> >>> The default interface to Bio::SeqI objects does not do it. The >>> methods leave >>> you to deal with feature table changes after sequence changes. >>> >>> However, there are some attempts to provide this kind functionality >>> in the >>> Bio::SeqUtils class. >>> >>> Bio::SeqUtils::cat >>> Bio::SeqUtils::revcom_with_features >>> Bio::SeqUtils::trunc_with_features >>> >>> These could be expanded and made more complete, maybe even a class >>> of its own >>> if there is enough interest? >>> >>> -Heikki >>> >>> >>> On Tuesday 16 January 2007 23:11, Andrew Stewart wrote: >>>> If I want to take a Bio::Seq object representing a contig and >>>> disassemble it into the constituent sequences which originally lead >>>> to its formation, all the while preserving the feature annotation >>>> associated with each sub-sequence, and with the coordinates of >>>> these >>>> feature sets updated to reflect their position relative to these >>>> sub- >>>> sequences, what is the best way to go about this? If I take a >>>> subsequence of a Seq object, will it carry over the relevant >>>> features >>>> as well? >>>> >>>> >>>> -- >>>> Andrew Stewart >>>> Research Assistant, Genomics Team >>>> Navy Medical Research Center (NMRC) >>>> Biological Defense Research Directorate (BDRD) >>>> BDRD Annex >>>> 12300 Washington Avenue, 2nd Floor >>>> Rockville, MD 20852 >>>> >>>> email: stewarta at nmrc.navy.mil >>>> phone: 301-231-6700 Ext 270 >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ______ _/ _/ >>> _____________________________________________________ >>> _/ _/ >>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>> _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho >>> _/ _/ _/ SANBI, South African National Bioinformatics >>> Institute >>> _/ _/ _/ University of Western Cape, South Africa >>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>> ___ _/_/_/_/_/ >>> ________________________________________________________ >> >> >> >> -- >> Andrew Stewart >> Research Assistant, Genomics Team >> Navy Medical Research Center (NMRC) >> Biological Defense Research Directorate (BDRD) >> BDRD Annex >> 12300 Washington Avenue, 2nd Floor >> Rockville, MD 20852 >> >> email: stewarta at nmrc.navy.mil >> phone: 301-231-6700 Ext 270 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From Kevin.M.Brown at asu.edu Thu Jan 18 13:08:18 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 18 Jan 2007 11:08:18 -0700 Subject: [Bioperl-l] Alignment->slice() issue? References: <1A4207F8295607498283FE9E93B775B4028B39C3@EX02.asurite.ad.asu.edu> Message-ID: <1A4207F8295607498283FE9E93B775B4029AFE21@EX02.asurite.ad.asu.edu> NM, looks like I found the issue. Since the alignment object needs the sequences to be padded to match them up (even though a start and stop value are in the alignment) I was trying to speed up the pad method and it wasn't fully filling out. So, I created my own splice function so I don't have the perl interpreter having to pad some sequences with as many as 3,000,000 .'s. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Wednesday, January 17, 2007 9:17 AM > To: bioperl-l list > Subject: [Bioperl-l] Alignment->slice() issue? > > Bioperl: 1.5.2_100 > Perl: perl -v > This is perl, v5.8.5 built for i386-linux-thread-multi > > I'm hoping this is just me, but I've created a huge alignment of a set > of primers on a chromosome and then I'm trying to slice up that one > large alignment into smaller alignments based around the CDS > features of > the chromosome (taken from a Genbank file that the script read in > previously that gives me both the features and the chromosome > sequence). > The error occurs when I request the slice. I get the following: > > ------------- EXCEPTION ------------- > MSG: Bad start,end parameters. Start [1088] has to be less than end > [850] > STACK Bio::PrimarySeq::subseq > /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:354 > STACK Bio::SimpleAlign::slice > /usr/lib/perl5/site_perl/5.8.5/Bio/SimpleAlign.pm:929 > STACK toplevel ./PrimerAnalysis.pl:376 > > -------------------------------------- > > But, based on output I've put into my script that isn't the range I > requested from the alignment. What I've requested is > $align = $alignments{$key}->slice($start, $stop); > $start is 1088 and $stop is 2377 (from the printout below) > "Forward strand with start(1088) and stop(2377) at ./PrimerAnalysis.pl > line 358, <$primer> line 657." > > The feature I'm initally after is BMAA0001 start:1139 stop:2326 with > some upstream and downstream sequence. > > I noticed that slice does "foreach my $seq ( > $self->each_seq() )", so I > copied that to printout all the sequences held by the alignment and > their start and stop locations and get the following: > NC_006349 1 2325379 > BurkM_0005_a-f..BurkM_0005_a-r 80686 81516 > BurkM_0005_a-f..BurkM_0005_a-r 268747 269577 > BurkM_0005_a-f..BurkM_0005_a-r 329852 330682 > BurkM_0005_a-f..BurkM_0005_a-r 560818 561648 > BurkM_0005_a-f..BurkM_0005_a-r 592443 593273 > BurkM_0005_a-f..BurkM_0005_a-r 908245 909075 > BurkM_0005_a-f..BurkM_0005_a-r 935390 936220 > BurkM_0005_a-f..BurkM_0005_a-r 1014714 1015544 > BurkM_0005_a-f..BurkM_0005_a-r 1034315 1035145 > BurkM_0005_a-f..BurkM_0005_a-r 1225934 1226764 > BurkM_0005_a-f..BurkM_0005_a-r 1324779 1325609 > BurkM_0005_a-f..BurkM_0005_a-r 1413075 1413905 > BurkM_0005_a-f..BurkM_0005_a-r 1480717 1481547 > BurkM_0005_a-f..BurkM_0005_a-r 1517965 1518795 > BurkM_0005_a-f..BurkM_0005_a-r 1900786 1901616 > BurkM_0005_a-f..BurkM_0005_a-r 1921906 1922736 > BurkM_0005_a-f..BurkM_0005_a-r 1957979 1958809 > BurkM_0005_a-f..BurkM_0005_a-r 2136301 2137131 > BurkM_0005_a-r..BurkM_0005_a-f 103238 104068 > BurkM_0005_a-r..BurkM_0005_a-f 170641 171471 > BurkM_0005_a-r..BurkM_0005_a-f 408755 409585 > BurkM_0005_a-r..BurkM_0005_a-f 432906 433736 > BurkM_0005_a-r..BurkM_0005_a-f 509458 510288 > BurkM_0005_a-r..BurkM_0005_a-f 565194 566024 > BurkM_0005_a-r..BurkM_0005_a-f 656754 657584 > BurkM_0005_a-r..BurkM_0005_a-f 733927 734757 > BurkM_0005_a-r..BurkM_0005_a-f 838705 839535 > BurkM_0005_a-r..BurkM_0005_a-f 869777 870607 > BurkM_0005_a-r..BurkM_0005_a-f 892021 892851 > BurkM_0005_a-r..BurkM_0005_a-f 909903 910733 > BurkM_0005_a-r..BurkM_0005_a-f 1061801 1062631 > BurkM_0005_a-r..BurkM_0005_a-f 1096777 1097607 > BurkM_0005_a-r..BurkM_0005_a-f 1636356 1637186 > BurkM_0005_a-r..BurkM_0005_a-f 1636356 1643935 > BurkM_0005_a-r..BurkM_0005_a-f 1643105 1643935 > BurkM_0005_a-r..BurkM_0005_a-f 1790703 1791533 > BurkM_0005_a-r..BurkM_0005_a-f 2267109 2267939 > BurkM_0005_a-f..BurkM_0005_a-f 560818 566024 > BurkM_0005_a-f..BurkM_0005_a-f 908245 910733 > BMA_0006_a-r..BMA_0006_a-r 561646 565196 > BMA_0006_a-r..BMA_0006_a-r 909073 909905 > BMA_0046_a-f..BMA_0046_a-r 437921 438661 > BurkM_0092_a-f..BurkM_0092_a-f 561670 565172 > BurkM_0092_a-f..BurkM_0092_a-f 909097 909881 > BMA_0113_a-f..BMA_0113_a-r 1310782 1311536 > BMA_0113_a-r..BMA_0113_a-f 172284 173038 > BMA_0113_a-r..BMA_0113_a-f 2266197 2266951 > BMA_0146_a-f..BMA_0146_a-r 1172194 1173065 > BMA_0146_a-f..BMA_0146_a-r 2267012 2269123 > BMA_0146_a-r..BMA_0146_a-f 167410 168281 > BMA_0146_a-r..BMA_0146_a-f 320180 321051 > BMA_0146_a-r..BMA_0146_a-f 894226 895097 > BMA_0146_a-r..BMA_0146_a-f 894226 900207 > BMA_0146_a-r..BMA_0146_a-f 899335 900207 > BMA_0146_a-r..BMA_0146_a-f 1638747 1639622 > BMA_0146_a-r..BMA_0146_a-f 1972415 1973286 > BMA_0146_a-r..BMA_0146_a-f 2157899 2158770 > BMA_0146_a-r..BMA_0146_a-f 2321169 2322040 > > So, I can see that all the sequences held in the alignment > have Start < > Stop as expected. What I can't figure is where the end value > is coming > from that is messing this up. > > Any help is greatly appreciated. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Jan 18 20:43:37 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 18 Jan 2007 17:43:37 -0800 Subject: [Bioperl-l] Fwd: Bio::SearchIO::Writer::HTMLResultWriter problem References: <45AFFD94.9080407@uma.es> Message-ID: <0B0EEF9B-4EF2-431D-B830-BEC499FC7CE1@bioperl.org> Folks on the list will have a better time answering you. You should report what version of BioPerl you are using as well. -jason Begin forwarded message: > From: "Antonio J. P?rez" > Date: January 18, 2007 3:07:00 PM PST > To: G.Williams at hgmp.mrc.ac.uk, jason at bioperl.org > Subject: Bio::SearchIO::Writer::HTMLResultWriter problem > > Dear Drs., I am trying to use a remoteBlast for a Bioinformatics > course. I use the Bio::SearchIO::Writer::HTMLResultWriter module > but I obtain a plain output without carriage returns. Could it be a > bug or my code is wrong? I send enclosed a copy of my script, and > the blast is implemented in: > http://jaguar.genetica.uma.es/blastUNIA/blastHTTP.html > > Please, could you help me? Thanks in advance and yours sincerely, > > Antonio. > > -- > Participa en mi blog sobre Diversidad Funcional y Bioinform?tica: > http://ajperezbioinfo.blogspot.com > > #!/usr/bin/perl > > # Recibe una secuencia desde un formulario web > # y la envia a un NCBI-Blast remoto > # Este es un ejemplo de CGI en Perl > # que utiliza algunos modulos de la libreria BioPerl > > # AJPerez, 25/08/2005 > > use CGI; # Libreria CGI de Perl > use Bio::SeqIO; # Modulo de > tratamiento de secuencias > use Bio::Tools::Run::RemoteBlast; # Modulo para > ejecutar Blast remotos > use Bio::SearchIO; # Modulo parsear > salidas Blast > use Bio::SearchIO::Writer::HTMLResultWriter; # Modulo para > presentar resultados Blast en HTML > > # Recoge los parametros del formulario > my $form = new CGI; > my $seq_form = $form->param('seq_form'); > my $database_form = $form->param('database_form'); > > # Eliminamos los posibles espacios en blanco de la secuencia > $seq_form =~ s/\s+//g; > > # Selecciona el programa Blast y la base de datos, segun nuestra > eleccion > if ($database_form eq "NT") { # Base de datos no redundante de > nucleotidos > $program = "blastn"; > $database = "nt"; > } else { # Base de datos de aminoacidos SWISS-PROT > $program = "blastp"; > $database = "swissprot"; > } > > # Crea un objeto secuencia, con nuestra secuencia de aminoacidos, > # que luego podremos enviar al Blast remoto > my $seq_input = Bio::PrimarySeq->new ('-seq' => $seq_form); > > # Objeto con los parametros Blast definidos > my $remote_blast = Bio::Tools::Run::RemoteBlast->new( > '-prog' => $program, > '-data' => $database, > '-readmethod' => > 'SearchIO' > ); > > # Se lanza el Blast, y la salida se va a recoger en un nuevo objeto > my $blast_report = $remote_blast->submit_blast($seq_input); > > # Definici?n del tipo de documento de salida, para que el > explorador sepa que es una pagina web > print "Content-type: text/html\n\n"; > > # Recibe los resultados del Blast desde la cola del NCBI > # RID = Remote Blast ID (ejemplo: > 1125048844-28013-78894277386.BLASTQ2) > while ( my @rids = $remote_blast->each_rid ) { > foreach my $rid ( @rids ) { > > # Objeto con el estado de la ejecuci?n del Blast > my $rc = $remote_blast->retrieve_blast($rid); > > if( !ref($rc) ) { > if( $rc < 0 ) { # retrieve_blast retorna -1 cuando hay error > print "Existe algun problema con el RemoteBlast del > NCBI.
\n"; > print "Por favor, inténtelo de nuevo mas tarde.
\n"; > $remote_blast->remove_rid($rid); > } > # retrieve_blast retorna 0 cuando el Blast aun no ha finalizado > # Por ello, esperamos 5 segundos mas > sleep 5; > > } else { > # Cuando la ejecucion finalizado, presentamos los resultados > my $result = $rc->next_result(); > > # Los resultados son presentados, haciendo uso de SearchIO > # y, en particular, de su modulo para formatear en HTML > my $writer = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $out = new Bio::SearchIO(-writer => $writer, -fh => > \*STDOUT); > $out->write_result($result); > > # Elimina el resultado recien presentado por pantalla > $remote_blast->remove_rid($rid); > $exito = 1; > } > } > } > > # Si no han habido resultados, se presenta un mensaje de error > if (!$exito) { > print "Existe un problema con los parámetros > introducidos.
\n"; > print "Por favor, regrese al formulario y revíselos.
\n"; > } > > exit; From Kevin.M.Brown at asu.edu Thu Jan 18 15:23:04 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 18 Jan 2007 13:23:04 -0700 Subject: [Bioperl-l] Graphics help Message-ID: <1A4207F8295607498283FE9E93B775B4029AFE8D@EX02.asurite.ad.asu.edu> So, I'm almost done with this script I've been writing and wondering if there is a way that when a graphic is created if the Key for a given track could be prevented from being chopped off. I created a graphics panel and since the length of these genomes are fairly long I sliced it up into smaller sections which I then combine to make one map with each slice stacked below the previous (like reading a book). What I would love is if on any given panel that the features on the right hand side not get their labels cropped. I've attached an example panel that shows what I'm looking at. The primer name at the bottom as well as the CDS names on the right of the panel are cropped, but the CDS on the left aren't. Is there an option I can feed to the panel to prevent this cropping? Currently I create the panel with: my $panel = Bio::Graphics::Panel->new( -length => $length, -key_style => 'between', -width => 768, -pad_left => 10, -pad_right => 10, -offset => $start, ); -------------- next part -------------- A non-text attachment was scrubbed... Name: slice.PNG Type: image/png Size: 9903 bytes Desc: slice.PNG Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070118/a2ce8524/attachment.png From crabtree at tigr.ORG Fri Jan 19 07:52:41 2007 From: crabtree at tigr.ORG (Jonathan Crabtree) Date: Fri, 19 Jan 2007 07:52:41 -0500 Subject: [Bioperl-l] Graphics help In-Reply-To: <1A4207F8295607498283FE9E93B775B4029AFE8D@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B4029AFE8D@EX02.asurite.ad.asu.edu> Message-ID: <45B0BF19.2050802@tigr.org> Hi Kevin- One simple workaround for this problem is to increase the value of -pad_right (by an amount based on the font size and longest expected label.) Jonathan Kevin Brown wrote: > So, I'm almost done with this script I've been writing and wondering if > there is a way that when a graphic is created if the Key for a given > track could be prevented from being chopped off. > > I created a graphics panel and since the length of these genomes are > fairly long I sliced it up into smaller sections which I then combine to > make one map with each slice stacked below the previous (like reading a > book). What I would love is if on any given panel that the features on > the right hand side not get their labels cropped. I've attached an > example panel that shows what I'm looking at. The primer name at the > bottom as well as the CDS names on the right of the panel are cropped, > but the CDS on the left aren't. Is there an option I can feed to the > panel to prevent this cropping? > > Currently I create the panel with: > my $panel = > Bio::Graphics::Panel->new( > -length => $length, > -key_style => 'between', > -width => 768, > -pad_left => 10, > -pad_right => 10, > -offset => $start, > ); > From cristiangary at gmail.com Fri Jan 19 08:20:02 2007 From: cristiangary at gmail.com (Cristian Gary) Date: Fri, 19 Jan 2007 10:20:02 -0300 Subject: [Bioperl-l] ncbi blastn problem. Message-ID: <95ef8cd0701190520o31a9f7ban5ac07c35d2069477@mail.gmail.com> i have a problem with the result of the analisis in the ncbi blastn with Bio::Tools::Run::RemoteBlast that is only with de alignement of nucleotide - nucleotide , i dont have any problem with blastp. ncbi return , "No significant similarity found." but when i send the same fasta file with the ncbi webpage , return the correct alignement. any help. " this is an example that i find .... and learning...." use Bio::Tools::Run::RemoteBlast; use Bio::SeqIO; my $Seq_in = Bio::SeqIO->new (-file => 'fasta/prueba.fasta', -format => 'fasta'); my $query = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::RemoteBlast->new( '-prog' => 'blastn', '-data' => 'nr', _READMETHOD => "Blast", ); my $blast_report = $factory->submit_blast($query); my $max_number = 100; my $trial = 0; while ( my @rids = $factory->each_rid ) { print STDERR "\nSorry, maximum number of retries $max_number exceeded\n" if $trial >= $max_number; last if $trial >= $max_number; $trial++; print STDERR "waiting... ".(5*$trial)." units of time\n" ; # RID = Remote Blast ID (e.g: 1017772174-16400-6638) foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { # retrieve_blast returns -1 on error $factory->remove_rid($rid); } # retrieve_blast returns 0 on 'job not finished' sleep 5*$trial; } else { #---- Blast done ---- $factory->remove_rid($rid); my $result = $rc->next_result; print "database: ", $result->database_name(), "\n"; print "letterzs: " , $result->database_letters(), "\n"; print "entradas: ", $result->database_entries(),"\n"; print "@rids" , "\n"; while( my $hit = $result->next_hit ) { print "hit name is: ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "score is: ", $hsp->score, "\n"; } } } } } FASTA::: >Problema cagctcacccgcgccgccagagaggggcgcattcgcagtatccccggttttggggagaaaaccgaagcgcgcatcctgga agccctccaggcccagatcgccgccgttccccgttttcccatcgccgtcgccgccccgtatgccgctgccctggtccgct atctgcagaacgtacccggtgtgcggcgggtggtggtggccggcagcttccgacgcggcagggatacggtgggcgacctg gatatactggctacggccactgcagacagcccggtcatggaacgcttcaccgcctatgaggatgtggcggaagtgttct -- "El conocimiento le pertecene a la humanidad" "Gnu/linux -------- free your mind...... www.kubuntu.org From aperezp at uma.es Fri Jan 19 06:49:31 2007 From: aperezp at uma.es (=?ISO-8859-1?Q?=22Antonio_J=2E_P=E9rez=22?=) Date: Fri, 19 Jan 2007 12:49:31 +0100 Subject: [Bioperl-l] Bio::SearchIO::Writer::HTMLResultWriter problem Message-ID: <45B0B04B.4070308@uma.es> Hi everybody, I am new in this list. I am trying to use a remoteBlast for a Bioinformatics course. I use the Bio::SearchIO::Writer::HTMLResultWriter module but I obtain a plain output without carriage returns. Could it be a bug or my code is wrong? I send enclosed a copy of my script, and the blast is implemented in: http://jaguar.genetica.uma.es/blastUNIA/blastHTTP.html Please, could you help me? Thanks in advance, Antonio. Pd: I use the last BioPerl version -- Antonio J. P?rez Pulido, PhD Instituto Nacional de Bioinform?tica (INB) Integrated Bioinformatics Node (GNV-5) Facultad de Ciencias (Dpto. de Gen?tica) Campus Universitario de Teatinos 29071 M?laga (Spain) http://www.ajperez.cjb.net / antoniojperez at uma.es Tfno. +34 952 131 957 -------------------------------------------------------- Try the server for protein function assigment (AnaGram): http://jaguar.genetica.uma.es/anagram.htm -------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: blastHTTP.cgi Type: application/x-cgi Size: 3560 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070119/53c0ca29/attachment.bin From y.itan at ucl.ac.uk Fri Jan 19 06:51:24 2007 From: y.itan at ucl.ac.uk (Yuval Itan) Date: Fri, 19 Jan 2007 11:51:24 +0000 Subject: [Bioperl-l] ClustalW on Bioperl Message-ID: <5c722add565877fee42133ebacb6ca0f@ucl.ac.uk> Dear all, I need to align a few hundred sequence pairs using ClustalW, with output given in pir or gde format. I would appreciate any suggestion of how to do that using Bioperl, and if Bioperl has parsing options for the ClustalW output. Thanks a lot, Yuval From marian.thieme at lycos.de Fri Jan 19 07:55:50 2007 From: marian.thieme at lycos.de (marian thieme) Date: Fri, 19 Jan 2007 12:55:50 +0000 Subject: [Bioperl-l] Model deletions Message-ID: <188661178015417@lycos-europe.com> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070119/906101ef/attachment.html From cjfields at uiuc.edu Fri Jan 19 10:30:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 19 Jan 2007 09:30:51 -0600 Subject: [Bioperl-l] ncbi blastn problem. In-Reply-To: <95ef8cd0701190520o31a9f7ban5ac07c35d2069477@mail.gmail.com> References: <95ef8cd0701190520o31a9f7ban5ac07c35d2069477@mail.gmail.com> Message-ID: Note these two lines in the BLAST output: Number of sequences better than 0.001: 0 Number of HSP's better than 0.001 without gapping: 0 The default expect values are set differently for BLAST web page submissions (10) vs the URLAPI version (0.001). Use the parameter '- expect' to change that: my $factory = Bio::Tools::Run::RemoteBlast->new( '-prog' => 'blastn', '-data' => 'nr', '-readmethod' => 'blast', '-expect' => 10 ); chris On Jan 19, 2007, at 7:20 AM, Cristian Gary wrote: > i have a problem with the result of the analisis in the ncbi > blastn with > Bio::Tools::Run::RemoteBlast that is only with de alignement of > nucleotide - > nucleotide , i dont have any problem with blastp. > > ncbi return , "No significant similarity found." but when i send > the same > fasta file with the ncbi webpage , return the correct alignement. > any help. > > " this is an example that i find .... and learning...." > > use Bio::Tools::Run::RemoteBlast; > use Bio::SeqIO; > > > my $Seq_in = Bio::SeqIO->new (-file => 'fasta/prueba.fasta', > -format => 'fasta'); > my $query = $Seq_in->next_seq(); > > my $factory = Bio::Tools::Run::RemoteBlast->new( > '-prog' => 'blastn', > '-data' => 'nr', > _READMETHOD => "Blast", > > ); > my $blast_report = $factory->submit_blast($query); > my $max_number = 100; > my $trial = 0; > > > while ( my @rids = $factory->each_rid ) { > > print STDERR "\nSorry, maximum number of retries $max_number > exceeded\n" if > $trial >= $max_number; > last if $trial >= $max_number; > $trial++; > > print STDERR "waiting... ".(5*$trial)." units of time\n" ; > # RID = Remote Blast ID (e.g: 1017772174-16400-6638) > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > # retrieve_blast returns -1 on error > $factory->remove_rid($rid); > } > # retrieve_blast returns 0 on 'job not finished' > sleep 5*$trial; > } else { > > #---- Blast done ---- > > $factory->remove_rid($rid); > my $result = $rc->next_result; > print "database: ", $result->database_name(), "\n"; > print "letterzs: " , $result->database_letters(), "\n"; > print "entradas: ", $result->database_entries(),"\n"; > print "@rids" , "\n"; > > while( my $hit = $result->next_hit ) { > print "hit name is: ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "score is: ", $hsp->score, "\n"; > } > } > } > } > } > > > FASTA::: > >> Problema > cagctcacccgcgccgccagagaggggcgcattcgcagtatccccggttttggggagaaaaccgaagcgc > gcatcctgga > agccctccaggcccagatcgccgccgttccccgttttcccatcgccgtcgccgccccgtatgccgctgcc > ctggtccgct > atctgcagaacgtacccggtgtgcggcgggtggtggtggccggcagcttccgacgcggcagggatacggt > gggcgacctg > gatatactggctacggccactgcagacagcccggtcatggaacgcttcaccgcctatgaggatgtggcgg > aagtgttct > > > > > -- > "El conocimiento le pertecene a la humanidad" > > "Gnu/linux -------- free your mind...... > www.kubuntu.org > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johan.viklund at gmail.com Fri Jan 19 13:20:20 2007 From: johan.viklund at gmail.com (Johan Viklund) Date: Fri, 19 Jan 2007 19:20:20 +0100 Subject: [Bioperl-l] Bio::Tree development Message-ID: <5e924f0a0701191020od11b4eax3b77bb67e05cc75d@mail.gmail.com> Hi, I've started using Bio::Tree{,IO} and would very much like to update it (it was I who filed bug 2191 ). I have listed below the main issues I think should be addressed, (there might be more). I just thought I could get some comments/suggestions on this, and if you're happy with this I could start working on it. Shouldn't take so long time. == Bugs (or what I think is bugs) == * Spelling of Descendent. I think old (misspelled) names should be reatained for backwards-compability, but they should be aliases. Or the old methods can be kept and they delegate to the new ones with a warning (I've seen this in other places of BioPerl). * Unifying the newick writing between the nexus.pm and newick.pm in TreeIO I think it's a bit strange having two implementations for this. For parsing newickstrings there's only one (in newick.pm). Other nice things this would bring is adding the ability to sort the nodes in the nexus output, now this is only possible when writing newick-files. There might be other slight differences too (I haven't checked). * Remove reverse_edge() from Node.pm It calls the nonexsistant delete_edge() (which should be roughly equivalent to remove_Descendent()), and I believe that this is an old helper function for the old reroot method (as I noted in the above bugreport). * Move get_leaf_nodes() from TreeI.pm to NodeI.pm Quite obvious, I often only want to do stuff on leaf_nodes from a particular node in a tree, this would be much clearer than having to write grep { $_->is_Leaf } $node->get_all_Descendents; all the time. Reimplement the method in TreeI.pm roughly like this: $self->get_root_node->get_leaf_nodes(); == Additions == * More Tests In part to reflect any changes, and also to increase the coverage of our tests. * Better Kualitee * Iterators A couple of iterators or tree walk methods/classes for trees. This comes in handy when one wants to annotate tree nodes in different ways. As a bare minimum I would think pre-order, in-order and post-order iterators should be implemented. This would also simplify the different write_tree() methods I think. What would the most bioperly way of implementing an Iterator be? * Implement TreeIO/tgf.pm Parser for the TreeGraph format. == Some minor bugs == * Node->Id (minor bug) For some reason the Id gets set to the bootstrap value for internal nodes, I find this a bit annoying. I think that the internal_id would be better. * General code cleanup Making sure everything is indented according to some standard. I've seen previously that there doesn't seem to be any real standard for how BioPerl code should look like. I would think that it would be a lot clearer to understand lots of the code if it was indented properly. As it is now, the indentation depth changes between 2,3 and 4 within the same file even. * get_Descendents() Undocumented and works, I thought it was each_Descendent()-like, but it was an alias for get_all_Descendents(), highly confusing. Should at least be documented, maybe it's an old remnant... * Naming convensions in BioPerl What are they, sometimes methods look_like_this() ans sometimes they look_like_This(), what's the general rule for when to use capital letters in the beginning of a word (in Bio::Seq there's even a get_SeqFeatures() )? It seems like there are capital letters in a name when there's another BioPerl class/object involved, but I'm not sure (is_Leaf in Node.pm doesn't follow this). -- Johan Viklund PhD Student Molecular Evolution EBC, Uppsala University Norbyv?gen 18C SE-752 36 Uppsala Sweden phone +46(0)18-471 64 03 From adl91 at msstate.edu Fri Jan 19 14:12:27 2007 From: adl91 at msstate.edu (Andy Lindeman) Date: Fri, 19 Jan 2007 13:12:27 -0600 Subject: [Bioperl-l] Parsing CDD Results? Message-ID: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> Hi all-- Has anyone figured a good way to parse the results from NCBI's CDD database (i.e., http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml )? Thanks much. --A From jason at bioperl.org Fri Jan 19 14:24:51 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 19 Jan 2007 11:24:51 -0800 Subject: [Bioperl-l] Parsing CDD Results? In-Reply-To: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> References: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> Message-ID: <840C35A7-1F55-4517-9D4E-72A1B97F0173@bioperl.org> if you run it locally isn't it just RPSBLAST which *used* to be parseable by Bio::SearchIO::blast - I am not sure now. or are you talking about running it via web? On Jan 19, 2007, at 11:12 AM, Andy Lindeman wrote: > Hi all-- > > Has anyone figured a good way to parse the results from NCBI's CDD > database (i.e., http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml )? > > Thanks much. > > --A > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From bix at sendu.me.uk Fri Jan 19 14:18:17 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 19 Jan 2007 19:18:17 +0000 Subject: [Bioperl-l] Bio::Tree development In-Reply-To: <5e924f0a0701191020od11b4eax3b77bb67e05cc75d@mail.gmail.com> References: <5e924f0a0701191020od11b4eax3b77bb67e05cc75d@mail.gmail.com> Message-ID: <45B11979.30008@sendu.me.uk> Johan Viklund wrote: > Hi, > > I've started using Bio::Tree{,IO} and would very much like to update > it (it was I who filed bug 2191 > ). > > I have listed below the main issues I think should be addressed, > (there might be more). I just thought I could get some > comments/suggestions on this, and if you're happy with this I could > start working on it. Shouldn't take so long time. > > > == Bugs (or what I think is bugs) == > > * Spelling of Descendent. > I think old (misspelled) names should be reatained for > backwards-compability, but they should be aliases. Or the old methods > can be kept and they delegate to the new ones with a warning (I've > seen this in other places of BioPerl). This came up before; its an alternate spelling found in dictionaries, so isn't a bug. But by all means add aliases for the more common spelling. > == Some minor bugs == > > * Node->Id (minor bug) > For some reason the Id gets set to the bootstrap value for internal > nodes, I find this a bit annoying. I think that the internal_id would > be better. Where does this come up? If only with certain TreeIO input formats, that's probably because the id field was abused by making it the bootstrap and/or Bioperl has no way of distinguishing the bootstrap from the id. > * General code cleanup > Making sure everything is indented according to some standard. I've > seen previously that there doesn't seem to be any real standard for > how BioPerl code should look like. I would think that it would be a > lot clearer to understand lots of the code if it was indented > properly. As it is now, the indentation depth changes between 2,3 and > 4 within the same file even. My own personal preference is an indent of 4 spaces. > * get_Descendents() > Undocumented and works, I thought it was each_Descendent()-like, but > it was an alias for get_all_Descendents(), highly confusing. Should at > least be documented, maybe it's an old remnant... Please document it :) > * Naming convensions in BioPerl > What are they, sometimes methods look_like_this() ans sometimes they > look_like_This(), what's the general rule for when to use capital > letters in the beginning of a word (in Bio::Seq there's even a > get_SeqFeatures() )? It seems like there are capital letters in a name > when there's another BioPerl class/object involved, but I'm not sure > (is_Leaf in Node.pm doesn't follow this). I'd say the most common convention is all lower-case, underscores between words. I was actually thinking of having is_leaf() a synonym of is_Leaf, because the latter is really annoying. Anyway, other than the points above I'd say all your ideas sound great; code away! From cjfields at uiuc.edu Fri Jan 19 14:28:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 19 Jan 2007 13:28:43 -0600 Subject: [Bioperl-l] Parsing CDD Results? In-Reply-To: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> References: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> Message-ID: <0D9087AB-CC32-4422-93F6-00D8B81F99B8@uiuc.edu> AFAIK there isn't. I don't think CDD is accessible via eutils (though I haven't tried it personally). If there is text/XML output for the sequences in CDD then one could possibly write up an AlignIO parser... chris On Jan 19, 2007, at 1:12 PM, Andy Lindeman wrote: > Hi all-- > > Has anyone figured a good way to parse the results from NCBI's CDD > database (i.e., http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml )? > > Thanks much. > > --A > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Jan 19 15:16:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 19 Jan 2007 14:16:07 -0600 Subject: [Bioperl-l] Bio::Tree development In-Reply-To: <45B11979.30008@sendu.me.uk> References: <5e924f0a0701191020od11b4eax3b77bb67e05cc75d@mail.gmail.com> <45B11979.30008@sendu.me.uk> Message-ID: <151FC25E-28A0-4FF6-A25E-45F6D975E029@uiuc.edu> On Jan 19, 2007, at 1:18 PM, Sendu Bala wrote: ... > >> * General code cleanup >> Making sure everything is indented according to some standard. I've >> seen previously that there doesn't seem to be any real standard for >> how BioPerl code should look like. I would think that it would be a >> lot clearer to understand lots of the code if it was indented >> properly. As it is now, the indentation depth changes between 2,3 and >> 4 within the same file even. > > My own personal preference is an indent of 4 spaces. ... Mine as well. Some people also use tabs vs spaces (I prefer spaces). However, I think (Johan) you'll be hard-pressed to force a standard there. You're always welcome to run it through perltidy prior to commits! chris From MEC at stowers-institute.org Fri Jan 19 15:30:54 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 19 Jan 2007 14:30:54 -0600 Subject: [Bioperl-l] Bio/SeqFeature/Annotated proposed patch Message-ID: I ran across this problem: Setting the score of a feature to 0 (zero) cuases it to really be set to '.'. I'm poised to apply the following patch. Any objections? Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri Index: Annotated.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v retrieving revision 1.36 diff -c -r1.36 Annotated.pm *** Annotated.pm 16 Oct 2006 16:20:38 -0000 1.36 --- Annotated.pm 19 Jan 2007 20:25:01 -0000 *************** *** 425,432 **** $self->add_Annotation('score', $term); } ! $self->score('.') unless ($self->get_Annotations('score')); # make sure we always have something ! return $self->get_Annotations('score'); } --- 425,439 ---- $self->add_Annotation('score', $term); } ! #$self->score('.') unless ($self->get_Annotations('score')); # make sure we always have something ! ! # malcolm.cook at stowers-institute.org is not sure why we want to ! # 'make sure we always have something', but, in any case, the above ! # sets the score to '.' when there is an explicit score of 0, which ! # can't be correct, so, re-writing to use 'has_tag' as follows: ! ! $self->score('.') unless $self->has_tag('score'); # make sure we always have something ! return $self->get_Annotations('score'); } From MEC at stowers-institute.org Fri Jan 19 15:39:24 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 19 Jan 2007 14:39:24 -0600 Subject: [Bioperl-l] proposed patch to Bio/Annotation/SimpleValue.pm Message-ID: I got an error upon incrementing the score of a Bio::SeqFeature::Annotated Operation "+": no method found, left argument in overloaded package Bio::Annotation::SimpleValue, right argument has no overloaded magic at ./myscript line xxxx. It turns out that Bio/Annotation/SimpleValue overloaded "" and eq but did not provide a fallback for other operators. The following patch should fix that. Any Objections? Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri Index: SimpleValue.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Annotation/SimpleValue.pm,v retrieving revision 1.21 diff -c -r1.21 SimpleValue.pm *** SimpleValue.pm 26 Sep 2006 22:03:05 -0000 1.21 --- SimpleValue.pm 19 Jan 2007 20:34:45 -0000 *************** *** 62,69 **** package Bio::Annotation::SimpleValue; use strict; ! use overload '""' => sub { $_[0]->value}; ! use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; # Object preamble - inherits from Bio::Root::Root --- 62,72 ---- package Bio::Annotation::SimpleValue; use strict; ! ! use overload ! '""' => sub { $_[0]->value}, ! 'eq' => sub { "$_[0]" eq "$_[1]" }, ! fallback => 1; # Object preamble - inherits from Bio::Root::Root From cjfields at uiuc.edu Fri Jan 19 15:57:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 19 Jan 2007 14:57:34 -0600 Subject: [Bioperl-l] Bio/SeqFeature/Annotated proposed patch In-Reply-To: References: Message-ID: <7B0A4DC3-E3BC-46CE-99D9-A8341CD39A00@uiuc.edu> I don't have a problem with it, though I wonder if the '.' has something to do with GFF formatting. Does this pass tests? Chris On Jan 19, 2007, at 2:30 PM, Cook, Malcolm wrote: > I ran across this problem: > > Setting the score of a feature to 0 (zero) cuases it to really be > set to > '.'. > > I'm poised to apply the following patch. > > Any objections? > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > > > Index: Annotated.pm > =================================================================== > RCS file: > /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm,v > retrieving revision 1.36 > diff -c -r1.36 Annotated.pm > *** Annotated.pm 16 Oct 2006 16:20:38 -0000 1.36 > --- Annotated.pm 19 Jan 2007 20:25:01 -0000 > *************** > *** 425,432 **** > $self->add_Annotation('score', $term); > } > > ! $self->score('.') unless ($self->get_Annotations('score')); # make > sure we always have something > ! > return $self->get_Annotations('score'); > } > > --- 425,439 ---- > $self->add_Annotation('score', $term); > } > > ! #$self->score('.') unless ($self->get_Annotations('score')); # > make > sure we always have something > ! > ! # malcolm.cook at stowers-institute.org is not sure why we want to > ! # 'make sure we always have something', but, in any case, the > above > ! # sets the score to '.' when there is an explicit score of 0, > which > ! # can't be correct, so, re-writing to use 'has_tag' as follows: > ! > ! $self->score('.') unless $self->has_tag('score'); # make sure we > always have something > ! > return $self->get_Annotations('score'); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From luciap at sas.upenn.edu Fri Jan 19 15:53:11 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Fri, 19 Jan 2007 15:53:11 -0500 Subject: [Bioperl-l] Bio::Tree development In-Reply-To: <151FC25E-28A0-4FF6-A25E-45F6D975E029@uiuc.edu> References: <5e924f0a0701191020od11b4eax3b77bb67e05cc75d@mail.gmail.com> <45B11979.30008@sendu.me.uk> <151FC25E-28A0-4FF6-A25E-45F6D975E029@uiuc.edu> Message-ID: <1169239991.45b12fb71b845@webmail.sas.upenn.edu> Hi One of the functions missing from TreeIO that will be good to implement is a collpase bootstrap function, meaning to collapse all nodes bellow a certain bootstrap value I do have a bit of code to do this, however it cannot be generalized until the bootstrap value is consistently captured by the same function, so far, for the trees I have at least, the internal ID is the bootstrap value, but that may not be true for all trees.Or it can be written asuming that you already set the boostrap to the right value. The function bootstrap in Bio::Tree::NodeI never gets the bootstrap right in any tree I've used (always newick format I am talking) just an idea, Lucia Quoting Chris Fields : > > On Jan 19, 2007, at 1:18 PM, Sendu Bala wrote: > ... > > > >> * General code cleanup > >> Making sure everything is indented according to some standard. I've > >> seen previously that there doesn't seem to be any real standard for > >> how BioPerl code should look like. I would think that it would be a > >> lot clearer to understand lots of the code if it was indented > >> properly. As it is now, the indentation depth changes between 2,3 and > >> 4 within the same file even. > > > > My own personal preference is an indent of 4 spaces. > ... > > Mine as well. Some people also use tabs vs spaces (I prefer > spaces). However, I think (Johan) you'll be hard-pressed to force a > standard there. > > You're always welcome to run it through perltidy prior to commits! > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From bix at sendu.me.uk Fri Jan 19 16:06:43 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 19 Jan 2007 21:06:43 +0000 Subject: [Bioperl-l] Bio/SeqFeature/Annotated proposed patch In-Reply-To: References: Message-ID: <45B132E3.9000601@sendu.me.uk> Cook, Malcolm wrote: > I ran across this problem: > > Setting the score of a feature to 0 (zero) cuases it to really be set to > '.'. > > I'm poised to apply the following patch. > > Any objections? > ! $self->score('.') unless ($self->get_Annotations('score')); # make > sure we always have something vs > ! $self->score('.') unless $self->has_tag('score'); # make sure we > always have something I didn't look into how this is setup, but could something have a score tag without the score being defined? I'd have thought it safest to call $self->get_Annotations('score') and check if the answer was defined. So really the solution would seem to be: $self->score('.') unless @{[$self->get_Annotations('score')]}; (or similar) From cjfields at uiuc.edu Fri Jan 19 15:55:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 19 Jan 2007 14:55:21 -0600 Subject: [Bioperl-l] Bio::Tree development In-Reply-To: <1169239991.45b12fb71b845@webmail.sas.upenn.edu> References: <5e924f0a0701191020od11b4eax3b77bb67e05cc75d@mail.gmail.com> <45B11979.30008@sendu.me.uk> <151FC25E-28A0-4FF6-A25E-45F6D975E029@uiuc.edu> <1169239991.45b12fb71b845@webmail.sas.upenn.edu> Message-ID: <1295F2B3-3126-4038-B4AF-24CE334DAAFE@uiuc.edu> You should always file any potential problems as a bug, otherwise we'll never know about it. http://www.bioperl.org/wiki/Bugs Relevant code and data demonstrate the problem also helps; submit them as attachments after filing a bug report: http://bugzilla.open-bio.org/ chris On Jan 19, 2007, at 2:53 PM, Lucia Peixoto wrote: > Hi > > One of the functions missing from TreeIO that will be good to > implement is a > collpase bootstrap function, meaning to collapse all nodes bellow a > certain > bootstrap value > > I do have a bit of code to do this, however it cannot be > generalized until the > bootstrap value is consistently captured by the same function, so > far, for the > trees I have at least, the internal ID is the bootstrap value, but > that may not > be true for all trees.Or it can be written asuming that you already > set the > boostrap to the right value. > > The function bootstrap in Bio::Tree::NodeI never gets the bootstrap > right in any > tree I've used (always newick format I am talking) > > just an idea, > > Lucia > > > > Quoting Chris Fields : > >> >> On Jan 19, 2007, at 1:18 PM, Sendu Bala wrote: >> ... >>> >>>> * General code cleanup >>>> Making sure everything is indented according to some standard. I've >>>> seen previously that there doesn't seem to be any real standard for >>>> how BioPerl code should look like. I would think that it would be a >>>> lot clearer to understand lots of the code if it was indented >>>> properly. As it is now, the indentation depth changes between >>>> 2,3 and >>>> 4 within the same file even. >>> >>> My own personal preference is an indent of 4 spaces. >> ... >> >> Mine as well. Some people also use tabs vs spaces (I prefer >> spaces). However, I think (Johan) you'll be hard-pressed to force a >> standard there. >> >> You're always welcome to run it through perltidy prior to commits! >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Fri Jan 19 16:34:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 19 Jan 2007 13:34:29 -0800 Subject: [Bioperl-l] Bio::Tree development In-Reply-To: <1169239991.45b12fb71b845@webmail.sas.upenn.edu> References: <5e924f0a0701191020od11b4eax3b77bb67e05cc75d@mail.gmail.com> <45B11979.30008@sendu.me.uk> <151FC25E-28A0-4FF6-A25E-45F6D975E029@uiuc.edu> <1169239991.45b12fb71b845@webmail.sas.upenn.edu> Message-ID: <654E4463-BD37-4049-A032-4DF1295659F3@bioperl.org> Just assume the bootstrap value will be available from bootstrap() and please send in the function for collapsing by bootstrap. I never finished consensus_tree building from bootstrap_replicates like I wanted to, but this is a nice logical addition to the package. On Jan 19, 2007, at 12:53 PM, Lucia Peixoto wrote: > Hi > > One of the functions missing from TreeIO that will be good to > implement is a > collpase bootstrap function, meaning to collapse all nodes bellow a > certain > bootstrap value > > I do have a bit of code to do this, however it cannot be > generalized until the > bootstrap value is consistently captured by the same function, so > far, for the > trees I have at least, the internal ID is the bootstrap value, but > that may not > be true for all trees.Or it can be written asuming that you already > set the > boostrap to the right value. > > The function bootstrap in Bio::Tree::NodeI never gets the bootstrap > right in any > tree I've used (always newick format I am talking) > This has been discussed multiple times on the list - I don't know if I've posted the answer on the wiki or not, but it should go to the FAQ and/or Tree relevant pages. Please understand that this is a limitation of the NEWICK FORMAT. Everyone complains about it but I think you are complaining to the wrong people. We don't assume the internal node IDs are bootstrap values (often they are not). Some formats use "ID[BOOTSTRAP]" which we parse into node id, and bootstrap. If you know that the internal ids represent the bootstrap you can easily move the data over for my $node ( grep { ! $_->is_Leaf } $tree->get_nodes ) { $node->bootstrap($node->id); $node->id(''); } if this is sufficiently annoying to you, write a function and submit it as a patch after it has been tested.... Here is an example: sub migrate_id_to_bootstrap { my $tree = shift; for my $node ( grep { ! $_->is_Leaf } $tree->get_nodes ) { $node->bootstrap($node->id); $node->id(''); } } -jason > just an idea, > > Lucia > > > > Quoting Chris Fields : > >> >> On Jan 19, 2007, at 1:18 PM, Sendu Bala wrote: >> ... >>> >>>> * General code cleanup >>>> Making sure everything is indented according to some standard. I've >>>> seen previously that there doesn't seem to be any real standard for >>>> how BioPerl code should look like. I would think that it would be a >>>> lot clearer to understand lots of the code if it was indented >>>> properly. As it is now, the indentation depth changes between >>>> 2,3 and >>>> 4 within the same file even. >>> >>> My own personal preference is an indent of 4 spaces. >> ... >> >> Mine as well. Some people also use tabs vs spaces (I prefer >> spaces). However, I think (Johan) you'll be hard-pressed to force a >> standard there. >> >> You're always welcome to run it through perltidy prior to commits! >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From MEC at stowers-institute.org Fri Jan 19 18:07:31 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 19 Jan 2007 17:07:31 -0600 Subject: [Bioperl-l] Bio/SeqFeature/Annotated proposed patch Message-ID: Sendu & Chris, Thanks for vetting this change. My mistake for mixing 'Annotations' with deprecated 'tags'. Regardless, it does pass SeqFeature.t, including these new tests I just added, the last of which fails without the patch: is $sfa3->score(), 12; $sfa3->score(11); is $sfa3->score(), 11; $sfa3->score(0); is $sfa3->score(), 0; ...as confirmed via `./Build test --test_files t/SeqFeature.t --verbose` However, Sendu's improved implementation passes them too, as does testing for `defined`, as is done elsewhere in this module, viz: $self->score('.') unless (defined($self->get_Annotations('score'))); # make sure we always have something So, I'm going with the test for 'defined' since this approach is what is used elsewhere in this module. Thanks for your eyes and minds.... Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Friday, January 19, 2007 3:07 PM > To: Cook, Malcolm > Cc: bioperl list; allenday at ucla.edu; allenday at cpan.org > Subject: Re: [Bioperl-l] Bio/SeqFeature/Annotated proposed patch > > Cook, Malcolm wrote: > > I ran across this problem: > > > > Setting the score of a feature to 0 (zero) cuases it to > really be set to > > '.'. > > > > I'm poised to apply the following patch. > > > > Any objections? > > > ! $self->score('.') unless > ($self->get_Annotations('score')); # make > > sure we always have something > > vs > > > ! $self->score('.') unless $self->has_tag('score'); # make sure we > > always have something > > I didn't look into how this is setup, but could something > have a score > tag without the score being defined? I'd have thought it > safest to call > $self->get_Annotations('score') and check if the answer was defined. > > So really the solution would seem to be: > > $self->score('.') unless @{[$self->get_Annotations('score')]}; > (or similar) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Fri Jan 19 21:00:27 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 19 Jan 2007 18:00:27 -0800 Subject: [Bioperl-l] wikipedia-ers Message-ID: Anyone wikipedia gurus want to spend some time updating the article on BioPerl? http://en.wikipedia.org/wiki/BioPerl Apparently it needs some citations and references other than self- referential ones and/or it needs to have proper citations. -jason -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From hlapp at gmx.net Sat Jan 20 00:35:49 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 20 Jan 2007 00:35:49 -0500 Subject: [Bioperl-l] Bio::Tree development In-Reply-To: <5e924f0a0701191020od11b4eax3b77bb67e05cc75d@mail.gmail.com> References: <5e924f0a0701191020od11b4eax3b77bb67e05cc75d@mail.gmail.com> Message-ID: <47A687BC-7286-4032-A823-9171CE440BD5@gmx.net> On Jan 19, 2007, at 1:20 PM, Johan Viklund wrote: > * Naming convensions in BioPerl > What are they, sometimes methods look_like_this() ans sometimes they > look_like_This(), what's the general rule for when to use capital > letters in the beginning of a word (in Bio::Seq there's even a > get_SeqFeatures() )? It seems like there are capital letters in a name > when there's another BioPerl class/object involved, but I'm not sure > (is_Leaf in Node.pm doesn't follow this). The convention so far has been to capitalize if the returned object, or element of the returned array, is a BioPerl object. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From roy at colibase.bham.ac.uk Thu Jan 18 09:25:47 2007 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Thu, 18 Jan 2007 14:25:47 +0000 Subject: [Bioperl-l] Translate sequences align and reverse translate In-Reply-To: <69E2D2428BD6C2429B8944FEC53B1EB91263D5@colhpaexc004.HPA.org.uk> References: <69E2D2428BD6C2429B8944FEC53B1EB91263D5@colhpaexc004.HPA.org.uk> Message-ID: <45AF836B.60909@colibase.bham.ac.uk> > I would like to automate the process of taking coding DNA sequences, > translating them to amino acids, aligning them with clustalw and then > reverse-translating back to DNA so as to obtain the best alignment. > I can think of how I would do this with bioperl but since it is > probably a common process I would like to ask if anyone already has > such a script that they wouldn't mind sharing so I don't re-invent > the wheel. align_on_codons.pl in the examples/align directory looks like it does what you want (I've never used it, though). Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk From bix at sendu.me.uk Sat Jan 20 05:45:06 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 20 Jan 2007 10:45:06 +0000 Subject: [Bioperl-l] wikipedia-ers In-Reply-To: References: Message-ID: <45B1F2B2.4030908@sendu.me.uk> Jason Stajich wrote: > Anyone wikipedia gurus want to spend some time updating the article > on BioPerl? > http://en.wikipedia.org/wiki/BioPerl > > Apparently it needs some citations and references other than self- > referential ones and/or it needs to have proper citations. I've added a bunch of refs that hopefully satisfy the criteria. However I don't know if I'm supposed to delete the {{Unreferenced|date=January 2007}} and {{Notability|date=January 2007}} tags myself, or if mysterious 'editor' will notice the changes and get rid of them for me. From bix at sendu.me.uk Sat Jan 20 09:59:53 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 20 Jan 2007 14:59:53 +0000 Subject: [Bioperl-l] wikipedia-ers In-Reply-To: <45B22BD7.3080802@campus.iztacala.unam.mx> References: <45B1F2B2.4030908@sendu.me.uk> <45B22BD7.3080802@campus.iztacala.unam.mx> Message-ID: <45B22E69.9010406@sendu.me.uk> Mauricio Herrera Cuadra wrote: > Good job Sendu! I say you can remove the {{Unreferenced|date=January > 2007}} tag as long as most of the current content in the article has > proper citations. As for the {{Notability|date=January 2007}} tag, it > must remain there in order to encourage article growth. The same applies > for the {{compu-soft-stub}} and {{bioinformatics-stub}} tags at the > bottom of it. I agree on the stubs, but the notability tag remaining to encourage article growth? I don't think so: it has a specific meaning and no longer applies after my changes - BioPerl has been shown to be 'notable' thanks to the references I supplied. Indeed, if the tag remains the whole article can be deleted, which hardly encourages growth! In any case, I left a message for the person who originally added the tags, so hopefully he'll remove them. From arareko at campus.iztacala.unam.mx Sat Jan 20 09:48:55 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 20 Jan 2007 08:48:55 -0600 Subject: [Bioperl-l] wikipedia-ers In-Reply-To: <45B1F2B2.4030908@sendu.me.uk> References: <45B1F2B2.4030908@sendu.me.uk> Message-ID: <45B22BD7.3080802@campus.iztacala.unam.mx> Good job Sendu! I say you can remove the {{Unreferenced|date=January 2007}} tag as long as most of the current content in the article has proper citations. As for the {{Notability|date=January 2007}} tag, it must remain there in order to encourage article growth. The same applies for the {{compu-soft-stub}} and {{bioinformatics-stub}} tags at the bottom of it. Cheers, Mauricio. Sendu Bala wrote: > Jason Stajich wrote: >> Anyone wikipedia gurus want to spend some time updating the article >> on BioPerl? >> http://en.wikipedia.org/wiki/BioPerl >> >> Apparently it needs some citations and references other than self- >> referential ones and/or it needs to have proper citations. > > I've added a bunch of refs that hopefully satisfy the criteria. However > I don't know if I'm supposed to delete the > {{Unreferenced|date=January 2007}} > and > {{Notability|date=January 2007}} > tags myself, or if mysterious 'editor' will notice the changes and get > rid of them for me. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From bpederse at gmail.com Sat Jan 20 13:37:25 2007 From: bpederse at gmail.com (Brent Pedersen) Date: Sat, 20 Jan 2007 10:37:25 -0800 Subject: [Bioperl-l] constant Graphics height Message-ID: hi, i'm trying to find my way through all the bioperl / gbrowse docs. can someone point me to an example where the image is kept at a constant height, regardless of the number of tracks, or the overlapping of features? is this possible with gbrowse_img? thanks for any examples or pointers. -brent From cjfields at uiuc.edu Sat Jan 20 18:51:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 20 Jan 2007 17:51:00 -0600 Subject: [Bioperl-l] wikipedia-ers In-Reply-To: <45B22E69.9010406@sendu.me.uk> References: <45B1F2B2.4030908@sendu.me.uk> <45B22BD7.3080802@campus.iztacala.unam.mx> <45B22E69.9010406@sendu.me.uk> Message-ID: On Jan 20, 2007, at 8:59 AM, Sendu Bala wrote: > Mauricio Herrera Cuadra wrote: >> Good job Sendu! I say you can remove the {{Unreferenced|date=January >> 2007}} tag as long as most of the current content in the article has >> proper citations. As for the {{Notability|date=January 2007}} tag, it >> must remain there in order to encourage article growth. The same >> applies >> for the {{compu-soft-stub}} and {{bioinformatics-stub}} tags at the >> bottom of it. > > I agree on the stubs, but the notability tag remaining to encourage > article growth? I don't think so: it has a specific meaning and no > longer applies after my changes - BioPerl has been shown to be > 'notable' > thanks to the references I supplied. Indeed, if the tag remains the > whole article can be deleted, which hardly encourages growth! > > In any case, I left a message for the person who originally added the > tags, so hopefully he'll remove them. I think you have definitely filled both requirements. We could add a link to the BioPerl publications page as well; hard to argue with close to 500 refs! Regardless, the Wikipedia admin who added the tags doesn't seem to think either requirement is fulfilled yet (though it remains open): http://en.wikipedia.org/wiki/User_talk:Wickethewok#BioPerl chris From bosborne11 at verizon.net Sat Jan 20 21:46:42 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 20 Jan 2007 21:46:42 -0500 Subject: [Bioperl-l] human_small-01.xml file In-Reply-To: <45AF4943.1070709@cs.man.ac.uk> Message-ID: Mikel and Ramona, I took at look at the human-small-01.xml file that Mikel sent me, from Intact. The problem is that the PSI MI parser expects to find a field called proteinInteractorRef for every protein in every file it sees. The file you sent me has no such field, it looks like they use the field interactorRef instead. Other files from IntAct, like a slightly older 2.5 file bovine_small.xml that's used in the test suite, do have the proteinInteractorRef field. I'll need to take a closer look. Brian O. On 1/18/07 5:17 AM, "Mikel Ega?a Aranguren" wrote: > Hi, > > gzipped and attached the xml file > > thanks > > Mikel > > Brian Osborne(e)k dio: >> Mikel, >> >> Please send me human_small-01.xml, I'll take a look. >> >> Brian O. >> >> >> On 1/17/07 11:37 AM, "Mikel Ega?a Aranguren" >> wrote: >> >> >>> Hello everyone; >>> >>> I get exactly the same error when parsing the intact file from >>> ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/species/human_small-0 >>> 1. >>> xml >>> >>> and I was about to send an email; help would be much appreciated. >>> >>> thanks a lot >>> >>> Mikel >>> >>> >>> magnusgeist(e)k dio: >>> >>>> dear all, >>>> >>>> trying to read files in psi 2.5 format from intact like this: >>>> >>>> my $io = Bio::Network::IO->new(-format => 'psi', >>>> -source => 'intact', >>>> -file => >>>> 'human_small-07.xml'); >>>> >>>> my $graph = $io->next_network; >>>> >>>> returns the following error: Can't call method "att" on an undefined value >>>> at /vol/pi/lib/perl-5.8.0/Bio/Network/IO/psi.pm line 396. >>>> >>>> doing the same with files from dip: >>>> >>>> my $io = Bio::Network::IO->new( -format => 'psi', >>>> -file => 'Hsapi20070107.mif'); >>>> >>>> my $graph = $io->next_network; >>>> >>>> does not result in any problems. >>>> >>>> would be great if one of you could help! >>>> thank you very much in advance. >>>> magnusgeist >>>> >>>> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From mikel.eganaaranguren at cs.man.ac.uk Sun Jan 21 10:26:43 2007 From: mikel.eganaaranguren at cs.man.ac.uk (=?ISO-8859-1?Q?Mikel_Ega=F1a_Aranguren?=) Date: Sun, 21 Jan 2007 15:26:43 +0000 Subject: [Bioperl-l] Another error reading psi 1 file from intact using bioperl-network-1.5.2_100 In-Reply-To: <45AE50CF.40805@cs.man.ac.uk> References: <8411196.post@talk.nabble.com> <45AE50CF.40805@cs.man.ac.uk> Message-ID: <45B38633.2000600@cs.man.ac.uk> Hello; I'm trying to get some annotations from the interactions of an intact file (in psi 1, otherwise won't parse as it has already been commented in this list). The file I'm using is: ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi1/species/schpo_small.xml I'm interested in the name of each interaction (e.g. spc7_mal2_1), and I try to get it doing the following: for my $interaction ($network->interactions){ my $ac = $interaction->annotation; print "--- ANNOTATIONS\n"; print $ac->get_num_of_annotations(),"\n"; foreach my $key ( $ac->get_all_annotation_keys() ) { my @values = $ac->get_Annotations($key); foreach my $value ( @values ) { print "Annotation ",$key," stringified value ",$value->as_text,"\n"; } } And the number of annotations is 0, even though I'm sure it works as I can access the nodes (the proteins). Is this a bug or am I doing something wrong? thanks a lot regards Mikel Ega?a Aranguren(e)k dio: > Hello everyone; > > I get exactly the same error when parsing the intact file from > ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/species/human_small-01.xml > > and I was about to send an email; help would be much appreciated. > > thanks a lot > > Mikel > > > magnusgeist(e)k dio: > >> dear all, >> >> trying to read files in psi 2.5 format from intact like this: >> >> my $io = Bio::Network::IO->new(-format => 'psi', >> -source => 'intact', >> -file => >> 'human_small-07.xml'); >> >> my $graph = $io->next_network; >> >> returns the following error: Can't call method "att" on an undefined value >> at /vol/pi/lib/perl-5.8.0/Bio/Network/IO/psi.pm line 396. >> >> doing the same with files from dip: >> >> my $io = Bio::Network::IO->new( -format => 'psi', >> -file => 'Hsapi20070107.mif'); >> >> my $graph = $io->next_network; >> >> does not result in any problems. >> >> would be great if one of you could help! >> thank you very much in advance. >> magnusgeist >> >> > > > -- Mikel Ega?a Aranguren - http://www.mikeleganaaranguren.com PhD student - Manchester University Computer Science Cell Cycle Ontology http://www.cellcycleontology.org Gene Ontology Next Generation http://www.gong.manchester.ac.uk Metabolik BioHacklab http://www.sindominio.net/metabolik/weblog X-Evian http://x-evian.org/ From mikel.eganaaranguren at cs.man.ac.uk Sun Jan 21 11:52:53 2007 From: mikel.eganaaranguren at cs.man.ac.uk (=?ISO-8859-1?Q?Mikel_Ega=F1a_Aranguren?=) Date: Sun, 21 Jan 2007 16:52:53 +0000 Subject: [Bioperl-l] bioperl-network-1.5.2_100 node IDs Message-ID: <45B39A65.8010003@cs.man.ac.uk> Hello; is there a way of fine-tuning the types of ids that are stored in the Node objects, in the Bio::Network module? For example, if I have the following in an Intact psi xml file: How can I retrieve the id "mis12_schpo"? At the moment it just gives me "Q9Y738". Thanks a lot. Regards -- Mikel Ega?a Aranguren - http://www.mikeleganaaranguren.com PhD student - Manchester University Computer Science Cell Cycle Ontology http://www.cellcycleontology.org Gene Ontology Next Generation http://www.gong.manchester.ac.uk Metabolik BioHacklab http://www.sindominio.net/metabolik/weblog X-Evian http://x-evian.org/ From bosborne11 at verizon.net Sun Jan 21 17:35:49 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Sun, 21 Jan 2007 17:35:49 -0500 Subject: [Bioperl-l] bioperl-network-1.5.2_100 node IDs In-Reply-To: <45B39A65.8010003@cs.man.ac.uk> Message-ID: Mikel, All the ids are stored as Annotations, so each id is a Bio::Annotation::DBLink object. So to get at these you do something like this: ok $node = $g1->get_nodes_by_id('PIR:A64696'); @proteins = $node->proteins; ok $proteins[0]->accession_number,'PIR:A64696'; my $ac = $proteins[0]->annotation; @ids = $ac->get_Annotations('dblink'); ok $ids[0]->primary_id, "A64696"; By the way, it's frequently the case that a good source of example code are the *t files in the t/ directory. The code above comes from t/ProteinNet.t, for example. You can also do: %ids = $g1->get_ids_by_node($node); This is not as Bioperl-ish but this is the way the modules were initially written. Also, I've updated the documentation in IO/psi.pm, now you can easily tell what is and what is not extracted from the XML and put in the Node object. If there's something missing from the node that you're interested in you should tell me and I'll see what I can do, I use this for my own work and play and have only extracted data from the PSI MI XML that's useful for me and not much else. Brian O. On 1/21/07 11:52 AM, "Mikel Ega?a Aranguren" wrote: > Hello; > > is there a way of fine-tuning the types of ids that are stored in the > Node objects, in the Bio::Network module? > > For example, if I have the following in an Intact psi xml file: > > secondary="mis12_schpo" version="SP_37"/> > > secondary="mis12_schpo"/> > > How can I retrieve the id "mis12_schpo"? At the moment it just gives me > "Q9Y738". > > Thanks a lot. > > Regards From bosborne11 at verizon.net Sun Jan 21 17:52:18 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Sun, 21 Jan 2007 17:52:18 -0500 Subject: [Bioperl-l] Another error reading psi 1 file from intact using bioperl-network-1.5.2_100 In-Reply-To: <45B38633.2000600@cs.man.ac.uk> Message-ID: Mikel, Currently only nodes have annotations, the interactions do not have annotations. See the latest psi.pm, it explains what the nodes and interactions contain. Brian O. On 1/21/07 10:26 AM, "Mikel Ega?a Aranguren" wrote: > Hello; > > I'm trying to get some annotations from the interactions of an intact > file (in psi 1, otherwise won't parse as it has already been commented > in this list). > > The file I'm using is: > ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi1/species/schpo_small.xml > > I'm interested in the name of each interaction (e.g. > spc7_mal2_1), and I try to get it doing the > following: > > for my $interaction ($network->interactions){ > my $ac = $interaction->annotation; > print "--- ANNOTATIONS\n"; > print $ac->get_num_of_annotations(),"\n"; > foreach my $key ( $ac->get_all_annotation_keys() ) { > my @values = $ac->get_Annotations($key); > foreach my $value ( @values ) { > print "Annotation ",$key," stringified value > ",$value->as_text,"\n"; > } > } > > And the number of annotations is 0, even though I'm sure it works as I > can access the nodes (the proteins). Is this a bug or am I doing > something wrong? > > thanks a lot > > regards > > > Mikel Ega?a Aranguren(e)k dio: >> Hello everyone; >> >> I get exactly the same error when parsing the intact file from >> ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/species/human_small-01 >> .xml >> >> and I was about to send an email; help would be much appreciated. >> >> thanks a lot >> >> Mikel >> >> >> magnusgeist(e)k dio: >> >>> dear all, >>> >>> trying to read files in psi 2.5 format from intact like this: >>> >>> my $io = Bio::Network::IO->new(-format => 'psi', >>> -source => 'intact', >>> -file => >>> 'human_small-07.xml'); >>> >>> my $graph = $io->next_network; >>> >>> returns the following error: Can't call method "att" on an undefined value >>> at /vol/pi/lib/perl-5.8.0/Bio/Network/IO/psi.pm line 396. >>> >>> doing the same with files from dip: >>> >>> my $io = Bio::Network::IO->new( -format => 'psi', >>> -file => 'Hsapi20070107.mif'); >>> >>> my $graph = $io->next_network; >>> >>> does not result in any problems. >>> >>> would be great if one of you could help! >>> thank you very much in advance. >>> magnusgeist >>> >>> >> >> >> > From torsten.seemann at infotech.monash.edu.au Sun Jan 21 19:47:32 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 22 Jan 2007 11:47:32 +1100 Subject: [Bioperl-l] Parsing CDD Results? In-Reply-To: <840C35A7-1F55-4517-9D4E-72A1B97F0173@bioperl.org> References: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> <840C35A7-1F55-4517-9D4E-72A1B97F0173@bioperl.org> Message-ID: > > Has anyone figured a good way to parse the results from NCBI's CDD > > database (i.e., http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml )? > if you run it locally isn't it just RPSBLAST which *used* to be > parseable by Bio::SearchIO::blast - I am not sure now. or are you > talking about running it via web? Jason is correct. I run CDD searches locally via Bio::Tools::Run::StandAloneBlast->rpsblast() and parse the results with Bio::SearchIO. "rpsblast" comes with the standard BLAST package. You can download the CDD models already compiled too from NCBI's ftp site. RPS-BLAST runs very quickly too as their are only 12,000 models in the CDD database. --Torsten Seemann From cjfields at uiuc.edu Sun Jan 21 21:12:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 21 Jan 2007 20:12:26 -0600 Subject: [Bioperl-l] Parsing CDD Results? In-Reply-To: References: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> <840C35A7-1F55-4517-9D4E-72A1B97F0173@bioperl.org> Message-ID: <59665746-13B2-4E95-99AD-18D8EC3CA7B9@uiuc.edu> On Jan 21, 2007, at 6:47 PM, Torsten Seemann wrote: >>> Has anyone figured a good way to parse the results from NCBI's CDD >>> database (i.e., http://www.ncbi.nlm.nih.gov/Structure/cdd/ >>> cdd.shtml )? > >> if you run it locally isn't it just RPSBLAST which *used* to be >> parseable by Bio::SearchIO::blast - I am not sure now. or are you >> talking about running it via web? > > Jason is correct. > > I run CDD searches locally via > Bio::Tools::Run::StandAloneBlast->rpsblast() and parse the results > with Bio::SearchIO. "rpsblast" comes with the standard BLAST package. > You can download the CDD models already compiled too from NCBI's ftp > site. RPS-BLAST runs very quickly too as their are only 12,000 models > in the CDD database. > > --Torsten Seemann I wasn't sure if he meant RPS-BLAST results or the data from the web page. chris From er at xs4all.nl Mon Jan 22 06:37:08 2007 From: er at xs4all.nl (Erik) Date: Mon, 22 Jan 2007 12:37:08 +0100 (CET) Subject: [Bioperl-l] missing test file(s) In-Reply-To: <59665746-13B2-4E95-99AD-18D8EC3CA7B9@uiuc.edu> References: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> <840C35A7-1F55-4517-9D4E-72A1B97F0173@bioperl.org> <59665746-13B2-4E95-99AD-18D8EC3CA7B9@uiuc.edu> Message-ID: <17845.156.83.0.59.1169465828.squirrel@webmail.xs4all.nl> Hi, I noticed that t/Annotation.t references longnames.dnd, but that this longnames.dnd is not in http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/t/data/?cvsroot=bioperl The same goes for longnames.aln Maybe has the filename changed, or should the tests be removed? Or has someone forgotten to submit them? Thanks, Erikjan From cjfields at uiuc.edu Mon Jan 22 08:09:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 22 Jan 2007 07:09:57 -0600 Subject: [Bioperl-l] missing test file(s) In-Reply-To: <17845.156.83.0.59.1169465828.squirrel@webmail.xs4all.nl> References: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> <840C35A7-1F55-4517-9D4E-72A1B97F0173@bioperl.org> <59665746-13B2-4E95-99AD-18D8EC3CA7B9@uiuc.edu> <17845.156.83.0.59.1169465828.squirrel@webmail.xs4all.nl> Message-ID: <05EA4CC6-BB1B-4A05-A8CD-630463CAFA48@uiuc.edu> Weigang forgot to submit them. Weigang, could you add/commit these test files to CVS? chris On Jan 22, 2007, at 5:37 AM, Erik wrote: > Hi, > > I noticed that t/Annotation.t references longnames.dnd, but that this > longnames.dnd is not in > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/t/data/? > cvsroot=bioperl > > The same goes for longnames.aln > > Maybe has the filename changed, or should the tests be removed? > Or has someone forgotten to submit them? > > Thanks, > Erikjan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From weigang at GENECTR.HUNTER.CUNY.EDU Mon Jan 22 10:25:54 2007 From: weigang at GENECTR.HUNTER.CUNY.EDU (Weigang Qiu) Date: Mon, 22 Jan 2007 10:25:54 -0500 Subject: [Bioperl-l] missing test file(s) In-Reply-To: <05EA4CC6-BB1B-4A05-A8CD-630463CAFA48@uiuc.edu> References: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> <840C35A7-1F55-4517-9D4E-72A1B97F0173@bioperl.org> <59665746-13B2-4E95-99AD-18D8EC3CA7B9@uiuc.edu> <17845.156.83.0.59.1169465828.squirrel@webmail.xs4all.nl> <05EA4CC6-BB1B-4A05-A8CD-630463CAFA48@uiuc.edu> Message-ID: <45B4D782.80008@genectr.hunter.cuny.edu> test files added & commited. sorry about the neglect. Chris Fields wrote: > Weigang forgot to submit them. Weigang, could you add/commit these > test files to CVS? > > chris > > On Jan 22, 2007, at 5:37 AM, Erik wrote: > >> Hi, >> >> I noticed that t/Annotation.t references longnames.dnd, but that this >> longnames.dnd is not in >> >> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/t/data/? >> cvsroot=bioperl >> >> The same goes for longnames.aln >> >> Maybe has the filename changed, or should the tests be removed? >> Or has someone forgotten to submit them? >> >> Thanks, >> Erikjan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- Weigang Qiu, Assist. Professor Department of Biological Sciences Hunter College, City University of New York 695 Park Ave, New York, NY 10021 1-212-772-5296 (Office, Room 839, Hunter North) From cjfields at uiuc.edu Mon Jan 22 11:26:25 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 22 Jan 2007 10:26:25 -0600 Subject: [Bioperl-l] missing test file(s) In-Reply-To: <45B4D782.80008@genectr.hunter.cuny.edu> References: <3f3ecb5a0701191112j7ba8cb8fpec9f8e3cb2c3282@mail.gmail.com> <840C35A7-1F55-4517-9D4E-72A1B97F0173@bioperl.org> <59665746-13B2-4E95-99AD-18D8EC3CA7B9@uiuc.edu> <17845.156.83.0.59.1169465828.squirrel@webmail.xs4all.nl> <05EA4CC6-BB1B-4A05-A8CD-630463CAFA48@uiuc.edu> <45B4D782.80008@genectr.hunter.cuny.edu> Message-ID: <264D5A70-5A1C-44C1-B4B7-E98EF9127A3E@uiuc.edu> No problem; I've done that before myself. I now get all tests passing after updating from CVS. chris On Jan 22, 2007, at 9:25 AM, Weigang Qiu wrote: > test files added & commited. sorry about the neglect. > > Chris Fields wrote: > >> Weigang forgot to submit them. Weigang, could you add/commit these >> test files to CVS? >> >> chris >> >> On Jan 22, 2007, at 5:37 AM, Erik wrote: >> >>> Hi, >>> >>> I noticed that t/Annotation.t references longnames.dnd, but that >>> this >>> longnames.dnd is not in >>> >>> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/t/data/? >>> cvsroot=bioperl >>> >>> The same goes for longnames.aln >>> >>> Maybe has the filename changed, or should the tests be removed? >>> Or has someone forgotten to submit them? >>> >>> Thanks, >>> Erikjan >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> > > > -- > Weigang Qiu, Assist. Professor > Department of Biological Sciences > Hunter College, City University of New York > 695 Park Ave, New York, NY 10021 > 1-212-772-5296 (Office, Room 839, Hunter North) > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bpederse at gmail.com Mon Jan 22 11:34:12 2007 From: bpederse at gmail.com (Brent Pedersen) Date: Mon, 22 Jan 2007 08:34:12 -0800 Subject: [Bioperl-l] constant Graphics height In-Reply-To: References: Message-ID: hi, to answer (i think) my own q, it looks possible to use Bio::Graphics::Glyph::top() to do this in bioperl. i think more hackery would be required to do same in gbrowse. -b On 1/20/07, Brent Pedersen wrote: > hi, i'm trying to find my way through all the bioperl / gbrowse docs. > can someone point me to an example where the image is kept at a > constant height, regardless of the number of tracks, or the > overlapping of features? > is this possible with gbrowse_img? thanks for any examples or pointers. > -brent > From wgallin at ualberta.ca Mon Jan 22 13:46:13 2007 From: wgallin at ualberta.ca (Warren Gallin) Date: Mon, 22 Jan 2007 11:46:13 -0700 Subject: [Bioperl-l] Finding coding sequences corresponding to protein sequences Message-ID: <5C9D2739-1F7B-4323-A209-880C28CEC32F@ualberta.ca> I have a set of gi numbers for protein sequences and I am trying to write a script that will return the corresponding nucleic acid coding sequences (CDS only) for each protein record. Currently the script grabs the protein sequence SeqIO object from GENPEPT and scans through the primary tags looking for CDS. Problem number 1 is that not all of the protein records have a CDS annotation that links to a nucleic acid sequence record (the first example in my set of records is gi1345813). So, when that happens, is there a straightforward way to find all protein records that have amino acid sequences identical to the current one, aside from running BLAST and then parsing the sequence identifiers on the alignment of the first hit? Problem number 2 is that sometimes the accession number for the corresponding nucleotide sequence in the CDS-tagged feature is a RefSeq entry, which causes my script to throw a fatal exception (the first example in my set of records is gi31543024). Is there a straightforward way to get to a non-RefSeq record given a RefSeq accession number? Also, is there some way of evaluating the call to GENBANK to determine if the record being requested is a RefSeq record and bypass the call? I would appreciate pointers to relevant documentation or examples so I can learn more about the general issues that are involved, or specific advice on how to make this kind of a script work. I'm using Bioperl 1.5.2 on a OS X server. Thanks, Warren Gallin From bix at sendu.me.uk Tue Jan 23 10:57:04 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 23 Jan 2007 15:57:04 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store issues Message-ID: <45B63050.4060101@sendu.me.uk> Hi, I'm trying to use Bio::DB::SeqFeature::Store for the first time and have come across some issues. Is the documentation accurate? The synopsis has: my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test', -write => 1 ); but code for new() has: rearrange(['ADAPTOR', 'SERIALIZER', 'INDEX_SUBFEATURES', 'CACHE', 'COMPRESS', 'DEBUG', 'CREATE',], at _); Ie. naively -write doesn't do anything, and -create is undocumented. I needed to use -create to get it to work. Doc for store() says: Args : list of Bio::SeqFeatureI objects But when I tried storing Bio::SeqFeature::Annotated objects I get this error: Transaction aborted because Can't locate object method "all_tags" via package "Bio::SeqFeature::Annotated" at /home/sendu/src/bioperl/core/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 1300. Ie. mysql.pm _update_attribute_index() method expects a Bio::SeqFeature::Generic object (which has the all_tags method), not just a Bio::SeqFeatureI. In fact all_tags() is just an alias to SeqFeatureI's get_all_tags() which is deprecated. Likewise, the same issue applies for the Generic-specific each_tag_value() alias of the deprecated get_tag_values(). What might be the best way to resolve this? I'm thinking it would be harmless to at least change _update_attribute_index() to call the SeqFeatureI methods instead? Don't know if there's a desire to not use the deprecated methods at all. From lstein at cshl.edu Tue Jan 23 15:08:21 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 23 Jan 2007 15:08:21 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store issues In-Reply-To: <45B63050.4060101@sendu.me.uk> References: <45B63050.4060101@sendu.me.uk> Message-ID: <6dce9a0b0701231208o30bcfa74yf32e5154ca9b6248@mail.gmail.com> On 1/23/07, Sendu Bala wrote: > > Hi, > > I'm trying to use Bio::DB::SeqFeature::Store for the first time and have > come across some issues. > > Is the documentation accurate? The synopsis has: > > my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', > -dsn => 'dbi:mysql:test', > -write => 1 ); > > but code for new() has: > > rearrange(['ADAPTOR', > 'SERIALIZER', > 'INDEX_SUBFEATURES', > 'CACHE', > 'COMPRESS', > 'DEBUG', > 'CREATE',], at _); > > Ie. naively -write doesn't do anything, and -create is undocumented. I > needed to use -create to get it to work. Some of the arguments are processed by the adaptor, in this case DBI::mysql. Looks like the -create option needs to be documented. Doc for store() says: > > Args : list of Bio::SeqFeatureI objects > > But when I tried storing Bio::SeqFeature::Annotated objects I get this > error: > > Transaction aborted because Can't locate object method "all_tags" via > package "Bio::SeqFeature::Annotated" at > /home/sendu/src/bioperl/core/Bio/DB/SeqFeature/Store/DBI/mysql.pm line > 1300. > > Ie. mysql.pm _update_attribute_index() method expects a > Bio::SeqFeature::Generic object (which has the all_tags method), not > just a Bio::SeqFeatureI. In fact all_tags() is just an alias to > SeqFeatureI's get_all_tags() which is deprecated. Likewise, the same > issue applies for the Generic-specific each_tag_value() alias of the > deprecated get_tag_values(). My error; I got confused between the old and new style deprecated. Now fixed. What might be the best way to resolve this? I'm thinking it would be > harmless to at least change _update_attribute_index() to call the > SeqFeatureI methods instead? Don't know if there's a desire to not use > the deprecated methods at all. > > Using separate annotation() objects converts a small data structure into a large one. I use lightweight Bio::Graphics::FeatureBase as the base class for Bio::DB::SeqFeature::NormalizedFeature, and avoiding the annotation object allows me to store attributes in a nice normalized form in a set of tables. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hjian at kuicr.kyoto-u.ac.jp Wed Jan 24 19:31:20 2007 From: hjian at kuicr.kyoto-u.ac.jp (Jian Huang) Date: Thu, 25 Jan 2007 09:31:20 +0900 Subject: [Bioperl-l] Is there any solution to residue leak problem of the Bio::Structure::Entry get_residues() method Message-ID: <001401c74018$26148420$93656785@zhur> Dear all, When two or more continuous residues have insertion codes and they are with the same residue name, get_residues() method of Bio::Structure::Entry can only get the first residue. Other residue(s) will be omitted. For example, in PDB structure 1tal, SER-120H will be omitted, although the previous SER-120G will be returned properly. I have reported this bug. See http://bugzilla.open-bio.org/show_bug.cgi?id=2192 It seems that this bug is then actually due to add_residue() method and then and then...Is there any solution to this problem? Jian Huang hjian at kuicr.kyoto-u.ac.jp Bioinformatics Center, Kyoto University Gokasho, Uji, Kyoto 611-0011, JAPAN Phone: +81-774-38-3296, Fax: +81-774-38-3269 From jay at jays.net Thu Jan 25 02:26:11 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 25 Jan 2007 01:26:11 -0600 Subject: [Bioperl-l] bioperl-microarray: status? Message-ID: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Hi Allen (et. al.) -- What is the status of bioperl-microarray nowadays? I checked it out of CVS and it looks like there hasn't been any substantive commit since 2003? I ask because I just spit out some Affymetrix GeneChip definitions and soon I'll have to get serious about parsing/storing/diving result sets. This area is new to me so I've been diving the bioperl-l archives for every mention of microarray over the years. Bioconductor and R are fascinating. I have a lot of reading to do. :) Anyway, just wondering if bioperl-microarray is still active and/or in use in the wild... Thanks, j seqlab.net (tutorial coming soon ... join our mailing list!) From cjfields at uiuc.edu Thu Jan 25 08:37:01 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 Jan 2007 07:37:01 -0600 Subject: [Bioperl-l] Is there any solution to residue leak problem of the Bio::Structure::Entry get_residues() method In-Reply-To: <001401c74018$26148420$93656785@zhur> References: <001401c74018$26148420$93656785@zhur> Message-ID: <439636B5-39E1-45E0-BAB3-6EC950D5F6FB@uiuc.edu> Jian, Bio::Structure is one area where much work still needs to be done. There had been talk on the list a while back about restructuring/ refactoring everything there but I have seen little progress as of yet. Also, it would help tremendously if you attach a simple script to the bug report demonstrating the bug for those unfamiliar with Bio::Structure (like me!). This is so we can confirm it and try to get a fix in. cheers! chris On Jan 24, 2007, at 6:31 PM, Jian Huang wrote: > Dear all, > > When two or more continuous residues have insertion codes and they > are with the same residue name, get_residues() method of > Bio::Structure::Entry can only get the first residue. Other residue > (s) will be omitted. For example, in PDB structure 1tal, SER-120H > will be omitted, although the previous SER-120G will be returned > properly. I have reported this bug. See http://bugzilla.open- > bio.org/show_bug.cgi?id=2192 > > It seems that this bug is then actually due to add_residue() method > and then and then...Is there any solution to this problem? > > Jian Huang > hjian at kuicr.kyoto-u.ac.jp > > Bioinformatics Center, > Kyoto University > Gokasho, Uji, Kyoto 611-0011, JAPAN > Phone: +81-774-38-3296, Fax: +81-774-38-3269 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From allenday at gmail.com Thu Jan 25 03:30:26 2007 From: allenday at gmail.com (Allen Day) Date: Thu, 25 Jan 2007 00:30:26 -0800 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> Message-ID: <5c24dcc30701250030g4487888an4646127c6b21ede@mail.gmail.com> Eh, there is some discussion activity on the list, but not much. You are really better off moving to Bioconductor. -Allen On 1/24/07, Jay Hannah wrote: > Hi Allen (et. al.) -- > > What is the status of bioperl-microarray nowadays? I checked it out > of CVS and it looks like there hasn't been any substantive commit > since 2003? > > I ask because I just spit out some Affymetrix GeneChip definitions > and soon I'll have to get serious about parsing/storing/diving result > sets. This area is new to me so I've been diving the bioperl-l > archives for every mention of microarray over the years. Bioconductor > and R are fascinating. I have a lot of reading to do. :) > > Anyway, just wondering if bioperl-microarray is still active and/or > in use in the wild... > > Thanks, > > j > seqlab.net > (tutorial coming soon ... join our mailing list!) > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jimhu at tamu.edu Thu Jan 25 11:11:02 2007 From: jimhu at tamu.edu (Jim Hu) Date: Thu, 25 Jan 2007 11:11:02 -0500 Subject: [Bioperl-l] Pathway tools output parser Message-ID: Is there a module to parse the lisp object files from Peter Karp's Pathway Tools? I need a parser to convert the gene and protein objects in EcoCyc releases into something that can be imported into Chado. ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From george.heller at yahoo.com Thu Jan 25 21:51:05 2007 From: george.heller at yahoo.com (George Heller) Date: Thu, 25 Jan 2007 18:51:05 -0800 (PST) Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: Message-ID: <269155.93656.qm@web58905.mail.re1.yahoo.com> Hi Hilmar, I still seem to be having problems loading my fasta file. I wrote a new package, SeqProcessor.pm as below, package SeqProcessor::Accession; use strict; use vars qw(@ISA); use Bio::Seq::BaseSeqProcessor; use Bio::SeqFeature::Generic; @ISA = qw(Bio::Seq::BaseSeqProcessor); sub process_seq { my ($self, $seq) = @_; $seq->accession_number($seq->display_id); return ($seq); } 1; I have this file SeqProcessor.pm in my home directory, and I have set the PERL5LIB variable accordingly. When I run load_seqdatabase.pl, perl load_seqdatabase.pl -host localhost -dbname biodb -format fasta -dbuser postgres -driver Pg --pipeline="SeqProcessor::Accession" maize_pep.fasta I still get the error, Loading maize_pep.fasta ... -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400|1","","0","") FKs (1,) ERROR: duplicate key violates unique constraint "bioentry_accession_key" --------------------------------------------------- Could not store FGENESHT0000001||AC155633|570|4400|1: ------------- EXCEPTION ------------- MSG: error while executing statement in Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current transaction is aborted, commands ignored until end of transaction block STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 STACK (eval) load_seqdatabase.pl:620 STACK toplevel load_seqdatabase.pl:602 -------------------------------------- at load_seqdatabase.pl line 633 Is there something I am missing? Thanks! George. Hilmar Lapp wrote: Hi George, sorry for the sluggish response, I was tied up during the week. This is also why you always want to keep the thread on the list. Perl is an interpreted language, so no compilation is necessary. The only thing you need to do is have the package in a place where perl can find it. The simplest way to achieve this is by setting the PERL5LIB environment variable: $ export PERL5LIB=/where/you/put/your/perl/package or if PERL5LIB was set already, you'd append it: $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package I do assume that you didn't really add your code to the SeqAdaptor.pm package - there is no necessity for nor benefit from that, and at worst (and quite likely) perl won't be able to find the package. Note that there is plenty of documentation for how to write packages for perl and how to make them accessible to perl. Hth, -hilmar On Jan 8, 2007, at 11:52 PM, George Heller wrote: > Hi Hilmer. > > Thanks so much for the response. As I am new to Bioperl, I have > another question. > > I have made the changes as suggested by you, and have added the > code below to the SeqAdaptor.pm script. > > package SeqProcessor::Accession; > use strict; > use vars qw(@ISA); > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > sub process_seq > { > my ($self, $seq) = @_; > $seq->accession_number($seq->display_id); > return ($seq); > } > > Now that I have done my changes, do I need to compile or something > for the changes to reflect? If so, can you please let me know the > command for the same, or direct me to any lin that has > documentation for the same? > > Thanks so much for the help. > George. > > Hilmar Lapp wrote: > George, > > this is almost certainly caused by using FASTA format and bioperl's > treatment of it. I am guilty of not having written a FAQ yet for > Bioperl-db, as this would certainly be there. > > Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl > uses Bioperl to parse sequence files) does not extract the accession > number from the description line of the fasta sequence, and instead > sets the accession_number property if sequence objects it creates to > "unknown". Since there is a unique key constraint on > (accession,version,namespace) the second sequence loaded will raise > an exception as it will violate the constraint. > > The simplest way to deal with this is to write a SeqProcessor that > massages the accession_number appropriately and then supply the > module to load_seqdatabase.pl using the --pipeline command line > switch. > > There are several examples for how to do this in the email archives. > See for example this thread on the Biosql list: > > http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html > > with two links to examples, and Marc Logghe gives another one in the > thread itself. > > Hth, > > -hilmar > > On Jan 8, 2007, at 3:17 PM, George Heller wrote: > > > Hi all. > > > > I am new to Bioperl and am trying to run the load_seqdatabase.pl > > script to load sequence data from a file into Postgres database. I > > am invoking the script through the following command: > > > > perl load_seqdatabase.pl -host localhost -dbname biodb06 -format > > fasta > > -dbuser postgres -driver Pg > > > > I am getting the following error: > > > > -------------------- WARNING --------------------- > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > > were ("FGENES > > HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > > 1","unknown" > > ,"","0","") FKs (1,) > > ERROR: duplicate key violates unique constraint > > "bioentry_accession_key" > > --------------------------------------------------- > > Could not store unknown: > > ------------- EXCEPTION ------------- > > MSG: error while executing statement in > > Bio::DB::BioSQL::SeqAdaptor::find_by_uni > > que_key: ERROR: current transaction is aborted, commands ignored > > until end of t > > ransaction block > > STACK > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ > > lib/perl > > 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > > usr/lib/perl5 > > /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ > > perl5/site_perl/5 > > .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/ > > site_perl/5. > > 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > > STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ > > site_perl/5.8. > > 5/Bio/DB/Persistent/PersistentObject.pm:271 > > STACK (eval) load_seqdatabase.pl:620 > > STACK toplevel load_seqdatabase.pl:602 > > -------------------------------------- > > at load_seqdatabase.pl line 633 > > > > Can anyone tell me how I can correct this error and get my script > > running? Thanks!!! > > > > George. > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== --------------------------------- Looking for earth-friendly autos? Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. From roy at colibase.bham.ac.uk Thu Jan 25 17:19:00 2007 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Thu, 25 Jan 2007 22:19:00 +0000 Subject: [Bioperl-l] Species name problems with bioperl-db Message-ID: <45B92CD4.4040003@colibase.bham.ac.uk> Hi. I'm having problems similar to those discussed in this thread: http://comments.gmane.org/gmane.comp.lang.perl.bio.general/13766 and in bug 2092. I'm using the 1.52 release code, that includes Sendu's fix for the problem, but I'm still getting errors with some species names. The process seems to fall foul of line 167 of Bio::Species, which checks that the lineage starts at the species in question. Here are some of the error messages I'm getting: Uniprot entry P21215: MSG: The supplied lineage does not start near 'Clostridium sp.' (I was supplied 'sp. ATCC29733 | Clostridium | Clostridiaceae | Clostridiales | Clostridia | Firmicutes | Bacteria') Uniprot entry Q98AM7: MSG: The supplied lineage does not start near 'Rhizobium loti' (I was supplied 'loti | Mesorhizobium | Phyllobacteriaceae | Rhizobiales | Alphaproteobacteria | Proteobacteria | Bacteria') Genbank entry CP000026: MSG: The supplied lineage does not start near 'Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150' (I was supplied 'paratyphi | Salmonella | Enterobacteriaceae | Enterobacteriales | Gammaproteobacteria | Proteobacteria | Bacteria') It is easy to see why problems are arising- the species name used in the GenBank/Uniprot entry is sometimes a synonym of that in the supplied lineage, rather than an exact duplicate. Is the check on line 167 really necessary? Or at least could the throw be changed to a warn? Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk From jay at jays.net Fri Jan 26 08:26:49 2007 From: jay at jays.net (Jay Hannah) Date: Fri, 26 Jan 2007 07:26:49 -0600 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: <5c24dcc30701250030g4487888an4646127c6b21ede@mail.gmail.com> References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <5c24dcc30701250030g4487888an4646127c6b21ede@mail.gmail.com> Message-ID: On Jan 25, 2007, at 2:30 AM, Allen Day wrote: > Eh, there is some discussion activity on the list, but not much. You > are really better off moving to Bioconductor. Ok, thanks. I added that to the wiki page: http://www.bioperl.org/wiki/Microarray_package j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From cjfields at uiuc.edu Fri Jan 26 09:05:01 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 Jan 2007 08:05:01 -0600 Subject: [Bioperl-l] bioperl-microarray: status? In-Reply-To: References: <829F02EC-F827-485E-82F8-9EFEA0332C77@jays.net> <5c24dcc30701250030g4487888an4646127c6b21ede@mail.gmail.com> Message-ID: Don't know if it's worth it, but could the microarray package be modified so that it deals with data generated from or interacts directly with Bioconductor (i.e. maybe including some specialized bioperl-run set of classes to run Bioconductor tasks, return lightweight bioperl microarray classes)? Allen pointed out in a previous post that Bioconductor is the best pick for certain tasks, while Perl excels at others: http://article.gmane.org/gmane.comp.lang.perl.bio.general/13993 Might be nice if we could merge both strengths together in some way. chris On Jan 26, 2007, at 7:26 AM, Jay Hannah wrote: > On Jan 25, 2007, at 2:30 AM, Allen Day wrote: >> Eh, there is some discussion activity on the list, but not much. You >> are really better off moving to Bioconductor. > > Ok, thanks. I added that to the wiki page: > > http://www.bioperl.org/wiki/Microarray_package > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From e-just at northwestern.edu Fri Jan 26 12:08:49 2007 From: e-just at northwestern.edu (Eric Just) Date: Fri, 26 Jan 2007 11:08:49 -0600 Subject: [Bioperl-l] bp_genbank2gff3.pl Message-ID: Hi there, I am getting some strange results with bp_genbank2gff3.pl. I have a source genbank file with mulitple records. I would like to have all of my mRNA features parsed into mRNA CDS exon strutctures and the tRNA features parsed int tRNA exon structures in the GFF3 file. I am calling the script like this: perl %xampp_root%/perl/bin/bp_genbank2gff3.pl --filter misc_feature --filter repeat_region --nolump genbank_data/test.small.gb Everything appears to run OK, no errors, however in my output I have mysterious missing exon features. Most of the mRNAs get parsed as mRNA/CDS/exon but some are missing one or more exon features. The problem seems to get worse the more records there are in the genbank source file. For example, the following portion of the genbank file: gene <5948..>6982 /locus_tag="4.t00046" /Name="4.t00046" mRNA 5948..6982 /db_xref="GI:56474408" /locus_tag="4.t00046" /codon_start=1 /protein_id="EAL51779.1" /product="ubiquitin-conjugating enzyme, putative" CDS 5948..6982 /locus_tag="4.t00046" gets written as: AAFB01000019 GenBank gene 5948 6982 . + . iD=4.t00046;locus_tag=4.t00046;Name=4.t00046 AAFB01000019 GenBank mRNA 5948 6982 . + . iD=4.t00046.t01;Parent=4.t00046;db_xref=GI:56474408;locus_tag=4.t00046;codon_start=1;protein_id=EAL51779.1;product=ubiquitin-conjugating enzyme%2C putative AAFB01000019 GenBank CDS 5948 6982 . + . Parent=4.t00046.t01;locus_tag=4.t00046 Whereas most of the other mRNA features have exon features. I notice the same problem with tRNAs missing exon features. When if I parse the single GenBank record, it works fine, it seems to be a problem parsing a single file with multiple GenBank records. Any idea what's going wrong or what I can do to help trouble shoot? Attached is my source GenBank file. Thanks a lot! Eric -------------- next part -------------- A non-text attachment was scrubbed... Name: test.small.gb.gz Type: application/x-gzip Size: 175413 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070126/c545fa4e/attachment-0001.gz From bix at sendu.me.uk Fri Jan 26 13:27:03 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 26 Jan 2007 18:27:03 +0000 Subject: [Bioperl-l] Species name problems with bioperl-db In-Reply-To: <45B92CD4.4040003@colibase.bham.ac.uk> References: <45B92CD4.4040003@colibase.bham.ac.uk> Message-ID: <45BA47F7.8060207@sendu.me.uk> Roy Chaudhuri wrote: > Hi. [snip] > It is easy to see why problems are arising- the species name used in the > GenBank/Uniprot entry is sometimes a synonym of that in the supplied > lineage, rather than an exact duplicate. Is the check on line 167 really > necessary? Or at least could the throw be changed to a warn? I acknowledge this report (thank you for the details) and will investigate when I have the time. I must have had some good reason for the check throwing, but will confirm if has to be done that way. If you could, please make a new bug report on bugzilla for this. Cheers, Sendu. From lstein at cshl.edu Fri Jan 26 18:12:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 26 Jan 2007 18:12:49 -0500 Subject: [Bioperl-l] to_FTstring() for Bio::Graphics::FeatureBase In-Reply-To: <418931A4-DD63-455D-8BB2-9C0D2A391429@bioperl.org> References: <418931A4-DD63-455D-8BB2-9C0D2A391429@bioperl.org> Message-ID: <6dce9a0b0701261512s196cfd90vc6976d584b5196a2@mail.gmail.com> Hi Jason, Ack, embarassing. I'll fix this somehow and commit when I think the changes are right. Feel free to make suggestions. Lincoln On 1/26/07, Jason Stajich wrote: > > Let me rephrase - I can see where this might not be the correct fix. I > have Bio::DB::SeqFeature objects which are CDS objects, I want to print out > their location with to_FTstring, but it is only returning min..max for the > location. > I guess for something that is a Segment there is only one value > (min..max) that should be presented. > > imagine I've done: > for my $gene ( $segment->features('gene') ) { > for my $mRNA ( $gene->get_SeqFeatures('mRNA') { > for my $CDS ( $mRNA->get_SeqFeatures('CDS') { > print $CDS->location->to_FTstring(); > } > } > } > > > Also - I note that if I call $CDS->length on a reverse-strand feature I > get a negative length. > > So Feature::Base would need to be changed to this: > > sub length { > my $self = shift; > return $self->high - $self->low + 1; > } > > instead of > > sub length { > my $self = shift; > return $self->end - $self->start + 1; > } > > > -jasonOn Jan 26, 2007, at 2:52 PM, Jason Stajich wrote: > > Lincoln: > > Right now the code for Bio::Graphics::FeatureBase implementation of the > LocationI interface method, to_FTstring is: > > sub to_FTstring { > my $self = shift; > my $low = $self->min_start; > my $high = $self->max_end; > return "$low..$high"; > } > > So strand is thrown away. Is it legitimate to modify this to return > $high..$low when strand < 0? > > -jason > > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason at bioperl.org Fri Jan 26 18:15:43 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 26 Jan 2007 15:15:43 -0800 Subject: [Bioperl-l] to_FTstring() for Bio::Graphics::FeatureBase In-Reply-To: <6dce9a0b0701261512s196cfd90vc6976d584b5196a2@mail.gmail.com> References: <418931A4-DD63-455D-8BB2-9C0D2A391429@bioperl.org> <6dce9a0b0701261512s196cfd90vc6976d584b5196a2@mail.gmail.com> Message-ID: sure - i made a quick fix to make sure length is positive in Segment.pm - feel free to undo that, I should probably wait for you to make the full changes. -jason On Jan 26, 2007, at 3:12 PM, Lincoln Stein wrote: > Hi Jason, > > Ack, embarassing. I'll fix this somehow and commit when I think the > changes > are right. Feel free to make suggestions. > > Lincoln > > On 1/26/07, Jason Stajich wrote: >> >> Let me rephrase - I can see where this might not be the correct >> fix. I >> have Bio::DB::SeqFeature objects which are CDS objects, I want to >> print out >> their location with to_FTstring, but it is only returning min..max >> for the >> location. >> I guess for something that is a Segment there is only one value >> (min..max) that should be presented. >> >> imagine I've done: >> for my $gene ( $segment->features('gene') ) { >> for my $mRNA ( $gene->get_SeqFeatures('mRNA') { >> for my $CDS ( $mRNA->get_SeqFeatures('CDS') { >> print $CDS->location->to_FTstring(); >> } >> } >> } >> >> >> Also - I note that if I call $CDS->length on a reverse-strand >> feature I >> get a negative length. >> >> So Feature::Base would need to be changed to this: >> >> sub length { >> my $self = shift; >> return $self->high - $self->low + 1; >> } >> >> instead of >> >> sub length { >> my $self = shift; >> return $self->end - $self->start + 1; >> } >> >> >> -jasonOn Jan 26, 2007, at 2:52 PM, Jason Stajich wrote: >> >> Lincoln: >> >> Right now the code for Bio::Graphics::FeatureBase implementation >> of the >> LocationI interface method, to_FTstring is: >> >> sub to_FTstring { >> my $self = shift; >> my $low = $self->min_start; >> my $high = $self->max_end; >> return "$low..$high"; >> } >> >> So strand is thrown away. Is it legitimate to modify this to return >> $high..$low when strand < 0? >> >> -jason >> >> >> -- >> Jason Stajich >> jason at bioperl.org >> http://jason.open-bio.org/ >> >> >> >> > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From hjian at kuicr.kyoto-u.ac.jp Fri Jan 26 21:52:44 2007 From: hjian at kuicr.kyoto-u.ac.jp (Jian Huang) Date: Sat, 27 Jan 2007 11:52:44 +0900 Subject: [Bioperl-l] Is there any solution to residue leak problem of the Bio::Structure::Entry get_residues() method References: <001401c74018$26148420$93656785@zhur> <439636B5-39E1-45E0-BAB3-6EC950D5F6FB@uiuc.edu> Message-ID: <000a01c741be$3b265cb0$93656785@zhur> Dear Chris, A simple scipt to detect residue leak bug in Bio::Structure::Entry get_residues() method... You can download a slice of 1TAL structure for test at http://bugzilla.open-bio.org/attachment.cgi?id=549 Best regards, Jian Huang #!d:\perl\bin\perl.exe #=================================================== # A simple scipt to detect residue leak bug in # Bio::Structure::Entry get_residues() method # # Please edit the first line according to you system # # 120h.pl 1tal.pdb #==================================================== use strict; use Bio::Structure::IO; # create io stream from the first arguments of command line my $structio = Bio::Structure::IO->new(-file => $ARGV[0]); # create Bio::Structure::Entry object my $in = $structio->next_structure; foreach my $chain ($in->get_chains) { foreach my $res ($in->get_residues($chain)){ print (($res->id).getICode($in,$res),"\n"); } } #-------------------------------------------------------------- # Function getting residue insertion Code based on its CA atom #-------------------------------------------------------------- sub getICode{ my ($in,$res) = @_; foreach my $atom($in->get_atoms($res)) { if ($atom->id eq "CA"){ my $InsertionCode = $atom->icode; return $InsertionCode; } } } #=================================================== # Now look at the 1tal.pdb and the screen output # residue SER 120H is skipped #==================================================== __END__ ----- Original Message ----- From: "Chris Fields" To: "Jian Huang" Cc: Sent: Thursday, January 25, 2007 10:37 PM Subject: Re: [Bioperl-l] Is there any solution to residue leak problem of the Bio::Structure::Entry get_residues() method > Jian, > > Bio::Structure is one area where much work still needs to be done. There > had been talk on the list a while back about restructuring/ refactoring > everything there but I have seen little progress as of yet. > > Also, it would help tremendously if you attach a simple script to the bug > report demonstrating the bug for those unfamiliar with Bio::Structure > (like me!). This is so we can confirm it and try to get a fix in. > > cheers! > > chris > > On Jan 24, 2007, at 6:31 PM, Jian Huang wrote: > >> Dear all, >> >> When two or more continuous residues have insertion codes and they are >> with the same residue name, get_residues() method of >> Bio::Structure::Entry can only get the first residue. Other residue (s) >> will be omitted. For example, in PDB structure 1tal, SER-120H will be >> omitted, although the previous SER-120G will be returned properly. I >> have reported this bug. See http://bugzilla.open- >> bio.org/show_bug.cgi?id=2192 >> >> It seems that this bug is then actually due to add_residue() method and >> then and then...Is there any solution to this problem? >> >> Jian Huang >> hjian at kuicr.kyoto-u.ac.jp >> >> Bioinformatics Center, >> Kyoto University >> Gokasho, Uji, Kyoto 611-0011, JAPAN >> Phone: +81-774-38-3296, Fax: +81-774-38-3269 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From hlapp at gmx.net Fri Jan 26 19:16:33 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 26 Jan 2007 18:16:33 -0600 Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: <269155.93656.qm@web58905.mail.re1.yahoo.com> References: <269155.93656.qm@web58905.mail.re1.yahoo.com> Message-ID: <4B0E46B5-48ED-4E2B-B4BE-0210675FB2AE@gmx.net> George, I don't know you create the FASTA file, but that's probably where the root cause is. Based on the message: > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| > AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > 1","","0","") FKs (1,) > ERROR: duplicate key violates unique constraint > "bioentry_accession_key" > --------------------------------------------------- the identifier and accession number are set, so your SeqProcessor scriptlet was executed (otherwise you'd also have seen a dynamic loading error it e.g. you perl class could be not be found or loaded by perl). If you still receive the duplicate key violation, then it can only mean that indeed a sequence with the exact same accession number was in the database already. There are different possibilities for why: you may have loaded the same file before (use --lookup and related switches if you want to update existing sequences), or your FASTA file contains multiple sequences with the same ID, or you have a sequence with the same ID in different FASTA files, if you are loading from more than one file. In either of the two latter cases, you will need to find a way to disambiguate the IDs. BTW you also want to consider to parse the concatenated ID 'FGENESHT0000001||AC155633|570|4400|1' apart into its component IDs, and then use only one component. For example: my @ids = split(/|/,$seq->display_id); $seq->accession_number($ids[0]); Obviously, this will only make for a nicer accession number, and not solve your duplicate ID problem, as the latter is in the file(s) you load. -hilmar On Jan 25, 2007, at 8:51 PM, George Heller wrote: > Hi Hilmar, > > I still seem to be having problems loading my fasta file. I wrote a > new package, SeqProcessor.pm as below, > > package SeqProcessor::Accession; > use strict; > use vars qw(@ISA); > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > @ISA = qw(Bio::Seq::BaseSeqProcessor); > sub process_seq > { > my ($self, $seq) = @_; > $seq->accession_number($seq->display_id); > return ($seq); > } > 1; > I have this file SeqProcessor.pm in my home directory, and I have > set the PERL5LIB variable accordingly. When I run load_seqdatabase.pl, > > perl load_seqdatabase.pl -host localhost -dbname biodb -format > fasta -dbuser postgres -driver Pg -- > pipeline="SeqProcessor::Accession" maize_pep.fasta > > I still get the error, > > Loading maize_pep.fasta ... > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| > AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > 1","","0","") FKs (1,) > ERROR: duplicate key violates unique constraint > "bioentry_accession_key" > --------------------------------------------------- > Could not store FGENESHT0000001||AC155633|570|4400|1: > ------------- EXCEPTION ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > transaction is aborted, commands ignored until end of transaction > block > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ > perl//Bio/DB/Persistent/PersistentObject.pm:272 > STACK (eval) load_seqdatabase.pl:620 > STACK toplevel load_seqdatabase.pl:602 > -------------------------------------- > at load_seqdatabase.pl line 633 > Is there something I am missing? > > Thanks! > George. > > > Hilmar Lapp wrote: > Hi George, sorry for the sluggish response, I was tied up during the > week. This is also why you always want to keep the thread on the list. > > Perl is an interpreted language, so no compilation is necessary. The > only thing you need to do is have the package in a place where perl > can find it. The simplest way to achieve this is by setting the > PERL5LIB environment variable: > > $ export PERL5LIB=/where/you/put/your/perl/package > > or if PERL5LIB was set already, you'd append it: > > $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package > > I do assume that you didn't really add your code to the SeqAdaptor.pm > package - there is no necessity for nor benefit from that, and at > worst (and quite likely) perl won't be able to find the package. Note > that there is plenty of documentation for how to write packages for > perl and how to make them accessible to perl. > > Hth, > > -hilmar > > On Jan 8, 2007, at 11:52 PM, George Heller wrote: > > > Hi Hilmer. > > > > Thanks so much for the response. As I am new to Bioperl, I have > > another question. > > > > I have made the changes as suggested by you, and have added the > > code below to the SeqAdaptor.pm script. > > > > package SeqProcessor::Accession; > > use strict; > > use vars qw(@ISA); > > use Bio::Seq::BaseSeqProcessor; > > use Bio::SeqFeature::Generic; > > > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > > > sub process_seq > > { > > my ($self, $seq) = @_; > > $seq->accession_number($seq->display_id); > > return ($seq); > > } > > > > Now that I have done my changes, do I need to compile or something > > for the changes to reflect? If so, can you please let me know the > > command for the same, or direct me to any lin that has > > documentation for the same? > > > > Thanks so much for the help. > > George. > > > > Hilmar Lapp wrote: > > George, > > > > this is almost certainly caused by using FASTA format and bioperl's > > treatment of it. I am guilty of not having written a FAQ yet for > > Bioperl-db, as this would certainly be there. > > > > Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl > > uses Bioperl to parse sequence files) does not extract the accession > > number from the description line of the fasta sequence, and instead > > sets the accession_number property if sequence objects it creates to > > "unknown". Since there is a unique key constraint on > > (accession,version,namespace) the second sequence loaded will raise > > an exception as it will violate the constraint. > > > > The simplest way to deal with this is to write a SeqProcessor that > > massages the accession_number appropriately and then supply the > > module to load_seqdatabase.pl using the --pipeline command line > > switch. > > > > There are several examples for how to do this in the email archives. > > See for example this thread on the Biosql list: > > > > http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html > > > > with two links to examples, and Marc Logghe gives another one in the > > thread itself. > > > > Hth, > > > > -hilmar > > > > On Jan 8, 2007, at 3:17 PM, George Heller wrote: > > > > > Hi all. > > > > > > I am new to Bioperl and am trying to run the load_seqdatabase.pl > > > script to load sequence data from a file into Postgres database. I > > > am invoking the script through the following command: > > > > > > perl load_seqdatabase.pl -host localhost -dbname biodb06 -format > > > fasta > > > -dbuser postgres -driver Pg > > > > > > I am getting the following error: > > > > > > -------------------- WARNING --------------------- > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > > > were ("FGENES > > > HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570| > 4400| > > > 1","unknown" > > > ,"","0","") FKs (1,) > > > ERROR: duplicate key violates unique constraint > > > "bioentry_accession_key" > > > --------------------------------------------------- > > > Could not store unknown: > > > ------------- EXCEPTION ------------- > > > MSG: error while executing statement in > > > Bio::DB::BioSQL::SeqAdaptor::find_by_uni > > > que_key: ERROR: current transaction is aborted, commands ignored > > > until end of t > > > ransaction block > > > STACK > > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ > > > lib/perl > > > 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > > > usr/lib/perl5 > > > /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ > > > perl5/site_perl/5 > > > .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/ > perl5/ > > > site_perl/5. > > > 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > > > STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ > > > site_perl/5.8. > > > 5/Bio/DB/Persistent/PersistentObject.pm:271 > > > STACK (eval) load_seqdatabase.pl:620 > > > STACK toplevel load_seqdatabase.pl:602 > > > -------------------------------------- > > > at load_seqdatabase.pl line 633 > > > > > > Can anyone tell me how I can correct this error and get my script > > > running? Thanks!!! > > > > > > George. > > > > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam protection around > > > http://mail.yahoo.com > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > Looking for earth-friendly autos? > Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Fri Jan 26 23:38:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 Jan 2007 22:38:26 -0600 Subject: [Bioperl-l] Is there any solution to residue leak problem of the Bio::Structure::Entry get_residues() method In-Reply-To: <000a01c741be$3b265cb0$93656785@zhur> References: <001401c74018$26148420$93656785@zhur> <439636B5-39E1-45E0-BAB3-6EC950D5F6FB@uiuc.edu> <000a01c741be$3b265cb0$93656785@zhur> Message-ID: <6DAA6F34-D2A4-454D-9A69-31C8E7817327@uiuc.edu> On Jan 26, 2007, at 8:52 PM, Jian Huang wrote: > Dear Chris, > > A simple scipt to detect residue leak bug in Bio::Structure::Entry > get_residues() method... > You can download a slice of 1TAL structure for test at > http://bugzilla.open-bio.org/attachment.cgi?id=549 > > Best regards, > > Jian Huang ... Jian, I have tested your script out. I changed the subroutine to return a blank string; it was shooting off warnings when they were turned on,, but the results were the same w/o any warnings: Use of uninitialized value in pattern match (m//) at /Users/cjfields/src/bioperl-live/Bio/Structure/IO/pdb.pm line 449, line 105. LEU-114 LEU-119 PRO-120 ARG-120.AA VAL-120.BB ALA-120.CC ASN-120.DD GLY-120.EE SER-120.GG <---- SER-120.HH <---- PHE-120.II VAL-120.JJ THR-120.KK VAL-121 Isn't the arrow pointing at SER-120G and SER-120H? You might want to check that you are running the latest Bioperl release (1.5.2); I noticed that a CVS commit that may be related to this was made after the 1.5.1 release. If you have upgraded, an older installation may be interfering. You can check which version by running (I think this will work on WinXP): perl -MBio::Root::Version -e "print $Bio::Root::Version::VERSION,\"\n\"" To locate your bioperl installation being used: perldoc -l Bio::Root::Root chris From hjian at kuicr.kyoto-u.ac.jp Sat Jan 27 02:34:14 2007 From: hjian at kuicr.kyoto-u.ac.jp (Jian Huang) Date: Sat, 27 Jan 2007 16:34:14 +0900 Subject: [Bioperl-l] Is there any solution to residue leak problem of the Bio::Structure::Entry get_residues() method References: <001401c74018$26148420$93656785@zhur> <439636B5-39E1-45E0-BAB3-6EC950D5F6FB@uiuc.edu> <000a01c741be$3b265cb0$93656785@zhur> <6DAA6F34-D2A4-454D-9A69-31C8E7817327@uiuc.edu> Message-ID: <008f01c741e5$8ea76600$93656785@zhur> Dear Chris, I have compared the old and the new version. The the residue leak problem I met seems to be caused by by get_residues() method of Bio::Structure::Entry. However, it is actually caused by _read_PDB_coordinate_section() method of Bio::Structure::IO::pdb.pm. In the new version, my newly-met old bug was fixed with the "newly-added" line ( at line1186 in pdb.pm ) $res_name_num .= '.'.$icode if $icode; Thank you again for your kindly help. Jian Huang From mag87 at cornell.edu Sat Jan 27 02:27:06 2007 From: mag87 at cornell.edu (Michael Gore) Date: Sat, 27 Jan 2007 02:27:06 -0500 Subject: [Bioperl-l] swissprot stream with no ID. Not swissprot in my book Message-ID: <000501c741e4$8f82ee60$de7afd80@maize.cornell.edu> I am having a similar error retrieving sequence from SwissProt. Thought may be changing 'swall' to 'UniProtKB' would solve the issue, but it still remains. >>> ------------- EXCEPTION ------------- >>> MSG: swissprot stream with no ID. Not swissprot in my book >>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ >>> swiss.pm:179 >>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ >>> WebDBSeqI.pm:153 >>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 >>> STACK toplevel tut2.pl:5 It must be a URL issue? Does anyone know a remedy? New Bio-Perl User Mike On 6/20/06 1:16 PM, "Chris Fields" > wrote: > Brian, > > Brian, > > Looks like EBI switched the url parameter for swissprot 'swall' to > 'UniProtKB'. I committed a change to Bio::DB::SwissProt in CVS which fixes > this and solves the issue. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org ] On Behalf Of Brian Osborne >> Sent: Tuesday, June 20, 2006 11:14 AM >> To: George Tzotzos; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Error message >> >> George, >> >> The docs I'm reading say to use 'swiss', not 'swissprot' but I think >> there's >> some other problem that may be specific to SwissProt. Can you retrieve >> from >> GenBank? E.g.: >> >> my $seq_object = get_sequence('genbank', 2); >> >> Brian O. >> >> >> On 6/20/06 7:36 AM, "George Tzotzos" > wrote: >> >>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the >>> following script to test the installation: >>> >>> use Bio::Perl; >>> use strict; >>> use warnings; >>> >>> my $seq_object = get_sequence('swissprot', "P09651"); >>> >>> write_sequence(">roa1.fasta", 'fasta', $seq_object); >>> >>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I >>> get the message below. >>> >>> Any help on the nature of the problem and how to overcome it would be >>> greatly appreciated. >>> >>> Thanks >>> >>> George >>> >>> >>> ------------- EXCEPTION ------------- >>> MSG: swissprot stream with no ID. Not swissprot in my book >>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ >>> swiss.pm:179 >>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ >>> WebDBSeqI.pm:153 >>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 >>> STACK toplevel tut2.pl:5 >>> >>> >>> >>> George T. Tzotzos Ph.D >>> Vienna, Austria Michael Gore Graduate Student Department of Plant Breeding and Genetics Cornell University Institute for Genomic Diversity 175 Biotechnology Building Ithaca, NY 14853-2703 Office: (607) 255-1809 Fax: (607) 255-6249 From cjfields at uiuc.edu Sat Jan 27 09:35:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 27 Jan 2007 08:35:23 -0600 Subject: [Bioperl-l] swissprot stream with no ID. Not swissprot in my book In-Reply-To: <000501c741e4$8f82ee60$de7afd80@maize.cornell.edu> References: <000501c741e4$8f82ee60$de7afd80@maize.cornell.edu> Message-ID: This did fix it for most users; this is the first problem I have seen with it since then. We don't have enough information to go on here to really diagnose the problem beyond that you're using Bio::Perl. Are you using the latest BioPerl version (1.5.2, which has the fix)? How did you retrieve your sequence (i.e. do you have a script)? What OS are you on? I run the following using the latest bioperl and it works: use Bio::Perl; my $seq = get_sequence('swissprot',"ROA1_HUMAN"); print $seq->accession_number,"\n"; print $seq->seq,"\n"; Chris On Jan 27, 2007, at 1:27 AM, Michael Gore wrote: > I am having a similar error retrieving sequence from SwissProt. > Thought may > be changing 'swall' to 'UniProtKB' would solve the issue, but it still > remains. > >>>> ------------- EXCEPTION ------------- >>>> MSG: swissprot stream with no ID. Not swissprot in my book >>>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ >>>> swiss.pm:179 >>>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ >>>> WebDBSeqI.pm:153 >>>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 >>>> STACK toplevel tut2.pl:5 > > > It must be a URL issue? Does anyone know a remedy? > > New Bio-Perl User > > Mike > > > > On 6/20/06 1:16 PM, "Chris Fields" > wrote: > >> Brian, >> >> Brian, >> >> Looks like EBI switched the url parameter for swissprot 'swall' to >> 'UniProtKB'. I committed a change to Bio::DB::SwissProt in CVS which > fixes >> this and solves the issue. >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l- >>> bounces at lists.open-bio.org > ] On Behalf > Of Brian > Osborne >>> Sent: Tuesday, June 20, 2006 11:14 AM >>> To: George Tzotzos; bioperl-l at lists.open-bio.org > >>> Subject: Re: [Bioperl-l] Error message >>> >>> George, >>> >>> The docs I'm reading say to use 'swiss', not 'swissprot' but I think >>> there's >>> some other problem that may be specific to SwissProt. Can you >>> retrieve >>> from >>> GenBank? E.g.: >>> >>> my $seq_object = get_sequence('genbank', 2); >>> >>> Brian O. >>> >>> >>> On 6/20/06 7:36 AM, "George Tzotzos" > wrote: >>> >>>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the >>>> following script to test the installation: >>>> >>>> use Bio::Perl; >>>> use strict; >>>> use warnings; >>>> >>>> my $seq_object = get_sequence('swissprot', "P09651"); >>>> >>>> write_sequence(">roa1.fasta", 'fasta', $seq_object); >>>> >>>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I >>>> get the message below. >>>> >>>> Any help on the nature of the problem and how to overcome it >>>> would be >>>> greatly appreciated. >>>> >>>> Thanks >>>> >>>> George >>>> >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: swissprot stream with no ID. Not swissprot in my book >>>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ >>>> swiss.pm:179 >>>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ >>>> WebDBSeqI.pm:153 >>>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 >>>> STACK toplevel tut2.pl:5 >>>> >>>> >>>> >>>> George T. Tzotzos Ph.D >>>> Vienna, Austria > > > > > > Michael Gore > > Graduate Student > > Department of Plant Breeding and Genetics > > Cornell University > > Institute for Genomic Diversity > > 175 Biotechnology Building > > Ithaca, NY 14853-2703 > > > > Office: (607) 255-1809 > > Fax: (607) 255-6249 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mag87 at cornell.edu Sat Jan 27 14:19:25 2007 From: mag87 at cornell.edu (Michael Gore) Date: Sat, 27 Jan 2007 14:19:25 -0500 Subject: [Bioperl-l] swissprot stream with no ID. Not swissprot in my book In-Reply-To: Message-ID: <000001c74248$146880a0$de7afd80@maize.cornell.edu> Chris, Both versions of Perl and BioPerl on my machine were antiquated. Problem solved with updates. Thanks, Mike -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Saturday, January 27, 2007 9:35 AM To: Michael Gore Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] swissprot stream with no ID. Not swissprot in my book This did fix it for most users; this is the first problem I have seen with it since then. We don't have enough information to go on here to really diagnose the problem beyond that you're using Bio::Perl. Are you using the latest BioPerl version (1.5.2, which has the fix)? How did you retrieve your sequence (i.e. do you have a script)? What OS are you on? I run the following using the latest bioperl and it works: use Bio::Perl; my $seq = get_sequence('swissprot',"ROA1_HUMAN"); print $seq->accession_number,"\n"; print $seq->seq,"\n"; Chris On Jan 27, 2007, at 1:27 AM, Michael Gore wrote: > I am having a similar error retrieving sequence from SwissProt. > Thought may > be changing 'swall' to 'UniProtKB' would solve the issue, but it still > remains. > >>>> ------------- EXCEPTION ------------- >>>> MSG: swissprot stream with no ID. Not swissprot in my book >>>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ >>>> swiss.pm:179 >>>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ >>>> WebDBSeqI.pm:153 >>>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 >>>> STACK toplevel tut2.pl:5 > > > It must be a URL issue? Does anyone know a remedy? > > New Bio-Perl User > > Mike > > > > On 6/20/06 1:16 PM, "Chris Fields" > wrote: > >> Brian, >> >> Brian, >> >> Looks like EBI switched the url parameter for swissprot 'swall' to >> 'UniProtKB'. I committed a change to Bio::DB::SwissProt in CVS which > fixes >> this and solves the issue. >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l- >>> bounces at lists.open-bio.org > ] On Behalf > Of Brian > Osborne >>> Sent: Tuesday, June 20, 2006 11:14 AM >>> To: George Tzotzos; bioperl-l at lists.open-bio.org > >>> Subject: Re: [Bioperl-l] Error message >>> >>> George, >>> >>> The docs I'm reading say to use 'swiss', not 'swissprot' but I think >>> there's >>> some other problem that may be specific to SwissProt. Can you >>> retrieve >>> from >>> GenBank? E.g.: >>> >>> my $seq_object = get_sequence('genbank', 2); >>> >>> Brian O. >>> >>> >>> On 6/20/06 7:36 AM, "George Tzotzos" > wrote: >>> >>>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the >>>> following script to test the installation: >>>> >>>> use Bio::Perl; >>>> use strict; >>>> use warnings; >>>> >>>> my $seq_object = get_sequence('swissprot', "P09651"); >>>> >>>> write_sequence(">roa1.fasta", 'fasta', $seq_object); >>>> >>>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I >>>> get the message below. >>>> >>>> Any help on the nature of the problem and how to overcome it >>>> would be >>>> greatly appreciated. >>>> >>>> Thanks >>>> >>>> George >>>> >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: swissprot stream with no ID. Not swissprot in my book >>>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ >>>> swiss.pm:179 >>>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ >>>> WebDBSeqI.pm:153 >>>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 >>>> STACK toplevel tut2.pl:5 >>>> >>>> >>>> >>>> George T. Tzotzos Ph.D >>>> Vienna, Austria > > > > > > Michael Gore > > Graduate Student > > Department of Plant Breeding and Genetics > > Cornell University > > Institute for Genomic Diversity > > 175 Biotechnology Building > > Ithaca, NY 14853-2703 > > > > Office: (607) 255-1809 > > Fax: (607) 255-6249 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From george.heller at yahoo.com Sat Jan 27 21:31:56 2007 From: george.heller at yahoo.com (George Heller) Date: Sat, 27 Jan 2007 18:31:56 -0800 (PST) Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: <4B0E46B5-48ED-4E2B-B4BE-0210675FB2AE@gmx.net> Message-ID: <215228.54033.qm@web58901.mail.re1.yahoo.com> Hi Hilmar, I tried the lookup and noupdate options, and also made changes to the SeqProcessor.pm package for the accession, package SeqProcessor::Accession; use strict; use vars qw(@ISA); use Bio::Seq::BaseSeqProcessor; use Bio::SeqFeature::Generic; @ISA = qw(Bio::Seq::BaseSeqProcessor); sub process_seq { my ($self, $seq) = @_; my @ids = split(/|/,$seq->display_id); $seq->accession_number($ids[0]); return ($seq); } 1; I invoke the load_seqdatabase.pl as, perl load_seqdatabase.pl -host localhost -dbname usda-06 -format fasta -dbuser postgres -driver Pg --lookup --noupdate --pipeline="SeqProcessor::Accession" maize_pep.fasta Loading maize_pep.fasta ... I get the error, -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were ("FGENESHT0000021||AC155633|113788|114708|-1","FGENESHT0000021||AC155633|113788|114708|-1","FGENESHT0000021||AC155633|113788|114708|-1","","0","") FKs (1,) ERROR: value too long for type character varying(40) --------------------------------------------------- Could not store FGENESHT0000021||AC155633|113788|114708|-1: ------------- EXCEPTION ------------- MSG: error while executing statement in Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current transaction is aborted, commands ignored until end of transaction block STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 STACK (eval) load_seqdatabase.pl:620 STACK toplevel load_seqdatabase.pl:602 -------------------------------------- at load_seqdatabase.pl line 633 As far as I gather, this error shouldnt appear as we are filtering out the accession as only the first code that appears. Ideas? George. Hilmar Lapp wrote: George, I don't know you create the FASTA file, but that's probably where the root cause is. Based on the message: > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| > AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > 1","","0","") FKs (1,) > ERROR: duplicate key violates unique constraint > "bioentry_accession_key" > --------------------------------------------------- the identifier and accession number are set, so your SeqProcessor scriptlet was executed (otherwise you'd also have seen a dynamic loading error it e.g. you perl class could be not be found or loaded by perl). If you still receive the duplicate key violation, then it can only mean that indeed a sequence with the exact same accession number was in the database already. There are different possibilities for why: you may have loaded the same file before (use --lookup and related switches if you want to update existing sequences), or your FASTA file contains multiple sequences with the same ID, or you have a sequence with the same ID in different FASTA files, if you are loading from more than one file. In either of the two latter cases, you will need to find a way to disambiguate the IDs. BTW you also want to consider to parse the concatenated ID 'FGENESHT0000001||AC155633|570|4400|1' apart into its component IDs, and then use only one component. For example: my @ids = split(/|/,$seq->display_id); $seq->accession_number($ids[0]); Obviously, this will only make for a nicer accession number, and not solve your duplicate ID problem, as the latter is in the file(s) you load. -hilmar On Jan 25, 2007, at 8:51 PM, George Heller wrote: > Hi Hilmar, > > I still seem to be having problems loading my fasta file. I wrote a > new package, SeqProcessor.pm as below, > > package SeqProcessor::Accession; > use strict; > use vars qw(@ISA); > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > @ISA = qw(Bio::Seq::BaseSeqProcessor); > sub process_seq > { > my ($self, $seq) = @_; > $seq->accession_number($seq->display_id); > return ($seq); > } > 1; > I have this file SeqProcessor.pm in my home directory, and I have > set the PERL5LIB variable accordingly. When I run load_seqdatabase.pl, > > perl load_seqdatabase.pl -host localhost -dbname biodb -format > fasta -dbuser postgres -driver Pg -- > pipeline="SeqProcessor::Accession" maize_pep.fasta > > I still get the error, > > Loading maize_pep.fasta ... > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| > AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > 1","","0","") FKs (1,) > ERROR: duplicate key violates unique constraint > "bioentry_accession_key" > --------------------------------------------------- > Could not store FGENESHT0000001||AC155633|570|4400|1: > ------------- EXCEPTION ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > transaction is aborted, commands ignored until end of transaction > block > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ > perl//Bio/DB/Persistent/PersistentObject.pm:272 > STACK (eval) load_seqdatabase.pl:620 > STACK toplevel load_seqdatabase.pl:602 > -------------------------------------- > at load_seqdatabase.pl line 633 > Is there something I am missing? > > Thanks! > George. > > > Hilmar Lapp wrote: > Hi George, sorry for the sluggish response, I was tied up during the > week. This is also why you always want to keep the thread on the list. > > Perl is an interpreted language, so no compilation is necessary. The > only thing you need to do is have the package in a place where perl > can find it. The simplest way to achieve this is by setting the > PERL5LIB environment variable: > > $ export PERL5LIB=/where/you/put/your/perl/package > > or if PERL5LIB was set already, you'd append it: > > $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package > > I do assume that you didn't really add your code to the SeqAdaptor.pm > package - there is no necessity for nor benefit from that, and at > worst (and quite likely) perl won't be able to find the package. Note > that there is plenty of documentation for how to write packages for > perl and how to make them accessible to perl. > > Hth, > > -hilmar > > On Jan 8, 2007, at 11:52 PM, George Heller wrote: > > > Hi Hilmer. > > > > Thanks so much for the response. As I am new to Bioperl, I have > > another question. > > > > I have made the changes as suggested by you, and have added the > > code below to the SeqAdaptor.pm script. > > > > package SeqProcessor::Accession; > > use strict; > > use vars qw(@ISA); > > use Bio::Seq::BaseSeqProcessor; > > use Bio::SeqFeature::Generic; > > > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > > > sub process_seq > > { > > my ($self, $seq) = @_; > > $seq->accession_number($seq->display_id); > > return ($seq); > > } > > > > Now that I have done my changes, do I need to compile or something > > for the changes to reflect? If so, can you please let me know the > > command for the same, or direct me to any lin that has > > documentation for the same? > > > > Thanks so much for the help. > > George. > > > > Hilmar Lapp wrote: > > George, > > > > this is almost certainly caused by using FASTA format and bioperl's > > treatment of it. I am guilty of not having written a FAQ yet for > > Bioperl-db, as this would certainly be there. > > > > Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl > > uses Bioperl to parse sequence files) does not extract the accession > > number from the description line of the fasta sequence, and instead > > sets the accession_number property if sequence objects it creates to > > "unknown". Since there is a unique key constraint on > > (accession,version,namespace) the second sequence loaded will raise > > an exception as it will violate the constraint. > > > > The simplest way to deal with this is to write a SeqProcessor that > > massages the accession_number appropriately and then supply the > > module to load_seqdatabase.pl using the --pipeline command line > > switch. > > > > There are several examples for how to do this in the email archives. > > See for example this thread on the Biosql list: > > > > http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html > > > > with two links to examples, and Marc Logghe gives another one in the > > thread itself. > > > > Hth, > > > > -hilmar > > > > On Jan 8, 2007, at 3:17 PM, George Heller wrote: > > > > > Hi all. > > > > > > I am new to Bioperl and am trying to run the load_seqdatabase.pl > > > script to load sequence data from a file into Postgres database. I > > > am invoking the script through the following command: > > > > > > perl load_seqdatabase.pl -host localhost -dbname biodb06 -format > > > fasta > > > -dbuser postgres -driver Pg > > > > > > I am getting the following error: > > > > > > -------------------- WARNING --------------------- > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > > > were ("FGENES > > > HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570| > 4400| > > > 1","unknown" > > > ,"","0","") FKs (1,) > > > ERROR: duplicate key violates unique constraint > > > "bioentry_accession_key" > > > --------------------------------------------------- > > > Could not store unknown: > > > ------------- EXCEPTION ------------- > > > MSG: error while executing statement in > > > Bio::DB::BioSQL::SeqAdaptor::find_by_uni > > > que_key: ERROR: current transaction is aborted, commands ignored > > > until end of t > > > ransaction block > > > STACK > > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ > > > lib/perl > > > 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > > > usr/lib/perl5 > > > /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ > > > perl5/site_perl/5 > > > .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/ > perl5/ > > > site_perl/5. > > > 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > > > STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ > > > site_perl/5.8. > > > 5/Bio/DB/Persistent/PersistentObject.pm:271 > > > STACK (eval) load_seqdatabase.pl:620 > > > STACK toplevel load_seqdatabase.pl:602 > > > -------------------------------------- > > > at load_seqdatabase.pl line 633 > > > > > > Can anyone tell me how I can correct this error and get my script > > > running? Thanks!!! > > > > > > George. > > > > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam protection around > > > http://mail.yahoo.com > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > Looking for earth-friendly autos? > Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== --------------------------------- Check out the all-new Yahoo! Mail beta - Fire up a more powerful email and get things done faster. From mdehoon at c2b2.columbia.edu Sat Jan 27 18:36:11 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Sat, 27 Jan 2007 18:36:11 -0500 Subject: [Bioperl-l] Downloading the tutorial for offline reading Message-ID: <45BBE1EB.3090607@c2b2.columbia.edu> Hi everybody, We at Biopython were discussing whether we should move to a wiki-only documentation for Biopython instead of the current LaTeX (with PDF & HTML output) documentation in addition to the wiki. Whereas the wiki can be updated more easily and by more people, the PDF/HTML documentation has the advantage that it can be downloaded for offline reading. So I was wondering how BioPerl creates its documentation. From the website, it appears that all documentation is in wiki-form. Is there a way to download the whole BioPerl documentation for offline reading? If not, do users frequently express a need for this? Or is everybody more or less happy with the wiki-only approach? Best wishes from the Biopython side, --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From bosborne11 at verizon.net Sun Jan 28 00:12:57 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Sun, 28 Jan 2007 00:12:57 -0500 Subject: [Bioperl-l] Downloading the tutorial for offline reading In-Reply-To: <45BBE1EB.3090607@c2b2.columbia.edu> Message-ID: Michiel, I can't speak for others but I think the ideal would be to also have PDF versions. I've read that various html2pdf converters have been used at Wiki sites to make PDF but I've never tested any of these myself. If you come across a good one please keep us apprised. Brian O. On 1/27/07 6:36 PM, "Michiel Jan Laurens de Hoon" wrote: > Or is everybody more > or less happy with the wiki-only approach? From cjfields at uiuc.edu Sun Jan 28 09:34:13 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 28 Jan 2007 08:34:13 -0600 Subject: [Bioperl-l] Downloading the tutorial for offline reading In-Reply-To: References: Message-ID: <6C725BEA-B641-4173-8A6C-61792D5F5BD8@uiuc.edu> The wiki pages have printable versions (linked via the toolbar below the search box); those pages only have the wiki documentation. Here's the BioPerl FAQ: http://www.bioperl.org/w/index.php?title=FAQ&printable=yes You could generate PDFs if you were able to print/save to PDF from any app (Mac OS X, for instance). In fact, the printable installation wiki pages are what I used for generating the INSTALL docs in CVS. I just dumped to a text file from elinks and edited out the explicit links chris On Jan 27, 2007, at 11:12 PM, Brian Osborne wrote: > Michiel, > > I can't speak for others but I think the ideal would be to also > have PDF > versions. I've read that various html2pdf converters have been used > at Wiki > sites to make PDF but I've never tested any of these myself. If you > come > across a good one please keep us apprised. > > Brian O. > > > On 1/27/07 6:36 PM, "Michiel Jan Laurens de Hoon" > wrote: > >> Or is everybody more >> or less happy with the wiki-only approach? > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Sun Jan 28 11:35:47 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Sun, 28 Jan 2007 11:35:47 -0500 Subject: [Bioperl-l] Downloading the tutorial for offline reading In-Reply-To: <6C725BEA-B641-4173-8A6C-61792D5F5BD8@uiuc.edu> Message-ID: Chris, I wasn't perfectly clear. There is value in having PDFs within the distribution itself, like we had when all the HOWTOs were in Docbook. That was the single best attribute of Docbook, the ability to create PDF, HTML, and text version by running a script. Brian O. On 1/28/07 9:34 AM, "Chris Fields" wrote: > The wiki pages have printable versions (linked via the toolbar below > the search box); those pages only have the wiki documentation. > Here's the BioPerl FAQ: > > http://www.bioperl.org/w/index.php?title=FAQ&printable=yes > > You could generate PDFs if you were able to print/save to PDF from > any app (Mac OS X, for instance). In fact, the printable > installation wiki pages are what I used for generating the INSTALL > docs in CVS. I just dumped to a text file from elinks and edited out > the explicit links > > chris > > On Jan 27, 2007, at 11:12 PM, Brian Osborne wrote: > >> Michiel, >> >> I can't speak for others but I think the ideal would be to also >> have PDF >> versions. I've read that various html2pdf converters have been used >> at Wiki >> sites to make PDF but I've never tested any of these myself. If you >> come >> across a good one please keep us apprised. >> >> Brian O. >> >> >> On 1/27/07 6:36 PM, "Michiel Jan Laurens de Hoon" >> wrote: >> >>> Or is everybody more >>> or less happy with the wiki-only approach? >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > From hlapp at gmx.net Sun Jan 28 13:11:46 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 28 Jan 2007 13:11:46 -0500 Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: <215228.54033.qm@web58901.mail.re1.yahoo.com> References: <215228.54033.qm@web58901.mail.re1.yahoo.com> Message-ID: That's odd indeed. Did you try and put a print statement before the return statement that proves that 1) the codes gets executed, and 2) display_id(), primary_id() and accession_number() have the expected values? BTW you might also want to set primary_id() to undef (as the ID found in the FASTA files doesn't really count as a primary database- specific ID anyway). The identifier column in bioentry (which primary_id() maps to) is constrained to 40 chars as well. -hilmar On Jan 27, 2007, at 9:31 PM, George Heller wrote: > Hi Hilmar, > > I tried the lookup and noupdate options, and also made changes to > the SeqProcessor.pm package for the accession, > package SeqProcessor::Accession; > use strict; > use vars qw(@ISA); > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > @ISA = qw(Bio::Seq::BaseSeqProcessor); > sub process_seq > { > my ($self, $seq) = @_; > my @ids = split(/|/,$seq->display_id); > $seq->accession_number($ids[0]); > return ($seq); > } > 1; > I invoke the load_seqdatabase.pl as, > perl load_seqdatabase.pl -host localhost -dbname usda-06 -format > fasta -dbuser postgres -driver Pg --lookup --noupdate -- > pipeline="SeqProcessor::Accession" maize_pep.fasta > Loading maize_pep.fasta ... > I get the error, > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were ("FGENESHT0000021||AC155633|113788| > 114708|-1","FGENESHT0000021||AC155633|113788| > 114708|-1","FGENESHT0000021||AC155633|113788|114708|-1","","0","") > FKs (1,) > ERROR: value too long for type character varying(40) > --------------------------------------------------- > Could not store FGENESHT0000021||AC155633|113788|114708|-1: > ------------- EXCEPTION ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > transaction is aborted, commands ignored until end of transaction > block > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ > perl//Bio/DB/Persistent/PersistentObject.pm:272 > STACK (eval) load_seqdatabase.pl:620 > STACK toplevel load_seqdatabase.pl:602 > -------------------------------------- > at load_seqdatabase.pl line 633 > As far as I gather, this error shouldnt appear as we are > filtering out the accession as only the first code that appears. > Ideas? > > George. > > > Hilmar Lapp wrote: > George, > > I don't know you create the FASTA file, but that's probably where the > root cause is. Based on the message: > >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| >> 1","","0","") FKs (1,) >> ERROR: duplicate key violates unique constraint >> "bioentry_accession_key" >> --------------------------------------------------- > > the identifier and accession number are set, so your SeqProcessor > scriptlet was executed (otherwise you'd also have seen a dynamic > loading error it e.g. you perl class could be not be found or loaded > by perl). If you still receive the duplicate key violation, then it > can only mean that indeed a sequence with the exact same accession > number was in the database already. > > There are different possibilities for why: you may have loaded the > same file before (use --lookup and related switches if you want to > update existing sequences), or your FASTA file contains multiple > sequences with the same ID, or you have a sequence with the same ID > in different FASTA files, if you are loading from more than one file. > In either of the two latter cases, you will need to find a way to > disambiguate the IDs. > > BTW you also want to consider to parse the concatenated ID > 'FGENESHT0000001||AC155633|570|4400|1' apart into its component IDs, > and then use only one component. For example: > > my @ids = split(/|/,$seq->display_id); > $seq->accession_number($ids[0]); > > Obviously, this will only make for a nicer accession number, and not > solve your duplicate ID problem, as the latter is in the file(s) you > load. > > -hilmar > > On Jan 25, 2007, at 8:51 PM, George Heller wrote: > >> Hi Hilmar, >> >> I still seem to be having problems loading my fasta file. I wrote a >> new package, SeqProcessor.pm as below, >> >> package SeqProcessor::Accession; >> use strict; >> use vars qw(@ISA); >> use Bio::Seq::BaseSeqProcessor; >> use Bio::SeqFeature::Generic; >> @ISA = qw(Bio::Seq::BaseSeqProcessor); >> sub process_seq >> { >> my ($self, $seq) = @_; >> $seq->accession_number($seq->display_id); >> return ($seq); >> } >> 1; >> I have this file SeqProcessor.pm in my home directory, and I have >> set the PERL5LIB variable accordingly. When I run >> load_seqdatabase.pl, >> >> perl load_seqdatabase.pl -host localhost -dbname biodb -format >> fasta -dbuser postgres -driver Pg -- >> pipeline="SeqProcessor::Accession" maize_pep.fasta >> >> I still get the error, >> >> Loading maize_pep.fasta ... >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| >> 1","","0","") FKs (1,) >> ERROR: duplicate key violates unique constraint >> "bioentry_accession_key" >> --------------------------------------------------- >> Could not store FGENESHT0000001||AC155633|570|4400|1: >> ------------- EXCEPTION ------------- >> MSG: error while executing statement in >> Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current >> transaction is aborted, commands ignored until end of transaction >> block >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 >> STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ >> perl//Bio/DB/Persistent/PersistentObject.pm:272 >> STACK (eval) load_seqdatabase.pl:620 >> STACK toplevel load_seqdatabase.pl:602 >> -------------------------------------- >> at load_seqdatabase.pl line 633 >> Is there something I am missing? >> >> Thanks! >> George. >> >> >> Hilmar Lapp wrote: >> Hi George, sorry for the sluggish response, I was tied up during the >> week. This is also why you always want to keep the thread on the >> list. >> >> Perl is an interpreted language, so no compilation is necessary. The >> only thing you need to do is have the package in a place where perl >> can find it. The simplest way to achieve this is by setting the >> PERL5LIB environment variable: >> >> $ export PERL5LIB=/where/you/put/your/perl/package >> >> or if PERL5LIB was set already, you'd append it: >> >> $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package >> >> I do assume that you didn't really add your code to the SeqAdaptor.pm >> package - there is no necessity for nor benefit from that, and at >> worst (and quite likely) perl won't be able to find the package. Note >> that there is plenty of documentation for how to write packages for >> perl and how to make them accessible to perl. >> >> Hth, >> >> -hilmar >> >> On Jan 8, 2007, at 11:52 PM, George Heller wrote: >> >>> Hi Hilmer. >>> >>> Thanks so much for the response. As I am new to Bioperl, I have >>> another question. >>> >>> I have made the changes as suggested by you, and have added the >>> code below to the SeqAdaptor.pm script. >>> >>> package SeqProcessor::Accession; >>> use strict; >>> use vars qw(@ISA); >>> use Bio::Seq::BaseSeqProcessor; >>> use Bio::SeqFeature::Generic; >>> >>> @ISA = qw(Bio::Seq::BaseSeqProcessor); >>> >>> sub process_seq >>> { >>> my ($self, $seq) = @_; >>> $seq->accession_number($seq->display_id); >>> return ($seq); >>> } >>> >>> Now that I have done my changes, do I need to compile or something >>> for the changes to reflect? If so, can you please let me know the >>> command for the same, or direct me to any lin that has >>> documentation for the same? >>> >>> Thanks so much for the help. >>> George. >>> >>> Hilmar Lapp wrote: >>> George, >>> >>> this is almost certainly caused by using FASTA format and bioperl's >>> treatment of it. I am guilty of not having written a FAQ yet for >>> Bioperl-db, as this would certainly be there. >>> >>> Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl >>> uses Bioperl to parse sequence files) does not extract the accession >>> number from the description line of the fasta sequence, and instead >>> sets the accession_number property if sequence objects it creates to >>> "unknown". Since there is a unique key constraint on >>> (accession,version,namespace) the second sequence loaded will raise >>> an exception as it will violate the constraint. >>> >>> The simplest way to deal with this is to write a SeqProcessor that >>> massages the accession_number appropriately and then supply the >>> module to load_seqdatabase.pl using the --pipeline command line >>> switch. >>> >>> There are several examples for how to do this in the email archives. >>> See for example this thread on the Biosql list: >>> >>> http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html >>> >>> with two links to examples, and Marc Logghe gives another one in the >>> thread itself. >>> >>> Hth, >>> >>> -hilmar >>> >>> On Jan 8, 2007, at 3:17 PM, George Heller wrote: >>> >>>> Hi all. >>>> >>>> I am new to Bioperl and am trying to run the load_seqdatabase.pl >>>> script to load sequence data from a file into Postgres database. I >>>> am invoking the script through the following command: >>>> >>>> perl load_seqdatabase.pl -host localhost -dbname biodb06 -format >>>> fasta >>>> -dbuser postgres -driver Pg >>>> >>>> I am getting the following error: >>>> >>>> -------------------- WARNING --------------------- >>>> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >>>> were ("FGENES >>>> HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570| >> 4400| >>>> 1","unknown" >>>> ,"","0","") FKs (1,) >>>> ERROR: duplicate key violates unique constraint >>>> "bioentry_accession_key" >>>> --------------------------------------------------- >>>> Could not store unknown: >>>> ------------- EXCEPTION ------------- >>>> MSG: error while executing statement in >>>> Bio::DB::BioSQL::SeqAdaptor::find_by_uni >>>> que_key: ERROR: current transaction is aborted, commands ignored >>>> until end of t >>>> ransaction block >>>> STACK >>>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ >>>> lib/perl >>>> 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 >>>> STACK >> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >>>> usr/lib/perl5 >>>> /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ >>>> perl5/site_perl/5 >>>> .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/ >> perl5/ >>>> site_perl/5. >>>> 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>> STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ >>>> site_perl/5.8. >>>> 5/Bio/DB/Persistent/PersistentObject.pm:271 >>>> STACK (eval) load_seqdatabase.pl:620 >>>> STACK toplevel load_seqdatabase.pl:602 >>>> -------------------------------------- >>>> at load_seqdatabase.pl line 633 >>>> >>>> Can anyone tell me how I can correct this error and get my script >>>> running? Thanks!!! >>>> >>>> George. >>>> >>>> >>>> __________________________________________________ >>>> Do You Yahoo!? >>>> Tired of spam? Yahoo! Mail has the best spam protection around >>>> http://mail.yahoo.com >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam protection around >>> http://mail.yahoo.com >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> >> Looking for earth-friendly autos? >> Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > > --------------------------------- > Check out the all-new Yahoo! Mail beta - Fire up a more powerful > email and get things done faster. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sun Jan 28 23:50:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 28 Jan 2007 22:50:15 -0600 Subject: [Bioperl-l] DocBook docs, was Re: Downloading the tutorial for offline reading In-Reply-To: References: Message-ID: <6B743BF0-1C8C-410F-83D8-1EE2074348E9@uiuc.edu> Brian, Agreed. I also wish there was a way of syncing specific docs (HOWTOs, FAQ, INSTALL, etc.) between CVS and the wiki, so maybe going back to having DocBook (or similar) for those docs is the best answer. Just checking Google, there are a few conversion tools for MediaWiki to DocBook, including this one: http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php Do you think that would be best? chris On Jan 28, 2007, at 10:35 AM, Brian Osborne wrote: > Chris, > > I wasn't perfectly clear. There is value in having PDFs within the > distribution itself, like we had when all the HOWTOs were in > Docbook. That > was the single best attribute of Docbook, the ability to create > PDF, HTML, > and text version by running a script. > > Brian O. ... From Anthony.Underwood at hpa.org.uk Mon Jan 29 06:29:16 2007 From: Anthony.Underwood at hpa.org.uk (Anthony Underwood) Date: Mon, 29 Jan 2007 11:29:16 -0000 Subject: [Bioperl-l] Bug in the Bio::SeqIO.scf.pm code Message-ID: <69E2D2428BD6C2429B8944FEC53B1EB917537C@colhpaexc004.HPA.org.uk> Hi All, Has anybody else come across this problem: I have some scf files that fail to be read by the next_seq method within scf.pm due to line 226: my $name_comment = $name_comments[0]->as_text(); Error: can not call method as_text on undefined object. It appears that if the scf file has no comments then the array @name_comments created in the previous line is empty. I overcame this by replacing line 226 with the following lines: my $name_comment; if (@name_comments){ $name_comment = $name_comments[0]->as_text(); $name_comment =~ s/^Comment:\s+//; } Should this be incorporated into the code? Many thanks, Anthony ----------------------------------------- ******************************************************************* ******* The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************** ************************************************ From cjfields at uiuc.edu Mon Jan 29 08:21:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 29 Jan 2007 07:21:51 -0600 Subject: [Bioperl-l] Bug in the Bio::SeqIO.scf.pm code In-Reply-To: <69E2D2428BD6C2429B8944FEC53B1EB917537C@colhpaexc004.HPA.org.uk> References: <69E2D2428BD6C2429B8944FEC53B1EB917537C@colhpaexc004.HPA.org.uk> Message-ID: <40DF7C1B-0536-436C-B6BB-7A68D4048C2E@uiuc.edu> Could you add a test file for the bug to the bug report or send it to me? If you send it make sure not to CC the entire mail list. I'll commit this to CVS once we have a test in place. chris On Jan 29, 2007, at 5:29 AM, Anthony Underwood wrote: > Hi All, > > > > Has anybody else come across this problem: > > > > I have some scf files that fail to be read by the next_seq method > within > scf.pm due to line 226: my $name_comment = $name_comments[0]- > >as_text(); > > Error: can not call method as_text on undefined object. > > > > It appears that if the scf file has no comments then the array > @name_comments created in the previous line is empty. > > > > I overcame this by replacing line 226 with the following lines: > > my $name_comment; > > if (@name_comments){ > > $name_comment = $name_comments[0]->as_text(); > > $name_comment =~ s/^Comment:\s+//; > > } > > > > > > Should this be incorporated into the code? Many thanks, > > > > Anthony > > > > > ----------------------------------------- > ******************************************************************* > ******* > The information contained in the EMail and any attachments is > confidential and intended solely and for the attention and use of > the named addressee(s). It may not be disclosed to any other person > without the express authority of the HPA, or the intended > recipient, or both. If you are not the intended recipient, you must > not disclose, copy, distribute or retain this message or any part > of it. This footnote also confirms that this EMail has been swept > for computer viruses, but please re-sweep any attachments before > opening or saving. HTTP://www.HPA.org.uk ************************** > ************************************************ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From avilella at gmail.com Mon Jan 29 12:18:01 2007 From: avilella at gmail.com (Albert Vilella) Date: Mon, 29 Jan 2007 17:18:01 +0000 Subject: [Bioperl-l] Hyphy::REL code example In-Reply-To: <4A98ACB8EC146149872BAC9A132A582CBBD61F@icex5.ic.ac.uk> References: <4A98ACB8EC146149872BAC9A132A582CBBD61F@icex5.ic.ac.uk> Message-ID: <358f4d650701290918p1c30dc55la9a877a5e8083ed7@mail.gmail.com> I think it should work if you change the occurrences of "nh" to "newick". I don't know why I mistakenly wrote the synopsis with 'nh' and the test with the correct 'newick', but that should do the trick :) On 1/29/07, Johri, Saurabh wrote: > > > Dear Albert, > > I'm trying to use Bioperl to run the REL analysis within Hyphy and have used > the example code which you have provided i.e. > http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Phylo/Hyphy/REL.html#libs > > Unfortunately, I am experiencing problems with running this code and I > receive the exception for the tree file, because the errors tell me that > "Can't locate Bio/TreeIO/nh.pm in @INC" > I have browsed CPAN and have google searched to look for the nh.pm module, > although i'm unable to find this.. > > Could you tell me where I can locate this from? > > Thank you for your help, > > Saurabh > > Saurabh Johri > PhD Candidate > Centre For Molecular Microbiology & Infection > Imperial College London > London > SW7 2AZ From MEC at stowers-institute.org Mon Jan 29 16:17:24 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Mon, 29 Jan 2007 15:17:24 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and annotations Message-ID: Lincoln, Thanks for your suggestions on approach to my problems augmenting Flybase annotation. I am trying to follow them and finding the following oddities The first issue relates to the intermix of 'annotations' and 'tag values'. I find that Bio::DB::SeqFeature implements some of the 'tag' methods and some of the 'Annotation' methods. Here is a perl one-liner that shows values stored using add_tag_value are not retreived with get_tag_values, but rather with get_Annotations. > perl -MBio::DB::SeqFeature -e 'my $f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print "get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" . $f->get_Annotations("x");' whose output is: get_tag_values: get_Annotations: 666 Tracing this shows me that this results from the fact that: Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase (via Bio::DB::SeqFeature::NormalizedFeature) which does not support -tags in ->new but rather -attributes, viz: -attributes a hashref of tag value attributes, in which the key is the tag and the value is an array reference of values And though Bio::Graphics::FeatureBase purports to implement Bio::SeqFeatureI, it only partially implements the 'tag' methods (now deprecated and relegated to Bio::AnnotatableI). In particular, the '*' methods Bio::SeqFeatureI are not implemented in Bio::Graphics::FeatureBase has_tag * add_tag_value get_tag_values get_all_tags * remove_tag get_tagset_values get_Annotations As a result, add_tag_value and remove_tag are inherited from different modules whose understanding of tags is not the same! This one-liner : >perl -MClass::ISA -MClass::Inspector -MBio::DB::SeqFeature -e 'my @c = Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep {Class::Inspector->function_exists($_, $fn)} @c)}' confirms that they are defined in different packages, namely: add_tag_value: Bio::AnnotatableI get_tag_values: Bio::Graphics::FeatureBase Bio::AnnotatableI Proposed solution... hmmmm ..... I dunno.... maybe the following patch to Bio::Graphics::FeatureBase->add_tag_value : sub add_tag_value { my ($self,$tag, at vals) = @_; push @{$self->{attributes}{$tag}}, @vals; } It fixes my use case for now but I'm still concerned and confused about this variety of methods. Suggestions? ------------------------------------------------------------------------ - Also, I think that any "ID" in column 9 of GFF3 float file should be preserved through a round-trip through a Bio::DB::SeqFeature store, but this is not yet possible since any ID attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to locally patch GFF3Loader::handle_feature method to add the following: # mec at stowers-institute.org , wondering why not all attributes are # carried forward, adds ID tag in particular service of # round-tripping ID, which, though present in database as load_id # attribute, was getting lost as itself $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID}; Poised to patch.... what d'you think? Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Tuesday, December 19, 2006 3:58 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm > wrote: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lzhtom at hotmail.com Mon Jan 29 21:23:49 2007 From: lzhtom at hotmail.com (zhihua li) Date: Tue, 30 Jan 2007 02:23:49 +0000 Subject: [Bioperl-l] "jump" to the right sequence in a SeqIO object Message-ID: hi netters, suppose I read in a multi-fasta file to a SeqIO object: my $seqinput=Bio::SeqIO->new(-file=>'~/file.txt', -format=>'Fasta'); This file contains 100 fasta sequences. Right now I only want to deal with two of them, with the display ID "X" and "Y" respectively. Normally what I'd do is to write a loop to "capture" these two: my ($X, $Y); while($seqinput->nextseq){ if($_->id="X"){$X=$_;} if($_->id="Y"){$Y=$_;} } My question is, is there some other way to do that? Is it possible to locate the two sequences that have the display ID as "X" and "Y" directly, without looping through the whole list? Thanks a lot! _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From torsten.seemann at infotech.monash.edu.au Mon Jan 29 22:12:40 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 30 Jan 2007 14:12:40 +1100 Subject: [Bioperl-l] "jump" to the right sequence in a SeqIO object In-Reply-To: References: Message-ID: > This file contains 100 fasta sequences. Right now I only want to deal with > two of them, with the display ID "X" and "Y" respectively. > My question is, is there some other way to do that? Is it possible to > locate the two sequences that have the display ID as "X" and "Y" directly, without looping > through the whole list? http://www.bioperl.org/wiki/Bptutorial.pl#Indexing_and_accessing_local_databases_Bio::Index::.2A.2C_bp_index.pl.2C_bp_fetch.pl.2C_Bio::DB::.2A.29 http://www.bioperl.org/wiki/Module:Bio::Index::Fasta http://www.bioperl.org/wiki/FAQ#How_do_I_use_Bio::Index::Fasta_and_index_on_different_ids.3F --Torsten From cjfields at uiuc.edu Mon Jan 29 22:18:53 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 29 Jan 2007 21:18:53 -0600 Subject: [Bioperl-l] "jump" to the right sequence in a SeqIO object In-Reply-To: References: Message-ID: On Jan 29, 2007, at 8:23 PM, zhihua li wrote: > hi netters, > > suppose I read in a multi-fasta file to a SeqIO object: > > my $seqinput=Bio::SeqIO->new(-file=>'~/file.txt', > -format=>'Fasta'); > > This file contains 100 fasta sequences. Right now I only want to > deal with two of them, with the display ID "X" and "Y" respectively. > Normally what I'd do is to write a loop to "capture" these two: > my ($X, $Y); > while($seqinput->nextseq){ > if($_->id="X"){$X=$_;} > if($_->id="Y"){$Y=$_;} > } > > My question is, is there some other way to do that? Is it possible > to locate the two > sequences that have the display ID as "X" and "Y" directly, without > looping through > the whole list? There are several flat database implementations which would do what you want. Look at the POD for Bio::DB::Fasta, Bio::Index::Fasta, as well as the FAQ, the tutorial, and the Flat Database HOWTO, all of which have more information. chris From bosborne11 at verizon.net Mon Jan 29 22:56:26 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 29 Jan 2007 22:56:26 -0500 Subject: [Bioperl-l] "jump" to the right sequence in a SeqIO object In-Reply-To: Message-ID: And http://www.bioperl.org/wiki/HOWTO:Beginners#Indexing_for_Fast_Retrieval. On 1/29/07 10:12 PM, "Torsten Seemann" wrote: >> This file contains 100 fasta sequences. Right now I only want to deal with >> two of them, with the display ID "X" and "Y" respectively. >> My question is, is there some other way to do that? Is it possible to >> locate the two sequences that have the display ID as "X" and "Y" directly, >> without looping >> through the whole list? > > http://www.bioperl.org/wiki/Bptutorial.pl#Indexing_and_accessing_local_databas > es_Bio::Index::.2A.2C_bp_index.pl.2C_bp_fetch.pl.2C_Bio::DB::.2A.29 > http://www.bioperl.org/wiki/Module:Bio::Index::Fasta > http://www.bioperl.org/wiki/FAQ#How_do_I_use_Bio::Index::Fasta_and_index_on_di > fferent_ids.3F > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From JK at novozymes.com Tue Jan 30 02:45:34 2007 From: JK at novozymes.com (JK (Jesper Agerbo Krogh)) Date: Tue, 30 Jan 2007 08:45:34 +0100 Subject: [Bioperl-l] CON(structed) sequence databases? Message-ID: <934F95E71B6C9347A873C42AE3C196191489C800@NZT0004E.dknz.nzcorp.net> Hi. What do you do about parsing sequences from the "CON"-divisions of EMBL/Genbank? The entries looks just like this one: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&t erm=CH445337&doptcmdl=GenBank The bioperl 1.4 parser dies on the embl-version and the 1.5 parser uses the complete .dat file as a single entry. Thanks. Jesper From george.heller at yahoo.com Tue Jan 30 12:32:09 2007 From: george.heller at yahoo.com (George Heller) Date: Tue, 30 Jan 2007 09:32:09 -0800 (PST) Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: Message-ID: <817961.76470.qm@web58904.mail.re1.yahoo.com> Hi Hilmar, I have been trying to get around this problem for some days now, but havent had much luck. My file has about 12000 odd records, and when I try to load it with a pipeline to the package, I have about 483 records that get loaded. sub process_seq { my ($self, $seq) = @_; # $seq->accession_number($seq->display_id); my @ids = split(/\|/,$seq->display_id); $seq->accession_number($ids[0]); $seq->primary_id($ids[2]); return ($seq); } I am assuming this is because I use the ids[2] for the primary_id, and possibly the file has 483 unique records for that field. When I use some other reference like ids[3] etc(coz it has more unique values), I get the same error about the length being more than 40. I printed out the values for the display_id, primary_id and accession_number. The display_id has the entire first line of the file, so does the primary_id, if I dont set it in my new script. The accession number is split correctly and printed. This is how the first few lines in my file look like, >FGENESHT0000001||AC155633|570|4400|1 MTERKRKEIEDRKRKISGPQPGSSNRPRFSGNQPQQFRQNQRPPQQHQQFQRQYPQHQYQNRQSNQSGGQFQRQNQQAPR LPAPAAQQNSQATPAQVGNRACFHCGEQGHWVMQCPKKAAQQQSGPNAPAKQNVPQPRAGNRSQPRYNHGRLNHLEAEAV QETPSMIVGMFPVDSHIAEVLFDTGATHSFITASWVEAHNLPITTMSTPIQIDSAGGRIRADSICLNICVEIRGIAFPAN LIVMGTQGIDVILGMNWLDKYQAVISCDKRTIKLMSPLGEEVVTELVPPEPKRGSCYQLAVDSSEVDPIESIRVVSEFPD VFPKDLPGMPPERKVEFAIELLPGTAPIFKRAYRISGPELVELKEQIDELSEKGYIRPSTSPWAAPVLFVEKKDGTKRMC IDYRALNEVTIKNKYPLPRIEDLFDQLRGASVFSKIDLRSAFFMNLMNSVFMDYLDKFVVVFIDDILVYSQSEEEHADHL KMVLQRLREHQLYAKLSKCEFWINEVLFLGHIINKEGLAVDPKKVANILNWKAPTDARGIKSFIGMVGYYRRFIEGFSKI AKPMTALLGNKVEFKWTQKCQEAFEALKEKLTIAPVLVLPDVHKPFSVYCDACYTGLGCVLMQEGRVVAYSSRQLKVHEK NYPIHDLELAAVVHALKTWRHYLYGQKCDVYTDHKSLKYIFTQSELNMRQRRWLELIKDYELEIHYHPGKANVVADALSR KSQVNLMVARPMPYELAKEFDRLSLGFLNNSRGVTVELEPTLEREIKEAQKNDEKISEIRRLILDGRGKDFREDAEGVIW FKDRLCVPNVQSIRELILKEAHETAYSIHPGSEKMYQDLKKKFWWYGMKREIAEHVAMCDSCRRIKAEHQRPAGLLQPLQ IPQWKWDEIGMDFIVGLPRTRAGYDSIWVVVDRLTKSAHFIPVKTNYSSAVLAELYMSRIVCLHGVPKKIVSDRGTQFTS HFWRQLHEALGTHLNFSSAYHPQTDGQTERTNQILEDMLRACALQDQSGWDKRLPYAEFSYNNSYQASLKMSPFQALYGR SCRTPLQWDQPGEKQVFGPDILLEAEENIKMVRENLKIAQSRQRSYADTRRRELSFEVGDFVYLKVSPIRGVKRFGVKGK LAPRYIGSYQILARRGEVAYQLSLPENLSAVHDVFHVSQLKKCLRVPEEQLPVEGLEVQEDLTYVEKPVQILEVADRVTR RKTIRMCKVRWNHHSEEEATSEREDDLMAKYPELFASQP* Any suggestions? Thanks! George. Hilmar Lapp wrote: That's odd indeed. Did you try and put a print statement before the return statement that proves that 1) the codes gets executed, and 2) display_id(), primary_id() and accession_number() have the expected values? BTW you might also want to set primary_id() to undef (as the ID found in the FASTA files doesn't really count as a primary database- specific ID anyway). The identifier column in bioentry (which primary_id() maps to) is constrained to 40 chars as well. -hilmar On Jan 27, 2007, at 9:31 PM, George Heller wrote: > Hi Hilmar, > > I tried the lookup and noupdate options, and also made changes to > the SeqProcessor.pm package for the accession, > package SeqProcessor::Accession; > use strict; > use vars qw(@ISA); > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > @ISA = qw(Bio::Seq::BaseSeqProcessor); > sub process_seq > { > my ($self, $seq) = @_; > my @ids = split(/|/,$seq->display_id); > $seq->accession_number($ids[0]); > return ($seq); > } > 1; > I invoke the load_seqdatabase.pl as, > perl load_seqdatabase.pl -host localhost -dbname usda-06 -format > fasta -dbuser postgres -driver Pg --lookup --noupdate -- > pipeline="SeqProcessor::Accession" maize_pep.fasta > Loading maize_pep.fasta ... > I get the error, > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were ("FGENESHT0000021||AC155633|113788| > 114708|-1","FGENESHT0000021||AC155633|113788| > 114708|-1","FGENESHT0000021||AC155633|113788|114708|-1","","0","") > FKs (1,) > ERROR: value too long for type character varying(40) > --------------------------------------------------- > Could not store FGENESHT0000021||AC155633|113788|114708|-1: > ------------- EXCEPTION ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > transaction is aborted, commands ignored until end of transaction > block > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ > perl//Bio/DB/Persistent/PersistentObject.pm:272 > STACK (eval) load_seqdatabase.pl:620 > STACK toplevel load_seqdatabase.pl:602 > -------------------------------------- > at load_seqdatabase.pl line 633 > As far as I gather, this error shouldnt appear as we are > filtering out the accession as only the first code that appears. > Ideas? > > George. > > > Hilmar Lapp wrote: > George, > > I don't know you create the FASTA file, but that's probably where the > root cause is. Based on the message: > >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| >> 1","","0","") FKs (1,) >> ERROR: duplicate key violates unique constraint >> "bioentry_accession_key" >> --------------------------------------------------- > > the identifier and accession number are set, so your SeqProcessor > scriptlet was executed (otherwise you'd also have seen a dynamic > loading error it e.g. you perl class could be not be found or loaded > by perl). If you still receive the duplicate key violation, then it > can only mean that indeed a sequence with the exact same accession > number was in the database already. > > There are different possibilities for why: you may have loaded the > same file before (use --lookup and related switches if you want to > update existing sequences), or your FASTA file contains multiple > sequences with the same ID, or you have a sequence with the same ID > in different FASTA files, if you are loading from more than one file. > In either of the two latter cases, you will need to find a way to > disambiguate the IDs. > > BTW you also want to consider to parse the concatenated ID > 'FGENESHT0000001||AC155633|570|4400|1' apart into its component IDs, > and then use only one component. For example: > > my @ids = split(/|/,$seq->display_id); > $seq->accession_number($ids[0]); > > Obviously, this will only make for a nicer accession number, and not > solve your duplicate ID problem, as the latter is in the file(s) you > load. > > -hilmar > > On Jan 25, 2007, at 8:51 PM, George Heller wrote: > >> Hi Hilmar, >> >> I still seem to be having problems loading my fasta file. I wrote a >> new package, SeqProcessor.pm as below, >> >> package SeqProcessor::Accession; >> use strict; >> use vars qw(@ISA); >> use Bio::Seq::BaseSeqProcessor; >> use Bio::SeqFeature::Generic; >> @ISA = qw(Bio::Seq::BaseSeqProcessor); >> sub process_seq >> { >> my ($self, $seq) = @_; >> $seq->accession_number($seq->display_id); >> return ($seq); >> } >> 1; >> I have this file SeqProcessor.pm in my home directory, and I have >> set the PERL5LIB variable accordingly. When I run >> load_seqdatabase.pl, >> >> perl load_seqdatabase.pl -host localhost -dbname biodb -format >> fasta -dbuser postgres -driver Pg -- >> pipeline="SeqProcessor::Accession" maize_pep.fasta >> >> I still get the error, >> >> Loading maize_pep.fasta ... >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| >> 1","","0","") FKs (1,) >> ERROR: duplicate key violates unique constraint >> "bioentry_accession_key" >> --------------------------------------------------- >> Could not store FGENESHT0000001||AC155633|570|4400|1: >> ------------- EXCEPTION ------------- >> MSG: error while executing statement in >> Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current >> transaction is aborted, commands ignored until end of transaction >> block >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 >> STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ >> perl//Bio/DB/Persistent/PersistentObject.pm:272 >> STACK (eval) load_seqdatabase.pl:620 >> STACK toplevel load_seqdatabase.pl:602 >> -------------------------------------- >> at load_seqdatabase.pl line 633 >> Is there something I am missing? >> >> Thanks! >> George. >> >> >> Hilmar Lapp wrote: >> Hi George, sorry for the sluggish response, I was tied up during the >> week. This is also why you always want to keep the thread on the >> list. >> >> Perl is an interpreted language, so no compilation is necessary. The >> only thing you need to do is have the package in a place where perl >> can find it. The simplest way to achieve this is by setting the >> PERL5LIB environment variable: >> >> $ export PERL5LIB=/where/you/put/your/perl/package >> >> or if PERL5LIB was set already, you'd append it: >> >> $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package >> >> I do assume that you didn't really add your code to the SeqAdaptor.pm >> package - there is no necessity for nor benefit from that, and at >> worst (and quite likely) perl won't be able to find the package. Note >> that there is plenty of documentation for how to write packages for >> perl and how to make them accessible to perl. >> >> Hth, >> >> -hilmar >> >> On Jan 8, 2007, at 11:52 PM, George Heller wrote: >> >>> Hi Hilmer. >>> >>> Thanks so much for the response. As I am new to Bioperl, I have >>> another question. >>> >>> I have made the changes as suggested by you, and have added the >>> code below to the SeqAdaptor.pm script. >>> >>> package SeqProcessor::Accession; >>> use strict; >>> use vars qw(@ISA); >>> use Bio::Seq::BaseSeqProcessor; >>> use Bio::SeqFeature::Generic; >>> >>> @ISA = qw(Bio::Seq::BaseSeqProcessor); >>> >>> sub process_seq >>> { >>> my ($self, $seq) = @_; >>> $seq->accession_number($seq->display_id); >>> return ($seq); >>> } >>> >>> Now that I have done my changes, do I need to compile or something >>> for the changes to reflect? If so, can you please let me know the >>> command for the same, or direct me to any lin that has >>> documentation for the same? >>> >>> Thanks so much for the help. >>> George. >>> >>> Hilmar Lapp wrote: >>> George, >>> >>> this is almost certainly caused by using FASTA format and bioperl's >>> treatment of it. I am guilty of not having written a FAQ yet for >>> Bioperl-db, as this would certainly be there. >>> >>> Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl >>> uses Bioperl to parse sequence files) does not extract the accession >>> number from the description line of the fasta sequence, and instead >>> sets the accession_number property if sequence objects it creates to >>> "unknown". Since there is a unique key constraint on >>> (accession,version,namespace) the second sequence loaded will raise >>> an exception as it will violate the constraint. >>> >>> The simplest way to deal with this is to write a SeqProcessor that >>> massages the accession_number appropriately and then supply the >>> module to load_seqdatabase.pl using the --pipeline command line >>> switch. >>> >>> There are several examples for how to do this in the email archives. >>> See for example this thread on the Biosql list: >>> >>> http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html >>> >>> with two links to examples, and Marc Logghe gives another one in the >>> thread itself. >>> >>> Hth, >>> >>> -hilmar >>> >>> On Jan 8, 2007, at 3:17 PM, George Heller wrote: >>> >>>> Hi all. >>>> >>>> I am new to Bioperl and am trying to run the load_seqdatabase.pl >>>> script to load sequence data from a file into Postgres database. I >>>> am invoking the script through the following command: >>>> >>>> perl load_seqdatabase.pl -host localhost -dbname biodb06 -format >>>> fasta >>>> -dbuser postgres -driver Pg >>>> >>>> I am getting the following error: >>>> >>>> -------------------- WARNING --------------------- >>>> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >>>> were ("FGENES >>>> HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570| >> 4400| >>>> 1","unknown" >>>> ,"","0","") FKs (1,) >>>> ERROR: duplicate key violates unique constraint >>>> "bioentry_accession_key" >>>> --------------------------------------------------- >>>> Could not store unknown: >>>> ------------- EXCEPTION ------------- >>>> MSG: error while executing statement in >>>> Bio::DB::BioSQL::SeqAdaptor::find_by_uni >>>> que_key: ERROR: current transaction is aborted, commands ignored >>>> until end of t >>>> ransaction block >>>> STACK >>>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ >>>> lib/perl >>>> 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 >>>> STACK >> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >>>> usr/lib/perl5 >>>> /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ >>>> perl5/site_perl/5 >>>> .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/ >> perl5/ >>>> site_perl/5. >>>> 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>> STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ >>>> site_perl/5.8. >>>> 5/Bio/DB/Persistent/PersistentObject.pm:271 >>>> STACK (eval) load_seqdatabase.pl:620 >>>> STACK toplevel load_seqdatabase.pl:602 >>>> -------------------------------------- >>>> at load_seqdatabase.pl line 633 >>>> >>>> Can anyone tell me how I can correct this error and get my script >>>> running? Thanks!!! >>>> >>>> George. >>>> >>>> >>>> __________________________________________________ >>>> Do You Yahoo!? >>>> Tired of spam? Yahoo! Mail has the best spam protection around >>>> http://mail.yahoo.com >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam protection around >>> http://mail.yahoo.com >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> >> Looking for earth-friendly autos? >> Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > > --------------------------------- > Check out the all-new Yahoo! Mail beta - Fire up a more powerful > email and get things done faster. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== --------------------------------- Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. From cjfields at uiuc.edu Tue Jan 30 14:47:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 Jan 2007 13:47:55 -0600 Subject: [Bioperl-l] BLASTXML XML::SAX::Expat issues resolved Message-ID: <42BC1FFD-BAF4-43E4-84B8-987081C83B9E@uiuc.edu> Sendu, all, Bj?rn H?hrmann updated XML::SAX::Expat to v0.38 on CPAN to fix a bug that was killing BLAST XML parsing, so XML::SAX::Expat now works with BLASTXML parsing. However, it still has some problems with character encoding. This bug only seems to affect the output from description () due to '>'. Beyond that, every backend parser (XML::SAX::Expat, XML::SAX::ExpatXS) works and passes tests, even with XML::SAX character encoding issues. I am still waiting to hear back from Grant McLean on progress in fixing that particular XML::SAX bug. Based on that I think we can remove any absolute ExpatXS requirement (we can just require XML::SAX), though I still strongly recommend using XML::SAX::ExpatXS or XML::LibXML::SAX::Parser. A warning is in place to warn users about entities issues if one uses an affected XML::SAX backend (XML::SAX::PurePerl, XML::LibXML::SAX, XML::SAX::Expat). cheers! chris From bosborne11 at verizon.net Tue Jan 30 16:59:59 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 30 Jan 2007 16:59:59 -0500 Subject: [Bioperl-l] DocBook docs, was Re: Downloading the tutorial for offline reading In-Reply-To: <6B743BF0-1C8C-410F-83D8-1EE2074348E9@uiuc.edu> Message-ID: Chris, My recommendation would be to not use Docbook, for a couple of reasons. One is that very few people can stand writing in Docbook XML, you have to learn all the tags and the act of writing itself is slow. You won't get much written if people have to use this format. WYSIWYG? Don't know, but it would have to be an app that works on the troika of Linux, Win, and Mac Second problem is the conversion itself, Docbook to PDF, HTML, and text. Now, you don't have to convert to all of these formats of course, perhaps only PDF. In this case you probably have to do Docbook -> fo -> PDF. I can tell you that setting up all the Java applications was a true PITA, all praise to CPAN for providing a means of installing multiple packages and tracking version dependencies simultaneously. However, if you insist I've given you a sense of what I did with the shell script below. Hint: you must also hack the XSL files provided by e-novative. The reason I used them is because the resulting PDF and HTML is very pretty. A qualification here: all my knowledge of Docbook is a bit dated, I threw all those *jar's out when the Wiki was set up. There _must_ be a better way, I think it's based on Wiki and something like html2pdf. Brian O. On 1/28/07 11:50 PM, "Chris Fields" wrote: > so maybe going > back to having DocBook (or similar) for those docs is the best > answer setenv HTML_STYLE '~/bioperl-live/doc/howto/xml/stylesheet/e-novative_article_html.xsl' setenv PDF_STYLE '~/bioperl-live/doc/howto/xml/stylesheet/e-novative_article_fo.xsl' setenv XML_HOME '~/bioperl-live/doc/howto/xml' setenv SAXON_JAR '/usr/local/saxon6-5-4/saxon.jar' setenv JAVA '/usr/bin/java' setenv LYNX '/sw/bin/lynx' setenv XEP_JAR '/usr/local/RenderX/XEP/lib/saxon.jar' setenv XEP '/usr/local/RenderX/XEP/xep' foreach XML ($XML_HOME/[A-Z]*.xml) setenv HTML `echo $XML | sed 's/\.xml/\.html/'` # Create HTML file echo "Creating $HTML..." $JAVA -jar $SAXON_JAR -o $HTML $XML $HTML_STYLE use.extensions=1 tablecolumns.extension=0 setenv TXT `echo $XML | sed 's/\.xml/\.txt/'` # Create text from HTML file echo "Creating $TXT..." $LYNX -dump $HTML > $TXT setenv FO `echo $XML | sed 's/\.xml/\.fo/'` # Create FO file echo "Creating $FO..." $JAVA -jar $XEP_JAR -o $FO $XML $PDF_STYLE setenv PDF `echo $XML | sed 's/\.xml/\.pdf/'` # Create PDF echo "Creating $PDF..." $XEP -fo $FO -pdf $PDF end From lstein at cshl.edu Tue Jan 30 17:17:50 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 30 Jan 2007 17:17:50 -0500 Subject: [Bioperl-l] to_FTstring() for Bio::Graphics::FeatureBase In-Reply-To: References: Message-ID: <6dce9a0b0701301417r30efdd06ldae35099b463893e@mail.gmail.com> Is there any reason that I shouldn't return complement(start..end) on minus strand features? I'm committing this change. Let me know if it breaks anything. Lincoln On 1/26/07, Jason Stajich wrote: > > Lincoln: > > Right now the code for Bio::Graphics::FeatureBase implementation of the > LocationI interface method, to_FTstring is: > > sub to_FTstring { > my $self = shift; > my $low = $self->min_start; > my $high = $self->max_end; > return "$low..$high"; > } > > So strand is thrown away. Is it legitimate to modify this to return > $high..$low when strand < 0? > > -jason > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Tue Jan 30 17:46:12 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 30 Jan 2007 17:46:12 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and annotations In-Reply-To: References: Message-ID: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com> I've fixed the first issue in CVS. Sorry for the inconsistency. add_tag_value(), delete_tag_value() and get_Annotations() now all work as expected. The problem with the ID column is that it is supposed to be LOCAL to the GFF3 file and is not intended to be stored in the database. In contrast, Name can survive roundtripping. Perhaps the thing to do is to add a flag to the GFF3 file that turns on ID round-tripping, e.g. ##round-trip-ids: 1 If you like this idea, I can implement it. Lincoln On 1/29/07, Cook, Malcolm wrote: > > Lincoln, > > Thanks for your suggestions on approach to my problems augmenting Flybase > annotation. I am trying to follow them and finding the following oddities > > The first issue relates to the intermix of 'annotations' and 'tag > values'. I find that Bio::DB::SeqFeature implements some of the 'tag' > methods and some of the 'Annotation' methods. Here is a perl one-liner that > shows values stored using add_tag_value are not retreived with > get_tag_values, but rather with get_Annotations. > > > perl -MBio::DB::SeqFeature -e 'my $f=Bio::DB::SeqFeature->new; > $f->add_tag_value("x",666); print "get_tag_values:\t" . > $f->get_tag_values("x") . "\nget_Annotations:\t" . > $f->get_Annotations("x");' > > whose output is: > get_tag_values: > get_Annotations: 666 > > Tracing this shows me that this results from the fact that: > > Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase (via > Bio::DB::SeqFeature::NormalizedFeature) which does not support -tags in > ->new but rather -attributes, viz: > > -attributes a hashref of tag value attributes, in which the key is > the tag > and the value is an array reference of values > > And though Bio::Graphics::FeatureBase purports to implement > Bio::SeqFeatureI, it only partially implements the 'tag' methods (now > deprecated and relegated to Bio::AnnotatableI). In particular, the '*' > methods Bio::SeqFeatureI are not implemented in Bio::Graphics::FeatureBase > > > has_tag > * add_tag_value > get_tag_values > get_all_tags > * remove_tag > get_tagset_values > get_Annotations > > As a result, add_tag_value and remove_tag are inherited from different > modules whose understanding of tags is not the same! > > This one-liner : > > >perl -MClass::ISA -MClass::Inspector -MBio::DB::SeqFeature -e 'my @c = > Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn > qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep > {Class::Inspector->function_exists($_, $fn)} @c)}' > > confirms that they are defined in different packages, namely: > > add_tag_value: Bio::AnnotatableI > get_tag_values: Bio::Graphics::FeatureBase Bio::AnnotatableI > Proposed solution... hmmmm ..... I dunno.... maybe the following patch > to Bio::Graphics::FeatureBase->add_tag_value : > > sub add_tag_value { > my ($self,$tag, at vals) = @_; > push @{$self->{attributes}{$tag}}, @vals; > } > > It fixes my use case for now but I'm still concerned and confused about > this variety of methods. > > Suggestions? > > > ------------------------------------------------------------------------- > > Also, I think that any "ID" in column 9 of GFF3 float file should be > preserved through a round-trip through a Bio::DB::SeqFeature store, but this > is not yet possible since any ID attribute in GFF3 column 9 is being lost > by GFF3Loader, causing me to locally patch GFF3Loader::handle_feature > method to add the following: > > # mec at stowers-institute.org, wondering why not all attributes are > # carried forward, adds ID tag in particular service of > # round-tripping ID, which, though present in database as load_id > # attribute, was getting lost as itself > $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID}; > > Poised to patch.... what d'you think? > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > > > ------------------------------ > From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf > Of Lincoln Stein > Sent: Tuesday, December 19, 2006 3:58 PM > To: Cook, Malcolm > Cc: bioperl list; lstein at cshl.org > Subject: Re: bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader > problems augmenting Flybase annotation > > Hi Malcom, > > Your second guess was right. The use case of augmenting an existing gene > with additional splice forms isn't provided for. You can get the > functionality by making direct calls to Bio::DB::SeqFeature::Store methods: > > my @genes = $db->get_features_by_name('FBgn0017545'); > @genes == 1 or die "Didn't get exactly one gene"; > my $parent = $genes[0]; > > my $parent = $genes[0]; > my $chr = $parent->seq_id; > my $start = $parent->start; > my $end = $parent->end; > my $strand = $parent->strand; > > my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', > -source => 'added', > -seq_id => '4r', > -strand => $strand, > -start => $start+10, > -end => $end, > ); > $parent->add_SeqFeature($new_splice_form); > > for my $pos ([$start+10,$start+100],[$start+200,$end]) { > my ($e_start,$e_end) = @$pos; > my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', > -store => $db, > -seq_id => '4r', > -strand => $strand, > -start => $e_start, > -end => $e_end); > $new_splice_form->add_SeqFeature($exon); > } > > I found a bug in updating the seqfeature database when I wrote this > script, so you'll have to get the latest biperl live. I think you can use > this to write a splice form updating script. > > In order to support the idea of adding new splice forms to an existing > gene using the GFF3 format, I will have to either modify the loader, or > write a separate script (probably better to do the latter). It shouldn't be > hard if you'd like to give it a try. > > Lincoln > > On 12/19/06, Cook, Malcolm wrote: > > > > Lincoln and fellow Bio::DB::SeqFeature travelers, > > > > I find that using bp_seqfeature_load.PLS to load subfeatures of genes > > already loaded using bp_seqfeature_load.PLS fails with > > > > ------------- EXCEPTION ------------- > > MSG: FBgn0017545 doesn't have a primary id > > STACK > > Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 > > STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 > > STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 > > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 > > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 > > STACK toplevel > > /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo > > > > ad.PLS:76 > > > > Where FBgn0017545 is the ID of a gene previously loaded. > > > > I am unsure how to remedy my situation and welcome any advise on correct > > or improved approach to my problem. > > > > Here's some detail if it helps. I am developing a pipeline to design a > > microarray probes capable of distinguishing among splice variants in > > drosophila (using latest Flybase dmel_r5.1 annotation). So I > > > > 1) load a filtered selection of Flybase annotation using > > bp_seqfeature_load. (for testing purposes, I am using a single gene's > > worth of annotation, FBgn0017545.gff, attached). This is done as > > follows: > > > > > bp_seqfeature_load.PLS --create FBgn0017545.gff > > > > 2) analyze all the genes in the database, and create GFF3 output each > > feature of which has a 'Parent' that is a previously loaded gene (i.e. > > FBgn0017545). (These features represent the unique introns, splice > > sites, and exonic design targets. Output of this analysis, > > FBgn0017545_matd.gff, is also attached) > > > > 3) load these analysis results into the same database, as follows: > > > > > bp_seqfeature_load.PLS FBgn0017545_matd.gff > > > > It is at this point that I get the above error. > > > > However, I don't get any error and the data loads fine if I load the two > > files together, as follows: > > > > > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff > > FBgn0017545_matd.gff) > > > > So, I suspect that either I am misunderstanding when/how to use > > bp_seqfeature_load.PLS or else this use case has not yet arisen and must > > > > be provided for somehow. > > > > I am running against bioperl-live > > > > Thanks for your thoughts and assistance, > > > > Malcolm Cook > > Database Applications Manager - Bioinformatics > > Stowers Institute for Medical Research - Kansas City, Missouri > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From N.Haigh at sheffield.ac.uk Tue Jan 30 18:23:59 2007 From: N.Haigh at sheffield.ac.uk (Nathan Haigh) Date: Tue, 30 Jan 2007 23:23:59 +0000 Subject: [Bioperl-l] ActivePerl 5.8.8.820 Release Message-ID: <1170199439.45bfd38f8de49@webmail.shef.ac.uk> I have just recieved news that ActivePerl 5.8.8.820 has been released. It fixes a few major bugs that we experienced when releasing Bioperl 1.5.2. Without going into it in much detail - I'm on holiday :o) here are a few of the pertinent fixes: For users: It should have fixed bugs releaing to updating to the latest version of a module (from any repo) rather than just installing the one from ActiveState's repo. Therefore, you should be able to say something like "install bioperl" or "upgrade bioperl" and you get the latest version available rather than having to specifically tell it which verson of a package to install. For developers: It should also have fixed a bug whereby we we forced to implemented a redirect in the httpd.conf file. The bug resulted in PPM4 clients requesting modules from our repo rather than ActiveStates. The PPM GUI should now also support install and uninstall of modules with post-install and uninstall scripts For the full change log see: http://aspn.activestate.com/ASPN/docs/ActivePerl/5.8/changes-58.html I haven't been able to test these changes as I'm on holiday. But if these changes fix problems we were having we should think about bumping up the min requirement to ActivePerl 5.8.8.820 so save a lot of headaches. Cheers Nath From MEC at stowers-institute.org Tue Jan 30 19:30:35 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 30 Jan 2007 18:30:35 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and annotations Message-ID: Hi Lincoln, Thanks for the resolution of tag value methods. Your fixes work in my hands... I never knew that "ID column is that it is supposed to be LOCAL to the GFF3 file and is not intended to be stored in the database". I read in http://www.sequenceontology.org/gff3.shtml that ID Indicates the name of the feature. IDs must be unique within the scope of the GFF file. But that ID is solely for the purpose of linearizing the relationships among features within the scope of the GFF text file. Further, in that document, I see lots of examples where the ID is what is holding together the fabric of the feature relationships; IDs appear as the value of: Target and Parent attributes. I'd appreciate it if you can provide a motivating example where the ID= in a (real life) GFF3 file is in fact ONLY used for the purpose of linearizing the feature relationships. ANyway, If it matters, the GFF formated genome I'm wallowing in these days from Flybase (dmel r5.1) presents their FlyBase IDs in the ID attribute, like this: 4 FlyBase gene 24068 25621 . + . ID=FBgn0040037;Name=CG17923;Dbxref=FlyBase:FBan0017923,FlyBase_Annotatio n_IDs:CG17923... 4 FlyBase mRNA 24068 25621 . + . ID=FBtr0089155;Name=CG17923-RA;Parent=FBgn0040037;Dbxref=FlyBase_Annotat ion_IDs:CG17923-RA; 4 FlyBase exon 24068 24477 . + . ID=CG17923:1;Name=CG17923:1;Parent=FBtr0089155 Switching to using 'Name' might work OK for my application. I'll look into it. It winds up being the same as the ID in some cases anyway.... Malcolm Cook ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Tuesday, January 30, 2007 4:46 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: Bio::DB::SeqFeature treamtent of tags and annotations I've fixed the first issue in CVS. Sorry for the inconsistency. add_tag_value(), delete_tag_value() and get_Annotations() now all work as expected. The problem with the ID column is that it is supposed to be LOCAL to the GFF3 file and is not intended to be stored in the database. In contrast, Name can survive roundtripping. Perhaps the thing to do is to add a flag to the GFF3 file that turns on ID round-tripping, e.g. ##round-trip-ids: 1 If you like this idea, I can implement it. Lincoln On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org > wrote: Lincoln, Thanks for your suggestions on approach to my problems augmenting Flybase annotation. I am trying to follow them and finding the following oddities The first issue relates to the intermix of 'annotations' and 'tag values'. I find that Bio::DB::SeqFeature implements some of the 'tag' methods and some of the 'Annotation' methods. Here is a perl one-liner that shows values stored using add_tag_value are not retreived with get_tag_values, but rather with get_Annotations. > perl -MBio::DB::SeqFeature -e 'my $f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print "get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" . $f->get_Annotations("x");' whose output is: get_tag_values: get_Annotations: 666 Tracing this shows me that this results from the fact that: Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase (via Bio::DB::SeqFeature::NormalizedFeature) which does not support -tags in ->new but rather -attributes, viz: -attributes a hashref of tag value attributes, in which the key is the tag and the value is an array reference of values And though Bio::Graphics::FeatureBase purports to implement Bio::SeqFeatureI, it only partially implements the 'tag' methods (now deprecated and relegated to Bio::AnnotatableI). In particular, the '*' methods Bio::SeqFeatureI are not implemented in Bio::Graphics::FeatureBase has_tag * add_tag_value get_tag_values get_all_tags * remove_tag get_tagset_values get_Annotations As a result, add_tag_value and remove_tag are inherited from different modules whose understanding of tags is not the same! This one-liner : >perl -MClass::ISA -MClass::Inspector -MBio::DB::SeqFeature -e 'my @c = Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep {Class::Inspector->function_exists($_, $fn)} @c)}' confirms that they are defined in different packages, namely: add_tag_value: Bio::AnnotatableI get_tag_values: Bio::Graphics::FeatureBase Bio::AnnotatableI Proposed solution... hmmmm ..... I dunno.... maybe the following patch to Bio::Graphics::FeatureBase->add_tag_value : sub add_tag_value { my ($self,$tag, at vals) = @_; push @{$self->{attributes}{$tag}}, @vals; } It fixes my use case for now but I'm still concerned and confused about this variety of methods. Suggestions? ------------------------------------------------------------------------ - Also, I think that any "ID" in column 9 of GFF3 float file should be preserved through a round-trip through a Bio::DB::SeqFeature store, but this is not yet possible since any ID attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to locally patch GFF3Loader::handle_feature method to add the following: # mec at stowers-institute.org , wondering why not all attributes are # carried forward, adds ID tag in particular service of # round-tripping ID, which, though present in database as load_id # attribute, was getting lost as itself $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID}; Poised to patch.... what d'you think? Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto: lincoln.stein at gmail.com ] On Behalf Of Lincoln Stein Sent: Tuesday, December 19, 2006 3:58 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm > wrote: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Tue Jan 30 21:59:21 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2007 21:59:21 -0500 Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: <817961.76470.qm@web58904.mail.re1.yahoo.com> References: <817961.76470.qm@web58904.mail.re1.yahoo.com> Message-ID: <28BC3DF8-48C0-4C89-9F89-7C247D2A2FB6@gmx.net> George, as I suggested, you don't have a value for the primary_id(). I use it, for example, to store the NCBI GI#, if the record has it, but yours don't. IMHO the bioperl parser is mistaken in using the FASTA ID token for the primary_id(), but that's what it does, and you can undo it by setting the value to undef, which is what it should be so long as you don't exactly know a better value. You set the value to undef by doing $seq->primary_id(undef); If you print the value afterwards, don't be confused by seeing some strange value there; this is only because the Bioperl model requires primary_id() to return something that is unique, and if you don't set a value (or reset it to undef), it will use the memory location of the object. Bioperl-db recognizes this and will not store this dummy value, since in BioSQL the identifier column is optional (nullable). If you are unconvinced of this, use the same value as you use for the accession, just don't use something the uniqueness of which is questionable, and don't use a string longer than 40 chars. You still haven't explained on how you got the FASTA file in the first place. The reason I have been asking is that I suspect that even your first component is not unique in the file as it looks like you are pulling database search hits from a sequence database (BLAST, fastacmd?), and your sequences may get hit multiple times. I have no way of verifying how unique your identifiers are, so you need to check that yourself. You have done so for the 4th element already, what about the first? It is key that you know how to uniquely identify a sequence in your file, if you want every single sequence in the file as a separate record in the database. If no part of your ID line is unique but you want it to be, append a random (or incremented) number. Hth, -hilmar On Jan 30, 2007, at 12:32 PM, George Heller wrote: > Hi Hilmar, > > I have been trying to get around this problem for some days now, > but havent had much luck. My file has about 12000 odd records, and > when I try to load it with a pipeline to the package, I have about > 483 records that get loaded. > > sub process_seq > { > my ($self, $seq) = @_; > # $seq->accession_number($seq->display_id); > my @ids = split(/\|/,$seq->display_id); > $seq->accession_number($ids[0]); > $seq->primary_id($ids[2]); > return ($seq); > } > I am assuming this is because I use the ids[2] for the primary_id, > and possibly the file has 483 unique records for that field. When I > use some other reference like ids[3] etc(coz it has more unique > values), I get the same error about the length being more than 40. > > I printed out the values for the display_id, primary_id and > accession_number. The display_id has the entire first line of the > file, so does the primary_id, if I dont set it in my new script. > The accession number is split correctly and printed. > > This is how the first few lines in my file look like, > > >FGENESHT0000001||AC155633|570|4400|1 > MTERKRKEIEDRKRKISGPQPGSSNRPRFSGNQPQQFRQNQRPPQQHQQFQRQYPQHQYQNRQSNQSGGQ > FQRQNQQAPR > LPAPAAQQNSQATPAQVGNRACFHCGEQGHWVMQCPKKAAQQQSGPNAPAKQNVPQPRAGNRSQPRYNHG > RLNHLEAEAV > QETPSMIVGMFPVDSHIAEVLFDTGATHSFITASWVEAHNLPITTMSTPIQIDSAGGRIRADSICLNICV > EIRGIAFPAN > LIVMGTQGIDVILGMNWLDKYQAVISCDKRTIKLMSPLGEEVVTELVPPEPKRGSCYQLAVDSSEVDPIE > SIRVVSEFPD > VFPKDLPGMPPERKVEFAIELLPGTAPIFKRAYRISGPELVELKEQIDELSEKGYIRPSTSPWAAPVLFV > EKKDGTKRMC > IDYRALNEVTIKNKYPLPRIEDLFDQLRGASVFSKIDLRSAFFMNLMNSVFMDYLDKFVVVFIDDILVYS > QSEEEHADHL > KMVLQRLREHQLYAKLSKCEFWINEVLFLGHIINKEGLAVDPKKVANILNWKAPTDARGIKSFIGMVGYY > RRFIEGFSKI > AKPMTALLGNKVEFKWTQKCQEAFEALKEKLTIAPVLVLPDVHKPFSVYCDACYTGLGCVLMQEGRVVAY > SSRQLKVHEK > NYPIHDLELAAVVHALKTWRHYLYGQKCDVYTDHKSLKYIFTQSELNMRQRRWLELIKDYELEIHYHPGK > ANVVADALSR > KSQVNLMVARPMPYELAKEFDRLSLGFLNNSRGVTVELEPTLEREIKEAQKNDEKISEIRRLILDGRGKD > FREDAEGVIW > FKDRLCVPNVQSIRELILKEAHETAYSIHPGSEKMYQDLKKKFWWYGMKREIAEHVAMCDSCRRIKAEHQ > RPAGLLQPLQ > IPQWKWDEIGMDFI > VGLPRTRAGYDSIWVVVDRLTKSAHFIPVKTNYSSAVLAELYMSRIVCLHGVPKKIVSDRGTQFTS > HFWRQLHEALGTHLNFSSAYHPQTDGQTERTNQILEDMLRACALQDQSGWDKRLPYAEFSYNNSYQASLK > MSPFQALYGR > SCRTPLQWDQPGEKQVFGPDILLEAEENIKMVRENLKIAQSRQRSYADTRRRELSFEVGDFVYLKVSPIR > GVKRFGVKGK > LAPRYIGSYQILARRGEVAYQLSLPENLSAVHDVFHVSQLKKCLRVPEEQLPVEGLEVQEDLTYVEKPVQ > ILEVADRVTR > RKTIRMCKVRWNHHSEEEATSEREDDLMAKYPELFASQP* > Any suggestions? > > Thanks! > George. > > > > > > Hilmar Lapp wrote: > That's odd indeed. Did you try and put a print statement before the > return statement that proves that 1) the codes gets executed, and 2) > display_id(), primary_id() and accession_number() have the expected > values? > > BTW you might also want to set primary_id() to undef (as the ID found > in the FASTA files doesn't really count as a primary database- > specific ID anyway). The identifier column in bioentry (which > primary_id() maps to) is constrained to 40 chars as well. > > -hilmar > > On Jan 27, 2007, at 9:31 PM, George Heller wrote: > > > Hi Hilmar, > > > > I tried the lookup and noupdate options, and also made changes to > > the SeqProcessor.pm package for the accession, > > package SeqProcessor::Accession; > > use strict; > > use vars qw(@ISA); > > use Bio::Seq::BaseSeqProcessor; > > use Bio::SeqFeature::Generic; > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > sub process_seq > > { > > my ($self, $seq) = @_; > > my @ids = split(/|/,$seq->display_id); > > $seq->accession_number($ids[0]); > > return ($seq); > > } > > 1; > > I invoke the load_seqdatabase.pl as, > > perl load_seqdatabase.pl -host localhost -dbname usda-06 -format > > fasta -dbuser postgres -driver Pg --lookup --noupdate -- > > pipeline="SeqProcessor::Accession" maize_pep.fasta > > Loading maize_pep.fasta ... > > I get the error, > > -------------------- WARNING --------------------- > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > > were ("FGENESHT0000021||AC155633|113788| > > 114708|-1","FGENESHT0000021||AC155633|113788| > > 114708|-1","FGENESHT0000021||AC155633|113788|114708|-1","","0","") > > FKs (1,) > > ERROR: value too long for type character varying(40) > > --------------------------------------------------- > > Could not store FGENESHT0000021||AC155633|113788|114708|-1: > > ------------- EXCEPTION ------------- > > MSG: error while executing statement in > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > > transaction is aborted, commands ignored until end of transaction > > block > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ > > perl//Bio/DB/Persistent/PersistentObject.pm:272 > > STACK (eval) load_seqdatabase.pl:620 > > STACK toplevel load_seqdatabase.pl:602 > > -------------------------------------- > > at load_seqdatabase.pl line 633 > > As far as I gather, this error shouldnt appear as we are > > filtering out the accession as only the first code that appears. > > Ideas? > > > > George. > > > > > > Hilmar Lapp wrote: > > George, > > > > I don't know you create the FASTA file, but that's probably where > the > > root cause is. Based on the message: > > > >> -------------------- WARNING --------------------- > >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| > >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > >> 1","","0","") FKs (1,) > >> ERROR: duplicate key violates unique constraint > >> "bioentry_accession_key" > >> --------------------------------------------------- > > > > the identifier and accession number are set, so your SeqProcessor > > scriptlet was executed (otherwise you'd also have seen a dynamic > > loading error it e.g. you perl class could be not be found or loaded > > by perl). If you still receive the duplicate key violation, then it > > can only mean that indeed a sequence with the exact same accession > > number was in the database already. > > > > There are different possibilities for why: you may have loaded the > > same file before (use --lookup and related switches if you want to > > update existing sequences), or your FASTA file contains multiple > > sequences with the same ID, or you have a sequence with the same ID > > in different FASTA files, if you are loading from more than one > file. > > In either of the two latter cases, you will need to find a way to > > disambiguate the IDs. > > > > BTW you also want to consider to parse the concatenated ID > > 'FGENESHT0000001||AC155633|570|4400|1' apart into its component IDs, > > and then use only one component. For example: > > > > my @ids = split(/|/,$seq->display_id); > > $seq->accession_number($ids[0]); > > > > Obviously, this will only make for a nicer accession number, and not > > solve your duplicate ID problem, as the latter is in the file(s) you > > load. > > > > -hilmar > > > > On Jan 25, 2007, at 8:51 PM, George Heller wrote: > > > >> Hi Hilmar, > >> > >> I still seem to be having problems loading my fasta file. I wrote a > >> new package, SeqProcessor.pm as below, > >> > >> package SeqProcessor::Accession; > >> use strict; > >> use vars qw(@ISA); > >> use Bio::Seq::BaseSeqProcessor; > >> use Bio::SeqFeature::Generic; > >> @ISA = qw(Bio::Seq::BaseSeqProcessor); > >> sub process_seq > >> { > >> my ($self, $seq) = @_; > >> $seq->accession_number($seq->display_id); > >> return ($seq); > >> } > >> 1; > >> I have this file SeqProcessor.pm in my home directory, and I have > >> set the PERL5LIB variable accordingly. When I run > >> load_seqdatabase.pl, > >> > >> perl load_seqdatabase.pl -host localhost -dbname biodb -format > >> fasta -dbuser postgres -driver Pg -- > >> pipeline="SeqProcessor::Accession" maize_pep.fasta > >> > >> I still get the error, > >> > >> Loading maize_pep.fasta ... > >> -------------------- WARNING --------------------- > >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| > >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > >> 1","","0","") FKs (1,) > >> ERROR: duplicate key violates unique constraint > >> "bioentry_accession_key" > >> --------------------------------------------------- > >> Could not store FGENESHT0000001||AC155633|570|4400|1: > >> ------------- EXCEPTION ------------- > >> MSG: error while executing statement in > >> Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > >> transaction is aborted, commands ignored until end of transaction > >> block > >> STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > >> STACK Bio::DB::Persistent::PersistentObject::store /home/akar/ > local/ > >> perl//Bio/DB/Persistent/PersistentObject.pm:272 > >> STACK (eval) load_seqdatabase.pl:620 > >> STACK toplevel load_seqdatabase.pl:602 > >> -------------------------------------- > >> at load_seqdatabase.pl line 633 > >> Is there something I am missing? > >> > >> Thanks! > >> George. > >> > >> > >> Hilmar Lapp wrote: > >> Hi George, sorry for the sluggish response, I was tied up during > the > >> week. This is also why you always want to keep the thread on the > >> list. > >> > >> Perl is an interpreted language, so no compilation is necessary. > The > >> only thing you need to do is have the package in a place where perl > >> can find it. The simplest way to achieve this is by setting the > >> PERL5LIB environment variable: > >> > >> $ export PERL5LIB=/where/you/put/your/perl/package > >> > >> or if PERL5LIB was set already, you'd append it: > >> > >> $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package > >> > >> I do assume that you didn't really add your code to the > SeqAdaptor.pm > >> package - there is no necessity for nor benefit from that, and at > >> worst (and quite likely) perl won't be able to find the package. > Note > >> that there is plenty of documentation for how to write packages for > >> perl and how to make them accessible to perl. > >> > >> Hth, > >> > >> -hilmar > >> > >> On Jan 8, 2007, at 11:52 PM, George Heller wrote: > >> > >>> Hi Hilmer. > >>> > >>> Thanks so much for the response. As I am new to Bioperl, I have > >>> another question. > >>> > >>> I have made the changes as suggested by you, and have added the > >>> code below to the SeqAdaptor.pm script. > >>> > >>> package SeqProcessor::Accession; > >>> use strict; > >>> use vars qw(@ISA); > >>> use Bio::Seq::BaseSeqProcessor; > >>> use Bio::SeqFeature::Generic; > >>> > >>> @ISA = qw(Bio::Seq::BaseSeqProcessor); > >>> > >>> sub process_seq > >>> { > >>> my ($self, $seq) = @_; > >>> $seq->accession_number($seq->display_id); > >>> return ($seq); > >>> } > >>> > >>> Now that I have done my changes, do I need to compile or something > >>> for the changes to reflect? If so, can you please let me know the > >>> command for the same, or direct me to any lin that has > >>> documentation for the same? > >>> > >>> Thanks so much for the help. > >>> George. > >>> > >>> Hilmar Lapp wrote: > >>> George, > >>> > >>> this is almost certainly caused by using FASTA format and > bioperl's > >>> treatment of it. I am guilty of not having written a FAQ yet for > >>> Bioperl-db, as this would certainly be there. > >>> > >>> Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl > >>> uses Bioperl to parse sequence files) does not extract the > accession > >>> number from the description line of the fasta sequence, and > instead > >>> sets the accession_number property if sequence objects it > creates to > >>> "unknown". Since there is a unique key constraint on > >>> (accession,version,namespace) the second sequence loaded will > raise > >>> an exception as it will violate the constraint. > >>> > >>> The simplest way to deal with this is to write a SeqProcessor that > >>> massages the accession_number appropriately and then supply the > >>> module to load_seqdatabase.pl using the --pipeline command line > >>> switch. > >>> > >>> There are several examples for how to do this in the email > archives. > >>> See for example this thread on the Biosql list: > >>> > >>> http://lists.open-bio.org/pipermail/biosql-l/2005-August/ > 000901.html > >>> > >>> with two links to examples, and Marc Logghe gives another one > in the > >>> thread itself. > >>> > >>> Hth, > >>> > >>> -hilmar > >>> > >>> On Jan 8, 2007, at 3:17 PM, George Heller wrote: > >>> > >>>> Hi all. > >>>> > >>>> I am new to Bioperl and am trying to run the load_seqdatabase.pl > >>>> script to load sequence data from a file into Postgres > database. I > >>>> am invoking the script through the following command: > >>>> > >>>> perl load_seqdatabase.pl -host localhost -dbname biodb06 -format > >>>> fasta > >>>> -dbuser postgres -driver Pg > >>>> > >>>> I am getting the following error: > >>>> > >>>> -------------------- WARNING --------------------- > >>>> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, > values > >>>> were ("FGENES > >>>> HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570| > >> 4400| > >>>> 1","unknown" > >>>> ,"","0","") FKs (1,) > >>>> ERROR: duplicate key violates unique constraint > >>>> "bioentry_accession_key" > >>>> --------------------------------------------------- > >>>> Could not store unknown: > >>>> ------------- EXCEPTION ------------- > >>>> MSG: error while executing statement in > >>>> Bio::DB::BioSQL::SeqAdaptor::find_by_uni > >>>> que_key: ERROR: current transaction is aborted, commands ignored > >>>> until end of t > >>>> ransaction block > >>>> STACK > >>>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > usr/ > >>>> lib/perl > >>>> 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 > >>>> STACK > >> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > >>>> usr/lib/perl5 > >>>> /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 > >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ > >>>> perl5/site_perl/5 > >>>> .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 > >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/ > >> perl5/ > >>>> site_perl/5. > >>>> 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > >>>> STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/ > perl5/ > >>>> site_perl/5.8. > >>>> 5/Bio/DB/Persistent/PersistentObject.pm:271 > >>>> STACK (eval) load_seqdatabase.pl:620 > >>>> STACK toplevel load_seqdatabase.pl:602 > >>>> -------------------------------------- > >>>> at load_seqdatabase.pl line 633 > >>>> > >>>> Can anyone tell me how I can correct this error and get my script > >>>> running? Thanks!!! > >>>> > >>>> George. > >>>> > >>>> > >>>> __________________________________________________ > >>>> Do You Yahoo!? > >>>> Tired of spam? Yahoo! Mail has the best spam protection around > >>>> http://mail.yahoo.com > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> -- > >>> =========================================================== > >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>> =========================================================== > >>> > >>> > >>> > >>> > >>> > >>> > >>> __________________________________________________ > >>> Do You Yahoo!? > >>> Tired of spam? Yahoo! Mail has the best spam protection around > >>> http://mail.yahoo.com > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> > >> Looking for earth-friendly autos? > >> Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > > > > > > > --------------------------------- > > Check out the all-new Yahoo! Mail beta - Fire up a more powerful > > email and get things done faster. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > Don't pick lemons. > See all the new 2007 cars at Yahoo! Autos. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From neetisomaiya at gmail.com Wed Jan 31 05:25:33 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 31 Jan 2007 15:55:33 +0530 Subject: [Bioperl-l] want score from needle output Message-ID: <764978cf0701310225y28bee35bg39044e9436075840@mail.gmail.com> Hi, I am using a code like the following to parse needle output. While I am able to get the percent identity value, the score method is not returning anything, though the needle output does report a score too. -------------------------------- my $str = Bio::AlignIO->new(-format => 'emboss',-file => $needle_output); my $aln = $str->next_aln(); my $aln_perc_iden = $aln->percentage_identity; my $aln_score = $aln->score; ----------------------------- Can anyone suggest what the problem could be? -- -Neeti Even my blood says, B positive From bix at sendu.me.uk Wed Jan 31 06:10:12 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 31 Jan 2007 11:10:12 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store delete() not working Message-ID: <45C07914.6060908@sendu.me.uk> Hi, I'm trying to use Bio::DB::SeqFeature::Store delete() on a list of Bio::SeqFeature::Annotated retrieved from the database. It doesn't work. I presume I'm falling foul of the issue pointed out in the docs: "WARNING: The current DBI::mysql implementation has some issues that need to be resolved, namely (1) normalized subfeatures are NOT recursively deleted; and (2) the deletions are not performed in a transaction." Is there a trick to avoid the problem? Or, how might someone go about improving the implementation so that it worked? Cheers, Sendu. From george.heller at yahoo.com Wed Jan 31 00:43:10 2007 From: george.heller at yahoo.com (George Heller) Date: Tue, 30 Jan 2007 21:43:10 -0800 (PST) Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: <28BC3DF8-48C0-4C89-9F89-7C247D2A2FB6@gmx.net> Message-ID: <14733.60041.qm@web58904.mail.re1.yahoo.com> Hi Hilmar, I tried setting the primary id to undef, it still gives me the length(greater than 40) error. And I tried setting it to the same as accession, that too resulted in the same error. I really have no idea where the file really came from, coz I have recently joined this project, and people working with this before have left! I am attaching the input file with this mail so you can have a look at it. Thanks. George. Hilmar Lapp wrote: George, as I suggested, you don't have a value for the primary_id(). I use it, for example, to store the NCBI GI#, if the record has it, but yours don't. IMHO the bioperl parser is mistaken in using the FASTA ID token for the primary_id(), but that's what it does, and you can undo it by setting the value to undef, which is what it should be so long as you don't exactly know a better value. You set the value to undef by doing $seq->primary_id(undef); If you print the value afterwards, don't be confused by seeing some strange value there; this is only because the Bioperl model requires primary_id() to return something that is unique, and if you don't set a value (or reset it to undef), it will use the memory location of the object. Bioperl-db recognizes this and will not store this dummy value, since in BioSQL the identifier column is optional (nullable). If you are unconvinced of this, use the same value as you use for the accession, just don't use something the uniqueness of which is questionable, and don't use a string longer than 40 chars. You still haven't explained on how you got the FASTA file in the first place. The reason I have been asking is that I suspect that even your first component is not unique in the file as it looks like you are pulling database search hits from a sequence database (BLAST, fastacmd?), and your sequences may get hit multiple times. I have no way of verifying how unique your identifiers are, so you need to check that yourself. You have done so for the 4th element already, what about the first? It is key that you know how to uniquely identify a sequence in your file, if you want every single sequence in the file as a separate record in the database. If no part of your ID line is unique but you want it to be, append a random (or incremented) number. Hth, -hilmar On Jan 30, 2007, at 12:32 PM, George Heller wrote: Hi Hilmar, I have been trying to get around this problem for some days now, but havent had much luck. My file has about 12000 odd records, and when I try to load it with a pipeline to the package, I have about 483 records that get loaded. sub process_seq { my ($self, $seq) = @_; # $seq->accession_number($seq->display_id); my @ids = split(/\|/,$seq->display_id); $seq->accession_number($ids[0]); $seq->primary_id($ids[2]); return ($seq); } I am assuming this is because I use the ids[2] for the primary_id, and possibly the file has 483 unique records for that field. When I use some other reference like ids[3] etc(coz it has more unique values), I get the same error about the length being more than 40. I printed out the values for the display_id, primary_id and accession_number. The display_id has the entire first line of the file, so does the primary_id, if I dont set it in my new script. The accession number is split correctly and printed. This is how the first few lines in my file look like, >FGENESHT0000001||AC155633|570|4400|1 MTERKRKEIEDRKRKISGPQPGSSNRPRFSGNQPQQFRQNQRPPQQHQQFQRQYPQHQYQNRQSNQSGGQFQRQNQQAPR LPAPAAQQNSQATPAQVGNRACFHCGEQGHWVMQCPKKAAQQQSGPNAPAKQNVPQPRAGNRSQPRYNHGRLNHLEAEAV QETPSMIVGMFPVDSHIAEVLFDTGATHSFITASWVEAHNLPITTMSTPIQIDSAGGRIRADSICLNICVEIRGIAFPAN LIVMGTQGIDVILGMNWLDKYQAVISCDKRTIKLMSPLGEEVVTELVPPEPKRGSCYQLAVDSSEVDPIESIRVVSEFPD VFPKDLPGMPPERKVEFAIELLPGTAPIFKRAYRISGPELVELKEQIDELSEKGYIRPSTSPWAAPVLFVEKKDGTKRMC IDYRALNEVTIKNKYPLPRIEDLFDQLRGASVFSKIDLRSAFFMNLMNSVFMDYLDKFVVVFIDDILVYSQSEEEHADHL KMVLQRLREHQLYAKLSKCEFWINEVLFLGHIINKEGLAVDPKKVANILNWKAPTDARGIKSFIGMVGYYRRFIEGFSKI AKPMTALLGNKVEFKWTQKCQEAFEALKEKLTIAPVLVLPDVHKPFSVYCDACYTGLGCVLMQEGRVVAYSSRQLKVHEK NYPIHDLELAAVVHALKTWRHYLYGQKCDVYTDHKSLKYIFTQSELNMRQRRWLELIKDYELEIHYHPGKANVVADALSR KSQVNLMVARPMPYELAKEFDRLSLGFLNNSRGVTVELEPTLEREIKEAQKNDEKISEIRRLILDGRGKDFREDAEGVIW FKDRLCVPNVQSIRELILKEAHETAYSIHPGSEKMYQDLKKKFWWYGMKREIAEHVAMCDSCRRIKAEHQRPAGLLQPLQ IPQWKWDEIGMDFI VGLPRTRAGYDSIWVVVDRLTKSAHFIPVKTNYSSAVLAELYMSRIVCLHGVPKKIVSDRGTQFTS HFWRQLHEALGTHLNFSSAYHPQTDGQTERTNQILEDMLRACALQDQSGWDKRLPYAEFSYNNSYQASLKMSPFQALYGR SCRTPLQWDQPGEKQVFGPDILLEAEENIKMVRENLKIAQSRQRSYADTRRRELSFEVGDFVYLKVSPIRGVKRFGVKGK LAPRYIGSYQILARRGEVAYQLSLPENLSAVHDVFHVSQLKKCLRVPEEQLPVEGLEVQEDLTYVEKPVQILEVADRVTR RKTIRMCKVRWNHHSEEEATSEREDDLMAKYPELFASQP* Any suggestions? Thanks! George. Hilmar Lapp wrote: That's odd indeed. Did you try and put a print statement before the return statement that proves that 1) the codes gets executed, and 2) display_id(), primary_id() and accession_number() have the expected values? BTW you might also want to set primary_id() to undef (as the ID found in the FASTA files doesn't really count as a primary database- specific ID anyway). The identifier column in bioentry (which primary_id() maps to) is constrained to 40 chars as well. -hilmar On Jan 27, 2007, at 9:31 PM, George Heller wrote: > Hi Hilmar, > > I tried the lookup and noupdate options, and also made changes to > the SeqProcessor.pm package for the accession, > package SeqProcessor::Accession; > use strict; > use vars qw(@ISA); > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > @ISA = qw(Bio::Seq::BaseSeqProcessor); > sub process_seq > { > my ($self, $seq) = @_; > my @ids = split(/|/,$seq->display_id); > $seq->accession_number($ids[0]); > return ($seq); > } > 1; > I invoke the load_seqdatabase.pl as, > perl load_seqdatabase.pl -host localhost -dbname usda-06 -format > fasta -dbuser postgres -driver Pg --lookup --noupdate -- > pipeline="SeqProcessor::Accession" maize_pep.fasta > Loading maize_pep.fasta ... > I get the error, > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were ("FGENESHT0000021||AC155633|113788| > 114708|-1","FGENESHT0000021||AC155633|113788| > 114708|-1","FGENESHT0000021||AC155633|113788|114708|-1","","0","") > FKs (1,) > ERROR: value too long for type character varying(40) > --------------------------------------------------- > Could not store FGENESHT0000021||AC155633|113788|114708|-1: > ------------- EXCEPTION ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > transaction is aborted, commands ignored until end of transaction > block > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ > perl//Bio/DB/Persistent/PersistentObject.pm:272 > STACK (eval) load_seqdatabase.pl:620 > STACK toplevel load_seqdatabase.pl:602 > -------------------------------------- > at load_seqdatabase.pl line 633 > As far as I gather, this error shouldnt appear as we are > filtering out the accession as only the first code that appears. > Ideas? > > George. > > > Hilmar Lapp wrote: > George, > > I don't know you create the FASTA file, but that's probably where the > root cause is. Based on the message: > >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| >> 1","","0","") FKs (1,) >> ERROR: duplicate key violates unique constraint >> "bioentry_accession_key" >> --------------------------------------------------- > > the identifier and accession number are set, so your SeqProcessor > scriptlet was executed (otherwise you'd also have seen a dynamic > loading error it e.g. you perl class could be not be found or loaded > by perl). If you still receive the duplicate key violation, then it > can only mean that indeed a sequence with the exact same accession > number was in the database already. > > There are different possibilities for why: you may have loaded the > same file before (use --lookup and related switches if you want to > update existing sequences), or your FASTA file contains multiple > sequences with the same ID, or you have a sequence with the same ID > in different FASTA files, if you are loading from more than one file. > In either of the two latter cases, you will need to find a way to > disambiguate the IDs. > > BTW you also want to consider to parse the concatenated ID > 'FGENESHT0000001||AC155633|570|4400|1' apart into its component IDs, > and then use only one component. For example: > > my @ids = split(/|/,$seq->display_id); > $seq->accession_number($ids[0]); > > Obviously, this will only make for a nicer accession number, and not > solve your duplicate ID problem, as the latter is in the file(s) you > load. > > -hilmar > > On Jan 25, 2007, at 8:51 PM, George Heller wrote: > >> Hi Hilmar, >> >> I still seem to be having problems loading my fasta file. I wrote a >> new package, SeqProcessor.pm as below, >> >> package SeqProcessor::Accession; >> use strict; >> use vars qw(@ISA); >> use Bio::Seq::BaseSeqProcessor; >> use Bio::SeqFeature::Generic; >> @ISA = qw(Bio::Seq::BaseSeqProcessor); >> sub process_seq >> { >> my ($self, $seq) = @_; >> $seq->accession_number($seq->display_id); >> return ($seq); >> } >> 1; >> I have this file SeqProcessor.pm in my home directory, and I have >> set the PERL5LIB variable accordingly. When I run >> load_seqdatabase.pl, >> >> perl load_seqdatabase.pl -host localhost -dbname biodb -format >> fasta -dbuser postgres -driver Pg -- >> pipeline="SeqProcessor::Accession" maize_pep.fasta >> >> I still get the error, >> >> Loading maize_pep.fasta ... >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| >> 1","","0","") FKs (1,) >> ERROR: duplicate key violates unique constraint >> "bioentry_accession_key" >> --------------------------------------------------- >> Could not store FGENESHT0000001||AC155633|570|4400|1: >> ------------- EXCEPTION ------------- >> MSG: error while executing statement in >> Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current >> transaction is aborted, commands ignored until end of transaction >> block >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 >> STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ >> perl//Bio/DB/Persistent/PersistentObject.pm:272 >> STACK (eval) load_seqdatabase.pl:620 >> STACK toplevel load_seqdatabase.pl:602 >> -------------------------------------- >> at load_seqdatabase.pl line 633 >> Is there something I am missing? >> >> Thanks! >> George. >> >> >> Hilmar Lapp wrote: >> Hi George, sorry for the sluggish response, I was tied up during the >> week. This is also why you always want to keep the thread on the >> list. >> >> Perl is an interpreted language, so no compilation is necessary. The >> only thing you need to do is have the package in a place where perl >> can find it. The simplest way to achieve this is by setting the >> PERL5LIB environment variable: >> >> $ export PERL5LIB=/where/you/put/your/perl/package >> >> or if PERL5LIB was set already, you'd append it: >> >> $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package >> >> I do assume that you didn't really add your code to the SeqAdaptor.pm >> package - there is no necessity for nor benefit from that, and at >> worst (and quite likely) perl won't be able to find the package. Note >> that there is plenty of documentation for how to write packages for >> perl and how to make them accessible to perl. >> >> Hth, >> >> -hilmar >> >> On Jan 8, 2007, at 11:52 PM, George Heller wrote: >> >>> Hi Hilmer. >>> >>> Thanks so much for the response. As I am new to Bioperl, I have >>> another question. >>> >>> I have made the changes as suggested by you, and have added the >>> code below to the SeqAdaptor.pm script. >>> >>> package SeqProcessor::Accession; >>> use strict; >>> use vars qw(@ISA); >>> use Bio::Seq::BaseSeqProcessor; >>> use Bio::SeqFeature::Generic; >>> >>> @ISA = qw(Bio::Seq::BaseSeqProcessor); >>> >>> sub process_seq >>> { >>> my ($self, $seq) = @_; >>> $seq->accession_number($seq->display_id); >>> return ($seq); >>> } >>> >>> Now that I have done my changes, do I need to compile or something >>> for the changes to reflect? If so, can you please let me know the >>> command for the same, or direct me to any lin that has >>> documentation for the same? >>> >>> Thanks so much for the help. >>> George. >>> >>> Hilmar Lapp wrote: >>> George, >>> >>> this is almost certainly caused by using FASTA format and bioperl's >>> treatment of it. I am guilty of not having written a FAQ yet for >>> Bioperl-db, as this would certainly be there. >>> >>> Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl >>> uses Bioperl to parse sequence files) does not extract the accession >>> number from the description line of the fasta sequence, and instead >>> sets the accession_number property if sequence objects it creates to >>> "unknown". Since there is a unique key constraint on >>> (accession,version,namespace) the second sequence loaded will raise >>> an exception as it will violate the constraint. >>> >>> The simplest way to deal with this is to write a SeqProcessor that >>> massages the accession_number appropriately and then supply the >>> module to load_seqdatabase.pl using the --pipeline command line >>> switch. >>> >>> There are several examples for how to do this in the email archives. >>> See for example this thread on the Biosql list: >>> >>> http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html >>> >>> with two links to examples, and Marc Logghe gives another one in the >>> thread itself. >>> >>> Hth, >>> >>> -hilmar >>> >>> On Jan 8, 2007, at 3:17 PM, George Heller wrote: >>> >>>> Hi all. >>>> >>>> I am new to Bioperl and am trying to run the load_seqdatabase.pl >>>> script to load sequence data from a file into Postgres database. I >>>> am invoking the script through the following command: >>>> >>>> perl load_seqdatabase.pl -host localhost -dbname biodb06 -format >>>> fasta >>>> -dbuser postgres -driver Pg >>>> >>>> I am getting the following error: >>>> >>>> -------------------- WARNING --------------------- >>>> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >>>> were ("FGENES >>>> HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570| >> 4400| >>>> 1","unknown" >>>> ,"","0","") FKs (1,) >>>> ERROR: duplicate key violates unique constraint >>>> "bioentry_accession_key" >>>> --------------------------------------------------- >>>> Could not store unknown: >>>> ------------- EXCEPTION ------------- >>>> MSG: error while executing statement in >>>> Bio::DB::BioSQL::SeqAdaptor::find_by_uni >>>> que_key: ERROR: current transaction is aborted, commands ignored >>>> until end of t >>>> ransaction block >>>> STACK >>>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ >>>> lib/perl >>>> 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 >>>> STACK >> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >>>> usr/lib/perl5 >>>> /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ >>>> perl5/site_perl/5 >>>> .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/ >> perl5/ >>>> site_perl/5. >>>> 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>> STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ >>>> site_perl/5.8. >>>> 5/Bio/DB/Persistent/PersistentObject.pm:271 >>>> STACK (eval) load_seqdatabase.pl:620 >>>> STACK toplevel load_seqdatabase.pl:602 >>>> -------------------------------------- >>>> at load_seqdatabase.pl line 633 >>>> >>>> Can anyone tell me how I can correct this error and get my script >>>> running? Thanks!!! >>>> >>>> George. >>>> >>>> >>>> __________________________________________________ >>>> Do You Yahoo!? >>>> Tired of spam? Yahoo! Mail has the best spam protection around >>>> http://mail.yahoo.com >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam protection around >>> http://mail.yahoo.com >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> >> Looking for earth-friendly autos? >> Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > > --------------------------------- > Check out the all-new Yahoo! Mail beta - Fire up a more powerful > email and get things done faster. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== --------------------------------- Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== --------------------------------- Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: maize_pep.fasta Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070130/83ee1c40/attachment-0001.pl From 2004116058 at njau.edu.cn Wed Jan 31 04:03:03 2007 From: 2004116058 at njau.edu.cn (xiefuliang) Date: Wed, 31 Jan 2007 17:03:03 +0800 Subject: [Bioperl-l] your kind help needed! Message-ID: <000601c74516$9ddcaa70$1a66a8c0@c40125c78411a5> Hello, dear administrators. I am fresh man for bioperl. At present, I find the efficiency is very high with bioperl. So, I want to thank the people who contribute to Bioperl, you too. Now, I have some question in using bioperl. As listed below: 1. Sequence Retrieval from Local Database use Bio::DB::Fasta; my $db = Bio::DB::Fasta->new($dir_with_fa_files); my $seqstr = $db->seq(??SEQUENCE1??); When I used the module, I found $db->seq(??SEQUENCE1??) will retrieve the corresponding sequence and the ID of next sequence. e.g. The result is AGCTTGGGGAAGGTT >DFOOO1 I hope you can test the module again. Thanks. 2. I want to make BLASTX in GenBank, swissprot, or other protein databases. And I just want to obtain the best hit information (HSP). I found several modules can do the work. But I do not know which one is the best. I failed to obtain the best hit information. If conveniently, could you send me some example source code, which including how to make BLASTX and obtain the best hit information. 3. I use the command line ??ppm install bioperl?? to install bioperl. So, I have only bioperl 1.2 in computer. How can I update my bioperl? Thank you for your help. Regards, Xie Fuliang From cjfields at uiuc.edu Wed Jan 31 09:05:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 Jan 2007 08:05:46 -0600 Subject: [Bioperl-l] CON(structed) sequence databases? In-Reply-To: <934F95E71B6C9347A873C42AE3C196191489C800@NZT0004E.dknz.nzcorp.net> References: <934F95E71B6C9347A873C42AE3C196191489C800@NZT0004E.dknz.nzcorp.net> Message-ID: On Jan 30, 2007, at 1:45 AM, JK ((Jesper Agerbo Krogh)) wrote: > Hi. > > What do you do about parsing sequences from the "CON"-divisions of > EMBL/Genbank? The entries looks just like this one: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > db=nucleotide&cmd=search&t > erm=CH445337&doptcmdl=GenBank > > The bioperl 1.4 parser dies on the embl-version and the 1.5 parser > uses > the complete .dat file as a single entry. > > Thanks. > > Jesper For GenBank CONTIG/WGS line parsing you'll have to update to Bioperl 1.5.2 (I added that in after 1.5.1). The CONTIG data is currently just carved up by newline and stored as SimpleValue annotation when parsing GenBank records; I don't believe it is even parsed with EMBL at this time. Although we could probably do something using Bio::Location objects, there really hasn't been much demand for it since one can retrieve the sequences assembled by NCBI by requesting the full GenBank record (automatically set up in Bio::DB::GenBank) or requesting return format 'gbwithparts' when using eutils. To retrieve the parsed data from a GenBank record in a Bio::Seq object: my @contigs = $seq->annotation->get_Annotations('CONTIG'); If the complete .dat file is read as a single file then there's definitely a bug (end of seq record isn't detected), which is possible since I only tested against single CON files. Could you point out the dat file you checked so I can test it out? chris From cjfields at uiuc.edu Wed Jan 31 09:21:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 Jan 2007 08:21:54 -0600 Subject: [Bioperl-l] want score from needle output In-Reply-To: <764978cf0701310225y28bee35bg39044e9436075840@mail.gmail.com> References: <764978cf0701310225y28bee35bg39044e9436075840@mail.gmail.com> Message-ID: <1156FA36-6024-4C02-BAEA-A751F858830A@uiuc.edu> On Jan 31, 2007, at 4:25 AM, neeti somaiya wrote: > Hi, > > I am using a code like the following to parse needle output. While > I am able > to get the percent identity value, the score method is not returning > anything, though the needle output does report a score too. > > -------------------------------- > my $str = Bio::AlignIO->new(-format => 'emboss',-file => > $needle_output); > my $aln = $str->next_aln(); > > my $aln_perc_iden = $aln->percentage_identity; > my $aln_score = $aln->score; > ----------------------------- > > Can anyone suggest what the problem could be? > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l It's possible the needle output has changed. What version of EMBOSS are you using? chris From bix at sendu.me.uk Wed Jan 31 09:36:18 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 31 Jan 2007 14:36:18 +0000 Subject: [Bioperl-l] your kind help needed! In-Reply-To: <000601c74516$9ddcaa70$1a66a8c0@c40125c78411a5> References: <000601c74516$9ddcaa70$1a66a8c0@c40125c78411a5> Message-ID: <45C0A962.9000101@sendu.me.uk> xiefuliang wrote: > > use Bio::DB::Fasta; > > my $db = Bio::DB::Fasta->new($dir_with_fa_files); > > my $seqstr = $db->seq(?SEQUENCE1?); > > When I used the module, I found $db->seq(?SEQUENCE1?) will retrieve > the corresponding sequence and the ID of next sequence. > > e.g. The result is > > AGCTTGGGGAAGGTT > >> DFOOO1 This may be due to your old version of Bioperl, but if not, post the file that contains SEQUENCE1 (if it is reasonably small). > 2. I want to make BLASTX in GenBank, swissprot, or other protein > databases. And I just want to obtain the best hit information (HSP). > I found several modules can do the work. But I do not know which one > is the best. I failed to obtain the best hit information. If > conveniently, could you send me some example source code, which > including how to make BLASTX and obtain the best hit information. See the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO > 3. I use the command line ?ppm install bioperl? to install bioperl. > So, I have only bioperl 1.2 in computer. How can I update my bioperl? Install Bioperl 1.5.2: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Hope that helps, Sendu. From MEC at stowers-institute.org Wed Jan 31 09:54:42 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 31 Jan 2007 08:54:42 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store delete() not working Message-ID: Sendu, I too have found the same to be true, traced the code, couldn't explain to myself where lies the bug, and found through experimentation that a loop will get around it. Causing me to write code such as: $gene->object_store->delete($_) foreach $gene->MRNA; # TODO: understand why calling ->delete() only deletes 1 feature (not all)! Cheers, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Wednesday, January 31, 2007 5:10 AM > To: bioperl-l > Subject: [Bioperl-l] Bio::DB::SeqFeature::Store delete() not working > > Hi, > > I'm trying to use Bio::DB::SeqFeature::Store delete() on a list of > Bio::SeqFeature::Annotated retrieved from the database. It > doesn't work. > I presume I'm falling foul of the issue pointed out in the docs: > > "WARNING: The current DBI::mysql implementation has some issues that > need to be resolved, namely (1) normalized subfeatures are NOT > recursively deleted; and (2) the deletions are not performed in a > transaction." > > Is there a trick to avoid the problem? Or, how might someone go about > improving the implementation so that it worked? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Wed Jan 31 09:47:03 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 31 Jan 2007 14:47:03 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store delete() not working In-Reply-To: <45C07914.6060908@sendu.me.uk> References: <45C07914.6060908@sendu.me.uk> Message-ID: <45C0ABE7.4060803@sendu.me.uk> Sendu Bala wrote: > Hi, > > I'm trying to use Bio::DB::SeqFeature::Store delete() on a list of > Bio::SeqFeature::Annotated retrieved from the database. It doesn't work. > I presume I'm falling foul of the issue pointed out in the docs: > > "WARNING: The current DBI::mysql implementation has some issues that > need to be resolved, namely (1) normalized subfeatures are NOT > recursively deleted; and (2) the deletions are not performed in a > transaction." > > Is there a trick to avoid the problem? Or, how might someone go about > improving the implementation so that it worked? Actually, it was just a simple bug in Bio::DB::SeqFeature::Store::DBI::mysql::_deleteid() I've committed a fix that I hope won't break anything; all I did was have it return the number of rows deleted, since Bio::DB::SeqFeature::Store::delete() needs a true return from _deleteid() or it will only delete the first feature supplied to it. (I've left the current implementation for delete() which effectively gives up the moment a feature fails to be deleted, instead of trying to delete the remaining features it was supplied.) I propose changing delete() to return the number of features it successfully deletes instead of boolean, to match store() behaviour. But that isn't so important. From hlapp at gmx.net Wed Jan 31 10:36:32 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2007 10:36:32 -0500 Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: <817961.76470.qm@web58904.mail.re1.yahoo.com> References: <817961.76470.qm@web58904.mail.re1.yahoo.com> Message-ID: <5F3B4A74-D010-4EEA-8CF6-2BCAAFF0300E@gmx.net> Hi George, your reply to mine contained a plain-text file attachment of (roughly) 8MB. Not only does it hang up my email reader (probably because it recognizes the files as text and tries to display it), but you also sent it to the entire list, resulting in 8MB being sent to around 1600 people. Please, whenever you have larger files, only send it to an individual, *never* to a mailing list unless specifically asked to do so. Second, whenever you have large text files to attach, *always* compress them. This will typically save 70% of the bandwidth being used for transmission. What I could recover from your file looked like the first element is unique. It also looks like the last 4 elements are the accession of the target sequence (presumably of a GenBank entry - presumably maize BAC clones?) and the start and end coordinates and the strand. All of display_id, accession_number, and primary_id must be 40 chars or less (sorry for not pointing that out right away), so you may use the first element for all three of them. You could also use some combination of the other elements to construct a more meaningful display_id, just don't exceed 40 chars. Setting primary_id() to undef should definitely work. If it doesn't you may be using an old version of Bioperl. You should be using release 1.5.2, or at least 1.5.x. It's possible that the 1.4 release still had the bug of not allowing the setting to undef, please upgrade if that's the version you're using. Let me know if you still have problems with getting this to work. -hilmar On Jan 30, 2007, at 12:32 PM, George Heller wrote: > Hi Hilmar, > > I have been trying to get around this problem for some days now, > but havent had much luck. My file has about 12000 odd records, and > when I try to load it with a pipeline to the package, I have about > 483 records that get loaded. > > sub process_seq > { > my ($self, $seq) = @_; > # $seq->accession_number($seq->display_id); > my @ids = split(/\|/,$seq->display_id); > $seq->accession_number($ids[0]); > $seq->primary_id($ids[2]); > return ($seq); > } > I am assuming this is because I use the ids[2] for the primary_id, > and possibly the file has 483 unique records for that field. When I > use some other reference like ids[3] etc(coz it has more unique > values), I get the same error about the length being more than 40. > > I printed out the values for the display_id, primary_id and > accession_number. The display_id has the entire first line of the > file, so does the primary_id, if I dont set it in my new script. > The accession number is split correctly and printed. > > This is how the first few lines in my file look like, > > >FGENESHT0000001||AC155633|570|4400|1 > MTERKRKEIEDRKRKISGPQPGSSNRPRFSGNQPQQFRQNQRPPQQHQQFQRQYPQHQYQNRQSNQSGGQ > FQRQNQQAPR > LPAPAAQQNSQATPAQVGNRACFHCGEQGHWVMQCPKKAAQQQSGPNAPAKQNVPQPRAGNRSQPRYNHG > RLNHLEAEAV > QETPSMIVGMFPVDSHIAEVLFDTGATHSFITASWVEAHNLPITTMSTPIQIDSAGGRIRADSICLNICV > EIRGIAFPAN > LIVMGTQGIDVILGMNWLDKYQAVISCDKRTIKLMSPLGEEVVTELVPPEPKRGSCYQLAVDSSEVDPIE > SIRVVSEFPD > VFPKDLPGMPPERKVEFAIELLPGTAPIFKRAYRISGPELVELKEQIDELSEKGYIRPSTSPWAAPVLFV > EKKDGTKRMC > IDYRALNEVTIKNKYPLPRIEDLFDQLRGASVFSKIDLRSAFFMNLMNSVFMDYLDKFVVVFIDDILVYS > QSEEEHADHL > KMVLQRLREHQLYAKLSKCEFWINEVLFLGHIINKEGLAVDPKKVANILNWKAPTDARGIKSFIGMVGYY > RRFIEGFSKI > AKPMTALLGNKVEFKWTQKCQEAFEALKEKLTIAPVLVLPDVHKPFSVYCDACYTGLGCVLMQEGRVVAY > SSRQLKVHEK > NYPIHDLELAAVVHALKTWRHYLYGQKCDVYTDHKSLKYIFTQSELNMRQRRWLELIKDYELEIHYHPGK > ANVVADALSR > KSQVNLMVARPMPYELAKEFDRLSLGFLNNSRGVTVELEPTLEREIKEAQKNDEKISEIRRLILDGRGKD > FREDAEGVIW > FKDRLCVPNVQSIRELILKEAHETAYSIHPGSEKMYQDLKKKFWWYGMKREIAEHVAMCDSCRRIKAEHQ > RPAGLLQPLQ > IPQWKWDEIGMDFI > VGLPRTRAGYDSIWVVVDRLTKSAHFIPVKTNYSSAVLAELYMSRIVCLHGVPKKIVSDRGTQFTS > HFWRQLHEALGTHLNFSSAYHPQTDGQTERTNQILEDMLRACALQDQSGWDKRLPYAEFSYNNSYQASLK > MSPFQALYGR > SCRTPLQWDQPGEKQVFGPDILLEAEENIKMVRENLKIAQSRQRSYADTRRRELSFEVGDFVYLKVSPIR > GVKRFGVKGK > LAPRYIGSYQILARRGEVAYQLSLPENLSAVHDVFHVSQLKKCLRVPEEQLPVEGLEVQEDLTYVEKPVQ > ILEVADRVTR > RKTIRMCKVRWNHHSEEEATSEREDDLMAKYPELFASQP* > Any suggestions? > > Thanks! > George. > > > > > > Hilmar Lapp wrote: > That's odd indeed. Did you try and put a print statement before the > return statement that proves that 1) the codes gets executed, and 2) > display_id(), primary_id() and accession_number() have the expected > values? > > BTW you might also want to set primary_id() to undef (as the ID found > in the FASTA files doesn't really count as a primary database- > specific ID anyway). The identifier column in bioentry (which > primary_id() maps to) is constrained to 40 chars as well. > > -hilmar > > On Jan 27, 2007, at 9:31 PM, George Heller wrote: > > > Hi Hilmar, > > > > I tried the lookup and noupdate options, and also made changes to > > the SeqProcessor.pm package for the accession, > > package SeqProcessor::Accession; > > use strict; > > use vars qw(@ISA); > > use Bio::Seq::BaseSeqProcessor; > > use Bio::SeqFeature::Generic; > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > sub process_seq > > { > > my ($self, $seq) = @_; > > my @ids = split(/|/,$seq->display_id); > > $seq->accession_number($ids[0]); > > return ($seq); > > } > > 1; > > I invoke the load_seqdatabase.pl as, > > perl load_seqdatabase.pl -host localhost -dbname usda-06 -format > > fasta -dbuser postgres -driver Pg --lookup --noupdate -- > > pipeline="SeqProcessor::Accession" maize_pep.fasta > > Loading maize_pep.fasta ... > > I get the error, > > -------------------- WARNING --------------------- > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > > were ("FGENESHT0000021||AC155633|113788| > > 114708|-1","FGENESHT0000021||AC155633|113788| > > 114708|-1","FGENESHT0000021||AC155633|113788|114708|-1","","0","") > > FKs (1,) > > ERROR: value too long for type character varying(40) > > --------------------------------------------------- > > Could not store FGENESHT0000021||AC155633|113788|114708|-1: > > ------------- EXCEPTION ------------- > > MSG: error while executing statement in > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > > transaction is aborted, commands ignored until end of transaction > > block > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ > > perl//Bio/DB/Persistent/PersistentObject.pm:272 > > STACK (eval) load_seqdatabase.pl:620 > > STACK toplevel load_seqdatabase.pl:602 > > -------------------------------------- > > at load_seqdatabase.pl line 633 > > As far as I gather, this error shouldnt appear as we are > > filtering out the accession as only the first code that appears. > > Ideas? > > > > George. > > > > > > Hilmar Lapp wrote: > > George, > > > > I don't know you create the FASTA file, but that's probably where > the > > root cause is. Based on the message: > > > >> -------------------- WARNING --------------------- > >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| > >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > >> 1","","0","") FKs (1,) > >> ERROR: duplicate key violates unique constraint > >> "bioentry_accession_key" > >> --------------------------------------------------- > > > > the identifier and accession number are set, so your SeqProcessor > > scriptlet was executed (otherwise you'd also have seen a dynamic > > loading error it e.g. you perl class could be not be found or loaded > > by perl). If you still receive the duplicate key violation, then it > > can only mean that indeed a sequence with the exact same accession > > number was in the database already. > > > > There are different possibilities for why: you may have loaded the > > same file before (use --lookup and related switches if you want to > > update existing sequences), or your FASTA file contains multiple > > sequences with the same ID, or you have a sequence with the same ID > > in different FASTA files, if you are loading from more than one > file. > > In either of the two latter cases, you will need to find a way to > > disambiguate the IDs. > > > > BTW you also want to consider to parse the concatenated ID > > 'FGENESHT0000001||AC155633|570|4400|1' apart into its component IDs, > > and then use only one component. For example: > > > > my @ids = split(/|/,$seq->display_id); > > $seq->accession_number($ids[0]); > > > > Obviously, this will only make for a nicer accession number, and not > > solve your duplicate ID problem, as the latter is in the file(s) you > > load. > > > > -hilmar > > > > On Jan 25, 2007, at 8:51 PM, George Heller wrote: > > > >> Hi Hilmar, > >> > >> I still seem to be having problems loading my fasta file. I wrote a > >> new package, SeqProcessor.pm as below, > >> > >> package SeqProcessor::Accession; > >> use strict; > >> use vars qw(@ISA); > >> use Bio::Seq::BaseSeqProcessor; > >> use Bio::SeqFeature::Generic; > >> @ISA = qw(Bio::Seq::BaseSeqProcessor); > >> sub process_seq > >> { > >> my ($self, $seq) = @_; > >> $seq->accession_number($seq->display_id); > >> return ($seq); > >> } > >> 1; > >> I have this file SeqProcessor.pm in my home directory, and I have > >> set the PERL5LIB variable accordingly. When I run > >> load_seqdatabase.pl, > >> > >> perl load_seqdatabase.pl -host localhost -dbname biodb -format > >> fasta -dbuser postgres -driver Pg -- > >> pipeline="SeqProcessor::Accession" maize_pep.fasta > >> > >> I still get the error, > >> > >> Loading maize_pep.fasta ... > >> -------------------- WARNING --------------------- > >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| > >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| > >> 1","","0","") FKs (1,) > >> ERROR: duplicate key violates unique constraint > >> "bioentry_accession_key" > >> --------------------------------------------------- > >> Could not store FGENESHT0000001||AC155633|570|4400|1: > >> ------------- EXCEPTION ------------- > >> MSG: error while executing statement in > >> Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > >> transaction is aborted, commands ignored until end of transaction > >> block > >> STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > >> STACK Bio::DB::Persistent::PersistentObject::store /home/akar/ > local/ > >> perl//Bio/DB/Persistent/PersistentObject.pm:272 > >> STACK (eval) load_seqdatabase.pl:620 > >> STACK toplevel load_seqdatabase.pl:602 > >> -------------------------------------- > >> at load_seqdatabase.pl line 633 > >> Is there something I am missing? > >> > >> Thanks! > >> George. > >> > >> > >> Hilmar Lapp wrote: > >> Hi George, sorry for the sluggish response, I was tied up during > the > >> week. This is also why you always want to keep the thread on the > >> list. > >> > >> Perl is an interpreted language, so no compilation is necessary. > The > >> only thing you need to do is have the package in a place where perl > >> can find it. The simplest way to achieve this is by setting the > >> PERL5LIB environment variable: > >> > >> $ export PERL5LIB=/where/you/put/your/perl/package > >> > >> or if PERL5LIB was set already, you'd append it: > >> > >> $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package > >> > >> I do assume that you didn't really add your code to the > SeqAdaptor.pm > >> package - there is no necessity for nor benefit from that, and at > >> worst (and quite likely) perl won't be able to find the package. > Note > >> that there is plenty of documentation for how to write packages for > >> perl and how to make them accessible to perl. > >> > >> Hth, > >> > >> -hilmar > >> > >> On Jan 8, 2007, at 11:52 PM, George Heller wrote: > >> > >>> Hi Hilmer. > >>> > >>> Thanks so much for the response. As I am new to Bioperl, I have > >>> another question. > >>> > >>> I have made the changes as suggested by you, and have added the > >>> code below to the SeqAdaptor.pm script. > >>> > >>> package SeqProcessor::Accession; > >>> use strict; > >>> use vars qw(@ISA); > >>> use Bio::Seq::BaseSeqProcessor; > >>> use Bio::SeqFeature::Generic; > >>> > >>> @ISA = qw(Bio::Seq::BaseSeqProcessor); > >>> > >>> sub process_seq > >>> { > >>> my ($self, $seq) = @_; > >>> $seq->accession_number($seq->display_id); > >>> return ($seq); > >>> } > >>> > >>> Now that I have done my changes, do I need to compile or something > >>> for the changes to reflect? If so, can you please let me know the > >>> command for the same, or direct me to any lin that has > >>> documentation for the same? > >>> > >>> Thanks so much for the help. > >>> George. > >>> > >>> Hilmar Lapp wrote: > >>> George, > >>> > >>> this is almost certainly caused by using FASTA format and > bioperl's > >>> treatment of it. I am guilty of not having written a FAQ yet for > >>> Bioperl-db, as this would certainly be there. > >>> > >>> Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl > >>> uses Bioperl to parse sequence files) does not extract the > accession > >>> number from the description line of the fasta sequence, and > instead > >>> sets the accession_number property if sequence objects it > creates to > >>> "unknown". Since there is a unique key constraint on > >>> (accession,version,namespace) the second sequence loaded will > raise > >>> an exception as it will violate the constraint. > >>> > >>> The simplest way to deal with this is to write a SeqProcessor that > >>> massages the accession_number appropriately and then supply the > >>> module to load_seqdatabase.pl using the --pipeline command line > >>> switch. > >>> > >>> There are several examples for how to do this in the email > archives. > >>> See for example this thread on the Biosql list: > >>> > >>> http://lists.open-bio.org/pipermail/biosql-l/2005-August/ > 000901.html > >>> > >>> with two links to examples, and Marc Logghe gives another one > in the > >>> thread itself. > >>> > >>> Hth, > >>> > >>> -hilmar > >>> > >>> On Jan 8, 2007, at 3:17 PM, George Heller wrote: > >>> > >>>> Hi all. > >>>> > >>>> I am new to Bioperl and am trying to run the load_seqdatabase.pl > >>>> script to load sequence data from a file into Postgres > database. I > >>>> am invoking the script through the following command: > >>>> > >>>> perl load_seqdatabase.pl -host localhost -dbname biodb06 -format > >>>> fasta > >>>> -dbuser postgres -driver Pg > >>>> > >>>> I am getting the following error: > >>>> > >>>> -------------------- WARNING --------------------- > >>>> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, > values > >>>> were ("FGENES > >>>> HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570| > >> 4400| > >>>> 1","unknown" > >>>> ,"","0","") FKs (1,) > >>>> ERROR: duplicate key violates unique constraint > >>>> "bioentry_accession_key" > >>>> --------------------------------------------------- > >>>> Could not store unknown: > >>>> ------------- EXCEPTION ------------- > >>>> MSG: error while executing statement in > >>>> Bio::DB::BioSQL::SeqAdaptor::find_by_uni > >>>> que_key: ERROR: current transaction is aborted, commands ignored > >>>> until end of t > >>>> ransaction block > >>>> STACK > >>>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > usr/ > >>>> lib/perl > >>>> 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 > >>>> STACK > >> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > >>>> usr/lib/perl5 > >>>> /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 > >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ > >>>> perl5/site_perl/5 > >>>> .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 > >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/ > >> perl5/ > >>>> site_perl/5. > >>>> 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 > >>>> STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/ > perl5/ > >>>> site_perl/5.8. > >>>> 5/Bio/DB/Persistent/PersistentObject.pm:271 > >>>> STACK (eval) load_seqdatabase.pl:620 > >>>> STACK toplevel load_seqdatabase.pl:602 > >>>> -------------------------------------- > >>>> at load_seqdatabase.pl line 633 > >>>> > >>>> Can anyone tell me how I can correct this error and get my script > >>>> running? Thanks!!! > >>>> > >>>> George. > >>>> > >>>> > >>>> __________________________________________________ > >>>> Do You Yahoo!? > >>>> Tired of spam? Yahoo! Mail has the best spam protection around > >>>> http://mail.yahoo.com > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> -- > >>> =========================================================== > >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>> =========================================================== > >>> > >>> > >>> > >>> > >>> > >>> > >>> __________________________________________________ > >>> Do You Yahoo!? > >>> Tired of spam? Yahoo! Mail has the best spam protection around > >>> http://mail.yahoo.com > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> > >> Looking for earth-friendly autos? > >> Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > > > > > > > --------------------------------- > > Check out the all-new Yahoo! Mail beta - Fire up a more powerful > > email and get things done faster. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > Don't pick lemons. > See all the new 2007 cars at Yahoo! Autos. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Jan 31 11:23:52 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 Jan 2007 10:23:52 -0600 Subject: [Bioperl-l] bioperl-db Species question Message-ID: Hilmar, Sendu, I have a couple of quick bioperl-db questions; no hurry if you have your hands full. I was working on fixing a bioperl-db bug for Roy, which is also related to Brian's bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2197 http://bugzilla.open-bio.org/show_bug.cgi?id=2092 Two things popped up: 1) As Sendu alluded to in Brian's bug report (bug 2092), Bio::DB::BioSQL::SpeciesAdaptor has the current behavior of storing Bio::Species data by genus/species names (single name for each) via resetting classification(), which is inconsistent with Sendu's Bio::Species/Bio::Taxon changes. Currently, Bio::Species stores names strictly by genus/species node names, where the species node name normally also contains the genus name. I know where to make initial changes if we want to use node names in bioperl-db (SpeciesAdaptor::populate_from_row), but this currently break some bioperl-db species tests (02species.t). I can also see this being a big headache, though I couldn't find anything that indicates doing so isn't BioSQL-compliant. What would be the best way to go here? 2) Somewhat related: should we be running strict node name checks using throw() in Bio::Species at all times? There is no current workaround using FORCE, as is suggested in Bio::Species POD. This currently seems to cause more problems than it's worth and sometimes throws even when running tax lookups with bioperl-db (Roy's bug). I have changed the throw() to a warn() for the time being. In the meantime, I am also converting all tests in bioperl-db to Test::More (I'm about 3/4 through now); I'll add a copy of Test::Simple to bioperl-db t/lib per Michael Schwern's instructions JIC it isn't installed. Let me know if this is a problem. cheers! chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Wed Jan 31 12:02:08 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 Jan 2007 18:02:08 +0100 Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and annotations In-Reply-To: References: Message-ID: <6dce9a0b0701310902h464f1206u4d00a5df4e3b296c@mail.gmail.com> The problem is when you try to merge files from different sources. The IDs frequently collide. This is why we allow two GFF files to contain the same IDs, and assume that the database will create its own internal (unique) IDs when it loads them. The "Name" field is more or less the global identifier that you are looking for. Lincoln On 1/31/07, Cook, Malcolm wrote: > > Hi Lincoln, > > Thanks for the resolution of tag value methods. Your fixes work in my hands... > > I never knew that "ID column is that it is supposed to be LOCAL to the GFF3 file and is not intended to be stored in the database". > > I read in http://www.sequenceontology.org/gff3.shtml that > > ID Indicates the name of the feature. IDs must be unique > within the scope of the GFF file. > > But that ID is solely for the purpose of linearizing the relationships among features within the scope of the GFF text file. > > Further, in that document, I see lots of examples where the ID is what is holding together the fabric of the feature relationships; IDs appear as the value of: Target and Parent attributes. > > > > I'd appreciate it if you can provide a motivating example where the ID= in a (real life) GFF3 file is in fact ONLY used for the purpose of linearizing the feature relationships. > > > > ANyway, > > If it matters, the GFF formated genome I'm wallowing in these days from > Flybase (dmel r5.1) presents their FlyBase IDs in the ID attribute, like > this: > > 4 FlyBase gene 24068 25621 . + . ID=FBgn0040037;Name=CG17923;Dbxref=FlyBase:FBan0017923,FlyBase_Annotation_IDs:CG17923... > 4 FlyBase mRNA 24068 25621 . + . ID=FBtr0089155;Name=CG17923-RA;Parent=FBgn0040037;Dbxref=FlyBase_Annotation_IDs:CG17923-RA; > 4 FlyBase exon 24068 24477 . + . ID=CG17923:1;Name=CG17923:1;Parent=FBtr0089155 > > Switching to using 'Name' might work OK for my application. I'll look into it. It winds up being the same as the ID in some cases anyway.... > > Malcolm Cook > > > > ------------------------------ > *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On > Behalf Of *Lincoln Stein > *Sent:* Tuesday, January 30, 2007 4:46 PM > *To:* Cook, Malcolm > *Cc:* bioperl list; lstein at cshl.org > *Subject:* Re: Bio::DB::SeqFeature treamtent of tags and annotations > > I've fixed the first issue in CVS. Sorry for the inconsistency. > add_tag_value(), delete_tag_value() and get_Annotations() now all work as > expected. > > The problem with the ID column is that it is supposed to be LOCAL to the > GFF3 file and is not intended to be stored in the database. In contrast, > Name can survive roundtripping. Perhaps the thing to do is to add a flag to > the GFF3 file that turns on ID round-tripping, e.g. > > ##round-trip-ids: 1 > > If you like this idea, I can implement it. > > Lincoln > > On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org> wrote: > > > > Lincoln, > > > > Thanks for your suggestions on approach to my problems augmenting > > Flybase annotation. I am trying to follow them and finding the following > > oddities > > > > The first issue relates to the intermix of 'annotations' and 'tag > > values'. I find that Bio::DB::SeqFeature implements some of the 'tag' > > methods and some of the 'Annotation' methods. Here is a perl one-liner that > > shows values stored using add_tag_value are not retreived with > > get_tag_values, but rather with get_Annotations. > > > > > perl -MBio::DB::SeqFeature -e 'my $f=Bio::DB::SeqFeature->new; > > $f->add_tag_value("x",666); print "get_tag_values:\t" . > > $f->get_tag_values("x") . "\nget_Annotations:\t" . > > $f->get_Annotations("x");' > > > > whose output is: > > get_tag_values: > > get_Annotations: 666 > > > > Tracing this shows me that this results from the fact that: > > > > Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase (via > > Bio::DB::SeqFeature::NormalizedFeature) which does not support -tags in > > ->new but rather -attributes, viz: > > > > -attributes a hashref of tag value attributes, in which the key is > > the tag > > and the value is an array reference of values > > > > And though Bio::Graphics::FeatureBase purports to implement > > Bio::SeqFeatureI, it only partially implements the 'tag' methods (now > > deprecated and relegated to Bio::AnnotatableI). In particular, the '*' > > methods Bio::SeqFeatureI are not implemented in Bio::Graphics::FeatureBase > > > > > > has_tag > > * add_tag_value > > get_tag_values > > get_all_tags > > * remove_tag > > get_tagset_values > > get_Annotations > > > > As a result, add_tag_value and remove_tag are inherited from different > > modules whose understanding of tags is not the same! > > > > This one-liner : > > > > >perl -MClass::ISA -MClass::Inspector -MBio::DB::SeqFeature -e 'my @c = > > Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn > > qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep > > {Class::Inspector->function_exists($_, $fn)} @c)}' > > > > confirms that they are defined in different packages, namely: > > > > add_tag_value: Bio::AnnotatableI > > get_tag_values: Bio::Graphics::FeatureBase Bio::AnnotatableI > > Proposed solution... hmmmm ..... I dunno.... maybe the following patch > > to Bio::Graphics::FeatureBase->add_tag_value : > > > > sub add_tag_value { > > my ($self,$tag, at vals) = @_; > > push @{$self->{attributes}{$tag}}, @vals; > > } > > > > It fixes my use case for now but I'm still concerned and confused about > > this variety of methods. > > > > Suggestions? > > > > > > > > ------------------------------------------------------------------------- > > > > Also, I think that any "ID" in column 9 of GFF3 float file should be > > preserved through a round-trip through a Bio::DB::SeqFeature store, but this > > is not yet possible since any ID attribute in GFF3 column 9 is being > > lost by GFF3Loader, causing me to locally patch GFF3Loader::handle_feature > > method to add the following: > > > > # mec at stowers-institute.org , wondering why not all attributes are > > # carried forward, adds ID tag in particular service of > > # round-tripping ID, which, though present in database as load_id > > # attribute, was getting lost as itself > > $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID}; > > > > Poised to patch.... what d'you think? > > > > Malcolm Cook > > Stowers Institute for Medical Research - Kansas City, Missouri > > > > > > > > ------------------------------ > > From: lincoln.stein at gmail.com [mailto: lincoln.stein at gmail.com] On > > Behalf Of Lincoln Stein > > Sent: Tuesday, December 19, 2006 3:58 PM > > To: Cook, Malcolm > > Cc: bioperl list; lstein at cshl.org > > Subject: Re: bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader > > problems augmenting Flybase annotation > > > > Hi Malcom, > > > > Your second guess was right. The use case of augmenting an existing gene > > with additional splice forms isn't provided for. You can get the > > functionality by making direct calls to Bio::DB::SeqFeature::Store methods: > > > > my @genes = $db->get_features_by_name('FBgn0017545'); > > @genes == 1 or die "Didn't get exactly one gene"; > > my $parent = $genes[0]; > > > > my $parent = $genes[0]; > > my $chr = $parent->seq_id; > > my $start = $parent->start; > > my $end = $parent->end; > > my $strand = $parent->strand; > > > > my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', > > -source => 'added', > > -seq_id => '4r', > > -strand => $strand, > > -start => $start+10, > > -end => $end, > > ); > > $parent->add_SeqFeature($new_splice_form); > > > > for my $pos ([$start+10,$start+100],[$start+200,$end]) { > > my ($e_start,$e_end) = @$pos; > > my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', > > -store => $db, > > -seq_id => '4r', > > -strand => $strand, > > -start => $e_start, > > -end => $e_end); > > $new_splice_form->add_SeqFeature($exon); > > } > > > > I found a bug in updating the seqfeature database when I wrote this > > script, so you'll have to get the latest biperl live. I think you can use > > this to write a splice form updating script. > > > > In order to support the idea of adding new splice forms to an existing > > gene using the GFF3 format, I will have to either modify the loader, or > > write a separate script (probably better to do the latter). It shouldn't be > > hard if you'd like to give it a try. > > > > Lincoln > > > > On 12/19/06, Cook, Malcolm wrote: > > > > > > Lincoln and fellow Bio::DB::SeqFeature travelers, > > > > > > I find that using bp_seqfeature_load.PLS to load subfeatures of genes > > > already loaded using bp_seqfeature_load.PLS fails with > > > > > > ------------- EXCEPTION ------------- > > > MSG: FBgn0017545 doesn't have a primary id > > > STACK > > > Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables > > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 > > > STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree > > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 > > > STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load > > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 > > > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh > > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 > > > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load > > > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 > > > STACK toplevel > > > /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo > > > > > > ad.PLS:76 > > > > > > Where FBgn0017545 is the ID of a gene previously loaded. > > > > > > I am unsure how to remedy my situation and welcome any advise on > > > correct > > > or improved approach to my problem. > > > > > > Here's some detail if it helps. I am developing a pipeline to design > > > a > > > microarray probes capable of distinguishing among splice variants in > > > drosophila (using latest Flybase dmel_r5.1 annotation). So I > > > > > > 1) load a filtered selection of Flybase annotation using > > > bp_seqfeature_load. (for testing purposes, I am using a single gene's > > > > > > worth of annotation, FBgn0017545.gff, attached). This is done as > > > follows: > > > > > > > bp_seqfeature_load.PLS --create FBgn0017545.gff > > > > > > 2) analyze all the genes in the database, and create GFF3 output each > > > feature of which has a 'Parent' that is a previously loaded gene (i.e. > > > FBgn0017545). (These features represent the unique introns, splice > > > sites, and exonic design targets. Output of this analysis, > > > FBgn0017545_matd.gff, is also attached) > > > > > > 3) load these analysis results into the same database, as follows: > > > > > > > bp_seqfeature_load.PLS FBgn0017545_matd.gff > > > > > > It is at this point that I get the above error. > > > > > > However, I don't get any error and the data loads fine if I load the > > > two > > > files together, as follows: > > > > > > > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff > > > FBgn0017545_matd.gff) > > > > > > So, I suspect that either I am misunderstanding when/how to use > > > bp_seqfeature_load.PLS or else this use case has not yet arisen and > > > must > > > be provided for somehow. > > > > > > I am running against bioperl-live > > > > > > Thanks for your thoughts and assistance, > > > > > > Malcolm Cook > > > Database Applications Manager - Bioinformatics > > > Stowers Institute for Medical Research - Kansas City, Missouri > > > > > > > > > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Wed Jan 31 12:18:56 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 Jan 2007 18:18:56 +0100 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store delete() not working In-Reply-To: <45C0ABE7.4060803@sendu.me.uk> References: <45C07914.6060908@sendu.me.uk> <45C0ABE7.4060803@sendu.me.uk> Message-ID: <6dce9a0b0701310918m6c6f2748jece35352dc18ba0e@mail.gmail.com> Thank you very much for finding and fixing the problem! I was about to start working on the issue when I saw that you'd taken care of it. Lincoln On 1/31/07, Sendu Bala wrote: > > Sendu Bala wrote: > > Hi, > > > > I'm trying to use Bio::DB::SeqFeature::Store delete() on a list of > > Bio::SeqFeature::Annotated retrieved from the database. It doesn't work. > > I presume I'm falling foul of the issue pointed out in the docs: > > > > "WARNING: The current DBI::mysql implementation has some issues that > > need to be resolved, namely (1) normalized subfeatures are NOT > > recursively deleted; and (2) the deletions are not performed in a > > transaction." > > > > Is there a trick to avoid the problem? Or, how might someone go about > > improving the implementation so that it worked? > > Actually, it was just a simple bug in > Bio::DB::SeqFeature::Store::DBI::mysql::_deleteid() > > I've committed a fix that I hope won't break anything; all I did was > have it return the number of rows deleted, since > Bio::DB::SeqFeature::Store::delete() needs a true return from > _deleteid() or it will only delete the first feature supplied to it. > (I've left the current implementation for delete() which effectively > gives up the moment a feature fails to be deleted, instead of trying to > delete the remaining features it was supplied.) > > I propose changing delete() to return the number of features it > successfully deletes instead of boolean, to match store() behaviour. But > that isn't so important. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Jan 31 13:56:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 Jan 2007 12:56:31 -0600 Subject: [Bioperl-l] DocBook docs, was Re: Downloading the tutorial for offline reading In-Reply-To: References: Message-ID: <6BE5D03D-F430-4B01-A140-865770197B02@uiuc.edu> On Jan 30, 2007, at 3:59 PM, Brian Osborne wrote: > Chris, > > My recommendation would be to not use Docbook, for a couple of > reasons. One > is that very few people can stand writing in Docbook XML, you have > to learn > all the tags and the act of writing itself is slow. You won't get much > written if people have to use this format. WYSIWYG? Don't know, but > it would > have to be an app that works on the troika of Linux, Win, and Mac > > Second problem is the conversion itself, Docbook to PDF, HTML, and > text. > Now, you don't have to convert to all of these formats of course, > perhaps > only PDF. In this case you probably have to do Docbook -> fo -> > PDF. I can > tell you that setting up all the Java applications was a true PITA, > all > praise to CPAN for providing a means of installing multiple > packages and > tracking version dependencies simultaneously. However, if you > insist I've > given you a sense of what I did with the shell script below. Hint: > you must > also hack the XSL files provided by e-novative. The reason I used > them is > because the resulting PDF and HTML is very pretty. > > A qualification here: all my knowledge of Docbook is a bit dated, I > threw > all those *jar's out when the Wiki was set up. > > There _must_ be a better way, I think it's based on Wiki and > something like > html2pdf. > > Brian O. Wiki or HTML conversion tools are worth looking into; there are a few perl-based tools but most convertors are in other languages. I may toy around with this at some point when I have more time, $job and all... chris From raim at tbi.univie.ac.at Wed Jan 31 16:09:49 2007 From: raim at tbi.univie.ac.at (Rainer Machne) Date: Wed, 31 Jan 2007 22:09:49 +0100 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? Message-ID: <45C1059D.1070100@tbi.univie.ac.at> Dear Bioperl list, hoping not be on the wrong email list, i would have a short question: Is there a standard way or are there nice (Bioperl) tools to come from a gene id (gi) other ids (see below) to the genomic coordinates of the respective gene? We have Fasta files retrieved from NCBI protein Blast in fungal genomes: >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago maydis 521] or >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata] (we only have gi, ref and gb in my set). I retrieved all my fasta files from whole fungal genomes with available protein sequences at http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi As I only searched whole finished genomes (not shotgun), I thought it would then be easy to get the genomic coordinates and retrieve upstream sequences, but we have failed so far to find a consistent way to do this automatically. Many of the gi entries refer to mRNAs or partial mRNAs and the way to the coordinates seems to differ for each case. Any suggestions would be appreciated. with kind regards, Rainer Machne University of Vienna Department for Theoretical Chemistry Theoretical Biochemistry Group From jason at bioperl.org Wed Jan 31 17:00:01 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 31 Jan 2007 14:00:01 -0800 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: <0CA3BA2C-0CFF-4161-9F5E-0DAD33F3E33B@bioperl.org> Rainer - You probably want to download each whole genome as a genbank file, parse the file and generate the coordinates of all the genes. AFAIK, It is non-trivial to get the genomic coordinates starting from a gene record only using NCBI interface since you really want to see the sequence in genomic context. I wrote a genbank2gff formatters that worked okay for my needs: http://fungal.genome.duke.edu/~jes12/software/scripts/gbk2gff3.perl.txt You can also try Chris Mungall's Unflattener module: http://bioperl.org/wiki/Module:Bio::SeqFeature::Tools::Unflattener http://search.cpan.org/~sendu/bioperl/Bio/SeqFeature/Tools/ Unflattener.pm Some annotations are inconsistent from the standardized formats so you have to work in some special casing if you really want to pull the data in consistently every time. There are tools in BioPerl for managing these databases, typically the data can be represented in GFF format for simplicity and there are database implementations for fast access to the data. See Bio::DB::GFF and Bio::DB::SeqFeature I did make GFF files during my graduate work for most of the (then) available fungal genomes - http://fungal.genome.duke.edu/ which may be useful to you as well. -jason -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ On Jan 31, 2007, at 1:09 PM, Rainer Machne wrote: > Dear Bioperl list, > > hoping not be on the wrong email list, i would have a short question: > > Is there a standard way or are there nice (Bioperl) tools to come > from a > gene id (gi) other ids (see below) to the genomic coordinates of the > respective gene? > > We have Fasta files retrieved from NCBI protein Blast in fungal > genomes: > >> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago > maydis 521] > or >> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida >> glabrata] > > (we only have gi, ref and gb in my set). > > I retrieved all my fasta files from whole fungal genomes with > available > protein sequences at > http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi > > As I only searched whole finished genomes (not shotgun), I thought it > would then be easy to get the genomic coordinates and retrieve > upstream > sequences, but we have failed so far to find a consistent way to do > this > automatically. Many of the gi entries refer to mRNAs or partial mRNAs > and the way to the coordinates seems to differ for each case. > > Any suggestions would be appreciated. > > with kind regards, > Rainer Machne > > University of Vienna > Department for Theoretical Chemistry > Theoretical Biochemistry Group > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From george.heller at yahoo.com Wed Jan 31 18:15:25 2007 From: george.heller at yahoo.com (George Heller) Date: Wed, 31 Jan 2007 15:15:25 -0800 (PST) Subject: [Bioperl-l] Error while running load_seqdatabase.pl In-Reply-To: <5F3B4A74-D010-4EEA-8CF6-2BCAAFF0300E@gmx.net> Message-ID: <840956.45979.qm@web58909.mail.re1.yahoo.com> Hi Hilmar, I am extremely sorry about sending such a huge file, I realized only after I sent it. Really sorry.... And, finally, the loading worked. I set the primary_id, accession and the display_id to the first number on my file, and it loads perfectly fine now! Thank you so much Hilmar for all the help. George. Hilmar Lapp wrote: Hi George, your reply to mine contained a plain-text file attachment of (roughly) 8MB. Not only does it hang up my email reader (probably because it recognizes the files as text and tries to display it), but you also sent it to the entire list, resulting in 8MB being sent to around 1600 people. Please, whenever you have larger files, only send it to an individual, *never* to a mailing list unless specifically asked to do so. Second, whenever you have large text files to attach, *always* compress them. This will typically save 70% of the bandwidth being used for transmission. What I could recover from your file looked like the first element is unique. It also looks like the last 4 elements are the accession of the target sequence (presumably of a GenBank entry - presumably maize BAC clones?) and the start and end coordinates and the strand. All of display_id, accession_number, and primary_id must be 40 chars or less (sorry for not pointing that out right away), so you may use the first element for all three of them. You could also use some combination of the other elements to construct a more meaningful display_id, just don't exceed 40 chars. Setting primary_id() to undef should definitely work. If it doesn't you may be using an old version of Bioperl. You should be using release 1.5.2, or at least 1.5.x. It's possible that the 1.4 release still had the bug of not allowing the setting to undef, please upgrade if that's the version you're using. Let me know if you still have problems with getting this to work. -hilmar On Jan 30, 2007, at 12:32 PM, George Heller wrote: Hi Hilmar, I have been trying to get around this problem for some days now, but havent had much luck. My file has about 12000 odd records, and when I try to load it with a pipeline to the package, I have about 483 records that get loaded. sub process_seq { my ($self, $seq) = @_; # $seq->accession_number($seq->display_id); my @ids = split(/\|/,$seq->display_id); $seq->accession_number($ids[0]); $seq->primary_id($ids[2]); return ($seq); } I am assuming this is because I use the ids[2] for the primary_id, and possibly the file has 483 unique records for that field. When I use some other reference like ids[3] etc(coz it has more unique values), I get the same error about the length being more than 40. I printed out the values for the display_id, primary_id and accession_number. The display_id has the entire first line of the file, so does the primary_id, if I dont set it in my new script. The accession number is split correctly and printed. This is how the first few lines in my file look like, >FGENESHT0000001||AC155633|570|4400|1 MTERKRKEIEDRKRKISGPQPGSSNRPRFSGNQPQQFRQNQRPPQQHQQFQRQYPQHQYQNRQSNQSGGQFQRQNQQAPR LPAPAAQQNSQATPAQVGNRACFHCGEQGHWVMQCPKKAAQQQSGPNAPAKQNVPQPRAGNRSQPRYNHGRLNHLEAEAV QETPSMIVGMFPVDSHIAEVLFDTGATHSFITASWVEAHNLPITTMSTPIQIDSAGGRIRADSICLNICVEIRGIAFPAN LIVMGTQGIDVILGMNWLDKYQAVISCDKRTIKLMSPLGEEVVTELVPPEPKRGSCYQLAVDSSEVDPIESIRVVSEFPD VFPKDLPGMPPERKVEFAIELLPGTAPIFKRAYRISGPELVELKEQIDELSEKGYIRPSTSPWAAPVLFVEKKDGTKRMC IDYRALNEVTIKNKYPLPRIEDLFDQLRGASVFSKIDLRSAFFMNLMNSVFMDYLDKFVVVFIDDILVYSQSEEEHADHL KMVLQRLREHQLYAKLSKCEFWINEVLFLGHIINKEGLAVDPKKVANILNWKAPTDARGIKSFIGMVGYYRRFIEGFSKI AKPMTALLGNKVEFKWTQKCQEAFEALKEKLTIAPVLVLPDVHKPFSVYCDACYTGLGCVLMQEGRVVAYSSRQLKVHEK NYPIHDLELAAVVHALKTWRHYLYGQKCDVYTDHKSLKYIFTQSELNMRQRRWLELIKDYELEIHYHPGKANVVADALSR KSQVNLMVARPMPYELAKEFDRLSLGFLNNSRGVTVELEPTLEREIKEAQKNDEKISEIRRLILDGRGKDFREDAEGVIW FKDRLCVPNVQSIRELILKEAHETAYSIHPGSEKMYQDLKKKFWWYGMKREIAEHVAMCDSCRRIKAEHQRPAGLLQPLQ IPQWKWDEIGMDFI VGLPRTRAGYDSIWVVVDRLTKSAHFIPVKTNYSSAVLAELYMSRIVCLHGVPKKIVSDRGTQFTS HFWRQLHEALGTHLNFSSAYHPQTDGQTERTNQILEDMLRACALQDQSGWDKRLPYAEFSYNNSYQASLKMSPFQALYGR SCRTPLQWDQPGEKQVFGPDILLEAEENIKMVRENLKIAQSRQRSYADTRRRELSFEVGDFVYLKVSPIRGVKRFGVKGK LAPRYIGSYQILARRGEVAYQLSLPENLSAVHDVFHVSQLKKCLRVPEEQLPVEGLEVQEDLTYVEKPVQILEVADRVTR RKTIRMCKVRWNHHSEEEATSEREDDLMAKYPELFASQP* Any suggestions? Thanks! George. Hilmar Lapp wrote: That's odd indeed. Did you try and put a print statement before the return statement that proves that 1) the codes gets executed, and 2) display_id(), primary_id() and accession_number() have the expected values? BTW you might also want to set primary_id() to undef (as the ID found in the FASTA files doesn't really count as a primary database- specific ID anyway). The identifier column in bioentry (which primary_id() maps to) is constrained to 40 chars as well. -hilmar On Jan 27, 2007, at 9:31 PM, George Heller wrote: > Hi Hilmar, > > I tried the lookup and noupdate options, and also made changes to > the SeqProcessor.pm package for the accession, > package SeqProcessor::Accession; > use strict; > use vars qw(@ISA); > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > @ISA = qw(Bio::Seq::BaseSeqProcessor); > sub process_seq > { > my ($self, $seq) = @_; > my @ids = split(/|/,$seq->display_id); > $seq->accession_number($ids[0]); > return ($seq); > } > 1; > I invoke the load_seqdatabase.pl as, > perl load_seqdatabase.pl -host localhost -dbname usda-06 -format > fasta -dbuser postgres -driver Pg --lookup --noupdate -- > pipeline="SeqProcessor::Accession" maize_pep.fasta > Loading maize_pep.fasta ... > I get the error, > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were ("FGENESHT0000021||AC155633|113788| > 114708|-1","FGENESHT0000021||AC155633|113788| > 114708|-1","FGENESHT0000021||AC155633|113788|114708|-1","","0","") > FKs (1,) > ERROR: value too long for type character varying(40) > --------------------------------------------------- > Could not store FGENESHT0000021||AC155633|113788|114708|-1: > ------------- EXCEPTION ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > transaction is aborted, commands ignored until end of transaction > block > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ > local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ > perl//Bio/DB/Persistent/PersistentObject.pm:272 > STACK (eval) load_seqdatabase.pl:620 > STACK toplevel load_seqdatabase.pl:602 > -------------------------------------- > at load_seqdatabase.pl line 633 > As far as I gather, this error shouldnt appear as we are > filtering out the accession as only the first code that appears. > Ideas? > > George. > > > Hilmar Lapp wrote: > George, > > I don't know you create the FASTA file, but that's probably where the > root cause is. Based on the message: > >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| >> 1","","0","") FKs (1,) >> ERROR: duplicate key violates unique constraint >> "bioentry_accession_key" >> --------------------------------------------------- > > the identifier and accession number are set, so your SeqProcessor > scriptlet was executed (otherwise you'd also have seen a dynamic > loading error it e.g. you perl class could be not be found or loaded > by perl). If you still receive the duplicate key violation, then it > can only mean that indeed a sequence with the exact same accession > number was in the database already. > > There are different possibilities for why: you may have loaded the > same file before (use --lookup and related switches if you want to > update existing sequences), or your FASTA file contains multiple > sequences with the same ID, or you have a sequence with the same ID > in different FASTA files, if you are loading from more than one file. > In either of the two latter cases, you will need to find a way to > disambiguate the IDs. > > BTW you also want to consider to parse the concatenated ID > 'FGENESHT0000001||AC155633|570|4400|1' apart into its component IDs, > and then use only one component. For example: > > my @ids = split(/|/,$seq->display_id); > $seq->accession_number($ids[0]); > > Obviously, this will only make for a nicer accession number, and not > solve your duplicate ID problem, as the latter is in the file(s) you > load. > > -hilmar > > On Jan 25, 2007, at 8:51 PM, George Heller wrote: > >> Hi Hilmar, >> >> I still seem to be having problems loading my fasta file. I wrote a >> new package, SeqProcessor.pm as below, >> >> package SeqProcessor::Accession; >> use strict; >> use vars qw(@ISA); >> use Bio::Seq::BaseSeqProcessor; >> use Bio::SeqFeature::Generic; >> @ISA = qw(Bio::Seq::BaseSeqProcessor); >> sub process_seq >> { >> my ($self, $seq) = @_; >> $seq->accession_number($seq->display_id); >> return ($seq); >> } >> 1; >> I have this file SeqProcessor.pm in my home directory, and I have >> set the PERL5LIB variable accordingly. When I run >> load_seqdatabase.pl, >> >> perl load_seqdatabase.pl -host localhost -dbname biodb -format >> fasta -dbuser postgres -driver Pg -- >> pipeline="SeqProcessor::Accession" maize_pep.fasta >> >> I still get the error, >> >> Loading maize_pep.fasta ... >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >> were ("FGENESHT0000001||AC155633|570|4400|1","FGENESHT0000001|| >> AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400| >> 1","","0","") FKs (1,) >> ERROR: duplicate key violates unique constraint >> "bioentry_accession_key" >> --------------------------------------------------- >> Could not store FGENESHT0000001||AC155633|570|4400|1: >> ------------- EXCEPTION ------------- >> MSG: error while executing statement in >> Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current >> transaction is aborted, commands ignored until end of transaction >> block >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/ >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/ >> local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 >> STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/ >> perl//Bio/DB/Persistent/PersistentObject.pm:272 >> STACK (eval) load_seqdatabase.pl:620 >> STACK toplevel load_seqdatabase.pl:602 >> -------------------------------------- >> at load_seqdatabase.pl line 633 >> Is there something I am missing? >> >> Thanks! >> George. >> >> >> Hilmar Lapp wrote: >> Hi George, sorry for the sluggish response, I was tied up during the >> week. This is also why you always want to keep the thread on the >> list. >> >> Perl is an interpreted language, so no compilation is necessary. The >> only thing you need to do is have the package in a place where perl >> can find it. The simplest way to achieve this is by setting the >> PERL5LIB environment variable: >> >> $ export PERL5LIB=/where/you/put/your/perl/package >> >> or if PERL5LIB was set already, you'd append it: >> >> $ export PERL5LIB=${PERL5LIB}:/where/you/put/your/perl/package >> >> I do assume that you didn't really add your code to the SeqAdaptor.pm >> package - there is no necessity for nor benefit from that, and at >> worst (and quite likely) perl won't be able to find the package. Note >> that there is plenty of documentation for how to write packages for >> perl and how to make them accessible to perl. >> >> Hth, >> >> -hilmar >> >> On Jan 8, 2007, at 11:52 PM, George Heller wrote: >> >>> Hi Hilmer. >>> >>> Thanks so much for the response. As I am new to Bioperl, I have >>> another question. >>> >>> I have made the changes as suggested by you, and have added the >>> code below to the SeqAdaptor.pm script. >>> >>> package SeqProcessor::Accession; >>> use strict; >>> use vars qw(@ISA); >>> use Bio::Seq::BaseSeqProcessor; >>> use Bio::SeqFeature::Generic; >>> >>> @ISA = qw(Bio::Seq::BaseSeqProcessor); >>> >>> sub process_seq >>> { >>> my ($self, $seq) = @_; >>> $seq->accession_number($seq->display_id); >>> return ($seq); >>> } >>> >>> Now that I have done my changes, do I need to compile or something >>> for the changes to reflect? If so, can you please let me know the >>> command for the same, or direct me to any lin that has >>> documentation for the same? >>> >>> Thanks so much for the help. >>> George. >>> >>> Hilmar Lapp wrote: >>> George, >>> >>> this is almost certainly caused by using FASTA format and bioperl's >>> treatment of it. I am guilty of not having written a FAQ yet for >>> Bioperl-db, as this would certainly be there. >>> >>> Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl >>> uses Bioperl to parse sequence files) does not extract the accession >>> number from the description line of the fasta sequence, and instead >>> sets the accession_number property if sequence objects it creates to >>> "unknown". Since there is a unique key constraint on >>> (accession,version,namespace) the second sequence loaded will raise >>> an exception as it will violate the constraint. >>> >>> The simplest way to deal with this is to write a SeqProcessor that >>> massages the accession_number appropriately and then supply the >>> module to load_seqdatabase.pl using the --pipeline command line >>> switch. >>> >>> There are several examples for how to do this in the email archives. >>> See for example this thread on the Biosql list: >>> >>> http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html >>> >>> with two links to examples, and Marc Logghe gives another one in the >>> thread itself. >>> >>> Hth, >>> >>> -hilmar >>> >>> On Jan 8, 2007, at 3:17 PM, George Heller wrote: >>> >>>> Hi all. >>>> >>>> I am new to Bioperl and am trying to run the load_seqdatabase.pl >>>> script to load sequence data from a file into Postgres database. I >>>> am invoking the script through the following command: >>>> >>>> perl load_seqdatabase.pl -host localhost -dbname biodb06 -format >>>> fasta >>>> -dbuser postgres -driver Pg >>>> >>>> I am getting the following error: >>>> >>>> -------------------- WARNING --------------------- >>>> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values >>>> were ("FGENES >>>> HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570| >> 4400| >>>> 1","unknown" >>>> ,"","0","") FKs (1,) >>>> ERROR: duplicate key violates unique constraint >>>> "bioentry_accession_key" >>>> --------------------------------------------------- >>>> Could not store unknown: >>>> ------------- EXCEPTION ------------- >>>> MSG: error while executing statement in >>>> Bio::DB::BioSQL::SeqAdaptor::find_by_uni >>>> que_key: ERROR: current transaction is aborted, commands ignored >>>> until end of t >>>> ransaction block >>>> STACK >>>> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/ >>>> lib/perl >>>> 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948 >>>> STACK >> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >>>> usr/lib/perl5 >>>> /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/ >>>> perl5/site_perl/5 >>>> .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203 >>>> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/ >> perl5/ >>>> site_perl/5. >>>> 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 >>>> STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/ >>>> site_perl/5.8. >>>> 5/Bio/DB/Persistent/PersistentObject.pm:271 >>>> STACK (eval) load_seqdatabase.pl:620 >>>> STACK toplevel load_seqdatabase.pl:602 >>>> -------------------------------------- >>>> at load_seqdatabase.pl line 633 >>>> >>>> Can anyone tell me how I can correct this error and get my script >>>> running? Thanks!!! >>>> >>>> George. >>>> >>>> >>>> __________________________________________________ >>>> Do You Yahoo!? >>>> Tired of spam? Yahoo! Mail has the best spam protection around >>>> http://mail.yahoo.com >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam protection around >>> http://mail.yahoo.com >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> >> Looking for earth-friendly autos? >> Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > > --------------------------------- > Check out the all-new Yahoo! Mail beta - Fire up a more powerful > email and get things done faster. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== --------------------------------- Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== --------------------------------- Everyone is raving about the all-new Yahoo! Mail beta. From barry.moore at genetics.utah.edu Wed Jan 31 18:27:30 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 31 Jan 2007 16:27:30 -0700 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: Rainer, We use a perl library called CGL written by Mark Yandell and colleagues (which in turn uses Chris Mungal's BioChaos and Unflattener.pm referred to by Jason) for this type of task. The basic pipeline is convert GenBank files to Chaos XML, then use CGL with those XML files to get a nice object oriented access to exons, transcripts, proteins, coordinates and more for of those genes. I am currently using this with good success on most GenBank genomes (unfortunately I haven't been working with the fungal genomes, but it should work fine). The Ensembl API provides similar functionality for Ensembl genomes - but not very many fungi there. http://www.yandell-lab.org/cgl/ http://www.ensembl.org/info/software/core/core_tutorial.html Feel free to contact Mark or myself directly if you are interested in using CGL. Barry On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote: > Dear Bioperl list, > > hoping not be on the wrong email list, i would have a short question: > > Is there a standard way or are there nice (Bioperl) tools to come > from a > gene id (gi) other ids (see below) to the genomic coordinates of the > respective gene? > > We have Fasta files retrieved from NCBI protein Blast in fungal > genomes: > >> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago > maydis 521] > or >> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida >> glabrata] > > (we only have gi, ref and gb in my set). > > I retrieved all my fasta files from whole fungal genomes with > available > protein sequences at > http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi > > As I only searched whole finished genomes (not shotgun), I thought it > would then be easy to get the genomic coordinates and retrieve > upstream > sequences, but we have failed so far to find a consistent way to do > this > automatically. Many of the gi entries refer to mRNAs or partial mRNAs > and the way to the coordinates seems to differ for each case. > > Any suggestions would be appreciated. > > with kind regards, > Rainer Machne > > University of Vienna > Department for Theoretical Chemistry > Theoretical Biochemistry Group > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l