From bill at genenformics.com Sun Jun 1 00:28:04 2008 From: bill at genenformics.com (bill at genenformics.com) Date: Sat, 31 May 2008 21:28:04 -0700 (PDT) Subject: [Bioperl-l] How to extract list of SNPs for a given gene? In-Reply-To: References: <6F230E9769AA8D4EB4BC401DF133EDB7180C4A@NIHCESMLBX15.nih.gov> Message-ID: <61887.98.218.182.229.1212294484.squirrel@webmail.dreamhost.com> Hi, Abhijit, Gene2Snp, a standalone application which find SNPs for given Entrez Genes, can be freely downloaded from http://www.genenformics.com/download.html A sample output is available at http://www.genenformics.com/Gene2Snp_example_result.txt This application may consume lots of CPU/memory due to complexity of locus region. Bill at genenformics.com > From: artendulkar at gmail.com [mailto:artendulkar at gmail.com] > Sent: Tuesday, May 20, 2008 4:21 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] How to extract list of SNPs for a given gene? > > Hi, > Can anyone please tell me how to get list of SNPs in any particular gene > using BioPerl, given NCBI Gene ID? > Is there any method, which takes NCBI gene ID as argument and returns > list > of SNPs by connecting to dbSNP? > Thank you. > Abhijit > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Sun Jun 1 12:51:38 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 1 Jun 2008 11:51:38 -0500 Subject: [Bioperl-l] Fwd: BPlite In-Reply-To: References: <48412383.5080201@ucsf.edu> Message-ID: <5DB8D706-5312-4D70-B71A-60915A84C825@uiuc.edu> The problem is BPLite has been officially deprecated in favor of Bio::SearchIO BLAST parsing (including Sendu's BLAST-based pull parser). If there is interest in resurrecting BPLite we would need someone to actively maintain it. chris On May 31, 2008, at 4:53 PM, Jason Stajich wrote: > > > Begin forwarded message: > >> From: Anatoly Urisman >> Date: May 31, 2008 5:08:03 AM CDT >> To: jason... >> Subject: BPlite >> >> Hi Jason, >> I was wondering if you are aware of a fix to the BPlite.pm module >> that supports the new NCBI blastall output (i.e. reports are not >> delimited by something like BLASTN 2.2.8 [Jan-05-2004]). >> Thanks. >> Anatoly Urisman, MD-PhD > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Jun 3 11:25:26 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 3 Jun 2008 10:25:26 -0500 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl and the Google Summer of Code Message-ID: On behalf of the BioPerl core developers, Google, and the National Evolutionary Synthesis Center (NESCent), I would like to congratulate Mira Han on being accepted as a student for the Google Summer of Code (GSoC) and welcome her to the BioPerl community. Mira's accepted project proposal involves developing phyloXML support for BioPerl. Following is the proposal abstract: "PhyloXML is an XML document model for phylogenetic data that incorporates various annotation types, including user customized data. The format is currently not supported by BioPerl. I propose a SAX based data structure and interface for PhyloXML support in BioPerl. I will use most of the existing IO structures such as TreeIO and TreeEventBuilder and subclass them to extend the functions specific to PhyloXML. The objects will be connected to various existing BioPerl modules, such as SeqI, TaxonI, AnnotationI by reference in order to accommodate different phyloXML elements." NESCent, under the Phyloinformatics Summer of Code, is participating as a mentoring organization in the GSoC for the second year. This year, five projects (including Mira's) are being funded by Google, with a sixth project being funded by external sources. Mira's co- mentors for this project are myself, Jason Stajich, Rutger Vos, and Christian Zmasek (the primary developer of phyloXML). However, I encourage Mira to ask questions on the BioPerl mail list for feedback from the greater BioPerl community. Further information on phyloXML: http://www.phyloxml.org/ Further information on NESCent's Phyloinformatics Summer of Code, including all funded projects: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2008 Sincerely, Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann The Institute for Genomic Biology University of Illinois Urbana-Champaign From miraceti at gmail.com Tue Jun 3 11:58:00 2008 From: miraceti at gmail.com (miraceti) Date: Tue, 3 Jun 2008 11:58:00 -0400 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl and the Google Summer of Code In-Reply-To: References: Message-ID: Thanks for the welcome, I'm very excited to be part of the great community, Here is the wiki page for the project, *http://www.bioperl.org/wiki/PhyloXML_support_in_BioPerl *I'll look forward to interacting with you and getting lots of help! Mira Han On Tue, Jun 3, 2008 at 11:25 AM, Chris Fields wrote: > On behalf of the BioPerl core developers, Google, and the National > Evolutionary Synthesis Center (NESCent), I would like to congratulate Mira > Han on being accepted as a student for the Google Summer of Code (GSoC) and > welcome her to the BioPerl community. Mira's accepted project proposal > involves developing phyloXML support for BioPerl. Following is the proposal > abstract: > > "PhyloXML is an XML document model for phylogenetic data that incorporates > various annotation types, including user customized data. The format is > currently not supported by BioPerl. I propose a SAX based data structure and > interface for PhyloXML support in BioPerl. I will use most of the existing > IO structures such as TreeIO and TreeEventBuilder and subclass them to > extend the functions specific to PhyloXML. The objects will be connected to > various existing BioPerl modules, such as SeqI, TaxonI, AnnotationI by > reference in order to accommodate different phyloXML elements." > > NESCent, under the Phyloinformatics Summer of Code, is participating as a > mentoring organization in the GSoC for the second year. This year, five > projects (including Mira's) are being funded by Google, with a sixth project > being funded by external sources. Mira's co-mentors for this project are > myself, Jason Stajich, Rutger Vos, and Christian Zmasek (the primary > developer of phyloXML). However, I encourage Mira to ask questions on the > BioPerl mail list for feedback from the greater BioPerl community. > > Further information on phyloXML: > > http://www.phyloxml.org/ > > Further information on NESCent's Phyloinformatics Summer of Code, including > all funded projects: > > > https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2008 > > Sincerely, > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > The Institute for Genomic Biology > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bamboowarrior at gmail.com Tue Jun 3 16:50:37 2008 From: bamboowarrior at gmail.com (Arkady) Date: Tue, 3 Jun 2008 15:50:37 -0500 Subject: [Bioperl-l] liftOver API Message-ID: <91656c3f0806031350i442c6359v9c3461247e4340c6@mail.gmail.com> Hi folks, I've seen references occasionally to ensembl API or a BioPerl module for converting between (human) genome assemblies (e.g. hg17 to hg18). I'm also, of course, aware of liftOver, and of the chain file format. But I'm more interested in the API. Does this still exist? Where can I find it? If not, does anyone have something that does this? Cheers, John Woods From ousmane.diallo at crchum.qc.ca Tue Jun 3 17:21:42 2008 From: ousmane.diallo at crchum.qc.ca (Ousmane Diallo) Date: Tue, 03 Jun 2008 17:21:42 -0400 Subject: [Bioperl-l] How to get protein ID and get protein accession from GI Message-ID: <4845B5E6.4050703@crchum.qc.ca> hello, Could somebody help me on how to get the protein ID and ACCESSION using the mRNA gi or accession. my $db_obj = Bio::DB::GenBank->new(); my $seq_obj = $db_obj->get_Seq_by_acc('123456') ; # I pass here the mrna acc to get the seq_obj my $gi = $seq_obj->primary_id ; # I get here the gi I need to get the protein ID and protein ACCESSION" THANKS!! From pallavi.sarmah at igib.res.in Wed Jun 4 08:33:08 2008 From: pallavi.sarmah at igib.res.in (Pallavi Sarmah) Date: Wed, 4 Jun 2008 18:03:08 +0530 Subject: [Bioperl-l] Bioperl-ext installation error Message-ID: <4C33FA201D55F743B5DE794497FCA8971F0C88@n1ex> Hi, I am trying to install Bioperl-ext and when rum the Makefile.PL it gives me the following error. ERROR from evaluation of /home/pallavi/Pallavi/downloads/bioperl-ext-1.5.1/Bio/SeqIO/staden/Makefile.PL: Invalid version '' for Bio::SeqIO::staden::read. Can anyone let me know the remedy for this. I stuck with this for last 2-3 days. Pallavi From sidd.basu at gmail.com Wed Jun 4 10:54:13 2008 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Wed, 4 Jun 2008 09:54:13 -0500 Subject: [Bioperl-l] Re: How to get protein ID and get protein accession from GI In-Reply-To: <4845B5E6.4050703@crchum.qc.ca> References: <4845B5E6.4050703@crchum.qc.ca> Message-ID: <4846ac97.c505be0a.0446.1dc2@mx.google.com> Hi, You have to get the 'Feature' object for that. On Tue, 03 Jun 2008, Ousmane Diallo wrote: > hello, > Could somebody help me on how to get the protein ID and ACCESSION using the mRNA gi or accession. > > > my $db_obj = Bio::DB::GenBank->new(); my $seq_obj = > $db_obj->get_Seq_by_acc('123456') ; # I pass here the mrna acc to get the > seq_obj > my $gi = $seq_obj->primary_id ; # I get here the gi my ($feat) = grep { $_->primary_tag() eq 'Protein' } $seq_obj->get_SeqFeatures(); print $feat->seq_id(),"\n"; For details and explanations read the Howto here ..... http://www.bioperl.org/wiki/HOWTO:Feature-Annotation -siddhartha > > > I need to get the protein ID and protein ACCESSION" > > THANKS!! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Wed Jun 4 10:19:03 2008 From: jay at jays.net (Jay Hannah) Date: Wed, 4 Jun 2008 09:19:03 -0500 Subject: [Bioperl-l] How to get protein ID and get protein accession from GI In-Reply-To: <4845B5E6.4050703@crchum.qc.ca> References: <4845B5E6.4050703@crchum.qc.ca> Message-ID: <200F7B8D-3066-48A8-9965-9E4E215749D9@jays.net> On Jun 3, 2008, at 4:21 PM, Ousmane Diallo wrote: > Could somebody help me on how to get the protein ID and ACCESSION > using the mRNA gi or accession. > > my $db_obj = Bio::DB::GenBank->new(); > my $seq_obj = $db_obj->get_Seq_by_acc('123456') ; # I pass here > the mrna acc to get the seq_obj > my $gi = $seq_obj->primary_id ; # I get here > the gi > > I need to get the protein ID and protein ACCESSION" Please provide a real MRNA accession # you're interested in. I prefer sending example code that I know actually works on your data of interest. :) Thanks, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From bill at genenformics.com Thu Jun 5 01:16:07 2008 From: bill at genenformics.com (bill at genenformics.com) Date: Wed, 4 Jun 2008 22:16:07 -0700 (PDT) Subject: [Bioperl-l] How to get protein ID and get protein accession from GI Message-ID: <62133.98.218.171.90.1212642967.squirrel@webmail.dreamhost.com> Hi, Ousmane, IdConvert, a standalone application which convert protein/nucleotide gi/acc, can be freely downloaded from http://www.genenformics.com/download.html A sample output is available at http://www.genenformics.com/IdConvert_example_result.txt The following is a test run: >IdConvert.exe 300,NM_005252,399,NP_001225 #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc 300 299 X59693.1 300 CAA42214.1 ubiquinol--cytochrome c reductase [Bos taurus] NM_005252 6552332 NM_005252.2 4885241 NP_005243.1 v-fos FBJ murine osteosarcoma viral oncogene homolog [Homo sapiens] 399 399 V00111.1 400 CAA23445.1 unnamed protein product [Bos taurus] NP_001225 15451858 NM_001234.3 4502589 NP_001225.1 caveolin 3 [Homo sapiens] Bill at genenformics.com > hello, > Could somebody help me on how to get the protein ID and ACCESSION using > the mRNA gi or accession. > > > my $db_obj = Bio::DB::GenBank->new(); > my $seq_obj = $db_obj->get_Seq_by_acc('123456') ; # I pass here the mrna > acc to get the seq_obj > my $gi = $seq_obj->primary_id ; # I get here the gi > > > I need to get the protein ID and protein ACCESSION" > > THANKS!! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bug-bioperl at rt.cpan.org Thu Jun 5 12:01:13 2008 From: bug-bioperl at rt.cpan.org (=?UTF-8?B?Q8OpZHJpYyBDYWJhdQ==?= via RT) Date: Thu, 05 Jun 2008 12:01:13 -0400 Subject: [Bioperl-l] [rt.cpan.org #36480] Bug in Bio::Search::SearchUtils.pm In-Reply-To: <00ba01c8c725$49c454a0$dd4cfde0$@Cabau@tours.inra.fr> References: <00ba01c8c725$49c454a0$dd4cfde0$@Cabau@tours.inra.fr> Message-ID: Thu Jun 05 12:01:11 2008: Request 36480 was acted upon. Transaction: Ticket created by Cedric.Cabau at tours.inra.fr Queue: bioperl Subject: Bug in Bio::Search::SearchUtils.pm Broken in: (no value) Severity: (no value) Owner: Nobody Requestors: Cedric.Cabau at tours.inra.fr Status: new Ticket BioPerl version: bioperl-1.5.2_102 Module: Bio::Search::SearchUtils.pm OS: CentOS Linux version 2.6.18-53.1.14.el5 (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) Perl: v5.8.8 built for x86_64-linux-thread-multi Bug description: Methods tile_hsps looks (among other things) if alignment is ambiguous. Inside loop foreach $hsp ( $sbjct->hsps() ) { at line 177, methods $qoverlap = &_adjust_contigs() (line 200) and $soverlap = &_adjust_contigs() (line 206) are used to know if current HSP overlap previous ones on query and on subject. The problem is that in this loop, only the result of the last comparison which means the last HSP with previous ones is kept in variables $qoverlap and $soverlap. After the loop, we found (line 299): if($qoverlap) { if($soverlap) { $sbjct->ambiguous_aln('qs'); } else { $sbjct->ambiguous_aln('q'); } } elsif($soverlap) { $sbjct->ambiguous_aln('s'); } Only the result of the last comparison is stored in $sbjct and method ambiguous_aln from module Bio::Search::Hit::GenericHit will return wrong value if the alignment presents overlapping HSPs but last HSP not overlap with previous ones. To solve this bug, I just modify in line 200: $qoverlap = &_adjust_contigs('query', $hsp, $qstart, $qstop, \@qcontigs, $max_overlap, $frame, $qstrand); by $qoverlap += &_adjust_contigs('query', $hsp, $qstart, $qstop, \@qcontigs, $max_overlap, $frame, $qstrand); and in line 206: $soverlap = &_adjust_contigs('sbjct', $hsp, $sstart, $sstop, \@scontigs, $max_overlap, $frame, $sstrand); by $soverlap += &_adjust_contigs('sbjct', $hsp, $sstart, $sstop, \@scontigs, $max_overlap, $frame, $sstrand); to keep trace of overlaps in the whole HSPs screening process. Regards, Cedric -- +---------------------------------------------------------------+ | C?dric Cabau INRA - SIGENAE - URA | | Tel : 02.47.42.75.42 Fax : 02.47.42.77.78 | | http://www.sigenae.org INRA - UR 83 - 37380 Nouzilly | +---------------------------------------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jvaughn7 at utk.edu Thu Jun 5 11:53:54 2008 From: jvaughn7 at utk.edu (JustinV) Date: Thu, 5 Jun 2008 08:53:54 -0700 (PDT) Subject: [Bioperl-l] updating a reciprocal blast file Message-ID: <17673277.post@talk.nabble.com> I have a large reciprocal blast file that contains 3 proteomes. I'd like to integrate another proteome for downstream clustering. I imagine a command-line script that takes as input the new proteome in fasta format, the directory of the the old proteomes in fasta format, and the pre-existing reciprocal blast file, and then performs the proper blasts and updates the pre-existing reciprocal blast file accordingly. I am using blast locally and the downstream processing is done by OrthoMCL. I assume this has been handled before, but I can't track down the code. If anyone is familiar with a pre-exisiting script or has pertinent advice, I'd be much obliged. Justin -- View this message in context: http://www.nabble.com/updating-a-reciprocal-blast-file-tp17673277p17673277.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Thu Jun 5 16:18:40 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 5 Jun 2008 13:18:40 -0700 Subject: [Bioperl-l] updating a reciprocal blast file In-Reply-To: <17673277.post@talk.nabble.com> References: <17673277.post@talk.nabble.com> Message-ID: <167EE308-F6F6-4EE9-A773-255A68906ABB@bioperl.org> Are you keeping each pairwise in a separate file and then combining it all? http://fungalgenomes.org/~stajich/scripts/pairwise_blast_jobs_big.pl Are you fixing E-values so they are scaled across different sized databases? You will probably want to add a Z= parameter to insure values are useable. I also had to hack ORTHOMCL locally to cache things in DB_Files as it was too memory intensive the way it runs on my big datasets. -jason On Jun 5, 2008, at 8:53 AM, JustinV wrote: > > I have a large reciprocal blast file that contains 3 proteomes. > I'd like to > integrate another proteome for downstream clustering. I imagine a > command-line script that takes as input the new proteome in fasta > format, > the directory of the the old proteomes in fasta format, and the pre- > existing > reciprocal blast file, and then performs the proper blasts and > updates the > pre-existing reciprocal blast file accordingly. I am using blast > locally > and the downstream processing is done by OrthoMCL. I assume this > has been > handled before, but I can't track down the code. If anyone is > familiar with > a pre-exisiting script or has pertinent advice, I'd be much obliged. > > Justin > -- > View this message in context: http://www.nabble.com/updating-a- > reciprocal-blast-file-tp17673277p17673277.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jvaughn7 at utk.edu Fri Jun 6 10:18:16 2008 From: jvaughn7 at utk.edu (JustinV) Date: Fri, 6 Jun 2008 07:18:16 -0700 (PDT) Subject: [Bioperl-l] updating a reciprocal blast file In-Reply-To: <167EE308-F6F6-4EE9-A773-255A68906ABB@bioperl.org> References: <17673277.post@talk.nabble.com> <167EE308-F6F6-4EE9-A773-255A68906ABB@bioperl.org> Message-ID: <17693254.post@talk.nabble.com> Jason, Thanks for the suggestions. The blast report is a single file containing each query in the 3 proteomes against a database of the 3 proteomes. As you probably remember, OrthoMCL offers many modes of running that head-off various levels of processing. In any case, the reciprocal blast is the bottle-neck. Mode -3 forgoes this step, but it requires a single, exhaustive blast report. Perhaps, I could run the reciprocal blasts separately as you suggested and then integrate them with your pairwise_blast_jobs_big.pl. I have to say, this code looks a little spooky. Wouldn't it be possible to just resort (by normalized e-value, discussed below) and reprint each set of query hits based on a hash or index of the results of that query against the new proteome? And then tack on the results of new proteome against the updated database to the end of the total blast report. In terms of normalizing the e-value, since I am using a consistent scoring matrix, can't I just recalculate the scores based on new database size: (new e-value) = (new database size) * (old e-value) / (old database size) as in http://www.springerlink.com/content/55m318wwqdgtw85h/ (methods) and elsewhere. As of yet, I've been satisfied with the run time downstream of the reciprocal blast, but, as I've said, I'm currently only using three plant (dicot) proteomes. By "hack ORTHOMCL locally to cache things in DB_Files" do you mean serializing the blastSeq objects from early blasts in the blast_parse subroutine using bioperl-db or something? Maybe this is dumb assumption. In any case, I'd be curious to see your modified version of the OrthoMCL script. Justin Jason Stajich-3 wrote: > > Are you keeping each pairwise in a separate file and then combining > it all? > http://fungalgenomes.org/~stajich/scripts/pairwise_blast_jobs_big.pl > > Are you fixing E-values so they are scaled across different sized > databases? You will probably want to add a Z= parameter to insure > values are useable. > > I also had to hack ORTHOMCL locally to cache things in DB_Files as it > was too memory intensive the way it runs on my big datasets. > > -jason > On Jun 5, 2008, at 8:53 AM, JustinV wrote: > >> >> I have a large reciprocal blast file that contains 3 proteomes. >> I'd like to >> integrate another proteome for downstream clustering. I imagine a >> command-line script that takes as input the new proteome in fasta >> format, >> the directory of the the old proteomes in fasta format, and the pre- >> existing >> reciprocal blast file, and then performs the proper blasts and >> updates the >> pre-existing reciprocal blast file accordingly. I am using blast >> locally >> and the downstream processing is done by OrthoMCL. I assume this >> has been >> handled before, but I can't track down the code. If anyone is >> familiar with >> a pre-exisiting script or has pertinent advice, I'd be much obliged. >> >> Justin >> -- >> View this message in context: http://www.nabble.com/updating-a- >> reciprocal-blast-file-tp17673277p17673277.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/updating-a-reciprocal-blast-file-tp17673277p17693254.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From whuang.ustc at gmail.com Sun Jun 8 23:27:42 2008 From: whuang.ustc at gmail.com (Wen Huang) Date: Sun, 8 Jun 2008 22:27:42 -0500 Subject: [Bioperl-l] EMBL format field Message-ID: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> Hi all, I have a EMBL file that I want to extract one of the line ###file### ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. XX PA AB000170.1 XX DE Sus scrofa (pig) endopeptidase 24.16 type M1 XX OS Sus scrofa (pig) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. OX NCBI_TaxID=9823; ......... I want the accession number in the line that starts with PA, AB000170 in this example. Can anybody kindly help, tell me which module and method I should use? I tried various things like $seq_obj -> primary_id, display_id, get_secondary_id, etc.. they did not work... Thanks a lot! Wen From Marc.Logghe at ablynx.com Mon Jun 9 04:47:11 2008 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Mon, 9 Jun 2008 10:47:11 +0200 Subject: [Bioperl-l] EMBL format field In-Reply-To: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> Message-ID: <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Hi Wen, A dump of that sequence object (Data::Dumper is your friend !) reveals that the PA EMBL field is not saved into the object. However, you will find the string 'AB000170.1' in the embedded CDS feature, more precisely the seqid of the location object. I don't know whether that is always the case, but it is in your particular example. So, to get your hands on that value you have to do: my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; my $parent_id = $cds->location->seq_id; HTH, Marc Marc Logghe Senior Bioinformatician Ablynx nv > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Wen Huang > Sent: Monday, June 09, 2008 5:28 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] EMBL format field > > Hi all, > > I have a EMBL file that I want to extract one of the line > > ###file### > ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. > XX > PA AB000170.1 > XX > DE Sus scrofa (pig) endopeptidase 24.16 type M1 > XX > OS Sus scrofa (pig) > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > Mammalia; > OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. > OX NCBI_TaxID=9823; > ......... > > I want the accession number in the line that starts with PA, AB000170 > in this example. > > Can anybody kindly help, tell me which module and method I should use? > I tried various things like $seq_obj -> primary_id, display_id, > get_secondary_id, etc.. they did not work... > > Thanks a lot! > > Wen > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon Jun 9 08:30:07 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 9 Jun 2008 08:30:07 -0400 Subject: [Bioperl-l] EMBL format field In-Reply-To: <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Message-ID: If this is the case with the latest version of BioPerl it should be filed as a bug report for the embl parser. The ID ought to be reported in $seq->get_secondary_accessions() (which returns an array). If it doesn't, it sounds like a bug to me. -hilmar On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: > Hi Wen, > A dump of that sequence object (Data::Dumper is your friend !) reveals > that the PA EMBL field is not saved into the object. However, you will > find the string 'AB000170.1' in the embedded CDS feature, more > precisely > the seqid of the location object. I don't know whether that is always > the case, but it is in your particular example. > So, to get your hands on that value you have to do: > > my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; > my $parent_id = $cds->location->seq_id; > > HTH, > Marc > > Marc Logghe > Senior Bioinformatician > Ablynx nv >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Wen Huang >> Sent: Monday, June 09, 2008 5:28 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] EMBL format field >> >> Hi all, >> >> I have a EMBL file that I want to extract one of the line >> >> ###file### >> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >> XX >> PA AB000170.1 >> XX >> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >> XX >> OS Sus scrofa (pig) >> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >> Euteleostomi; >> Mammalia; >> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >> OX NCBI_TaxID=9823; >> ......... >> >> I want the accession number in the line that starts with PA, AB000170 >> in this example. >> >> Can anybody kindly help, tell me which module and method I should >> use? >> I tried various things like $seq_obj -> primary_id, display_id, >> get_secondary_id, etc.. they did not work... >> >> Thanks a lot! >> >> Wen >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From whuang.ustc at gmail.com Mon Jun 9 10:05:35 2008 From: whuang.ustc at gmail.com (Wen Huang) Date: Mon, 9 Jun 2008 09:05:35 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Message-ID: Hi Marc, Thanks a lot! It does work!! Wen On Jun 9, 2008, at 3:47 AM, Marc Logghe wrote: > Hi Wen, > A dump of that sequence object (Data::Dumper is your friend !) reveals > that the PA EMBL field is not saved into the object. However, you will > find the string 'AB000170.1' in the embedded CDS feature, more > precisely > the seqid of the location object. I don't know whether that is always > the case, but it is in your particular example. > So, to get your hands on that value you have to do: > > my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; > my $parent_id = $cds->location->seq_id; > > HTH, > Marc > > Marc Logghe > Senior Bioinformatician > Ablynx nv >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Wen Huang >> Sent: Monday, June 09, 2008 5:28 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] EMBL format field >> >> Hi all, >> >> I have a EMBL file that I want to extract one of the line >> >> ###file### >> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >> XX >> PA AB000170.1 >> XX >> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >> XX >> OS Sus scrofa (pig) >> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >> Euteleostomi; >> Mammalia; >> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >> OX NCBI_TaxID=9823; >> ......... >> >> I want the accession number in the line that starts with PA, AB000170 >> in this example. >> >> Can anybody kindly help, tell me which module and method I should >> use? >> I tried various things like $seq_obj -> primary_id, display_id, >> get_secondary_id, etc.. they did not work... >> >> Thanks a lot! >> >> Wen >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From whuang.ustc at gmail.com Mon Jun 9 10:07:28 2008 From: whuang.ustc at gmail.com (Wen Huang) Date: Mon, 9 Jun 2008 09:07:28 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Message-ID: <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> Hilmar, I tried that, it did not work. Marc's way can work. Thanks, Wen On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: > If this is the case with the latest version of BioPerl it should be > filed as a bug report for the embl parser. The ID ought to be > reported in $seq->get_secondary_accessions() (which returns an > array). If it doesn't, it sounds like a bug to me. > > -hilmar > > On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >> Hi Wen, >> A dump of that sequence object (Data::Dumper is your friend !) >> reveals >> that the PA EMBL field is not saved into the object. However, you >> will >> find the string 'AB000170.1' in the embedded CDS feature, more >> precisely >> the seqid of the location object. I don't know whether that is always >> the case, but it is in your particular example. >> So, to get your hands on that value you have to do: >> >> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >> my $parent_id = $cds->location->seq_id; >> >> HTH, >> Marc >> >> Marc Logghe >> Senior Bioinformatician >> Ablynx nv >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>> Sent: Monday, June 09, 2008 5:28 AM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] EMBL format field >>> >>> Hi all, >>> >>> I have a EMBL file that I want to extract one of the line >>> >>> ###file### >>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>> XX >>> PA AB000170.1 >>> XX >>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>> XX >>> OS Sus scrofa (pig) >>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>> Euteleostomi; >>> Mammalia; >>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >>> OX NCBI_TaxID=9823; >>> ......... >>> >>> I want the accession number in the line that starts with PA, >>> AB000170 >>> in this example. >>> >>> Can anybody kindly help, tell me which module and method I should >>> use? >>> I tried various things like $seq_obj -> primary_id, display_id, >>> get_secondary_id, etc.. they did not work... >>> >>> Thanks a lot! >>> >>> Wen >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Mon Jun 9 14:12:29 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 9 Jun 2008 13:12:29 -0500 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: References: Message-ID: <87FB1AA1-943D-4BCC-8F00-2CC5F05FFEAB@uiuc.edu> [cross-posting to bioperl-l for archiving] On Jun 8, 2008, at 11:32 PM, Han, Mira wrote: > ... > issues: > There are a lot of s when processing the elements, > I tried to make a hash of function references that point to the > member functions, > But when I tried calling it through the hash, it was giving me an > error that I'm trying to call a method on an unblessed object. I ran into something similar when setting up a few SeqIO modules (Bio::SeqIO::gbdriver being on of them) which passed on data chunks to method handlers. It has something to do with how the method is set up in the class (package) namespace and how you refer to it. It's a little tricky b/c you run into semantic issues with perl's 'hammered- on' OO, but it can be done. If you call using '$self->{lookup}->{$tag}->(@args)' directly, what happens is you can successfully call the method since you are still in the proper module namespace. However, since you aren't calling from the invocant ($self) directly but rather from a reference in the invocant, it treats the call like a subroutine instead of a method. Therefore no invocant is passed as the first argument (you will instead get either the first element in @args or 'undef' assigned to $self within the method). Not sure if this is supposed to be a feature or a bug. Regardless, any attempt within the method to do something with $self will result in a 'using an unblessed reference' or 'not a hash reference'. There are two solutions, both of which work. If you have method references stored in a hash table in the invocant: $self->{lookup}->{tag1} = \&foo; $self->{lookup}->{tag2} = \&bar; .... you can grab the actual code reference (checking using 'exists') and use it directly on the invocant, but NOT as a code reference. This acts as a symbolic reference, which is allowed for subroutine and method calls (I think it's supposed to be DWIM-my): if (exists $self->{lookup}->{$tag}) { my $method = $self->{lookup}->{$tag}; $self->$method(@args); } else {...} The above also works if you use strings in the lookup table which contain the name of the methods (again, symbolic reference): $self->{lookup}->{tag1} = 'foo'; $self->{lookup}->{tag2} = 'bar'; Alternately, you can pass the invocant in explicitly (which looks weird to me, hence my above solution): if (exists $self->{lookup}->{$tag}) { $self->{lookup}->{$tag}->($self, @args); } else {...} perl6 fixes a lot of these issues, but of course it won't be out for a while longer. > I'd like to figure out how to do it, > But before that, is hashing really better than lots of if-elses? Using a stack of if-elsifs isn't as efficient as a lookup since you would test each case in succession (so something that is further down the if-elseif test stack would have passed through and failed each previous test case before success). A lookup table would test simply based on the existence of a value stored under a key (tag). An alternative is to use 5.10 features (smart matching and given-when, which is like a switch statement), but that will limit usage for those still using 5.8.8, which is probably a majority of users, since 5.10 came out just last December. chris > > > Mira > > > > On 6/2/08 10:29 AM, "Han, Mira" wrote: > > > > Last week (May 26-30): > 1. made skeleton files for TreeIO:: PhyloEventBuilder, > TreeIO::phyloXML, Tree::NodePhyloXML > 2. managed to connect and load them up but there is a bus error > problem. > I think it's probably due to some of the function calls that I'm > making > That I haven't looked into properly. I'm suspecting it will go away > once I properly > build in the end_element for > > This week (Jun 2-6): > 1. implement start_element, and end_element for and > > - start_element: : add treelevel, : push data > to current_items. > - end_element: : minus treelevel, : pop data > from current_elements, use new() to build node from popped data. > 2. get rid of that bus error > 3. TreeIO::phyloXML::Next_tree() : look for element > _______________________________________________ > Wg-phyloinformatics mailing list > Wg-phyloinformatics at nescent.org > https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Jun 9 15:08:23 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 9 Jun 2008 14:08:23 -0500 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: References: Message-ID: <763316E4-F480-45FC-BC81-A450D61CAB2D@uiuc.edu> Yes, that works as well. chris On Jun 9, 2008, at 1:30 PM, aaron.j.mackey at gsk.com wrote: > How about just: > > $self->${ $self->{lookup}->{tag} }(@args) > > i.e., shorthand for: > > $method = $self->{lookup}->{tag} > $self->$method(@args); > > -Aaron > > wg-phyloinformatics-bounces at nescent.org wrote on 06/09/2008 02:12:29 > PM: > >> [cross-posting to bioperl-l for archiving] >> >> On Jun 8, 2008, at 11:32 PM, Han, Mira wrote: >> >>> ... >>> issues: >>> There are a lot of s when processing the elements, >>> I tried to make a hash of function references that point to the >>> member functions, >>> But when I tried calling it through the hash, it was giving me an >>> error that I'm trying to call a method on an unblessed object. >> >> I ran into something similar when setting up a few SeqIO modules >> (Bio::SeqIO::gbdriver being on of them) which passed on data chunks >> to >> method handlers. It has something to do with how the method is set >> up >> in the class (package) namespace and how you refer to it. It's a >> little tricky b/c you run into semantic issues with perl's 'hammered- >> on' OO, but it can be done. >> >> If you call using '$self->{lookup}->{$tag}->(@args)' directly, what >> happens is you can successfully call the method since you are still >> in >> the proper module namespace. However, since you aren't calling from >> the invocant ($self) directly but rather from a reference in the >> invocant, it treats the call like a subroutine instead of a method. >> Therefore no invocant is passed as the first argument (you will >> instead get either the first element in @args or 'undef' assigned to >> $self within the method). Not sure if this is supposed to be a >> feature or a bug. Regardless, any attempt within the method to do >> something with $self will result in a 'using an unblessed reference' >> or 'not a hash reference'. >> >> There are two solutions, both of which work. If you have method >> references stored in a hash table in the invocant: >> >> $self->{lookup}->{tag1} = \&foo; >> $self->{lookup}->{tag2} = \&bar; >> .... >> >> you can grab the actual code reference (checking using 'exists') and >> use it directly on the invocant, but NOT as a code reference. This >> acts as a symbolic reference, which is allowed for subroutine and >> method calls (I think it's supposed to be DWIM-my): >> >> if (exists $self->{lookup}->{$tag}) { >> my $method = $self->{lookup}->{$tag}; >> $self->$method(@args); >> } else {...} >> >> The above also works if you use strings in the lookup table which >> contain the name of the methods (again, symbolic reference): >> >> $self->{lookup}->{tag1} = 'foo'; >> $self->{lookup}->{tag2} = 'bar'; >> >> Alternately, you can pass the invocant in explicitly (which looks >> weird to me, hence my above solution): >> >> if (exists $self->{lookup}->{$tag}) { >> $self->{lookup}->{$tag}->($self, @args); >> } else {...} >> >> perl6 fixes a lot of these issues, but of course it won't be out >> for a >> while longer. >> >>> I'd like to figure out how to do it, >>> But before that, is hashing really better than lots of if-elses? >> >> Using a stack of if-elsifs isn't as efficient as a lookup since you >> would test each case in succession (so something that is further down >> the if-elseif test stack would have passed through and failed each >> previous test case before success). A lookup table would test simply >> based on the existence of a value stored under a key (tag). >> >> An alternative is to use 5.10 features (smart matching and given- >> when, >> which is like a switch statement), but that will limit usage for >> those >> still using 5.8.8, which is probably a majority of users, since 5.10 >> came out just last December. >> >> chris >> >>> >>> >>> Mira >>> >>> >>> >>> On 6/2/08 10:29 AM, "Han, Mira" wrote: >>> >>> >>> >>> Last week (May 26-30): >>> 1. made skeleton files for TreeIO:: PhyloEventBuilder, >>> TreeIO::phyloXML, Tree::NodePhyloXML >>> 2. managed to connect and load them up but there is a bus error >>> problem. >>> I think it's probably due to some of the function calls that I'm >>> making >>> That I haven't looked into properly. I'm suspecting it will go away >>> once I properly >>> build in the end_element for >>> >>> This week (Jun 2-6): >>> 1. implement start_element, and end_element for and >>> >>> - start_element: : add treelevel, : push data >>> to current_items. >>> - end_element: : minus treelevel, : pop data >>> from current_elements, use new() to build node from popped data. >>> 2. get rid of that bus error >>> 3. TreeIO::phyloXML::Next_tree() : look for element >>> _______________________________________________ >>> Wg-phyloinformatics mailing list >>> Wg-phyloinformatics at nescent.org >>> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Marie-Claude Hofmann >> College of Veterinary Medicine >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Wg-phyloinformatics mailing list >> Wg-phyloinformatics at nescent.org >> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics >> > > Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From aaron.j.mackey at gsk.com Mon Jun 9 14:30:22 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 9 Jun 2008 14:30:22 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: <87FB1AA1-943D-4BCC-8F00-2CC5F05FFEAB@uiuc.edu> Message-ID: How about just: $self->${ $self->{lookup}->{tag} }(@args) i.e., shorthand for: $method = $self->{lookup}->{tag} $self->$method(@args); -Aaron wg-phyloinformatics-bounces at nescent.org wrote on 06/09/2008 02:12:29 PM: > [cross-posting to bioperl-l for archiving] > > On Jun 8, 2008, at 11:32 PM, Han, Mira wrote: > > > ... > > issues: > > There are a lot of s when processing the elements, > > I tried to make a hash of function references that point to the > > member functions, > > But when I tried calling it through the hash, it was giving me an > > error that I'm trying to call a method on an unblessed object. > > I ran into something similar when setting up a few SeqIO modules > (Bio::SeqIO::gbdriver being on of them) which passed on data chunks to > method handlers. It has something to do with how the method is set up > in the class (package) namespace and how you refer to it. It's a > little tricky b/c you run into semantic issues with perl's 'hammered- > on' OO, but it can be done. > > If you call using '$self->{lookup}->{$tag}->(@args)' directly, what > happens is you can successfully call the method since you are still in > the proper module namespace. However, since you aren't calling from > the invocant ($self) directly but rather from a reference in the > invocant, it treats the call like a subroutine instead of a method. > Therefore no invocant is passed as the first argument (you will > instead get either the first element in @args or 'undef' assigned to > $self within the method). Not sure if this is supposed to be a > feature or a bug. Regardless, any attempt within the method to do > something with $self will result in a 'using an unblessed reference' > or 'not a hash reference'. > > There are two solutions, both of which work. If you have method > references stored in a hash table in the invocant: > > $self->{lookup}->{tag1} = \&foo; > $self->{lookup}->{tag2} = \&bar; > .... > > you can grab the actual code reference (checking using 'exists') and > use it directly on the invocant, but NOT as a code reference. This > acts as a symbolic reference, which is allowed for subroutine and > method calls (I think it's supposed to be DWIM-my): > > if (exists $self->{lookup}->{$tag}) { > my $method = $self->{lookup}->{$tag}; > $self->$method(@args); > } else {...} > > The above also works if you use strings in the lookup table which > contain the name of the methods (again, symbolic reference): > > $self->{lookup}->{tag1} = 'foo'; > $self->{lookup}->{tag2} = 'bar'; > > Alternately, you can pass the invocant in explicitly (which looks > weird to me, hence my above solution): > > if (exists $self->{lookup}->{$tag}) { > $self->{lookup}->{$tag}->($self, @args); > } else {...} > > perl6 fixes a lot of these issues, but of course it won't be out for a > while longer. > > > I'd like to figure out how to do it, > > But before that, is hashing really better than lots of if-elses? > > Using a stack of if-elsifs isn't as efficient as a lookup since you > would test each case in succession (so something that is further down > the if-elseif test stack would have passed through and failed each > previous test case before success). A lookup table would test simply > based on the existence of a value stored under a key (tag). > > An alternative is to use 5.10 features (smart matching and given-when, > which is like a switch statement), but that will limit usage for those > still using 5.8.8, which is probably a majority of users, since 5.10 > came out just last December. > > chris > > > > > > > Mira > > > > > > > > On 6/2/08 10:29 AM, "Han, Mira" wrote: > > > > > > > > Last week (May 26-30): > > 1. made skeleton files for TreeIO:: PhyloEventBuilder, > > TreeIO::phyloXML, Tree::NodePhyloXML > > 2. managed to connect and load them up but there is a bus error > > problem. > > I think it's probably due to some of the function calls that I'm > > making > > That I haven't looked into properly. I'm suspecting it will go away > > once I properly > > build in the end_element for > > > > This week (Jun 2-6): > > 1. implement start_element, and end_element for and > > > > - start_element: : add treelevel, : push data > > to current_items. > > - end_element: : minus treelevel, : pop data > > from current_elements, use new() to build node from popped data. > > 2. get rid of that bus error > > 3. TreeIO::phyloXML::Next_tree() : look for element > > _______________________________________________ > > Wg-phyloinformatics mailing list > > Wg-phyloinformatics at nescent.org > > https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Wg-phyloinformatics mailing list > Wg-phyloinformatics at nescent.org > https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics > From miraceti at gmail.com Tue Jun 10 00:37:03 2008 From: miraceti at gmail.com (miraceti) Date: Tue, 10 Jun 2008 00:37:03 -0400 Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report In-Reply-To: <763316E4-F480-45FC-BC81-A450D61CAB2D@uiuc.edu> References: <763316E4-F480-45FC-BC81-A450D61CAB2D@uiuc.edu> Message-ID: Thanks for that, It now works. FYI, my $method = $self->{'_start_element'}->{$reader->name}; $self->$method(); Worked. $self->{'_start_element'}->{$reader->name}($self); Worked. $self->${$self->{'_start_element'}->{$reader->name}}; Gave error: Not a SCALAR reference at /usr/local/src/bioperl-live/Bio/TreeIO/phyloxml.pm line 197 $self->${\scalar($self->{'_start_element'}->{$reader->name})}; Worked. I'm using the first way. Mira On Mon, Jun 9, 2008 at 3:08 PM, Chris Fields wrote: > Yes, that works as well. > > chris > > > On Jun 9, 2008, at 1:30 PM, aaron.j.mackey at gsk.com wrote: > > How about just: >> >> $self->${ $self->{lookup}->{tag} }(@args) >> >> i.e., shorthand for: >> >> $method = $self->{lookup}->{tag} >> $self->$method(@args); >> >> -Aaron >> >> wg-phyloinformatics-bounces at nescent.org wrote on 06/09/2008 02:12:29 PM: >> >> [cross-posting to bioperl-l for archiving] >>> >>> On Jun 8, 2008, at 11:32 PM, Han, Mira wrote: >>> >>> ... >>>> issues: >>>> There are a lot of s when processing the elements, >>>> I tried to make a hash of function references that point to the >>>> member functions, >>>> But when I tried calling it through the hash, it was giving me an >>>> error that I'm trying to call a method on an unblessed object. >>>> >>> >>> I ran into something similar when setting up a few SeqIO modules >>> (Bio::SeqIO::gbdriver being on of them) which passed on data chunks to >>> method handlers. It has something to do with how the method is set up >>> in the class (package) namespace and how you refer to it. It's a >>> little tricky b/c you run into semantic issues with perl's 'hammered- >>> on' OO, but it can be done. >>> >>> If you call using '$self->{lookup}->{$tag}->(@args)' directly, what >>> happens is you can successfully call the method since you are still in >>> the proper module namespace. However, since you aren't calling from >>> the invocant ($self) directly but rather from a reference in the >>> invocant, it treats the call like a subroutine instead of a method. >>> Therefore no invocant is passed as the first argument (you will >>> instead get either the first element in @args or 'undef' assigned to >>> $self within the method). Not sure if this is supposed to be a >>> feature or a bug. Regardless, any attempt within the method to do >>> something with $self will result in a 'using an unblessed reference' >>> or 'not a hash reference'. >>> >>> There are two solutions, both of which work. If you have method >>> references stored in a hash table in the invocant: >>> >>> $self->{lookup}->{tag1} = \&foo; >>> $self->{lookup}->{tag2} = \&bar; >>> .... >>> >>> you can grab the actual code reference (checking using 'exists') and >>> use it directly on the invocant, but NOT as a code reference. This >>> acts as a symbolic reference, which is allowed for subroutine and >>> method calls (I think it's supposed to be DWIM-my): >>> >>> if (exists $self->{lookup}->{$tag}) { >>> my $method = $self->{lookup}->{$tag}; >>> $self->$method(@args); >>> } else {...} >>> >>> The above also works if you use strings in the lookup table which >>> contain the name of the methods (again, symbolic reference): >>> >>> $self->{lookup}->{tag1} = 'foo'; >>> $self->{lookup}->{tag2} = 'bar'; >>> >>> Alternately, you can pass the invocant in explicitly (which looks >>> weird to me, hence my above solution): >>> >>> if (exists $self->{lookup}->{$tag}) { >>> $self->{lookup}->{$tag}->($self, @args); >>> } else {...} >>> >>> perl6 fixes a lot of these issues, but of course it won't be out for a >>> while longer. >>> >>> I'd like to figure out how to do it, >>>> But before that, is hashing really better than lots of if-elses? >>>> >>> >>> Using a stack of if-elsifs isn't as efficient as a lookup since you >>> would test each case in succession (so something that is further down >>> the if-elseif test stack would have passed through and failed each >>> previous test case before success). A lookup table would test simply >>> based on the existence of a value stored under a key (tag). >>> >>> An alternative is to use 5.10 features (smart matching and given-when, >>> which is like a switch statement), but that will limit usage for those >>> still using 5.8.8, which is probably a majority of users, since 5.10 >>> came out just last December. >>> >>> chris >>> >>> >>>> >>>> Mira >>>> >>>> >>>> >>>> On 6/2/08 10:29 AM, "Han, Mira" wrote: >>>> >>>> >>>> >>>> Last week (May 26-30): >>>> 1. made skeleton files for TreeIO:: PhyloEventBuilder, >>>> TreeIO::phyloXML, Tree::NodePhyloXML >>>> 2. managed to connect and load them up but there is a bus error >>>> problem. >>>> I think it's probably due to some of the function calls that I'm >>>> making >>>> That I haven't looked into properly. I'm suspecting it will go away >>>> once I properly >>>> build in the end_element for >>>> >>>> This week (Jun 2-6): >>>> 1. implement start_element, and end_element for and >>>> >>>> - start_element: : add treelevel, : push data >>>> to current_items. >>>> - end_element: : minus treelevel, : pop data >>>> from current_elements, use new() to build node from popped data. >>>> 2. get rid of that bus error >>>> 3. TreeIO::phyloXML::Next_tree() : look for element >>>> _______________________________________________ >>>> Wg-phyloinformatics mailing list >>>> Wg-phyloinformatics at nescent.org >>>> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics >>>> >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Marie-Claude Hofmann >>> College of Veterinary Medicine >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> >>> _______________________________________________ >>> Wg-phyloinformatics mailing list >>> Wg-phyloinformatics at nescent.org >>> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics >>> >>> >> >> > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From yezhiqiang at gmail.com Tue Jun 10 07:43:50 2008 From: yezhiqiang at gmail.com (Zhi-Qiang Ye) Date: Tue, 10 Jun 2008 19:43:50 +0800 Subject: [Bioperl-l] EMBL format field In-Reply-To: References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> Message-ID: <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> That's weird. I also met this problem. I tried a embl-format file like this: ID CB271253; SV 1; linear; mRNA; EST; INV; 591 BP. XX AC CB271253; XX DT 24-FEB-2003 (Rel. 74, Created) DT 24-FEB-2003 (Rel. 74, Last updated, Version 1) XX DE taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA 3' similar to DE SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence. from: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw the $seq object's ->id, ->display_id are "unkown id" ... ZQ Ye 2008/6/9 Hilmar Lapp : > If this is the case with the latest version of BioPerl it should be filed as > a bug report for the embl parser. The ID ought to be reported in > $seq->get_secondary_accessions() (which returns an array). If it doesn't, it > sounds like a bug to me. > > -hilmar > > On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >> >> Hi Wen, >> A dump of that sequence object (Data::Dumper is your friend !) reveals >> that the PA EMBL field is not saved into the object. However, you will >> find the string 'AB000170.1' in the embedded CDS feature, more precisely >> the seqid of the location object. I don't know whether that is always >> the case, but it is in your particular example. >> So, to get your hands on that value you have to do: >> >> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >> my $parent_id = $cds->location->seq_id; >> >> HTH, >> Marc >> >> Marc Logghe >> Senior Bioinformatician >> Ablynx nv >>> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>> Sent: Monday, June 09, 2008 5:28 AM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] EMBL format field >>> >>> Hi all, >>> >>> I have a EMBL file that I want to extract one of the line >>> >>> ###file### >>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>> XX >>> PA AB000170.1 >>> XX >>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>> XX >>> OS Sus scrofa (pig) >>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; >>> Mammalia; >>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >>> OX NCBI_TaxID=9823; >>> ......... >>> >>> I want the accession number in the line that starts with PA, AB000170 >>> in this example. >>> >>> Can anybody kindly help, tell me which module and method I should use? >>> I tried various things like $seq_obj -> primary_id, display_id, >>> get_secondary_id, etc.. they did not work... >>> >>> Thanks a lot! >>> >>> Wen >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jeremydt at gmail.com Tue Jun 10 14:50:19 2008 From: jeremydt at gmail.com (Jeremy Davis-Turak) Date: Tue, 10 Jun 2008 11:50:19 -0700 Subject: [Bioperl-l] Error installing bioperl Message-ID: <378b225b0806101150i5fe6ff2as831d6ee8b5254b4c@mail.gmail.com> Hi, I'm getting the following error, either using CPAN or make with bioperl-1.4 (also with bioperl-1.2) Writing Makefile for Bio make: *** No rule to make target `/usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/config.h', needed by `Makefile'. Stop. Can you please help? Thanks, Jeremy From Kevin.M.Brown at asu.edu Tue Jun 10 15:10:40 2008 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 10 Jun 2008 12:10:40 -0700 Subject: [Bioperl-l] Error installing bioperl In-Reply-To: <378b225b0806101150i5fe6ff2as831d6ee8b5254b4c@mail.gmail.com> References: <378b225b0806101150i5fe6ff2as831d6ee8b5254b4c@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B404F2AFCD@EX02.asurite.ad.asu.edu> Bioperl 1.4 is a very old version. Try following the install directions at http://www.bioperl.org/wiki/Installing_BioPerl > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jeremy Davis-Turak > Sent: Tuesday, June 10, 2008 11:50 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Error installing bioperl > > Hi, I'm getting the following error, either using CPAN or make with > bioperl-1.4 (also with bioperl-1.2) > > Writing Makefile for Bio > make: *** No rule to make target > `/usr/lib64/perl5/5.10.0/x86_64-linux-thread-multi/CORE/config > .h', needed by > `Makefile'. Stop. > > > Can you please help? > > Thanks, > > Jeremy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki at sanbi.ac.za Tue Jun 10 19:22:10 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 11 Jun 2008 01:22:10 +0200 Subject: [Bioperl-l] A lot of POD fixes in bioperl-live and bioperl run Message-ID: <200806110122.10982.heikki@sanbi.ac.za> I have recently done a lot fixes in the inline Plain Old Documenation (POD) texts in bioperl-live and bioperl-run. Last ones (hopefully) were committed a few minutes ago. This has resulted quite large updates from SVN. I wanted to apologize the inconvenience and to explain reasons for these small and pedantic fixes. In contrast to perl, POD is sensitive to white space. This makes it relatively difficult to find and fix all minor errors in POD. I've now gone through the trouble of fixing all POD mistakes causing even the smallest warning in the podchecker. The main reason for doing this was to reduce the the number of warnings reported by the pod.pl bioperl maintenence tool. Too many minor warnings make it difficult to recognise more serious errors affecting the integrity and readability of POD documentation. One example case is when a paragraph that was supposed to be 'in verbatim', is in fact touching the previous paragraph and the pod engine formats it and destroys the intended ascii graph or table. The only way POD engine is able to report this is to warn about unescaped special characters. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From jason at bioperl.org Tue Jun 10 19:55:56 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Jun 2008 16:55:56 -0700 Subject: [Bioperl-l] EMBL format field In-Reply-To: <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> Message-ID: <46E681F5-CB56-4ADA-BEB9-00083CFA78F9@bioperl.org> What version of bioperl? It works for me using this code I get 'CB271253' printed out. #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $in = Bio::SeqIO->new(-format => 'embl', -file => shift); while( my $seq = $in->next_seq ) { print $seq->id,"\n"; } On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote: > That's weird. I also met this problem. I tried a embl-format file > like this: > > ID CB271253; SV 1; linear; mRNA; EST; INV; 591 BP. > XX > AC CB271253; > XX > DT 24-FEB-2003 (Rel. 74, Created) > DT 24-FEB-2003 (Rel. 74, Last updated, Version 1) > XX > DE taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA 3' similar to > DE SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence. > > from: http://www.ebi.ac.uk/cgi-bin/dbfetch? > db=embl&id=CB271253&style=raw > > the $seq object's ->id, ->display_id are "unkown id" ... > > > > ZQ Ye > > 2008/6/9 Hilmar Lapp : >> If this is the case with the latest version of BioPerl it should >> be filed as >> a bug report for the embl parser. The ID ought to be reported in >> $seq->get_secondary_accessions() (which returns an array). If it >> doesn't, it >> sounds like a bug to me. >> >> -hilmar >> >> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>> >>> Hi Wen, >>> A dump of that sequence object (Data::Dumper is your friend !) >>> reveals >>> that the PA EMBL field is not saved into the object. However, you >>> will >>> find the string 'AB000170.1' in the embedded CDS feature, more >>> precisely >>> the seqid of the location object. I don't know whether that is >>> always >>> the case, but it is in your particular example. >>> So, to get your hands on that value you have to do: >>> >>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >>> my $parent_id = $cds->location->seq_id; >>> >>> HTH, >>> Marc >>> >>> Marc Logghe >>> Senior Bioinformatician >>> Ablynx nv >>>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>> Sent: Monday, June 09, 2008 5:28 AM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] EMBL format field >>>> >>>> Hi all, >>>> >>>> I have a EMBL file that I want to extract one of the line >>>> >>>> ###file### >>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>> XX >>>> PA AB000170.1 >>>> XX >>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>> XX >>>> OS Sus scrofa (pig) >>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>> Euteleostomi; >>>> Mammalia; >>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >>>> OX NCBI_TaxID=9823; >>>> ......... >>>> >>>> I want the accession number in the line that starts with PA, >>>> AB000170 >>>> in this example. >>>> >>>> Can anybody kindly help, tell me which module and method I >>>> should use? >>>> I tried various things like $seq_obj -> primary_id, display_id, >>>> get_secondary_id, etc.. they did not work... >>>> >>>> Thanks a lot! >>>> >>>> Wen >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Tue Jun 10 19:57:42 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Jun 2008 16:57:42 -0700 Subject: [Bioperl-l] EMBL format field In-Reply-To: <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> Message-ID: <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> PA is a field that we don't currently parse, something that should be filed as a bug on bugzilla. Would you be able to do this? -jason On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: > Hilmar, > > I tried that, it did not work. Marc's way can work. > > Thanks, > Wen > > On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: > >> If this is the case with the latest version of BioPerl it should >> be filed as a bug report for the embl parser. The ID ought to be >> reported in $seq->get_secondary_accessions() (which returns an >> array). If it doesn't, it sounds like a bug to me. >> >> -hilmar >> >> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>> Hi Wen, >>> A dump of that sequence object (Data::Dumper is your friend !) >>> reveals >>> that the PA EMBL field is not saved into the object. However, you >>> will >>> find the string 'AB000170.1' in the embedded CDS feature, more >>> precisely >>> the seqid of the location object. I don't know whether that is >>> always >>> the case, but it is in your particular example. >>> So, to get your hands on that value you have to do: >>> >>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >>> my $parent_id = $cds->location->seq_id; >>> >>> HTH, >>> Marc >>> >>> Marc Logghe >>> Senior Bioinformatician >>> Ablynx nv >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>> Sent: Monday, June 09, 2008 5:28 AM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] EMBL format field >>>> >>>> Hi all, >>>> >>>> I have a EMBL file that I want to extract one of the line >>>> >>>> ###file### >>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>> XX >>>> PA AB000170.1 >>>> XX >>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>> XX >>>> OS Sus scrofa (pig) >>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>> Euteleostomi; >>>> Mammalia; >>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >>>> OX NCBI_TaxID=9823; >>>> ......... >>>> >>>> I want the accession number in the line that starts with PA, >>>> AB000170 >>>> in this example. >>>> >>>> Can anybody kindly help, tell me which module and method I >>>> should use? >>>> I tried various things like $seq_obj -> primary_id, display_id, >>>> get_secondary_id, etc.. they did not work... >>>> >>>> Thanks a lot! >>>> >>>> Wen >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Jun 10 20:19:55 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jun 2008 19:19:55 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> Message-ID: <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> PA is an odd field; it isn't described in the EMBL user manual: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html but appears in mRNA files, so I'm guessing it stands for the (p)rotein (a)ccession. I don't think this should be stored as primary/secondary accession, but maybe as a DBLink annootation? chris On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: > PA is a field that we don't currently parse, something that should > be filed as a bug on bugzilla. > Would you be able to do this? > > -jason > On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: > >> Hilmar, >> >> I tried that, it did not work. Marc's way can work. >> >> Thanks, >> Wen >> >> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >> >>> If this is the case with the latest version of BioPerl it should >>> be filed as a bug report for the embl parser. The ID ought to be >>> reported in $seq->get_secondary_accessions() (which returns an >>> array). If it doesn't, it sounds like a bug to me. >>> >>> -hilmar >>> >>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>> Hi Wen, >>>> A dump of that sequence object (Data::Dumper is your friend !) >>>> reveals >>>> that the PA EMBL field is not saved into the object. However, you >>>> will >>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>> precisely >>>> the seqid of the location object. I don't know whether that is >>>> always >>>> the case, but it is in your particular example. >>>> So, to get your hands on that value you have to do: >>>> >>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >>>> my $parent_id = $cds->location->seq_id; >>>> >>>> HTH, >>>> Marc >>>> >>>> Marc Logghe >>>> Senior Bioinformatician >>>> Ablynx nv >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] EMBL format field >>>>> >>>>> Hi all, >>>>> >>>>> I have a EMBL file that I want to extract one of the line >>>>> >>>>> ###file### >>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>> XX >>>>> PA AB000170.1 >>>>> XX >>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>> XX >>>>> OS Sus scrofa (pig) >>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>> Euteleostomi; >>>>> Mammalia; >>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; >>>>> Sus. >>>>> OX NCBI_TaxID=9823; >>>>> ......... >>>>> >>>>> I want the accession number in the line that starts with PA, >>>>> AB000170 >>>>> in this example. >>>>> >>>>> Can anybody kindly help, tell me which module and method I >>>>> should use? >>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>> get_secondary_id, etc.. they did not work... >>>>> >>>>> Thanks a lot! >>>>> >>>>> Wen >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jason at bioperl.org Tue Jun 10 20:36:20 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Jun 2008 17:36:20 -0700 Subject: [Bioperl-l] EMBL format field In-Reply-To: <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> Message-ID: <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> I agree if it isn't the accession # it shouldn't be stored there. I guess it is a DBlink, but it is going to be hacky to round-trip this as you'll have to have a special case for records that are mRNAs... -jason On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: > PA is an odd field; it isn't described in the EMBL user manual: > > http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html > > but appears in mRNA files, so I'm guessing it stands for the (p) > rotein (a)ccession. I don't think this should be stored as primary/ > secondary accession, but maybe as a DBLink annootation? > > chris > > On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: > >> PA is a field that we don't currently parse, something that should >> be filed as a bug on bugzilla. >> Would you be able to do this? >> >> -jason >> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >> >>> Hilmar, >>> >>> I tried that, it did not work. Marc's way can work. >>> >>> Thanks, >>> Wen >>> >>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>> >>>> If this is the case with the latest version of BioPerl it should >>>> be filed as a bug report for the embl parser. The ID ought to be >>>> reported in $seq->get_secondary_accessions() (which returns an >>>> array). If it doesn't, it sounds like a bug to me. >>>> >>>> -hilmar >>>> >>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>> Hi Wen, >>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>> reveals >>>>> that the PA EMBL field is not saved into the object. However, >>>>> you will >>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>> precisely >>>>> the seqid of the location object. I don't know whether that is >>>>> always >>>>> the case, but it is in your particular example. >>>>> So, to get your hands on that value you have to do: >>>>> >>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >>>>> my $parent_id = $cds->location->seq_id; >>>>> >>>>> HTH, >>>>> Marc >>>>> >>>>> Marc Logghe >>>>> Senior Bioinformatician >>>>> Ablynx nv >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>> To: bioperl-l at lists.open-bio.org >>>>>> Subject: [Bioperl-l] EMBL format field >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I have a EMBL file that I want to extract one of the line >>>>>> >>>>>> ###file### >>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>> XX >>>>>> PA AB000170.1 >>>>>> XX >>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>> XX >>>>>> OS Sus scrofa (pig) >>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>> Euteleostomi; >>>>>> Mammalia; >>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; >>>>>> Sus. >>>>>> OX NCBI_TaxID=9823; >>>>>> ......... >>>>>> >>>>>> I want the accession number in the line that starts with PA, >>>>>> AB000170 >>>>>> in this example. >>>>>> >>>>>> Can anybody kindly help, tell me which module and method I >>>>>> should use? >>>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>>> get_secondary_id, etc.. they did not work... >>>>>> >>>>>> Thanks a lot! >>>>>> >>>>>> Wen >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > From whuang.ustc at gmail.com Tue Jun 10 20:51:51 2008 From: whuang.ustc at gmail.com (Wen Huang) Date: Tue, 10 Jun 2008 19:51:51 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> Message-ID: Hi Everybody, Thank you for your thoughtful discussion and help. I have found another way to get around it (by grep and awk), but not so perl-ish. I don't think I know how to submit a bug report to bugzilla, but I do think that it is not a good idea to include the parent id in a PA line, or even in the file... The file I got is from EMBL-CDS databank, I wanted to get the mRNA from which they are derived. I guess it is better to include it as a DBlink as Jason pointed out. Thanks, Wen On Jun 10, 2008, at 7:36 PM, Jason Stajich wrote: > I agree if it isn't the accession # it shouldn't be stored there. I > guess it is a DBlink, but it is going to be hacky to round-trip this > as you'll have to have a special case for records that are mRNAs... > > -jason > On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: > >> PA is an odd field; it isn't described in the EMBL user manual: >> >> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >> >> but appears in mRNA files, so I'm guessing it stands for the >> (p)rotein (a)ccession. I don't think this should be stored as >> primary/secondary accession, but maybe as a DBLink annootation? >> >> chris >> >> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >> >>> PA is a field that we don't currently parse, something that should >>> be filed as a bug on bugzilla. >>> Would you be able to do this? >>> >>> -jason >>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>> >>>> Hilmar, >>>> >>>> I tried that, it did not work. Marc's way can work. >>>> >>>> Thanks, >>>> Wen >>>> >>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>> >>>>> If this is the case with the latest version of BioPerl it should >>>>> be filed as a bug report for the embl parser. The ID ought to be >>>>> reported in $seq->get_secondary_accessions() (which returns an >>>>> array). If it doesn't, it sounds like a bug to me. >>>>> >>>>> -hilmar >>>>> >>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>> Hi Wen, >>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>> reveals >>>>>> that the PA EMBL field is not saved into the object. However, >>>>>> you will >>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>> precisely >>>>>> the seqid of the location object. I don't know whether that is >>>>>> always >>>>>> the case, but it is in your particular example. >>>>>> So, to get your hands on that value you have to do: >>>>>> >>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>> >get_SeqFeatures; >>>>>> my $parent_id = $cds->location->seq_id; >>>>>> >>>>>> HTH, >>>>>> Marc >>>>>> >>>>>> Marc Logghe >>>>>> Senior Bioinformatician >>>>>> Ablynx nv >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>> >>>>>>> ###file### >>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>> XX >>>>>>> PA AB000170.1 >>>>>>> XX >>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>> XX >>>>>>> OS Sus scrofa (pig) >>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>> Euteleostomi; >>>>>>> Mammalia; >>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; >>>>>>> Sus. >>>>>>> OX NCBI_TaxID=9823; >>>>>>> ......... >>>>>>> >>>>>>> I want the accession number in the line that starts with PA, >>>>>>> AB000170 >>>>>>> in this example. >>>>>>> >>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>> should use? >>>>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>>>> get_secondary_id, etc.. they did not work... >>>>>>> >>>>>>> Thanks a lot! >>>>>>> >>>>>>> Wen >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> -- >>>>> =========================================================== >>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>> =========================================================== >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Marie-Claude Hofmann >> College of Veterinary Medicine >> University of Illinois Urbana-Champaign >> >> >> >> > From hlapp at gmx.net Tue Jun 10 21:35:50 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Jun 2008 21:35:50 -0400 Subject: [Bioperl-l] EMBL format field In-Reply-To: <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> Message-ID: <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: > I agree if it isn't the accession # it shouldn't be stored there. > I guess it is a DBlink, but it is going to be hacky to round-trip > this as you'll have to have a special case for records that are > mRNAs... I think I agree with that - didn't realize it is the accession of the (translated) protein. It would be ideal to convert this into a DBLink annotation indeed, but that's an opinion and an interpretation of the file (even if a very useful one). As such I believe it should be the matter of a SeqProcessor. Hmm - except that at that point the information has been lost already so there's actually nothing that the SeqProcessor could massage. So what if the line would simply be a B::Annotation::SimpleValue with 'PA' as key and the accession# as value? That wouldn't be an interpretation, and yet would make the value available to a SeqProcessor for converting into a DBLink. -hilmar > > -jason > On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: > >> PA is an odd field; it isn't described in the EMBL user manual: >> >> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >> >> but appears in mRNA files, so I'm guessing it stands for the (p) >> rotein (a)ccession. I don't think this should be stored as >> primary/secondary accession, but maybe as a DBLink annootation? >> >> chris >> >> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >> >>> PA is a field that we don't currently parse, something that >>> should be filed as a bug on bugzilla. >>> Would you be able to do this? >>> >>> -jason >>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>> >>>> Hilmar, >>>> >>>> I tried that, it did not work. Marc's way can work. >>>> >>>> Thanks, >>>> Wen >>>> >>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>> >>>>> If this is the case with the latest version of BioPerl it >>>>> should be filed as a bug report for the embl parser. The ID >>>>> ought to be reported in $seq->get_secondary_accessions() (which >>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>> >>>>> -hilmar >>>>> >>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>> Hi Wen, >>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>> reveals >>>>>> that the PA EMBL field is not saved into the object. However, >>>>>> you will >>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>> precisely >>>>>> the seqid of the location object. I don't know whether that is >>>>>> always >>>>>> the case, but it is in your particular example. >>>>>> So, to get your hands on that value you have to do: >>>>>> >>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>> >get_SeqFeatures; >>>>>> my $parent_id = $cds->location->seq_id; >>>>>> >>>>>> HTH, >>>>>> Marc >>>>>> >>>>>> Marc Logghe >>>>>> Senior Bioinformatician >>>>>> Ablynx nv >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>> >>>>>>> ###file### >>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>> XX >>>>>>> PA AB000170.1 >>>>>>> XX >>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>> XX >>>>>>> OS Sus scrofa (pig) >>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>> Euteleostomi; >>>>>>> Mammalia; >>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>> Suidae; Sus. >>>>>>> OX NCBI_TaxID=9823; >>>>>>> ......... >>>>>>> >>>>>>> I want the accession number in the line that starts with PA, >>>>>>> AB000170 >>>>>>> in this example. >>>>>>> >>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>> should use? >>>>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>>>> get_secondary_id, etc.. they did not work... >>>>>>> >>>>>>> Thanks a lot! >>>>>>> >>>>>>> Wen >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> -- >>>>> =========================================================== >>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>> =========================================================== >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Marie-Claude Hofmann >> College of Veterinary Medicine >> University of Illinois Urbana-Champaign >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bill at genenformics.com Tue Jun 10 21:43:55 2008 From: bill at genenformics.com (bill at genenformics.com) Date: Tue, 10 Jun 2008 18:43:55 -0700 (PDT) Subject: [Bioperl-l] EMBL format field In-Reply-To: <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> Message-ID: <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> This can be accomplished using IdConvert if protein accession/gi is known: $> ./IdConvert.exe BAA19060 #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc BAA19060 1783121 AB000170.1 1783123 BAA19061.1 endopeptidase 24.16 type M3 [Sus scrofa] Download IdConvert from http://www.genenformics.com/download.html for free. Bill at genenformics.com > > On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: >> I agree if it isn't the accession # it shouldn't be stored there. >> I guess it is a DBlink, but it is going to be hacky to round-trip >> this as you'll have to have a special case for records that are >> mRNAs... > > I think I agree with that - didn't realize it is the accession of the > (translated) protein. It would be ideal to convert this into a DBLink > annotation indeed, but that's an opinion and an interpretation of the > file (even if a very useful one). As such I believe it should be the > matter of a SeqProcessor. > > Hmm - except that at that point the information has been lost already > so there's actually nothing that the SeqProcessor could massage. > > So what if the line would simply be a B::Annotation::SimpleValue with > 'PA' as key and the accession# as value? That wouldn't be an > interpretation, and yet would make the value available to a > SeqProcessor for converting into a DBLink. > > -hilmar > >> >> -jason >> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: >> >>> PA is an odd field; it isn't described in the EMBL user manual: >>> >>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>> >>> but appears in mRNA files, so I'm guessing it stands for the (p) >>> rotein (a)ccession. I don't think this should be stored as >>> primary/secondary accession, but maybe as a DBLink annootation? >>> >>> chris >>> >>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >>> >>>> PA is a field that we don't currently parse, something that >>>> should be filed as a bug on bugzilla. >>>> Would you be able to do this? >>>> >>>> -jason >>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>>> >>>>> Hilmar, >>>>> >>>>> I tried that, it did not work. Marc's way can work. >>>>> >>>>> Thanks, >>>>> Wen >>>>> >>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>>> >>>>>> If this is the case with the latest version of BioPerl it >>>>>> should be filed as a bug report for the embl parser. The ID >>>>>> ought to be reported in $seq->get_secondary_accessions() (which >>>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>> Hi Wen, >>>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>>> reveals >>>>>>> that the PA EMBL field is not saved into the object. However, >>>>>>> you will >>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>>> precisely >>>>>>> the seqid of the location object. I don't know whether that is >>>>>>> always >>>>>>> the case, but it is in your particular example. >>>>>>> So, to get your hands on that value you have to do: >>>>>>> >>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>>> >get_SeqFeatures; >>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>> >>>>>>> HTH, >>>>>>> Marc >>>>>>> >>>>>>> Marc Logghe >>>>>>> Senior Bioinformatician >>>>>>> Ablynx nv >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>> >>>>>>>> ###file### >>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>> XX >>>>>>>> PA AB000170.1 >>>>>>>> XX >>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>> XX >>>>>>>> OS Sus scrofa (pig) >>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>>> Euteleostomi; >>>>>>>> Mammalia; >>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>>> Suidae; Sus. >>>>>>>> OX NCBI_TaxID=9823; >>>>>>>> ......... >>>>>>>> >>>>>>>> I want the accession number in the line that starts with PA, >>>>>>>> AB000170 >>>>>>>> in this example. >>>>>>>> >>>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>>> should use? >>>>>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>> >>>>>>>> Thanks a lot! >>>>>>>> >>>>>>>> Wen >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> -- >>>>>> =========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Marie-Claude Hofmann >>> College of Veterinary Medicine >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Jun 10 22:09:13 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Jun 2008 22:09:13 -0400 Subject: [Bioperl-l] EMBL format field In-Reply-To: <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> Message-ID: Bill, this mailing list is about BioPerl. There are many programs and web- sites out there that convert between IDs, that wasn't the question. We welcome your participation in helping to solve Bioperl-related problems, and sometimes the easiest solution is to use other, cross- platform open-source tools. For peddling commercial products, no matter how useful they are and how little the cost, please use other forums. -hilmar On Jun 10, 2008, at 9:43 PM, bill at genenformics.com wrote: > This can be accomplished using IdConvert if protein accession/gi is > known: > > $> ./IdConvert.exe BAA19060 > #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc > BAA19060 1783121 AB000170.1 1783123 BAA19061.1 > endopeptidase 24.16 type M3 [Sus scrofa] > > Download IdConvert from http://www.genenformics.com/download.html > for free. > > Bill at genenformics.com > > >> >> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: >>> I agree if it isn't the accession # it shouldn't be stored there. >>> I guess it is a DBlink, but it is going to be hacky to round-trip >>> this as you'll have to have a special case for records that are >>> mRNAs... >> >> I think I agree with that - didn't realize it is the accession of the >> (translated) protein. It would be ideal to convert this into a DBLink >> annotation indeed, but that's an opinion and an interpretation of the >> file (even if a very useful one). As such I believe it should be the >> matter of a SeqProcessor. >> >> Hmm - except that at that point the information has been lost already >> so there's actually nothing that the SeqProcessor could massage. >> >> So what if the line would simply be a B::Annotation::SimpleValue with >> 'PA' as key and the accession# as value? That wouldn't be an >> interpretation, and yet would make the value available to a >> SeqProcessor for converting into a DBLink. >> >> -hilmar >> >>> >>> -jason >>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: >>> >>>> PA is an odd field; it isn't described in the EMBL user manual: >>>> >>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>>> >>>> but appears in mRNA files, so I'm guessing it stands for the (p) >>>> rotein (a)ccession. I don't think this should be stored as >>>> primary/secondary accession, but maybe as a DBLink annootation? >>>> >>>> chris >>>> >>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >>>> >>>>> PA is a field that we don't currently parse, something that >>>>> should be filed as a bug on bugzilla. >>>>> Would you be able to do this? >>>>> >>>>> -jason >>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>>>> >>>>>> Hilmar, >>>>>> >>>>>> I tried that, it did not work. Marc's way can work. >>>>>> >>>>>> Thanks, >>>>>> Wen >>>>>> >>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>>>> >>>>>>> If this is the case with the latest version of BioPerl it >>>>>>> should be filed as a bug report for the embl parser. The ID >>>>>>> ought to be reported in $seq->get_secondary_accessions() (which >>>>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>>>> >>>>>>> -hilmar >>>>>>> >>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>>> Hi Wen, >>>>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>>>> reveals >>>>>>>> that the PA EMBL field is not saved into the object. However, >>>>>>>> you will >>>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>>>> precisely >>>>>>>> the seqid of the location object. I don't know whether that is >>>>>>>> always >>>>>>>> the case, but it is in your particular example. >>>>>>>> So, to get your hands on that value you have to do: >>>>>>>> >>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>>>>> get_SeqFeatures; >>>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>>> >>>>>>>> HTH, >>>>>>>> Marc >>>>>>>> >>>>>>>> Marc Logghe >>>>>>>> Senior Bioinformatician >>>>>>>> Ablynx nv >>>>>>>>> -----Original Message----- >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>>> >>>>>>>>> ###file### >>>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>>> XX >>>>>>>>> PA AB000170.1 >>>>>>>>> XX >>>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>>> XX >>>>>>>>> OS Sus scrofa (pig) >>>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>>>> Euteleostomi; >>>>>>>>> Mammalia; >>>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>>>> Suidae; Sus. >>>>>>>>> OX NCBI_TaxID=9823; >>>>>>>>> ......... >>>>>>>>> >>>>>>>>> I want the accession number in the line that starts with PA, >>>>>>>>> AB000170 >>>>>>>>> in this example. >>>>>>>>> >>>>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>>>> should use? >>>>>>>>> I tried various things like $seq_obj -> primary_id, >>>>>>>>> display_id, >>>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> >>>>>>>>> Wen >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> -- >>>>>>> =========================================================== >>>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>>> =========================================================== >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Marie-Claude Hofmann >>>> College of Veterinary Medicine >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bill at genenformics.com Tue Jun 10 22:33:45 2008 From: bill at genenformics.com (bill at genenformics.com) Date: Tue, 10 Jun 2008 19:33:45 -0700 (PDT) Subject: [Bioperl-l] EMBL format field In-Reply-To: References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> Message-ID: <61609.98.218.171.90.1213151625.squirrel@webmail.dreamhost.com> Hi, Hilmar, Thank you for your advice. I am a BioPerl user and I step in only when there is no efficient/effective BioPerl method to solve specific problems. Please forgive us for providing free solutions. Bill at genenformics.com > Bill, > > this mailing list is about BioPerl. There are many programs and web- > sites out there that convert between IDs, that wasn't the question. > > We welcome your participation in helping to solve Bioperl-related > problems, and sometimes the easiest solution is to use other, cross- > platform open-source tools. > > For peddling commercial products, no matter how useful they are and > how little the cost, please use other forums. > > -hilmar > > On Jun 10, 2008, at 9:43 PM, bill at genenformics.com wrote: >> This can be accomplished using IdConvert if protein accession/gi is >> known: >> >> $> ./IdConvert.exe BAA19060 >> #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc >> BAA19060 1783121 AB000170.1 1783123 BAA19061.1 >> endopeptidase 24.16 type M3 [Sus scrofa] >> >> Download IdConvert from http://www.genenformics.com/download.html >> for free. >> >> Bill at genenformics.com >> >> >>> >>> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: >>>> I agree if it isn't the accession # it shouldn't be stored there. >>>> I guess it is a DBlink, but it is going to be hacky to round-trip >>>> this as you'll have to have a special case for records that are >>>> mRNAs... >>> >>> I think I agree with that - didn't realize it is the accession of the >>> (translated) protein. It would be ideal to convert this into a DBLink >>> annotation indeed, but that's an opinion and an interpretation of the >>> file (even if a very useful one). As such I believe it should be the >>> matter of a SeqProcessor. >>> >>> Hmm - except that at that point the information has been lost already >>> so there's actually nothing that the SeqProcessor could massage. >>> >>> So what if the line would simply be a B::Annotation::SimpleValue with >>> 'PA' as key and the accession# as value? That wouldn't be an >>> interpretation, and yet would make the value available to a >>> SeqProcessor for converting into a DBLink. >>> >>> -hilmar >>> >>>> >>>> -jason >>>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: >>>> >>>>> PA is an odd field; it isn't described in the EMBL user manual: >>>>> >>>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>>>> >>>>> but appears in mRNA files, so I'm guessing it stands for the (p) >>>>> rotein (a)ccession. I don't think this should be stored as >>>>> primary/secondary accession, but maybe as a DBLink annootation? >>>>> >>>>> chris >>>>> >>>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >>>>> >>>>>> PA is a field that we don't currently parse, something that >>>>>> should be filed as a bug on bugzilla. >>>>>> Would you be able to do this? >>>>>> >>>>>> -jason >>>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>>>>> >>>>>>> Hilmar, >>>>>>> >>>>>>> I tried that, it did not work. Marc's way can work. >>>>>>> >>>>>>> Thanks, >>>>>>> Wen >>>>>>> >>>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>>>>> >>>>>>>> If this is the case with the latest version of BioPerl it >>>>>>>> should be filed as a bug report for the embl parser. The ID >>>>>>>> ought to be reported in $seq->get_secondary_accessions() (which >>>>>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>>>>> >>>>>>>> -hilmar >>>>>>>> >>>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>>>> Hi Wen, >>>>>>>>> A dump of that sequence object (Data::Dumper is your friend !) >>>>>>>>> reveals >>>>>>>>> that the PA EMBL field is not saved into the object. However, >>>>>>>>> you will >>>>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more >>>>>>>>> precisely >>>>>>>>> the seqid of the location object. I don't know whether that is >>>>>>>>> always >>>>>>>>> the case, but it is in your particular example. >>>>>>>>> So, to get your hands on that value you have to do: >>>>>>>>> >>>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>>>>>> get_SeqFeatures; >>>>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>>>> >>>>>>>>> HTH, >>>>>>>>> Marc >>>>>>>>> >>>>>>>>> Marc Logghe >>>>>>>>> Senior Bioinformatician >>>>>>>>> Ablynx nv >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>>>> >>>>>>>>>> ###file### >>>>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>>>> XX >>>>>>>>>> PA AB000170.1 >>>>>>>>>> XX >>>>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>>>> XX >>>>>>>>>> OS Sus scrofa (pig) >>>>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>>>>> Euteleostomi; >>>>>>>>>> Mammalia; >>>>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>>>>> Suidae; Sus. >>>>>>>>>> OX NCBI_TaxID=9823; >>>>>>>>>> ......... >>>>>>>>>> >>>>>>>>>> I want the accession number in the line that starts with PA, >>>>>>>>>> AB000170 >>>>>>>>>> in this example. >>>>>>>>>> >>>>>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>>>>> should use? >>>>>>>>>> I tried various things like $seq_obj -> primary_id, >>>>>>>>>> display_id, >>>>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>>>> >>>>>>>>>> Thanks a lot! >>>>>>>>>> >>>>>>>>>> Wen >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> -- >>>>>>>> =========================================================== >>>>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>>>> =========================================================== >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher >>>>> Lab of Dr. Marie-Claude Hofmann >>>>> College of Veterinary Medicine >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Jun 10 22:59:43 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jun 2008 21:59:43 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: <61609.98.218.171.90.1213151625.squirrel@webmail.dreamhost.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <45F42C94-02FD-429B-816F-883C7855A97D@gmail.com> <468D955B-533C-439A-96EA-B7CC6392BFE3@bioperl.org> <2542BB6E-AC43-4D5A-8788-2330E59A6E6F@uiuc.edu> <9A1F4E7F-A231-410C-8890-14ECDC7D0258@bioperl.org> <283146E5-0041-42CF-B881-90624D5DE393@gmx.net> <61099.98.218.171.90.1213148635.squirrel@webmail.dreamhost.com> <61609.98.218.171.90.1213151625.squirrel@webmail.dreamhost.com> Message-ID: <3804E643-9BBC-4B28-B1DE-D75AA2C9FE74@uiuc.edu> Bill, It's okay to offer suggestions to problems, particularly if no one answers, but I have to agree with Hilmar in this case. The specific problem: your 'solution' is tied to commercial software (albeit free), which appear to be closed-source and with questionable licensing. I couldn't find documentation on your website addressing either issue. Therefore, I couldn't recommend using it unless the latter two issues were clarified, preferably by becoming open-source. chris On Jun 10, 2008, at 9:33 PM, bill at genenformics.com wrote: > Hi, Hilmar, > > Thank you for your advice. > > I am a BioPerl user and I step in only when there is no > efficient/effective BioPerl method to solve specific problems. > > Please forgive us for providing free solutions. > > Bill at genenformics.com > >> Bill, >> >> this mailing list is about BioPerl. There are many programs and web- >> sites out there that convert between IDs, that wasn't the question. >> >> We welcome your participation in helping to solve Bioperl-related >> problems, and sometimes the easiest solution is to use other, cross- >> platform open-source tools. >> >> For peddling commercial products, no matter how useful they are and >> how little the cost, please use other forums. >> >> -hilmar >> >> On Jun 10, 2008, at 9:43 PM, bill at genenformics.com wrote: >>> This can be accomplished using IdConvert if protein accession/gi is >>> known: >>> >>> $> ./IdConvert.exe BAA19060 >>> #Input Nuc_GI Nuc_Acc Pro_GI Pro_Acc Desc >>> BAA19060 1783121 AB000170.1 1783123 BAA19061.1 >>> endopeptidase 24.16 type M3 [Sus scrofa] >>> >>> Download IdConvert from http://www.genenformics.com/download.html >>> for free. >>> >>> Bill at genenformics.com >>> >>> >>>> >>>> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote: >>>>> I agree if it isn't the accession # it shouldn't be stored there. >>>>> I guess it is a DBlink, but it is going to be hacky to round-trip >>>>> this as you'll have to have a special case for records that are >>>>> mRNAs... >>>> >>>> I think I agree with that - didn't realize it is the accession of >>>> the >>>> (translated) protein. It would be ideal to convert this into a >>>> DBLink >>>> annotation indeed, but that's an opinion and an interpretation of >>>> the >>>> file (even if a very useful one). As such I believe it should be >>>> the >>>> matter of a SeqProcessor. >>>> >>>> Hmm - except that at that point the information has been lost >>>> already >>>> so there's actually nothing that the SeqProcessor could massage. >>>> >>>> So what if the line would simply be a B::Annotation::SimpleValue >>>> with >>>> 'PA' as key and the accession# as value? That wouldn't be an >>>> interpretation, and yet would make the value available to a >>>> SeqProcessor for converting into a DBLink. >>>> >>>> -hilmar >>>> >>>>> >>>>> -jason >>>>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote: >>>>> >>>>>> PA is an odd field; it isn't described in the EMBL user manual: >>>>>> >>>>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html >>>>>> >>>>>> but appears in mRNA files, so I'm guessing it stands for the (p) >>>>>> rotein (a)ccession. I don't think this should be stored as >>>>>> primary/secondary accession, but maybe as a DBLink annootation? >>>>>> >>>>>> chris >>>>>> >>>>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote: >>>>>> >>>>>>> PA is a field that we don't currently parse, something that >>>>>>> should be filed as a bug on bugzilla. >>>>>>> Would you be able to do this? >>>>>>> >>>>>>> -jason >>>>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote: >>>>>>> >>>>>>>> Hilmar, >>>>>>>> >>>>>>>> I tried that, it did not work. Marc's way can work. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Wen >>>>>>>> >>>>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote: >>>>>>>> >>>>>>>>> If this is the case with the latest version of BioPerl it >>>>>>>>> should be filed as a bug report for the embl parser. The ID >>>>>>>>> ought to be reported in $seq->get_secondary_accessions() >>>>>>>>> (which >>>>>>>>> returns an array). If it doesn't, it sounds like a bug to me. >>>>>>>>> >>>>>>>>> -hilmar >>>>>>>>> >>>>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>>>>> Hi Wen, >>>>>>>>>> A dump of that sequence object (Data::Dumper is your >>>>>>>>>> friend !) >>>>>>>>>> reveals >>>>>>>>>> that the PA EMBL field is not saved into the object. However, >>>>>>>>>> you will >>>>>>>>>> find the string 'AB000170.1' in the embedded CDS feature, >>>>>>>>>> more >>>>>>>>>> precisely >>>>>>>>>> the seqid of the location object. I don't know whether that >>>>>>>>>> is >>>>>>>>>> always >>>>>>>>>> the case, but it is in your particular example. >>>>>>>>>> So, to get your hands on that value you have to do: >>>>>>>>>> >>>>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- >>>>>>>>>>> get_SeqFeatures; >>>>>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>>>>> >>>>>>>>>> HTH, >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> Marc Logghe >>>>>>>>>> Senior Bioinformatician >>>>>>>>>> Ablynx nv >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- >>>>>>>>>>> l- >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>>>>> >>>>>>>>>>> ###file### >>>>>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>>>>> XX >>>>>>>>>>> PA AB000170.1 >>>>>>>>>>> XX >>>>>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>>>>> XX >>>>>>>>>>> OS Sus scrofa (pig) >>>>>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>>>>>>>>>> Euteleostomi; >>>>>>>>>>> Mammalia; >>>>>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>>>>>>>>> Suidae; Sus. >>>>>>>>>>> OX NCBI_TaxID=9823; >>>>>>>>>>> ......... >>>>>>>>>>> >>>>>>>>>>> I want the accession number in the line that starts with PA, >>>>>>>>>>> AB000170 >>>>>>>>>>> in this example. >>>>>>>>>>> >>>>>>>>>>> Can anybody kindly help, tell me which module and method I >>>>>>>>>>> should use? >>>>>>>>>>> I tried various things like $seq_obj -> primary_id, >>>>>>>>>>> display_id, >>>>>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>>>>> >>>>>>>>>>> Thanks a lot! >>>>>>>>>>> >>>>>>>>>>> Wen >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> -- >>>>>>>>> =========================================================== >>>>>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>>>>> =========================================================== >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> Christopher Fields >>>>>> Postdoctoral Researcher >>>>>> Lab of Dr. Marie-Claude Hofmann >>>>>> College of Veterinary Medicine >>>>>> University of Illinois Urbana-Champaign >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Jun 10 23:00:04 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jun 2008 22:00:04 -0500 Subject: [Bioperl-l] A lot of POD fixes in bioperl-live and bioperl run In-Reply-To: <200806110122.10982.heikki@sanbi.ac.za> References: <200806110122.10982.heikki@sanbi.ac.za> Message-ID: <2BCA7C8B-CDAC-49A8-9809-F9127DB05BEC@uiuc.edu> Thanks for the work on this Heikki! chris On Jun 10, 2008, at 6:22 PM, Heikki Lehvaslaiho wrote: > I have recently done a lot fixes in the inline Plain Old Documenation > (POD) texts in bioperl-live and bioperl-run. Last ones (hopefully) > were > committed a few minutes ago. This has resulted quite large updates > from SVN. > > I wanted to apologize the inconvenience and to explain reasons for > these small > and pedantic fixes. > > In contrast to perl, POD is sensitive to white space. This makes it > relatively > difficult to find and fix all minor errors in POD. I've now gone > through the > trouble of fixing all POD mistakes causing even the smallest warning > in the > podchecker. > > The main reason for doing this was to reduce the the number of > warnings > reported by the pod.pl bioperl maintenence tool. Too many minor > warnings make > it difficult to recognise more serious errors affecting the > integrity and > readability of POD documentation. > > One example case is when a paragraph that was supposed to be 'in > verbatim', is in fact touching the previous paragraph and the pod > engine formats it and destroys the intended ascii graph or table. > The only way POD engine is able to report this is to warn about > unescaped > special characters. > > -Heikki > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From hiekeen at gmail.com Wed Jun 11 04:49:30 2008 From: hiekeen at gmail.com (Jinyan Huang) Date: Wed, 11 Jun 2008 16:49:30 +0800 Subject: [Bioperl-l] How to get all TF which target to sox-oct cis-element? Message-ID: How to get all TF which target to sox-oct cis-element ATGCA(T)A(T)A(G/C)A(T)? Thank you very much. -- Best regards, Jinyan Huang (ekeen) School of Life Sciences and Technology, 1302 Room Tongji University Siping Road 1239, Shanghai 200092 P.R. China Tel :0086-21-65981041 Msn: hiekeen at hotmail.com eMail: hiekeen at gmail.com From dalloliogm at gmail.com Wed Jun 11 05:46:37 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 11 Jun 2008 11:46:37 +0200 Subject: [Bioperl-l] preparing nexus files for Mr. Bayes Message-ID: <5aa3b3570806110246naa24dcaj462d8025048aad41@mail.gmail.com> Hi, I was wondering whether there is any bioperl module I can use to handle nexus files for Mr.Bayes. mr. Bayes is a program for bayesian estimation of phylogeny, which uses an alignment file in a customized nexus format as input. I would need a module to prepare these files, doing tasks like: - joining more than an alignment in a single file/line - substitute matching chars with '.' - customizing parameters displayed in the headers of the output nexus file - manage mrbayes' extensions, like adding information about partitions and taxas - adding batch instructions for the mr bayes interpreter - and similar stuff. I have tried Bio::AlignIO but I see it doesn't handle all of this, what I am looing for is bit more specific. Moreover, I found a bug in the way mrbayes recognizes the output from Bio::AlignIO (https://sourceforge.net/tracker/index.php?func=detail&aid=1990655&group_id=129302&atid=714418). Is there any existing module already available? If there is not, I am going to have to write such scripts anyway. I would like to contribute them to bioperl, even if I am not a very good perl programmer, I prefer python. -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From jason at bioperl.org Wed Jun 11 11:13:20 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 11 Jun 2008 08:13:20 -0700 Subject: [Bioperl-l] preparing nexus files for Mr. Bayes In-Reply-To: <5aa3b3570806110246naa24dcaj462d8025048aad41@mail.gmail.com> References: <5aa3b3570806110246naa24dcaj462d8025048aad41@mail.gmail.com> Message-ID: <32D2B523-8C68-44B3-80B4-E8BB82FB4232@bioperl.org> Giovanni - Bits and pieces already exist, there is little to handle the text blocks that exist in NEXUS files because BioPerl has has been focused on the alignment data not the analysis. Arlin Stolzfus has started a Bio::NEXUS project that is supposed to fully parse all NEXUS data in and out but I don't know what the status is right now. If you look at the Bio::AlignIO::nexus documentation you need to turn off symbols and endblock when writing for MrBayes NEXUS. my $out = Bio::AlignIO->new(-format => 'nexus', -show_symbols => 0, -show_endblock => 0); # after you have written out the alignment $out->write_aln; # you can then print out whatever execution blocks you want with standard print statements. print $out->_fh "begin mrbayes;\n"; ... As for the other things, the Bio::SimpleAlign module lets you swap the match character (see map_char and gap_char methods). For joining alignments and setting up partitions, you can join multiple alignments by making a new Bio::SimpleAlign and adding concatenated sequences together. my %matrix; for my $aln ( @alns ) { for my $seq ( $aln->each_seq ) { my $id = $seq->id; $matrix{$id} .= $seq->seq; } } my $bigaln = Bio::SimpleAlign->new; while( my ($id,$seq) = each %matrix ) { $bigaln->add_seq(Bio::LocatableSeq->new(-id => $id, -seq => $seq)); } In general there is not a single solution to a lot of these tasks (although there should be one that concatenates a set of alignments) so there are not ready-to-use functions. I have my custom code I use to join datasets and establish partitions but much is specific to how I organize the data on my system so I don't know how informative it would be. I'm not sure python code would be much good here... =) if you jump ship you should talk to Frank Kauff who has written some python code to manipulate alignments for biopython. -jason On Jun 11, 2008, at 2:46 AM, Giovanni Marco Dall'Olio wrote: > Hi, > I was wondering whether there is any bioperl module I can use to > handle nexus files for Mr.Bayes. > > mr. Bayes is a program for bayesian estimation of phylogeny, which > uses an alignment file in a customized nexus format as input. > I would need a module to prepare these files, doing tasks like: > - joining more than an alignment in a single file/line > - substitute matching chars with '.' > - customizing parameters displayed in the headers of the output > nexus file > - manage mrbayes' extensions, like adding information about > partitions and taxas > - adding batch instructions for the mr bayes interpreter > - and similar stuff. > > I have tried Bio::AlignIO but I see it doesn't handle all of this, > what I am looing for is bit more specific. > Moreover, I found a bug in the way mrbayes recognizes the output from > Bio::AlignIO (https://sourceforge.net/tracker/index.php? > func=detail&aid=1990655&group_id=129302&atid=714418). > > Is there any existing module already available? > If there is not, I am going to have to write such scripts anyway. > I would like to contribute them to bioperl, even if I am not a very > good perl programmer, I prefer python. > > -- > ----------------------------------------------------------- > > My Blog on Bioinformatics (italian): http://bioinfoblog.it > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers-institute.org Wed Jun 11 11:18:11 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 11 Jun 2008 10:18:11 -0500 Subject: [Bioperl-l] Position Announcement - Programmer Analyst / Bioinformatics - Stowers Institute for Medical Research - Kansas City, Missouri Message-ID: Programmer Analyst The Stowers Institute for Medical Research has an opening for a Programmer Analyst to support the Bioinformatics Center. Responsibilities include development and analysis of software requirements; designing, evaluating, coding, documenting, deploying, and supporting custom and third-party software and bioinformatics applications; and assisting in maintenance of biological sequence databases. In addition to excellent communication skills the successful candidate will have experience with at least one scripting language (Perl, Python, Unix shell, etc); web applications development; source code management systems; Unix/Linux and Windows systems administration; relational database applications (MySQL, Postgresql, etc.); pipeline/ workflow systems; cluster computing; a variety of DNA/ protein sequence analysis applications; and either Bioperl, EnsembleAPI, GMOD/GBrowse, or NCBI C++ toolkit. Experience with biological sequence database management, customizing genome browsers display using track formats such as GFF and bed, and Vector NTI is preferred. The minimum requirements include an undergraduate degree in computer science, bioinformatics, physical science, mathematics, or a related field; and at least two years experience in development, utilization, or support of a group utilizing bioinformatics applications. Apply at Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri From cjfields at uiuc.edu Wed Jun 11 17:01:10 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Jun 2008 16:01:10 -0500 Subject: [Bioperl-l] treexml In-Reply-To: <925346D6-B9B5-4573-991F-0D3F5436FE63@BERKELEY.EDU> References: <925346D6-B9B5-4573-991F-0D3F5436FE63@BERKELEY.EDU> Message-ID: <57DFF483-3159-4BC1-BB48-EE56B0F7498C@uiuc.edu> On Jun 11, 2008, at 3:50 PM, Jason Stajich wrote: > Hey Mira - > > Looks like things are going well. I just wanted to check and see if > it is totally necessary to create new classes or if you can use the > get/set tag/value pair interface that already exists? > > These are the functions that are present in Bio::Tree::TreeI and > Bio::Tree::NodeI : > add_tag_value > remove_tag > get_all_tags > get_tag_values > has_tag > > These are the same functions we use in SeqFeatureI interface as > well. It is just possible to re-use these rather than making a new > function for every data type - this way we don't have to change the > interface for different richness of data. I agree; it simplifies the interface, even if it sacrifices small conveniences. > BTW (and this may be me who did it, but maybe Sendu remembers) - I > am not sure why Bio::Tree::TreeI ISA Bio::Tree::NodeI. Does anyone > know? Trees are containers around a set of Nodes which are linked > together to form a tree and the Tree object holds a pointer to the > root node. Not sure myself; maybe this was part of Sendu's Taxonomy overhaul a while back? > -jason > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.or -c Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From JASON_STAJICH at BERKELEY.EDU Wed Jun 11 16:50:34 2008 From: JASON_STAJICH at BERKELEY.EDU (Jason Stajich) Date: Wed, 11 Jun 2008 13:50:34 -0700 Subject: [Bioperl-l] treexml Message-ID: <925346D6-B9B5-4573-991F-0D3F5436FE63@BERKELEY.EDU> Hey Mira - Looks like things are going well. I just wanted to check and see if it is totally necessary to create new classes or if you can use the get/set tag/value pair interface that already exists? These are the functions that are present in Bio::Tree::TreeI and Bio::Tree::NodeI : add_tag_value remove_tag get_all_tags get_tag_values has_tag These are the same functions we use in SeqFeatureI interface as well. It is just possible to re-use these rather than making a new function for every data type - this way we don't have to change the interface for different richness of data. BTW (and this may be me who did it, but maybe Sendu remembers) - I am not sure why Bio::Tree::TreeI ISA Bio::Tree::NodeI. Does anyone know? Trees are containers around a set of Nodes which are linked together to form a tree and the Tree object holds a pointer to the root node. -jason -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org From miraceti at gmail.com Wed Jun 11 20:50:22 2008 From: miraceti at gmail.com (miraceti) Date: Wed, 11 Jun 2008 20:50:22 -0400 Subject: [Bioperl-l] treexml In-Reply-To: <57DFF483-3159-4BC1-BB48-EE56B0F7498C@uiuc.edu> References: <925346D6-B9B5-4573-991F-0D3F5436FE63@BERKELEY.EDU> <57DFF483-3159-4BC1-BB48-EE56B0F7498C@uiuc.edu> Message-ID: I can remove the TreePhyloXML class, And use the tag functions for Bio::Tree::TreeI. Yeah, it seems like TreePhyloXML is overkill. I don't know about the node though. I think it would be cleaner to have a NodePhyloXML, Since there are much more than scalar values that will be connected to the node, Such as annotations, sequences, etc. On Wed, Jun 11, 2008 at 5:01 PM, Chris Fields wrote: > > On Jun 11, 2008, at 3:50 PM, Jason Stajich wrote: > > Hey Mira - >> >> Looks like things are going well. I just wanted to check and see if it is >> totally necessary to create new classes or if you can use the get/set >> tag/value pair interface that already exists? >> >> These are the functions that are present in Bio::Tree::TreeI and >> Bio::Tree::NodeI : >> add_tag_value >> remove_tag >> get_all_tags >> get_tag_values >> has_tag >> >> These are the same functions we use in SeqFeatureI interface as well. It >> is just possible to re-use these rather than making a new function for every >> data type - this way we don't have to change the interface for different >> richness of data. >> > > I agree; it simplifies the interface, even if it sacrifices small > conveniences. > > BTW (and this may be me who did it, but maybe Sendu remembers) - I am not >> sure why Bio::Tree::TreeI ISA Bio::Tree::NodeI. Does anyone know? Trees are >> containers around a set of Nodes which are linked together to form a tree >> and the Tree object holds a pointer to the root node. >> > > Not sure myself; maybe this was part of Sendu's Taxonomy overhaul a while > back? > > -jason >> -- >> Jason Stajich >> Miller Research Fellow >> University of California, Berkeley >> lab: 510.642.8441 >> http://pmb.berkeley.edu/~taylor/people/js.html >> http://fungalgenomes.or >> > > -c > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed Jun 11 20:56:26 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 11 Jun 2008 17:56:26 -0700 Subject: [Bioperl-l] treexml In-Reply-To: References: Message-ID: <9EC82D14-EDFE-4734-AA72-E65911411628@bioperl.org> On Jun 11, 2008, at 5:48 PM, Han, Mira wrote: > > I can remove the TreePhyloXML class, > And use the tag functions for Bio::Tree::TreeI. > Yeah, it seems like TreePhyloXML is overkill. > I don't know about the node though. > I think it would be cleaner to have a NodePhyloXML, > Since there are much more than scalar values that will be connected > to the node, > Such as annotations, sequences, etc. Mira - That sounds quite fine to me. I agree that the NodePhyloXML is probably going to be different enough from the generic Node object to justify making a sub-class. You may want to inherit from the interfaces like Bio::AnnotatableI in the NodePhyloXML if you are attaching annotations, etc. For example I do this for the Bio::Tree::AlleleNode which are nodes in a tree and also Individuals from a Population for popgen analyses. -jason > > > On 6/11/08 4:50 PM, "Jason Stajich" > wrote: > > Hey Mira - > > Looks like things are going well. I just wanted to check and see if > it is totally necessary to create new classes or if you can use the > get/set tag/value pair interface that already exists? > > These are the functions that are present in Bio::Tree::TreeI and > Bio::Tree::NodeI : > add_tag_value > remove_tag > get_all_tags > get_tag_values > has_tag > > These are the same functions we use in SeqFeatureI interface as > well. It is just possible to re-use these rather than making a new > function for every data type - this way we don't have to change the > interface for different richness of data. > > BTW (and this may be me who did it, but maybe Sendu remembers) - I > am not sure why Bio::Tree::TreeI ISA Bio::Tree::NodeI. Does anyone > know? Trees are containers around a set of Nodes which are linked > together to form a tree and the Tree object holds a pointer to the > root node. > > -jason > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org > > > > From yezhiqiang at gmail.com Thu Jun 12 05:06:32 2008 From: yezhiqiang at gmail.com (Zhi-Qiang Ye) Date: Thu, 12 Jun 2008 17:06:32 +0800 Subject: [Bioperl-l] EMBL format field In-Reply-To: <46E681F5-CB56-4ADA-BEB9-00083CFA78F9@bioperl.org> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> <46E681F5-CB56-4ADA-BEB9-00083CFA78F9@bioperl.org> Message-ID: <34198fe40806120206i31cee6f5xfe6f57b3b9e2a84b@mail.gmail.com> Hi, Jason I used exactly your code, and the result is still 'unknown id'. Where can I get the version of bioperl? I used ubuntu gutsy, the version in ubuntu's package management system is 1.4-1. I installed BioPerl 1.4 on another computer, IA64 with redhat linux. It has the same problem. In the process of installation using CPAN, make test always failed. So I used 'force install ....'. I am not sure it is the reason. Thanks. Zhi-Qiang Ye 2008/6/11 Jason Stajich : > What version of bioperl? It works for me using this code I get 'CB271253' > printed out. > > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-format => 'embl', -file => shift); > while( my $seq = $in->next_seq ) { > print $seq->id,"\n"; > } > > On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote: > >> That's weird. I also met this problem. I tried a embl-format file like >> this: >> >> ID CB271253; SV 1; linear; mRNA; EST; INV; 591 BP. >> XX >> AC CB271253; >> XX >> DT 24-FEB-2003 (Rel. 74, Created) >> DT 24-FEB-2003 (Rel. 74, Last updated, Version 1) >> XX >> DE taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA 3' similar to >> DE SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence. >> >> from: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw >> >> the $seq object's ->id, ->display_id are "unkown id" ... >> >> >> >> ZQ Ye >> >> 2008/6/9 Hilmar Lapp : >>> >>> If this is the case with the latest version of BioPerl it should be filed >>> as >>> a bug report for the embl parser. The ID ought to be reported in >>> $seq->get_secondary_accessions() (which returns an array). If it doesn't, >>> it >>> sounds like a bug to me. >>> >>> -hilmar >>> >>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>> >>>> Hi Wen, >>>> A dump of that sequence object (Data::Dumper is your friend !) reveals >>>> that the PA EMBL field is not saved into the object. However, you will >>>> find the string 'AB000170.1' in the embedded CDS feature, more precisely >>>> the seqid of the location object. I don't know whether that is always >>>> the case, but it is in your particular example. >>>> So, to get your hands on that value you have to do: >>>> >>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures; >>>> my $parent_id = $cds->location->seq_id; >>>> >>>> HTH, >>>> Marc >>>> >>>> Marc Logghe >>>> Senior Bioinformatician >>>> Ablynx nv >>>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] EMBL format field >>>>> >>>>> Hi all, >>>>> >>>>> I have a EMBL file that I want to extract one of the line >>>>> >>>>> ###file### >>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>> XX >>>>> PA AB000170.1 >>>>> XX >>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>> XX >>>>> OS Sus scrofa (pig) >>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; >>>>> Mammalia; >>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. >>>>> OX NCBI_TaxID=9823; >>>>> ......... >>>>> >>>>> I want the accession number in the line that starts with PA, AB000170 >>>>> in this example. >>>>> >>>>> Can anybody kindly help, tell me which module and method I should use? >>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>> get_secondary_id, etc.. they did not work... >>>>> >>>>> Thanks a lot! >>>>> >>>>> Wen >>>>> _______________________________________________ From bix at sendu.me.uk Thu Jun 12 05:54:04 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 12 Jun 2008 10:54:04 +0100 Subject: [Bioperl-l] treexml In-Reply-To: <57DFF483-3159-4BC1-BB48-EE56B0F7498C@uiuc.edu> References: <925346D6-B9B5-4573-991F-0D3F5436FE63@BERKELEY.EDU> <57DFF483-3159-4BC1-BB48-EE56B0F7498C@uiuc.edu> Message-ID: <4850F23C.4070809@sendu.me.uk> Chris Fields wrote: > On Jun 11, 2008, at 3:50 PM, Jason Stajich wrote: >> BTW (and this may be me who did it, but maybe Sendu remembers) - I am >> not sure why Bio::Tree::TreeI ISA Bio::Tree::NodeI. Does anyone know? >> Trees are containers around a set of Nodes which are linked together >> to form a tree and the Tree object holds a pointer to the root node. > > Not sure myself; maybe this was part of Sendu's Taxonomy overhaul a > while back? I don't think it was me. It's worth noting that if you change TreeI to inherit from RootI instead of NodeI, it doesn't seem to cause any problems in the obviously related test scripts. Maybe it's there as a convenience thing? I don't know. From cjfields at uiuc.edu Thu Jun 12 08:21:31 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Jun 2008 07:21:31 -0500 Subject: [Bioperl-l] treexml In-Reply-To: <4850F23C.4070809@sendu.me.uk> References: <925346D6-B9B5-4573-991F-0D3F5436FE63@BERKELEY.EDU> <57DFF483-3159-4BC1-BB48-EE56B0F7498C@uiuc.edu> <4850F23C.4070809@sendu.me.uk> Message-ID: <3CDC860D-05D8-4D6F-B5FC-D76309C510C3@uiuc.edu> Looks like it has been there for a while: http://code.open-bio.org/svnweb/index.cgi/bioperl/diff/bioperl-live/trunk/Bio/Tree/TreeI.pm?rev1=2900;rev2=2901 Jason was the committer, back in 2001. Maybe (as Sendu indicates) it should be changed to RootI. chris On Jun 12, 2008, at 4:54 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Jun 11, 2008, at 3:50 PM, Jason Stajich wrote: >>> BTW (and this may be me who did it, but maybe Sendu remembers) - I >>> am not sure why Bio::Tree::TreeI ISA Bio::Tree::NodeI. Does >>> anyone know? Trees are containers around a set of Nodes which are >>> linked together to form a tree and the Tree object holds a pointer >>> to the root node. >> Not sure myself; maybe this was part of Sendu's Taxonomy overhaul a >> while back? > > I don't think it was me. It's worth noting that if you change TreeI > to inherit from RootI instead of NodeI, it doesn't seem to cause any > problems in the obviously related test scripts. > > Maybe it's there as a convenience thing? I don't know. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From hlapp at gmx.net Thu Jun 12 09:58:25 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 12 Jun 2008 09:58:25 -0400 Subject: [Bioperl-l] treexml In-Reply-To: <9EC82D14-EDFE-4734-AA72-E65911411628@bioperl.org> References: <9EC82D14-EDFE-4734-AA72-E65911411628@bioperl.org> Message-ID: <922F0238-F390-4A33-99F6-8473CDF22980@gmx.net> I agree with implementing Bio::AnnotatableI. I'm not sure we need to have a node class specifically tailored to PhyloXML but maybe we do. But even if we do, the annotatable capability sounds generic enough for having a Bio::Tree::AnnotatableNode, from which Bio::Tree::NodePhyloXML would inherit and do all the things that are really specific to PhyloXML? -hilmar On Jun 11, 2008, at 8:56 PM, Jason Stajich wrote: > > On Jun 11, 2008, at 5:48 PM, Han, Mira wrote: > >> >> I can remove the TreePhyloXML class, >> And use the tag functions for Bio::Tree::TreeI. >> Yeah, it seems like TreePhyloXML is overkill. >> I don't know about the node though. >> I think it would be cleaner to have a NodePhyloXML, >> Since there are much more than scalar values that will be >> connected to the node, >> Such as annotations, sequences, etc. > > Mira - > That sounds quite fine to me. I agree that the NodePhyloXML is > probably going to be different enough from the generic Node object > to justify making a sub-class. You may want to inherit from the > interfaces like Bio::AnnotatableI in the NodePhyloXML if you are > attaching annotations, etc. > > For example I do this for the Bio::Tree::AlleleNode which are nodes > in a tree and also Individuals from a Population for popgen analyses. > > -jason >> >> >> On 6/11/08 4:50 PM, "Jason Stajich" >> wrote: >> >> Hey Mira - >> >> Looks like things are going well. I just wanted to check and see >> if it is totally necessary to create new classes or if you can use >> the get/set tag/value pair interface that already exists? >> >> These are the functions that are present in Bio::Tree::TreeI and >> Bio::Tree::NodeI : >> add_tag_value >> remove_tag >> get_all_tags >> get_tag_values >> has_tag >> >> These are the same functions we use in SeqFeatureI interface as >> well. It is just possible to re-use these rather than making a >> new function for every data type - this way we don't have to >> change the interface for different richness of data. >> >> BTW (and this may be me who did it, but maybe Sendu remembers) - I >> am not sure why Bio::Tree::TreeI ISA Bio::Tree::NodeI. Does >> anyone know? Trees are containers around a set of Nodes which are >> linked together to form a tree and the Tree object holds a pointer >> to the root node. >> >> -jason >> >> -- >> Jason Stajich >> Miller Research Fellow >> University of California, Berkeley >> lab: 510.642.8441 >> http://pmb.berkeley.edu/~taylor/people/js.html >> http://fungalgenomes.org >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mirhan at indiana.edu Thu Jun 12 11:44:08 2008 From: mirhan at indiana.edu (Han, Mira) Date: Thu, 12 Jun 2008 11:44:08 -0400 Subject: [Bioperl-l] treexml In-Reply-To: <922F0238-F390-4A33-99F6-8473CDF22980@gmx.net> Message-ID: I'll do that, I was thinking of having an annotationCollectionI reference as a variable, But inheriting from AnnotatableI sounds better. Do you mean I should make a more generic AnnotatableNode instead of NodePhyloXML? Mira On 6/12/08 9:58 AM, "Hilmar Lapp" wrote: I agree with implementing Bio::AnnotatableI. I'm not sure we need to have a node class specifically tailored to PhyloXML but maybe we do. But even if we do, the annotatable capability sounds generic enough for having a Bio::Tree::AnnotatableNode, from which Bio::Tree::NodePhyloXML would inherit and do all the things that are really specific to PhyloXML? -hilmar On Jun 11, 2008, at 8:56 PM, Jason Stajich wrote: > > On Jun 11, 2008, at 5:48 PM, Han, Mira wrote: > >> >> I can remove the TreePhyloXML class, >> And use the tag functions for Bio::Tree::TreeI. >> Yeah, it seems like TreePhyloXML is overkill. >> I don't know about the node though. >> I think it would be cleaner to have a NodePhyloXML, >> Since there are much more than scalar values that will be >> connected to the node, >> Such as annotations, sequences, etc. > > Mira - > That sounds quite fine to me. I agree that the NodePhyloXML is > probably going to be different enough from the generic Node object > to justify making a sub-class. You may want to inherit from the > interfaces like Bio::AnnotatableI in the NodePhyloXML if you are > attaching annotations, etc. > > For example I do this for the Bio::Tree::AlleleNode which are nodes > in a tree and also Individuals from a Population for popgen analyses. > > -jason >> >> >> On 6/11/08 4:50 PM, "Jason Stajich" >> wrote: >> >> Hey Mira - >> >> Looks like things are going well. I just wanted to check and see >> if it is totally necessary to create new classes or if you can use >> the get/set tag/value pair interface that already exists? >> >> These are the functions that are present in Bio::Tree::TreeI and >> Bio::Tree::NodeI : >> add_tag_value >> remove_tag >> get_all_tags >> get_tag_values >> has_tag >> >> These are the same functions we use in SeqFeatureI interface as >> well. It is just possible to re-use these rather than making a >> new function for every data type - this way we don't have to >> change the interface for different richness of data. >> >> BTW (and this may be me who did it, but maybe Sendu remembers) - I >> am not sure why Bio::Tree::TreeI ISA Bio::Tree::NodeI. Does >> anyone know? Trees are containers around a set of Nodes which are >> linked together to form a tree and the Tree object holds a pointer >> to the root node. >> >> -jason >> >> -- >> Jason Stajich >> Miller Research Fellow >> University of California, Berkeley >> lab: 510.642.8441 >> http://pmb.berkeley.edu/~taylor/people/js.html >> http://fungalgenomes.org >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Kevin.M.Brown at asu.edu Thu Jun 12 11:22:11 2008 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 12 Jun 2008 08:22:11 -0700 Subject: [Bioperl-l] EMBL format field In-Reply-To: <34198fe40806120206i31cee6f5xfe6f57b3b9e2a84b@mail.gmail.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com><03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com><34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com><46E681F5-CB56-4ADA-BEB9-00083CFA78F9@bioperl.org> <34198fe40806120206i31cee6f5xfe6f57b3b9e2a84b@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B404F2B374@EX02.asurite.ad.asu.edu> See the following links for where to get a more current version. 1.4 is years old and lots of parts are non-functional due to website and file format changes. http://www.bioperl.org/wiki/Installing_BioPerl http://www.bioperl.org/wiki/Installing_BioPerl_on_Ubuntu_Server > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Zhi-Qiang Ye > Sent: Thursday, June 12, 2008 2:07 AM > To: Jason Stajich > Cc: bioperl list > Subject: Re: [Bioperl-l] EMBL format field > > Hi, Jason > > I used exactly your code, and the result is still 'unknown id'. > Where can I get the version of bioperl? > I used ubuntu gutsy, the version in ubuntu's package > management system is 1.4-1. > > I installed BioPerl 1.4 on another computer, IA64 with redhat > linux. It has the same problem. > In the process of installation using CPAN, make test always failed. So > I used 'force install ....'. > I am not sure it is the reason. > > Thanks. > Zhi-Qiang Ye > > 2008/6/11 Jason Stajich : > > What version of bioperl? It works for me using this code I > get 'CB271253' > > printed out. > > > > #!/usr/bin/perl -w > > use strict; > > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-format => 'embl', -file => shift); > > while( my $seq = $in->next_seq ) { > > print $seq->id,"\n"; > > } > > > > On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote: > > > >> That's weird. I also met this problem. I tried a > embl-format file like > >> this: > >> > >> ID CB271253; SV 1; linear; mRNA; EST; INV; 591 BP. > >> XX > >> AC CB271253; > >> XX > >> DT 24-FEB-2003 (Rel. 74, Created) > >> DT 24-FEB-2003 (Rel. 74, Last updated, Version 1) > >> XX > >> DE taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA > 3' similar to > >> DE SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence. > >> > >> from: > http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw > >> > >> the $seq object's ->id, ->display_id are "unkown id" ... > >> > >> > >> > >> ZQ Ye > >> > >> 2008/6/9 Hilmar Lapp : > >>> > >>> If this is the case with the latest version of BioPerl it > should be filed > >>> as > >>> a bug report for the embl parser. The ID ought to be reported in > >>> $seq->get_secondary_accessions() (which returns an > array). If it doesn't, > >>> it > >>> sounds like a bug to me. > >>> > >>> -hilmar > >>> > >>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: > >>>> > >>>> Hi Wen, > >>>> A dump of that sequence object (Data::Dumper is your > friend !) reveals > >>>> that the PA EMBL field is not saved into the object. > However, you will > >>>> find the string 'AB000170.1' in the embedded CDS > feature, more precisely > >>>> the seqid of the location object. I don't know whether > that is always > >>>> the case, but it is in your particular example. > >>>> So, to get your hands on that value you have to do: > >>>> > >>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} > $seq->get_SeqFeatures; > >>>> my $parent_id = $cds->location->seq_id; > >>>> > >>>> HTH, > >>>> Marc > >>>> > >>>> Marc Logghe > >>>> Senior Bioinformatician > >>>> Ablynx nv > >>>>> > >>>>> -----Original Message----- > >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang > >>>>> Sent: Monday, June 09, 2008 5:28 AM > >>>>> To: bioperl-l at lists.open-bio.org > >>>>> Subject: [Bioperl-l] EMBL format field > >>>>> > >>>>> Hi all, > >>>>> > >>>>> I have a EMBL file that I want to extract one of the line > >>>>> > >>>>> ###file### > >>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. > >>>>> XX > >>>>> PA AB000170.1 > >>>>> XX > >>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 > >>>>> XX > >>>>> OS Sus scrofa (pig) > >>>>> OC Eukaryota; Metazoa; Chordata; Craniata; > Vertebrata; Euteleostomi; > >>>>> Mammalia; > >>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; > Suidae; Sus. > >>>>> OX NCBI_TaxID=9823; > >>>>> ......... > >>>>> > >>>>> I want the accession number in the line that starts > with PA, AB000170 > >>>>> in this example. > >>>>> > >>>>> Can anybody kindly help, tell me which module and > method I should use? > >>>>> I tried various things like $seq_obj -> primary_id, display_id, > >>>>> get_secondary_id, etc.. they did not work... > >>>>> > >>>>> Thanks a lot! > >>>>> > >>>>> Wen > >>>>> _______________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Jun 12 13:03:55 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Jun 2008 12:03:55 -0500 Subject: [Bioperl-l] treexml In-Reply-To: <922F0238-F390-4A33-99F6-8473CDF22980@gmx.net> References: <9EC82D14-EDFE-4734-AA72-E65911411628@bioperl.org> <922F0238-F390-4A33-99F6-8473CDF22980@gmx.net> Message-ID: <4EAAD507-FF89-4F9A-909A-1E1C7B0187C8@uiuc.edu> Yes, that sounds correct to me. I could see this being useful for other rich tree file formats so it probably should be genericized. chris On Jun 12, 2008, at 8:58 AM, Hilmar Lapp wrote: > I agree with implementing Bio::AnnotatableI. I'm not sure we need to > have a node class specifically tailored to PhyloXML but maybe we do. > > But even if we do, the annotatable capability sounds generic enough > for having a Bio::Tree::AnnotatableNode, from which > Bio::Tree::NodePhyloXML would inherit and do all the things that are > really specific to PhyloXML? > > -hilmar > > On Jun 11, 2008, at 8:56 PM, Jason Stajich wrote: >> >> On Jun 11, 2008, at 5:48 PM, Han, Mira wrote: >> >>> >>> I can remove the TreePhyloXML class, >>> And use the tag functions for Bio::Tree::TreeI. >>> Yeah, it seems like TreePhyloXML is overkill. >>> I don't know about the node though. >>> I think it would be cleaner to have a NodePhyloXML, >>> Since there are much more than scalar values that will be >>> connected to the node, >>> Such as annotations, sequences, etc. >> >> Mira - >> That sounds quite fine to me. I agree that the NodePhyloXML is >> probably going to be different enough from the generic Node object >> to justify making a sub-class. You may want to inherit from the >> interfaces like Bio::AnnotatableI in the NodePhyloXML if you are >> attaching annotations, etc. >> >> For example I do this for the Bio::Tree::AlleleNode which are nodes >> in a tree and also Individuals from a Population for popgen analyses. >> >> -jason >>> >>> >>> On 6/11/08 4:50 PM, "Jason Stajich" >>> wrote: >>> >>> Hey Mira - >>> >>> Looks like things are going well. I just wanted to check and see >>> if it is totally necessary to create new classes or if you can use >>> the get/set tag/value pair interface that already exists? >>> >>> These are the functions that are present in Bio::Tree::TreeI and >>> Bio::Tree::NodeI : >>> add_tag_value >>> remove_tag >>> get_all_tags >>> get_tag_values >>> has_tag >>> >>> These are the same functions we use in SeqFeatureI interface as >>> well. It is just possible to re-use these rather than making a >>> new function for every data type - this way we don't have to >>> change the interface for different richness of data. >>> >>> BTW (and this may be me who did it, but maybe Sendu remembers) - I >>> am not sure why Bio::Tree::TreeI ISA Bio::Tree::NodeI. Does >>> anyone know? Trees are containers around a set of Nodes which are >>> linked together to form a tree and the Tree object holds a pointer >>> to the root node. >>> >>> -jason >>> >>> -- >>> Jason Stajich >>> Miller Research Fellow >>> University of California, Berkeley >>> lab: 510.642.8441 >>> http://pmb.berkeley.edu/~taylor/people/js.html >>> http://fungalgenomes.org >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From vbhonagiri at yahoo.com Thu Jun 12 17:18:20 2008 From: vbhonagiri at yahoo.com (veena bhonagiri) Date: Thu, 12 Jun 2008 14:18:20 -0700 (PDT) Subject: [Bioperl-l] Hi Message-ID: <267026.87405.qm@web35304.mail.mud.yahoo.com> Hi, I was looking for a parser to convert gff3 to gff2 format. Is there one available? thanks, Veena From barry.moore at genetics.utah.edu Thu Jun 12 17:44:29 2008 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 12 Jun 2008 15:44:29 -0600 Subject: [Bioperl-l] Hi In-Reply-To: <267026.87405.qm@web35304.mail.mud.yahoo.com> References: <267026.87405.qm@web35304.mail.mud.yahoo.com> Message-ID: Did you really want to go backwards from gff3 to gff2 or did you mean from gff2 to gff3? If you want to go from GTF/GFF2 forward to GFF3 there is a script available through the Sequence Ontology for doing this (http:// song.cvs.sourceforge.net/song/software/scripts/gtf2gff3/). It handles a wide variety of GTF and GFF2 files, however since there is a lot of variability of implementations of those formats I can't guarantee that it will handle every flavor. For example, it does not work on GTF from the UCSC table browser because those files use the same ID for gene and transcript, so it is impossible to group multiple transcripts to a gene. The version released will has been tested on Ensembl and Twinscan GTF. I happen to be working on extending that script right now to handle a GFF2/GFF3 hybrid found in WS180 release from WormBase so if it doesn't handle your case it may soon. If you use it and have feedback I'd appreciate hearing from you. Barry Barry Moore Senior Research Specialist Eccles Institute of Human Genetics Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 On Jun 12, 2008, at 3:18 PM, veena bhonagiri wrote: > Hi, > I was looking for a parser to convert gff3 to gff2 format. Is there > one available? > > thanks, > Veena > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki at sanbi.ac.za Fri Jun 13 07:33:21 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 13 Jun 2008 13:33:21 +0200 Subject: [Bioperl-l] preparing nexus files for Mr. Bayes In-Reply-To: <32D2B523-8C68-44B3-80B4-E8BB82FB4232@bioperl.org> References: <5aa3b3570806110246naa24dcaj462d8025048aad41@mail.gmail.com> <32D2B523-8C68-44B3-80B4-E8BB82FB4232@bioperl.org> Message-ID: <200806131333.21902.heikki@sanbi.ac.za> FYI, According http://search.cpan.org/dist/Bio-NEXUS/, the Bio::NEXUS modules have been recently updated (Apr 23 2008). It would definitely be worth testing them and possibly create subclasses to handle PAUP and MrBayes command blocks. -Heikki On Wednesday 11 June 2008 17:13:20 Jason Stajich wrote: > Giovanni - > > Bits and pieces already exist, there is little to handle the text > blocks that exist in NEXUS files because BioPerl has has been focused > on the alignment data not the analysis. Arlin Stolzfus has started a > Bio::NEXUS project that is supposed to fully parse all NEXUS data in > and out but I don't know what the status is right now. > > If you look at the Bio::AlignIO::nexus documentation you need to turn > off symbols and endblock when writing for MrBayes NEXUS. > > my $out = Bio::AlignIO->new(-format => 'nexus', > -show_symbols => 0, > -show_endblock => 0); > > > # after you have written out the alignment > $out->write_aln; > # you can then print out whatever execution blocks you want with > standard print statements. > print $out->_fh "begin mrbayes;\n"; ... > > > As for the other things, the Bio::SimpleAlign module lets you swap > the match character (see map_char and gap_char methods). > > For joining alignments and setting up partitions, you can join > multiple alignments by making a new Bio::SimpleAlign and adding > concatenated sequences together. > > my %matrix; > for my $aln ( @alns ) { > for my $seq ( $aln->each_seq ) { > my $id = $seq->id; > $matrix{$id} .= $seq->seq; > } > } > my $bigaln = Bio::SimpleAlign->new; > while( my ($id,$seq) = each %matrix ) { > $bigaln->add_seq(Bio::LocatableSeq->new(-id => $id, > -seq => $seq)); > } > > > In general there is not a single solution to a lot of these tasks > (although there should be one that concatenates a set of alignments) > so there are not ready-to-use functions. I have my custom code I use > to join datasets and establish partitions but much is specific to how > I organize the data on my system so I don't know how informative it > would be. > > I'm not sure python code would be much good here... =) if you jump > ship you should talk to Frank Kauff who has written some python code > to manipulate alignments for biopython. > > -jason > > On Jun 11, 2008, at 2:46 AM, Giovanni Marco Dall'Olio wrote: > > Hi, > > I was wondering whether there is any bioperl module I can use to > > handle nexus files for Mr.Bayes. > > > > mr. Bayes is a program for bayesian estimation of phylogeny, which > > uses an alignment file in a customized nexus format as input. > > I would need a module to prepare these files, doing tasks like: > > - joining more than an alignment in a single file/line > > - substitute matching chars with '.' > > - customizing parameters displayed in the headers of the output > > nexus file > > - manage mrbayes' extensions, like adding information about > > partitions and taxas > > - adding batch instructions for the mr bayes interpreter > > - and similar stuff. > > > > I have tried Bio::AlignIO but I see it doesn't handle all of this, > > what I am looing for is bit more specific. > > Moreover, I found a bug in the way mrbayes recognizes the output from > > Bio::AlignIO (https://sourceforge.net/tracker/index.php? > > func=detail&aid=1990655&group_id=129302&atid=714418). > > > > Is there any existing module already available? > > If there is not, I am going to have to write such scripts anyway. > > I would like to contribute them to bioperl, even if I am not a very > > good perl programmer, I prefer python. > > > > -- > > ----------------------------------------------------------- > > > > My Blog on Bioinformatics (italian): http://bioinfoblog.it > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From yezhiqiang at gmail.com Sat Jun 14 09:39:45 2008 From: yezhiqiang at gmail.com (Zhi-Qiang Ye) Date: Sat, 14 Jun 2008 21:39:45 +0800 Subject: [Bioperl-l] EMBL format field In-Reply-To: <1A4207F8295607498283FE9E93B775B404F2B374@EX02.asurite.ad.asu.edu> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> <46E681F5-CB56-4ADA-BEB9-00083CFA78F9@bioperl.org> <34198fe40806120206i31cee6f5xfe6f57b3b9e2a84b@mail.gmail.com> <1A4207F8295607498283FE9E93B775B404F2B374@EX02.asurite.ad.asu.edu> Message-ID: <34198fe40806140639g1c2edc3aqe8fce6d03a102642@mail.gmail.com> Thank all of you. I finally get the newest version of bioperl installed and solved the problem. I noticed that ensembl API still uses bioperl-1.2.3, which misleaded me that bioperl-1.4 is very up-to-date ... Regards, Zhi-Qiang 2008/6/12 Kevin Brown : > See the following links for where to get a more current version. 1.4 is > years old and lots of parts are non-functional due to website and file > format changes. > > http://www.bioperl.org/wiki/Installing_BioPerl > > http://www.bioperl.org/wiki/Installing_BioPerl_on_Ubuntu_Server > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Zhi-Qiang Ye >> Sent: Thursday, June 12, 2008 2:07 AM >> To: Jason Stajich >> Cc: bioperl list >> Subject: Re: [Bioperl-l] EMBL format field >> >> Hi, Jason >> >> I used exactly your code, and the result is still 'unknown id'. >> Where can I get the version of bioperl? >> I used ubuntu gutsy, the version in ubuntu's package >> management system is 1.4-1. >> >> I installed BioPerl 1.4 on another computer, IA64 with redhat >> linux. It has the same problem. >> In the process of installation using CPAN, make test always failed. So >> I used 'force install ....'. >> I am not sure it is the reason. >> >> Thanks. >> Zhi-Qiang Ye >> >> 2008/6/11 Jason Stajich : >> > What version of bioperl? It works for me using this code I >> get 'CB271253' >> > printed out. >> > >> > #!/usr/bin/perl -w >> > use strict; >> > use Bio::SeqIO; >> > my $in = Bio::SeqIO->new(-format => 'embl', -file => shift); >> > while( my $seq = $in->next_seq ) { >> > print $seq->id,"\n"; >> > } >> > >> > On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote: >> > >> >> That's weird. I also met this problem. I tried a >> embl-format file like >> >> this: >> >> >> >> ID CB271253; SV 1; linear; mRNA; EST; INV; 591 BP. >> >> XX >> >> AC CB271253; >> >> XX >> >> DT 24-FEB-2003 (Rel. 74, Created) >> >> DT 24-FEB-2003 (Rel. 74, Last updated, Version 1) >> >> XX >> >> DE taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA >> 3' similar to >> >> DE SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence. >> >> >> >> from: >> http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw >> >> >> >> the $seq object's ->id, ->display_id are "unkown id" ... >> >> >> >> >> >> >> >> ZQ Ye >> >> >> >> 2008/6/9 Hilmar Lapp : >> >>> >> >>> If this is the case with the latest version of BioPerl it >> should be filed >> >>> as >> >>> a bug report for the embl parser. The ID ought to be reported in >> >>> $seq->get_secondary_accessions() (which returns an >> array). If it doesn't, >> >>> it >> >>> sounds like a bug to me. >> >>> >> >>> -hilmar >> >>> >> >>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >> >>>> >> >>>> Hi Wen, >> >>>> A dump of that sequence object (Data::Dumper is your >> friend !) reveals >> >>>> that the PA EMBL field is not saved into the object. >> However, you will >> >>>> find the string 'AB000170.1' in the embedded CDS >> feature, more precisely >> >>>> the seqid of the location object. I don't know whether >> that is always >> >>>> the case, but it is in your particular example. >> >>>> So, to get your hands on that value you have to do: >> >>>> >> >>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} >> $seq->get_SeqFeatures; >> >>>> my $parent_id = $cds->location->seq_id; >> >>>> >> >>>> HTH, >> >>>> Marc >> >>>> >> >>>> Marc Logghe >> >>>> Senior Bioinformatician >> >>>> Ablynx nv >> >>>>> >> >>>>> -----Original Message----- >> >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> >>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >> >>>>> Sent: Monday, June 09, 2008 5:28 AM >> >>>>> To: bioperl-l at lists.open-bio.org >> >>>>> Subject: [Bioperl-l] EMBL format field >> >>>>> >> >>>>> Hi all, >> >>>>> >> >>>>> I have a EMBL file that I want to extract one of the line >> >>>>> >> >>>>> ###file### >> >>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >> >>>>> XX >> >>>>> PA AB000170.1 >> >>>>> XX >> >>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >> >>>>> XX >> >>>>> OS Sus scrofa (pig) >> >>>>> OC Eukaryota; Metazoa; Chordata; Craniata; >> Vertebrata; Euteleostomi; >> >>>>> Mammalia; >> >>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >> Suidae; Sus. >> >>>>> OX NCBI_TaxID=9823; >> >>>>> ......... >> >>>>> >> >>>>> I want the accession number in the line that starts >> with PA, AB000170 >> >>>>> in this example. >> >>>>> >> >>>>> Can anybody kindly help, tell me which module and >> method I should use? >> >>>>> I tried various things like $seq_obj -> primary_id, display_id, >> >>>>> get_secondary_id, etc.. they did not work... >> >>>>> >> >>>>> Thanks a lot! >> >>>>> >> >>>>> Wen >> >>>>> _______________________________________________ >> _______________________________________________ >> Bioperl-l mailing list From cjfields at uiuc.edu Sat Jun 14 22:25:26 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 14 Jun 2008 21:25:26 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: <34198fe40806140639g1c2edc3aqe8fce6d03a102642@mail.gmail.com> References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> <46E681F5-CB56-4ADA-BEB9-00083CFA78F9@bioperl.org> <34198fe40806120206i31cee6f5xfe6f57b3b9e2a84b@mail.gmail.com> <1A4207F8295607498283FE9E93B775B404F2B374@EX02.asurite.ad.asu.edu> <34198fe40806140639g1c2edc3aqe8fce6d03a102642@mail.gmail.com> Message-ID: See this FAQ question, as well as the one following it. chris On Jun 14, 2008, at 8:39 AM, Zhi-Qiang Ye wrote: > Thank all of you. I finally get the newest version of bioperl > installed and solved the problem. > > I noticed that ensembl API still uses bioperl-1.2.3, which > misleaded me that bioperl-1.4 is very up-to-date ... > > > Regards, > Zhi-Qiang > > > 2008/6/12 Kevin Brown : >> See the following links for where to get a more current version. >> 1.4 is >> years old and lots of parts are non-functional due to website and >> file >> format changes. >> >> http://www.bioperl.org/wiki/Installing_BioPerl >> >> http://www.bioperl.org/wiki/Installing_BioPerl_on_Ubuntu_Server >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Zhi-Qiang Ye >>> Sent: Thursday, June 12, 2008 2:07 AM >>> To: Jason Stajich >>> Cc: bioperl list >>> Subject: Re: [Bioperl-l] EMBL format field >>> >>> Hi, Jason >>> >>> I used exactly your code, and the result is still 'unknown id'. >>> Where can I get the version of bioperl? >>> I used ubuntu gutsy, the version in ubuntu's package >>> management system is 1.4-1. >>> >>> I installed BioPerl 1.4 on another computer, IA64 with redhat >>> linux. It has the same problem. >>> In the process of installation using CPAN, make test always >>> failed. So >>> I used 'force install ....'. >>> I am not sure it is the reason. >>> >>> Thanks. >>> Zhi-Qiang Ye >>> >>> 2008/6/11 Jason Stajich : >>>> What version of bioperl? It works for me using this code I >>> get 'CB271253' >>>> printed out. >>>> >>>> #!/usr/bin/perl -w >>>> use strict; >>>> use Bio::SeqIO; >>>> my $in = Bio::SeqIO->new(-format => 'embl', -file => shift); >>>> while( my $seq = $in->next_seq ) { >>>> print $seq->id,"\n"; >>>> } >>>> >>>> On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote: >>>> >>>>> That's weird. I also met this problem. I tried a >>> embl-format file like >>>>> this: >>>>> >>>>> ID CB271253; SV 1; linear; mRNA; EST; INV; 591 BP. >>>>> XX >>>>> AC CB271253; >>>>> XX >>>>> DT 24-FEB-2003 (Rel. 74, Created) >>>>> DT 24-FEB-2003 (Rel. 74, Last updated, Version 1) >>>>> XX >>>>> DE taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA >>> 3' similar to >>>>> DE SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence. >>>>> >>>>> from: >>> http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw >>>>> >>>>> the $seq object's ->id, ->display_id are "unkown id" ... >>>>> >>>>> >>>>> >>>>> ZQ Ye >>>>> >>>>> 2008/6/9 Hilmar Lapp : >>>>>> >>>>>> If this is the case with the latest version of BioPerl it >>> should be filed >>>>>> as >>>>>> a bug report for the embl parser. The ID ought to be reported in >>>>>> $seq->get_secondary_accessions() (which returns an >>> array). If it doesn't, >>>>>> it >>>>>> sounds like a bug to me. >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>> >>>>>>> Hi Wen, >>>>>>> A dump of that sequence object (Data::Dumper is your >>> friend !) reveals >>>>>>> that the PA EMBL field is not saved into the object. >>> However, you will >>>>>>> find the string 'AB000170.1' in the embedded CDS >>> feature, more precisely >>>>>>> the seqid of the location object. I don't know whether >>> that is always >>>>>>> the case, but it is in your particular example. >>>>>>> So, to get your hands on that value you have to do: >>>>>>> >>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} >>> $seq->get_SeqFeatures; >>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>> >>>>>>> HTH, >>>>>>> Marc >>>>>>> >>>>>>> Marc Logghe >>>>>>> Senior Bioinformatician >>>>>>> Ablynx nv >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>> >>>>>>>> ###file### >>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>> XX >>>>>>>> PA AB000170.1 >>>>>>>> XX >>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>> XX >>>>>>>> OS Sus scrofa (pig) >>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; >>> Vertebrata; Euteleostomi; >>>>>>>> Mammalia; >>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>> Suidae; Sus. >>>>>>>> OX NCBI_TaxID=9823; >>>>>>>> ......... >>>>>>>> >>>>>>>> I want the accession number in the line that starts >>> with PA, AB000170 >>>>>>>> in this example. >>>>>>>> >>>>>>>> Can anybody kindly help, tell me which module and >>> method I should use? >>>>>>>> I tried various things like $seq_obj -> primary_id, display_id, >>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>> >>>>>>>> Thanks a lot! >>>>>>>> >>>>>>>> Wen >>>>>>>> _______________________________________________ >>> _______________________________________________ >>> Bioperl-l mailing list > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Jun 15 01:17:10 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 15 Jun 2008 00:17:10 -0500 Subject: [Bioperl-l] EMBL format field In-Reply-To: References: <3A0B2CCF-6EA7-485D-ABC9-6A0F83B3F4B0@gmail.com> <03C512635899144083CADB0EE222018901C8072B@alpaca.lan.ablynx.com> <34198fe40806100443g6c18f47dj881d68c0bf14ba8f@mail.gmail.com> <46E681F5-CB56-4ADA-BEB9-00083CFA78F9@bioperl.org> <34198fe40806120206i31cee6f5xfe6f57b3b9e2a84b@mail.gmail.com> <1A4207F8295607498283FE9E93B775B404F2B374@EX02.asurite.ad.asu.edu> <34198fe40806140639g1c2edc3aqe8fce6d03a102642@mail.gmail.com> Message-ID: <1898E656-CC34-4BBE-8026-7266A6CC036D@uiuc.edu> Um, should have attached a link there: http://www.bioperl.org/wiki/FAQ#I_am_using_Ensembl.__How_do_I_do_XYZ.3F chris On Jun 14, 2008, at 9:25 PM, Chris Fields wrote: > See this FAQ question, as well as the one following it. > > chris > > On Jun 14, 2008, at 8:39 AM, Zhi-Qiang Ye wrote: > >> Thank all of you. I finally get the newest version of bioperl >> installed and solved the problem. >> >> I noticed that ensembl API still uses bioperl-1.2.3, which >> misleaded me that bioperl-1.4 is very up-to-date ... >> >> >> Regards, >> Zhi-Qiang >> >> >> 2008/6/12 Kevin Brown : >>> See the following links for where to get a more current version. >>> 1.4 is >>> years old and lots of parts are non-functional due to website and >>> file >>> format changes. >>> >>> http://www.bioperl.org/wiki/Installing_BioPerl >>> >>> http://www.bioperl.org/wiki/Installing_BioPerl_on_Ubuntu_Server >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>> Zhi-Qiang Ye >>>> Sent: Thursday, June 12, 2008 2:07 AM >>>> To: Jason Stajich >>>> Cc: bioperl list >>>> Subject: Re: [Bioperl-l] EMBL format field >>>> >>>> Hi, Jason >>>> >>>> I used exactly your code, and the result is still 'unknown id'. >>>> Where can I get the version of bioperl? >>>> I used ubuntu gutsy, the version in ubuntu's package >>>> management system is 1.4-1. >>>> >>>> I installed BioPerl 1.4 on another computer, IA64 with redhat >>>> linux. It has the same problem. >>>> In the process of installation using CPAN, make test always >>>> failed. So >>>> I used 'force install ....'. >>>> I am not sure it is the reason. >>>> >>>> Thanks. >>>> Zhi-Qiang Ye >>>> >>>> 2008/6/11 Jason Stajich : >>>>> What version of bioperl? It works for me using this code I >>>> get 'CB271253' >>>>> printed out. >>>>> >>>>> #!/usr/bin/perl -w >>>>> use strict; >>>>> use Bio::SeqIO; >>>>> my $in = Bio::SeqIO->new(-format => 'embl', -file => shift); >>>>> while( my $seq = $in->next_seq ) { >>>>> print $seq->id,"\n"; >>>>> } >>>>> >>>>> On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote: >>>>> >>>>>> That's weird. I also met this problem. I tried a >>>> embl-format file like >>>>>> this: >>>>>> >>>>>> ID CB271253; SV 1; linear; mRNA; EST; INV; 591 BP. >>>>>> XX >>>>>> AC CB271253; >>>>>> XX >>>>>> DT 24-FEB-2003 (Rel. 74, Created) >>>>>> DT 24-FEB-2003 (Rel. 74, Last updated, Version 1) >>>>>> XX >>>>>> DE taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA >>>> 3' similar to >>>>>> DE SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence. >>>>>> >>>>>> from: >>>> http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw >>>>>> >>>>>> the $seq object's ->id, ->display_id are "unkown id" ... >>>>>> >>>>>> >>>>>> >>>>>> ZQ Ye >>>>>> >>>>>> 2008/6/9 Hilmar Lapp : >>>>>>> >>>>>>> If this is the case with the latest version of BioPerl it >>>> should be filed >>>>>>> as >>>>>>> a bug report for the embl parser. The ID ought to be reported in >>>>>>> $seq->get_secondary_accessions() (which returns an >>>> array). If it doesn't, >>>>>>> it >>>>>>> sounds like a bug to me. >>>>>>> >>>>>>> -hilmar >>>>>>> >>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote: >>>>>>>> >>>>>>>> Hi Wen, >>>>>>>> A dump of that sequence object (Data::Dumper is your >>>> friend !) reveals >>>>>>>> that the PA EMBL field is not saved into the object. >>>> However, you will >>>>>>>> find the string 'AB000170.1' in the embedded CDS >>>> feature, more precisely >>>>>>>> the seqid of the location object. I don't know whether >>>> that is always >>>>>>>> the case, but it is in your particular example. >>>>>>>> So, to get your hands on that value you have to do: >>>>>>>> >>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} >>>> $seq->get_SeqFeatures; >>>>>>>> my $parent_id = $cds->location->seq_id; >>>>>>>> >>>>>>>> HTH, >>>>>>>> Marc >>>>>>>> >>>>>>>> Marc Logghe >>>>>>>> Senior Bioinformatician >>>>>>>> Ablynx nv >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang >>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM >>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>> Subject: [Bioperl-l] EMBL format field >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have a EMBL file that I want to extract one of the line >>>>>>>>> >>>>>>>>> ###file### >>>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP. >>>>>>>>> XX >>>>>>>>> PA AB000170.1 >>>>>>>>> XX >>>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1 >>>>>>>>> XX >>>>>>>>> OS Sus scrofa (pig) >>>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; >>>> Vertebrata; Euteleostomi; >>>>>>>>> Mammalia; >>>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; >>>> Suidae; Sus. >>>>>>>>> OX NCBI_TaxID=9823; >>>>>>>>> ......... >>>>>>>>> >>>>>>>>> I want the accession number in the line that starts >>>> with PA, AB000170 >>>>>>>>> in this example. >>>>>>>>> >>>>>>>>> Can anybody kindly help, tell me which module and >>>> method I should use? >>>>>>>>> I tried various things like $seq_obj -> primary_id, >>>>>>>>> display_id, >>>>>>>>> get_secondary_id, etc.. they did not work... >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> >>>>>>>>> Wen >>>>>>>>> _______________________________________________ >>>> _______________________________________________ >>>> Bioperl-l mailing list >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Marie-Claude Hofmann > College of Veterinary Medicine > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From nsh9351 at rit.edu Mon Jun 16 10:17:41 2008 From: nsh9351 at rit.edu (Nathan Haseley (RIT Student)) Date: Mon, 16 Jun 2008 10:17:41 -0400 Subject: [Bioperl-l] Limiting organism scope on bioperl BLAST searches Message-ID: <7D75703BC8E1C149BF78A1E79AAAB169035BC511@svits28.main.ad.rit.edu> Hello! I am trying to write a perl script comparing proteins from a group of mammals. I am only interested in 5 organisms. Is there a way to limit my bioperl BLAST search to specific genome databases? Do you have any other suggestions for the most effective way to achieve this filtering? Thanks! Nathan From akarger at CGR.Harvard.edu Mon Jun 16 12:18:30 2008 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 16 Jun 2008 12:18:30 -0400 Subject: [Bioperl-l] Installing bioperl with Strawberry Message-ID: <72AF30DC2881964CB911FD08E57157E7012188EE@lsdiv-msxbe-001.nucleus.harvard.edu> I got a new Windows box and was going to install Bioperl. Questions: has anyone had experience, positive or negative, with using Strawberry Perl + Bioperl? I was thinking of going the Strawberry route because they claim it acts more like UNIX Perl, which would be nice. But if SP doesn't work with Bioperl, or is just generally horrible, I don't want to install it. SP claims it supports PPM, but I don't know whether I should use that or the CPAN bioperl install. Any thoughts? -Amir Karger From Kevin.M.Brown at asu.edu Mon Jun 16 12:47:43 2008 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Jun 2008 09:47:43 -0700 Subject: [Bioperl-l] Installing bioperl with Strawberry In-Reply-To: <72AF30DC2881964CB911FD08E57157E7012188EE@lsdiv-msxbe-001.nucleus.harvard.edu> References: <72AF30DC2881964CB911FD08E57157E7012188EE@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: <1A4207F8295607498283FE9E93B775B404FCCAB4@EX02.asurite.ad.asu.edu> No experiences with SP, but if it supports PPM, then I would use those rather than cpan for getting BioPerl installed on your system. http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger > Sent: Monday, June 16, 2008 9:19 AM > To: bioperl-l at portal.open-bio.org > Subject: [Bioperl-l] Installing bioperl with Strawberry > > I got a new Windows box and was going to install Bioperl. > > Questions: has anyone had experience, positive or negative, with using > Strawberry Perl + Bioperl? I was thinking of going the > Strawberry route > because they claim it acts more like UNIX Perl, which would > be nice. But > if SP doesn't work with Bioperl, or is just generally > horrible, I don't > want to install it. > > SP claims it supports PPM, but I don't know whether I should > use that or > the CPAN bioperl install. > > Any thoughts? > > -Amir Karger > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Mon Jun 16 13:35:53 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Jun 2008 19:35:53 +0200 Subject: [Bioperl-l] Limiting organism scope on bioperl BLAST searches In-Reply-To: <7D75703BC8E1C149BF78A1E79AAAB169035BC511@svits28.main.ad.rit.edu> References: <7D75703BC8E1C149BF78A1E79AAAB169035BC511@svits28.main.ad.rit.edu> Message-ID: <628aabb70806161035i57995291w935b3ae3904bdecc@mail.gmail.com> Hi Nathan, It's not clear from your script whether you are doing your BLAST search locally with the database on your own computer or whether you're doing a netblast (blastcl3) where the database is on the NCBI server. If the latter, you can use the -u option to limit your search to the results of an Entrez query. See the blastcl3 manual on NCBI's website for details; I think there's even an example for this very purpose. If the former, you could create a database with sequences from the 5 mammalian species. There are lots of ways to do that, but I usually do an Entrez search crafted to my purpose and download the FASTA sequences. Or you could do the same thing at Ensembl, either via the website or via the Perl API. Dave From lamq at usal.es Wed Jun 18 07:50:16 2008 From: lamq at usal.es (Luis A. M. Quintales) Date: Wed, 18 Jun 2008 13:50:16 +0200 Subject: [Bioperl-l] bp_genbank2gff3 and method "binomial" problem Message-ID: <4858F678.4070504@usal.es> Hello! I want to convert a set of EMBL files (genome of S.pombe version Sep 04) to GFF files. First I try bp_genbank2gff3.pl --format EMBL chromosome1.contig.embl and I obtain Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl line 674, line 103671 So, I convert the embl to genbank file using a perl program with Bio::SeqIO methods. This works. Then, I retry bp_genbank2gff3.pl chromosome1.gb and ... I obtain again the same error Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl line 674, line 103496 Some help? Thank you. -- ================================================== Luis Antonio Miguel Quintales Departamento de Inform?tica y Autom?tica Facultad de Ciencias Universidad de Salamanca Plaza de la Merced s/n 37008-SALAMANCA SPAIN ================================================== Tel.: +34-923-294400(ext.1513) Fax.: +34-923-294584 E-mail: lamq at usal.es ================================================== From David.Messina at sbc.su.se Wed Jun 18 09:20:10 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 18 Jun 2008 15:20:10 +0200 Subject: [Bioperl-l] bp_genbank2gff3 and method "binomial" problem In-Reply-To: <4858F678.4070504@usal.es> References: <4858F678.4070504@usal.es> Message-ID: <628aabb70806180620j6e8c6614sde2661b5cd0aee00@mail.gmail.com> Hi Luis, This is the line that you're dying on: my ($species)= $seq->annotation->get_Annotations("species") || ( $seq->can('species') ? $seq->species()->binomial() : undef ); So it looks like there might be a problem with the species field in one of the Genbank entries. Could you extract from your input file the record that it is choking on? We need it to reproduce the error. And then, to help us keep track of the problem, could you submit a copy of that record and your description of what happened to our bug tracker? http://bugzilla.open-bio.org/ Please be sure to indicate which version of BioPerl you are using. Line 674 in the current version of genbank2gff3.pl is a comment (the above is line 672), so I suspect you may be using an outdated version. The problem may already have been fixed in the current version. More instructions here if needed: http://www.bioperl.org/wiki/Bugs Thanks, Dave From jajams at utu.fi Wed Jun 18 18:52:47 2008 From: jajams at utu.fi (=?ISO-8859-15?Q?Joonas_J=E4msen?=) Date: Thu, 19 Jun 2008 01:52:47 +0300 Subject: [Bioperl-l] hmmpfam Message-ID: <485991BF.90307@utu.fi> How to parse hmmpfam output to get gi/sequence of specified domain, score, E-value? From heikki at sanbi.ac.za Thu Jun 19 06:08:32 2008 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 19 Jun 2008 12:08:32 +0200 Subject: [Bioperl-l] hmmpfam In-Reply-To: <485991BF.90307@utu.fi> References: <485991BF.90307@utu.fi> Message-ID: <200806191208.33177.heikki@sanbi.ac.za> Joonas, I am not familiar with hmmpfam, but is is part of the HMMERtoolkit, so it should be possible to use Bio::SearchIO modules for parsing the output. In addition, it is possible to run HMMER programs directly from BioPerl (the bioperl-run repositoty is needed): See: http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bio/Tools/Run/Hmmer.html http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SearchIO/hmmer.html for code examples, and SearchIO HOWTO http://bio.perl.org/wiki/HOWTO%3ASearchIO for an introduction to Bio::SearchIO. -Heikki On Thursday 19 June 2008 00:52:47 Joonas J?msen wrote: > How to parse hmmpfam output to get gi/sequence of specified domain, > score, E-value? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Thu Jun 19 06:44:43 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 19 Jun 2008 11:44:43 +0100 Subject: [Bioperl-l] hmmpfam In-Reply-To: <200806191208.33177.heikki@sanbi.ac.za> References: <485991BF.90307@utu.fi> <200806191208.33177.heikki@sanbi.ac.za> Message-ID: <485A389B.4000700@sendu.me.uk> Heikki Lehvaslaiho wrote: > Joonas, > > I am not familiar with hmmpfam, but is is part of the HMMERtoolkit, so it > should be possible to use Bio::SearchIO modules for parsing the output. In > addition, it is possible to run HMMER programs directly from BioPerl (the > bioperl-run repositoty is needed): > > See: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-run/Bio/Tools/Run/Hmmer.html > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SearchIO/hmmer.html You may also prefer hmmer_pull: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SearchIO/hmmer_pull.html Which is faster and might behave more like you expect. From govind.chandra at bbsrc.ac.uk Thu Jun 19 09:42:23 2008 From: govind.chandra at bbsrc.ac.uk (Govind Chandra) Date: Thu, 19 Jun 2008 14:42:23 +0100 Subject: [Bioperl-l] Bio::Ontology help solicited In-Reply-To: References: Message-ID: <1213882943.7713.43.camel@jic51958.jic.bbsrc.ac.uk> Hi, In the code snippet below it seems I get the Bio::Ontology::Ontology object $ontology but $ontology->find_terms is failing. It will help greatly if someone points out what is wrong with this code and suggest a correction. That the obo file is being read is obvious from the time it takes from invocation to output. I wrote another version of this script which read the "go" format and it works fine. I cannot use the go format because it is deprecated and not frequently updated. Even if I am willing to use the go format, the process.ontology file (from geneontology.org) has an error in it which causes Bio::OntologyIO->next_ontology() to fail. It will be best if I could make the obo format work for me. Cheers Govind ### code begins ### use strict; use Bio::OntologyIO; my $file='gene_ontology.1_2.obo'; my $parser = Bio::OntologyIO->new( -format => "obo", -file => $file ); my $ontology = $parser->next_ontology(); print(ref($ontology), "\n"); my ($term)=$ontology->find_terms(-name => "GO:0000072"); print($term->definition(), "\n"); print("$term\n"); print(ref($term),"\n"); ### code ends ### ### output begins ### Bio::Ontology::Ontology Can't call method "definition" on an undefined value at ontology_check.pl line 26, line 288014. ### output ends ### Other Possibly relevant information: ------------------------------------ Perl is: v5.8.8 built for x86_64-linux-thread-multi BioPerl is: bioperl-1.5.2_102 Output of uname -a: Linux n61347 2.6.18-92.1.1.el5 #1 SMP Thu May 22 09:01:47 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Contents of /etc/redhat-release: Red Hat Enterprise Linux Server release 5.2 (Tikanga) From byuhobbes85 at yahoo.com Thu Jun 19 09:43:32 2008 From: byuhobbes85 at yahoo.com (byuhobbes) Date: Thu, 19 Jun 2008 06:43:32 -0700 (PDT) Subject: [Bioperl-l] StandAloneBlast Message-ID: <18009126.post@talk.nabble.com> I'm running blastall with StandAloneBlast for the first time. There is quite a bit of documentation about how to run the blast, and how to get your blast report object, but I am having trouble finding documentation on what values you can retrieve from the report object and how you do so. Any tips? Thanks. ----- -------------------- -- View this message in context: http://www.nabble.com/StandAloneBlast-tp18009126p18009126.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From hlapp at gmx.net Thu Jun 19 10:07:39 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 19 Jun 2008 10:07:39 -0400 Subject: [Bioperl-l] Bio::Ontology help solicited In-Reply-To: <1213882943.7713.43.camel@jic51958.jic.bbsrc.ac.uk> References: <1213882943.7713.43.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <39E954A9-72BB-466B-AEA3-1003B77CD49D@gmx.net> I think you want to use -identifier rather than -name if you are querying by an identifier. Let us know if that doesn't work either. -hilmar On Jun 19, 2008, at 9:42 AM, Govind Chandra wrote: > Hi, > > In the code snippet below it seems I get the Bio::Ontology::Ontology > object $ontology but $ontology->find_terms is failing. It will help > greatly if someone points out what is wrong with this code and > suggest a > correction. That the obo file is being read is obvious from the time > it > takes from invocation to output. > > I wrote another version of this script which read the "go" format > and it > works fine. I cannot use the go format because it is deprecated and > not > frequently updated. Even if I am willing to use the go format, the > process.ontology file (from geneontology.org) has an error in it which > causes Bio::OntologyIO->next_ontology() to fail. It will be best if I > could make the obo format work for me. > > Cheers > > Govind > > ### code begins ### > > use strict; > use Bio::OntologyIO; > > my $file='gene_ontology.1_2.obo'; > my $parser = Bio::OntologyIO->new( > -format => "obo", > -file => $file > ); > my $ontology = $parser->next_ontology(); > print(ref($ontology), "\n"); > my ($term)=$ontology->find_terms(-name => "GO:0000072"); > print($term->definition(), "\n"); > print("$term\n"); > print(ref($term),"\n"); > > ### code ends ### > > ### output begins ### > Bio::Ontology::Ontology > > > Can't call method "definition" on an undefined value at > ontology_check.pl line 26, line 288014. > ### output ends ### > > > Other Possibly relevant information: > ------------------------------------ > > Perl is: v5.8.8 built for x86_64-linux-thread-multi > > BioPerl is: bioperl-1.5.2_102 > > Output of uname -a: Linux n61347 2.6.18-92.1.1.el5 #1 SMP Thu May 22 > 09:01:47 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux > > Contents of /etc/redhat-release: Red Hat Enterprise Linux Server > release > 5.2 (Tikanga) > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From govind.chandra at bbsrc.ac.uk Thu Jun 19 10:27:21 2008 From: govind.chandra at bbsrc.ac.uk (Govind Chandra) Date: Thu, 19 Jun 2008 15:27:21 +0100 Subject: [Bioperl-l] Bio::Ontology help solicited In-Reply-To: <39E954A9-72BB-466B-AEA3-1003B77CD49D@gmx.net> References: <1213882943.7713.43.camel@jic51958.jic.bbsrc.ac.uk> <39E954A9-72BB-466B-AEA3-1003B77CD49D@gmx.net> Message-ID: <1213885641.7713.54.camel@jic51958.jic.bbsrc.ac.uk> Hi Hilmar, Thanks for the very quick response. I changed the line below my ($term)=$ontology->find_terms(-name => "GO:0000072"); to my ($term)=$ontology->find_terms(-identifier => "GO:0000072"); but it does not make any difference. From perldoc I gathered that -name and -identifier should do the same thing anyway. Cheers Govind On Thu, 2008-06-19 at 10:07 -0400, Hilmar Lapp wrote: > I think you want to use -identifier rather than -name if you are > querying by an identifier. > > Let us know if that doesn't work either. > > -hilmar > > On Jun 19, 2008, at 9:42 AM, Govind Chandra wrote: > > > Hi, > > > > In the code snippet below it seems I get the Bio::Ontology::Ontology > > object $ontology but $ontology->find_terms is failing. It will help > > greatly if someone points out what is wrong with this code and > > suggest a > > correction. That the obo file is being read is obvious from the time > > it > > takes from invocation to output. > > > > I wrote another version of this script which read the "go" format > > and it > > works fine. I cannot use the go format because it is deprecated and > > not > > frequently updated. Even if I am willing to use the go format, the > > process.ontology file (from geneontology.org) has an error in it which > > causes Bio::OntologyIO->next_ontology() to fail. It will be best if I > > could make the obo format work for me. > > > > Cheers > > > > Govind > > > > ### code begins ### > > > > use strict; > > use Bio::OntologyIO; > > > > my $file='gene_ontology.1_2.obo'; > > my $parser = Bio::OntologyIO->new( > > -format => "obo", > > -file => $file > > ); > > my $ontology = $parser->next_ontology(); > > print(ref($ontology), "\n"); > > my ($term)=$ontology->find_terms(-name => "GO:0000072"); > > print($term->definition(), "\n"); > > print("$term\n"); > > print(ref($term),"\n"); > > > > ### code ends ### > > > > ### output begins ### > > Bio::Ontology::Ontology > > > > > > Can't call method "definition" on an undefined value at > > ontology_check.pl line 26, line 288014. > > ### output ends ### > > > > > > Other Possibly relevant information: > > ------------------------------------ > > > > Perl is: v5.8.8 built for x86_64-linux-thread-multi > > > > BioPerl is: bioperl-1.5.2_102 > > > > Output of uname -a: Linux n61347 2.6.18-92.1.1.el5 #1 SMP Thu May 22 > > 09:01:47 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux > > > > Contents of /etc/redhat-release: Red Hat Enterprise Linux Server > > release > > 5.2 (Tikanga) > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Jun 19 10:46:04 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 19 Jun 2008 09:46:04 -0500 Subject: [Bioperl-l] StandAloneBlast In-Reply-To: <18009126.post@talk.nabble.com> References: <18009126.post@talk.nabble.com> Message-ID: See the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO chris On Jun 19, 2008, at 8:43 AM, byuhobbes wrote: > > I'm running blastall with StandAloneBlast for the first time. There > is quite > a bit of documentation about how to run the blast, and how to get > your blast > report object, but I am having trouble finding documentation on what > values > you can retrieve from the report object and how you do so. > > Any tips? > > Thanks. > > ----- > -------------------- > > -- > View this message in context: http://www.nabble.com/StandAloneBlast-tp18009126p18009126.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From hlapp at gmx.net Thu Jun 19 10:50:57 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 19 Jun 2008 10:50:57 -0400 Subject: [Bioperl-l] Bio::Ontology help solicited In-Reply-To: <1213885641.7713.54.camel@jic51958.jic.bbsrc.ac.uk> References: <1213882943.7713.43.camel@jic51958.jic.bbsrc.ac.uk> <39E954A9-72BB-466B-AEA3-1003B77CD49D@gmx.net> <1213885641.7713.54.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <5C819020-9094-416C-9B3B-DD085AB01602@gmx.net> On Jun 19, 2008, at 10:27 AM, Govind Chandra wrote: > I changed the line below > > my ($term)=$ontology->find_terms(-name => "GO:0000072"); > > to > > my ($term)=$ontology->find_terms(-identifier => "GO:0000072"); That's odd. Did you convince yourself that a term with this identifier is indeed in the ontology? If you iterate over all terms (using e.g. $ontology->get_all_terms()), is there a term with this identifier? E.g.: print join("\n", map { $_->identifier; } (grep { $_->identifier =~ /0000072/; } $ontology->get_all_terms())),"\n"; Does this print anything? > but it does not make any difference. From perldoc I gathered that - > name > and -identifier should do the same thing anyway. I'm not sure where you gathered that from. Could you point me to the location that says that? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Thu Jun 19 10:32:05 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 19 Jun 2008 15:32:05 +0100 Subject: [Bioperl-l] StandAloneBlast In-Reply-To: <18009126.post@talk.nabble.com> References: <18009126.post@talk.nabble.com> Message-ID: <485A6DE5.4060503@sendu.me.uk> byuhobbes wrote: > I'm running blastall with StandAloneBlast for the first time. There is quite > a bit of documentation about how to run the blast, and how to get your blast > report object, but I am having trouble finding documentation on what values > you can retrieve from the report object and how you do so. Look at the documentation for the objects you retrieve. http://docs.bioperl.org/bioperl-live/Bio/SearchIO/blast.html $searchio->next_result returns a Bio::Search::Result::ResultI, actually a Bio::Search::Result::BlastResult: http://docs.bioperl.org/bioperl-live/Bio/Search/Result/BlastResult.html which isa: http://docs.bioperl.org/bioperl-live/Bio/Search/Result/GenericResult.html These return $hit = $result->next_hit() Bio::Search::Hit::HitI objects: http://docs.bioperl.org/bioperl-live/Bio/Search/Hit/BlastHit.html http://docs.bioperl.org/bioperl-live/Bio/Search/Hit/GenericHit.html which in turn will give you HSP objects: http://docs.bioperl.org/bioperl-live/Bio/Search/HSP/BlastHSP.html http://docs.bioperl.org/bioperl-live/Bio/Search/HSP/GenericHSP.html From michael.watson at bbsrc.ac.uk Thu Jun 19 10:51:41 2008 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 19 Jun 2008 15:51:41 +0100 Subject: [Bioperl-l] Working with SeqWithQuality Message-ID: <8975119BCD0AC5419D61A9CF1A923E9506D873FE@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I have fasta files and separate quality fasta files. I understand all about constructing a SeqWithQuality object, that's the easy bit, but are there no functions for actually manipulating the sequence based on the quality values? For example, trimming the ends where a moving window average quality does not go above a certain value, or masking areas with low quality? Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From michael.watson at bbsrc.ac.uk Thu Jun 19 11:28:24 2008 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 19 Jun 2008 16:28:24 +0100 Subject: [Bioperl-l] Working with SeqWithQuality In-Reply-To: <485A7AAD.1010604@cam.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9506D873FE@iahce2ksrv1.iah.bbsrc.ac.uk> <485A7AAD.1010604@cam.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9506D873FF@iahce2ksrv1.iah.bbsrc.ac.uk> Hi Roy That's exactly what I want, thanks, it just wasn't where I was expecting it Mick -----Original Message----- From: Roy Chaudhuri [mailto:rrc22 at cam.ac.uk] Sent: 19 June 2008 16:27 To: michael watson (IAH-C) Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Working with SeqWithQuality Hi Mick, I think you want Bio::Tools::Alignment::Trim. There are lots of caveats in the POD but as I recall it works fine. Here is a small script I wrote to return contigs over a certain length and a certain minimum quality (not exactly what you asked for but I'm sure it could be adapted). #!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; use Bio::Tools::Alignment::Trim; use Getopt::Long; die "Usage: trimqual fasta_file qual_file -qual 20 -window 10 -min 500\n" unless $ARGV[1]; my $phred=20; my $window=10; my $min=500; GetOptions ('phred|qual=i'=>\$phred, 'window=i'=>\$window, 'minimum=i'=>\$min) or die "Unrecognised option\n"; my $fasta=Bio::SeqIO->newFh(-file=>$ARGV[0], -format=>'fasta'); my $quality=Bio::SeqIO->newFh(-file=>$ARGV[1], -format=>'qual'); my $out=Bio::SeqIO->newFh(-format=>'fasta'); my $count=0; while (<$fasta>) { my $seq=$_; my $qual=<$quality>; my $trim=Bio::Tools::Alignment::Trim->new(-phreds=>$phred, -windowsize=>$window); my ($start,$end) = @{$trim->trim_singlet($seq->seq, join(' ',@{$qual->qual}), $seq->display_id,'singlet')}; next if $end==0; my $trimmed_sequence=substr($seq->seq, $start, $end+1-$start); my $length=length $trimmed_sequence; print STDERR $seq->display_id, ": $start-$end [$length bp]"; if ($length<$min) { print STDERR " - skipped\n"; next; } else { print STDERR "\n"; } my $trimseq=Bio::Seq->new(-seq=>$trimmed_sequence, -id=>$seq->display_id); print $out $trimseq; $count++; } print STDERR "\nFinished. $count contigs larger than $min bp. were found.\n"; Cheers, Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. michael watson (IAH-C) wrote: > Hi > > I have fasta files and separate quality fasta files. I understand all > about constructing a SeqWithQuality object, that's the easy bit, but are > there no functions for actually manipulating the sequence based on the > quality values? For example, trimming the ends where a moving window > average quality does not go above a certain value, or masking areas with > low quality? > > Thanks > Mick > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers-institute.org Thu Jun 19 11:37:20 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 19 Jun 2008 10:37:20 -0500 Subject: [Bioperl-l] Working with SeqWithQuality In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9506D873FE@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9506D873FE@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: Michael, If you happen to be working abi chromatogram files you can use perl's Bio::Trace::ABIF (c.f. CPAN) to do this directly easily without need for SeqWIthQUality objects. ??? Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) Sent: Thursday, June 19, 2008 9:52 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] Working with SeqWithQuality Hi I have fasta files and separate quality fasta files. I understand all about constructing a SeqWithQuality object, that's the easy bit, but are there no functions for actually manipulating the sequence based on the quality values? For example, trimming the ends where a moving window average quality does not go above a certain value, or masking areas with low quality? Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From govind.chandra at bbsrc.ac.uk Thu Jun 19 11:54:20 2008 From: govind.chandra at bbsrc.ac.uk (Govind Chandra) Date: Thu, 19 Jun 2008 16:54:20 +0100 Subject: [Bioperl-l] Bio::Ontology help solicited In-Reply-To: <5C819020-9094-416C-9B3B-DD085AB01602@gmx.net> References: <1213882943.7713.43.camel@jic51958.jic.bbsrc.ac.uk> <39E954A9-72BB-466B-AEA3-1003B77CD49D@gmx.net> <1213885641.7713.54.camel@jic51958.jic.bbsrc.ac.uk> <5C819020-9094-416C-9B3B-DD085AB01602@gmx.net> Message-ID: <1213890860.7713.87.camel@jic51958.jic.bbsrc.ac.uk> Hi Hilmar, Below is an extract from perldoc Bio::Ontology::Ontology from which I _wrongly_ concluded that -identifier and -name do the same thing. find_terms Title : find_terms Usage : ($term) = $oe->find_terms(-identifier => "SO:0000263"); Function: Find term instances matching queries for their attributes. An implementation may not support querying for arbitrary attributes, but can generally be expected to accept -identifier and -name as queries. If both are provided, they are implicitly intersected. Example : Returns : an array of zero or more Bio::Ontology::TermI objects Args : Named parameters. The following parameters should be recognized by any implementations: -identifier query by the given identifier -name query by the given name I had been messing with this script for a few hours before posting for help and had tried various combinations of -identifier and -name. But neither -identifier => "GO:0000072" nor -name => "M phase specific microtubule process" work. I know that the GOid exists because I can find it in the obo file. There is something more fundamentally wrong in my script than just the syntax. Like you suggested, below, I get_all_terms() but the list @allterms is empty. ### code begins ### use strict; use Bio::OntologyIO; my $file='gene_ontology.1_2.obo'; my $parser = Bio::OntologyIO->new( -format => "obo", -file => $file ); my $ontology = $parser->next_ontology(); print(ref($ontology), "\n"); my @allterms = $ontology->get_all_terms(); print(scalar(@allterms), "\n"); my $limiter; foreach my $term (@allterms) { print($term->name(),"\n"); print($term->definition(),"\n"); if($limiter++ > 30) {last;} } exit; ### code ends ### ### output begins ### Bio::Ontology::Ontology 0 ### output ends ### Given that the script works with the "go" format files (except process.ontology in which case it complains and exits) I suspect the obo file but am surprised that Bio::OntologyIO->next_ontology() does not complain. Will it be asking for too much to request you to get the obo file I am trying to parse from ftp.geneontology.org/pub/go/ontology/obo_format_1_2/gene_ontology.1_2.obo and see if it works for you? Thanks Govind On Thu, 2008-06-19 at 10:50 -0400, Hilmar Lapp wrote: > On Jun 19, 2008, at 10:27 AM, Govind Chandra wrote: > > > I changed the line below > > > > my ($term)=$ontology->find_terms(-name => "GO:0000072"); > > > > to > > > > my ($term)=$ontology->find_terms(-identifier => "GO:0000072"); > > That's odd. Did you convince yourself that a term with this identifier > is indeed in the ontology? If you iterate over all terms (using e.g. > $ontology->get_all_terms()), is there a term with this identifier? > > E.g.: print join("\n", map { $_->identifier; } > (grep { $_->identifier =~ /0000072/; } > $ontology->get_all_terms())),"\n"; > > Does this print anything? > > > but it does not make any difference. From perldoc I gathered that - > > name > > and -identifier should do the same thing anyway. > > > I'm not sure where you gathered that from. Could you point me to the > location that says that? > > -hilmar From roy.chaudhuri at gmail.com Thu Jun 19 11:44:37 2008 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 19 Jun 2008 16:44:37 +0100 Subject: [Bioperl-l] Working with SeqWithQuality Message-ID: <485A7EE5.3090805@gmail.com> Hi Mick, I think you want Bio::Tools::Alignment::Trim. There are lots of caveats in the POD but as I recall it works fine. Here is a small script I wrote to return contigs over a certain length and a certain minimum quality (not exactly what you asked for but I'm sure it could be adapted). #!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; use Bio::Tools::Alignment::Trim; use Getopt::Long; die "Usage: trimqual fasta_file qual_file -qual 20 -window 10 -min 500\n" unless $ARGV[1]; my $phred=20; my $window=10; my $min=500; GetOptions ('phred|qual=i'=>\$phred, 'window=i'=>\$window, 'minimum=i'=>\$min) or die "Unrecognised option\n"; my $fasta=Bio::SeqIO->newFh(-file=>$ARGV[0], -format=>'fasta'); my $quality=Bio::SeqIO->newFh(-file=>$ARGV[1], -format=>'qual'); my $out=Bio::SeqIO->newFh(-format=>'fasta'); my $count=0; while (<$fasta>) { my $seq=$_; my $qual=<$quality>; my $trim=Bio::Tools::Alignment::Trim->new(-phreds=>$phred, -windowsize=>$window); my ($start,$end) = @{$trim->trim_singlet($seq->seq, join(' ',@{$qual->qual}), $seq->display_id,'singlet')}; next if $end==0; my $trimmed_sequence=substr($seq->seq, $start, $end+1-$start); my $length=length $trimmed_sequence; print STDERR $seq->display_id, ": $start-$end [$length bp]"; if ($length<$min) { print STDERR " - skipped\n"; next; } else { print STDERR "\n"; } my $trimseq=Bio::Seq->new(-seq=>$trimmed_sequence, -id=>$seq->display_id); print $out $trimseq; $count++; } print STDERR "\nFinished. $count contigs larger than $min bp. were found.\n"; Cheers, Roy. -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. michael watson (IAH-C) wrote: > Hi > > I have fasta files and separate quality fasta files. I understand all > about constructing a SeqWithQuality object, that's the easy bit, but are > there no functions for actually manipulating the sequence based on the > quality values? For example, trimming the ends where a moving window > average quality does not go above a certain value, or masking areas with > low quality? > > Thanks > Mick > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Thu Jun 19 12:06:26 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 19 Jun 2008 12:06:26 -0400 Subject: [Bioperl-l] Bio::Ontology help solicited In-Reply-To: <1213890860.7713.87.camel@jic51958.jic.bbsrc.ac.uk> References: <1213882943.7713.43.camel@jic51958.jic.bbsrc.ac.uk> <39E954A9-72BB-466B-AEA3-1003B77CD49D@gmx.net> <1213885641.7713.54.camel@jic51958.jic.bbsrc.ac.uk> <5C819020-9094-416C-9B3B-DD085AB01602@gmx.net> <1213890860.7713.87.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <72EB057E-9E60-46B4-9E99-2117271AF135@gmx.net> Hi Govind, this looks like a bug (though it could indeed be a problem with the obo file - that has happened in the past too). Would you mind posting it at bugzilla.open-bio.org? -hilmar On Jun 19, 2008, at 11:54 AM, Govind Chandra wrote: > Hi Hilmar, > > Below is an extract from perldoc Bio::Ontology::Ontology from which I > _wrongly_ concluded that -identifier and -name do the same thing. > > > > find_terms > > Title : find_terms > Usage : ($term) = $oe->find_terms(-identifier => "SO: > 0000263"); > Function: Find term instances matching queries for their > attributes. > > An implementation may not support querying for > arbitrary > attributes, but can generally be expected to accept > -identifier and -name as queries. If both are > provided, > they are implicitly intersected. > > Example : > Returns : an array of zero or more Bio::Ontology::TermI objects > Args : Named parameters. The following parameters should > be recognized > by any implementations: > > -identifier query by the given identifier > -name query by the given name > > > > I had been messing with this script for a few hours before posting for > help and had tried various combinations of -identifier and -name. But > neither -identifier => "GO:0000072" nor -name => "M phase specific > microtubule process" work. I know that the GOid exists because I can > find it in the obo file. There is something more fundamentally wrong > in > my script than just the syntax. Like you suggested, below, I > get_all_terms() but the list @allterms is empty. > > ### code begins ### > use strict; > use Bio::OntologyIO; > > my $file='gene_ontology.1_2.obo'; > my $parser = Bio::OntologyIO->new( > -format => "obo", > -file => $file > ); > my $ontology = $parser->next_ontology(); > print(ref($ontology), "\n"); > my @allterms = $ontology->get_all_terms(); > print(scalar(@allterms), "\n"); > my $limiter; > foreach my $term (@allterms) { > print($term->name(),"\n"); > print($term->definition(),"\n"); > if($limiter++ > 30) {last;} > } > exit; > ### code ends ### > > ### output begins ### > Bio::Ontology::Ontology > 0 > ### output ends ### > > Given that the script works with the "go" format files (except > process.ontology in which case it complains and exits) I suspect the > obo > file but am surprised that Bio::OntologyIO->next_ontology() does not > complain. Will it be asking for too much to request you to get the obo > file I am trying to parse from > > ftp.geneontology.org/pub/go/ontology/obo_format_1_2/gene_ontology.1_2.obo > > and see if it works for you? > > Thanks > > Govind > > > > > > On Thu, 2008-06-19 at 10:50 -0400, Hilmar Lapp wrote: >> On Jun 19, 2008, at 10:27 AM, Govind Chandra wrote: >> >>> I changed the line below >>> >>> my ($term)=$ontology->find_terms(-name => "GO:0000072"); >>> >>> to >>> >>> my ($term)=$ontology->find_terms(-identifier => "GO:0000072"); >> >> That's odd. Did you convince yourself that a term with this >> identifier >> is indeed in the ontology? If you iterate over all terms (using e.g. >> $ontology->get_all_terms()), is there a term with this identifier? >> >> E.g.: print join("\n", map { $_->identifier; } >> (grep { $_->identifier =~ /0000072/; } >> $ontology->get_all_terms())),"\n"; >> >> Does this print anything? >> >>> but it does not make any difference. From perldoc I gathered that - >>> name >>> and -identifier should do the same thing anyway. >> >> >> I'm not sure where you gathered that from. Could you point me to the >> location that says that? >> >> -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From johnsonm at gmail.com Thu Jun 19 15:38:32 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 19 Jun 2008 14:38:32 -0500 Subject: [Bioperl-l] Bio::FeatureIO::gff Message-ID: I recently had to do some gff3 generation/munging, so I took a look at Bio::FeatureIO::gff. I ran into a few issues: - The _handle_feature method (called by next_feature) attaches Dbxref attributes using 'Dbxref' as the key. However, _write_feature_3 uses 'dblink' for the key when looking for Dbxref attributes. I changed _handle_feature to use 'dblink' also, but I'm not sure that's any more (or less) correct than changing _write_feature_3 to use 'Dbxref'. Anybody have any strong opinions one way or the other? - Sendu made some changes to _write_feature_25 and _write_feature_3, but missed a line in _write_feature_3. I think line 890 should be my $phase = defined($feature->phase) ? (ref($feature->phase) ? $feature->phase->value : $feature->phase) : '.'; instead of my $phase = $feature->phase->value; to be consistent. - Also in _write_feature_3, the Dbxref attributes are wrapped in a call to uri_escape(). This generates a mangled gff3 that Apollo, at least, does not like. Also, looking at the gff3 spec, I do not believe this is correct behaviour. Quoting http://www.sequenceontology.org/gff3.shtml: The value of both Ontology_term and Dbxref is the ID of the cross referenced object in the form "DBTAG:ID". The DBTAG indicates which database the referenced object can be found in, and ID indicates the identifier of the object within that database. IDs can contain unescaped colons but DBTAGs cannot, so parsing code should split on the first colon encountered in the attribute value. So the key (DBTAG) should be escaped, but not the value (ID). The code presently escapes both: my $vstring = join ',', map {uri_escape($_->database .':'. $_->primary_id)} @v; which should probably be something like my $vstring = join ',', map {uri_escape($_->database) .':'. $_->primary_id} @v; So, what are the plans for Bio::FeatureIO? I find it kind of handy, so unless it's going to be scrapped in favor of something else, any objection to lobbing a ticket and patch at Bugzilla? From sidd.basu at gmail.com Thu Jun 19 15:55:10 2008 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 19 Jun 2008 14:55:10 -0500 Subject: [Bioperl-l] Re: Bio::Ontology help solicited In-Reply-To: <1213890860.7713.87.camel@jic51958.jic.bbsrc.ac.uk> References: <1213882943.7713.43.camel@jic51958.jic.bbsrc.ac.uk> <39E954A9-72BB-466B-AEA3-1003B77CD49D@gmx.net> <1213885641.7713.54.camel@jic51958.jic.bbsrc.ac.uk> <5C819020-9094-416C-9B3B-DD085AB01602@gmx.net> <1213890860.7713.87.camel@jic51958.jic.bbsrc.ac.uk> Message-ID: <485ab9c6.3ff0220a.6b13.1589@mx.google.com> Hi Govind, There are few ways by which you could search and find the term... On Thu, 19 Jun 2008, Govind Chandra wrote: > Hi Hilmar, > > Below is an extract from perldoc Bio::Ontology::Ontology from which I > _wrongly_ concluded that -identifier and -name do the same thing. > > > > find_terms > > Title : find_terms > Usage : ($term) = $oe->find_terms(-identifier => "SO:0000263"); > Function: Find term instances matching queries for their attributes. > > An implementation may not support querying for arbitrary > attributes, but can generally be expected to accept > -identifier and -name as queries. If both are provided, > they are implicitly intersected. > > Example : > Returns : an array of zero or more Bio::Ontology::TermI objects > Args : Named parameters. The following parameters should be recognized > by any implementations: > > -identifier query by the given identifier > -name query by the given name > > > > I had been messing with this script for a few hours before posting for > help and had tried various combinations of -identifier and -name. But > neither -identifier => "GO:0000072" nor -name => "M phase specific > microtubule process" work. I know that the GOid exists because I can > find it in the obo file. There is something more fundamentally wrong in > my script than just the syntax. Like you suggested, below, I > get_all_terms() but the list @allterms is empty. > > ### code begins ### > use strict; > use Bio::OntologyIO; > > my $file='gene_ontology.1_2.obo'; > my $parser = Bio::OntologyIO->new( > -format => "obo", > -file => $file > ); > my $ontology = $parser->next_ontology(); 1. You need to get the ontology object to which the term belongs. And if you don't know if offhand then you have to search and get it. my $id = 'GO:0000072'; my $term; ONT: while ( my $ontrow = $parser->next_ontology() ) { my ($newterm) = $ontrow->find_terms( -identifier => $id ); next ONT if !$newterm; $term = $newterm; last ONT; } if ($term) { print 'First try: ', $term->identifier, "\t", $term->name, "\t", $term->ontology->name(), "\n"; } 2. You know the ontology to which the term belong. my $id = 'GO:0000072'; my $name = 'biological_process'; my $ont; ONT: while ( my $ontrow = $parser->next_ontology() ) { if ($ontrow->name() eq $name) { $ont = $ontrow; last ONT; } } my ($term) = $ont->find_terms(-identifier => $id); if ($term) { print 'Second try: ', $term->identifier, "\t", $term->name, "\t", $term->ontology->name(), "\n"; } 3. Get the engine and search through it my $id = 'GO:0000072'; my $eng = $parser->next_ontology->engine(); my ($term) = $eng->find_terms(-identifier => $id); if ($term) { print 'Third try: ', $term->identifier, "\t", $term->name, "\t", $term->ontology->name(), "\n"; } Hope that helps. -siddhartha > print(ref($ontology), "\n"); > my @allterms = $ontology->get_all_terms(); > print(scalar(@allterms), "\n"); > my $limiter; > foreach my $term (@allterms) { > print($term->name(),"\n"); > print($term->definition(),"\n"); > if($limiter++ > 30) {last;} > } > exit; > ### code ends ### > > ### output begins ### > Bio::Ontology::Ontology > 0 > ### output ends ### > > Given that the script works with the "go" format files (except > process.ontology in which case it complains and exits) I suspect the obo > file but am surprised that Bio::OntologyIO->next_ontology() does not > complain. Will it be asking for too much to request you to get the obo > file I am trying to parse from > > ftp.geneontology.org/pub/go/ontology/obo_format_1_2/gene_ontology.1_2.obo > > and see if it works for you? > > Thanks > > Govind > > > > > > On Thu, 2008-06-19 at 10:50 -0400, Hilmar Lapp wrote: > > On Jun 19, 2008, at 10:27 AM, Govind Chandra wrote: > > > > > I changed the line below > > > > > > my ($term)=$ontology->find_terms(-name => "GO:0000072"); > > > > > > to > > > > > > my ($term)=$ontology->find_terms(-identifier => "GO:0000072"); > > > > That's odd. Did you convince yourself that a term with this identifier > > is indeed in the ontology? If you iterate over all terms (using e.g. > > $ontology->get_all_terms()), is there a term with this identifier? > > > > E.g.: print join("\n", map { $_->identifier; } > > (grep { $_->identifier =~ /0000072/; } > > $ontology->get_all_terms())),"\n"; > > > > Does this print anything? > > > > > but it does not make any difference. From perldoc I gathered that - > > > name > > > and -identifier should do the same thing anyway. > > > > > > I'm not sure where you gathered that from. Could you point me to the > > location that says that? > > > > -hilmar > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Jun 19 16:19:13 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 19 Jun 2008 15:19:13 -0500 Subject: [Bioperl-l] Bio::FeatureIO::gff In-Reply-To: References: Message-ID: On Jun 19, 2008, at 2:38 PM, Mark Johnson wrote: > I recently had to do some gff3 generation/munging, so I took a look at > Bio::FeatureIO::gff. I ran into a few issues: > > - The _handle_feature method (called by next_feature) attaches Dbxref > attributes using 'Dbxref' as the key. However, _write_feature_3 uses > 'dblink' for the key when looking for Dbxref attributes. I changed > _handle_feature to use 'dblink' also, but I'm not sure that's any more > (or less) correct than changing _write_feature_3 to use 'Dbxref'. > Anybody have any strong opinions one way or the other? > > - Sendu made some changes to _write_feature_25 and _write_feature_3, > but missed a line in _write_feature_3. I think line 890 should be > > my $phase = defined($feature->phase) ? (ref($feature->phase) ? > $feature->phase->value : $feature->phase) : '.'; > > instead of > > my $phase = $feature->phase->value; > > to be consistent. > > - Also in _write_feature_3, the Dbxref attributes are wrapped in a > call to uri_escape(). This generates a mangled gff3 that Apollo, at > least, does not like. Also, looking at the gff3 spec, I do not > believe this is correct behaviour. Quoting > http://www.sequenceontology.org/gff3.shtml: > > The value of both Ontology_term and Dbxref is the ID of the cross > referenced object in the form "DBTAG:ID". The DBTAG indicates which > database the referenced object can be found in, and ID indicates the > identifier of the object within that database. IDs can contain > unescaped colons but DBTAGs cannot, so parsing code should split on > the first colon encountered in the attribute value. > > So the key (DBTAG) should be escaped, but not the value (ID). The > code presently escapes both: > > my $vstring = join ',', map {uri_escape($_->database .':'. > $_->primary_id)} @v; > > which should probably be something like > > my $vstring = join ',', map {uri_escape($_->database) .':'. > $_->primary_id} @v; > > > So, what are the plans for Bio::FeatureIO? I find it kind of handy, > so unless it's going to be scrapped in favor of something else, any > objection to lobbing a ticket and patch at Bugzilla? I think the general idea of Bio::FeatureIO will remain (read/write feature data) but it will definitely undergo significant reimplementation. The typed SeqFeatureI class (Bio::SeqFeature::Annotated) would be deprecated in favor of something more lightweight. However, I don't see that happening until after 1.6 is released unless someone wants to take it on. chris From johnsonm at gmail.com Thu Jun 19 16:47:57 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 19 Jun 2008 15:47:57 -0500 Subject: [Bioperl-l] Bio::FeatureIO::gff In-Reply-To: References: Message-ID: On Thu, Jun 19, 2008 at 3:19 PM, Chris Fields wrote: > I think the general idea of Bio::FeatureIO will remain (read/write feature > data) but it will definitely undergo significant reimplementation. The > typed SeqFeatureI class (Bio::SeqFeature::Annotated) would be deprecated in > favor of something more lightweight. > > However, I don't see that happening until after 1.6 is released unless > someone wants to take it on. > > chris I suspect the 'something' you mention is going to require a fair bit of discussion and design. The best time for that would be after 1.6. Otherwise, we're likely to end up with something just as problematic as Bio::SeqFeature::Annotated. However, people are using Bio::FeatureIO as it stands. Unless the plan is to remove it from source control (as opposed to just not packaging it up as part of the 1.6 release), would there be any objection to patching up the existing implementation? If nothing else, it will be the starting point for the reimplementation. Might as well correct obvious defects, so they *don't* get reimplemented. From mirhan at indiana.edu Thu Jun 19 17:07:28 2008 From: mirhan at indiana.edu (Han, Mira) Date: Thu, 19 Jun 2008 17:07:28 -0400 Subject: [Bioperl-l] Bio::Tree::AnnotatableNode Message-ID: Hi, I have a few questions regarding the design of the AnnotatableNode. 1. add_tag_value or Annotation::SimpleValue? I have property tags for nodes that can be defined by the user, that contains generally simple scalar values, I'm currently using the Annotation::SimpleValue to contain them in the node. What I'm wondering is.. should I use the tag/value implementation already in NodeI for these instead? 2. I have to also maintain a unique identifier for each node called id_source. First I was looking to use the internal_id to store the unique ids. But now I realized that we cannot set the internal_id to arbitrary ids, We can only get the values already determined. So now I'm wondering again should I set this id as a tag/value? Or a SimpleAnnotation? Or can I modify the code so that we can set the internal_id? Mira From mirhan at indiana.edu Thu Jun 19 17:23:40 2008 From: mirhan at indiana.edu (Han, Mira) Date: Thu, 19 Jun 2008 17:23:40 -0400 Subject: [Bioperl-l] Bio::Tree::AnnotatableNode In-Reply-To: Message-ID: My current position is that I will override the internal_id method to store the id_source ids instead of the creation_id. And I will use Annotation::SimpleValue instead of the _tags hash in the Tree::Node I guess in that case I'll have to implement the add_tag_value etc to use Annotation::SimpleValue internally? Another quick questions.. Which is the general style, to use hyphenated tags or just normal words for keys like this? Mira On 6/19/08 5:07 PM, "Mira Han" wrote: > > Hi, > I have a few questions regarding the design of the AnnotatableNode. > > 1. add_tag_value or Annotation::SimpleValue? > I have property tags for nodes that can be defined by the user, that contains > generally simple scalar values, > I'm currently using the Annotation::SimpleValue to contain them in the node. > What I'm wondering is.. > should I use the tag/value implementation already in NodeI for these instead? > > 2. I have to also maintain a unique identifier for each node called id_source. > First I was looking to use the internal_id to store the unique ids. > But now I realized that we cannot set the internal_id to arbitrary ids, > We can only get the values already determined. > So now I'm wondering again should I set this id as a tag/value? Or a > SimpleAnnotation? > Or can I modify the code so that we can set the internal_id? > > Mira From johnsonm at gmail.com Thu Jun 19 17:29:01 2008 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 19 Jun 2008 16:29:01 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Genemark Message-ID: It seems that GeneMark.hmm returns 0 even when the license is expired. You can probably guess how I discovered this. I created a bug and attached a fix: http://bugzilla.open-bio.org/show_bug.cgi?id=2523 If you've built Bio::Tools::Run::Genemark into a gene prediction pipeline (as I have), you really want this fix. Otherwise, if your license expires, you may not realize it until you wonder why GeneMark has suddenly stopped predicting genes... From cjfields at uiuc.edu Thu Jun 19 17:29:45 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 19 Jun 2008 16:29:45 -0500 Subject: [Bioperl-l] Bio::FeatureIO::gff In-Reply-To: References: Message-ID: <568794E1-29DE-4077-89CF-2FCB47BB8C1D@uiuc.edu> On Jun 19, 2008, at 3:47 PM, Mark Johnson wrote: > On Thu, Jun 19, 2008 at 3:19 PM, Chris Fields > wrote: > >> I think the general idea of Bio::FeatureIO will remain (read/write >> feature >> data) but it will definitely undergo significant reimplementation. >> The >> typed SeqFeatureI class (Bio::SeqFeature::Annotated) would be >> deprecated in >> favor of something more lightweight. >> >> However, I don't see that happening until after 1.6 is released >> unless >> someone wants to take it on. >> >> chris > > I suspect the 'something' you mention is going to require a fair bit > of discussion and design. The best time for that would be after 1.6. > Otherwise, we're likely to end up with something just as problematic > as Bio::SeqFeature::Annotated. > > However, people are using Bio::FeatureIO as it stands. Unless the > plan is to remove it from source control (as opposed to just not > packaging it up as part of the 1.6 release), would there be any > objection to patching up the existing implementation? If nothing > else, it will be the starting point for the reimplementation. Might > as well correct obvious defects, so they *don't* get reimplemented. I think patching for the current implementation is fine. We we plan on making changes we'll try posting plans on the wiki. Speaking of, I need to look at the FeatureIO API to see what is expected. Hopefully changes can occur with minimal to no API changes. chris From cjfields at uiuc.edu Thu Jun 19 17:36:40 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 19 Jun 2008 16:36:40 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Genemark In-Reply-To: References: Message-ID: Just committed the patch to svn, so updating the local bioperl-run from subversion should work. chris On Jun 19, 2008, at 4:29 PM, Mark Johnson wrote: > It seems that GeneMark.hmm returns 0 even when the license is expired. > You can probably guess how I discovered this. I created a bug and > attached a fix: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2523 > > If you've built Bio::Tools::Run::Genemark into a gene prediction > pipeline (as I have), you really want this fix. Otherwise, if your > license expires, you may not realize it until you wonder why GeneMark > has suddenly stopped predicting genes... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Jun 19 18:24:21 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 19 Jun 2008 17:24:21 -0500 Subject: [Bioperl-l] Bio::Tree::AnnotatableNode In-Reply-To: References: Message-ID: <6EE3DE6A-2EB0-4561-8CD1-9428A9D5AC88@uiuc.edu> On Jun 19, 2008, at 4:23 PM, Han, Mira wrote: > My current position is that > I will override the internal_id method to store the id_source ids > instead of > the creation_id. It looks like you can set the internal_id() with the _creation_id() private method. > And I will use Annotation::SimpleValue instead of the _tags hash in > the > Tree::Node > I guess in that case I'll have to implement the add_tag_value etc to > use > Annotation::SimpleValue internally? It makes sense as you wouldn't want a mixed bag of scalar tags and AnnotationI, though this sounds eerily similar to what I just rolled back in SeqFeature/Annotatable. I think it's okay in this case. With that in mind, you'll obviously need to reimplement the other 'tag' methods similarly so they replicate behavior indicated in the NodeI API and Node implementation (i.e. if args are accepted or values returned they must be scalar values and not Annotation::SimpleValue). > Another quick questions.. > Which is the general style, to use hyphenated tags or just normal > words for > keys like this? > > Mira Normal words, though documenting these is best. For consistency you may want to see what other TreeIO parsers use for tag names and (if there are similarities) use the same tag names. chris > On 6/19/08 5:07 PM, "Mira Han" wrote: > >> >> Hi, >> I have a few questions regarding the design of the AnnotatableNode. >> >> 1. add_tag_value or Annotation::SimpleValue? >> I have property tags for nodes that can be defined by the user, >> that contains >> generally simple scalar values, >> I'm currently using the Annotation::SimpleValue to contain them in >> the node. >> What I'm wondering is.. >> should I use the tag/value implementation already in NodeI for >> these instead? >> >> 2. I have to also maintain a unique identifier for each node called >> id_source. >> First I was looking to use the internal_id to store the unique ids. >> But now I realized that we cannot set the internal_id to arbitrary >> ids, >> We can only get the values already determined. >> So now I'm wondering again should I set this id as a tag/value? Or a >> SimpleAnnotation? >> Or can I modify the code so that we can set the internal_id? >> >> Mira > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Thu Jun 19 20:59:02 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 19 Jun 2008 20:59:02 -0400 Subject: [Bioperl-l] Bio::FeatureIO::gff In-Reply-To: References: Message-ID: <513D9B7B-D5F0-43C0-8084-CEFBA3AE420D@gmx.net> On Jun 19, 2008, at 3:38 PM, Mark Johnson wrote: > So, what are the plans for Bio::FeatureIO? I find it kind of handy, > so unless it's going to be scrapped in favor of something else, any > objection to lobbing a ticket and patch at Bugzilla? Not at all, that would be great in fact. ChrisF mentioned the caveats due to the need for re-implementation, so before you go overboard with fixing design etc you may want to check here. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From koski at cenix-bioscience.com Fri Jun 20 03:14:59 2008 From: koski at cenix-bioscience.com (Liisa Koski) Date: Fri, 20 Jun 2008 09:14:59 +0200 Subject: [Bioperl-l] svn checkout error bioperl-live Message-ID: <485B58F3.6000603@cenix-bioscience.com> Hi, I'm trying to checkout bioperl-live svn co svn://open-bio.org/bioperl/bioperl-live/trunk bioperl-live but get the following error: svn: Can't connect to host 'open-bio.org': Connection refused Any suggestions? Thanks, Liisa From arareko at campus.iztacala.unam.mx Fri Jun 20 11:04:16 2008 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 20 Jun 2008 10:04:16 -0500 Subject: [Bioperl-l] svn checkout error bioperl-live In-Reply-To: <485B58F3.6000603@cenix-bioscience.com> References: <485B58F3.6000603@cenix-bioscience.com> Message-ID: <485BC6F0.1040101@campus.iztacala.unam.mx> Hi Lisa, There was a problem in the server hosting the SVN service. ChrisD has fixed that and it should be working now, please try again. Regards, Mauricio. Liisa Koski wrote: > Hi, > > I'm trying to checkout bioperl-live > > svn co svn://open-bio.org/bioperl/bioperl-live/trunk bioperl-live > > but get the following error: > > svn: Can't connect to host 'open-bio.org': Connection refused > > Any suggestions? > > Thanks, > Liisa > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Fri Jun 20 14:35:07 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 20 Jun 2008 13:35:07 -0500 Subject: [Bioperl-l] BioPerl Infernal tools Message-ID: For those interested, The first release candidate for Infernal 1.0 (v 1.0rc1) is out now. I am planning on updating all Infernal-related modules (Bio::SearchIO::infernal and Bio::Tools::Run::Infernal) to use the latest release, probably within the next couple of weeks. It is possible I will need to drop support for pre-0.8 Infernal releases if discerning output between different versions becomes too problematic (you should probably be running the latest version anyway, but I digress...). If so I will announce that when I commit the final changes. chris From jason at bioperl.org Mon Jun 23 11:02:56 2008 From: jason at bioperl.org (Jason Stajich) Date: Mon, 23 Jun 2008 08:02:56 -0700 Subject: [Bioperl-l] Fwd: PAML through BIOPERL - parsing error References: <263150.43884.qm@web36407.mail.mud.yahoo.com> Message-ID: <10D5F46A-B9C3-4525-A66F-20B0D8786FB3@bioperl.org> You'll have to report what version of BioPerl and PAML you are using, only certain versions work together because the report output from PAML changes in each version. We have tried to fix it for PAML4 in the latest code in SVN and I believe PAML 3.15 should work with what is in 1.5.2 release of BioPerl but I'm not sure. Please direct your questions to the mailing list to insure someone has a chance to look at it. -jason Begin forwarded message: > From: Tannistha > Date: June 22, 2008 10:43:41 PM PDT > To: jason at bioperl.org > Subject: PAML through BIOPERL - parsing error > Reply-To: tannistha3 at yahoo.com > > > Hi Jason, > > I am using PAML through BIOPERL. My input in multiple CDS sequence. > I am getting an error while parsing my codeml result. > The error is: > Use of uninitialized value in pattern match (m//) at /usr/lib/perl5/ > site_perl/5.8.5/Bio/Tools/Phylo/PAML.pm line 615, line 90. > > Please suggest how to eliminate this error. > > Thanking you > > Regards > > Dr. Tannistha Nandi > email: tannistha3 at yahoo.com > > From nsh9351 at rit.edu Mon Jun 23 15:32:08 2008 From: nsh9351 at rit.edu (Nathan Haseley (RIT Student)) Date: Mon, 23 Jun 2008 15:32:08 -0400 Subject: [Bioperl-l] BioPerl help with 2D arrays Message-ID: <7D75703BC8E1C149BF78A1E79AAAB169035BC531@svits28.main.ad.rit.edu> Hello, I am writing a script that returns an array of array references to Bio::Search::HSP::GenericHSP objects (2D array of Bio::Search::HSP::GenericHSP objects). Whenever I try to call functions such as ->num_identical I get an error message: Can't call method "num_identical" without a package or object reference. Below are the segments of the code that are giving me problems to give you a general idea of what I'm doing. Is there a way around this? What am I doing wrong? Thanks! Sincerely, Nathan my $in = new Bio::SearchIO( -format => 'blast', -file => $file ); my $ j = -1; while( $result = $in->next_result and ref($result)) { ++$j; while( $has_species == 0) { if( my $hit = $result->next_hit) { if( $hit -> description =~ /$species_names[$i]/i) { $has_species = 1; $temp[$i] = $hit->next_hsp; $result->rewind; } } } $homologs[$j] = [@temp]; . . . return $homologs . . . (eventually in a separate fucntion) $temp = $homologs[$i]->[$j]; $temp->num_identical; From jajams at utu.fi Tue Jun 24 04:23:35 2008 From: jajams at utu.fi (=?iso-8859-1?Q?Joonas_J=E4msen?=) Date: Tue, 24 Jun 2008 11:23:35 +0300 Subject: [Bioperl-l] hmmpfam Message-ID: Hello, Is it possible to just get the FASTA sequence of a specified domain (by name) above a certain threshold found with hmmpfam? I could not find this functionality in the packages I was referred to recently. Joonas J?msen. From David.Messina at sbc.su.se Tue Jun 24 10:35:50 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Jun 2008 16:35:50 +0200 Subject: [Bioperl-l] BioPerl help with 2D arrays In-Reply-To: <7D75703BC8E1C149BF78A1E79AAAB169035BC531@svits28.main.ad.rit.edu> References: <7D75703BC8E1C149BF78A1E79AAAB169035BC531@svits28.main.ad.rit.edu> Message-ID: <628aabb70806240735o60a4d292l93d620f1e829f92c@mail.gmail.com> Hi Nathan, I'm not sure, but I suspect that when you come back and try to examine one of your saved objects, the class defining that object isn't loaded. If you add the line use Bio::Search::HSP::GenericHSP; to the top of the file where you are getting the error, does that help? Dave From bix at sendu.me.uk Tue Jun 24 11:23:50 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 24 Jun 2008 16:23:50 +0100 Subject: [Bioperl-l] BioPerl help with 2D arrays In-Reply-To: <7D75703BC8E1C149BF78A1E79AAAB169035BC531@svits28.main.ad.rit.edu> References: <7D75703BC8E1C149BF78A1E79AAAB169035BC531@svits28.main.ad.rit.edu> Message-ID: <48611186.1020908@sendu.me.uk> Nathan Haseley (RIT Student) wrote: > Hello, I am writing a script that returns an array of array > references to Bio::Search::HSP::GenericHSP objects (2D array of > Bio::Search::HSP::GenericHSP objects). Whenever I try to call > functions such as ->num_identical I get an error message: Can't call > method "num_identical" without a package or object reference. > > Below are the segments of the code that are giving me problems to > give you a general idea of what I'm doing. Is there a way around > this? What am I doing wrong? Thanks! Sincerely, Nathan I think we need to see more of your code. If I fill in enough of the missing blanks to make your example run, it works fine. So your problem is somewhere in your code that you haven't shown us. Best thing to do is standard debugging to come up with a minimal but working script that demonstrates the problem, then give us that. From cjfields at uiuc.edu Tue Jun 24 11:39:47 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 24 Jun 2008 10:39:47 -0500 Subject: [Bioperl-l] BioPerl help with 2D arrays In-Reply-To: <7D75703BC8E1C149BF78A1E79AAAB169035BC531@svits28.main.ad.rit.edu> References: <7D75703BC8E1C149BF78A1E79AAAB169035BC531@svits28.main.ad.rit.edu> Message-ID: On Jun 23, 2008, at 2:32 PM, Nathan Haseley (RIT Student) wrote: > Hello, > I am writing a script that returns an array of array references > to Bio::Search::HSP::GenericHSP objects (2D array of > Bio::Search::HSP::GenericHSP objects). Whenever I try to call > functions such as ->num_identical I get an error message: > Can't call method "num_identical" without a package or object > reference. > > Below are the segments of the code that are giving me problems to > give you a general idea of what I'm doing. Is there a way around > this? What am I doing wrong? Thanks! > Sincerely, > Nathan > > my $in = new Bio::SearchIO( -format => 'blast', > -file => $file ); > my $ j = -1; > while( $result = $in->next_result and ref($result)) { $result should always be assigned an object ref or undef, the latter which kills the loop, so '... and ref($result)' isn't needed. > ++$j; > while( $has_species == 0) { > if( my $hit = $result->next_hit) { > if( $hit -> description =~ /$species_names[$i]/i) { > $has_species = 1; > $temp[$i] = $hit->next_hsp; > $result->rewind; > } > } > } I think you want to incorporate a simple hash construct for your species names. However... You have left out a significant bit of code, so I am finding it hard to determine what you are trying to accomplish. For instance, from the above it seems you always want to always match to the same species name, as $i never changes (so $species_name[$i] also never changes). The while loop as represented above is pretty dangerous, as you can feasibly enter an infinite loop unless you get both a hit AND the hit desc matches the regex. Also, you're rewinding the main $result iterator ($result->rewind) while inside the $result iterator loop; not sure why you are doing that. Note that '$hit->next_hsp;' can also be dangerous under some circumstances, as this can give you undef in cases where no HSP alignment is returned (e.g. the hit data is only in the hit table). May be one source of your problems. > $homologs[$j] = [@temp]; ... > return $homologs Shouldn't that be @homologs? You use '$homologs[$j]' above, which is a scalar in the array @homologs. If you aren't 'use strict/warnings', this creates a local scalar $homologs then returns the value of $homologs (undef), which could be the problem. Using 'strict/warnings' should catch that. > . > . > . (eventually in a separate fucntion) > $temp = $homologs[$i]->[$j]; > $temp->num_identical; I would go about this by creating a lookup table (hash or array) of species names, then iterate through each BLAST report to either (1) generically grab the species name generically and check it against the lookup table using 'exists', %lookup = map {$_ => 1} ('species1', 'species2'); # if species is in brackets, match inside the brackets if ($hit->description =~ /\[([^\]]+)\]/) { $sp = $1; if (exists $lookup{$sp}) { # do something } else { # do something on failure } } or (2) check each name (the key in the lookup table) against the description using a 'grep', such as: %lookup = map {$_ => 1} ('species1', 'species2'); if (grep {$desc =~ /$_/} keys %lookup) {...} else {# do something on failure} # could use array of names as a lookup as well Depending on how the varied the descriptions are; it may only be feasible to try the latter one. From there you could store the HSP data in a hash, using the species name: $temp{$species} = $hit->next_hsp || warn 'No HSP returned'; and store that by the report # or description: # array of reports push @reports, \%temp; # hash of reports, by query name $report{$result->query_name} = \%temp; chris From pmiguel at purdue.edu Tue Jun 24 19:29:30 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Tue, 24 Jun 2008 19:29:30 -0400 Subject: [Bioperl-l] The way to extract a segment of an EMBL record? Message-ID: <4861835A.3010700@purdue.edu> When I use the trunc method to pull a segment of a sequence object derived from large EMBL file, the annotations are not propagated. Do I need to specify a heavy weight sequence object somehow for annotation in the EMBL file to be carried along? Phillip PS The code I'm talking about is below. This produces what appears to be a valid EMBL file, but only bare-bones annotation is carried along, or generated on the fly: ID NC_003070; SV 1; linear; unassigned DNA; STD; UNC; 70001 BP. XX AC NC_003070; XX DE Arabidopsis thaliana chromosome 1, complete sequence. XX XX FH Key Location/Qualifiers FH XX SQ Sequence 70001 BP; 20783 A; 14173 C; 13896 G; 21149 T; 0 other; Here is the code: use Bio::SeqIO; use Bio::Seq; my $rh_opts =parseOptions(); #create input sequence object my $in = Bio::SeqIO->new(-file => $rh_opts->{INFILE} , -format => $rh_opts->{F_IN} ); #pull in sequence. (Ignores all but first sequence) my $seq = $in->next_seq() || die "Hey, $rh_opts-{INFILE} contains no sequence of format $rh_opts->{F_IN}!"; $in->close(); #xtract segment of sequence to output my $end =(defined $rh_opts->{END}) ? $rh_opts->{END} : $seq->length(); my $begin = $rh_opts->{BEGIN}; my $segment =$seq->trunc($begin, $end); #create output object my $out = Bio::SeqIO->new(-file => ">$rh_opts->{OUTFILE}" , -format => $rh_opts->{F_OUT}); $out->write_seq($segment); From cjfields at uiuc.edu Tue Jun 24 21:33:15 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 24 Jun 2008 20:33:15 -0500 Subject: [Bioperl-l] The way to extract a segment of an EMBL record? In-Reply-To: <4861835A.3010700@purdue.edu> References: <4861835A.3010700@purdue.edu> Message-ID: <2EE95EFF-DF52-40DB-A9F6-EA9A5530CC94@uiuc.edu> The trunc() methods are not completely implemented; Bio::Seq::trunc() doesn't carry over either annotations or seqfeatures AFAIK. I think Bio::SeqUtils::trunc_with_features does this though. chris On Jun 24, 2008, at 6:29 PM, Phillip San Miguel wrote: > When I use the trunc method to pull a segment of a sequence object > derived from large EMBL file, the annotations are not propagated. Do > I need to specify a heavy weight sequence object somehow for > annotation in the EMBL file to be carried along? > > Phillip > > PS The code I'm talking about is below. This produces what appears > to be a valid EMBL file, but only bare-bones annotation is carried > along, or generated on the fly: > > ID NC_003070; SV 1; linear; unassigned DNA; STD; UNC; 70001 BP. > XX > AC NC_003070; > XX > DE Arabidopsis thaliana chromosome 1, complete sequence. > XX > XX > FH Key Location/Qualifiers > FH > XX > SQ Sequence 70001 BP; 20783 A; 14173 C; 13896 G; 21149 T; 0 other; > > > Here is the code: > > use Bio::SeqIO; > use Bio::Seq; > > my $rh_opts =parseOptions(); > > #create input sequence object > my $in = Bio::SeqIO->new(-file => $rh_opts->{INFILE} , -format > => $rh_opts->{F_IN} ); > > #pull in sequence. (Ignores all but first sequence) > my $seq = $in->next_seq() > || die "Hey, $rh_opts-{INFILE} contains no sequence of format > $rh_opts->{F_IN}!"; > $in->close(); > > #xtract segment of sequence to output > my $end =(defined $rh_opts->{END}) > ? $rh_opts->{END} > : $seq->length(); > my $begin = $rh_opts->{BEGIN}; > my $segment =$seq->trunc($begin, $end); > > #create output object > my $out = Bio::SeqIO->new(-file => ">$rh_opts->{OUTFILE}" , -format > => $rh_opts->{F_OUT}); > $out->write_seq($segment); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann College of Veterinary Medicine University of Illinois Urbana-Champaign From jason at bioperl.org Wed Jun 25 01:24:02 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 25 Jun 2008 14:24:02 +0900 Subject: [Bioperl-l] Fwd: hmmpfam In-Reply-To: <8273f6c20806242223v4f6d1442n4a96233c6aff3112@mail.gmail.com> References: <8273f6c20806242223v4f6d1442n4a96233c6aff3112@mail.gmail.com> Message-ID: <8273f6c20806242224g4d3b4579gd89619e34423358b@mail.gmail.com> forgot to cc list From: Jason Stajich Date: Jun 25, 2008 2:23 PM Subject: Re: [Bioperl-l] hmmpfam To: Joonas J?msen you can get the sequence as a string, but it will have the gaps and other symbols inserted. See the SEARCHIO Howto for getting query and hit sequence as a string or a multiple-alignment of the query and hit pair (get_aln). -jason On 6/24/08, Joonas J?msen wrote: > > Hello, > > Is it possible to just get the FASTA sequence of a specified domain (by > name) above a certain threshold found with hmmpfam? I could not find this > functionality in the packages I was referred to recently. > > Joonas J?msen. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From hlapp at gmx.net Wed Jun 25 18:07:09 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 25 Jun 2008 18:07:09 -0400 Subject: [Bioperl-l] Problems when trying to persist a sequence in my BioSQL database using, BioPerl In-Reply-To: <48611792.2060906@gmx.net> References: <48611792.2060906@gmx.net> Message-ID: <1DB86640-D4D5-4081-8DE5-E3181DD9D6A3@gmx.net> Hi Gabrielle, (note that I have changed the mailing list to bioperl - whoever replies please cut biosql from the cc list, assuming that this indeed isn't BioSQL's fault) given the error message below, Bio::SeqIO::genbank can be found, but it fails to load because it requires Bio::Species, which in turn imports support for weak references from Scalar::Util. The last step fails, causing loading Bio::Species to fail, which in turn causes Bio::SeqIO::genbank to fail to load. The real question is why your version of Perl doesn't seem to have support for weak references (the reason for Scalar::Util failing to load). Could you give details on your OS version and your version of Perl (output of 'perl -V'). The question for BioPerl is whether there is a fall-back mechanism we might want to support if weak references aren't supported, rather than rendering the genbank parser unusable. Sendu or Chris - any thoughts on this? -hilmar On Jun 24, 2008, at 10:49 AM, Gabrielle Doan wrote: > Hi all, > > I am new to BioPerl and BioSQL so please excuse me if my question is > a bit simple. I followed the installation files in the current > version of BioPerl very strictly (I used the Bioperl 1.5.2, > Developer Release from the bioperl website). After successful > installation I tried to persist a genbank file in my BioSQL > database, which runs on a database server and is accessible using > the mysql command shell. When using bioperl I receive the following > error message: > > ================ > > $ /usr/bin/bp_load_seqdatabase.pl --host radb --dbname bioseqdb -- > dbuser myuser --dbpass mypasswd --namespace GenBank /home/doan/db- > data/ref_chr1.gbk > Loading /local/doan/db-daten/ref_chr1.gbk ... > Bio::SeqIO: genbank cannot be found > Exception ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::SeqIO::genbank. Weak references are > not implemented in the version of perl at /usr/lib/perl5/site_perl/ > 5.8.8/Bio/Species.pm line 91 > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/ > Bio/Species.pm line 91. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/ > SeqIO/genbank.pm line 172. > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/ > Bio/SeqIO/genbank.pm line 172. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/ > Root/Root.pm line 425. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/ > Root/Root.pm:359 > STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/ > Bio/Root/Root.pm:427 > STACK: Bio::SeqIO::_load_format_module /usr/lib/perl5/site_perl/ > 5.8.8/Bio/SeqIO.pm:555 > STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376 > STACK: /usr/bin/bp_load_seqdatabase.pl:541 > ----------------------------------------------------------- > > For more information about the SeqIO system please see the SeqIO docs. > This includes ways of checking for formats at compile time, not run > time > Can't call method "next_seq" on an undefined value at /usr/bin/ > bp_load_seqdatabase.pl line 565. > > ================ > > Unfortunately, even Google does not provide any hints when searching > for the particular message. It seems that for some reason the path > to the Bio::SeqIO::genbank module cannot be found. I am greateful > for any hint! > > Cheers, > Gabrielle > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Thu Jun 26 00:37:41 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Jun 2008 13:37:41 +0900 Subject: [Bioperl-l] hmmpfam In-Reply-To: <4862A7B8.5020302@utu.fi> References: <8273f6c20806242223v4f6d1442n4a96233c6aff3112@mail.gmail.com> <8273f6c20806242224g4d3b4579gd89619e34423358b@mail.gmail.com> <4862A7B8.5020302@utu.fi> Message-ID: <64CE7D40-9ED3-48F2-B46B-441368F201A6@bioperl.org> yeah - you get it as a string, you make a sequence object, you write it out as fasta. I'm sure someone on the list can help if you can't figure out how to do this. Read the sequence howto, you want to use Bio::Seq to make the sequence and Bio::SeqIO to write the sequence. I don't know if you want the whole sequence or just what was matched by the profile search. Please continue to reply to questions to the mailing list. -jason On Jun 26, 2008, at 5:16 AM, Joonas J?msen wrote: > Thanks, > > Is there no way to get the sequence out in fasta format? I have > like 25000 sequences to extract. I have no clue as to how to do > this at the moment. > > Best regards, > Joonas. > > Jason Stajich wrote: >> forgot to cc list From: *Jason Stajich* > > >> Date: Jun 25, 2008 2:23 PM >> Subject: Re: [Bioperl-l] hmmpfam >> To: Joonas J?msen > >> you can get the sequence as a string, but it will have the gaps >> and other symbols inserted. >> See the SEARCHIO Howto for getting query and hit sequence as a >> string or a multiple-alignment of the query and hit pair >> (get_aln). -jason >> On 6/24/08, *Joonas J?msen* > > wrote: >> Hello, >> Is it possible to just get the FASTA sequence of a specified >> domain >> (by name) above a certain threshold found with hmmpfam? I >> could not >> find this functionality in the packages I was referred to >> recently. >> Joonas J?msen. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org > bio.org> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> Jason Stajich >> jason at bioperl.org >> http://bioperl.org/wiki/User:Jason >> -- >> Jason Stajich >> jason at bioperl.org >> http://bioperl.org/wiki/User:Jason From bix at sendu.me.uk Thu Jun 26 06:02:38 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 26 Jun 2008 11:02:38 +0100 Subject: [Bioperl-l] Problems when trying to persist a sequence in my BioSQL database using, BioPerl In-Reply-To: <1DB86640-D4D5-4081-8DE5-E3181DD9D6A3@gmx.net> References: <48611792.2060906@gmx.net> <1DB86640-D4D5-4081-8DE5-E3181DD9D6A3@gmx.net> Message-ID: <4863693E.2080706@sendu.me.uk> Hilmar Lapp wrote: > The real question is why your version of Perl doesn't seem to have > support for weak references (the reason for Scalar::Util failing to > load). Could you give details on your OS version and your version of > Perl (output of 'perl -V'). Given that it seems to be perl 5.8.8, I'm guessing this is the RedHat/Fedora issue: http://search.cpan.org/~adamk/Task-Weaken-1.02/lib/Task/Weaken.pm Solutions: First try installing the latest version of Scalar::Util yourself: perl -MCPAN -e shell force install Scalar::Util (and see that it gets installed in a place that is checked before the Fedora version, or overwrites the Fedora version) If that doesn't work, you'll have to download and compile Perl yourself from source (don't use Fedora's installation system). > The question for BioPerl is whether there is a fall-back mechanism we > might want to support if weak references aren't supported, rather than > rendering the genbank parser unusable. Sendu or Chris - any thoughts on > this? Firstly, Task::Weaken should get added to Build.pl as a requirement, so people get better error messages. If we do that, however, the whole of BioPerl doesn't get installed, never mind just the genbank parser not being usable. As for a fall-back mechanism, I'm not really sure how that would work. The easiest thing to do would be to just not deal with the species lines if Bio::Species doesn't work. Is that an acceptable fall-back? If not, more thought and discussion is needed. Make a proposal: what would you want to happen? > On Jun 24, 2008, at 10:49 AM, Gabrielle Doan wrote: > >> Hi all, >> >> I am new to BioPerl and BioSQL so please excuse me if my question is a >> bit simple. I followed the installation files in the current version >> of BioPerl very strictly (I used the Bioperl 1.5.2, Developer Release >> from the bioperl website). After successful installation I tried to >> persist a genbank file in my BioSQL database, which runs on a database >> server and is accessible using the mysql command shell. When using >> bioperl I receive the following error message: >> >> ================ >> >> $ /usr/bin/bp_load_seqdatabase.pl --host radb --dbname bioseqdb >> --dbuser myuser --dbpass mypasswd --namespace GenBank >> /home/doan/db-data/ref_chr1.gbk >> Loading /local/doan/db-daten/ref_chr1.gbk ... >> Bio::SeqIO: genbank cannot be found >> Exception ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Failed to load module Bio::SeqIO::genbank. Weak references are >> not implemented in the version of perl at >> /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91 From cjfields at uiuc.edu Thu Jun 26 08:44:52 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 26 Jun 2008 07:44:52 -0500 Subject: [Bioperl-l] Problems when trying to persist a sequence in my BioSQL database using, BioPerl In-Reply-To: <1DB86640-D4D5-4081-8DE5-E3181DD9D6A3@gmx.net> References: <48611792.2060906@gmx.net> <1DB86640-D4D5-4081-8DE5-E3181DD9D6A3@gmx.net> Message-ID: We could change that in Bio::Species (maybe with DESTROY), but Scalar::Util is also used several other BioPerl modules and is used in a few BioPerl dependencies, I believe. cjfields:bioperl-live cjfields$ ack Scalar::Util Bio/DB/SeqFeature/Store.pm 231:use Scalar::Util 'blessed'; Bio/FeatureIO/bed.pm 71:use Scalar::Util qw(looks_like_number); Bio/Map/Position.pm 89:use Scalar::Util qw(looks_like_number); Bio/Map/PositionI.pm 81:use Scalar::Util qw(looks_like_number); Bio/Map/Relative.pm 89:use Scalar::Util qw(looks_like_number); Bio/SeqFeature/Gene/GeneStructure.pm 68: eval "use Scalar::Util qw(weaken);"; Bio/Species.pm 91:use Scalar::Util qw(weaken isweak); Bio/Tree/Node.pm 76:use Scalar::Util qw(weaken isweak); chris On Jun 25, 2008, at 5:07 PM, Hilmar Lapp wrote: > Hi Gabrielle, > > (note that I have changed the mailing list to bioperl - whoever > replies please cut biosql from the cc list, assuming that this > indeed isn't BioSQL's fault) > > given the error message below, Bio::SeqIO::genbank can be found, but > it fails to load because it requires Bio::Species, which in turn > imports support for weak references from Scalar::Util. The last step > fails, causing loading Bio::Species to fail, which in turn causes > Bio::SeqIO::genbank to fail to load. > > The real question is why your version of Perl doesn't seem to have > support for weak references (the reason for Scalar::Util failing to > load). Could you give details on your OS version and your version of > Perl (output of 'perl -V'). > > The question for BioPerl is whether there is a fall-back mechanism > we might want to support if weak references aren't supported, rather > than rendering the genbank parser unusable. Sendu or Chris - any > thoughts on this? > > -hilmar > > On Jun 24, 2008, at 10:49 AM, Gabrielle Doan wrote: > >> Hi all, >> >> I am new to BioPerl and BioSQL so please excuse me if my question >> is a bit simple. I followed the installation files in the current >> version of BioPerl very strictly (I used the Bioperl 1.5.2, >> Developer Release from the bioperl website). After successful >> installation I tried to persist a genbank file in my BioSQL >> database, which runs on a database server and is accessible using >> the mysql command shell. When using bioperl I receive the following >> error message: >> >> ================ >> >> $ /usr/bin/bp_load_seqdatabase.pl --host radb --dbname bioseqdb -- >> dbuser myuser --dbpass mypasswd --namespace GenBank /home/doan/db- >> data/ref_chr1.gbk >> Loading /local/doan/db-daten/ref_chr1.gbk ... >> Bio::SeqIO: genbank cannot be found >> Exception ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Failed to load module Bio::SeqIO::genbank. Weak references are >> not implemented in the version of perl at /usr/lib/perl5/site_perl/ >> 5.8.8/Bio/Species.pm line 91 >> BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/ >> Bio/Species.pm line 91. >> Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/ >> SeqIO/genbank.pm line 172. >> BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/ >> Bio/SeqIO/genbank.pm line 172. >> Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/ >> Root/Root.pm line 425. >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/ >> Root/Root.pm:359 >> STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/ >> Bio/Root/Root.pm:427 >> STACK: Bio::SeqIO::_load_format_module /usr/lib/perl5/site_perl/ >> 5.8.8/Bio/SeqIO.pm:555 >> STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm: >> 376 >> STACK: /usr/bin/bp_load_seqdatabase.pl:541 >> ----------------------------------------------------------- >> >> For more information about the SeqIO system please see the SeqIO >> docs. >> This includes ways of checking for formats at compile time, not run >> time >> Can't call method "next_seq" on an undefined value at /usr/bin/ >> bp_load_seqdatabase.pl line 565. >> >> ================ >> >> Unfortunately, even Google does not provide any hints when >> searching for the particular message. It seems that for some reason >> the path to the Bio::SeqIO::genbank module cannot be found. I am >> greateful for any hint! >> >> Cheers, >> Gabrielle >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Marie-Claude Hofmann Institute for Genomic Biology/College of Veterinary Medicine University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Jun 26 10:58:29 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 26 Jun 2008 09:58:29 -0500 Subject: [Bioperl-l] Problems when trying to persist a sequence in my BioSQL database using, BioPerl In-Reply-To: <4863693E.2080706@sendu.me.uk> References: <48611792.2060906@gmx.net> <1DB86640-D4D5-4081-8DE5-E3181DD9D6A3@gmx.net> <4863693E.2080706@sendu.me.uk> Message-ID: <1CECB9E9-1CB2-433D-9AE6-9A8EC3B26E63@uiuc.edu> On Jun 26, 2008, at 5:02 AM, Sendu Bala wrote: > Hilmar Lapp wrote: >> The real question is why your version of Perl doesn't seem to have >> support for weak references (the reason for Scalar::Util failing to >> load). Could you give details on your OS version and your version >> of Perl (output of 'perl -V'). > > Given that it seems to be perl 5.8.8, I'm guessing this is the > RedHat/Fedora issue: > > http://search.cpan.org/~adamk/Task-Weaken-1.02/lib/Task/Weaken.pm > > Solutions: > First try installing the latest version of Scalar::Util yourself: > > perl -MCPAN -e shell > force install Scalar::Util > > (and see that it gets installed in a place that is checked before > the Fedora version, or overwrites the Fedora version) > > If that doesn't work, you'll have to download and compile Perl > yourself from source (don't use Fedora's installation system). The last option isn't really viable. BioPerl is hard enough to install w/o having to rebuild perl from scratch. For the time being, we should add Task::Weaken to the requirements and describe the issue within the installation notes, or at least link to a reference to it. >> The question for BioPerl is whether there is a fall-back mechanism >> we might want to support if weak references aren't supported, >> rather than rendering the genbank parser unusable. Sendu or Chris - >> any thoughts on this? > > Firstly, Task::Weaken should get added to Build.pl as a requirement, > so people get better error messages. If we do that, however, the > whole of BioPerl doesn't get installed, never mind just the genbank > parser not being usable. > > As for a fall-back mechanism, I'm not really sure how that would > work. The easiest thing to do would be to just not deal with the > species lines if Bio::Species doesn't work. Is that an acceptable > fall-back? If not, more thought and discussion is needed. Make a > proposal: what would you want to happen? Skipping the Species isn't an option; it's an integral part of the main BioPerl core and would be a PITA to deal with in bioperl-live, let alone bioperl-db. We would have to wrap every Species-related call in an eval{} and fallback to something else. Could we just set DESTROY or a root cleanup callback to delete the child/parent node references? chris From ymc at shgc.stanford.edu Thu Jun 26 14:13:04 2008 From: ymc at shgc.stanford.edu (Yee Man Chan) Date: Thu, 26 Jun 2008 11:13:04 -0700 (PDT) Subject: [Bioperl-l] IUPAC support for DNA alignment Message-ID: Hi all I am the owner of Bio::Tools::dpAlign. A user emailed me to add support for IUPAC nucleotide codes. I am ok to add this feature but I would like to know what are the conventions to handle these IUPAC codes. Suppose match is +3 and mismatch is -1. Then what should be the score when T matches with U, A with W, A with D, A with N and A with X? Does anyone know the conventions? Thanks a lot. Yee Man From bix at sendu.me.uk Thu Jun 26 14:47:18 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 26 Jun 2008 19:47:18 +0100 Subject: [Bioperl-l] Problems when trying to persist a sequence in my BioSQL database using, BioPerl In-Reply-To: <1CECB9E9-1CB2-433D-9AE6-9A8EC3B26E63@uiuc.edu> References: <48611792.2060906@gmx.net> <1DB86640-D4D5-4081-8DE5-E3181DD9D6A3@gmx.net> <4863693E.2080706@sendu.me.uk> <1CECB9E9-1CB2-433D-9AE6-9A8EC3B26E63@uiuc.edu> Message-ID: <4863E436.7020505@sendu.me.uk> Chris Fields wrote: >> As for a fall-back mechanism, I'm not really sure how that would work. >> The easiest thing to do would be to just not deal with the species >> lines if Bio::Species doesn't work. Is that an acceptable fall-back? >> If not, more thought and discussion is needed. Make a proposal: what >> would you want to happen? > > Skipping the Species isn't an option; it's an integral part of the main > BioPerl core and would be a PITA to deal with in bioperl-live, let alone > bioperl-db. We would have to wrap every Species-related call in an > eval{} and fallback to something else. Could we just set DESTROY or a > root cleanup callback to delete the child/parent node references? Just to remind everyone, this all first came up here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/13623 And the associated bug report is here: http://bugzilla.open-bio.org/show_bug.cgi?id=2149 I really can't remember now, but there might be problems with your suggestion. But by all means try your idea and see if it works. I'll take a look at it myself soon. From cjfields at uiuc.edu Thu Jun 26 16:41:08 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 26 Jun 2008 15:41:08 -0500 Subject: [Bioperl-l] Problems when trying to persist a sequence in my BioSQL database using, BioPerl In-Reply-To: <4863E436.7020505@sendu.me.uk> References: <48611792.2060906@gmx.net> <1DB86640-D4D5-4081-8DE5-E3181DD9D6A3@gmx.net> <4863693E.2080706@sendu.me.uk> <1CECB9E9-1CB2-433D-9AE6-9A8EC3B26E63@uiuc.edu> <4863E436.7020505@sendu.me.uk> Message-ID: <50497C21-4187-4D0E-B695-CA88862F5FD8@uiuc.edu> On Jun 26, 2008, at 1:47 PM, Sendu Bala wrote: > Chris Fields wrote: >>> As for a fall-back mechanism, I'm not really sure how that would >>> work. The easiest thing to do would be to just not deal with the >>> species lines if Bio::Species doesn't work. Is that an acceptable >>> fall-back? If not, more thought and discussion is needed. Make a >>> proposal: what would you want to happen? >> Skipping the Species isn't an option; it's an integral part of the >> main BioPerl core and would be a PITA to deal with in bioperl-live, >> let alone bioperl-db. We would have to wrap every Species-related >> call in an eval{} and fallback to something else. Could we just >> set DESTROY or a root cleanup callback to delete the child/parent >> node references? > > Just to remind everyone, this all first came up here: > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/13623 > And the associated bug report is here: > http://bugzilla.open-bio.org/show_bug.cgi?id=2149 > > I really can't remember now, but there might be problems with your > suggestion. But by all means try your idea and see if it works. I'll > take a look at it myself soon. Double-checked, DESTROY wouldn't be called unless all references to the root node were removed. weaken() is really the best option; it might be possible to wrap everything in a proxy object if weaken doesn't work (though it's a bit of a hack): http://www.perl.com/pub/a/2002/08/07/proxyobject.html?page=3 chris From apapanicolaou at ice.mpg.de Fri Jun 27 06:02:08 2008 From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou) Date: Fri, 27 Jun 2008 12:02:08 +0200 Subject: [Bioperl-l] IUPAC support for DNA alignment In-Reply-To: References: Message-ID: <4864BAA0.1000103@ice.mpg.de> Hello I'm the user who asked for it. I don't know of any conventions but perhaps people can help on this? I'm not an expert at all but here is my opinion: If you don't know the codon position (or even if it is coding) then you can't estimate the codon degeneracy. If you don't know the frequency of the bases representated in the degenerate site then you can't model it either on the DNA level. So any solution will be ad-hoc. Regarding 2 base degenerate positions: My suggestion is that in a situation of alignment between, say a polymorphic and non polymorphic population for that site, and the user is interested in the distance between the populations, it would make sense to have the score to the full match. Regarding 3 bases: I don't really know (see N below) but I 'd go for a full match again, assuming the user build the consensus. Regarding N: I think this is more likely to be missing data. I doubt you can have a SNP occuring four times in the same position (three times are expected under infinite sites, too for that matter). Or the consensus is derived from very diverged sequences. I wouldn't score N therefore. Regarding X: That one shouldn't find in a DNA alignment unless it is a mask. I'd expect no score as well. my /practical/ suggestion would be to have the user to define it, as you allow for the other options, perhaps even allowing 2fold and 3fold degenerate IUPAC codes to be given different scores. That might save you (the owner) some future work when the user wants it... many thanks to anyone who can help, alexie ps. Yee Man had cleverly suggested a workaround: one can use the Protein Matrix to create a scoring matrix. Might require some caution, remembering resetting the alphabet though? Yee Man Chan wrote: > Hi all > > I am the owner of Bio::Tools::dpAlign. A user emailed me to add > support for IUPAC nucleotide codes. I am ok to add this feature but I > would like to know what are the conventions to handle these IUPAC codes. > > Suppose match is +3 and mismatch is -1. Then what should be the > score when T matches with U, A with W, A with D, A with N and A with X? > Does anyone know the conventions? > > Thanks a lot. > Yee Man > > -- "You can't find a hermit to teach you herming, because of course that rather spoils the whole thing." -- (Terry Pratchett, Small Gods) Alexie Papanicolaou Department of Entomology, Max Planck Institute for Chemical Ecology, Hans-Knoell-Strasse 8, D-07745 Jena, Germany. From hlapp at gmx.net Fri Jun 27 08:47:12 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 27 Jun 2008 08:47:12 -0400 Subject: [Bioperl-l] IUPAC support for DNA alignment In-Reply-To: <4864BAA0.1000103@ice.mpg.de> References: <4864BAA0.1000103@ice.mpg.de> Message-ID: <06F151CF-9994-42FD-BA8C-5F93AA285DF4@gmx.net> Hi Alexie, On Jun 27, 2008, at 6:02 AM, Alexie Papanicolaou wrote: > Hello > > I'm the user who asked for it. I don't know of any conventions but > perhaps people can help on this? > > I'm not an expert at all but here is my opinion: > If you don't know the codon position (or even if it is coding) then > you can't estimate the codon degeneracy. If you don't know the > frequency of the bases representated in the degenerate site then you > can't model it either on the DNA level. So any solution will be ad- > hoc. > > Regarding 2 base degenerate positions: My suggestion is that in a > situation of alignment between, say a polymorphic and non > polymorphic population for that site, and the user is interested in > the distance between the populations, it would make sense to have > the score to the full match. > > Regarding 3 bases: I don't really know (see N below) but I 'd go for > a full match again, assuming the user build the consensus. are you suggesting that a determined and a degenerate site aligned pairwise should score as much as two determined sites? My (possibly naive) default would be to average over all possibilities, each weighted by base frequency (if base frequencies are assumed unequal or independent), thus integrating out the uncertainty. (For standard matrices, I think this would also result in N receiving zero score.) In the end though, maybe there should be an option for a user to just provide a substitution matrix? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From apapanicolaou at ice.mpg.de Fri Jun 27 10:13:57 2008 From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou) Date: Fri, 27 Jun 2008 16:13:57 +0200 Subject: [Bioperl-l] IUPAC support for DNA alignment In-Reply-To: <06F151CF-9994-42FD-BA8C-5F93AA285DF4@gmx.net> References: <4864BAA0.1000103@ice.mpg.de> <06F151CF-9994-42FD-BA8C-5F93AA285DF4@gmx.net> Message-ID: <4864F5A5.9000601@ice.mpg.de> Hello I guess I didn't give enough info... (also sorry Yee Man, forget to CC you before) Scenario 1 - polymorphic allele vs non-polymorphic one. e.g. Let [A/G] be SNP in two alleles in population A and the one fixed allele [G] is in population B. In this scenario we want to calculate the distance between one locus between two populations ,thus a degenerate site is not the result of uncertaintly but of reality. Obviously the best method is to provide a matrix (if the user can be bothered) but Yee Man already allows this option. Personally, I wouldn't really using an alignment score to measure distance though... The application here is: we first want to align those two sequences and there should be no penalty because there is a SNP in one population (then estimate distance with another algorithm). Scenario 2 - uncertainty If the scenario is that [A/G] is the result of uncertainty then I gladly agree with you! I'm also perplexed how to score IUPAC codes allowing for three nucleotides (i.e. there might not be a SNP after all... but then again infinitite sites doesn't have to hold - in some species less than others...) Scenario 3 - a type profile alignment to a consensus In my particular case, I'm doing something different: I have the consensus of an alignment of multiple sequences (dozens to hundrends depending on dataset) with some mismatches including a SNP say [A/G]. A third sequence that I wish to align has A in that position. So obviously, it shouldn't be penalized. So it really depends on application and the user should be able to decide in the end... (Yee Man already provides the option for a protein substitution matrix). It would be nice if we had the option of specifying it though much more easily (a simple switch) so i can use for scenario 3. a ps. sorry, my english is going the drain... Hilmar Lapp wrote: > Hi Alexie, > > On Jun 27, 2008, at 6:02 AM, Alexie Papanicolaou wrote: > >> Hello >> >> I'm the user who asked for it. I don't know of any conventions but >> perhaps people can help on this? >> >> I'm not an expert at all but here is my opinion: >> If you don't know the codon position (or even if it is coding) then >> you can't estimate the codon degeneracy. If you don't know the >> frequency of the bases representated in the degenerate site then you >> can't model it either on the DNA level. So any solution will be ad-hoc. >> >> Regarding 2 base degenerate positions: My suggestion is that in a >> situation of alignment between, say a polymorphic and non polymorphic >> population for that site, and the user is interested in the distance >> between the populations, it would make sense to have the score to the >> full match. >> >> Regarding 3 bases: I don't really know (see N below) but I 'd go for >> a full match again, assuming the user build the consensus. > > are you suggesting that a determined and a degenerate site aligned > pairwise should score as much as two determined sites? > > My (possibly naive) default would be to average over all > possibilities, each weighted by base frequency (if base frequencies > are assumed unequal or independent), thus integrating out the > uncertainty. (For standard matrices, I think this would also result in > N receiving zero score.) > > In the end though, maybe there should be an option for a user to just > provide a substitution matrix? > > -hilmar > -- -- "Eppur si evolve" ("And yet it evolves") -Galileo Jr (ca 21st century) -- Alexie Papanicolaou Entomology Max Planck Institute for Chemical Ecology Hans Knoell Str 8 Jena 07745 Germany Email apapanicolaou at ice.mpg.de Tel +493641571561 From hlapp at gmx.net Fri Jun 27 13:33:46 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 27 Jun 2008 13:33:46 -0400 Subject: [Bioperl-l] IUPAC support for DNA alignment In-Reply-To: <4864F5A5.9000601@ice.mpg.de> References: <4864BAA0.1000103@ice.mpg.de> <06F151CF-9994-42FD-BA8C-5F93AA285DF4@gmx.net> <4864F5A5.9000601@ice.mpg.de> Message-ID: So instead of the user choosing a special matrix you would like to have a simple argument (that would probably under the hood do exactly that)? BTW Scenarios #1 and #3 sound more or less the same to me (i.e., you believe the degenerate code to reflect site polymorphism, not sequence uncertainty). -hilmar On Jun 27, 2008, at 10:13 AM, Alexie Papanicolaou wrote: > Hello > > I guess I didn't give enough info... (also sorry Yee Man, forget to > CC you before) > > Scenario 1 - polymorphic allele vs non-polymorphic one. e.g. > Let [A/G] be SNP in two alleles in population A and the one fixed > allele [G] is in population B. > > In this scenario we want to calculate the distance between one locus > between two populations ,thus a degenerate site is not the result of > uncertaintly but of reality. Obviously the best method is to provide > a matrix (if the user can be bothered) but Yee Man already allows > this option. Personally, I wouldn't really using an alignment score > to measure distance though... The application here is: we first want > to align those two sequences and there should be no penalty because > there is a SNP in one population (then estimate distance with > another algorithm). > > Scenario 2 - uncertainty > If the scenario is that [A/G] is the result of uncertainty then I > gladly agree with you! I'm also perplexed how to score IUPAC codes > allowing for three nucleotides (i.e. there might not be a SNP after > all... but then again infinitite sites doesn't have to hold - in > some species less than others...) > > Scenario 3 - a type profile alignment to a consensus > In my particular case, I'm doing something different: I have the > consensus of an alignment of multiple sequences (dozens to hundrends > depending on dataset) with some mismatches including a SNP say [A/ > G]. A third sequence that I wish to align has A in that position. So > obviously, it shouldn't be penalized. > > So it really depends on application and the user should be able to > decide in the end... (Yee Man already provides the option for a > protein substitution matrix). It would be nice if we had the option > of specifying it though much more easily (a simple switch) so i can > use for scenario 3. > a > ps. sorry, my english is going the drain... > > > Hilmar Lapp wrote: >> Hi Alexie, >> >> On Jun 27, 2008, at 6:02 AM, Alexie Papanicolaou wrote: >> >>> Hello >>> >>> I'm the user who asked for it. I don't know of any conventions but >>> perhaps people can help on this? >>> >>> I'm not an expert at all but here is my opinion: >>> If you don't know the codon position (or even if it is coding) >>> then you can't estimate the codon degeneracy. If you don't know >>> the frequency of the bases representated in the degenerate site >>> then you can't model it either on the DNA level. So any solution >>> will be ad-hoc. >>> >>> Regarding 2 base degenerate positions: My suggestion is that in a >>> situation of alignment between, say a polymorphic and non >>> polymorphic population for that site, and the user is interested >>> in the distance between the populations, it would make sense to >>> have the score to the full match. >>> >>> Regarding 3 bases: I don't really know (see N below) but I 'd go >>> for a full match again, assuming the user build the consensus. >> >> are you suggesting that a determined and a degenerate site aligned >> pairwise should score as much as two determined sites? >> >> My (possibly naive) default would be to average over all >> possibilities, each weighted by base frequency (if base frequencies >> are assumed unequal or independent), thus integrating out the >> uncertainty. (For standard matrices, I think this would also result >> in N receiving zero score.) >> >> In the end though, maybe there should be an option for a user to >> just provide a substitution matrix? >> >> -hilmar >> > > -- > -- > "Eppur si evolve" ("And yet it evolves") > -Galileo Jr (ca 21st century) > > -- > Alexie Papanicolaou > Entomology > Max Planck Institute for Chemical Ecology > Hans Knoell Str 8 > Jena 07745 > Germany > Email apapanicolaou at ice.mpg.de > Tel +493641571561 -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From apapanicolaou at ice.mpg.de Fri Jun 27 13:40:34 2008 From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou) Date: Fri, 27 Jun 2008 19:40:34 +0200 Subject: [Bioperl-l] IUPAC support for DNA alignment In-Reply-To: References: <4864BAA0.1000103@ice.mpg.de> <06F151CF-9994-42FD-BA8C-5F93AA285DF4@gmx.net> <4864F5A5.9000601@ice.mpg.de> Message-ID: <48652612.2010709@ice.mpg.de> Yes, I don't want to use a special scoring for what i'm doing now. The option would allow to score a C or A aligned with M the same score as specified in -match. I guess it would be quickier if I just made my own matrix but there is a TODO line on IUPAC codes so I thought I push it a bit. Yes from a computation point of view Sc 1 & 3 are the same. a Hilmar Lapp wrote: > So instead of the user choosing a special matrix you would like to > have a simple argument (that would probably under the hood do exactly > that)? > > BTW Scenarios #1 and #3 sound more or less the same to me (i.e., you > believe the degenerate code to reflect site polymorphism, not sequence > uncertainty). > > -hilmar > > On Jun 27, 2008, at 10:13 AM, Alexie Papanicolaou wrote: > >> Hello >> >> I guess I didn't give enough info... (also sorry Yee Man, forget to >> CC you before) >> >> Scenario 1 - polymorphic allele vs non-polymorphic one. e.g. >> Let [A/G] be SNP in two alleles in population A and the one fixed >> allele [G] is in population B. >> >> In this scenario we want to calculate the distance between one locus >> between two populations ,thus a degenerate site is not the result of >> uncertaintly but of reality. Obviously the best method is to provide >> a matrix (if the user can be bothered) but Yee Man already allows >> this option. Personally, I wouldn't really using an alignment score >> to measure distance though... The application here is: we first want >> to align those two sequences and there should be no penalty because >> there is a SNP in one population (then estimate distance with another >> algorithm). >> >> Scenario 2 - uncertainty >> If the scenario is that [A/G] is the result of uncertainty then I >> gladly agree with you! I'm also perplexed how to score IUPAC codes >> allowing for three nucleotides (i.e. there might not be a SNP after >> all... but then again infinitite sites doesn't have to hold - in some >> species less than others...) >> >> Scenario 3 - a type profile alignment to a consensus >> In my particular case, I'm doing something different: I have the >> consensus of an alignment of multiple sequences (dozens to hundrends >> depending on dataset) with some mismatches including a SNP say [A/G]. >> A third sequence that I wish to align has A in that position. So >> obviously, it shouldn't be penalized. >> >> So it really depends on application and the user should be able to >> decide in the end... (Yee Man already provides the option for a >> protein substitution matrix). It would be nice if we had the option >> of specifying it though much more easily (a simple switch) so i can >> use for scenario 3. >> a >> ps. sorry, my english is going the drain... >> >> >> Hilmar Lapp wrote: >>> Hi Alexie, >>> >>> On Jun 27, 2008, at 6:02 AM, Alexie Papanicolaou wrote: >>> >>>> Hello >>>> >>>> I'm the user who asked for it. I don't know of any conventions but >>>> perhaps people can help on this? >>>> >>>> I'm not an expert at all but here is my opinion: >>>> If you don't know the codon position (or even if it is coding) then >>>> you can't estimate the codon degeneracy. If you don't know the >>>> frequency of the bases representated in the degenerate site then >>>> you can't model it either on the DNA level. So any solution will be >>>> ad-hoc. >>>> >>>> Regarding 2 base degenerate positions: My suggestion is that in a >>>> situation of alignment between, say a polymorphic and non >>>> polymorphic population for that site, and the user is interested in >>>> the distance between the populations, it would make sense to have >>>> the score to the full match. >>>> >>>> Regarding 3 bases: I don't really know (see N below) but I 'd go >>>> for a full match again, assuming the user build the consensus. >>> >>> are you suggesting that a determined and a degenerate site aligned >>> pairwise should score as much as two determined sites? >>> >>> My (possibly naive) default would be to average over all >>> possibilities, each weighted by base frequency (if base frequencies >>> are assumed unequal or independent), thus integrating out the >>> uncertainty. (For standard matrices, I think this would also result >>> in N receiving zero score.) >>> >>> In the end though, maybe there should be an option for a user to >>> just provide a substitution matrix? >>> >>> -hilmar >>> >> >> -- >> -- >> "Eppur si evolve" ("And yet it evolves") >> -Galileo Jr (ca 21st century) >> >> -- >> Alexie Papanicolaou >> Entomology >> Max Planck Institute for Chemical Ecology >> Hans Knoell Str 8 >> Jena 07745 >> Germany >> Email apapanicolaou at ice.mpg.de >> Tel +493641571561 > -- -- "Eppur si evolve" ("And yet it evolves") -Galileo Jr (ca 21st century) -- Alexie Papanicolaou Entomology Max Planck Institute for Chemical Ecology Hans Knoell Str 8 Jena 07745 Germany Email apapanicolaou at ice.mpg.de Tel +493641571561 From aaron.j.mackey at gsk.com Fri Jun 27 10:09:57 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 27 Jun 2008 10:09:57 -0400 Subject: [Bioperl-l] IUPAC support for DNA alignment In-Reply-To: Message-ID: You could replicate what they do here with EST_GENOME (re-engineered to accept ambiguity codes): http://www.genome.org/cgi/content/short/17/2/212 But I think the answer is user-dependent -- some might want the "full score" (as in the above case), others might want the "(probabilistically) averaged score", etc. So, let the scoring matrix be subclass-able (or mix-able), so that users can specify the exact desired behavior via a handful of predefined (and useful) behaviors. -Aaron From ymc at shgc.stanford.edu Fri Jun 27 13:35:39 2008 From: ymc at shgc.stanford.edu (Yee Man Chan) Date: Fri, 27 Jun 2008 10:35:39 -0700 (PDT) Subject: [Bioperl-l] IUPAC support for DNA alignment In-Reply-To: References: Message-ID: Hi guys What about providing two switches; one for full score and one for probabilistic score? Assume match is +3 and mismatch -1 Full score version: 1) T - U = +3 (I assume U is the same as T for alignment purpose, right?) 2) A - W = +3 3) A - D = +3 4) A - N = +3 5) A - X = -1 (not so sure about this one) Probabilistic score version: 1) T - U = +3 2) A - W = +3/2-1/2 = +1 3) A - D = +3/3-1*2/3 = +1/3 4) A - N = +3/4-1*3/4 = 0 5) A - X = -1 What do you think? Yee Man On Fri, 27 Jun 2008 aaron.j.mackey at gsk.com wrote: > You could replicate what they do here with EST_GENOME (re-engineered to > accept ambiguity codes): > > http://www.genome.org/cgi/content/short/17/2/212 > > But I think the answer is user-dependent -- some might want the "full > score" (as in the above case), others might want the "(probabilistically) > averaged score", etc. So, let the scoring matrix be subclass-able (or > mix-able), so that users can specify the exact desired behavior via a > handful of predefined (and useful) behaviors. > > -Aaron > From hlapp at gmx.net Fri Jun 27 17:31:55 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 27 Jun 2008 17:31:55 -0400 Subject: [Bioperl-l] IUPAC support for DNA alignment In-Reply-To: References: Message-ID: <134AD76F-E008-4288-91BB-CBFA6042F54A@gmx.net> On Jun 27, 2008, at 1:35 PM, Yee Man Chan wrote: > > Hi guys > > What about providing two switches; one for full score and one for > probabilistic score? > > Assume match is +3 and mismatch -1 > > Full score version: > 1) T - U = +3 (I assume U is the same as T for alignment purpose, > right?) Right. > > 2) A - W = +3 > 3) A - D = +3 > 4) A - N = +3 > 5) A - X = -1 (not so sure about this one) > > Probabilistic score version: > 1) T - U = +3 > 2) A - W = +3/2-1/2 = +1 > 3) A - D = +3/3-1*2/3 = +1/3 > 4) A - N = +3/4-1*3/4 = 0 > 5) A - X = -1 Note that there are also M, R, V, and H, and their complements (which by definition would not match your example of 'A'). Note also that the above implicitly assumes 50% GC content or equal likelihood of the code-constituent bases, which in reality for most coding sequences is not true. Also, if you have a known polymorphism at the site, for 3-letter ambiguities not all 3 may be equally likely. For example, if you have letter D for a [A/G] SNP, one may not want to give 1/3 of weight to possibility T. I would at least allow for the possibility to assign expected base frequencies and weight the ambiguous possibilities by those. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From pmiguel at purdue.edu Sun Jun 29 19:44:31 2008 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sun, 29 Jun 2008 19:44:31 -0400 Subject: [Bioperl-l] How to read seq and quals into Bio::Seq::Quality object? Message-ID: <48681E5F.2090607@purdue.edu> What would be the accepted method to read the seq and qual values in from two files (a fasta and a qual file) and put them into Bio::Seq::Quality object. Is there a Bio::SeqIO method that would do that? Phillip From pmiguel at purdue.edu Mon Jun 30 15:13:47 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Mon, 30 Jun 2008 15:13:47 -0400 Subject: [Bioperl-l] How to read seq and quals into Bio::Seq::Quality object? In-Reply-To: <48681E5F.2090607@purdue.edu> References: <48681E5F.2090607@purdue.edu> Message-ID: <4869306B.1010503@purdue.edu> Anyone? Sorry if I'm being a jerk. But on the basis of the existence of Bio::Seq::Quality it seems Bio::Seq::SeqWithQuality was deprecated. But the latter has a clear methodology to get from .fasta and .fasta.qual files into a new object. Once the new object is populated, looks like the former might be superior to use. But at the moment the only way I'm seeing to populate the ::Quality object from two files is to bring each file in with SeqIO and then use the primary seq and primary qual objects. Thus created I'd export their sequence and quals as text and use that create the ::Quality object. If that is the way to go, fine. But I feel like I must be missing something. Phillip Purdue Genomics Core Facility Phillip SanMiguel wrote: > What would be the accepted method to read the seq and qual values in > from two files (a fasta and a qual file) and put them into > Bio::Seq::Quality object. Is there a Bio::SeqIO method that would do > that? > > Phillip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Mon Jun 30 15:34:00 2008 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 30 Jun 2008 12:34:00 -0700 Subject: [Bioperl-l] How to read seq and quals into Bio::Seq::Quality object? In-Reply-To: <4869306B.1010503@purdue.edu> References: <48681E5F.2090607@purdue.edu> <4869306B.1010503@purdue.edu> Message-ID: <48693528.1060700@gmail.com> I have looked at the issue before and haven't found an existing way that import nicely a FASTA and QUAL file at the same time. It would be valuable in my opinion, but at the moment I think you have to create two SeqIO objects and go through all the sequence records in both of them simultaneously. Florent Phillip San Miguel wrote: > Anyone? Sorry if I'm being a jerk. But on the basis of the > existence of Bio::Seq::Quality it seems Bio::Seq::SeqWithQuality was > deprecated. But the latter has a clear methodology to get from .fasta > and .fasta.qual files into a new object. Once the new object is > populated, looks like the former might be superior to use. > But at the moment the only way I'm seeing to populate the ::Quality > object from two files is to bring each file in with SeqIO and then use > the primary seq and primary qual objects. Thus created I'd export > their sequence and quals as text and use that create the ::Quality > object. If that is the way to go, fine. But I feel like I must be > missing something. > > Phillip > Purdue Genomics Core Facility > > Phillip SanMiguel wrote: >> What would be the accepted method to read the seq and qual values in >> from two files (a fasta and a qual file) and put them into >> Bio::Seq::Quality object. Is there a Bio::SeqIO method that would do >> that? >> >> Phillip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Mon Jun 30 17:25:32 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 30 Jun 2008 17:25:32 -0400 Subject: [Bioperl-l] How to read seq and quals into Bio::Seq::Quality object? In-Reply-To: <4869306B.1010503@purdue.edu> References: <48681E5F.2090607@purdue.edu> <4869306B.1010503@purdue.edu> Message-ID: Apologies if this is a stupid comment, but have you looked at the Bio::SeqIO::qual format parser? Purportedly, it returns Bio::Seq::Quality objects. -hilmar On Jun 30, 2008, at 3:13 PM, Phillip San Miguel wrote: > Anyone? Sorry if I'm being a jerk. But on the basis of the > existence of Bio::Seq::Quality it seems Bio::Seq::SeqWithQuality was > deprecated. But the latter has a clear methodology to get > from .fasta and .fasta.qual files into a new object. Once the new > object is populated, looks like the former might be superior to use. > But at the moment the only way I'm seeing to populate > the ::Quality object from two files is to bring each file in with > SeqIO and then use the primary seq and primary qual objects. Thus > created I'd export their sequence and quals as text and use that > create the ::Quality object. If that is the way to go, fine. But I > feel like I must be missing something. > > Phillip > Purdue Genomics Core Facility > > Phillip SanMiguel wrote: >> What would be the accepted method to read the seq and qual values >> in from two files (a fasta and a qual file) and put them into >> Bio::Seq::Quality object. Is there a Bio::SeqIO method that would >> do that? >> >> Phillip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : ===========================================================