From limericksean at gmail.com Wed Jun 1 07:43:03 2005 From: limericksean at gmail.com (Sean O'Keeffe) Date: Wed Jun 1 07:41:57 2005 Subject: [Bioperl-l] Bio::SearchIO::hmmer Message-ID: <462784640506010443329e049d@mail.gmail.com> Hi, I was wondering how Bio::SearchIO::hmmer parses hmmpfam/hmmsearch result files. I have a set of hmmpam result hits without alignments in a file which I generated using hmmpfam locally (-A0 option on the command line). Is this considered a valid result by Bio::SearchIO::hmmer? If so, what might be wrong with the code below (which gets as far as the first while loop and doesn't enter any other): use strict; use Bio::SearchIO; my $inhmmfile = "test-hmm.smart"; my $outputfilename = "HMM-test.hmmer.parsed"; my $fastafilename = "$outputfilename".".fasta"; my $inevalue =1; my $inlength =20; my ($myresult,$myhit,$myhsp,$mysignificance,$mylength, $mynohit,$mylasthit,$mylastresult,$mypercent) = 0; unless (open(PARSEDFILE, ">$outputfilename")) { print "Could not open file $outputfilename !\n"; exit; } unless (open(FASTAFILE, ">$fastafilename")) { print "Could not open file $fastafilename !\n"; exit; } my $in = new Bio::SearchIO(-format => 'hmmer', -file => $inhmmfile); while(my $result = $in->next_result ) { $myresult++; while (my $hit = $result->next_hit ) { $myhit++; while (my $hsp = $hit->next_hsp ) { $myhsp++; if( $hsp->length('total') >= $inlength ) { $mylength++; if ( $hit->significance <= $inevalue ) { $mysignificance++; print PARSEDFILE $result->query_name,"\t", $result->query_description,"\t", $result->query_length, "\t", $hit->description, "\t", $hit->accession, "\t", $hit->bits, "\t", $hit->significance, "\t", $hsp->num_identical, "\t", $hsp->num_conserved,"\t", $hsp->start('query'),"\t", $hsp->end('query'),"\t", $hsp->start('hit'),"\t", $hsp->end('hit'),"\n"; print FASTAFILE "> ", $hit->description,"\n", $hsp->hit_string,"\n"; } } } } if ($myhit == 0) { $mynohit++; } $myhit = 0; } $mypercent = $mynohit*100 / $myresult; print "\n\n", $myresult, " query sequence(s)\n"; print "\n", $myhsp, " HSP sequence(s) \n"; print "\n", $mylength, " hit(s) presenting the minimum requested length\n"; print "\n", $mysignificance, " hit(s) presenting the minimum requested E-value\n"; print "\n", $mynohit, " query sequence(s) presenting < NO HITS > "; close (PARSEDFILE); close (FASTAFILE); exit; Output is: Use of uninitialized value in numeric eq (==) at ../hmm-test-parse.pl line 58, line 40126. 1 query sequence(s) Use of uninitialized value in print at ../hmm-test-parse.pl line 68, line 40126. HSP sequence(s) Use of uninitialized value in print at ../hmm-test-parse.pl line 69, line 40126. hit(s) presenting the minimum requested length Use of uninitialized value in print at ../hmm-test-parse.pl line 70, line 40126. hit(s) presenting the minimum requested E-value 1 query sequence(s) presenting < NO HITS > (100.00 %) Thanks very much, Sean. From limericksean at gmail.com Wed Jun 1 09:08:02 2005 From: limericksean at gmail.com (Sean O'Keeffe) Date: Wed Jun 1 09:03:15 2005 Subject: [Bioperl-l] Bio::SearchIO::hmmer Message-ID: <4627846405060106084c917610@mail.gmail.com> Hi, I was wondering how Bio::SearchIO::hmmer parses hmmpfam/hmmsearch result files. I have a set of hmmpam result hits without alignments in a file which I generated using hmmpfam locally (-A0 option on the command line). Is this considered a valid result by Bio::SearchIO::hmmer? If so, what might be wrong with the code below (which gets as far as the first while loop and doesn't enter any other): use strict; use Bio::SearchIO; my $inhmmfile = "test-hmm.smart"; my $outputfilename = "HMM-test.hmmer.parsed"; my $fastafilename = "$outputfilename".".fasta"; my $inevalue =1; my $inlength =20; my ($myresult,$myhit,$myhsp,$mysignificance,$mylength, $mynohit,$mylasthit,$mylastresult,$mypercent) = 0; unless (open(PARSEDFILE, ">$outputfilename")) { print "Could not open file $outputfilename !\n"; exit; } unless (open(FASTAFILE, ">$fastafilename")) { print "Could not open file $fastafilename !\n"; exit; } my $in = new Bio::SearchIO(-format => 'hmmer', -file => $inhmmfile); while(my $result = $in->next_result ) { $myresult++; while (my $hit = $result->next_hit ) { $myhit++; while (my $hsp = $hit->next_hsp ) { $myhsp++; if( $hsp->length('total') >= $inlength ) { $mylength++; if ( $hit->significance <= $inevalue ) { $mysignificance++; print PARSEDFILE $result->query_name,"\t", $result->query_description,"\t", $result->query_length, "\t", $hit->description, "\t", $hit->accession, "\t", $hit->bits, "\t", $hit->significance, "\t", $hsp->num_identical, "\t", $hsp->num_conserved,"\t", $hsp->start('query'),"\t", $hsp->end('query'),"\t", $hsp->start('hit'),"\t", $hsp->end('hit'),"\n"; print FASTAFILE "> ", $hit->description,"\n", $hsp->hit_string,"\n"; } } } } if ($myhit == 0) { $mynohit++; } $myhit = 0; } $mypercent = $mynohit*100 / $myresult; print "\n\n", $myresult, " query sequence(s)\n"; print "\n", $myhsp, " HSP sequence(s) \n"; print "\n", $mylength, " hit(s) presenting the minimum requested length\n"; print "\n", $mysignificance, " hit(s) presenting the minimum requested E-value\n"; print "\n", $mynohit, " query sequence(s) presenting < NO HITS > "; close (PARSEDFILE); close (FASTAFILE); exit; Output is: Use of uninitialized value in numeric eq (==) at ../hmm-test-parse.pl line 58, line 40126. 1 query sequence(s) Use of uninitialized value in print at ../hmm-test-parse.pl line 68, line 40126. HSP sequence(s) Use of uninitialized value in print at ../hmm-test-parse.pl line 69, line 40126. hit(s) presenting the minimum requested length Use of uninitialized value in print at ../hmm-test-parse.pl line 70, line 40126. hit(s) presenting the minimum requested E-value 1 query sequence(s) presenting < NO HITS > (100.00 %) Thanks very much, Sean. From Sean.Maceachern at dpi.vic.gov.au Thu Jun 2 03:51:15 2005 From: Sean.Maceachern at dpi.vic.gov.au (Sean.Maceachern@dpi.vic.gov.au) Date: Thu Jun 2 03:44:04 2005 Subject: [Bioperl-l] bioperl-ext1.4 installation problem (Sean MacEachern) Message-ID: Hello, I am trying to install bioperl on a machine running fedora core 3 and I have run across a few problems installing bioperl-ext-1.4. I have installed all of the appropriate Inline and io_lib files however whenever I try to execute the perl Makefile.PL command I get the following error: [root@fred bioperl-ext-1.4] # perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Bio::Ext::Align Found Staden io_lib "libread" in /usr/local/lib ... Automatically using the Read.h found in /usr/local/include/io_lib ... Writing Makefile for Bio::SeqIO::staden::read Writing Makefile for Bio One or more DATA sections were not processed by Inline. I then attempted to make the staden::read file independently, however I came across another error which I believe may have something to do with a line out of place in one of the io_lib *.h files but was unable to determine which file it was (Read.h os.h etc...). I have attached some of the error mesage below, hopefully someone will be able to help me identify the problem. [root@fred read] # perl Makefile.PL Writing Makefile for Bio::SeqIO::staden::read [root@fred read] # make gcc -c -I/usr/local/genomics/bioperl-ext-1.4/Bio/SeqIO/staden -I/usr/local/include/io_lib -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -DVERSION=\"0.01\" -DXS_VERSION=\"0.01\" -fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" read.c In file included from /usr/local/include/io_lib/Read.h:43, from read.xs:5: /usr/local/include/io_lib/os.h:9:1: warning: "INT_MAX" redefined In file included from /usr/include/sys/param.h:22, from /usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE/perl.h:446, from read.xs:2: /usr/lib/gcc/i386-redhat-linux/3.4.3/include/limits.h:74:1: warning: this is the location of the previous definition In file included from /usr/local/include/io_lib/Read.h:43, from read.xs:5: /usr/local/include/io_lib/os.h:10:1: warning: "SHRT_MAX" redefined In file included from /usr/include/sys/param.h:22, from /usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE/perl.h:446, from read.xs:2: /usr/lib/gcc/i386-redhat-linux/3.4.3/include/limits.h:60:1: warning: this is the location of the previous definition In file included from /usr/local/include/io_lib/Read.h:43, from read.xs:5: /usr/local/include/io_lib/os.h:24:4: #error No 2-byte integer type found. /usr/local/include/io_lib/os.h:34:4: #error No 4-byte integer type found. In file included from /usr/local/include/io_lib/Read.h:43, from read.xs:5: /usr/local/include/io_lib/os.h:40: error: syntax error before "int_2" /usr/local/include/io_lib/os.h:40: warning: data definition has no type or storage class /usr/local/include/io_lib/os.h:41: error: syntax error before "uint_2" /usr/local/include/io_lib/os.h:41: warning: data definition has no type or storage class /usr/local/include/io_lib/os.h:42: error: syntax error before "int_4" /usr/local/include/io_lib/os.h:42: warning: data definition has no type or storage class /usr/local/include/io_lib/os.h:43: error: syntax error before "uint_4" /usr/local/include/io_lib/os.h:43: warning: data definition has no type or storage class Regards, Sean From Richard.Adams at ed.ac.uk Thu Jun 2 05:20:12 2005 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Thu Jun 2 05:15:29 2005 Subject: [Bioperl-l] please help me to check why this perl script does notwork! Message-ID: <429ECF4C.4080909@ed.ac.uk> Hi Fei Li, I've written a BioPerl module for submitting sequences to NetPhos at the same site, Bio/Tools/Analysis/Protein/NetPhos. You can use that as a template for your code - you just have to alter the form fields and the result parser. See http://bioperl.org/HOWTOs/SimpleWebAnalysis/index.html Or if you would be willing to contribute your ChloroP specific code I could wrap it up into a new BioPerl module , Regards Richard -- Dr Richard Adams Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From michael.watson at bbsrc.ac.uk Thu Jun 2 08:53:45 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu Jun 2 08:48:06 2005 Subject: [Bioperl-l] Removing SeqFeatures Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D5AE@iahce2knas1.iah.bbsrc.reserved> Hi I want to take an EMBL sequence and remove some of the features, but keep all of the other information (accession, id, sequence, comments etc). So I have got some code which gives me the features that I want and the ones I don't want, but I can't figure out how to "remove" a feature. The only way I can think of doing this is to create a new sequence and transfer the data across from the old sequence bit by bit - surely there's a better way? Many thanks Mick From Marc.Logghe at devgen.com Thu Jun 2 09:23:34 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Jun 2 09:18:14 2005 Subject: [Bioperl-l] Removing SeqFeatures Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E7F8@ANTARESIA.be.devgen.com> Hi Mick, I think the only way is to run the remove_SeqFeatures() and catch the removed features in an array. Filter the array keeping the features you want to keep and add them to your sequence object using add_SeqFeature(@features_to_keep) HTH, Marc > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > michael watson (IAH-C) > Sent: Thursday, June 02, 2005 2:54 PM > To: Bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Removing SeqFeatures > > Hi > > I want to take an EMBL sequence and remove some of the > features, but keep all of the other information (accession, > id, sequence, comments etc). > > So I have got some code which gives me the features that I > want and the ones I don't want, but I can't figure out how to > "remove" a feature. > The only way I can think of doing this is to create a new > sequence and transfer the data across from the old sequence > bit by bit - surely there's a better way? > > Many thanks > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From MEC at Stowers-Institute.org Thu Jun 2 10:45:19 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Thu Jun 2 10:38:02 2005 Subject: [Bioperl-l] Removing SeqFeatures Message-ID: <200506021437.j52EbJfY024658@portal.open-bio.org> FWIW: In day's past I had the following exchange with Ewan/Hilmar/Matthew on exactly this topic in which I proposed a patch to Bio::Seq to define 'delete_SeqFeature' and was subsequenty converted against the approach: http://bioperl.org/pipermail/bioperl-l/2001-May/005624.html Cheers, malcolm -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Marc Logghe Sent: Thursday, June 02, 2005 8:24 AM To: michael watson (IAH-C); Bioperl-l@portal.open-bio.org Subject: RE: [Bioperl-l] Removing SeqFeatures Hi Mick, I think the only way is to run the remove_SeqFeatures() and catch the removed features in an array. Filter the array keeping the features you want to keep and add them to your sequence object using add_SeqFeature(@features_to_keep) HTH, Marc > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > michael watson (IAH-C) > Sent: Thursday, June 02, 2005 2:54 PM > To: Bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Removing SeqFeatures > > Hi > > I want to take an EMBL sequence and remove some of the > features, but keep all of the other information (accession, > id, sequence, comments etc). > > So I have got some code which gives me the features that I > want and the ones I don't want, but I can't figure out how to > "remove" a feature. > The only way I can think of doing this is to create a new > sequence and transfer the data across from the old sequence > bit by bit - surely there's a better way? > > Many thanks > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From Annie.Law at nrc-cnrc.gc.ca Thu Jun 2 11:59:22 2005 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Thu Jun 2 11:51:24 2005 Subject: [Bioperl-l] Entrez Gene parser questions Message-ID: <10C94843061E094A98C02EB77CFC328722FF48@nrcmrdex1d.imsb.nrc.ca> Hi, I would appreciate help with the following. First of all, I would like to say it's great that a Entrez gene parser was written. I just have some questions to get me started. A) I already have bioperl 1.4 installed. I would like to know if I can install the Entrez gene parser and Stefan Kirov's entrezgene.pm with out reinstalling bioperl. If this is not advised then I would reinstall bioperl. I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...) 2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory (which util.pm shold I use?? I searched CPAN and got many results for util.pm is it Biblio::Util, Boulder::Util, or something else??) B) For each Entrez gene id how do I access the associated Unigene id, accession numbers, gene symbol, gene name, And GO IDs. Also, I have been looking at the code and the POD within it. Are there some other places that I can look for documentation? Thanks so much!! Annie. From mingyi.liu at gpc-biotech.com Thu Jun 2 12:59:28 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Thu Jun 2 12:53:30 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF48@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF48@nrcmrdex1d.imsb.nrc.ca> Message-ID: <429F3AF0.4000207@gpc-biotech.com> Hi, Annie, Stefan's more familiar with the requirements of entrezgene.pm. Based on my experience installing entrezgene.pm for bioperl 1.5 (I don't have 1.4) there's no need for a util.pm. I suspect the reason why you mentioned a util.pm was that previously my Bio::ASN1::EntrezGene needed a util.pm, but that was eliminated since V1.02 or so and before Bio::ASN1 space was created on CPAN. Maybe Stefan had not updated the documentation to reflect this change? You do need to install Bio::ASN1::EntrezGene first, then copy both the entrezgene.pm and the latest Bio::Annotation::DBLink from bioperl CVS since entrezgene.pm uses the url() in DBLink, which is not present in even bioperl 1.5. For me, I also had to do a force install Grapha::Directed, but that might not be a problem for you if your system already has it (my bioperl installation was old and incomplete before). Best, Mingyi Law, Annie wrote: >Hi, > >I would appreciate help with the following. First of all, I would like to say it's great that a Entrez gene parser was written. I just have some questions to get me started. > >A) I already have bioperl 1.4 installed. I would like to know if I can install the Entrez gene parser and >Stefan Kirov's entrezgene.pm with out reinstalling bioperl. If this is not advised then I would reinstall bioperl. > >I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...) > 2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory > (which util.pm shold I use?? I searched CPAN and got many results for util.pm > is it Biblio::Util, Boulder::Util, or something else??) > >B) For each Entrez gene id how do I access the associated Unigene id, accession numbers, gene symbol, gene name, >And GO IDs. > >Also, I have been looking at the code and the POD within it. Are there some other places that I can look for documentation? > >Thanks so much!! >Annie. > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From skirov at utk.edu Thu Jun 2 13:39:21 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 2 13:31:28 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF48@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF48@nrcmrdex1d.imsb.nrc.ca> Message-ID: <429F4449.8000801@utk.edu> Annie, I am sorry to say there is no good documentation yet, since I am still evaluating and debugging the code and I have buch of other stuff to deal with ir tight now, so appolozie but there will not be a comprehensive docs for at least another 2 weeks. First install: I think you need to install Bio::SeqIO::entrezgene.pm and Bio::ASN1::EnztrezGene. You will have to update Bio::Annotation::DBLink. Make sure you also have Bio::Cluster::SequenceFamily and all modules in Bio::SeqFeature::Gene. These also may need updating. When you do my $seqio = Bio::SeqIO->new(-format => 'entrezgene', -file => $file, -debug => 'on'); my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; $gene->accession_number will give you the Entrezgene id and $gene->id will give you the gene symbol. If you go through the annotations: my $ann=$gene->annotation; (where most of the data is) my @dblinks= $ann->get_Annotations('DBLink') foreach my $dblink (@dblinks) { print 'Unigene id for this gene is '.$dblink->id if (lc($dblink->database) eq 'unigene'); } my @nameann=$ann->get_Annotations('Official Full Name') print 'Gene name is ',$nameann[0]->as_text,"\n"; my @go=$ann->get_Annotations('OntologyTerm'); foreach my $go (@go) { next if ($go->authority eq 'STS marker'); #Unless you want STS markers... my @refs=$go->term->references;#you should get just one print join(',',$gid,$go->ontology->name,$go->identifier,$go->name,'GO:'.$go->identifier,$refs[0]->medline); } my @associated_seq=$struct->get_members(); foreach my $seq (@associated_seq) { if ($contig->namespace eq 'refseq') {#Only refseq, there is also mrna, product and genomic as options... my @prod=$contig->annotation->get_Annotations('product'); my @transvar=$contig->annotation->get_Annotations('simple'); my $transvar=''; foreach my $sv (@transvar) { $transvar=$sv->value if ($sv->tagname eq 'Transcriptional Variant'); } my $assembly; foreach my $ann ($contig->annotation->get_Annotations('dblink')) { $assembly.=$ann->primary_id . '-' if ($ann->optional_id eq 'Source Sequence'); } chop $assembly; my $prod; foreach my $p (@prod) { if ($p) { $prod=$p->value; last; } } } Hope this helps and will get you started at least. Let me know if you have more question. Stefan Law, Annie wrote: >Hi, > >I would appreciate help with the following. First of all, I would like to say it's great that a Entrez gene parser was written. I just have some questions to get me started. > >A) I already have bioperl 1.4 installed. I would like to know if I can install the Entrez gene parser and >Stefan Kirov's entrezgene.pm with out reinstalling bioperl. If this is not advised then I would reinstall bioperl. > >I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...) > 2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory > (which util.pm shold I use?? I searched CPAN and got many results for util.pm > is it Biblio::Util, Boulder::Util, or something else??) > >B) For each Entrez gene id how do I access the associated Unigene id, accession numbers, gene symbol, gene name, >And GO IDs. > >Also, I have been looking at the code and the POD within it. Are there some other places that I can look for documentation? > >Thanks so much!! >Annie. > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From skirov at utk.edu Thu Jun 2 13:44:47 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 2 13:36:47 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF48@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF48@nrcmrdex1d.imsb.nrc.ca> Message-ID: <429F458F.1020201@utk.edu> Ani, Also pleas let me know if you find anything weird. I think GIs might have a tab at the end. Also note that (at least as of April 25th) EntrezGene ASN file does not have Gene ontology for Drosophila. So far I have not heard from NCBI help desk on that. Stefan Law, Annie wrote: >Hi, > >I would appreciate help with the following. First of all, I would like to say it's great that a Entrez gene parser was written. I just have some questions to get me started. > >A) I already have bioperl 1.4 installed. I would like to know if I can install the Entrez gene parser and >Stefan Kirov's entrezgene.pm with out reinstalling bioperl. If this is not advised then I would reinstall bioperl. > >I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...) > 2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory > (which util.pm shold I use?? I searched CPAN and got many results for util.pm > is it Biblio::Util, Boulder::Util, or something else??) > >B) For each Entrez gene id how do I access the associated Unigene id, accession numbers, gene symbol, gene name, >And GO IDs. > >Also, I have been looking at the code and the POD within it. Are there some other places that I can look for documentation? > >Thanks so much!! >Annie. > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From golharam at umdnj.edu Thu Jun 2 15:39:06 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu Jun 2 15:31:22 2005 Subject: [Bioperl-l] AlignIO and bl2seq Message-ID: <000501c567aa$bd414600$e6028a0a@GOLHARMOBILE1> Hi all, I'm trying to parse some alignments performed using bl2seq. My code is as follows: my $output = `bl2seq -p blastn -i seq1 -j seq2`; my $in = Bio::AlignIO->new(-fh => new IO::String($output), -format => 'bl2seq'); my $aln = $in->next_aln(); if (defined($aln)) { print "Score: ", $aln->score, "\n"; } else { print "n/a"; } When it comes to printing the score, nothing gets printed out, which makes sense because blast gives a list of HSPs or none if there aren't any. So, how do I get the first HSP from the output using AlignIO? I know I should be using SearchIO to get the HSPs, but I thought I would try it with AlignIO as its documented, but I can't get it working. Any ideas??? Ryan From micheleen at mail.utexas.edu Fri Jun 3 02:03:19 2005 From: micheleen at mail.utexas.edu (Micheleen Harris) Date: Fri Jun 3 02:01:22 2005 Subject: [Bioperl-l] getting t_coffee to run with bioperl in linux first time Message-ID: <6.2.1.2.2.20050603004311.01e902f0@mail.utexas.edu> I just installed t_coffee into my home directory on a linux machine and it is working fine. Bioperl is also functioning as I tried querying genbank for sequence data successfully. I then followed the following instructions to make a wrapper for t_coffee: 1. Make sure the t_coffee executable is in your path so that which t_coffee returns a t_cofee executable on your system. 2. Define an environmental variable TCOFFEEDIR which is a dir which contains the 't_coffee' app: In bash export TCOFFEEDIR=/home/username/progs/T-COFFEE_distribution_Version_1.37/bin In csh/tcsh setenv TCOFFEEDIR /home/username/progs/T-COFFEE_distribution_Version_1.37/bin 3. Include a definition of an environmental variable TCOFFEEDIR in every script that will use this TCoffee wrapper module. BEGIN { $ENV{TCOFFEDIR} = '/home/username/progs/T-COFFEE_distribution_Version_1.37/bin' } use Bio::Tools::Run::Alignment::TCoffee; I placed the path to t_coffee and defined TCOFFEEDIR in my .bash_profile file as follows: # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin PATH=$PATH:/home/micheleen/progs/T-COFFEE_distribution_Version_1.37/bin/ export PATH export TCOFFEEDIR = /home/micheleen/progs/T-COFFEE_distribution_Version_1.37/bin unset USERNAME I ran the following script: #!/bin/perl -w BEGIN { $ENV{TCOFFEDIR} = '/home/micheleen/progs/T-COFFEE_distribution_Version_1 .37/bin' }; use Bio::Tools::Run::Alignment::TCoffee; use Bio::Seq; use Bio::DB::GenBank; $coffeefound = Bio::Tools::Run::Alignment::TCoffee->exists_tcoffee(); And get the error: Can't locate Bio/Tools/Run/Alignment/TCoffee.pm in @INC (@INC contains: /etc/perl /usr/lib/perl5/site_perl/5.8.5/i686-linux /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.2/i686-linux /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.5/i686-linux /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.5/i686-linux /usr/lib/perl5/5.8.5 /usr/local/lib/site_perl /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.2/i686-linux /usr/lib/perl5/site_perl/5.8.4 .) at my_scripts/find_pol.pl line 4. BEGIN failed--compilation aborted at my_scripts/find_pol.pl line 4. Please advise! I'm certain I have the paths correct. Is this a simple fix, did I miss something obvious? Thanks, Micheleen From taerwin at tpg.com.au Fri Jun 3 03:39:58 2005 From: taerwin at tpg.com.au (Tim Erwin) Date: Fri Jun 3 03:35:11 2005 Subject: [Bioperl-l] getting t_coffee to run with bioperl in linux first time In-Reply-To: <6.2.1.2.2.20050603004311.01e902f0@mail.utexas.edu> References: <6.2.1.2.2.20050603004311.01e902f0@mail.utexas.edu> Message-ID: <1117784398.9865.13.camel@bacp4> Hi Micheleen, This error is to do with the BioPerl package and not the program t_coffee. You will need to set the environment variable PERL5LIB to use your bioperl libraries (I am also assuming you have installed the bioperl-run package) i.e export PERL5LIB=/somedir/bioperl-run-1.4/ To test that you have set the PERL5LIB variable correctly you should be able to type perldoc Bio::Tools::Run::Alignment::TCoffee, and if you get the documentation you should be good to go. You can also set the location of the modules inside you script with: BEGIN { unshift @INC,"/somedir/bioperl-run-1.4"; }; or use lib "somedir"; Regards, Tim > I ran the following script: > > #!/bin/perl -w > > BEGIN { $ENV{TCOFFEDIR} = > '/home/micheleen/progs/T-COFFEE_distribution_Version_1 > .37/bin' }; > use Bio::Tools::Run::Alignment::TCoffee; > use Bio::Seq; > use Bio::DB::GenBank; > > $coffeefound = Bio::Tools::Run::Alignment::TCoffee->exists_tcoffee(); > > > And get the error: > > Can't locate Bio/Tools/Run/Alignment/TCoffee.pm in @INC (@INC contains: > /etc/perl /usr/lib/perl5/site_perl/5.8.5/i686-linux > /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.2 > /usr/lib/perl5/site_perl/5.8.2/i686- linux /usr/lib/perl5/site_perl/5.8.4 > /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.5/i686-linux > /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl/5.8.4 > /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.5/i686-linux > /usr/lib/perl5/5.8.5 /usr/local/lib/site_perl > /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.2/i686- linux > /usr/lib/perl5/site_perl/5.8.4 .) at my_scripts/find_pol.pl line 4. > BEGIN failed--compilation aborted at my_scripts/find_pol.pl line 4. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From michael.watson at bbsrc.ac.uk Fri Jun 3 08:53:10 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri Jun 3 08:45:33 2005 Subject: [Bioperl-l] Bio::Graphics quickie! Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D5D9@iahce2knas1.iah.bbsrc.reserved> Hi How do I change the font size of a feature label in tracks of a Bio::Graphics::Panel object? Thanks Mick From nit_822001 at yahoo.com Fri Jun 3 07:50:09 2005 From: nit_822001 at yahoo.com (NITIN kumar) Date: Fri Jun 3 09:56:07 2005 Subject: [Bioperl-l] How to Install Message-ID: <20050603115009.51952.qmail@web52607.mail.yahoo.com> Hai, I am a student of bioinformatics and i have pc with linux enterprise edition os having perl 5.8.Plese help me in installation of bioperl in my pc. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From brian_osborne at cognia.com Fri Jun 3 10:28:01 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jun 3 10:23:29 2005 Subject: [Bioperl-l] How to Install In-Reply-To: <20050603115009.51952.qmail@web52607.mail.yahoo.com> Message-ID: Nitin, Have you been using the INSTALL instructions? http://bioperl.org/Core/Latest/INSTALL Brian O. On 6/3/05 7:50 AM, "NITIN kumar" wrote: > Hai, > I am a student of bioinformatics and i have pc > with linux enterprise edition os having perl 5.8.Plese > help me > in installation of bioperl in my pc. > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri Jun 3 10:31:57 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jun 3 10:25:27 2005 Subject: [Bioperl-l] Reroot Tree ? In-Reply-To: References: Message-ID: Of course it looks at leaves - those can be the only labeled nodes in the tree sometimes. So your problem is getting a node by name not in the re-root itself? See the test in t/Tree.t as to how it is often used but basically my ($node) = $tree->find_node(-id => 'node1'); You can also do it this way: my @nodes = grep { $_->id eq 'node1' } $tree->get_nodes; I did update the re-root code since bioperl 1.4 so you may also want to grab all the latest modules in Bio/Tree and Bio/TreeIO directory out of CVS. --jason On May 30, 2005, at 7:36 AM, Ferdinand Marl?taz wrote: > Hi, > > Well, I'm trying to reroot a population of Tree and I don't manage > to do it ! In fact, the function reroot asks $node and I'd like to > reroot with a specific taxa of my tree, so I ignore the node name > on which I must reroot ? I've tried find_node but it doesn't seem > to work (I don't thing it looks leaves). > So, what should I do ? > > > Thanks > > Ferdi > > _____________________________ > Ferdinand Marl?taz > Evolution et phylog?nie des m?tazoaires > UMR 6540 DIMAR > Rue Batterie des Lions > 13007 MARSEILLE > Tel. 33 (0)4 91 04 16 54 > Port. 33 (0)6 30 35 58 49 > Mail. Ferdinand.Marletaz@ens- > lyon.fr_______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From akarger at CGR.Harvard.edu Fri Jun 3 12:43:22 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri Jun 3 12:51:15 2005 Subject: [Bioperl-l] New GO file format Message-ID: <339D68B133EAD311971E009027DC47970301D00E@montecarlo.cgr.harvard.edu> Hi. Do the bioperl GO tools handle the new OBO file format? All I want to do is find every GO term's parent. (I found go-perl, which I believe can do it: I'm just wondering if I can do it within Bioperl.) Thanks, -Amir Karger From lstein at cshl.edu Fri Jun 3 13:21:51 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Jun 3 13:15:37 2005 Subject: [Bioperl-l] Bio::Graphics quickie! In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D5D9@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950172D5D9@iahce2knas1.iah.bbsrc.reserved> Message-ID: <200506031321.52193.lstein@cshl.edu> You can control the font size of each track by supplying -font to the track definition. Values are gdSmallFont, gdMediumFont, gdMediumBoldFont and gdLargeFont. You can also pass a GD::Font object created from your favorite font file, as described in the GD documentation. Sadly, TrueType fonts are not supported at this time. Lincoln On Friday 03 June 2005 08:53 am, michael watson (IAH-C) wrote: > Hi > > How do I change the font size of a feature label in tracks of a > Bio::Graphics::Panel object? > > Thanks > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From zhoujie at fudan.edu.cn Sat Jun 4 20:48:53 2005 From: zhoujie at fudan.edu.cn (Jie Zhou) Date: Sat Jun 4 08:44:00 2005 Subject: [Bioperl-l] A problem when converting Genbank->GFF->png using script Message-ID: <1117932533.13493.13.camel@debian.ibsfu> Hi all, I wanted to convert a genbank file into a png file, trying to show the features presented in the genbank file in a graph. I used the following two scripts in bioperl: "feature_draw.PLS", "genbank2gff3.PLS". And I got some error message as follows: ----------------------------------------------------------------- jzhou@debian:~/results/GFF$ perl genbank2gff3.PLS test Processing file test... Use of uninitialized value in concatenation (.) or string at genbank2gff3.PLS line 172. working on contig NM_005476... GFF3 saved to ./.gff jzhou@debian:~/results/GFF$ mv .gff test.gff jzhou@debian:~/results/GFF$ perl feature_draw.PLS test.gff >test.png Can't use string ("Bio::DB::GFF") as a HASH ref while "strict refs" in use at /usr/local/share/perl/5.8.4/Bio/DB/GFF.pm line 2198, <> line 4. ----------------------------------------------------------------- I checked the gff file, it looks like this: ----------------------------------------------------------------- ##gff-version 3 ##sequence-region NM_005476 1 3760 ##source bp_genbank2gff3.pl NM_005476 GenBank region 1 3760 . . . ID=NM_005476 NM_005476 GenBank databank_entry 1 3760 . + . ID=GenBank:databank_entry:NM_005476:1:3760;mol_type=mRNA;db_xref=taxon:9606;map=9p13.2;chromosome=9;organism=Homo+sapiens NM_005476 GenBank gene 1 3760 . + . ID=gene:GNE;db_xref=GeneID:10020,HPRD:HPRD_04825,MIM:603824;gene=GNE;note=synonyms:+NM%2C+DMRV%2C+IBM2%2C+Uae1%2C+GLCNE NM_005476 GenBank mRNA 127 2295 . + . ID=mRNA:GNE.t01;Parent=gene:GNE;gene=GNE NM_005476 GenBank CDS 127 2295 . + . Parent=mRNA:GNE.t01;protein_id=NP_005467.1;gene=GNE;note=N-acylmannosamine . .(omitted) . NM_005476 GenBank exon 127 2295 . + . Parent=mRNA:GNE.t01;gene=GNE NM_005476 GenBank polyA_signal 3722 3727 . + . Parent=gene:GNE;gene=GNE NM_005476 GenBank polyA_site 3745 3745 . + . Parent=gene:GNE;gene=GNE >NM_005476 GCTCTGCCTGCTTCGTGGCGCTTGGTTCGTCCCTCGCCCGAGGAGCGCGGTGGCGGCGTG ...(ommited) ------------------------------------------------------------------ I also checked the genbank file I input, it's a correct one. I wonder, is the procedure right? Why does the error happen? Thanks very much for any help. Regards, Jie From amackey at pcbi.upenn.edu Sat Jun 4 14:03:14 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Sat Jun 4 13:55:31 2005 Subject: [Bioperl-l] bioperl-ext1.4 installation problem (Sean MacEachern) In-Reply-To: References: Message-ID: <6BC45CE1-CB14-4414-9CBE-52C3D09BD190@pcbi.upenn.edu> You need to copy the config.h and os.h files into the io_lib include installation directory. -Aaron On Jun 2, 2005, at 3:51 AM, Sean.Maceachern@dpi.vic.gov.au wrote: > however I > came across another error which I believe may have something to do > with a > line out of place in one of the io_lib *.h files but was unable to > determine which file it was (Read.h os.h etc...). -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From hlapp at gmx.net Sun Jun 5 20:29:53 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Jun 5 20:22:02 2005 Subject: [Bioperl-l] New GO file format In-Reply-To: <339D68B133EAD311971E009027DC47970301D00E@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC47970301D00E@montecarlo.cgr.harvard.edu> Message-ID: <42eeee65cc9731cf2b7be11713eef7a0@gmx.net> I've been writing an adaptor layer in bioperl that will transparently adapt go-perl into the bioperl object model. I'm 90% there, what's left is mostly adding tests (and then probably debugging why it doesn't work ;). -hilmar On Jun 3, 2005, at 9:43 AM, Amir Karger wrote: > Hi. > > Do the bioperl GO tools handle the new OBO file format? All I want to > do is > find every GO term's parent. (I found go-perl, which I believe can do > it: > I'm just wondering if I can do it within Bioperl.) > > Thanks, > > -Amir Karger > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From xuying at sibs.ac.cn Mon Jun 6 03:45:31 2005 From: xuying at sibs.ac.cn (xuying) Date: Mon Jun 6 03:39:22 2005 Subject: [Bioperl-l] staden package building problem Message-ID: <20050606074648.EE71610DFD5@smtp.sibsnet.org> Hi all: When I was compiling the staden package, it aborted with the follow messages. ............ Makefile:73: warning: overriding commands for target `clean' mk/global.mk:256: warning: ignoring old commands for target `clean' Makefile:88: warning: overriding commands for target `depend' mk/global.mk:515: warning: ignoring old commands for target `depend' Makefile:128: warning: overriding commands for target `distsrc' mk/global.mk:494: warning: ignoring old commands for target `distsrc' cd Misc && make all make[1]: Entering directory `/home/xuying/staden-src-1-5-3/src/Misc' make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/home/xuying/staden-src-1-5-3/src/Misc' cd io_lib && make all make[1]: Entering directory `/home/xuying/staden-src-1-5-3/src/io_lib' Makefile:129: dependencies: No such file or directory Makefile:161: warning: overriding commands for target `distsrc' ../mk/global.mk:494: warning: ignoring old commands for target `distsrc' cc -L../lib/linux-binaries -shared -o ../lib/linux-binaries/libread.so read/linux-binaries/Read.o read/linux-binaries/translate.o read/linux-binaries/scf_extras.o utils/linux-binaries/find.o utils/linux-binaries/mach-io.o utils/linux-binaries/traceType.o utils/linux-binaries/read_alloc.o utils/linux-binaries/compress.o utils/linux-binaries/open_trace_file.o scf/linux-binaries/read_scf.o scf/linux-binaries/write_scf.o scf/linux-binaries/misc_scf.o exp_file/linux-binaries/expFileIO.o plain/linux-binaries/seqIOPlain.o abi/linux-binaries/fpoint.o abi/linux-binaries/seqIOABI.o alf/linux-binaries/seqIOALF.o ctf/linux-binaries/ctfCompress.o ctf/linux-binaries/seqIOCTF.o ztr/linux-binaries/compression.o ztr/linux-binaries/ztr_translate.o ztr/linux-binaries/ztr.o -lz -lmisc cd progs && make make[2]: Entering directory `/home/xuying/staden-src-1-5-3/src/io_lib/progs' Makefile:84: dependencies: No such file or directory make[2]: Nothing to be done for `all'. make[2]: Leaving directory `/home/xuying/staden-src-1-5-3/src/io_lib/progs' make[1]: Leaving directory `/home/xuying/staden-src-1-5-3/src/io_lib' cd tk_utils && make all make[1]: Entering directory `/home/xuying/staden-src-1-5-3/src/tk_utils' Makefile:123: warning: overriding commands for target `distsrc' ../mk/global.mk:494: warning: ignoring old commands for target `distsrc' ln -s ../licence/boxes.h . ln: `./boxes.h': File exists make[1]: *** [.links] Error 1 make[1]: Leaving directory `/home/xuying/staden-src-1-5-3/src/tk_utils' make: *** [tk_utils] Error 2 What's the problem? I just installed the exact external package required manually. Please help!!! Best Regards!         xuying         xuying@sibs.ac.cn           2005-06-06 From amackey at pcbi.upenn.edu Mon Jun 6 10:38:26 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Jun 6 10:35:12 2005 Subject: [Bioperl-l] staden package building problem In-Reply-To: <20050606074648.EE71610DFD5@smtp.sibsnet.org> References: <20050606074648.EE71610DFD5@smtp.sibsnet.org> Message-ID: Unfortunately, I think you need to ask the staden package developers, as the source of your errors doesn't have anything to do with BioPerl. But, it looks like a symlink is trying to be made over an existing file; I'd suggest investigating this, and possibly starting over with a clean slate. -Aaron On Jun 6, 2005, at 3:45 AM, xuying wrote: > Hi all: > When I was compiling the staden package, it aborted with the > follow messages. > ............ > Makefile:73: warning: overriding commands for target `clean' > mk/global.mk:256: warning: ignoring old commands for target `clean' > Makefile:88: warning: overriding commands for target `depend' > mk/global.mk:515: warning: ignoring old commands for target `depend' > Makefile:128: warning: overriding commands for target `distsrc' > mk/global.mk:494: warning: ignoring old commands for target `distsrc' > cd Misc && make all > make[1]: Entering directory `/home/xuying/staden-src-1-5-3/src/Misc' > make[1]: Nothing to be done for `all'. > make[1]: Leaving directory `/home/xuying/staden-src-1-5-3/src/Misc' > cd io_lib && make all > make[1]: Entering directory `/home/xuying/staden-src-1-5-3/src/io_lib' > Makefile:129: dependencies: No such file or directory > Makefile:161: warning: overriding commands for target `distsrc' > ../mk/global.mk:494: warning: ignoring old commands for target > `distsrc' > cc -L../lib/linux-binaries -shared -o ../lib/linux-binaries/ > libread.so read/linux-binaries/Read.o read/linux-binaries/ > translate.o read/linux-binaries/scf_extras.o utils/linux-binaries/ > find.o utils/linux-binaries/mach-io.o utils/linux-binaries/ > traceType.o utils/linux-binaries/read_alloc.o utils/linux-binaries/ > compress.o utils/linux-binaries/open_trace_file.o scf/linux- > binaries/read_scf.o scf/linux-binaries/write_scf.o scf/linux- > binaries/misc_scf.o exp_file/linux-binaries/expFileIO.o plain/linux- > binaries/seqIOPlain.o abi/linux-binaries/fpoint.o abi/linux- > binaries/seqIOABI.o alf/linux-binaries/seqIOALF.o ctf/linux- > binaries/ctfCompress.o ctf/linux-binaries/seqIOCTF.o ztr/linux- > binaries/compression.o ztr/linux-binaries/ztr_translate.o ztr/linux- > binaries/ztr.o -lz -lmisc > cd progs && make > make[2]: Entering directory `/home/xuying/staden-src-1-5-3/src/ > io_lib/progs' > Makefile:84: dependencies: No such file or directory > make[2]: Nothing to be done for `all'. > make[2]: Leaving directory `/home/xuying/staden-src-1-5-3/src/ > io_lib/progs' > make[1]: Leaving directory `/home/xuying/staden-src-1-5-3/src/io_lib' > cd tk_utils && make all > make[1]: Entering directory `/home/xuying/staden-src-1-5-3/src/ > tk_utils' > Makefile:123: warning: overriding commands for target `distsrc' > ../mk/global.mk:494: warning: ignoring old commands for target > `distsrc' > ln -s ../licence/boxes.h . > ln: `./boxes.h': File exists > make[1]: *** [.links] Error 1 > make[1]: Leaving directory `/home/xuying/staden-src-1-5-3/src/ > tk_utils' > make: *** [tk_utils] Error 2 > > > What's the problem? I just installed the exact external package > required manually. Please help!!! > > > Best Regards! > > > ????????xuying > ????????xuying@sibs.ac.cn > ??????????2005-06-06 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From Jonathan_Epstein at nih.gov Mon Jun 6 10:35:41 2005 From: Jonathan_Epstein at nih.gov (Jonathan Epstein) Date: Mon Jun 6 10:39:31 2005 Subject: [Bioperl-l] 1-year fellowship opportunity at NIH for youngish BioPerl-er Message-ID: <6.2.1.2.2.20050606102705.02d05970@nihexchange4.nih.gov> Hi, We have an opportunity for a good Perl programmer with some BioPerl experience for a one-year appointment. This is suitable, e.g. for someone just out of college or who as just obtained a master's degree. See: http://www.training.nih.gov/student/Pre-IRTA/irtamanualpostbac.asp It doesn't pay particularly well and there is a requirement of US citizenship or US permanent residency ("green card"), but is a great opportunity IMO. It's located in suburban Washington DC (Maryland suburbs). Contact me directly if you're interested. [ apologies if this post was inappropriate; I didn't see anything that prohibited it, but clarification would be helpful ] Jonathan Jonathan Epstein Jonathan_Epstein@nih.gov Head, Unit on Biologic Computation (301)402-4563 Office of the Scientific Director Bldg 31, Room 2A47 Nat. Inst. of Child Health & Human Development 31 Center Drive National Institutes of Health Bethesda, MD 20892 From jason.stajich at duke.edu Mon Jun 6 15:58:22 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jun 6 15:50:36 2005 Subject: [Bioperl-l] AlignIO and bl2seq In-Reply-To: <000501c567aa$bd414600$e6028a0a@GOLHARMOBILE1> References: <000501c567aa$bd414600$e6028a0a@GOLHARMOBILE1> Message-ID: <989FD4FC-753D-45F3-8895-6ADE7C76309F@duke.edu> > I know I should be using SearchIO to get the HSPs Not to be trite, so why don't you? The AlignIO bl2seq parsing is just a hack to delegate to the SearchIO objects for the parsing. I only updated it to use SearchIO when we stopped supporting BPbl2seq parsing. Look at the code and you'll see what it is doing I hope. I would be more in favor of removing AlignIO::bl2seq anyways but I am a big believer in keeping the API as stable as possible - at least not removing functionality without good reason. To answer you question - probably because when the aln object is made from the HSP object we don't initialize the score field. The question would be which score would you want - bit score or some people might expect e-value (even if it isn't a score). The Search objects are just going to be richer wrt the pairwise aln data so I would start with SearchIO - you can always get Bio::SimpleAlign objects back out with the $hsp->get_aln method. HTH, -jason On Jun 2, 2005, at 3:39 PM, Ryan Golhar wrote: > Hi all, > > I'm trying to parse some alignments performed using bl2seq. My > code is > as follows: > > my $output = `bl2seq -p blastn -i seq1 -j seq2`; > my $in = Bio::AlignIO->new(-fh => new IO::String($output), -format => > 'bl2seq'); > my $aln = $in->next_aln(); > > if (defined($aln)) { > print "Score: ", $aln->score, "\n"; > } else { > print "n/a"; > } > > When it comes to printing the score, nothing gets printed out, which > makes sense because blast gives a list of HSPs or none if there aren't > any. So, how do I get the first HSP from the output using AlignIO? > > I know I should be using SearchIO to get the HSPs, but I thought I > would > try it with AlignIO as its documented, but I can't get it working. > Any > ideas??? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From golharam at umdnj.edu Mon Jun 6 16:15:42 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon Jun 6 16:08:10 2005 Subject: [Bioperl-l] AlignIO and bl2seq In-Reply-To: <989FD4FC-753D-45F3-8895-6ADE7C76309F@duke.edu> Message-ID: <002301c56ad4$842119e0$3500a8c0@GOLHARMOBILE1> I went for AlignIO because I came across it first and it looked like the most simply way of doing it. I think what you just wrote in your message should either be in the docs or tutorial as a recommendation indicating we should use SearchIO instead. Ryan -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: Monday, June 06, 2005 3:58 PM To: golharam@umdnj.edu Cc: 'Bioperl List' Subject: Re: [Bioperl-l] AlignIO and bl2seq I know I should be using SearchIO to get the HSPs Not to be trite, so why don't you? The AlignIO bl2seq parsing is just a hack to delegate to the SearchIO objects for the parsing. I only updated it to use SearchIO when we stopped supporting BPbl2seq parsing. Look at the code and you'll see what it is doing I hope. I would be more in favor of removing AlignIO::bl2seq anyways but I am a big believer in keeping the API as stable as possible - at least not removing functionality without good reason. To answer you question - probably because when the aln object is made from the HSP object we don't initialize the score field. The question would be which score would you want - bit score or some people might expect e-value (even if it isn't a score). The Search objects are just going to be richer wrt the pairwise aln data so I would start with SearchIO - you can always get Bio::SimpleAlign objects back out with the $hsp->get_aln method. HTH, -jason On Jun 2, 2005, at 3:39 PM, Ryan Golhar wrote: Hi all, I'm trying to parse some alignments performed using bl2seq. My code is as follows: my $output = `bl2seq -p blastn -i seq1 -j seq2`; my $in = Bio::AlignIO->new(-fh => new IO::String($output), -format => 'bl2seq'); my $aln = $in->next_aln(); if (defined($aln)) { print "Score: ", $aln->score, "\n"; } else { print "n/a"; } When it comes to printing the score, nothing gets printed out, which makes sense because blast gives a list of HSPs or none if there aren't any. So, how do I get the first HSP from the output using AlignIO? I know I should be using SearchIO to get the HSPs, but I thought I would try it with AlignIO as its documented, but I can't get it working. Any ideas??? Ryan _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From lifei03 at gmail.com Mon Jun 6 20:43:59 2005 From: lifei03 at gmail.com (Frank Lee) Date: Mon Jun 6 20:36:51 2005 Subject: [Bioperl-l] how to use bioperl to do Z-scores test Message-ID: <1e3d81a105060617437e25966a@mail.gmail.com> Can anybody give me some ideas about how to use bioperl module to do Z-scores test (Say, Wilcoxon).? Thanks very much -- Do not guess who I am. I am not Bush in BlackHouse From sdavis2 at mail.nih.gov Mon Jun 6 21:16:00 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon Jun 6 21:09:08 2005 Subject: [Bioperl-l] how to use bioperl to do Z-scores test References: <1e3d81a105060617437e25966a@mail.gmail.com> Message-ID: <008001c56afe$77a3fb40$5179f345@WATSON> Frank, I don't think there is a bioperl module to do a t-test, but I could be wrong. But you might want to look here: http://search.cpan.org/~yunfang/Statistics-TTest-1.1.0/TTest.pm Also, the open-source statistical package R (http://www.r-project.org) is quite good for such things and can be scripted or used from perl. What are you trying to do? Sean ----- Original Message ----- From: "Frank Lee" To: Sent: Monday, June 06, 2005 8:43 PM Subject: [Bioperl-l] how to use bioperl to do Z-scores test > Can anybody give me some ideas about how to use bioperl module to do > Z-scores test (Say, Wilcoxon).? > > Thanks very much > > -- > Do not guess who I am. I am not Bush in BlackHouse > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From lifei03 at gmail.com Tue Jun 7 00:25:54 2005 From: lifei03 at gmail.com (Frank Lee) Date: Tue Jun 7 00:19:10 2005 Subject: [Bioperl-l] how to use bioperl to do Z-scores test In-Reply-To: <008001c56afe$77a3fb40$5179f345@WATSON> References: <1e3d81a105060617437e25966a@mail.gmail.com> <008001c56afe$77a3fb40$5179f345@WATSON> Message-ID: <1e3d81a10506062125441e4171@mail.gmail.com> Hi, Sean What I am doing now is like this: I got a number from my data. Say 122. Then I generate one thansand numbers randomly according to similar criteria, Say, (7, 19, 45,199,......................). I wish to tell whether the result 122 is random distributed in the random dataset or it is small or large. And I wish to caculate the p-vlaue as a cutoff since I have thousands of such data(set). Can you give me some suggestions? Thanks! Frank On 6/7/05, Sean Davis wrote: > Frank, > > I don't think there is a bioperl module to do a t-test, but I could be > wrong. But you might want to look here: > > http://search.cpan.org/~yunfang/Statistics-TTest-1.1.0/TTest.pm > > Also, the open-source statistical package R (http://www.r-project.org) is > quite good for such things and can be scripted or used from perl. What are > you trying to do? > > Sean > > ----- Original Message ----- > From: "Frank Lee" > To: > Sent: Monday, June 06, 2005 8:43 PM > Subject: [Bioperl-l] how to use bioperl to do Z-scores test > > > > Can anybody give me some ideas about how to use bioperl module to do > > Z-scores test (Say, Wilcoxon).? > > > > Thanks very much > > > > -- > > Do not guess who I am. I am not Bush in BlackHouse > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- Do not guess who I am. I am not Bush in BlackHouse From sdavis2 at mail.nih.gov Tue Jun 7 06:29:56 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Jun 7 06:26:29 2005 Subject: [Bioperl-l] how to use bioperl to do Z-scores test In-Reply-To: <1e3d81a10506062125441e4171@mail.gmail.com> References: <1e3d81a105060617437e25966a@mail.gmail.com> <008001c56afe$77a3fb40$5179f345@WATSON> <1e3d81a10506062125441e4171@mail.gmail.com> Message-ID: <3fa0f3811320a583cdfe4111d6983817@mail.nih.gov> On Jun 7, 2005, at 12:25 AM, Frank Lee wrote: > Hi, Sean > > What I am doing now is like this: > > I got a number from my data. Say 122. Then I generate one > thansand numbers randomly according to similar criteria, Say, (7, 19, > 45,199,......................). I wish to tell whether the result > 122 is random distributed in the random dataset or it is small or > large. And I wish to caculate the p-vlaue as a cutoff since I have > thousands of such data(set). > > Can you give me some suggestions? Thanks! > You could try using code like: #!/usr/bin/perl use strict; use warnings; # Observed data my $datapoint=90; # generate 1000 random numbers (from 1 to 100) my @j; for my $i ((1..1000)) { $j[$i-1] = int(rand(100)); } # these lines return the number of permutation values > (<=) the observed # value. my $count_greater = grep {$_>$datapoint} @j; my $count_less = grep {$_<=$datapoint} @j; # output the result # You're mileage may vary depending on if you want a one-sided test or two-sided print "Original Data Point: $datapoint\n"; print "Permutation values greater than Data Point: $count_greater (p=".($count_greater/1000).") \n"; print "Permutation values less than Data Point: $count_less (p=".($count_less/1000).") \n"; If you are working with a large dataset, you may really want to consider moving over to a statistical package like R, which has many facilities for doing all kinds of testing like this (and more). Sean > On 6/7/05, Sean Davis wrote: >> Frank, >> >> I don't think there is a bioperl module to do a t-test, but I could be >> wrong. But you might want to look here: >> >> http://search.cpan.org/~yunfang/Statistics-TTest-1.1.0/TTest.pm >> >> Also, the open-source statistical package R >> (http://www.r-project.org) is >> quite good for such things and can be scripted or used from perl. >> What are >> you trying to do? >> >> Sean >> >> ----- Original Message ----- >> From: "Frank Lee" >> To: >> Sent: Monday, June 06, 2005 8:43 PM >> Subject: [Bioperl-l] how to use bioperl to do Z-scores test >> >> >>> Can anybody give me some ideas about how to use bioperl module to do >>> Z-scores test (Say, Wilcoxon).? >>> >>> Thanks very much >>> >>> -- >>> Do not guess who I am. I am not Bush in BlackHouse >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> > > > -- > Do not guess who I am. I am not Bush in BlackHouse From sdavis2 at mail.nih.gov Tue Jun 7 07:08:41 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Jun 7 07:02:16 2005 Subject: [Bioperl-l] how to use bioperl to do Z-scores test In-Reply-To: <3fa0f3811320a583cdfe4111d6983817@mail.nih.gov> References: <1e3d81a105060617437e25966a@mail.gmail.com> <008001c56afe$77a3fb40$5179f345@WATSON> <1e3d81a10506062125441e4171@mail.gmail.com> <3fa0f3811320a583cdfe4111d6983817@mail.nih.gov> Message-ID: <73f27eab5fa23176e65f1bdcd6d2c6d7@mail.nih.gov> On Jun 7, 2005, at 6:29 AM, Sean Davis wrote: > > On Jun 7, 2005, at 12:25 AM, Frank Lee wrote: > >> Hi, Sean >> >> What I am doing now is like this: >> >> I got a number from my data. Say 122. Then I generate one >> thansand numbers randomly according to similar criteria, Say, (7, 19, >> 45,199,......................). I wish to tell whether the result >> 122 is random distributed in the random dataset or it is small or >> large. And I wish to caculate the p-vlaue as a cutoff since I have >> thousands of such data(set). >> >> Can you give me some suggestions? Thanks! >> > > You could try using code like: > > #!/usr/bin/perl > use strict; > use warnings; > > # Observed data > my $datapoint=90; > # generate 1000 random numbers (from 1 to 100) > my @j; > for my $i ((1..1000)) { > $j[$i-1] = int(rand(100)); > } > > # these lines return the number of permutation values > (<=) the > observed > # value. > my $count_greater = grep {$_>$datapoint} @j; > my $count_less = grep {$_<=$datapoint} @j; > > # output the result > # You're mileage may vary depending on if you want a one-sided test or > two-sided > print "Original Data Point: $datapoint\n"; > print "Permutation values greater than Data Point: $count_greater > (p=".($count_greater/1000).") \n"; > print "Permutation values less than Data Point: $count_less > (p=".($count_less/1000).") \n"; > > > If you are working with a large dataset, you may really want to > consider moving over to a statistical package like R, which has many > facilities for doing all kinds of testing like this (and more). > And I didn't mention--the perl rand function generates numbers from a uniform distribution, which may or may not be what you want. Sean From Shilpa.Dixit at s1.com Tue Jun 7 06:49:13 2005 From: Shilpa.Dixit at s1.com (Shilpa Dixit) Date: Tue Jun 7 07:56:55 2005 Subject: [Bioperl-l] About suggestions and volunteer Message-ID: Hello, I am interested to be a volunteer for perl community. What can I do Human Genome Project? I have bit worked on Perl, C, C++, Oracle(Database). Also Can you tell where can I put suggestions / ask queries related to DNA/Human Genome Project? Thanks & Regards, Shilpa From lifei03 at gmail.com Tue Jun 7 08:46:49 2005 From: lifei03 at gmail.com (Frank Lee) Date: Tue Jun 7 08:46:11 2005 Subject: [Bioperl-l] how to use bioperl to do Z-scores test In-Reply-To: <3fa0f3811320a583cdfe4111d6983817@mail.nih.gov> References: <1e3d81a105060617437e25966a@mail.gmail.com> <008001c56afe$77a3fb40$5179f345@WATSON> <1e3d81a10506062125441e4171@mail.gmail.com> <3fa0f3811320a583cdfe4111d6983817@mail.nih.gov> Message-ID: <1e3d81a1050607054611eb3514@mail.gmail.com> Thanks very much. I will try the R-project. On 6/7/05, Sean Davis wrote: > > On Jun 7, 2005, at 12:25 AM, Frank Lee wrote: > > > Hi, Sean > > > > What I am doing now is like this: > > > > I got a number from my data. Say 122. Then I generate one > > thansand numbers randomly according to similar criteria, Say, (7, 19, > > 45,199,......................). I wish to tell whether the result > > 122 is random distributed in the random dataset or it is small or > > large. And I wish to caculate the p-vlaue as a cutoff since I have > > thousands of such data(set). > > > > Can you give me some suggestions? Thanks! > > > > You could try using code like: > > #!/usr/bin/perl > use strict; > use warnings; > > # Observed data > my $datapoint=90; > # generate 1000 random numbers (from 1 to 100) > my @j; > for my $i ((1..1000)) { > $j[$i-1] = int(rand(100)); > } > > # these lines return the number of permutation values > (<=) the > observed > # value. > my $count_greater = grep {$_>$datapoint} @j; > my $count_less = grep {$_<=$datapoint} @j; > > # output the result > # You're mileage may vary depending on if you want a one-sided test or > two-sided > print "Original Data Point: $datapoint\n"; > print "Permutation values greater than Data Point: $count_greater > (p=".($count_greater/1000).") \n"; > print "Permutation values less than Data Point: $count_less > (p=".($count_less/1000).") \n"; > > > If you are working with a large dataset, you may really want to > consider moving over to a statistical package like R, which has many > facilities for doing all kinds of testing like this (and more). > > Sean > > > > > > On 6/7/05, Sean Davis wrote: > >> Frank, > >> > >> I don't think there is a bioperl module to do a t-test, but I could be > >> wrong. But you might want to look here: > >> > >> http://search.cpan.org/~yunfang/Statistics-TTest-1.1.0/TTest.pm > >> > >> Also, the open-source statistical package R > >> (http://www.r-project.org) is > >> quite good for such things and can be scripted or used from perl. > >> What are > >> you trying to do? > >> > >> Sean > >> > >> ----- Original Message ----- > >> From: "Frank Lee" > >> To: > >> Sent: Monday, June 06, 2005 8:43 PM > >> Subject: [Bioperl-l] how to use bioperl to do Z-scores test > >> > >> > >>> Can anybody give me some ideas about how to use bioperl module to do > >>> Z-scores test (Say, Wilcoxon).? > >>> > >>> Thanks very much > >>> > >>> -- > >>> Do not guess who I am. I am not Bush in BlackHouse > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> > >> > > > > > > -- > > Do not guess who I am. I am not Bush in BlackHouse > > -- Do not guess who I am. I am not Bush in BlackHouse From lifei03 at gmail.com Tue Jun 7 08:46:49 2005 From: lifei03 at gmail.com (Frank Lee) Date: Tue Jun 7 08:47:19 2005 Subject: [Bioperl-l] how to use bioperl to do Z-scores test In-Reply-To: <3fa0f3811320a583cdfe4111d6983817@mail.nih.gov> References: <1e3d81a105060617437e25966a@mail.gmail.com> <008001c56afe$77a3fb40$5179f345@WATSON> <1e3d81a10506062125441e4171@mail.gmail.com> <3fa0f3811320a583cdfe4111d6983817@mail.nih.gov> Message-ID: <1e3d81a1050607054611eb3514@mail.gmail.com> Thanks very much. I will try the R-project. On 6/7/05, Sean Davis wrote: > > On Jun 7, 2005, at 12:25 AM, Frank Lee wrote: > > > Hi, Sean > > > > What I am doing now is like this: > > > > I got a number from my data. Say 122. Then I generate one > > thansand numbers randomly according to similar criteria, Say, (7, 19, > > 45,199,......................). I wish to tell whether the result > > 122 is random distributed in the random dataset or it is small or > > large. And I wish to caculate the p-vlaue as a cutoff since I have > > thousands of such data(set). > > > > Can you give me some suggestions? Thanks! > > > > You could try using code like: > > #!/usr/bin/perl > use strict; > use warnings; > > # Observed data > my $datapoint=90; > # generate 1000 random numbers (from 1 to 100) > my @j; > for my $i ((1..1000)) { > $j[$i-1] = int(rand(100)); > } > > # these lines return the number of permutation values > (<=) the > observed > # value. > my $count_greater = grep {$_>$datapoint} @j; > my $count_less = grep {$_<=$datapoint} @j; > > # output the result > # You're mileage may vary depending on if you want a one-sided test or > two-sided > print "Original Data Point: $datapoint\n"; > print "Permutation values greater than Data Point: $count_greater > (p=".($count_greater/1000).") \n"; > print "Permutation values less than Data Point: $count_less > (p=".($count_less/1000).") \n"; > > > If you are working with a large dataset, you may really want to > consider moving over to a statistical package like R, which has many > facilities for doing all kinds of testing like this (and more). > > Sean > > > > > > On 6/7/05, Sean Davis wrote: > >> Frank, > >> > >> I don't think there is a bioperl module to do a t-test, but I could be > >> wrong. But you might want to look here: > >> > >> http://search.cpan.org/~yunfang/Statistics-TTest-1.1.0/TTest.pm > >> > >> Also, the open-source statistical package R > >> (http://www.r-project.org) is > >> quite good for such things and can be scripted or used from perl. > >> What are > >> you trying to do? > >> > >> Sean > >> > >> ----- Original Message ----- > >> From: "Frank Lee" > >> To: > >> Sent: Monday, June 06, 2005 8:43 PM > >> Subject: [Bioperl-l] how to use bioperl to do Z-scores test > >> > >> > >>> Can anybody give me some ideas about how to use bioperl module to do > >>> Z-scores test (Say, Wilcoxon).? > >>> > >>> Thanks very much > >>> > >>> -- > >>> Do not guess who I am. I am not Bush in BlackHouse > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> > >> > > > > > > -- > > Do not guess who I am. I am not Bush in BlackHouse > > -- Do not guess who I am. I am not Bush in BlackHouse From Annie.Law at nrc-cnrc.gc.ca Tue Jun 7 16:25:56 2005 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Tue Jun 7 16:21:21 2005 Subject: [Bioperl-l] Entrez Gene parser questions Message-ID: <10C94843061E094A98C02EB77CFC328722FF4A@nrcmrdex1d.imsb.nrc.ca> Hi, Thanks for all of the replies. I am using bioperl 1.4 and I have done the following: 1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, make test, make install) 2. got a copy of the entrezgene.pm from bioperl-live and put it in the Bio/SeqIO directory 3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in the corresponding directory 4. the Bio::Cluster::SequenceFamily file was already up to date 5. also have all the most recent bioperl-live Bio::SeqFeature::Gene modules I grabbed the ASN file from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ I then wrote a simple perl script which includes: #!/usr/bin/perl -w use strict; use Bio::ASN1::EntrezGene; use Bio::SeqIO; use Bio::Annotation::DBLink; use Bio::Cluster::SequenceFamily; my $file = '/var/lib/mysql/Homo_sapiens'; my $seqio = Bio::SeqIO->new(-format => 'entrezgene', -file => $file, -debug => 'on'); my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; After I run this script I get the following errors: Useless use of hash element in void context at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. Argument "-trimopt" isn't numeric in numeric eq (==) at /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. Use of uninitialized value in string ne at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. Can't locate object method "add_tag_value" via package "Bio::SeqFeature::Generic" at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. I'm not sure what I am missing? Thanks so much, Annie. -----Original Message----- From: Stefan Kirov [mailto:skirov@utk.edu] Sent: Thursday, June 02, 2005 1:39 PM To: Law, Annie Cc: 'bioperl-l@bioperl.org' Subject: Re: [Bioperl-l] Entrez Gene parser questions Annie, I am sorry to say there is no good documentation yet, since I am still evaluating and debugging the code and I have buch of other stuff to deal with ir tight now, so appolozie but there will not be a comprehensive docs for at least another 2 weeks. First install: I think you need to install Bio::SeqIO::entrezgene.pm and Bio::ASN1::EnztrezGene. You will have to update Bio::Annotation::DBLink. Make sure you also have Bio::Cluster::SequenceFamily and all modules in Bio::SeqFeature::Gene. These also may need updating. When you do my $seqio = Bio::SeqIO->new(-format => 'entrezgene', -file => $file, -debug => 'on'); my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; $gene->accession_number will give you the Entrezgene id and $gene->id will give you the gene symbol. If you go through the annotations: my $ann=$gene->annotation; (where most of the data is) my @dblinks= $ann->get_Annotations('DBLink') foreach my $dblink (@dblinks) { print 'Unigene id for this gene is '.$dblink->id if (lc($dblink->database) eq 'unigene'); } my @nameann=$ann->get_Annotations('Official Full Name') print 'Gene name is ',$nameann[0]->as_text,"\n"; my @go=$ann->get_Annotations('OntologyTerm'); foreach my $go (@go) { next if ($go->authority eq 'STS marker'); #Unless you want STS markers... my @refs=$go->term->references;#you should get just one print join(',',$gid,$go->ontology->name,$go->identifier,$go->name,'GO:'.$go->identifier,$refs[0]->medline); } my @associated_seq=$struct->get_members(); foreach my $seq (@associated_seq) { if ($contig->namespace eq 'refseq') {#Only refseq, there is also mrna, product and genomic as options... my @prod=$contig->annotation->get_Annotations('product'); my @transvar=$contig->annotation->get_Annotations('simple'); my $transvar=''; foreach my $sv (@transvar) { $transvar=$sv->value if ($sv->tagname eq 'Transcriptional Variant'); } my $assembly; foreach my $ann ($contig->annotation->get_Annotations('dblink')) { $assembly.=$ann->primary_id . '-' if ($ann->optional_id eq 'Source Sequence'); } chop $assembly; my $prod; foreach my $p (@prod) { if ($p) { $prod=$p->value; last; } } } Hope this helps and will get you started at least. Let me know if you have more question. Stefan Law, Annie wrote: >Hi, > >I would appreciate help with the following. First of all, I would like >to say it's great that a Entrez gene parser was written. I just have some questions to get me started. > >A) I already have bioperl 1.4 installed. I would like to know if I can >install the Entrez gene parser and >Stefan Kirov's entrezgene.pm with out reinstalling bioperl. If this is not advised then I would reinstall bioperl. > >I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...) > 2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory > (which util.pm shold I use?? I searched CPAN and got many results for util.pm > is it Biblio::Util, Boulder::Util, or something else??) > >B) For each Entrez gene id how do I access the associated Unigene id, >accession numbers, gene symbol, gene name, And GO IDs. > >Also, I have been looking at the code and the POD within it. Are there >some other places that I can look for documentation? > >Thanks so much!! >Annie. > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From skirov at utk.edu Tue Jun 7 16:47:11 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Jun 7 16:39:11 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF4A@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF4A@nrcmrdex1d.imsb.nrc.ca> Message-ID: <42A607CF.8040601@utk.edu> Hi Annie, Few more things to update: Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason your script does not work. Update also Bio::SeqIO::entrezegene- I added some fixed so you will not see those nasty warning when using strict (nothing critical though). I am not sure we have the same versions for Bio::ASN1::EntrezGene. I have 1.0.7. Where did you get it from (maybe mine is older)? Stefan Law, Annie wrote: >Hi, > >Thanks for all of the replies. >I am using bioperl 1.4 and I have done the following: >1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, make test, make install) >2. got a copy of the entrezgene.pm from bioperl-live and put it in the Bio/SeqIO directory >3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in the corresponding directory >4. the Bio::Cluster::SequenceFamily file was already up to date >5. also have all the most recent bioperl-live Bio::SeqFeature::Gene modules > >I grabbed the ASN file from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ > >I then wrote a simple perl script which includes: > >#!/usr/bin/perl -w > >use strict; >use Bio::ASN1::EntrezGene; >use Bio::SeqIO; >use Bio::Annotation::DBLink; >use Bio::Cluster::SequenceFamily; > >my $file = '/var/lib/mysql/Homo_sapiens'; > >my $seqio = Bio::SeqIO->new(-format => 'entrezgene', > -file => $file, > -debug => 'on'); > >my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; > >After I run this script I get the following errors: > >Useless use of hash element in void context at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. > >Argument "-trimopt" isn't numeric in numeric eq (==) at /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. > >Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. > >Use of uninitialized value in string ne at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. > >Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. > >Can't locate object method "add_tag_value" via package "Bio::SeqFeature::Generic" at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. > >I'm not sure what I am missing? > >Thanks so much, >Annie. > > > > >-----Original Message----- >From: Stefan Kirov [mailto:skirov@utk.edu] >Sent: Thursday, June 02, 2005 1:39 PM >To: Law, Annie >Cc: 'bioperl-l@bioperl.org' >Subject: Re: [Bioperl-l] Entrez Gene parser questions > > >Annie, >I am sorry to say there is no good documentation yet, since I am still >evaluating and debugging the code and I have buch of other stuff to deal >with ir tight now, so appolozie but there will not be a comprehensive >docs for at least another 2 weeks. >First install: >I think you need to install Bio::SeqIO::entrezgene.pm and >Bio::ASN1::EnztrezGene. You will have to update Bio::Annotation::DBLink. >Make sure you also have Bio::Cluster::SequenceFamily and all modules in >Bio::SeqFeature::Gene. These also may need updating. >When you do > my $seqio = Bio::SeqIO->new(-format => 'entrezgene', > -file => $file, > -debug => 'on'); > my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; $gene->accession_number will give you the Entrezgene id and $gene->id will give you the gene symbol. If you go through the >annotations: >my $ann=$gene->annotation; (where most of the data is) > >my @dblinks= $ann->get_Annotations('DBLink') >foreach my $dblink (@dblinks) { > print 'Unigene id for this gene is '.$dblink->id if (lc($dblink->database) eq 'unigene'); } > >my @nameann=$ann->get_Annotations('Official Full Name') >print 'Gene name is ',$nameann[0]->as_text,"\n"; > >my @go=$ann->get_Annotations('OntologyTerm'); >foreach my $go (@go) { >next if ($go->authority eq 'STS marker'); #Unless you want STS markers... my @refs=$go->term->references;#you should get just one print join(',',$gid,$go->ontology->name,$go->identifier,$go->name,'GO:'.$go->identifier,$refs[0]->medline); >} >my @associated_seq=$struct->get_members(); > >foreach my $seq (@associated_seq) { > if ($contig->namespace eq 'refseq') {#Only refseq, there is also mrna, product and genomic as options... > my @prod=$contig->annotation->get_Annotations('product'); > my @transvar=$contig->annotation->get_Annotations('simple'); > my $transvar=''; > foreach my $sv (@transvar) { > $transvar=$sv->value if ($sv->tagname eq 'Transcriptional Variant'); > } > my $assembly; > foreach my $ann ($contig->annotation->get_Annotations('dblink')) { > $assembly.=$ann->primary_id . '-' if ($ann->optional_id eq 'Source Sequence'); > } > chop $assembly; > my $prod; > foreach my $p (@prod) { > if ($p) { > $prod=$p->value; > last; > } > } >} > >Hope this helps and will get you started at least. Let me know if you >have more question. >Stefan > >Law, Annie wrote: > > > >>Hi, >> >>I would appreciate help with the following. First of all, I would like >>to say it's great that a Entrez gene parser was written. I just have some questions to get me started. >> >>A) I already have bioperl 1.4 installed. I would like to know if I can >>install the Entrez gene parser and >>Stefan Kirov's entrezgene.pm with out reinstalling bioperl. If this is not advised then I would reinstall bioperl. >> >>I was going to 1. install the Bio::ASN1::EntrezGene module (perl Makefile.PL, make, make test, make install, etc...) >> 2. just take only the entrezgene.pm and util.pm file and put it in the appropriate directory >> (which util.pm shold I use?? I searched CPAN and got many results for util.pm >> is it Biblio::Util, Boulder::Util, or something else??) >> >>B) For each Entrez gene id how do I access the associated Unigene id, >>accession numbers, gene symbol, gene name, And GO IDs. >> >>Also, I have been looking at the code and the POD within it. Are there >>some other places that I can look for documentation? >> >>Thanks so much!! >>Annie. >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From mingyi.liu at gpc-biotech.com Tue Jun 7 19:06:28 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Tue Jun 7 18:58:41 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <42A607CF.8040601@utk.edu> References: <10C94843061E094A98C02EB77CFC328722FF4A@nrcmrdex1d.imsb.nrc.ca> <42A607CF.8040601@utk.edu> Message-ID: <42A62874.6060408@gpc-biotech.com> Hi, Annie Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked fine with your entrezgene.pm in bioperl-live at least a couple days ago, when I replied to Annie. V1.09 also did not have any significant changes in parser code vs. 1.07. I guess it has something to do with V1.4 since the sample script you used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The procedure you used was also correct (there's no need to 'use Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first (although I don't quite understand why Bio::ASN1::EntrezGene complained about non numerica operon, which can only result from incorrect calling of the function. But Stefan's entrezgene was using correct calling and it runs perfectly for me. I was using entrezgene.pm CVS version 1.9. Keep us posted on your progress. BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' (bioperl style) or 'file' (I allowed both when I implemented this switch I think. I usually just use file, which might explain your note in the comment in your script. I just noticed the comment. Mingyi Stefan Kirov wrote: > Hi Annie, > Few more things to update: > Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason > your script does not work. Update also Bio::SeqIO::entrezegene- I > added some fixed so you will not see those nasty warning when using > strict (nothing critical though). > I am not sure we have the same versions for Bio::ASN1::EntrezGene. I > have 1.0.7. Where did you get it from (maybe mine is older)? > Stefan > > Law, Annie wrote: > >> Hi, >> >> Thanks for all of the replies. >> I am using bioperl 1.4 and I have done the following: >> 1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >> make test, make install) >> 2. got a copy of the entrezgene.pm from bioperl-live and put it in >> the Bio/SeqIO directory >> 3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >> the corresponding directory >> 4. the Bio::Cluster::SequenceFamily file was already up to date >> 5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >> modules >> >> I grabbed the ASN file from >> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >> >> I then wrote a simple perl script which includes: >> >> #!/usr/bin/perl -w >> >> use strict; >> use Bio::ASN1::EntrezGene; >> use Bio::SeqIO; >> use Bio::Annotation::DBLink; >> use Bio::Cluster::SequenceFamily; >> >> my $file = '/var/lib/mysql/Homo_sapiens'; >> >> my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >> -file => $file, >> -debug => 'on'); >> >> my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >> >> After I run this script I get the following errors: >> >> Useless use of hash element in void context at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >> >> Argument "-trimopt" isn't numeric in numeric eq (==) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >> >> Pseudo-hashes are deprecated at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >> >> Use of uninitialized value in string ne at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >> >> Pseudo-hashes are deprecated at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >> >> Can't locate object method "add_tag_value" via package >> "Bio::SeqFeature::Generic" at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >> >> I'm not sure what I am missing? >> >> Thanks so much, >> Annie. > From skirov at utk.edu Tue Jun 7 20:38:53 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Jun 7 20:30:46 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <42A62874.6060408@gpc-biotech.com> References: <10C94843061E094A98C02EB77CFC328722FF4A@nrcmrdex1d.imsb.nrc.ca> <42A607CF.8040601@utk.edu> <42A62874.6060408@gpc-biotech.com> Message-ID: <42A63E1D.3090307@utk.edu> I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the code (actually the underlying modules I mentioned...). Thanks for the -file option... There are also some leftover commenst :-) . I need to clean the code a bit... Stefan Mingyi Liu wrote: > Hi, Annie > > Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked > fine with your entrezgene.pm in bioperl-live at least a couple days > ago, when I replied to Annie. V1.09 also did not have any significant > changes in parser code vs. 1.07. > > I guess it has something to do with V1.4 since the sample script you > used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The > procedure you used was also correct (there's no need to 'use > Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first > (although I don't quite understand why Bio::ASN1::EntrezGene > complained about non numerica operon, which can only result from > incorrect calling of the function. But Stefan's entrezgene was using > correct calling and it runs perfectly for me. I was using > entrezgene.pm CVS version 1.9. > > Keep us posted on your progress. > > BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' > (bioperl style) or 'file' (I allowed both when I implemented this > switch I think. I usually just use file, which might explain your note > in the comment in your script. I just noticed the comment. > > Mingyi > > Stefan Kirov wrote: > >> Hi Annie, >> Few more things to update: >> Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >> your script does not work. Update also Bio::SeqIO::entrezegene- I >> added some fixed so you will not see those nasty warning when using >> strict (nothing critical though). >> I am not sure we have the same versions for Bio::ASN1::EntrezGene. I >> have 1.0.7. Where did you get it from (maybe mine is older)? >> Stefan >> >> Law, Annie wrote: >> >>> Hi, >>> >>> Thanks for all of the replies. >>> I am using bioperl 1.4 and I have done the following: >>> 1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>> make test, make install) >>> 2. got a copy of the entrezgene.pm from bioperl-live and put it in >>> the Bio/SeqIO directory >>> 3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>> the corresponding directory >>> 4. the Bio::Cluster::SequenceFamily file was already up to date >>> 5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>> modules >>> >>> I grabbed the ASN file from >>> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>> >>> I then wrote a simple perl script which includes: >>> >>> #!/usr/bin/perl -w >>> >>> use strict; >>> use Bio::ASN1::EntrezGene; >>> use Bio::SeqIO; >>> use Bio::Annotation::DBLink; >>> use Bio::Cluster::SequenceFamily; >>> >>> my $file = '/var/lib/mysql/Homo_sapiens'; >>> >>> my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>> -file => $file, >>> -debug => 'on'); >>> >>> my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>> >>> After I run this script I get the following errors: >>> >>> Useless use of hash element in void context at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>> >>> Argument "-trimopt" isn't numeric in numeric eq (==) at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>> >>> Pseudo-hashes are deprecated at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>> >>> Use of uninitialized value in string ne at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>> >>> Pseudo-hashes are deprecated at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>> >>> Can't locate object method "add_tag_value" via package >>> "Bio::SeqFeature::Generic" at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>> >>> I'm not sure what I am missing? >>> >>> Thanks so much, >>> Annie. >> >> > > From amackey at virginia.edu Tue Jun 7 11:30:15 2005 From: amackey at virginia.edu (Aaron J. Mackey) Date: Wed Jun 8 07:47:30 2005 Subject: [Bioperl-l] Fwd: BioPerl Blast Tools References: <20050607154259.owcoogs4kgwgw0cs@webmail1.manchester.ac.uk> Message-ID: <4D57A819-E562-4CC6-8756-BA4A72271016@virginia.edu> Begin forwarded message: > From: Paul Fisher > Date: June 7, 2005 10:42:59 AM EDT > Subject: RE: BioPerl Blast Tools > I am currently working through my final research project for my MSc > Bioinformatics degree, which is to develop a comparison tool for > comparing > two blast files. > > What i require now is a means of filtering the results held in a > blast file > depending on species and/or chromosome, i.e the hits that are > present in > Homo sapiens and/or chromosome 3, then showing on the new hits from > the > blast file (when compared to the original file) to the user. > > Is there any such implementation at present within the Bioperl > modules or > any assistance given by the NCBI staff and their API's. > > > Any help is much appreciated. > Thanks, > Paul Fisher. From dbastar at yahoo.com Wed Jun 8 07:43:41 2005 From: dbastar at yahoo.com (Duangdaow Kanhasiri) Date: Wed Jun 8 07:47:44 2005 Subject: [Bioperl-l] Error loading sequence with load_seqdatabase.pl Message-ID: <20050608114341.29861.qmail@web40728.mail.yahoo.com> Hi, I've used the bioperl script load_seqdatabase.pl (came with the biosql' scripts) to load the bacterial sequence in genbank format(*.gbk) into PostgreSQL 8.0 database on Linux machine as: $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk & Where under the /export/Bacteria/ path are the Bacteria's name path e.g. Acinetobacter_sp_ADP1 and the file name are like NC_006824.gbk. Previously it used to load some sequences in to some tables in biosql database (count from table bioentry) bioseq=# select count(*) from bioentry; count ------- 33 (1 row) However, after a while it then stopped with the the error: [1]+ Segmentation fault perl load_seqdatabase.pl /export/Bacteria/*/*.gbk & I then checked and removed the *.gbk file that have already been loaded in to the table, leaving only the unloaded ones and ran the scripted again. It continued to work for some times and stopped again. I repeated the process several times until 173 sequences were loaded into the table: bioseq=# select count(*) from bioentry; count ------- 173 (1 row) The program then stopped again and this time it wouldn't run anymore even I tried with only on file. The error is still the same like: $ perl load_seqdatabase.pl /export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk Segmentation fault $ Now I couldn't load the rest of my sequences into the database anymore. I would be very apprecialed if any one knows how to solve the "Segmentation fault" problem? Regards, Davina __________________________________ Discover Yahoo! Have fun online with music videos, cool games, IM and more. Check it out! http://discover.yahoo.com/online.html -------------- next part -------------- A non-text attachment was scrubbed... Name: load_seqdatabase.pl Type: application/octet-stream Size: 22486 bytes Desc: 3434098052-load_seqdatabase.pl Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050608/1c6b46ab/load_seqdatabase-0001.obj From sdavis2 at mail.nih.gov Wed Jun 8 09:43:03 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Jun 8 09:35:58 2005 Subject: [Bioperl-l] Bio::SeqIO::entrezgene refseq question Message-ID: <6c528f4e127499b0431bfc524aaaa88f@mail.nih.gov> Stefan and Mingyi, I don't think I have officially thanked you both for a nice contribution for those of use using annotation heavily. I am just picking up with using the new parsers. Just a quick question: How would I get something like the gene2refseq file out of the parser? In particular, I would like to get all rna/protein pairs (as pairs) where they exist, as well as rna and protein that do not have pairs. I guess I haven't waded into the object structure far enough, yet. Thanks, Sean From skirov at utk.edu Wed Jun 8 10:12:32 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Jun 8 10:06:00 2005 Subject: [Bioperl-l] Bio::SeqIO::entrezgene refseq question In-Reply-To: <6c528f4e127499b0431bfc524aaaa88f@mail.nih.gov> References: <6c528f4e127499b0431bfc524aaaa88f@mail.nih.gov> Message-ID: <42A6FCD0.1010207@utk.edu> Sean, Here it is: my $seqio = Bio::SeqIO->new(-format => 'entrezgene', -file => $file, -debug => 'on'); my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; The object you need is $genestructure- it has all sequence related data- genomic contigs, Refseq, GenBank. Then to get the data you ask for: my @seobj=$struct->get_members(); foreach my $seq(@seqobj) { next unless (($contig->namespace eq 'refseq')&&($contig->authority=~/mrna/i)); #skip unless refseq transcript, alternatively you can read the protein as well and put it in a hash/array my @prod=$contig->annotation->get_Annotations('product'); my $protid=$prod[0]->value if ($prod[0]);#I would expect only one product.... } Then if you wish you can get the whole record for the protein out of @seqobj It is not your fault, just the docs are not there yet (any moment now :-) )... Let me know if this helps and if you need anything else. Stefan Sean Davis wrote: > Stefan and Mingyi, > > I don't think I have officially thanked you both for a nice > contribution for those of use using annotation heavily. I am just > picking up with using the new parsers. Just a quick question: How > would I get something like the gene2refseq file out of the parser? In > particular, I would like to get all rna/protein pairs (as pairs) where > they exist, as well as rna and protein that do not have pairs. I > guess I haven't waded into the object structure far enough, yet. > > Thanks, > Sean > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From Annie.Law at nrc-cnrc.gc.ca Wed Jun 8 11:40:24 2005 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Wed Jun 8 11:32:20 2005 Subject: [Bioperl-l] Entrez Gene parser questions Message-ID: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> Hi, I went to bioperl-live and copied the newest entrezgene.pm and Bio::AnnotatableI, Bio::SeqFeatureI modules and I got the following output (same output with entrezgene.pm saved on May 11 as well). The complaint about the module Generic.pm is gone now. I am using EntrezGene V 1.09 picked up from CPAN. I'm not sure what is wrong. I guess I could delete the Bio/ASN1 directory and install EntrezGene V.1.07 but that is not supposed to make a difference? Here's a wild stab but are the additional messages related to a statement I read somewhere about the first record being empty? Is my next alternative to install Bioperl 1.5 and try again? If so, am I correct in thinking that to 'uninstall' there is no such Uninstall command but I would just delete the /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then Use CPAN to install bioperl 1.5 Thanks so much, Annie. Argument "-trimopt" isn't numeric in numeric eq (==) at /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. -----Original Message----- From: Stefan Kirov [mailto:skirov@utk.edu] Sent: Tuesday, June 07, 2005 8:39 PM To: Mingyi Liu Cc: Law, Annie; Bioperl list Subject: Re: [Bioperl-l] Entrez Gene parser questions I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the code (actually the underlying modules I mentioned...). Thanks for the -file option... There are also some leftover commenst :-) . I need to clean the code a bit... Stefan Mingyi Liu wrote: > Hi, Annie > > Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked > fine with your entrezgene.pm in bioperl-live at least a couple days > ago, when I replied to Annie. V1.09 also did not have any significant > changes in parser code vs. 1.07. > > I guess it has something to do with V1.4 since the sample script you > used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The > procedure you used was also correct (there's no need to 'use > Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first > (although I don't quite understand why Bio::ASN1::EntrezGene > complained about non numerica operon, which can only result from > incorrect calling of the function. But Stefan's entrezgene was using > correct calling and it runs perfectly for me. I was using > entrezgene.pm CVS version 1.9. > > Keep us posted on your progress. > > BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' > (bioperl style) or 'file' (I allowed both when I implemented this > switch I think. I usually just use file, which might explain your note > in the comment in your script. I just noticed the comment. > > Mingyi > > Stefan Kirov wrote: > >> Hi Annie, >> Few more things to update: >> Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >> your script does not work. Update also Bio::SeqIO::entrezegene- I >> added some fixed so you will not see those nasty warning when using >> strict (nothing critical though). >> I am not sure we have the same versions for Bio::ASN1::EntrezGene. I >> have 1.0.7. Where did you get it from (maybe mine is older)? >> Stefan >> >> Law, Annie wrote: >> >>> Hi, >>> >>> Thanks for all of the replies. >>> I am using bioperl 1.4 and I have done the following: >>> 1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>> make test, make install) >>> 2. got a copy of the entrezgene.pm from bioperl-live and put it in >>> the Bio/SeqIO directory >>> 3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>> the corresponding directory >>> 4. the Bio::Cluster::SequenceFamily file was already up to date >>> 5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>> modules >>> >>> I grabbed the ASN file from >>> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>> >>> I then wrote a simple perl script which includes: >>> >>> #!/usr/bin/perl -w >>> >>> use strict; >>> use Bio::ASN1::EntrezGene; >>> use Bio::SeqIO; >>> use Bio::Annotation::DBLink; >>> use Bio::Cluster::SequenceFamily; >>> >>> my $file = '/var/lib/mysql/Homo_sapiens'; >>> >>> my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>> -file => $file, >>> -debug => 'on'); >>> >>> my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>> >>> After I run this script I get the following errors: >>> >>> Useless use of hash element in void context at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>> >>> Argument "-trimopt" isn't numeric in numeric eq (==) at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>> >>> Pseudo-hashes are deprecated at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>> >>> Use of uninitialized value in string ne at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>> >>> Pseudo-hashes are deprecated at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>> >>> Can't locate object method "add_tag_value" via package >>> "Bio::SeqFeature::Generic" at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>> >>> I'm not sure what I am missing? >>> >>> Thanks so much, >>> Annie. >> >> > > From skirov at utk.edu Wed Jun 8 11:52:37 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Jun 8 11:44:30 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> Message-ID: <42A71445.9010308@utk.edu> Annie, Can you just disable use strict for now and tell me again what you see? Stefan Law, Annie wrote: >Hi, > >I went to bioperl-live and copied the newest entrezgene.pm and Bio::AnnotatableI, Bio::SeqFeatureI modules and >I got the following output (same output with entrezgene.pm saved on May 11 as well). The complaint about the module >Generic.pm is gone now. >I am using EntrezGene V 1.09 picked up from CPAN. > >I'm not sure what is wrong. I guess I could delete the Bio/ASN1 directory and install EntrezGene V.1.07 but that is not supposed to make a difference? >Here's a wild stab but are the additional messages related to a statement I read somewhere about the first record being empty? > >Is my next alternative to install Bioperl 1.5 and try again? If so, am I correct in thinking that to 'uninstall' there is no such >Uninstall command but I would just delete the /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then >Use CPAN to install bioperl 1.5 > >Thanks so much, >Annie. > > >Argument "-trimopt" isn't numeric in numeric eq (==) at /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. > > >-----Original Message----- >From: Stefan Kirov [mailto:skirov@utk.edu] >Sent: Tuesday, June 07, 2005 8:39 PM >To: Mingyi Liu >Cc: Law, Annie; Bioperl list >Subject: Re: [Bioperl-l] Entrez Gene parser questions > > >I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the >code (actually the underlying modules I mentioned...). >Thanks for the -file option... There are also some leftover commenst :-) >. I need to clean the code a bit... >Stefan > >Mingyi Liu wrote: > > > >>Hi, Annie >> >>Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked >>fine with your entrezgene.pm in bioperl-live at least a couple days >>ago, when I replied to Annie. V1.09 also did not have any significant >>changes in parser code vs. 1.07. >> >>I guess it has something to do with V1.4 since the sample script you >>used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The >>procedure you used was also correct (there's no need to 'use >>Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first >>(although I don't quite understand why Bio::ASN1::EntrezGene >>complained about non numerica operon, which can only result from >>incorrect calling of the function. But Stefan's entrezgene was using >>correct calling and it runs perfectly for me. I was using >>entrezgene.pm CVS version 1.9. >> >>Keep us posted on your progress. >> >>BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' >>(bioperl style) or 'file' (I allowed both when I implemented this >>switch I think. I usually just use file, which might explain your note >>in the comment in your script. I just noticed the comment. >> >>Mingyi >> >>Stefan Kirov wrote: >> >> >> >>>Hi Annie, >>>Few more things to update: >>>Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >>>your script does not work. Update also Bio::SeqIO::entrezegene- I >>>added some fixed so you will not see those nasty warning when using >>>strict (nothing critical though). >>>I am not sure we have the same versions for Bio::ASN1::EntrezGene. I >>>have 1.0.7. Where did you get it from (maybe mine is older)? >>>Stefan >>> >>>Law, Annie wrote: >>> >>> >>> >>>>Hi, >>>> >>>>Thanks for all of the replies. >>>>I am using bioperl 1.4 and I have done the following: >>>>1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>>>make test, make install) >>>>2. got a copy of the entrezgene.pm from bioperl-live and put it in >>>>the Bio/SeqIO directory >>>>3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>>>the corresponding directory >>>>4. the Bio::Cluster::SequenceFamily file was already up to date >>>>5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>>>modules >>>> >>>>I grabbed the ASN file from >>>>ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>>> >>>>I then wrote a simple perl script which includes: >>>> >>>>#!/usr/bin/perl -w >>>> >>>>use strict; >>>>use Bio::ASN1::EntrezGene; >>>>use Bio::SeqIO; >>>>use Bio::Annotation::DBLink; >>>>use Bio::Cluster::SequenceFamily; >>>> >>>>my $file = '/var/lib/mysql/Homo_sapiens'; >>>> >>>>my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>>> -file => $file, >>>> -debug => 'on'); >>>> >>>>my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>>> >>>>After I run this script I get the following errors: >>>> >>>>Useless use of hash element in void context at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>>> >>>>Argument "-trimopt" isn't numeric in numeric eq (==) at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>>> >>>>Pseudo-hashes are deprecated at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>>> >>>>Use of uninitialized value in string ne at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>>> >>>>Pseudo-hashes are deprecated at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>> >>>>Can't locate object method "add_tag_value" via package >>>>"Bio::SeqFeature::Generic" at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>>> >>>>I'm not sure what I am missing? >>>> >>>>Thanks so much, >>>>Annie. >>>> >>>> >>> >>> >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From Annie.Law at nrc-cnrc.gc.ca Wed Jun 8 12:01:52 2005 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Wed Jun 8 11:53:48 2005 Subject: [Bioperl-l] Entrez Gene parser questions Message-ID: <10C94843061E094A98C02EB77CFC328722FF4C@nrcmrdex1d.imsb.nrc.ca> Hi Stefan, I commented out the use strict and I think I got the same output. Here's what I got. Thanks, Annie. Argument "-trimopt" isn't numeric in numeric eq (==) at /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. -----Original Message----- From: Stefan Kirov [mailto:skirov@utk.edu] Sent: Wednesday, June 08, 2005 11:53 AM To: Law, Annie Cc: Mingyi Liu; Bioperl list Subject: Re: [Bioperl-l] Entrez Gene parser questions Annie, Can you just disable use strict for now and tell me again what you see? Stefan Law, Annie wrote: >Hi, > >I went to bioperl-live and copied the newest entrezgene.pm and >Bio::AnnotatableI, Bio::SeqFeatureI modules and I got the following >output (same output with entrezgene.pm saved on May 11 as well). The >complaint about the module Generic.pm is gone now. I am using >EntrezGene V 1.09 picked up from CPAN. > >I'm not sure what is wrong. I guess I could delete the Bio/ASN1 >directory and install EntrezGene V.1.07 but that is not supposed to >make a difference? Here's a wild stab but are the additional messages >related to a statement I read somewhere about the first record being >empty? > >Is my next alternative to install Bioperl 1.5 and try again? If so, am >I correct in thinking that to 'uninstall' there is no such Uninstall >command but I would just delete the /usr/lib/perl5/site_perl/5.8.0/Bio >ie. The Bio directory and then Use CPAN to install bioperl 1.5 > >Thanks so much, >Annie. > > >Argument "-trimopt" isn't numeric in numeric eq (==) at >/usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >Pseudo-hashes are deprecated at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. Use of >uninitialized value in string eq at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. Use of >uninitialized value in string eq at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. Use of >uninitialized value in substitution (s///) at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. Use of >uninitialized value in string eq at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. Use of >uninitialized value in substitution (s///) at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >Pseudo-hashes are deprecated at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. Use of >uninitialized value in string eq at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of >uninitialized value in string eq at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of >uninitialized value in string eq at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of >uninitialized value in string eq at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. Use of >uninitialized value in string eq at >/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. > > >-----Original Message----- >From: Stefan Kirov [mailto:skirov@utk.edu] >Sent: Tuesday, June 07, 2005 8:39 PM >To: Mingyi Liu >Cc: Law, Annie; Bioperl list >Subject: Re: [Bioperl-l] Entrez Gene parser questions > > >I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the >code (actually the underlying modules I mentioned...). >Thanks for the -file option... There are also some leftover commenst :-) >. I need to clean the code a bit... >Stefan > >Mingyi Liu wrote: > > > >>Hi, Annie >> >>Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked >>fine with your entrezgene.pm in bioperl-live at least a couple days >>ago, when I replied to Annie. V1.09 also did not have any significant >>changes in parser code vs. 1.07. >> >>I guess it has something to do with V1.4 since the sample script you >>used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The >>procedure you used was also correct (there's no need to 'use >>Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first >>(although I don't quite understand why Bio::ASN1::EntrezGene >>complained about non numerica operon, which can only result from >>incorrect calling of the function. But Stefan's entrezgene was using >>correct calling and it runs perfectly for me. I was using >>entrezgene.pm CVS version 1.9. >> >>Keep us posted on your progress. >> >>BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' >>(bioperl style) or 'file' (I allowed both when I implemented this >>switch I think. I usually just use file, which might explain your note >>in the comment in your script. I just noticed the comment. >> >>Mingyi >> >>Stefan Kirov wrote: >> >> >> >>>Hi Annie, >>>Few more things to update: >>>Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >>>your script does not work. Update also Bio::SeqIO::entrezegene- I >>>added some fixed so you will not see those nasty warning when using >>>strict (nothing critical though). I am not sure we have the same >>>versions for Bio::ASN1::EntrezGene. I have 1.0.7. Where did you get >>>it from (maybe mine is older)? Stefan >>> >>>Law, Annie wrote: >>> >>> >>> >>>>Hi, >>>> >>>>Thanks for all of the replies. >>>>I am using bioperl 1.4 and I have done the following: >>>>1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>>>make test, make install) 2. got a copy of the entrezgene.pm from >>>>bioperl-live and put it in the Bio/SeqIO directory >>>>3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>>>the corresponding directory >>>>4. the Bio::Cluster::SequenceFamily file was already up to date >>>>5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>>>modules >>>> >>>>I grabbed the ASN file from >>>>ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>>> >>>>I then wrote a simple perl script which includes: >>>> >>>>#!/usr/bin/perl -w >>>> >>>>use strict; >>>>use Bio::ASN1::EntrezGene; >>>>use Bio::SeqIO; >>>>use Bio::Annotation::DBLink; >>>>use Bio::Cluster::SequenceFamily; >>>> >>>>my $file = '/var/lib/mysql/Homo_sapiens'; >>>> >>>>my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>>> -file => $file, >>>> -debug => 'on'); >>>> >>>>my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>>> >>>>After I run this script I get the following errors: >>>> >>>>Useless use of hash element in void context at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>>> >>>>Argument "-trimopt" isn't numeric in numeric eq (==) at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>>> >>>>Pseudo-hashes are deprecated at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>>> >>>>Use of uninitialized value in string ne at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>>> >>>>Pseudo-hashes are deprecated at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>> >>>>Can't locate object method "add_tag_value" via package >>>>"Bio::SeqFeature::Generic" at >>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>>> >>>>I'm not sure what I am missing? >>>> >>>>Thanks so much, >>>>Annie. >>>> >>>> >>> >>> >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From sdavis2 at mail.nih.gov Wed Jun 8 12:25:30 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Jun 8 12:18:45 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <42A71445.9010308@utk.edu> References: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> <42A71445.9010308@utk.edu> Message-ID: <414778d6b5d2e32b9a0c20192a9afe3f@mail.nih.gov> Just for reference, I am using CVS bioperl-live, Bio::ASN1::entrezgene 1.09 without any difficulty. Sean On Jun 8, 2005, at 11:52 AM, Stefan Kirov wrote: > Annie, > Can you just disable use strict for now and tell me again what you see? > Stefan > > Law, Annie wrote: > >> Hi, >> I went to bioperl-live and copied the newest entrezgene.pm and >> Bio::AnnotatableI, Bio::SeqFeatureI modules and >> I got the following output (same output with entrezgene.pm saved on >> May 11 as well). The complaint about the module Generic.pm is gone >> now. >> I am using EntrezGene V 1.09 picked up from CPAN. >> >> I'm not sure what is wrong. I guess I could delete the Bio/ASN1 >> directory and install EntrezGene V.1.07 but that is not supposed to >> make a difference? >> Here's a wild stab but are the additional messages related to a >> statement I read somewhere about the first record being empty? >> >> Is my next alternative to install Bioperl 1.5 and try again? If so, >> am I correct in thinking that to 'uninstall' there is no such >> Uninstall command but I would just delete the >> /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then >> Use CPAN to install bioperl 1.5 >> >> Thanks so much, >> Annie. >> >> >> Argument "-trimopt" isn't numeric in numeric eq (==) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >> Pseudo-hashes are deprecated at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >> Use of uninitialized value in substitution (s///) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >> Use of uninitialized value in substitution (s///) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >> Pseudo-hashes are deprecated at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. >> >> >> -----Original Message----- >> From: Stefan Kirov [mailto:skirov@utk.edu] Sent: Tuesday, June 07, >> 2005 8:39 PM >> To: Mingyi Liu >> Cc: Law, Annie; Bioperl list >> Subject: Re: [Bioperl-l] Entrez Gene parser questions >> >> >> I am pretty sure it is the old Bio::SeqFeature::Generic that breaks >> the code (actually the underlying modules I mentioned...). >> Thanks for the -file option... There are also some leftover commenst >> :-) . I need to clean the code a bit... >> Stefan From skirov at utk.edu Wed Jun 8 12:10:36 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Jun 8 12:44:16 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF4C@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF4C@nrcmrdex1d.imsb.nrc.ca> Message-ID: <42A7187C.3090902@utk.edu> Change #!/usr/bin/perl -w to #!/usr/bin/perl and let me know again... Thanks! Law, Annie wrote: >Hi Stefan, > >I commented out the use strict and I think I got the same output. >Here's what I got. > >Thanks, >Annie. > >Argument "-trimopt" isn't numeric in numeric eq (==) at /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >Use of uninitialized value in substitution (s///) at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >Pseudo-hashes are deprecated at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. >Use of uninitialized value in string eq at /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. > >-----Original Message----- >From: Stefan Kirov [mailto:skirov@utk.edu] >Sent: Wednesday, June 08, 2005 11:53 AM >To: Law, Annie >Cc: Mingyi Liu; Bioperl list >Subject: Re: [Bioperl-l] Entrez Gene parser questions > > >Annie, >Can you just disable use strict for now and tell me again what you see? Stefan > >Law, Annie wrote: > > > >>Hi, >> >>I went to bioperl-live and copied the newest entrezgene.pm and >>Bio::AnnotatableI, Bio::SeqFeatureI modules and I got the following >>output (same output with entrezgene.pm saved on May 11 as well). The >>complaint about the module Generic.pm is gone now. I am using >>EntrezGene V 1.09 picked up from CPAN. >> >>I'm not sure what is wrong. I guess I could delete the Bio/ASN1 >>directory and install EntrezGene V.1.07 but that is not supposed to >>make a difference? Here's a wild stab but are the additional messages >>related to a statement I read somewhere about the first record being >>empty? >> >>Is my next alternative to install Bioperl 1.5 and try again? If so, am >>I correct in thinking that to 'uninstall' there is no such Uninstall >>command but I would just delete the /usr/lib/perl5/site_perl/5.8.0/Bio >>ie. The Bio directory and then Use CPAN to install bioperl 1.5 >> >>Thanks so much, >>Annie. >> >> >>Argument "-trimopt" isn't numeric in numeric eq (==) at >>/usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>Pseudo-hashes are deprecated at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. Use of >>uninitialized value in string eq at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. Use of >>uninitialized value in string eq at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. Use of >>uninitialized value in substitution (s///) at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. Use of >>uninitialized value in string eq at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. Use of >>uninitialized value in substitution (s///) at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >>Pseudo-hashes are deprecated at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. Use of >>uninitialized value in string eq at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of >>uninitialized value in string eq at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of >>uninitialized value in string eq at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. Use of >>uninitialized value in string eq at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. Use of >>uninitialized value in string eq at >>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. >> >> >>-----Original Message----- >>From: Stefan Kirov [mailto:skirov@utk.edu] >>Sent: Tuesday, June 07, 2005 8:39 PM >>To: Mingyi Liu >>Cc: Law, Annie; Bioperl list >>Subject: Re: [Bioperl-l] Entrez Gene parser questions >> >> >>I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the >>code (actually the underlying modules I mentioned...). >>Thanks for the -file option... There are also some leftover commenst :-) >>. I need to clean the code a bit... >>Stefan >> >>Mingyi Liu wrote: >> >> >> >> >> >>>Hi, Annie >>> >>>Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked >>>fine with your entrezgene.pm in bioperl-live at least a couple days >>>ago, when I replied to Annie. V1.09 also did not have any significant >>>changes in parser code vs. 1.07. >>> >>>I guess it has something to do with V1.4 since the sample script you >>>used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The >>>procedure you used was also correct (there's no need to 'use >>>Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first >>>(although I don't quite understand why Bio::ASN1::EntrezGene >>>complained about non numerica operon, which can only result from >>>incorrect calling of the function. But Stefan's entrezgene was using >>>correct calling and it runs perfectly for me. I was using >>>entrezgene.pm CVS version 1.9. >>> >>>Keep us posted on your progress. >>> >>>BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' >>>(bioperl style) or 'file' (I allowed both when I implemented this >>>switch I think. I usually just use file, which might explain your note >>>in the comment in your script. I just noticed the comment. >>> >>>Mingyi >>> >>>Stefan Kirov wrote: >>> >>> >>> >>> >>> >>>>Hi Annie, >>>>Few more things to update: >>>>Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >>>>your script does not work. Update also Bio::SeqIO::entrezegene- I >>>>added some fixed so you will not see those nasty warning when using >>>>strict (nothing critical though). I am not sure we have the same >>>>versions for Bio::ASN1::EntrezGene. I have 1.0.7. Where did you get >>>>it from (maybe mine is older)? Stefan >>>> >>>>Law, Annie wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Hi, >>>>> >>>>>Thanks for all of the replies. >>>>>I am using bioperl 1.4 and I have done the following: >>>>>1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>>>>make test, make install) 2. got a copy of the entrezgene.pm from >>>>>bioperl-live and put it in the Bio/SeqIO directory >>>>>3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>>>>the corresponding directory >>>>>4. the Bio::Cluster::SequenceFamily file was already up to date >>>>>5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>>>>modules >>>>> >>>>>I grabbed the ASN file from >>>>>ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>>>> >>>>>I then wrote a simple perl script which includes: >>>>> >>>>>#!/usr/bin/perl -w >>>>> >>>>>use strict; >>>>>use Bio::ASN1::EntrezGene; >>>>>use Bio::SeqIO; >>>>>use Bio::Annotation::DBLink; >>>>>use Bio::Cluster::SequenceFamily; >>>>> >>>>>my $file = '/var/lib/mysql/Homo_sapiens'; >>>>> >>>>>my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>>>> -file => $file, >>>>> -debug => 'on'); >>>>> >>>>>my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>>>> >>>>>After I run this script I get the following errors: >>>>> >>>>>Useless use of hash element in void context at >>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>>>> >>>>>Argument "-trimopt" isn't numeric in numeric eq (==) at >>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>>>> >>>>>Pseudo-hashes are deprecated at >>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>>>> >>>>>Use of uninitialized value in string ne at >>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>>>> >>>>>Pseudo-hashes are deprecated at >>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>>> >>>>>Can't locate object method "add_tag_value" via package >>>>>"Bio::SeqFeature::Generic" at >>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>>>> >>>>>I'm not sure what I am missing? >>>>> >>>>>Thanks so much, >>>>>Annie. >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> > > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From mingyi.liu at gpc-biotech.com Wed Jun 8 16:44:30 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Wed Jun 8 16:36:28 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> Message-ID: <42A758AE.70304@gpc-biotech.com> Hi, Stefan's right in suggesting you turn off -w, which would make your script work. But thanks for finding this bug. I just noticed that entrezgene.pm was actually calling the Bio::ASN1::EntrezGene->next_seq incorrectly (probably my documentation was a bit confusing & my module did not follow standard hash-based parameter passing of subroutines). It should be called like ->next_seq(1), but entrezgene.pm called using ->next_seq(-trimopt => 1). This worked for all of us who do not use -w, as it would fall back to option '1' in my next_seq function (exactly as Stefan's calling function wanted). Therefore this bug went unnoticed until you turned on -w (I guess we were all spoiled by the easy (and sometimes messy) life of weak datatyping in Perl). Stefan would you please change the calling of next_seq to next_seq(1)? This would fix the error messages Annie was seeing. Thanks, Mingyi From skirov at utk.edu Wed Jun 8 16:50:05 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Jun 8 16:41:54 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <42A758AE.70304@gpc-biotech.com> References: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> <42A758AE.70304@gpc-biotech.com> Message-ID: <42A759FD.7000504@utk.edu> Sure. Thanks for letting me know. Annie, does it work for you now? Stefan Mingyi Liu wrote: > Hi, > > Stefan's right in suggesting you turn off -w, which would make your > script work. But thanks for finding this bug. I just noticed that > entrezgene.pm was actually calling the Bio::ASN1::EntrezGene->next_seq > incorrectly (probably my documentation was a bit confusing & my module > did not follow standard hash-based parameter passing of subroutines). > It should be called like ->next_seq(1), but entrezgene.pm called using > ->next_seq(-trimopt => 1). This worked for all of us who do not use > -w, as it would fall back to option '1' in my next_seq function > (exactly as Stefan's calling function wanted). Therefore this bug > went unnoticed until you turned on -w (I guess we were all spoiled by > the easy (and sometimes messy) life of weak datatyping in Perl). > > Stefan would you please change the calling of next_seq to > next_seq(1)? This would fix the error messages Annie was seeing. > > Thanks, > > Mingyi > From hlapp at gmx.net Wed Jun 8 22:28:47 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jun 8 22:21:43 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> Message-ID: <7f77f6e5803451c7236755c5550e1023@gmx.net> Quite frankly, mixing versions in bioperl has never worked really well and nobody really has time to support this, so I don't think it's good advice to people to suggest plugging in single modules from a recent version into an older one. Just update to current cvs and you should be at a state where at least you're not totally on your own. As for the output, these are obviously warnings, and instead of asking you to turn off -w (L.Wall: Known bugs: -w is not on by default.) Stefan and Mingyi should fix whatever is needed to silence those warnings. (No offense please) -hilmar On Jun 8, 2005, at 11:40 PM, Law, Annie wrote: > Hi, > > I went to bioperl-live and copied the newest entrezgene.pm and > Bio::AnnotatableI, Bio::SeqFeatureI modules and > I got the following output (same output with entrezgene.pm saved on > May 11 as well). The complaint about the module > Generic.pm is gone now. > I am using EntrezGene V 1.09 picked up from CPAN. > > I'm not sure what is wrong. I guess I could delete the Bio/ASN1 > directory and install EntrezGene V.1.07 but that is not supposed to > make a difference? > Here's a wild stab but are the additional messages related to a > statement I read somewhere about the first record being empty? > > Is my next alternative to install Bioperl 1.5 and try again? If so, am > I correct in thinking that to 'uninstall' there is no such > Uninstall command but I would just delete the > /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then > Use CPAN to install bioperl 1.5 > > Thanks so much, > Annie. > > > Argument "-trimopt" isn't numeric in numeric eq (==) at > /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. > Pseudo-hashes are deprecated at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. > Use of uninitialized value in string eq at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. > Use of uninitialized value in string eq at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. > Use of uninitialized value in substitution (s///) at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. > Use of uninitialized value in string eq at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. > Use of uninitialized value in substitution (s///) at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. > Pseudo-hashes are deprecated at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. > Use of uninitialized value in string eq at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. > Use of uninitialized value in string eq at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. > Use of uninitialized value in string eq at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. > Use of uninitialized value in string eq at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. > Use of uninitialized value in string eq at > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. > > > -----Original Message----- > From: Stefan Kirov [mailto:skirov@utk.edu] > Sent: Tuesday, June 07, 2005 8:39 PM > To: Mingyi Liu > Cc: Law, Annie; Bioperl list > Subject: Re: [Bioperl-l] Entrez Gene parser questions > > > I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the > code (actually the underlying modules I mentioned...). > Thanks for the -file option... There are also some leftover commenst > :-) > . I need to clean the code a bit... > Stefan > > Mingyi Liu wrote: > >> Hi, Annie >> >> Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked >> fine with your entrezgene.pm in bioperl-live at least a couple days >> ago, when I replied to Annie. V1.09 also did not have any significant >> changes in parser code vs. 1.07. >> >> I guess it has something to do with V1.4 since the sample script you >> used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The >> procedure you used was also correct (there's no need to 'use >> Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first >> (although I don't quite understand why Bio::ASN1::EntrezGene >> complained about non numerica operon, which can only result from >> incorrect calling of the function. But Stefan's entrezgene was using >> correct calling and it runs perfectly for me. I was using >> entrezgene.pm CVS version 1.9. >> >> Keep us posted on your progress. >> >> BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' >> (bioperl style) or 'file' (I allowed both when I implemented this >> switch I think. I usually just use file, which might explain your note >> in the comment in your script. I just noticed the comment. >> >> Mingyi >> >> Stefan Kirov wrote: >> >>> Hi Annie, >>> Few more things to update: >>> Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >>> your script does not work. Update also Bio::SeqIO::entrezegene- I >>> added some fixed so you will not see those nasty warning when using >>> strict (nothing critical though). >>> I am not sure we have the same versions for Bio::ASN1::EntrezGene. I >>> have 1.0.7. Where did you get it from (maybe mine is older)? >>> Stefan >>> >>> Law, Annie wrote: >>> >>>> Hi, >>>> >>>> Thanks for all of the replies. >>>> I am using bioperl 1.4 and I have done the following: >>>> 1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>>> make test, make install) >>>> 2. got a copy of the entrezgene.pm from bioperl-live and put it in >>>> the Bio/SeqIO directory >>>> 3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>>> the corresponding directory >>>> 4. the Bio::Cluster::SequenceFamily file was already up to date >>>> 5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>>> modules >>>> >>>> I grabbed the ASN file from >>>> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>>> >>>> I then wrote a simple perl script which includes: >>>> >>>> #!/usr/bin/perl -w >>>> >>>> use strict; >>>> use Bio::ASN1::EntrezGene; >>>> use Bio::SeqIO; >>>> use Bio::Annotation::DBLink; >>>> use Bio::Cluster::SequenceFamily; >>>> >>>> my $file = '/var/lib/mysql/Homo_sapiens'; >>>> >>>> my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>>> -file => $file, >>>> -debug => 'on'); >>>> >>>> my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>>> >>>> After I run this script I get the following errors: >>>> >>>> Useless use of hash element in void context at >>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>>> >>>> Argument "-trimopt" isn't numeric in numeric eq (==) at >>>> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>>> >>>> Pseudo-hashes are deprecated at >>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>>> >>>> Use of uninitialized value in string ne at >>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>>> >>>> Pseudo-hashes are deprecated at >>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>> >>>> Can't locate object method "add_tag_value" via package >>>> "Bio::SeqFeature::Generic" at >>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>>> >>>> I'm not sure what I am missing? >>>> >>>> Thanks so much, >>>> Annie. >>> >>> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Wed Jun 8 23:01:11 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Jun 8 22:53:17 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <7f77f6e5803451c7236755c5550e1023@gmx.net> References: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> <7f77f6e5803451c7236755c5550e1023@gmx.net> Message-ID: <42A7B0F7.6020107@utk.edu> Hilmar Lapp wrote: > Quite frankly, mixing versions in bioperl has never worked really well > and nobody really has time to support this, so I don't think it's good > advice to people to suggest plugging in single modules from a recent > version into an older one. Just update to current cvs and you should > be at a state where at least you're not totally on your own. Agree, but there may be a need for a transitional period. Annie, why do you need bioperl 1.4? Is there something that lacks backcompatibility? I think in general most modules try to provide such... > > As for the output, these are obviously warnings, and instead of asking > you to turn off -w (L.Wall: Known bugs: -w is not on by default.) > Stefan and Mingyi should fix whatever is needed to silence those > warnings. (No offense please) None taken. But this is not even a developer's release and I believe such bugs are usual and normal (as long as they are not there for ever). I am creating tests and I will fix the warnings when I commit these, but I believe there are no deadlines I am missing. Please correct me if I am wrong and I can probably spend some more time fixing the warnings produced by entrezgene parser. Overall I don't believe it is a bad idea for people to explore particular code without strict (provided this does not go into production). That's why the term 'unstable' exists, does it not? Stefan > > -hilmar > > On Jun 8, 2005, at 11:40 PM, Law, Annie wrote: > >> Hi, >> >> I went to bioperl-live and copied the newest entrezgene.pm and >> Bio::AnnotatableI, Bio::SeqFeatureI modules and >> I got the following output (same output with entrezgene.pm saved on >> May 11 as well). The complaint about the module >> Generic.pm is gone now. >> I am using EntrezGene V 1.09 picked up from CPAN. >> >> I'm not sure what is wrong. I guess I could delete the Bio/ASN1 >> directory and install EntrezGene V.1.07 but that is not supposed to >> make a difference? >> Here's a wild stab but are the additional messages related to a >> statement I read somewhere about the first record being empty? >> >> Is my next alternative to install Bioperl 1.5 and try again? If so, >> am I correct in thinking that to 'uninstall' there is no such >> Uninstall command but I would just delete the >> /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then >> Use CPAN to install bioperl 1.5 >> >> Thanks so much, >> Annie. >> >> >> Argument "-trimopt" isn't numeric in numeric eq (==) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >> Pseudo-hashes are deprecated at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >> Use of uninitialized value in substitution (s///) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >> Use of uninitialized value in substitution (s///) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >> Pseudo-hashes are deprecated at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. >> >> >> -----Original Message----- >> From: Stefan Kirov [mailto:skirov@utk.edu] >> Sent: Tuesday, June 07, 2005 8:39 PM >> To: Mingyi Liu >> Cc: Law, Annie; Bioperl list >> Subject: Re: [Bioperl-l] Entrez Gene parser questions >> >> >> I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the >> code (actually the underlying modules I mentioned...). >> Thanks for the -file option... There are also some leftover commenst :-) >> . I need to clean the code a bit... >> Stefan >> >> Mingyi Liu wrote: >> >>> Hi, Annie >>> >>> Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked >>> fine with your entrezgene.pm in bioperl-live at least a couple days >>> ago, when I replied to Annie. V1.09 also did not have any significant >>> changes in parser code vs. 1.07. >>> >>> I guess it has something to do with V1.4 since the sample script you >>> used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The >>> procedure you used was also correct (there's no need to 'use >>> Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first >>> (although I don't quite understand why Bio::ASN1::EntrezGene >>> complained about non numerica operon, which can only result from >>> incorrect calling of the function. But Stefan's entrezgene was using >>> correct calling and it runs perfectly for me. I was using >>> entrezgene.pm CVS version 1.9. >>> >>> Keep us posted on your progress. >>> >>> BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' >>> (bioperl style) or 'file' (I allowed both when I implemented this >>> switch I think. I usually just use file, which might explain your note >>> in the comment in your script. I just noticed the comment. >>> >>> Mingyi >>> >>> Stefan Kirov wrote: >>> >>>> Hi Annie, >>>> Few more things to update: >>>> Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >>>> your script does not work. Update also Bio::SeqIO::entrezegene- I >>>> added some fixed so you will not see those nasty warning when using >>>> strict (nothing critical though). >>>> I am not sure we have the same versions for Bio::ASN1::EntrezGene. I >>>> have 1.0.7. Where did you get it from (maybe mine is older)? >>>> Stefan >>>> >>>> Law, Annie wrote: >>>> >>>>> Hi, >>>>> >>>>> Thanks for all of the replies. >>>>> I am using bioperl 1.4 and I have done the following: >>>>> 1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>>>> make test, make install) >>>>> 2. got a copy of the entrezgene.pm from bioperl-live and put it in >>>>> the Bio/SeqIO directory >>>>> 3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>>>> the corresponding directory >>>>> 4. the Bio::Cluster::SequenceFamily file was already up to date >>>>> 5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>>>> modules >>>>> >>>>> I grabbed the ASN file from >>>>> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>>>> >>>>> I then wrote a simple perl script which includes: >>>>> >>>>> #!/usr/bin/perl -w >>>>> >>>>> use strict; >>>>> use Bio::ASN1::EntrezGene; >>>>> use Bio::SeqIO; >>>>> use Bio::Annotation::DBLink; >>>>> use Bio::Cluster::SequenceFamily; >>>>> >>>>> my $file = '/var/lib/mysql/Homo_sapiens'; >>>>> >>>>> my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>>>> -file => $file, >>>>> -debug => 'on'); >>>>> >>>>> my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>>>> >>>>> After I run this script I get the following errors: >>>>> >>>>> Useless use of hash element in void context at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>>>> >>>>> Argument "-trimopt" isn't numeric in numeric eq (==) at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>>>> >>>>> Pseudo-hashes are deprecated at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>>>> >>>>> Use of uninitialized value in string ne at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>>>> >>>>> Pseudo-hashes are deprecated at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>>> >>>>> Can't locate object method "add_tag_value" via package >>>>> "Bio::SeqFeature::Generic" at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>>>> >>>>> I'm not sure what I am missing? >>>>> >>>>> Thanks so much, >>>>> Annie. >>>> >>>> >>>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> From hlapp at gmx.net Wed Jun 8 23:28:25 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jun 8 23:20:51 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <42A7B0F7.6020107@utk.edu> References: <10C94843061E094A98C02EB77CFC328722FF4B@nrcmrdex1d.imsb.nrc.ca> <7f77f6e5803451c7236755c5550e1023@gmx.net> <42A7B0F7.6020107@utk.edu> Message-ID: <2c1d295d825d53b71de8eec964defea2@gmx.net> On Jun 9, 2005, at 11:01 AM, Stefan Kirov wrote: >> As for the output, these are obviously warnings, and instead of >> asking you to turn off -w (L.Wall: Known bugs: -w is not on by >> default.) Stefan and Mingyi should fix whatever is needed to silence >> those warnings. (No offense please) > > None taken. But this is not even a developer's release and I believe > such bugs are usual and normal (as long as they are not there for > ever). I am creating tests and I will fix the warnings when I commit > these, but I believe there are no deadlines I am missing. Please > correct me if I am wrong and I can probably spend some more time > fixing the warnings produced by entrezgene parser. > Overall I don't believe it is a bad idea for people to explore > particular code without strict (provided this does not go into > production). That's why the term 'unstable' exists, does it not? > Sure - I haven't said you should fix it ASAP or should have fixed it already, have I? Developers who are annoyed can BTW fix it themselves :-) the thing is just not everybody is a developer in this regard. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Jun 8 23:46:10 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jun 8 23:38:05 2005 Subject: [Bioperl-l] Error loading sequence with load_seqdatabase.pl In-Reply-To: <20050609033924.11682.qmail@web40708.mail.yahoo.com> References: <20050609033924.11682.qmail@web40708.mail.yahoo.com> Message-ID: <7c10e9211ba9b2d13026002a5e1f4390@gmx.net> I've never heard about Rocks Cluster as an OS ... 2GB seem plenty; even if you db is on the same machine it shouldn't be a problem (unless you've most of the memory to Pg's shared pool). You need to use a monitoring tool like 'top' to see where your resources go and whether something exhausts the memory. If you write a simple script that creates a simple dummy table and inserts 1 million random rows, does that raise any problem? (Like, accumulates memory or even crashes too?) -hilmar On Jun 9, 2005, at 11:39 AM, Duangdaow Kanhasiri wrote: > The > > OS: Rocks Cluster v 3.3 > Total Memory: 2 GB > DBD::Pg version: 1.42 > DBI version: 1.48 > > > --- Hilmar Lapp wrote: > >> What OS are you running this on? How much memory >> have you got on the >> machine on which you run the script, and on the >> machine on which you >> run the database? Are these the same or not? Which >> version of DBI and >> DBD::Pg? >> >> This hasn't been reported by anyone else really so I >> suspect it's >> either due to too limited memory, or a problem in >> the DBD driver or in >> the DBI compiled code. Can you watch the process >> (using, e.g., top) and >> see how fast it increases in memory consumption? >> Since you can continue >> when you restart it's not something specific to one >> sequence that would >> trigger the problem; rather it appears whenever you >> have run through a >> certain number of entries the process dies. >> >> -hilmar >> >> On Jun 8, 2005, at 7:43 PM, Duangdaow Kanhasiri >> wrote: >> >>> Hi, >>> >>> I've used the bioperl script load_seqdatabase.pl >> (came >>> with the biosql' scripts) to load the bacterial >>> sequence in genbank format(*.gbk) into PostgreSQL >> 8.0 >>> database on Linux machine as: >>> >>> $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk >> & >>> >>> Where under the /export/Bacteria/ path are the >>> Bacteria's name path e.g. Acinetobacter_sp_ADP1 >> and >>> the file name are like NC_006824.gbk. >>> >>> Previously it used to load some sequences in to >> some >>> tables in biosql database (count from table >> bioentry) >>> >>> bioseq=# select count(*) from bioentry; >>> count >>> ------- >>> 33 >>> (1 row) >>> >>> >>> However, after a while it then stopped with the >> the >>> error: >>> >>> [1]+ Segmentation fault perl >> load_seqdatabase.pl >>> /export/Bacteria/*/*.gbk & >>> >>> I then checked and removed the *.gbk file that >> have >>> already been loaded in to the table, leaving only >> the >>> unloaded ones and ran the scripted again. It >>> continued to work for some times and stopped >> again. I >>> repeated the process several times until 173 >> sequences >>> were loaded into the table: >>> >>> bioseq=# select count(*) from bioentry; >>> count >>> ------- >>> 173 >>> (1 row) >>> >>> The program then stopped again and this time it >>> wouldn't run anymore even I tried with only on >> file. >>> The error is still the same like: >>> >>> $ perl load_seqdatabase.pl >>> >> > /export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk >>> Segmentation fault >>> $ >>> >>> Now I couldn't load the rest of my sequences into >> the >>> database anymore. I would be very apprecialed if >> any >>> one knows how to solve the "Segmentation fault" >>> problem? >>> >>> Regards, >>> >>> Davina >>> >>> >>> >>> __________________________________ >>> Discover Yahoo! >>> Have fun online with music videos, cool games, IM >> and more. Check it >>> out! >>> http://discover.yahoo.com/ >>> >> > online.html_______________________________________ >> >>> ________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> >> > http://portal.open-bio.org/mailman/listinfo/bioperl-l >> -- >> > ------------------------------------------------------------- >> Hilmar Lapp email: lapp >> at gnf.org >> GNF, San Diego, Ca. 92121 phone: >> +1-858-812-1757 >> > ------------------------------------------------------------- >> >> >> > > > > > __________________________________ > Discover Yahoo! > Get on-the-go sports scores, stock quotes, news and more. Check it out! > http://discover.yahoo.com/mobile.html > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Thu Jun 9 02:15:14 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jun 9 02:12:02 2005 Subject: [Bioperl-l] Error loading sequence with load_seqdatabase.pl In-Reply-To: <20050609035557.45275.qmail@web40727.mail.yahoo.com> References: <20050609035557.45275.qmail@web40727.mail.yahoo.com> Message-ID: First off, I wouldn't continue this thread on biosql-l, it's for schema questions and this clearly isn't one. The script is a perl/bioperl script and some people on the bioperl-l list may happen to be on a platform similar to yours. Second, you're saying now the script wouldn't run at all anymore? With what error message? You might try to supply --debug as an option if the script does at least something before it dies. The output will be potentially extensive so be sure to capture in a file, then send it me. If the script dies immediately I'm afraid I can't do anything. My gut feeling is that there is something with your compiler, C runtime library, or DBI/DBD compiled code that doesn't play well with your perl. Did you do a binary install of perl or did you compile it from source? Really your best bet is to find someone who's on the same or a similar platform and see whether he/she has had similar problems or none of these. -hilmar On Jun 9, 2005, at 11:55 AM, Duangdaow Kanhasiri wrote: > The system I use hase following configs: > > CPU: 2 @ AthlonXP2000 > OS: Rocks Cluster v 3.3 > Total Memory: 2 GB > DBD::Pg version: 1.42 > DBI version: 1.48 > > I've attached the out put of the top command (top.txt) > with this mail. Unfortunately that the script > load_seqdatabase.pl wouldn't run anymore, no matter > how many time I tried running it, therefore, I > couldn't measure how much it consumes the resource > (cpu, memory) on the machine. > > Regards, > > Davina > > > --- Hilmar Lapp wrote: > >> What OS are you running this on? How much memory >> have you got on the >> machine on which you run the script, and on the >> machine on which you >> run the database? Are these the same or not? Which >> version of DBI and >> DBD::Pg? >> >> This hasn't been reported by anyone else really so I >> suspect it's >> either due to too limited memory, or a problem in >> the DBD driver or in >> the DBI compiled code. Can you watch the process >> (using, e.g., top) and >> see how fast it increases in memory consumption? >> Since you can continue >> when you restart it's not something specific to one >> sequence that would >> trigger the problem; rather it appears whenever you >> have run through a >> certain number of entries the process dies. >> >> -hilmar >> >> On Jun 8, 2005, at 7:43 PM, Duangdaow Kanhasiri >> wrote: >> >>> Hi, >>> >>> I've used the bioperl script load_seqdatabase.pl >> (came >>> with the biosql' scripts) to load the bacterial >>> sequence in genbank format(*.gbk) into PostgreSQL >> 8.0 >>> database on Linux machine as: >>> >>> $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk >> & >>> >>> Where under the /export/Bacteria/ path are the >>> Bacteria's name path e.g. Acinetobacter_sp_ADP1 >> and >>> the file name are like NC_006824.gbk. >>> >>> Previously it used to load some sequences in to >> some >>> tables in biosql database (count from table >> bioentry) >>> >>> bioseq=# select count(*) from bioentry; >>> count >>> ------- >>> 33 >>> (1 row) >>> >>> >>> However, after a while it then stopped with the >> the >>> error: >>> >>> [1]+ Segmentation fault perl >> load_seqdatabase.pl >>> /export/Bacteria/*/*.gbk & >>> >>> I then checked and removed the *.gbk file that >> have >>> already been loaded in to the table, leaving only >> the >>> unloaded ones and ran the scripted again. It >>> continued to work for some times and stopped >> again. I >>> repeated the process several times until 173 >> sequences >>> were loaded into the table: >>> >>> bioseq=# select count(*) from bioentry; >>> count >>> ------- >>> 173 >>> (1 row) >>> >>> The program then stopped again and this time it >>> wouldn't run anymore even I tried with only on >> file. >>> The error is still the same like: >>> >>> $ perl load_seqdatabase.pl >>> >> > /export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk >>> Segmentation fault >>> $ >>> >>> Now I couldn't load the rest of my sequences into >> the >>> database anymore. I would be very apprecialed if >> any >>> one knows how to solve the "Segmentation fault" >>> problem? >>> >>> Regards, >>> >>> Davina >>> >>> >>> >>> __________________________________ >>> Discover Yahoo! >>> Have fun online with music videos, cool games, IM >> and more. Check it >>> out! >>> http://discover.yahoo.com/ >>> >> > online.html_______________________________________ >> >>> ________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> >> > http://portal.open-bio.org/mailman/listinfo/bioperl-l >> -- >> > ------------------------------------------------------------- >> Hilmar Lapp email: lapp >> at gnf.org >> GNF, San Diego, Ca. 92121 phone: >> +1-858-812-1757 >> > ------------------------------------------------------------- >> >> >> > > > > __________________________________ > Discover Yahoo! > Get on-the-go sports scores, stock quotes, news and more. Check it out! > http://discover.yahoo.com/mobile.html[root@biogenome root]# top > 10:31:14 up 27 days, 21:20, 5 users, load average: 0.00, 0.02, 0.03 > 193 processes: 192 sleeping, 1 running, 0 zombie, 0 stopped > CPU states: cpu user nice system irq softirq iowait > idle > total 1.8% 0.0% 0.0% 0.0% 0.0% 0.0% > 198.0% > cpu00 1.9% 0.0% 0.0% 0.0% 0.0% 0.0% > 98.0% > cpu01 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% > 100.0% > Mem: 2057220k av, 1556640k used, 500580k free, 0k shrd, > 167096k buff > 1101048k actv, 266692k in_d, 39936k in_c > Swap: 4192956k av, 91620k used, 4101336k free > 1196752k cached > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU > COMMAND > 16683 root 23 0 1288 1288 844 R 1.9 0.0 0:00 0 top > 1 root 15 0 520 516 456 S 0.0 0.0 0:29 0 init > 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 > migration/0 > 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 > migration/1 > 4 root 15 0 0 0 0 SW 0.0 0.0 0:00 1 > keventd > 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 > ksoftirqd/0 > 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 > ksoftirqd/1 > 9 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 > bdflush > 7 root 15 0 0 0 0 SW 0.0 0.0 0:37 0 > kswapd > 8 root 15 0 0 0 0 SW 0.0 0.0 0:24 0 > kscand > 10 root 15 0 0 0 0 SW 0.0 0.0 0:19 0 > kupdated > 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 > mdrecoveryd > 17 root 25 0 0 0 0 SW 0.0 0.0 0:00 1 > scsi_eh_0 > 18 root 25 0 0 0 0 SW 0.0 0.0 0:00 1 > aacraid > 20 root 25 0 0 0 0 SW 0.0 0.0 0:00 1 > scsi_eh_0 > 23 root 15 0 0 0 0 SW 0.0 0.0 1:29 1 > kjournald > 70 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd > 1165 root 15 0 0 0 0 SW 0.0 0.0 0:49 0 > kjournald > 1418 root 15 0 0 0 0 SW 0.0 0.0 0:00 1 eth0 > 1543 root 15 0 620 608 524 S 0.0 0.0 0:59 0 > syslogd > 1547 root 15 0 484 424 420 S 0.0 0.0 0:00 0 klogd > 1557 root 15 0 456 448 392 S 0.0 0.0 2:04 0 > irqbalance > 1565 rpc 15 0 572 548 500 S 0.0 0.0 0:00 0 > portmap > 1584 rpcuser 25 0 716 632 628 S 0.0 0.0 0:00 1 > rpc.statd > 1595 root 15 0 404 388 344 S 0.0 0.0 0:06 0 mdadm > 1619 root RT 0 556 456 424 S 0.0 0.0 0:16 1 > auditd > 1629 nobody 15 0 1180 1016 724 S 0.0 0.0 24:57 1 > gmetad > 1658 root 15 0 472 424 400 S 0.0 0.0 0:01 0 pvfsd > > > [root@biogenome DBD]# df -h > Filesystem Size Used Avail Use% Mounted on > /dev/sda1 5.8G 3.6G 1.9G 66% / > /dev/sda3 125G 24G 95G 21% /export > none 1005M 0 1005M 0% /dev/shm > tmpfs 503M 3.5M 499M 1% /var/lib/ganglia/rrds > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From priesel at caesar.de Thu Jun 9 03:19:31 2005 From: priesel at caesar.de (Saskia Priesel) Date: Thu Jun 9 03:15:05 2005 Subject: [Bioperl-l] Parsing PDB Files Message-ID: <42A7ED15.4010902@caesar.de> Hello to all, I want to parse the sequence from a PDB File. For this I found the module Bio::Structure::Entry but I don't understand it. There only the explanations of the methods but no synopsis. I need help. Who have an idea? Regards, Saskia From yuichiyoshi at gmail.com Thu Jun 9 05:39:08 2005 From: yuichiyoshi at gmail.com (Yoshida Yuichi) Date: Thu Jun 9 05:30:59 2005 Subject: [Bioperl-l] Problem in parsing GenBank flatfile Message-ID: Dear all, I am trying to parse GenBank flatfile (accession num is NT_015926) by calling Bio::SeqIO modules, but I can not. - - - - - - - - - - - - - Perl program code - - - - - - - - - - - - - #!/usr/bin/perl use Bio::SeqIO; $gbk_filename = shift @ARGV; $seqin = Bio::SeqIO->new(-file=>$gbk_filename, -format=>'Genbank'); while ($seqobj = $seqin->next_seq) { $accession = $seqobj->accession_number,"\n"; foreach my $feat ($seqobj->get_SeqFeatures()){ if ($feat->primary_tag eq 'mRNA'){ $db_gene_name = join(' ',$feat->get_tag_values('gene')); $db_transcript_id = join(' ',$feat->get_tag_values('transcript_id')); $start = $feat->start; $end = $feat->end; print $db_transcript_id,"\t",$db_gene_name,"\t",$accession,"\t"; print $start,"\t",$end,"\n"; } } } - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The following error message is shown. - - - - - - - - - - - - - error message - - - - - - - - - - - - - -------------------- WARNING --------------------- MSG: cannot see new qualifier in feature CDS: aa:OTHER) --------------------------------------------------- -------------------- WARNING --------------------- MSG: cannot see new qualifier in feature CDS: aa:OTHER) --------------------------------------------------- -------------------- WARNING --------------------- MSG: cannot see new qualifier in feature CDS: aa:OTHER) --------------------------------------------------- out of memory - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The parts which cause the error (I guess) is shown as the followings. - - - - - GenBank partial flatfile (NT_015926) - - - - - CDS complement(join(4528741..4528932,4543408..4543490, 4581809..4582043,4616648..4616817,4632093..4632236, 4643148..4643301)) /gene="FLJ21820" /note="go_function: catalytic activity [goid 0003824] [evidence IEA]; go_process: lipid metabolism [goid 0006629] [evidence IEA]" /codon_start=1 /product="hypothetical protein FLJ21820" /protein_id="NP_068744.1" /db_xref="GI:11345458" /db_xref="GeneID:60526" /db_xref="LocusID:60526" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Would you please tell me the way to solve this problem? -- Yuichi Yoshida From priesel at caesar.de Thu Jun 9 06:48:51 2005 From: priesel at caesar.de (Saskia Priesel) Date: Thu Jun 9 06:45:11 2005 Subject: [Bioperl-l] Bioperl Module Message-ID: <42A81E29.1030005@caesar.de> Hi, who can explain me explicitly the bioperl Module Bio::Structure::Entry. I need it for parsing the sequence from 30000 PDB Files. I want to analyse the sequence after length, metal ions and cysteine. best regards, Saskia From brian_osborne at cognia.com Thu Jun 9 07:51:09 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jun 9 07:43:29 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <7f77f6e5803451c7236755c5550e1023@gmx.net> Message-ID: Bioperl, > Stefan and Mingyi should fix whatever is needed to silence those > warnings. (No offense please) Yes, I should support Hilmar here. When we are working through the full test suite before a formal release our goal has always been to eliminate all errors *and* warnings, not just errors. Brian O. On 6/8/05 10:28 PM, "Hilmar Lapp" wrote: > Quite frankly, mixing versions in bioperl has never worked really well > and nobody really has time to support this, so I don't think it's good > advice to people to suggest plugging in single modules from a recent > version into an older one. Just update to current cvs and you should be > at a state where at least you're not totally on your own. > > As for the output, these are obviously warnings, and instead of asking > you to turn off -w (L.Wall: Known bugs: -w is not on by default.) > Stefan and Mingyi should fix whatever is needed to silence those > warnings. (No offense please) > > -hilmar > > On Jun 8, 2005, at 11:40 PM, Law, Annie wrote: > >> Hi, >> >> I went to bioperl-live and copied the newest entrezgene.pm and >> Bio::AnnotatableI, Bio::SeqFeatureI modules and >> I got the following output (same output with entrezgene.pm saved on >> May 11 as well). The complaint about the module >> Generic.pm is gone now. >> I am using EntrezGene V 1.09 picked up from CPAN. >> >> I'm not sure what is wrong. I guess I could delete the Bio/ASN1 >> directory and install EntrezGene V.1.07 but that is not supposed to >> make a difference? >> Here's a wild stab but are the additional messages related to a >> statement I read somewhere about the first record being empty? >> >> Is my next alternative to install Bioperl 1.5 and try again? If so, am >> I correct in thinking that to 'uninstall' there is no such >> Uninstall command but I would just delete the >> /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then >> Use CPAN to install bioperl 1.5 >> >> Thanks so much, >> Annie. >> >> >> Argument "-trimopt" isn't numeric in numeric eq (==) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >> Pseudo-hashes are deprecated at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >> Use of uninitialized value in substitution (s///) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >> Use of uninitialized value in substitution (s///) at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >> Pseudo-hashes are deprecated at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. >> Use of uninitialized value in string eq at >> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. >> >> >> -----Original Message----- >> From: Stefan Kirov [mailto:skirov@utk.edu] >> Sent: Tuesday, June 07, 2005 8:39 PM >> To: Mingyi Liu >> Cc: Law, Annie; Bioperl list >> Subject: Re: [Bioperl-l] Entrez Gene parser questions >> >> >> I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the >> code (actually the underlying modules I mentioned...). >> Thanks for the -file option... There are also some leftover commenst >> :-) >> . I need to clean the code a bit... >> Stefan >> >> Mingyi Liu wrote: >> >>> Hi, Annie >>> >>> Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked >>> fine with your entrezgene.pm in bioperl-live at least a couple days >>> ago, when I replied to Annie. V1.09 also did not have any significant >>> changes in parser code vs. 1.07. >>> >>> I guess it has something to do with V1.4 since the sample script you >>> used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The >>> procedure you used was also correct (there's no need to 'use >>> Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first >>> (although I don't quite understand why Bio::ASN1::EntrezGene >>> complained about non numerica operon, which can only result from >>> incorrect calling of the function. But Stefan's entrezgene was using >>> correct calling and it runs perfectly for me. I was using >>> entrezgene.pm CVS version 1.9. >>> >>> Keep us posted on your progress. >>> >>> BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' >>> (bioperl style) or 'file' (I allowed both when I implemented this >>> switch I think. I usually just use file, which might explain your note >>> in the comment in your script. I just noticed the comment. >>> >>> Mingyi >>> >>> Stefan Kirov wrote: >>> >>>> Hi Annie, >>>> Few more things to update: >>>> Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >>>> your script does not work. Update also Bio::SeqIO::entrezegene- I >>>> added some fixed so you will not see those nasty warning when using >>>> strict (nothing critical though). >>>> I am not sure we have the same versions for Bio::ASN1::EntrezGene. I >>>> have 1.0.7. Where did you get it from (maybe mine is older)? >>>> Stefan >>>> >>>> Law, Annie wrote: >>>> >>>>> Hi, >>>>> >>>>> Thanks for all of the replies. >>>>> I am using bioperl 1.4 and I have done the following: >>>>> 1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>>>> make test, make install) >>>>> 2. got a copy of the entrezgene.pm from bioperl-live and put it in >>>>> the Bio/SeqIO directory >>>>> 3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>>>> the corresponding directory >>>>> 4. the Bio::Cluster::SequenceFamily file was already up to date >>>>> 5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>>>> modules >>>>> >>>>> I grabbed the ASN file from >>>>> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>>>> >>>>> I then wrote a simple perl script which includes: >>>>> >>>>> #!/usr/bin/perl -w >>>>> >>>>> use strict; >>>>> use Bio::ASN1::EntrezGene; >>>>> use Bio::SeqIO; >>>>> use Bio::Annotation::DBLink; >>>>> use Bio::Cluster::SequenceFamily; >>>>> >>>>> my $file = '/var/lib/mysql/Homo_sapiens'; >>>>> >>>>> my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>>>> -file => $file, >>>>> -debug => 'on'); >>>>> >>>>> my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>>>> >>>>> After I run this script I get the following errors: >>>>> >>>>> Useless use of hash element in void context at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>>>> >>>>> Argument "-trimopt" isn't numeric in numeric eq (==) at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>>>> >>>>> Pseudo-hashes are deprecated at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>>>> >>>>> Use of uninitialized value in string ne at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>>>> >>>>> Pseudo-hashes are deprecated at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>>> >>>>> Can't locate object method "add_tag_value" via package >>>>> "Bio::SeqFeature::Generic" at >>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>>>> >>>>> I'm not sure what I am missing? >>>>> >>>>> Thanks so much, >>>>> Annie. >>>> >>>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> From brian_osborne at cognia.com Thu Jun 9 07:53:44 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jun 9 07:46:19 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <42A7B0F7.6020107@utk.edu> Message-ID: Stefan, Of course you're right. There are no deadlines and your efforts in this regard are very much appreciated, by all. Brian O. On 6/8/05 11:01 PM, "Stefan Kirov" wrote: > Hilmar Lapp wrote: > >> Quite frankly, mixing versions in bioperl has never worked really well >> and nobody really has time to support this, so I don't think it's good >> advice to people to suggest plugging in single modules from a recent >> version into an older one. Just update to current cvs and you should >> be at a state where at least you're not totally on your own. > > Agree, but there may be a need for a transitional period. Annie, why do > you need bioperl 1.4? Is there something that lacks backcompatibility? I > think in general most modules try to provide such... > >> >> As for the output, these are obviously warnings, and instead of asking >> you to turn off -w (L.Wall: Known bugs: -w is not on by default.) >> Stefan and Mingyi should fix whatever is needed to silence those >> warnings. (No offense please) > > None taken. But this is not even a developer's release and I believe > such bugs are usual and normal (as long as they are not there for ever). > I am creating tests and I will fix the warnings when I commit these, but > I believe there are no deadlines I am missing. Please correct me if I am > wrong and I can probably spend some more time fixing the warnings > produced by entrezgene parser. > Overall I don't believe it is a bad idea for people to explore > particular code without strict (provided this does not go into > production). That's why the term 'unstable' exists, does it not? > Stefan > >> >> -hilmar >> >> On Jun 8, 2005, at 11:40 PM, Law, Annie wrote: >> >>> Hi, >>> >>> I went to bioperl-live and copied the newest entrezgene.pm and >>> Bio::AnnotatableI, Bio::SeqFeatureI modules and >>> I got the following output (same output with entrezgene.pm saved on >>> May 11 as well). The complaint about the module >>> Generic.pm is gone now. >>> I am using EntrezGene V 1.09 picked up from CPAN. >>> >>> I'm not sure what is wrong. I guess I could delete the Bio/ASN1 >>> directory and install EntrezGene V.1.07 but that is not supposed to >>> make a difference? >>> Here's a wild stab but are the additional messages related to a >>> statement I read somewhere about the first record being empty? >>> >>> Is my next alternative to install Bioperl 1.5 and try again? If so, >>> am I correct in thinking that to 'uninstall' there is no such >>> Uninstall command but I would just delete the >>> /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then >>> Use CPAN to install bioperl 1.5 >>> >>> Thanks so much, >>> Annie. >>> >>> >>> Argument "-trimopt" isn't numeric in numeric eq (==) at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>> Pseudo-hashes are deprecated at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>> Use of uninitialized value in string eq at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. >>> Use of uninitialized value in string eq at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >>> Use of uninitialized value in substitution (s///) at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >>> Use of uninitialized value in string eq at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >>> Use of uninitialized value in substitution (s///) at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >>> Pseudo-hashes are deprecated at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>> Use of uninitialized value in string eq at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >>> Use of uninitialized value in string eq at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >>> Use of uninitialized value in string eq at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >>> Use of uninitialized value in string eq at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. >>> Use of uninitialized value in string eq at >>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. >>> >>> >>> -----Original Message----- >>> From: Stefan Kirov [mailto:skirov@utk.edu] >>> Sent: Tuesday, June 07, 2005 8:39 PM >>> To: Mingyi Liu >>> Cc: Law, Annie; Bioperl list >>> Subject: Re: [Bioperl-l] Entrez Gene parser questions >>> >>> >>> I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the >>> code (actually the underlying modules I mentioned...). >>> Thanks for the -file option... There are also some leftover commenst :-) >>> . I need to clean the code a bit... >>> Stefan >>> >>> Mingyi Liu wrote: >>> >>>> Hi, Annie >>>> >>>> Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked >>>> fine with your entrezgene.pm in bioperl-live at least a couple days >>>> ago, when I replied to Annie. V1.09 also did not have any significant >>>> changes in parser code vs. 1.07. >>>> >>>> I guess it has something to do with V1.4 since the sample script you >>>> used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The >>>> procedure you used was also correct (there's no need to 'use >>>> Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first >>>> (although I don't quite understand why Bio::ASN1::EntrezGene >>>> complained about non numerica operon, which can only result from >>>> incorrect calling of the function. But Stefan's entrezgene was using >>>> correct calling and it runs perfectly for me. I was using >>>> entrezgene.pm CVS version 1.9. >>>> >>>> Keep us posted on your progress. >>>> >>>> BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' >>>> (bioperl style) or 'file' (I allowed both when I implemented this >>>> switch I think. I usually just use file, which might explain your note >>>> in the comment in your script. I just noticed the comment. >>>> >>>> Mingyi >>>> >>>> Stefan Kirov wrote: >>>> >>>>> Hi Annie, >>>>> Few more things to update: >>>>> Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >>>>> your script does not work. Update also Bio::SeqIO::entrezegene- I >>>>> added some fixed so you will not see those nasty warning when using >>>>> strict (nothing critical though). >>>>> I am not sure we have the same versions for Bio::ASN1::EntrezGene. I >>>>> have 1.0.7. Where did you get it from (maybe mine is older)? >>>>> Stefan >>>>> >>>>> Law, Annie wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Thanks for all of the replies. >>>>>> I am using bioperl 1.4 and I have done the following: >>>>>> 1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>>>>> make test, make install) >>>>>> 2. got a copy of the entrezgene.pm from bioperl-live and put it in >>>>>> the Bio/SeqIO directory >>>>>> 3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>>>>> the corresponding directory >>>>>> 4. the Bio::Cluster::SequenceFamily file was already up to date >>>>>> 5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>>>>> modules >>>>>> >>>>>> I grabbed the ASN file from >>>>>> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>>>>> >>>>>> I then wrote a simple perl script which includes: >>>>>> >>>>>> #!/usr/bin/perl -w >>>>>> >>>>>> use strict; >>>>>> use Bio::ASN1::EntrezGene; >>>>>> use Bio::SeqIO; >>>>>> use Bio::Annotation::DBLink; >>>>>> use Bio::Cluster::SequenceFamily; >>>>>> >>>>>> my $file = '/var/lib/mysql/Homo_sapiens'; >>>>>> >>>>>> my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>>>>> -file => $file, >>>>>> -debug => 'on'); >>>>>> >>>>>> my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>>>>> >>>>>> After I run this script I get the following errors: >>>>>> >>>>>> Useless use of hash element in void context at >>>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>>>>> >>>>>> Argument "-trimopt" isn't numeric in numeric eq (==) at >>>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>>>>> >>>>>> Pseudo-hashes are deprecated at >>>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>>>>> >>>>>> Use of uninitialized value in string ne at >>>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>>>>> >>>>>> Pseudo-hashes are deprecated at >>>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>>>> >>>>>> Can't locate object method "add_tag_value" via package >>>>>> "Bio::SeqFeature::Generic" at >>>>>> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>>>>> >>>>>> I'm not sure what I am missing? >>>>>> >>>>>> Thanks so much, >>>>>> Annie. >>>>> >>>>> >>>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Thu Jun 9 08:02:26 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 9 07:54:28 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: References: Message-ID: <42A82FD2.7030908@utk.edu> Brian, I also agree on that, but I don't think there is a formal release coming soon, is there? Really, when is the next relase due? Stefan Brian Osborne wrote: >Bioperl, > > > >>Stefan and Mingyi should fix whatever is needed to silence those >>warnings. (No offense please) >> >> > >Yes, I should support Hilmar here. When we are working through the full test >suite before a formal release our goal has always been to eliminate all >errors *and* warnings, not just errors. > >Brian O. > > >On 6/8/05 10:28 PM, "Hilmar Lapp" wrote: > > > >>Quite frankly, mixing versions in bioperl has never worked really well >>and nobody really has time to support this, so I don't think it's good >>advice to people to suggest plugging in single modules from a recent >>version into an older one. Just update to current cvs and you should be >>at a state where at least you're not totally on your own. >> >>As for the output, these are obviously warnings, and instead of asking >>you to turn off -w (L.Wall: Known bugs: -w is not on by default.) >>Stefan and Mingyi should fix whatever is needed to silence those >>warnings. (No offense please) >> >>-hilmar >> >>On Jun 8, 2005, at 11:40 PM, Law, Annie wrote: >> >> >> >>>Hi, >>> >>>I went to bioperl-live and copied the newest entrezgene.pm and >>>Bio::AnnotatableI, Bio::SeqFeatureI modules and >>>I got the following output (same output with entrezgene.pm saved on >>>May 11 as well). The complaint about the module >>>Generic.pm is gone now. >>>I am using EntrezGene V 1.09 picked up from CPAN. >>> >>>I'm not sure what is wrong. I guess I could delete the Bio/ASN1 >>>directory and install EntrezGene V.1.07 but that is not supposed to >>>make a difference? >>>Here's a wild stab but are the additional messages related to a >>>statement I read somewhere about the first record being empty? >>> >>>Is my next alternative to install Bioperl 1.5 and try again? If so, am >>>I correct in thinking that to 'uninstall' there is no such >>>Uninstall command but I would just delete the >>>/usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then >>>Use CPAN to install bioperl 1.5 >>> >>>Thanks so much, >>>Annie. >>> >>> >>>Argument "-trimopt" isn't numeric in numeric eq (==) at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>>Pseudo-hashes are deprecated at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>Use of uninitialized value in string eq at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 543. >>>Use of uninitialized value in string eq at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >>>Use of uninitialized value in substitution (s///) at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >>>Use of uninitialized value in string eq at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 554. >>>Use of uninitialized value in substitution (s///) at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 555. >>>Pseudo-hashes are deprecated at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>Use of uninitialized value in string eq at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >>>Use of uninitialized value in string eq at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >>>Use of uninitialized value in string eq at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 604. >>>Use of uninitialized value in string eq at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 295. >>>Use of uninitialized value in string eq at >>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 323. >>> >>> >>>-----Original Message----- >>>From: Stefan Kirov [mailto:skirov@utk.edu] >>>Sent: Tuesday, June 07, 2005 8:39 PM >>>To: Mingyi Liu >>>Cc: Law, Annie; Bioperl list >>>Subject: Re: [Bioperl-l] Entrez Gene parser questions >>> >>> >>>I am pretty sure it is the old Bio::SeqFeature::Generic that breaks the >>>code (actually the underlying modules I mentioned...). >>>Thanks for the -file option... There are also some leftover commenst >>>:-) >>>. I need to clean the code a bit... >>>Stefan >>> >>>Mingyi Liu wrote: >>> >>> >>> >>>>Hi, Annie >>>> >>>>Annie must've been using the Bio::ASN1::EntrezGene V1.09, which worked >>>>fine with your entrezgene.pm in bioperl-live at least a couple days >>>>ago, when I replied to Annie. V1.09 also did not have any significant >>>>changes in parser code vs. 1.07. >>>> >>>>I guess it has something to do with V1.4 since the sample script you >>>>used below runs perfectly fine on my Bioperl 1.5 + entrezgene.pm. The >>>>procedure you used was also correct (there's no need to 'use >>>>Bio::ASN1::EntrezGene;' though). Try follow Stefan's suggestion first >>>>(although I don't quite understand why Bio::ASN1::EntrezGene >>>>complained about non numerica operon, which can only result from >>>>incorrect calling of the function. But Stefan's entrezgene was using >>>>correct calling and it runs perfectly for me. I was using >>>>entrezgene.pm CVS version 1.9. >>>> >>>>Keep us posted on your progress. >>>> >>>>BTW, Stefan, you could call Bio::ASN1::EntrezGene with either '-file' >>>>(bioperl style) or 'file' (I allowed both when I implemented this >>>>switch I think. I usually just use file, which might explain your note >>>>in the comment in your script. I just noticed the comment. >>>> >>>>Mingyi >>>> >>>>Stefan Kirov wrote: >>>> >>>> >>>> >>>>>Hi Annie, >>>>>Few more things to update: >>>>>Bio::AnnotatableI and Bio::SeqFeatureI. I believe this is the reason >>>>>your script does not work. Update also Bio::SeqIO::entrezegene- I >>>>>added some fixed so you will not see those nasty warning when using >>>>>strict (nothing critical though). >>>>>I am not sure we have the same versions for Bio::ASN1::EntrezGene. I >>>>>have 1.0.7. Where did you get it from (maybe mine is older)? >>>>>Stefan >>>>> >>>>>Law, Annie wrote: >>>>> >>>>> >>>>> >>>>>>Hi, >>>>>> >>>>>>Thanks for all of the replies. >>>>>>I am using bioperl 1.4 and I have done the following: >>>>>>1. installed Bio::ASN1::EnztrezGene (using perl Makefile.PL, make, >>>>>>make test, make install) >>>>>>2. got a copy of the entrezgene.pm from bioperl-live and put it in >>>>>>the Bio/SeqIO directory >>>>>>3. copied the Bio::Annotation::DBLink DBLink.pm file and put them in >>>>>>the corresponding directory >>>>>>4. the Bio::Cluster::SequenceFamily file was already up to date >>>>>>5. also have all the most recent bioperl-live Bio::SeqFeature::Gene >>>>>>modules >>>>>> >>>>>>I grabbed the ASN file from >>>>>>ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN/Mammalia/ >>>>>> >>>>>>I then wrote a simple perl script which includes: >>>>>> >>>>>>#!/usr/bin/perl -w >>>>>> >>>>>>use strict; >>>>>>use Bio::ASN1::EntrezGene; >>>>>>use Bio::SeqIO; >>>>>>use Bio::Annotation::DBLink; >>>>>>use Bio::Cluster::SequenceFamily; >>>>>> >>>>>>my $file = '/var/lib/mysql/Homo_sapiens'; >>>>>> >>>>>>my $seqio = Bio::SeqIO->new(-format => 'entrezgene', >>>>>> -file => $file, >>>>>> -debug => 'on'); >>>>>> >>>>>>my ($gene,$genestructure,$uncaptured) = $seqio->next_seq; >>>>>> >>>>>>After I run this script I get the following errors: >>>>>> >>>>>>Useless use of hash element in void context at >>>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 317. >>>>>> >>>>>>Argument "-trimopt" isn't numeric in numeric eq (==) at >>>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/ASN1/EntrezGene.pm line 450. >>>>>> >>>>>>Pseudo-hashes are deprecated at >>>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 148. >>>>>> >>>>>>Use of uninitialized value in string ne at >>>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 150. >>>>>> >>>>>>Pseudo-hashes are deprecated at >>>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/entrezgene.pm line 416. >>>>>> >>>>>>Can't locate object method "add_tag_value" via package >>>>>>"Bio::SeqFeature::Generic" at >>>>>>/usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm line 234. >>>>>> >>>>>>I'm not sure what I am missing? >>>>>> >>>>>>Thanks so much, >>>>>>Annie. >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> > > > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From mingyi.liu at gpc-biotech.com Thu Jun 9 09:55:56 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Thu Jun 9 09:47:52 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: References: Message-ID: <42A84A6C.4000509@gpc-biotech.com> Brian Osborne wrote: >Bioperl, > > > >>Stefan and Mingyi should fix whatever is needed to silence those >>warnings. (No offense please) >> >> > >Yes, I should support Hilmar here. When we are working through the full test >suite before a formal release our goal has always been to eliminate all >errors *and* warnings, not just errors. > >Brian O. > > > Hi, Brian & Hilmar, I think both of you misunderstood (or missed part of) the previous email exchanges between Stefan and me and jumped to conclusion too soon. In my previous email, it began with "Stefan's right in suggesting you turn off -w, which would make your script work.", which likely led to your worries. But that's totally unnecessary - this is just a temporary solution suggested to end user. It is not a suggestion that we will not fix it or end user should change their programming habit. In fact, if you please read on, in the latter part of the same message, I suggested a fix that would fix the problem end user saw, which Stefan immediately responded that he'd incorporate it. I'm sure Stefan would get rid of those annoying "use of unitialized value" warnings too. That's the long term solution we proposed and (to be) implemented to fix the issue raised by end user. So truly there's no reason to worry or take sides here. We don't really have a difference in our user support approaches, despite our apparent different attitudes towards the importance of '-w' switch. :) Mingyi From cjm at fruitfly.org Thu Jun 9 14:45:48 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Thu Jun 9 14:38:00 2005 Subject: [Bioperl-l] Bio::RangeI::union Message-ID: The pod docs for union() state that this is is valid: my $newrange = Bio::RangeI->union(@ranges); In the subroutine body, this gets called: my $self = shift; ... $self->new(...) Since $self is equal to the string "Bio::RangeI", rather than an object implementing this interface, this will result in a call to Bio::Root::RootI->new("Bio::RangeI",...) This works fine in bp1.4, but in recent bioperl revisions this results in a warning message that Bio::Root::RootI->new is deprecated, and a delegation to Bio::Root::Root, **omitting the name of the class to be created**, thus creating a Bio::Root::Root object, which is useless and will inevitably break any code calling the union() method. I think this delegation is completely wrong, and should be removed, and the warning message switched to an error; OR it should be undeprecated and the original behaviour behaviour restored If we decide that RootI->new is truly deprecated, then Bio::RangeI should have to do some $self examination, and use the correct object instantiation method, rather than $self->new. I don't really know what the correct object instantiation method is - perhaps just Bio::Range->new()? Or should a factory be used? Personally, I would prefer it if Bio::RootI->new were undeprecated and the original behaviour restored. deprecating would make perfect sense if bioperl interfaces really were interfaces, which they are not. Cheers Chris From Annie.Law at nrc-cnrc.gc.ca Thu Jun 9 14:57:40 2005 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Thu Jun 9 14:49:47 2005 Subject: [Bioperl-l] Entrez Gene parser questions Message-ID: <10C94843061E094A98C02EB77CFC328722FF4E@nrcmrdex1d.imsb.nrc.ca> Hi, Thanks for everybody's responses. Yes, if I turn off the warnings then the scripts works. I was sticking with bioperl 1.4 because I think I read somewhere that the even extensions are the stable ones and Also I was happy with how 1.4 was working for me (and it seems that 1.4 works okay with the Entrez gene parser now) but if it is unadvised to plug in new modules to an older version then I will look into installing Bioperl 1.5. It seems to be fine for a lot of people :) Just some questions about getting rid of bioperl 1.4. Am I correct in thinking that to 'uninstall' there is no such Uninstall command but I would just delete the /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then Use CPAN to install bioperl 1.5 I am running my simple script and some print statements and it takes about 10 minutes to print 2000 Entrez gene Ids. How long does it generally take the parser to finish for example the Homo sapiens file? My real goal is not just to print but I just wanted to do a test run. Is this about the same performance others are getting or are there some other options to speed it up? Thanks again for all of your efforts! Annie. -----Original Message----- From: Stefan Kirov [mailto:skirov@utk.edu] Sent: Wednesday, June 08, 2005 4:50 PM To: Mingyi Liu Cc: Law, Annie; Bioperl list Subject: Re: [Bioperl-l] Entrez Gene parser questions Sure. Thanks for letting me know. Annie, does it work for you now? Stefan Mingyi Liu wrote: > Hi, > > Stefan's right in suggesting you turn off -w, which would make your > script work. But thanks for finding this bug. I just noticed that > entrezgene.pm was actually calling the Bio::ASN1::EntrezGene->next_seq > incorrectly (probably my documentation was a bit confusing & my module > did not follow standard hash-based parameter passing of subroutines). > It should be called like ->next_seq(1), but entrezgene.pm called using > ->next_seq(-trimopt => 1). This worked for all of us who do not use > -w, as it would fall back to option '1' in my next_seq function > (exactly as Stefan's calling function wanted). Therefore this bug > went unnoticed until you turned on -w (I guess we were all spoiled by > the easy (and sometimes messy) life of weak datatyping in Perl). > > Stefan would you please change the calling of next_seq to next_seq(1)? > This would fix the error messages Annie was seeing. > > Thanks, > > Mingyi > From skirov at utk.edu Thu Jun 9 15:08:31 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 9 15:00:19 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF4E@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF4E@nrcmrdex1d.imsb.nrc.ca> Message-ID: <42A893AF.7030508@utk.edu> It is slow- there is a lot of data and it goes into many bioperl objects. Performance is not the idea of this parser. If you need really high performance you may want to stick to the flat files. One suggestion is to get rid of the debug option (it won't make big difference, but still...). The whole human file takes about an hour on my machine, depends what you have to do the analysis. Also i Oh, by the way you need bioperl-live, not 1.5. No need for uninstall- just install bioperl-live, it should overwrite the old stuff. Stefan Law, Annie wrote: >Hi, > >Thanks for everybody's responses. Yes, if I turn off the warnings then the scripts works. I was sticking with bioperl 1.4 because I think I read somewhere that the even extensions are the stable ones and Also I was happy with how 1.4 was working for me (and it seems that 1.4 works okay with the Entrez gene parser now) but if it is unadvised to plug in new modules to an older version then I will look into installing Bioperl 1.5. It seems to be fine for a lot of people :) > >Just some questions about getting rid of bioperl 1.4. Am I correct in thinking that to 'uninstall' there is no such Uninstall command but I would just delete the /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then Use CPAN to install bioperl 1.5 > >I am running my simple script and some print statements and it takes about 10 minutes to print 2000 Entrez gene Ids. How long does it generally take the parser to finish for example the Homo sapiens file? My real goal is not just to print but I just wanted to do a test run. Is this about the same performance others are getting or are there some other options to speed it up? > >Thanks again for all of your efforts! >Annie. > > >-----Original Message----- >From: Stefan Kirov [mailto:skirov@utk.edu] >Sent: Wednesday, June 08, 2005 4:50 PM >To: Mingyi Liu >Cc: Law, Annie; Bioperl list >Subject: Re: [Bioperl-l] Entrez Gene parser questions > > >Sure. Thanks for letting me know. >Annie, does it work for you now? >Stefan > >Mingyi Liu wrote: > > > >>Hi, >> >>Stefan's right in suggesting you turn off -w, which would make your >>script work. But thanks for finding this bug. I just noticed that >>entrezgene.pm was actually calling the Bio::ASN1::EntrezGene->next_seq >>incorrectly (probably my documentation was a bit confusing & my module >>did not follow standard hash-based parameter passing of subroutines). >>It should be called like ->next_seq(1), but entrezgene.pm called using >>->next_seq(-trimopt => 1). This worked for all of us who do not use >>-w, as it would fall back to option '1' in my next_seq function >>(exactly as Stefan's calling function wanted). Therefore this bug >>went unnoticed until you turned on -w (I guess we were all spoiled by >>the easy (and sometimes messy) life of weak datatyping in Perl). >> >>Stefan would you please change the calling of next_seq to next_seq(1)? >>This would fix the error messages Annie was seeing. >> >>Thanks, >> >>Mingyi >> >> >> From mingyi.liu at gpc-biotech.com Thu Jun 9 15:11:19 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Thu Jun 9 15:04:32 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FF4E@nrcmrdex1d.imsb.nrc.ca> References: <10C94843061E094A98C02EB77CFC328722FF4E@nrcmrdex1d.imsb.nrc.ca> Message-ID: <42A89457.1000102@gpc-biotech.com> Law, Annie wrote: >Just some questions about getting rid of bioperl 1.4. Am I correct in thinking that to 'uninstall' there is no such Uninstall command but I would just delete the /usr/lib/perl5/site_perl/5.8.0/Bio ie. The Bio directory and then Use CPAN to install bioperl 1.5 > > > AFAIK, bioperl installer does not support uninstall. So just delete the directory containing the modules and install 1.5 should work for you. >I am running my simple script and some print statements and it takes about 10 minutes to print 2000 Entrez gene Ids. How long does it generally take the parser to finish for example the Homo sapiens file? My real goal is not just to print but I just wanted to do a test run. Is this about the same performance others are getting or are there some other options to speed it up? > > That sounds exceedingly slow. What type of machine are you using? Using Bio::ASN1::EntrezGene version 1.09 on my Intel Xeon 2.4 GHz machine it takes about 17 minutes to process all human records (~145000 records). Stefan had stated that Bio::SeqIO::entrezgene.pm is about 4 fold slower after all the bioperl object initiations etc., so it should translate to 20,000+ records for 10 minutes. If the genes you processed are all of 'live' status, then the speed would be slower since they have more content on average, but 2000 for 10 minutes still sounds slow. There might be some thing in your script that can be improved or scripts just run slowly on your machine? Or the genes you selected on averge have much more content than other overall human records. BTW, just got Stefan's mail which confirmed the speed stats above. Mingyi From amackey at pcbi.upenn.edu Thu Jun 9 15:39:37 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Jun 9 15:31:55 2005 Subject: [Bioperl-l] Bio::RangeI::union In-Reply-To: References: Message-ID: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> Or, how about deprecating the Bio::RangeI->union() construct (since although we've supplied implementations in interface, we don't need to encourage people to use them)? This is just a weird way to do $newrange = Bio::Range->union(@ranges), right? (which saves a whole capitalized keystroke!) -Aaron On Jun 9, 2005, at 2:45 PM, Chris Mungall wrote: > > The pod docs for union() state that this is is valid: > > my $newrange = Bio::RangeI->union(@ranges); > > In the subroutine body, this gets called: > > my $self = shift; > ... > $self->new(...) > > Since $self is equal to the string "Bio::RangeI", rather than an > object > implementing this interface, this will result in a call to > > Bio::Root::RootI->new("Bio::RangeI",...) > > This works fine in bp1.4, but in recent bioperl revisions this > results in > a warning message that Bio::Root::RootI->new is deprecated, and a > delegation to Bio::Root::Root, **omitting the name of the class to be > created**, thus creating a Bio::Root::Root object, which is useless > and > will inevitably break any code calling the union() method. > > I think this delegation is completely wrong, and should be removed, > and > the warning message switched to an error; OR it should be > undeprecated and > the original behaviour behaviour restored > > If we decide that RootI->new is truly deprecated, then Bio::RangeI > should > have to do some $self examination, and use the correct object > instantiation method, rather than $self->new. I don't really know > what the > correct object instantiation method is - perhaps just Bio::Range- > >new()? > Or should a factory be used? > > Personally, I would prefer it if Bio::RootI->new were undeprecated > and the > original behaviour restored. deprecating would make perfect sense if > bioperl interfaces really were interfaces, which they are not. > > Cheers > Chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From jeremy_just at netcourrier.com Thu Jun 9 08:58:59 2005 From: jeremy_just at netcourrier.com (=?ISO-8859-15?Q?J=E9r=E9my?= JUST) Date: Thu Jun 9 18:06:45 2005 Subject: [Bioperl-l] Putative bug in Bio/SearchIO/blast.pm Message-ID: <20050609145859.0000248d@pearson.infobiogen.fr> Hello, I think I've found a little bug in the Blast parser. On that Blast result (BLASTN 2.2.9 [May-01-2004]): <<<<<< Score E Sequences producing significant alignments: (bits) Value gi|42592260|ref|NC_003070.5| Arabidopsis thaliana chromosome 1, ... 75.8 8e-13 >gi|42592260|ref|NC_003070.5| Arabidopsis thaliana chromosome 1, complete sequence Length = 30432563 Score = 75.8 bits (38), Expect = 8e-13 Identities = 104/126 (82%) Strand = Plus / Minus >>>>>> the score is read as ? 8 ? instead of ? 75.8 ?. I attach a tiny patch against Bioperl-1.5.0. ($Id: blast.pm,v 1.84 2004/10/28 20:40:12 jason Exp $) Cheers. -- J?r?my JUST -------------- next part -------------- A non-text attachment was scrubbed... Name: blast_score_format.diff Type: application/octet-stream Size: 663 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050609/9c1e4f60/blast_score_format.obj From hlapp at gmx.net Thu Jun 9 20:55:17 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jun 9 20:47:12 2005 Subject: [Bioperl-l] Entrez Gene parser questions In-Reply-To: <42A84A6C.4000509@gpc-biotech.com> References: <42A84A6C.4000509@gpc-biotech.com> Message-ID: <43ab7b9c98ed5009ae49de43798dd1fb@gmx.net> Guys, don't over-inflate this. I said no offense intended, Stefan said none taken, so why don't we discontinue this thread and you guys just fix it at some point. OK? -hilmar On Jun 9, 2005, at 9:55 PM, Mingyi Liu wrote: > Brian Osborne wrote: > >> Bioperl, >> >> >>> Stefan and Mingyi should fix whatever is needed to silence those >>> warnings. (No offense please) >>> >> >> Yes, I should support Hilmar here. When we are working through the >> full test >> suite before a formal release our goal has always been to eliminate >> all >> errors *and* warnings, not just errors. >> >> Brian O. >> >> > Hi, Brian & Hilmar, > > I think both of you misunderstood (or missed part of) the previous > email exchanges between Stefan and me and jumped to conclusion too > soon. In my previous email, it began with "Stefan's right in > suggesting you turn off -w, which would make your script work.", which > likely led to your worries. But that's totally unnecessary - this is > just a temporary solution suggested to end user. It is not a > suggestion that we will not fix it or end user should change their > programming habit. > > In fact, if you please read on, in the latter part of the same > message, I suggested a fix that would fix the problem end user saw, > which Stefan immediately responded that he'd incorporate it. I'm sure > Stefan would get rid of those annoying "use of unitialized value" > warnings too. That's the long term solution we proposed and (to be) > implemented to fix the issue raised by end user. So truly there's no > reason to worry or take sides here. We don't really have a difference > in our user support approaches, despite our apparent different > attitudes towards the importance of '-w' switch. :) > > Mingyi > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Thu Jun 9 21:30:54 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 9 21:24:01 2005 Subject: [Bioperl-l] Entrez Gene parser questions and next release?? In-Reply-To: <43ab7b9c98ed5009ae49de43798dd1fb@gmx.net> References: <42A84A6C.4000509@gpc-biotech.com> <43ab7b9c98ed5009ae49de43798dd1fb@gmx.net> Message-ID: <42A8ED4E.4010300@utk.edu> Hilmar Lapp wrote: > Guys, don't over-inflate this. I said no offense intended, Stefan said > none taken, so why don't we discontinue this thread and you guys just > fix it at some point. OK? Yup :-) . But I really want to know when the next release is going to happen. Some of my code is a mess right now, I have few additional parsers done I have not tested well yet... So if it is going to happen soon I am in trouble. I know there was some discussion about this in the recent past, but please help a lazy guy out here and let me know... Stefan > > -hilmar > > On Jun 9, 2005, at 9:55 PM, Mingyi Liu wrote: > >> Brian Osborne wrote: >> >>> Bioperl, >>> >>> >>>> Stefan and Mingyi should fix whatever is needed to silence those >>>> warnings. (No offense please) >>>> >>> >>> Yes, I should support Hilmar here. When we are working through the >>> full test >>> suite before a formal release our goal has always been to eliminate all >>> errors *and* warnings, not just errors. >>> >>> Brian O. >>> >>> >> Hi, Brian & Hilmar, >> >> I think both of you misunderstood (or missed part of) the previous >> email exchanges between Stefan and me and jumped to conclusion too >> soon. In my previous email, it began with "Stefan's right in >> suggesting you turn off -w, which would make your script work.", >> which likely led to your worries. But that's totally unnecessary - >> this is just a temporary solution suggested to end user. It is not a >> suggestion that we will not fix it or end user should change their >> programming habit. >> >> In fact, if you please read on, in the latter part of the same >> message, I suggested a fix that would fix the problem end user saw, >> which Stefan immediately responded that he'd incorporate it. I'm >> sure Stefan would get rid of those annoying "use of unitialized >> value" warnings too. That's the long term solution we proposed and >> (to be) implemented to fix the issue raised by end user. So truly >> there's no reason to worry or take sides here. We don't really have >> a difference in our user support approaches, despite our apparent >> different attitudes towards the importance of '-w' switch. :) >> >> Mingyi >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From hlapp at gmx.net Thu Jun 9 21:14:49 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jun 9 22:13:59 2005 Subject: [Bioperl-l] Bio::RangeI::union In-Reply-To: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> References: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> Message-ID: <65b843706f46188c71430f1969a4effb@gmx.net> Sounds good to me. Interfaces may have decorating methods but we wanted to deprecate that interfaces are instantiated. So if RangeI tries to instantiate itself then that's not good; no decorating method on an interface should ever attempt to instantiate the interface. -hilmar On Jun 10, 2005, at 3:39 AM, Aaron J. Mackey wrote: > Or, how about deprecating the Bio::RangeI->union() construct (since > although we've supplied implementations in interface, we don't need to > encourage people to use them)? This is just a weird way to do > $newrange = Bio::Range->union(@ranges), right? (which saves a whole > capitalized keystroke!) > > -Aaron > > On Jun 9, 2005, at 2:45 PM, Chris Mungall wrote: > >> >> The pod docs for union() state that this is is valid: >> >> my $newrange = Bio::RangeI->union(@ranges); >> >> In the subroutine body, this gets called: >> >> my $self = shift; >> ... >> $self->new(...) >> >> Since $self is equal to the string "Bio::RangeI", rather than an >> object >> implementing this interface, this will result in a call to >> >> Bio::Root::RootI->new("Bio::RangeI",...) >> >> This works fine in bp1.4, but in recent bioperl revisions this >> results in >> a warning message that Bio::Root::RootI->new is deprecated, and a >> delegation to Bio::Root::Root, **omitting the name of the class to be >> created**, thus creating a Bio::Root::Root object, which is useless >> and >> will inevitably break any code calling the union() method. >> >> I think this delegation is completely wrong, and should be removed, >> and >> the warning message switched to an error; OR it should be >> undeprecated and >> the original behaviour behaviour restored >> >> If we decide that RootI->new is truly deprecated, then Bio::RangeI >> should >> have to do some $self examination, and use the correct object >> instantiation method, rather than $self->new. I don't really know >> what the >> correct object instantiation method is - perhaps just >> Bio::Range->new()? >> Or should a factory be used? >> >> Personally, I would prefer it if Bio::RootI->new were undeprecated >> and the >> original behaviour restored. deprecating would make perfect sense if >> bioperl interfaces really were interfaces, which they are not. >> >> Cheers >> Chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Aaron J. Mackey, Ph.D. > Project Manager, ApiDB Bioinformatics Resource Center > Penn Genomics Institute, University of Pennsylvania > email: amackey@pcbi.upenn.edu > office: 215-898-1205 > fax: 215-746-6697 > postal: Penn Genomics Institute > Goddard Labs 212 > 415 S. University Avenue > Philadelphia, PA 19104-6017 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Thu Jun 9 22:17:48 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jun 9 22:16:41 2005 Subject: [Bioperl-l] release date when? In-Reply-To: <42A8ED4E.4010300@utk.edu> References: <42A84A6C.4000509@gpc-biotech.com> <43ab7b9c98ed5009ae49de43798dd1fb@gmx.net> <42A8ED4E.4010300@utk.edu> Message-ID: Aside from some tests failing on some platforms and a variety of modules emitting an excessive amount of perl-triggered warning messages, the last agreed upon consensus was to write up a couple of tests that verify that the traditional SeqFeatureI interface behaviour is maintained. Doing a stable release doesn't make much sense before that's taken place and the thereby identified problems have been ironed out because otherwise it'd suffer from the almost the same problems as 1.5 has. -hilmar On Jun 10, 2005, at 9:30 AM, Stefan Kirov wrote: > > > Hilmar Lapp wrote: > >> Guys, don't over-inflate this. I said no offense intended, Stefan >> said none taken, so why don't we discontinue this thread and you guys >> just fix it at some point. OK? > > Yup :-) . But I really want to know when the next release is going to > happen. Some of my code is a mess right now, I have few additional > parsers done I have not tested well yet... So if it is going to happen > soon I am in trouble. I know there was some discussion about this in > the recent past, but please help a lazy guy out here and let me > know... > Stefan > >> >> -hilmar >> >> On Jun 9, 2005, at 9:55 PM, Mingyi Liu wrote: >> >>> Brian Osborne wrote: >>> >>>> Bioperl, >>>> >>>> >>>>> Stefan and Mingyi should fix whatever is needed to silence those >>>>> warnings. (No offense please) >>>>> >>>> >>>> Yes, I should support Hilmar here. When we are working through the >>>> full test >>>> suite before a formal release our goal has always been to eliminate >>>> all >>>> errors *and* warnings, not just errors. >>>> >>>> Brian O. >>>> >>>> >>> Hi, Brian & Hilmar, >>> >>> I think both of you misunderstood (or missed part of) the previous >>> email exchanges between Stefan and me and jumped to conclusion too >>> soon. In my previous email, it began with "Stefan's right in >>> suggesting you turn off -w, which would make your script work.", >>> which likely led to your worries. But that's totally unnecessary - >>> this is just a temporary solution suggested to end user. It is not >>> a suggestion that we will not fix it or end user should change their >>> programming habit. >>> >>> In fact, if you please read on, in the latter part of the same >>> message, I suggested a fix that would fix the problem end user saw, >>> which Stefan immediately responded that he'd incorporate it. I'm >>> sure Stefan would get rid of those annoying "use of unitialized >>> value" warnings too. That's the long term solution we proposed and >>> (to be) implemented to fix the issue raised by end user. So truly >>> there's no reason to worry or take sides here. We don't really have >>> a difference in our user support approaches, despite our apparent >>> different attitudes towards the importance of '-w' switch. :) >>> >>> Mingyi >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > -- > Stefan Kirov, Ph.D. > University of Tennessee/Oak Ridge National Laboratory > 5700 bldg, PO BOX 2008 MS6164 > Oak Ridge TN 37831-6164 > USA > tel +865 576 5120 > fax +865-576-5332 > e-mail: skirov@utk.edu > sao@ornl.gov > > "And the wars go on with brainwashed pride > For the love of God and our human rights > And all these things are swept aside" > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dbastar at yahoo.com Fri Jun 10 00:32:42 2005 From: dbastar at yahoo.com (Duangdaow Kanhasiri) Date: Fri Jun 10 00:25:07 2005 Subject: [Bioperl-l] Error loading sequence with load_seqdatabase.pl In-Reply-To: Message-ID: <20050610043242.93697.qmail@web40711.mail.yahoo.com> Thank you very much for your advice, Hilmar... I've found that, after several times of restarting the perl script load_seqdatabase.pl and it wouldn't run any more with the error as: $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk & [1]+ Segmentation fault perl load_seqdatabase.pl /export/Bacteria/*/*.gbk & Finally I couldn't even run any perl command anymore as: $ perl --version Segmentation fault I've ask for help aroud here and someone told me that the command I used might have loaded to many sequence files names at one time that make it impossible for the perl script load_seqdatabase.pl to handle. I've done the work around for this problem by re-booting the machine (in -order to make the perl work again), and devided the genbank sequence files to be loaded with the perl script load_seqdatabase.pl into many small number of files and load them one small chunk at a time. Now it started to run again and this time I'm praying for it to continue working, if not, another pray is that I could re-start it agin after it failed. Regards, Davina --- Hilmar Lapp wrote: > First off, I wouldn't continue this thread on > biosql-l, it's for schema > questions and this clearly isn't one. The script is > a perl/bioperl > script and some people on the bioperl-l list may > happen to be on a > platform similar to yours. > > Second, you're saying now the script wouldn't run at > all anymore? With > what error message? You might try to supply --debug > as an option if the > script does at least something before it dies. The > output will be > potentially extensive so be sure to capture in a > file, then send it me. > > If the script dies immediately I'm afraid I can't do > anything. My gut > feeling is that there is something with your > compiler, C runtime > library, or DBI/DBD compiled code that doesn't play > well with your > perl. Did you do a binary install of perl or did you > compile it from > source? > > Really your best bet is to find someone who's on the > same or a similar > platform and see whether he/she has had similar > problems or none of > these. > > -hilmar > > On Jun 9, 2005, at 11:55 AM, Duangdaow Kanhasiri > wrote: > > > The system I use hase following configs: > > > > CPU: 2 @ AthlonXP2000 > > OS: Rocks Cluster v 3.3 > > Total Memory: 2 GB > > DBD::Pg version: 1.42 > > DBI version: 1.48 > > > > I've attached the out put of the top command > (top.txt) > > with this mail. Unfortunately that the script > > load_seqdatabase.pl wouldn't run anymore, no > matter > > how many time I tried running it, therefore, I > > couldn't measure how much it consumes the resource > > (cpu, memory) on the machine. > > > > Regards, > > > > Davina > > > > > > --- Hilmar Lapp wrote: > > > >> What OS are you running this on? How much memory > >> have you got on the > >> machine on which you run the script, and on the > >> machine on which you > >> run the database? Are these the same or not? > Which > >> version of DBI and > >> DBD::Pg? > >> > >> This hasn't been reported by anyone else really > so I > >> suspect it's > >> either due to too limited memory, or a problem in > >> the DBD driver or in > >> the DBI compiled code. Can you watch the process > >> (using, e.g., top) and > >> see how fast it increases in memory consumption? > >> Since you can continue > >> when you restart it's not something specific to > one > >> sequence that would > >> trigger the problem; rather it appears whenever > you > >> have run through a > >> certain number of entries the process dies. > >> > >> -hilmar > >> > >> On Jun 8, 2005, at 7:43 PM, Duangdaow Kanhasiri > >> wrote: > >> > >>> Hi, > >>> > >>> I've used the bioperl script load_seqdatabase.pl > >> (came > >>> with the biosql' scripts) to load the bacterial > >>> sequence in genbank format(*.gbk) into > PostgreSQL > >> 8.0 > >>> database on Linux machine as: > >>> > >>> $perl load_seqdatabase.pl > /export/Bacteria/*/*.gbk > >> & > >>> > >>> Where under the /export/Bacteria/ path are the > >>> Bacteria's name path e.g. Acinetobacter_sp_ADP1 > >> and > >>> the file name are like NC_006824.gbk. > >>> > >>> Previously it used to load some sequences in to > >> some > >>> tables in biosql database (count from table > >> bioentry) > >>> > >>> bioseq=# select count(*) from bioentry; > >>> count > >>> ------- > >>> 33 > >>> (1 row) > >>> > >>> > >>> However, after a while it then stopped with the > >> the > >>> error: > >>> > >>> [1]+ Segmentation fault perl > >> load_seqdatabase.pl > >>> /export/Bacteria/*/*.gbk & > >>> > >>> I then checked and removed the *.gbk file that > >> have > >>> already been loaded in to the table, leaving > only > >> the > >>> unloaded ones and ran the scripted again. It > >>> continued to work for some times and stopped > >> again. I > >>> repeated the process several times until 173 > >> sequences > >>> were loaded into the table: > >>> > >>> bioseq=# select count(*) from bioentry; > >>> count > >>> ------- > >>> 173 > >>> (1 row) > >>> > >>> The program then stopped again and this time it > >>> wouldn't run anymore even I tried with only on > >> file. > >>> The error is still the same like: > >>> > >>> $ perl load_seqdatabase.pl > >>> > >> > > > /export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk > >>> Segmentation fault > >>> $ > >>> > >>> Now I couldn't load the rest of my sequences > into > >> the > >>> database anymore. I would be very apprecialed > if > >> any > >>> one knows how to solve the "Segmentation fault" > >>> problem? > >>> > >>> Regards, > >>> > >>> Davina > >>> > >>> > >>> > >>> __________________________________ > >>> Discover Yahoo! > >>> Have fun online with music videos, cool games, > IM > >> and more. Check it > >>> out! > >>> http://discover.yahoo.com/ > >>> > >> > > > online.html_______________________________________ > >> > >>> ________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> > >> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> -- > >> > > > ------------------------------------------------------------- > === message truncated === __________________________________ Discover Yahoo! Use Yahoo! to plan a weekend, have fun online and more. Check it out! http://discover.yahoo.com/ From chad at dieselwurks.com Fri Jun 10 01:40:00 2005 From: chad at dieselwurks.com (Chad Matsalla) Date: Fri Jun 10 01:31:58 2005 Subject: [Bioperl-l] genbank2gff and how do I use gff3 with Bio::DB::GFF? Message-ID: Greetings! I have a genbank file that I would like to convert to GFF so that I can view it in a GBrowse. I would also like to use the normal Bio::DB::GFF (memory adaptor) interface to work with this data. I took a genbank entry (in this case t/data/AE003644_Adh-genomic.gb) and ran the bp_genbank2gff.pl program like this: /usr/bin/bp_genbank2gff.pl --stdout --file t/data/AE003644_Adh-genomic.gb > csm.gff3 Now, I want to get the segment represented by AE003644. If this was gff2, I would successfully get that like this - I've chosen the gffv2 file t/data/biodbgff/test.gff my $db2 = Bio::DB::GFF->new( -adaptor => 'memory', -file => 't/data/biodbgff/test.gff' ); my $segment2 = $db2->segment(-name => 'Contig1'); my @features = $segment2->features(); print("There are this many features on Contig1 (".scalar(@features).")\n"); However, with gff3 I'm not able to get the segment. ($segment is undef) Am I missing something? Can someone push me in the right direction? I also want to use gff from genbank2gff in gbrowse - I can't seem to get the thing to appear and I assume it's related to this. Is this an issue with the reference class? AE003644 Genbank region 1 263309 . . . ID=AE003644;Alias=AE002690,AE014134 Chad Matsalla From dwrice at indiana.edu Fri Jun 10 01:41:46 2005 From: dwrice at indiana.edu (Danny Rice) Date: Fri Jun 10 01:33:35 2005 Subject: [Bioperl-l] avoiding feature parsing Message-ID: <42A9281A.1020102@indiana.edu> I'm cranking through a bunch of genbank or fasta files named by their ncbi gi. The large genbank files take a huge amount of time to parse all the feature info but I am only interested in the sequence. I've looked at the modules and read the docs but haven't found good documentation on how to read a genbank file without parsing all the feature info. I tried my $seqio = Bio::SeqIO->new(-file => "$dir/$gi", -format => "fasta"); and to my surprise it seems to parse the genbank files correctly but only gets the sequence, which seems to solve the problem. My only question is "Is this the expected behavior and can I rely on this working? And. Is their any documentation on this behavior?". I suppose this figures out that I mean: "I'm only interested in the sequence but go ahead and figure out the format of the input file if it isn't already in fasta format." If there is a more standard or faster way to just get the sequence from a genbank file I'd be interested in that also. -Danny From Marc.Logghe at devgen.com Fri Jun 10 05:05:30 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Fri Jun 10 04:57:53 2005 Subject: [Bioperl-l] avoiding feature parsing Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E82F@ANTARESIA.be.devgen.com> Hi Danny, > my $seqio = Bio::SeqIO->new(-file => "$dir/$gi", -format => "fasta"); > > and to my surprise it seems to parse the genbank files > correctly but only gets the sequence, which seems to solve > the problem. My only question is "Is this the expected > behavior and can I rely on this working? And. Is their any Problem is, bioperl actually has parsed the complete gebank record, *including* the feature table. Of course, they do not show up in the fasta dump, but all features are attached to the sequence object anyways. This is the default behaviour. But you can say to the builder that you are not interested in the features. >From the perldoc of Bio::Seq::SeqBuilder: my $builder = $seqio->sequence_builder(); # if you need only sequence, id, and description (e.g. for # conversion to FASTA format): $builder->want_none(); $builder->add_wanted_slot('display_id','desc','seq'); # if you want everything except the sequence and features $builder->want_all(1); # this is the default if it's untouched $builder->add_unwanted_slot('seq','features'); HTH, Marc From hlapp at gmx.net Fri Jun 10 05:48:25 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jun 10 05:41:12 2005 Subject: [Bioperl-l] avoiding feature parsing In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E82F@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA606E82F@ANTARESIA.be.devgen.com> Message-ID: Excellent answer Marc. I was starting to look up the docs again but you were much faster. -hilmar On Jun 10, 2005, at 5:05 PM, Marc Logghe wrote: > Hi Danny, > >> my $seqio = Bio::SeqIO->new(-file => "$dir/$gi", -format => "fasta"); >> >> and to my surprise it seems to parse the genbank files >> correctly but only gets the sequence, which seems to solve >> the problem. My only question is "Is this the expected >> behavior and can I rely on this working? And. Is their any > > Problem is, bioperl actually has parsed the complete gebank record, > *including* the feature table. Of course, they do not show up in the > fasta dump, but all features are attached to the sequence object > anyways. This is the default behaviour. > But you can say to the builder that you are not interested in the > features. >> From the perldoc of Bio::Seq::SeqBuilder: > > my $builder = $seqio->sequence_builder(); > # if you need only sequence, id, and description (e.g. for > # conversion to FASTA format): > $builder->want_none(); > $builder->add_wanted_slot('display_id','desc','seq'); > > # if you want everything except the sequence and features > $builder->want_all(1); # this is the default if it's untouched > $builder->add_unwanted_slot('seq','features'); > > HTH, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri Jun 10 05:49:02 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jun 10 05:41:26 2005 Subject: [Bioperl-l] avoiding feature parsing In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E82F@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA606E82F@ANTARESIA.be.devgen.com> Message-ID: <1a2f2f558e58fc7c9d00620cd6acd6db@gmx.net> Excellent answer Marc. I was starting to look up the docs again but you were much faster. -hilmar On Jun 10, 2005, at 5:05 PM, Marc Logghe wrote: > Hi Danny, > >> my $seqio = Bio::SeqIO->new(-file => "$dir/$gi", -format => "fasta"); >> >> and to my surprise it seems to parse the genbank files >> correctly but only gets the sequence, which seems to solve >> the problem. My only question is "Is this the expected >> behavior and can I rely on this working? And. Is their any > > Problem is, bioperl actually has parsed the complete gebank record, > *including* the feature table. Of course, they do not show up in the > fasta dump, but all features are attached to the sequence object > anyways. This is the default behaviour. > But you can say to the builder that you are not interested in the > features. >> From the perldoc of Bio::Seq::SeqBuilder: > > my $builder = $seqio->sequence_builder(); > # if you need only sequence, id, and description (e.g. for > # conversion to FASTA format): > $builder->want_none(); > $builder->add_wanted_slot('display_id','desc','seq'); > > # if you want everything except the sequence and features > $builder->want_all(1); # this is the default if it's untouched > $builder->add_unwanted_slot('seq','features'); > > HTH, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cjm at fruitfly.org Fri Jun 10 12:36:25 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Jun 10 12:27:22 2005 Subject: [Bioperl-l] Bio::RangeI::union In-Reply-To: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> References: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> Message-ID: You're right, I could just do $newrange = Bio::Range->union(@ranges) And it would work fine However, the calling context is *another* "decorated" interface in RangeI (disconnected_ranges) - it seems very odd to have a an interface "method" calling a method in a class that implements that interface I'm not sure exactly what you're proposing when you say "deprecating the Bio::RangeI->union() construct" - how would this work? Would it just throw a warning if $self eq "Bio::RangeI"? On Thu, 9 Jun 2005, Aaron J. Mackey wrote: > Or, how about deprecating the Bio::RangeI->union() construct (since > although we've supplied implementations in interface, we don't need > to encourage people to use them)? This is just a weird way to do > $newrange = Bio::Range->union(@ranges), right? (which saves a whole > capitalized keystroke!) > > -Aaron > > On Jun 9, 2005, at 2:45 PM, Chris Mungall wrote: > > > > > The pod docs for union() state that this is is valid: > > > > my $newrange = Bio::RangeI->union(@ranges); > > > > In the subroutine body, this gets called: > > > > my $self = shift; > > ... > > $self->new(...) > > > > Since $self is equal to the string "Bio::RangeI", rather than an > > object > > implementing this interface, this will result in a call to > > > > Bio::Root::RootI->new("Bio::RangeI",...) > > > > This works fine in bp1.4, but in recent bioperl revisions this > > results in > > a warning message that Bio::Root::RootI->new is deprecated, and a > > delegation to Bio::Root::Root, **omitting the name of the class to be > > created**, thus creating a Bio::Root::Root object, which is useless > > and > > will inevitably break any code calling the union() method. > > > > I think this delegation is completely wrong, and should be removed, > > and > > the warning message switched to an error; OR it should be > > undeprecated and > > the original behaviour behaviour restored > > > > If we decide that RootI->new is truly deprecated, then Bio::RangeI > > should > > have to do some $self examination, and use the correct object > > instantiation method, rather than $self->new. I don't really know > > what the > > correct object instantiation method is - perhaps just Bio::Range- > > >new()? > > Or should a factory be used? > > > > Personally, I would prefer it if Bio::RootI->new were undeprecated > > and the > > original behaviour restored. deprecating would make perfect sense if > > bioperl interfaces really were interfaces, which they are not. > > > > Cheers > > Chris > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Aaron J. Mackey, Ph.D. > Project Manager, ApiDB Bioinformatics Resource Center > Penn Genomics Institute, University of Pennsylvania > email: amackey@pcbi.upenn.edu > office: 215-898-1205 > fax: 215-746-6697 > postal: Penn Genomics Institute > Goddard Labs 212 > 415 S. University Avenue > Philadelphia, PA 19104-6017 > > From brian_osborne at cognia.com Fri Jun 10 12:56:53 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jun 10 12:49:05 2005 Subject: [Bioperl-l] Problem in parsing GenBank flatfile In-Reply-To: Message-ID: Yoshida, Whatever the problem is it appears to be fixed in bioperl-live, in CVS. I downloaded the most recent NT_015926 and it's parsed correctly. Brian O. On 6/9/05 5:39 AM, "Yoshida Yuichi" wrote: > Dear all, > > I am trying to parse GenBank flatfile (accession num is NT_015926) > by calling Bio::SeqIO modules, but I can not. > > - - - - - - - - - - - - - Perl program code - - - - - - - - - - - - - > #!/usr/bin/perl > use Bio::SeqIO; > > $gbk_filename = shift @ARGV; > $seqin = Bio::SeqIO->new(-file=>$gbk_filename, -format=>'Genbank'); > > while ($seqobj = $seqin->next_seq) { > $accession = $seqobj->accession_number,"\n"; > foreach my $feat ($seqobj->get_SeqFeatures()){ > if ($feat->primary_tag eq 'mRNA'){ > $db_gene_name = join(' ',$feat->get_tag_values('gene')); > $db_transcript_id = join(' > ',$feat->get_tag_values('transcript_id')); > $start = $feat->start; $end = $feat->end; > print $db_transcript_id,"\t",$db_gene_name,"\t",$accession,"\t"; > print $start,"\t",$end,"\n"; > } > } > } > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > The following error message is shown. > > - - - - - - - - - - - - - error message - - - - - - - - - - - - - > -------------------- WARNING --------------------- > MSG: cannot see new qualifier in feature CDS: aa:OTHER) > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: cannot see new qualifier in feature CDS: aa:OTHER) > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: cannot see new qualifier in feature CDS: aa:OTHER) > --------------------------------------------------- > out of memory > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > The parts which cause the error (I guess) is shown as the followings. > > - - - - - GenBank partial flatfile (NT_015926) - - - - - > CDS complement(join(4528741..4528932,4543408..4543490, > 4581809..4582043,4616648..4616817,4632093..4632236, > 4643148..4643301)) > /gene="FLJ21820" > /note="go_function: catalytic activity [goid 0003824] > [evidence IEA]; > go_process: lipid metabolism [goid 0006629] [evidence > IEA]" > /codon_start=1 > /product="hypothetical protein FLJ21820" > /protein_id="NP_068744.1" > /db_xref="GI:11345458" > /db_xref="GeneID:60526" > /db_xref="LocusID:60526" > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Would you please tell me the way to solve this problem? > > -- > Yuichi Yoshida > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From amackey at pcbi.upenn.edu Fri Jun 10 13:02:43 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Fri Jun 10 12:54:30 2005 Subject: [Bioperl-l] Bio::RangeI::union In-Reply-To: References: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> Message-ID: <5EEB59C1-E7F1-45AC-9887-F618F3D3C584@pcbi.upenn.edu> Yep (using whatever deprecation method [warn, throw, etc] we use elsewhere). But I guess we'd need to check that Bio::RangeI itself doesn't ever do this. -Aaron On Jun 10, 2005, at 12:36 PM, Chris Mungall wrote: > I'm not sure exactly what you're proposing when you say > "deprecating the > Bio::RangeI->union() construct" - how would this work? Would it > just throw > a warning if $self eq "Bio::RangeI"? -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From cjm at fruitfly.org Fri Jun 10 15:53:07 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Jun 10 15:43:39 2005 Subject: [Bioperl-l] Bio::RangeI::union In-Reply-To: <5EEB59C1-E7F1-45AC-9887-F618F3D3C584@pcbi.upenn.edu> References: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> <5EEB59C1-E7F1-45AC-9887-F618F3D3C584@pcbi.upenn.edu> Message-ID: What about Bio::RangeI->disconnected_ranges(), which currently calls Bio::RangeI->union()? It seems highly egregious for a decorated interface to call an implementing class. Personally I'd rather we gave up the pretense that RangeI is an interface, and just admit it's a class like any other, and allow static method calls like RangeI->union() (and avoiding forcing people to change code that conforms perfectly to the documentation). I'm afraid This whole decorated interface concept makes no sense whatsoever to me. On Fri, 10 Jun 2005, Aaron J. Mackey wrote: > Yep (using whatever deprecation method [warn, throw, etc] we use > elsewhere). But I guess we'd need to check that Bio::RangeI itself > doesn't ever do this. > > -Aaron > > On Jun 10, 2005, at 12:36 PM, Chris Mungall wrote: > > > I'm not sure exactly what you're proposing when you say > > "deprecating the > > Bio::RangeI->union() construct" - how would this work? Would it > > just throw > > a warning if $self eq "Bio::RangeI"? > > -- > Aaron J. Mackey, Ph.D. > Project Manager, ApiDB Bioinformatics Resource Center > Penn Genomics Institute, University of Pennsylvania > email: amackey@pcbi.upenn.edu > office: 215-898-1205 > fax: 215-746-6697 > postal: Penn Genomics Institute > Goddard Labs 212 > 415 S. University Avenue > Philadelphia, PA 19104-6017 > > -- Chris From cjm at fruitfly.org Fri Jun 10 16:25:58 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Jun 10 16:17:33 2005 Subject: [Bioperl-l] Bio::RangeI::union In-Reply-To: References: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> <5EEB59C1-E7F1-45AC-9887-F618F3D3C584@pcbi.upenn.edu> Message-ID: On Fri, 10 Jun 2005, Chris Mungall wrote: > What about Bio::RangeI->disconnected_ranges(), which currently calls > Bio::RangeI->union()? It seems highly egregious for a decorated interface > to call an implementing class. Ah, of course, disconnected_ranges can just call $self->union(), and provided the caller respects the convention of not calling static methods on an interface, then everything should be fine.. > Personally I'd rather we gave up the pretense that RangeI is an interface, > and just admit it's a class like any other, and allow static method calls > like RangeI->union() (and avoiding forcing people to change code that > conforms perfectly to the documentation). I'm afraid This whole decorated > interface concept makes no sense whatsoever to me. > > On Fri, 10 Jun 2005, Aaron J. Mackey wrote: > > > Yep (using whatever deprecation method [warn, throw, etc] we use > > elsewhere). But I guess we'd need to check that Bio::RangeI itself > > doesn't ever do this. > > > > -Aaron > > > > On Jun 10, 2005, at 12:36 PM, Chris Mungall wrote: > > > > > I'm not sure exactly what you're proposing when you say > > > "deprecating the > > > Bio::RangeI->union() construct" - how would this work? Would it > > > just throw > > > a warning if $self eq "Bio::RangeI"? > > > > -- > > Aaron J. Mackey, Ph.D. > > Project Manager, ApiDB Bioinformatics Resource Center > > Penn Genomics Institute, University of Pennsylvania > > email: amackey@pcbi.upenn.edu > > office: 215-898-1205 > > fax: 215-746-6697 > > postal: Penn Genomics Institute > > Goddard Labs 212 > > 415 S. University Avenue > > Philadelphia, PA 19104-6017 > > > > > > -- Chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Chris From amackey at pcbi.upenn.edu Fri Jun 10 16:26:24 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Fri Jun 10 16:18:21 2005 Subject: [Bioperl-l] Bio::RangeI::union In-Reply-To: References: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> <5EEB59C1-E7F1-45AC-9887-F618F3D3C584@pcbi.upenn.edu> Message-ID: <66687491-B24C-4A49-9B6F-EC8AE5B7735E@pcbi.upenn.edu> Right. I was just composing an email to that effect. -Aaron On Jun 10, 2005, at 4:25 PM, Chris Mungall wrote: > Ah, of course, disconnected_ranges can just call $self->union(), and > provided the caller respects the convention of not calling static > methods > on an interface, then everything should be fine.. > -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From cjm at fruitfly.org Fri Jun 10 20:07:22 2005 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Jun 10 19:57:53 2005 Subject: [Bioperl-l] Bio::RangeI::union In-Reply-To: <66687491-B24C-4A49-9B6F-EC8AE5B7735E@pcbi.upenn.edu> References: <2D0A6A21-409F-45C7-9625-EC4E64831A6C@pcbi.upenn.edu> <5EEB59C1-E7F1-45AC-9887-F618F3D3C584@pcbi.upenn.edu> <66687491-B24C-4A49-9B6F-EC8AE5B7735E@pcbi.upenn.edu> Message-ID: OK, I've fixed RangeI. static method calls from RangeI result in a warning msg meing thrown, and the call delegated to Bio::Range tests pass, committed to cvs. And no more moaning about decorated interfaces from me! On Fri, 10 Jun 2005, Aaron J. Mackey wrote: > Right. I was just composing an email to that effect. -Aaron > > On Jun 10, 2005, at 4:25 PM, Chris Mungall wrote: > > > Ah, of course, disconnected_ranges can just call $self->union(), and > > provided the caller respects the convention of not calling static > > methods > > on an interface, then everything should be fine.. > > > > -- > Aaron J. Mackey, Ph.D. > Project Manager, ApiDB Bioinformatics Resource Center > Penn Genomics Institute, University of Pennsylvania > email: amackey@pcbi.upenn.edu > office: 215-898-1205 > fax: 215-746-6697 > postal: Penn Genomics Institute > Goddard Labs 212 > 415 S. University Avenue > Philadelphia, PA 19104-6017 > > -- Chris From avilella at gmail.com Sat Jun 11 15:18:04 2005 From: avilella at gmail.com (Albert Vilella) Date: Sat Jun 11 15:12:45 2005 Subject: [Bioperl-l] patch xyplot.pm negative values Bio::Graphics/GMOD Message-ID: <1118517484.8515.12.camel@localhost.localdomain> Hi all, xyplot negative values: Lincoln and Guenther: this are very hacky modifications (I still haven't completely familiarized myself with Bio::Graphics) I took a look at the code in xyplot.pm, and tested it on a local png file (I don't have a GBrowse setup to test). Please check if what has been done is correct. It seems to me that _draw_boxes needs to be modified, so that the boxes aren't wrongly plotted, but _draw_histogram, _draw_line and _draw_points don't need anything special, a part from the existence of the scale. In _draw_boxes, negative boxes need to be plotted from the middle of the track down to their score, so I modified it in a similar fashion as it is done in _draw_scale --- xyplot.pm.~1.15.~ 2005-06-11 08:03:02.000000000 +0200 +++ xyplot.pm 2005-06-11 21:04:37.895586696 +0200 @@ -64,7 +64,8 @@ my $type = $self->option('graph_type') || $self->option('graphtype') || 'boxes'; $self->_draw_histogram($gd,$x,$y) if $type eq 'histogram'; - $self->_draw_boxes($gd,$x,$y) if $type eq 'boxes'; + #we need $dx and $dy for calculating $half for negative boxes + $self->_draw_boxes($gd,$x,$y,$dx,$dy) if $type eq 'boxes'; $self->_draw_line ($gd,$x,$y) if $type eq 'line' or $type eq 'linepoints'; $self->_draw_points($gd,$x,$y) if $type eq 'points' @@ -141,7 +142,7 @@ sub _draw_boxes { my $self = shift; - my ($gd,$left,$top) = @_; + my ($gd,$left,$top,$dx,$dy) = @_; my @parts = $self->parts; my $fgcolor = $self->fgcolor; @@ -152,8 +153,17 @@ for (my $i = 0; $i < @parts; $i++) { my $part = $parts[$i]; my $next = $parts[$i+1]; + my ($dummy1,$dummy2,$dummy3,$zero) = $part->calculate_boundaries($left,0); my ($x1,$y1,$x2,$y2) = $part->calculate_boundaries($left,$top); - $self->filled_box($gd,$x1,$part->{_y_position},$x2,$y2,$bgcolor, $fgcolor); + # If negative box + if ($zero < ($part->{_y_position})) { + my ($dummyx1,$ny1,$dummyx2,$ny2) = $self->calculate_boundaries($dx,$dy); + my $half = ($ny1+$ny2)/2; + $self->filled_box($gd,$x1,$part->{_y_position},$x2,$half, $bgcolor,$fgcolor); + # Normal positive box + } else { + $self->filled_box($gd,$x1,$part->{_y_position},$x2,$y2, $bgcolor,$fgcolor); + } next unless $next; my ($x3,$y3,$x4,$y4) = $next->calculate_boundaries($left,$top); $gd->line($x2,$y2,$x3,$y4,$fgcolor) if $x2 < $x3; This results in boxes being plot upward if positive and downward if negative from the middle of the track, (midpoint between $min_score and $max_score). I don't know if it is the best thing to do, but it makes sense to me. Also, negative boxes will have a different color than positive boxes. It also makes sense to me. If you can check if it is ok and test it in a running GBrowse server, that would be great. Bests, Albert. PD: I haven't done anything about the log coordinates. From jason.stajich at duke.edu Sun Jun 12 16:15:36 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Jun 12 16:09:13 2005 Subject: [Bioperl-l] Putative bug in Bio/SearchIO/blast.pm In-Reply-To: <20050609145859.0000248d@pearson.infobiogen.fr> References: <20050609145859.0000248d@pearson.infobiogen.fr> Message-ID: <56F7C7EB-10CD-4D2F-B9E7-6B701392EB60@duke.edu> I'm pretty sure this is already fixed with the code in CVS. Can you try it with that code -- the regexp was already corrected in connection to very large bitscores (Scott Markel's bug) - it is done slightly differently from your suggested fix as we need to handle sci- value in that field as well. Also, please post bugs and diffs to http://bugzilla.open-bio.org so that they can be tracked - otherwise folks have to follow a mailing list thread to find out the resolution for a bug. -jason On Jun 9, 2005, at 8:58 AM, J?r?my JUST wrote: > > Hello, > > I think I've found a little bug in the Blast parser. > On that Blast result (BLASTN 2.2.9 [May-01-2004]): > > <<<<<< > > Score E > Sequences producing significant alignments: > (bits) Value > > gi|42592260|ref|NC_003070.5| Arabidopsis thaliana chromosome > 1, ... 75.8 8e-13 > > >> gi|42592260|ref|NC_003070.5| Arabidopsis thaliana chromosome 1, >> complete >> > sequence > Length = 30432563 > > Score = 75.8 bits (38), Expect = 8e-13 > Identities = 104/126 (82%) > Strand = Plus / Minus > >>>>>>> >>>>>>> > > the score is read as ? 8 ? instead of ? 75.8 ?. > > > > I attach a tiny patch against Bioperl-1.5.0. > ($Id: blast.pm,v 1.84 2004/10/28 20:40:12 jason Exp $) > > > Cheers. > > > -- > J?r?my JUST > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From hartzell at kestrel.alerce.com Sun Jun 12 18:09:52 2005 From: hartzell at kestrel.alerce.com (George Hartzell) Date: Sun Jun 12 18:03:20 2005 Subject: [Bioperl-l] Bio::Tools::Primer3.pm bug: code or doc, you decide??? Message-ID: <17068.45744.979958.907829@satchel.alerce.com> The pod for Bio::Tools::Primer3::number_of_results says for Function: "Retrieve the number of primers returned from Primer3" and for it's Notes:, it says "Returns the maximum number of primers returned from Primer3." In fact, it's returning the maximum offset of the array of results, also known as one *less* than the number of results that primer3 returned.... (see Bio::Tools::Primer3::_separate() and it's use of $maxlocation and $self->{'maximum_primers_returned'}). Given the method's *name*, I figured it'd tell me how many primers I'd gotten back from primer3. Given the method's Function doc, ditto.... Given the method's Notes, it's a bit fuzzier, but it still evokes a count. I'd like to fix *something*, so that I'll feel better about wasting an hour walking the primer3 C code and pestering the authors about an off by one bug..... So, should I change the method to add one, or change the documentation to specify that it's the maximum offset into the results array (even though the names then misleading). I'd rather fix the code, but *I* don't have to be backward compatible.... Comments? g. From brian_osborne at cognia.com Sun Jun 12 18:35:28 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Sun Jun 12 18:27:12 2005 Subject: [Bioperl-l] Bio::Tools::Primer3.pm bug: code or doc, you decide??? In-Reply-To: <17068.45744.979958.907829@satchel.alerce.com> Message-ID: George, You're presenting a good argument that it's the code that should change. Brian O. On 6/12/05 6:09 PM, "George Hartzell" wrote: > > The pod for Bio::Tools::Primer3::number_of_results says for Function: > > "Retrieve the number of primers returned from Primer3" > > and for it's Notes:, it says > > "Returns the maximum number of primers returned from Primer3." > > In fact, it's returning the maximum offset of the array of results, > also known as one *less* than the number of results that primer3 > returned.... (see Bio::Tools::Primer3::_separate() and it's use of > $maxlocation and $self->{'maximum_primers_returned'}). > > Given the method's *name*, I figured it'd tell me how many primers I'd > gotten back from primer3. > > Given the method's Function doc, ditto.... > > Given the method's Notes, it's a bit fuzzier, but it still evokes a > count. > > I'd like to fix *something*, so that I'll feel better about wasting an > hour walking the primer3 C code and pestering the authors about an off > by one bug..... > > So, should I change the method to add one, or change the documentation > to specify that it's the maximum offset into the results array (even > though the names then misleading). > > I'd rather fix the code, but *I* don't have to be backward > compatible.... > > Comments? > > g. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hartzell at kestrel.alerce.com Sun Jun 12 20:52:24 2005 From: hartzell at kestrel.alerce.com (George Hartzell) Date: Sun Jun 12 20:43:58 2005 Subject: [Bioperl-l] A possible fix for Bio::Tools::Run::Primer3::run(). Message-ID: <17068.55496.742709.93166@satchel.alerce.com> Me again, Can someone review this patch? Bio::Tools::Primer3::_separate refers to $self->{input_options}, which it expects to be a reference to a hash containing the arguments that were passed to primer3, to help it figure out what is a result and what's not. Unfortunately, it doesn't look like it's ever set in the results object that's passed back, so if you use $results->primer_results(0), you end up with the input arguments mixed in. I think that this fixes it. I'm sure there's a way to do it in one line..... g. *** Bio/Tools/Run/Primer3.pm.orig Tue Mar 1 18:26:31 2005 --- Bio/Tools/Run/Primer3.pm Sun Jun 12 17:41:51 2005 *************** *** 452,457 **** --- 452,466 ---- $self->{results_obj} = new Bio::Tools::Primer3; $self->{results_obj}->_set_variable('results', $self->{results}); $self->{results_obj}->_set_variable('seqobject', $self->{seqobject}); + # Bio::Tools::Primer3::_separate needs a hash of the primer3 arguments, + # with the arg as the key and the value as the value (surprise!). + my %input_hash; + foreach my $line (@{$self->{'primer3_input'}}) { + my ($key, $value) = split '=', $line; + $input_hash{$key} = $value; + } + $self->{results_obj}->_set_variable('input_options', \%input_hash); + $self->{results_separated}= $self->{results_obj}->_separate(); return $self->{results_obj}; } From hartzell at kestrel.alerce.com Sun Jun 12 21:23:00 2005 From: hartzell at kestrel.alerce.com (George Hartzell) Date: Sun Jun 12 21:16:29 2005 Subject: [Bioperl-l] A possible fix for Bio::Tools::Run::Primer3::run(). In-Reply-To: <17068.55496.742709.93166@satchel.alerce.com> References: <17068.55496.742709.93166@satchel.alerce.com> Message-ID: <17068.57332.657860.423494@satchel.alerce.com> George Hartzell writes: > > Me again, > > Can someone review this patch? > [...] Sorry to respond to my own post. Here's a better patch, knocks the original fix down to two lines and adds another fix so that the result object doesn't have an empty element in it (caused by processing the boulder io record terminator). Still, comments welcome. g. *** Bio/Tools/Run/Primer3.pm.orig Tue Mar 1 18:26:31 2005 --- Bio/Tools/Run/Primer3.pm Sun Jun 12 18:17:55 2005 *************** *** 437,442 **** --- 437,443 ---- print OUT $_; } chomp; + next if ($_ eq "="); # skip over the boulderio record terminator. my ($return, $value) = split('=',$_); $self->{'results'}->{$return} = $value; } *************** *** 452,457 **** --- 453,462 ---- $self->{results_obj} = new Bio::Tools::Primer3; $self->{results_obj}->_set_variable('results', $self->{results}); $self->{results_obj}->_set_variable('seqobject', $self->{seqobject}); + # Bio::Tools::Primer3::_separate needs a hash of the primer3 arguments, + # with the arg as the key and the value as the value (surprise!). + my %input_hash = map {split '='} @{$self->{'primer3_input'}}; + $self->{results_obj}->_set_variable('input_options', \%input_hash); $self->{results_separated}= $self->{results_obj}->_separate(); return $self->{results_obj}; } From avilella at gmail.com Mon Jun 13 05:56:27 2005 From: avilella at gmail.com (Albert Vilella) Date: Mon Jun 13 05:48:26 2005 Subject: [Bioperl-l] Re: patch xyplot.pm negative values Bio::Graphics/GMOD In-Reply-To: <1118655041.15945.40.camel@mango.techgate.insilico.com> References: <1118517484.8515.12.camel@localhost.localdomain> <1118655041.15945.40.camel@mango.techgate.insilico.com> Message-ID: <1118656587.8243.24.camel@localhost.localdomain> > I also have little experience on Bio::Graphics, but from my > understanding I also think that draw_boxes and draw_scale needs to be > modified. > > I did this as well and you can see my experiments on > http://genome.insilico.at/cgi-bin/gbrowse/nimblegen > > Though my approach is very hacky, I can share it. I made a copy to my > private module gwxyplot.pm which is an overkill but good for > experiments. I think this function should go into a new graph_type > option in xyplot. > > I think my solution is similar to yours, If you want I can test your > version on our gbrowse installation. This morning (with more caffeine in my veins) I found out that my previous patch was wrong: I confounded the middle-point with the zero-point in the vertical axis... > > I still miss features to draw the scale: > -) all over the detail view > -) also if no features are available, the empty scale should be drawn. > -) do correct padding if scale is drawn. Is this in only in GMOD side or does it also affect Bio::Graphics? > > In my approach I need to configure the min_score max_score otherwise the > scale is not done correct which is not very good ... So, basically, min_score, max_score and draw_scale parts need some work so that gwxyplot changes can be merged to xyplot. Is that right? Albert. > > PD: I haven't done anything about the log coordinates. From fernan at iib.unsam.edu.ar Mon Jun 13 12:22:07 2005 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Mon Jun 13 12:15:24 2005 Subject: [Bioperl-l] pubmed not used in Bio::Annotation::Reference if medline is present Message-ID: <20050613162207.GT953@iib.unsam.edu.ar> Hi! I'm parsing GenBank files using bioperl-1.4: my $bpSeqIoObj = Bio::SeqIO->new ( -format => 'GenBank', -file => $dataFile ); while ( my $bpSeqObj = $bpSeqIoObj->next_seq() ) { my $bpAnnotObj = $bpSeqObj->annotation(); my @bpRefObjs = $bpAnnotObj->get_Annotations('reference'); my $bpRefObj = shift(@bpRefObjs); my $pubmed = $bpRefObj->{'pubmed'}; my $medline= $bpRefObj->{'medline'}; ... } The problem that I'm seeing is that whenever a genbank record has both PUBMED and MEDLINE IDs, $bpRefObj->{'pubmed'} is empty. It would be OK if ->{'medline'} was empty, since MEDLINE has been replaced by pubmed, but this is not the case. Anyone can confirm this? I haven't found any bugs open related to this issue. Perhaps it's already fixed in 1.5 or in CVS? I can fix it if I knew who is loading the Bio::Annotation object upon parsing the GenBank file ... Fernan From brian_osborne at cognia.com Mon Jun 13 12:38:41 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Jun 13 12:32:33 2005 Subject: [Bioperl-l] pubmed not used in Bio::Annotation::Reference if medline is present In-Reply-To: <20050613162207.GT953@iib.unsam.edu.ar> Message-ID: Fernan, What is the $dataFile or id? Brian O. On 6/13/05 12:22 PM, "Fernan Aguero" wrote: > Hi! > > I'm parsing GenBank files using bioperl-1.4: > > my $bpSeqIoObj = Bio::SeqIO->new > ( -format => 'GenBank', -file => $dataFile ); > > while ( my $bpSeqObj = $bpSeqIoObj->next_seq() ) { > my $bpAnnotObj = $bpSeqObj->annotation(); > my @bpRefObjs = $bpAnnotObj->get_Annotations('reference'); > my $bpRefObj = shift(@bpRefObjs); > > my $pubmed = $bpRefObj->{'pubmed'}; > my $medline= $bpRefObj->{'medline'}; > > ... > } > > The problem that I'm seeing is that whenever a genbank > record has both PUBMED and MEDLINE IDs, > $bpRefObj->{'pubmed'} is empty. > > It would be OK if ->{'medline'} was empty, since MEDLINE has > been replaced by pubmed, but this is not the case. > > Anyone can confirm this? I haven't found any bugs open > related to this issue. Perhaps it's already fixed in 1.5 or > in CVS? I can fix it if I knew who is loading the > Bio::Annotation object upon parsing the GenBank file ... > > Fernan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From fernan at iib.unsam.edu.ar Mon Jun 13 13:33:12 2005 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Mon Jun 13 13:28:00 2005 Subject: [Bioperl-l] pubmed not used in Bio::Annotation::Reference if medline is present In-Reply-To: References: <20050613162207.GT953@iib.unsam.edu.ar> Message-ID: <20050613173312.GU953@iib.unsam.edu.ar> +----[ Brian Osborne (13.Jun.2005 13:44): | | Fernan, | | What is the $dataFile or id? | | Brian O. | +----] OK, so your question prompted me to look again in GenBank, because I'm using GenBank files that were downloaded some time ago ... if you go to GenBank now, there are no MEDLINE lines. But they are in my copy of the file (I'm using the entry for accession number AI563039 as an example, attached). So the bug is there. However, I don't know if it's still relevant or worth to fix it, as it seems that NCBI has pulled the MEDLINE line from the GenBank records. Anyway, I'll have to download all my datasets again :( Fernan LOCUS AI563039 530 bp mRNA linear EST 26-MAR-1999 DEFINITION TENS2196 T. cruzi epimastigote normalized cDNA Library Trypanosoma cruzi cDNA clone 2196 5', mRNA sequence. ACCESSION AI563039 VERSION AI563039.1 GI:4521421 KEYWORDS EST. SOURCE Trypanosoma cruzi ORGANISM Trypanosoma cruzi Eukaryota; Euglenozoa; Kinetoplastida; Trypanosomatidae; Trypanosoma; Schizotrypanum. REFERENCE 1 (bases 1 to 530) AUTHORS Verdun,R.E., Di Paolo,N.C., Urmenyi,T.P., Rondinelli,E., Frasch,A.C.C. and Sanchez,D.O. TITLE Gene discovery through expressed sequence tag sequencing in trypanosoma cruzi JOURNAL Infect. Immun. 66 (11), 5393-5398 (1998) MEDLINE 99003155 PUBMED 9784549 COMMENT Contact: Sanchez D.O. ... ... From skirov at utk.edu Mon Jun 13 15:13:19 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Jun 13 15:05:27 2005 Subject: [Bioperl-l] some notes on entrezgene parser performance Message-ID: <42ADDACF.5040607@utk.edu> This is brief comparison of locuslink (march download) v.s. entrezgene. While this is not too comprehensive it shows to some extent how entrezgene parser performs. The procedure: both locuslink and entrezgene files are parsed and loaded into a relational database (genereg.ornl.gov/gkdb) and then the number of records is compared between the locuslink and entrezgene tables. Locuslink parser has been used for years and I know it works well. So if there is a nasty bug in entrezgene, that missed some of the data, I expect to see a decrease in the number of records in entrezgene vs. locuslink (some fluctuation is normal). Here are two reports (for Gene Ontology and RefSeq, if interested please ask and I will send additional reports). I think the parser is functioning normally so far. The only significant deviation is GO data for Drosophila, but that is because the entrezgene file simply does not contain this data. This is quite a restricted approach, so I can't guarantee the parser is not missing something. Please let me know if you notice weird behavior. Stefan REFSEQ report: For organism Apis mellifera, entrez gene had 18 records more (less) than locuslink And table ll_refseq_nm had 14 records more (less) than locuslink For organism Bos taurus, entrez gene had 37113 records more (less) than locuslink And table ll_refseq_nm had 362 records more (less) than locuslink For organism Caenorhabditis elegans, entrez gene had -799 records more (less) than locuslink And table ll_refseq_nm had -383 records more (less) than locuslink For organism Canis familiaris, entrez gene had 42 records more (less) than locuslink And table ll_refseq_nm had 45 records more (less) than locuslink For organism Danio rerio, entrez gene had 2208 records more (less) than locuslink And table ll_refseq_nm had 305 records more (less) than locuslink For organism Drosophila melanogaster, entrez gene had -8814 records more (less) than locuslink And table ll_refseq_nm had 1249 records more (less) than locuslink For organism Gallus gallus, entrez gene had 16 records more (less) than locuslink And table ll_refseq_nm had 192 records more (less) than locuslink For organism Homo sapiens, entrez gene had 107382 records more (less) than locuslink And table ll_refseq_nm had 151 records more (less) than locuslink For organism Human immunodeficiency virus 1, entrez gene had -9 records more (less) than locuslink And table ll_refseq_nm had -24 records more (less) than locuslink For organism Mus musculus, entrez gene had 57427 records more (less) than locuslink And table ll_refseq_nm had 49 records more (less) than locuslink For organism Pan troglodytes, entrez gene had 19 records more (less) than locuslink And table ll_refseq_nm had 18 records more (less) than locuslink For organism Rattus norvegicus, entrez gene had 25821 records more (less) than locuslink And table ll_refseq_nm had 681 records more (less) than locuslink For organism Strongylocentrotus purpuratus, entrez gene had -29 records more (less) than locuslink And table ll_refseq_nm had -8 records more (less) than locuslink For organism Sus scrofa, entrez gene had 12 records more (less) than locuslink And table ll_refseq_nm had 3 records more (less) than locuslink For organism Takifugu rubripes, entrez gene had -238 records more (less) than locuslink And table ll_refseq_nm had -7 records more (less) than locuslink For organism Xenopus tropicalis, entrez gene had -2612 records more (less) than locuslink And table ll_refseq_nm had -2403 records more (less) than locuslink GeneOntology report For organism Danio rerio, entrez gene had 2208 records more (less) than locuslink And table ll_go had 4291 records more (less) than locuslink For organism Drosophila melanogaster, entrez gene had -8814 records more (less) than locuslink And table ll_go had -44725 records more (less) than locuslink For organism Homo sapiens, entrez gene had 107382 records more (less) than locuslink And table ll_go had 6716 records more (less) than locuslink For organism Mus musculus, entrez gene had 57427 records more (less) than locuslink And table ll_go had 2972 records more (less) than locuslink For organism Rattus norvegicus, entrez gene had 25821 records more (less) than locuslink And table ll_go had 4927 records more (less) than locuslink From jeremy_just at netcourrier.com Mon Jun 13 18:50:07 2005 From: jeremy_just at netcourrier.com (=?ISO-8859-15?Q?J=E9r=E9my?= JUST) Date: Mon Jun 13 18:52:33 2005 Subject: [Bioperl-l] Putative bug in Bio/SearchIO/blast.pm In-Reply-To: <56F7C7EB-10CD-4D2F-B9E7-6B701392EB60@duke.edu> References: <20050609145859.0000248d@pearson.infobiogen.fr> <56F7C7EB-10CD-4D2F-B9E7-6B701392EB60@duke.edu> Message-ID: <20050614005007.67bb8d41@norbert.inapg.inra.fr> On Sun, 12 Jun 2005 16:15:36 -0400 Jason Stajich wrote: > I'm pretty sure this is already fixed with the code in CVS. Can you > try it with that code You're right. I should have looked at CVS before posting. > Also, please post bugs and diffs to http://bugzilla.open-bio.org so > that they can be tracked OK, next time, I'll use bugzilla. Each project has its development habits, and I hadn't looked at Bioperl and its development for nearly three years. Cheers. -- J?r?my JUST From eelhaik at uh.edu Mon Jun 13 18:38:19 2005 From: eelhaik at uh.edu (EranEl) Date: Tue Jun 14 09:02:47 2005 Subject: [Bioperl-l] Volunteers to contributions of Perl code Message-ID: <0II100BECISALK@mail.uh.edu> Dear sir I read your message in the bioperl forum regarding the conversion from phylip to fasta I was wondering if you have the perl code that does that Thank you very much ____________________________________ Eran Elhaik: Lab Phone: (713) 743-2312 Doctoral Student University of Houston HYPERLINK "http://nsm.uh.edu:16080/~dgraur/eran/main.htm"http://nsm.uh.edu:16080/~dgra ur/eran/main.htm ____________________________________ -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.323 / Virus Database: 267.6.9 - Release Date: 11/06/2005 From ChaosMK2 at gmx.net Mon Jun 13 16:58:31 2005 From: ChaosMK2 at gmx.net (ChaosMK2) Date: Tue Jun 14 09:03:17 2005 Subject: [Bioperl-l] Bioperl module: Bio::Tools::Run::Alignment::TCoffee Message-ID: <42ADF377.6060906@gmx.net> I would like to use your TCoffe module but I encountered some problems. First I post the code then I will write my questions. #!/usr/bin/perl; use strict; use warnings; use Bio::Tools::Run::Alignment::TCoffee; print "Enter full name of directory containing t_*fasta files:\n"; chomp(my $dir = <>); chdir $dir; #my @params = ("ktuple" => 2, "matrix" => "BLOSUM", "tg-mode" => 0, "output" => "fasta_aln", "score_html"); my @params = ("ktuple" => 2, "matrix" => "BLOSUM", "output" => "fasta_aln",); my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); my @multizFiles = grep(!-d, glob("t_MULTIZ_BLAT_*fasta")); for my $multizFile (@multizFiles) { open(INPUT, "<", $multizFile); open(FILE, ">", "Temp"); for() { s/^(\w\w)\t([-acgtu]+)$/>$1\n$2/i; print FILE; } close FILE; close INPUT; my $file = File::Spec->catfile($dir, "Temp"); my $aln = $factory->align($file); open(FILE, ">", "TCoffee_$multizFile"); print FILE $aln; close FILE; unlink $file; } exit; First: if I use tg-mode I get the exeption "unallowed parameter TG-MODE !" Second: the same exeption with score_html (Considering your description I can have more than one output format. Or is it my fault and I have to write it another way?) Third and main problem: I am working on a Windows XP system so I guess thats the reason but here comes what the exception tells me: TCoffee call crashed: 256 [command -in=...] Before that windows reports an error: Command "-in" was not found... How to resolve that. Do I have to use Linux? I would have the possiblity but not the admin rights to install all the modules but thats my problem and would not be an obstacle. Are there any other solutions for my problem or is my code buggy? Thank you for any help provided. Sebastian From jason.stajich at duke.edu Tue Jun 14 09:32:50 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jun 14 09:28:11 2005 Subject: [Bioperl-l] Volunteers to contributions of Perl code In-Reply-To: <0II100BECISALK@mail.uh.edu> References: <0II100BECISALK@mail.uh.edu> Message-ID: <8902037E-D3BA-40A6-AA81-73F11D52C952@duke.edu> See Bio::AlignIO module. On Jun 13, 2005, at 6:38 PM, EranEl wrote: > Dear sir > > > > I read your message in the bioperl forum regarding the conversion from > phylip to fasta > > I was wondering if you have the perl code that does that > > > > Thank you very much > > > > ____________________________________ > > Eran Elhaik: Lab Phone: (713) 743-2312 > > Doctoral Student > > University of Houston > > HYPERLINK > "http://nsm.uh.edu:16080/~dgraur/eran/main.htm"http://nsm.uh.edu: > 16080/~dgra > ur/eran/main.htm > > ____________________________________ > > > > > -- > No virus found in this outgoing message. > Checked by AVG Anti-Virus. > Version: 7.0.323 / Virus Database: 267.6.9 - Release Date: 11/06/2005 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From ram at i122server.vu-wien.ac.at Tue Jun 14 10:18:06 2005 From: ram at i122server.vu-wien.ac.at (Rambabu Gudavalli) Date: Tue Jun 14 10:22:08 2005 Subject: [Bioperl-l] how can i get GO terms Message-ID: Dear All, i am trying to get the Gene Ontology for melanogaster genome. could any one know how to get them. i have the melanogaster gene names (ex: CG17178). i want the biological function and biological process etc from Gene ontology. thank you. Ram From chad at dieselwurks.com Tue Jun 14 13:02:36 2005 From: chad at dieselwurks.com (Chad Matsalla) Date: Tue Jun 14 12:54:17 2005 Subject: [Bioperl-l] genbank2gff and how do I use gff3 with Bio::DB::GFF? Message-ID: Greetings, It's a bit cheesy to follow up on my own post but due to the underwhelming response I thought I'd tell people that evertything works fine if you use bp_genbank2gff3.pl. Since bp_genbank2gff.pl doesn't work quite properly and hasn't been changed for over a year can it just be moved to the attic to prevent other people from trying[1] to work with it? Thanks! Chad Matsalla [1] For some strange reason. From lstein at cshl.edu Tue Jun 14 14:00:36 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Jun 14 13:52:57 2005 Subject: [Bioperl-l] Re: patch xyplot.pm negative values Bio::Graphics/GMOD In-Reply-To: <1118517484.8515.12.camel@localhost.localdomain> References: <1118517484.8515.12.camel@localhost.localdomain> Message-ID: <200506141400.37019.lstein@cshl.edu> Hi Albert, Much obliged to you on this. Unfortunately the patch got wordwrapped in transit. Could you send the patch as an attachment? Sometimes you need to gzip the attachment first. Lincoln On Saturday 11 June 2005 03:18 pm, Albert Vilella wrote: > Hi all, > > xyplot negative values: > > Lincoln and Guenther: this are very hacky modifications (I still haven't > completely familiarized myself with Bio::Graphics) > > I took a look at the code in xyplot.pm, and tested it on a local png > file (I don't have a GBrowse setup to test). Please check if what > has been done is correct. > > It seems to me that _draw_boxes needs to be modified, so that the > boxes aren't wrongly plotted, but _draw_histogram, _draw_line and > _draw_points don't need anything special, a part from the existence of > the scale. > > In _draw_boxes, negative boxes need to be plotted from the middle of > the track down to their score, so I modified it in a similar fashion > as it is done in _draw_scale > > --- xyplot.pm.~1.15.~ 2005-06-11 08:03:02.000000000 +0200 > +++ xyplot.pm 2005-06-11 21:04:37.895586696 +0200 > @@ -64,7 +64,8 @@ > > my $type = $self->option('graph_type') || $self->option('graphtype') > > || 'boxes'; > > $self->_draw_histogram($gd,$x,$y) if $type eq 'histogram'; > - $self->_draw_boxes($gd,$x,$y) if $type eq 'boxes'; > + #we need $dx and $dy for calculating $half for negative boxes > + $self->_draw_boxes($gd,$x,$y,$dx,$dy) if $type eq 'boxes'; > $self->_draw_line ($gd,$x,$y) if $type eq 'line' > or $type eq 'linepoints'; > $self->_draw_points($gd,$x,$y) if $type eq 'points' > @@ -141,7 +142,7 @@ > > sub _draw_boxes { > my $self = shift; > - my ($gd,$left,$top) = @_; > + my ($gd,$left,$top,$dx,$dy) = @_; > > my @parts = $self->parts; > my $fgcolor = $self->fgcolor; > @@ -152,8 +153,17 @@ > for (my $i = 0; $i < @parts; $i++) { > my $part = $parts[$i]; > my $next = $parts[$i+1]; > + my ($dummy1,$dummy2,$dummy3,$zero) = > $part->calculate_boundaries($left,0); > my ($x1,$y1,$x2,$y2) = $part->calculate_boundaries($left,$top); > - $self->filled_box($gd,$x1,$part->{_y_position},$x2,$y2,$bgcolor, > $fgcolor); > + # If negative box > + if ($zero < ($part->{_y_position})) { > + my ($dummyx1,$ny1,$dummyx2,$ny2) = > $self->calculate_boundaries($dx,$dy); > + my $half = ($ny1+$ny2)/2; > + $self->filled_box($gd,$x1,$part->{_y_position},$x2,$half, > $bgcolor,$fgcolor); > + # Normal positive box > + } else { > + $self->filled_box($gd,$x1,$part->{_y_position},$x2,$y2, > $bgcolor,$fgcolor); > + } > next unless $next; > my ($x3,$y3,$x4,$y4) = $next->calculate_boundaries($left,$top); > $gd->line($x2,$y2,$x3,$y4,$fgcolor) if $x2 < $x3; > > This results in boxes being plot upward if positive and downward if > negative from the middle of the track, (midpoint between $min_score > and $max_score). I don't know if it is the best thing to do, but it > makes sense to me. > > Also, negative boxes will have a different color than positive boxes. It > also makes sense to me. > > If you can check if it is ok and test it in a running GBrowse server, > that would be great. > > Bests, > > Albert. > > PD: I haven't done anything about the log coordinates. -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From hartzell at kestrel.alerce.com Tue Jun 14 13:48:48 2005 From: hartzell at kestrel.alerce.com (George Hartzell) Date: Tue Jun 14 14:39:07 2005 Subject: [Bioperl-l] Bio::Tools::Primer3.pm bug: code or doc, you decide??? In-Reply-To: References: <17068.45744.979958.907829@satchel.alerce.com> Message-ID: <17071.6272.927626.702576@satchel.alerce.com> Brian Osborne writes: > George, > > You're presenting a good argument that it's the code that should change. > > Brian O. > > > On 6/12/05 6:09 PM, "George Hartzell" wrote: > > > > > The pod for Bio::Tools::Primer3::number_of_results says for Function: > > > > "Retrieve the number of primers returned from Primer3" > > > > and for it's Notes:, it says > > > > "Returns the maximum number of primers returned from Primer3." > > > > In fact, it's returning the maximum offset of the array of results, > > also known as one *less* than the number of results that primer3 > > returned.... (see Bio::Tools::Primer3::_separate() and it's use of > > $maxlocation and $self->{'maximum_primers_returned'}). > > > > Given the method's *name*, I figured it'd tell me how many primers I'd > > gotten back from primer3. > > > > Given the method's Function doc, ditto.... > > > > Given the method's Notes, it's a bit fuzzier, but it still evokes a > > count. > > > > I'd like to fix *something*, so that I'll feel better about wasting an > > hour walking the primer3 C code and pestering the authors about an off > > by one bug..... > > > > So, should I change the method to add one, or change the documentation > > to specify that it's the maximum offset into the results array (even > > though the names then misleading). > > > > I'd rather fix the code, but *I* don't have to be backward > > compatible.... > > > > Comments? Here's a patch that fixes it. It changes the behaviour, code that used to work will now have an off by one error. There should be a "Headsup! We just made a non-backward-compatible change" warning if when this gets committed. g. -------------- next part -------------- A non-text attachment was scrubbed... Name: patch-Primer3.pm Type: application/octet-stream Size: 942 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050614/c27aff49/patch-Primer3.obj From hlapp at gnf.org Tue Jun 14 14:55:38 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 14 14:46:09 2005 Subject: [Bioperl-l] Re: memory error while loading SwissProt into Oracle using bioperl-db In-Reply-To: References: Message-ID: <3ba087a1f2d128f023b94d871b0366fa@gnf.org> On Jun 14, 2005, at 2:52 AM, Jana Bauckmann wrote: > Hi, > > I would like to load SwissProt data into my Oracle 9.2 database with > BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got > two > problems: > > 1) I get many (about 1300) warnings stating integrity constraint > errors: > > ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - parent > key not found (DBD ERROR: OCIStmtExecute) > > ORA-01400: cannot insert NULL into > ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS") > (DBD ERROR: OCIStmtExecute) If there is indeed no authors for the respective reference in the respective SwissProt entries then this is expected because Reference.Authors may not be NULL. You should, however, see more than just the error message above; supposedly there is a warning message following or preceding it that informs about not all foreign keys succeeded to insert, and the message should give the primary key. This should be the primary key for the bioentry that should have gotten the reference attached. Using SQL you should then be able to identify which record it is and then you can look it up on the Swissprot site or in your Swissprot source file. If the bioentry itself fails to load because of this problem then you should see an error message to this effect, with full stack trace. Otherwise the bioentry did load, just the reference didn't, and if you don't really need this particular reference, you don't need to worry about it. You may also want to consider trying to upgrade to a CVS snapshot from either the 1.4 branch or the main trunk. There have been a few fixes to modules that I believe include the swissprot parser. > > 2) The script stops after 2 hours (34500 tuples in table BioEntry) with > message: Out of memory! > > I guess problem 1 causes problem 2. Is this reasonable or do I have two > separated problems? The one before may not even be a real problem, see above. It is extremely unlikely that it causes the memory problem. Swissprot is is a large, very diverse, and richly annotated data source, and because bioperl-db caches a lot of stuff like ontology terms, references, and dbxrefs the loader process will eventually use up anywhere between 500MB and 1.3GB of RAM. Given the amount of memory you have this shouldn't be a limitation though at all, unless maybe if you gave all the memory to Oracle running on the same machine. I've had a memory leak issue with DBD::Oracle, the Oracle 9iR2 client library, and multi-thread enabled perl 5.8.1 on MacOSX. You may be seeing a similar problem. Try watching the loader process in top and see how fast the memory consumption grows. It will grow due to the object cache filling up, but if you see it eating up more than 1GB before 100,000 records loaded you're likely to have hit a memory leak. If that's the case you'll have to rebuild your own perl from source with multi-threading disabled. -hilmar > > I run Oracle and the load script on the same machine with: > Suse Linux 9.0 (kernel 2.4.21-291-smp) with 12 GB RAM > perl 5.8.1, built for i586-linux-thread-multi > bioperl 1.4 > bioperl-db 0.1 BTW I'm assuming this is not correct; otherwise the latest BioSQL schema wouldn't be supported, let alone the Oracle version of it. You probably obtained a snapshot from CVS? > DBI 1.48 > DBD::Oracle 1.16 > Oracle 9.2 > BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on > 6th > June 2005) > > Thanks for any suggestions, > Jana > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cain at cshl.edu Tue Jun 14 16:53:29 2005 From: cain at cshl.edu (Scott Cain) Date: Tue Jun 14 16:45:29 2005 Subject: [Bioperl-l] genbank2gff and how do I use gff3 with Bio::DB::GFF? In-Reply-To: References: Message-ID: <1118782409.3813.34.camel@localhost.localdomain> Hi Chad, Sorry I haven't reproduced your results; I've been dragged down by other matters. If I can recap to make sure I have it right: bp_genbank2gff.pl will correctly generate gff2 bp_genbank2gff3.pl will correctly generate gff3 If I have that right, I would be reluctant to pitch bp_genbank2gff.pl as there are still plenty of people (that is, the overwhelming majority) that use gff2. I nearly certain (without even looking) that they could be documented better. What would be really great would be if you could submit this as a bug on bioperl's bugzilla, so I won't forget about it. I have a few software releases I really want to get out in the next week or so, so I know I won't get to it real soon. Thanks, Scott On Tue, 2005-06-14 at 11:02 -0600, Chad Matsalla wrote: > Greetings, > > It's a bit cheesy to follow up on my own post but due to the > underwhelming response I thought I'd tell people that evertything works > fine if you use bp_genbank2gff3.pl. > > Since bp_genbank2gff.pl doesn't work quite properly and hasn't been > changed for over a year can it just be moved to the attic to prevent > other people from trying[1] to work with it? > > Thanks! > > Chad Matsalla > > > [1] For some strange reason. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From tuantran167 at gmail.com Tue Jun 14 20:03:47 2005 From: tuantran167 at gmail.com (Tuan A. Tran) Date: Tue Jun 14 19:55:50 2005 Subject: [Bioperl-l] extract info from .game.xml Message-ID: Hi, I am trying to extract some information from a file filename.game.xml (I got this file from flybase.org). I wrote a simple script to test it. However, I keep getting the following message ------------- EXCEPTION ------------- MSG: No annotations STACK Bio::SeqIO::game::gameHandler::load /usr/local/share/perl/5.8.4/Bio/SeqIO/game/gameHandler.pm:121 STACK Bio::SeqIO::game::_getseqs /usr/local/share/perl/5.8.4/Bio/SeqIO/game.pm:156 STACK Bio::SeqIO::game::next_seq /usr/local/share/perl/5.8.4/Bio/SeqIO/game.pm:101 STACK toplevel fetchseq_game_xml.pl:64 I have no idea why. Can anyone help? Thanks in advance, TAT --------------------------------- My simple script is #!/usr/local/lib/perl use strict; sub NULL () {0}; use Bio::Seq; use Bio::SeqIO; #use Bio::SeqIO::game; #use Bio::Annotation; use Bio::SearchIO; use Bio::AlignIO; use Bio::SimpleAlign; use Bio::LocatableSeq; use Bio::Tools::Run::StandAloneBlast; use Bio::Tools::Run::Alignment::Clustalw; use Getopt::Long; use Bio::DB::GenBank; use Bio::DB::Flat::BDB; #use Bio::Index::GenBank; use Bio::Index::Fasta; use Bio::SeqFeature::Generic; use DBI; my $infile = shift; my $in = Bio::SeqIO->new( -file=> $infile, -format=>'game'); while (my $query = $in->next_seq() ) { print $query->id,"\n"; } From chad at dieselwurks.com Tue Jun 14 22:53:02 2005 From: chad at dieselwurks.com (Chad Matsalla) Date: Tue Jun 14 22:44:44 2005 Subject: [Bioperl-l] genbank2gff and how do I use gff3 with Bio::DB::GFF? In-Reply-To: <1118782409.3813.34.camel@localhost.localdomain> References: <1118782409.3813.34.camel@localhost.localdomain> Message-ID: On Tue, 14 Jun 2005, Scott Cain wrote: > Sorry I haven't reproduced your results; I've been dragged down by other > matters. If I can recap to make sure I have it right: > > bp_genbank2gff.pl will correctly generate gff2 I don't know how to ask it to produce gff2. As far as I can tell from the source it generates v3 only and under default use it produces gff3. And the gff3 it produces returns undef to the segment() call in [1]. > bp_genbank2gff3.pl will correctly generate gff3 Yes, it generates gff that I can successfully bind to using code I provided earlier and included here[1]. I'm not at all a gff expert so I was hoping for someone else[2] to tell me the difference between the gff3 produced by genbank2gff and genbank2gff3. Whew, that's a mouthfull to say out loud. > If I have that right, I would be reluctant to pitch bp_genbank2gff.pl as > there are still plenty of people (that is, the overwhelming majority) > that use gff2. I nearly certain (without even looking) that they could > be documented better. I'd be pleased for information on how to make either script produce gff2. Am I missing something? > What would be really great would be if you could submit this as a bug on > bioperl's bugzilla, so I won't forget about it. I have a few software > releases I really want to get out in the next week or so, so I know I > won't get to it real soon. I'd love to report a bug but I'm not clear on what I'm reporting. Is it that the gff3 produced by genbank2gff is incomplete in some way? Or is it that the gff3 produced by genbank2gff requires some special technique to use as in [1]? Or is it that genbank2gff should be producing gff2? Thanks for your advice on how to treat this! These scripts provide a great start to working with these systems. Chad Matsalla [1] my $db2 = Bio::DB::GFF->new( -adaptor => 'memory', -file => './t/data/AE003644_Adh-genomic.gff' ); my $segment2 = $db2->segment(-name => 'AE003644'); my @features = $segment2->features(); print("There are this many features on AE003644 (".scalar(@features).")\n"); [2] wink wink, nudge nudge From priesel at caesar.de Wed Jun 15 02:53:30 2005 From: priesel at caesar.de (Saskia Priesel) Date: Wed Jun 15 02:45:06 2005 Subject: [Bioperl-l] Module: Bio::Structure::IO Message-ID: <42AFCFFE.6080605@caesar.de> Hello to all, I have a problem with very much Files from the PDB (Protein Data Bank). I want to analyse 30000 PDB Files. For this I take the Bioperl Module Bio::Structure::IO for reading the whole entry. Below I give you the source code. My Problem is now that I will have to much open entries in the memory. Is there a method or so in the module which can handle this? sub filter_data { my $pdb_files_ref = shift; my @pdb_files = @$pdb_files_ref; #print join ("\n",@pdb_files); #initialize variables my @file_data = (); my $min_length = 0; my $max_length = 100; for(my $i=0;$i<=$#pdb_files;$i++) { my $data = $pdb_files[$i]; #print "$data\n"; my $structio = Bio::Structure::IO->new(-file => "$data", '-format' => 'pdb'); my $structure = $structio->next_structure(); print "Structure",$structure->id,"\n"; my @chain_list = $structure->get_chains(); my $length = scalar @chain_list; #print "Laenge: $length\n"; #print "Letztes Element: $chain_list[-1]\n"; for(my $i=0;$i<=$#chain_list;$i++) { my $chain = $chain_list[$i]; #print "Chain: $chain\n"; my $chainid = $chain->id; #print "Chain: $chainid\n"; if($chainid =~ m/default/) { $pseq = $structure->seqres(); my $default_seq = $pseq->seq(); #print "$default_seq\n"; if(length($default_seq) >= $min_length && length($default_seq) <= $max_length) { if($default_seq =~ m/.*C.*C.*C/i == 0) { print "Structure",$structure->id,"\n"; print "Chain: $chainid\n"; print "$default_seq\n"; print "Laenge: ",length($default_seq),"\n"; } } next; } my $pseq = $structure->seqres($chainid); if (!$pseq){ last; } my $sequence = $pseq->seq(); #print "$sequence\n"; if(length($sequence) >= $min_length && length($sequence) <= $max_length) { if($sequence =~ m/.*C.*C.*C/i == 0) { print "Structure",$structure->id,"\n"; print "Chain: $chainid\n"; print "$sequence\n"; print "Laenge: ",length($sequence),"\n"; next; } } } } } From michael.watson at bbsrc.ac.uk Wed Jun 15 10:14:27 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed Jun 15 10:07:54 2005 Subject: [Bioperl-l] locus_tag of features in Bio::SeqFeature->gff_string Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D73D@iahce2knas1.iah.bbsrc.reserved> Hi I've noticed that some locus tags are reported with "'s round them and some are not when gff_string is called on a feature. E.g. in NC_002755.gbk, MT2384.1 comes out as "MT2384.1", and MT4034 comes out as MT4034. The dot seems to be the deciding factor. Not a massive issue, by any means... :-S Mick From jason.stajich at duke.edu Wed Jun 15 11:42:26 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jun 15 11:35:23 2005 Subject: [Bioperl-l] locus_tag of features in Bio::SeqFeature->gff_string In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D73D@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950172D73D@iahce2knas1.iah.bbsrc.reserved> Message-ID: <7A7F0AC8-DA95-4587-BBAC-D64664F68D84@duke.edu> http://www.sanger.ac.uk/Software/formats/GFF/ GFF_Spec.shtml#attribute_field I see the a Sequence attribute field with strings containing "." being escaped. I assume that is why you see this in the current gff_string implementation. I don't know if this persists in GFF3 as things are slightly different in terms of what needs to be escaped/protected in quotes. I think things have really grown organically wrt GFF input/output - I don't know if the Bio::FeatureIO system provides a more uniform solution for the future? -jason On Jun 15, 2005, at 10:14 AM, michael watson ((IAH-C)) wrote: > Hi > > I've noticed that some locus tags are reported with "'s round them and > some are not when gff_string is called on a feature. > > E.g. in NC_002755.gbk, MT2384.1 comes out as "MT2384.1", and MT4034 > comes out as MT4034. The dot seems to be the deciding factor. > > Not a massive issue, by any means... :-S > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From asu at gnf.org Wed Jun 15 15:32:35 2005 From: asu at gnf.org (Andrew Su) Date: Wed Jun 15 15:24:19 2005 Subject: [Bioperl-l] aligned sequence from hmmer output Message-ID: <26AA69A5942B374388D123E5B8D121BE1D758F@EXCHCLUSTER01.lj.gnf.org> Hello, Does anyone know of an easy way to get the sequence from an alignment in hmmer output? I think from blast output, it's something like $hsp->seq('query') or $hsp->seq('target'), but when I try this for a hmmer hsp, it throws an exception complaining about trying to create a seq object with a sequence that has "." and "+" and such (pasted below). for example, my program looks something like this: $in = new Bio::SearchIO( -format => 'hmmer', -file => $ARGV[0] ); while( my $result = $in->next_result ) { while( $hit = $result->next_hit ) { while( $hsp = $hit->next_domain ) { print $hit->name,",",$hsp->evalue,",",$hsp->score, "\n"; } } } ... and I'd like to output the matched sequence as well. Anyone have any thoughts on how I would accomplish this? (apologies if this has previously been addressed; couldn't find the right combination of search strings to fish the answer out of the archives...) thanks, -andrew ------------- EXCEPTION ------------- MSG: Attempting to set the sequence to [aggttaa.a.cggtcaa.aa<-*] which does not look healthy STACK Bio::PrimarySeq::seq /depts/CompDisc/lib/perl/Bio/PrimarySeq.pm:267 STACK Bio::PrimarySeq::new /depts/CompDisc/lib/perl/Bio/PrimarySeq.pm:217 STACK Bio::LocatableSeq::new /depts/CompDisc/lib/perl/Bio/LocatableSeq.pm:100 STACK Bio::Search::HSP::HSPI::seq /depts/CompDisc/lib/perl/Bio/Search/HSP/HSPI.pm:572 STACK toplevel ./parse_hmmer.pl:17 From jason.stajich at gmail.com Wed Jun 15 15:08:10 2005 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed Jun 15 15:30:33 2005 Subject: [Bioperl-l] Fwd: query References: <20050615185742.84407.qmail@web50608.mail.yahoo.com> Message-ID: Begin forwarded message: > From: Maitry Kothari > Date: June 15, 2005 2:57:42 PM EDT > To: jason@bioperl.org > Subject: query > > > hi, > I am student of MS (computer) , in this semester i have sub bio- > informatics & i m doing my project & in that i have to compare two > text files & have to get one out put file. how can i do this with > perl? > thanks > maitry > > Send instant messages to your online friends http:// > uk.messenger.yahoo.com > -- Jason Stajich jason.stajich-at-gmail.com or jason-at-bioperl.org http://jason.open-bio.org From jason.stajich at duke.edu Wed Jun 15 15:51:33 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jun 15 15:44:17 2005 Subject: [Bioperl-l] aligned sequence from hmmer output In-Reply-To: <26AA69A5942B374388D123E5B8D121BE1D758F@EXCHCLUSTER01.lj.gnf.org> References: <26AA69A5942B374388D123E5B8D121BE1D758F@EXCHCLUSTER01.lj.gnf.org> Message-ID: <9E3F9E4D-35EC-4257-ADF2-286A42B4CF2D@duke.edu> You can get them as strings: $hsp->hit_string and $hsp->query_string. $hsp->homology_string gives you the middle. -jason On Jun 15, 2005, at 3:32 PM, Andrew Su wrote: > Hello, > > Does anyone know of an easy way to get the sequence from an > alignment in > hmmer output? I think from blast output, it's something like > $hsp->seq('query') or $hsp->seq('target'), but when I try this for a > hmmer hsp, it throws an exception complaining about trying to create a > seq object with a sequence that has "." and "+" and such (pasted > below). > > > for example, my program looks something like this: > > $in = new Bio::SearchIO( -format => 'hmmer', -file => $ARGV[0] ); > while( my $result = $in->next_result ) { > while( $hit = $result->next_hit ) { > while( $hsp = $hit->next_domain ) { > print > $hit->name,",",$hsp->evalue,",",$hsp->score, "\n"; > } > } > } > > ... and I'd like to output the matched sequence as well. Anyone have > any thoughts on how I would accomplish this? > > (apologies if this has previously been addressed; couldn't find the > right combination of search strings to fish the answer out of the > archives...) > > thanks, > -andrew > > > ------------- EXCEPTION ------------- > MSG: Attempting to set the sequence to [aggttaa.a.cggtcaa.aa<-*] which > does not look healthy > STACK Bio::PrimarySeq::seq > /depts/CompDisc/lib/perl/Bio/PrimarySeq.pm:267 > STACK Bio::PrimarySeq::new > /depts/CompDisc/lib/perl/Bio/PrimarySeq.pm:217 > STACK Bio::LocatableSeq::new > /depts/CompDisc/lib/perl/Bio/LocatableSeq.pm:100 > STACK Bio::Search::HSP::HSPI::seq > /depts/CompDisc/lib/perl/Bio/Search/HSP/HSPI.pm:572 > STACK toplevel ./parse_hmmer.pl:17 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hartzell at kestrel.alerce.com Wed Jun 15 20:03:59 2005 From: hartzell at kestrel.alerce.com (George Hartzell) Date: Wed Jun 15 19:55:30 2005 Subject: [Bioperl-l] Bio::Graphics.pm example/documentation fix Message-ID: <17072.49647.402834.351364@satchel.alerce.com> I couldn't get the example from the Bio::Graphics.pm pod doc to work as written. I think that the second add_track should take $wholeseq as it's first argument, not $seq. I also think that the comment at the top of the example, talking about a script named red_and_blue.pl, is an historical artifact. Here's a patch that fixes both of these. Can someone more familiar with how this works tell me if after applying this patch example generates the glyphs it should? Thanks, g. *** Bio/Graphics.pm.orig Wed Jun 15 16:54:30 2005 --- Bio/Graphics.pm Wed Jun 15 16:58:01 2005 *************** *** 12,21 **** =head1 SYNOPSIS - # This script generates a PNG picture of a 10K region containing a - # set of red features and a set of blue features. Call it like this: - # red_and_blue.pl > redblue.png - # you can now view the picture with your favorite image application - - # This script parses a GenBank or EMBL file named on the command # line and produces a PNG rendering of it. Call it like this: --- 12,15 ---- *************** *** 54,58 **** -tick => 2); ! $panel->add_track($seq, -glyph => 'generic', -bgcolor => 'blue', --- 48,52 ---- -tick => 2); ! $panel->add_track($wholeseq, -glyph => 'generic', -bgcolor => 'blue', From schuerer at genomining.com Thu Jun 16 04:26:14 2005 From: schuerer at genomining.com (Katja Schuerer) Date: Thu Jun 16 04:19:52 2005 Subject: [Bioperl-l] Course in informatics for biology 2006 at Institut Pasteur Message-ID: <1118910373.4577.44.camel@dhcp-142.genomining.net> Course in informatics for biology 2006 at Institut Pasteur In the series of courses offered at the Pasteur Institute, a course will be offered in informatics in biology. The next session will take place from January to end of April 2006. The main goal of this course is to provide researchers in biology an initial exposure to informatics. Admitance in the course is reserved for those with a degree in biology or a related discipline. With more and more bioinformatics tools available, it becomes increasingly important for researchers in biology to be able both to manage their data, implement their ideas, and judge for themselves the usefulness of new algorithms and software. This course will emphasize fundamental aspects of computer science and apply them to biological examples. Theoretical aspects (algorithm development, logic, problem modeling and design methods), and technical applications (databases and web technologies) that are relevant for biologists will be thoroughly discussed. Programming is presented through the object-oriented paradigm, using a modern high-level language, Python, provided with tools for biology and enabling both prototyping or scripting and the building of important software systems. Learning of additional languages (perl and C) will be available for interested students. Learning during the course will be reinforced with computing exercises, and effective training will be provided by a 2 month research project. The working language of the course is French. For further information, please consult: http://www.pasteur.fr/formation/infobio-en.html *** Registration will be closed on October 15 2005. *** Sincerely, -- Catherine Letondal, Institut Pasteur & Katja Schuerer, Genomining Course informatics for biology -- www.pasteur.fr/formation/infobio From avilella at gmail.com Thu Jun 16 05:30:52 2005 From: avilella at gmail.com (Albert Vilella) Date: Thu Jun 16 05:22:59 2005 Subject: [Bioperl-l] scales in xyplot.pm negative values Bio::Graphics for GBrowse In-Reply-To: <200506141400.37019.lstein@cshl.edu> References: <1118517484.8515.12.camel@localhost.localdomain> <200506141400.37019.lstein@cshl.edu> Message-ID: <1118914253.8228.23.camel@localhost.localdomain> Hi all, I post this here for suggestions: How would we like to set the scale or xyplots with respect to min_value and max_value in cases where negative values exist. For example: Ex1: min_value=-1 max_value=5 Set: 0 to the middle of the plot, then plot from +5 to -5: +5 + | | | 0 +----------------------------------------- | | | -5 + Set: the scale to plot from +5 to -1, which will fill up all the plot (as in UCSC wiggle tracks): +5 + | | | | | 0 +----------------------------------------- | -1 + Comments? Bests, Albert. From jcsanchez at cib.csic.es Thu Jun 16 05:11:57 2005 From: jcsanchez at cib.csic.es (Juan Carlos Sanchez Ferrero) Date: Thu Jun 16 10:02:44 2005 Subject: [Bioperl-l] pfam query Message-ID: <42B1425D.3030109@cib.csic.es> hello, Does anyone know if it is possible to make a batch query using bioperl to the pfam database ? i tried using Bio::Tools::WWW, but i am only able to get by the swisspfam accession number, and what i am interested is in making new queries.... Thanks in advance for any help jc From sdavis2 at mail.nih.gov Thu Jun 16 10:40:03 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Jun 16 10:33:02 2005 Subject: [Bioperl-l] pfam query In-Reply-To: <42B1425D.3030109@cib.csic.es> References: <42B1425D.3030109@cib.csic.es> Message-ID: On Jun 16, 2005, at 5:11 AM, Juan Carlos Sanchez Ferrero wrote: > > > hello, > > Does anyone know if it is possible to make a batch query using bioperl > to the pfam database ? > i tried using Bio::Tools::WWW, > but i am only able to get by the swisspfam accession number, > and what i am interested is in making new queries.... I'm not sure what you are trying to do, but have you looked at ensembl mart? You can access the MySQL database running the ensembl mart directly to query for genes with certain pfam domains or all pfam domains for given (or all genes). Sean From jcsanchez at cib.csic.es Thu Jun 16 11:33:54 2005 From: jcsanchez at cib.csic.es (Juan Carlos Sanchez Ferrero) Date: Thu Jun 16 14:05:04 2005 Subject: [Bioperl-l] pfam query In-Reply-To: References: <42B1425D.3030109@cib.csic.es> Message-ID: <42B19BE2.8030206@cib.csic.es> No is not that, what i am more interested is in get the graphical representation of the domains of each of my prots here is an example, i can do this: #! /usr/bin/perl -w use Bio::Tools::WWW qw(:obj); $pfam=$BioWWW->search_url('pfam_sp_uk'); print STDERR "$pfam\n"; $query= $pfam.'Q9P902' system "wget -p $query"; exit; and it works fine, but in contrast if i modify it in order to search for a seq this $pfam=$BioWWW->search_url('pfam_sp_uk'); to $pfam=$BioWWW->search_url('pfam_seq_uk'); and the $query to something like $query=$pfam.'MTASDFPFCVLLIDFNPD..........' i got an error saying that i haven't given a query sequence..... maybe i am doing something wrong or maybe it is not possible do it this way, i don't know jc Sean Davis wrote: > > On Jun 16, 2005, at 5:11 AM, Juan Carlos Sanchez Ferrero wrote: > >> >> >> hello, >> >> Does anyone know if it is possible to make a batch query using >> bioperl to the pfam database ? >> i tried using Bio::Tools::WWW, >> but i am only able to get by the swisspfam accession number, >> and what i am interested is in making new queries.... > > > I'm not sure what you are trying to do, but have you looked at ensembl > mart? You can access the MySQL database running the ensembl mart > directly to query for genes with certain pfam domains or all pfam > domains for given (or all genes). > > Sean > From tuantran167 at gmail.com Sat Jun 18 01:42:41 2005 From: tuantran167 at gmail.com (Tuan A. Tran) Date: Sat Jun 18 01:34:54 2005 Subject: [Bioperl-l] parsing blast output Message-ID: Hi, When I blasted my query sequence against a database, the got the following line (for example) >3R type=chromosome; loc=3R:1..27905053; ID=3R; release=r4.1; species=dmel Using bioperl module, $blast_report = $factory->blastall($query); I can extract some information like ID = 3R using $blast_report->next_result->hits()->name; If I want to keep the entire line as show above what should I do? Is there a module in bioperl? I really appreciate if someone can tell me how to do it. Thanks, TAT From iak13000 at gmail.com Sat Jun 18 06:41:13 2005 From: iak13000 at gmail.com (Irshad Khan) Date: Sat Jun 18 06:33:38 2005 Subject: [Bioperl-l] parsing blast output In-Reply-To: <7851868c05061803392128884a@mail.gmail.com> References: <7851868c05061803392128884a@mail.gmail.com> Message-ID: <7851868c050618034136f5ae31@mail.gmail.com> On 6/18/05, Irshad Khan wrote: > Hi, > > If it is in the description part may be you can try this > > $blast_report->next_result->hits()->description; > > let me know if it works > > Irshad > > On 6/18/05, Tuan A. Tran wrote: > > Hi, > > > > When I blasted my query sequence against a database, the got the > > following line (for example) > > > > >3R type=chromosome; loc=3R:1..27905053; ID=3R; release=r4.1; species=dmel > > > > Using bioperl module, > > $blast_report = $factory->blastall($query); > > I can extract some information like ID = 3R using > > $blast_report->next_result->hits()->name; > > > > If I want to keep the entire line as show above what should I do? Is > > there a module in bioperl? I really appreciate if someone can tell me > > how to do it. > > > > Thanks, > > TAT > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From jason.stajich at duke.edu Sat Jun 18 09:34:26 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Jun 18 09:27:19 2005 Subject: [Bioperl-l] parsing blast output In-Reply-To: <7851868c050618034136f5ae31@mail.gmail.com> References: <7851868c05061803392128884a@mail.gmail.com> <7851868c050618034136f5ae31@mail.gmail.com> Message-ID: <84833BC2-0C70-4978-9719-9F2D6C052594@duke.edu> Except that hits() returns a list. while( my $result = $blast_report->next_result ) { while( my $hit= $result->next_hit ) { print $hit->name, " ", $hit->description, "\n"; } } See the HOWTO as well. On Jun 18, 2005, at 6:41 AM, Irshad Khan wrote: > On 6/18/05, Irshad Khan wrote: > >> Hi, >> >> If it is in the description part may be you can try this >> >> $blast_report->next_result->hits()->description; >> >> let me know if it works >> >> Irshad >> >> On 6/18/05, Tuan A. Tran wrote: >> >>> Hi, >>> >>> When I blasted my query sequence against a database, the got the >>> following line (for example) >>> >>> >>>> 3R type=chromosome; loc=3R:1..27905053; ID=3R; release=r4.1; >>>> species=dmel >>>> >>> >>> Using bioperl module, >>> $blast_report = $factory->blastall($query); >>> I can extract some information like ID = 3R using >>> $blast_report->next_result->hits()->name; >>> >>> If I want to keep the entire line as show above what should I >>> do? Is >>> there a module in bioperl? I really appreciate if someone can >>> tell me >>> how to do it. >>> >>> Thanks, >>> TAT >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From iak13000 at gmail.com Sat Jun 18 10:15:19 2005 From: iak13000 at gmail.com (Irshad Khan) Date: Sat Jun 18 10:07:48 2005 Subject: [Bioperl-l] parsing blast output In-Reply-To: References: <7851868c05061803392128884a@mail.gmail.com> <7851868c050618034136f5ae31@mail.gmail.com> Message-ID: <7851868c05061807151a27ab22@mail.gmail.com> hi Tuan, To parse blast reports i use to use Bio::SearchIO; so Try jason's loop by using SearchIO putting together you can write like use Bio::SearchIO; while( my $result = $blast_report->next_result ) { while( my $hit= $result->next_hit ) { print $hit->name, " ", $hit->description, "\n"; } } let me know if it works. On 6/18/05, Tuan A. Tran wrote: > Hi Irshad, > > I did try it before and got an error > Can't locate object method "description" via package > "Bio::Search::HSP::GenericHSP" > > I also looked at instructions of the above package. I could not find > method "description". Do you know where I can find an example? > > Tuan > > > On 6/18/05, Irshad Khan wrote: > > On 6/18/05, Irshad Khan wrote: > > > Hi, > > > > > > If it is in the description part may be you can try this > > > > > > $blast_report->next_result->hits()->description; > > > > > > let me know if it works > > > > > > Irshad > > > > > > On 6/18/05, Tuan A. Tran wrote: > > > > Hi, > > > > > > > > When I blasted my query sequence against a database, the got the > > > > following line (for example) > > > > > > > > >3R type=chromosome; loc=3R:1..27905053; ID=3R; release=r4.1; species=dmel > > > > > > > > Using bioperl module, > > > > $blast_report = $factory->blastall($query); > > > > I can extract some information like ID = 3R using > > > > $blast_report->next_result->hits()->name; > > > > > > > > If I want to keep the entire line as show above what should I do? Is > > > > there a module in bioperl? I really appreciate if someone can tell me > > > > how to do it. > > > > > > > > Thanks, > > > > TAT > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > From tuantran167 at gmail.com Sun Jun 19 09:49:41 2005 From: tuantran167 at gmail.com (Tuan A. Tran) Date: Sun Jun 19 09:42:38 2005 Subject: [Bioperl-l] about Bio::DB::GFF Message-ID: Hi, I read instructions about how to use Bio::DB::GFF. I quoted the last part of it. At the end of it, there is a comment "This limitation will be corrected in the next version of Bio::DB::GFF" I just wonder if there is a new version in which this limitation is corrected. Tuan ---------------------------------------------------------------------------------- This module will accept GFF3 files, as described at http://song.sourceforge.net/gff3.shtml. However, the implementation has some limitations. 1. GFF version string is required The GFF file b contain the version comment: ##gff-version 3 Unless this version string is present at the top of the GFF file, the loader will attempt to parse the file in GFF2 format, with less-than-desirable results. 2. Only one level of nesting allowed A major restriction is that Bio::DB::GFF only allows one level of nesting of features. For nesting, the Target tag will be used preferentially followed by the ID tag, followed by the Parent tag. This means that if genes are represented like this: XXXX XXXX gene XXXX XXXX XXXX ID=myGene XXXX XXXX mRNA XXXX XXXX XXXX ID=myTranscript;Parent=myGene XXXX XXXX exon XXXX XXXX XXXX Parent=myTranscript XXXX XXXX exon XXXX XXXX XXXX Parent=myTranscript Then there will be one group called myGene containing the "gene" feature and one group called myTranscript containing the mRNA, and two exons. You can work around this restriction to some extent by using the Alias attribute literally: XXXX XXXX gene XXXX XXXX XXXX ID=myGene XXXX XXXX mRNA XXXX XXXX XXXX ID=myTranscript;Parent=myGene;Alias=myGene XXXX XXXX exon XXXX XXXX XXXX Parent=myTranscript;Alias=myGene XXXX XXXX exon XXXX XXXX XXXX Parent=myTranscript;Alias=myGene This limitation will be corrected in the next version of Bio::DB::GFF. From lstein at cshl.edu Sun Jun 19 13:12:07 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Sun Jun 19 13:05:09 2005 Subject: [Bioperl-l] Drawing sequences in the "other" direction In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D55A@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950172D55A@iahce2knas1.iah.bbsrc.reserved> Message-ID: <200506191312.07629.lstein@cshl.edu> Hi Michael, Have you tried passing -flip=>1 to Bio::Graphics::Panel->new()? Lincoln On Tuesday 24 May 2005 08:16 am, michael watson (IAH-C) wrote: > Hi > > I'm trying to draw images of bits of aligned bacterial genomes with the > genes marked on as features. Reasonably often a gene in one species is > on the +1 strand, and in another species it's on the -1 strand. I want > to draw an image of these genes "aligned", one on top of the other, both > facing in the same direction (obviously those that I have flipped I will > annotate as such). > > I have been drawing images using Bio::Graphics::Panel and the add_track > method, but I can't figure out how to draw the sequence, and all it's > features, running in the opposite direction. In fact, I doubt there is > one unless someone can point it out? > > I did think of drawing them in the right orientation and using the linux > "convert" command to flip the image, but then all the text is backwards! > > Any help appreciated > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From michael.watson at bbsrc.ac.uk Sun Jun 19 14:18:07 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Sun Jun 19 14:10:18 2005 Subject: [Bioperl-l] Drawing sequences in the "other" direction Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020679A8@iahce2knas1.iah.bbsrc.reserved> Hi Lincoln, List Yeah, flip works great on the whole panel, but I have my top track which is a genome in the +1 direction, then I want to add another track which is a second genome, aligned to the first, but it runs in the -1 direction, I want to flip the second track but not the first... Would running through all the features and re-jigging the co-ordinates of all the features work? I was hoping to avoid it... though I guess somewhere buried in the guts of Bio::Graphics::Panel, this code must already be written...? Mick -----Original Message----- From: Lincoln Stein [mailto:lstein@cshl.edu] Sent: Sun 19/06/2005 6:12 PM To: bioperl-l@portal.open-bio.org Cc: michael watson (IAH-C) Subject: Re: [Bioperl-l] Drawing sequences in the "other" direction Hi Michael, Have you tried passing -flip=>1 to Bio::Graphics::Panel->new()? Lincoln On Tuesday 24 May 2005 08:16 am, michael watson (IAH-C) wrote: > Hi > > I'm trying to draw images of bits of aligned bacterial genomes with the > genes marked on as features. Reasonably often a gene in one species is > on the +1 strand, and in another species it's on the -1 strand. I want > to draw an image of these genes "aligned", one on top of the other, both > facing in the same direction (obviously those that I have flipped I will > annotate as such). > > I have been drawing images using Bio::Graphics::Panel and the add_track > method, but I can't figure out how to draw the sequence, and all it's > features, running in the opposite direction. In fact, I doubt there is > one unless someone can point it out? > > I did think of drawing them in the right orientation and using the linux > "convert" command to flip the image, but then all the text is backwards! > > Any help appreciated > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From hz5 at njit.edu Sun Jun 19 22:19:51 2005 From: hz5 at njit.edu (hz5@njit.edu) Date: Sun Jun 19 22:12:35 2005 Subject: [Bioperl-l] Drawing sequences in the "other" direction In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95020679A8@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E95020679A8@iahce2knas1.iah.bbsrc.reserved> Message-ID: <1119233991.42b627c7b995b@webmail.njit.edu> Michael, This is what I did to solve this problem: I will have one panel render +1 seq, then use another panel to render the -1 seq in the "flipped" direction, then use copy to join the 2 panel into one picture. haibo Quoting "michael watson (IAH-C)" : > > > Hi Lincoln, List > > Yeah, flip works great on the whole panel, but I have my top track which > is a genome in the +1 direction, then I want to add another track which > is a second genome, aligned to the first, but it runs in the -1 > direction, I want to flip the second track but not the first... > > Would running through all the features and re-jigging the co-ordinates > of all the features work? I was hoping to avoid it... though I guess > somewhere buried in the guts of Bio::Graphics::Panel, this code must > already be written...? > > Mick > > > > -----Original Message----- > From: Lincoln Stein [mailto:lstein@cshl.edu] > Sent: Sun 19/06/2005 6:12 PM > To: bioperl-l@portal.open-bio.org > Cc: michael watson (IAH-C) > Subject: Re: [Bioperl-l] Drawing sequences in the "other" direction > > Hi Michael, > > Have you tried passing -flip=>1 to Bio::Graphics::Panel->new()? > > Lincoln > > On Tuesday 24 May 2005 08:16 am, michael watson (IAH-C) wrote: > > Hi > > > > I'm trying to draw images of bits of aligned bacterial genomes with > the > > genes marked on as features. Reasonably often a gene in one species > is > > on the +1 strand, and in another species it's on the -1 strand. I > want > > to draw an image of these genes "aligned", one on top of the other, > both > > facing in the same direction (obviously those that I have flipped I > will > > annotate as such). > > > > I have been drawing images using Bio::Graphics::Panel and the > add_track > > method, but I can't figure out how to draw the sequence, and all > it's > > features, running in the opposite direction. In fact, I doubt there > is > > one unless someone can point it out? > > > > I did think of drawing them in the right orientation and using the > linux > > "convert" command to flip the image, but then all the text is > backwards! > > > > Any help appreciated > > > > Mick > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ========================================================= Haibo Zhang, PhD Computational Biology http://www.cyberpostdoc.org/ Share postdoc information in cyberspace. Welcome your stories, suggestions and advice! From whs at ebi.ac.uk Mon Jun 20 03:46:30 2005 From: whs at ebi.ac.uk (Will Spooner) Date: Mon Jun 20 03:38:40 2005 Subject: [Bioperl-l] parsing blast output In-Reply-To: <84833BC2-0C70-4978-9719-9F2D6C052594@duke.edu> Message-ID: Hi Jason, I have, in the past, wanted to work with HSP objects out of context of the Hit object. There is currently no way to fetch the description in this case. How easy would it be to propogate the $hit->description to the $hsp->seqdesc attribute during the report parsing? Will On Sat, 18 Jun 2005, Jason Stajich wrote: > Except that hits() returns a list. > while( my $result = $blast_report->next_result ) { > while( my $hit= $result->next_hit ) { > print $hit->name, " ", $hit->description, "\n"; > } > } > > See the HOWTO as well. > > On Jun 18, 2005, at 6:41 AM, Irshad Khan wrote: > > > On 6/18/05, Irshad Khan wrote: > > > >> Hi, > >> > >> If it is in the description part may be you can try this > >> > >> $blast_report->next_result->hits()->description; > >> > >> let me know if it works > >> > >> Irshad > >> > >> On 6/18/05, Tuan A. Tran wrote: > >> > >>> Hi, > >>> > >>> When I blasted my query sequence against a database, the got the > >>> following line (for example) > >>> > >>> > >>>> 3R type=chromosome; loc=3R:1..27905053; ID=3R; release=r4.1; > >>>> species=dmel > >>>> > >>> > >>> Using bioperl module, > >>> $blast_report = $factory->blastall($query); > >>> I can extract some information like ID = 3R using > >>> $blast_report->next_result->hits()->name; > >>> > >>> If I want to keep the entire line as show above what should I > >>> do? Is > >>> there a module in bioperl? I really appreciate if someone can > >>> tell me > >>> how to do it. > >>> > >>> Thanks, > >>> TAT > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From bqcao at physics.uc.edu Mon Jun 20 15:53:40 2005 From: bqcao at physics.uc.edu (Baoqiang Cao) Date: Mon Jun 20 15:45:35 2005 Subject: [Bioperl-l] how to get entries from swiss-prot Message-ID: Dear All, I'd like to download all entries in swiss-prot with keywords "phage organelle", any package I can use for this purpose? Thanks. Best, B. Cao From jason.stajich at duke.edu Mon Jun 20 19:51:56 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jun 20 19:43:28 2005 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: <4627846405060106084c917610@mail.gmail.com> References: <4627846405060106084c917610@mail.gmail.com> Message-ID: <8BA71CE8-2DE3-4DE0-AF3C-44C3A4DA5C98@duke.edu> It really wasn't designed to parse A0 format. I think we've slowly tried to plug the gaps. Okay I've fixed it, give the latest code from CVS a whirl. And I don't see how your code is going to work wrt printing out the hit_string if you don't have any alignments in the file. -jason On Jun 1, 2005, at 9:08 AM, Sean O'Keeffe wrote: > Hi, > I was wondering how Bio::SearchIO::hmmer parses hmmpfam/hmmsearch > result files. I have a set of hmmpam result hits without alignments in > a file which I generated using hmmpfam locally (-A0 option on the > command line). Is this considered a valid result by > Bio::SearchIO::hmmer? If so, what might be wrong with the code below > (which gets as far as the first while loop and doesn't enter any > other): > > use strict; > use Bio::SearchIO; > > my $inhmmfile = "test-hmm.smart"; > my $outputfilename = "HMM-test.hmmer.parsed"; > my $fastafilename = "$outputfilename".".fasta"; > my $inevalue =1; > my $inlength =20; > my ($myresult,$myhit,$myhsp,$mysignificance,$mylength, > $mynohit,$mylasthit,$mylastresult,$mypercent) = 0; > > unless (open(PARSEDFILE, ">$outputfilename")) { > print "Could not open file $outputfilename !\n"; > exit; > } > > unless (open(FASTAFILE, ">$fastafilename")) { > print "Could not open file $fastafilename !\n"; > exit; > } > > my $in = new Bio::SearchIO(-format => 'hmmer', -file => $inhmmfile); > > while(my $result = $in->next_result ) { > $myresult++; > while (my $hit = $result->next_hit ) { > $myhit++; > while (my $hsp = $hit->next_hsp ) { > $myhsp++; > if( $hsp->length('total') >= $inlength ) { > $mylength++; > if ( $hit->significance <= > $inevalue ) { > $mysignificance++; > > print PARSEDFILE > $result->query_name,"\t", > $result- > >query_description,"\t", > $result->query_length, "\t", > $hit->description, "\t", > $hit->accession, "\t", > $hit->bits, "\t", > $hit->significance, "\t", > $hsp->num_identical, "\t", > $hsp->num_conserved,"\t", > $hsp->start('query'),"\t", > $hsp->end('query'),"\t", > $hsp->start('hit'),"\t", > $hsp->end('hit'),"\n"; > > print FASTAFILE "> ", > $hit->description,"\n", > $hsp->hit_string,"\n"; > } > } > } > } > > if ($myhit == 0) { > $mynohit++; > } > > $myhit = 0; > } > > $mypercent = $mynohit*100 / $myresult; > > print "\n\n", $myresult, " query sequence(s)\n"; > print "\n", $myhsp, " HSP sequence(s) \n"; > print "\n", $mylength, " hit(s) presenting the minimum requested > length\n"; > print "\n", $mysignificance, " hit(s) presenting the minimum requested > E-value\n"; > print "\n", $mynohit, " query sequence(s) presenting < NO HITS > "; > close (PARSEDFILE); > close (FASTAFILE); > exit; > > > Output is: > Use of uninitialized value in numeric eq (==) at ../hmm-test-parse.pl > line 58, line 40126. > > > 1 query sequence(s) > > Use of uninitialized value in print at ../hmm-test-parse.pl line 68, > line 40126. > HSP sequence(s) > > Use of uninitialized value in print at ../hmm-test-parse.pl line 69, > line 40126. > hit(s) presenting the minimum requested length > > Use of uninitialized value in print at ../hmm-test-parse.pl line 70, > line 40126. > hit(s) presenting the minimum requested E-value > > 1 query sequence(s) presenting < NO HITS > (100.00 %) > > > Thanks very much, > Sean. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Mon Jun 20 19:51:58 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jun 20 19:43:42 2005 Subject: [Bioperl-l] parsing blast output In-Reply-To: References: Message-ID: <8F6ACB1D-FD6A-4BE8-B222-91330901AAD1@duke.edu> done. Propigating query name and desc as well Try: $hsp->query->seq_id; $hsp->query->seqdesc; $hsp->hit->seq_id; $hsp->hit->seqdesc; -jason On Jun 20, 2005, at 3:46 AM, Will Spooner wrote: > Hi Jason, > > I have, in the past, wanted to work with HSP objects out of context > of the > Hit object. There is currently no way to fetch the description in this > case. How easy would it be to propogate the $hit->description to the > $hsp->seqdesc attribute during the report parsing? > > Will > > On Sat, 18 Jun 2005, Jason Stajich wrote: > > >> Except that hits() returns a list. >> while( my $result = $blast_report->next_result ) { >> while( my $hit= $result->next_hit ) { >> print $hit->name, " ", $hit->description, "\n"; >> } >> } >> >> See the HOWTO as well. >> >> On Jun 18, 2005, at 6:41 AM, Irshad Khan wrote: >> >> >>> On 6/18/05, Irshad Khan wrote: >>> >>> >>>> Hi, >>>> >>>> If it is in the description part may be you can try this >>>> >>>> $blast_report->next_result->hits()->description; >>>> >>>> let me know if it works >>>> >>>> Irshad >>>> >>>> On 6/18/05, Tuan A. Tran wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> When I blasted my query sequence against a database, the got the >>>>> following line (for example) >>>>> >>>>> >>>>> >>>>>> 3R type=chromosome; loc=3R:1..27905053; ID=3R; release=r4.1; >>>>>> species=dmel >>>>>> >>>>>> >>>>> >>>>> Using bioperl module, >>>>> $blast_report = $factory->blastall($query); >>>>> I can extract some information like ID = 3R using >>>>> $blast_report->next_result->hits()->name; >>>>> >>>>> If I want to keep the entire line as show above what should I >>>>> do? Is >>>>> there a module in bioperl? I really appreciate if someone can >>>>> tell me >>>>> how to do it. >>>>> >>>>> Thanks, >>>>> TAT >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12/ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From hz5 at njit.edu Mon Jun 20 22:58:30 2005 From: hz5 at njit.edu (hz5@njit.edu) Date: Mon Jun 20 22:50:28 2005 Subject: [Bioperl-l] Drawing sequences in the "other" direction In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D778@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950172D778@iahce2knas1.iah.bbsrc.reserved> Message-ID: <1119322710.42b782560eb1c@webmail.njit.edu> Hi Michael, I have the cooresponding code here, hope it will help: foreach my $p(@allpanel){ my $tim = GD::Image->new($p->width(), $p->height()); $p->gd($tim); $gdImg->copy($tim, $0, $hid, 0, 0, $p->width(), $p->height()); $hid += $p->height(); } my $pngData = $gdImg->png(); haibo //cheers Quoting "michael watson (IAH-C)" : > Hi Haibo > > Sorry, is "copy" a bioperl function or a linux/windows one? I tried > this using an external linux command, but it had trouble figuring out > what size the images were > > Mick > > -----Original Message----- > From: hz5@njit.edu [mailto:hz5@njit.edu] > Sent: 20 June 2005 03:20 > To: michael watson (IAH-C) > Cc: bioperl-l@portal.open-bio.org > Subject: RE: [Bioperl-l] Drawing sequences in the "other" direction > > > Michael, > This is what I did to solve this problem: > I will have one panel render +1 seq, then use another panel to render > the -1 > seq in the "flipped" direction, then use copy to join the 2 panel into > one > picture. > > haibo > > Quoting "michael watson (IAH-C)" : > > > > > > > Hi Lincoln, List > > > > Yeah, flip works great on the whole panel, but I have my top track > > which is a genome in the +1 direction, then I want to add another > > track which is a second genome, aligned to the first, but it runs in > > > the -1 direction, I want to flip the second track but not the > first... > > > > Would running through all the features and re-jigging the > co-ordinates > > > of all the features work? I was hoping to avoid it... though I guess > > > somewhere buried in the guts of Bio::Graphics::Panel, this code must > > > already be written...? > > > > Mick > > > > > > > > -----Original Message----- > > From: Lincoln Stein [mailto:lstein@cshl.edu] > > Sent: Sun 19/06/2005 6:12 PM > > To: bioperl-l@portal.open-bio.org > > Cc: michael watson (IAH-C) > > Subject: Re: [Bioperl-l] Drawing sequences in the "other" > direction > > > > Hi Michael, > > > > Have you tried passing -flip=>1 to Bio::Graphics::Panel->new()? > > > > Lincoln > > > > On Tuesday 24 May 2005 08:16 am, michael watson (IAH-C) wrote: > > > Hi > > > > > > I'm trying to draw images of bits of aligned bacterial genomes > with > > the > > > genes marked on as features. Reasonably often a gene in one > species > > is > > > on the +1 strand, and in another species it's on the -1 strand. I > > want > > > to draw an image of these genes "aligned", one on top of the > other, > > both > > > facing in the same direction (obviously those that I have flipped > I > > will > > > annotate as such). > > > > > > I have been drawing images using Bio::Graphics::Panel and the > > add_track > > > method, but I can't figure out how to draw the sequence, and all > > it's > > > features, running in the opposite direction. In fact, I doubt > there > > is > > > one unless someone can point it out? > > > > > > I did think of drawing them in the right orientation and using the > > linux > > > "convert" command to flip the image, but then all the text is > > backwards! > > > > > > Any help appreciated > > > > > > Mick > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > ========================================================= > Haibo Zhang, PhD > Computational Biology > http://www.cyberpostdoc.org/ > Share postdoc information in cyberspace. Welcome your stories, > suggestions and > advice! > ========================================================= Haibo Zhang, PhD Computational Biology http://www.cyberpostdoc.org/ Share postdoc information in cyberspace. Welcome your stories, suggestions and advice! From heikki at ebi.ac.uk Tue Jun 21 05:02:01 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Jun 21 05:16:39 2005 Subject: [Bioperl-l] Announce: SeqHound access modules Message-ID: <200506211002.01147.heikki@ebi.ac.uk> I've just committed two files into bioperl-live: Bio::DB::SeqHound t/SeqHound_DB.t They give and test sequence retrieval from the SeqHound database: http://www.blueprint.org/seqhound/ The code is written by Rong Yao, Hao Lieu and Ian Donaldson. Enjoy, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From michael.watson at bbsrc.ac.uk Tue Jun 21 05:56:51 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue Jun 21 05:52:39 2005 Subject: [Bioperl-l] how to get entries from swiss-prot Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020679AF@iahce2knas1.iah.bbsrc.reserved> I just did an SRS all text search of swissprot at the EBI: Query "(([swissprot-AllText:phage*] & [swissprot-AllText:organelle*]) | [swissprot-AllText:phage organelle*]) " found 1 entries Even searching the whole of uniprot only throws up two entries. So either SRS isn't working as expected, or you don't really have much of a problem here.... Failing that, you can use the Bio::DB::Query interface to search the protein database at the NCBI - doing a quick search, entrez throws out 9 results for phage organelle :-) -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org on behalf of Baoqiang Cao Sent: Mon 20/06/2005 8:53 PM To: bioperl-l@portal.open-bio.org Cc: Subject: [Bioperl-l] how to get entries from swiss-prot Dear All, I'd like to download all entries in swiss-prot with keywords "phage organelle", any package I can use for this purpose? Thanks. Best, B. Cao _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Tue Jun 21 08:41:25 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Jun 21 08:34:49 2005 Subject: [Bioperl-l] how to get entries from swiss-prot In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95020679AF@iahce2knas1.iah.bbsrc.reserved> Message-ID: B., You mean "phage" or "organelle", yes? Phages, or bacteriophages, do not have organelles so combining these 2 terms with AND is going to retrieve very few entries. As Michael suggested, you could try Bio::DB::Query, it's described in the Beginner's HOWTO: http://bioperl.org/HOWTOs Brian O. On 6/21/05 5:56 AM, "michael watson (IAH-C)" wrote: > I just did an SRS all text search of swissprot at the EBI: > > Query "(([swissprot-AllText:phage*] & [swissprot-AllText:organelle*]) | > [swissprot-AllText:phage organelle*]) " found 1 entries > > Even searching the whole of uniprot only throws up two entries. > > So either SRS isn't working as expected, or you don't really have much of a > problem here.... > > Failing that, you can use the Bio::DB::Query interface to search the protein > database at the NCBI - doing a quick search, entrez throws out 9 results for > phage organelle :-) > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org on behalf of Baoqiang Cao > Sent: Mon 20/06/2005 8:53 PM > To: bioperl-l@portal.open-bio.org > Cc: > Subject: [Bioperl-l] how to get entries from swiss-prot > Dear All, > > I'd like to download all entries in swiss-prot with keywords "phage > organelle", any package I can use for this purpose? Thanks. > > Best, > B. Cao > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From michael.watson at bbsrc.ac.uk Tue Jun 21 08:52:15 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue Jun 21 08:44:14 2005 Subject: [Bioperl-l] how to get entries from swiss-prot Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020679C1@iahce2knas1.iah.bbsrc.reserved> Actually, the phrase "phage organelle", when enclosed in speech marks, is almost a googlewhack ;-) http://www.google.co.uk/search?hl=en&q=%22phage+organelle%22&btnG=Google+Search&meta= However, there is one paper that refers to a phage tail as an organelle: http://jb.asm.org/cgi/content/full/185/14/4022 -----Original Message----- From: Brian Osborne [mailto:brian_osborne@cognia.com] Sent: Tue 21/06/2005 1:41 PM To: michael watson (IAH-C); Baoqiang Cao; bioperl-l@portal.open-bio.org Cc: Subject: Re: [Bioperl-l] how to get entries from swiss-prot B., You mean "phage" or "organelle", yes? Phages, or bacteriophages, do not have organelles so combining these 2 terms with AND is going to retrieve very few entries. As Michael suggested, you could try Bio::DB::Query, it's described in the Beginner's HOWTO: http://bioperl.org/HOWTOs Brian O. On 6/21/05 5:56 AM, "michael watson (IAH-C)" wrote: > I just did an SRS all text search of swissprot at the EBI: > > Query "(([swissprot-AllText:phage*] & [swissprot-AllText:organelle*]) | > [swissprot-AllText:phage organelle*]) " found 1 entries > > Even searching the whole of uniprot only throws up two entries. > > So either SRS isn't working as expected, or you don't really have much of a > problem here.... > > Failing that, you can use the Bio::DB::Query interface to search the protein > database at the NCBI - doing a quick search, entrez throws out 9 results for > phage organelle :-) > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org on behalf of Baoqiang Cao > Sent: Mon 20/06/2005 8:53 PM > To: bioperl-l@portal.open-bio.org > Cc: > Subject: [Bioperl-l] how to get entries from swiss-prot > Dear All, > > I'd like to download all entries in swiss-prot with keywords "phage > organelle", any package I can use for this purpose? Thanks. > > Best, > B. Cao > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Tue Jun 21 09:07:31 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Jun 21 08:59:20 2005 Subject: [Bioperl-l] how to get entries from swiss-prot In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95020679C1@iahce2knas1.iah.bbsrc.reserved> Message-ID: Michael, Jonathan King! That's funny. An unconventional thinker who was way ahead of his time in some ways, he had this notion of a protein folding imtermediate many years before it was accepted, and he studied these intermediates as a phage geneticist, not as a crystallographer or structural biologist. Brian O. On 6/21/05 8:52 AM, "michael watson (IAH-C)" wrote: > Actually, the phrase "phage organelle", when enclosed in speech marks, is > almost a googlewhack ;-) > > http://www.google.co.uk/search?hl=en&q=%22phage+organelle%22&btnG=Google+Searc > h&meta= > > However, there is one paper that refers to a phage tail as an organelle: > > http://jb.asm.org/cgi/content/full/185/14/4022 > > > -----Original Message----- > From: Brian Osborne [mailto:brian_osborne@cognia.com] > Sent: Tue 21/06/2005 1:41 PM > To: michael watson (IAH-C); Baoqiang Cao; bioperl-l@portal.open-bio.org > Cc: > Subject: Re: [Bioperl-l] how to get entries from swiss-prot > B., > > You mean "phage" or "organelle", yes? Phages, or bacteriophages, do not have > organelles so combining these 2 terms with AND is going to retrieve very few > entries. > > As Michael suggested, you could try Bio::DB::Query, it's described in the > Beginner's HOWTO: > > http://bioperl.org/HOWTOs > > > Brian O. > > > On 6/21/05 5:56 AM, "michael watson (IAH-C)" > wrote: > >> I just did an SRS all text search of swissprot at the EBI: >> >> Query "(([swissprot-AllText:phage*] & [swissprot-AllText:organelle*]) | >> [swissprot-AllText:phage organelle*]) " found 1 entries >> >> Even searching the whole of uniprot only throws up two entries. >> >> So either SRS isn't working as expected, or you don't really have much of a >> problem here.... >> >> Failing that, you can use the Bio::DB::Query interface to search the protein >> database at the NCBI - doing a quick search, entrez throws out 9 results for >> phage organelle :-) >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org on behalf of Baoqiang Cao >> Sent: Mon 20/06/2005 8:53 PM >> To: bioperl-l@portal.open-bio.org >> Cc: >> Subject: [Bioperl-l] how to get entries from swiss-prot >> Dear All, >> >> I'd like to download all entries in swiss-prot with keywords "phage >> organelle", any package I can use for this purpose? Thanks. >> >> Best, >> B. Cao >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > From michael.spitzer at uni-muenster.de Tue Jun 21 09:27:40 2005 From: michael.spitzer at uni-muenster.de (Michael Spitzer) Date: Tue Jun 21 09:24:23 2005 Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? Message-ID: <42B815CC.3090009@uni-muenster.de> Dear All, For a list of approx. 20 GI numbers (NCBI GenBank IDs) I need the taxon ID as given in the corresponding full GenBank record. Which is the easiest way to accomplish this task automatically? Does Bioperl help? Can one access this function via the NCBI website (possibly, using Bioperl)? Or, does one have to download the whole GenBank database? All I could find out is that there is a function 'gi2taxid' in the NCBI toolkit, but I have no experience with using the toolkit, and I hope that there is an easier 'Bioperl' way to solve the problem - could BIO::DB::NCBIHelper be the way to go? Any help or hints are greatly appreciated! Kind regards, Michael Spitzer From michael.watson at bbsrc.ac.uk Tue Jun 21 10:27:36 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue Jun 21 11:40:49 2005 Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020679D9@iahce2knas1.iah.bbsrc.reserved> Bio::DB::Query::GenBank can be used to query GenBank, and Bio::DB::GenBank can be used to retrieve records. After that it depends where the taxon id is stored - if it is stored in the feature table, as in: /mol_type="mRNA" /cultivar="Nipponbare" /db_xref="taxon:39947" /clone="R2345" /dev_stage="seedling" Then once you have the Bio::Seq object from Bio::DB::GenBank you can iterate through the feature table and look at each tag-value pair (using the "has_tag" and "each_tag_value" methods) to look for something like db_xref="taxon:39947" HTH Mick -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org on behalf of Michael Spitzer Sent: Tue 21/06/2005 2:27 PM To: bioperl-l@bioperl.org Cc: Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? Dear All, For a list of approx. 20 GI numbers (NCBI GenBank IDs) I need the taxon ID as given in the corresponding full GenBank record. Which is the easiest way to accomplish this task automatically? Does Bioperl help? Can one access this function via the NCBI website (possibly, using Bioperl)? Or, does one have to download the whole GenBank database? All I could find out is that there is a function 'gi2taxid' in the NCBI toolkit, but I have no experience with using the toolkit, and I hope that there is an easier 'Bioperl' way to solve the problem - could BIO::DB::NCBIHelper be the way to go? Any help or hints are greatly appreciated! Kind regards, Michael Spitzer _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Tue Jun 21 12:02:30 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jun 21 11:54:06 2005 Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95020679D9@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E95020679D9@iahce2knas1.iah.bbsrc.reserved> Message-ID: <5529868E-3C19-4D30-9720-62A0AEAF8709@duke.edu> There is also a gi2taxonid file that you can download and index locally if you are going to do this a lot. DB_File is useful for this as you can tie a hash to the file and re-use the index. ftp://ftp.ncbi.nih.gov/pub/taxonomy/ AFAIK there is no direct NCBI utility to query with a gi and get the taxonid easily for every record - the download of the sequence record and then parsing is fine but will be slow if you have to do this over many many records. The Bio::DB::Taxonomy modules are useful if you want to walk up and down the taxonomy hierarchy and get sub-sections and/or query for the least common node. I use it in conjunction with the gi2taxid file (indexed) to identify DB search results by taxonomic groups. -jason On Jun 21, 2005, at 10:27 AM, michael watson ((IAH-C)) wrote: > Bio::DB::Query::GenBank can be used to query GenBank, and > Bio::DB::GenBank can be used to retrieve records. > > After that it depends where the taxon id is stored - if it is > stored in the feature table, as in: > > /mol_type="mRNA" > /cultivar="Nipponbare" > /db_xref="taxon:39947" > /clone="R2345" > /dev_stage="seedling" > > Then once you have the Bio::Seq object from Bio::DB::GenBank you > can iterate through the feature table and look at each tag-value > pair (using the "has_tag" and "each_tag_value" methods) to look for > something like db_xref="taxon:39947" > > HTH > > Mick > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org on behalf of Michael > Spitzer > Sent: Tue 21/06/2005 2:27 PM > To: bioperl-l@bioperl.org > Cc: > Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? > Dear All, > > For a list of approx. 20 GI numbers (NCBI GenBank IDs) I need the > taxon > ID as given in the corresponding full GenBank record. Which is the > easiest way to accomplish this task automatically? Does Bioperl help? > Can one access this function via the NCBI website (possibly, using > Bioperl)? Or, does one have to download the whole GenBank database? > > All I could find out is that there is a function 'gi2taxid' in the > NCBI > toolkit, but I have no experience with using the toolkit, and I hope > that there is an easier 'Bioperl' way to solve the problem - could > BIO::DB::NCBIHelper be the way to go? Any help or hints are greatly > appreciated! > > Kind regards, > > Michael Spitzer > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From khufaz83 at yahoo.com Wed Jun 22 02:44:18 2005 From: khufaz83 at yahoo.com (hafiz yusof) Date: Wed Jun 22 02:35:43 2005 Subject: [Bioperl-l] cannot find path to blastall Message-ID: <20050622064418.15683.qmail@web52507.mail.yahoo.com> I'm trying to set up a standalone blast and I'm getting an error message; MSG: cannot find path to blastall and i have running this code under Linux and i also haven't find blastall file in /usr/local/bin/, what should i do? code: #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; my $Seq_in = Bio::SeqIO->new (-file =>"/home/hafiz/bioperl/fasta", -format => 'fasta'); my $query = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => 'ecoli.nt', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); my $result = $blast_report->next_result; while( my $hit = $result->next_hit()) { print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n";} Send instant messages to your online friends http://uk.messenger.yahoo.com From rob at salmonella.org Wed Jun 22 02:46:23 2005 From: rob at salmonella.org (Rob Edwards) Date: Wed Jun 22 02:37:46 2005 Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? In-Reply-To: <42B815CC.3090009@uni-muenster.de> References: <42B815CC.3090009@uni-muenster.de> Message-ID: <1003840011d840032395bf3ed408185b@salmonella.org> Possibly the easiest way to do this is using the eutils facilities. e.g. This url will retrieve the tax id for gi 1234 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi? dbfrom=nucleotide&db=taxonomy&id=1234 will return: nucleotide 1234 taxonomy nucleotide_taxonomy 9940 Rob On Jun 21, 2005, at 6:27 AM, Michael Spitzer wrote: > Dear All, > > For a list of approx. 20 GI numbers (NCBI GenBank IDs) I need the > taxon ID as given in the corresponding full GenBank record. Which is > the easiest way to accomplish this task automatically? Does Bioperl > help? Can one access this function via the NCBI website (possibly, > using Bioperl)? Or, does one have to download the whole GenBank > database? > > All I could find out is that there is a function 'gi2taxid' in the > NCBI toolkit, but I have no experience with using the toolkit, and I > hope that there is an easier 'Bioperl' way to solve the problem - > could BIO::DB::NCBIHelper be the way to go? Any help or hints are > greatly appreciated! > > Kind regards, > > Michael Spitzer > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From l.douchy at gmail.com Wed Jun 22 03:03:08 2005 From: l.douchy at gmail.com (Laurent DOUCHY) Date: Wed Jun 22 02:54:32 2005 Subject: [Bioperl-l] cannot find path to blastall In-Reply-To: <20050622064418.15683.qmail@web52507.mail.yahoo.com> References: <20050622064418.15683.qmail@web52507.mail.yahoo.com> Message-ID: <2fb209dd0506220003c4c4613@mail.gmail.com> try with this on the top of your code. #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; BEGIN { $ENV{PATH}=":/home/bioinfo/Laurent/Utils/Blast/blast-2.2.10/bin/:"; } ... 2005/6/22, hafiz yusof : > I'm trying to set up a standalone blast and I'm > getting an error message; > > MSG: cannot find path to blastall > > and i have running this code under Linux and i also > haven't find blastall file in /usr/local/bin/, what > should i do? > > code: > > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > my $Seq_in = Bio::SeqIO->new (-file > =>"/home/hafiz/bioperl/fasta", -format => 'fasta'); > my $query = $Seq_in->next_seq(); > > my $factory = > Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > > 'database' => 'ecoli.nt', > > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), " > significance: ", > > $hit->significance(), "\n";} > > Send instant messages to your online friends http://uk.messenger.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From khufaz83 at yahoo.com Wed Jun 22 03:10:36 2005 From: khufaz83 at yahoo.com (hafiz yusof) Date: Wed Jun 22 03:02:02 2005 Subject: [Bioperl-l] cannot find path to blastall In-Reply-To: <2fb209dd0506220003c4c4613@mail.gmail.com> Message-ID: <20050622071037.20860.qmail@web52502.mail.yahoo.com> i have done running this code, and i getting this error; -------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- Can't call method "next_result" on an undefined value at blast1.pl line 22, line 1. BEGIN failed--compilation aborted at blast1.pl line 29. #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; BEGIN { $ENV{PATH}=":/home/bioinfo/Laurent/Utils/Blast/blast-2.2.10/bin/:"; my $Seq_in = Bio::SeqIO->new (-file =>"/home/hafiz/bioperl/fasta", -format => 'fasta'); my $query = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => 'ecoli.nt', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); my $result = $blast_report->next_result; while( my $hit = $result->next_hit()) { print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n";} } --- Laurent DOUCHY wrote: > try with this on the top of your code. > > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > BEGIN > { > > $ENV{PATH}=":/home/bioinfo/Laurent/Utils/Blast/blast-2.2.10/bin/:"; > } > > ... > > 2005/6/22, hafiz yusof : > > I'm trying to set up a standalone blast and I'm > > getting an error message; > > > > MSG: cannot find path to blastall > > > > and i have running this code under Linux and i > also > > haven't find blastall file in /usr/local/bin/, > what > > should i do? > > > > code: > > > > #!/usr/bin/perl > > > > use strict; > > use Bio::SeqIO; > > use Bio::Tools::Run::StandAloneBlast; > > > > my $Seq_in = Bio::SeqIO->new (-file > > =>"/home/hafiz/bioperl/fasta", -format => > 'fasta'); > > my $query = $Seq_in->next_seq(); > > > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new('program' > => > > 'blastp', > > > > 'database' => 'ecoli.nt', > > > > _READMETHOD => "Blast" > > > ); > > my $blast_report = $factory->blastall($query); > > my $result = $blast_report->next_result; > > > > while( my $hit = $result->next_hit()) { > > print "\thit name: ", $hit->name(), " > > significance: ", > > > > $hit->significance(), "\n";} > > > > Send instant messages to your online friends > http://uk.messenger.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > Send instant messages to your online friends http://uk.messenger.yahoo.com From taerwin at tpg.com.au Wed Jun 22 03:14:52 2005 From: taerwin at tpg.com.au (Tim Erwin) Date: Wed Jun 22 03:10:08 2005 Subject: [Bioperl-l] cannot find path to blastall In-Reply-To: <20050622064418.15683.qmail@web52507.mail.yahoo.com> References: <20050622064418.15683.qmail@web52507.mail.yahoo.com> Message-ID: <1119424493.5442.68.camel@bacp4> Hi Hafiz You could try setting the executable with the code: $factory->executable(?/some_dir/blastall?); Otherwise you could use a symbolic link in /usr/local/bin to point to your blastall exe. i.e. $> cd /usr/local/bin $> ln -s /some_dir/blastall . Regards, Tim On Wed, 2005-06-22 at 07:44 +0100, hafiz yusof wrote: > I'm trying to set up a standalone blast and I'm > getting an error message; > > MSG: cannot find path to blastall > > and i have running this code under Linux and i also > haven't find blastall file in /usr/local/bin/, what > should i do? > > > > code: > > > > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > my $Seq_in = Bio::SeqIO->new (-file > =>"/home/hafiz/bioperl/fasta", -format => 'fasta'); > my $query = $Seq_in->next_seq(); > > > my $factory = > Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > > 'database' => 'ecoli.nt', > > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), " > significance: ", > > $hit->significance(), "\n";} > > > Send instant messages to your online friends http://uk.messenger.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From l.douchy at gmail.com Wed Jun 22 03:22:21 2005 From: l.douchy at gmail.com (Laurent DOUCHY) Date: Wed Jun 22 03:14:01 2005 Subject: [Bioperl-l] cannot find path to blastall In-Reply-To: <20050622071037.20860.qmail@web52502.mail.yahoo.com> References: <2fb209dd0506220003c4c4613@mail.gmail.com> <20050622071037.20860.qmail@web52502.mail.yahoo.com> Message-ID: <2fb209dd05062200223aeae94a@mail.gmail.com> hum ... sory but you must replace the path to blast program. I think that the database path is important to. #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; BEGIN { $ENV{PATH}=":yourPath/Blast/blast-2.2.10/bin/:"; } my $Seq_in = Bio::SeqIO->new (-file =>"/home/hafiz/bioperl/fasta", -format => 'fasta'); my $query = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => 'yourPath/ecoli.nt', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); my $result = $blast_report->next_result; while( my $hit = $result->next_hit()) { print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n";} 2005/6/22, hafiz yusof : > i have done running this code, and i getting this > error; > > -------------------- WARNING --------------------- > MSG: cannot find path to blastall > --------------------------------------------------- > Can't call method "next_result" on an undefined value > at blast1.pl line 22, line 1. > BEGIN failed--compilation aborted at blast1.pl line > 29. > > > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > BEGIN > { > > $ENV{PATH}=":/home/bioinfo/Laurent/Utils/Blast/blast-2.2.10/bin/:"; > > my $Seq_in = Bio::SeqIO->new (-file > =>"/home/hafiz/bioperl/fasta", -format => 'fasta'); > my $query = $Seq_in->next_seq(); > > my $factory = > Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > > 'database' => 'ecoli.nt', > > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), " > significance: ", > > $hit->significance(), "\n";} > > } > > --- Laurent DOUCHY wrote: > > > try with this on the top of your code. > > > > #!/usr/bin/perl > > > > use strict; > > use Bio::SeqIO; > > use Bio::Tools::Run::StandAloneBlast; > > > > BEGIN > > { > > > > > $ENV{PATH}=":/home/bioinfo/Laurent/Utils/Blast/blast-2.2.10/bin/:"; > > } > > > > ... > > > > 2005/6/22, hafiz yusof : > > > I'm trying to set up a standalone blast and I'm > > > getting an error message; > > > > > > MSG: cannot find path to blastall > > > > > > and i have running this code under Linux and i > > also > > > haven't find blastall file in /usr/local/bin/, > > what > > > should i do? > > > > > > code: > > > > > > #!/usr/bin/perl > > > > > > use strict; > > > use Bio::SeqIO; > > > use Bio::Tools::Run::StandAloneBlast; > > > > > > my $Seq_in = Bio::SeqIO->new (-file > > > =>"/home/hafiz/bioperl/fasta", -format => > > 'fasta'); > > > my $query = $Seq_in->next_seq(); > > > > > > my $factory = > > > Bio::Tools::Run::StandAloneBlast->new('program' > > => > > > 'blastp', > > > > > > 'database' => 'ecoli.nt', > > > > > > _READMETHOD => "Blast" > > > > > ); > > > my $blast_report = $factory->blastall($query); > > > my $result = $blast_report->next_result; > > > > > > while( my $hit = $result->next_hit()) { > > > print "\thit name: ", $hit->name(), " > > > significance: ", > > > > > > $hit->significance(), "\n";} > > > > > > Send instant messages to your online friends > > http://uk.messenger.yahoo.com > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > Send instant messages to your online friends http://uk.messenger.yahoo.com > From n.haigh at sheffield.ac.uk Wed Jun 22 03:51:30 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Wed Jun 22 03:42:58 2005 Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? In-Reply-To: <42B815CC.3090009@uni-muenster.de> Message-ID: Atlas Web tools also provide a web based service that is capable of doing this: http://bioinformatics.ubc.ca/atlas/webtools/gi2tax.php http://bioinformatics.ubc.ca/atlas/webtools/ They use the gi2taxonid file from ncbi as mentioned by Jason. I personally create a DB_File of the gi2taxonid file - if your working with a species for which sequences are being rapidly sequenced you'll have to download the file on a regular basis to ensure the new GI numbners are mapped to a taxonid. Nath -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Michael Spitzer Sent: 21 June 2005 14:28 To: bioperl-l@bioperl.org Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? Dear All, For a list of approx. 20 GI numbers (NCBI GenBank IDs) I need the taxon ID as given in the corresponding full GenBank record. Which is the easiest way to accomplish this task automatically? Does Bioperl help? Can one access this function via the NCBI website (possibly, using Bioperl)? Or, does one have to download the whole GenBank database? All I could find out is that there is a function 'gi2taxid' in the NCBI toolkit, but I have no experience with using the toolkit, and I hope that there is an easier 'Bioperl' way to solve the problem - could BIO::DB::NCBIHelper be the way to go? Any help or hints are greatly appreciated! Kind regards, Michael Spitzer _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From ro_phls2 at dh.gov.hk Wed Jun 22 05:11:32 2005 From: ro_phls2 at dh.gov.hk (Andrew Leung) Date: Wed Jun 22 05:01:53 2005 Subject: [Bioperl-l] Which version of standalone BLAST binary to use? Message-ID: <20050622090958.FQU7020.pimx07@Leungkcro> Hello, My workstation is running with an "Intel 3.2GHz/1MB Xeon (EM64T) Processor" and "Redhat Linux Enterprise WS for AMD64/EM64T". Which binary version of standalone BLAST should I install? The various likely options from NCBI are: 1. linux-ia32 2. linux-x64 3. linux-ia64 Thank you for your advice. Andrew From jason.stajich at duke.edu Wed Jun 22 09:42:04 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jun 22 09:34:14 2005 Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? In-Reply-To: <1003840011d840032395bf3ed408185b@salmonella.org> References: <42B815CC.3090009@uni-muenster.de> <1003840011d840032395bf3ed408185b@salmonella.org> Message-ID: <4DC08AC1-230D-424E-B3C6-7A2A745A020F@duke.edu> cool - i don't remember that being part of the eutils interface before - I'll see about adding it to the Bio::DB::Taxonomy module now. -jason On Jun 22, 2005, at 2:46 AM, Rob Edwards wrote: > Possibly the easiest way to do this is using the eutils facilities. > > e.g. > > This url will retrieve the tax id for gi 1234 > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi? > dbfrom=nucleotide&db=taxonomy&id=1234 > > will return: > > > nucleotide > > 1234 > > > taxonomy > nucleotide_taxonomy > > 9940 > > > > > > Rob > > > On Jun 21, 2005, at 6:27 AM, Michael Spitzer wrote: > > >> Dear All, >> >> For a list of approx. 20 GI numbers (NCBI GenBank IDs) I need the >> taxon ID as given in the corresponding full GenBank record. Which >> is the easiest way to accomplish this task automatically? Does >> Bioperl help? Can one access this function via the NCBI website >> (possibly, using Bioperl)? Or, does one have to download the whole >> GenBank database? >> >> All I could find out is that there is a function 'gi2taxid' in the >> NCBI toolkit, but I have no experience with using the toolkit, and >> I hope that there is an easier 'Bioperl' way to solve the problem >> - could BIO::DB::NCBIHelper be the way to go? Any help or hints >> are greatly appreciated! >> >> Kind regards, >> >> Michael Spitzer >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Wed Jun 22 09:45:31 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jun 22 09:37:48 2005 Subject: [Bioperl-l] cannot find path to blastall In-Reply-To: <2fb209dd05062200223aeae94a@mail.gmail.com> References: <2fb209dd0506220003c4c4613@mail.gmail.com> <20050622071037.20860.qmail@web52502.mail.yahoo.com> <2fb209dd05062200223aeae94a@mail.gmail.com> Message-ID: <8B55F5EC-819D-4686-9907-ACD926D501EE@duke.edu> It looks like you are completely resetting the PATH variable - you probably want to append that. Alternatively, you can also say BEGIN { ENV{'BLASTDIR'} = '/path/to/BLAST/dir'; } use Bio::Tools::Run::StandAloneBlast; I think you want the BEGIN block in this instance before the use stmt to insure the variable is set before the module is included (as we setup some of that stuff at include time since those initializations are also in a BEGIN block). -jason On Jun 22, 2005, at 3:22 AM, Laurent DOUCHY wrote: > hum ... sory but you must replace the path to blast program. I think > that the database path is important to. > > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > > BEGIN > { > $ENV{PATH}=":yourPath/Blast/blast-2.2.10/bin/:"; > } > > my $Seq_in = Bio::SeqIO->new (-file > =>"/home/hafiz/bioperl/fasta", -format => 'fasta'); > my $query = $Seq_in->next_seq(); > > my $factory = > Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > > 'database' => 'yourPath/ecoli.nt', > > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), " > significance: ", > > $hit->significance(), "\n";} > > > > 2005/6/22, hafiz yusof : > >> i have done running this code, and i getting this >> error; >> >> -------------------- WARNING --------------------- >> MSG: cannot find path to blastall >> --------------------------------------------------- >> Can't call method "next_result" on an undefined value >> at blast1.pl line 22, line 1. >> BEGIN failed--compilation aborted at blast1.pl line >> 29. >> >> >> #!/usr/bin/perl >> >> use strict; >> use Bio::SeqIO; >> use Bio::Tools::Run::StandAloneBlast; >> >> BEGIN >> { >> >> $ENV{PATH}=":/home/bioinfo/Laurent/Utils/Blast/blast-2.2.10/bin/:"; >> >> my $Seq_in = Bio::SeqIO->new (-file >> =>"/home/hafiz/bioperl/fasta", -format => 'fasta'); >> my $query = $Seq_in->next_seq(); >> >> my $factory = >> Bio::Tools::Run::StandAloneBlast->new('program' => >> 'blastp', >> >> 'database' => 'ecoli.nt', >> >> _READMETHOD => "Blast" >> ); >> my $blast_report = $factory->blastall($query); >> my $result = $blast_report->next_result; >> >> while( my $hit = $result->next_hit()) { >> print "\thit name: ", $hit->name(), " >> significance: ", >> >> $hit->significance(), "\n";} >> >> } >> >> --- Laurent DOUCHY wrote: >> >> >>> try with this on the top of your code. >>> >>> #!/usr/bin/perl >>> >>> use strict; >>> use Bio::SeqIO; >>> use Bio::Tools::Run::StandAloneBlast; >>> >>> BEGIN >>> { >>> >>> >>> >> $ENV{PATH}=":/home/bioinfo/Laurent/Utils/Blast/blast-2.2.10/bin/:"; >> >>> } >>> >>> ... >>> >>> 2005/6/22, hafiz yusof : >>> >>>> I'm trying to set up a standalone blast and I'm >>>> getting an error message; >>>> >>>> MSG: cannot find path to blastall >>>> >>>> and i have running this code under Linux and i >>>> >>> also >>> >>>> haven't find blastall file in /usr/local/bin/, >>>> >>> what >>> >>>> should i do? >>>> >>>> code: >>>> >>>> #!/usr/bin/perl >>>> >>>> use strict; >>>> use Bio::SeqIO; >>>> use Bio::Tools::Run::StandAloneBlast; >>>> >>>> my $Seq_in = Bio::SeqIO->new (-file >>>> =>"/home/hafiz/bioperl/fasta", -format => >>>> >>> 'fasta'); >>> >>>> my $query = $Seq_in->next_seq(); >>>> >>>> my $factory = >>>> Bio::Tools::Run::StandAloneBlast->new('program' >>>> >>> => >>> >>>> 'blastp', >>>> >>>> 'database' => 'ecoli.nt', >>>> >>>> _READMETHOD => "Blast" >>>> >>>> >>> ); >>> >>>> my $blast_report = $factory->blastall($query); >>>> my $result = $blast_report->next_result; >>>> >>>> while( my $hit = $result->next_hit()) { >>>> print "\thit name: ", $hit->name(), " >>>> significance: ", >>>> >>>> $hit->significance(), "\n";} >>>> >>>> Send instant messages to your online friends >>>> >>> http://uk.messenger.yahoo.com >>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> >>>> >>> >>> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >>>> >>>> >>> >>> >> >> Send instant messages to your online friends http:// >> uk.messenger.yahoo.com >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From amackey at pcbi.upenn.edu Wed Jun 22 09:46:09 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Jun 22 09:38:13 2005 Subject: [Bioperl-l] More unresolved issues with Bio::AnnotatableI Message-ID: Because AnnotatableI has implementations for add_tag and get_tag that invoke Bio::Annotation::OntologyTerm, and therefore Graph::Directed, which relies on Scalar::Util::weaken(), therefore I cannot even use basic Bio::Seq functionality on any perl that doesn't have weak references (oddly, this cropped up in a 5.8.0 install via an RPM that was evidently compiled without support for weak references, so this isn't just an "ancient perl" problem). This is something of a showstopper for any 1.6; in effect, we'd need to disable Annotation::OntologyTerm use for any Perl without weak reference support. We've said it before, and we need to say it again: the changes made to the feature/annotation object model are seriously impeding our ability to move forward to a release (and frighteningly, the GBrowse distribution now includes those parts of 1.5 that it relies on, so a user's BioPerl install could be a hodge-podge of 1.4/1.5 code). This seems important to all GMOD projects, so why hasn't there been any work on it? Thanks, -Aaron From khoueiry at ibdm.univ-mrs.fr Wed Jun 22 06:07:48 2005 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Wed Jun 22 09:39:50 2005 Subject: [Bioperl-l] Pattern search with gap Message-ID: <1119434868.14595.14.camel@DavidLinux> Skipped content of type multipart/alternative-------------- next part -------------- while($i<=length($seqstring) - $winSize){ my $nucCount = 0; my $substring = substr($seqstring, $i, $winSize); #If the substring doesn't contain nuc or begins with a gap if($substring !~ /[AGCT]/ or $substring =~ /^-/){ $i++; next; } #if the substring doesn't contain gap if($substring !~ /-/){ #print $substring."\n"; if($substring eq $qseq){ print "$qseq found on $i..".($i+length($qseq))."\n"; last; } $i++; } if($substring =~ /-/){ $nucCount++ while $substring =~ /[AGCT]/g; my $first = $i; my $j = 1; while($nucCount < length($qseq)){ $substring = substr($seqstring, $i, $winSize+$j); $j++; $nucCount = 0; $nucCount++ while $substring =~ /[AGCT]/g; } my $last = ($i + $winSize+$j) - 1; if($nucCount = length($qseq)){ my $gapCount = $substring =~ s/-//g; if($substring eq $qseq){ print "$qseq found on $i.."; print "$last\t"."with $gapCount Gap(s)\n"; last; } } $i++; print $substring."\t"; print $first."..".$last."\t".$nucCount."\n"; $nucCount = 0; } } From walsh at cenix-bioscience.com Wed Jun 22 10:04:20 2005 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Wed Jun 22 09:55:53 2005 Subject: [Bioperl-l] Pattern search with gap In-Reply-To: <1119434868.14595.14.camel@DavidLinux> References: <1119434868.14595.14.camel@DavidLinux> Message-ID: <42B96FE4.3040400@cenix-bioscience.com> Hello, You could try substituting N's for your gaps and then use WU-BLAST with a wordsize equal to your input sequence (setting penalties for indels very high to ensure perfect matches). I imagine that would be faster than scanning the sequence as you are doing. Hope that helps, Andrew khoueiry wrote: > Hello, > > I want to parse a gapped sequence and search for a pattern in it... What > is important for me is to get the Position of the pattern Start and End > taking gaps into account: > > i.e : > my $seqstring = > "--------------------CAAAATAAATAGGTTATACAGAAACA---------------------AGATAAAAATTACA"; > my $qseq = "CAAGATA"; > > so the result should give me : start = 61 and End = 89 > > I wrote a program to do that.. It works well but when working with very > large sequences (And I have a lot of them), it take a lot of time.... > > In fact, my program parse the sequence with a sliding window equal the > length of the pattern... > > the while loop is attached : > > Any suggestion will be appreciated.... > > > Pierre > > > > > > ------------------------------------------------------------------------ > > while($i<=length($seqstring) - $winSize){ > > my $nucCount = 0; > my $substring = substr($seqstring, $i, $winSize); > > #If the substring doesn't contain nuc or begins with a gap > if($substring !~ /[AGCT]/ or $substring =~ /^-/){ > $i++; > next; > } > > #if the substring doesn't contain gap > if($substring !~ /-/){ > #print $substring."\n"; > if($substring eq $qseq){ > print "$qseq found on $i..".($i+length($qseq))."\n"; > last; > } > $i++; > } > > if($substring =~ /-/){ > $nucCount++ while $substring =~ /[AGCT]/g; > my $first = $i; > my $j = 1; > while($nucCount < length($qseq)){ > $substring = substr($seqstring, $i, $winSize+$j); > $j++; > $nucCount = 0; > $nucCount++ while $substring =~ /[AGCT]/g; > } > my $last = ($i + $winSize+$j) - 1; > > if($nucCount = length($qseq)){ > my $gapCount = $substring =~ s/-//g; > if($substring eq $qseq){ > print "$qseq found on $i.."; > print "$last\t"."with $gapCount Gap(s)\n"; > last; > } > } > > $i++; > > print $substring."\t"; > print $first."..".$last."\t".$nucCount."\n"; > $nucCount = 0; > } > } > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Tel. +49-351-4173 137 Fax +49-351-4173 109 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From hotafin at gmail.com Wed Jun 22 10:13:31 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Wed Jun 22 10:05:00 2005 Subject: [Bioperl-l] Pattern search with gap In-Reply-To: <1119434868.14595.14.camel@DavidLinux> References: <1119434868.14595.14.camel@DavidLinux> Message-ID: Here's a much simpler code: #!/usr/bin/perl # 10 20 30 40 45 50 60 7072 # 123456789012345678901234567890123456789012345678901234567890123456789012345678901 my $seqstring ="--------------------CAAAATAAATAGGTTATACAGAAACA---------------------AGATAAAAATTACA"; my $qseq = "CAAGATA"; my @qqq = split (//,$qseq); my $pat = join('-*',@qqq); my $pat_rege = qr/$pat/; $seqstring =~ /$pat_rege/; my $before = $`; my $match_seq = $&; my $before_length = length $before; my $mseq_length = length $match_seq; my $start = 1 + $before_length; my $end = $before_length + $mseq_length; print "Start:$start End:$end\n"; #Start:45 End:72 It should be quite fast. try it out, and let me know, if it works well for you! Hota On 6/22/05, khoueiry wrote: > Hello, > > I want to parse a gapped sequence and search for a pattern in it... What > is important for me is to get the Position of the pattern Start and End > taking gaps into account: > > i.e : > my $seqstring = > "--------------------CAAAATAAATAGGTTATACAGAAACA---------------------AGATAAAAATTACA"; > my $qseq = "CAAGATA"; > > so the result should give me : start = 61 and End = 89 > > I wrote a program to do that.. It works well but when working with very > large sequences (And I have a lot of them), it take a lot of time.... > > In fact, my program parse the sequence with a sliding window equal the > length of the pattern... > > the while loop is attached : > > Any suggestion will be appreciated.... > > > Pierre > > > > -- > ========================== > Pierre Khoueiry > LGPD/IBDM > Campus de Luminy, Case 907 > 13288 Marseille cedex 9, France > Tel : +33 (0)4 91 82 94 18 > Fax : +33 (0)4 91 82 06 82 > > ========================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From hotafin at gmail.com Wed Jun 22 10:26:52 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Wed Jun 22 10:18:48 2005 Subject: [Bioperl-l] Pattern search with gap In-Reply-To: References: <1119434868.14595.14.camel@DavidLinux> Message-ID: And a slightly different code for multiple recognition: my $seqstring ="--------------------CAAAATAAATAGGTTATACAGAAACA---------------------AGATAAAAATTACA--CAAG-AT-A----"; my $qseq = "CAAGATA"; my @qqq = split (//,$qseq); my $pat = join('-*',@qqq); my $pat_rege = qr/$pat/; while ($seqstring =~ /$pat_rege/g) { my $before = $`; my $match_seq = $&; my $before_length = length $before; my $mseq_length = length $match_seq; my $start = 1 + $before_length; my $end = $before_length + $mseq_length; print "Start:$start End:$end\n"; } #Start:45 End:72 #Start:84 End:92 From patrick at bennour.de Wed Jun 22 11:43:29 2005 From: patrick at bennour.de (Patrick Bennour) Date: Wed Jun 22 11:34:58 2005 Subject: [Bioperl-l] Looking for an Application to visualize Promoter Prediction results Message-ID: <002301c57741$23c1f320$2101a8c0@windowsxp> Dear All, I am looking for an application that does at least some of the following. Input: different promoter prediction analysis programs (like CpgProD, Eponine, FirstEF, McPromoter) The application should then - automatically parse the results - visualize the results in an graphical diagram, that contains the input sequence - visualize the different predictions in an comparative diagram - combine some predictions to improve prediction quality Thanks for your suggestions From khoueiry at ibdm.univ-mrs.fr Wed Jun 22 12:02:32 2005 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Wed Jun 22 11:52:53 2005 Subject: [Bioperl-l] Pattern search with gap In-Reply-To: References: <1119434868.14595.14.camel@DavidLinux> Message-ID: <1119456152.14595.24.camel@DavidLinux> Thanks, In fact, after posting my mail, I tried another loop by. It detect position (by pos()) without the gaps and then I loop on the gapped sequence counting nucleotides and gaps till i get to the pos() already detected thus. the new start is the number of nucleotides and gaps... Tamas, I tried your loop and it is a lot faster than my new one .. Thanks,, I will go for it.. Pierre Le mercredi 22 juin 2005 ? 16:26 +0200, Tamas Horvath a ?crit : > And a slightly different code for multiple recognition: > > my $seqstring ="--------------------CAAAATAAATAGGTTATACAGAAACA---------------------AGATAAAAATTACA--CAAG-AT-A----"; > my $qseq = "CAAGATA"; > > my @qqq = split (//,$qseq); > my $pat = join('-*',@qqq); > > my $pat_rege = qr/$pat/; > > while ($seqstring =~ /$pat_rege/g) { > my $before = $`; > my $match_seq = $&; > my $before_length = length $before; > my $mseq_length = length $match_seq; > my $start = 1 + $before_length; > my $end = $before_length + $mseq_length; > print "Start:$start End:$end\n"; > } > #Start:45 End:72 > #Start:84 End:92 From hotafin at gmail.com Wed Jun 22 12:30:20 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Wed Jun 22 12:24:27 2005 Subject: [Bioperl-l] mystery Message-ID: while ($pdb_data =~ /(REMARK 999.*)/g) { die "die2"; } die"die1" if $pdb_data =~ /(REMARK 999.*)/; the following code terminates with "die1". Does anyone know why? the $pdb_data is a string of a pdb entry (1PHK) it has the following REMARK 999 lines: REMARK 999 REMARK 999 SEQUENCE REMARK 999 1PHK SWS P00518 1 - 14 NOT IN ATOMS LIST REMARK 999 1PHK SWS P00518 292 - 386 NOT IN ATOMS LIST From laurichj at bioinfo.ucr.edu Wed Jun 22 13:02:10 2005 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Wed Jun 22 12:53:32 2005 Subject: [Bioperl-l] mystery In-Reply-To: References: Message-ID: <20050622170210.GA1203@bioinfo.ucr.edu> On Wed 06/22/05 18:30, Tamas Horvath wrote: > while ($pdb_data =~ /(REMARK 999.*)/g) { > die "die2"; > } > die"die1" if $pdb_data =~ /(REMARK 999.*)/; > > the following code terminates with "die1". Does anyone know why? > the $pdb_data is a string of a pdb entry (1PHK) Odd seems to work here (as expected, "die2"). Check the odd perl variables that change the behavior of things (man perlvar). -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| | Geek Code: Version 3.12 | | GAT/CS$/IT$ d+ s-: a-->--- C++++$ UL++++$ P++ L++++| | $E--- W+ N o? K? w--(---) O? M+(++) V? PS++ PE-(--)| | Y+ PGP+++ t--- 5+++ X+ R tv DI++ D--- G++ | | e++ h- r++ z? | |----------------------------------------------------| From hotafin at gmail.com Wed Jun 22 13:03:24 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Wed Jun 22 12:54:46 2005 Subject: [Bioperl-l] mystery In-Reply-To: <42B998BB.5040601@virchow.uni-wuerzburg.de> References: <42B998BB.5040601@virchow.uni-wuerzburg.de> Message-ID: well, I have some other loops in my script wich are all work fine. Maybe this is some sort of mysterious bug of the IDE's debugger... thank's anyway. PS.: Afterwards I've tested it "outside" the IDE, and it worked fine for me to... On 6/22/05, Andreas Boehm wrote: > Hi, > > it works fine with my perl version: > perl5 (revision 5.0 version 8 subversion 0) > > Maybe you have a buggy compilation, that has a problem with the /g in > your while-loop? > > regards, > Andreas Boehm > > Tamas Horvath wrote: > > while ($pdb_data =~ /(REMARK 999.*)/g) { > > die "die2"; > > } > > die"die1" if $pdb_data =~ /(REMARK 999.*)/; > > > > the following code terminates with "die1". Does anyone know why? > > the $pdb_data is a string of a pdb entry (1PHK) > > > > it has the following REMARK 999 lines: > > > > REMARK 999 > > REMARK 999 SEQUENCE > > REMARK 999 1PHK SWS P00518 1 - 14 NOT IN ATOMS LIST > > REMARK 999 1PHK SWS P00518 292 - 386 NOT IN ATOMS LIST > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From raoul.bonnal at itb.cnr.it Wed Jun 22 15:29:31 2005 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Wed Jun 22 15:21:15 2005 Subject: [Bioperl-l] Read alignment produced by blast's option -m (4|6) Message-ID: Hi, do you know a way to load the output produced by balstn using the option -m 4 or 6, so that then I can manage it as a multi alignment for example with the Bio::Align ? RJP From rvosa at sfu.ca Wed Jun 22 15:37:14 2005 From: rvosa at sfu.ca (Rutger Vos) Date: Wed Jun 22 15:28:45 2005 Subject: [Bioperl-l] CORBA::ORBit on Win32 ActivePerl v5.8.4 Message-ID: <42B9BDEA.7010301@sfu.ca> Dear fellow BioPerlers, has anyone been able to get CORBA::ORBit to install on native Win32 (ActivePerl)? How did you do it (compiler, flags, libs)? Thanks! Rutger -- ++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar ++++++++++++++++++++++++++++++++++++++++++++ From bioperlanand at yahoo.com Wed Jun 22 16:17:16 2005 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Wed Jun 22 16:10:03 2005 Subject: [Bioperl-l] How to convert GFF to GAME XML Message-ID: <20050622201716.73996.qmail@web32905.mail.mud.yahoo.com> Hi, Does anybody know if there is a bioperl way of converting a gff file to "game xml" format. If the answer is no, can anybody suggest some tools. Thanks in advance. Anand __________________________________ Discover Yahoo! Use Yahoo! to plan a weekend, have fun online and more. Check it out! http://discover.yahoo.com/ From lstein at cshl.edu Wed Jun 22 17:17:50 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Jun 22 17:10:03 2005 Subject: [Bioperl-l] Drawing sequences in the "other" direction In-Reply-To: <1119233991.42b627c7b995b@webmail.njit.edu> References: <8975119BCD0AC5419D61A9CF1A923E95020679A8@iahce2knas1.iah.bbsrc.reserved> <1119233991.42b627c7b995b@webmail.njit.edu> Message-ID: <200506221717.50991.lstein@cshl.edu> That's the exact solution I would have proposed. Lincoln On Sunday 19 June 2005 10:19 pm, hz5@njit.edu wrote: > Michael, > This is what I did to solve this problem: > I will have one panel render +1 seq, then use another panel to render the > -1 seq in the "flipped" direction, then use copy to join the 2 panel into > one picture. > > haibo > > Quoting "michael watson (IAH-C)" : > > Hi Lincoln, List > > > > Yeah, flip works great on the whole panel, but I have my top track which > > is a genome in the +1 direction, then I want to add another track which > > is a second genome, aligned to the first, but it runs in the -1 > > direction, I want to flip the second track but not the first... > > > > Would running through all the features and re-jigging the co-ordinates > > of all the features work? I was hoping to avoid it... though I guess > > somewhere buried in the guts of Bio::Graphics::Panel, this code must > > already be written...? > > > > Mick > > > > > > > > -----Original Message----- > > From: Lincoln Stein [mailto:lstein@cshl.edu] > > Sent: Sun 19/06/2005 6:12 PM > > To: bioperl-l@portal.open-bio.org > > Cc: michael watson (IAH-C) > > Subject: Re: [Bioperl-l] Drawing sequences in the "other" direction > > > > Hi Michael, > > > > Have you tried passing -flip=>1 to Bio::Graphics::Panel->new()? > > > > Lincoln > > > > On Tuesday 24 May 2005 08:16 am, michael watson (IAH-C) wrote: > > > Hi > > > > > > I'm trying to draw images of bits of aligned bacterial genomes with > > > > the > > > > > genes marked on as features. Reasonably often a gene in one species > > > > is > > > > > on the +1 strand, and in another species it's on the -1 strand. I > > > > want > > > > > to draw an image of these genes "aligned", one on top of the other, > > > > both > > > > > facing in the same direction (obviously those that I have flipped I > > > > will > > > > > annotate as such). > > > > > > I have been drawing images using Bio::Graphics::Panel and the > > > > add_track > > > > > method, but I can't figure out how to draw the sequence, and all > > > > it's > > > > > features, running in the opposite direction. In fact, I doubt there > > > > is > > > > > one unless someone can point it out? > > > > > > I did think of drawing them in the right orientation and using the > > > > linux > > > > > "convert" command to flip the image, but then all the text is > > > > backwards! > > > > > Any help appreciated > > > > > > Mick > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > ========================================================= > Haibo Zhang, PhD > Computational Biology > http://www.cyberpostdoc.org/ > Share postdoc information in cyberspace. Welcome your stories, suggestions > and advice! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Wed Jun 22 17:21:14 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Jun 22 17:13:07 2005 Subject: [Bioperl-l] scales in xyplot.pm negative values Bio::Graphics for GBrowse In-Reply-To: <1118914253.8228.23.camel@localhost.localdomain> References: <1118517484.8515.12.camel@localhost.localdomain> <200506141400.37019.lstein@cshl.edu> <1118914253.8228.23.camel@localhost.localdomain> Message-ID: <200506221721.15784.lstein@cshl.edu> I like the second choice better! Lincoln On Thursday 16 June 2005 05:30 am, Albert Vilella wrote: > Hi all, > > I post this here for suggestions: > > How would we like to set the scale or xyplots with respect to > min_value and max_value in cases where negative values exist. For > example: > > Ex1: min_value=-1 max_value=5 > > Set: 0 to the middle of the plot, then plot from +5 to -5: > > +5 + > > > > 0 +----------------------------------------- > > > > -5 + > > Set: the scale to plot from +5 to -1, which will fill up all the > plot (as in UCSC wiggle tracks): > > +5 + > > > > > > 0 +----------------------------------------- > > -1 + > > Comments? > > Bests, > > Albert. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Wed Jun 22 17:21:14 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Jun 22 17:13:12 2005 Subject: [Bioperl-l] scales in xyplot.pm negative values Bio::Graphics for GBrowse In-Reply-To: <1118914253.8228.23.camel@localhost.localdomain> References: <1118517484.8515.12.camel@localhost.localdomain> <200506141400.37019.lstein@cshl.edu> <1118914253.8228.23.camel@localhost.localdomain> Message-ID: <200506221721.15784.lstein@cshl.edu> I like the second choice better! Lincoln On Thursday 16 June 2005 05:30 am, Albert Vilella wrote: > Hi all, > > I post this here for suggestions: > > How would we like to set the scale or xyplots with respect to > min_value and max_value in cases where negative values exist. For > example: > > Ex1: min_value=-1 max_value=5 > > Set: 0 to the middle of the plot, then plot from +5 to -5: > > +5 + > > > > 0 +----------------------------------------- > > > > -5 + > > Set: the scale to plot from +5 to -1, which will fill up all the > plot (as in UCSC wiggle tracks): > > +5 + > > > > > > 0 +----------------------------------------- > > -1 + > > Comments? > > Bests, > > Albert. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Wed Jun 22 17:25:22 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Jun 22 17:17:02 2005 Subject: [Bioperl-l] Problems with Bio/Graphics/Feature.pm In-Reply-To: <0DCE537D-CFDA-43B6-A8D8-687BAC352794@duke.edu> References: <8975119BCD0AC5419D61A9CF1A923E950121BB7B@iahce2knas1.iah.bbsrc.reserved> <0DCE537D-CFDA-43B6-A8D8-687BAC352794@duke.edu> Message-ID: <200506221725.23138.lstein@cshl.edu> Thanks to Jason for fixing that. I'm afraid that I've been partly responsible for the drift. Lincoln On Monday 09 May 2005 03:54 pm, Jason Stajich wrote: > yes I know - that is why I fixed that script in CVS just now. 'you' > was meant generally - fault of the script and API drifting apart so > not your (Mick) fault at all. > > -j > > On May 9, 2005, at 3:11 PM, michael watson ((IAH-C)) wrote: > > Hi > > > > I didn't deliberately pass a RichSeq object - a call in the > > render_sequence.pl script did i.e. render_sequence.pl doesn't work > > "out of the box". > > > > I think it's this piece of code that breaks it: > > > > $panel->add_track(arrow => $seq, > > -bump => 0, > > -double=>1, > > -tick => 2); > > > > Mick > > > > :-) > > > > -----Original Message----- > > From: Jason Stajich [mailto:jason.stajich@duke.edu] > > Sent: Mon 09/05/2005 7:27 PM > > To: michael watson (IAH-C) > > Cc: bioperl-l@portal.open-bio.org > > Subject: Re: [Bioperl-l] Problems with Bio/Graphics/Feature.pm > > > > you need to pass in a SeqFeature::Generic or Graphics::Feature obj > > instead of Sequence object. > > > > I updated the code in CVS: > > > > http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > > examples/biographics/render_sequence.pl?rev=1.2&cvsroot=bioperl > > > > [fyi Bio::DB Bio::Graphics developers] > > There was something else weird about how this was working - somehow > > Bio::Location objects are getting passed to the description and label > > functions. I don't quite understand, might be my local code playing > > too. > > > > -jason > > > > On May 9, 2005, at 6:29 AM, michael watson ((IAH-C)) wrote: > >> Hi > >> > >> I'm hacking around with the render_sequence.pl example script and > >> keep > >> getting errors: > >> > >> Can't locate object method "seq_id" via package > >> "Bio::Seq::RichSeq" at > >> /usr/local/bioperl-1.5.0/Bio/Graphics/Feature.pm line 269, > >> line > >> 191. > >> > >> I also get a similar message about not being able to locate object > >> method "start", which is called on the next line of > >> Bio::Graphics::Feature.pm > >> > >> I vaguely recall asking about this previously - was a solution ever > >> presented? > >> > >> Many thanks in advance > >> > >> Mick > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From sm_middha at yahoo.com Wed Jun 22 18:24:23 2005 From: sm_middha at yahoo.com (sumit middha) Date: Wed Jun 22 18:16:28 2005 Subject: [Bioperl-l] FASTA.pm issue Message-ID: <20050622222423.9075.qmail@web30712.mail.mud.yahoo.com> Hello, I have a trouble with using fasta module I use the required statements use Bio::DB::Fasta; use Bio::Seq; The error was: AnyDBM_File doesn't define an EXISTS method at /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm line 577 thanks, sm __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From taerwin at tpg.com.au Wed Jun 22 19:47:53 2005 From: taerwin at tpg.com.au (Tim Erwin) Date: Wed Jun 22 19:43:25 2005 Subject: [Bioperl-l] Which version of standalone BLAST binary to use? In-Reply-To: <20050622090958.FQU7020.pimx07@Leungkcro> References: <20050622090958.FQU7020.pimx07@Leungkcro> Message-ID: <1119484073.5442.73.camel@bacp4> Hi Andrew, The linux-ia32 should work, The linux-ia64 if for the intel itanium and the x64 is for x86_64 (AMD and am not sure how this would go on the xeon). The Xeon (EM64T) should be able to run both 64 and 32 bit binaries, so you shouldn't have any problems with the ia32. Regards, Tim On Wed, 2005-06-22 at 17:11 +0800, Andrew Leung wrote: > Hello, > My workstation is running with an "Intel 3.2GHz/1MB Xeon (EM64T) Processor" > and "Redhat Linux Enterprise WS for AMD64/EM64T". Which binary version of > standalone BLAST should I install? > The various likely options from NCBI are: > 1. linux-ia32 > 2. linux-x64 > 3. linux-ia64 > Thank you for your advice. > Andrew > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From allenday at ucla.edu Wed Jun 22 20:43:44 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Jun 22 20:35:12 2005 Subject: [Bioperl-l] Re: More unresolved issues with Bio::AnnotatableI In-Reply-To: References: Message-ID: Hi, Where is the dependency on Graph::Directed introduced? A grep through Bio/* in bioperl-live on HEAD reveals several references in the POD to Graph.pm, but I don't see it anywhere in the code. I don't know if Chris Mungall's GO-Perl API removes the dependency on Graph::Directed, but it is certainly easier to use than Bio::OntologyIO as a means to access the OBO ontologies. I'm willing to look into converting Bio::Annotation::OntologyTerm to use GO::Model::* instead of Bio::Ontology::Term, but it may interfere with other projects using the class (e.g. bioperl-db). Hilmar, I know you were looking at the GO-Perl codebase recently, can you comment on any of the above? -Allen On Wed, 22 Jun 2005, Aaron J. Mackey wrote: > > Because AnnotatableI has implementations for add_tag and get_tag that > invoke Bio::Annotation::OntologyTerm, and therefore Graph::Directed, > which relies on Scalar::Util::weaken(), therefore I cannot even use > basic Bio::Seq functionality on any perl that doesn't have weak > references (oddly, this cropped up in a 5.8.0 install via an RPM that > was evidently compiled without support for weak references, so this > isn't just an "ancient perl" problem). > > This is something of a showstopper for any 1.6; in effect, we'd need > to disable Annotation::OntologyTerm use for any Perl without weak > reference support. > > We've said it before, and we need to say it again: the changes made > to the feature/annotation object model are seriously impeding our > ability to move forward to a release (and frighteningly, the GBrowse > distribution now includes those parts of 1.5 that it relies on, so a > user's BioPerl install could be a hodge-podge of 1.4/1.5 code). This > seems important to all GMOD projects, so why hasn't there been any > work on it? > > Thanks, > > -Aaron > From amackey at pcbi.upenn.edu Wed Jun 22 20:45:08 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Jun 22 20:40:37 2005 Subject: [Bioperl-l] Re: More unresolved issues with Bio::AnnotatableI In-Reply-To: References: Message-ID: <42BA0614.90102@pcbi.upenn.edu> I think Graph::Directed comes with Graph (since Graph is just a wrapper for Graph::Directed and Graph::Undirected) -Aaron Allen Day wrote: > Hi, > > Where is the dependency on Graph::Directed introduced? A grep through > Bio/* in bioperl-live on HEAD reveals several references in the POD to > Graph.pm, but I don't see it anywhere in the code. > > I don't know if Chris Mungall's GO-Perl API removes the dependency on > Graph::Directed, but it is certainly easier to use than Bio::OntologyIO as > a means to access the OBO ontologies. I'm willing to look into converting > Bio::Annotation::OntologyTerm to use GO::Model::* instead of > Bio::Ontology::Term, but it may interfere with other projects using the > class (e.g. bioperl-db). > > Hilmar, I know you were looking at the GO-Perl codebase recently, can you > comment on any of the above? > > -Allen > > > On Wed, 22 Jun 2005, Aaron J. Mackey wrote: > > >>Because AnnotatableI has implementations for add_tag and get_tag that >>invoke Bio::Annotation::OntologyTerm, and therefore Graph::Directed, >>which relies on Scalar::Util::weaken(), therefore I cannot even use >>basic Bio::Seq functionality on any perl that doesn't have weak >>references (oddly, this cropped up in a 5.8.0 install via an RPM that >>was evidently compiled without support for weak references, so this >>isn't just an "ancient perl" problem). >> >>This is something of a showstopper for any 1.6; in effect, we'd need >>to disable Annotation::OntologyTerm use for any Perl without weak >>reference support. >> >>We've said it before, and we need to say it again: the changes made >>to the feature/annotation object model are seriously impeding our >>ability to move forward to a release (and frighteningly, the GBrowse >>distribution now includes those parts of 1.5 that it relies on, so a >>user's BioPerl install could be a hodge-podge of 1.4/1.5 code). This >>seems important to all GMOD projects, so why hasn't there been any >>work on it? >> >>Thanks, >> >>-Aaron >> From hlapp at gmx.net Thu Jun 23 01:41:20 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jun 23 01:33:07 2005 Subject: [Bioperl-l] Re: More unresolved issues with Bio::AnnotatableI In-Reply-To: References: Message-ID: <1a67060b1ade16d12f992949ed0ca5ae@gmx.net> Graph::Directed is part of the Graph.pm package. In fact, the graph used by SimpleGOEngine is an instance of Graph::Directed. If accessing $seq->annotation would now need Graph::Directed installed and functional, then the reason almost certainly is due to the changes Aaron pinpoints. More precisely, any ontology term needs to be in an ontology, and the ontology needs a backing ontology engine to support the graph-based algorithms like traversal etc. SimpleGOEngine is the default engine being used if you don't specify another one, and it itself is merely a wrapper around an instance Graph.pm (the superclass of Graph::Directed). This breaks down on a seemingly simple use case because the modules using Graph.pm have been programmed (by ChrisZ and myself) with the concept that if you use them you'll almost always also want the engine API functional. Now with the transition to a much more pervasive use of ontology terms, you easily have a situation in which all you'll ever want from a term's ontology is its name. So I suggest that I (or any taker is welcome) make this more robust by loading Graph.pm and friends only on demand within SimpleGOEngine when the methods depending on it are actually used. As for adapting the go-perl API to Bioperl, yes that's what I've done but have yet to test. As soon as I'm convinced that it works I'll commit it though. Note that this isn't really the panacea to all ontology-related problems in Bioperl though. The issue Aaron's hit is due to assumptions being made too quick about what a user has installed and what she's going to call, hence can be fixed accordingly. Also, Graph.pm is not a necessarily dispensible dependency; it implements many algorithms on graphs (connected subgraphs, shortest path, etc) that go-perl doesn't but which can be very useful. I decided to adapt go-perl to Bioperl primarily to finally delegate responsibility for dealing with the oddities of the dag-edit and obo-family of file formats to those who claim they solved all that :) -hilmar On Jun 22, 2005, at 8:43 PM, Allen Day wrote: > Hi, > > Where is the dependency on Graph::Directed introduced? A grep through > Bio/* in bioperl-live on HEAD reveals several references in the POD to > Graph.pm, but I don't see it anywhere in the code. > > I don't know if Chris Mungall's GO-Perl API removes the dependency on > Graph::Directed, but it is certainly easier to use than > Bio::OntologyIO as > a means to access the OBO ontologies. I'm willing to look into > converting > Bio::Annotation::OntologyTerm to use GO::Model::* instead of > Bio::Ontology::Term, but it may interfere with other projects using the > class (e.g. bioperl-db). > > Hilmar, I know you were looking at the GO-Perl codebase recently, can > you > comment on any of the above? > > -Allen > > > On Wed, 22 Jun 2005, Aaron J. Mackey wrote: > >> >> Because AnnotatableI has implementations for add_tag and get_tag that >> invoke Bio::Annotation::OntologyTerm, and therefore Graph::Directed, >> which relies on Scalar::Util::weaken(), therefore I cannot even use >> basic Bio::Seq functionality on any perl that doesn't have weak >> references (oddly, this cropped up in a 5.8.0 install via an RPM that >> was evidently compiled without support for weak references, so this >> isn't just an "ancient perl" problem). >> >> This is something of a showstopper for any 1.6; in effect, we'd need >> to disable Annotation::OntologyTerm use for any Perl without weak >> reference support. >> >> We've said it before, and we need to say it again: the changes made >> to the feature/annotation object model are seriously impeding our >> ability to move forward to a release (and frighteningly, the GBrowse >> distribution now includes those parts of 1.5 that it relies on, so a >> user's BioPerl install could be a hodge-podge of 1.4/1.5 code). This >> seems important to all GMOD projects, so why hasn't there been any >> work on it? >> >> Thanks, >> >> -Aaron >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From sdavis2 at mail.nih.gov Thu Jun 23 07:06:55 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Jun 23 06:57:59 2005 Subject: [Bioperl-l] GEO SOFT format parsing Message-ID: I saw a couple of posts about GEO SOFT format parsing from about a year ago and wondered if anyone had gone on to complete one (for GDS, as my main priority). If so, is it available somewhere? Thanks, Sean From sm_middha at yahoo.com Thu Jun 23 13:07:17 2005 From: sm_middha at yahoo.com (sumit middha) Date: Thu Jun 23 12:58:40 2005 Subject: [Bioperl-l] FASTA.pm issue In-Reply-To: Message-ID: <20050623170717.41518.qmail@web30702.mail.mud.yahoo.com> Thanks for the reply Brian. Changing it to Bio::Index::Fasta helped, but gave another problem in my script, which I dont have a clue. ------------- EXCEPTION ------------- MSG: Can't open 'SDBM_File' dbm file '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or directory STACK Bio::Index::Abstract::open_dbm /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 STACK Bio::Index::Abstract::new /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 STACK Bio::Index::AbstractSeq::new /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 STACK toplevel get_ortho.pl:31 I know that the file exists, and has been formatted as a database to use BLAST search. sumit --- Brian Osborne wrote: > Sumit, > > In perl 5.8 a module that's using a tied hash is > supposed to have an EXISTS > method, but it appears that AnyDBM_File doesn't. You > could try using > Bio::Index::Fasta instead, or Bio::DB::Flat. > > Brian O. > > > On 6/22/05 6:24 PM, "sumit middha" > wrote: > > > > > Hello, > > > > I have a trouble with using fasta module > > > > I use the required statements > > > > use Bio::DB::Fasta; > > use Bio::Seq; > > > > The error was: > > > > AnyDBM_File doesn't define an EXISTS method at > > > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm > > line 577 > > > > thanks, > > sm > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From khh103 at york.ac.uk Thu Jun 23 13:30:31 2005 From: khh103 at york.ac.uk (Kat Hull) Date: Thu Jun 23 13:24:09 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession Message-ID: <42BAF1B7.10109@york.ac.uk> Hi there, I was wondering whether anyone has a solution to my problem. I have a list of protein assession numbers and want to retrieve the corresponding nucleotide sequences automatically. I thought it would be possible to do this by changing the NCBI url, but this doesn't seem to be the case. Is there a bio-perl module that can do this? Kind regards, Kat From brian_osborne at cognia.com Thu Jun 23 14:38:06 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jun 23 14:30:07 2005 Subject: [Bioperl-l] FASTA.pm issue In-Reply-To: <20050623170717.41518.qmail@web30702.mail.mud.yahoo.com> Message-ID: Sumit, You'll have to show us the code that gives you the error, I think. Brian O. On 6/23/05 1:07 PM, "sumit middha" wrote: > > Thanks for the reply Brian. > Changing it to Bio::Index::Fasta helped, but gave > another problem in my script, which I dont have a > clue. > > ------------- EXCEPTION ------------- > MSG: Can't open 'SDBM_File' dbm file > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or > directory > STACK Bio::Index::Abstract::open_dbm > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392 > STACK Bio::Index::Abstract::new > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 > STACK Bio::Index::AbstractSeq::new > /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 > STACK toplevel get_ortho.pl:31 > > I know that the file exists, and has been formatted as > a database to use BLAST search. > > sumit > > --- Brian Osborne wrote: > >> Sumit, >> >> In perl 5.8 a module that's using a tied hash is >> supposed to have an EXISTS >> method, but it appears that AnyDBM_File doesn't. You >> could try using >> Bio::Index::Fasta instead, or Bio::DB::Flat. >> >> Brian O. >> >> >> On 6/22/05 6:24 PM, "sumit middha" >> wrote: >> >>> >>> Hello, >>> >>> I have a trouble with using fasta module >>> >>> I use the required statements >>> >>> use Bio::DB::Fasta; >>> use Bio::Seq; >>> >>> The error was: >>> >>> AnyDBM_File doesn't define an EXISTS method at >>> >> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm >>> line 577 >>> >>> thanks, >>> sm >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> >> > http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From nandita at uga.edu Thu Jun 23 15:16:01 2005 From: nandita at uga.edu (Nandita Mullapudi) Date: Thu Jun 23 15:07:39 2005 Subject: [Bioperl-l] help with parsing meme output Message-ID: <792dff48.110fccbf.8442900@punts2.cc.uga.edu> Hi, I am trying to use Bio::Matrix::PSM::IO to parse meme output. I need to extract the values corresponding to length of the sequence, seq id, and motif id, start and significance/score. I can get the last three using foreach my $instance (@{ $instances }) { my $start = $instance -> start; my $score = $instance -> score; But i cannot find out how to get the seq id and seq length. any ideas? thanks -nandita *************************************************** Graduate Student, Kissinger Lab. Dept. of Genetics UGA, Athens GA 30602 USA lab phone: 706-542-6563 cell phone: 706-254-2444 Lab add: C318 Life Sciences **************************************************** From james.wasmuth at ed.ac.uk Thu Jun 23 15:33:04 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Thu Jun 23 15:29:17 2005 Subject: [Bioperl-l] help with parsing meme output In-Reply-To: <792dff48.110fccbf.8442900@punts2.cc.uga.edu> References: <792dff48.110fccbf.8442900@punts2.cc.uga.edu> Message-ID: <42BB0E70.1030802@ed.ac.uk> Hi Nandita does "my $id=$instance->primary_id;" do what you want? Is it the length from the input sequence that you want? my %length= $header->length(); Function: Returns the length of the input sequence or motifs as a hash, indexed by a sequence ID (motif id or accession number) james Nandita Mullapudi wrote: >Hi, >I am trying to use Bio::Matrix::PSM::IO to parse meme output. >I need to extract the values corresponding to length of the >sequence, seq id, and motif id, start and significance/score. >I can get the last three using > >foreach my $instance (@{ $instances }) { > my $start = $instance -> start; > my $score = $instance -> score; > >But i cannot find out how to get the seq id and seq length. >any ideas? >thanks >-nandita > >*************************************************** >Graduate Student, Kissinger Lab. >Dept. of Genetics >UGA, Athens GA 30602 USA >lab phone: 706-542-6563 >cell phone: 706-254-2444 >Lab add: C318 Life Sciences >**************************************************** >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- http://www.nematodes.org/~james "Until man duplicates a blade of grass, nature can laugh at his so-called scientific knowledge...." --Thomas Edison Blaxter Nematode Genomics Group | Institute of Evolutionary Biology | Ashworth Laboratories, KB | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From skirov at utk.edu Thu Jun 23 15:52:58 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 23 15:44:24 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession In-Reply-To: <42BAF1B7.10109@york.ac.uk> References: <42BAF1B7.10109@york.ac.uk> Message-ID: <42BB131A.2090403@utk.edu> Kat, If you are familiar with Bioperl it is kind of easy- look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on how to get the protein record Go through the dblinks and find the appropriate accession number (where the database method returns GenBank). Then retrieve this accession number(s) through Bio::DB::GenBank. If you are not familiar with Bioperl- read the docs for Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and Bio::Annotation::DBLink). Hope this helps, Stefan Kat Hull wrote: > Hi there, > I was wondering whether anyone has a solution to my problem. I have a > list of protein assession numbers and want to retrieve the > corresponding nucleotide sequences automatically. I thought it would > be possible to do this by changing the NCBI url, but this doesn't seem > to be the case. > Is there a bio-perl module that can do this? > > Kind regards, > Kat > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l Stefan From skirov at utk.edu Thu Jun 23 15:55:53 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 23 15:47:14 2005 Subject: [Bioperl-l] help with parsing meme output In-Reply-To: <792dff48.110fccbf.8442900@punts2.cc.uga.edu> References: <792dff48.110fccbf.8442900@punts2.cc.uga.edu> Message-ID: <42BB13C9.8030807@utk.edu> $instance->accession_number to get the sequence id and lenght is given by $instance->length Stefan Nandita Mullapudi wrote: >Hi, >I am trying to use Bio::Matrix::PSM::IO to parse meme output. >I need to extract the values corresponding to length of the >sequence, seq id, and motif id, start and significance/score. >I can get the last three using > >foreach my $instance (@{ $instances }) { > my $start = $instance -> start; > my $score = $instance -> score; > >But i cannot find out how to get the seq id and seq length. >any ideas? >thanks >-nandita > >*************************************************** >Graduate Student, Kissinger Lab. >Dept. of Genetics >UGA, Athens GA 30602 USA >lab phone: 706-542-6563 >cell phone: 706-254-2444 >Lab add: C318 Life Sciences >**************************************************** >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From james.wasmuth at ed.ac.uk Thu Jun 23 15:51:48 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Thu Jun 23 15:47:24 2005 Subject: [Bioperl-l] help with parsing meme output In-Reply-To: References: Message-ID: <42BB12D4.5050705@ed.ac.uk> Nandita The BioPerl module $header->length() comes from is PSM/PsmHeader.pm This should be inherited when you "use Bio::Matrix::PSM::IO" have a look http://doc.bioperl.org/releases/bioperl-1.4/Bio/Matrix/PSM/IO.html What you want should be covered there. Otherwise shout and someone will answer -james Nandita Mullapudi wrote: >thanks James, > > > >>Is it the length from the input sequence that you want? >> >>my %length= $header->length(); >>Function: Returns the length of the input sequence or motifs >> >> >as a hash, indexed > > >>by a sequence ID (motif id or accession number) >> >> >> > >yes, i want the length from the input sequence. I am not sure >i can use the above without specifying which module / package >it refers to? > >also , where can i find this info? :) >thanks, >-nandita > > > > >>james >> >> >>Nandita Mullapudi wrote: >> >> >> >>>Hi, >>>I am trying to use Bio::Matrix::PSM::IO to parse meme output. >>>I need to extract the values corresponding to length of the >>>sequence, seq id, and motif id, start and significance/score. >>>I can get the last three using >>> >>>foreach my $instance (@{ $instances }) { >>> my $start = $instance -> start; >>> my $score = $instance -> score; >>> >>>But i cannot find out how to get the seq id and seq length. >>>any ideas? >>>thanks >>>-nandita >>> >>>*************************************************** >>>Graduate Student, Kissinger Lab. >>>Dept. of Genetics >>>UGA, Athens GA 30602 USA >>>lab phone: 706-542-6563 >>>cell phone: 706-254-2444 >>>Lab add: C318 Life Sciences >>>**************************************************** >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>-- >>http://www.nematodes.org/~james >> >>"Until man duplicates a blade of grass, nature can laugh at >> >> >his so-called scientific knowledge...." > > >> --Thomas Edison >> >>Blaxter Nematode Genomics Group | >>Institute of Evolutionary Biology | >>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>University of Edinburgh | web: www.nematodes.org >>Edinburgh | >>EH9 3JT | >>UK | >> >> >> >> > >*************************************************** >Graduate Student, Kissinger Lab. >Dept. of Genetics >UGA, Athens GA 30602 USA >lab phone: 706-542-6563 >cell phone: 706-254-2444 >Lab add: C318 Life Sciences >**************************************************** > > -- http://www.nematodes.org/~james "Until man duplicates a blade of grass, nature can laugh at his so-called scientific knowledge...." --Thomas Edison Blaxter Nematode Genomics Group | Institute of Evolutionary Biology | Ashworth Laboratories, KB | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From skirov at utk.edu Thu Jun 23 16:14:21 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 23 16:06:06 2005 Subject: [Bioperl-l] help with parsing meme output In-Reply-To: <37c6b8a0.1114add7.82dba00@punts2.cc.uga.edu> References: <37c6b8a0.1114add7.82dba00@punts2.cc.uga.edu> Message-ID: <42BB181D.7040006@utk.edu> The error you get is because you did not declare $header prior to usage with 'my' (or you can use it as a global, but then you should do something like $main::header). Another way is not to use strict, but this is not generally recommended if you want to write something reliable. And yes, $instance->length will give back the hit length, so you may need what James suggested. Stefan Nandita Mullapudi wrote: >i should've tried to be clearer, i'm looking for for length of >the input sequence. i tried James' suggestion of print >"$header->length"; but the error i get back is > >Global symbol "$header" requires explicit package name at >parsememe2.pl line 15. > > >altho i am using use Bio::Matrix::PSM::IO; > >thanks for your help >-nandita > >---- Original message ---- > > >>Date: Thu, 23 Jun 2005 15:55:53 -0400 >>From: Stefan Kirov >>Subject: Re: [Bioperl-l] help with parsing meme output >>To: Nandita Mullapudi >>Cc: bioperl-l@portal.open-bio.org >> >>$instance->accession_number to get the sequence id >>and >>lenght is given by $instance->length >>Stefan >> >>Nandita Mullapudi wrote: >> >> >> >>>Hi, >>>I am trying to use Bio::Matrix::PSM::IO to parse meme output. >>>I need to extract the values corresponding to length of the >>>sequence, seq id, and motif id, start and significance/score. >>>I can get the last three using >>> >>>foreach my $instance (@{ $instances }) { >>> my $start = $instance -> start; >>> my $score = $instance -> score; >>> >>>But i cannot find out how to get the seq id and seq length. >>>any ideas? >>>thanks >>>-nandita >>> >>>*************************************************** >>>Graduate Student, Kissinger Lab. >>>Dept. of Genetics >>>UGA, Athens GA 30602 USA >>>lab phone: 706-542-6563 >>>cell phone: 706-254-2444 >>>Lab add: C318 Life Sciences >>>**************************************************** >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> > >*************************************************** >Graduate Student, Kissinger Lab. >Dept. of Genetics >UGA, Athens GA 30602 USA >lab phone: 706-542-6563 >cell phone: 706-254-2444 >Lab add: C318 Life Sciences >**************************************************** > > From james.wasmuth at ed.ac.uk Thu Jun 23 16:36:03 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Thu Jun 23 16:31:43 2005 Subject: [Bioperl-l] help with parsing meme output In-Reply-To: <5527172c.11156e5d.81a0500@punts2.cc.uga.edu> References: <5527172c.11156e5d.81a0500@punts2.cc.uga.edu> Message-ID: <42BB1D33.4070009@ed.ac.uk> Does this behave itself? while (my %header=$psmIO->header) { for (my $i=0; $i<=$#{$header{instances}};$i++) { print $header{instances}->[$i],"\t",$header{lengths}->[$i],"\n"; } } I don't use these modules but having looked at the docs this should work. Although the notes in Bio::Matrix::PSM::IO for this method say it should be obsolete. If you still get no joy then attach a copy of the output file to an email. This should provide people with an example. Nandita Mullapudi wrote: >ok i've got to be missing something here. > >this is my code: > >use strict; >use warnings; >use Bio::Matrix::PSM::IO; >use Bio::Matrix::PSM::InstanceSite; > >my $psmIO = new Bio::Matrix::PSM::IO( -file => 'memeout.txt', > -format => 'meme'); > >while (my %header=$psmIO->header) { > foreach my $seqid (@{$header{instances}}) { > print "$header->length"; > >} >} > > >and the error i get is " Global symbol "$header" requires >explicit package name at parsememe2.pl line 15. >Execution of parsememe2.pl aborted due to compilation errors. > > >thanks for your help. >-n > > > > >---- Original message ---- > > >>Date: Thu, 23 Jun 2005 20:51:48 +0100 >>From: James Wasmuth >>Subject: Re: [Bioperl-l] help with parsing meme output >>To: Nandita Mullapudi >>Cc: bioperl-l@bioperl.org >> >>Nandita >> >>The BioPerl module $header->length() comes from is >> >> >PSM/PsmHeader.pm > > >>This should be inherited when you "use Bio::Matrix::PSM::IO" >> >>have a look >>http://doc.bioperl.org/releases/bioperl-1.4/Bio/Matrix/PSM/IO.html >> >>What you want should be covered there. Otherwise shout and >> >> >someone will > > >>answer >> >>-james >> >> >> >>Nandita Mullapudi wrote: >> >> >> >>>thanks James, >>> >>> >>> >>> >>> >>>>Is it the length from the input sequence that you want? >>>> >>>>my %length= $header->length(); >>>>Function: Returns the length of the input sequence or motifs >>>> >>>> >>>> >>>> >>>as a hash, indexed >>> >>> >>> >>> >>>>by a sequence ID (motif id or accession number) >>>> >>>> >>>> >>>> >>>> >>>yes, i want the length from the input sequence. I am not sure >>>i can use the above without specifying which module / package >>>it refers to? >>> >>>also , where can i find this info? :) >>>thanks, >>>-nandita >>> >>> >>> >>> >>> >>> >>>>james >>>> >>>> >>>>Nandita Mullapudi wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Hi, >>>>>I am trying to use Bio::Matrix::PSM::IO to parse meme output. >>>>>I need to extract the values corresponding to length of the >>>>>sequence, seq id, and motif id, start and significance/score. >>>>>I can get the last three using >>>>> >>>>>foreach my $instance (@{ $instances }) { >>>>> my $start = $instance -> start; >>>>> my $score = $instance -> score; >>>>> >>>>>But i cannot find out how to get the seq id and seq length. >>>>>any ideas? >>>>>thanks >>>>>-nandita >>>>> >>>>>*************************************************** >>>>>Graduate Student, Kissinger Lab. >>>>>Dept. of Genetics >>>>>UGA, Athens GA 30602 USA >>>>>lab phone: 706-542-6563 >>>>>cell phone: 706-254-2444 >>>>>Lab add: C318 Life Sciences >>>>>**************************************************** >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l@portal.open-bio.org >>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>-- >>>>http://www.nematodes.org/~james >>>> >>>>"Until man duplicates a blade of grass, nature can laugh at >>>> >>>> >>>> >>>> >>>his so-called scientific knowledge...." >>> >>> >>> >>> >>>> --Thomas Edison >>>> >>>>Blaxter Nematode Genomics Group | >>>>Institute of Evolutionary Biology | >>>>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>University of Edinburgh | web: www.nematodes.org >>>>Edinburgh | >>>>EH9 3JT | >>>>UK | >>>> >>>> >>>> >>>> >>>> >>>> >>>*************************************************** >>>Graduate Student, Kissinger Lab. >>>Dept. of Genetics >>>UGA, Athens GA 30602 USA >>>lab phone: 706-542-6563 >>>cell phone: 706-254-2444 >>>Lab add: C318 Life Sciences >>>**************************************************** >>> >>> >>> >>> >>-- >>http://www.nematodes.org/~james >> >>"Until man duplicates a blade of grass, nature can laugh at >> >> >his so-called scientific knowledge...." > > >> --Thomas Edison >> >>Blaxter Nematode Genomics Group | >>Institute of Evolutionary Biology | >>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>University of Edinburgh | web: www.nematodes.org >>Edinburgh | >>EH9 3JT | >>UK | >> >> >> >> > >*************************************************** >Graduate Student, Kissinger Lab. >Dept. of Genetics >UGA, Athens GA 30602 USA >lab phone: 706-542-6563 >cell phone: 706-254-2444 >Lab add: C318 Life Sciences >**************************************************** > > -- http://www.nematodes.org/~james "Until man duplicates a blade of grass, nature can laugh at his so-called scientific knowledge...." --Thomas Edison Blaxter Nematode Genomics Group | Institute of Evolutionary Biology | Ashworth Laboratories, KB | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From james.wasmuth at ed.ac.uk Thu Jun 23 17:23:15 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Thu Jun 23 17:20:53 2005 Subject: [Bioperl-l] help with parsing meme output In-Reply-To: References: Message-ID: <42BB2843.4020307@ed.ac.uk> It would appear that $psmIO->header is not implemented in PSM/IO.pm. Does anyone know if this is to be done? Nandita Mullapudi wrote: >thanks James, >this one gives the error > >Can't use an undefined value as an ARRAY reference at >/usr/lib/perl5/site_perl/5.6.1/Bio/Matrix/PSM/IO/meme.pm line >159, line 43. > >i've attached the text output i am trying to parse > >-nandita > > >---- Original message ---- > > >>Date: Thu, 23 Jun 2005 21:36:03 +0100 >>From: James Wasmuth >>Subject: Re: [Bioperl-l] help with parsing meme output >>To: Nandita Mullapudi >>Cc: bioperl-l@portal.open-bio.org >> >>Does this behave itself? >> >>while (my %header=$psmIO->header) { >> for (my $i=0; $i<=$#{$header{instances}};$i++) { >> print >> >> >$header{instances}->[$i],"\t",$header{lengths}->[$i],"\n"; > > >> } >>} >> >> >>I don't use these modules but having looked at the docs this >> >> >should work. Although the notes in Bio::Matrix::PSM::IO for >this method say it should be obsolete. > > >>If you still get no joy then attach a copy of the output file >> >> >to an email. This should provide people with an example. > > >> >> >> >> >> >> >> >>Nandita Mullapudi wrote: >> >> >> >>>ok i've got to be missing something here. >>> >>>this is my code: >>> >>>use strict; >>>use warnings; >>>use Bio::Matrix::PSM::IO; >>>use Bio::Matrix::PSM::InstanceSite; >>> >>>my $psmIO = new Bio::Matrix::PSM::IO( -file => 'memeout.txt', >>> -format => 'meme'); >>> >>>while (my %header=$psmIO->header) { >>> foreach my $seqid (@{$header{instances}}) { >>>print "$header->length"; >>> >>>} >>>} >>> >>> >>>and the error i get is " Global symbol "$header" requires >>>explicit package name at parsememe2.pl line 15. >>>Execution of parsememe2.pl aborted due to compilation errors. >>> >>> >>>thanks for your help. >>>-n >>> >>> >>> >>> >>>---- Original message ---- >>> >>> >>> >>> >>>>Date: Thu, 23 Jun 2005 20:51:48 +0100 >>>>From: James Wasmuth >>>>Subject: Re: [Bioperl-l] help with parsing meme output >>>>To: Nandita Mullapudi >>>>Cc: bioperl-l@bioperl.org >>>> >>>>Nandita >>>> >>>>The BioPerl module $header->length() comes from is >>>> >>>> >>>> >>>> >>>PSM/PsmHeader.pm >>> >>> >>> >>> >>>>This should be inherited when you "use Bio::Matrix::PSM::IO" >>>> >>>>have a look >>>>http://doc.bioperl.org/releases/bioperl-1.4/Bio/Matrix/PSM/IO.html >>>> >>>>What you want should be covered there. Otherwise shout and >>>> >>>> >>>> >>>> >>>someone will >>> >>> >>> >>> >>>>answer >>>> >>>>-james >>>> >>>> >>>> >>>>Nandita Mullapudi wrote: >>>> >>>> >>>> >>>> >>>> >>>>>thanks James, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Is it the length from the input sequence that you want? >>>>>> >>>>>>my %length= $header->length(); >>>>>>Function: Returns the length of the input sequence or motifs >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>as a hash, indexed >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>by a sequence ID (motif id or accession number) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>yes, i want the length from the input sequence. I am not sure >>>>>i can use the above without specifying which module / package >>>>>it refers to? >>>>> >>>>>also , where can i find this info? :) >>>>>thanks, >>>>>-nandita >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>james >>>>>> >>>>>> >>>>>>Nandita Mullapudi wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Hi, >>>>>>>I am trying to use Bio::Matrix::PSM::IO to parse meme >>>>>>> >>>>>>> >output. > > >>>>>>>I need to extract the values corresponding to length of the >>>>>>>sequence, seq id, and motif id, start and >>>>>>> >>>>>>> >significance/score. > > >>>>>>>I can get the last three using >>>>>>> >>>>>>>foreach my $instance (@{ $instances }) { >>>>>>> my $start = $instance -> start; >>>>>>> my $score = $instance -> score; >>>>>>> >>>>>>>But i cannot find out how to get the seq id and seq length. >>>>>>>any ideas? >>>>>>>thanks >>>>>>>-nandita >>>>>>> >>>>>>>*************************************************** >>>>>>>Graduate Student, Kissinger Lab. >>>>>>>Dept. of Genetics >>>>>>>UGA, Athens GA 30602 USA >>>>>>>lab phone: 706-542-6563 >>>>>>>cell phone: 706-254-2444 >>>>>>>Lab add: C318 Life Sciences >>>>>>>**************************************************** >>>>>>>_______________________________________________ >>>>>>>Bioperl-l mailing list >>>>>>>Bioperl-l@portal.open-bio.org >>>>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>-- >>>>>>http://www.nematodes.org/~james >>>>>> >>>>>>"Until man duplicates a blade of grass, nature can laugh at >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>his so-called scientific knowledge...." >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> --Thomas Edison >>>>>> >>>>>>Blaxter Nematode Genomics Group | >>>>>>Institute of Evolutionary Biology | >>>>>>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>>>University of Edinburgh | web: www.nematodes.org >>>>>>Edinburgh | >>>>>>EH9 3JT | >>>>>>UK | >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>*************************************************** >>>>>Graduate Student, Kissinger Lab. >>>>>Dept. of Genetics >>>>>UGA, Athens GA 30602 USA >>>>>lab phone: 706-542-6563 >>>>>cell phone: 706-254-2444 >>>>>Lab add: C318 Life Sciences >>>>>**************************************************** >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>-- >>>>http://www.nematodes.org/~james >>>> >>>>"Until man duplicates a blade of grass, nature can laugh at >>>> >>>> >>>> >>>> >>>his so-called scientific knowledge...." >>> >>> >>> >>> >>>> --Thomas Edison >>>> >>>>Blaxter Nematode Genomics Group | >>>>Institute of Evolutionary Biology | >>>>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>University of Edinburgh | web: www.nematodes.org >>>>Edinburgh | >>>>EH9 3JT | >>>>UK | >>>> >>>> >>>> >>>> >>>> >>>> >>>*************************************************** >>>Graduate Student, Kissinger Lab. >>>Dept. of Genetics >>>UGA, Athens GA 30602 USA >>>lab phone: 706-542-6563 >>>cell phone: 706-254-2444 >>>Lab add: C318 Life Sciences >>>**************************************************** >>> >>> >>> >>> >>-- >>http://www.nematodes.org/~james >> >>"Until man duplicates a blade of grass, nature can laugh at >> >> >his so-called scientific knowledge...." > > >> --Thomas Edison >> >>Blaxter Nematode Genomics Group | >>Institute of Evolutionary Biology | >>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>University of Edinburgh | web: www.nematodes.org >>Edinburgh | >>EH9 3JT | >>UK | >> >> >> >> > >*************************************************** >Graduate Student, Kissinger Lab. >Dept. of Genetics >UGA, Athens GA 30602 USA >lab phone: 706-542-6563 >cell phone: 706-254-2444 >Lab add: C318 Life Sciences >**************************************************** > > From skirov at utk.edu Thu Jun 23 21:12:35 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 23 21:07:59 2005 Subject: [Bioperl-l] help with parsing meme output In-Reply-To: <42BB2843.4020307@ed.ac.uk> References: <42BB2843.4020307@ed.ac.uk> Message-ID: <42BB5E03.5000206@utk.edu> James, while you were correct in your first suggestion (to use $psmIO->length), but quite incorrect at your next suggestion: the code while (my %header=$psmIO->header) { makes no sense. What you probably want to do is: while (my $psm=$psmIO->next_psm) { my %header=$psm->header; #Bio::Matrix::PSM::Psm method #Do something with the has } But header has different purpose- it contains data about particular prediction, such as number of sites, width of the motif, etc. If you need the initial sequences lengths it is precisely as you suggested: my %lengths=$psmIO->length; foreach my $id (keys %lengths) { print "Initial sequence $id length is ",$lengths{$id},"\n"; } C'est tout! To get a length of a particular hit (that is the sequence, on which a predicted motif is based): while (my $psm=$psmIO->next_psm) { my $instances=$psm->instances; foreach my..... { print "Hits ... is long", $instance->length.... } Let me know if there are further questions Stefan James Wasmuth wrote: > It would appear that $psmIO->header is not implemented in PSM/IO.pm. > Does anyone know if this is to be done? > > Nandita Mullapudi wrote: > >> thanks James, >> this one gives the error >> >> Can't use an undefined value as an ARRAY reference at >> /usr/lib/perl5/site_perl/5.6.1/Bio/Matrix/PSM/IO/meme.pm line >> 159, line 43. >> >> i've attached the text output i am trying to parse >> >> -nandita >> >> >> ---- Original message ---- >> >> >>> Date: Thu, 23 Jun 2005 21:36:03 +0100 >>> From: James Wasmuth Subject: Re: >>> [Bioperl-l] help with parsing meme output To: Nandita Mullapudi >>> >>> Cc: bioperl-l@portal.open-bio.org >>> >>> Does this behave itself? >>> while (my %header=$psmIO->header) { >>> for (my $i=0; $i<=$#{$header{instances}};$i++) { >>> print >>> >> >> $header{instances}->[$i],"\t",$header{lengths}->[$i],"\n"; >> >> >>> } >>> } >>> >>> >>> I don't use these modules but having looked at the docs this >>> >> >> should work. Although the notes in Bio::Matrix::PSM::IO for >> this method say it should be obsolete. >> >> >>> If you still get no joy then attach a copy of the output file >>> >> >> to an email. This should provide people with an example. >> >> >>> >>> >>> >>> >>> >>> >>> >>> Nandita Mullapudi wrote: >>> >>> >>> >>>> ok i've got to be missing something here. >>>> >>>> this is my code: >>>> >>>> use strict; >>>> use warnings; >>>> use Bio::Matrix::PSM::IO; >>>> use Bio::Matrix::PSM::InstanceSite; >>>> >>>> my $psmIO = new Bio::Matrix::PSM::IO( -file => 'memeout.txt', >>>> -format => 'meme'); >>>> >>>> while (my %header=$psmIO->header) { >>>> foreach my $seqid (@{$header{instances}}) { >>>> print "$header->length"; >>>> >>>> } >>>> } >>>> >>>> >>>> and the error i get is " Global symbol "$header" requires >>>> explicit package name at parsememe2.pl line 15. >>>> Execution of parsememe2.pl aborted due to compilation errors. >>>> >>>> >>>> thanks for your help. >>>> -n >>>> >>>> >>>> >>>> >>>> ---- Original message ---- >>>> >>>> >>>> >>>> >>>>> Date: Thu, 23 Jun 2005 20:51:48 +0100 >>>>> From: James Wasmuth Subject: Re: >>>>> [Bioperl-l] help with parsing meme output To: Nandita Mullapudi >>>>> >>>>> Cc: bioperl-l@bioperl.org >>>>> >>>>> Nandita >>>>> >>>>> The BioPerl module $header->length() comes from is >>>>> >>>>> >>>> >>>> PSM/PsmHeader.pm >>>> >>>> >>>> >>>> >>>>> This should be inherited when you "use Bio::Matrix::PSM::IO" >>>>> >>>>> have a look >>>>> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Matrix/PSM/IO.html >>>>> >>>>> What you want should be covered there. Otherwise shout and >>>>> >>>>> >>>> >>>> someone will >>>> >>>> >>>> >>>>> answer >>>>> >>>>> -james >>>>> >>>>> >>>>> >>>>> Nandita Mullapudi wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> thanks James, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Is it the length from the input sequence that you want? >>>>>>> >>>>>>> my %length= $header->length(); >>>>>>> Function: Returns the length of the input sequence or motifs >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> as a hash, indexed >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> by a sequence ID (motif id or accession number) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> yes, i want the length from the input sequence. I am not sure >>>>>> i can use the above without specifying which module / package >>>>>> it refers to? >>>>>> also , where can i find this info? :) >>>>>> thanks, >>>>>> -nandita >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> james >>>>>>> >>>>>>> >>>>>>> Nandita Mullapudi wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> I am trying to use Bio::Matrix::PSM::IO to parse meme >>>>>>>> >>>>>>> >> output. >> >> >>>>>>>> I need to extract the values corresponding to length of the >>>>>>>> sequence, seq id, and motif id, start and >>>>>>>> >>>>>>> >> significance/score. >> >> >>>>>>>> I can get the last three using >>>>>>>> foreach my $instance (@{ $instances }) { >>>>>>>> my $start = $instance -> start; >>>>>>>> my $score = $instance -> score; >>>>>>>> >>>>>>>> But i cannot find out how to get the seq id and seq length. >>>>>>>> any ideas? >>>>>>>> thanks >>>>>>>> -nandita >>>>>>>> >>>>>>>> *************************************************** >>>>>>>> Graduate Student, Kissinger Lab. >>>>>>>> Dept. of Genetics >>>>>>>> UGA, Athens GA 30602 USA >>>>>>>> lab phone: 706-542-6563 >>>>>>>> cell phone: 706-254-2444 >>>>>>>> Lab add: C318 Life Sciences >>>>>>>> **************************************************** >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l@portal.open-bio.org >>>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> http://www.nematodes.org/~james >>>>>>> >>>>>>> "Until man duplicates a blade of grass, nature can laugh at >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> his so-called scientific knowledge...." >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> --Thomas Edison >>>>>>> Blaxter Nematode Genomics Group | >>>>>>> Institute of Evolutionary Biology | >>>>>>> Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>>>> University of Edinburgh | web: www.nematodes.org >>>>>>> Edinburgh | >>>>>>> EH9 3JT | >>>>>>> UK | >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> *************************************************** >>>>>> Graduate Student, Kissinger Lab. >>>>>> Dept. of Genetics >>>>>> UGA, Athens GA 30602 USA >>>>>> lab phone: 706-542-6563 >>>>>> cell phone: 706-254-2444 >>>>>> Lab add: C318 Life Sciences >>>>>> **************************************************** >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> http://www.nematodes.org/~james >>>>> >>>>> "Until man duplicates a blade of grass, nature can laugh at >>>>> >>>>> >>>> >>>> his so-called scientific knowledge...." >>>> >>>> >>>> >>>> >>>>> --Thomas Edison >>>>> Blaxter Nematode Genomics Group | >>>>> Institute of Evolutionary Biology | >>>>> Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>> University of Edinburgh | web: www.nematodes.org >>>>> Edinburgh | >>>>> EH9 3JT | >>>>> UK | >>>>> >>>>> >>>>> >>>>> >>>> >>>> *************************************************** >>>> Graduate Student, Kissinger Lab. >>>> Dept. of Genetics >>>> UGA, Athens GA 30602 USA >>>> lab phone: 706-542-6563 >>>> cell phone: 706-254-2444 >>>> Lab add: C318 Life Sciences >>>> **************************************************** >>>> >>>> >>>> >>> >>> -- >>> http://www.nematodes.org/~james >>> >>> "Until man duplicates a blade of grass, nature can laugh at >>> >> >> his so-called scientific knowledge...." >> >> >>> --Thomas Edison >>> Blaxter Nematode Genomics Group | >>> Institute of Evolutionary Biology | >>> Ashworth Laboratories, KB | tel: +44 131 650 7403 >>> University of Edinburgh | web: www.nematodes.org >>> Edinburgh | >>> EH9 3JT | >>> UK | >>> >>> >>> >> >> >> *************************************************** >> Graduate Student, Kissinger Lab. >> Dept. of Genetics >> UGA, Athens GA 30602 USA >> lab phone: 706-542-6563 >> cell phone: 706-254-2444 >> Lab add: C318 Life Sciences >> **************************************************** >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From nandita at uga.edu Thu Jun 23 22:40:44 2005 From: nandita at uga.edu (Nandita Mullapudi) Date: Thu Jun 23 22:32:18 2005 Subject: [Bioperl-l] help with parsing meme output In-Reply-To: <42BB5E03.5000206@utk.edu> References: <42BB2843.4020307@ed.ac.uk> <42BB5E03.5000206@utk.edu> Message-ID: Thanks Stefan, the line that didn't make sense was mine, not James'. :p but i see what was wrong. i'll try this now. -nandita >James, >while you were correct in your first suggestion (to use >$psmIO->length), but quite incorrect at your next suggestion: >the code >while (my %header=$psmIO->header) { >makes no sense. >What you probably want to do is: >while (my $psm=$psmIO->next_psm) { >my %header=$psm->header; #Bio::Matrix::PSM::Psm method >#Do something with the has >} >But header has different purpose- it contains data about particular >prediction, such as number of sites, width of the motif, etc. >If you need the initial sequences lengths it is precisely as you suggested: >my %lengths=$psmIO->length; >foreach my $id (keys %lengths) { >print "Initial sequence $id length is ",$lengths{$id},"\n"; >} >C'est tout! >To get a length of a particular hit (that is the sequence, on which >a predicted motif is based): >while (my $psm=$psmIO->next_psm) { >my $instances=$psm->instances; >foreach my..... { >print "Hits ... is long", $instance->length.... >} >Let me know if there are further questions >Stefan > >James Wasmuth wrote: > >>It would appear that $psmIO->header is not implemented in PSM/IO.pm. >>Does anyone know if this is to be done? >> >>Nandita Mullapudi wrote: >> >>>thanks James, >>>this one gives the error >>> >>>Can't use an undefined value as an ARRAY reference at >>>/usr/lib/perl5/site_perl/5.6.1/Bio/Matrix/PSM/IO/meme.pm line >>>159, line 43. >>> >>>i've attached the text output i am trying to parse >>> >>>-nandita >>> >>> >>>---- Original message ---- >>> >>>>Date: Thu, 23 Jun 2005 21:36:03 +0100 >>>>From: James Wasmuth Subject: Re: >>>>[Bioperl-l] help with parsing meme output To: Nandita Mullapudi >>>> >>>>Cc: bioperl-l@portal.open-bio.org >>>> >>>>Does this behave itself? >>>>while (my %header=$psmIO->header) { >>>> for (my $i=0; $i<=$#{$header{instances}};$i++) { >>>> print >>>> >>> >>>$header{instances}->[$i],"\t",$header{lengths}->[$i],"\n"; >>> >>>> } >>>>} >>>> >>>> >>>>I don't use these modules but having looked at the docs this >>>> >>> >>>should work. Although the notes in Bio::Matrix::PSM::IO for >>>this method say it should be obsolete. >>> >>>>If you still get no joy then attach a copy of the output file >>>> >>> >>>to an email. This should provide people with an example. >>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>Nandita Mullapudi wrote: >>>> >>>> >>>>>ok i've got to be missing something here. >>>>> >>>>>this is my code: >>>>> >>>>>use strict; >>>>>use warnings; >>>>>use Bio::Matrix::PSM::IO; >>>>>use Bio::Matrix::PSM::InstanceSite; >>>>> >>>>>my $psmIO = new Bio::Matrix::PSM::IO( -file => 'memeout.txt', >>>>> -format => 'meme'); >>>>> >>>>>while (my %header=$psmIO->header) { >>>>> foreach my $seqid (@{$header{instances}}) { >>>>>print "$header->length"; >>>>> >>>>>} >>>>>} >>>>> >>>>> >>>>>and the error i get is " Global symbol "$header" requires >>>>>explicit package name at parsememe2.pl line 15. >>>>>Execution of parsememe2.pl aborted due to compilation errors. >>>>> >>>>> >>>>>thanks for your help. >>>>>-n >>>>> >>>>> >>>>> >>>>> >>>>>---- Original message ---- >>>>> >>>>> >>>>> >>>>> >>>>>>Date: Thu, 23 Jun 2005 20:51:48 +0100 >>>>>>From: James Wasmuth Subject: Re: >>>>>>[Bioperl-l] help with parsing meme output To: Nandita >>>>>>Mullapudi >>>>>>Cc: bioperl-l@bioperl.org >>>>>> >>>>>>Nandita >>>>>> >>>>>>The BioPerl module $header->length() comes from is >>>>>> >>>>> >>>>>PSM/PsmHeader.pm >>>>> >>>>> >>>>> >>>>> >>>>>>This should be inherited when you "use Bio::Matrix::PSM::IO" >>>>>> >>>>>>have a look >>>>>>http://doc.bioperl.org/releases/bioperl-1.4/Bio/Matrix/PSM/IO.html >>>>>> >>>>>>What you want should be covered there. Otherwise shout and >>>>>> >>>>> >>>>>someone will >>>>> >>>>> >>>>> >>>>>>answer >>>>>> >>>>>>-james >>>>>> >>>>>> >>>>>> >>>>>>Nandita Mullapudi wrote: >>>>>> >>>>>> >>>>>> >>>>>>>thanks James, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>Is it the length from the input sequence that you want? >>>>>>>> >>>>>>>>my %length= $header->length(); >>>>>>>>Function: Returns the length of the input sequence or motifs >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>>as a hash, indexed >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>by a sequence ID (motif id or accession number) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>>yes, i want the length from the input sequence. I am not sure >>>>>>>i can use the above without specifying which module / package >>>>>>>it refers to? >>>>>>>also , where can i find this info? :) >>>>>>>thanks, >>>>>>>-nandita >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>james >>>>>>>> >>>>>>>> >>>>>>>>Nandita Mullapudi wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Hi, >>>>>>>>>I am trying to use Bio::Matrix::PSM::IO to parse meme >>>>>>>>> >>>>>>>> >>>output. >>> >>>>>>>>>I need to extract the values corresponding to length of the >>>>>>>>>sequence, seq id, and motif id, start and >>>>>>>>> >>>>>>>> >>>significance/score. >>> >>>>>>>>>I can get the last three using >>>>>>>>>foreach my $instance (@{ $instances }) { >>>>>>>>> my $start = $instance -> start; >>>>>>>>> my $score = $instance -> score; >>>>>>>>> >>>>>>>>>But i cannot find out how to get the seq id and seq length. >>>>>>>>>any ideas? >>>>>>>>>thanks >>>>>>>>>-nandita >>>>>>>>> >>>>>>>>>*************************************************** >>>>>>>>>Graduate Student, Kissinger Lab. >>>>>>>>>Dept. of Genetics >>>>>>>>>UGA, Athens GA 30602 USA >>>>>>>>>lab phone: 706-542-6563 >>>>>>>>>cell phone: 706-254-2444 >>>>>>>>>Lab add: C318 Life Sciences >>>>>>>>>**************************************************** >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l@portal.open-bio.org >>>>>>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>-- >>>>>>>>http://www.nematodes.org/~james >>>>>>>> >>>>>>>>"Until man duplicates a blade of grass, nature can laugh at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>>his so-called scientific knowledge...." >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> --Thomas Edison >>>>>>>>Blaxter Nematode Genomics Group | >>>>>>>>Institute of Evolutionary Biology | >>>>>>>>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>>>>>University of Edinburgh | web: www.nematodes.org >>>>>>>>Edinburgh | >>>>>>>>EH9 3JT | >>>>>>>>UK | >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>>*************************************************** >>>>>>>Graduate Student, Kissinger Lab. >>>>>>>Dept. of Genetics >>>>>>>UGA, Athens GA 30602 USA >>>>>>>lab phone: 706-542-6563 >>>>>>>cell phone: 706-254-2444 >>>>>>>Lab add: C318 Life Sciences >>>>>>>**************************************************** >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>>-- >>>>>>http://www.nematodes.org/~james >>>>>> >>>>>>"Until man duplicates a blade of grass, nature can laugh at >>>>>> >>>>> >>>>>his so-called scientific knowledge...." >>>>> >>>>> >>>>> >>>>> >>>>>> --Thomas Edison >>>>>>Blaxter Nematode Genomics Group | >>>>>>Institute of Evolutionary Biology | >>>>>>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>>>University of Edinburgh | web: www.nematodes.org >>>>>>Edinburgh | >>>>>>EH9 3JT | >>>>>>UK | >>>>>> >>>>>> >>>>>> >>>>> >>>>>*************************************************** >>>>>Graduate Student, Kissinger Lab. >>>>>Dept. of Genetics >>>>>UGA, Athens GA 30602 USA >>>>>lab phone: 706-542-6563 >>>>>cell phone: 706-254-2444 >>>>>Lab add: C318 Life Sciences >>>>>**************************************************** >>>>> >>>>> >>>>> >>>> >>>>-- >>>>http://www.nematodes.org/~james >>>> >>>>"Until man duplicates a blade of grass, nature can laugh at >>>> >>> >>>his so-called scientific knowledge...." >>> >>>> --Thomas Edison >>>>Blaxter Nematode Genomics Group | >>>>Institute of Evolutionary Biology | >>>>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>University of Edinburgh | web: www.nematodes.org >>>>Edinburgh | >>>>EH9 3JT | >>>>UK | >>>> >>>> >>> >>> >>>*************************************************** >>>Graduate Student, Kissinger Lab. >>>Dept. of Genetics >>>UGA, Athens GA 30602 USA >>>lab phone: 706-542-6563 >>>cell phone: 706-254-2444 >>>Lab add: C318 Life Sciences >>>**************************************************** >>> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l -- **************************************************** Nandita Mullapudi Graduate student, Kissinger Lab. Dept. of Genetics. UGA Athens GA 30602 Lab Address: C-318 Life Sciences. Lab phone: 706-542-6563 Cell phone: 706-254-2444 **************************************************** From skirov at utk.edu Thu Jun 23 23:13:58 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 23 23:06:23 2005 Subject: [Bioperl-l] help with parsing meme output In-Reply-To: References: <42BB2843.4020307@ed.ac.uk> <42BB5E03.5000206@utk.edu> Message-ID: <42BB7A76.1000408@utk.edu> Sorry James :-[ ... I thought it was your suggestion... Nandita let me know if you have more questions. Stefan Nandita Mullapudi wrote: > Thanks Stefan, > the line that didn't make sense was mine, not James'. :p > but i see what was wrong. i'll try this now. > -nandita > >> James, >> while you were correct in your first suggestion (to use >> $psmIO->length), but quite incorrect at your next suggestion: >> the code >> while (my %header=$psmIO->header) { >> makes no sense. >> What you probably want to do is: >> while (my $psm=$psmIO->next_psm) { >> my %header=$psm->header; #Bio::Matrix::PSM::Psm method >> #Do something with the has >> } >> But header has different purpose- it contains data about particular >> prediction, such as number of sites, width of the motif, etc. >> If you need the initial sequences lengths it is precisely as you >> suggested: >> my %lengths=$psmIO->length; >> foreach my $id (keys %lengths) { >> print "Initial sequence $id length is ",$lengths{$id},"\n"; >> } >> C'est tout! >> To get a length of a particular hit (that is the sequence, on which a >> predicted motif is based): >> while (my $psm=$psmIO->next_psm) { >> my $instances=$psm->instances; >> foreach my..... { >> print "Hits ... is long", $instance->length.... >> } >> Let me know if there are further questions >> Stefan >> >> James Wasmuth wrote: >> >>> It would appear that $psmIO->header is not implemented in PSM/IO.pm. >>> Does anyone know if this is to be done? >>> >>> Nandita Mullapudi wrote: >>> >>>> thanks James, >>>> this one gives the error >>>> >>>> Can't use an undefined value as an ARRAY reference at >>>> /usr/lib/perl5/site_perl/5.6.1/Bio/Matrix/PSM/IO/meme.pm line >>>> 159, line 43. >>>> >>>> i've attached the text output i am trying to parse >>>> >>>> -nandita >>>> >>>> >>>> ---- Original message ---- >>>> >>>>> Date: Thu, 23 Jun 2005 21:36:03 +0100 >>>>> From: James Wasmuth Subject: Re: >>>>> [Bioperl-l] help with parsing meme output To: Nandita Mullapudi >>>>> >>>>> Cc: bioperl-l@portal.open-bio.org >>>>> >>>>> Does this behave itself? >>>>> while (my %header=$psmIO->header) { >>>>> for (my $i=0; $i<=$#{$header{instances}};$i++) { >>>>> print >>>>> >>>> >>>> $header{instances}->[$i],"\t",$header{lengths}->[$i],"\n"; >>>> >>>>> } >>>>> } >>>>> >>>>> >>>>> I don't use these modules but having looked at the docs this >>>>> >>>> >>>> should work. Although the notes in Bio::Matrix::PSM::IO for >>>> this method say it should be obsolete. >>>> >>>>> If you still get no joy then attach a copy of the output file >>>>> >>>> >>>> to an email. This should provide people with an example. >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Nandita Mullapudi wrote: >>>>> >>>>> >>>>>> ok i've got to be missing something here. >>>>>> >>>>>> this is my code: >>>>>> >>>>>> use strict; >>>>>> use warnings; >>>>>> use Bio::Matrix::PSM::IO; >>>>>> use Bio::Matrix::PSM::InstanceSite; >>>>>> >>>>>> my $psmIO = new Bio::Matrix::PSM::IO( -file => 'memeout.txt', >>>>>> -format => 'meme'); >>>>>> >>>>>> while (my %header=$psmIO->header) { >>>>>> foreach my $seqid (@{$header{instances}}) { >>>>>> print "$header->length"; >>>>>> >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> and the error i get is " Global symbol "$header" requires >>>>>> explicit package name at parsememe2.pl line 15. >>>>>> Execution of parsememe2.pl aborted due to compilation errors. >>>>>> >>>>>> >>>>>> thanks for your help. >>>>>> -n >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ---- Original message ---- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Date: Thu, 23 Jun 2005 20:51:48 +0100 >>>>>>> From: James Wasmuth Subject: Re: >>>>>>> [Bioperl-l] help with parsing meme output To: Nandita Mullapudi >>>>>>> >>>>>>> Cc: bioperl-l@bioperl.org >>>>>>> >>>>>>> Nandita >>>>>>> >>>>>>> The BioPerl module $header->length() comes from is >>>>>>> >>>>>> >>>>>> PSM/PsmHeader.pm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> This should be inherited when you "use Bio::Matrix::PSM::IO" >>>>>>> >>>>>>> have a look >>>>>>> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Matrix/PSM/IO.html >>>>>>> >>>>>>> What you want should be covered there. Otherwise shout and >>>>>>> >>>>>> >>>>>> someone will >>>>>> >>>>>> >>>>>> >>>>>>> answer >>>>>>> >>>>>>> -james >>>>>>> >>>>>>> >>>>>>> >>>>>>> Nandita Mullapudi wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> thanks James, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Is it the length from the input sequence that you want? >>>>>>>>> >>>>>>>>> my %length= $header->length(); >>>>>>>>> Function: Returns the length of the input sequence or motifs >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> as a hash, indexed >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> by a sequence ID (motif id or accession number) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> yes, i want the length from the input sequence. I am not sure >>>>>>>> i can use the above without specifying which module / package >>>>>>>> it refers to? >>>>>>>> also , where can i find this info? :) >>>>>>>> thanks, >>>>>>>> -nandita >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> james >>>>>>>>> >>>>>>>>> >>>>>>>>> Nandita Mullapudi wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> I am trying to use Bio::Matrix::PSM::IO to parse meme >>>>>>>>>> >>>>>>>>> >>>> output. >>>> >>>>>>>>>> I need to extract the values corresponding to length of the >>>>>>>>>> sequence, seq id, and motif id, start and >>>>>>>>>> >>>>>>>>> >>>> significance/score. >>>> >>>>>>>>>> I can get the last three using >>>>>>>>>> foreach my $instance (@{ $instances }) { >>>>>>>>>> my $start = $instance -> start; >>>>>>>>>> my $score = $instance -> score; >>>>>>>>>> >>>>>>>>>> But i cannot find out how to get the seq id and seq length. >>>>>>>>>> any ideas? >>>>>>>>>> thanks >>>>>>>>>> -nandita >>>>>>>>>> >>>>>>>>>> *************************************************** >>>>>>>>>> Graduate Student, Kissinger Lab. >>>>>>>>>> Dept. of Genetics >>>>>>>>>> UGA, Athens GA 30602 USA >>>>>>>>>> lab phone: 706-542-6563 >>>>>>>>>> cell phone: 706-254-2444 >>>>>>>>>> Lab add: C318 Life Sciences >>>>>>>>>> **************************************************** >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l@portal.open-bio.org >>>>>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> http://www.nematodes.org/~james >>>>>>>>> >>>>>>>>> "Until man duplicates a blade of grass, nature can laugh at >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> his so-called scientific knowledge...." >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> --Thomas Edison >>>>>>>>> Blaxter Nematode Genomics Group | >>>>>>>>> Institute of Evolutionary Biology | >>>>>>>>> Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>>>>>> University of Edinburgh | web: www.nematodes.org >>>>>>>>> Edinburgh | >>>>>>>>> EH9 3JT | >>>>>>>>> UK | >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *************************************************** >>>>>>>> Graduate Student, Kissinger Lab. >>>>>>>> Dept. of Genetics >>>>>>>> UGA, Athens GA 30602 USA >>>>>>>> lab phone: 706-542-6563 >>>>>>>> cell phone: 706-254-2444 >>>>>>>> Lab add: C318 Life Sciences >>>>>>>> **************************************************** >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> http://www.nematodes.org/~james >>>>>>> >>>>>>> "Until man duplicates a blade of grass, nature can laugh at >>>>>>> >>>>>> >>>>>> his so-called scientific knowledge...." >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> --Thomas Edison >>>>>>> Blaxter Nematode Genomics Group | >>>>>>> Institute of Evolutionary Biology | >>>>>>> Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>>>> University of Edinburgh | web: www.nematodes.org >>>>>>> Edinburgh | >>>>>>> EH9 3JT | >>>>>>> UK | >>>>>>> >>>>>>> >>>>>> >>>>>> *************************************************** >>>>>> Graduate Student, Kissinger Lab. >>>>>> Dept. of Genetics >>>>>> UGA, Athens GA 30602 USA >>>>>> lab phone: 706-542-6563 >>>>>> cell phone: 706-254-2444 >>>>>> Lab add: C318 Life Sciences >>>>>> **************************************************** >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> http://www.nematodes.org/~james >>>>> >>>>> "Until man duplicates a blade of grass, nature can laugh at >>>>> >>>> >>>> his so-called scientific knowledge...." >>>> >>>>> --Thomas Edison >>>>> Blaxter Nematode Genomics Group | >>>>> Institute of Evolutionary Biology | >>>>> Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>> University of Edinburgh | web: www.nematodes.org >>>>> Edinburgh | >>>>> EH9 3JT | >>>>> UK | >>>>> >>>> >>>> >>>> *************************************************** >>>> Graduate Student, Kissinger Lab. >>>> Dept. of Genetics >>>> UGA, Athens GA 30602 USA >>>> lab phone: 706-542-6563 >>>> cell phone: 706-254-2444 >>>> Lab add: C318 Life Sciences >>>> **************************************************** >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > From heikki at ebi.ac.uk Fri Jun 24 02:49:11 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Fri Jun 24 02:40:32 2005 Subject: [Bioperl-l] Fwd: Bio::DB::BioDB Message-ID: <200506240749.11341.heikki@ebi.ac.uk> ---------- Forwarded Message ---------- Subject: Bio::DB::BioDB Date: Thursday 23 June 2005 19:47 From: madhuri battu To: jason@bioperl.org, brian_osborne@cognia.com, heikki@ebi.ac.uk Hi, My name is Madhuri Battu. I am working at UT Southwestern in US as an intern. I have a problem with installing Bio::DB::BioDB. It was saying it is 5 years old on downloads page and recently updated on View cvs page. Can you please tell whether the module is being used and how can i install it. If this module is not being used what module can be used in place of that. Thanks, Madhuri. __________________________________________________________ Free antispam, antivirus and 1GB to save all your messages Only in Yahoo! Mail: http://in.mail.yahoo.com ------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From Richard.Adams at ed.ac.uk Fri Jun 24 06:38:22 2005 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Fri Jun 24 06:31:23 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession Message-ID: <42BBE29E.109@ed.ac.uk> Kat, You can try Ensmart at www.ensembl.org as well, it's very useful for this sort of thing. Richard -- Dr Richard Adams Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From michael.spitzer at uni-muenster.de Fri Jun 24 08:03:38 2005 From: michael.spitzer at uni-muenster.de (Michael Spitzer) Date: Fri Jun 24 07:56:57 2005 Subject: [Bioperl-l] NCBI Genbank ID to TaxonID via Bioperl? In-Reply-To: <1003840011d840032395bf3ed408185b@salmonella.org> References: <42B815CC.3090009@uni-muenster.de> <1003840011d840032395bf3ed408185b@salmonella.org> Message-ID: <42BBF69A.2030602@uni-muenster.de> Rob Edwards wrote: > Possibly the easiest way to do this is using the eutils facilities. > [...] Thanks, this looks exactly what I'm in need for. The API seems to be very simple and easy from what I got from a quick glance at the example Perlscripts there. I think I'll go for the EUtils approach. Thank you very much, Michael From skirov at utk.edu Fri Jun 24 12:06:54 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Jun 24 11:58:22 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession In-Reply-To: <42BC298F.8000100@york.ac.uk> References: <42BAF1B7.10109@york.ac.uk> <42BB131A.2090403@utk.edu> <42BC298F.8000100@york.ac.uk> Message-ID: <42BC2F9E.20607@utk.edu> my $seqio = $gb->get_Stream_by_id(['13474692']); while( my $seq = $seqio->next_seq ) { print "seq is ", $seq->display_id, "\n"; my $ann=$seq->annotation; #This gives you the annotation of the retrieved sequence object foreach my $dblink ($ann->get_Annotations('DBLink')) { if ($dblink->database =~/refseq/i) { print $database->primary_id, " is the mRNA accession number\n"; } } } However, the gene you are looking at is not associated with any NM_ sequence, but rather comes from NC_. Therefore the above will not work for you. You will have to descend through the sequence features and find teh feature that says 'coded_by': use Bio::DB::GenPept; my $gb=new Bio::DB::GenPept; my $seqio = $gb->get_Stream_by_id(['13474692']); while( my $seq = $seqio->next_seq ) { print "seq is ", $seq->display_id, "\n"; my @f=$seq->get_SeqFeatures; #This gives you the annotation of the retrieved sequence object foreach my $feat (@f) { my $ann=$feat->annotation; next unless ($ann->get_Annotations('coded_by')); my @coded=$ann->get_Annotations('coded_by'); foreach my $location (@coded) { print $location->value, " is the location that codes this protein\n"; } } } No guarantees the code is typo free :-) Stefan Kat Hull wrote: > Hi Stefan, > Thanks for your advice but i'm still struggling! I have used > Bio::DB::GenPept to get the protein accession number given the protein > gi number. However, I don't understand how Bio::Annotation::DBLink > works. Does it fetch the url of a link on the web-site? Basically, > if I could use this (or something else) to get the url of the CDS link > for my protein of interest, I can get the corresponding nucleotide > accession from this, as it is encoded in the url. > Do you know how to use this module? Is this what you were suggesting > I try yesterday (I didn't really understand what you were getting at). > Many thanks, > > Kat > > ps. Here's where i'm at so far: > > > use Bio::Annotation::DBLink; > use Bio::DB::GenBank; > use Bio::DB::GenPept; > $gb = new Bio::DB::GenPept; > > > # given the gi number, this returns the accession > my $seqio = $gb->get_Stream_by_id(['13474692']); > while( my $seq = $seqio->next_seq ) { > print "seq is ", $seq->display_id, "\n"; > } > # not sure what i'm doing here > > > $link2 = new Bio::Annotation::DBLink(); > $link2->database('dbSNP'); > $link2->primary_id('2367'); > > > > > > Stefan Kirov wrote: > >> Kat, >> If you are familiar with Bioperl it is kind of easy- >> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on how >> to get the protein record >> Go through the dblinks and find the appropriate accession number >> (where the database method returns GenBank). >> Then retrieve this accession number(s) through Bio::DB::GenBank. If >> you are not familiar with Bioperl- read the docs for >> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and >> Bio::Annotation::DBLink). >> Hope this helps, >> Stefan >> >> Kat Hull wrote: >> >>> Hi there, >>> I was wondering whether anyone has a solution to my problem. I have >>> a list of protein assession numbers and want to retrieve the >>> corresponding nucleotide sequences automatically. I thought it >>> would be possible to do this by changing the NCBI url, but this >>> doesn't seem to be the case. >>> Is there a bio-perl module that can do this? >>> >>> Kind regards, >>> Kat >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> Stefan > > > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From skirov at utk.edu Fri Jun 24 14:54:46 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Jun 24 14:46:13 2005 Subject: [Bioperl-l] Bio::DB::CUTG behaving weird Message-ID: <42BC56F6.7070302@utk.edu> Guys, This very simple code gives a nasty warning and actually does not retrieve anything: my $db = Bio::DB::CUTG->new(-sp =>'Homo sapiens'); my $cdtable = $db->get_request(); It does not run properly under SUSE 9.3, but it does just fine under Ubuntu 5. The code and the bioperl versions are the same (bioperl-live, updated today, even though CUTG goes back quite a while). -------------------- WARNING --------------------- MSG: probable parsing error - should be 21 entries for 20aa + stop codon --------------------------------------------------- If you go to http://www.kazusa.or.jp/codon/cgi-bin/spsearch.cgi?species=Homo+sapiens&c=s, you will see where the problem is. However, my Ubuntu machine goves warning that there are multiple choices and selects Homo sapiens. It may be the LWP::UserAgent. I am obviously missing something. Any ideas? Stefan From hlapp at gmx.net Sat Jun 25 00:17:54 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jun 25 00:09:48 2005 Subject: [Bioperl-l] Fwd: Bio::DB::BioDB In-Reply-To: <200506240749.11341.heikki@ebi.ac.uk> References: <200506240749.11341.heikki@ebi.ac.uk> Message-ID: The module is being used, and in fact in quite extensive ways in certain projects. You need to get to download the latest revision from CVS, not CPAN or any tarball out there that says it is version 0.10. Also, you will need the latest version of the BioSQL schema, also downloadable from CVS. There are installation documents that should explain the necessary steps, depending on your choice of RDBMS (we support Mysql, PostgreSQL, HSQLDB, and Oracle). -hilmar On Jun 24, 2005, at 2:49 AM, Heikki Lehvaslaiho wrote: > > > ---------- Forwarded Message ---------- > > Subject: Bio::DB::BioDB > Date: Thursday 23 June 2005 19:47 > From: madhuri battu > To: jason@bioperl.org, brian_osborne@cognia.com, heikki@ebi.ac.uk > > Hi, > My name is Madhuri Battu. I am working at UT > Southwestern in US as an intern. I have a problem with > installing Bio::DB::BioDB. It was saying it is 5 years > old on downloads page and recently updated on View cvs > page. Can you please tell whether the module is being > used and how can i install it. If this module is not > being used what module can be used in place of that. > Thanks, > Madhuri. > > > > > > > __________________________________________________________ > Free antispam, antivirus and 1GB to save all your messages > Only in Yahoo! Mail: http://in.mail.yahoo.com > > ------------------------------------------------------- > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambridge, CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lehvasla at ebi.ac.uk Sat Jun 25 02:41:37 2005 From: lehvasla at ebi.ac.uk (lehvasla@ebi.ac.uk) Date: Sat Jun 25 20:50:55 2005 Subject: [Bioperl-l] Fwd: Re: Bio::DB::BioDB Message-ID: <37588.83.151.196.59.1119681697.squirrel@webmail.ebi.ac.uk> -------------- next part -------------- An embedded message was scrubbed... From: madhuri battu Subject: Re: Bio::DB::BioDB Date: Fri, 24 Jun 2005 21:33:46 +0100 (BST) Size: 4402 Url: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050625/a06256e9/BioDB.eml From nandita at uga.edu Thu Jun 23 16:56:47 2005 From: nandita at uga.edu (Nandita Mullapudi) Date: Sat Jun 25 20:51:24 2005 Subject: [Bioperl-l] help with parsing meme output Message-ID: thanks James, this one gives the error Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.6.1/Bio/Matrix/PSM/IO/meme.pm line 159, line 43. i've attached the text output i am trying to parse -nandita ---- Original message ---- >Date: Thu, 23 Jun 2005 21:36:03 +0100 >From: James Wasmuth >Subject: Re: [Bioperl-l] help with parsing meme output >To: Nandita Mullapudi >Cc: bioperl-l@portal.open-bio.org > >Does this behave itself? > >while (my %header=$psmIO->header) { > for (my $i=0; $i<=$#{$header{instances}};$i++) { > print $header{instances}->[$i],"\t",$header{lengths}->[$i],"\n"; > } >} > > >I don't use these modules but having looked at the docs this should work. Although the notes in Bio::Matrix::PSM::IO for this method say it should be obsolete. > >If you still get no joy then attach a copy of the output file to an email. This should provide people with an example. > > > > > > > > > >Nandita Mullapudi wrote: > >>ok i've got to be missing something here. >> >>this is my code: >> >>use strict; >>use warnings; >>use Bio::Matrix::PSM::IO; >>use Bio::Matrix::PSM::InstanceSite; >> >>my $psmIO = new Bio::Matrix::PSM::IO( -file => 'memeout.txt', >> -format => 'meme'); >> >>while (my %header=$psmIO->header) { >> foreach my $seqid (@{$header{instances}}) { >> print "$header->length"; >> >>} >>} >> >> >>and the error i get is " Global symbol "$header" requires >>explicit package name at parsememe2.pl line 15. >>Execution of parsememe2.pl aborted due to compilation errors. >> >> >>thanks for your help. >>-n >> >> >> >> >>---- Original message ---- >> >> >>>Date: Thu, 23 Jun 2005 20:51:48 +0100 >>>From: James Wasmuth >>>Subject: Re: [Bioperl-l] help with parsing meme output >>>To: Nandita Mullapudi >>>Cc: bioperl-l@bioperl.org >>> >>>Nandita >>> >>>The BioPerl module $header->length() comes from is >>> >>> >>PSM/PsmHeader.pm >> >> >>>This should be inherited when you "use Bio::Matrix::PSM::IO" >>> >>>have a look >>>http://doc.bioperl.org/releases/bioperl-1.4/Bio/Matrix/PSM/IO.html >>> >>>What you want should be covered there. Otherwise shout and >>> >>> >>someone will >> >> >>>answer >>> >>>-james >>> >>> >>> >>>Nandita Mullapudi wrote: >>> >>> >>> >>>>thanks James, >>>> >>>> >>>> >>>> >>>> >>>>>Is it the length from the input sequence that you want? >>>>> >>>>>my %length= $header->length(); >>>>>Function: Returns the length of the input sequence or motifs >>>>> >>>>> >>>>> >>>>> >>>>as a hash, indexed >>>> >>>> >>>> >>>> >>>>>by a sequence ID (motif id or accession number) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>yes, i want the length from the input sequence. I am not sure >>>>i can use the above without specifying which module / package >>>>it refers to? >>>> >>>>also , where can i find this info? :) >>>>thanks, >>>>-nandita >>>> >>>> >>>> >>>> >>>> >>>> >>>>>james >>>>> >>>>> >>>>>Nandita Mullapudi wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Hi, >>>>>>I am trying to use Bio::Matrix::PSM::IO to parse meme output. >>>>>>I need to extract the values corresponding to length of the >>>>>>sequence, seq id, and motif id, start and significance/score. >>>>>>I can get the last three using >>>>>> >>>>>>foreach my $instance (@{ $instances }) { >>>>>> my $start = $instance -> start; >>>>>> my $score = $instance -> score; >>>>>> >>>>>>But i cannot find out how to get the seq id and seq length. >>>>>>any ideas? >>>>>>thanks >>>>>>-nandita >>>>>> >>>>>>*************************************************** >>>>>>Graduate Student, Kissinger Lab. >>>>>>Dept. of Genetics >>>>>>UGA, Athens GA 30602 USA >>>>>>lab phone: 706-542-6563 >>>>>>cell phone: 706-254-2444 >>>>>>Lab add: C318 Life Sciences >>>>>>**************************************************** >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l@portal.open-bio.org >>>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>-- >>>>>http://www.nematodes.org/~james >>>>> >>>>>"Until man duplicates a blade of grass, nature can laugh at >>>>> >>>>> >>>>> >>>>> >>>>his so-called scientific knowledge...." >>>> >>>> >>>> >>>> >>>>> --Thomas Edison >>>>> >>>>>Blaxter Nematode Genomics Group | >>>>>Institute of Evolutionary Biology | >>>>>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>>>University of Edinburgh | web: www.nematodes.org >>>>>Edinburgh | >>>>>EH9 3JT | >>>>>UK | >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>*************************************************** >>>>Graduate Student, Kissinger Lab. >>>>Dept. of Genetics >>>>UGA, Athens GA 30602 USA >>>>lab phone: 706-542-6563 >>>>cell phone: 706-254-2444 >>>>Lab add: C318 Life Sciences >>>>**************************************************** >>>> >>>> >>>> >>>> >>>-- >>>http://www.nematodes.org/~james >>> >>>"Until man duplicates a blade of grass, nature can laugh at >>> >>> >>his so-called scientific knowledge...." >> >> >>> --Thomas Edison >>> >>>Blaxter Nematode Genomics Group | >>>Institute of Evolutionary Biology | >>>Ashworth Laboratories, KB | tel: +44 131 650 7403 >>>University of Edinburgh | web: www.nematodes.org >>>Edinburgh | >>>EH9 3JT | >>>UK | >>> >>> >>> >>> >> >>*************************************************** >>Graduate Student, Kissinger Lab. >>Dept. of Genetics >>UGA, Athens GA 30602 USA >>lab phone: 706-542-6563 >>cell phone: 706-254-2444 >>Lab add: C318 Life Sciences >>**************************************************** >> >> > >-- >http://www.nematodes.org/~james > >"Until man duplicates a blade of grass, nature can laugh at his so-called scientific knowledge...." > --Thomas Edison > >Blaxter Nematode Genomics Group | >Institute of Evolutionary Biology | >Ashworth Laboratories, KB | tel: +44 131 650 7403 >University of Edinburgh | web: www.nematodes.org >Edinburgh | >EH9 3JT | >UK | > > *************************************************** Graduate Student, Kissinger Lab. Dept. of Genetics UGA, Athens GA 30602 USA lab phone: 706-542-6563 cell phone: 706-254-2444 Lab add: C318 Life Sciences **************************************************** -------------- next part -------------- ******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 3.0 (Release date: 2002/04/02 00:11:59) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.sdsc.edu. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.sdsc.edu. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= /scratch/nandita/Sullivan/all14-trunc-uw ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ G6PD_TRUNC 1.0000 1500 GAPDH_TRUNC 1.0000 1500 PDIpro-trunc 1.0000 1500 Camp-kin-cat-trunc 1.0000 1450 cAMP_kinase_reg-trunc 1.0000 1500 PK4-TRUNC 1.0000 1500 Calmodulin-TRUNC 1.0000 1500 MIC2-TRUNC 1.0000 1500 MIC4-TRUNC 1.0000 1500 MIC5-TRUNC 1.0000 1500 MIC6-TRUNC 1.0000 1500 ROP9-TRUNC 1.0000 1500 GRA7-TRUNC 1.0000 1500 GRA2pre-TRUNC 1.0000 1500 RiboP-TRUNCATED 1.0000 1501 MIC11-PRO-TRUNC 1.0000 1500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme /scratch/nandita/Sullivan/all14-trunc-uw -dna -minw 6 -maxw 20 -mod oops -nmotifs 20 model: mod= oops nmotifs= 20 evt= inf object function= E-value of product of p-values width: minw= 6 maxw= 20 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 16 maxsites= 16 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 23951 N= 16 strands: + sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.218 C 0.269 G 0.246 T 0.267 Background letter frequencies (from dataset with add-one prior applied): A 0.218 C 0.269 G 0.246 T 0.267 ******************************************************************************** ******************************************************************************** MOTIF 1 width = 11 sites = 16 llr = 159 E-value = 5.1e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A 2:::9:8:611 pos.-specific C 1:381:1::9: probability G 1131:a1a::8 matrix T 6951::::4:1 bits 2.2 2.0 * * 1.8 ** * 1.5 ** * * Information 1.3 * **** * content 1.1 * ******** (14.4 bits) 0.9 * ******** 0.7 * ******** 0.4 *********** 0.2 *********** 0.0 ----------- Multilevel TTTCAGAGACG consensus C T sequence G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------- GRA7-TRUNC 1203 2.13e-07 CTCACCGGGT TTTCAGAGACG CGCGAGATCC MIC2-TRUNC 1282 2.13e-07 GCGACGTCGA TTTCAGAGACG ACGCCGCATG ROP9-TRUNC 1092 1.06e-06 TCAGCGACGC ATTCAGAGACG AGTTTGTTGC G6PD_TRUNC 908 1.30e-06 TGTCCAAAAG TTGCAGAGTCG ACTTCCATCT PDIpro-trunc 357 1.56e-06 CCAGCAGTAG TTCCAGAGTCG TACAGCCGTA MIC5-TRUNC 880 3.56e-06 GTCGCGGAAG GTTCAGAGACG CCCCAGTCGT MIC4-TRUNC 445 4.81e-06 CCGCTTTCCG TTGGAGAGACG CCGGCTAGGG PK4-TRUNC 36 4.81e-06 GCATCGCTAG ATCCAGAGTCG CTCTTTAACT GAPDH_TRUNC 994 1.31e-05 CACTGTGTTT TTGCAGAGACT TGTTTCCACT cAMP_kinase_reg-trunc 1211 1.65e-05 TCGCTCTCAT TTCCAGCGTCG ATCGCGCCTG MIC6-TRUNC 1196 2.16e-05 CGCTAGTGTG TGTCAGCGACG CGGCAGTCGA Calmodulin-TRUNC 368 2.55e-05 GGTCCCCCGT ATGCAGAGTCA AATAGCCAAC GRA2pre-TRUNC 1324 2.74e-05 ACAGAGACGA CGCCAGAGACG CAAAATGAAC RiboP-TRUNCATED 1042 5.17e-05 CAGGCGGTTT CTTCAGGGTCG AATGGGAGTC MIC11-PRO-TRUNC 1068 7.72e-05 TGTTTTCCTT TTTTCGAGACG TCTGTCGAGA Camp-kin-cat-trunc 237 1.11e-04 GCGGCGTTTC TTTGAGAGAAA CTTAAACAGG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- GRA7-TRUNC 2.1e-07 1202_[1]_287 MIC2-TRUNC 2.1e-07 1281_[1]_208 ROP9-TRUNC 1.1e-06 1091_[1]_398 G6PD_TRUNC 1.3e-06 907_[1]_582 PDIpro-trunc 1.6e-06 356_[1]_1133 MIC5-TRUNC 3.6e-06 879_[1]_610 MIC4-TRUNC 4.8e-06 444_[1]_1045 PK4-TRUNC 4.8e-06 35_[1]_1454 GAPDH_TRUNC 1.3e-05 993_[1]_496 cAMP_kinase_reg-trunc 1.6e-05 1210_[1]_279 MIC6-TRUNC 2.2e-05 1195_[1]_294 Calmodulin-TRUNC 2.6e-05 367_[1]_1122 GRA2pre-TRUNC 2.7e-05 1323_[1]_166 RiboP-TRUNCATED 5.2e-05 1041_[1]_449 MIC11-PRO-TRUNC 7.7e-05 1067_[1]_422 Camp-kin-cat-trunc 0.00011 236_[1]_1203 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=11 seqs=16 GRA7-TRUNC ( 1203) TTTCAGAGACG 1 MIC2-TRUNC ( 1282) TTTCAGAGACG 1 ROP9-TRUNC ( 1092) ATTCAGAGACG 1 G6PD_TRUNC ( 908) TTGCAGAGTCG 1 PDIpro-trunc ( 357) TTCCAGAGTCG 1 MIC5-TRUNC ( 880) GTTCAGAGACG 1 MIC4-TRUNC ( 445) TTGGAGAGACG 1 PK4-TRUNC ( 36) ATCCAGAGTCG 1 GAPDH_TRUNC ( 994) TTGCAGAGACT 1 cAMP_kinase_reg-trunc ( 1211) TTCCAGCGTCG 1 MIC6-TRUNC ( 1196) TGTCAGCGACG 1 Calmodulin-TRUNC ( 368) ATGCAGAGTCA 1 GRA2pre-TRUNC ( 1324) CGCCAGAGACG 1 RiboP-TRUNCATED ( 1042) CTTCAGGGTCG 1 MIC11-PRO-TRUNC ( 1068) TTTTCGAGACG 1 Camp-kin-cat-trunc ( 237) TTTGAGAGAAA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 23791 bayes= 10.5372 E= 5.1e+001 -22 -110 -198 123 -1064 -1064 -98 171 -1064 -10 2 91 -1064 160 -98 -209 210 -210 -1064 -1064 -1064 -1064 202 -1064 189 -110 -198 -1064 -1064 -1064 202 -1064 152 -1064 -1064 49 -180 180 -1064 -1064 -80 -1064 172 -209 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 n= 23791 E= 5.1e+001 0.187519 0.125090 0.062615 0.624776 0.000136 0.000168 0.125076 0.874620 0.000136 0.250012 0.249998 0.499854 0.000136 0.812160 0.125076 0.062628 0.937051 0.062629 0.000154 0.000167 0.000136 0.000168 0.999529 0.000167 0.812129 0.125090 0.062615 0.000167 0.000136 0.000168 0.999529 0.000167 0.624746 0.000168 0.000154 0.374932 0.062597 0.937082 0.000154 0.000167 0.125058 0.000168 0.812146 0.062628 -------------------------------------------------------------------------------- Time 41.39 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 width = 8 sites = 16 llr = 143 E-value = 9.3e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A 869a9a81 pos.-specific C :4::::1: probability G 211:1::9 matrix T ::::::1: bits 2.2 * * 2.0 * * 1.8 ** * 1.5 * **** * Information 1.3 * ****** content 1.1 * ****** (12.9 bits) 0.9 ******** 0.7 ******** 0.4 ******** 0.2 ******** 0.0 -------- Multilevel AAAAAAAG consensus C sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- -------- GRA2pre-TRUNC 251 5.84e-06 GTGGAAACGA AAAAAAAG GGTTTCAGGA MIC2-TRUNC 126 5.84e-06 GTACATGGTC AAAAAAAG CCGTAATGCA PK4-TRUNC 476 5.84e-06 AGACGTTGGG AAAAAAAG TTATCGGCGT cAMP_kinase_reg-trunc 429 5.84e-06 TATTCGTGTG AAAAAAAG TGGAAGTCCC GRA7-TRUNC 490 1.30e-05 TAGGAGATTC ACAAAAAG TTCAGGAAAA PDIpro-trunc 728 1.96e-05 GCTCGTCCAC GAAAAAAG TGTTTCGTGA MIC5-TRUNC 813 3.85e-05 GATTCAAGGA AAAAAATG CATTGAACAA G6PD_TRUNC 58 3.85e-05 AGGACCTCCG AAAAAATG TCACTCTTCG MIC11-PRO-TRUNC 660 4.66e-05 ATCACCAAAT GCAAAAAG CGAGCCTTCA GAPDH_TRUNC 783 5.32e-05 TTAGGAATGG AGAAAAAG GTAGTTTACA RiboP-TRUNCATED 1117 5.96e-05 CAGGAGTCTG ACAAAAAA GTGAAGTTTT Calmodulin-TRUNC 454 5.96e-05 CATCTACCAT ACAAAAAA ATCATGTGGG ROP9-TRUNC 425 6.76e-05 GGAACAGCAA ACAAGAAG ACGGGCAATA MIC6-TRUNC 1089 6.76e-05 AAACGCTGCA ACAAGAAG CACAGTCCAG Camp-kin-cat-trunc 315 9.02e-05 TGGCGAGTGG AAGAAAAG CTGTTTCTCT MIC4-TRUNC 881 2.11e-04 CACTAAAACC GAAAAACG GAAGTCAGTA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- GRA2pre-TRUNC 5.8e-06 250_[2]_1242 MIC2-TRUNC 5.8e-06 125_[2]_1367 PK4-TRUNC 5.8e-06 475_[2]_1017 cAMP_kinase_reg-trunc 5.8e-06 428_[2]_1064 GRA7-TRUNC 1.3e-05 489_[2]_1003 PDIpro-trunc 2e-05 727_[2]_765 MIC5-TRUNC 3.8e-05 812_[2]_680 G6PD_TRUNC 3.8e-05 57_[2]_1435 MIC11-PRO-TRUNC 4.7e-05 659_[2]_833 GAPDH_TRUNC 5.3e-05 782_[2]_710 RiboP-TRUNCATED 6e-05 1116_[2]_377 Calmodulin-TRUNC 6e-05 453_[2]_1039 ROP9-TRUNC 6.8e-05 424_[2]_1068 MIC6-TRUNC 6.8e-05 1088_[2]_404 Camp-kin-cat-trunc 9e-05 314_[2]_1128 MIC4-TRUNC 0.00021 880_[2]_612 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=8 seqs=16 GRA2pre-TRUNC ( 251) AAAAAAAG 1 MIC2-TRUNC ( 126) AAAAAAAG 1 PK4-TRUNC ( 476) AAAAAAAG 1 cAMP_kinase_reg-trunc ( 429) AAAAAAAG 1 GRA7-TRUNC ( 490) ACAAAAAG 1 PDIpro-trunc ( 728) GAAAAAAG 1 MIC5-TRUNC ( 813) AAAAAATG 1 G6PD_TRUNC ( 58) AAAAAATG 1 MIC11-PRO-TRUNC ( 660) GCAAAAAG 1 GAPDH_TRUNC ( 783) AGAAAAAG 1 RiboP-TRUNCATED ( 1117) ACAAAAAA 1 Calmodulin-TRUNC ( 454) ACAAAAAA 1 ROP9-TRUNC ( 425) ACAAGAAG 1 MIC6-TRUNC ( 1089) ACAAGAAG 1 Camp-kin-cat-trunc ( 315) AAGAAAAG 1 MIC4-TRUNC ( 881) GAAAAACG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 8 n= 23839 bayes= 10.5401 E= 9.3e+001 189 -1064 -39 -1064 136 48 -198 -1064 210 -1064 -198 -1064 219 -1064 -1064 -1064 200 -1064 -98 -1064 219 -1064 -1064 -1064 189 -210 -1064 -109 -80 -1064 183 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 8 n= 23839 E= 9.3e+001 0.812129 0.000168 0.187537 0.000167 0.562285 0.374934 0.062615 0.000167 0.937051 0.000168 0.062615 0.000167 0.999512 0.000168 0.000154 0.000167 0.874590 0.000168 0.125076 0.000167 0.999512 0.000168 0.000154 0.000167 0.812129 0.062629 0.000154 0.125088 0.125058 0.000168 0.874607 0.000167 -------------------------------------------------------------------------------- Time 82.95 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 width = 11 sites = 16 llr = 157 E-value = 2.2e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A ::1:1:::::8 pos.-specific C 3::31241:a: probability G ::::4:4:a:: matrix T 8a974829::3 bits 2.2 2.0 * ** 1.8 * ** 1.5 ** ** Information 1.3 ** * **** content 1.1 **** * **** (14.1 bits) 0.9 **** * **** 0.7 **** * **** 0.4 **** ****** 0.2 *********** 0.0 ----------- Multilevel TTTTTTCTGCA consensus C CG G T sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------- GRA7-TRUNC 347 3.72e-07 CTCTGCTCTG TTTTTTCTGCA CTGCACCGCC MIC5-TRUNC 47 1.37e-06 GTGCAATTGC TTTTGTGTGCA ACCGACAGGC PDIpro-trunc 100 3.47e-06 ATATTGTCGC TTTTGTTTGCA CTCCTGTACA GAPDH_TRUNC 662 3.47e-06 CGTCTCTCTT TTTCGTGTGCA TCTCATTCTT Camp-kin-cat-trunc 723 3.77e-06 GACGCGATTT TTTTATCTGCA CCACTCGCCG RiboP-TRUNCATED 1344 8.86e-06 TCTTCTTGTT TTTTGCGTGCA GGTTAAAGTG GRA2pre-TRUNC 1404 8.86e-06 CATTAAACGA TTTCTTTTGCA ATTCGCGTCG MIC4-TRUNC 1157 1.44e-05 ACATCGACCA TTTTCTGTGCA TCTGTGCTGC MIC6-TRUNC 924 1.64e-05 ATTAAACGGG TTTCGTCTGCT TTAGATGTTT MIC2-TRUNC 1225 1.64e-05 GACCGTCCAG CTTTATCTGCA AGCAACGCCA cAMP_kinase_reg-trunc 1451 2.10e-05 GGCGTGTCCC CTTTTTCTGCT TCTTTTTGCC G6PD_TRUNC 11 2.10e-05 CCAGACCCCT CTTTTTCTGCT GCTTACACGG Calmodulin-TRUNC 418 2.27e-05 ACAAGATCGG TTATTTGTGCA CCCATTTCTA PK4-TRUNC 640 4.79e-05 AGGCCTTCCT CTTCGCCTGCA ACCTTCAGAG ROP9-TRUNC 1470 5.09e-05 TTCAAACACG TTTTTCGCGCA GTGCTTCCCT MIC11-PRO-TRUNC 759 1.06e-04 CAATAGCGGG TTTCTTTCGCT AGTTCGCGAC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- GRA7-TRUNC 3.7e-07 346_[3]_1143 MIC5-TRUNC 1.4e-06 46_[3]_1443 PDIpro-trunc 3.5e-06 99_[3]_1390 GAPDH_TRUNC 3.5e-06 661_[3]_828 Camp-kin-cat-trunc 3.8e-06 722_[3]_717 RiboP-TRUNCATED 8.9e-06 1343_[3]_147 GRA2pre-TRUNC 8.9e-06 1403_[3]_86 MIC4-TRUNC 1.4e-05 1156_[3]_333 MIC6-TRUNC 1.6e-05 923_[3]_566 MIC2-TRUNC 1.6e-05 1224_[3]_265 cAMP_kinase_reg-trunc 2.1e-05 1450_[3]_39 G6PD_TRUNC 2.1e-05 10_[3]_1479 Calmodulin-TRUNC 2.3e-05 417_[3]_1072 PK4-TRUNC 4.8e-05 639_[3]_850 ROP9-TRUNC 5.1e-05 1469_[3]_20 MIC11-PRO-TRUNC 0.00011 758_[3]_731 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=11 seqs=16 GRA7-TRUNC ( 347) TTTTTTCTGCA 1 MIC5-TRUNC ( 47) TTTTGTGTGCA 1 PDIpro-trunc ( 100) TTTTGTTTGCA 1 GAPDH_TRUNC ( 662) TTTCGTGTGCA 1 Camp-kin-cat-trunc ( 723) TTTTATCTGCA 1 RiboP-TRUNCATED ( 1344) TTTTGCGTGCA 1 GRA2pre-TRUNC ( 1404) TTTCTTTTGCA 1 MIC4-TRUNC ( 1157) TTTTCTGTGCA 1 MIC6-TRUNC ( 924) TTTCGTCTGCT 1 MIC2-TRUNC ( 1225) CTTTATCTGCA 1 cAMP_kinase_reg-trunc ( 1451) CTTTTTCTGCT 1 G6PD_TRUNC ( 11) CTTTTTCTGCT 1 Calmodulin-TRUNC ( 418) TTATTTGTGCA 1 PK4-TRUNC ( 640) CTTCGCCTGCA 1 ROP9-TRUNC ( 1470) TTTTTCGCGCA 1 MIC11-PRO-TRUNC ( 759) TTTCTTTCGCT 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 23791 bayes= 10.5372 E= 2.2e+002 -1064 -10 -1064 149 -1064 -1064 -1064 191 -180 -1064 -1064 181 -1064 22 -1064 137 -80 -210 61 71 -1064 -52 -1064 161 -1064 70 61 -51 -1064 -110 -1064 171 -1064 -1064 202 -1064 -1064 190 -1064 -1064 178 -1064 -1064 -9 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 n= 23791 E= 2.2e+002 0.000136 0.250012 0.000154 0.749698 0.000136 0.000168 0.000154 0.999542 0.062597 0.000168 0.000154 0.937081 0.000136 0.312473 0.000154 0.687237 0.125058 0.062629 0.374920 0.437393 0.000136 0.187551 0.000154 0.812159 0.000136 0.437395 0.374920 0.187549 0.000136 0.125090 0.000154 0.874620 0.000136 0.000168 0.999529 0.000167 0.000136 0.999543 0.000154 0.000167 0.749668 0.000168 0.000154 0.250010 -------------------------------------------------------------------------------- Time 123.42 secs. ******************************************************************************** ******************************************************************************** MOTIF 4 width = 11 sites = 16 llr = 158 E-value = 7.7e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 4 Description -------------------------------------------------------------------------------- Simplified A ::::::1::2: pos.-specific C :::::634::8 probability G 1a::::1:511 matrix T 9:aaa456582 bits 2.2 2.0 **** 1.8 **** 1.5 ***** Information 1.3 ***** content 1.1 ***** (14.3 bits) 0.9 ****** **** 0.7 ****** **** 0.4 ****** **** 0.2 *********** 0.0 ----------- Multilevel TGTTTCTTGTC consensus TCCT sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------- Calmodulin-TRUNC 1392 4.20e-07 TCACATTCGG TGTTTCTTGTC CGGGGGTGTT MIC6-TRUNC 1443 1.74e-06 ATATATGATG TGTTTTTTTTC GATTGGATGA GRA2pre-TRUNC 1436 3.05e-06 TATCGCACGT TGTTTCTCGTC CCACGAATAG GAPDH_TRUNC 319 3.92e-06 TTGCCCTCTG TGTTTTCTGTC GCTGAGTGCA cAMP_kinase_reg-trunc 1310 4.80e-06 AGCTCGCGTT TGTTTTCTTTC TGACAGTCCT G6PD_TRUNC 1275 6.14e-06 TTTCGTTTCC TGTTTCCCTTC GATGACCGCT MIC2-TRUNC 921 1.07e-05 ACGTCACAGT TGTTTTTTTAC GGGAAAATTC PK4-TRUNC 1086 1.34e-05 CGTACGTCTC TGTTTCTCGAC CTTTTTCGAT ROP9-TRUNC 1280 1.68e-05 GCCGGACTCT TGTTTCTCGTT AGCGCGTAGG MIC4-TRUNC 1462 1.98e-05 GTTGTGGATG TGTTTTCTTTT GTGACCGCTC Camp-kin-cat-trunc 133 1.98e-05 TGGCAGTTCC TGTTTCGTTTC AACGTCACTT PDIpro-trunc 913 3.00e-05 TAAGCGAGCT TGTTTTATTAC CTGGCTTTCG MIC11-PRO-TRUNC 1172 3.39e-05 CTTTTCGTAC GGTTTTTTGTC TAGCATAAAC RiboP-TRUNCATED 658 3.76e-05 GACAGCGCGT TGTTTCCTTGC GGCTGCGCCT MIC5-TRUNC 1410 3.76e-05 ATGTAGTTTC TGTTTCTCGTG AGACCGTGAA GRA7-TRUNC 1273 4.06e-05 CTTTGGAACG TGTTTCACGTT TGAGTTGCAC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- Calmodulin-TRUNC 4.2e-07 1391_[4]_98 MIC6-TRUNC 1.7e-06 1442_[4]_47 GRA2pre-TRUNC 3e-06 1435_[4]_54 GAPDH_TRUNC 3.9e-06 318_[4]_1171 cAMP_kinase_reg-trunc 4.8e-06 1309_[4]_180 G6PD_TRUNC 6.1e-06 1274_[4]_215 MIC2-TRUNC 1.1e-05 920_[4]_569 PK4-TRUNC 1.3e-05 1085_[4]_404 ROP9-TRUNC 1.7e-05 1279_[4]_210 MIC4-TRUNC 2e-05 1461_[4]_28 Camp-kin-cat-trunc 2e-05 132_[4]_1307 PDIpro-trunc 3e-05 912_[4]_577 MIC11-PRO-TRUNC 3.4e-05 1171_[4]_318 RiboP-TRUNCATED 3.8e-05 657_[4]_833 MIC5-TRUNC 3.8e-05 1409_[4]_80 GRA7-TRUNC 4.1e-05 1272_[4]_217 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 4 width=11 seqs=16 Calmodulin-TRUNC ( 1392) TGTTTCTTGTC 1 MIC6-TRUNC ( 1443) TGTTTTTTTTC 1 GRA2pre-TRUNC ( 1436) TGTTTCTCGTC 1 GAPDH_TRUNC ( 319) TGTTTTCTGTC 1 cAMP_kinase_reg-trunc ( 1310) TGTTTTCTTTC 1 G6PD_TRUNC ( 1275) TGTTTCCCTTC 1 MIC2-TRUNC ( 921) TGTTTTTTTAC 1 PK4-TRUNC ( 1086) TGTTTCTCGAC 1 ROP9-TRUNC ( 1280) TGTTTCTCGTT 1 MIC4-TRUNC ( 1462) TGTTTTCTTTT 1 Camp-kin-cat-trunc ( 133) TGTTTCGTTTC 1 PDIpro-trunc ( 913) TGTTTTATTAC 1 MIC11-PRO-TRUNC ( 1172) GGTTTTTTGTC 1 RiboP-TRUNCATED ( 658) TGTTTCCTTGC 1 MIC5-TRUNC ( 1410) TGTTTCTCGTG 1 GRA7-TRUNC ( 1273) TGTTTCACGTT 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 23791 bayes= 10.5372 E= 7.7e+001 -1064 -1064 -198 181 -1064 -1064 202 -1064 -1064 -1064 -1064 191 -1064 -1064 -1064 191 -1064 -1064 -1064 191 -1064 107 -1064 71 -80 22 -198 91 -1064 48 -1064 123 -1064 -1064 102 91 -22 -1064 -198 149 -1064 148 -198 -51 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 n= 23791 E= 7.7e+001 0.000136 0.000168 0.062615 0.937081 0.000136 0.000168 0.999529 0.000167 0.000136 0.000168 0.000154 0.999542 0.000136 0.000168 0.000154 0.999542 0.000136 0.000168 0.000154 0.999542 0.000136 0.562316 0.000154 0.437393 0.125058 0.312473 0.062615 0.499854 0.000136 0.374934 0.000154 0.624776 0.000136 0.000168 0.499842 0.499854 0.187519 0.000168 0.062615 0.749698 0.000136 0.749699 0.062615 0.187549 -------------------------------------------------------------------------------- Time 163.67 secs. ******************************************************************************** ******************************************************************************** MOTIF 5 width = 8 sites = 16 llr = 142 E-value = 4.8e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 5 Description -------------------------------------------------------------------------------- Simplified A 3:8:aaa3 pos.-specific C :9:1:::7 probability G 81:9:::: matrix T :12::::1 bits 2.2 *** 2.0 *** 1.8 *** 1.5 ***** Information 1.3 ******* content 1.1 ******* (12.8 bits) 0.9 ******** 0.7 ******** 0.4 ******** 0.2 ******** 0.0 -------- Multilevel GCAGAAAC consensus A A sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- -------- RiboP-TRUNCATED 1 9.96e-06 . GCAGAAAC TGCGGGGAAT GRA7-TRUNC 949 9.96e-06 GTTCGCGCCT GCAGAAAC CCACGAGCCC ROP9-TRUNC 449 9.96e-06 AATATATCTG GCAGAAAC ATCTAAGCAC MIC5-TRUNC 1112 9.96e-06 CTGGAGAGAC GCAGAAAC TTAGATCCAG MIC2-TRUNC 613 9.96e-06 CCTTTGTAAC GCAGAAAC TGAAAATAAC Camp-kin-cat-trunc 1397 1.81e-05 TTGTTCGACG GCAGAAAA GGCTTCTTGT G6PD_TRUNC 376 2.69e-05 TCGCCGACTT ACAGAAAC CATTGACGAC GRA2pre-TRUNC 315 3.91e-05 GACAGATCCC GCTGAAAC CTCTAAACAC cAMP_kinase_reg-trunc 123 3.91e-05 AGATGCTTCC GCTGAAAC CTACGAATCT PDIpro-trunc 453 4.62e-05 ACCGCAATCC ACAGAAAA CCACACAACA MIC11-PRO-TRUNC 1140 5.71e-05 GGGCAACGTG GCACAAAC GCAGTCGTTA MIC4-TRUNC 993 6.70e-05 ATCCTGACTA GCAGAAAT TCGTTCACCC PK4-TRUNC 1469 8.60e-05 GTTGTTTGTT GGAGAAAC AGCTTTTCTA GAPDH_TRUNC 1018 8.60e-05 TTCCACTGTC GCTGAAAA GGGAGAGTAT Calmodulin-TRUNC 1097 1.25e-04 TTGTTGTAAC ACACAAAC CGCTAGTGCT MIC6-TRUNC 549 2.56e-04 ATTGTTTTCC ATAGAAAA CACTACTGGA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- RiboP-TRUNCATED 1e-05 [5]_1493 GRA7-TRUNC 1e-05 948_[5]_544 ROP9-TRUNC 1e-05 448_[5]_1044 MIC5-TRUNC 1e-05 1111_[5]_381 MIC2-TRUNC 1e-05 612_[5]_880 Camp-kin-cat-trunc 1.8e-05 1396_[5]_46 G6PD_TRUNC 2.7e-05 375_[5]_1117 GRA2pre-TRUNC 3.9e-05 314_[5]_1178 cAMP_kinase_reg-trunc 3.9e-05 122_[5]_1370 PDIpro-trunc 4.6e-05 452_[5]_1040 MIC11-PRO-TRUNC 5.7e-05 1139_[5]_353 MIC4-TRUNC 6.7e-05 992_[5]_500 PK4-TRUNC 8.6e-05 1468_[5]_24 GAPDH_TRUNC 8.6e-05 1017_[5]_475 Calmodulin-TRUNC 0.00013 1096_[5]_396 MIC6-TRUNC 0.00026 548_[5]_944 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 5 width=8 seqs=16 RiboP-TRUNCATED ( 1) GCAGAAAC 1 GRA7-TRUNC ( 949) GCAGAAAC 1 ROP9-TRUNC ( 449) GCAGAAAC 1 MIC5-TRUNC ( 1112) GCAGAAAC 1 MIC2-TRUNC ( 613) GCAGAAAC 1 Camp-kin-cat-trunc ( 1397) GCAGAAAA 1 G6PD_TRUNC ( 376) ACAGAAAC 1 GRA2pre-TRUNC ( 315) GCTGAAAC 1 cAMP_kinase_reg-trunc ( 123) GCTGAAAC 1 PDIpro-trunc ( 453) ACAGAAAA 1 MIC11-PRO-TRUNC ( 1140) GCACAAAC 1 MIC4-TRUNC ( 993) GCAGAAAT 1 PK4-TRUNC ( 1469) GGAGAAAC 1 GAPDH_TRUNC ( 1018) GCTGAAAA 1 Calmodulin-TRUNC ( 1097) ACACAAAC 1 MIC6-TRUNC ( 549) ATAGAAAA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 8 n= 23839 bayes= 10.5401 E= 4.8e+002 19 -1064 161 -1064 -1064 170 -198 -209 189 -1064 -1064 -51 -1064 -110 183 -1064 219 -1064 -1064 -1064 219 -1064 -1064 -1064 219 -1064 -1064 -1064 19 136 -1064 -209 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 8 n= 23839 E= 4.8e+002 0.249980 0.000168 0.749685 0.000167 0.000136 0.874621 0.062615 0.062628 0.812129 0.000168 0.000154 0.187549 0.000136 0.125090 0.874607 0.000167 0.999512 0.000168 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 0.249980 0.687238 0.000154 0.062628 -------------------------------------------------------------------------------- Time 204.07 secs. ******************************************************************************** ******************************************************************************** MOTIF 6 width = 11 sites = 16 llr = 152 E-value = 8.1e+004 ******************************************************************************** -------------------------------------------------------------------------------- Motif 6 Description -------------------------------------------------------------------------------- Simplified A ::::::::::1 pos.-specific C 33:8::a2a:2 probability G 65a:81:1:17 matrix T 12:239:7:91 bits 2.2 2.0 * * * 1.8 * * * 1.5 * * ** Information 1.3 * ** ** content 1.1 ***** ** (13.7 bits) 0.9 ***** ** 0.7 * ********* 0.4 *********** 0.2 *********** 0.0 ----------- Multilevel GGGCGTCTCTG consensus CC T sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------- MIC11-PRO-TRUNC 1307 3.33e-07 CAGCGCACAA GGGCGTCTCTG GTTGCCAGAC MIC6-TRUNC 1488 6.97e-07 GCTTCCTGCC GCGCGTCTCTG CT GAPDH_TRUNC 220 1.78e-06 CGCGACAACT GGGCTTCTCTG TAAGCCAAGT MIC5-TRUNC 1042 1.01e-05 ACGCCGATCC GGGCGTCTCTT TGTTTTCCTT RiboP-TRUNCATED 1408 1.12e-05 AGCTTCCCCT CGGTGTCTCTG CGTTTTCCTG PDIpro-trunc 1128 1.19e-05 AGTGTCGAGC GGGCGTCTCGG TGTCGCCGTC GRA7-TRUNC 1129 1.48e-05 ACGAGGAGAC GCGCGTCTCTA GAGAGACCCG MIC4-TRUNC 633 1.77e-05 TCCATTGACG GTGCGGCTCTG CAGAATATGT MIC2-TRUNC 1414 2.10e-05 GCCCGCCCTT TGGCGTCTCTC ATTTTGGGTG Calmodulin-TRUNC 597 2.46e-05 GAGCAATAAA GCGTGTCCCTG TTTCTCCGTT PK4-TRUNC 546 3.46e-05 CCGACGCGCA CGGCTTCTCTC CGCAATCGCT cAMP_kinase_reg-trunc 292 3.46e-05 GAATTTCCAT GCGCTGCTCTG CGAGGTGCCT GRA2pre-TRUNC 1353 3.80e-05 ACAGCGGAAC CTGCGTCGCTG TCTGTCCTGC Camp-kin-cat-trunc 472 3.97e-05 GACCGCGAGA CGGTGTCCCTG AATAGTTCGC G6PD_TRUNC 538 4.33e-05 CCCACATGTG TCGCGTCGCTG TTGCTTGACA ROP9-TRUNC 1481 9.32e-05 TTTTCGCGCA GTGCTTCCCTC GTTTCTCGG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC11-PRO-TRUNC 3.3e-07 1306_[6]_183 MIC6-TRUNC 7e-07 1487_[6]_2 GAPDH_TRUNC 1.8e-06 219_[6]_1270 MIC5-TRUNC 1e-05 1041_[6]_448 RiboP-TRUNCATED 1.1e-05 1407_[6]_83 PDIpro-trunc 1.2e-05 1127_[6]_362 GRA7-TRUNC 1.5e-05 1128_[6]_361 MIC4-TRUNC 1.8e-05 632_[6]_857 MIC2-TRUNC 2.1e-05 1413_[6]_76 Calmodulin-TRUNC 2.5e-05 596_[6]_893 PK4-TRUNC 3.5e-05 545_[6]_944 cAMP_kinase_reg-trunc 3.5e-05 291_[6]_1198 GRA2pre-TRUNC 3.8e-05 1352_[6]_137 Camp-kin-cat-trunc 4e-05 471_[6]_968 G6PD_TRUNC 4.3e-05 537_[6]_952 ROP9-TRUNC 9.3e-05 1480_[6]_9 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 6 width=11 seqs=16 MIC11-PRO-TRUNC ( 1307) GGGCGTCTCTG 1 MIC6-TRUNC ( 1488) GCGCGTCTCTG 1 GAPDH_TRUNC ( 220) GGGCTTCTCTG 1 MIC5-TRUNC ( 1042) GGGCGTCTCTT 1 RiboP-TRUNCATED ( 1408) CGGTGTCTCTG 1 PDIpro-trunc ( 1128) GGGCGTCTCGG 1 GRA7-TRUNC ( 1129) GCGCGTCTCTA 1 MIC4-TRUNC ( 633) GTGCGGCTCTG 1 MIC2-TRUNC ( 1414) TGGCGTCTCTC 1 Calmodulin-TRUNC ( 597) GCGTGTCCCTG 1 PK4-TRUNC ( 546) CGGCTTCTCTC 1 cAMP_kinase_reg-trunc ( 292) GCGCTGCTCTG 1 GRA2pre-TRUNC ( 1353) CTGCGTCGCTG 1 Camp-kin-cat-trunc ( 472) CGGTGTCCCTG 1 G6PD_TRUNC ( 538) TCGCGTCGCTG 1 ROP9-TRUNC ( 1481) GTGCTTCCCTC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 23791 bayes= 10.5372 E= 8.1e+004 -1064 -10 134 -109 -1064 22 102 -51 -1064 -1064 202 -1064 -1064 160 -1064 -51 -1064 -1064 161 -9 -1064 -1064 -98 171 -1064 190 -1064 -1064 -1064 -52 -98 137 -1064 190 -1064 -1064 -1064 -1064 -198 181 -180 -52 148 -209 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 n= 23791 E= 8.1e+004 0.000136 0.250012 0.624763 0.125088 0.000136 0.312473 0.499842 0.187549 0.000136 0.000168 0.999529 0.000167 0.000136 0.812160 0.000154 0.187549 0.000136 0.000168 0.749685 0.250010 0.000136 0.000168 0.125076 0.874620 0.000136 0.999543 0.000154 0.000167 0.000136 0.187551 0.125076 0.687237 0.000136 0.999543 0.000154 0.000167 0.000136 0.000168 0.062615 0.937081 0.062597 0.187551 0.687224 0.062628 -------------------------------------------------------------------------------- Time 244.55 secs. ******************************************************************************** ******************************************************************************** MOTIF 7 width = 11 sites = 16 llr = 157 E-value = 3.5e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 7 Description -------------------------------------------------------------------------------- Simplified A 6889:49:::: pos.-specific C 1::1a118::4 probability G 332::3:3:36 matrix T :::::31:a7: bits 2.2 2.0 * * 1.8 ** * 1.5 *** * * Information 1.3 **** * * content 1.1 **** **** (14.2 bits) 0.9 ***** ***** 0.7 ***** ***** 0.4 ***** ***** 0.2 *********** 0.0 ----------- Multilevel AAAACAACTTG consensus GG G G GC sequence T -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------- MIC6-TRUNC 485 7.36e-07 CAGCCGATCC AAAACTACTTG CCCACTTCCG PDIpro-trunc 75 1.18e-06 CACAGGAGGC AAAACGACTTC CACAATATTG GRA2pre-TRUNC 1075 2.11e-06 AGGCTTGCGA AAAACAAGTTC GTCGCAAAAG G6PD_TRUNC 502 3.66e-06 GATCTGTGTA GAAACAACTGG TGAATAGCTG PK4-TRUNC 116 5.16e-06 CGTTGGTGAA AAAACGAGTTC CACCCGGCGG ROP9-TRUNC 781 7.99e-06 CCTGCTCGAA GAAACGACTGG ATTTTACTTC MIC11-PRO-TRUNC 429 1.04e-05 TACACATTAC AGAACTACTGG AATGCTCCAG MIC2-TRUNC 318 1.14e-05 GACATGCAGA AAGACAACTGC TGAAGGAATC GRA7-TRUNC 1393 1.44e-05 TTGCAGCGGC AAAACATCTTG TGTAAAATTC MIC5-TRUNC 156 1.44e-05 CGGAAAATAT GGAACTACTTG GAACAAAATG Camp-kin-cat-trunc 651 1.56e-05 GCGCTCCAGA AAGACAAGTTC AACTGCTGTA GAPDH_TRUNC 1043 1.56e-05 TATGCAAAAT AGAACTACTGC GCTTATCGGA Calmodulin-TRUNC 890 2.18e-05 GCTGCGATGG CAGACAACTTG GTTTTGATCC MIC4-TRUNC 471 2.62e-05 TAGGGATTGT CGAACGACTTG CGAAGCGACC cAMP_kinase_reg-trunc 731 6.40e-05 TTCCTCGGCC GAAACACGTTG AGGCGGCTTG RiboP-TRUNCATED 625 9.30e-05 TAATCGCTGG AAACCCACTTC TGAAGCTGCA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC6-TRUNC 7.4e-07 484_[7]_1005 PDIpro-trunc 1.2e-06 74_[7]_1415 GRA2pre-TRUNC 2.1e-06 1074_[7]_415 G6PD_TRUNC 3.7e-06 501_[7]_988 PK4-TRUNC 5.2e-06 115_[7]_1374 ROP9-TRUNC 8e-06 780_[7]_709 MIC11-PRO-TRUNC 1e-05 428_[7]_1061 MIC2-TRUNC 1.1e-05 317_[7]_1172 GRA7-TRUNC 1.4e-05 1392_[7]_97 MIC5-TRUNC 1.4e-05 155_[7]_1334 Camp-kin-cat-trunc 1.6e-05 650_[7]_789 GAPDH_TRUNC 1.6e-05 1042_[7]_447 Calmodulin-TRUNC 2.2e-05 889_[7]_600 MIC4-TRUNC 2.6e-05 470_[7]_1019 cAMP_kinase_reg-trunc 6.4e-05 730_[7]_759 RiboP-TRUNCATED 9.3e-05 624_[7]_866 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 7 width=11 seqs=16 MIC6-TRUNC ( 485) AAAACTACTTG 1 PDIpro-trunc ( 75) AAAACGACTTC 1 GRA2pre-TRUNC ( 1075) AAAACAAGTTC 1 G6PD_TRUNC ( 502) GAAACAACTGG 1 PK4-TRUNC ( 116) AAAACGAGTTC 1 ROP9-TRUNC ( 781) GAAACGACTGG 1 MIC11-PRO-TRUNC ( 429) AGAACTACTGG 1 MIC2-TRUNC ( 318) AAGACAACTGC 1 GRA7-TRUNC ( 1393) AAAACATCTTG 1 MIC5-TRUNC ( 156) GGAACTACTTG 1 Camp-kin-cat-trunc ( 651) AAGACAAGTTC 1 GAPDH_TRUNC ( 1043) AGAACTACTGC 1 Calmodulin-TRUNC ( 890) CAGACAACTTG 1 MIC4-TRUNC ( 471) CGAACGACTTG 1 cAMP_kinase_reg-trunc ( 731) GAAACACGTTG 1 RiboP-TRUNCATED ( 625) AAACCCACTTC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 23791 bayes= 10.5372 E= 3.5e+002 152 -110 2 -1064 178 -1064 2 -1064 189 -1064 -39 -1064 210 -210 -1064 -1064 -1064 190 -1064 -1064 100 -210 2 -9 200 -210 -1064 -209 -1064 148 2 -1064 -1064 -1064 -1064 191 -1064 -1064 34 137 -1064 70 119 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 n= 23791 E= 3.5e+002 0.624746 0.125090 0.249998 0.000167 0.749668 0.000168 0.249998 0.000167 0.812129 0.000168 0.187537 0.000167 0.937051 0.062629 0.000154 0.000167 0.000136 0.999543 0.000154 0.000167 0.437363 0.062629 0.249998 0.250010 0.874590 0.062629 0.000154 0.062628 0.000136 0.749699 0.249998 0.000167 0.000136 0.000168 0.000154 0.999542 0.000136 0.000168 0.312459 0.687237 0.000136 0.437395 0.562302 0.000167 -------------------------------------------------------------------------------- Time 284.02 secs. ******************************************************************************** ******************************************************************************** MOTIF 8 width = 19 sites = 16 llr = 191 E-value = 3.7e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 8 Description -------------------------------------------------------------------------------- Simplified A 39:51:::1::32312:33 pos.-specific C 718::11a:13432:3117 probability G 1:21934:918:537184: matrix T :::4:65::9:3:324211 bits 2.2 2.0 * 1.8 * * * 1.5 * * ** Information 1.3 ** * *** content 1.1 ** * **** (17.2 bits) 0.9 *** * **** * * * 0.7 ****** **** * * * 0.4 ************* * * * 0.2 ************* * *** 0.0 ------------------- Multilevel CACAGTTCGTGCGTGTGGC consensus A T GG CACA C AA sequence T G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------------- Calmodulin-TRUNC 1279 1.16e-09 GCGGTATACC CACTGTGCGTGCGTGCGGC TCTTTTCTGA MIC5-TRUNC 573 1.03e-08 GCTATAATCA CACAGTTCGTCCCTGTGAC GCATTTGCTA GAPDH_TRUNC 821 4.22e-08 AGTCCCCATG CACAGTGCGTGAGTACGGA CTCTAGCGTC MIC2-TRUNC 157 1.97e-07 TTACACCAGG AACGGGTCGTGTGGGAGGC CGTCAGCGCT RiboP-TRUNCATED 712 3.46e-07 AGGAGACAGA CACAGCTCGTCCGGGAGAA GAATCTACCT PK4-TRUNC 503 5.23e-07 TAGTTCTCCA CACAGGCCGTGCGGGGGTC CTCAAGCGTC G6PD_TRUNC 610 7.05e-07 CCGACCGTGC CAGAGGGCGTCAGAGTGTC GCAACGCGAC MIC4-TRUNC 329 1.13e-06 ACGAGAAGCT CAGTGTTCGTGTACATGAC CAATGCCGAC ROP9-TRUNC 81 1.47e-06 CCTGCAGACG AACTGTTCGTGACTTGTGC TGCTTATGTG GRA7-TRUNC 522 2.64e-06 GTTTCCGAAC AACGGTTCGTCCCCGCTGC ACTGGCCGGC GRA2pre-TRUNC 815 2.85e-06 CGAGTACGAC CACTGTGCGTGTACTTCGC CAAAAGGAAA Camp-kin-cat-trunc 213 4.50e-06 CTGAACGGGC AACAGTCCACGAGAGCGGC GTTTCTTTGA MIC11-PRO-TRUNC 1087 4.84e-06 CGTCTGTCGA GAGAGTTCGTGTCAGTTAA ACTTGAAGCT cAMP_kinase_reg-trunc 604 5.98e-06 GGCTCCTTCC CACAAGGCGTGAAGGAGCA GAGAAACACC PDIpro-trunc 168 1.15e-05 CTTCTCGGCA CACTGCTCGGGCCTGCGAT CTCACGTGCC MIC6-TRUNC 17 1.46e-05 GACCGGGGTG CCCTGTGCATGTGATTGCC GTTTCCAGTT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- Calmodulin-TRUNC 1.2e-09 1278_[8]_203 MIC5-TRUNC 1e-08 572_[8]_909 GAPDH_TRUNC 4.2e-08 820_[8]_661 MIC2-TRUNC 2e-07 156_[8]_1325 RiboP-TRUNCATED 3.5e-07 711_[8]_771 PK4-TRUNC 5.2e-07 502_[8]_979 G6PD_TRUNC 7e-07 609_[8]_872 MIC4-TRUNC 1.1e-06 328_[8]_1153 ROP9-TRUNC 1.5e-06 80_[8]_1401 GRA7-TRUNC 2.6e-06 521_[8]_960 GRA2pre-TRUNC 2.9e-06 814_[8]_667 Camp-kin-cat-trunc 4.5e-06 212_[8]_1219 MIC11-PRO-TRUNC 4.8e-06 1086_[8]_395 cAMP_kinase_reg-trunc 6e-06 603_[8]_878 PDIpro-trunc 1.1e-05 167_[8]_1314 MIC6-TRUNC 1.5e-05 16_[8]_1465 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 8 width=19 seqs=16 Calmodulin-TRUNC ( 1279) CACTGTGCGTGCGTGCGGC 1 MIC5-TRUNC ( 573) CACAGTTCGTCCCTGTGAC 1 GAPDH_TRUNC ( 821) CACAGTGCGTGAGTACGGA 1 MIC2-TRUNC ( 157) AACGGGTCGTGTGGGAGGC 1 RiboP-TRUNCATED ( 712) CACAGCTCGTCCGGGAGAA 1 PK4-TRUNC ( 503) CACAGGCCGTGCGGGGGTC 1 G6PD_TRUNC ( 610) CAGAGGGCGTCAGAGTGTC 1 MIC4-TRUNC ( 329) CAGTGTTCGTGTACATGAC 1 ROP9-TRUNC ( 81) AACTGTTCGTGACTTGTGC 1 GRA7-TRUNC ( 522) AACGGTTCGTCCCCGCTGC 1 GRA2pre-TRUNC ( 815) CACTGTGCGTGTACTTCGC 1 Camp-kin-cat-trunc ( 213) AACAGTCCACGAGAGCGGC 1 MIC11-PRO-TRUNC ( 1087) GAGAGTTCGTGTCAGTTAA 1 cAMP_kinase_reg-trunc ( 604) CACAAGGCGTGAAGGAGCA 1 PDIpro-trunc ( 168) CACTGCTCGGGCCTGCGAT 1 MIC6-TRUNC ( 17) CCCTGTGCATGTGATTGCC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 19 n= 23663 bayes= 10.5294 E= 3.7e+001 19 136 -198 -1064 210 -210 -1064 -1064 -1064 160 -39 -1064 119 -1064 -98 49 -180 -1064 193 -1064 -1064 -110 2 123 -1064 -110 61 91 -1064 190 -1064 -1064 -80 -1064 183 -1064 -1064 -210 -198 171 -1064 -10 161 -1064 52 48 -1064 23 -22 22 102 -1064 19 -52 2 23 -80 -1064 148 -51 -22 22 -98 49 -1064 -210 161 -51 52 -110 83 -109 19 136 -1064 -209 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 19 n= 23663 E= 3.7e+001 0.249980 0.687238 0.062615 0.000167 0.937051 0.062629 0.000154 0.000167 0.000136 0.812160 0.187537 0.000167 0.499824 0.000168 0.125076 0.374932 0.062597 0.000168 0.937068 0.000167 0.000136 0.125090 0.249998 0.624776 0.000136 0.125090 0.374920 0.499854 0.000136 0.999543 0.000154 0.000167 0.125058 0.000168 0.874607 0.000167 0.000136 0.062629 0.062615 0.874620 0.000136 0.250012 0.749685 0.000167 0.312441 0.374934 0.000154 0.312471 0.187519 0.312473 0.499842 0.000167 0.249980 0.187551 0.249998 0.312471 0.125058 0.000168 0.687224 0.187549 0.187519 0.312473 0.125076 0.374932 0.000136 0.062629 0.749685 0.187549 0.312441 0.125090 0.437381 0.125088 0.249980 0.687238 0.000154 0.062628 -------------------------------------------------------------------------------- Time 323.49 secs. ******************************************************************************** ******************************************************************************** MOTIF 9 width = 11 sites = 16 llr = 158 E-value = 1.2e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 9 Description -------------------------------------------------------------------------------- Simplified A 58136a8a:3: pos.-specific C 5:7:3:1:a35 probability G :238::1::45 matrix T ::::1::::1: bits 2.2 * * 2.0 * ** 1.8 * ** 1.5 * * ** Information 1.3 * * **** content 1.1 ** * **** (14.3 bits) 0.9 ********* * 0.7 ********* * 0.4 ********* * 0.2 *********** 0.0 ----------- Multilevel AACGAAAACGC consensus C GAC AG sequence C -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------- MIC6-TRUNC 883 5.46e-07 AGTATTCCGC CACGAAAACGC GCACCGCAAG GRA2pre-TRUNC 1117 2.25e-06 ACGCACCACG AAGGAAAACGC GTATCACGTC ROP9-TRUNC 1125 2.25e-06 CGTAGCAGGC AACGCAAACGC CAGGCATTGT MIC5-TRUNC 681 3.14e-06 CTGTAGCAAG CACGCAAACGC AGCCCCGTTT GRA7-TRUNC 500 4.49e-06 ACAAAAAGTT CAGGAAAACAG TGTTTCCGAA PK4-TRUNC 432 5.44e-06 GTCCACTTCC AGCGAAAACGG AGTTCACCCT MIC2-TRUNC 878 6.97e-06 CATCGCACCC CACAAAAACCG TTGCCAAGAA RiboP-TRUNCATED 962 7.67e-06 GAAGAAGCGA AAGAAAAACGG AGTAGAGAGG MIC11-PRO-TRUNC 1026 1.25e-05 CACAAACTAA AACACAAACAC TTCCATTTAT cAMP_kinase_reg-trunc 266 1.69e-05 ACAACACTCC AAAGAAAACAC AACTCGAATT GAPDH_TRUNC 1276 1.69e-05 GGACAGGGTG CACACAAACCG CACACGCGTG MIC4-TRUNC 1482 2.34e-05 TGTGACCGCT CACGAACACCC CACGCAAA Camp-kin-cat-trunc 1146 3.21e-05 CAGCCACATC AACGAAGACCC CTTTTCGGTC Calmodulin-TRUNC 1199 4.17e-05 CTACGCTGCT CGCGAAAACTG TGGTGGTTTT G6PD_TRUNC 935 4.44e-05 ATCTCTGGAT AAGGTAAACAG CAGCACTGTG PDIpro-trunc 1174 8.13e-05 CCACACCATT CGCGCACACGG CACACAACAG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC6-TRUNC 5.5e-07 882_[9]_607 GRA2pre-TRUNC 2.3e-06 1116_[9]_373 ROP9-TRUNC 2.3e-06 1124_[9]_365 MIC5-TRUNC 3.1e-06 680_[9]_809 GRA7-TRUNC 4.5e-06 499_[9]_990 PK4-TRUNC 5.4e-06 431_[9]_1058 MIC2-TRUNC 7e-06 877_[9]_612 RiboP-TRUNCATED 7.7e-06 961_[9]_529 MIC11-PRO-TRUNC 1.2e-05 1025_[9]_464 cAMP_kinase_reg-trunc 1.7e-05 265_[9]_1224 GAPDH_TRUNC 1.7e-05 1275_[9]_214 MIC4-TRUNC 2.3e-05 1481_[9]_8 Camp-kin-cat-trunc 3.2e-05 1145_[9]_294 Calmodulin-TRUNC 4.2e-05 1198_[9]_291 G6PD_TRUNC 4.4e-05 934_[9]_555 PDIpro-trunc 8.1e-05 1173_[9]_316 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 9 width=11 seqs=16 MIC6-TRUNC ( 883) CACGAAAACGC 1 GRA2pre-TRUNC ( 1117) AAGGAAAACGC 1 ROP9-TRUNC ( 1125) AACGCAAACGC 1 MIC5-TRUNC ( 681) CACGCAAACGC 1 GRA7-TRUNC ( 500) CAGGAAAACAG 1 PK4-TRUNC ( 432) AGCGAAAACGG 1 MIC2-TRUNC ( 878) CACAAAAACCG 1 RiboP-TRUNCATED ( 962) AAGAAAAACGG 1 MIC11-PRO-TRUNC ( 1026) AACACAAACAC 1 cAMP_kinase_reg-trunc ( 266) AAAGAAAACAC 1 GAPDH_TRUNC ( 1276) CACACAAACCG 1 MIC4-TRUNC ( 1482) CACGAACACCC 1 Camp-kin-cat-trunc ( 1146) AACGAAGACCC 1 Calmodulin-TRUNC ( 1199) CGCGAAAACTG 1 G6PD_TRUNC ( 935) AAGGTAAACAG 1 PDIpro-trunc ( 1174) CGCGCACACGG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 23791 bayes= 10.5372 E= 1.2e+002 119 90 -1064 -1064 189 -1064 -39 -1064 -180 136 2 -1064 19 -1064 161 -1064 152 22 -1064 -209 219 -1064 -1064 -1064 189 -110 -198 -1064 219 -1064 -1064 -1064 -1064 190 -1064 -1064 19 -10 83 -209 -1064 90 102 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 n= 23791 E= 1.2e+002 0.499824 0.499855 0.000154 0.000167 0.812129 0.000168 0.187537 0.000167 0.062597 0.687238 0.249998 0.000167 0.249980 0.000168 0.749685 0.000167 0.624746 0.312473 0.000154 0.062628 0.999512 0.000168 0.000154 0.000167 0.812129 0.125090 0.062615 0.000167 0.999512 0.000168 0.000154 0.000167 0.000136 0.999543 0.000154 0.000167 0.249980 0.250012 0.437381 0.062628 0.000136 0.499855 0.499842 0.000167 -------------------------------------------------------------------------------- Time 362.92 secs. ******************************************************************************** ******************************************************************************** MOTIF 10 width = 11 sites = 16 llr = 151 E-value = 5.8e+004 ******************************************************************************** -------------------------------------------------------------------------------- Motif 10 Description -------------------------------------------------------------------------------- Simplified A 21:1:a2246: pos.-specific C 19:2a::8:12 probability G 1:13::8:638 matrix T 6:94::::::: bits 2.2 * 2.0 ** 1.8 ** 1.5 * ** Information 1.3 ** **** * content 1.1 ** ***** * (13.6 bits) 0.9 ** ******* 0.7 ** ******* 0.4 *** ******* 0.2 *********** 0.0 ----------- Multilevel TCTTCAGCGAG consensus G AG sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------- cAMP_kinase_reg-trunc 1127 2.62e-07 TTTTCGACTC TCTTCAGCGAG CGAGAAAAGG G6PD_TRUNC 726 4.94e-07 CGAGATTATA TCTTCAGCAAG TCACTAAGCC GRA2pre-TRUNC 798 1.51e-06 GTGGAACAGA TCTCCAGCGAG TACGACCACT MIC5-TRUNC 312 1.51e-06 GGTACTACCT TCTGCAGCAAG TGTTCCGTTC RiboP-TRUNCATED 942 2.92e-06 TCGACTTCAT TCTGCAGCGGG AAGAAGCGAA Calmodulin-TRUNC 26 1.74e-05 CGTTCTCCCC ACTTCAGAAAG TTCACAGTAT PK4-TRUNC 680 1.92e-05 GCGAACGTGT TCGCCAGCGAG GAGCTGCGAA Camp-kin-cat-trunc 107 1.92e-05 GTCAAACTGT TCTTCAGCAGC AGGATTGGCA MIC11-PRO-TRUNC 736 2.06e-05 CCCCTCTAGA ACTACAGCGGG GTCAATAGCG GRA7-TRUNC 1032 3.80e-05 CGCGGTTCCA TCGTCAGCGAC GAGGTTCGAC PDIpro-trunc 942 4.16e-05 CGGTTTTCTC GCTTCAACGAG TCTGGCAGCG ROP9-TRUNC 38 5.15e-05 ACAGGTGATC TCTTCAAAAGG GAGCGAAATT GAPDH_TRUNC 493 5.91e-05 AATGACGGAT AATGCAGCGAG TCTGACAGCT MIC4-TRUNC 513 6.75e-05 AAGGGAGGAA CCTACAGCGAC TACCACAAAT MIC6-TRUNC 1223 9.17e-05 TCGATCCGAT CCTGCAACAGG CAGAGGTGTG MIC2-TRUNC 1068 9.17e-05 ATTAGTGTCC TCTCCAGAGCG ACCTTAAATC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- cAMP_kinase_reg-trunc 2.6e-07 1126_[10]_363 G6PD_TRUNC 4.9e-07 725_[10]_764 GRA2pre-TRUNC 1.5e-06 797_[10]_692 MIC5-TRUNC 1.5e-06 311_[10]_1178 RiboP-TRUNCATED 2.9e-06 941_[10]_549 Calmodulin-TRUNC 1.7e-05 25_[10]_1464 PK4-TRUNC 1.9e-05 679_[10]_810 Camp-kin-cat-trunc 1.9e-05 106_[10]_1333 MIC11-PRO-TRUNC 2.1e-05 735_[10]_754 GRA7-TRUNC 3.8e-05 1031_[10]_458 PDIpro-trunc 4.2e-05 941_[10]_548 ROP9-TRUNC 5.1e-05 37_[10]_1452 GAPDH_TRUNC 5.9e-05 492_[10]_997 MIC4-TRUNC 6.7e-05 512_[10]_977 MIC6-TRUNC 9.2e-05 1222_[10]_267 MIC2-TRUNC 9.2e-05 1067_[10]_422 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 10 width=11 seqs=16 cAMP_kinase_reg-trunc ( 1127) TCTTCAGCGAG 1 G6PD_TRUNC ( 726) TCTTCAGCAAG 1 GRA2pre-TRUNC ( 798) TCTCCAGCGAG 1 MIC5-TRUNC ( 312) TCTGCAGCAAG 1 RiboP-TRUNCATED ( 942) TCTGCAGCGGG 1 Calmodulin-TRUNC ( 26) ACTTCAGAAAG 1 PK4-TRUNC ( 680) TCGCCAGCGAG 1 Camp-kin-cat-trunc ( 107) TCTTCAGCAGC 1 MIC11-PRO-TRUNC ( 736) ACTACAGCGGG 1 GRA7-TRUNC ( 1032) TCGTCAGCGAC 1 PDIpro-trunc ( 942) GCTTCAACGAG 1 ROP9-TRUNC ( 38) TCTTCAAAAGG 1 GAPDH_TRUNC ( 493) AATGCAGCGAG 1 MIC4-TRUNC ( 513) CCTACAGCGAC 1 MIC6-TRUNC ( 1223) CCTGCAACAGG 1 MIC2-TRUNC ( 1068) TCTCCAGAGCG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 23791 bayes= 10.5372 E= 5.8e+004 -22 -110 -198 123 -180 180 -1064 -1064 -1064 -1064 -98 171 -80 -52 2 71 -1064 190 -1064 -1064 219 -1064 -1064 -1064 -22 -1064 172 -1064 -22 160 -1064 -1064 78 -1064 134 -1064 152 -210 34 -1064 -1064 -52 172 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 n= 23791 E= 5.8e+004 0.187519 0.125090 0.062615 0.624776 0.062597 0.937082 0.000154 0.000167 0.000136 0.000168 0.125076 0.874620 0.125058 0.187551 0.249998 0.437393 0.000136 0.999543 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 0.187519 0.000168 0.812146 0.000167 0.187519 0.812160 0.000154 0.000167 0.374902 0.000168 0.624763 0.000167 0.624746 0.062629 0.312459 0.000167 0.000136 0.187551 0.812146 0.000167 -------------------------------------------------------------------------------- Time 401.80 secs. ******************************************************************************** ******************************************************************************** MOTIF 11 width = 8 sites = 16 llr = 133 E-value = 1.0e+007 ******************************************************************************** -------------------------------------------------------------------------------- Motif 11 Description -------------------------------------------------------------------------------- Simplified A :::::::: pos.-specific C 2:1:1:98 probability G :9:::::: matrix T 819a9a12 bits 2.2 2.0 * * 1.8 * * 1.5 * *** Information 1.3 ******* content 1.1 ******** (12.0 bits) 0.9 ******** 0.7 ******** 0.4 ******** 0.2 ******** 0.0 -------- Multilevel TGTTTTCC consensus sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 11 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- -------- MIC11-PRO-TRUNC 1058 2.40e-05 TATCATATTA TGTTTTCC TTTTTTCGAG RiboP-TRUNCATED 372 2.40e-05 ATTTCTGGTT TGTTTTCC GGGTGTTGAA MIC6-TRUNC 541 2.40e-05 AGGCATTCAT TGTTTTCC ATAGAAAACA MIC5-TRUNC 1053 2.40e-05 GGCGTCTCTT TGTTTTCC TTGGAGTTCC G6PD_TRUNC 873 2.40e-05 TGAGGAACCG TGTTTTCC CCACATTTTC GRA7-TRUNC 136 7.19e-05 TCTTTCGATT TGTTTTCT ATTCCTTCCG ROP9-TRUNC 1315 7.19e-05 GTTGTGCGAC TGTTTTCT TCCGCCTCTT cAMP_kinase_reg-trunc 1005 7.19e-05 TTCAGTCGCG CGTTTTCC ACCCTCCGGT PDIpro-trunc 1224 7.19e-05 GCTAGCGATG CGTTTTCC CAAGAAAATT PK4-TRUNC 783 1.20e-04 CCCAGGGGTC TGTTTTTC CGGCTCTGTG Camp-kin-cat-trunc 424 1.20e-04 CGAGGTAGGG TGCTTTCC GTCCAAGGTG GRA2pre-TRUNC 131 1.46e-04 GTCGAAAACA TTTTTTCC GTTAAGCGTA MIC2-TRUNC 1395 1.46e-04 TTATCATTTG TTTTTTCC GGCCCGCCCT Calmodulin-TRUNC 1409 1.70e-04 TGTCCGGGGG TGTTCTCC TGGCCCCGAT MIC4-TRUNC 436 2.90e-04 CAGATAGACC CGCTTTCC GTTGGAGAGA GAPDH_TRUNC 1357 2.90e-04 CCTCGGCGTT TGTTTTTT AGTTATTTTT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 11 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC11-PRO-TRUNC 2.4e-05 1057_[11]_435 RiboP-TRUNCATED 2.4e-05 371_[11]_1122 MIC6-TRUNC 2.4e-05 540_[11]_952 MIC5-TRUNC 2.4e-05 1052_[11]_440 G6PD_TRUNC 2.4e-05 872_[11]_620 GRA7-TRUNC 7.2e-05 135_[11]_1357 ROP9-TRUNC 7.2e-05 1314_[11]_178 cAMP_kinase_reg-trunc 7.2e-05 1004_[11]_488 PDIpro-trunc 7.2e-05 1223_[11]_269 PK4-TRUNC 0.00012 782_[11]_710 Camp-kin-cat-trunc 0.00012 423_[11]_1019 GRA2pre-TRUNC 0.00015 130_[11]_1362 MIC2-TRUNC 0.00015 1394_[11]_98 Calmodulin-TRUNC 0.00017 1408_[11]_84 MIC4-TRUNC 0.00029 435_[11]_1057 GAPDH_TRUNC 0.00029 1356_[11]_136 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 11 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 11 width=8 seqs=16 MIC11-PRO-TRUNC ( 1058) TGTTTTCC 1 RiboP-TRUNCATED ( 372) TGTTTTCC 1 MIC6-TRUNC ( 541) TGTTTTCC 1 MIC5-TRUNC ( 1053) TGTTTTCC 1 G6PD_TRUNC ( 873) TGTTTTCC 1 GRA7-TRUNC ( 136) TGTTTTCT 1 ROP9-TRUNC ( 1315) TGTTTTCT 1 cAMP_kinase_reg-trunc ( 1005) CGTTTTCC 1 PDIpro-trunc ( 1224) CGTTTTCC 1 PK4-TRUNC ( 783) TGTTTTTC 1 Camp-kin-cat-trunc ( 424) TGCTTTCC 1 GRA2pre-TRUNC ( 131) TTTTTTCC 1 MIC2-TRUNC ( 1395) TTTTTTCC 1 Calmodulin-TRUNC ( 1409) TGTTCTCC 1 MIC4-TRUNC ( 436) CGCTTTCC 1 GAPDH_TRUNC ( 1357) TGTTTTTT 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 11 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 8 n= 23839 bayes= 10.5401 E= 1.0e+007 -1064 -52 -1064 161 -1064 -1064 183 -109 -1064 -110 -1064 171 -1064 -1064 -1064 191 -1064 -210 -1064 181 -1064 -1064 -1064 191 -1064 170 -1064 -109 -1064 160 -1064 -51 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 11 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 8 n= 23839 E= 1.0e+007 0.000136 0.187551 0.000154 0.812159 0.000136 0.000168 0.874607 0.125088 0.000136 0.125090 0.000154 0.874620 0.000136 0.000168 0.000154 0.999542 0.000136 0.062629 0.000154 0.937081 0.000136 0.000168 0.000154 0.999542 0.000136 0.874621 0.000154 0.125088 0.000136 0.812160 0.000154 0.187549 -------------------------------------------------------------------------------- Time 440.66 secs. ******************************************************************************** ******************************************************************************** MOTIF 12 width = 8 sites = 16 llr = 135 E-value = 1.6e+006 ******************************************************************************** -------------------------------------------------------------------------------- Motif 12 Description -------------------------------------------------------------------------------- Simplified A :::::::1 pos.-specific C :38:1::6 probability G :::a:::3 matrix T a83:9aa: bits 2.2 2.0 * * ** 1.8 * * ** 1.5 * **** Information 1.3 * **** content 1.1 ******* (12.2 bits) 0.9 ******* 0.7 ******** 0.4 ******** 0.2 ******** 0.0 -------- Multilevel TTCGTTTC consensus CT G sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 12 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- -------- RiboP-TRUNCATED 1181 2.40e-05 GAAGACCTCG TTCGTTTC TCCCCCCCAC MIC5-TRUNC 1151 2.40e-05 GTGATTGCAG TTCGTTTC CTCAATTTAC PK4-TRUNC 731 2.40e-05 CCTGCTACGC TTCGTTTC CCGAGAGTCG cAMP_kinase_reg-trunc 1069 2.40e-05 AGTTCTTCGC TTCGTTTC GCCCGGCAGC G6PD_TRUNC 1266 2.40e-05 TTTTAAGATT TTCGTTTC CTGTTTCCCT MIC2-TRUNC 1344 4.59e-05 GCTCATGAGT TTCGTTTG TCAGGTGCAA PDIpro-trunc 874 4.59e-05 TAGAGGCTTC TTCGTTTG TCGCGAGAGA MIC11-PRO-TRUNC 631 9.39e-05 GCCTTTTTCT TCCGTTTC GCAAAACTTG MIC6-TRUNC 693 9.39e-05 GTAAGCATCC TTTGTTTC CGTTTAAAAT Calmodulin-TRUNC 612 9.39e-05 TCCCTGTTTC TCCGTTTC ATAGTCAATT GRA7-TRUNC 1229 1.13e-04 GATCCCTGAT TTCGTTTA CCATTGACGC GRA2pre-TRUNC 201 1.57e-04 GAATACATCT TTTGTTTG CGTCCTGCAC Camp-kin-cat-trunc 82 1.57e-04 CCGTAACGCA TCCGTTTG TGCCCGCGTC GAPDH_TRUNC 622 1.81e-04 GAAGCTCGTG TCTGTTTC CTATGGTTTT ROP9-TRUNC 1197 2.20e-04 ATGTCAAATG TTTGTTTA ATTTATTGTC MIC4-TRUNC 1283 2.44e-04 AGTACAATCA TTCGCTTC TGACAATCGC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 12 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- RiboP-TRUNCATED 2.4e-05 1180_[12]_313 MIC5-TRUNC 2.4e-05 1150_[12]_342 PK4-TRUNC 2.4e-05 730_[12]_762 cAMP_kinase_reg-trunc 2.4e-05 1068_[12]_424 G6PD_TRUNC 2.4e-05 1265_[12]_227 MIC2-TRUNC 4.6e-05 1343_[12]_149 PDIpro-trunc 4.6e-05 873_[12]_619 MIC11-PRO-TRUNC 9.4e-05 630_[12]_862 MIC6-TRUNC 9.4e-05 692_[12]_800 Calmodulin-TRUNC 9.4e-05 611_[12]_881 GRA7-TRUNC 0.00011 1228_[12]_264 GRA2pre-TRUNC 0.00016 200_[12]_1292 Camp-kin-cat-trunc 0.00016 81_[12]_1361 GAPDH_TRUNC 0.00018 621_[12]_871 ROP9-TRUNC 0.00022 1196_[12]_296 MIC4-TRUNC 0.00024 1282_[12]_210 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 12 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 12 width=8 seqs=16 RiboP-TRUNCATED ( 1181) TTCGTTTC 1 MIC5-TRUNC ( 1151) TTCGTTTC 1 PK4-TRUNC ( 731) TTCGTTTC 1 cAMP_kinase_reg-trunc ( 1069) TTCGTTTC 1 G6PD_TRUNC ( 1266) TTCGTTTC 1 MIC2-TRUNC ( 1344) TTCGTTTG 1 PDIpro-trunc ( 874) TTCGTTTG 1 MIC11-PRO-TRUNC ( 631) TCCGTTTC 1 MIC6-TRUNC ( 693) TTTGTTTC 1 Calmodulin-TRUNC ( 612) TCCGTTTC 1 GRA7-TRUNC ( 1229) TTCGTTTA 1 GRA2pre-TRUNC ( 201) TTTGTTTG 1 Camp-kin-cat-trunc ( 82) TCCGTTTG 1 GAPDH_TRUNC ( 622) TCTGTTTC 1 ROP9-TRUNC ( 1197) TTTGTTTA 1 MIC4-TRUNC ( 1283) TTCGCTTC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 12 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 8 n= 23839 bayes= 10.5401 E= 1.6e+006 -1064 -1064 -1064 191 -1064 -10 -1064 149 -1064 148 -1064 -9 -1064 -1064 202 -1064 -1064 -210 -1064 181 -1064 -1064 -1064 191 -1064 -1064 -1064 191 -80 122 2 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 12 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 8 n= 23839 E= 1.6e+006 0.000136 0.000168 0.000154 0.999542 0.000136 0.250012 0.000154 0.749698 0.000136 0.749699 0.000154 0.250010 0.000136 0.000168 0.999529 0.000167 0.000136 0.062629 0.000154 0.937081 0.000136 0.000168 0.000154 0.999542 0.000136 0.000168 0.000154 0.999542 0.125058 0.624777 0.249998 0.000167 -------------------------------------------------------------------------------- Time 479.31 secs. ******************************************************************************** ******************************************************************************** MOTIF 13 width = 6 sites = 16 llr = 123 E-value = 1.4e+007 ******************************************************************************** -------------------------------------------------------------------------------- Motif 13 Description -------------------------------------------------------------------------------- Simplified A :1a::: pos.-specific C a::a:9 probability G :9::a1 matrix T :::::: bits 2.2 * 2.0 * *** 1.8 * *** 1.5 ****** Information 1.3 ****** content 1.1 ****** (11.1 bits) 0.9 ****** 0.7 ****** 0.4 ****** 0.2 ****** 0.0 ------ Multilevel CGACGC consensus sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 13 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------ MIC11-PRO-TRUNC 776 2.57e-04 CGCTAGTTCG CGACGC TCAACCGAGT RiboP-TRUNCATED 19 2.57e-04 TGCGGGGAAT CGACGC TGAGACTCCG GRA7-TRUNC 784 2.57e-04 CGGTGTCACT CGACGC GTTGAGAACG ROP9-TRUNC 1086 2.57e-04 CAGCAGTCAG CGACGC ATTCAGAGAC MIC6-TRUNC 726 2.57e-04 CCAGATGGCA CGACGC CGTCTGGTTT MIC5-TRUNC 947 2.57e-04 ACGTGTTATT CGACGC AGTCTGTTGA MIC4-TRUNC 1311 2.57e-04 ATCGACTGAG CGACGC GTTGATCGTC MIC2-TRUNC 284 2.57e-04 TGGTGAATAA CGACGC AGCCAGCACG PK4-TRUNC 537 2.57e-04 GCGTCCGAGC CGACGC GCACGGCTTC Camp-kin-cat-trunc 1068 2.57e-04 GTAAGGACTG CGACGC CGCTCTCACG PDIpro-trunc 1063 2.57e-04 GGTCCAAAAG CGACGC CGTTATTCTC GAPDH_TRUNC 540 2.57e-04 GATTCTCAAC CGACGC TTCTTAGGGA G6PD_TRUNC 595 2.57e-04 CTGAGAGTCG CGACGC CGACCGTGCC GRA2pre-TRUNC 55 4.85e-04 GCAGAATGCT CAACGC GGGCAGCACT cAMP_kinase_reg-trunc 1046 4.85e-04 GTAGCAGCTG CAACGC GAGTCCAAGT Calmodulin-TRUNC 200 7.20e-04 TGCGTTCCCA CGACGG GGATACGGAC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 13 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC11-PRO-TRUNC 0.00026 775_[13]_719 RiboP-TRUNCATED 0.00026 18_[13]_1477 GRA7-TRUNC 0.00026 783_[13]_711 ROP9-TRUNC 0.00026 1085_[13]_409 MIC6-TRUNC 0.00026 725_[13]_769 MIC5-TRUNC 0.00026 946_[13]_548 MIC4-TRUNC 0.00026 1310_[13]_184 MIC2-TRUNC 0.00026 283_[13]_1211 PK4-TRUNC 0.00026 536_[13]_958 Camp-kin-cat-trunc 0.00026 1067_[13]_377 PDIpro-trunc 0.00026 1062_[13]_432 GAPDH_TRUNC 0.00026 539_[13]_955 G6PD_TRUNC 0.00026 594_[13]_900 GRA2pre-TRUNC 0.00048 54_[13]_1440 cAMP_kinase_reg-trunc 0.00048 1045_[13]_449 Calmodulin-TRUNC 0.00072 199_[13]_1295 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 13 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 13 width=6 seqs=16 MIC11-PRO-TRUNC ( 776) CGACGC 1 RiboP-TRUNCATED ( 19) CGACGC 1 GRA7-TRUNC ( 784) CGACGC 1 ROP9-TRUNC ( 1086) CGACGC 1 MIC6-TRUNC ( 726) CGACGC 1 MIC5-TRUNC ( 947) CGACGC 1 MIC4-TRUNC ( 1311) CGACGC 1 MIC2-TRUNC ( 284) CGACGC 1 PK4-TRUNC ( 537) CGACGC 1 Camp-kin-cat-trunc ( 1068) CGACGC 1 PDIpro-trunc ( 1063) CGACGC 1 GAPDH_TRUNC ( 540) CGACGC 1 G6PD_TRUNC ( 595) CGACGC 1 GRA2pre-TRUNC ( 55) CAACGC 1 cAMP_kinase_reg-trunc ( 1046) CAACGC 1 Calmodulin-TRUNC ( 200) CGACGG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 13 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 6 n= 23871 bayes= 10.542 E= 1.4e+007 -1064 190 -1064 -1064 -80 -1064 183 -1064 219 -1064 -1064 -1064 -1064 190 -1064 -1064 -1064 -1064 202 -1064 -1064 180 -198 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 13 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 6 n= 23871 E= 1.4e+007 0.000136 0.999543 0.000154 0.000167 0.125058 0.000168 0.874607 0.000167 0.999512 0.000168 0.000154 0.000167 0.000136 0.999543 0.000154 0.000167 0.000136 0.000168 0.999529 0.000167 0.000136 0.937082 0.062615 0.000167 -------------------------------------------------------------------------------- Time 517.63 secs. ******************************************************************************** ******************************************************************************** MOTIF 14 width = 11 sites = 16 llr = 155 E-value = 4.4e+003 ******************************************************************************** -------------------------------------------------------------------------------- Motif 14 Description -------------------------------------------------------------------------------- Simplified A :9:::1494:1 pos.-specific C 9:::1:6::a: probability G :16749:13:9 matrix T 114351::3:: bits 2.2 2.0 * 1.8 * ** 1.5 * * ** Information 1.3 ** * * ** content 1.1 **** *** ** (14.0 bits) 0.9 **** *** ** 0.7 ******** ** 0.4 *********** 0.2 *********** 0.0 ----------- Multilevel CAGGTGCAACG consensus TTG A G sequence T -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 14 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------- MIC2-TRUNC 1353 1.98e-07 TTTCGTTTGT CAGGTGCAACG TCGTCTACTC ROP9-TRUNC 1338 5.42e-07 CTCTTCAGCG CAGGTGAAACG AGCACGCTGT cAMP_kinase_reg-trunc 794 9.15e-07 GCTTCTTTTT CAGGTGCAGCG CCCGGAATGT MIC4-TRUNC 370 1.69e-06 CTTTAGACGT CAGGGGAAGCG AGCGTTCTGT MIC11-PRO-TRUNC 1127 2.30e-06 TCGATGATCT CATGGGCAACG TGGCACAAAC G6PD_TRUNC 216 2.30e-06 GGTGAGGTCA CATGTGAAACG AGGCGAACGG GRA2pre-TRUNC 239 4.05e-06 GAGGCGGCTA CAGTGGAAACG AAAAAAAAGG MIC5-TRUNC 974 1.04e-05 CGGTCGAGCG CATTTGCATCG ACGTCCGCTC PDIpro-trunc 995 1.13e-05 ATCTTCTGTG CATTGGAATCG ACATGTTGAG GAPDH_TRUNC 643 1.25e-05 TGGTTTTTCA CAGGCGCAGCG TCTCTCTTTT PK4-TRUNC 144 2.98e-05 CGGGCGACGA CAGGGGCGGCG AGCGCACGAA RiboP-TRUNCATED 644 5.38e-05 TCTGAAGCTG CAGTGACAGCG CGTTGTTTCC Camp-kin-cat-trunc 865 5.38e-05 TGCCAACGCC CATGTGCATCA GGGACAGGCC GRA7-TRUNC 57 7.48e-05 TTGCTTACTA CAGTGTCATCG AGATCCATAT Calmodulin-TRUNC 468 8.56e-05 AAAAATCATG TGGGTGAAACG AAACAGTACA MIC6-TRUNC 399 1.21e-04 TACGTGTCAT TTTGTGAAACG ACACAGCACA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 14 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC2-TRUNC 2e-07 1352_[14]_137 ROP9-TRUNC 5.4e-07 1337_[14]_152 cAMP_kinase_reg-trunc 9.1e-07 793_[14]_696 MIC4-TRUNC 1.7e-06 369_[14]_1120 MIC11-PRO-TRUNC 2.3e-06 1126_[14]_363 G6PD_TRUNC 2.3e-06 215_[14]_1274 GRA2pre-TRUNC 4.1e-06 238_[14]_1251 MIC5-TRUNC 1e-05 973_[14]_516 PDIpro-trunc 1.1e-05 994_[14]_495 GAPDH_TRUNC 1.2e-05 642_[14]_847 PK4-TRUNC 3e-05 143_[14]_1346 RiboP-TRUNCATED 5.4e-05 643_[14]_847 Camp-kin-cat-trunc 5.4e-05 864_[14]_575 GRA7-TRUNC 7.5e-05 56_[14]_1433 Calmodulin-TRUNC 8.6e-05 467_[14]_1022 MIC6-TRUNC 0.00012 398_[14]_1091 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 14 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 14 width=11 seqs=16 MIC2-TRUNC ( 1353) CAGGTGCAACG 1 ROP9-TRUNC ( 1338) CAGGTGAAACG 1 cAMP_kinase_reg-trunc ( 794) CAGGTGCAGCG 1 MIC4-TRUNC ( 370) CAGGGGAAGCG 1 MIC11-PRO-TRUNC ( 1127) CATGGGCAACG 1 G6PD_TRUNC ( 216) CATGTGAAACG 1 GRA2pre-TRUNC ( 239) CAGTGGAAACG 1 MIC5-TRUNC ( 974) CATTTGCATCG 1 PDIpro-trunc ( 995) CATTGGAATCG 1 GAPDH_TRUNC ( 643) CAGGCGCAGCG 1 PK4-TRUNC ( 144) CAGGGGCGGCG 1 RiboP-TRUNCATED ( 644) CAGTGACAGCG 1 Camp-kin-cat-trunc ( 865) CATGTGCATCA 1 GRA7-TRUNC ( 57) CAGTGTCATCG 1 Calmodulin-TRUNC ( 468) TGGGTGAAACG 1 MIC6-TRUNC ( 399) TTTGTGAAACG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 14 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 23791 bayes= 10.5372 E= 4.4e+003 -1064 170 -1064 -109 200 -1064 -198 -209 -1064 -1064 134 49 -1064 -1064 148 23 -1064 -210 83 91 -180 -1064 183 -209 100 107 -1064 -1064 210 -1064 -198 -1064 100 -1064 34 -9 -1064 190 -1064 -1064 -180 -1064 193 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 14 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 n= 23791 E= 4.4e+003 0.000136 0.874621 0.000154 0.125088 0.874590 0.000168 0.062615 0.062628 0.000136 0.000168 0.624763 0.374932 0.000136 0.000168 0.687224 0.312471 0.000136 0.062629 0.437381 0.499854 0.062597 0.000168 0.874607 0.062628 0.437363 0.562316 0.000154 0.000167 0.937051 0.000168 0.062615 0.000167 0.437363 0.000168 0.312459 0.250010 0.000136 0.999543 0.000154 0.000167 0.062597 0.000168 0.937068 0.000167 -------------------------------------------------------------------------------- Time 556.43 secs. ******************************************************************************** ******************************************************************************** MOTIF 15 width = 6 sites = 16 llr = 127 E-value = 1.5e+005 ******************************************************************************** -------------------------------------------------------------------------------- Motif 15 Description -------------------------------------------------------------------------------- Simplified A :a:2a: pos.-specific C a::8:a probability G ::a::: matrix T :::::: bits 2.2 * * 2.0 *** ** 1.8 *** ** 1.5 *** ** Information 1.3 ****** content 1.1 ****** (11.5 bits) 0.9 ****** 0.7 ****** 0.4 ****** 0.2 ****** 0.0 ------ Multilevel CAGCAC consensus sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 15 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------ MIC11-PRO-TRUNC 174 2.28e-04 CAAGCGCCTC CAGCAC GGTTTCAAGC RiboP-TRUNCATED 1459 2.28e-04 CCCTCCGTTC CAGCAC CTTTCTGCTC GRA2pre-TRUNC 64 2.28e-04 TCAACGCGGG CAGCAC TTTTCCTCCC ROP9-TRUNC 1264 2.28e-04 ATGGCACTGG CAGCAC GCCGGACTCT MIC6-TRUNC 413 2.28e-04 TGAAACGACA CAGCAC ATAACCACTC MIC4-TRUNC 1143 2.28e-04 CTGATGTTTG CAGCAC ATCGACCATT MIC2-TRUNC 293 2.28e-04 ACGACGCAGC CAGCAC GGTTATTGCG Calmodulin-TRUNC 526 2.28e-04 GACGGGCAGC CAGCAC CGTCGCATAC PK4-TRUNC 354 2.28e-04 AAGGAGCGAA CAGCAC ACAGCCAAGC cAMP_kinase_reg-trunc 23 2.28e-04 AGACCGTCAG CAGCAC GGGTGCTAGC PDIpro-trunc 797 2.28e-04 CAAACTTAGA CAGCAC TTCAATTACT GAPDH_TRUNC 1 2.28e-04 . CAGCAC GTTCTCATGG G6PD_TRUNC 946 2.28e-04 AGGTAAACAG CAGCAC TGTGGGCAGC GRA7-TRUNC 217 4.13e-04 TTCGTGCATC CAGAAC CTTCTGTCCT MIC5-TRUNC 1127 4.13e-04 AACTTAGATC CAGAAC ATCACATAGT Camp-kin-cat-trunc 634 4.13e-04 GGTTTTTGTA CAGAAC GGCGCTCCAG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 15 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC11-PRO-TRUNC 0.00023 173_[15]_1321 RiboP-TRUNCATED 0.00023 1458_[15]_37 GRA2pre-TRUNC 0.00023 63_[15]_1431 ROP9-TRUNC 0.00023 1263_[15]_231 MIC6-TRUNC 0.00023 412_[15]_1082 MIC4-TRUNC 0.00023 1142_[15]_352 MIC2-TRUNC 0.00023 292_[15]_1202 Calmodulin-TRUNC 0.00023 525_[15]_969 PK4-TRUNC 0.00023 353_[15]_1141 cAMP_kinase_reg-trunc 0.00023 22_[15]_1472 PDIpro-trunc 0.00023 796_[15]_698 GAPDH_TRUNC 0.00023 [15]_1494 G6PD_TRUNC 0.00023 945_[15]_549 GRA7-TRUNC 0.00041 216_[15]_1278 MIC5-TRUNC 0.00041 1126_[15]_368 Camp-kin-cat-trunc 0.00041 633_[15]_811 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 15 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 15 width=6 seqs=16 MIC11-PRO-TRUNC ( 174) CAGCAC 1 RiboP-TRUNCATED ( 1459) CAGCAC 1 GRA2pre-TRUNC ( 64) CAGCAC 1 ROP9-TRUNC ( 1264) CAGCAC 1 MIC6-TRUNC ( 413) CAGCAC 1 MIC4-TRUNC ( 1143) CAGCAC 1 MIC2-TRUNC ( 293) CAGCAC 1 Calmodulin-TRUNC ( 526) CAGCAC 1 PK4-TRUNC ( 354) CAGCAC 1 cAMP_kinase_reg-trunc ( 23) CAGCAC 1 PDIpro-trunc ( 797) CAGCAC 1 GAPDH_TRUNC ( 1) CAGCAC 1 G6PD_TRUNC ( 946) CAGCAC 1 GRA7-TRUNC ( 217) CAGAAC 1 MIC5-TRUNC ( 1127) CAGAAC 1 Camp-kin-cat-trunc ( 634) CAGAAC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 15 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 6 n= 23871 bayes= 10.542 E= 1.5e+005 -1064 190 -1064 -1064 219 -1064 -1064 -1064 -1064 -1064 202 -1064 -22 160 -1064 -1064 219 -1064 -1064 -1064 -1064 190 -1064 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 15 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 6 n= 23871 E= 1.5e+005 0.000136 0.999543 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 0.000136 0.000168 0.999529 0.000167 0.187519 0.812160 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 0.000136 0.999543 0.000154 0.000167 -------------------------------------------------------------------------------- Time 594.62 secs. ******************************************************************************** ******************************************************************************** MOTIF 16 width = 15 sites = 16 llr = 165 E-value = 2.5e+006 ******************************************************************************** -------------------------------------------------------------------------------- Motif 16 Description -------------------------------------------------------------------------------- Simplified A 3::321:::13:::: pos.-specific C ::61::::21412:7 probability G 18:5:41983::1a1 matrix T 63418591:5397:2 bits 2.2 2.0 * 1.8 * 1.5 * * * Information 1.3 * *** * * content 1.1 * * *** * * (14.9 bits) 0.9 ** * *** * * 0.7 *** ***** **** 0.4 *** ***** ***** 0.2 *************** 0.0 --------------- Multilevel TGCGTTTGGTCTTGC consensus ATTA G GA sequence T -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 16 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- --------------- GAPDH_TRUNC 858 2.79e-08 TCAGTTGTTG TGTGTTTGGTATTGC GAGTCTGGAG PDIpro-trunc 680 4.93e-07 CAAGCCCTAA TGCGTTTGGTTTTGG CACTCTGATT MIC5-TRUNC 32 1.19e-06 TCTGTGTAAA TGCGTGTGCAATTGC TTTTGTGTGC MIC2-TRUNC 199 1.34e-06 CGAGTCCTGC TGTGAGTGGACTTGC ATCCGCAGTT cAMP_kinase_reg-trunc 898 1.91e-06 AGCAGCCCGC TTTTTTTGGTCTTGC TGTCGCGCTT MIC11-PRO-TRUNC 1440 2.99e-06 AAACGTATGC ATCGTGTGGGCTTGT GGTTTGCAGA GRA2pre-TRUNC 704 2.99e-06 CACCTGCCAG TGCATATGGGTTTGC ATATTTTTGC GRA7-TRUNC 1440 2.99e-06 TGAAGTACCC TGTATTGGGGCTTGC TAACGTTTTG RiboP-TRUNCATED 782 3.31e-06 GGTAGCTGGT GGTCTGTGGTATTGC TCACGTCTTC Camp-kin-cat-trunc 541 1.10e-05 GGAAAGCGGT AGCGAGTGGCCTGGC GGTTGGTTTG MIC6-TRUNC 1397 1.19e-05 GACATCTGTG GGCGTTTGGGATCGT GATGACATCG ROP9-TRUNC 746 2.36e-05 GACCAAGCTG TGCCATTGCTCTCGC GAAGCAAGCT G6PD_TRUNC 812 2.70e-05 GCAGACATAC ATCGTGGGGTTTTGG CCCGCTAGGT Calmodulin-TRUNC 754 2.89e-05 CAAATTCTGG AGCATTTGGCTTGGT GAGTGGGAGA MIC4-TRUNC 179 5.99e-05 GTGCGTGAAC ATTTTGTTGTCTCGC TTTGAGTGGC PK4-TRUNC 178 5.99e-05 GCTCTGTTCT TGCATTTTCTACTGC TCTTGTGGTC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 16 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- GAPDH_TRUNC 2.8e-08 857_[16]_628 PDIpro-trunc 4.9e-07 679_[16]_806 MIC5-TRUNC 1.2e-06 31_[16]_1454 MIC2-TRUNC 1.3e-06 198_[16]_1287 cAMP_kinase_reg-trunc 1.9e-06 897_[16]_588 MIC11-PRO-TRUNC 3e-06 1439_[16]_46 GRA2pre-TRUNC 3e-06 703_[16]_782 GRA7-TRUNC 3e-06 1439_[16]_46 RiboP-TRUNCATED 3.3e-06 781_[16]_705 Camp-kin-cat-trunc 1.1e-05 540_[16]_895 MIC6-TRUNC 1.2e-05 1396_[16]_89 ROP9-TRUNC 2.4e-05 745_[16]_740 G6PD_TRUNC 2.7e-05 811_[16]_674 Calmodulin-TRUNC 2.9e-05 753_[16]_732 MIC4-TRUNC 6e-05 178_[16]_1307 PK4-TRUNC 6e-05 177_[16]_1308 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 16 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 16 width=15 seqs=16 GAPDH_TRUNC ( 858) TGTGTTTGGTATTGC 1 PDIpro-trunc ( 680) TGCGTTTGGTTTTGG 1 MIC5-TRUNC ( 32) TGCGTGTGCAATTGC 1 MIC2-TRUNC ( 199) TGTGAGTGGACTTGC 1 cAMP_kinase_reg-trunc ( 898) TTTTTTTGGTCTTGC 1 MIC11-PRO-TRUNC ( 1440) ATCGTGTGGGCTTGT 1 GRA2pre-TRUNC ( 704) TGCATATGGGTTTGC 1 GRA7-TRUNC ( 1440) TGTATTGGGGCTTGC 1 RiboP-TRUNCATED ( 782) GGTCTGTGGTATTGC 1 Camp-kin-cat-trunc ( 541) AGCGAGTGGCCTGGC 1 MIC6-TRUNC ( 1397) GGCGTTTGGGATCGT 1 ROP9-TRUNC ( 746) TGCCATTGCTCTCGC 1 G6PD_TRUNC ( 812) ATCGTGGGGTTTTGG 1 Calmodulin-TRUNC ( 754) AGCATTTGGCTTGGT 1 MIC4-TRUNC ( 179) ATTTTGTTGTCTCGC 1 PK4-TRUNC ( 178) TGCATTTTCTACTGC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 16 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 15 n= 23727 bayes= 10.5333 E= 2.5e+006 52 -1064 -98 108 -1064 -1064 161 -9 -1064 122 -1064 49 19 -110 102 -109 -22 -1064 -1064 161 -180 -1064 83 91 -1064 -1064 -98 171 -1064 -1064 183 -109 -1064 -52 172 -1064 -80 -110 2 91 52 70 -1064 -9 -1064 -210 -1064 181 -1064 -52 -98 137 -1064 -1064 202 -1064 -1064 136 -98 -51 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 16 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 15 n= 23727 E= 2.5e+006 0.312441 0.000168 0.125076 0.562315 0.000136 0.000168 0.749685 0.250010 0.000136 0.624777 0.000154 0.374932 0.249980 0.125090 0.499842 0.125088 0.187519 0.000168 0.000154 0.812159 0.062597 0.000168 0.437381 0.499854 0.000136 0.000168 0.125076 0.874620 0.000136 0.000168 0.874607 0.125088 0.000136 0.187551 0.812146 0.000167 0.125058 0.125090 0.249998 0.499854 0.312441 0.437395 0.000154 0.250010 0.000136 0.062629 0.000154 0.937081 0.000136 0.187551 0.125076 0.687237 0.000136 0.000168 0.999529 0.000167 0.000136 0.687238 0.125076 0.187549 -------------------------------------------------------------------------------- Time 632.49 secs. ******************************************************************************** ******************************************************************************** MOTIF 17 width = 6 sites = 16 llr = 123 E-value = 8.9e+006 ******************************************************************************** -------------------------------------------------------------------------------- Motif 17 Description -------------------------------------------------------------------------------- Simplified A :aaaa: pos.-specific C 7::::1 probability G 3::::9 matrix T 1::::: bits 2.2 **** 2.0 **** 1.8 **** 1.5 ***** Information 1.3 ***** content 1.1 ***** (11.0 bits) 0.9 ****** 0.7 ****** 0.4 ****** 0.2 ****** 0.0 ------ Multilevel CAAAAG consensus G sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 17 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------ MIC11-PRO-TRUNC 96 1.51e-04 GGTCCTAGGC CAAAAG AGAGCATTGA GRA2pre-TRUNC 834 1.51e-04 TGTACTTCGC CAAAAG GAAAATACAC MIC6-TRUNC 329 1.51e-04 TCAAGCATGC CAAAAG CCGACATACT MIC5-TRUNC 1433 1.51e-04 ACCGTGAAAC CAAAAG CGCAGTTTCA MIC2-TRUNC 520 1.51e-04 TTCGTTATGT CAAAAG AGCGCTCGCT cAMP_kinase_reg-trunc 658 1.51e-04 CCCCAACCGT CAAAAG TGTGAACGTG Camp-kin-cat-trunc 680 1.51e-04 TATCTTTGTT CAAAAG GTAACATAAA PDIpro-trunc 1057 1.51e-04 TAAGGCGGTC CAAAAG CGACGCCGTT GAPDH_TRUNC 1440 1.51e-04 CACGGTGTAG CAAAAG GCGCATTTCT G6PD_TRUNC 902 1.51e-04 TGATTTTGTC CAAAAG TTGCAGAGTC RiboP-TRUNCATED 442 2.89e-04 TTTCAGGGCA GAAAAG GAGGGAAACT ROP9-TRUNC 315 2.89e-04 AAAACGGTGG GAAAAG ATGAATATCG PK4-TRUNC 268 2.89e-04 AAACGCAACG GAAAAG CACGTCGATA MIC4-TRUNC 1177 4.53e-04 ATCTGTGCTG CAAAAC GGGCCTCTGT GRA7-TRUNC 891 6.02e-04 TATAATATCT TAAAAG CAGTTGGGTA Calmodulin-TRUNC 680 7.53e-04 CACTCGCCCG GAAAAC TTTGTAGATC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 17 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC11-PRO-TRUNC 0.00015 95_[17]_1399 GRA2pre-TRUNC 0.00015 833_[17]_661 MIC6-TRUNC 0.00015 328_[17]_1166 MIC5-TRUNC 0.00015 1432_[17]_62 MIC2-TRUNC 0.00015 519_[17]_975 cAMP_kinase_reg-trunc 0.00015 657_[17]_837 Camp-kin-cat-trunc 0.00015 679_[17]_765 PDIpro-trunc 0.00015 1056_[17]_438 GAPDH_TRUNC 0.00015 1439_[17]_55 G6PD_TRUNC 0.00015 901_[17]_593 RiboP-TRUNCATED 0.00029 441_[17]_1054 ROP9-TRUNC 0.00029 314_[17]_1180 PK4-TRUNC 0.00029 267_[17]_1227 MIC4-TRUNC 0.00045 1176_[17]_318 GRA7-TRUNC 0.0006 890_[17]_604 Calmodulin-TRUNC 0.00075 679_[17]_815 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 17 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 17 width=6 seqs=16 MIC11-PRO-TRUNC ( 96) CAAAAG 1 GRA2pre-TRUNC ( 834) CAAAAG 1 MIC6-TRUNC ( 329) CAAAAG 1 MIC5-TRUNC ( 1433) CAAAAG 1 MIC2-TRUNC ( 520) CAAAAG 1 cAMP_kinase_reg-trunc ( 658) CAAAAG 1 Camp-kin-cat-trunc ( 680) CAAAAG 1 PDIpro-trunc ( 1057) CAAAAG 1 GAPDH_TRUNC ( 1440) CAAAAG 1 G6PD_TRUNC ( 902) CAAAAG 1 RiboP-TRUNCATED ( 442) GAAAAG 1 ROP9-TRUNC ( 315) GAAAAG 1 PK4-TRUNC ( 268) GAAAAG 1 MIC4-TRUNC ( 1177) CAAAAC 1 GRA7-TRUNC ( 891) TAAAAG 1 Calmodulin-TRUNC ( 680) GAAAAC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 17 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 6 n= 23871 bayes= 10.542 E= 8.9e+006 -1064 136 2 -209 219 -1064 -1064 -1064 219 -1064 -1064 -1064 219 -1064 -1064 -1064 219 -1064 -1064 -1064 -1064 -110 183 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 17 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 6 n= 23871 E= 8.9e+006 0.000136 0.687238 0.249998 0.062628 0.999512 0.000168 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 0.000136 0.125090 0.874607 0.000167 -------------------------------------------------------------------------------- Time 670.90 secs. ******************************************************************************** ******************************************************************************** MOTIF 18 width = 6 sites = 16 llr = 122 E-value = 2.6e+007 ******************************************************************************** -------------------------------------------------------------------------------- Motif 18 Description -------------------------------------------------------------------------------- Simplified A 3:aa8a pos.-specific C ::::1: probability G :a:::: matrix T 7:::1: bits 2.2 ** * 2.0 *** * 1.8 *** * 1.5 *** * Information 1.3 ***** content 1.1 ****** (11.0 bits) 0.9 ****** 0.7 ****** 0.4 ****** 0.2 ****** 0.0 ------ Multilevel TGAAAA consensus A sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 18 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------ MIC11-PRO-TRUNC 517 1.49e-04 CCCTACGTTT TGAAAA GACTGCCAGT GRA2pre-TRUNC 535 1.49e-04 GTTATGTTAC TGAAAA GTATAGTCAA GRA7-TRUNC 979 1.49e-04 CAGGCGTGGC TGAAAA TCCTCAGCCA ROP9-TRUNC 303 1.49e-04 CTCTCTCCCC TGAAAA CGGTGGGAAA MIC5-TRUNC 1206 1.49e-04 GCTAACTGTG TGAAAA TTCAGTGACC MIC2-TRUNC 621 1.49e-04 ACGCAGAAAC TGAAAA TAACAAGTTT Calmodulin-TRUNC 1305 1.49e-04 GGCTCTTTTC TGAAAA ATTCACATGA PK4-TRUNC 255 1.49e-04 CCGCGTTCTC TGAAAA CGCAACGGAA RiboP-TRUNCATED 172 2.72e-04 ACGGCACCCA AGAAAA ATACCGCTCT cAMP_kinase_reg-trunc 1140 2.72e-04 TCAGCGAGCG AGAAAA GGCTGGTCGA PDIpro-trunc 1234 2.72e-04 CGTTTTCCCA AGAAAA TTCGCGGGCG GAPDH_TRUNC 416 2.72e-04 TGAATTCATT AGAAAA CTAGTACTAC G6PD_TRUNC 326 2.72e-04 AACGAACACC AGAAAA GACGGGCAAG MIC4-TRUNC 957 4.56e-04 CGATGCCGCG TGAACA TGGCGTCCCC Camp-kin-cat-trunc 1188 4.56e-04 TTCTTCTTCT TGAACA GCGGCTCGTC MIC6-TRUNC 620 7.89e-04 CCTCATCACG TGAATA CACGCTGCGT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 18 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC11-PRO-TRUNC 0.00015 516_[18]_978 GRA2pre-TRUNC 0.00015 534_[18]_960 GRA7-TRUNC 0.00015 978_[18]_516 ROP9-TRUNC 0.00015 302_[18]_1192 MIC5-TRUNC 0.00015 1205_[18]_289 MIC2-TRUNC 0.00015 620_[18]_874 Calmodulin-TRUNC 0.00015 1304_[18]_190 PK4-TRUNC 0.00015 254_[18]_1240 RiboP-TRUNCATED 0.00027 171_[18]_1324 cAMP_kinase_reg-trunc 0.00027 1139_[18]_355 PDIpro-trunc 0.00027 1233_[18]_261 GAPDH_TRUNC 0.00027 415_[18]_1079 G6PD_TRUNC 0.00027 325_[18]_1169 MIC4-TRUNC 0.00046 956_[18]_538 Camp-kin-cat-trunc 0.00046 1187_[18]_257 MIC6-TRUNC 0.00079 619_[18]_875 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 18 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 18 width=6 seqs=16 MIC11-PRO-TRUNC ( 517) TGAAAA 1 GRA2pre-TRUNC ( 535) TGAAAA 1 GRA7-TRUNC ( 979) TGAAAA 1 ROP9-TRUNC ( 303) TGAAAA 1 MIC5-TRUNC ( 1206) TGAAAA 1 MIC2-TRUNC ( 621) TGAAAA 1 Calmodulin-TRUNC ( 1305) TGAAAA 1 PK4-TRUNC ( 255) TGAAAA 1 RiboP-TRUNCATED ( 172) AGAAAA 1 cAMP_kinase_reg-trunc ( 1140) AGAAAA 1 PDIpro-trunc ( 1234) AGAAAA 1 GAPDH_TRUNC ( 416) AGAAAA 1 G6PD_TRUNC ( 326) AGAAAA 1 MIC4-TRUNC ( 957) TGAACA 1 Camp-kin-cat-trunc ( 1188) TGAACA 1 MIC6-TRUNC ( 620) TGAATA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 18 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 6 n= 23871 bayes= 10.542 E= 2.6e+007 52 -1064 -1064 137 -1064 -1064 202 -1064 219 -1064 -1064 -1064 219 -1064 -1064 -1064 189 -110 -1064 -209 219 -1064 -1064 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 18 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 6 n= 23871 E= 2.6e+007 0.312441 0.000168 0.000154 0.687237 0.000136 0.000168 0.999529 0.000167 0.999512 0.000168 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 0.812129 0.125090 0.000154 0.062628 0.999512 0.000168 0.000154 0.000167 -------------------------------------------------------------------------------- Time 708.81 secs. ******************************************************************************** ******************************************************************************** MOTIF 19 width = 8 sites = 16 llr = 133 E-value = 5.4e+006 ******************************************************************************** -------------------------------------------------------------------------------- Motif 19 Description -------------------------------------------------------------------------------- Simplified A 823:8a:9 pos.-specific C 17:1::a: probability G 1::91::1 matrix T :17:1::: bits 2.2 * 2.0 ** 1.8 * *** 1.5 * *** Information 1.3 * ***** content 1.1 * ****** (12.0 bits) 0.9 * ****** 0.7 ******** 0.4 ******** 0.2 ******** 0.0 -------- Multilevel ACTGAACA consensus A sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 19 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- -------- cAMP_kinase_reg-trunc 394 1.08e-05 CGTGGCGTTC ACTGAACA CCGACTTGCG PK4-TRUNC 765 1.96e-05 CCCTGGGGAA ACAGAACA CCCAGGGGTC G6PD_TRUNC 1420 1.96e-05 TCCGATTTAA ACAGAACA TTTTGACGCC GRA2pre-TRUNC 1338 2.84e-05 AGAGACGCAA AATGAACA GCGGAACCTG GAPDH_TRUNC 447 2.84e-05 GAGCTGTAAA AATGAACA ACATTCTGTG GRA7-TRUNC 634 4.63e-05 ATCGATCTCT ATTGAACA ACTTCTGAGT MIC5-TRUNC 822 4.63e-05 AAAAAAATGC ATTGAACA AGAGCCGCTT Camp-kin-cat-trunc 1013 4.63e-05 AATAGACGGG AAAGAACA GGCTCACTGA MIC11-PRO-TRUNC 610 5.84e-05 TGCTCGAGGT GCTGAACA CTCGCCTTTT RiboP-TRUNCATED 1099 5.84e-05 CTATCTCCTC GCTGAACA CAGGAGTCTG MIC2-TRUNC 488 7.16e-05 TCTAAATGAG ACTGTACA AGCTGACGAC MIC6-TRUNC 561 1.02e-04 AGAAAACACT ACTGGACA ACCATTCGGT ROP9-TRUNC 662 1.50e-04 GGTATTGAAA CCTGAACA GGAAAATGAG MIC4-TRUNC 718 1.50e-04 GCGACCATTA ACTGAACG ACTAAGCAGG Calmodulin-TRUNC 481 1.50e-04 GTGAAACGAA ACAGTACA GTGTGTGTCC PDIpro-trunc 463 2.11e-04 ACAGAAAACC ACACAACA AGGCGTACAT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 19 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- cAMP_kinase_reg-trunc 1.1e-05 393_[19]_1099 PK4-TRUNC 2e-05 764_[19]_728 G6PD_TRUNC 2e-05 1419_[19]_73 GRA2pre-TRUNC 2.8e-05 1337_[19]_155 GAPDH_TRUNC 2.8e-05 446_[19]_1046 GRA7-TRUNC 4.6e-05 633_[19]_859 MIC5-TRUNC 4.6e-05 821_[19]_671 Camp-kin-cat-trunc 4.6e-05 1012_[19]_430 MIC11-PRO-TRUNC 5.8e-05 609_[19]_883 RiboP-TRUNCATED 5.8e-05 1098_[19]_395 MIC2-TRUNC 7.2e-05 487_[19]_1005 MIC6-TRUNC 0.0001 560_[19]_932 ROP9-TRUNC 0.00015 661_[19]_831 MIC4-TRUNC 0.00015 717_[19]_775 Calmodulin-TRUNC 0.00015 480_[19]_1012 PDIpro-trunc 0.00021 462_[19]_1030 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 19 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 19 width=8 seqs=16 cAMP_kinase_reg-trunc ( 394) ACTGAACA 1 PK4-TRUNC ( 765) ACAGAACA 1 G6PD_TRUNC ( 1420) ACAGAACA 1 GRA2pre-TRUNC ( 1338) AATGAACA 1 GAPDH_TRUNC ( 447) AATGAACA 1 GRA7-TRUNC ( 634) ATTGAACA 1 MIC5-TRUNC ( 822) ATTGAACA 1 Camp-kin-cat-trunc ( 1013) AAAGAACA 1 MIC11-PRO-TRUNC ( 610) GCTGAACA 1 RiboP-TRUNCATED ( 1099) GCTGAACA 1 MIC2-TRUNC ( 488) ACTGTACA 1 MIC6-TRUNC ( 561) ACTGGACA 1 ROP9-TRUNC ( 662) CCTGAACA 1 MIC4-TRUNC ( 718) ACTGAACG 1 Calmodulin-TRUNC ( 481) ACAGTACA 1 PDIpro-trunc ( 463) ACACAACA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 19 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 8 n= 23839 bayes= 10.5401 E= 5.4e+006 189 -210 -98 -1064 -22 136 -1064 -109 52 -1064 -1064 137 -1064 -210 193 -1064 189 -1064 -198 -109 219 -1064 -1064 -1064 -1064 190 -1064 -1064 210 -1064 -198 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 19 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 8 n= 23839 E= 5.4e+006 0.812129 0.062629 0.125076 0.000167 0.187519 0.687238 0.000154 0.125088 0.312441 0.000168 0.000154 0.687237 0.000136 0.062629 0.937068 0.000167 0.812129 0.000168 0.062615 0.125088 0.999512 0.000168 0.000154 0.000167 0.000136 0.999543 0.000154 0.000167 0.937051 0.000168 0.062615 0.000167 -------------------------------------------------------------------------------- Time 745.93 secs. ******************************************************************************** ******************************************************************************** MOTIF 20 width = 6 sites = 16 llr = 121 E-value = 8.9e+007 ******************************************************************************** -------------------------------------------------------------------------------- Motif 20 Description -------------------------------------------------------------------------------- Simplified A :::8:a pos.-specific C a:::a: probability G :a1::: matrix T ::93:: bits 2.2 * 2.0 ** ** 1.8 ** ** 1.5 *** ** Information 1.3 ****** content 1.1 ****** (10.9 bits) 0.9 ****** 0.7 ****** 0.4 ****** 0.2 ****** 0.0 ------ Multilevel CGTACA consensus T sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 20 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------ MIC11-PRO-TRUNC 958 2.26e-04 GAGCACCGCA CGTACA GCTCGACGTG RiboP-TRUNCATED 115 2.26e-04 GAGGCGACAA CGTACA GAGCGTCTCA GRA2pre-TRUNC 297 2.26e-04 TAGCTGATTT CGTACA CAGACAGATC ROP9-TRUNC 697 2.26e-04 TTGTCTCTCA CGTACA TCTGTGCGTT MIC6-TRUNC 511 2.26e-04 TTCCGGTGTA CGTACA TCGCGCGACA MIC4-TRUNC 860 2.26e-04 TTCGCCGCTT CGTACA ACGTACACTA MIC2-TRUNC 399 2.26e-04 CACTGTTACT CGTACA CTAGCTTCAT Calmodulin-TRUNC 1442 2.26e-04 TGGTGGTTAG CGTACA CATACACTTC cAMP_kinase_reg-trunc 1026 2.26e-04 CTCCGGTCTA CGTACA CCCCGTAGCA Camp-kin-cat-trunc 1100 2.26e-04 TATGCCGTTG CGTACA ATGTCGGTGA PDIpro-trunc 474 2.26e-04 CACAACAAGG CGTACA TTCCTTCCGC MIC5-TRUNC 328 5.02e-04 GCAAGTGTTC CGTTCA CTTTCCTTTA PK4-TRUNC 1167 5.02e-04 CCACTGCGCA CGTTCA TCTTTGACGT GAPDH_TRUNC 745 5.02e-04 AAGTTCTTTG CGTTCA GGCCGTCCCT G6PD_TRUNC 1397 5.02e-04 GCAGTTGCCT CGTTCA TCGTGTGTCC GRA7-TRUNC 834 7.11e-04 GGTAGAATTT CGGACA GTGAACCATT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 20 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- MIC11-PRO-TRUNC 0.00023 957_[20]_537 RiboP-TRUNCATED 0.00023 114_[20]_1381 GRA2pre-TRUNC 0.00023 296_[20]_1198 ROP9-TRUNC 0.00023 696_[20]_798 MIC6-TRUNC 0.00023 510_[20]_984 MIC4-TRUNC 0.00023 859_[20]_635 MIC2-TRUNC 0.00023 398_[20]_1096 Calmodulin-TRUNC 0.00023 1441_[20]_53 cAMP_kinase_reg-trunc 0.00023 1025_[20]_469 Camp-kin-cat-trunc 0.00023 1099_[20]_345 PDIpro-trunc 0.00023 473_[20]_1021 MIC5-TRUNC 0.0005 327_[20]_1167 PK4-TRUNC 0.0005 1166_[20]_328 GAPDH_TRUNC 0.0005 744_[20]_750 G6PD_TRUNC 0.0005 1396_[20]_98 GRA7-TRUNC 0.00071 833_[20]_661 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 20 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 20 width=6 seqs=16 MIC11-PRO-TRUNC ( 958) CGTACA 1 RiboP-TRUNCATED ( 115) CGTACA 1 GRA2pre-TRUNC ( 297) CGTACA 1 ROP9-TRUNC ( 697) CGTACA 1 MIC6-TRUNC ( 511) CGTACA 1 MIC4-TRUNC ( 860) CGTACA 1 MIC2-TRUNC ( 399) CGTACA 1 Calmodulin-TRUNC ( 1442) CGTACA 1 cAMP_kinase_reg-trunc ( 1026) CGTACA 1 Camp-kin-cat-trunc ( 1100) CGTACA 1 PDIpro-trunc ( 474) CGTACA 1 MIC5-TRUNC ( 328) CGTTCA 1 PK4-TRUNC ( 1167) CGTTCA 1 GAPDH_TRUNC ( 745) CGTTCA 1 G6PD_TRUNC ( 1397) CGTTCA 1 GRA7-TRUNC ( 834) CGGACA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 20 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 6 n= 23871 bayes= 10.542 E= 8.9e+007 -1064 190 -1064 -1064 -1064 -1064 202 -1064 -1064 -1064 -198 181 178 -1064 -1064 -9 -1064 190 -1064 -1064 219 -1064 -1064 -1064 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 20 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 6 n= 23871 E= 8.9e+007 0.000136 0.999543 0.000154 0.000167 0.000136 0.000168 0.999529 0.000167 0.000136 0.000168 0.062615 0.937081 0.749668 0.000168 0.000154 0.250010 0.000136 0.999543 0.000154 0.000167 0.999512 0.000168 0.000154 0.000167 -------------------------------------------------------------------------------- Time 783.41 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- G6PD_TRUNC 4.27e-13 10_[3(2.10e-05)]_36_[2(3.85e-05)]_150_[14(2.30e-06)]_149_[5(2.69e-05)]_118_[7(3.66e-06)]_25_[6(4.33e-05)]_61_[8(7.05e-07)]_97_[10(4.94e-07)]_75_[16(2.70e-05)]_46_[11(2.40e-05)]_27_[1(1.30e-06)]_16_[9(4.44e-05)]_320_[12(2.40e-05)]_1_[4(6.14e-06)]_134_[19(1.96e-05)]_73 GAPDH_TRUNC 1.20e-13 58_[4(3.00e-05)]_150_[6(1.78e-06)]_88_[4(3.92e-06)]_117_[19(2.84e-05)]_38_[10(5.91e-05)]_139_[14(1.25e-05)]_8_[3(3.47e-06)]_110_[2(5.32e-05)]_30_[8(4.22e-08)]_18_[16(2.79e-08)]_85_[9(6.59e-05)]_25_[1(1.31e-05)]_13_[5(8.60e-05)]_17_[7(1.56e-05)]_222_[9(1.69e-05)]_214 PDIpro-trunc 3.57e-11 74_[7(1.18e-06)]_14_[3(3.47e-06)]_57_[8(1.15e-05)]_139_[12(9.39e-05)]_23_[1(1.56e-06)]_85_[5(4.62e-05)]_219_[16(4.93e-07)]_33_[2(1.96e-05)]_138_[12(4.59e-05)]_31_[4(3.00e-05)]_18_[10(4.16e-05)]_42_[14(1.13e-05)]_122_[6(1.19e-05)]_35_[9(8.13e-05)]_39_[11(7.19e-05)]_269 Camp-kin-cat-trunc 1.24e-08 106_[10(1.92e-05)]_15_[4(1.98e-05)]_69_[8(4.50e-06)]_83_[2(9.02e-05)]_149_[6(3.97e-05)]_58_[16(1.10e-05)]_95_[7(1.56e-05)]_61_[3(3.77e-06)]_131_[14(5.38e-05)]_137_[19(4.63e-05)]_125_[9(3.21e-05)]_240_[5(1.81e-05)]_46 cAMP_kinase_reg-trunc 5.01e-13 122_[5(3.91e-05)]_135_[9(1.69e-05)]_15_[6(3.46e-05)]_91_[19(1.08e-05)]_27_[2(5.84e-06)]_167_[8(5.98e-06)]_108_[7(6.40e-05)]_52_[14(9.15e-07)]_93_[16(1.91e-06)]_92_[11(7.19e-05)]_56_[12(2.40e-05)]_50_[10(2.62e-07)]_73_[1(1.65e-05)]_88_[4(4.80e-06)]_93_[11(7.19e-05)]_29_[3(2.10e-05)]_39 PK4-TRUNC 1.82e-10 6_[10(4.80e-05)]_18_[1(4.81e-06)]_69_[7(5.16e-06)]_17_[14(2.98e-05)]_23_[16(5.99e-05)]_239_[9(5.44e-06)]_33_[2(5.84e-06)]_19_[8(5.23e-07)]_24_[6(3.46e-05)]_23_[1(4.76e-05)]_49_[3(4.79e-05)]_29_[10(1.92e-05)]_40_[12(2.40e-05)]_26_[19(1.96e-05)]_236_[12(9.39e-05)]_69_[4(1.34e-05)]_372_[5(8.60e-05)]_24 Calmodulin-TRUNC 3.11e-10 25_[10(1.74e-05)]_331_[1(2.55e-05)]_39_[3(2.27e-05)]_25_[2(5.96e-05)]_6_[14(8.56e-05)]_118_[6(2.46e-05)]_4_[12(9.39e-05)]_134_[16(2.89e-05)]_121_[7(2.18e-05)]_195_[9(1.69e-05)]_92_[9(4.17e-05)]_69_[8(1.16e-09)]_94_[4(4.20e-07)]_98 MIC2-TRUNC 1.20e-14 63_[16(5.36e-05)]_47_[2(5.84e-06)]_23_[8(1.97e-07)]_23_[16(1.34e-06)]_104_[7(1.14e-05)]_159_[19(7.16e-05)]_117_[5(9.96e-06)]_257_[9(6.97e-06)]_32_[4(1.07e-05)]_136_[10(9.17e-05)]_45_[5(8.60e-05)]_93_[3(1.64e-05)]_46_[1(2.13e-07)]_51_[12(4.59e-05)]_1_[14(1.98e-07)]_50_[6(2.10e-05)]_76 MIC4-TRUNC 3.12e-08 178_[16(5.99e-05)]_135_[8(1.13e-06)]_22_[14(1.69e-06)]_64_[1(4.81e-06)]_15_[7(2.62e-05)]_31_[10(6.75e-05)]_109_[6(1.77e-05)]_114_[4(5.36e-05)]_224_[5(6.70e-05)]_156_[3(1.44e-05)]_294_[4(1.98e-05)]_9_[9(2.34e-05)]_8 MIC5-TRUNC 1.56e-15 31_[16(1.19e-06)]_[3(1.37e-06)]_98_[7(1.44e-05)]_2_[2(9.02e-05)]_135_[10(1.51e-06)]_250_[8(1.03e-08)]_89_[9(3.14e-06)]_121_[2(3.85e-05)]_1_[19(4.63e-05)]_26_[7(8.78e-05)]_13_[1(3.56e-06)]_9_[12(9.39e-05)]_66_[14(1.04e-05)]_57_[6(1.01e-05)]_[11(2.40e-05)]_51_[5(9.96e-06)]_31_[12(2.40e-05)]_192_[11(7.19e-05)]_51_[4(3.76e-05)]_23_[3(9.51e-06)]_46 MIC6-TRUNC 7.79e-11 16_[8(1.46e-05)]_449_[7(7.36e-07)]_45_[11(2.40e-05)]_144_[12(9.39e-05)]_182_[9(5.46e-07)]_30_[3(1.64e-05)]_154_[2(6.76e-05)]_99_[1(2.16e-05)]_16_[10(9.17e-05)]_163_[16(1.19e-05)]_31_[4(1.74e-06)]_34_[6(6.97e-07)]_2 ROP9-TRUNC 9.45e-11 37_[10(5.15e-05)]_32_[8(1.47e-06)]_325_[2(6.76e-05)]_16_[5(9.96e-06)]_181_[7(7.97e-05)]_97_[16(2.36e-05)]_20_[7(7.99e-06)]_288_[1(4.76e-05)]_1_[1(1.06e-06)]_22_[9(2.25e-06)]_144_[4(1.68e-05)]_24_[11(7.19e-05)]_15_[14(5.42e-07)]_121_[3(5.09e-05)]_[6(9.32e-05)]_9 GRA7-TRUNC 4.94e-12 56_[14(7.48e-05)]_68_[11(7.19e-05)]_203_[3(3.72e-07)]_132_[2(1.30e-05)]_2_[9(4.49e-06)]_11_[8(2.64e-06)]_70_[8(8.65e-05)]_4_[19(4.63e-05)]_307_[5(9.96e-06)]_20_[5(8.60e-05)]_47_[10(3.80e-05)]_86_[6(1.48e-05)]_63_[1(2.13e-07)]_59_[4(4.06e-05)]_109_[7(1.44e-05)]_36_[16(2.99e-06)]_46 GRA2pre-TRUNC 4.22e-13 238_[14(4.05e-06)]_1_[2(5.84e-06)]_56_[5(3.91e-05)]_381_[16(2.99e-06)]_79_[10(1.51e-06)]_6_[8(2.85e-06)]_241_[7(2.11e-06)]_31_[9(2.25e-06)]_196_[1(2.74e-05)]_3_[19(2.84e-05)]_7_[6(3.80e-05)]_40_[3(8.86e-06)]_21_[4(3.05e-06)]_54 RiboP-TRUNCATED 7.88e-11 [5(9.96e-06)]_178_[16(6.66e-05)]_74_[11(7.19e-05)]_88_[11(2.40e-05)]_59_[5(1.81e-05)]_178_[7(9.30e-05)]_8_[14(5.38e-05)]_3_[4(3.76e-05)]_43_[8(3.46e-07)]_51_[16(3.31e-06)]_145_[10(2.92e-06)]_9_[9(7.67e-06)]_69_[1(5.17e-05)]_46_[19(5.84e-05)]_10_[2(5.96e-05)]_56_[12(2.40e-05)]_155_[3(8.86e-06)]_53_[6(1.12e-05)]_[11(7.19e-05)]_75 MIC11-PRO-TRUNC 1.11e-10 428_[7(1.04e-05)]_170_[19(5.84e-05)]_13_[12(9.39e-05)]_21_[2(4.66e-05)]_68_[10(2.06e-05)]_279_[9(1.25e-05)]_21_[11(2.40e-05)]_21_[8(4.84e-06)]_21_[14(2.30e-06)]_2_[5(5.71e-05)]_24_[4(3.39e-05)]_13_[11(7.19e-05)]_103_[6(3.33e-07)]_56_[9(2.34e-05)]_55_[16(2.99e-06)]_46 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 20 reached. ******************************************************************************** CPU: mnode012 ******************************************************************************** From shikida at gmail.com Fri Jun 24 10:59:46 2005 From: shikida at gmail.com (Leonardo Kenji Shikida) Date: Sat Jun 25 20:51:39 2005 Subject: [Bioperl-l] problems during installation Message-ID: <1b4364400506240759372ff1af@mail.gmail.com> trying to install using perl -MCPAN -e "install Bundle::BioPerl" in a suse 9.3 box how should I proceed? >>>>>>>> CPAN.pm: Going to build L/LD/LDS/GD-2.19.tar.gz NOTICE: This module requires libgd 2.0.28 or higher. it will NOT work with earlier versions. If you are getting compile or link errors, then please get and install a new version of libgd from www.boutell.com. Do NOT ask Lincoln for help until you try this. If you are using Math::Trig 1.01 or lower, it has a bug that causes a "prerequisite not found" warning to be issued. You may safely ignore this warning. Type perl Makefile.PL -h for command-line option summary Configuring for libgd version 2.0.32. Included Features: GD_XPM GD_JPEG GD_FONTCONFIG GD_FREETYPE GD_PNG GD_G IF GD library used from: /usr If you experience compile problems, please check the @INC, @LIBPATH and @LIBS arrays defined in Makefile.PL and manually adjust, if necessary. Checking if your kit is complete... Looks good Writing Makefile for GD cp GD/Polyline.pm blib/lib/GD/Polyline.pm cp qd.pl blib/lib/qd.pl cp GD.pm blib/lib/GD.pm AutoSplitting blib/lib/GD.pm (blib/lib/auto/GD) cp GD/Simple.pm blib/lib/GD/Simple.pm /usr/bin/perl /usr/lib/perl5/5.8.6/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8. 6/ExtUtils/typemap -typemap typemap GD.xs > GD.xsc && mv GD.xsc GD.c cc -c -I/usr/include -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -marc h=i586 -mcpu=i686 -fmessage-length=0 -Wall -g -Wall -pipe -DVERSION=\"2.19\" - DXS_VERSION=\"2.19\" -fPIC "-I/usr/lib/perl5/5.8.6/i586-linux-thread-multi/CORE" -DHAVE_JPEG -DHAVE_FT -DHAVE_XPM -DHAVE_GIF -DHAVE_PNG -DHAVE_FONTCONFIG GD.c GD.xs:7:16: gd.h: No such file or directory GD.xs:8:21: gdfontg.h: No such file or directory GD.xs:9:21: gdfontl.h: No such file or directory GD.xs:10:22: gdfontmb.h: No such file or directory -- [] Kenji _______________________ http://kenjiria.blogspot.com http://gaitabh.blogspot.com From hlapp at gmx.net Sun Jun 26 15:14:59 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Jun 26 15:07:15 2005 Subject: [Bioperl-l] Fwd: Re: Bio::DB::BioDB In-Reply-To: <37588.83.151.196.59.1119681697.squirrel@webmail.ebi.ac.uk> References: <37588.83.151.196.59.1119681697.squirrel@webmail.ebi.ac.uk> Message-ID: Madhuri, please do NOT send personal email to Heikki or any other developer with questions or reports concerning any software module. Otherwise you risk that your emails remain unanswered. There's many people on the mailing list who can possibly help you out. As for your report, I fixed the cause for this in bioperl-db. You need to update from cvs, or download the tarball and re-install. The change is in a single file, namely Bio/DB/Persistent/PersistentObject.pm. The new version should have the revision line $Id: PersistentObject.pm,v 1.5 2005/06/26 17:38:22 lapp Exp $ at the top of the file. -hilmar On Jun 25, 2005, at 2:41 AM, lehvasla@ebi.ac.uk wrote: > > From: madhuri battu > Date: June 24, 2005 4:33:46 PM EDT > To: Heikki Lehv?slaiho > Subject: Re: Bio::DB::BioDB > > > Hi, > I loaded the whole Bioperl through cvs and i tried > to run load_seqdatabase.pl script on my machine and i > am getting the following errors > > Could not store NM_000014: Operation `ne': no method > found, > left argument in overloaded package > Bio::Annotation::Reference, > right argument has no overloaded magic at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm > line534, line 581. > > Can you please help me in this matter. > Thanks, > Madhuri. > --- Heikki Lehvaslaiho wrote: > >> Madhuri, >> >> It should work from CVS. I've forwarded your message >> to the bioperl mailing >> list (bioperl-l@bioperl.org). Please further queries >> and more detailed >> problem reports there. >> >> If you have downloaded bioperl-db from CVS, you need >> to add its root directory >> to your PERL5LIB environmental variable. >> >> > PERL5LIB=/home/heikki/src/bioperl/core:/home/heikki/src/bioperl/db >> >> More generic bioperl install instructions can be >> found at: >> > http://www.ebi.ac.uk/~lehvasla/bioperl/InstallingBioperl.html >> >> >> -Heikki >> >> On Thursday 23 June 2005 19:47, madhuri battu wrote: >>> Hi, >>> My name is Madhuri Battu. I am working at UT >>> Southwestern in US as an intern. I have a problem >> with >>> installing Bio::DB::BioDB. It was saying it is 5 >> years >>> old on downloads page and recently updated on View >> cvs >>> page. Can you please tell whether the module is >> being >>> used and how can i install it. If this module is >> not >>> being used what module can be used in place of >> that. >>> Thanks, >>> Madhuri. >>> >>> >>> >>> >>> >>> >>> >> > __________________________________________________________ >>> Free antispam, antivirus and 1GB to save all your >> messages >>> Only in Yahoo! Mail: http://in.mail.yahoo.com >> >> -- >> ______ _/ >> > _/_____________________________________________________ >> _/ _/ >> http://www.ebi.ac.uk/mutations/ >> _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi >> _ac _uk >> _/_/_/_/_/ EMBL Outstation, European >> Bioinformatics Institute >> _/ _/ _/ Wellcome Trust Genome Campus, Hinxton >> _/ _/ _/ Cambridge, CB10 1SD, United Kingdom >> _/ Phone: +44 (0)1223 494 644 FAX: +44 >> (0)1223 494 468 >> ___ >> > _/_/_/_/_/________________________________________________________ >> > > > > > __________________________________________________________ > How much free photo storage do you get? Store your friends 'n family > snaps for FREE with Yahoo! Photos http://in.photos.yahoo.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From fetter_fake at hotmail.com Mon Jun 27 05:14:38 2005 From: fetter_fake at hotmail.com (Zack Napalm) Date: Mon Jun 27 05:09:54 2005 Subject: [Bioperl-l] pubmed article download and storing in object Message-ID: Hi, I am new at Bioperl and wondering whether there is a possibility to scrape articles from PubMed automatically, and put them into an Bio::Biblio::PubmedArticle object. I don't want to scrape them via LWP::Simple get and put them into an object manually, if there is a appropriate function predefined. regards johnny _________________________________________________________________ Sie suchen E-Mails, Dokumente oder Fotos? Die neue MSN Suche Toolbar mit Windows-Desktopsuche liefert in sekundenschnelle Ergebnisse. Jetzt neu! http://desktop.msn.de/ Jetzt gratis downloaden! From khh103 at york.ac.uk Mon Jun 27 06:47:57 2005 From: khh103 at york.ac.uk (Kat Hull) Date: Mon Jun 27 06:40:38 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession In-Reply-To: <42BC2F9E.20607@utk.edu> References: <42BAF1B7.10109@york.ac.uk> <42BB131A.2090403@utk.edu> <42BC298F.8000100@york.ac.uk> <42BC2F9E.20607@utk.edu> Message-ID: <42BFD95D.7070001@york.ac.uk> Hi Stefan, I'm still having problems! I get some features from the sequence object but can't get past this part in the code: $ac=$feat->annotation(); # This is not returning anything. Could you look at the code to tell me where its going wrong? Thanks again, #________________________________________ #!/biol/programs/perl580/bin/perl use Bio::Annotation::Collection; use Bio::DB::GenPept; $gb = new Bio::DB::GenPept; my $seqio = $gb->get_Stream_by_id(['13474692']); my $ac; my $c; while( my $seq = $seqio->next_seq ) { print "seq is ", $seq->display_id, "\n"; my @f=$seq->get_all_SeqFeatures; #This gives you the annotation of the retrieved sequence object foreach my $feat (@f) { ++$c; print "Feature ",$feat->primary_tag," starts ",$feat->start," ends ", $feat->end," strand ",$feat->strand,"\n"; # features retain link to underlying sequence object #print "Feature sequence is ",$feat->seq->seq(),"\n"; my $t = $feat->feature_count; print "Feature count is $t\n"; print "count is $c\n"; $ac=$feat->annotation(); # PROBLEM SEEMS TO BE HERE my $blah=$ac->get_all_annotation_keys(); print "GETS TO HERE WITH NO KEYS $blah\n"; foreach $key ( $ac->get_all_annotation_keys() ) { print "DOESN'T GET TO HERE\n"; @values = $ac->get_Annotations($key); foreach $value ( @values ) { # value is an Bio::AnnotationI, and defines a "as_text" method print "Annotation ",$key," stringified value ",$value->as_text,"\n"; # also defined hash_tree method, which allows data orientated # access into this object $hash = $value->hash_tree(); } } # commented out for now # next unless ($ac->get_Annotations('coded_by')); # my @coded=$ac->get_Annotations('coded_by'); # foreach my $location (@coded) { # print $location->value, " is the location that codes this protein\n"; # } } } Stefan Kirov wrote: > my $seqio = $gb->get_Stream_by_id(['13474692']); > while( my $seq = $seqio->next_seq ) { > print "seq is ", $seq->display_id, "\n"; > my $ann=$seq->annotation; #This gives you the annotation of the > retrieved sequence object > foreach my $dblink ($ann->get_Annotations('DBLink')) { > if ($dblink->database =~/refseq/i) { > print $database->primary_id, " is the mRNA accession > number\n"; > } > } > } > However, the gene you are looking at is not associated with any NM_ > sequence, but rather comes from NC_. Therefore the above will not work > for you. You will have to descend through the sequence features and > find teh feature that says 'coded_by': > use Bio::DB::GenPept; > my $gb=new Bio::DB::GenPept; > my $seqio = $gb->get_Stream_by_id(['13474692']); > while( my $seq = $seqio->next_seq ) { > print "seq is ", $seq->display_id, "\n"; > my @f=$seq->get_SeqFeatures; #This gives you the annotation of > the retrieved sequence object > foreach my $feat (@f) { > my $ann=$feat->annotation; > next unless ($ann->get_Annotations('coded_by')); > my @coded=$ann->get_Annotations('coded_by'); > foreach my $location (@coded) { > print $location->value, " is the location that codes this > protein\n"; > } > } > } > No guarantees the code is typo free :-) > Stefan > > Kat Hull wrote: > >> Hi Stefan, >> Thanks for your advice but i'm still struggling! I have used >> Bio::DB::GenPept to get the protein accession number given the >> protein gi number. However, I don't understand how >> Bio::Annotation::DBLink works. Does it fetch the url of a link on >> the web-site? Basically, if I could use this (or something else) to >> get the url of the CDS link for my protein of interest, I can get the >> corresponding nucleotide accession from this, as it is encoded in the >> url. >> Do you know how to use this module? Is this what you were suggesting >> I try yesterday (I didn't really understand what you were getting at). >> Many thanks, >> >> Kat >> >> ps. Here's where i'm at so far: >> >> >> use Bio::Annotation::DBLink; >> use Bio::DB::GenBank; >> use Bio::DB::GenPept; >> $gb = new Bio::DB::GenPept; >> >> >> # given the gi number, this returns the accession >> my $seqio = $gb->get_Stream_by_id(['13474692']); >> while( my $seq = $seqio->next_seq ) { >> print "seq is ", $seq->display_id, "\n"; >> } >> # not sure what i'm doing here >> >> >> $link2 = new Bio::Annotation::DBLink(); >> $link2->database('dbSNP'); >> $link2->primary_id('2367'); >> >> >> >> >> >> Stefan Kirov wrote: >> >>> Kat, >>> If you are familiar with Bioperl it is kind of easy- >>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on how >>> to get the protein record >>> Go through the dblinks and find the appropriate accession number >>> (where the database method returns GenBank). >>> Then retrieve this accession number(s) through Bio::DB::GenBank. If >>> you are not familiar with Bioperl- read the docs for >>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and >>> Bio::Annotation::DBLink). >>> Hope this helps, >>> Stefan >>> >>> Kat Hull wrote: >>> >>>> Hi there, >>>> I was wondering whether anyone has a solution to my problem. I have >>>> a list of protein assession numbers and want to retrieve the >>>> corresponding nucleotide sequences automatically. I thought it >>>> would be possible to do this by changing the NCBI url, but this >>>> doesn't seem to be the case. >>>> Is there a bio-perl module that can do this? >>>> >>>> Kind regards, >>>> Kat >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >>> Stefan >> >> >> >> >> > From lifei03 at gmail.com Mon Jun 27 07:14:40 2005 From: lifei03 at gmail.com (Frank Lee) Date: Mon Jun 27 07:06:09 2005 Subject: [Bioperl-l] problems during installation In-Reply-To: <1b4364400506240759372ff1af@mail.gmail.com> References: <1b4364400506240759372ff1af@mail.gmail.com> Message-ID: <42BFDFA0.1010609@gmail.com> Just guess, I believe the answer is given when you configing as this : If you experience compile problems, please check the @INC, @LIBPATH and @LIBS arrays defined in Makefile.PL and manually adjust, if necessary. For me, I will locate these files and find where they are and add them to the @INC or @LIBPATH, etc. Wish you luck \ Leonardo Kenji Shikida wrote: >trying to install using perl -MCPAN -e "install Bundle::BioPerl" in a >suse 9.3 box > >how should I proceed? > > > > > CPAN.pm: Going to build L/LD/LDS/GD-2.19.tar.gz > >NOTICE: This module requires libgd 2.0.28 or higher. > it will NOT work with earlier versions. If you are getting > compile or link errors, then please get and install a new > version of libgd from www.boutell.com. Do NOT ask Lincoln > for help until you try this. > > If you are using Math::Trig 1.01 or lower, it has a bug that > causes a "prerequisite not found" warning to be issued. You may > safely ignore this warning. > > Type perl Makefile.PL -h for command-line option summary > >Configuring for libgd version 2.0.32. >Included Features: GD_XPM GD_JPEG GD_FONTCONFIG GD_FREETYPE GD_PNG GD_G >IF >GD library used from: /usr > >If you experience compile problems, please check the @INC, @LIBPATH and @LIBS >arrays defined in Makefile.PL and manually adjust, if necessary. > >Checking if your kit is complete... >Looks good >Writing Makefile for GD >cp GD/Polyline.pm blib/lib/GD/Polyline.pm >cp qd.pl blib/lib/qd.pl >cp GD.pm blib/lib/GD.pm >AutoSplitting blib/lib/GD.pm (blib/lib/auto/GD) >cp GD/Simple.pm blib/lib/GD/Simple.pm >/usr/bin/perl /usr/lib/perl5/5.8.6/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8. >6/ExtUtils/typemap -typemap typemap GD.xs > GD.xsc && mv GD.xsc GD.c >cc -c -I/usr/include -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -marc >h=i586 -mcpu=i686 -fmessage-length=0 -Wall -g -Wall -pipe -DVERSION=\"2.19\" - >DXS_VERSION=\"2.19\" -fPIC "-I/usr/lib/perl5/5.8.6/i586-linux-thread-multi/CORE" > -DHAVE_JPEG -DHAVE_FT -DHAVE_XPM -DHAVE_GIF -DHAVE_PNG -DHAVE_FONTCONFIG GD.c >GD.xs:7:16: gd.h: No such file or directory >GD.xs:8:21: gdfontg.h: No such file or directory >GD.xs:9:21: gdfontl.h: No such file or directory >GD.xs:10:22: gdfontmb.h: No such file or directory > > > > From Richard.Adams at ed.ac.uk Mon Jun 27 09:56:46 2005 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Mon Jun 27 09:48:13 2005 Subject: [Bioperl-l] Bio::DB::CUTG behaving weird Message-ID: <42C0059E.1050708@ed.ac.uk> Stefan, I'm looking into it although I don't have access to either of those OSs. Does the test script (t/DBCUTG.t) run OK on both systems? I imagine the error is coming from the fact that the parser is trying to parse the mitochondrial table and I will try to sort out why that isn't parsing properly. Basically all the DBCUTG module does to cope with a non-unique species is just select the first in the list based on regexp matching, this is usually the most likely choice. I'll put in some hopefully more challenging tests in the test script to help find the bug. If you can run your script with $db->verbose(1) set (using the CVS version) and send me the output I can look into it more... Best wishes, Richard -- Dr Richard Adams Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From skirov at utk.edu Mon Jun 27 10:20:01 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Jun 27 10:13:02 2005 Subject: [Bioperl-l] Bio::DB::CUTG behaving weird In-Reply-To: <42C0059E.1050708@ed.ac.uk> References: <42C0059E.1050708@ed.ac.uk> Message-ID: <42C00B11.3040300@utk.edu> I am trying to debug it right now and I have some limited success. I will let you know what may be wrong as soon as I can. By the way, do you minde adding some simple functionality which may be useful to this module such as 3-way table consisting of AA name, symbol and letter as in: Alanine=>Ala=>A. If you don't have the time I can add this as well (if you consider it useful of course). Stefan Richard Adams wrote: > Stefan, > I'm looking into it although I don't have access to either of those OSs. > Does the test script (t/DBCUTG.t) run OK on both systems? > > I imagine the error is coming from the fact that the parser is trying > to parse the mitochondrial table > and I will try to sort out why that isn't parsing properly. Basically > all the DBCUTG module does to cope > with a non-unique species is just select the first in the list based > on regexp matching, this is usually the most likely choice. > > > I'll put in some hopefully more challenging tests in the test script > to help find the bug. If you can run your script with > $db->verbose(1) set (using the CVS version) and send me the output I > can look into it more... > > Best wishes, > > Richard > > From lifei03 at gmail.com Mon Jun 27 11:06:38 2005 From: lifei03 at gmail.com (Frank Lee) Date: Mon Jun 27 10:57:46 2005 Subject: [Bioperl-l] convert Refseq ID to another ID can be used for Gene ontology? Message-ID: <42C015FE.6060305@gmail.com> Dear All, I met a trouble question for using gene ontology. I have finished my protein analysis by using data from Refseq database. But when I wish to further analyze the protein function with GO, I unfortunately found that refseq ID are not supported:-(. Can anybody have the experience on convert refseq ID to another IDs which can be used in GO? My protein sequence cover Homo, Mus, Arabidopsis, Oryza, Drosophila, C. elegans. It seems IDs used for GO is different for different species. Anyway, I should try one by one. Thanks in advance. Best wishes Frank From skirov at utk.edu Mon Jun 27 11:56:02 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Jun 27 11:47:21 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession In-Reply-To: <42BFD95D.7070001@york.ac.uk> References: <42BAF1B7.10109@york.ac.uk> <42BB131A.2090403@utk.edu> <42BC298F.8000100@york.ac.uk> <42BC2F9E.20607@utk.edu> <42BFD95D.7070001@york.ac.uk> Message-ID: <42C02192.3020400@utk.edu> Kat, In my experience works OK: 60 mendel /home/sao> perl test.pl seq is NP_106261 Feature source starts 1 ends 243 strand 1 Feature count is 0 count is 1 GETS TO HERE WITH NO KEYS 3 DOESN'T GET TO HERE Annotation db_xref stringified value Value: taxon:266835 DOESN'T GET TO HERE Annotation strain stringified value Value: MAFF303099 DOESN'T GET TO HERE Annotation organism stringified value Value: Mesorhizobium loti MAFF303099 Feature Protein starts 1 ends 243 strand 1 Feature count is 0 count is 2 GETS TO HERE WITH NO KEYS 1 DOESN'T GET TO HERE Annotation product stringified value Value: transcriptional regulator Feature CDS starts 1 ends 243 strand 1 Feature count is 0 count is 3 GETS TO HERE WITH NO KEYS 4 DOESN'T GET TO HERE Annotation locus_tag stringified value Value: mlr5637 DOESN'T GET TO HERE Annotation db_xref stringified value Value: GeneID:1228922 DOESN'T GET TO HERE Annotation coded_by stringified value Value: NC_002678.2:4529413..4530144 DOESN'T GET TO HERE Annotation transl_table stringified value Value: 11 What you need is Annotation coded_by stringified value Value: NC_002678.2:4529413..4530144 What bioperl version are you using? Stefan Kat Hull wrote: > Hi Stefan, > I'm still having problems! > I get some features from the sequence object but can't get past this > part in the code: > > $ac=$feat->annotation(); # This is not returning anything. > > Could you look at the code to tell me where its going wrong? > > Thanks again, > #________________________________________ > #!/biol/programs/perl580/bin/perl > > > use Bio::Annotation::Collection; > use Bio::DB::GenPept; > $gb = new Bio::DB::GenPept; > > > my $seqio = > $gb->get_Stream_by_id(['13474692']); > my $ac; > my $c; > > > while( my $seq = $seqio->next_seq ) { > print "seq is ", $seq->display_id, "\n"; > my @f=$seq->get_all_SeqFeatures; #This gives you the > annotation of the retrieved sequence object > > > foreach my $feat (@f) { > ++$c; > print "Feature ",$feat->primary_tag," starts ",$feat->start," > ends ", > $feat->end," strand ",$feat->strand,"\n"; > > > # features retain link to underlying sequence object > #print "Feature sequence is ",$feat->seq->seq(),"\n"; > > > my $t = $feat->feature_count; > print "Feature count is $t\n"; > print "count is $c\n"; > > > $ac=$feat->annotation(); # PROBLEM SEEMS TO BE HERE > > > my $blah=$ac->get_all_annotation_keys(); > print "GETS TO HERE WITH NO KEYS $blah\n"; > > > foreach $key ( $ac->get_all_annotation_keys() ) { > print "DOESN'T GET TO HERE\n"; > @values = $ac->get_Annotations($key); > foreach $value ( @values ) { > # value is an Bio::AnnotationI, and defines a "as_text" > method > print "Annotation ",$key," stringified value > ",$value->as_text,"\n"; > > > # also defined hash_tree method, which allows data > orientated > # access into this object > $hash = $value->hash_tree(); > } > } > > > # commented out for now > > > # next unless ($ac->get_Annotations('coded_by')); > # my @coded=$ac->get_Annotations('coded_by'); > # foreach my $location (@coded) { > # print $location->value, " is the location that codes this > protein\n"; > # } > } > } > > > > > > Stefan Kirov wrote: > >> my $seqio = $gb->get_Stream_by_id(['13474692']); >> while( my $seq = $seqio->next_seq ) { >> print "seq is ", $seq->display_id, "\n"; >> my $ann=$seq->annotation; #This gives you the annotation of the >> retrieved sequence object >> foreach my $dblink ($ann->get_Annotations('DBLink')) { >> if ($dblink->database =~/refseq/i) { >> print $database->primary_id, " is the mRNA accession >> number\n"; >> } >> } >> } >> However, the gene you are looking at is not associated with any NM_ >> sequence, but rather comes from NC_. Therefore the above will not >> work for you. You will have to descend through the sequence features >> and find teh feature that says 'coded_by': >> use Bio::DB::GenPept; >> my $gb=new Bio::DB::GenPept; >> my $seqio = $gb->get_Stream_by_id(['13474692']); >> while( my $seq = $seqio->next_seq ) { >> print "seq is ", $seq->display_id, "\n"; >> my @f=$seq->get_SeqFeatures; #This gives you the annotation of >> the retrieved sequence object >> foreach my $feat (@f) { >> my $ann=$feat->annotation; >> next unless ($ann->get_Annotations('coded_by')); >> my @coded=$ann->get_Annotations('coded_by'); >> foreach my $location (@coded) { >> print $location->value, " is the location that codes this >> protein\n"; >> } >> } >> } >> No guarantees the code is typo free :-) >> Stefan >> >> Kat Hull wrote: >> >>> Hi Stefan, >>> Thanks for your advice but i'm still struggling! I have used >>> Bio::DB::GenPept to get the protein accession number given the >>> protein gi number. However, I don't understand how >>> Bio::Annotation::DBLink works. Does it fetch the url of a link on >>> the web-site? Basically, if I could use this (or something else) >>> to get the url of the CDS link for my protein of interest, I can get >>> the corresponding nucleotide accession from this, as it is encoded >>> in the url. >>> Do you know how to use this module? Is this what you were >>> suggesting I try yesterday (I didn't really understand what you were >>> getting at). >>> Many thanks, >>> >>> Kat >>> >>> ps. Here's where i'm at so far: >>> >>> >>> use Bio::Annotation::DBLink; >>> use Bio::DB::GenBank; >>> use Bio::DB::GenPept; >>> $gb = new Bio::DB::GenPept; >>> >>> >>> # given the gi number, this returns the accession >>> my $seqio = $gb->get_Stream_by_id(['13474692']); >>> while( my $seq = $seqio->next_seq ) { >>> print "seq is ", $seq->display_id, "\n"; >>> } >>> # not sure what i'm doing here >>> >>> >>> $link2 = new Bio::Annotation::DBLink(); >>> $link2->database('dbSNP'); >>> $link2->primary_id('2367'); >>> >>> >>> >>> >>> >>> Stefan Kirov wrote: >>> >>>> Kat, >>>> If you are familiar with Bioperl it is kind of easy- >>>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on >>>> how to get the protein record >>>> Go through the dblinks and find the appropriate accession number >>>> (where the database method returns GenBank). >>>> Then retrieve this accession number(s) through Bio::DB::GenBank. If >>>> you are not familiar with Bioperl- read the docs for >>>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and >>>> Bio::Annotation::DBLink). >>>> Hope this helps, >>>> Stefan >>>> >>>> Kat Hull wrote: >>>> >>>>> Hi there, >>>>> I was wondering whether anyone has a solution to my problem. I >>>>> have a list of protein assession numbers and want to retrieve the >>>>> corresponding nucleotide sequences automatically. I thought it >>>>> would be possible to do this by changing the NCBI url, but this >>>>> doesn't seem to be the case. >>>>> Is there a bio-perl module that can do this? >>>>> >>>>> Kind regards, >>>>> Kat >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>>> >>>> >>>> Stefan >>> >>> >>> >>> >>> >>> >> > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From khh103 at york.ac.uk Mon Jun 27 12:16:28 2005 From: khh103 at york.ac.uk (Kat Hull) Date: Mon Jun 27 12:08:16 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession In-Reply-To: <42C02192.3020400@utk.edu> References: <42BAF1B7.10109@york.ac.uk> <42BB131A.2090403@utk.edu> <42BC298F.8000100@york.ac.uk> <42BC2F9E.20607@utk.edu> <42BFD95D.7070001@york.ac.uk> <42C02192.3020400@utk.edu> Message-ID: <42C0265C.7070408@york.ac.uk> Hi Stefan, I've devised a far simpler way of getting what I want using a perl module called: WWW::Mechanize. It simply gets the page source of a url which you can then parse the appropriate links to get the nucleotide accession number. Thanks again, Kat use WWW::Mechanize; my $mech = WWW::Mechanize->new(); # THIS GETS THE HTML CONTENTS OF A WEBSITE my $url = "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=13488063"; $mech->get( $url ); print $mech->content(); Stefan Kirov wrote: > Kat, > In my experience works OK: > 60 mendel /home/sao> perl test.pl > seq is NP_106261 > Feature source starts 1 ends 243 strand 1 > Feature count is 0 > count is 1 > GETS TO HERE WITH NO KEYS 3 > DOESN'T GET TO HERE > Annotation db_xref stringified value Value: taxon:266835 > DOESN'T GET TO HERE > Annotation strain stringified value Value: MAFF303099 > DOESN'T GET TO HERE > Annotation organism stringified value Value: Mesorhizobium loti > MAFF303099 > Feature Protein starts 1 ends 243 strand 1 > Feature count is 0 > count is 2 > GETS TO HERE WITH NO KEYS 1 > DOESN'T GET TO HERE > Annotation product stringified value Value: transcriptional regulator > Feature CDS starts 1 ends 243 strand 1 > Feature count is 0 > count is 3 > GETS TO HERE WITH NO KEYS 4 > DOESN'T GET TO HERE > Annotation locus_tag stringified value Value: mlr5637 > DOESN'T GET TO HERE > Annotation db_xref stringified value Value: GeneID:1228922 > DOESN'T GET TO HERE > Annotation coded_by stringified value Value: NC_002678.2:4529413..4530144 > DOESN'T GET TO HERE > Annotation transl_table stringified value Value: 11 > > What you need is > Annotation coded_by stringified value Value: NC_002678.2:4529413..4530144 > What bioperl version are you using? > > Stefan > > Kat Hull wrote: > >> Hi Stefan, >> I'm still having problems! >> I get some features from the sequence object but can't get past this >> part in the code: >> >> $ac=$feat->annotation(); # This is not returning anything. >> >> Could you look at the code to tell me where its going wrong? >> >> Thanks again, >> #________________________________________ >> #!/biol/programs/perl580/bin/perl >> >> >> use Bio::Annotation::Collection; >> use Bio::DB::GenPept; >> $gb = new Bio::DB::GenPept; >> >> >> my $seqio = >> $gb->get_Stream_by_id(['13474692']); >> my $ac; >> my $c; >> >> >> while( my $seq = $seqio->next_seq ) { >> print "seq is ", $seq->display_id, "\n"; >> my @f=$seq->get_all_SeqFeatures; #This gives you the >> annotation of the retrieved sequence object >> >> >> foreach my $feat (@f) { >> ++$c; >> print "Feature ",$feat->primary_tag," starts ",$feat->start," >> ends ", >> $feat->end," strand ",$feat->strand,"\n"; >> >> >> # features retain link to underlying sequence object >> #print "Feature sequence is ",$feat->seq->seq(),"\n"; >> >> >> my $t = $feat->feature_count; >> print "Feature count is $t\n"; >> print "count is $c\n"; >> >> >> $ac=$feat->annotation(); # PROBLEM SEEMS TO BE HERE >> >> >> my $blah=$ac->get_all_annotation_keys(); >> print "GETS TO HERE WITH NO KEYS $blah\n"; >> >> >> foreach $key ( $ac->get_all_annotation_keys() ) { >> print "DOESN'T GET TO HERE\n"; >> @values = $ac->get_Annotations($key); >> foreach $value ( @values ) { >> # value is an Bio::AnnotationI, and defines a >> "as_text" method >> print "Annotation ",$key," stringified value >> ",$value->as_text,"\n"; >> >> >> # also defined hash_tree method, which allows data >> orientated >> # access into this object >> $hash = $value->hash_tree(); >> } >> } >> >> >> # commented out for now >> >> >> # next unless ($ac->get_Annotations('coded_by')); >> # my @coded=$ac->get_Annotations('coded_by'); >> # foreach my $location (@coded) { >> # print $location->value, " is the location that codes this >> protein\n"; >> # } >> } >> } >> >> >> >> >> >> Stefan Kirov wrote: >> >>> my $seqio = $gb->get_Stream_by_id(['13474692']); >>> while( my $seq = $seqio->next_seq ) { >>> print "seq is ", $seq->display_id, "\n"; >>> my $ann=$seq->annotation; #This gives you the annotation of >>> the retrieved sequence object >>> foreach my $dblink ($ann->get_Annotations('DBLink')) { >>> if ($dblink->database =~/refseq/i) { >>> print $database->primary_id, " is the mRNA accession >>> number\n"; >>> } >>> } >>> } >>> However, the gene you are looking at is not associated with any NM_ >>> sequence, but rather comes from NC_. Therefore the above will not >>> work for you. You will have to descend through the sequence features >>> and find teh feature that says 'coded_by': >>> use Bio::DB::GenPept; >>> my $gb=new Bio::DB::GenPept; >>> my $seqio = $gb->get_Stream_by_id(['13474692']); >>> while( my $seq = $seqio->next_seq ) { >>> print "seq is ", $seq->display_id, "\n"; >>> my @f=$seq->get_SeqFeatures; #This gives you the annotation of >>> the retrieved sequence object >>> foreach my $feat (@f) { >>> my $ann=$feat->annotation; >>> next unless ($ann->get_Annotations('coded_by')); >>> my @coded=$ann->get_Annotations('coded_by'); >>> foreach my $location (@coded) { >>> print $location->value, " is the location that codes this >>> protein\n"; >>> } >>> } >>> } >>> No guarantees the code is typo free :-) >>> Stefan >>> >>> Kat Hull wrote: >>> >>>> Hi Stefan, >>>> Thanks for your advice but i'm still struggling! I have used >>>> Bio::DB::GenPept to get the protein accession number given the >>>> protein gi number. However, I don't understand how >>>> Bio::Annotation::DBLink works. Does it fetch the url of a link on >>>> the web-site? Basically, if I could use this (or something else) >>>> to get the url of the CDS link for my protein of interest, I can >>>> get the corresponding nucleotide accession from this, as it is >>>> encoded in the url. >>>> Do you know how to use this module? Is this what you were >>>> suggesting I try yesterday (I didn't really understand what you >>>> were getting at). >>>> Many thanks, >>>> >>>> Kat >>>> >>>> ps. Here's where i'm at so far: >>>> >>>> >>>> use Bio::Annotation::DBLink; >>>> use Bio::DB::GenBank; >>>> use Bio::DB::GenPept; >>>> $gb = new Bio::DB::GenPept; >>>> >>>> >>>> # given the gi number, this returns the accession >>>> my $seqio = $gb->get_Stream_by_id(['13474692']); >>>> while( my $seq = $seqio->next_seq ) { >>>> print "seq is ", $seq->display_id, "\n"; >>>> } >>>> # not sure what i'm doing here >>>> >>>> >>>> $link2 = new Bio::Annotation::DBLink(); >>>> $link2->database('dbSNP'); >>>> $link2->primary_id('2367'); >>>> >>>> >>>> >>>> >>>> >>>> Stefan Kirov wrote: >>>> >>>>> Kat, >>>>> If you are familiar with Bioperl it is kind of easy- >>>>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on >>>>> how to get the protein record >>>>> Go through the dblinks and find the appropriate accession number >>>>> (where the database method returns GenBank). >>>>> Then retrieve this accession number(s) through Bio::DB::GenBank. >>>>> If you are not familiar with Bioperl- read the docs for >>>>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and >>>>> Bio::Annotation::DBLink). >>>>> Hope this helps, >>>>> Stefan >>>>> >>>>> Kat Hull wrote: >>>>> >>>>>> Hi there, >>>>>> I was wondering whether anyone has a solution to my problem. I >>>>>> have a list of protein assession numbers and want to retrieve the >>>>>> corresponding nucleotide sequences automatically. I thought it >>>>>> would be possible to do this by changing the NCBI url, but this >>>>>> doesn't seem to be the case. >>>>>> Is there a bio-perl module that can do this? >>>>>> >>>>>> Kind regards, >>>>>> Kat >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@portal.open-bio.org >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Stefan >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> > From hlapp at gmx.net Mon Jun 27 12:17:49 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Jun 27 12:09:18 2005 Subject: [Bioperl-l] Getting nucleotide seq from protein accession In-Reply-To: <42BFD95D.7070001@york.ac.uk> References: <42BAF1B7.10109@york.ac.uk> <42BB131A.2090403@utk.edu> <42BC298F.8000100@york.ac.uk> <42BC2F9E.20607@utk.edu> <42BFD95D.7070001@york.ac.uk> Message-ID: Features aren't getting their annotation bundles populated by the genbank/genpept parsers. Instead, the feature table tags end up as tag/value pairs, i.e., use $feature->get_all_tags() followed by get_tag_values() for each tag returned. Alternatively, you can look at the features tag and annotation bundle as a single bundle using Bio::SeqFeature::AnnotationAdaptor: use Bio::SeqFeature::AnnotationAdaptor; # ... code to obtain feature $feat here my $ac = Bio::SeqFeature::AnnotationAdaptor->new(-feature => $feat); # ... use $ac as you would any annotation collection Hth, -hilmar On Jun 27, 2005, at 6:47 AM, Kat Hull wrote: > Hi Stefan, > I'm still having problems! > I get some features from the sequence object but can't get past this > part in the code: > > $ac=$feat->annotation(); # This is not returning anything. > > Could you look at the code to tell me where its going wrong? > > Thanks again, > #________________________________________ > #!/biol/programs/perl580/bin/perl > > use Bio::Annotation::Collection; > use Bio::DB::GenPept; > $gb = new Bio::DB::GenPept; > > my $seqio = > $gb->get_Stream_by_id(['13474692']); > my $ac; > my $c; > > while( my $seq = $seqio->next_seq ) { > print "seq is ", $seq->display_id, "\n"; > my @f=$seq->get_all_SeqFeatures; #This gives you the > annotation of the retrieved sequence object > > foreach my $feat (@f) { > ++$c; > print "Feature ",$feat->primary_tag," starts ",$feat->start," > ends ", > $feat->end," strand ",$feat->strand,"\n"; > > # features retain link to underlying sequence object > #print "Feature sequence is ",$feat->seq->seq(),"\n"; > > my $t = $feat->feature_count; > print "Feature count is $t\n"; > print "count is $c\n"; > > $ac=$feat->annotation(); # PROBLEM SEEMS TO BE HERE > > my $blah=$ac->get_all_annotation_keys(); > print "GETS TO HERE WITH NO KEYS $blah\n"; > > foreach $key ( $ac->get_all_annotation_keys() ) { > print "DOESN'T GET TO HERE\n"; > @values = $ac->get_Annotations($key); > foreach $value ( @values ) { > # value is an Bio::AnnotationI, and defines a "as_text" > method > print "Annotation ",$key," stringified value > ",$value->as_text,"\n"; > > # also defined hash_tree method, which allows data > orientated > # access into this object > $hash = $value->hash_tree(); > } > } > > # commented out for now > # next unless ($ac->get_Annotations('coded_by')); > # my @coded=$ac->get_Annotations('coded_by'); > # foreach my $location (@coded) { > # print $location->value, " is the location that codes this > protein\n"; > # } > } > } > > > > > > Stefan Kirov wrote: > >> my $seqio = $gb->get_Stream_by_id(['13474692']); >> while( my $seq = $seqio->next_seq ) { >> print "seq is ", $seq->display_id, "\n"; >> my $ann=$seq->annotation; #This gives you the annotation of the >> retrieved sequence object >> foreach my $dblink ($ann->get_Annotations('DBLink')) { >> if ($dblink->database =~/refseq/i) { >> print $database->primary_id, " is the mRNA accession >> number\n"; >> } >> } >> } >> However, the gene you are looking at is not associated with any NM_ >> sequence, but rather comes from NC_. Therefore the above will not >> work for you. You will have to descend through the sequence features >> and find teh feature that says 'coded_by': >> use Bio::DB::GenPept; >> my $gb=new Bio::DB::GenPept; >> my $seqio = $gb->get_Stream_by_id(['13474692']); >> while( my $seq = $seqio->next_seq ) { >> print "seq is ", $seq->display_id, "\n"; >> my @f=$seq->get_SeqFeatures; #This gives you the annotation of >> the retrieved sequence object >> foreach my $feat (@f) { >> my $ann=$feat->annotation; >> next unless ($ann->get_Annotations('coded_by')); >> my @coded=$ann->get_Annotations('coded_by'); >> foreach my $location (@coded) { >> print $location->value, " is the location that codes this >> protein\n"; >> } >> } >> } >> No guarantees the code is typo free :-) >> Stefan >> >> Kat Hull wrote: >> >>> Hi Stefan, >>> Thanks for your advice but i'm still struggling! I have used >>> Bio::DB::GenPept to get the protein accession number given the >>> protein gi number. However, I don't understand how >>> Bio::Annotation::DBLink works. Does it fetch the url of a link on >>> the web-site? Basically, if I could use this (or something else) >>> to get the url of the CDS link for my protein of interest, I can get >>> the corresponding nucleotide accession from this, as it is encoded >>> in the url. >>> Do you know how to use this module? Is this what you were >>> suggesting I try yesterday (I didn't really understand what you were >>> getting at). >>> Many thanks, >>> >>> Kat >>> >>> ps. Here's where i'm at so far: >>> >>> use Bio::Annotation::DBLink; >>> use Bio::DB::GenBank; >>> use Bio::DB::GenPept; >>> $gb = new Bio::DB::GenPept; >>> >>> # given the gi number, this returns the accession >>> my $seqio = $gb->get_Stream_by_id(['13474692']); >>> while( my $seq = $seqio->next_seq ) { >>> print "seq is ", $seq->display_id, "\n"; >>> } >>> # not sure what i'm doing here >>> $link2 = new Bio::Annotation::DBLink(); >>> $link2->database('dbSNP'); >>> $link2->primary_id('2367'); >>> >>> >>> >>> >>> >>> Stefan Kirov wrote: >>> >>>> Kat, >>>> If you are familiar with Bioperl it is kind of easy- >>>> look at Bio::DB::GenPept (I suppose you use GenPept/GenBank?) on >>>> how to get the protein record >>>> Go through the dblinks and find the appropriate accession number >>>> (where the database method returns GenBank). >>>> Then retrieve this accession number(s) through Bio::DB::GenBank. If >>>> you are not familiar with Bioperl- read the docs for >>>> Bio::DB::GenPept, Bio::DB::GenBank, Bio::Annotation and >>>> Bio::Annotation::DBLink). >>>> Hope this helps, >>>> Stefan >>>> >>>> Kat Hull wrote: >>>> >>>>> Hi there, >>>>> I was wondering whether anyone has a solution to my problem. I >>>>> have a list of protein assession numbers and want to retrieve the >>>>> corresponding nucleotide sequences automatically. I thought it >>>>> would be possible to do this by changing the NCBI url, but this >>>>> doesn't seem to be the case. >>>>> Is there a bio-perl module that can do this? >>>>> >>>>> Kind regards, >>>>> Kat >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>>> >>>> Stefan >>> >>> >>> >>> >>> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Mon Jun 27 13:00:28 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Jun 27 12:51:51 2005 Subject: [Bioperl-l] Bio::DB::CUTG behaving weird In-Reply-To: <42C0059E.1050708@ed.ac.uk> References: <42C0059E.1050708@ed.ac.uk> Message-ID: <42C030AC.4000803@utk.edu> Nailed it! SImple as compiling LWP/UserAgent with Get/Head/POST (last I believe not necessary, but...) enabled (disabled by default). UserAgent does not complain loudly these are used, but not available (LWP bug?). This may need to be in the docs, what do you think? All tests pass (have not tried the new ones, but will). Stefan Richard Adams wrote: > Stefan, > I'm looking into it although I don't have access to either of those OSs. > Does the test script (t/DBCUTG.t) run OK on both systems? > > I imagine the error is coming from the fact that the parser is trying > to parse the mitochondrial table > and I will try to sort out why that isn't parsing properly. Basically > all the DBCUTG module does to cope > with a non-unique species is just select the first in the list based > on regexp matching, this is usually the most likely choice. > > > I'll put in some hopefully more challenging tests in the test script > to help find the bug. If you can run your script with > $db->verbose(1) set (using the CVS version) and send me the output I > can look into it more... > > Best wishes, > > Richard > > From allenday at ucla.edu Mon Jun 27 14:26:30 2005 From: allenday at ucla.edu (Allen Day) Date: Mon Jun 27 14:17:52 2005 Subject: [Bioperl-l] pubmed article download and storing in object In-Reply-To: References: Message-ID: try Bio::DB::Biblio::eutils. it doesn't give back an object, but gives back an xml document. to any volunteers out there looking for something to contribute: this would be an easy place to jump in and add something useful to bioperl. you'd have a chance to learn a bit of pubmed query language and xml dom api. -allen On Mon, 27 Jun 2005, Zack Napalm wrote: > Hi, > I am new at Bioperl and wondering whether there is a possibility to scrape > articles from PubMed automatically, and put them into an > Bio::Biblio::PubmedArticle object. > I don't want to scrape them via LWP::Simple get and put them into an object > manually, if there is a appropriate function predefined. > > regards > johnny > > _________________________________________________________________ > Sie suchen E-Mails, Dokumente oder Fotos? Die neue MSN Suche Toolbar mit > Windows-Desktopsuche liefert in sekundenschnelle Ergebnisse. Jetzt neu! > http://desktop.msn.de/ Jetzt gratis downloaden! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From MEC at Stowers-Institute.org Mon Jun 27 18:36:15 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Mon Jun 27 18:27:34 2005 Subject: [Bioperl-l] HOWTO: fix the broken left margin in all HOWTO pdfs from http://bioperl.org/HOWTOs/ Message-ID: <200506272227.j5RMRQ77010799@portal.open-bio.org> Cheers, This article http://www.e-novative-forum.com/article21.htm suggests to me that patching /bioperl-live/doc/howto/sgml/stylesheet/e-novative.xsl as follows is in order Line 367 (e-novative.xsl): - 0 + 0cm as it will fix the broken left margin in all the great HOWTO pdfs (which print out lousy this way)! Can someone prove this patch and if indeed this fixes problem is it possible to release fixed pdfs (with good margins) at http://bioperl.org/HOWTOs/ Thanks? Regards, Malcolm Cook - mec@stowers-institute.org - 816-926-4449 Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, MO USA From hlapp at gmx.net Mon Jun 27 22:06:47 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Jun 27 21:58:05 2005 Subject: [Bioperl-l] why string overload is bad Message-ID: <53373dfd309aab36a69d08ff0b11cf97@gmx.net> I've been debugging bioperl-db now for two days in a row to find all places that are affected by the string overload for the Bio::Annotation::* modules. There are many cases that break because a boolean evaluation of $obj is used as a shorthand for defined($obj) && ref($obj). One may argue that those cases deserve to be fixed anyway. BTW I'm also finding instances of this in the core bioperl library. However, it gets uglier than this. Consider the following innocuous looking code. use Bio::Annotation::SimpleValue; my $a = Bio::Annotation::SimpleValue->new(-value => ""); %args = (-blah,$a); my $ann = $args{-blah} || $args{-BLAH}; # ignore case print "type of annotation: ", ref($ann),"\n"; This will print 'type of annotation:' and nothing else, because the || operator triggers the string overload, so all of a sudden $ann becomes a (empty) string instead of an object. So the correct way to express this is my $ann = $args{-blah}; $ann = $args{-BLAH} unless defined($ann); # ignore case I would argue that string-overloading core library classes requires a degree of discipline when programming against it that the library becomes pretty unforgiving. Not really the spirit of how and for whom Bioperl should be useful. A beginner could have a pretty hard time tracking down the mistake above, let alone the frustration this can generate in the course of such an exercise. Do people really want to go the route of string-overloading the annotation classes? To me it's really over the top and is a step backwards for ease of using the toolkit. Comments, opinions anybody? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From brian_osborne at cognia.com Tue Jun 28 05:23:27 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Jun 28 05:15:55 2005 Subject: [Bioperl-l] HOWTO: fix the broken left margin in all HOWTO pdfs from http://bioperl.org/HOWTOs/ In-Reply-To: <200506272227.j5RMRQ77010799@portal.open-bio.org> Message-ID: Malcolm, Thank you for the tip, I've changed the XSL. Unfortunately I can't verify the change at the moment, my new computer is docbook-deficent! Brian O. On 6/27/05 6:36 PM, "Cook, Malcolm" wrote: > Cheers, > > This article http://www.e-novative-forum.com/article21.htm suggests to > me that patching /bioperl-live/doc/howto/sgml/stylesheet/e-novative.xsl > as follows is in order > > Line 367 (e-novative.xsl): > - 0 > + 0cm > > as it will fix the broken left margin in all the great HOWTO pdfs (which > print out lousy this way)! > > Can someone prove this patch and if indeed this fixes problem is it > possible to release fixed pdfs (with good margins) at > http://bioperl.org/HOWTOs/ > > Thanks? > > Regards, > > Malcolm Cook - mec@stowers-institute.org - 816-926-4449 > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, MO USA > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From fetter_fake at hotmail.com Tue Jun 28 09:09:12 2005 From: fetter_fake at hotmail.com (Zack Napalm) Date: Tue Jun 28 09:00:32 2005 Subject: [Bioperl-l] pubmed article download and storing in object In-Reply-To: Message-ID: how does this work exactly? the bioperl documentation seems very poor concerning structure and content. i have tried the following easy piece of code: my $biblio = new Bio::Biblio; my $bibResult = $biblio->find('p53'); my $cit = $bibResult->get_next(); print "$_\n" for @{$cit}; just a lot of numbers in an array come back, but no xml file => if i try to print $cit directly, just byte[]=ARRAY(0x...) is printed when i try to use Bio::DB::Biblio::eutils, the interpreter says, that this eutils.pm isn't installed. all moduls i use work properly, though. in the bioperl documentation is mentioned, that you shouldn't use this modul directly, but use it via my $biblio = new Bio::Biblio (-access => 'eutils');, which doesn't work either (eutils.pm not installed again). how can i get a proper xml file back? is there a nice bioperl documentation out there? >From: Allen Day >To: Zack Napalm >CC: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] pubmed article download and storing in object >Date: Mon, 27 Jun 2005 11:26:30 -0700 (PDT) > >try Bio::DB::Biblio::eutils. it doesn't give back an object, but gives >back an xml document. > >to any volunteers out there looking for something to contribute: this >would be an easy place to jump in and add something useful to bioperl. >you'd have a chance to learn a bit of pubmed query language and xml dom >api. > >-allen > >On Mon, 27 Jun 2005, Zack Napalm wrote: > > > Hi, > > I am new at Bioperl and wondering whether there is a possibility to >scrape > > articles from PubMed automatically, and put them into an > > Bio::Biblio::PubmedArticle object. > > I don't want to scrape them via LWP::Simple get and put them into an >object > > manually, if there is a appropriate function predefined. > > > > regards > > johnny > > > > _________________________________________________________________ > > Sie suchen E-Mails, Dokumente oder Fotos? Die neue MSN Suche Toolbar mit > > Windows-Desktopsuche liefert in sekundenschnelle Ergebnisse. Jetzt neu! > > http://desktop.msn.de/ Jetzt gratis downloaden! > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ Die neue MSN Suche Toolbar mit Windows-Desktopsuche. Suchen Sie gleichzeitig im Web, Ihren E-Mails und auf Ihrem PC! Jetzt neu! http://desktop.msn.de/ Jetzt gratis downloaden! From skirov at utk.edu Tue Jun 28 09:44:07 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Jun 28 09:35:21 2005 Subject: [Bioperl-l] why string overload is bad In-Reply-To: <53373dfd309aab36a69d08ff0b11cf97@gmx.net> References: <53373dfd309aab36a69d08ff0b11cf97@gmx.net> Message-ID: <42C15427.6010706@utk.edu> Hilmar Lapp wrote: > I've been debugging bioperl-db now for two days in a row to find all > places that are affected by the string overload for the > Bio::Annotation::* modules. There are many cases that break because a > boolean evaluation of $obj is used as a shorthand for defined($obj) && > ref($obj). One may argue that those cases deserve to be fixed anyway. > BTW I'm also finding instances of this in the core bioperl library. > > However, it gets uglier than this. Consider the following innocuous > looking code. > > use Bio::Annotation::SimpleValue; > my $a = Bio::Annotation::SimpleValue->new(-value => ""); > %args = (-blah,$a); > my $ann = $args{-blah} || $args{-BLAH}; # ignore case > print "type of annotation: ", ref($ann),"\n"; > > This will print 'type of annotation:' and nothing else, because the || > operator triggers the string overload, so all of a sudden $ann becomes > a (empty) string instead of an object. So the correct way to express > this is > > my $ann = $args{-blah}; > $ann = $args{-BLAH} unless defined($ann); # ignore case > > I would argue that string-overloading core library classes requires a > degree of discipline when programming against it that the library > becomes pretty unforgiving. > > Not really the spirit of how and for whom Bioperl should be useful. A > beginner could have a pretty hard time tracking down the mistake > above, let alone the frustration this can generate in the course of > such an exercise. > > Do people really want to go the route of string-overloading the > annotation classes? To me it's really over the top and is a step > backwards for ease of using the toolkit. Hilmar definitely has a point here. Overloading also causes trouble with respect to other code- take as an example Devel::ptkdb. Try to inspect a variable, that contains or is an object of type Bio::Annotation::*. Of course, one can always dump the data, but it is slower and this can be rather annoying if you just want to take a peek into something. And I have no doubt there would be other packages that fail because of the overloading. > > Comments, opinions anybody? > > -hilmar > From birney at ebi.ac.uk Tue Jun 28 09:56:39 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Jun 28 09:48:00 2005 Subject: [Bioperl-l] why string overload is bad In-Reply-To: <42C15427.6010706@utk.edu> References: <53373dfd309aab36a69d08ff0b11cf97@gmx.net> <42C15427.6010706@utk.edu> Message-ID: <42C15717.9000900@ebi.ac.uk> >> Not really the spirit of how and for whom Bioperl should be useful. A >> beginner could have a pretty hard time tracking down the mistake >> above, let alone the frustration this can generate in the course of >> such an exercise. >> >> Do people really want to go the route of string-overloading the >> annotation classes? To me it's really over the top and is a step >> backwards for ease of using the toolkit. > > > Hilmar definitely has a point here. Overloading also causes trouble with > respect to other code- take as an example Devel::ptkdb. Try to inspect a > variable, that contains or is an object of type Bio::Annotation::*. Of > course, one can always dump the data, but it is slower and this can be > rather annoying if you just want to take a peek into something. And I > have no doubt there would be other packages that fail because of the > overloading. > I have always been against string overloading. The subtly of the bugs generated and non-obvious code paths (when Perl wants a number, does it go via hte string-overloaded case...) I also (personally) think overloading in C++ is bad. I just think overloading is bad wherever. >> >> Comments, opinions anybody? >> >> -hilmar >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From fetter_fake at hotmail.com Tue Jun 28 10:38:44 2005 From: fetter_fake at hotmail.com (Zack Napalm) Date: Tue Jun 28 10:30:09 2005 Subject: [Bioperl-l] pubmed article download and storing in object In-Reply-To: Message-ID: solution: install bioperl 1.5 (instead of 1.4) and all works fine ;-) >From: "Zack Napalm" >To: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] pubmed article download and storing in object >Date: Tue, 28 Jun 2005 13:09:12 +0000 > >how does this work exactly? the bioperl documentation seems very poor >concerning structure and content. >i have tried the following easy piece of code: > >my $biblio = new Bio::Biblio; >my $bibResult = $biblio->find('p53'); >my $cit = $bibResult->get_next(); >print "$_\n" for @{$cit}; > >just a lot of numbers in an array come back, but no xml file => if i try to >print $cit directly, just byte[]=ARRAY(0x...) is printed > >when i try to use Bio::DB::Biblio::eutils, the interpreter says, that this >eutils.pm isn't installed. all moduls i use work properly, though. in the >bioperl documentation is mentioned, that you shouldn't use this modul >directly, but use it via my $biblio = new Bio::Biblio (-access => >'eutils');, which doesn't work either (eutils.pm not installed again). > >how can i get a proper xml file back? is there a nice bioperl documentation >out there? > >>From: Allen Day >>To: Zack Napalm >>CC: bioperl-l@bioperl.org >>Subject: Re: [Bioperl-l] pubmed article download and storing in object >>Date: Mon, 27 Jun 2005 11:26:30 -0700 (PDT) >> >>try Bio::DB::Biblio::eutils. it doesn't give back an object, but gives >>back an xml document. >> >>to any volunteers out there looking for something to contribute: this >>would be an easy place to jump in and add something useful to bioperl. >>you'd have a chance to learn a bit of pubmed query language and xml dom >>api. >> >>-allen >> >>On Mon, 27 Jun 2005, Zack Napalm wrote: >> >> > Hi, >> > I am new at Bioperl and wondering whether there is a possibility to >>scrape >> > articles from PubMed automatically, and put them into an >> > Bio::Biblio::PubmedArticle object. >> > I don't want to scrape them via LWP::Simple get and put them into an >>object >> > manually, if there is a appropriate function predefined. >> > >> > regards >> > johnny >> > >> > _________________________________________________________________ >> > Sie suchen E-Mails, Dokumente oder Fotos? Die neue MSN Suche Toolbar >>mit >> > Windows-Desktopsuche liefert in sekundenschnelle Ergebnisse. Jetzt neu! >> > http://desktop.msn.de/ Jetzt gratis downloaden! >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l@portal.open-bio.org >> > http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >_________________________________________________________________ >Die neue MSN Suche Toolbar mit Windows-Desktopsuche. Suchen Sie >gleichzeitig im Web, Ihren E-Mails und auf Ihrem PC! Jetzt neu! >http://desktop.msn.de/ Jetzt gratis downloaden! > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ Sie suchen E-Mails, Dokumente oder Fotos? Die neue MSN Suche Toolbar mit Windows-Desktopsuche liefert in sekundenschnelle Ergebnisse. Jetzt neu! http://desktop.msn.de/ Jetzt gratis downloaden! From michael.watson at bbsrc.ac.uk Tue Jun 28 11:11:55 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue Jun 28 11:03:14 2005 Subject: [Bioperl-l] Sequence trace as PNG image Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D7D3@iahce2knas1.iah.bbsrc.reserved> Hi Has anyone contributed code to bioperl (or would be willing to contribute it) that creates a PNG image of a sequence sequence trace (SCF)? I see that bioperl has a nice SequenceTrace object that could be used, but I don't want to reinvent the wheel if it is already out there... Thanks Mick From anunberg at oriongenomics.com Tue Jun 28 11:21:26 2005 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Tue Jun 28 11:12:44 2005 Subject: [Bioperl-l] GFF3 and Gbrowse Message-ID: I was wondering if there is any documentation about using GFF3 format with Gbrowse. Since this is the "new" format, I wanted to start using it, but observing some behaviors. The GFF3 documentation on http://song.sourceforge.net/gff3.shtml indicates the Name tag is the id to be displayed and the ID tag is unique and internal, however when I use Gbrowse 1.62 it is ID that is being displayed as the label. I wish to use processed_transcript aggregator, the GFF3 document indicates you only need to display the exons and CDS and the UTRs will be inferred, however I did not see that when viewed in Gbrowse. If there is some extra code or documentation I need please let me know Thanks Andy -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From brian_osborne at cognia.com Tue Jun 28 12:24:42 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Jun 28 12:16:00 2005 Subject: [Bioperl-l] pubmed article download and storing in object In-Reply-To: Message-ID: Zack, There are *some* examples in the examples/biblio/biblio_examples.pl script. If you come up with some more please send them our way, I can add them. Do you have XML::Twig installed? If not that could explain the error using "eutils". Brian O. On 6/28/05 9:09 AM, "Zack Napalm" wrote: > how does this work exactly? the bioperl documentation seems very poor > concerning structure and content. > i have tried the following easy piece of code: > > my $biblio = new Bio::Biblio; > my $bibResult = $biblio->find('p53'); > my $cit = $bibResult->get_next(); > print "$_\n" for @{$cit}; > > just a lot of numbers in an array come back, but no xml file => if i try to > print $cit directly, just byte[]=ARRAY(0x...) is printed > > when i try to use Bio::DB::Biblio::eutils, the interpreter says, that this > eutils.pm isn't installed. all moduls i use work properly, though. in the > bioperl documentation is mentioned, that you shouldn't use this modul > directly, but use it via my $biblio = new Bio::Biblio (-access => > 'eutils');, which doesn't work either (eutils.pm not installed again). > > how can i get a proper xml file back? is there a nice bioperl documentation > out there? > >> From: Allen Day >> To: Zack Napalm >> CC: bioperl-l@bioperl.org >> Subject: Re: [Bioperl-l] pubmed article download and storing in object >> Date: Mon, 27 Jun 2005 11:26:30 -0700 (PDT) >> >> try Bio::DB::Biblio::eutils. it doesn't give back an object, but gives >> back an xml document. >> >> to any volunteers out there looking for something to contribute: this >> would be an easy place to jump in and add something useful to bioperl. >> you'd have a chance to learn a bit of pubmed query language and xml dom >> api. >> >> -allen >> >> On Mon, 27 Jun 2005, Zack Napalm wrote: >> >>> Hi, >>> I am new at Bioperl and wondering whether there is a possibility to >> scrape >>> articles from PubMed automatically, and put them into an >>> Bio::Biblio::PubmedArticle object. >>> I don't want to scrape them via LWP::Simple get and put them into an >> object >>> manually, if there is a appropriate function predefined. >>> >>> regards >>> johnny >>> >>> _________________________________________________________________ >>> Sie suchen E-Mails, Dokumente oder Fotos? Die neue MSN Suche Toolbar mit >>> Windows-Desktopsuche liefert in sekundenschnelle Ergebnisse. Jetzt neu! >>> http://desktop.msn.de/ Jetzt gratis downloaden! >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _________________________________________________________________ > Die neue MSN Suche Toolbar mit Windows-Desktopsuche. Suchen Sie gleichzeitig > im Web, Ihren E-Mails und auf Ihrem PC! Jetzt neu! http://desktop.msn.de/ > Jetzt gratis downloaden! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Tue Jun 28 13:24:42 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Jun 28 13:16:13 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: References: Message-ID: <200506281324.42717.lstein@cshl.edu> The bioperl GFF database (both the inmemory and relational database versions) need to be brought up to date to handle the full expressive powerof GFF3. So for the time being ID trumps Name. Also you must use the so_transcript aggregator instead of the processed_transcript aggregator. Lincoln On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > I was wondering if there is any documentation about using GFF3 format with > Gbrowse. Since this is the "new" format, I wanted to start using it, but > observing some behaviors. > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml indicates > the Name tag is the id to be displayed and the ID tag is unique and > internal, however when I use Gbrowse 1.62 it is ID that is being displayed > as the label. > > I wish to use processed_transcript aggregator, the GFF3 document indicates > you only need to display the exons and CDS and the UTRs will be inferred, > however I did not see that when viewed in Gbrowse. > > If there is some extra code or documentation I need please let me know > > Thanks > Andy -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Tue Jun 28 13:24:42 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Jun 28 13:16:15 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: References: Message-ID: <200506281324.42717.lstein@cshl.edu> The bioperl GFF database (both the inmemory and relational database versions) need to be brought up to date to handle the full expressive powerof GFF3. So for the time being ID trumps Name. Also you must use the so_transcript aggregator instead of the processed_transcript aggregator. Lincoln On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > I was wondering if there is any documentation about using GFF3 format with > Gbrowse. Since this is the "new" format, I wanted to start using it, but > observing some behaviors. > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml indicates > the Name tag is the id to be displayed and the ID tag is unique and > internal, however when I use Gbrowse 1.62 it is ID that is being displayed > as the label. > > I wish to use processed_transcript aggregator, the GFF3 document indicates > you only need to display the exons and CDS and the UTRs will be inferred, > however I did not see that when viewed in Gbrowse. > > If there is some extra code or documentation I need please let me know > > Thanks > Andy -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From cain at cshl.edu Tue Jun 28 13:58:42 2005 From: cain at cshl.edu (Scott Cain) Date: Tue Jun 28 13:49:57 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: <200506281324.42717.lstein@cshl.edu> References: <200506281324.42717.lstein@cshl.edu> Message-ID: <1119981522.3365.34.camel@localhost.localdomain> Though you can always use a callback on the label to get the Name instead: label = sub { my $f = shift; my ($name) = $f->attributes('Name'); return $name; } On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote: > The bioperl GFF database (both the inmemory and relational database versions) > need to be brought up to date to handle the full expressive powerof GFF3. So > for the time being ID trumps Name. Also you must use the so_transcript > aggregator instead of the processed_transcript aggregator. > > Lincoln > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > > I was wondering if there is any documentation about using GFF3 format with > > Gbrowse. Since this is the "new" format, I wanted to start using it, but > > observing some behaviors. > > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml indicates > > the Name tag is the id to be displayed and the ID tag is unique and > > internal, however when I use Gbrowse 1.62 it is ID that is being displayed > > as the label. > > > > I wish to use processed_transcript aggregator, the GFF3 document indicates > > you only need to display the exons and CDS and the UTRs will be inferred, > > however I did not see that when viewed in Gbrowse. > > > > If there is some extra code or documentation I need please let me know > > > > Thanks > > Andy > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From MEC at Stowers-Institute.org Tue Jun 28 14:01:29 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Tue Jun 28 13:52:48 2005 Subject: [Bioperl-l] HOWTO: fix the broken left margin in all HOWTO pdfs from http://bioperl.org/HOWTOs/ Message-ID: <200506281752.j5SHqhPB030988@portal.open-bio.org> Thanks Brian, Though I have privs to commit, I was loath to go there without being able to conduct test on pdf generation myself. I'd appreciate a heads up if/when this is tested and new pdfs posted. Cheers, Malcolm -----Original Message----- From: Brian Osborne [mailto:brian_osborne@cognia.com] Sent: Tuesday, June 28, 2005 4:23 AM To: Cook, Malcolm; bioperl-l Subject: Re: [Bioperl-l] HOWTO: fix the broken left margin in all HOWTO pdfs from http://bioperl.org/HOWTOs/ Malcolm, Thank you for the tip, I've changed the XSL. Unfortunately I can't verify the change at the moment, my new computer is docbook-deficent! Brian O. On 6/27/05 6:36 PM, "Cook, Malcolm" wrote: > Cheers, > > This article http://www.e-novative-forum.com/article21.htm suggests to > me that patching /bioperl-live/doc/howto/sgml/stylesheet/e-novative.xsl > as follows is in order > > Line 367 (e-novative.xsl): > - 0 > + 0cm > > as it will fix the broken left margin in all the great HOWTO pdfs (which > print out lousy this way)! > > Can someone prove this patch and if indeed this fixes problem is it > possible to release fixed pdfs (with good margins) at > http://bioperl.org/HOWTOs/ > > Thanks? > > Regards, > > Malcolm Cook - mec@stowers-institute.org - 816-926-4449 > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, MO USA > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From cain at cshl.edu Tue Jun 28 15:44:10 2005 From: cain at cshl.edu (Scott Cain) Date: Tue Jun 28 15:35:25 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: <200506281324.42717.lstein@cshl.edu> References: <200506281324.42717.lstein@cshl.edu> Message-ID: <1119987851.3365.46.camel@localhost.localdomain> Lincoln, This is the first I've heard of the so_transcript aggregator; have you committed it anywhere? Scott On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote: > The bioperl GFF database (both the inmemory and relational database versions) > need to be brought up to date to handle the full expressive powerof GFF3. So > for the time being ID trumps Name. Also you must use the so_transcript > aggregator instead of the processed_transcript aggregator. > > Lincoln > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > > I was wondering if there is any documentation about using GFF3 format with > > Gbrowse. Since this is the "new" format, I wanted to start using it, but > > observing some behaviors. > > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml indicates > > the Name tag is the id to be displayed and the ID tag is unique and > > internal, however when I use Gbrowse 1.62 it is ID that is being displayed > > as the label. > > > > I wish to use processed_transcript aggregator, the GFF3 document indicates > > you only need to display the exons and CDS and the UTRs will be inferred, > > however I did not see that when viewed in Gbrowse. > > > > If there is some extra code or documentation I need please let me know > > > > Thanks > > Andy > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain at cshl.edu Tue Jun 28 15:44:10 2005 From: cain at cshl.edu (Scott Cain) Date: Tue Jun 28 15:35:33 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: <200506281324.42717.lstein@cshl.edu> References: <200506281324.42717.lstein@cshl.edu> Message-ID: <1119987851.3365.46.camel@localhost.localdomain> Lincoln, This is the first I've heard of the so_transcript aggregator; have you committed it anywhere? Scott On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote: > The bioperl GFF database (both the inmemory and relational database versions) > need to be brought up to date to handle the full expressive powerof GFF3. So > for the time being ID trumps Name. Also you must use the so_transcript > aggregator instead of the processed_transcript aggregator. > > Lincoln > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > > I was wondering if there is any documentation about using GFF3 format with > > Gbrowse. Since this is the "new" format, I wanted to start using it, but > > observing some behaviors. > > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml indicates > > the Name tag is the id to be displayed and the ID tag is unique and > > internal, however when I use Gbrowse 1.62 it is ID that is being displayed > > as the label. > > > > I wish to use processed_transcript aggregator, the GFF3 document indicates > > you only need to display the exons and CDS and the UTRs will be inferred, > > however I did not see that when viewed in Gbrowse. > > > > If there is some extra code or documentation I need please let me know > > > > Thanks > > Andy > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From lstein at cshl.edu Tue Jun 28 16:54:34 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Jun 28 16:45:51 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: <1119987851.3365.46.camel@localhost.localdomain> References: <200506281324.42717.lstein@cshl.edu> <1119987851.3365.46.camel@localhost.localdomain> Message-ID: <200506281654.35206.lstein@cshl.edu> It's in bioperl CVS. A copy is also in the gbrowse CVS which will be installed if it detects an old version of bioperl. Lincoln On Tuesday 28 June 2005 03:44 pm, Scott Cain wrote: > Lincoln, > > This is the first I've heard of the so_transcript aggregator; have you > committed it anywhere? > > Scott > > On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote: > > The bioperl GFF database (both the inmemory and relational database > > versions) need to be brought up to date to handle the full expressive > > powerof GFF3. So for the time being ID trumps Name. Also you must use the > > so_transcript aggregator instead of the processed_transcript aggregator. > > > > Lincoln > > > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > > > I was wondering if there is any documentation about using GFF3 format > > > with Gbrowse. Since this is the "new" format, I wanted to start using > > > it, but observing some behaviors. > > > > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml > > > indicates the Name tag is the id to be displayed and the ID tag is > > > unique and internal, however when I use Gbrowse 1.62 it is ID that is > > > being displayed as the label. > > > > > > I wish to use processed_transcript aggregator, the GFF3 document > > > indicates you only need to display the exons and CDS and the UTRs will > > > be inferred, however I did not see that when viewed in Gbrowse. > > > > > > If there is some extra code or documentation I need please let me know > > > > > > Thanks > > > Andy -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Tue Jun 28 16:54:34 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Jun 28 16:45:56 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: <1119987851.3365.46.camel@localhost.localdomain> References: <200506281324.42717.lstein@cshl.edu> <1119987851.3365.46.camel@localhost.localdomain> Message-ID: <200506281654.35206.lstein@cshl.edu> It's in bioperl CVS. A copy is also in the gbrowse CVS which will be installed if it detects an old version of bioperl. Lincoln On Tuesday 28 June 2005 03:44 pm, Scott Cain wrote: > Lincoln, > > This is the first I've heard of the so_transcript aggregator; have you > committed it anywhere? > > Scott > > On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote: > > The bioperl GFF database (both the inmemory and relational database > > versions) need to be brought up to date to handle the full expressive > > powerof GFF3. So for the time being ID trumps Name. Also you must use the > > so_transcript aggregator instead of the processed_transcript aggregator. > > > > Lincoln > > > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > > > I was wondering if there is any documentation about using GFF3 format > > > with Gbrowse. Since this is the "new" format, I wanted to start using > > > it, but observing some behaviors. > > > > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml > > > indicates the Name tag is the id to be displayed and the ID tag is > > > unique and internal, however when I use Gbrowse 1.62 it is ID that is > > > being displayed as the label. > > > > > > I wish to use processed_transcript aggregator, the GFF3 document > > > indicates you only need to display the exons and CDS and the UTRs will > > > be inferred, however I did not see that when viewed in Gbrowse. > > > > > > If there is some extra code or documentation I need please let me know > > > > > > Thanks > > > Andy -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From cain at cshl.edu Tue Jun 28 16:59:08 2005 From: cain at cshl.edu (Scott Cain) Date: Tue Jun 28 16:50:20 2005 Subject: [Bioperl-l] GFF3 and Gbrowse In-Reply-To: <200506281654.35206.lstein@cshl.edu> References: <200506281324.42717.lstein@cshl.edu> <1119987851.3365.46.camel@localhost.localdomain> <200506281654.35206.lstein@cshl.edu> Message-ID: <1119992348.3365.54.camel@localhost.localdomain> Lincoln, I hate to be a pain, but it is not in bioperl-live/Bio/DB/GFF/Aggregator (updated just now). Is it somewhere less obvious, or did you not cvs add it? Thanks, Scott On Tue, 2005-06-28 at 16:54 -0400, Lincoln Stein wrote: > It's in bioperl CVS. A copy is also in the gbrowse CVS which will be installed > if it detects an old version of bioperl. > > Lincoln > > On Tuesday 28 June 2005 03:44 pm, Scott Cain wrote: > > Lincoln, > > > > This is the first I've heard of the so_transcript aggregator; have you > > committed it anywhere? > > > > Scott > > > > On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote: > > > The bioperl GFF database (both the inmemory and relational database > > > versions) need to be brought up to date to handle the full expressive > > > powerof GFF3. So for the time being ID trumps Name. Also you must use the > > > so_transcript aggregator instead of the processed_transcript aggregator. > > > > > > Lincoln > > > > > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote: > > > > I was wondering if there is any documentation about using GFF3 format > > > > with Gbrowse. Since this is the "new" format, I wanted to start using > > > > it, but observing some behaviors. > > > > > > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml > > > > indicates the Name tag is the id to be displayed and the ID tag is > > > > unique and internal, however when I use Gbrowse 1.62 it is ID that is > > > > being displayed as the label. > > > > > > > > I wish to use processed_transcript aggregator, the GFF3 document > > > > indicates you only need to display the exons and CDS and the UTRs will > > > > be inferred, however I did not see that when viewed in Gbrowse. > > > > > > > > If there is some extra code or documentation I need please let me know > > > > > > > > Thanks > > > > Andy > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From bioinfovijayaraj at yahoo.com Wed Jun 29 03:19:47 2005 From: bioinfovijayaraj at yahoo.com (vijayaraj nagarajan) Date: Wed Jun 29 03:11:05 2005 Subject: [Bioperl-l] how-to-remove-redundant-lines Message-ID: <20050629071947.38624.qmail@web53608.mail.yahoo.com> hi i have a cluster file with contents like this: 1 2 5 7 8 11 2 5 7 8 11 3 13 17 19 4 21 45 67 5 7 8 11 Now the 1,2 and 5th lines are redundant. i need to remove the 2nd and 5th line from the file, while retaining only the first line, since the first line contains all the members present in 2 and 5th line... could anyone suggest me how to parse this file, to remove such redundant lines using perl. any help and suggestions in this regard would be greatly appreciated. thanks vijayaraj nagarajan research assistant the university of southern mississippi ms, usa __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From heikki at ebi.ac.uk Wed Jun 29 05:00:38 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Jun 29 04:51:42 2005 Subject: [Bioperl-l] how-to-remove-redundant-lines In-Reply-To: <20050629071947.38624.qmail@web53608.mail.yahoo.com> References: <20050629071947.38624.qmail@web53608.mail.yahoo.com> Message-ID: <200506291000.38963.heikki@ebi.ac.uk> Vijayraj, Your probelm in mathematical terms is comparing sets. In pseudocode: parse first line, create a set, write the line add set to an array for each subsequent line { parse the line, create a set for each old set in the array { if this set is a subset of the old set { next line } } # if we are here, we have not seen the set before add set to an array, write the line } the output will contain the unique lines only. There are a lot of modules in CPAN that can do the algebra for you. One of them is Set::Scalar: http://search.cpan.org/~jhi/Set-Scalar-1.19/ Yours, -Heikki On Wednesday 29 June 2005 08:19, vijayaraj nagarajan wrote: > hi > i have a cluster file with contents like this: > > 1 2 5 7 8 11 > 2 5 7 8 11 > 3 13 17 19 > 4 21 45 67 > 5 7 8 11 > > Now the 1,2 and 5th lines are redundant. i need to > remove the 2nd and 5th line from the file, while > retaining only the first line, since the first line > contains all the members present in 2 and 5th line... > > could anyone suggest me how to parse this file, to > remove such redundant lines using perl. > any help and suggestions in this regard would be > greatly appreciated. > > thanks > > vijayaraj nagarajan > research assistant > the university of southern mississippi > ms, usa > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at ebi.ac.uk Wed Jun 29 05:20:40 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed Jun 29 05:12:49 2005 Subject: [Bioperl-l] how-to-remove-redundant-lines In-Reply-To: <200506291000.38963.heikki@ebi.ac.uk> References: <20050629071947.38624.qmail@web53608.mail.yahoo.com> <200506291000.38963.heikki@ebi.ac.uk> Message-ID: <200506291020.40561.heikki@ebi.ac.uk> ... and if your set are not in decreasing order, you can not print them out immediately, bit you have to store them in a hash and test if the new set is a superset or a subset of each existing set, and remove and add sets in the hash accordingly - and print the hash elements in the end. -Heikki On Wednesday 29 June 2005 10:00, Heikki Lehvaslaiho wrote: > Vijayraj, > > Your probelm in mathematical terms is comparing sets. > > In pseudocode: > > parse first line, create a set, write the line > add set to an array > for each subsequent line { > parse the line, create a set > for each old set in the array { > if this set is a subset of the old set { > next line > } > } > # if we are here, we have not seen the set before > add set to an array, write the line > } > > the output will contain the unique lines only. > > There are a lot of modules in CPAN that can do the algebra for you. One of > them is Set::Scalar: http://search.cpan.org/~jhi/Set-Scalar-1.19/ > > > Yours, > -Heikki > > On Wednesday 29 June 2005 08:19, vijayaraj nagarajan wrote: > > hi > > i have a cluster file with contents like this: > > > > 1 2 5 7 8 11 > > 2 5 7 8 11 > > 3 13 17 19 > > 4 21 45 67 > > 5 7 8 11 > > > > Now the 1,2 and 5th lines are redundant. i need to > > remove the 2nd and 5th line from the file, while > > retaining only the first line, since the first line > > contains all the members present in 2 and 5th line... > > > > could anyone suggest me how to parse this file, to > > remove such redundant lines using perl. > > any help and suggestions in this regard would be > > greatly appreciated. > > > > thanks > > > > vijayaraj nagarajan > > research assistant > > the university of southern mississippi > > ms, usa > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From tcj25 at cam.ac.uk Wed Jun 29 05:32:09 2005 From: tcj25 at cam.ac.uk (Terry Jones) Date: Wed Jun 29 05:24:09 2005 Subject: [Bioperl-l] how-to-remove-redundant-lines In-Reply-To: Your message at 00:19:47 on Wednesday, 29 June 2005 References: <20050629071947.38624.qmail@web53608.mail.yahoo.com> Message-ID: <17090.27289.678948.95804@terry.jones.tc> >>>>> "vijayaraj" == vijayaraj nagarajan writes: vijayaraj> i have a cluster file with contents like this: vijayaraj> 1 2 5 7 8 11 vijayaraj> 2 5 7 8 11 vijayaraj> 3 13 17 19 vijayaraj> 4 21 45 67 vijayaraj> 5 7 8 11 vijayaraj> Now the 1,2 and 5th lines are redundant. i need to vijayaraj> remove the 2nd and 5th line from the file, while vijayaraj> retaining only the first line, since the first line vijayaraj> contains all the members present in 2 and 5th line... Are there any constraints on your data that might help solve this? For example, do the numbers always have exactly one space between them? Do the numbers always appear in ascending order? Is there ever any trailing whitespace on a line (there was a space at the end of your second line). If the answers to the above are yes, yes, and no, then the following works. If not, you'll need to do a little more to canonicalize each line (e.g., strip spaces, sort the numbers, etc). #!/usr/bin/perl -w use strict; my @lines; while (my $line = <>){ next if grep /\Q$line\E/, @lines; push @lines, $line; print $line; } or if you feel like being obscure today, you can do it straight from the command line: perl -ne '$l = $_; push(@lines, $l), print($l) unless grep /\Q$l\E/, @lines' < data Terry From tcj25 at cam.ac.uk Wed Jun 29 06:30:28 2005 From: tcj25 at cam.ac.uk (Terry Jones) Date: Wed Jun 29 06:24:00 2005 Subject: [Bioperl-l] how-to-remove-redundant-lines In-Reply-To: Your message at 11:32:09 on Wednesday, 29 June 2005 References: <20050629071947.38624.qmail@web53608.mail.yahoo.com> <17090.27289.678948.95804@terry.jones.tc> Message-ID: <17090.30788.631518.157296@terry.jones.tc> Oops...... I wrote the following rubbish: | Are there any constraints on your data that might help solve this? | | For example, do the numbers always have exactly one space between | them? Do the numbers always appear in ascending order? Is there ever | any trailing whitespace on a line (there was a space at the end of | your second line). | | If the answers to the above are yes, yes, and no, then the following | works. If not, you'll need to do a little more to canonicalize each | line (e.g., strip spaces, sort the numbers, etc). | | [Incriminating / embarrassing perl deleted] My code has at least two serious errors: 1) as Heikki points out, you can't output anything until you've read all the data, and 2) it will give spurious matches in input cases like this 23 45 67 3 45 I hereby disown my earlier posting. You should follow Heikki's advice and use sets. Terry From tcj25 at cam.ac.uk Wed Jun 29 07:16:28 2005 From: tcj25 at cam.ac.uk (Terry Jones) Date: Wed Jun 29 07:07:52 2005 Subject: [Bioperl-l] how-to-remove-redundant-lines In-Reply-To: Your message at 11:32:09 on Wednesday, 29 June 2005 References: <20050629071947.38624.qmail@web53608.mail.yahoo.com> <17090.27289.678948.95804@terry.jones.tc> Message-ID: <17090.33548.689475.727979@terry.jones.tc> >>>>> "vijayaraj" == vijayaraj nagarajan writes: vijayaraj> i have a cluster file with contents like this: vijayaraj> 1 2 5 7 8 11 vijayaraj> 2 5 7 8 11 vijayaraj> 3 13 17 19 vijayaraj> 4 21 45 67 vijayaraj> 5 7 8 11 vijayaraj> Now the 1,2 and 5th lines are redundant. i need to vijayaraj> remove the 2nd and 5th line from the file, while vijayaraj> retaining only the first line, since the first line vijayaraj> contains all the members present in 2 and 5th line... Here's something much better. It tries to be somewhat efficient. Terry #!/usr/bin/perl -w use strict; my @lines; while (<>){ my @nums = split; my $nums = {}; map { $nums->{$_} = undef } @nums; push @lines, [ $nums, scalar(@nums) ]; } my @sorted = sort { $lines[$b]->[1] <=> $lines[$a]->[1] } 0 .. $#lines; for (my $i = 0; $i < @lines; $i++){ print join(' ', sort { $a <=> $b } keys %{$lines[$sorted[$i]]->[0]}), "\n" unless match($i); } sub match { my $index = shift; my $target_set = $lines[$sorted[$index]]->[0]; for (my $i = 0; $i < $index; $i++){ my $is_subset = 1; my $bigger_set = $lines[$sorted[$i]]->[0]; for my $element (keys %$target_set){ unless (exists $bigger_set->{$element}){ $is_subset = 0; last; } } return 1 if $is_subset; } } exit(0); From MEC at Stowers-Institute.org Wed Jun 29 15:40:38 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Wed Jun 29 15:32:20 2005 Subject: [Bioperl-l] Alternate hit sorting for Bio::Search::Result objects Message-ID: <200506291931.j5TJVmPB026792@portal.open-bio.org> Hi, To re-raise a thread that was under discussion in Dec 2003 http://portal.open-bio.org/pipermail/bioperl-l/2003-December/014416.html ... In service of a particular analysis, I am trying to programmatically 'edit' a Bio::Search::Result::BlastResult by filtering and sorting the hits in a variety of ways prior to writing it back out (once for each way). I first tried breaking the interface (as suggested by Jason in Dec 03) by directly setting `$myResult->{'_hits'}`. However, as Jason further suggested, it _is_ prone to problems. I don't know if it would have worked in Dec 2003, but it does not work now (for blast reports at least). As I discovered, B::S::R::BlastResult overrides B::S::R::GenericResult->hits to accommodate tracking iterations such as reported by psiblast. It does this for all blast outputs, including flavors that don't iterate (`blastall -p blastx ` in my case). Thus I found that to break the interface, I had to do the following $myResult->{_iterations}[0]->{_newhits_below_threshold} = \@sortedFilteredAndOtherwiseMungedHits; This works but is ugly and probably even more prone to (future?) problems. My question is, what to do in general. Options: 1) use the -filter capability of Bio::SearchIO::Writer::ResultTableWriter in combination with Bio::Search::Result::ResultI->sort_hits 2) write an API for resetting hits 3) continue to live dangerously by setting $myResult->{_iterations}[0]->{_newhits_below_threshold} In either case, it _may_ make sense to not track hits by iteration unless necessary by having the Result $rc created in Bio::SearchIO::blast set _no_iterations to 1 unless the report is psiblast (or other iteration producer). I don't really like option 1 due to all the objects and levels of indirection just to grep/sort a list of hits but it might work fine... Thoughts ...suggestions ...advice ...admonitions all appreciated Regards, Malcolm Cook - mec@stowers-institute.org - 816-926-4449 Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, MO USA From julipao at terra.es Wed Jun 29 20:26:15 2005 From: julipao at terra.es (=?ISO-8859-1?Q?Julio_Fern=E1ndez_Banet?=) Date: Wed Jun 29 20:17:39 2005 Subject: [Bioperl-l] Help on module for the analysis of sequence alignment Message-ID: <7505a92c43f2e45fcc83672039e03ec3@terra.es> Hello all!. I would like to ask if there is a bioperl module that can extract information from an alignment obtained by bl2seq. The kind of info I need to extract is the position of a change, the nucleotide in the two sequences and in case of a deletion or insertion the length and the sequence that was inserted/deleted. The reason to do this is because I'm comparing the wild type sequence of my gene of interest against the sequences obtained from my population of study and I would like to store this kind of information in a database to integrate genotype variation with phenotype To be more descriptive I need something that works like this: Alignment: Seq1: acgtacgtacgtacgt----acgt | | | | | | | | | | | | | | | seq2: acgtacgtacct----acgtacgt Result from parsing: Position 11 g -> c Position (13-16)insertion ACGT Position (17-20)deletion ACGT Thanks a lot. Julio Fern?ndez PhD. Julio Fern?ndez Banet C/ Benito Corbal n?20 -6?A Pontevedra C.P. 36001 Spain Phone: (0034)607747946 Web: http://www.freelancebio.com PhD. Julio Fern?ndez Banet C/ Benito Corbal n?20 -6?A Pontevedra C.P. 36001 Spain Phone: (0034)607747946 Web: http://www.freelancebio.com PhD. Julio Fern?ndez Banet C/ Benito Corbal n?20 -6?A Pontevedra C.P. 36001 Spain Phone: (0034)607747946 Web: http://www.freelancebio.com PhD. Julio Fern?ndez Banet C/ Benito Corbal n?20 -6?A Pontevedra C.P. 36001 Spain Phone: (0034)607747946 Web: http://www.freelancebio.com PhD. Julio Fern?ndez Banet C/ Benito Corbal n?20 -6?A Pontevedra C.P. 36001 Spain Phone: (0034)607747946 Web: http://www.freelancebio.com PhD. Julio Fern?ndez Banet C/ Benito Corbal n?20 -6?A Pontevedra C.P. 36001 Spain Phone: (0034)607747946 Web: http://www.freelancebio.com PhD. Julio Fern?ndez Banet C/ Benito Corbal n?20 -6?A Pontevedra C.P. 36001 Spain Phone: (0034)607747946 Web: http://www.freelancebio.com From bioinfovijayaraj at yahoo.com Wed Jun 29 22:32:30 2005 From: bioinfovijayaraj at yahoo.com (vijayaraj nagarajan) Date: Wed Jun 29 22:23:47 2005 Subject: [Bioperl-l] how-to-remove-redundant-lines-IT WORKED-TERRY In-Reply-To: <17090.30788.631518.157296@terry.jones.tc> Message-ID: <20050630023230.21601.qmail@web53608.mail.yahoo.com> hi TERRY your so called [Incriminating / embarrassing perl deleted] code worked GREAT for me... thanks dear. i think this is bcause, all that your so called RUBBISH, where exactly that I had...hahahaha... i am really happy and was jumping out of joy, seeing the code work wonderful.... thank you SO MUCH..... infact i didnt try the other code you had suggested...just bcause i got the results and verified them to be correct. THANKS TO HEIKKI... even though i studied stat..and searched cpan for modules..this set concept didnt strike me...what a stupid i was... THANKS you once again Heikki for showing me the way...to set concept and perl modules... i would be more than happy to acknowledge you guys, when i am gonna publish my paper... cheers guys vijayaraj nagarajan research assistant department of biological sciences the university of southern mississippi ms, usa --- Terry Jones wrote: > Oops...... I wrote the following rubbish: > > | Are there any constraints on your data that might > help solve this? > | > | For example, do the numbers always have exactly > one space between > | them? Do the numbers always appear in ascending > order? Is there ever > | any trailing whitespace on a line (there was a > space at the end of > | your second line). > | > | If the answers to the above are yes, yes, and no, > then the following > | works. If not, you'll need to do a little more to > canonicalize each > | line (e.g., strip spaces, sort the numbers, etc). > | > | [Incriminating / embarrassing perl deleted] > > My code has at least two serious errors: 1) as > Heikki points out, you > can't output anything until you've read all the > data, and 2) it will > give spurious matches in input cases like this > > 23 45 67 > 3 45 > > I hereby disown my earlier posting. You should > follow Heikki's advice > and use sets. > > Terry > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From heikki at ebi.ac.uk Thu Jun 30 06:45:48 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu Jun 30 06:36:54 2005 Subject: [Bioperl-l] Help on module for the analysis of sequence alignment In-Reply-To: <7505a92c43f2e45fcc83672039e03ec3@terra.es> References: <7505a92c43f2e45fcc83672039e03ec3@terra.es> Message-ID: <200506301145.48957.heikki@ebi.ac.uk> Julio, We do not have anything ready that would do it, but it should not be too difficult it. We have: Bio::AlignIO::bl2seq, that can allows Bio::AlignIO to read bl2seq output Bio::SimpleAlign, is the structure that stores the alignment it has a lot of methods Bio::Variation::SeqDiff Bio::Variation::DNAMutation for storing the variation information also, Bio::Coordinate::Utils->from_align($aln) that can be used to calculate sequence positions Using these, it should be possible to do what you need. Part of the problem is task specific and is best done in a script but some of it could be put in a utility method into BioPerl: - take an alignment as the first argument - assume that first seq is the reference, allow to change that using second argument - to simplify things, allow only pairwise alignments - process the alignment and return a Bio::Variation::SeqDiff object holding all detected Bio::Variation::DNA::Mutations The script would take care of reading in the alignment and processing the output for printing or for adding into the database. Is this something you could write and give the method back to be included into BioPerl? Yours, -Heikki P.S. You do know of the EMBOSS program diffseq, don't you? http://emboss.sourceforge.net/apps/diffseq.html On Thursday 30 June 2005 01:26, Julio Fern?ndez Banet wrote: > Hello all!. > > I would like to ask if there is a bioperl module that can extract > information from an alignment obtained by bl2seq. > The kind of info I need to extract is the position of a change, the > nucleotide in the two sequences and in case of a deletion or insertion > the length and the sequence that was inserted/deleted. > The reason to do this is because I'm comparing the wild type sequence > of my gene of interest against the sequences obtained from my > population of study and I would like to store this kind of information > in a database to integrate genotype variation with phenotype > > To be more descriptive I need something that works like this: > > > Alignment: > Seq1: acgtacgtacgtacgt----acgt > > seq2: acgtacgtacct----acgtacgt > > Result from parsing: > > Position 11 g -> c > Position (13-16)insertion ACGT > Position (17-20)deletion ACGT > > Thanks a lot. > > Julio Fern?ndez > > > PhD. Julio Fern?ndez Banet > C/ Benito Corbal n?20 -6?A > Pontevedra C.P. 36001 > Spain > > Phone: (0034)607747946 > Web: http://www.freelancebio.com > > PhD. Julio Fern?ndez Banet > C/ Benito Corbal n?20 -6?A > Pontevedra C.P. 36001 > Spain > > Phone: (0034)607747946 > Web: http://www.freelancebio.com > > PhD. Julio Fern?ndez Banet > C/ Benito Corbal n?20 -6?A > Pontevedra C.P. 36001 > Spain > > Phone: (0034)607747946 > Web: http://www.freelancebio.com > > PhD. Julio Fern?ndez Banet > C/ Benito Corbal n?20 -6?A > Pontevedra C.P. 36001 > Spain > > Phone: (0034)607747946 > Web: http://www.freelancebio.com > > PhD. Julio Fern?ndez Banet > C/ Benito Corbal n?20 -6?A > Pontevedra C.P. 36001 > Spain > > Phone: (0034)607747946 > Web: http://www.freelancebio.com > > PhD. Julio Fern?ndez Banet > C/ Benito Corbal n?20 -6?A > Pontevedra C.P. 36001 > Spain > > Phone: (0034)607747946 > Web: http://www.freelancebio.com > > PhD. Julio Fern?ndez Banet > C/ Benito Corbal n?20 -6?A > Pontevedra C.P. 36001 > Spain > > Phone: (0034)607747946 > Web: http://www.freelancebio.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From shikida at gmail.com Mon Jun 27 07:27:51 2005 From: shikida at gmail.com (Leonardo Kenji Shikida) Date: Thu Jun 30 11:11:02 2005 Subject: [Bioperl-l] problems during installation In-Reply-To: <42BFDFA0.1010609@gmail.com> References: <1b4364400506240759372ff1af@mail.gmail.com> <42BFDFA0.1010609@gmail.com> Message-ID: <1b43644005062704271bbbd346@mail.gmail.com> thank you. I fixed the problem installing the required non-perl libraries regards Kenji On 6/27/05, Frank Lee wrote: > Just guess, I believe the answer is given when you configing as this : > > If you experience compile problems, please check the @INC, @LIBPATH and @LIBS > arrays defined in Makefile.PL and manually adjust, if necessary. > > For me, I will locate these files and find where they are and add them to the @INC or @LIBPATH, etc. > > Wish you luck > > > > > > \ > Leonardo Kenji Shikida wrote: > > >trying to install using perl -MCPAN -e "install Bundle::BioPerl" in a > >suse 9.3 box > > > >how should I proceed? > > > > > > > > > > CPAN.pm: Going to build L/LD/LDS/GD-2.19.tar.gz > > > >NOTICE: This module requires libgd 2.0.28 or higher. > > it will NOT work with earlier versions. If you are getting > > compile or link errors, then please get and install a new > > version of libgd from www.boutell.com. Do NOT ask Lincoln > > for help until you try this. > > > > If you are using Math::Trig 1.01 or lower, it has a bug that > > causes a "prerequisite not found" warning to be issued. You may > > safely ignore this warning. > > > > Type perl Makefile.PL -h for command-line option summary > > > >Configuring for libgd version 2.0.32. > >Included Features: GD_XPM GD_JPEG GD_FONTCONFIG GD_FREETYPE GD_PNG GD_G > >IF > >GD library used from: /usr > > > >If you experience compile problems, please check the @INC, @LIBPATH and @LIBS > >arrays defined in Makefile.PL and manually adjust, if necessary. > > > >Checking if your kit is complete... > >Looks good > >Writing Makefile for GD > >cp GD/Polyline.pm blib/lib/GD/Polyline.pm > >cp qd.pl blib/lib/qd.pl > >cp GD.pm blib/lib/GD.pm > >AutoSplitting blib/lib/GD.pm (blib/lib/auto/GD) > >cp GD/Simple.pm blib/lib/GD/Simple.pm > >/usr/bin/perl /usr/lib/perl5/5.8.6/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8. > >6/ExtUtils/typemap -typemap typemap GD.xs > GD.xsc && mv GD.xsc GD.c > >cc -c -I/usr/include -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > > -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -marc > >h=i586 -mcpu=i686 -fmessage-length=0 -Wall -g -Wall -pipe -DVERSION=\"2.19\" - > >DXS_VERSION=\"2.19\" -fPIC "-I/usr/lib/perl5/5.8.6/i586-linux-thread-multi/CORE" > > -DHAVE_JPEG -DHAVE_FT -DHAVE_XPM -DHAVE_GIF -DHAVE_PNG -DHAVE_FONTCONFIG GD.c > >GD.xs:7:16: gd.h: No such file or directory > >GD.xs:8:21: gdfontg.h: No such file or directory > >GD.xs:9:21: gdfontl.h: No such file or directory > >GD.xs:10:22: gdfontmb.h: No such file or directory > > > > > > > > > > -- [] Kenji _______________________ http://kenjiria.blogspot.com http://gaitabh.blogspot.com From chris at bioteam.net Mon Jun 27 09:08:15 2005 From: chris at bioteam.net (Chris Dagdigian) Date: Thu Jun 30 11:11:08 2005 Subject: [Bioperl-l] problems during installation In-Reply-To: <1b4364400506240759372ff1af@mail.gmail.com> References: <1b4364400506240759372ff1af@mail.gmail.com> Message-ID: <01900A2D-9724-45B8-BF89-4BF2676B6A74@bioteam.net> You may want to check SuSE's software install tool (YaST) -- the GD perl module is usually included in an RPM and the version in Suse's repository may be modern enough for what you need - it will remove the need for you to find and deal with libgd. You should also check with YaST to see if the updated libgd is available in case you need to build the perl GD module from source. -Chris On Jun 24, 2005, at 10:59 AM, Leonardo Kenji Shikida wrote: > trying to install using perl -MCPAN -e "install Bundle::BioPerl" in a > suse 9.3 box > > how should I proceed? > > >>>>>>>>> >>>>>>>>> > > CPAN.pm: Going to build L/LD/LDS/GD-2.19.tar.gz > > NOTICE: This module requires libgd 2.0.28 or higher. > it will NOT work with earlier versions. If you are getting > compile or link errors, then please get and install a new > version of libgd from www.boutell.com. Do NOT ask Lincoln > for help until you try this. > > If you are using Math::Trig 1.01 or lower, it has a bug that > causes a "prerequisite not found" warning to be issued. > You may > safely ignore this warning. > > Type perl Makefile.PL -h for command-line option summary > > Configuring for libgd version 2.0.32. > Included Features: GD_XPM GD_JPEG GD_FONTCONFIG > GD_FREETYPE GD_PNG GD_G > IF > GD library used from: /usr > > If you experience compile problems, please check the @INC, @LIBPATH > and @LIBS > arrays defined in Makefile.PL and manually adjust, if necessary. > > Checking if your kit is complete... > Looks good > Writing Makefile for GD > cp GD/Polyline.pm blib/lib/GD/Polyline.pm > cp qd.pl blib/lib/qd.pl > cp GD.pm blib/lib/GD.pm > AutoSplitting blib/lib/GD.pm (blib/lib/auto/GD) > cp GD/Simple.pm blib/lib/GD/Simple.pm > /usr/bin/perl /usr/lib/perl5/5.8.6/ExtUtils/xsubpp -typemap /usr/ > lib/perl5/5.8. > 6/ExtUtils/typemap -typemap typemap GD.xs > GD.xsc && mv GD.xsc GD.c > cc -c -I/usr/include -D_REENTRANT -D_GNU_SOURCE - > DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE - > D_FILE_OFFSET_BITS=64 -O2 -marc > h=i586 -mcpu=i686 -fmessage-length=0 -Wall -g -Wall -pipe - > DVERSION=\"2.19\" - > DXS_VERSION=\"2.19\" -fPIC "-I/usr/lib/perl5/5.8.6/i586-linux- > thread-multi/CORE" > -DHAVE_JPEG -DHAVE_FT -DHAVE_XPM -DHAVE_GIF -DHAVE_PNG - > DHAVE_FONTCONFIG GD.c > GD.xs:7:16: gd.h: No such file or directory > GD.xs:8:21: gdfontg.h: No such file or directory > GD.xs:9:21: gdfontl.h: No such file or directory > GD.xs:10:22: gdfontmb.h: No such file or directory > > > -- > > [] > > Kenji > _______________________ > http://kenjiria.blogspot.com > http://gaitabh.blogspot.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From rfsouza at cecm.usp.br Tue Jun 28 19:02:00 2005 From: rfsouza at cecm.usp.br (rfsouza@cecm.usp.br) Date: Thu Jun 30 11:11:12 2005 Subject: [Bioperl-l] nhx.pm does not print bootstrap values Message-ID: <32880.143.107.53.101.1119999720.squirrel@webmail.cecm.usp.br> Hi, I been trying to reformat some phylogenies built by the Phyml program to the nhx format, in order to print them through ATV, but, for some reason, the nhx.pm module doesn't print the bootstrap values. Those values are loaded using newick.pm and get printed out through nexus.pm, but nhx.pm seems to loose them. I'm using bioperl-live from CVS (downloaded 20050523). Is this a known bug? Thanks for any help. Robson #-------------- Here is my code: #!/usr/bin/perl -w use Bio::TreeIO; use Getopt::Long; use strict; my $usage = 0; my $outfile = undef; my $infmt = 'newick'; my $outfmt = 'nhx'; my $bootstyle = ''; GetOptions("infmt|i=s" => \$infmt, "outfmt|o=s" => \$outfmt, "bootstyle|b=s" => \$bootstyle, "outfile|f=s" => \$outfile, "usage|help|h" => \$usage) || die "Error parsing command line arguments"; usage() if ($usage || scalar(@ARGV) == 0); # Storing input file command line arguments my %inargs = ('-file'=>$ARGV[0], '-format'=>$infmt); $inargs{'-bootstrap_style'} = $bootstyle if (defined $bootstyle); # Storing input file command line arguments my %outargs = ('-format'=>$outfmt); if (defined $outfile) { $outargs{'-file'} = ">$outfile"; } else { $outargs{'-fh'} = \*STDOUT; } # Building data streams my $in = Bio::TreeIO->new(%inargs); my $out = Bio::TreeIO->new(%outargs); while (my $t = $in->next_tree) { $out->write_tree($t); } exit 0; sub usage { die "Usage: $0 [options] ... Available options: --infmt|-i : input file format (default: newick) --outfmt|-o : output file format (default: nhx) --bootstyle|b : input bootstrap values format --outfile|-f : output file name (default: stdout) --usage|--help|-h : print this message "; } From ylin9 at gel.ym.edu.tw Thu Jun 30 10:17:12 2005 From: ylin9 at gel.ym.edu.tw (=?big5?B?WXUtSHN1YW4gTGluKKpMpnSwYSk=?=) Date: Thu Jun 30 11:11:15 2005 Subject: [Bioperl-l] Problem of installing bioperl-run-1.4 Message-ID: <002501c57d7e$68c66170$7a4e818c@sandy> Hi, all, I have a problem to install bioperl-run-1.4. Because I want to use EMBOSS program within my bioperl script, I installed bioperl 1.4 and EMBOSS ( /home/tools/EMBOSS-2.10.0 ) in my debian linux system. When I tried to type in command line "make test", it said t/EMBOSS..................ok 28/30 skipped: EMBOSS not installed locally or XML::Twig not installed I also installed XML::Twig and tried symbolic link from /usr/local to /home/tools/EMBOSS-2.10.0 but still get the same message. Can anyone kindly tell me how to solve this problem or where to find solution ? Thank you very much, Vincent, From jason.stajich at duke.edu Thu Jun 30 11:40:46 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Jun 30 11:32:00 2005 Subject: [Bioperl-l] Problem of installing bioperl-run-1.4 In-Reply-To: <002501c57d7e$68c66170$7a4e818c@sandy> References: <002501c57d7e$68c66170$7a4e818c@sandy> Message-ID: <1120146046.42c4127e11197@webmail.duke.edu> did you make sure the EMBOSS bin directory is in your PATH? -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ Quoting "Yu-Hsuan Lin(?L?t?a)" : > Hi, all, > > I have a problem to install bioperl-run-1.4. Because I want to use EMBOSS > program within > > my bioperl script, I installed bioperl 1.4 and EMBOSS ( > /home/tools/EMBOSS-2.10.0 ) in > > my debian linux system. When I tried to type in command line "make test", it > said > > t/EMBOSS..................ok > > 28/30 skipped: EMBOSS not installed locally or XML::Twig not > installed > > > I also installed XML::Twig and tried symbolic link from /usr/local to > > /home/tools/EMBOSS-2.10.0 but still get the same message. Can anyone kindly > tell me how to > > solve this problem or where to find solution ? > > Thank you very much, > > Vincent, > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Thu Jun 30 01:00:50 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jun 30 12:04:02 2005 Subject: [Bioperl-l] More unresolved issues with Bio::AnnotatableI In-Reply-To: References: Message-ID: <4d0a050d2d74a15ffb4f3bab69355588@gmx.net> I fixed this. I actually turned out that the dependency was more a bug than a real dependency, and a rather trivial one: SimpleGOEngine was use'd in Bio::Ontology::Ontology.pm even though the code didn't use it - the default engine implementation is SimpleOntologyEngine. Nonetheless I also changed the instantiation of SimpleOntologyEngine to be on demand only - so if in the future it acquires a nasty dependency, or if the default is changed, the dependency is soft in that it only becomes a hard one when you call one of the methods requiring the engine. -hilmar On Jun 22, 2005, at 9:46 AM, Aaron J. Mackey wrote: > > Because AnnotatableI has implementations for add_tag and get_tag that > invoke Bio::Annotation::OntologyTerm, and therefore Graph::Directed, > which relies on Scalar::Util::weaken(), therefore I cannot even use > basic Bio::Seq functionality on any perl that doesn't have weak > references (oddly, this cropped up in a 5.8.0 install via an RPM that > was evidently compiled without support for weak references, so this > isn't just an "ancient perl" problem). > > This is something of a showstopper for any 1.6; in effect, we'd need > to disable Annotation::OntologyTerm use for any Perl without weak > reference support. > > We've said it before, and we need to say it again: the changes made to > the feature/annotation object model are seriously impeding our ability > to move forward to a release (and frighteningly, the GBrowse > distribution now includes those parts of 1.5 that it relies on, so a > user's BioPerl install could be a hodge-podge of 1.4/1.5 code). This > seems important to all GMOD projects, so why hasn't there been any > work on it? > > Thanks, > > -Aaron > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Thu Jun 30 12:19:50 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Jun 30 12:11:29 2005 Subject: [Bioperl-l] nhx.pm does not print bootstrap values In-Reply-To: <32880.143.107.53.101.1119999720.squirrel@webmail.cecm.usp.br> References: <32880.143.107.53.101.1119999720.squirrel@webmail.cecm.usp.br> Message-ID: <1120148390.42c41ba6753a3@webmail.duke.edu> I guess this is a bug - can you submit it as a bug to bugzilla.open-bio.org so we can track it. I thought I unified the slots where newick/nhx/nexus all get/set bootstraps but perhaps not. Bootstraps should be in the 'B' tag for nhx but I guess it isn't there? -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ Quoting rfsouza@cecm.usp.br: > Hi, > > I been trying to reformat some phylogenies built by the Phyml program to > the nhx format, in order to print them through ATV, but, for some reason, > the nhx.pm module doesn't print the bootstrap values. > > Those values are loaded using newick.pm and get printed out through > nexus.pm, but nhx.pm seems to loose them. I'm using bioperl-live from > CVS (downloaded 20050523). Is this a known bug? > > Thanks for any help. > Robson > > > #-------------- > Here is my code: > > #!/usr/bin/perl -w > > use Bio::TreeIO; > use Getopt::Long; > use strict; > > my $usage = 0; > my $outfile = undef; > my $infmt = 'newick'; > my $outfmt = 'nhx'; > my $bootstyle = ''; > GetOptions("infmt|i=s" => \$infmt, > "outfmt|o=s" => \$outfmt, > "bootstyle|b=s" => \$bootstyle, > "outfile|f=s" => \$outfile, > "usage|help|h" => \$usage) > || die "Error parsing command line arguments"; > usage() if ($usage || scalar(@ARGV) == 0); > > # Storing input file command line arguments > my %inargs = ('-file'=>$ARGV[0], '-format'=>$infmt); > $inargs{'-bootstrap_style'} = $bootstyle if (defined $bootstyle); > > # Storing input file command line arguments > my %outargs = ('-format'=>$outfmt); > if (defined $outfile) { > $outargs{'-file'} = ">$outfile"; > } else { > $outargs{'-fh'} = \*STDOUT; > } > > # Building data streams > my $in = Bio::TreeIO->new(%inargs); > my $out = Bio::TreeIO->new(%outargs); > while (my $t = $in->next_tree) { > $out->write_tree($t); > } > > exit 0; > > sub usage { > die "Usage: $0 [options] ... > Available options: > --infmt|-i : input file format (default: newick) > --outfmt|-o : output file format (default: nhx) > --bootstyle|b : input bootstrap values format > --outfile|-f : output file name (default: stdout) > --usage|--help|-h : print this message > "; > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Thu Jun 30 12:20:12 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jun 30 12:11:43 2005 Subject: [Bioperl-l] broken term relationships from dagedit files fixed In-Reply-To: <1118687174.3265.35.camel@localhost.localdomain> References: <20050613180019.64772.qmail@web54103.mail.yahoo.com> <1118687174.3265.35.camel@localhost.localdomain> Message-ID: I fixed the bug in Bioperl that prevented relationships from being returned when $ontology->get_relationships() is called without arguments. (if you called $ontology->get_relationships($term), everything worked fine) The rest is a sort of a post-mortem and probably largely uninteresting to most people unless you're a developer. The bug consisted of $graph->edges_at() being called incorrectly with an undef argument instead of calling $graph->edges() iff the term ID is undef (in Bio::Ontology::SimpleGOEngine.pm). The fix consisted of changing a single line of code. A test in the Bioperl test suite (t/Ontology.t) would readily expose the problem. However, versions of Graph.pm earlier than somewhere in 0.6.x behaved as if edges() had been called when the argument to edges_at() was undef, so that I only recently noticed the test failures, which had never been reported by someone actually experiencing the problem. This single-line bug which would be detected by an existing test triggered a motivation to migrate away from using Bioperl because it was considered broken, as opposed to a motivation to debug the problem in Bioperl and fix it. I am wondering whether this is just the way it goes or whether this experience could be used to draw some lessons for how GMOD developers and Bioperl developers could work together more productively, because generally I believe both groups of people could benefit significantly from each other. Also, I don't see being a GMOD developer as mutually exclusive with being a Bioperl developer and vice versa, so I'm not even sure these are two distinct groups of people ... Any comments greatly welcome. -hilmar On Jun 13, 2005, at 2:26 PM, Scott Cain wrote: > Hello Elena, > > Unfortunately, this isn't going to work for you. Bioperl is currently > broken in that it doesn't capture cvterm relationship information from > the DAG-Edit formatted files, so the cvterm_relationship table isn't > populated. This will be fixed when I migrate to using go-perl for > loading ontologies, but that is a few weeks away at the earliest. > > As a side note, that shell script is also broken, as it uses $USER and > $DBNAME where it should be using $CHADO_DB_USERNAME and $CHADO_DB_NAME, > but that is very much a moot point, as this script will be deprecated > in > favor of the postgres function for populationg the cvtermpath table. > > Thanks, > Scott > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Thu Jun 30 13:57:48 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 30 13:48:58 2005 Subject: [Bioperl-l] Bio::SeqIO::game question Message-ID: <42C4329C.6010800@utk.edu> I am trying to parse a game xml file from flybase.org. I am unsure if the format flybase provides is 1.2 or earlier, but in any case I get: -------------------- WARNING --------------------- MSG: I did not find a protein sequence for --------------------------------------------------- Can't call method "seq" on an undefined value at /home/sao/lib/perl5/site_perl/5.8.1/Bio/SeqIO/game/featHandler.pm line 790, line 484483. game tests look fine. Thanks! Stefan From laurichj at bioinfo.ucr.edu Thu Jun 30 17:03:25 2005 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Thu Jun 30 16:54:35 2005 Subject: [Bioperl-l] BLAST scores Message-ID: <20050630210325.GA13422@bioinfo.ucr.edu> Not really a BioPerl question, but... I ran a bunch of blasts using the tablular output. However, I need the score reported and it apparently doesn't do that. The reason I'm using the tabular format is to speed parsing, since that was taking more than half the CPU time... Anyhow, is there anyway to compute the score from the e-value and/or bit scores? Or am I stuck rerunning all those blasts? Thanks -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| From jbedell at oriongenomics.com Thu Jun 30 17:48:36 2005 From: jbedell at oriongenomics.com (Joseph Bedell) Date: Thu Jun 30 17:39:55 2005 Subject: [Bioperl-l] BLAST scores Message-ID: <434AF352F9D03C4C896782B8CC78BC7687F264@VADER.oriongenomics.com> Hi Josh, >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >bounces@portal.open-bio.org] On Behalf Of Josh Lauricha >Sent: Thursday, June 30, 2005 4:03 PM >To: bioperl-l@portal.open-bio.org >Subject: [Bioperl-l] BLAST scores > >Not really a BioPerl question, but... > >I ran a bunch of blasts using the tablular output. However, I need the >score reported and it apparently doesn't do that. The reason I'm using >the tabular format is to speed parsing, since that was taking more than >half the CPU time... Anyhow, is there anyway to compute the score from >the e-value and/or bit scores? Or am I stuck rerunning all those >blasts? You can calculate the score given the bit score (from the tabular output) and Lambda (calculated from the matrix). The equation is Score = (Bits)/(Lambda in bits). Lambda is only dependent upon the matrix. Did you use NCBI-blast or WU-BLAST? Which flavor of blast (blastn, blastp, etc)? In any case, you can just run a single blast and look at the stats at the bottom of the report to get the value of lambda. For example, a default NCBI-blastn (+1/-3) search has a lambda of 1.37 ============================ Lambda K H 1.37 0.711 1.31 Gapped Lambda K H 1.37 0.711 1.31 =============================== But, what is difficult to discover is this lambda is in NATS. To convert it to bits, divide it by the natural log of 2, or in perl: perl -e 'print 1.37/log(2),"\n"' 1.97649220601788 So, now you can take all of your bit scores divided by 1.97649220601788 to get the Score. HTH, Joey >Thanks > >-- > >------------------------------------------------------ >| Josh Lauricha | Ford, you're turning | >| laurichj@bioinfo.ucr.edu | into a penguin. Stop | >| Bioinformatics, UCR | it | >|----------------------------------------------------| >| OpenPG: | >| 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | >|----------------------------------------------------| >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From laurichj at bioinfo.ucr.edu Thu Jun 30 17:55:53 2005 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Thu Jun 30 17:47:00 2005 Subject: [Bioperl-l] BLAST scores In-Reply-To: <434AF352F9D03C4C896782B8CC78BC7687F264@VADER.oriongenomics.com> References: <434AF352F9D03C4C896782B8CC78BC7687F264@VADER.oriongenomics.com> Message-ID: <20050630215553.GB13422@bioinfo.ucr.edu> On Thu 06/30/05 16:48, Joseph Bedell wrote: > You can calculate the score given the bit score (from the tabular > output) and Lambda (calculated from the matrix). The equation is Score = > (Bits)/(Lambda in bits). > > Lambda is only dependent upon the matrix. Did you use NCBI-blast or > WU-BLAST? Which flavor of blast (blastn, blastp, etc)? In any case, you > can just run a single blast and look at the stats at the bottom of the > report to get the value of lambda. For example, a default NCBI-blastn > (+1/-3) search has a lambda of 1.37 > > ============================ > Lambda K H > 1.37 0.711 1.31 > > Gapped > Lambda K H > 1.37 0.711 1.31 > =============================== > > But, what is difficult to discover is this lambda is in NATS. To convert > it to bits, divide it by the natural log of 2, or in perl: > > perl -e 'print 1.37/log(2),"\n"' > 1.97649220601788 > > So, now you can take all of your bit scores divided by 1.97649220601788 > to get the Score. > > HTH, > Joey Cool, thanks. That'll save me a bunch of time ;) This was NCBI blastp, so I've already got it calculated ;) Thanks. -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| | Geek Code: Version 3.12 | | GAT/CS$/IT$ d+ s-: a-->--- C++++$ UL++++$ P++ L++++| | $E--- W+ N o? K? w--(---) O? M+(++) V? PS++ PE-(--)| | Y+ PGP+++ t--- 5+++ X+ R tv DI++ D--- G++ | | e++ h- r++ z? | |----------------------------------------------------| From gbazykin at Princeton.EDU Thu Jun 30 12:55:56 2005 From: gbazykin at Princeton.EDU (Georgii Bazykin) Date: Fri Jul 1 14:28:34 2005 Subject: [Bioperl-l] TreeIO::nhx doesn't write internal node labels Message-ID: <174361574726.20050630205556@princeton.edu> Hi, I am new to BioPerl, and I am having trouble trying to save a tree in NHX format. I load a nexus tree and parse a PAUP log file ("branch linkages") to get internal node ids (I will then need to process character changes between internal nodes, this is why I need internal node ids). I then put write PAUP ids (which are numbers) as ids of internal nodes of the tree, and write the tree in nxh format, hoping that the internal node labels will be preserved. But the resulting nhx file has only empty [&&NHX] labels and no internal node labels. Is this a feature, or am I doing something wrong? Please help! Yegor