From hlapp at gmx.net Sun May 1 18:41:03 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun May 1 18:34:02 2005 Subject: [Bioperl-l] Re: Bio::DB::Query In-Reply-To: <15a9a89705042112572328b827@mail.gmail.com> Message-ID: <19288707-BA92-11D9-8227-000A959EB4C4@gmx.net> Carito, sorry I didn't have time to get back to you earlier. The documentation of many modules is certainly far from complete, and most if not all modules could certainly benefit a lot from examples. The fact that documentation is sparse in many cases is not due to ignorance or oversight though, but simply due to lack of time, or in other words lack of volunteers. As for modifying records after they are read by SeqIO and before they are stored in the database, the perfect mechanism for this with support built into load_seqdatabase.pl is to write a sequence processor. Have you read the documentation under the --pipeline option in load_seqdatabase.pl? It references the respective modules in bioperl that the framework is built upon which will have more documentation. I use this every day with great success for my own projects. As for the rest of your email, be sure to note one thing. Bioperl/biosql/bioperl-db is open source - you didn't pay for it, did you? - and has been written by volunteers, i.e., people who didn't stop at complaining but went on to volunteer their unpaid time and made this world a better place for themselves and everybody else too by making an effort to correct the issues they found. -hilmar On Thursday, April 21, 2005, at 12:57 PM, carito vargas wrote: > Hi > >> Bio::DB::Query::BioQuery will map classes to tables for you. There's >> no >> really good HowTo document yet; there's plenty of examples though in >> the respective test script t/query.t. > > It should be good that in that in the bioperl API appear examples, > there are many functions with nothing of documentation... > >> I don't understand your goal in modifying load_seqdatabase.pl, so >> unless you elaborate on what you're trying to achieve I can't help >> you. > > My goal for modifying load_seqdatabase.pl is because I wanted to store > different .gbk files but with few modifications, and we needed to > store them all with different versions.... I could have done it by > using ohter parsing aplication before... > > thank you .. > > carito > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From echuong at gmail.com Mon May 2 17:13:59 2005 From: echuong at gmail.com (Edward Chuong) Date: Mon May 2 17:09:14 2005 Subject: [Bioperl-l] PAML parsing: standard error? Message-ID: <244d2e0e050502141330514baa@mail.gmail.com> Hi, Has parsing the standard errors for pairwise comp been implemented yet? Can't seem to figure out how to retrieve them. (if getSE = 1, the bottom of the file would look like this) pairwise comparison, codon frequencies: F3x4. 2 (PM_BWp0020A12f/1-633) ... 1 (mouse/1-630) lnL =-1299.951893 0.88536 2.62235 0.89353 SEs for parameters: 0.08133 0.51260 0.19693 t= 0.8854 S= 178.0 N= 443.0 dN/dS= 0.8933 dN= 0.2854 dS= 0.3194 dN = 0.28536 +- 0.03182 dS = 0.31943 +- 0.05807 (by method 1) dN = 0.28536 +- 0.03182 dS = 0.31943 +- 0.05807 (by method 2) Thanks! -Ed From jason.stajich at duke.edu Mon May 2 22:31:27 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon May 2 22:24:27 2005 Subject: [Bioperl-l] PAML parsing: standard error? In-Reply-To: <244d2e0e050502141330514baa@mail.gmail.com> References: <244d2e0e050502141330514baa@mail.gmail.com> Message-ID: <22cde51e72b8ba266fc609facb895aef@duke.edu> Nope, it's not parsed. feel free to add it and post a patch... On May 2, 2005, at 5:13 PM, Edward Chuong wrote: > Hi, > > Has parsing the standard errors for pairwise comp been implemented > yet? Can't seem to figure out how to retrieve them. > > (if getSE = 1, the bottom of the file would look like this) > > pairwise comparison, codon frequencies: F3x4. > > > 2 (PM_BWp0020A12f/1-633) ... 1 (mouse/1-630) > lnL =-1299.951893 > 0.88536 2.62235 0.89353 > SEs for parameters: > 0.08133 0.51260 0.19693 > > t= 0.8854 S= 178.0 N= 443.0 dN/dS= 0.8933 dN= 0.2854 dS= > 0.3194 > dN = 0.28536 +- 0.03182 dS = 0.31943 +- 0.05807 (by method 1) > dN = 0.28536 +- 0.03182 dS = 0.31943 +- 0.05807 (by method 2) > > Thanks! > -Ed > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From diriano at rz.uni-potsdam.de Tue May 3 03:55:22 2005 From: diriano at rz.uni-potsdam.de (Diego Riano) Date: Tue May 3 03:49:42 2005 Subject: [Bioperl-l] Problem retrieving sequences from NCBI Message-ID: <1115106922.8151.8.camel@molbio21.bio.uni-potsdam.de> Hello, I have a small problem. I have a script to retrieve sequences from ncbi. If there are specified coordinates, then the script only retrieves the corresponding region from the sequence. The input of the script is a file (IN) with a list of accession numbers and an optional pair of coordinates (start-end), and the user can specified the output format (default is fasta). When a specific region was specified, there is a secondary accesion number: REGION: start..end The problem that I have is that for some sequences I found in the output, for the secondary ac: REGION: ? Any idea why this could happen? ############################################### while(my $line=){ chomp $line; my ($id,$coords)=split(/\t/,$line); my $fetch="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=$id&retmode=text"; if(defined($format) && $format ne ""){ $fetch.="&rettype=$format"; } else{ $fetch.="&rettype=fasta"; } if (defined($coords)){ my ($start,$end)=split(/-/,$coords); $fetch.="&seq_start=$start&seq_stop=$end"; } my $result=get($fetch); } ################################################# Thanks diego -- _______________________________________ Diego Mauricio Riano Pachon Biologist Institute of Biology and Biochemistry Potsdam University Karl-Liebknecht-Str. 24-25 Haus 20 14476 Golm Germany Tel:+49 331 977 2809 http://www.geocities.com/dmrp.geo/ From jgrg at sanger.ac.uk Tue May 3 11:04:12 2005 From: jgrg at sanger.ac.uk (James Gilbert) Date: Tue May 3 10:55:05 2005 Subject: [Bioperl-l] v5.005 in INSTALL Message-ID: <4a8ee9cafbc8ee0d175fc061da63104d@sanger.ac.uk> Hi, A user emailed me to say that our Bio::Index modules don't work with Perl 5.005, and that a lot of others don't either because they have "use warnings". It says in the INSTALL file that 5.005 is OK. Should we change the INSTALL file to a later version of Perl, or fix 5.005 compatability? (He had to use 5.005 on a particular machine because it is required by his SRS installation.) Personally I'd go with 5.6.1 as a minimum. I like "use base", which is, I think, a 5.6.1 feature. This is what we have in the INSTALL: o SYSTEM REQUIREMENTS - perl 5.005 or later*. - External modules: Bioperl uses functionality provided in other Perl modules. Some of these are included in the standard perl package but some need to be obtained from the CPAN site. The list of external modules is included at the bottom of this INSTALL document. The CPAN Bioperl Bundle (Bundle::BioPerl) makes installation of these external modules easy. Simply install the bundle using your CPAN shell and all necessary modules will be installed. See THE BIOPERL BUNDLE, below. * Note that most modules will work with earlier versions of Perl. The only ones that will not are Bio::SimpleAlign.pm and the Bio::Index::* modules. If you don't need these modules and you want to install bioperl using an earlier version of Perl, edit the "require 5.005;" line in Makefile.PL as necessary. James ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ James G.R. Gilbert The Wellcome Trust Sanger Institute Fax: +44 (0)1223 494919 Wellcome Trust Genome Campus Tel: +44 (0)1223 494906 Hinxton, Cambridge, CB10 1SA From hlapp at gmx.net Tue May 3 11:38:23 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue May 3 11:31:20 2005 Subject: [Bioperl-l] v5.005 in INSTALL In-Reply-To: <4a8ee9cafbc8ee0d175fc061da63104d@sanger.ac.uk> Message-ID: <624DBB8C-BBE9-11D9-9084-000A959EB4C4@gmx.net> I ran an informal survey on this a couple days ago, you can check the archive for the results. The bottom line is that most people appear to use 5.6.1 or 5.8.x, but for instance FlyBase@Harvard still uses 5.005 although apparently they are in the process of upgrading. MacOSX 10.2 comes with 5.6.0 and Apple never updated it (in 10.2). -hilmar On Tuesday, May 3, 2005, at 08:04 AM, James Gilbert wrote: > > Hi, > > A user emailed me to say that our Bio::Index modules don't work with > Perl 5.005, and that a lot of others don't either because they have > "use warnings". It says in the INSTALL file that 5.005 is OK. Should > we change the INSTALL file to a later version of Perl, or fix 5.005 > compatability? (He had to use 5.005 on a particular machine because it > is required by his SRS installation.) > > Personally I'd go with 5.6.1 as a minimum. I like "use base", which > is, I think, a 5.6.1 feature. > > This is what we have in the INSTALL: > > o SYSTEM REQUIREMENTS > > - perl 5.005 or later*. > > - External modules: Bioperl uses functionality provided in other > Perl modules. Some of these are included in the standard perl > package but some need to be obtained from the CPAN site. The > list of external modules is included at the bottom of > this INSTALL document. > > The CPAN Bioperl Bundle (Bundle::BioPerl) makes installation > of these external modules easy. Simply install the bundle > using your CPAN shell and all necessary modules will be installed. > See THE BIOPERL BUNDLE, below. > > * Note that most modules will work with earlier versions of Perl. > The only ones that will not are Bio::SimpleAlign.pm and > the Bio::Index::* modules. If you don't need these modules > and you want to install bioperl using an earlier version of Perl, > edit the "require 5.005;" line in Makefile.PL as necessary. > > > James > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > James G.R. Gilbert The Wellcome Trust Sanger Institute > Fax: +44 (0)1223 494919 Wellcome Trust Genome Campus > Tel: +44 (0)1223 494906 Hinxton, Cambridge, CB10 1SA > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From tembe at bioanalysis.org Tue May 3 11:40:34 2005 From: tembe at bioanalysis.org (Waibhav Tembe) Date: Tue May 3 11:33:38 2005 Subject: [Bioperl-l] Megablast Ouput In-Reply-To: <4a8ee9cafbc8ee0d175fc061da63104d@sanger.ac.uk> References: <4a8ee9cafbc8ee0d175fc061da63104d@sanger.ac.uk> Message-ID: <42779B72.3030408@bioanalysis.org> Hello List, I was wondering if BioPerl parse routines from BLAST output could be used for megablast output as well (assuming that the display format for megablast output has been set to BLAST like output). Secondly, if the output display format (-m parameter) for blastall is changed, is it still possible to use BioPerl blast output parsers to extract hits/HSPs/Evalue etc? Thanks. Regards, -waibhav From jason.stajich at duke.edu Tue May 3 11:48:31 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue May 3 11:41:38 2005 Subject: [Bioperl-l] Megablast Ouput In-Reply-To: <42779B72.3030408@bioanalysis.org> References: <4a8ee9cafbc8ee0d175fc061da63104d@sanger.ac.uk> <42779B72.3030408@bioanalysis.org> Message-ID: <9143f5acb4f86e7c26fc309697911e87@duke.edu> Bio::SearchIO::blast will parse MEGABLAST output the -D 0 and -D 2 formats. -jason On May 3, 2005, at 11:40 AM, Waibhav Tembe wrote: > Hello List, > > I was wondering if BioPerl parse routines from BLAST output could be > used for megablast output as well (assuming that the display format > for megablast output has been set to BLAST like output). > > Secondly, if the output display format (-m parameter) for blastall is > changed, is it still possible to use BioPerl blast output parsers to > extract hits/HSPs/Evalue etc? > > Thanks. > > Regards, > > -waibhav > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From tembe at bioanalysis.org Tue May 3 12:02:44 2005 From: tembe at bioanalysis.org (Waibhav Tembe) Date: Tue May 3 11:55:42 2005 Subject: [Bioperl-l] Megablast Ouput In-Reply-To: <9143f5acb4f86e7c26fc309697911e87@duke.edu> References: <4a8ee9cafbc8ee0d175fc061da63104d@sanger.ac.uk> <42779B72.3030408@bioanalysis.org> <9143f5acb4f86e7c26fc309697911e87@duke.edu> Message-ID: <4277A0A4.2090206@bioanalysis.org> Thanks for the quick reply. Another question: If I don't care about the statistics of the the hits, but only need the local alignments and no. of gaps/mismatches, is using megablast and blast with the following parameters equivalent? blastall -p blastn -d DB -F f -q -1 -r 1 -G 2 -E 1 -W 7 -I T -e 10000000 -i Input -o Output megablast -d DB -F f -q -1 -r 1 -G 2 -E 1 -W 28 -I T -m 0 -D 2 -n T -R T -i Input -o Output 1. I tried to set E value large in BLAST to get all possible hits. 2. Since megablast computes a hash for every 4 base pairs, I selected W=28 which (I think) should match W=7 for blastn Any suggestions/comments/criticism welcome. I have to blast more than 100 short sequences of length approx 50bp and need to get local alignment scores using a modest linux box. I observed that megablast runs very fast as compared to BLAST. So I am curious to know if I will miss any hits using megablast instead of blast. As I mentioned, I don't care about the statistics. Thanks. Jason Stajich wrote: > Bio::SearchIO::blast will parse MEGABLAST output the -D 0 and -D 2 > formats. > > -jason > > On May 3, 2005, at 11:40 AM, Waibhav Tembe wrote: > >> Hello List, >> >> I was wondering if BioPerl parse routines from BLAST output could be >> used for megablast output as well (assuming that the display format >> for megablast output has been set to BLAST like output). >> >> Secondly, if the output display format (-m parameter) for blastall is >> changed, is it still possible to use BioPerl blast output parsers to >> extract hits/HSPs/Evalue etc? >> >> Thanks. >> >> Regards, >> >> -waibhav >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > From michael.watson at bbsrc.ac.uk Wed May 4 08:33:59 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed May 4 08:26:55 2005 Subject: [Bioperl-l] Error writing out EMBL Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D3A6@iahce2knas1.iah.bbsrc.reserved> Hi I'm on bioperl-1.5. I get this when writing out an EMBL file: FT /gene_id="Bio::Annotation::SimpleValue=HASH(0x858d688)" FT /note="Bio::Annotation::SimpleValue=HASH(0x8595360)" My code is: use Bio::Perl; use Bio::SeqIO; use Bio::Seq; use Bio::SeqFeature::Generic; my $seq = Bio::Seq->new(-display_id => 'test', -seq => "ACGTACGTACGTACGT"); my $gene = "test gene"; my $gene_feat = Bio::SeqFeature::Generic->new(-start => 2, -end => 5, -primary => 'CDS', -tag => {gene_id => $gene, note => 'test gene'}); $seq->add_SeqFeature($gene_feat); write_sequence(">test.embl",'embl',$seq); Why are my gene_id and note tags being written as bioperl object references? This didn't happen in 1.4! :-) Many thanks Mick From hlapp at gmx.net Wed May 4 12:08:33 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed May 4 12:02:19 2005 Subject: [Bioperl-l] Error writing out EMBL In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D3A6@iahce2knas1.iah.bbsrc.reserved> Message-ID: This is a known problem. Use 1.4 instead, or use the CVS downloads from either the 1.4 branch or the main trunk. -hilmar On Wednesday, May 4, 2005, at 05:33 AM, michael watson ((IAH-C)) wrote: > Hi > > I'm on bioperl-1.5. > > I get this when writing out an EMBL file: > > FT > /gene_id="Bio::Annotation::SimpleValue=HASH(0x858d688)" > FT > /note="Bio::Annotation::SimpleValue=HASH(0x8595360)" > > My code is: > > use Bio::Perl; > use Bio::SeqIO; > use Bio::Seq; > use Bio::SeqFeature::Generic; > > my $seq = Bio::Seq->new(-display_id => 'test', > -seq => "ACGTACGTACGTACGT"); > > my $gene = "test gene"; > > my $gene_feat = Bio::SeqFeature::Generic->new(-start => 2, > -end => 5, > -primary => 'CDS', > -tag => {gene_id => $gene, > note => 'test > gene'}); > > $seq->add_SeqFeature($gene_feat); > > write_sequence(">test.embl",'embl',$seq); > > > Why are my gene_id and note tags being written as bioperl object > references? This didn't happen in 1.4! :-) > > Many thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From alee at imcb.a-star.edu.sg Wed May 4 21:32:02 2005 From: alee at imcb.a-star.edu.sg (Alison Lee) Date: Wed May 4 22:33:30 2005 Subject: [Bioperl-l] Rank index in Bio::Tools::Run::Vista Message-ID: <004301c55112$559cd590$7347d90a@imcb.astar.edu.sg> Dear Bioperl Developers I would like to ask about Bio::Tools::Run::Vista. The regular expression at Line 365 /\d+/ which is meant to detect rank indexes, does not seem to preclude seqids with digits. Should it be modified to /^\d+$/? Also, lines 367 and 368 should be swapped. Lastly, can the Vista option "FILENAME" be added? Thanks. Regards Alison. DISCLAIMER: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you. From oliver.burren at cimr.cam.ac.uk Thu May 5 08:35:03 2005 From: oliver.burren at cimr.cam.ac.uk (Oliver Burren) Date: Thu May 5 08:31:44 2005 Subject: [Bioperl-l] Glyph question Message-ID: <1115296503.6133.41.camel@jakarta> Hi Developers, I'm trying to render some transcripts and I'm using objects in Bio::SeqFeature::Gene as well as Bio::Graphics::Panel and Glyph. I have added some subfeatures (Of type SeqFeature::Generic) to the exon objects that exist within these transcripts. Unfortunately when I render using the Bio::Graphics::Glyph::transcript glyph it treats these seqfeatures as exons so I get an odd transcript out. Is there anyway without creating a custom glyph to get Bio::Graphics::Glyph::transcript to use only exon subFeatures when rendering ? I'm using bioperl 1.4 Thanks very much Olly Burren From lstein at cshl.edu Thu May 5 12:41:41 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Thu May 5 12:34:50 2005 Subject: [Bioperl-l] Glyph question In-Reply-To: <1115296503.6133.41.camel@jakarta> References: <1115296503.6133.41.camel@jakarta> Message-ID: <200505051241.41952.lstein@cshl.edu> Hi Olly, Do you want the subfeatures to appear at all, or just suppress them? If you just want to suppress them, then it is easy to subclass the transcript glyph --- you can do it in the same file as your main script if you like: package Bio::Graphics::Glyph::mytranscript; use base 'Bio::Graphics::Glyph::transcript'; sub _subseq { # override the method that produces subparts my $self = shift; my $feature = shift; my @subseq = $self->SUPER::_subseq($feature); return grep {$_->primary_tag eq 'exon'} @subseq; } Then refer to the glyph named "mytranscript". Alternatively there is a new glyph named "processed_transcript" that -- I think -- will only display subfeatures of type "exon", "CDS" and "UTR". I haven't tested it with other types of subfeatures, however, so I might be wrong. Lincoln On Thursday 05 May 2005 08:35, Oliver Burren wrote: > Hi Developers, > > I'm trying to render some transcripts and I'm using objects in > Bio::SeqFeature::Gene as well as Bio::Graphics::Panel and Glyph. > > I have added some subfeatures (Of type SeqFeature::Generic) to the exon > objects that exist within these transcripts. Unfortunately when I render > using the Bio::Graphics::Glyph::transcript glyph it treats these > seqfeatures as exons so I get an odd transcript out. Is there anyway > without creating a custom glyph to get Bio::Graphics::Glyph::transcript > to use only exon subFeatures when rendering ? > > I'm using bioperl 1.4 > > Thanks very much > > Olly Burren > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From michael.watson at bbsrc.ac.uk Fri May 6 06:07:45 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri May 6 06:01:40 2005 Subject: [Bioperl-l] SearchIO and fasta output Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D3E8@iahce2knas1.iah.bbsrc.reserved> Hi Is there a particular output format I need to use when parsing FASTA output with SearchIO? I just used the defaults and got really rather disastrous results... Thanks Mick From jason.stajich at duke.edu Fri May 6 08:05:18 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri May 6 07:58:26 2005 Subject: [Bioperl-l] SearchIO and fasta output In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D3E8@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950172D3E8@iahce2knas1.iah.bbsrc.reserved> Message-ID: <9710d2315d33f74de0bd51f624426023@duke.edu> It works great for me. Concrete examples of a report are essential if you want help. What version of bioperl, FASTA? etc. There was an incompatibility with the latest version of FASTA output and older bioperl - fixed in CVS. The -m 9 -d 0 will also parse. For quick and dirty FASTA to tabular I use scripts/searchio/fastam9_to_table which turns FASTA m9 output into a blastall -m 9 like table (without SearchIO in fact so it is quite fast). There are several test files that get run in the test suite and the test t/SearchIO.t which demonstrate FASTA parsing. On May 6, 2005, at 6:07 AM, michael watson ((IAH-C)) wrote: > Hi > > Is there a particular output format I need to use when parsing FASTA > output with SearchIO? I just used the defaults and got really rather > disastrous results... > > Thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From michael.watson at bbsrc.ac.uk Fri May 6 09:03:15 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri May 6 08:56:20 2005 Subject: [Bioperl-l] SearchIO and fasta output Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D3F0@iahce2knas1.iah.bbsrc.reserved> Hi Sorry about the crap report. Basically I was using bioperl-1.4, and that didn't parse the latest version of fasta. So I upgraded to bioperl-1.5 and that works BUT because the script I am using creates features, my features now contain (the documented 1.5 bug) "FT /note="Bio::Annotation::SimpleValue=HASH(0x8595360)". So I'm not sure what to do - if I get the latest version from CVS is the above bug fixed? Thanks Mick -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: 06 May 2005 13:05 To: michael watson (IAH-C) Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] SearchIO and fasta output It works great for me. Concrete examples of a report are essential if you want help. What version of bioperl, FASTA? etc. There was an incompatibility with the latest version of FASTA output and older bioperl - fixed in CVS. The -m 9 -d 0 will also parse. For quick and dirty FASTA to tabular I use scripts/searchio/fastam9_to_table which turns FASTA m9 output into a blastall -m 9 like table (without SearchIO in fact so it is quite fast). There are several test files that get run in the test suite and the test t/SearchIO.t which demonstrate FASTA parsing. On May 6, 2005, at 6:07 AM, michael watson ((IAH-C)) wrote: > Hi > > Is there a particular output format I need to use when parsing FASTA > output with SearchIO? I just used the defaults and got really rather > disastrous results... > > Thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Fri May 6 09:39:05 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri May 6 09:32:28 2005 Subject: [Bioperl-l] SearchIO and fasta output In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D3F0@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950172D3F0@iahce2knas1.iah.bbsrc.reserved> Message-ID: Yell at the people who put the feature stuff in 1.5.0 that wasn't backwards compatible... okay that won't actually be productive. Upgrade to the CVS version if you can, otherwise you might be able to just drop in the Bio/SearchIO/fasta.pm from 1.5 into 1.4. I don't remember if there were any other knockoff effects to other modules that needed to be done, I think it was just a regexp fix. We should really put a 1.5.1 out to deal with the problems introduced with feature/annotations in 1.5.0 so that the old API is respected. -jason On May 6, 2005, at 9:03 AM, michael watson ((IAH-C)) wrote: > Hi > > Sorry about the crap report. > > Basically I was using bioperl-1.4, and that didn't parse the latest > version of fasta. > > So I upgraded to bioperl-1.5 and that works BUT because the script I am > using creates features, my features now contain (the documented 1.5 > bug) > "FT /note="Bio::Annotation::SimpleValue=HASH(0x8595360)". > > So I'm not sure what to do - if I get the latest version from CVS is > the > above bug fixed? > > Thanks > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 06 May 2005 13:05 > To: michael watson (IAH-C) > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] SearchIO and fasta output > > > It works great for me. Concrete examples of a report are essential if > you want help. > What version of bioperl, FASTA? etc. There was an incompatibility > with the latest version of FASTA output and older bioperl - fixed in > CVS. > > The -m 9 -d 0 will also parse. For quick and dirty FASTA to tabular I > use scripts/searchio/fastam9_to_table which turns FASTA m9 output into > a blastall -m 9 like table (without SearchIO in fact so it is quite > fast). > > There are several test files that get run in the test suite and the > test t/SearchIO.t which demonstrate FASTA parsing. > > On May 6, 2005, at 6:07 AM, michael watson ((IAH-C)) wrote: > >> Hi >> >> Is there a particular output format I need to use when parsing FASTA >> output with SearchIO? I just used the defaults and got really rather >> disastrous results... >> >> Thanks >> Mick >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From hlapp at gmx.net Fri May 6 12:22:59 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri May 6 12:16:14 2005 Subject: [Bioperl-l] SearchIO and fasta output In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D3F0@iahce2knas1.iah.bbsrc.reserved> Message-ID: <1C5F5CFC-BE4B-11D9-9F00-000A959EB4C4@gmx.net> Yes, I think so. -hilmar On Friday, May 6, 2005, at 06:03 AM, michael watson ((IAH-C)) wrote: > Hi > > Sorry about the crap report. > > Basically I was using bioperl-1.4, and that didn't parse the latest > version of fasta. > > So I upgraded to bioperl-1.5 and that works BUT because the script I am > using creates features, my features now contain (the documented 1.5 > bug) > "FT /note="Bio::Annotation::SimpleValue=HASH(0x8595360)". > > So I'm not sure what to do - if I get the latest version from CVS is > the > above bug fixed? > > Thanks > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: 06 May 2005 13:05 > To: michael watson (IAH-C) > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] SearchIO and fasta output > > > It works great for me. Concrete examples of a report are essential if > you want help. > What version of bioperl, FASTA? etc. There was an incompatibility > with the latest version of FASTA output and older bioperl - fixed in > CVS. > > The -m 9 -d 0 will also parse. For quick and dirty FASTA to tabular I > use scripts/searchio/fastam9_to_table which turns FASTA m9 output into > a blastall -m 9 like table (without SearchIO in fact so it is quite > fast). > > There are several test files that get run in the test suite and the > test t/SearchIO.t which demonstrate FASTA parsing. > > On May 6, 2005, at 6:07 AM, michael watson ((IAH-C)) wrote: > >> Hi >> >> Is there a particular output format I need to use when parsing FASTA >> output with SearchIO? I just used the defaults and got really rather >> disastrous results... >> >> Thanks >> Mick >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri May 6 12:23:47 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri May 6 12:17:20 2005 Subject: [Bioperl-l] SearchIO and fasta output In-Reply-To: Message-ID: <396D64F8-BE4B-11D9-9F00-000A959EB4C4@gmx.net> On Friday, May 6, 2005, at 06:39 AM, Jason Stajich wrote: > We should really put a 1.5.1 out to deal with the problems introduced > with feature/annotations in 1.5.0 so that the old API is respected. > Yes I know. It ain't easy to juggle time ... -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From glauber at ioc.fiocruz.br Thu May 5 15:57:24 2005 From: glauber at ioc.fiocruz.br (Glauber Wagner) Date: Sat May 7 16:11:48 2005 Subject: [Bioperl-l] Problem with parser blast!!!! Message-ID: <8D44604203DAF9438BF9123B4A08C779CEBDE1@alpha.ioc.fiocruz.br> Hi .. I have onde problem whem I parser one blast result!!!! The script works well and at the moment when Evalue = 0.0 the scripts stop!!! What is the problem? I screenned al the script but nothing!!! thanks glauber Evalue: 2e-06 Evalue Int: -13 Evalue Log: -13.1223633774043 Evalue Int resgatado = 1 Evalue Log resgatado = 1 1 1 Evalue: 0.0

Software error:

Can't take log of 0 at blast_clusters_local_print.pl_old line 126, <GEN1> line 81960.

For help, please send mail to this site's webmaster, giving this error message and the time and date of the error.

[Thu May 5 17:30:50 2005] blast_clusters_local_print.pl_old: Can't take log of 0 at blast_clusters_local_print.pl_old line 126, line 81960. From sc167 at cornell.edu Fri May 6 12:35:15 2005 From: sc167 at cornell.edu (Samuel W. Cartinhour) Date: Sat May 7 16:12:08 2005 Subject: [Bioperl-l] using graphics xyplot Message-ID: <75c9c5bcb66c4de4921e9e579add4e52@cornell.edu> Samuel W. Cartinhour USDA-ARS 325A Plant Science Building Cornell University Ithaca NY 14853 Office 607 255 8091 / FAX 607 255 4471 From ymc at paxil.stanford.edu Fri May 6 18:19:25 2005 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Sat May 7 16:12:27 2005 Subject: [Bioperl-l] Committed HMM module Message-ID: Hi all I just commited my HMM code to CVS. You can find them in bioperl-live/Bio/Tools/HMM.pm bioperl-ext/Bio/Ext/HMM/* Please give it a try and tell me what you think. Regards, Yee Man From lixiao_w at hotmail.com Sun May 8 06:20:04 2005 From: lixiao_w at hotmail.com (Wang Lixiao) Date: Sun May 8 06:15:53 2005 Subject: [Bioperl-l] How to run hmmpfam Message-ID: Dear all, I am new to bioperl. I want to use Bio::Tools::Run::Hmmpfam and CGI with Mozilla broswer. I can run my cgi file on my apache server, however, when I run it with broswer, it reports some errors: ------------- EXCEPTION ------------- MSG: Hmmpfam call ( -E 0.0001 myhmms 7LES_DROME) crashed: 0 STACK Bio::Tools::Run::Hmmpfam::_run /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/Hmmpfam.pm:224 STACK Bio::Tools::Run::Hmmpfam::run /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/Hmmpfam.pm:200 STACK toplevel /usr/local/apache/cgi-bin/seqtest.cgi:40 -------------------------------------- How can I resolve the problem? I do need your help! Thanks, /Lixiao From Marc.Logghe at devgen.com Sun May 8 14:33:40 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Sun May 8 14:26:30 2005 Subject: [Bioperl-l] How to run hmmpfam Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E756@ANTARESIA.be.devgen.com> Hi Lixiao, > MSG: Hmmpfam call ( -E 0.0001 myhmms 7LES_DROME) crashed: 0 > This error you typically get when bioperl is not able to find the path to hmmpfam (just in front of '-E' you should see the path of the executable, which is missing here). There are several ways to do this: 1) PassEnv (or SetEnv) for HMMPFAMDIR in your apache configuration 2) set it directly in your cgi script: $ENV{HMMPFAMDIR} = '/your/path/to/hmmfam_folder'; Important remark: in both cases HMMPFAMDIR should contain the path only, e.g. /usr/local/bin, not including the name of the executable itself (hmmpfam) ! 3) pass the full path (this time including the executable) as a parameter: my $factory = Bio::Tools::Run::Hmmpfam->new(PROGRAM => '/usr/local/bin/hmmpfam');# or wherever your exe is located HTH, Marc From Sean.Maceachern at dpi.vic.gov.au Mon May 9 03:17:21 2005 From: Sean.Maceachern at dpi.vic.gov.au (Sean.Maceachern@dpi.vic.gov.au) Date: Mon May 9 03:12:40 2005 Subject: [Bioperl-l] ESTScan Query Message-ID: Hello, I am processing a number of EST sequences with ESTScan and was hoping someone could tell me if anything exists that can take an alignment from an ESTScan translation (protein) and align the corresponding nucleic coding regions maintaining the gaps etc... from the aligned protein. Any suggestions would be appreciated. Sean From hlapp at gmx.net Mon May 9 04:36:08 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon May 9 04:32:51 2005 Subject: [Bioperl-l] use base; Message-ID: <6415B33C-C065-11D9-9D8F-000A959EB4C4@gmx.net> Perldoc says this has been present since 5.004 but unfortunately it doesn't work for multiple inheritance before 5.6.1, at least not on MacOSX. Test case attached. I tracked this down when investigating the FeatureIO.t warnings. I suggest that maybe we stick to the good old @ISA declaration - not terribly more lines of code and not terribly uglier, but has always worked. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- To test this, create a file Foo.pm with the following content: package Foo; use strict; use base qw(Bio::Root::Root Bio::SeqFeatureI); sub new { my $class = shift; my $self = bless {}, $class; return $self; } 1; Then run the following command: $ perl -w -e 'use Foo; $foo = Foo->new(); if ($foo->isa("Bio::SeqFeatureI")) { print "ok\n"; } else { print "not ok\n"; }' This needs to print 'ok'. On 5.6.0 it will print 'not ok', but if you remove the Bio::Root::Root from the 'use base' instruction it will print 'ok'. From hlapp at gmx.net Mon May 9 04:40:33 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon May 9 04:34:02 2005 Subject: [Bioperl-l] Bio::SeqFeature::Annotated Message-ID: <01E95799-C066-11D9-9D8F-000A959EB4C4@gmx.net> I'm adding a standard header; Allen if you could please fill in the minimal blanks. I'm also changing it to use @ISA as argued in the previous email, and I'm changing the warning upon attempt to add an object which is not SeqFeatureI to throwing an exception since I don't think it's a good idea if the object is ignored with only a warning (Generic.pm warns too but goes ahead and does add the feature). -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From james.wasmuth at ed.ac.uk Mon May 9 05:31:03 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Mon May 9 05:29:24 2005 Subject: [Bioperl-l] ESTScan Query In-Reply-To: References: Message-ID: <427F2DD7.1010101@ed.ac.uk> Hi Sean "tranalign" from the EMBOSS package will do what you ask... word of warning about ESTScan there's a very small bug in the code that can severely affect your translations. In the main ESTScan script replace $pSeq =~ s/[acgt]//g; # Remove lowercases... with $pSeq =~ s/[acgtn]//g; # Remove lowercases... james Sean.Maceachern@dpi.vic.gov.au wrote: >Hello, > >I am processing a number of EST sequences with ESTScan and was hoping >someone could tell me if anything exists that can take an alignment from an >ESTScan translation (protein) and align the corresponding nucleic coding >regions maintaining the gaps etc... from the aligned protein. > >Any suggestions would be appreciated. > >Sean > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- "Until man duplicates a blade of grass, nature can laugh at his so-called scientific knowledge...." --Thomas Edison Blaxter Nematode Genomics Group | Institute of Evolutionary Biology | Ashworth Laboratories, KB | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From michael.watson at bbsrc.ac.uk Mon May 9 06:29:32 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon May 9 06:23:48 2005 Subject: [Bioperl-l] Problems with Bio/Graphics/Feature.pm Message-ID: <8975119BCD0AC5419D61A9CF1A923E950121BB77@iahce2knas1.iah.bbsrc.reserved> Hi I'm hacking around with the render_sequence.pl example script and keep getting errors: Can't locate object method "seq_id" via package "Bio::Seq::RichSeq" at /usr/local/bioperl-1.5.0/Bio/Graphics/Feature.pm line 269, line 191. I also get a similar message about not being able to locate object method "start", which is called on the next line of Bio::Graphics::Feature.pm I vaguely recall asking about this previously - was a solution ever presented? Many thanks in advance Mick From Anthony.Underwood at hpa.org.uk Mon May 9 08:30:53 2005 From: Anthony.Underwood at hpa.org.uk (SRMD, Col - Underwood, Anthony) Date: Mon May 9 08:24:30 2005 Subject: [Bioperl-l] ContigAnalysis Message-ID: Please can anybody confirm which methods should work in the Bio::Assembly::ContigAnalysis module. It seems to me that single_strand method does not work, whereas low_consensus_quality does? Is there a method that is implemented and will find regiosn that are not double-stranded? Would the not_confirmed_on_both_strands method work? Many thanks, Anthony Dr Anthony Underwood Bioinformatics Group | Genomics, Proteomics and Bioinformatics Unit Centre for Infections Health Protection Agency 61 Colindale Avenue London NW9 5HT t: 0208 3276466 f: 0208 3276738 e:anthony.underwood@hpa.org.uk ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************** From michael.watson at bbsrc.ac.uk Mon May 9 11:27:48 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon May 9 11:25:02 2005 Subject: [Bioperl-l] Passing extra arguments to method references in Bio::Graphics::Panel::add_track Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D41E@iahce2knas1.iah.bbsrc.reserved> Hi I am on bioperl-1.5. I'm using the following code to create some rather tasty images: From michael.watson at bbsrc.ac.uk Mon May 9 11:30:44 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon May 9 11:25:12 2005 Subject: [Bioperl-l] RE: Passing extra arguments to method references in Bio::Graphics::Panel::add_track Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D41F@iahce2knas1.iah.bbsrc.reserved> Hi Sorry, a bit hasty on the trigger.... I am on bioperl-1.5. I'm using the following code to create some rather tasty images: $panel->add_track(transcript2 => \@includeCDS, -bgcolor => 'blue', -fgcolor => 'black', -key => 'CDS', -bump => 0, -height => 10, -label => \&gene_description, -description=> \&gene_label, ); This is fairly standard, and @includeCDS is a bunch of feature objects. What I want to do is pass extra arguments to &gene_description, and then within &gene_description check to see if the feature start is greater than a certain value (the extra argument). If it is, then I want to return an empty string, if it isn't I want to return the gene description. Something like: -label => \&gene_description($start) But when I tried that it didn't work ;-(. So is it possible to pass extra arguments to those functions I am referencing above? Many thanks Mick From michael.watson at bbsrc.ac.uk Mon May 9 11:27:48 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon May 9 11:25:20 2005 Subject: [Bioperl-l] Passing extra arguments to method references in Bio::Graphics::Panel::add_track Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D41E@iahce2knas1.iah.bbsrc.reserved> Hi I am on bioperl-1.5. I'm using the following code to create some rather tasty images: From crabtree at tigr.org Mon May 9 11:54:19 2005 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Mon May 9 11:47:02 2005 Subject: [Bioperl-l] RE: Passing extra arguments to method references inBio::Graphics::Panel::add_track Message-ID: Michael- What you're trying to do is known as "currying" in functional programming parlance. Perl (5, at least) doesn't support currying directly, but you can implement it yourself using a simple closure. For example, instead of this: > -label => \&gene_description($start) you could use something like the following (assuming that &gene_description expects the $start argument, followed by the usual arguments to a Bioperl track callback subroutine): -label => sub { my @args = @_; return &gene_description($start, @args); } Or if you need to repeat this trick a number of times, with different $start values, you could create yourself a subroutine-generating subroutine, like so (in this example I've made the outer subroutine anonymous, but you don't have to): my $funMaker = sub { my $start = shift; return sub { my @args = @_; return &gene_description($start, @args); } }; and then in the add_track method you'd say: -label => &$funMaker($my_start_value), If you google some combination of "currying", "perl", and "closure" you'll find a bunch of pages that discuss these and similar techniques. Hope this helps, Jonathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > michael watson (IAH-C) > Sent: Monday, May 09, 2005 11:31 AM > To: Bioperl > Subject: [Bioperl-l] RE: Passing extra arguments to method > references inBio::Graphics::Panel::add_track > > > Hi > > Sorry, a bit hasty on the trigger.... > > I am on bioperl-1.5. > > I'm using the following code to create some rather tasty images: > > $panel->add_track(transcript2 => \@includeCDS, > -bgcolor => 'blue', > -fgcolor => 'black', > -key => 'CDS', > -bump => 0, > -height => 10, > -label => \&gene_description, > -description=> \&gene_label, > ); > > This is fairly standard, and @includeCDS is a bunch of > feature objects. > > What I want to do is pass extra arguments to > &gene_description, and then within &gene_description check to > see if the feature start is greater than a certain value (the > extra argument). If it is, then I want to return an empty > string, if it isn't I want to return the gene description. > Something like: > > -label => \&gene_description($start) > > But when I tried that it didn't work ;-(. > > So is it possible to pass extra arguments to those functions > I am referencing above? > > Many thanks > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-> bio.org/mailman/listinfo/bioperl-l > From ferdinand.marletaz at gmail.com Mon May 9 12:47:21 2005 From: ferdinand.marletaz at gmail.com (=?ISO-8859-1?Q?Ferdinand_Marl=E9taz?=) Date: Mon May 9 12:40:15 2005 Subject: [Bioperl-l] write_sequence Message-ID: <7c7aa474050509094721bab58b@mail.gmail.com> Sorry if my question is stupid !!! I start programming using bioperl and I've a little problem with the function write_sequence. In fact I'd like to add sequence to pre-existing files in a loop and I was convinced that re-do a write_seq on a already written file would only add the interest lines... But it's not the case and it overwrites my previous files. What could be the solution to this problem ??? Thanks cheers Ferdi From laurichj at bioinfo.ucr.edu Mon May 9 12:55:18 2005 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Mon May 9 12:48:07 2005 Subject: [Bioperl-l] write_sequence In-Reply-To: <7c7aa474050509094721bab58b@mail.gmail.com> References: <7c7aa474050509094721bab58b@mail.gmail.com> Message-ID: <20050509165518.GB369@bioinfo.ucr.edu> On Mon 05/09/05 18:47, Ferdinand Marl?taz wrote: > Sorry if my question is stupid !!! > > I start programming using bioperl and I've a little problem with the > function write_sequence. In fact I'd like to add sequence to > pre-existing files in a loop and I was convinced that re-do a > write_seq on a already written file would only add the interest > lines... But it's not the case and it overwrites my previous files. > What could be the solution to this problem ??? Use: my $in = new Bio::SeqIO( -filename => ">>foo" ... ) rather than. my $in = new Bio::SeqIO( -filename => "foo" ... ) the ">>" will tell Perl to append to the file. -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| | Geek Code: Version 3.12 | | GAT/CS$/IT$ d+ s-: a-->--- C++++$ UL++++$ P++ L++++| | $E--- W+ N o? K? w--(---) O? M+(++) V? PS++ PE-(--)| | Y+ PGP+++ t--- 5+++ X+ R tv DI++ D--- G++ | | e++ h- r++ z? | |----------------------------------------------------| From palmeida at igc.gulbenkian.pt Mon May 9 13:11:59 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Mon May 9 13:04:53 2005 Subject: [Bioperl-l] write_sequence] Message-ID: <20050509171159.GC2705@bioinf.igc.gulbenkian.pt> If you use >> instead of > , when indicating the output file, write_seq appends the output to the file. Something like this, if you are specifying the output file when creating the SeqIO object: $out = Bio::SeqIO->new(-file => ">>outputfilename"); -Paulo On Mon, May 09, 2005 at 06:47:21PM +0200, Ferdinand Marl?taz wrote: > Sorry if my question is stupid !!! > > I start programming using bioperl and I've a little problem with the > function write_sequence. In fact I'd like to add sequence to > pre-existing files in a loop and I was convinced that re-do a > write_seq on a already written file would only add the interest > lines... But it's not the case and it overwrites my previous files. > What could be the solution to this problem ??? > > Thanks > > cheers > > Ferdi > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From jason.stajich at duke.edu Mon May 9 14:27:40 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon May 9 14:22:51 2005 Subject: [Bioperl-l] Problems with Bio/Graphics/Feature.pm In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950121BB77@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950121BB77@iahce2knas1.iah.bbsrc.reserved> Message-ID: you need to pass in a SeqFeature::Generic or Graphics::Feature obj instead of Sequence object. I updated the code in CVS: http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ examples/biographics/render_sequence.pl?rev=1.2&cvsroot=bioperl [fyi Bio::DB Bio::Graphics developers] There was something else weird about how this was working - somehow Bio::Location objects are getting passed to the description and label functions. I don't quite understand, might be my local code playing too. -jason On May 9, 2005, at 6:29 AM, michael watson ((IAH-C)) wrote: > Hi > > I'm hacking around with the render_sequence.pl example script and keep > getting errors: > > Can't locate object method "seq_id" via package "Bio::Seq::RichSeq" at > /usr/local/bioperl-1.5.0/Bio/Graphics/Feature.pm line 269, line > 191. > > I also get a similar message about not being able to locate object > method "start", which is called on the next line of > Bio::Graphics::Feature.pm > > I vaguely recall asking about this previously - was a solution ever > presented? > > Many thanks in advance > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From michael.watson at bbsrc.ac.uk Mon May 9 15:11:20 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon May 9 15:04:23 2005 Subject: [Bioperl-l] Problems with Bio/Graphics/Feature.pm Message-ID: <8975119BCD0AC5419D61A9CF1A923E950121BB7B@iahce2knas1.iah.bbsrc.reserved> Hi I didn't deliberately pass a RichSeq object - a call in the render_sequence.pl script did i.e. render_sequence.pl doesn't work "out of the box". I think it's this piece of code that breaks it: $panel->add_track(arrow => $seq, -bump => 0, -double=>1, -tick => 2); Mick :-) -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: Mon 09/05/2005 7:27 PM To: michael watson (IAH-C) Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] Problems with Bio/Graphics/Feature.pm you need to pass in a SeqFeature::Generic or Graphics::Feature obj instead of Sequence object. I updated the code in CVS: http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ examples/biographics/render_sequence.pl?rev=1.2&cvsroot=bioperl [fyi Bio::DB Bio::Graphics developers] There was something else weird about how this was working - somehow Bio::Location objects are getting passed to the description and label functions. I don't quite understand, might be my local code playing too. -jason On May 9, 2005, at 6:29 AM, michael watson ((IAH-C)) wrote: > Hi > > I'm hacking around with the render_sequence.pl example script and keep > getting errors: > > Can't locate object method "seq_id" via package "Bio::Seq::RichSeq" at > /usr/local/bioperl-1.5.0/Bio/Graphics/Feature.pm line 269, line > 191. > > I also get a similar message about not being able to locate object > method "start", which is called on the next line of > Bio::Graphics::Feature.pm > > I vaguely recall asking about this previously - was a solution ever > presented? > > Many thanks in advance > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From sam.kalat at gmail.com Mon May 9 15:40:02 2005 From: sam.kalat at gmail.com (Sam Kalat) Date: Mon May 9 15:33:01 2005 Subject: [Bioperl-l] StandAloneBlast or bl2seq quietly converts Ns to Ts Message-ID: <9d86df2b05050912405b0f67d3@mail.gmail.com> I'm not sure if this is a quirk of bl2seq or in bioperl. My task is to compare sequences that came from the same trace file, but were processed differently: with different basecallers, trimmers, screens, and the like. I take two sequences at a time that come from the same source, and BLAST them against each other using StandAloneBlast with bl2seq. I noticed in testing that I could take a sequence and BLAST it against itself, and frequently such a comparison isn't perfect - the fraction of identical bases might be somewhere in the 90's. On examination I see stuff like this (fake data shown): Query 1: ctgactgannnnnnnctgatcgatcgtacgtacg Sbjct 1: ctgactgatttttttctgatcgatcgtacgtacg The target was supposed to be the same as the subject, but anything that was an N becomes a T in the subject, but not the query, so they don't match up perfectly. I don't know why T was chosen, but it is always T. Anyone know if this is intentional behavior? Ultimately it means that all Ns in sequences treated this way are mismatches. It seems weird to me because the sequence in question didn't have a string of Ts, and now anything that does have a string of Ts will be more likely to match. Code available on request, but it doesn't try to do anything out of the ordinary, and it runs w/o errors. Thanks in advance Sam Kalat From jason.stajich at duke.edu Mon May 9 15:53:09 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon May 9 15:46:04 2005 Subject: [Bioperl-l] StandAloneBlast or bl2seq quietly converts Ns to Ts In-Reply-To: <9d86df2b05050912405b0f67d3@mail.gmail.com> References: <9d86df2b05050912405b0f67d3@mail.gmail.com> Message-ID: <75D4AA48-FE04-4687-AE29-E7BBADCDBAAE@duke.edu> turn low complexity filtering off. The -F F cmd-line option. -jason On May 9, 2005, at 3:40 PM, Sam Kalat wrote: > I'm not sure if this is a quirk of bl2seq or in bioperl. My task is > to compare sequences that came from the same trace file, but were > processed differently: with different basecallers, trimmers, screens, > and the like. I take two sequences at a time that come from the same > source, and BLAST them against each other using StandAloneBlast with > bl2seq. I noticed in testing that I could take a sequence and BLAST > it against itself, and frequently such a comparison isn't perfect - > the fraction of identical bases might be somewhere in the 90's. > > On examination I see stuff like this (fake data shown): > > Query 1: ctgactgannnnnnnctgatcgatcgtacgtacg > Sbjct 1: ctgactgatttttttctgatcgatcgtacgtacg > > The target was supposed to be the same as the subject, but anything > that was an N becomes a T in the subject, but not the query, so they > don't match up perfectly. I don't know why T was chosen, but it is > always T. > > Anyone know if this is intentional behavior? Ultimately it means that > all Ns in sequences treated this way are mismatches. It seems weird > to me because the sequence in question didn't have a string of Ts, and > now anything that does have a string of Ts will be more likely to > match. > > Code available on request, but it doesn't try to do anything out of > the ordinary, and it runs w/o errors. > > Thanks in advance > Sam Kalat > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Mon May 9 15:54:36 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon May 9 15:47:30 2005 Subject: [Bioperl-l] Problems with Bio/Graphics/Feature.pm In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950121BB7B@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950121BB7B@iahce2knas1.iah.bbsrc.reserved> Message-ID: <0DCE537D-CFDA-43B6-A8D8-687BAC352794@duke.edu> yes I know - that is why I fixed that script in CVS just now. 'you' was meant generally - fault of the script and API drifting apart so not your (Mick) fault at all. -j On May 9, 2005, at 3:11 PM, michael watson ((IAH-C)) wrote: > Hi > > I didn't deliberately pass a RichSeq object - a call in the > render_sequence.pl script did i.e. render_sequence.pl doesn't work > "out of the box". > > I think it's this piece of code that breaks it: > > $panel->add_track(arrow => $seq, > -bump => 0, > -double=>1, > -tick => 2); > > Mick > :-) > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Mon 09/05/2005 7:27 PM > To: michael watson (IAH-C) > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] Problems with Bio/Graphics/Feature.pm > > you need to pass in a SeqFeature::Generic or Graphics::Feature obj > instead of Sequence object. > > I updated the code in CVS: > > http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > examples/biographics/render_sequence.pl?rev=1.2&cvsroot=bioperl > > [fyi Bio::DB Bio::Graphics developers] > There was something else weird about how this was working - somehow > Bio::Location objects are getting passed to the description and label > functions. I don't quite understand, might be my local code playing > too. > > -jason > On May 9, 2005, at 6:29 AM, michael watson ((IAH-C)) wrote: > > >> Hi >> >> I'm hacking around with the render_sequence.pl example script and >> keep >> getting errors: >> >> Can't locate object method "seq_id" via package >> "Bio::Seq::RichSeq" at >> /usr/local/bioperl-1.5.0/Bio/Graphics/Feature.pm line 269, >> line >> 191. >> >> I also get a similar message about not being able to locate object >> method "start", which is called on the next line of >> Bio::Graphics::Feature.pm >> >> I vaguely recall asking about this previously - was a solution ever >> presented? >> >> Many thanks in advance >> >> Mick >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > > From roy at colibase.bham.ac.uk Wed May 11 06:21:20 2005 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Wed May 11 06:18:12 2005 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error Message-ID: <4281DCA0.3070201@colibase.bham.ac.uk> Hi all. I'm attempting to install bioperl-ext as I need to read in sequences from abi files. However the Bio::SeqIO::staden::read module fails the tests (the output from "perl Makefile.PL; make; make test" in the bioperl-ext/Bio/SeqIO/staden directory is pasted below). If I ignore the failed tests and make install anyway, I get the following error with a simple abi2fasta script: # abi2fasta a_A01_001.ab1 Can't locate object method "staden_read_trace" via package "Bio::SeqIO::abi" at /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/Bio/SeqIO/staden/read.pm line 115. One or more DATA sections were not processed by Inline. I know that this subject comes up regularly on the mailing list, but I have searched the archives and tried all the suggestions, and I think my problem is something that has not been raised before. I am using a standard CPAN bioperl-1.4 install (with Bio::Tools::dpAlign replaced with the cvs version, and the eval requiring Bio::SeqIO::staden::read in Bio::SeqIO commented out) and the cvs version of bioperl-ext (with read.pm modified to include -lz in LIBS). The Staden io_lib is installed in /usr/local/include with the os.h and config.h files copied into /usr/local/include/io_lib, and the os.h line changed to "config.h". The io_lib is detected correctly during the perl Makefile.PL process. Does anyone have any other suggestions? Thanks. Roy. # perl Makefile.PL Found Staden io_lib "libread" in /usr/local/lib ... Automatically using the Read.h found in /usr/local/include/io_lib ... Writing Makefile for Bio::SeqIO::staden::read # make cp read.pm blib/lib/Bio/SeqIO/staden/read.pm /usr/bin/perl -Mblib -MInline=NOISY,_INSTALL_ -MBio::SeqIO::staden::read -e1 0.01 blib/arch Starting Build Prepocess Stage Finished Build Prepocess Stage Starting Build Parse Stage Finished Build Parse Stage Starting Build Glue 1 Stage Finished Build Glue 1 Stage Starting Build Glue 2 Stage Finished Build Glue 2 Stage Starting Build Glue 3 Stage Finished Build Glue 3 Stage Starting Build Compile Stage Starting "perl Makefile.PL" Stage Writing Makefile for Bio::SeqIO::staden::read Finished "perl Makefile.PL" Stage Starting "make" Stage make[1]: Entering directory `/home/roy/bioperl-ext/Bio/SeqIO/staden/_Inline/build/Bio/SeqIO/staden/read' /usr/bin/perl /usr/lib/perl5/5.8.3/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8.3/ExtUtils/typemap read.xs > read.xsc && mv read.xsc read.c gcc -c -I/home/roy/bioperl-ext/Bio/SeqIO/staden -I/usr/local/include/io_lib -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -march=i386 -mcpu=i686 -DVERSION=\"0.01\" -DXS_VERSION=\"0.01\" -fPIC "-I/usr/lib/perl5/5.8.3/i386-linux-thread-multi/CORE" read.c In file included from /usr/local/include/io_lib/os.h:4, from /usr/local/include/io_lib/Read.h:43, from read.xs:5: /usr/local/include/io_lib/config.h:45:1: warning: "VERSION" redefined :10:1: warning: this is the location of the previous definition read.xs: In function `staden_write_trace': read.xs:32: warning: assignment from incompatible pointer type Running Mkbootstrap for Bio::SeqIO::staden::read () chmod 644 read.bs rm -f blib/arch/auto/Bio/SeqIO/staden/read/read.so gcc -shared -L/usr/local/lib read.o -o blib/arch/auto/Bio/SeqIO/staden/read/read.so -L/usr/local/lib -lread chmod 755 blib/arch/auto/Bio/SeqIO/staden/read/read.so cp read.bs blib/arch/auto/Bio/SeqIO/staden/read/read.bs chmod 644 blib/arch/auto/Bio/SeqIO/staden/read/read.bs make[1]: Leaving directory `/home/roy/bioperl-ext/Bio/SeqIO/staden/_Inline/build/Bio/SeqIO/staden/read' Finished "make" Stage Starting "make install" Stage make[1]: Entering directory `/home/roy/bioperl-ext/Bio/SeqIO/staden/_Inline/build/Bio/SeqIO/staden/read' Installing /home/roy/bioperl-ext/Bio/SeqIO/staden/blib/arch/auto/Bio/SeqIO/staden/read/read.bs Installing /home/roy/bioperl-ext/Bio/SeqIO/staden/blib/arch/auto/Bio/SeqIO/staden/read/read.so Files found in blib/arch: installing files in blib/lib into architecture dependent library tree Writing /home/roy/bioperl-ext/Bio/SeqIO/staden/blib/arch/auto/Bio/SeqIO/staden/read/.packlist make[1]: Leaving directory `/home/roy/bioperl-ext/Bio/SeqIO/staden/_Inline/build/Bio/SeqIO/staden/read' Finished "make install" Stage Starting Cleaning Up Stage Finished Cleaning Up Stage Finished Build Compile Stage Manifying blib/man3/Bio::SeqIO::staden::read.3pm # make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" test.pl test....The extension 'Bio::SeqIO::staden::read' is not properly installed in path: '../../..' If this is a CPAN/distributed module, you may need to reinstall it on your system. To allow Inline to compile the module in a temporary cache, simply remove the Inline config option 'VERSION=' from the Bio::SeqIO::staden::read module. at test.pl line 0 INIT failed--call queue aborted, line 1. test....dubious Test returned status 255 (wstat 65280, 0xff00) Scalar found where operator expected at (eval 155) line 1, near "'int' $__val" (Missing operator before $__val?) DIED. FAILED tests 1-94 Failed 94/94 tests, 0.00% okay Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- test.pl 255 65280 94 188 200.00% 1-94 Failed 1/1 test scripts, 0.00% okay. 94/94 subtests failed, 0.00% okay. make: *** [test_dynamic] Error 2 -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, UK http://colibase.bham.ac.uk From amackey at pcbi.upenn.edu Wed May 11 08:23:29 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed May 11 08:25:06 2005 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <4281DCA0.3070201@colibase.bham.ac.uk> References: <4281DCA0.3070201@colibase.bham.ac.uk> Message-ID: <402d127f904c88895c4129202752bc3b@pcbi.upenn.edu> Congratulations and thanks for following all of the advice already given on the list. It looks like you hit all the major bases. Just to confirm, you copied os.h and config.h into the installation directory *after* you ran configure with io-lib, yes? > read.xs: In function `staden_write_trace': > read.xs:32: warning: assignment from incompatible pointer type > Here's the source of all subsequent errors: read.xs could not be compiled. Now you get to read line 32 (which may not actually be the literal line 32, because of various #line preprocessor directives) and see what pointer type is being referenced. Usually this is indicative of something being wrong with the .h files that declare the types. Did io-lib itself build and install cleanly on your platform? Good luck, -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From michael.watson at bbsrc.ac.uk Wed May 11 10:54:59 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed May 11 10:47:47 2005 Subject: [Bioperl-l] RE: Passing extra arguments to method references inBio::Graphics::Panel::add_track Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D443@iahce2knas1.iah.bbsrc.reserved> OK, thank you very much for the information which was absolutely correct but wasn't useful for what I want to do. So I'm drawing images of genes. I don't want "bumped" images, and what this means is that the labels begin to overwrite one another and it looks awful. So what I want to do is ONLY draw a label and a description if the glyph is the FIRST glyph in a particular track. Maybe I'm being stupid, but I can't figure out how to do it - I can't see how I can make each new glyph figure out if a glyph has been drawn before it on the same track. On a different note, I want an overall title for each track (not each glyph in a track, a title for the entire track) - and I don't want to have a key. Is that possible? Many thanks Mick -----Original Message----- From: Crabtree, Jonathan [mailto:crabtree@tigr.org] Sent: 09 May 2005 16:54 To: michael watson (IAH-C) Cc: Bioperl Subject: RE: [Bioperl-l] RE: Passing extra arguments to method references inBio::Graphics::Panel::add_track Michael- What you're trying to do is known as "currying" in functional programming parlance. Perl (5, at least) doesn't support currying directly, but you can implement it yourself using a simple closure. For example, instead of this: > -label => \&gene_description($start) you could use something like the following (assuming that &gene_description expects the $start argument, followed by the usual arguments to a Bioperl track callback subroutine): -label => sub { my @args = @_; return &gene_description($start, @args); } Or if you need to repeat this trick a number of times, with different $start values, you could create yourself a subroutine-generating subroutine, like so (in this example I've made the outer subroutine anonymous, but you don't have to): my $funMaker = sub { my $start = shift; return sub { my @args = @_; return &gene_description($start, @args); } }; and then in the add_track method you'd say: -label => &$funMaker($my_start_value), If you google some combination of "currying", "perl", and "closure" you'll find a bunch of pages that discuss these and similar techniques. Hope this helps, Jonathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > michael watson (IAH-C) > Sent: Monday, May 09, 2005 11:31 AM > To: Bioperl > Subject: [Bioperl-l] RE: Passing extra arguments to method > references inBio::Graphics::Panel::add_track > > > Hi > > Sorry, a bit hasty on the trigger.... > > I am on bioperl-1.5. > > I'm using the following code to create some rather tasty images: > > $panel->add_track(transcript2 => \@includeCDS, > -bgcolor => 'blue', > -fgcolor => 'black', > -key => 'CDS', > -bump => 0, > -height => 10, > -label => \&gene_description, > -description=> \&gene_label, > ); > > This is fairly standard, and @includeCDS is a bunch of > feature objects. > > What I want to do is pass extra arguments to > &gene_description, and then within &gene_description check to > see if the feature start is greater than a certain value (the > extra argument). If it is, then I want to return an empty > string, if it isn't I want to return the gene description. > Something like: > > -label => \&gene_description($start) > > But when I tried that it didn't work ;-(. > > So is it possible to pass extra arguments to those functions > I am referencing above? > > Many thanks > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-> bio.org/mailman/listinfo/bioperl-l > From pmiguel at purdue.edu Wed May 11 11:58:35 2005 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed May 11 11:52:54 2005 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <4281DCA0.3070201@colibase.bham.ac.uk> References: <4281DCA0.3070201@colibase.bham.ac.uk> Message-ID: <42822BAB.5090407@purdue.edu> Roy Chaudhuri wrote: >Hi all. > >I'm attempting to install bioperl-ext as I need to read in sequences >from abi files. However the Bio::SeqIO::staden::read module fails the >tests (the output from "perl Makefile.PL; make; make test" in the >bioperl-ext/Bio/SeqIO/staden directory is pasted below). > >[...] > Clark Tibbetts paper: http://www-2.cs.cmu.edu/afs/cs/project/genome/WWW/Papers/clark.html tells how to parse an ABIF file. So you can do it all in perl. The script below, for instance prints out the sequence embedded in an ABIF file. The unedited sequence. I'm not sure how common editing of an .ab1 file is anymore. But you could get the edited sequence with: quikABIFdata( $file, "PBAS", 2, "A*" ) Actually, if the ABI basecaller used is the relatively recent "KB basecaller", then quality values are stored in the PCON record of the trace file. But you would need a different unpacking method to get them: quikABIFdata( $file, "PCON", 1, "C*" ) (Also the quality values are returned as separate elements in the array, but sequence is returned as a string containing the whole sequence in the first element of the array. So you would either need to join "PCON" or split "PBAS" to get them into equivalent formats.) I would like to write this as a method for bioperl, but I'm more of a bench scientist than a programmer. So I haven't learned how to write an object oriented program yet. Basically, I can use bioperl modules, but I can't write them. Anyway, my point is that it only takes about 40 lines of perl to read and find a record in an ABI trace file. The actual chromatograms for each base are also accessible using this technique. They are present in the DATA1-4 records (raw data, stored as signed shorts --need to unpack with "s", but might be different if you are not using a "big-endian" machine). The processed trace data is in DATA9-12. That is the data that phred uses to do base calling. I unpack that with "n" but my primitive null padding method below would need to be modified to get the 2 digit tag numbered records. #!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0]; my ($sequence) = quikABIFdata( $file, "PBAS", 1, "A*" ); print $sequence,"\n"; sub quikABIFdata { =pod $_[] parameter default 0 filename 1 tag TUBE 2 tagnum 0001 3 unpack_method none If unpacked, then a list is returned. If no unpacking is done the whole blob is returned as a string. The unpack methods are described in perldoc -f pack. =cut my $tag = $_[1] || "TUBE"; if ( defined $_[2] ) { my $tagnum = chr($_[2]); $tag .= "\000\000\000$tagnum"; } else { $tag = $tag."\000\000\000\001"; } my $in = slurpABIFfile( $_[0] ) || return undef; return undef unless ( my ( $refarraylen, $datptr ) = $in =~ /$tag.{8}(.{4})(.{4})/s ); $refarraylen = unpack ( "N", $refarraylen ); my $dat; if ( $refarraylen > 4 ) { $datptr = unpack ( "N", $datptr ); $dat = substr ( $in, $datptr, $refarraylen ); } else { $dat = $datptr; } return( $dat ) unless ( defined $_[3] ); my @dat = unpack ( $_[3], $dat ); } sub slurpABIFfile { return ( undef ) unless ( defined($_[0]) && -r $_[0] ); undef $/; open(INPUTFILE, "$_[0]") or die "Can't open $_[0], $!\n"; my $whole_trace_file = ; } From roy at colibase.bham.ac.uk Wed May 11 13:23:02 2005 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Wed May 11 13:16:59 2005 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <402d127f904c88895c4129202752bc3b@pcbi.upenn.edu> References: <4281DCA0.3070201@colibase.bham.ac.uk> <402d127f904c88895c4129202752bc3b@pcbi.upenn.edu> Message-ID: <42823F76.4010809@colibase.bham.ac.uk> > Just to > confirm, you copied os.h and config.h into the installation directory > *after* you ran configure with io-lib, yes? That's right, yes. > Here's the source of all subsequent errors: read.xs could not be > compiled. Now you get to read line 32 (which may not actually be the > literal line 32, because of various #line preprocessor directives) and > see what pointer type is being referenced. Usually this is indicative > of something being wrong with the .h files that declare the types. Did > io-lib itself build and install cleanly on your platform? Yeah, no problems with the io_lib install that I could see. Is there an easy non-Bioperl way of testing that it works? You'll have to excuse my ignorance (I know very little about inline C and makefiles) but I couldn't find a file called read.xs- is it deleted during the make process? I know Phillip has suggested a workaround, but it would still be nice to sort out what the problem is. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, UK http://colibase.bham.ac.uk From crabtree at tigr.org Wed May 11 14:04:07 2005 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Wed May 11 13:56:51 2005 Subject: [Bioperl-l] RE: Passing extra arguments to method references inBio::Graphics::Panel::add_track Message-ID: Hi Mick- >So I'm drawing images of genes. I don't want "bumped" images, and what >this means is that the labels begin to overwrite one another and it >looks awful. So what I want to do is ONLY draw a label and a >description if the glyph is the FIRST glyph in a particular track. >Maybe I'm being stupid, but I can't figure out how to do it - I can't >see how I can make each new glyph figure out if a glyph has been drawn >before it on the same track. I can think of a few ways to do something like this, and I've included some sample code (see below) that illustrates two of them. You may also be able to achieve a similar effect by enclosing your gene glyphs in some kind of "invisible" parent feature that prints a label and description but nothing else. >On a different note, I want an overall title for each track (not each >glyph in a track, a title for the entire track) - and I don't want to >have a key. Is that possible? Have you tried setting -key_style=>'between' in the call to Panel->new()? This will place each track title next to the relevant track, instead of at the bottom of the image. Jonathan #!/usr/bin/perl use Bio::Graphics::Panel; use Bio::SeqFeature::Generic; my $panel = Bio::Graphics::Panel->new(-length=> 1000, -width=> 600, -key_style=> 'between'); # 3 features my $f1 = Bio::SeqFeature::Generic->new(-start=>200, -end=>300, -primary_tag=>'misc', -label=>'l1', -display_name=>'d1'); my $f2 = Bio::SeqFeature::Generic->new(-start=>400, -end=>600, -primary_tag=>'misc', -label=>'l2', -display_name=>'d2'); my $f3 = Bio::SeqFeature::Generic->new(-start=>50, -end=>150, -primary_tag=>'misc', -label=>'l3', -display_name=>'d3'); # APPROACH #1: decide in advance which of the features will be lucky # enough to get a label & description; we'll call it $specialFeat my $specialFeat = $f2; my $descrFn1 = sub { my $feat = shift; # returning ' ' instead of undef or '' to maintain vertical spacing return ' ' unless ($feat eq $specialFeat); return $feat->primary_tag(); }; my $labelFn1 = sub { my $feat = shift; # returning ' ' instead of undef or '' to maintain vertical spacing return ' ' unless ($feat eq $specialFeat); return 1; # use default label }; my $track1 = $panel->add_track([$f1,$f2,$f3], -glyph => 'generic', -label => $labelFn1, -description => $descrFn1, -fontcolor => 'red', -font2color => 'blue', -bgcolor => 'blue', -key => 'track1', ); # APPROACH #2: Write functions that will return a label/description # only for the "first" feature drawn in a given track (i.e., the # first time they are called.) # Note that this approach is more "dangerous" because it relies on the # fact that Bioperl doesn't make any superfluous calls to $descrFn2 or # $labelFn2. Note also that the labels appear on $f3, not $f1 (at # least on my machine), because Bioperl does not necessarily draw the # features in the order that they are presented to the add_track # method [$f1,$f2,$f3]. my $descrCallCount = 0; my $descrFn2 = sub { my $feat = shift; ++$descrCallCount; # returning ' ' instead of undef or '' to maintain vertical spacing return ' ' unless ($descrCallCount == 1); return $feat->primary_tag(); }; my $labelCallCount = 0; my $labelFn2 = sub { my $feat = shift; ++$labelCallCount; # returning ' ' instead of undef or '' to maintain vertical spacing return ' ' unless ($labelCallCount == 1); return 1; # use default label }; my $track2 = $panel->add_track([$f1,$f2,$f3], -glyph => 'generic', -label => $labelFn2, -description => $descrFn2, -fontcolor => 'red', -font2color => 'blue', -bgcolor => 'blue', -key => 'track2', ); # note that if you want to call png() again (or any other method that results # in $labelFn2 or $descrFn2 being called) then you'll first want to reset # $descrCallCount and $labelCallCount to their original values print $panel->png(); From Matthew.Betts at bccs.uib.no Wed May 11 07:53:49 2005 From: Matthew.Betts at bccs.uib.no (Matthew Betts) Date: Wed May 11 21:31:49 2005 Subject: [Bioperl-l] Bio::SearchIO::amps and Bio::SearchIO::mrbayes_nexus Message-ID: Hi, I was thinking of writing a Bio::SearchIO module for AMPS block format. This is the format used by alscript and stamp. Both of these come with format converters, but would be useful for me to do it within bioperl. OK for me to write Bio::SearchIO::amps, or is there something else already? Is that name OK? Secondly, MrBayes doesn't like some things in the Nexus format output by Bio::SearchIO::nexus (the 'symbols' parameter, and it expects 'end;' rather than 'endblock;') even though they're valid nexus... OK to copy Bio::SearchIO::nexus to Bio::SearchIO::mrbayes_nexus and make the necessary changes, or is there a better way? (Though a full nexus parser and flexible outputter looks like a nightmare...) Thanks, Matthew From Matthew.Betts at bccs.uib.no Wed May 11 08:31:44 2005 From: Matthew.Betts at bccs.uib.no (Matthew Betts) Date: Wed May 11 21:31:53 2005 Subject: [Bioperl-l] Re: Bio::SearchIO::amps and Bio::SearchIO::mrbayes_nexus In-Reply-To: References: Message-ID: Sorry, meant Bio::AlignIO::* (oops) On Wed, 11 May 2005, Matthew Betts wrote: > > Hi, > > I was thinking of writing a Bio::SearchIO module for AMPS block format. This > is the format used by alscript and stamp. Both of these come with format > converters, but would be useful for me to do it within bioperl. OK for me to > write Bio::SearchIO::amps, or is there something else already? Is that name > OK? > > Secondly, MrBayes doesn't like some things in the Nexus format output by > Bio::SearchIO::nexus (the 'symbols' parameter, and it expects 'end;' rather > than 'endblock;') even though they're valid nexus... OK to copy > Bio::SearchIO::nexus to Bio::SearchIO::mrbayes_nexus and make the necessary > changes, or is there a better way? (Though a full nexus parser and flexible > outputter looks like a nightmare...) > > Thanks, > > Matthew > > -- Matthew Betts, Post Doc, Computational Biology Unit, BCCS, HiB, UiB, Thorm?hlensgt. 55, 5008 Bergen, Norway tlf: (+47) 55 58 40 22, fax: (+47) 55 58 42 95 mailto:matthew.betts@bccs.uib.no, www.ii.uib.no/~matthewb From Matthew.Betts at bccs.uib.no Wed May 11 08:41:54 2005 From: Matthew.Betts at bccs.uib.no (Matthew Betts) Date: Wed May 11 21:31:55 2005 Subject: [Bioperl-l] web alignment format conversion? Message-ID: Hi, Is there a web interface to any of the common bioperl tools? Particularly alignment format conversion using Bio::AlignIO. None of the existing web tools that I find to do this are as comprehensive as Bio::AlignIO. I've been working with a few people in the molecular biology side of our building, and often it is little things like incompatoble formats that stop them being able to use useful programs without having someone with command-line skills do it for them. Thinking of trying to get something like that set up here. Thanks, Matthew -- Matthew Betts, Post Doc, Computational Biology Unit, BCCS, HiB, UiB, Thorm?hlensgt. 55, 5008 Bergen, Norway tlf: (+47) 55 58 40 22, fax: (+47) 55 58 42 95 mailto:matthew.betts@bccs.uib.no, www.ii.uib.no/~matthewb From aplykimo at yahoo.com.tw Wed May 11 11:23:47 2005 From: aplykimo at yahoo.com.tw (aplykimo) Date: Wed May 11 21:31:56 2005 Subject: [Bioperl-l] [help] bl2seq using blastp Message-ID: <20050511152347.47126.qmail@web16105.mail.tpe.yahoo.com> HI, I use bl2seq to align two sequence. In file "i.fa" the first seq is >1 TSTSCTTTATCAGGATCACCAGGCCCCATCNGGATTCYCMAGAACCCCCAGCCCCAGGGACAG . In file "j.fa" Second seq is >2 MTGGARGACAGATGAGGACCACACCCGCAACCCCCAAGCCAGGACCAGCATCAGTGTCC. Using command " bl2seq -i i.fa -j j.fa -p blastp -o bl2seq.out -G 10 -E 1 -W 2 -g F -M BLOSUM62". However, it output "[NULL_Caption] WARNING: [000.000] SetUpBlastSearch failed." I try many different parameters, but it doesn't work at all. Is there any one know how to solve this problem? thx _______________________________________________________________________ Yahoo!©_¼¯¹q¤l«H½c §K¶O®e¶q250MB¡A«H¥ó¦b¦h¤]¤£©È http://tw.promo.yahoo.com/mail_new/index.html From n.haigh at sheffield.ac.uk Thu May 12 06:10:23 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu May 12 06:04:02 2005 Subject: [Bioperl-l] AlignIO::* match_char, gap_char and missing_char etc Message-ID: I've noticed some inconsistency in the way sequence alignments are read and stored and printed when match_char, gap_char and missing_char are used. Should sequences be stored exactly the way they are represented in the file? Should there be default values for formats that support one or more of match_char, gap_char and missing_char or should these only be set if they are used in the alignment file? Should formats that don't support match_char check for and do an unmatch during a write_aln? Should formats that use specific characters for match_char, gap_char and missing_char check and do map_char if required during a write_aln? I was going to have a look through Align::* and try to make them more consistent with regards to these. What I propose to do is: 1) Have default values for match_char, gap_char and missing_char for those formats that only support a particular character 2) Have match_char, gap_char and missing_char set when the appropriate command is found for setting these characters 3) Store the sequences exactly as they are in the alignment file (except maybe for match_char) 4) During write_aln check are conducted to ensure the sequences are compliant with the features (match_char, gap_char and missing_char ) supported by that format and do map_char, unmatch/match as required. I suppose the only thing is whether Unmatch should be called during read_aln in order to store sequences with the correct residue characters instead of the match_char. The reason being that many formats don't support this and the user can always call "match" on the SimpleAlign object, thus bringing some level of consistency to the use of this feature. This will be my first foray into making bigger changes in Bioperl as a developer! Yikes! So I'd like to know what people think as well as their experiences with similar problems. I'm most familiar with nexus, clustal, phylip and fasta so it would be nice to hear about comments/problems with some of the other formats! Cheers Nath ---------------------------------- Nathan Haigh PostDoctoral Research Associate Department of Animal and Plant Sciences University of Sheffield Western Bank Sheffield S10 2TN Tel: +44 (0)114 22 20112 Mob: +44 (0)7742 533 569 Fax: +44 (0)114 22 20002 From amackey at pcbi.upenn.edu Thu May 12 06:57:49 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu May 12 06:55:10 2005 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <42823F76.4010809@colibase.bham.ac.uk> References: <4281DCA0.3070201@colibase.bham.ac.uk> <402d127f904c88895c4129202752bc3b@pcbi.upenn.edu> <42823F76.4010809@colibase.bham.ac.uk> Message-ID: <428336AD.6020609@pcbi.upenn.edu> Roy Chaudhuri wrote: > You'll have to excuse my ignorance (I know very little about inline C > and makefiles) but I couldn't find a file called read.xs- is it deleted > during the make process Sorry; you should find this file in the temporary subdirectory created by Inline (do a find ./ -name read.xs in the build directory). -Aaron From roy at colibase.bham.ac.uk Thu May 12 07:12:01 2005 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Thu May 12 07:06:11 2005 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <428336AD.6020609@pcbi.upenn.edu> References: <4281DCA0.3070201@colibase.bham.ac.uk> <402d127f904c88895c4129202752bc3b@pcbi.upenn.edu> <42823F76.4010809@colibase.bham.ac.uk> <428336AD.6020609@pcbi.upenn.edu> Message-ID: <42833A01.8070804@colibase.bham.ac.uk> > Sorry; you should find this file in the temporary subdirectory created > by Inline (do a find ./ -name read.xs in the build directory). Tried that, there's no sign of it. After I run make, the _Inline directory only contains the file _Inline/config, and an empty series of directories: _Inline/build/Bio/SeqIO/staden From gdw1 at cornell.edu Thu May 12 09:47:08 2005 From: gdw1 at cornell.edu (Gregory Drake Wilson) Date: Thu May 12 09:40:05 2005 Subject: [Bioperl-l] Primer 3: MSG: Can't open RESULTS Message-ID: <36866.128.253.41.148.1115905628.squirrel@128.253.41.148> I am resurrecting a question posted last June (archived here http://bioperl.org/pipermail/bioperl-l/2004-June/016255.html) regarding bioperl's usage of Primer3 because the issue continues to plague me. All of my implementations die reliably after a fixed number of attempts at generating primers with this error: ------------- EXCEPTION ------------- MSG: Can't open RESULTS STACK Bio::Tools::Run::Primer3::run /usr/lib/perl5/vendor_perl/5.8.2/Bio/Tools/Run/Primer3.pm:360 STACK toplevel ./primer_maker.pl:128 -------------------------------------- The offending line (360) is: open (RESULTS, "$executable < $tempfile|") || $self->throw("Can't open RESULTS"); As the old thread suggests it is probably a numer of open files issue. > lsof -c pri +Lr returns an ever increasing amount of entries like: primer_ma 28830 gdw1 425u REG 3,6 1140 0 807353 /tmp/twdZa7turz (deleted) primer_ba 28830 drakos7 426u REG 3,6 1170 0 807354 /tmp/QDJv6Mb8iV (deleted) Does anyone know a way to make sure those file descriptors are cleaned up after each primer generation? Greg From gdw1 at cornell.edu Thu May 12 11:11:35 2005 From: gdw1 at cornell.edu (Greg Wilson) Date: Thu May 12 11:04:55 2005 Subject: [Bioperl-l] Primer 3: MSG: Can't open RESULTS In-Reply-To: <36866.128.253.41.148.1115905628.squirrel@128.253.41.148> References: <36866.128.253.41.148.1115905628.squirrel@128.253.41.148> Message-ID: <42837227.8060700@cornell.edu> Ok, I think I figured this out. The following line (***) needs to be added to Bio/Tools/Run/Primer3.pm: 356 # make a temporary file and print the instructions to it. 357 my ($temphandle, $tempfile)=$self->io->tempfile; 358 print $temphandle join "\n", @{$self->{'primer3_input'}}, "=\n"; 359 open (RESULTS, "$executable < $tempfile|") || $self->throw("Can't open RESULTS"); *** close $temphandle || $self->throw("Can't close TEMPFILE"); The temporary file never gets closed so the file descriptors just pile up until the overall script ends. This will clean them up as soon as we do not need them any more. Not sure who needs to do the actual patching of the module in CVS though. Greg Gregory Drake Wilson wrote: > I am resurrecting a question posted last June (archived here > http://bioperl.org/pipermail/bioperl-l/2004-June/016255.html) regarding > bioperl's usage of Primer3 because the issue continues to plague me. All > of my implementations die reliably after a fixed number of attempts at > generating primers with this error: > > ------------- EXCEPTION ------------- > MSG: Can't open RESULTS > STACK Bio::Tools::Run::Primer3::run > /usr/lib/perl5/vendor_perl/5.8.2/Bio/Tools/Run/Primer3.pm:360 > STACK toplevel ./primer_maker.pl:128 > > -------------------------------------- > > The offending line (360) is: > open (RESULTS, "$executable < $tempfile|") || $self->throw("Can't open > RESULTS"); > > As the old thread suggests it is probably a numer of open files issue. > > >>lsof -c pri +Lr > > returns an ever increasing amount of entries like: > primer_ma 28830 gdw1 425u REG 3,6 1140 0 807353 > /tmp/twdZa7turz (deleted) > primer_ba 28830 drakos7 426u REG 3,6 1170 0 807354 > /tmp/QDJv6Mb8iV (deleted) > > Does anyone know a way to make sure those file descriptors are cleaned up > after each primer generation? > > Greg > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Thu May 12 11:31:20 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu May 12 11:25:02 2005 Subject: [Bioperl-l] Primer 3: MSG: Can't open RESULTS In-Reply-To: <42837227.8060700@cornell.edu> References: <36866.128.253.41.148.1115905628.squirrel@128.253.41.148> <42837227.8060700@cornell.edu> Message-ID: <403C8362-D88F-4EDE-A350-5A5A899BC429@duke.edu> What version of bioperl-run are you on - this looks to already be fixed in CVS. -jason On May 12, 2005, at 11:11 AM, Greg Wilson wrote: > Ok, I think I figured this out. The following line (***) needs to > be added to Bio/Tools/Run/Primer3.pm: > > 356 # make a temporary file and print the instructions to it. > 357 my ($temphandle, $tempfile)=$self->io->tempfile; > 358 print $temphandle join "\n", @{$self->{'primer3_input'}}, "=\n"; > 359 open (RESULTS, "$executable < $tempfile|") || $self->throw > ("Can't open RESULTS"); > *** close $temphandle || $self->throw("Can't close TEMPFILE"); > > The temporary file never gets closed so the file descriptors just > pile up until the overall script ends. This will clean them up as > soon as we do not need them any more. > > Not sure who needs to do the actual patching of the module in CVS > though. > > Greg > > > Gregory Drake Wilson wrote: > >> I am resurrecting a question posted last June (archived here >> http://bioperl.org/pipermail/bioperl-l/2004-June/016255.html) >> regarding >> bioperl's usage of Primer3 because the issue continues to plague >> me. All >> of my implementations die reliably after a fixed number of >> attempts at >> generating primers with this error: >> ------------- EXCEPTION ------------- >> MSG: Can't open RESULTS >> STACK Bio::Tools::Run::Primer3::run >> /usr/lib/perl5/vendor_perl/5.8.2/Bio/Tools/Run/Primer3.pm:360 >> STACK toplevel ./primer_maker.pl:128 >> -------------------------------------- >> The offending line (360) is: >> open (RESULTS, "$executable < $tempfile|") || $self->throw >> ("Can't open >> RESULTS"); >> As the old thread suggests it is probably a numer of open files >> issue. >> >>> lsof -c pri +Lr >>> >> returns an ever increasing amount of entries like: >> primer_ma 28830 gdw1 425u REG 3,6 1140 0 807353 >> /tmp/twdZa7turz (deleted) >> primer_ba 28830 drakos7 426u REG 3,6 1170 0 807354 >> /tmp/QDJv6Mb8iV (deleted) >> Does anyone know a way to make sure those file descriptors are >> cleaned up >> after each primer generation? >> Greg >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From gdw1 at cornell.edu Thu May 12 11:51:50 2005 From: gdw1 at cornell.edu (Greg Wilson) Date: Thu May 12 11:44:40 2005 Subject: [Bioperl-l] Primer 3: MSG: Can't open RESULTS In-Reply-To: <403C8362-D88F-4EDE-A350-5A5A899BC429@duke.edu> References: <36866.128.253.41.148.1115905628.squirrel@128.253.41.148> <42837227.8060700@cornell.edu> <403C8362-D88F-4EDE-A350-5A5A899BC429@duke.edu> Message-ID: <42837B96.2020705@cornell.edu> Ok, I see that CVS revision 1.4 back in Jul 2004 has the change. I have bioperl-run 1.4 installed (and bioperl 1.5). This may just be a linux distro issue that I need to sort out. Sorry for not checking CVS first. Greg Jason Stajich wrote: > What version of bioperl-run are you on - this looks to already be fixed > in CVS. > > -jason > On May 12, 2005, at 11:11 AM, Greg Wilson wrote: > >> Ok, I think I figured this out. The following line (***) needs to be >> added to Bio/Tools/Run/Primer3.pm: >> >> 356 # make a temporary file and print the instructions to it. >> 357 my ($temphandle, $tempfile)=$self->io->tempfile; >> 358 print $temphandle join "\n", @{$self->{'primer3_input'}}, "=\n"; >> 359 open (RESULTS, "$executable < $tempfile|") || $self->throw ("Can't >> open RESULTS"); >> *** close $temphandle || $self->throw("Can't close TEMPFILE"); >> >> The temporary file never gets closed so the file descriptors just >> pile up until the overall script ends. This will clean them up as >> soon as we do not need them any more. >> >> Not sure who needs to do the actual patching of the module in CVS >> though. >> >> Greg From oliver.burren at cimr.cam.ac.uk Thu May 12 07:30:37 2005 From: oliver.burren at cimr.cam.ac.uk (Oliver Burren) Date: Thu May 12 13:22:00 2005 Subject: [Bioperl-l] Odd behaviour in Bio::SeqFeature::Gene::Transcript Message-ID: <1115897437.31101.41.camel@jakarta> Dear Developers, I'm working with Bio::SeqFeature::Gene::Transcript and I am getting some odd behaviour. In a nutshell I'm getting an error depending on which order i add exons to a transcript when I try and dump introns. Best illustrated with a script which I attach (test_intron.pl). Here is the output that I get : 229 machine /home/xxx % perl test_intron.pl Building transcript on -ve strand Exon Order is utr3prime,exon3,exon2,exon1,utr5prime Strand is set to -1 SEQ intron 31 30 . - . SEQ intron 21 25 . - . SEQ intron 11 15 . - . SEQ intron 6 5 . - . Exon Order is utr5prime,exon1,exon2,exon3,utr3prime Strand is set to -1 ------------- EXCEPTION ------------- MSG: Intron gap begins after '10' and ends before '1' STACK Bio::SeqFeature::Gene::Intron::location /home/xxxxx/bioperl- live/Bio/SeqFeature/Gene/Intron.pm:288 STACK Bio::SeqFeature::Generic::strand /home/xxxxx/bioperl- live/Bio/SeqFeature/Generic.pm:356 STACK Bio::Tools::GFF::_gff2_string /home/xxxxx/bioperl- live/Bio/Tools/GFF.pm:777 STACK Bio::Tools::GFF::gff_string /home/xxxxx/bioperl- live/Bio/Tools/GFF.pm:680STACK Bio::SeqFeature::Generic::gff_string /home/xxxxx/bioperl- live/Bio/SeqFeature/Generic.pm:762 STACK toplevel test_intron.pl:56 -------------------------------------- Is this the expected behaviour or a feature or more likely have I made a mistake somwhere ? I can fix it by the following. However this is probably the wrong thing to do. It can also be fixed by sorting exons before addition to transcript. cvs diff Transcript.pm Index: Transcript.pm =================================================================== RCS file: /home/repository/bioperl/bioperl- live/Bio/SeqFeature/Gene/Transcript.pm,v retrieving revision 1.33 diff -r1.33 Transcript.pm 287c287 < $rev_order = ($exons[0]->end() < $exons[1]->start() ? 0 : 1); --- > #$rev_order = ($exons[0]->end() < $exons[1]->start() ? 0 : 1); 291c291,292 < if((! defined($strand)) || ($strand != -1) || (! $rev_order)) { --- > #if((! defined($strand)) || ($strand != -1) || (! $rev_order)) { > if((! defined($strand)) || ($strand != -1)) { Thanks Olly Burren JDRF/WT DIL -------------- next part -------------- A non-text attachment was scrubbed... Name: test_intron.pl Type: application/x-perl Size: 1506 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050512/e8785459/test_intron.bin From jason.stajich at duke.edu Thu May 12 13:50:08 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu May 12 13:42:58 2005 Subject: [Bioperl-l] Odd behaviour in Bio::SeqFeature::Gene::Transcript In-Reply-To: <1115897437.31101.41.camel@jakarta> References: <1115897437.31101.41.camel@jakarta> Message-ID: <861275B2-6065-4125-BBE8-55A1A368C707@duke.edu> On May 12, 2005, at 7:30 AM, Oliver Burren wrote: > Dear Developers, > > I'm working with Bio::SeqFeature::Gene::Transcript and I am getting > some > odd behaviour. In a nutshell I'm getting an error depending on which > order i add exons to a transcript when I try and dump introns. > > Best illustrated with a script which I attach (test_intron.pl). > > Here is the output that I get : > 229 machine /home/xxx % perl test_intron.pl > Building transcript on -ve strand > Exon Order is utr3prime,exon3,exon2,exon1,utr5prime > Strand is set to -1 > SEQ intron 31 30 . - . > SEQ intron 21 25 . - . > SEQ intron 11 15 . - . > SEQ intron 6 5 . - . > Exon Order is utr5prime,exon1,exon2,exon3,utr3prime > Strand is set to -1 > > ------------- EXCEPTION ------------- > MSG: Intron gap begins after '10' and ends before '1' > STACK Bio::SeqFeature::Gene::Intron::location /home/xxxxx/bioperl- > live/Bio/SeqFeature/Gene/Intron.pm:288 > STACK Bio::SeqFeature::Generic::strand /home/xxxxx/bioperl- > live/Bio/SeqFeature/Generic.pm:356 > STACK Bio::Tools::GFF::_gff2_string /home/xxxxx/bioperl- > live/Bio/Tools/GFF.pm:777 > STACK Bio::Tools::GFF::gff_string /home/xxxxx/bioperl- > live/Bio/Tools/GFF.pm:680STACK > Bio::SeqFeature::Generic::gff_string /home/xxxxx/bioperl- > live/Bio/SeqFeature/Generic.pm:762 > STACK toplevel test_intron.pl:56 > > -------------------------------------- > > > > Is this the expected behaviour or a feature or more likely have I > made a > mistake somwhere ? > > > I can fix it by the following. However this is probably the wrong > thing > to do. It can also be fixed by sorting exons before addition to > transcript. > Sorting needs to take into account strandedness - it should be handled by the transcript object I would think. I am assuming we won't ever have exons in the same transcript which are actually represented on different contigs. This would then make the sorting not work - we run into the same problem in Bio::SeqFeatureI- >spliced_seq function too. So the sort needs to have a strand as part of the equation. I think this takes care of it. Multiply start by the strand in the schwartzian transformation. $ cvs diff Bio/SeqFeature/Gene/Transcript.pm Index: Bio/SeqFeature/Gene/Transcript.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Gene/ Transcript.pm,v retrieving revision 1.33 diff -r1.33 Transcript.pm 294c294,295 < @exons = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, $_->start()] } @exons; --- > @exons = map { $_->[0] } sort { $a->[1] <=> $b->[1] } > map { [ $_, $_->start * ($_->strand || 1)] } @exons; I can check this in if it works - seems to for your example. --jason > > cvs diff Transcript.pm > Index: Transcript.pm > =================================================================== > RCS file: /home/repository/bioperl/bioperl- > live/Bio/SeqFeature/Gene/Transcript.pm,v > retrieving revision 1.33 > diff -r1.33 Transcript.pm > 287c287 > < $rev_order = ($exons[0]->end() < $exons[1]->start() ? 0 : 1); > --- > >> #$rev_order = ($exons[0]->end() < $exons[1]->start() ? 0 : 1); >> > 291c291,292 > < if((! defined($strand)) || ($strand != -1) || (! $rev_order)) { > --- > >> #if((! defined($strand)) || ($strand != -1) || (! $rev_order)) { >> if((! defined($strand)) || ($strand != -1)) { >> > > > Thanks > > Olly Burren > JDRF/WT DIL > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From sjmiller at email.arizona.edu Thu May 12 17:53:54 2005 From: sjmiller at email.arizona.edu (Susan J. Miller) Date: Thu May 12 17:47:08 2005 Subject: [Bioperl-l] Changes to SearchIO/Writer/GbrowseGFF.pm Message-ID: <4283D072.4030708@email.arizona.edu> Hi, I've made a few changes to SearchIO/Writer/GbrowseGFF.pm: 1. Fixed what I believe is a small bug that appears for single-HSP +/- hits (use of wrong variables causing undefined value in concatenation) 2. Added an option to output the CIGAR line 3. Added an option to output e-values in the score column Below are the diffs between bioperl 1.5 GbrowseGFF.pm and my code - if these seem reasonable perhaps they can be incorporated. 21,23c21 < -file => ">result.gff" < -output_cigar => 1 < -output_signif => 1); --- > -file => ">result.gff"); 80,81d77 < : -output_cigar => 1 : output cigar lines < : -output_signif => 1 : output e-value in score column 89c85 < my ($evalue, $cigar, $signif) = $self->_rearrange(["E_VALUE", "OUTPUT_CIGAR", "OUTPUT_SIGNIF"], @args); --- > my ($evalue) = $self->_rearrange(["E_VALUE"], @args); 91,92d86 < $self->{_cigar} = $cigar; < $self->{_signif} = $signif; 153c147 < my ($GFF, $cigar, $score); --- > my $GFF; 161d154 < 171,175c164 < if (defined $self->{_signif}) { < $score = $hit->significance; < } else { < $score = $hit->raw_score; < } --- > my $score = $hit->raw_score; 222,224d210 < < #retrieve cigar line for possible output < $cigar = $hsp->cigar_string; 231d216 < 243,245c228 < if (defined $self->{_cigar}) { < $tags{'Gap'} = $cigar; < } --- > 264c247 < # The following lines should use $qmmin, $qmmax instead of $qpmax, $qpmin --- > 267,268c250,251 < $tags{'tstart'} = $qmmin; < $tags{'tend'} = $qmmax; --- > $tags{'tstart'} = $qpmax; > $tags{'tend'} = $qpmin; 271c254 < $tags{'Target'} = "EST:$seqname $qmmin $qmmax"; --- > $tags{'Target'} = "EST:$seqname $qpmax $qpmin"; 274,276d256 < if (defined $self->{_cigar}) { < $tags{'Gap'} = $cigar; < } 300,304c280,281 < if (defined $self->{_signif}) { < $score = $hsp->significance; < } else { < $score = $hsp->score; < } --- > my $score = $hsp->score; > 317,319d293 < if (defined $self->{_cigar}) { < $tags{'Gap'} = $cigar; < } 342,346c316,317 < if (defined $self->{_signif}) { < $score = $hsp->significance; < } else { < $score = $hsp->score; < } --- > my $score = $hsp->score; > 359,361d329 < if (defined $self->{_cigar}) { < $tags{'Gap'} = $cigar; < } -- Regards, -susan Susan J. Miller Biotechnology Computing Facility Arizona Research Laboratories Bio West 228 University of Arizona Tucson, AZ 85721 (520) 626-2597 From n.haigh at sheffield.ac.uk Fri May 13 04:29:41 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri May 13 04:26:10 2005 Subject: [Bioperl-l] Bio::SearchIO::amps and Bio::SearchIO::mrbayes_nexus In-Reply-To: Message-ID: I was under the impression that NEXUS format blocks should end with "end;" (Maddison et al. 1997) and it was later that some software started using "endblock" instead. Therefore, "end;" should be used for output, but for backwards compatibility "endblock;" should also be handled correctly for input - Although I haven't reread Maddison et al. 1997 to confirm this - no searchable pdf either :o( You may also find this page useful: http://camel5.umbi.umd.edu/camel/projects/nexus/ They have a tool "NEXPL" which contains a pretty comprehensive nexus parser which is able to handle quite a few different nexus blocks. Also, after I contacted Fredrik Ronquist he said that after a period of relatively slow progress they are accelerating the development of MrBayes this year with the addition of three new people to the development team. I've been told that Paul van der Mark (his new postdoc) is the person most suitable to contact with suggestions etc (let me know if you want his e-mail address). Hopefully MrBayes will be a little more NEXUS compliant than it currently is. Nathan -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Matthew Betts Sent: 11 May 2005 12:54 To: BioPerl Subject: [Bioperl-l] Bio::SearchIO::amps and Bio::SearchIO::mrbayes_nexus Hi, I was thinking of writing a Bio::SearchIO module for AMPS block format. This is the format used by alscript and stamp. Both of these come with format converters, but would be useful for me to do it within bioperl. OK for me to write Bio::SearchIO::amps, or is there something else already? Is that name OK? Secondly, MrBayes doesn't like some things in the Nexus format output by Bio::SearchIO::nexus (the 'symbols' parameter, and it expects 'end;' rather than 'endblock;') even though they're valid nexus... OK to copy Bio::SearchIO::nexus to Bio::SearchIO::mrbayes_nexus and make the necessary changes, or is there a better way? (Though a full nexus parser and flexible outputter looks like a nightmare...) Thanks, Matthew _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From oliver.burren at cimr.cam.ac.uk Fri May 13 06:45:48 2005 From: oliver.burren at cimr.cam.ac.uk (Oliver Burren) Date: Fri May 13 06:39:10 2005 Subject: [Bioperl-l] Odd behaviour in Bio::SeqFeature::Gene::Transcript In-Reply-To: <861275B2-6065-4125-BBE8-55A1A368C707@duke.edu> References: <1115897437.31101.41.camel@jakarta> <861275B2-6065-4125-BBE8-55A1A368C707@duke.edu> Message-ID: <1115981148.11509.18.camel@jakarta> Thanks Jason, That seems to do the trick. Olly On Thu, 2005-05-12 at 13:50 -0400, Jason Stajich wrote: > On May 12, 2005, at 7:30 AM, Oliver Burren wrote: > > > Dear Developers, > > > > I'm working with Bio::SeqFeature::Gene::Transcript and I am getting > > some > > odd behaviour. In a nutshell I'm getting an error depending on which > > order i add exons to a transcript when I try and dump introns. > > > > Best illustrated with a script which I attach (test_intron.pl). > > > > Here is the output that I get : > > 229 machine /home/xxx % perl test_intron.pl > > Building transcript on -ve strand > > Exon Order is utr3prime,exon3,exon2,exon1,utr5prime > > Strand is set to -1 > > SEQ intron 31 30 . - . > > SEQ intron 21 25 . - . > > SEQ intron 11 15 . - . > > SEQ intron 6 5 . - . > > Exon Order is utr5prime,exon1,exon2,exon3,utr3prime > > Strand is set to -1 > > > > ------------- EXCEPTION ------------- > > MSG: Intron gap begins after '10' and ends before '1' > > STACK Bio::SeqFeature::Gene::Intron::location /home/xxxxx/bioperl- > > live/Bio/SeqFeature/Gene/Intron.pm:288 > > STACK Bio::SeqFeature::Generic::strand /home/xxxxx/bioperl- > > live/Bio/SeqFeature/Generic.pm:356 > > STACK Bio::Tools::GFF::_gff2_string /home/xxxxx/bioperl- > > live/Bio/Tools/GFF.pm:777 > > STACK Bio::Tools::GFF::gff_string /home/xxxxx/bioperl- > > live/Bio/Tools/GFF.pm:680STACK > > Bio::SeqFeature::Generic::gff_string /home/xxxxx/bioperl- > > live/Bio/SeqFeature/Generic.pm:762 > > STACK toplevel test_intron.pl:56 > > > > -------------------------------------- > > > > > > > > Is this the expected behaviour or a feature or more likely have I > > made a > > mistake somwhere ? > > > > > > I can fix it by the following. However this is probably the wrong > > thing > > to do. It can also be fixed by sorting exons before addition to > > transcript. > > > > Sorting needs to take into account strandedness - it should be > handled by the transcript object I would think. I am assuming we > won't ever have exons in the same transcript which are actually > represented on different contigs. This would then make the sorting > not work - we run into the same problem in Bio::SeqFeatureI- > >spliced_seq function too. > > So the sort needs to have a strand as part of the equation. I think > this takes care of it. Multiply start by the strand in the > schwartzian transformation. > > $ cvs diff Bio/SeqFeature/Gene/Transcript.pm > Index: Bio/SeqFeature/Gene/Transcript.pm > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqFeature/Gene/ > Transcript.pm,v > retrieving revision 1.33 > diff -r1.33 Transcript.pm > 294c294,295 > < @exons = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map > { [ $_, $_->start()] } @exons; > --- > > @exons = map { $_->[0] } sort { $a->[1] <=> $b->[1] } > > map { [ $_, $_->start * ($_->strand || 1)] } @exons; > > I can check this in if it works - seems to for your example. > > --jason > > > > > cvs diff Transcript.pm > > Index: Transcript.pm > > =================================================================== > > RCS file: /home/repository/bioperl/bioperl- > > live/Bio/SeqFeature/Gene/Transcript.pm,v > > retrieving revision 1.33 > > diff -r1.33 Transcript.pm > > 287c287 > > < $rev_order = ($exons[0]->end() < $exons[1]->start() ? 0 : 1); > > --- > > > >> #$rev_order = ($exons[0]->end() < $exons[1]->start() ? 0 : 1); > >> > > 291c291,292 > > < if((! defined($strand)) || ($strand != -1) || (! $rev_order)) { > > --- > > > >> #if((! defined($strand)) || ($strand != -1) || (! $rev_order)) { > >> if((! defined($strand)) || ($strand != -1)) { > >> > > > > > > Thanks > > > > Olly Burren > > JDRF/WT DIL > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri May 13 10:51:36 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri May 13 10:44:26 2005 Subject: [Bioperl-l] Re: Bio::SearchIO::amps and Bio::SearchIO::mrbayes_nexus In-Reply-To: References: Message-ID: <70B655AA-12E2-4B9B-9A86-F22DCE80AEE2@duke.edu> I added two init parameters to Bio::AlignIO::nexus -show_symbols -show_endblock which should be set to '0' when initializing a nexus file for MrBayes. This should be in 1.5.0. This seemed to work okay for me. I'd rather not see a new module if we can just provide some parameters to tweak this. AlignIO::nexus was never meant to be a full fledged nexus parser - the Bio::SimpleAlign object is not complex enough to store all the commands and blocks. It would be nice if we could do a better job here - maybe by making an object which stores the blocks differently - AlignIO/SimpleAlign are only really applicable for the DATA block. Definitely welcome your input/support on designing some new stuff Matthew. If it makes more sense to you to write a new module that is fine too - but let's see how different it has to be from the current one and whether or not we can combine them in the end. -jason On May 11, 2005, at 8:31 AM, Matthew Betts wrote: > > Sorry, meant Bio::AlignIO::* > > (oops) > > On Wed, 11 May 2005, Matthew Betts wrote: > > >> >> Hi, >> >> I was thinking of writing a Bio::SearchIO module for AMPS block >> format. This is the format used by alscript and stamp. Both of >> these come with format converters, but would be useful for me to >> do it within bioperl. OK for me to write Bio::SearchIO::amps, or >> is there something else already? Is that name OK? >> >> Secondly, MrBayes doesn't like some things in the Nexus format >> output by Bio::SearchIO::nexus (the 'symbols' parameter, and it >> expects 'end;' rather than 'endblock;') even though they're valid >> nexus... OK to copy Bio::SearchIO::nexus to >> Bio::SearchIO::mrbayes_nexus and make the necessary changes, or is >> there a better way? (Though a full nexus parser and flexible >> outputter looks like a nightmare...) >> >> Thanks, >> >> Matthew >> >> >> > > -- > Matthew Betts, Post Doc, Computational Biology Unit, > BCCS, HiB, UiB, Thorm?hlensgt. 55, 5008 Bergen, Norway > tlf: (+47) 55 58 40 22, fax: (+47) 55 58 42 95 > mailto:matthew.betts@bccs.uib.no, www.ii.uib.no/ > ~matthewb_______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From amackey at pcbi.upenn.edu Fri May 13 13:29:56 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Fri May 13 13:22:42 2005 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <42833A01.8070804@colibase.bham.ac.uk> References: <4281DCA0.3070201@colibase.bham.ac.uk> <402d127f904c88895c4129202752bc3b@pcbi.upenn.edu> <42823F76.4010809@colibase.bham.ac.uk> <428336AD.6020609@pcbi.upenn.edu> <42833A01.8070804@colibase.bham.ac.uk> Message-ID: <1af2a57eaade51a178fe6055d4b1168c@pcbi.upenn.edu> Inline has a NOCLEAN option to prevent these from being emptied; try editing read.pm to include it, rebuild and see what happens ... -Aaron On May 12, 2005, at 7:12 AM, Roy Chaudhuri wrote: >> Sorry; you should find this file in the temporary subdirectory created >> by Inline (do a find ./ -name read.xs in the build directory). > > Tried that, there's no sign of it. After I run make, the _Inline > directory only contains the file _Inline/config, and an empty series of > directories: _Inline/build/Bio/SeqIO/staden > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From MAG at Stowers-Institute.org Fri May 13 14:27:24 2005 From: MAG at Stowers-Institute.org (Goel, Manisha) Date: Fri May 13 14:21:06 2005 Subject: [Bioperl-l] Number of internal nodes on a tree Message-ID: <200505131820.j4DIJvfY004165@portal.open-bio.org> Hi All, I am trying to claculate the number of nodes between any two given nodes (or leaves) on a phylogenetic tree. Can I do this using the existing functionalities in Bio::Tree ? Please suggest. Thanks, -Manisha Goel From jason.stajich at duke.edu Fri May 13 14:44:34 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri May 13 14:37:26 2005 Subject: [Bioperl-l] Number of internal nodes on a tree In-Reply-To: <200505131820.j4DIJvfY004165@portal.open-bio.org> References: <200505131820.j4DIJvfY004165@portal.open-bio.org> Message-ID: <9E109297-060A-4AD3-A4F8-45DA8FB24B40@duke.edu> yes. see the code in Bio::Tree::TreeFunctionsI for the distance method. It computes the distance between two nodes by adding the branch lengths - if you just want the count you can modify it . Basically find the LCA of the two nodes and count the number of steps it takes for each to get to the LCA. You are just iterating on the - >ancestor call. Something like this should work - I haven't tried it out though so I don't know for sure. my $lca = $tree->get_lca($node1,$node2); my $count = 0; for my $n ( $node1,$node2 ) { my $node = $n; # do a copy otherwise we'll be updating the value of $node1 and $node2 while( $node->ancestor ) { last if( $node->ancestor->internal_id == $lca->internal_id); $count++; $node = $node->ancestor; } } -jason On May 13, 2005, at 2:27 PM, Goel, Manisha wrote: > Hi All, > > I am trying to claculate the number of nodes between any two given > nodes > (or leaves) on a phylogenetic tree. > Can I do this using the existing functionalities in Bio::Tree ? > Please suggest. > > Thanks, > -Manisha Goel > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From mlemieux at bioinfo.ca Sat May 14 08:30:47 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Sat May 14 08:24:19 2005 Subject: [Bioperl-l] [help] bl2seq using blastp Message-ID: Your sequences don't look like proteins. If you use the online blast2seq you get "sequences must be proteins" errors. HTH, Madeleine From roy at colibase.bham.ac.uk Mon May 16 09:28:19 2005 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Mon May 16 09:22:08 2005 Subject: [Bioperl-l] Bio::SeqIO::staden::read make test error In-Reply-To: <1af2a57eaade51a178fe6055d4b1168c@pcbi.upenn.edu> References: <4281DCA0.3070201@colibase.bham.ac.uk> <402d127f904c88895c4129202752bc3b@pcbi.upenn.edu> <42823F76.4010809@colibase.bham.ac.uk> <428336AD.6020609@pcbi.upenn.edu> <42833A01.8070804@colibase.bham.ac.uk> <1af2a57eaade51a178fe6055d4b1168c@pcbi.upenn.edu> Message-ID: <42889FF3.8070108@colibase.bham.ac.uk> > Inline has a NOCLEAN option to prevent these from being emptied; try > editing read.pm to include it, rebuild and see what happens ... Okay, I built read.pm using the Inline CLEAN_AFTER_BUILD => 0 option, and examined read.xs. It looks like line 32 which causes the incompatible pointer type error is this line: qualarr = SvRV(qual); (equivalent to line 199 of read.pm). If I comment out this line, or replace it with: qualarr = (AV*)SvRV(qual); (copied without much comprehension from various bits of code on Google) then the compile warning goes away, but I still get the same errors when I make test. From alanrw at cs.manchester.ac.uk Mon May 16 10:43:40 2005 From: alanrw at cs.manchester.ac.uk (Alan R Williams) Date: Mon May 16 14:21:52 2005 Subject: [Bioperl-l] Grail Tool Message-ID: <4288B19C.40502@cs.manchester.ac.uk> Hello, I am trying to use Bio::Tools::Grail to parse a Grail output file. It reads the file OK but does not produce any predictions. Is there a particular part of the Grail output that it understands? I've attached one of the test Grail outputs I'm using. Thanks in advance, Alan -------------- next part -------------- gc_object_start: gene_grailexp --organism human --output pretty --dbpat grailexp_v3 # Service: gene_grailexp # Version: 3.3 # Description: GAT GrailEXP Gene Prediction Service # Last Modified: October, 2001 # Tool: GrailEXP 3.3 from ORNL. Last updated: October, 2001. # Database: GrailEXP Database Thu Feb 27 16:15:37 EST 2003 from NCBI/TIGR/Baylor/Riken (15960696 entries). Last updated: Thu Feb 27 16:15:37 2003. # Sequence Name: >gene_grailexp|PID=9167 # Sequence Length: 18553 # Output_begin: pretty -------------------------------------------------------------------------------- GrailEXP v3.31 [March, 2002] http://compbio.ornl.gov/grailexp/ Authors: Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and Edward C. Uberbacher, 1996-2001 Reference: "Automated Gene Identification in Large-Scale Genomic Sequences", Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4, Number 3, 1997 Sequence: >gene_grailexp|PID=9167 (18553 bp) -------------------------------------------------------------------------------- GAWAIN Gene Predictions (1 predicted, 1 with database similarity) Genes with Database Similarity (1 predicted, 0 with alternative splices) Gene 1, Variant 1 Strand: + Bounds: 1-393 Exons: 1 Start Codon: No Stop Codon: Yes Top-Scoring Reference: BF958402.1 (542 bp) (89% id, 1-393) >human|BF958402.1|est_human|dbEST - gi|12375677|gb|BF958402.1|BF958402 QV4-NN1148-291100-616-h10 NN1148 Homo sapiens cDNA Reference Path: BF958402.1 (542 bp) (89%, 1-393) ---Index---- --------Exons-------- ---------CDS--------- -Ph- -Fr- -Len- -Scr- 1.1.1 1 393 1 259 2 1 393 90 >GrailEXP Gene 1, Var 1 mRNA|Similar to BF958402.1 tgaattcagtgttaccactgcatatccagcccatctgacaccttcagatatgaagctgctcccatcagta aagtattcgacatctgggtcttttaagggctggtcctttatgtcttttcggcttgagaaaacctcatcca gtatttccacacaacagtcatacccagggccacacaacatgggctttccatgctctacccattctattgg cagcagtgtggctggatttagagtattcacaatctccaaggttatgtaggggttttcacacaaaagccct tgatatcttaacatcctagggttagaaagccagcgatgcccactctgctccatcagggtgatgaccgtgt ggggaacccgaattatcaacctttgtcccaatttgagcttgtt >GrailEXP Gene 1, Var 1 protein|Derived from similarity to BF958402.1 EFSVTTAYPAHLTPSDMKLLPSVKYSTSGSFKGWSFMSFRLEKTSSSISTQQSYPGPHNMGFPCSTHSIG SSVAGFRVFTISKVM* Genes with No Database Evidence (0 predicted) -------------------------------------------------------------------------------- GALAHAD Gene Alignments (4 located: 3 displayed, 1 redundant) Index Std Begin End Accession Database Organism Length 1 + 1 393 BF958402.1 est_human human 542 1 piece Seq exons = (1..393) 89% ident Ref exons = (148..542) 2 + 402 676 BF906163.1 est_human human 354 1 piece Seq exons = (402..676) 88% ident Ref exons = (4..277) 3 + 402 676 BI020262.1 est_human human 355 1 piece Seq exons = (402..676) 88% ident Ref exons = (5..278) -------------------------------------------------------------------------------- PERCEVAL Exon Candidates (6 predicted) Index Std Begin End Frm Type Len Scr Quality 1 + 337 481 2 Terminal 145 63 Marginal 2 - 1362 1482 2 Internal 121 19 Poor 3 + 6947 7042 0 Initial 96 41 Marginal 4 - 7856 7943 0 Initial 88 16 Poor 5 - 14834 15109 0 Internal 276 69 Marginal 6 - 15364 15448 0 Initial 85 59 Marginal -------------------------------------------------------------------------------- # Output_end: pretty gc_object_end: gene_grailexp --organism human --output pretty --dbpat grailexp_v3 gc_object_start: cpg_grailexp --output pretty # Service: cpg_grailexp # Version: 3.0 # Description: GAT GrailEXP CpG Island Locator # Last Modified: December 4, 2000 # Tool: GrailEXP 3.3 from ORNL. Last updated: October, 2001. # Sequence Name: >cpg_grailexp|PID=9476 # Sequence Length: 18553 # Output_begin: pretty -------------------------------------------------------------------------------- GrailEXP v3.31 [March, 2002] http://compbio.ornl.gov/grailexp/ Authors: Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and Edward C. Uberbacher, 1996-2001 Reference: "Automated Gene Identification in Large-Scale Genomic Sequences", Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4, Number 3, 1997 Sequence: >cpg_grailexp|PID=9476 (18553 bp) -------------------------------------------------------------------------------- PERCEVAL CpG Islands (2 predicted) Index Begin End Ratio Pct_GC 1 5791 6610 0.90 67.32 2 7044 7412 0.85 58.88 -------------------------------------------------------------------------------- # Output_end: pretty gc_object_end: cpg_grailexp --output pretty gc_object_start: repeat_grailexp --output pretty # Service: repeat_grailexp # Version: 2.0 # Description: GAT GrailEXP Repetitive Element Locator # Last Modified: December 4, 2000 # Tool: GrailEXP 3.3 from ORNL. Last updated: October, 2001. # Database: Repeatmasker Repetitive Database Apr 20 from GIRI (4760 entries). Last updated: Fri Feb 28 13:12:30 2003. # Sequence Name: >repeat_grailexp|PID=9508 # Sequence Length: 18553 # Output_begin: pretty -------------------------------------------------------------------------------- GrailEXP v3.31 [March, 2002] http://compbio.ornl.gov/grailexp/ Authors: Doug Hyatt, Manesh Shah, Victor Olman, Richard Mural, Ying Xu, and Edward C. Uberbacher, 1996-2001 Reference: "Automated Gene Identification in Large-Scale Genomic Sequences", Xu, Y. and Uberbacher, E.C., Journal of Computational Biology, Volume 4, Number 3, 1997 Sequence: >repeat_grailexp|PID=9508 (18553 bp) -------------------------------------------------------------------------------- PERCEVAL Repeats (11 located: 2 simple, 9 complex) Simple Repeats (2 located) Index Begin End Score 1st 10 Bases 1 12069 12175 76 ttttctgttt... 2 14222 14323 58 atggagataa... Complex Repeats (9 located) Index Std Begin End E-Val Element Names 1 + 16656 16793 2e-218 FAM#SINE/Alu/FAM#SINE/Alu_/7SL... 2 - 8079 8186 6e-154 L1PA14#LINE/L1/L1PA13#LINE/L1/... 3 - 3319 3360 0.006 HERV15#LTR/ERV1___LTR15_5%... 4 - 3081 3147 7e-18 HERV3#LTR/ERV1___15%_div,_with... 5 - 2559 2927 5e-32 HERV15#LTR/ERV1___LTR15_5%... 6 - 1791 1867 4e-07 HERV15#LTR/ERV1___LTR15_5%... 7 - 1154 1254 2e-05 HERV3#LTR/ERV1___15%_div,_with... 8 - 581 849 4e-29 HERV3#LTR/ERV1___15%_div,_with... 9 - 49 90 2e-05 HERV3#LTR/ERV1___15%_div,_with... Masked Sequence >repeat_grailexp|PID=9508, masked tgaattcagtgttaccactgcatatccagcccatctgacaccttcagannnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnttttaagggctggtcctttatgtcttttcg gcttgagaaaacctcatccagtatttccacacaacagtcatacccagggccacacaacat gggctttccatgctctacccattctattggcagcagtgtggctggatttagagtattcac aatctccaaggttatgtaggggttttcacacaaaagcccttgatatcttaacatcctagg gttagaaagccagcgatgcccactctgctccatcagggtgatgaccgtgtggggaacccg aattatcaacctttgtcccaatttgagcttgttagcatcttctgctaacaggacagtggc tgccaatgccttgaaacacagtggccatcccatagccaccaagtctagccctttggacta atatgccatgggccaataccatgaccctagtatttgtaccaagactcctgtagctattcc ttaactttcatgaacatataggtaaaaaggctttttcatgnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnntacagaagcctaaatactggacacttttaaagcagatttgagcctttttta cctgacactttatatcctgccttccacagcaggtgcagaaggttttgggtttttcccggg tgcagtagcccttggccagtggctttcttataccatggtcctgaccattttccttattct cttgacattcatcctttcagtgtcctttccttttgcaccattcacattaatctttttcta gccttggccagctcttaaatccctgtccagactggcctctttcatgactgcacccacgcc tccttgcaaagtcnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnntgtagc cactttagtaagctgagtatcactcacgcctgtggagcttctaacttctggaatttatgc ctgatatttccctgggcctgccttacaaatgccatatttaccatgtactgtttttcagca gcctcagggtcaaatggcatatgaagccgataagcctcgcagagcctctcgtaaaattca ccgggactttcatctggcttctggtggacctctgagacctttctgatattggtggccttt ttccctccagcctttattccatttaggagtgcctctcaataccactgcaaactttgtggc ccttgagcctggttagggtcccagcctgggacagcctctatcagaagtgcttgttgtgca tactgcctgatatctcctgtgcctgcaggctcattgttctcgagccactggagagcagct tgtataactctttggtgctcttccatatcaaataatgacagaagaagttgtttgcaatca acccaggtaggattgtgcattaggaaaatgggctgcattaaatctataagnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnntgaagagccattctcccccttggacttcattctgtgcattcaaataaatttgt ccccatgtttatcggagggacatctgcattgcttgggcatgacctgacctaagatagcca acctcatccccctggaattcctcccttgggttttcaggtaagggctctgacttctgtctg cgtggtgttgcctgagggatgctttcttctgagtctgagcctccagaggcagccagggca gcctcctgtctaagccttgccaaggatggataaattggtgcatagggggtagagtttcta actcttcagatggggcctgtaggacaggttttttctgcttttcctgtgactctttttctt tttctgatgacttgcagtctctttctattgttcctggttttgtccaagccattaaagtct tatggtaggtctcaaagcaggtctgcagccacttagggcgggtctgaattacacttagct gggagtctgtatacgggaactgatctgggtgcctgggctattctccaagcccagtgacca cctgaaacacttggccaattatctccttgtgtattgtcccctcggccagccaccctacat taaaagacagccggtttatctcacaaaaggttctcagtttctgtggagttagcttaacca tttaatcaccatttaaaccttttgtaaaagtttttcagnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnattcctgttacagctagctgcagccatggtggatgctgcttagccaggagagtac cttaattcctgttacagctagctgcagccatacnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnacacacctccccc gtccctgaaactgttttcctttctgactgacgtgcgagccccacctgcatctggtgtcgg ttagggcgtgagttttgtctgaatcaatgggcctctcccattgtcccaaccccctcgggt cagatcagtcatcatgccctnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnntggttcagatggctgtgcgctgctcccgtgacc atcctgcaaccccttccactggttccatttgtgctgtcgggggaaggccccagaacgcgg gagggcagttctccttctgggctgaaactctcctggtggtgccaaggaccccagatctcc catgtcctggggctgtagnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn ggatgagcccccagaaatgttacaagacaatcaaagactggagagactgaaaaaggttca ggagagtctattaaggtgatcaccggcccagccggacatatgtccagaaagtatgagccc caaactaagggcttttcctacttttaaacattttaaggcaggaactacatgaggtgggaa gcaagttacagaagcgagaaacaaaggcagttaatcaagcattacaacatttcttacatc ttgagaaaaacatgtcttgcaacctaaacttattggtcttgtgactgcagctgtgcagga gcttgctggcctgtaataaacttgagtaatttggagttggggagtatagataaggtccac tgtccacagagacaggacaggctgttaacattctcctttaacttgagtgtaagggggcag ggatcacactttgcagaaactttaagaggattttaaaatttctattactactactattaa gttacgtttgatttcattaatttcttcttcaaattctaggaagtctgatcccagagttaa tggtgaatgattacaaaacttatataatatctcacatatggttgtttatattttccaata aagttatatggggtgaataattataacaagtatttctgaaaaaaataacaattgagaata ctgatatatatagtccacagaaataatgagggaattttaataccatactccacattggct gaaactttctcactgcaggctatgctgtcccattcaacattctataaactaaagatgata gggagaatcttgtttaatctgaatgggagcttgaaggatacattaaatatggaaaatatg catccttccttaaagagaagtagcaaatgttttcattattcatttattcaaccacttagc caataatgttaacaattaaaggtgggagcattaaaaaaaggaaaagtggggtggggcggt gaaatgaacctcactgatttcttgcttttatcatttcaaactccaaaagtgtatttcaac tattcatatgatgattgaataaacttgttttacattatggaacacccttactgtgtagag ctgtgagaaattcaataatcagtatttaatgtggacaaaaatagatagattagagaaaca ataaacatggacatatctacacataatatatactccttttctatccaaaaatctgacaaa aaccttacacaatggtcagtttacaccattttaaggacaaggatgtggtgaaacaatttt aaacatttttatataataaaagaaaaaagttgttttaattgctaatgatcaagtttttac caagaatttacatatgcatcatcaaacagtagcttgaaactgaagcttgacctttgagaa ataaagcttttaaagtaaatgtctaaattctcttttctttattatgttacatttcattta ctttaggttttctagtgcaatttaaataagaaataaatagagagtatggtaatcaggtac aaaggctgtgaaaatattccctgcattttgcacagggaatatgttcaaattcctaatctg caaagaagaaaagtgcattttcttttttgaaatggcatttgaagatctctccatccatga acattcttgagcatatggattttacaataacagcagtgtatattagttttttaatttatc attaactgataacatatacagtaaaatataatttccgttttcttatctaagggcagaaaa tcccaacctatttctgaatctcactttggaacaaggacagaatgacctgacatggcattt ttgatttccaacgtttaaacgcataacgtttgtgtgctatggaagcatcttagcttctgg aatttatgctcgacacgaaacataaacaataaacaattcagtccacgttatatagacata tctcatatatatatgagatatatatatataaaatatcttacgcacatgcacacttagact ttctcggttttcatgaaactcacaatctacctcaggcgctcaaaggcactcggcctctca ggtctgaggcaccacagagaggcttccttgggcacagttgcttgctggtcaagacgccaa cttggcaaggttatgctgctggcaaaggcagattcgtcagaataaagtcgccacaggctc aaccaggcaaatcatgaatggccctttcagcaggagcctgagaggagggatgttattcag cccagcaaccctatttactttttttcaggatccccaaactcgcgttttttaaggctttcc gctacaagagagccagaattgaagcctgggttggcggcagtgcaggtacctgctcaccta agcatccctcttttaattttcctaactcctccacccacttgcccaagtataacttttgaa tggattcagcagaagtgaggcaggaaaggcaggaagatgagaaaaggcgcgacatcaaca cgcagagctcactgagccgccttgatgataaaaggcgggcacagggactacgtgggtggt ggcagaaagggcgcgggacacgcctcgcaaagagggaagagtgggcggggccacgtgccg ttgtcagagttcgcaactcgagcgcccagagggctcgcgaaaagtcccagcctgcaagcc aacctcgctcagcggacgactggccggatcccaacgcgctgccccttgcccagcctgcga gcgcgtggtacgaaggcgcgtctgcatccatgccccagcccggggagctggaggcgctcg cagtcagaggcgagtgatgctaggctgagcgcgtggcggcccgtgtcgtgccccgctgag ccaagtgcggaagggcagcggcgcgctccgactctgctcgccgcacgcagggcggggcgc ggctggggggcggggggcctggccgcccgctgggagctgcggacgagcaggcgcgctgag gacccgagggaggacacggttaaagcattgctatcaactgtgaacccagagagccctcct tagccaacacgctaactccgaagcctcccttacgcccccgaaccaccgaaggcggcgaca cctgattcagcgcacaaacacaggtcccttctgtcccggatacaattacgcggcagacac acactcaaactcgcgcggggcagccaagagacgaggtgagcggagggaccacgccgttcc agagggcggaagggggggcggtctgggtggaggagaggggcgttgtttgctccctgaggt gagtgccgggcgaatggctgctgtccaggggcgggggggtgggaattagaaagcacagac agggttgggctgaggaagttagaagtggttagtgggggttggcggggtagggagaaaggc taggggttgagggggtgtgaaggagatctttttaggctggcccctcatagcgcccgcgat cctgtttctgccatatggcatccgttcgagggttctctcgcgtttgtgcgctcaaggcag agggagggtataacggggaggcgattcggccgaggaccttgtacttgacccctgaaggga tgggcaccgcggagcccagaggccagagccagctgccagggcagggatggaggtggggga ccaggaggagcttcgggacctcttcaactcccctttactccgtcttccccctaaggagga acaagagcgggatgtgttgggggtggggaatgtcttggggctgtacaagctcgctgagac ttttcggggcccgccgtttccgcgtgcccgcgcgactctcgacaatggacagtgtacagt ggtggcagggtttcacgggtccccgcgccgatttgggaatggtacggggtgcccttgcat tagagaagcctgtgcgggcccccgggattagagaagaggtgagcgcctgcgggcggggag agaaggacgcgctgctggcaaaaagaagggcgccttccatggactggaggagcggcgagg gagcctggtatgtgtgttttggatttgaaggaggcaaaccgcatcggagagtgcaagtag atgcagtgcgtcggggcaaaaggacaatctgaggaggtgggcggggtacccaatctctat atacaagctttctggtttaattgagggtttataaaattgagtacggtaaacctgccattg tgcagcacttccttaggaaaggaatagcctctgtacctgtaccgcgagttcccatcacct ttcagcatctaggcggggagaggagagaaggtagagaacttctgagagagttctgagaac taacgcgaatggctggattgcgctaactaggcgttcttagcaggaaaatgtcgccaaggg aaggagtgggtggcggtatcaaaccaggcgtgctctggagaatcacctttattatgagcc agtgttagcagtgacgatagagtcttttatttctggggatcctattggtaaaaattagaa aaaaaccacactactccaaatactccaaaaatcttcagtcccctcaggctgtcaccttgt actaccagtttttcagtatcaatgacgtgaatgttagggagtttggctagatgccaaatt gttatcatttctgataatttcatgttcaaaaacgtctatattcactcaaaattggcgtgc ttctgacatatccccagaaaaaaatcaccaccatttgaaggaaggatagaaagtgccctt ttttcttttcttttttaaaaacttttattttaggttttnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnaaccaatattttaatagtgttaagagtaaccata ctttgtccccaggagattttaaaattagagtcatgcatcttaatattatatggtaaagga gaccagaactttgtgttacttgaattaaatcttagccagcttagttttctcctatagaaa ctgtctcttggtggtttgaggttttgctttaggaatgtgtctgtgttcagcaactcttct agataatgccagagaaaaacactctagtattgtatgatttgcctgccacctacaaggtta agttgacatctagtgttgtgtgacttagtattatacaagtctttaaaaactacaataaag catagtgccccgatatttagtaaaggcttgatactctattaggaggttgggcataactga acagaaactatacttctgactttttccaggtaatactactatacaggttcaaactctttt taaatctaagatttccagaggactatgcttgttgtatttaagcttggtacgcgtcttaga ataaataaaccaatgcagtaaataaagatgcttatgacaatttttttggataaaaactac ttttcgtttaaagattttgacagtttacatttcccaataatttaaattaaaacattttat tgcaccttatgaacaactatcagcttcattaccatttaaatattgctaaaaatcccatct tcaatcctcaaatcctttttaaaatcatgcacctgtttatcaaataatatcaatttgttg ctttttaattcatactaggattttccaacatacacctgggaatttctaagatatccttgg agagtgagaactttgattatttatgcaagatatagtatctcatttatttgacctcatctg cattggtacttatttacataagcacattttctcggtagctgactcttcacaccgtttatt actgaaacttaaaaaaaaagagccattataacccacaagcaatttctaaatcttactgtc ttccttgttcccttggagaatattggatataagttattctttcttgttgggttgtcttta tgaattcataaagaattgaatagtgctattcgctgaggtgtatagagctgtttttgttta tagtaaatggtctcagcctgctgggttgaaatactagaatagagacagaagtaccattct tctttttatgttttttttttttttctgggtcctctttctgttattggctatcactttctg ctgctgtcagactgtgagtgggccctgctgctctattatctgttgtctttttttccatca aatgatgtgggctattgcccgagtgtaaatcctgtagactttgtgacagacactttcagt ttacgtgctttatttgtggtttggactgtgaactgaaatgtttaggtttccagaagggtg acctctatataaaaggttttaatgtaccttttatagaaaaaatttgtgtatgagatcttt aggagggtgtattaatttcagacaattcagtcttctggaatatcttggtgtttctgagaa actgaagcagcctttaaaaaaaagtttactttcttatgataggcccagatcgtctttgtt taggaatatgtatactagaatattttacatattactttctgccattgaaacttactaact ataatgtagtgtagttaatatttttctaccccggaagtccagaataagaagtatgatttg agaatcctgctaaccatacttgaggctacatgaattatttaaatattattttgcaaatct acttatggaggaggtacataatcaaatacatatgattaaaatggtttaatttgatcttga aaacagtggtaatcatatctccttaatctgaagcatttcacaaacattcattcttttatt caacaaatatttgttaaaagcatactgtgtgccaagcactgtgctctggggatgtgcata caattaaggccctgggagggctctaggtttaaaagggggagacaaggaggaaacaggcta ttttcaatatctacagtatagtgctggctgttctgtaagtgccgtgatagagatacacac acagtaccctgagcagaaatttggaagtggtcaattctgtgtggggacaggagatttgag ggtttcttagagggtgacataaaagatgaatcactggaggcaggagcaagatcatggagg gctgtttgccattctcaggagtatggactattttcagtttaaggttgttttaaacagagg agctatgtgatgagaaaacagctcttacagcaagcaatctagaaggatggatctgaggaa tgtgagatttggatggtagtttagaagatattggggtggctttggtaagagaggttgagg tccttgtaccaaagtgttggcagtgggatgaaaaaaaggggtttggttacaacgatattc aggtggtagaacctctaagattaggtgaccaatttggttggaggattgggagtgggtggt ataggaggatgggatgggaacttaaaaggagatccagatttgagagcaagagtgatagag ttcagttttagacatgtagaacttgaagtgcttatagaaaatcgaactggagatgtgtgt ccagtgaatcattgtatgtgagttctgagcacagaggtcctgggctagataggcaggtag ataaatggtaggtagataaatggatagacacacagaaacatttaggagttttcagtgtat aggtaattataaaccgtgaaagttgataggttttccagggcaagtaccaaaagttaatag ggaaacagaggcctggggagatcctaaaatttaagagttgggtagagaaataggaagcaa taaagaaaaacgagaaggtacagaacaagagaggcaagaggagagttaggagggtgtaac ttttaaaagcagctgaaaactgagaaaatttcagaaagtgagaaggtcaggtactgaaat gagggggtgttagacagtctcggttcccaccccctctcgagccccgaatagctctgtgac cttgagctcacaatttagtttcatttcctcatttagcaaatgaggggagaagcacctact ttatagggctactgtgaggatcagatgagataaaatctatcaagcacaatgcctgggagg ttactcctttgtgtttaacaaatgctacctgttttatggaaaccaggacttggatttgtg attagaaggtccgtagtcatttttgctttcacaagtactcgctctcagaattcacagaga gtgattgaaaatatatttttaagatgaaaatttgatcatacttttaacttttttggttag aggccagtagaatagcagccattgaaagttttgaagggagcagagggagccaactgaaca aaagtccttgaagagagagtggaagttgcctcaagagcgttgcagagatgttgggtttct caggacaatccatgcttccgccagtcaggagggtctggtggcatatgcattgtgtttaaa aagcacctggtaattaaacctccaggtttctaatgtcttgtttaaagtactcttagtatt ttaagtgtgttctagttccttagataatctataaggacattgattatacattttagtatt agctgaaaataaactcagaaaacaatacagatgatttcaaaatttaacaaaatgtaattt aatatttgatgattccaaaggcacctctatacctcatagggtcttaaaaatcaacatgtg aaaaataattattagcctgtgtaataggggaaggactgatttgtagaggtttgtttgctt tttttttttttttcccccaattaagactactatggtaatatttaaatatacctacttttt tcctaggattttctgtttggatgaagtgatgctttatattagagattaataaacatttta aattagatttctattttaatatgtgtataagttttttactctgtaattactgcataattt gagaaaatgtttacatctatgcagtggtttggaagtcaaccaacaattgcacatagtaca tattgtttactaattggttataaggcatataataaatatttgttgtcccaatacatgttt gttgaataaaacaaaattagatattccatatttttaagatgaaatgaattttatatctgt agttttaatatttaatcatattctttacctgaaaatcgctagccatacttttcgttttca gcacttctctattttgcagttgtgtttcagtgcttttttttttcccactgtgcttagtcc ttcacgatttatggccctgttatttgtaaagcatagtaatctaagcaggtgattttcagt ttaggatttcttctagttccatgttgtatattcaccaagtcagttttagactatatattg tcatgctcagcactcccttacagagataacagatgccaagttgaaattgacttatacaga atgccacacttaggtaagtgtttaggagtgcatagccaaaataaaaatggtacctttggt ttatgaatggcagggcatctaggacagtgtaggtcagttcgtattctctggatggacatt tatgtaaatattaaaacatttagattagatggaaagatgtatagtaggaacatttcagag atactgtttccttgtagcattgtactatttatctgaattgttaatttggaataatcacta tattgattttattttgaagatataatttatttccattgacttgatataaataacagaaat gcatctgcaaggaatccagttcattaaagaaatttacattcgtatctgatttggtaaata tgtgcttgatttgggaatattatataacaaaatatttgtgaaatcattttgtacttttta attgttttgaagaaatttaatactcaaaatatttcaagacaccaaatcatgactttttag ttattacattaagctataacttgctttaatgagtaaaaactgtggggctaattttcttcc aagtgaaacatcaagtttaaataatgtgcacatgaatttacaaagtagaatcaagtgtgc aaatatgctatatgttctgtaataggttcttgaaaagctgtatataaactcagcttttat aagttgagtcatatttatagaataagaaaatagggcaaacatttaacacttcttaaatat taagtgatgggcatacagatgtttgttatgttaatctttatatttttctgtatgttagaa atagtttacagaaaaaaattgtggttagtgacattaaagcacattatatattaaatatac tcagggactttgaaatttaaagcaatatcaacatacaatttaccatagaaaaaatcctat tctgattttattaatacctgttttgcaaattcagaatttttaatgggtttaattaagcag atacacacacaacaacaacaacaacaaaaaacttggacacacttctgtaacacatcttga atatagtctcatctgttggaattttttaaacttctacacttaagtcttaatctgattctc cctcatcctcataatattcattattcagaggagaggcatcaacaggaagaattcaatttt acttaaataaaaattttttactagcaacggaatgagtgattcagtggtggtgtcttcctg ctttgagagagctgttctaagactaggcagttagtcttaggcctaagactaggcagttag tctaggggctccacagcaggcccttcctctgggttctctttatgaagaaggcctttgtgt tttacatagattttgcatttgtgctgctgtccctattgagttgttctcaccaagccaatc tggggagtagctcttaggggaaaactagtcaaacaggtttgcctgtctattcaaaaggag ggaagaacggtgaaatgtatggataaaggccaagagtaaaactgttttaagtaatattca tgctaatcactgactgttggttttgaaaatactttgtagacatttgtgagaatgtatctt aatggagataaagtaagagagaaactggaaggggaagtagtttagaagcaggggataagg acaagatggggagaaggaagggaaaaacaatggaggcaaaataagctcaagaaggaaaga atagtgtttcagtgtcatatctgggctttatgtaaactgggtgtgttatagcaggaaacc atggtattgatctgagaatgaatttaattaaaacaaacaaaaaacccccacatgcctgtg atcctcaatgtccaaaatagggagcgtcttgccagggcagggctgaagtttactaggaat attggtgggaaaatgttatagttaaccatctctgatactaatcttatattagagataata cttggtttcccctagtaaagtgttagagtcatacagctttattgtagtttgcaaatctat tacacatatatttctactgtttagttctcattctgcccctgaagcatagccctataggta gtagcattttacagctcataacttggtagagtcccaagtaacctttggaagtcaaaatga caatccagtctggggaatttgatgttaaaactaaagcttcttgcctagaacaagttttgg aactatagctcaccacctgcctcatgttctgggaatgctgaaccttgatcccactgaaat tacctttaagtagtctttctgtagtcctcagtaatctcctcctttgctccatgcttcttc cctgctccttgagagttccttctcctttctcatctgtaatggtaactccccatctgatct ttggttgtgcctcttcttccttcctggcatttttattaacatcttttgtcccagcacctc tcctcttctctattctcactcctttggtgtagttattcatgcccatagactggattttta tttatatatagatgattcaaaaagctgtatttctaattgtgatttcttctatactctcaa gagcttagagttcccagctcaacttcctaactcagatctgacatccccatttgcctcagt cttcgaagacagattttggtgttgtggaattcacttttttaccctttatatctaatctat tcctaattgatcactatgtccttcatgaagaccttcttcccttgcctgagtctttccatt cacctcctgatgtcttggtgtagtcccacatcacttcttcaccagtggtactaaatgact cattttcaccattacctgggccttccatcacctcctactgcccacaaccatcagtcactt gtccttactgacccacttgccttctttgagtgtgatttgactatgcgtttattttcacat ggggatccaaaatttttagcatggctttcgaggctgtctattgtctagctgctttctgat cagacttatttattttctcatattgctcaataggattctatcctgtttacccactggggc attcatttcaggatatccccccgcttgagatgtcctgccttttttttcttctgcacttcc aaattctcacccaaggttttgacactcaagtcccacctcagactgtggcttcccagtctc tgtcacttttctgtgactttgatttgtacatatattgatgttagcaattaggtttcattc tccctctaaggagagtatgaactcttagagggtggagtctcttgtctcctcatacacctt ggacaaacactattgatcagagagaagatattcaaacatgattgaatggtatctttagtt tgtcatgaggcaataagggtaagtttcctaattttctatacgtattttgattttttcctg ttgttttttccagaactgcatcattcaaacttgaaacattcccctatttttttttaacaa ggtgttctatataggaacaacatatcttatagggattgctacattatttagtcacaatgg aaaaaaaagtttgatatatagaaagattatataatagttacatggaactctactataaaa ttataatggtttagtgggttttcttttttaaatttacctggggcttctataggtggatgt gcattcacaactaattctaggtaatgctaataaaagaaaataatcacggtgataattaac tgtatttagtttttattttgtatttacatttcaatgtaagtatgcctaatgtcatgaaac catagttggagaaaataatgattatatgcatgaggaagataacttaggaagcattctaat tgtccaactccccacttccccatagaccttttataacaattaaatcaaagcaccagccca ggaataattcttttttccattctttgtgcctatttcctcctttgataattgctgaattct tctggtcagaatatctccctttccatctccctctgagcagagcaataggaataggaatat attttaagccaggtgaagtggtgcaggtctgaggtnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnaaaaaaa gaatatattttaagaaaattcctcagctaaagtaaaaagaaaagagagataagctaggta agattattaatcagaagtaatgcttgtctgtttaatttgccagttttaaataattcttga aggcatttgagaggtcattatattcttttaatcagttagactttatctgaatatatgctt tggaatatactttttcttcaaacttctaaactatgtgcaatatacagaaacaaggtcctc tggctgtcctgctttctgcttgctttttttctctccctttcacttttctggagataattt ttctgcctcttgtagggaattcagaggagggatcctagtgattggctttacaaatttgtg tgggggagggattggatagagtggagatgggataggcatctgcccatcagtaggatttgg ttgtttgtccagcctctatatgggctttaaatgctttatactttctgaccattatttttg ccagcaaaaatgatgtagcttattttgatactacttaaatgcagtcctgtccttattcta cccctactactccatccaaagggagactgaaggggagacattttgaaccagacatgttga tttcctggtggcctgtacagtgagtattttgtaacatttagaattaattcagtcattcat ttaataagtaacctaatatatgctagaaatcttctgtcttagagccaaaatatattgtaa ttttttcatggtctaagtatctaaggctttacagtataaaagtttaataaagtcatgacc aaggctttacaaagactgctattttgattcatgtatcacagaattcaataaaaagtcaaa ctattatctgacatatggcagtgacaactttggagggtgtttatatggtacaaatagaag atgctggattaggacctaggaaacctgaatgctggttccgactctgccagcgaataactg aactagctgtgtaatgcaaagacaaccaccccatagagactttgcccctttctctacaaa tgaggggtttgaaatcacctatctttagtttttaaattgtgactcatttatttaatgaaa atatatggtttttaaaattttgatatcattttggtacaaatatattgtgtcaatgtttaa atatatttatcttattttcacaagacaatggcaatgatatacaaaagactttctctggac ttgcctcattgtggtgaatatcggttataatcatagatacttagaagacacaaaattata cattcttacctgttagtaaaggaacatcaaagaaattcattctgtttttctgcagaaact atcaaaatttttccacttctggggattaagataacttctgttttttatctaatactcaac ctatctaatacaaccttgtcctagtgatgatggtagaattttataaaatagggaagtttg ttatttgttcacctgtgaagaccttcaaggaattgaataggaaaatacctagagttttgt ttcttgttctgatttgcttgtttatatttggccagttttgtcctgaaaatatgtttgaga gattacatgaaaatttattttgtgactcttaatataaatgttaaaaatacatttaaacta gtgatagtaaagaaagaatatcttaattagtgggggaaataatcccatgatcttttgtga ttctacaacattactgtagccttaagaatctctactaatgttgtattgaaaagctgtttc tcctacatgtgaa -------------------------------------------------------------------------------- # Output_end: pretty gc_object_end: repeat_grailexp --output pretty From lifei03 at gmail.com Tue May 17 11:24:50 2005 From: lifei03 at gmail.com (Frank) Date: Tue May 17 11:17:52 2005 Subject: [Bioperl-l] about pI prediction Message-ID: <428A0CC2.9000608@gmail.com> Dose anyone use the bioperl module to predict pI and molecular weight of a protein? Thanks Frank From akmala at nmrc.navy.mil Tue May 17 10:01:43 2005 From: akmala at nmrc.navy.mil (Arya Akmal) Date: Tue May 17 13:38:12 2005 Subject: [Bioperl-l] error parsing -m8 blast output Message-ID: Hello- I'm running bioperl 1.5 on mac OSX (10.3.9). I've run into a problem with Parsing m8 blast output with Bio::SearchIO::blasttable. Specifically, the following lines: >use Bio::SearchIO; >$file='test.out'; >$parser = new Bio::SearchIO(-file => $file, -format => 'blasttable'); >$result = $parser->next_result; result in the following error message: >Can't call method "bits" on an undefined value at /Library/Perl/5.8.1/Bio/Search/Hit/GenericHit.pm line 349, line 2. the blast report I'm parsing looks like: >b0002 b0002 100.00 820 0 0 1 820 1 820 0.0 1599.7 >b0002 YE0600 82.07 820 146 1 1 820 1 819 0.0 1336.2 >b0002 YPTB0602 81.95 820 147 1 1 820 1 819 0.0 1335.1 >b0002 YPO0459 81.95 820 147 1 1 820 1 819 0.0 1335.1 note that the bitscore on line 2 is 1336.2, and is not undefined. However, it seems to me that $hit_object, which should contain the information in line 2 has not been defined for some reason. If I try to parse a report with only one line, then I crash on line 1, otherwise I always crash on line 2, as indicated in the error message above. note, also that I can access other information in the object, depending on the method used. For example, >print $parser->result_count; returns 1. Also, I have no problem parsing standard blast output with Bio::SearchIO, and all the relevant methods seem to work fine in that case. I have also tried this on a different installation of bioperl 1.5 on Mac OSX, and didn't run into this problem. I can't determine how that installation is different from mine. I'd appreciate whatever insight anyone could offer ... thanks. From jason.stajich at duke.edu Tue May 17 15:27:49 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue May 17 15:25:09 2005 Subject: [Bioperl-l] error parsing -m8 blast output In-Reply-To: References: Message-ID: <844283A9-562F-4EF3-A3B4-E8045CD52E29@duke.edu> This works great for me on 1.5.0 branch and the HEAD code on Linux and OSX. I am running perl 5.8.6 on OSX. use Bio::SearchIO; use strict; use warnings; my $file='test.out'; my $parser = new Bio::SearchIO(-file => $file, -format => 'blasttable'); while( my $result = $parser->next_result ) { while( my $hit = $result->next_hit ) { my $n = 1; while( my $hsp = $hit->next_hsp ) { printf "query=%s hit=%s HSP=%d bitscore=%s\n", $result->query_name, $hit->name, $n++, $hsp->bits; } } } Please not that the Hit object will not have any score or significance data because -m8/-m9 output is just HSPs. Perhaps that is the problem, I don't see any part of your example code where you are even printing the bitscore so I can't really tell how you got to the specific error message. You'll probably want to make sure that you aren't dealing with mixed versions on your machine. % perl -MBio::SearchIO::blasttable -e 'print $Bio::SearchIO::blasttable::VERSION, "\n"' and see the Revision % perldoc -m Bio::Search::Hit::GenericHit | grep \$Id (I have 1.2) % perldoc -m Bio::SearchIO::blasttable | grep \$Id (I have 1.34) -jason On May 17, 2005, at 10:01 AM, Arya Akmal wrote: > use Bio::SearchIO; > >$file='test.out'; > >$parser = new Bio::SearchIO(-file => $file, -format => 'blasttable'); > >$result = $parser->next_result; > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From Marc.Logghe at devgen.com Tue May 17 16:08:40 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue May 17 16:01:24 2005 Subject: [Bioperl-l] about pI prediction Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E780@ANTARESIA.be.devgen.com> Hi Frank, > Dose anyone use the bioperl module to predict pI and > molecular weight > of a protein? Thanks Don't know of a pure BioPerl module to predict pI. You can use Bio::Tools::SeqStats to get the molecular weight, though. If you have EMBOSS installed, you can use the Bioperl wrapper to the EMBOSS application 'pepstats' for both the pI and mol weight. Have a look at the docs for Bio::Factory::EMBOSS. However, you have to parse out the corresponding values. The EMBOSS docs, including an example output you can find at http://emboss.sourceforge.net/apps/pepstats.html HTH, Marc From EwingAD at hiram.edu Wed May 18 00:31:47 2005 From: EwingAD at hiram.edu (Ewing, Adam D.) Date: Wed May 18 00:24:44 2005 Subject: [Bioperl-l] about pI prediction Message-ID: I'm not sure what the specific question is but Check out Bio::Tools::pICalculator for calculating isoelectric points, general syntax: my $calc = Bio::Tools::pICalculator->new(-places => 2, -pKset => 'EMBOSS'); $calc->seq($seqobj); #$seqobj is a Bio::Seq object my $iep = $calc->iep; For molecular weight, I ended up running pepstats (EMBOSS) from within a script and parsing the output which is fairly straightforward. Cheers, Adam Ewing Hiram Genomics Initiative http://hgi.hiram.edu -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org on behalf of Frank Sent: Tue 5/17/2005 11:24 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] about pI prediction Dose anyone use the bioperl module to predict pI and molecular weight of a protein? Thanks Frank _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From S2100086 at student.rmit.edu.au Wed May 18 02:23:22 2005 From: S2100086 at student.rmit.edu.au (Daniel Park) Date: Wed May 18 02:19:09 2005 Subject: [Bioperl-l] Where has BioPerl been used Message-ID: <1116397402.5d5b711cS2100086@student.rmit.edu.au> Hi all, I'm doing a report/assignment for a bio-informatics subject that I'm studying. I was wondering If any body could tell me of some examples where bio-perl has been used for research or any other bio-technology/Medically related purpose. Any thing would be helpful websites, personal stories, use cases/case studies, research papers well anything would help even pointers of where to look. Cheers, Daniel From ed at compbio.berkeley.edu Wed May 18 02:47:22 2005 From: ed at compbio.berkeley.edu (Ed Green) Date: Wed May 18 02:39:57 2005 Subject: [Bioperl-l] Where has BioPerl been used In-Reply-To: <1116397402.5d5b711cS2100086@student.rmit.edu.au> References: <1116397402.5d5b711cS2100086@student.rmit.edu.au> Message-ID: <428AE4FA.20304@compbio.berkeley.edu> Daniel, You may want to start with the 92 papers that cite the Bioperl paper: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12368254 You can get a list of these via Web of Science if you have access: http://isi02.isiknowledge.com/portal.cgi/ If you're looking for more personal testimonials I'd be happy to describe how bioperl has made my life easier. Ed Green UC Berkeley Daniel Park wrote: > Hi all, > > I'm doing a report/assignment for a bio-informatics subject that I'm studying. > > I was wondering If any body could tell me of some examples where bio-perl has been used for research or any other bio-technology/Medically related purpose. Any thing would be helpful websites, personal stories, use cases/case studies, research papers well anything would help even pointers of where to look. > > Cheers, > Daniel > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Wed May 18 07:32:16 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed May 18 07:30:52 2005 Subject: [Bioperl-l] about pI prediction In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E780@ANTARESIA.be.devgen.com> Message-ID: Frank, I've used both Tools/pICalculator and Tools/SeqStats. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Marc Logghe Sent: Tuesday, May 17, 2005 4:09 PM To: Frank; bioperl-l@portal.open-bio.org Subject: RE: [Bioperl-l] about pI prediction Hi Frank, > Dose anyone use the bioperl module to predict pI and > molecular weight > of a protein? Thanks Don't know of a pure BioPerl module to predict pI. You can use Bio::Tools::SeqStats to get the molecular weight, though. If you have EMBOSS installed, you can use the Bioperl wrapper to the EMBOSS application 'pepstats' for both the pI and mol weight. Have a look at the docs for Bio::Factory::EMBOSS. However, you have to parse out the corresponding values. The EMBOSS docs, including an example output you can find at http://emboss.sourceforge.net/apps/pepstats.html HTH, Marc _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lifei03 at gmail.com Wed May 18 10:12:33 2005 From: lifei03 at gmail.com (Frank) Date: Wed May 18 10:07:22 2005 Subject: [Bioperl-l] about pI prediction In-Reply-To: References: Message-ID: <428B4D51.3070109@gmail.com> Hi, Thanks all, I wil install them and do some analysis. I did it via net in http://us.expasy.org/tools/pi_tool.html before. It is not so convenient for mass data analysis. Hi, Brian, do you have any idea about the principle of cacluate pI? I searched some references published in 1980s. I am not sure whether it is out of date. Frank Brian Osborne wrote: >Frank, > >I've used both Tools/pICalculator and Tools/SeqStats. > >Brian O. > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Marc Logghe >Sent: Tuesday, May 17, 2005 4:09 PM >To: Frank; bioperl-l@portal.open-bio.org >Subject: RE: [Bioperl-l] about pI prediction > > >Hi Frank, > > >>Dose anyone use the bioperl module to predict pI and >>molecular weight >>of a protein? Thanks >> >> >Don't know of a pure BioPerl module to predict pI. You can use >Bio::Tools::SeqStats to get the molecular weight, though. >If you have EMBOSS installed, you can use the Bioperl wrapper to the >EMBOSS application 'pepstats' for both the pI and mol weight. Have a >look at the docs for Bio::Factory::EMBOSS. >However, you have to parse out the corresponding values. The EMBOSS >docs, including an example output you can find at >http://emboss.sourceforge.net/apps/pepstats.html >HTH, >Marc > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From samuel.thoraval at librophyt.com Wed May 18 06:42:52 2005 From: samuel.thoraval at librophyt.com (Samuel Thoraval) Date: Wed May 18 13:09:22 2005 Subject: [Bioperl-l] Bio::Tools::ESTScan and ESTScan v2.0 beta Message-ID: <200505181242.52770.samuel.thoraval@librophyt.com> Hi, I have modified Bio::Tools::ESTScan to make it compliant with new ESTScan version 2.0 beta. I am not familiar with the former version, and I don't know about all the changes, but one of them concerns the generated output. Instead of having 5 numbers following the sequence id, there are only 3, respectively being the score, start position and end position. The score can be negative. I also wanted ESTScan.pm to be able to parse the ESTScan (version 2.0 only) ? protein fasta file (which can be generated with option -t). I haven't added any support to the 'all-in-one' format for ESTScan v2. Below is the diff from ESTScan.pm version 1.11 with ESTScan.pm 1.10 : ~~~~~~~~~~~~~~~~~~~~~~~~ 1c1 < # $Id: ESTScan.pm,v 1.11 2005/05/18 07:38:45 lapp Exp $ --- > # $Id: ESTScan.pm,v 1.10 2002/10/22 07:38:45 lapp Exp $ 172d171 < ? ? my $alphabet; 182c181 < ? ? $seq->desc() =~ /^(\-?[\d.]+)\s*(.*)/ or --- > ? ? $seq->desc() =~ /^([\d.]+)\s*(.*)/ or 187,195d185 < ? ? # translated may end the description < ? ? if($seq->desc() =~ /(.*)translated$/) { < ? ? ? my $desc = $1; < ? ? ? $desc =~ s/;\s+$//; < ? ? ? $seq->desc($desc); < ? ? ? $alphabet = "protein"; < ? ? } else { < ? ? ? $alphabet = "dna"; < ? ? } 230,264d219 < ? ? } elsif ($seq->desc() =~ /^(\d+)\s+(\d+)\s*(.*)/) { < ? ? ? # default ESTSCAN v2 format < ? ? ? $seq->desc($3); < ? ? ? $predobj = Bio::Tools::Prediction::Exon->new('-source' => "ESTScan", < ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'-start' => $1, < ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'-end' => $2); < ? ? ? $predobj->strand($gene->strand()); < ? ? ? $predobj->score($gene->score()); # FIXME or $1, or $2 ? < ? ? ? $predobj->primary_tag("InternalExon"); < ? ? ? $predobj->seq_id($seq->display_id()); < ? ? ? # add to gene structure object < ? ? ? $gene->add_exon($predobj); < ? ? ? if ($alphabet eq "dna") { < ? ? ? ? ? ? ? # add predicted CDS < ? ? ? ? ? ? ? $cds = $seq->seq(); < ? ? ? ? ? ? ? $cds =~ s/[a-z]//g; # remove the deletions, but keep the insertions < ? ? ? ? ? ? ? $cds = Bio::PrimarySeq->new('-seq' => $cds, < ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-display_id' => $seq->display_id(), < ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-desc' => $seq->desc(), < ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-alphabet' => "dna"); < ? ? ? ? ? ? ? $gene->predicted_cds($cds); < ? ? ? ? ? ? ? $predobj->predicted_cds($cds); < ? ? ? ? ? ? ? if($gene->strand() == -1) { < ? ? ? ? ? ? ? $self->warn("reverse strand ORF, but unable to reverse coordinates!"); < ? ? ? ? ? ? ? } < ? ? ? } elsif ($alphabet eq "protein") { < ? ? ? ? ? ? ? # add predicted Protein < ? ? ? ? ? ? ? $cds = $seq->seq(); < ? ? ? ? ? ? ? $cds = Bio::PrimarySeq->new('-seq' => $cds, < ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-display_id' => $seq->display_id(), < ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-desc' => $seq->desc(), < ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-alphabet' => "protein"); < ? ? ? ? ? ? ? $gene->predicted_protein($cds); < ? ? ? ? ? ? ? $predobj->predicted_protein($cds); < ? ? ? } ~~~~~~~~~~~~~~~~~~~~~~~~ Regards, -- Samuel Thoraval From bwang at tc.cornell.edu Wed May 18 10:55:29 2005 From: bwang at tc.cornell.edu (Baohua Wang) Date: Wed May 18 13:09:28 2005 Subject: [Bioperl-l] load_seqdatabase.pl Message-ID: <395BBF33F0ED3F42876CB9C653729F4A3C90DF@mail.tc.cornell.edu> Dear Hilmar, I recently installed BioSQL on Windows and have the similar problem (not use cygwin) and didn't find any solution on Web. Then I went one step further by myself. It took me sometime to find out what's the problem. The script failed on throw() in loading Bio/Root/Root.pm on Windows. The problem lines are those "throw $class (...". After I put comma after $class as "throw $class, (...", the BioSQL tests and load scripts are succeeded (ftp://ftp.tc.cornell.edu/Outgoing/bwang/bioperl/MSSQL/package-modificat ion/perl/Bio/Root/Root.pm). I wish someone can work through the Root.pm and do more test on different Platforms and make some changes if necessary. Best wishes, Baohua Wang ________________________________ [Bioperl-l] load_seqdatabase.pl Hilmar Lapp hlapp at gnf.org Fri Mar 12 21:01:01 EST 2004 * Previous message: [Bioperl-l] load_seqdatabase.pl * Next message: [Bioperl-l] protein networks * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] ________________________________ I have seen this before I believe from someone on a Windows machine. The problem could be some trouble with the dynamic loading of modules. The -d option to perl invokes the debugger. Do you see the same problem without invoking the debugger? Did the bioperl-db tests pass? Root::debug() is invoked all over the place many many times. By default, you won't see any of that since $obj->verbose() governs whether the messages are printed or suppressed. You can try to pass --debug to load_seqdatabase.pl and see when it crashes then. (All messages that you see in addition after enabling --debug will have been routed through Root::debug.) -hilmar On Friday, March 12, 2004, at 11:54 AM, Barry Moore wrote: > I'm having some trouble with load_seqdatabase.pl. I using BioPerl 1.4 > (with some CVS updates including bioperl-db and bioperl-live Bio::DB > yesterday), ActiveState Perl 5.8, MySQL version 12.22 distribution > 4.0.16, Windows XP. I've successfully created a database mysql with > the biosqldb-mysql.sql schema downloaded yesterday. I have a file > RefSeq_Human_mRNA_short.gb that has RefSeq mRNA in GenBank format (I > pasted the first record from that file below). I run > load_seqdatabase.pl like this: > > >perl -d load_seqdatabase.pl --dbpass ******* > RefSeq_Human_mRNA_short.gb > > All of the script defaults should be O.K., but I've tried setting them > all on the command line as well. > > The script opens the file, and gets sequence out of it O.K., but then > somewhere in the eval statement that begins at line 500 it heads off > to Bio::Root::Root, and loses the plot with this error: > > Undefined subroutine &Bio::Root::Root::debug called at > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1526, > line 413. > Bio::DB::BioSQL::BasePersistenceAdaptor::_get_driver_class('Bio::DB::Bi > o > SQL::SeqAdaptor=HASH(0x1d22200)','Bio::DB::BioSQL::mysql::','Driver','B > io::DB::B > ioSQL::SeqAdaptor') called at > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdap > tor.pm line 1507 > Bio::DB::BioSQL::BasePersistenceAdaptor::dbd('Bio::DB::BioSQL::SeqAdapt > o > r=HASH(0x1d22200)') called at > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdap > tor.pm line 1386 > Bio::DB::BioSQL::BasePersistenceAdaptor::rollback('Bio::DB::BioSQL::Seq > A > daptor=HASH(0x1d22200)') called at load_seqdatabase.pl line 524 > > I'm happy to try to figure this out myself, but right now I haven't a > clue what's going on since the subroutine &Bio::Root::Root::debug > seems to me to be where it's supposed to be. Is this possible a OS > issue since I'm on Windows? Can anyone nudge me in the right > direction? > > Barry > -- > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > > LOCUS NM_000014 4577 bp mRNA linear PRI 20-DEC-2003 > DEFINITION Homo sapiens alpha-2-macroglobulin (A2M), mRNA. > ACCESSION NM_000014 > VERSION NM_000014.3 GI:6226959 > KEYWORDS . > SOURCE Homo sapiens (human) > ORGANISM Homo sapiens > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. > REFERENCE 1 (bases 1 to 4577) > AUTHORS Mathew,S., Arandjelovic,S., Beyer,W.F., Gonias,S.L. and > Pizzo,S.V. > TITLE Characterization of the interaction between alpha2-macroglobulin > and fibroblast growth factor-2: the role of hydrophobic > interactions > JOURNAL Biochem. J. 374 (Pt 1), 123-129 (2003) > PUBMED 12755687 > REMARK GeneRIF: FGF-2 and this protein interact at specific binding > sites, > involving different FGF-2 sequences. > REFERENCE 2 (bases 1 to 4577) > AUTHORS Athauda,S.B., Nishigai,M., Arakawa,H., Ikai,A., Ukai,M. and > Takahashi,K. > TITLE Inhibition of human pepsin and gastricsin by alpha2-macroglobulin > JOURNAL J Enzyme Inhib Med Chem 18 (3), 219-224 (2003) > PUBMED 14506912 > REMARK GeneRIF: alpha2-macroglobulin inhibits human pepsin adn > gastricsin > REFERENCE 3 (bases 1 to 4577) > AUTHORS Arandjelovic,S., Freed,T.A. and Gonias,S.L. > TITLE Growth factor-binding sequence in human alpha2-macroglobulin > targets the receptor-binding site in transforming growth > factor-beta > JOURNAL Biochemistry 42 (20), 6121-6127 (2003) > PUBMED 12755614 > REMARK GeneRIF: alpha(2)M-derived peptides target the receptor-binding > sequence in TGF-beta > REFERENCE 4 (bases 1 to 4577) > AUTHORS Shibata,M., Sakai,H., Sakai,E., Okamoto,K., Nishishita,K., > Yasuda,Y., Kato,Y. and Yamamoto,K. > TITLE Disruption of structural and functional integrity of alpha > 2-macroglobulin by cathepsin E > JOURNAL Eur. J. Biochem. 270 (6), 1189-1198 (2003) > PUBMED 12631277 > REMARK GeneRIF: These results suggest the possible involvement of > cathepsin E in disruption of the structural and functional > integrity of alpha 2-macroglobulin in the endolysosome system. > REFERENCE 5 (bases 1 to 4577) > AUTHORS Zappia,M., Cittadella,R., Manna,I., Nicoletti,G., Andreoli,V., > Bonavita,S., Gambardella,A. and Quattrone,A. > TITLE Genetic association of alpha2-macroglobulin polymorphisms with AD > in southern Italy > JOURNAL Neurology 59 (5), 756-758 (2002) > PUBMED 12221172 > REMARK GeneRIF: Genetic association of alpha2-macroglobulin > polymorphisms > with Alzheimer's disease > REFERENCE 6 (bases 1 to 4577) > AUTHORS Ghebremedhin,E., Schultz,C., Thal,D.R., Del Tredici,K., > Rueb,U. and > Braak,H. > TITLE Genetic association of argyrophilic grain disease with > polymorphisms in alpha-2 macroglobulin and low-density lipoprotein > receptor-related protein genes > JOURNAL Neuropathol Appl Neurobiol 28 (4), 308-313 (2002) > PUBMED 12175343 > REMARK GeneRIF: Genetic association of argyrophilic grain disease with > polymorphisms in alpha-2 macroglobulin. > REFERENCE 7 (bases 1 to 4577) > AUTHORS Kolodziej,S.J., Wagenknecht,T., Strickland,D.K. and Stoops,J.K. > TITLE The three-dimensional structure of the human alpha > 2-macroglobulin > dimer reveals its structural organization in the tetrameric native > and chymotrypsin alpha 2-macroglobulin complexes > JOURNAL J. Biol. Chem. 277 (31), 28031-28037 (2002) > PUBMED 12015318 > REMARK GeneRIF: The three-dimensional structure of the dimer reveals > its > structural organization in the tetrameric native and chymotrypsin > alpha 2-macroglobulin complexes. > REFERENCE 8 (bases 1 to 4577) > AUTHORS McElhinney,B., Ardill,J., Caldwell,C., Lloyd,F. and McClure,N. > TITLE Ovarian hyperstimulation syndrome and assisted reproductive > technologies: why some and not others? > JOURNAL Hum. Reprod. 17 (6), 1548-1553 (2002) > PUBMED 12042276 > REMARK GeneRIF: relationship between serum VEGF levels, alpha(2)M > levels > and the development of OHSS in hyperstimulated subjects undergoing > IVF > REFERENCE 9 (bases 1 to 4577) > AUTHORS Mettenburg,J.M., Webb,D.J. and Gonias,S.L. > TITLE Distinct binding sites in the structure of alpha 2-macroglobulin > mediate the interaction with beta-amyloid peptide and growth > factors > JOURNAL J. Biol. Chem. 277 (15), 13338-13345 (2002) > PUBMED 11823454 > REMARK GeneRIF: distinct binding sites mediate interaction with > beta-amyloid peptide and growth factors > REFERENCE 10 (bases 1 to 4577) > AUTHORS Cvirn,G., Gallistl,S., Koestenberger,M., Kutschera,J., > Leschnik,B. > and Muntean,W. > TITLE Alpha 2-macroglobulin enhances prothrombin activation and > thrombin > potential by inhibiting the anticoagulant protein C/protein S > system in cord and adult plasma > JOURNAL Thromb. Res. 105 (5), 433-439 (2002) > PUBMED 12062545 > REMARK GeneRIF: Alpha 2-macroglobulin enhances prothrombin activation > and > thrombin potential by inhibiting the anticoagulant protein > C/protein S system in cord and adult plasma. > REFERENCE 11 (bases 1 to 4577) > AUTHORS Janka,Z., Juhasz,A., Rimanoczy,A., Boda,K., Marki-Zay,J., > Palotas,M., Kuk,I., Zollei,M., Jakab,K. and Kalman,J. > TITLE Alpha2-macroglobulin exon 24 (Val-1000-Ile) polymorphism is not > associated with late-onset sporadic Alzheimer's dementia in the > Hungarian population > JOURNAL Psychiatr. Genet. 12 (1), 49-54 (2002) > PUBMED 11901360 > REMARK GeneRIF: has an important role in the AD-specific > neurodegenerative > process but its exon 24 Val-1000-Ile polymorphism is not likely to > be associated with late-onset sporadic AD in the Hungarian > population > REFERENCE 12 (bases 1 to 4577) > AUTHORS Chiabrando,G.A., Vides,M.A. and Sanchez,M.C. > TITLE Differential binding properties of human pregnancy zone protein- > and alpha2-macroglobulin-proteinase complexes to low-density > lipoprotein receptor-related protein > JOURNAL Arch. Biochem. Biophys. 398 (1), 73-78 (2002) > PUBMED 11811950 > REMARK GeneRIF: Differential binding to ldl receptor related protein > REFERENCE 13 (bases 1 to 4577) > AUTHORS Toombs,C.F. > TITLE Alfimeprase: pharmacology of a novel fibrinolytic > metalloproteinase > for thrombolysis > JOURNAL Haemostasis 31 (3-6), 141-147 (2001) > PUBMED 11910179 > REMARK GeneRIF: REVIEW: binds and neutralizes alfimeprase, which has > direct proteolytic activity against the fibrinogen Aalpha chain > REFERENCE 14 (bases 1 to 4577) > AUTHORS Borth,W. > TITLE Alpha 2-macroglobulin, a multifunctional binding protein with > targeting characteristics > JOURNAL FASEB J. 6 (15), 3345-3353 (1992) > PUBMED 1281457 > REFERENCE 15 (bases 1 to 4577) > AUTHORS Poller,W., Faber,J.P., Klobeck,G. and Olek,K. > TITLE Cloning of the human alpha 2-macroglobulin gene and detection of > mutations in two functional domains: the bait region and the > thiolester site > JOURNAL Hum. Genet. 88 (3), 313-319 (1992) > PUBMED 1370808 > REFERENCE 16 (bases 1 to 4577) > AUTHORS Poller,W., Faber,J.P. and Olek,K. > TITLE Sequence polymorphism in the human alpha-2-macroglobulin (A2M) > gene > JOURNAL Nucleic Acids Res. 19 (1), 198 (1991) > PUBMED 1707161 > REFERENCE 17 (bases 1 to 4577) > AUTHORS Bell,G.I., Rall,L.B., Sanchez-Pescador,R., Merryweather,J.P., > Scott,J., Eddy,R.L. and Shows,T.B. > TITLE Human alpha 2-macroglobulin gene is located on chromosome 12 > JOURNAL Somat. Cell Mol. Genet. 11 (3), 285-289 (1985) > PUBMED 2408344 > REFERENCE 18 (bases 1 to 4577) > AUTHORS Kan,C.C., Solomon,E., Belt,K.T., Chain,A.C., Hiorns,L.R. and > Fey,G. > TITLE Nucleotide sequence of cDNA encoding human alpha 2-macroglobulin > and assignment of the chromosomal locus > JOURNAL Proc. Natl. Acad. Sci. U.S.A. 82 (8), 2282-2286 (1985) > PUBMED 2581245 > COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. > The > reference sequence was derived from M11313.1. > On Nov 4, 1999 this sequence version replaced gi:6226762. > > Summary: Alpha-2-macroglobulin is a protease inhibitor and cytokine > transporter. It inhibits many proteases, including trypsin, > thrombin and collagenase. A2M is implicated in Alzheimer disease > (AD) due to its ability to mediate the clearance and degradation of > A-beta, the major component of beta-amyloid deposits. > FEATURES Location/Qualifiers > source 1..4577 > /organism="Homo sapiens" > /mol_type="mRNA" > /db_xref="taxon:9606" > /chromosome="12" > /map="12p13.3-p12.3" > gene 1..4577 > /gene="A2M" > /db_xref="GeneID:2" > /db_xref="LocusID:2" > /db_xref="MIM:103950" > CDS 44..4468 > /gene="A2M" > /note="go_function: protein carrier activity [goid > 0008320] [evidence NR]; > go_function: endopeptidase inhibitor activity [goid > 0004866] [evidence NR]; > go_function: wide-spectrum protease inhibitor activity > [goid 0017114] [evidence IEA]; > go_function: serine protease inhibitor activity [goid > 0004867] [evidence IEA]; > go_function: alpha-2 macroglobulin [goid 0016975] > [evidence NAS]; > go_process: intracellular protein transport [goid 0006886] > [evidence NR]" > /codon_start=1 > /product="alpha 2 macroglobulin precursor" > /protein_id="NP_000005.1" > /db_xref="GI:4557225" > /db_xref="GeneID:2" > /db_xref="LocusID:2" > /db_xref="MIM:103950" > /translation="MGKNKLLHPSLVLLLLVLLPTDASVSGKPQYMVLVPSLLHTETT > EKGCVLLSYLNETVTVSASLESVRGNRSLFTDLEAENDVLHCVAFAVPKSSSNEEVMF > LTVQVKGPTQEFKKRTTVMVKNEDSLVFVQTDKSIYKPGQTVKFRVVSMDENFHPLNE > LIPLVYIQDPKGNRIAQWQSFQLEGGLKQFSFPLSSEPFQGSYKVVVQKKSGGRTEHP > FTVEEFVLPKFEVQVTVPKIITILEEEMNVSVCGLYTYGKPVPGHVTVSICRKYSDAS > DCHGEDSQAFCEKFSGQLNSHGCFYQQVKTKVFQLKRKEYEMKLHTEAQIQEEGTVVE > LTGRQSSEITRTITKLSFVKVDSHFRQGIPFFGQVRLVDGKGVPIPNKVIFIRGNEAN > YYSNATTDEHGLVQFSINTTNVMGTSLTVRVNYKDRSPCYGYQWVSEEHEEAHHTAYL > VFSPSKSFVHLEPMSHELPCGHTQTVQAHYILNGGTLLGLKKLSFYYLIMAKGGIVRT > GTHGLLVKQEDMKGHFSISIPVKSDIAPVARLLIYAVLPTGDVIGDSAKYDVENCLAN > KVDLSFSPSQSLPASHAHLRVTAAPQSVCALRAVDQSVLLMKPDAELSASSVYNLLPE > KDLTGFPGPLNDQDDEDCINRHNVYINGITYTPVSSTNEKDMYSFLEDMGLKAFTNSK > IRKPKMCPQLQQYEMHGPEGLRVGFYESDVMGRGHARLVHVEEPHTETVRKYFPETWI > WDLVVVNSAGVAEVGVTVPDTITEWKAGAFCLSEDAGLGISSTASLRAFQPFFVELTM > PYSVIRGEAFTLKATVLNYLPKCIRVSVQLEASPAFLAVPVEKEQAPHCICANGRQTV > SWAVTPKSLGNVNFTVSAEALESQELCGTEVPSVPEHGRKDTVIKPLLVEPEGLEKET > TFNSLLCPSGGEVSEELSLKLPPNVVEESARASVSVLGDILGSAMQNTQNLLQMPYGC > GEQNMVLFAPNIYVLDYLNETQQLTPEVKSKAIGYLNTGYQRQLNYKHYDGSYSTFGE > RYGRNQGNTWLTAFVLKTFAQARAYIFIDEAHITQALIWLSQRQKDNGCFRSSGSLLN > NAIKGGVEDEVTLSAYITIALLEIPLTVTHPVVRNALFCLESAWKTAQEGDHGSHVYT > KALLAYAFALAGNQDKRKEVLKSLNEEAVKKDNSVHWERPQKPKAPVGHFYEPQAPSA > EVEMTSYVLLAYLTAQPAPTSEDLTSATNIVKWITKQQNAQGGFSSTQDTVVALHALS > KYGAATFTRTGKAAQVTIQSSGTFSSKFQVDNNNRLLLQQVSLPELPGEYSMKVTGEG > CVYLQTSLKYNILPEKEEFPFALGVQTLPQTCDEPKAHTSFQISLSVSYTGSRSASNM > AIVDVKMVSGFIPLKPTVKMLERSNHVSRTEVSSNHVLIYLDKVSNQTLSLFFTVLQD > VPVRDLKPAIVKVYDYYETDEFAIAEYNAPCSKDLGNA" > sig_peptide 44..112 > /gene="A2M" > /note="alpha-2-macroglobulin signal peptide" > misc_feature 104..4465 > /gene="A2M" > /note="KOG1366; Region: Alpha-macroglobulin > [Posttranslational modification, protein turnover, > chaperones]" > /db_xref="CDD:19155" > misc_feature 110..1927 > /gene="A2M" > /note="A2M_N; Region: Alpha-2-macroglobulin family > N-terminal region" > /db_xref="CDD:17056" > mat_peptide 113..4465 > /gene="A2M" > /product="alpha-2-macroglobulin" > misc_feature 2216..4432 > /gene="A2M" > /note="Region: Alpha-2-macroglobulin family" > /db_xref="CDD:5952" > variation 1122 > /gene="A2M" > /replace="G" > /replace="A" > /db_xref="dbSNP:2229298" > variation 1282 > /gene="A2M" > /replace="T" > /replace="C" > /db_xref="dbSNP:1049134" > variation 1339 > /gene="A2M" > /replace="T" > /replace="C" > /db_xref="dbSNP:2228222" > variation 1354 > /gene="A2M" > /replace="G" > /replace="A" > /db_xref="dbSNP:226396" > variation 1958 > /gene="A2M" > /replace="G" > /replace="A" > /db_xref="dbSNP:226405" > variation 2154 > /gene="A2M" > /replace="G" > /replace="A" > /db_xref="dbSNP:1800434" > variation 2431 > /gene="A2M" > /replace="T" > /replace="C" > /db_xref="dbSNP:1049143" > variation 2487 > /gene="A2M" > /replace="T" > /replace="A" > /db_xref="dbSNP:3180392" > variation 2487 > /gene="A2M" > /replace="T" > /replace="A" > /db_xref="dbSNP:3210107" > variation 2958 > /gene="A2M" > /replace="G" > /replace="A" > /db_xref="dbSNP:1800433" > variation 3041 > /gene="A2M" > /replace="G" > /replace="A" > /db_xref="dbSNP:669" > variation 3988 > /gene="A2M" > /replace="G" > /replace="A" > /db_xref="dbSNP:1802964" > variation 4334 > /gene="A2M" > /replace="T" > /replace="C" > /db_xref="dbSNP:1802965" > variation 4474 > /gene="A2M" > /replace="C" > /replace="A" > /db_xref="dbSNP:1130840" > variation 4508 > /gene="A2M" > /replace="T" > /replace="C" > /db_xref="dbSNP:1049985" > variation 4511 > /gene="A2M" > /replace="T" > /replace="A" > /db_xref="dbSNP:1802966" > variation 4519 > /gene="A2M" > /replace="C" > /replace="A" > /db_xref="dbSNP:3190224" > ORIGIN > 1 gctacaatcc atctggtctc ctccagctcc ttctttctgc aacatgggga agaacaaact > 61 ccttcatcca agtctggttc ttctcctctt ggtcctcctg cccacagacg cctcagtctc > 121 tggaaaaccg cagtatatgg ttctggtccc ctccctgctc cacactgaga ccactgagaa > 181 gggctgtgtc cttctgagct acctgaatga gacagtgact gtaagtgctt ccttggagtc > 241 tgtcagggga aacaggagcc tcttcactga cctggaggcg gagaatgacg tactccactg > 301 tgtcgccttc gctgtcccaa agtcttcatc caatgaggag gtaatgttcc tcactgtcca > 361 agtgaaagga ccaacccaag aatttaagaa gcggaccaca gtgatggtta agaacgagga > 421 cagtctggtc tttgtccaga cagacaaatc aatctacaaa ccagggcaga cagtgaaatt > 481 tcgtgttgtc tccatggatg aaaactttca ccccctgaat gagttgattc cactagtata > 541 cattcaggat cccaaaggaa atcgcatcgc acaatggcag agtttccagt tagagggtgg > 601 cctcaagcaa ttttcttttc ccctctcatc agagcccttc cagggctcct acaaggtggt > 661 ggtacagaag aaatcaggtg gaaggacaga gcaccctttc accgtggagg aatttgttct > 721 tcccaagttt gaagtacaag taacagtgcc aaagataatc accatcttgg aagaagagat > 781 gaatgtatca gtgtgtggcc tatacacata tgggaagcct gtccctggac atgtgactgt > 841 gagcatttgc agaaagtata gtgacgcttc cgactgccac ggtgaagatt cacaggcttt > 901 ctgtgagaaa ttcagtggac agctaaacag ccatggctgc ttctatcagc aagtaaaaac > 961 caaggtcttc cagctgaaga ggaaggagta tgaaatgaaa cttcacactg aggcccagat > 1021 ccaagaagaa ggaacagtgg tggaattgac tggaaggcag tccagtgaaa tcacaagaac > 1081 cataaccaaa ctctcatttg tgaaagtgga ctcacacttt cgacagggaa ttcccttctt > 1141 tgggcaggtg cgcctagtag atgggaaagg cgtccctata ccaaataaag tcatattcat > 1201 cagaggaaat gaagcaaact attactccaa tgctaccacg gatgagcatg gccttgtaca > 1261 gttctctatc aacaccacca acgttatggg tacctctctt actgttaggg tcaattacaa > 1321 ggatcgtagt ccctgttacg gctaccagtg ggtgtcagaa gaacacgaag aggcacatca > 1381 cactgcttat cttgtgttct ccccaagcaa gagctttgtc caccttgagc ccatgtctca > 1441 tgaactaccc tgtggccata ctcagacagt ccaggcacat tatattctga atggaggcac > 1501 cctgctgggg ctgaagaagc tctcctttta ttatctgata atggcaaagg gaggcattgt > 1561 ccgaactggg actcatggac tgcttgtgaa gcaggaagac atgaagggcc atttttccat > 1621 ctcaatccct gtgaagtcag acattgctcc tgtcgctcgg ttgctcatct atgctgtttt > 1681 acctaccggg gacgtgattg gggattctgc aaaatatgat gttgaaaatt gtctggccaa > 1741 caaggtggat ttgagcttca gcccatcaca aagtctccca gcctcacacg cccacctgcg > 1801 agtcacagcg gctcctcagt ccgtctgcgc cctccgtgct gtggaccaaa gcgtgctgct > 1861 catgaagcct gatgctgagc tctcggcgtc ctcggtttac aacctgctac cagaaaagga > 1921 cctcactggc ttccctgggc ctttgaatga ccaggacgat gaagactgca tcaatcgtca > 1981 taatgtctat attaatggaa tcacatatac tccagtatca agtacaaatg aaaaggatat > 2041 gtacagcttc ctagaggaca tgggcttaaa ggcattcacc aactcaaaga ttcgtaaacc > 2101 caaaatgtgt ccacagcttc aacagtatga aatgcatgga cctgaaggtc tacgtgtagg > 2161 tttttatgag tcagatgtaa tgggaagagg ccatgcacgc ctggtgcatg ttgaagagcc > 2221 tcacacggag accgtacgaa agtacttccc tgagacatgg atctgggatt tggtggtggt > 2281 aaactcagca ggggtggctg aggtaggagt aacagtccct gacaccatca ccgagtggaa > 2341 ggcaggggcc ttctgcctgt ctgaagatgc tggacttggt atctcttcca ctgcctctct > 2401 ccgagccttc cagcccttct ttgtggagct tacaatgcct tactctgtga ttcgtggaga > 2461 ggccttcaca ctcaaggcca cggtcctaaa ctaccttccc aaatgcatcc gggtcagtgt > 2521 gcagctggaa gcctctcccg ccttccttgc tgtcccagtg gagaaggaac aagcgcctca > 2581 ctgcatctgt gcaaacgggc ggcaaactgt gtcctgggca gtaaccccaa agtcattagg > 2641 aaatgtgaat ttcactgtga gcgcagaggc actagagtct caagagctgt gtgggactga > 2701 ggtgccttca gttcctgaac acggaaggaa agacacagtc atcaagcctc tgttggttga > 2761 acctgaagga ctagagaagg aaacaacatt caactcccta ctttgtccat caggtggtga > 2821 ggtttctgaa gaattatccc tgaaactgcc accaaatgtg gtagaagaat ctgcccgagc > 2881 ttctgtctca gttttgggag acatattagg ctctgccatg caaaacacac aaaatcttct > 2941 ccagatgccc tatggctgtg gagagcagaa tatggtcctc tttgctccta acatctatgt > 3001 actggattat ctaaatgaaa cacagcagct tactccagag gtcaagtcca aggccattgg > 3061 ctatctcaac actggttacc agagacagtt gaactacaaa cactatgatg gctcctacag > 3121 cacctttggg gagcgatatg gcaggaacca gggcaacacc tggctcacag cctttgttct > 3181 gaagactttt gcccaagctc gagcctacat cttcatcgat gaagcacaca ttacccaagc > 3241 cctcatatgg ctctcccaga ggcagaagga caatggctgt ttcaggagct ctgggtcact > 3301 gctcaacaat gccataaagg gaggagtaga agatgaagtg accctctccg cctatatcac > 3361 catcgccctt ctggagattc ctctcacagt cactcaccct gttgtccgca atgccctgtt > 3421 ttgcctggag tcagcctgga agacagcaca agaaggggac catggcagcc atgtatatac > 3481 caaagcactg ctggcctatg cttttgccct ggcaggtaac caggacaaga ggaaggaagt > 3541 actcaagtca cttaatgagg aagctgtgaa gaaagacaac tctgtccatt gggagcgccc > 3601 tcagaaaccc aaggcaccag tggggcattt ttacgaaccc caggctccct ctgctgaggt > 3661 ggagatgaca tcctatgtgc tcctcgctta tctcacggcc cagccagccc caacctcgga > 3721 ggacctgacc tctgcaacca acatcgtgaa gtggatcacg aagcagcaga atgcccaggg > 3781 cggtttctcc tccacccagg acacagtggt ggctctccat gctctgtcca aatatggagc > 3841 cgccacattt accaggactg ggaaggctgc acaggtgact atccagtctt cagggacatt > 3901 ttccagcaaa ttccaagtgg acaacaacaa tcgcctgtta ctgcagcagg tctcattgcc > 3961 agagctgcct ggggaataca gcatgaaagt gacaggagaa ggatgtgtct acctccagac > 4021 ctccttgaaa tacaatattc tcccagaaaa ggaagagttc ccctttgctt taggagtgca > 4081 gactctgcct caaacttgtg atgaacccaa agcccacacc agcttccaaa tctccctaag > 4141 tgtcagttac acagggagcc gctctgcctc caacatggcg atcgttgatg tgaagatggt > 4201 ctctggcttc attcccctga agccaacagt gaaaatgctt gaaagatcta accatgtgag > 4261 ccggacagaa gtcagcagca accatgtctt gatttacctt gataaggtgt caaatcagac > 4321 actgagcttg ttcttcacgg ttctgcaaga tgtcccagta agagatctca aaccagccat > 4381 agtgaaagtc tatgattact acgagacgga tgagtttgca atcgctgagt acaatgctcc > 4441 ttgcagcaaa gatcttggaa atgcttgaag accacaaggc tgaaaagtgc tttgctggag > 4501 tcctgttctc tgagctccac agaagacacg tgtttttgta tctttaaaga cttgatgaat > 4561 aaacactttt tctggtc > // > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From conrad.halling at bifx.org Wed May 18 14:13:47 2005 From: conrad.halling at bifx.org (Conrad Halling) Date: Wed May 18 14:08:17 2005 Subject: [Bioperl-l] about pI prediction In-Reply-To: <428B4D51.3070109@gmail.com> References: <428B4D51.3070109@gmail.com> Message-ID: <428B85DB.9040807@bifx.org> Frank, The usual algorithm is to calculate the charge of the protein at a given pH, say 7.0, then iterate to converge to a pH where the calculated charge on the protein is zero. In BioPerl, the calculation is performed by sub _calculate_iep() in Bio::Tools::pICalculator. For the latest stable release of EMBOSS, the code for calculating the pI is in file EMBOSS-2.10/nucleus/embiep.c, function emblepPhConverge(). For a more lucid explanation of how the algorithm works, see page two of http://fields.scripps.edu/DTASelect/20010710-pI-Algorithm.pdf. -- Conrad Frank wrote: > Hi, Thanks all, I wil install them and do some analysis. I did it via > net in http://us.expasy.org/tools/pi_tool.html before. It is not so > convenient for mass data analysis. > > Hi, Brian, do you have any idea about the principle of cacluate pI? I > searched some references published in 1980s. I am not sure whether it is > out of date. > > Frank > > Brian Osborne wrote: > >> Frank, >> >> I've used both Tools/pICalculator and Tools/SeqStats. >> >> Brian O. >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Marc Logghe >> Sent: Tuesday, May 17, 2005 4:09 PM >> To: Frank; bioperl-l@portal.open-bio.org >> Subject: RE: [Bioperl-l] about pI prediction >> >> >> Hi Frank, >> >> >>> Dose anyone use the bioperl module to predict pI and molecular >>> weight of a protein? Thanks >>> >> >> Don't know of a pure BioPerl module to predict pI. You can use >> Bio::Tools::SeqStats to get the molecular weight, though. >> If you have EMBOSS installed, you can use the Bioperl wrapper to the >> EMBOSS application 'pepstats' for both the pI and mol weight. Have a >> look at the docs for Bio::Factory::EMBOSS. >> However, you have to parse out the corresponding values. The EMBOSS >> docs, including an example output you can find at >> http://emboss.sourceforge.net/apps/pepstats.html >> HTH, >> Marc -- Conrad Halling conrad.halling@bifx.org From michael.watson at bbsrc.ac.uk Thu May 19 10:11:57 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu May 19 10:06:52 2005 Subject: [Bioperl-l] Extracting a particular feature from a sequence Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D4F8@iahce2knas1.iah.bbsrc.reserved> Hi Sorry if this documentation exists, but if it does I haven't been able to find it. I want to find a feature with a particular tag=value pairing in a sequence file. The Feature and Annotation HOWTO suggests a combination of grep and get_SeqFeatures: my @cds_features = grep { $_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $gb_file)->next_seq->get_SeqFeatures; However, if I want to examine a feature for a particular gene within a sequence containing an entire bacterial genome, the above code is somewhat cumbersome, having to iterate through all the features just to find one. Just wondering if there was a method I could use to extract just a single feature, something really nice like: $seq->get_SeqFeature(-primary_tag => 'CDS', -locus_tag => 'STY2701'); Thanks Mick From gyang at plantbio.uga.edu Thu May 19 11:27:11 2005 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Thu May 19 11:19:59 2005 Subject: [Bioperl-l] weird behavior of arrary reference arguments for a subroutine In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D4F8@iahce2knas1.iah.bbsrc.reserved> Message-ID: <20050519112711.d0d54f21@dogwood.plantbio.uga.edu> Dear All, I have a subroutine that uses several array references as arguments (the arrays are produced earlier by another subroutine, and the array contents are correct when I print them out). But when I use the references as arguments for subroutine, it seems they are in wrong order. See sub below: @A=(1, 3, 5, 7); @B=(2, 4, 6, 8); @C=(11,13, 15, 17); @D=(12, 14, 16, 18); sub test { # example use: test(\@A, \@B, \@C, \@D) my $X=1; while (${$_[0]}[$X]) { print "A:${$_[0]}[$X] B:${$_[1]}[$X]\n"; my $Y=1; while (${$_[2]}[$Y]) { ..... $Y+=1; } $X+=1; } I expect the result would print: A:1 B:2 A:3 B:4 A:5 B:6 A:7 B:8 But the real print out is something like: ...(these may be correct ones). A:1 B:14 A:1 B:16 A:1 B:18 ... A:15 B:6 A:15 B:8 ... Does anybody have a clue on the weird thing? Thank you! Guojun Department of Plant Biology University of Georgia From Marc.Logghe at devgen.com Thu May 19 11:36:33 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu May 19 11:29:02 2005 Subject: [Bioperl-l] Extracting a particular feature from a sequence Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E78A@ANTARESIA.be.devgen.com> Hi Michael, I don't know what kind of information you need from the features but if you are after the sequence I often use EMBOSS' extractfeat to do that. The command looks like: extractfeat test.gb -type CDS -tag gene -value STY2701 So you can specify the feature type, the tag name and value. Besides the sequence, you can add more feature info (e.g. other tag values) to the description line using the -describe option. Full docs at http://emboss.sourceforge.net/apps/extractfeat.html HTH, ML > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > michael watson (IAH-C) > Sent: Thursday, May 19, 2005 4:12 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Extracting a particular feature from a sequence > > Hi > > Sorry if this documentation exists, but if it does I haven't > been able to find it. > > I want to find a feature with a particular tag=value pairing > in a sequence file. The Feature and Annotation HOWTO > suggests a combination of grep and get_SeqFeatures: > > my @cds_features = grep { $_->primary_tag eq 'CDS' } > Bio::SeqIO->new(-file => $gb_file)->next_seq->get_SeqFeatures; > > However, if I want to examine a feature for a particular gene > within a sequence containing an entire bacterial genome, the > above code is somewhat cumbersome, having to iterate through > all the features just to find one. > > Just wondering if there was a method I could use to extract > just a single feature, something really nice like: > > $seq->get_SeqFeature(-primary_tag => 'CDS', -locus_tag => 'STY2701'); > > Thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From michael.watson at bbsrc.ac.uk Thu May 19 11:40:58 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu May 19 11:33:29 2005 Subject: [Bioperl-l] Extracting a particular feature from a sequence Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D501@iahce2knas1.iah.bbsrc.reserved> Hi Marc :-) Yes I am using extractfeat, and maybe you can help - the information I really need for this CDS is what STRAND it is on. As you will know, extractfeat extracts the sequence in the direction that the feature runs, so if it is on the -1 strand, the feature sequence runs in the opposite direction to a feature on the +1 strand. I just need to know *when* extractfeat has done that, but no matter how hard I look, I can't find anywhere that extractfeat can put that information....? Mick -----Original Message----- From: Marc Logghe [mailto:Marc.Logghe@devgen.com] Sent: 19 May 2005 16:37 To: michael watson (IAH-C); bioperl-l@portal.open-bio.org Subject: RE: [Bioperl-l] Extracting a particular feature from a sequence Hi Michael, I don't know what kind of information you need from the features but if you are after the sequence I often use EMBOSS' extractfeat to do that. The command looks like: extractfeat test.gb -type CDS -tag gene -value STY2701 So you can specify the feature type, the tag name and value. Besides the sequence, you can add more feature info (e.g. other tag values) to the description line using the -describe option. Full docs at http://emboss.sourceforge.net/apps/extractfeat.html HTH, ML > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > michael watson (IAH-C) > Sent: Thursday, May 19, 2005 4:12 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Extracting a particular feature from a sequence > > Hi > > Sorry if this documentation exists, but if it does I haven't > been able to find it. > > I want to find a feature with a particular tag=value pairing > in a sequence file. The Feature and Annotation HOWTO > suggests a combination of grep and get_SeqFeatures: > > my @cds_features = grep { $_->primary_tag eq 'CDS' } > Bio::SeqIO->new(-file => $gb_file)->next_seq->get_SeqFeatures; > > However, if I want to examine a feature for a particular gene > within a sequence containing an entire bacterial genome, the > above code is somewhat cumbersome, having to iterate through > all the features just to find one. > > Just wondering if there was a method I could use to extract > just a single feature, something really nice like: > > $seq->get_SeqFeature(-primary_tag => 'CDS', -locus_tag => 'STY2701'); > > Thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From ak at ebi.ac.uk Thu May 19 12:04:05 2005 From: ak at ebi.ac.uk (Andreas Kahari) Date: Thu May 19 11:56:36 2005 Subject: [Bioperl-l] weird behavior of arrary reference arguments for a subroutine In-Reply-To: <20050519112711.d0d54f21@dogwood.plantbio.uga.edu> References: <8975119BCD0AC5419D61A9CF1A923E950172D4F8@iahce2knas1.iah.bbsrc.reserved> <20050519112711.d0d54f21@dogwood.plantbio.uga.edu> Message-ID: <20050519160405.GE7882@ebi.ac.uk> Dear Guojun Yang, You program produces the following output: A:3 B:4 A:5 B:6 A:7 B:8 (you array indices starts on 1, not on 0, so you miss the fist array entry). I'm slightly surprised that "${$_[0]}[$X]" generates anything useful. In fact, "${$_[0]}" is an error in itself since "$_[0]" is not a scalar reference. I would have written it as "$_[0]->[$X]" (when thinking about it, I would probably have extracted the references from @_ in the start of the subroutine into something with more useful variable names). Andreas On Thu, May 19, 2005 at 11:27:11AM -0400, Guojun Yang wrote: > Dear All, > I have a subroutine that uses several array references as arguments (the arrays are produced earlier by another subroutine, and the array contents are correct when I print them out). But when I use the references as arguments for subroutine, it seems they are in wrong order. See sub below: > > @A=(1, 3, 5, 7); > @B=(2, 4, 6, 8); > @C=(11,13, 15, 17); > @D=(12, 14, 16, 18); > > sub test { > # example use: test(\@A, \@B, \@C, \@D) > > my $X=1; > while (${$_[0]}[$X]) { > print "A:${$_[0]}[$X] B:${$_[1]}[$X]\n"; > > my $Y=1; > while (${$_[2]}[$Y]) { > ..... > $Y+=1; > } > $X+=1; > } > > I expect the result would print: > A:1 B:2 > A:3 B:4 > A:5 B:6 > A:7 B:8 > > But the real print out is something like: > ...(these may be correct ones). > A:1 B:14 > A:1 B:16 > A:1 B:18 > ... > A:15 B:6 > A:15 B:8 > ... > > Does anybody have a clue on the weird thing? > Thank you! > Guojun > > Department of Plant Biology > University of Georgia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Andreas K?h?ri EMBL-EBI/ensembl www.ensembl.org 1024D/C2E163CB F4C4 A41A 665B 448A 3FA9 6AEA 12E3 39DA C2E1 63CB From palmeida at igc.gulbenkian.pt Thu May 19 12:12:08 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Thu May 19 12:05:30 2005 Subject: [Bioperl-l] weird behavior of arrary reference arguments for a subroutine In-Reply-To: <20050519112711.d0d54f21@dogwood.plantbio.uga.edu> References: <20050519112711.d0d54f21@dogwood.plantbio.uga.edu> Message-ID: <200505191712.08218.palmeida@igc.gulbenkian.pt> Hi, I don't fully understand your code and final goal (do you want to have just output you mentioned, or add the C and D lists eventually? And in what order?). This code, however, produces the output you wanted, in my computer (I only removed the lines with $Y, but it worked just the same, and I also changed $X to 0): @A=(1, 3, 5, 7); @B=(2, 4, 6, 8); @C=(11,13, 15, 17); @D=(12, 14, 16, 18); sub test { my $X=0; while (${$_[0]}[$X]) { print "A:${$_[0]}[$X] B:${$_[1]}[$X]\n"; $X+=1; } } test(\@A, \@B, \@C, \@D); ------- Output: A:1 B:2 A:3 B:4 A:5 B:6 A:7 B:8 -Paulo On Thursday 19 May 2005 16:27, Guojun Yang wrote: > Dear All, > I have a subroutine that uses several array references as arguments (the > arrays are produced earlier by another subroutine, and the array contents > are correct when I print them out). But when I use the references as > arguments for subroutine, it seems they are in wrong order. See sub below: > > @A=(1, 3, 5, 7); > @B=(2, 4, 6, 8); > @C=(11,13, 15, 17); > @D=(12, 14, 16, 18); > > sub test { > # example use: test(\@A, \@B, \@C, \@D) > > my $X=1; > while (${$_[0]}[$X]) { > print "A:${$_[0]}[$X] B:${$_[1]}[$X]\n"; > > my $Y=1; > while (${$_[2]}[$Y]) { > ..... > $Y+=1; > } > $X+=1; > } > > I expect the result would print: > A:1 B:2 > A:3 B:4 > A:5 B:6 > A:7 B:8 > > But the real print out is something like: > ...(these may be correct ones). > A:1 B:14 > A:1 B:16 > A:1 B:18 > ... > A:15 B:6 > A:15 B:8 > ... > > Does anybody have a clue on the weird thing? > Thank you! > Guojun > > Department of Plant Biology > University of Georgia From lifei03 at gmail.com Thu May 19 12:14:57 2005 From: lifei03 at gmail.com (Frank) Date: Thu May 19 12:07:29 2005 Subject: [Bioperl-l] about pI prediction In-Reply-To: <428B85DB.9040807@bifx.org> References: <428B4D51.3070109@gmail.com> <428B85DB.9040807@bifx.org> Message-ID: <428CBB81.1000009@gmail.com> Thanks very much! Conrad Halling wrote: > Frank, > > The usual algorithm is to calculate the charge of the protein at a > given pH, say 7.0, then iterate to converge to a pH where the > calculated charge on the protein is zero. > > In BioPerl, the calculation is performed by sub _calculate_iep() in > Bio::Tools::pICalculator. > > For the latest stable release of EMBOSS, the code for calculating the > pI is in file EMBOSS-2.10/nucleus/embiep.c, function emblepPhConverge(). > > For a more lucid explanation of how the algorithm works, see page two > of http://fields.scripps.edu/DTASelect/20010710-pI-Algorithm.pdf. > > -- Conrad > > > Frank wrote: > >> Hi, Thanks all, I wil install them and do some analysis. I did it >> via net in http://us.expasy.org/tools/pi_tool.html before. It is not >> so convenient for mass data analysis. >> >> Hi, Brian, do you have any idea about the principle of cacluate pI? >> I searched some references published in 1980s. I am not sure whether >> it is out of date. >> >> Frank >> >> Brian Osborne wrote: >> >>> Frank, >>> >>> I've used both Tools/pICalculator and Tools/SeqStats. >>> >>> Brian O. >>> >>> -----Original Message----- >>> From: bioperl-l-bounces@portal.open-bio.org >>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Marc Logghe >>> Sent: Tuesday, May 17, 2005 4:09 PM >>> To: Frank; bioperl-l@portal.open-bio.org >>> Subject: RE: [Bioperl-l] about pI prediction >>> >>> >>> Hi Frank, >>> >>> >>>> Dose anyone use the bioperl module to predict pI and molecular >>>> weight of a protein? Thanks >>>> >>> >>> >>> Don't know of a pure BioPerl module to predict pI. You can use >>> Bio::Tools::SeqStats to get the molecular weight, though. >>> If you have EMBOSS installed, you can use the Bioperl wrapper to the >>> EMBOSS application 'pepstats' for both the pI and mol weight. Have a >>> look at the docs for Bio::Factory::EMBOSS. >>> However, you have to parse out the corresponding values. The EMBOSS >>> docs, including an example output you can find at >>> http://emboss.sourceforge.net/apps/pepstats.html >>> HTH, >>> Marc >> > From junhu54 at hotmail.com Thu May 19 12:30:55 2005 From: junhu54 at hotmail.com (jun hu) Date: Thu May 19 12:24:47 2005 Subject: [Bioperl-l] about pI prediction In-Reply-To: <428B85DB.9040807@bifx.org> Message-ID: Hi, Everyone, I use emboss pepstat and parse the result with some simple perl regular expression ... works fine... The problem is that the result is different from the results from expasy webserver, I guess they use different PKa values... Does anyone know where I can get a set of Pka values of all the amino acids, expecially those modified amino acids, like oxidated amino acid, phosphorylated amino acids, Itraq labeled lysine , etc,..., their pkas should be different from normal amino acids and the pIs from corresponding peptide will differ too. Best, Jun Hu Bioinformatics Specialist UMDNJ >From: Conrad Halling >To: Frank >CC: bioperl-l@portal.open-bio.org, Brian Osborne >, Marc Logghe >Subject: Re: [Bioperl-l] about pI prediction >Date: Wed, 18 May 2005 14:13:47 -0400 > >Frank, > >The usual algorithm is to calculate the charge of the protein at a given >pH, say 7.0, then iterate to converge to a pH where the calculated charge >on the protein is zero. > >In BioPerl, the calculation is performed by sub _calculate_iep() in >Bio::Tools::pICalculator. > >For the latest stable release of EMBOSS, the code for calculating the pI is >in file EMBOSS-2.10/nucleus/embiep.c, function emblepPhConverge(). > >For a more lucid explanation of how the algorithm works, see page two of >http://fields.scripps.edu/DTASelect/20010710-pI-Algorithm.pdf. > >-- Conrad > > >Frank wrote: >>Hi, Thanks all, I wil install them and do some analysis. I did it via >>net in http://us.expasy.org/tools/pi_tool.html before. It is not so >>convenient for mass data analysis. >> >>Hi, Brian, do you have any idea about the principle of cacluate pI? I >>searched some references published in 1980s. I am not sure whether it is >>out of date. >> >>Frank >> >>Brian Osborne wrote: >> >>>Frank, >>> >>>I've used both Tools/pICalculator and Tools/SeqStats. >>> >>>Brian O. >>> >>>-----Original Message----- >>>From: bioperl-l-bounces@portal.open-bio.org >>>[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Marc Logghe >>>Sent: Tuesday, May 17, 2005 4:09 PM >>>To: Frank; bioperl-l@portal.open-bio.org >>>Subject: RE: [Bioperl-l] about pI prediction >>> >>> >>>Hi Frank, >>> >>> >>>>Dose anyone use the bioperl module to predict pI and molecular weight >>>>of a protein? Thanks >>>> >>> >>>Don't know of a pure BioPerl module to predict pI. You can use >>>Bio::Tools::SeqStats to get the molecular weight, though. >>>If you have EMBOSS installed, you can use the Bioperl wrapper to the >>>EMBOSS application 'pepstats' for both the pI and mol weight. Have a >>>look at the docs for Bio::Factory::EMBOSS. >>>However, you have to parse out the corresponding values. The EMBOSS >>>docs, including an example output you can find at >>>http://emboss.sourceforge.net/apps/pepstats.html >>>HTH, >>>Marc > >-- >Conrad Halling >conrad.halling@bifx.org > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From fernan at iib.unsam.edu.ar Thu May 19 12:36:36 2005 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu May 19 12:30:42 2005 Subject: [Bioperl-l] Extracting a particular feature from a sequence In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D4F8@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950172D4F8@iahce2knas1.iah.bbsrc.reserved> Message-ID: <20050519163636.GD1058@iib.unsam.edu.ar> +----[ michael watson (IAH-C) (19.May.2005 11:32): | | Hi | | Sorry if this documentation exists, but if it does I haven't been able | to find it. I was about to write the same question yesterday ... you got here first :) I am also having the impression that this feature doesn't exist. [snipped] | Just wondering if there was a method I could use to extract just a | single feature, something really nice like: | | $seq->get_SeqFeature(-primary_tag => 'CDS', -locus_tag => 'STY2701'); | +----] Yes, just get one feature. I need to go over thousands of GenBank files and check only the SOURCE feature, for me it would be just while ( $seq = $seqio->next_seq() ) { my $sourceFeat = $seq->get_SeqFeature( -primaryTag => 'SOURCE' ); # do my stuff } Fernan From Marc.Logghe at devgen.com Thu May 19 16:18:02 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu May 19 16:11:25 2005 Subject: [Bioperl-l] Extracting a particular feature from a sequence Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E78B@ANTARESIA.be.devgen.com> > :-) Yes I am using extractfeat, and maybe you can help - the > information I really need for this CDS is what STRAND it is > on. As you will know, extractfeat extracts the sequence in > the direction that the feature runs, so if it is on the -1 > strand, the feature sequence runs in the opposite direction > to a feature on the +1 strand. I just need to know > *when* extractfeat has done that, but no matter how hard I > look, I can't find anywhere that extractfeat can put that > information....? Hi Mick, Nope, don't find that neither. Another route you could follow: convert your genbank feature table into gff (bp_genbank2gff.pl or bp_genbank2gff3.pl, or even using EMBOSS: seqret your.gb -feature -offormat2 gff). Next, using the in memory adaptor of gbrowse you can perfectly query this gff "database" for the features you'd like and get hands on the strand information. I'll try to dig up an example script and come back to you. Cheers, Marc From cain at cshl.edu Thu May 19 17:21:49 2005 From: cain at cshl.edu (Scott Cain) Date: Thu May 19 17:15:36 2005 Subject: [Bioperl-l] Bio::Ontology fails test with current SOFA Message-ID: <1116537709.3703.54.camel@localhost.localdomain> Hi Hilmar, I realize Bio::Ontology is starting to show its age, and I am planning on migrating away from its use, but I was wondering if you have any ideas on what it would take to get Bio::OntologyIO to support the current SOFA. The first hangup is that it defines several relationship types in the file, which causes it to throw this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: format error (file t/data/sofa.ontology) offending line: )pseudogene ; SO:0000336 % pseudogenic_region ; SO:0000462 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/Root/Root.pm:328 STACK: Bio::OntologyIO::dagflat::_parse_flat_file /usr/lib/perl5/site_perl/5.8.5/Bio/OntologyIO/dagflat.pm:623 STACK: Bio::OntologyIO::dagflat::parse /usr/lib/perl5/site_perl/5.8.5/Bio/OntologyIO/dagflat.pm:284 STACK: Bio::OntologyIO::dagflat::next_ontology /usr/lib/perl5/site_perl/5.8.5/Bio/OntologyIO/dagflat.pm:317 STACK: t/Ontology.t:44 ----------------------------------------------------------- Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From Marc.Logghe at devgen.com Thu May 19 17:32:43 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu May 19 17:25:11 2005 Subject: [Bioperl-l] Extracting a particular feature from a sequence Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E78C@ANTARESIA.be.devgen.com> > Another route you could follow: convert your genbank feature > table into gff (bp_genbank2gff.pl or bp_genbank2gff3.pl, or > even using EMBOSS: > seqret your.gb -feature -offormat2 gff). > Next, using the in memory adaptor of gbrowse you can > perfectly query this gff "database" for the features you'd > like and get hands on the strand information. > I'll try to dig up an example script and come back to you. I'm back. I set up demo database using genbank record NC_003198 (a big chunk of DNA and features). 1) saved record into sequence.gb 2) created directory ./genome (working dir /home/marcl/gbrowse_in-memory/) 3) bp_genbank2gff3.pl --outdir genome/ --split sequences.gb The latter creates the fasta and gff files in genome. The gff contained 20937 features. strand = -1 locus_tag = STY2701 gene = eutN db_xref = GeneID:1249018 note = synonym: cchB This route should be fine for small genomes. It has to fit into memory. HTH, Marc From akarger at CGR.Harvard.edu Fri May 20 14:40:44 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri May 20 14:32:52 2005 Subject: [Bioperl-l] The Scriptome: a minimal-learning toolbox for manipulating biolog ical data Message-ID: <339D68B133EAD311971E009027DC479702DC82F4@montecarlo.cgr.harvard.edu> Our group has started a project called the Scriptome to provide experimental biologists with tools for exploring and manipulating biological data. (I mentioned it in an exploratory email to the list a couple months ago.) We're not targeting the complicated bioinformatics tools that EMBOSS, Bioperl, etc. provide. Rather, we want to help bench biologists to "eyeball", filter, format, and analyze the many large files they get from those and other programs. While these tasks may be trivial for a programmer, not every bench biologist has the time or inclination to become a programmer - especially those who do computational analysis only occasionally. http://cgr.harvard.edu/cbg/scriptome has an alpha version. There's a (small) set of tools up, including a "Fetch" tool that uses Bioperl. You might also want to look at the Principles & FAQ pages where we talk about the thinking behind the project & the particular solution we chose. We taught a debut 3-hour session to two groups of five biologists in the last few weeks, showing them how to find and use the tools. Newbies were excited that they could filter files in five minutes, instead of five hours of hand-editing. People who knew a bit of Perl appreciated that the tools were not "black box"; they could tweak the tools to be more useful, while learning more Perl on the side. Of course, it remains to be seen how many people use The Scriptome after the class is over. We're currently using an extremely lightweight interface to allow quick tool development, and to free occasional users from needing to learn & remember yet another "intuitive" GUI. We may want to change that (e.g., a lite version of Catherine Letondal's Pise http://www.pasteur.fr/recherche/unites/sis/Pise/ ? SeWeR?), but any change in interface requires wrestling with some frustrating pluses & minuses. For example, the current interface requires no install whatsoever for almost all the tools, just Perl on Unix. This project struggles with big questions: teaching programming skills to non-programmers, optimizing human-computer interface, and understanding biologists' (changing) needs. But even if we can't solve all of these problems, I believe this project can help to address an unmet need for a large group of scientists. We invite submissions of new tools (optionally including code) or "protocols" (series of tools) to be added to the site. I'd also be happy to get advice and feedback about the project in general. Thanks for listening, - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University From sallyli97 at yahoo.com Fri May 20 15:22:02 2005 From: sallyli97 at yahoo.com (Sally Li) Date: Fri May 20 15:14:45 2005 Subject: [Bioperl-l] How to change a row/column name in Bio::Matrix::Generic.pm? Message-ID: <20050520192202.63006.qmail@web53604.mail.yahoo.com> Hi, I would like to change the row/column name of a matrix built by using Bio::Matrix::Generic.pm. Could some one help me in this issue? If there is no such function, could some one suggest some ideas to add this function? Thanks! Sally __________________________________ Yahoo! Mail Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail From wangqi at seu.edu.cn Fri May 20 17:16:59 2005 From: wangqi at seu.edu.cn (wangqi) Date: Fri May 20 17:10:29 2005 Subject: [Bioperl-l] is there local alignment module in bioperl? Message-ID: <200505202109.j4KL9dfX020751@portal.open-bio.org> Dear all: I'm a beginner of bioperl, I want to know if bioperl has an alignment module which can use swith-waterman algorithm to calculate the local alignment between two sequences? From jason.stajich at duke.edu Sun May 22 22:30:22 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun May 22 22:25:28 2005 Subject: [Bioperl-l] How to change a row/column name in Bio::Matrix::Generic.pm? In-Reply-To: <20050520192202.63006.qmail@web53604.mail.yahoo.com> References: <20050520192202.63006.qmail@web53604.mail.yahoo.com> Message-ID: I assume you mean after you've created the Matrix object? I didn't put an API method in for this directly - maybe we should move the init code in the new() function out to this. You get the row names or col names from: column_header or row_header methods and/or column_names and row_names. You'll see they access the internal arrayrefs '_colnames' and '_rownames'. Look at the initializing code in the new() function where these internals are initialized. This should update the row names for you: my @rownames = qw(ROW1 ROW2 ROW3); $matrix->{'_rownames'} = [@rownames]; You'll also need to do this so the reverse lookup also works: my $count = 0; %{$self->{'_rownamesmap'}} = map { $_ => $count++ } @rownames; Similarly for columnnames -jason On May 20, 2005, at 3:22 PM, Sally Li wrote: > Hi, > > I would like to change the row/column name of a matrix > built by using Bio::Matrix::Generic.pm. Could some > one help me in this issue? If there is no such > function, could some one suggest some ideas to add > this function? > > Thanks! > Sally > > > > __________________________________ > Yahoo! Mail Mobile > Take Yahoo! Mail with you! Check email on your mobile phone. > http://mobile.yahoo.com/learn/mail > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From golharam at umdnj.edu Wed May 18 14:05:39 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Sun May 22 22:35:47 2005 Subject: [Bioperl-l] Updated Bio::Tools::Spidey::Exon.pm and test modules Message-ID: <000401c55bd4$32c9d690$e6028a0a@GOLHARMOBILE1> Attached to this message is: 1. An updated Bio::Tools::Spidey::Exon.pm module Added 2 new methods to check for existence for donor and acceptor splice site from output: donor(), acceptor() 2. A test module for Spidey (Spidey.t) 3. 2 data files used by the test module (spidey.test1, spidey.noalignment) Can someone please check these into CVS? Thanks, Ryan -------------- next part -------------- A non-text attachment was scrubbed... Name: Exon.pm Type: application/octet-stream Size: 5964 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050518/c7c1ee0c/Exon-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Spidey.t Type: application/octet-stream Size: 1585 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050518/c7c1ee0c/Spidey-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spidey.test1 Type: application/octet-stream Size: 6187 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050518/c7c1ee0c/spidey-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spidey.noalignment Type: application/octet-stream Size: 154 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050518/c7c1ee0c/spidey-0003.obj From golharam at umdnj.edu Wed May 18 14:11:39 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Sun May 22 22:35:50 2005 Subject: [Bioperl-l] RE: Updated Bio::Tools::Spidey::Exon.pm and test modules Message-ID: <000801c55bd5$0998aed0$e6028a0a@GOLHARMOBILE1> I missed one module. I've attached Bio::Tools::Spidey::Results.pm to this message. This also needs to be checked in. Is there a way for me to get read/write access to this module in CVS so I don't have to ask someone to check it in, or is this how I should do it? Ryan -----Original Message----- From: Ryan Golhar [mailto:golharam@umdnj.edu] Sent: Wednesday, May 18, 2005 2:06 PM To: 'Bioperl List' Subject: Updated Bio::Tools::Spidey::Exon.pm and test modules Attached to this message is: 1. An updated Bio::Tools::Spidey::Exon.pm module Added 2 new methods to check for existence for donor and acceptor splice site from output: donor(), acceptor() 2. A test module for Spidey (Spidey.t) 3. 2 data files used by the test module (spidey.test1, spidey.noalignment) Can someone please check these into CVS? Thanks, Ryan -------------- next part -------------- A non-text attachment was scrubbed... Name: Results.pm Type: application/octet-stream Size: 14102 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050518/03682348/Results-0001.obj From ewijaya at singnet.com.sg Sun May 22 19:17:37 2005 From: ewijaya at singnet.com.sg (Edward WIJAYA) Date: Sun May 22 23:58:01 2005 Subject: [Bioperl-l] is there local alignment module in bioperl? In-Reply-To: <200505202109.j4KL9dfX020751@portal.open-bio.org> References: <200505202109.j4KL9dfX020751@portal.open-bio.org> Message-ID: On Sat, 21 May 2005 05:16:59 +0800, wangqi wrote: > Dear all: > I'm a beginner of bioperl, I want to know if bioperl has an alignment > module which can use swith-waterman algorithm to calculate the local > alignment between two sequences? > Is this what you are looking for? http://search.cpan.org/~birney/bioperl-1.4/Bio/Tools/pSW.pm -- Regards, Edward WIJAYA SINGAPORE From tembe at bioanalysis.org Mon May 23 09:50:18 2005 From: tembe at bioanalysis.org (Waibhav Tembe) Date: Mon May 23 09:43:02 2005 Subject: [Bioperl-l] Equivalence Between SmithWaterman and Edit distance Message-ID: <4291DF9A.50402@bioanalysis.org> Hello List, I am aware of the the pSW module as well as the String::approx module coming from Perl/Bioperl for aligning sequences. From an algorithm point of view, could any one let me know if the Edit distance between two strings is the same as the Smith-Waterman alignment with unit penalty for mismatch, gap existence, and gap extension? Thanks. From lstein at cshl.edu Mon May 23 12:55:33 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon May 23 12:48:09 2005 Subject: [Bioperl-l] RE: Passing extra arguments to method references in Bio::Graphics::Panel::add_track In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950172D41F@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E950172D41F@iahce2knas1.iah.bbsrc.reserved> Message-ID: <200505231255.34274.lstein@cshl.edu> $start = 23; add_track(... -label => sub { gene_description($start,@_) } ... ); In other words, pass an anonymous subroutine in which the $start variable (which has *ALREADY* been assigned) is passed to gene_description in front of the other variables. If $start is going to change dynamically, you will have to use a global or something similar, because once $start is incorporated into the anonymous subroutine, it is stuck there. Lincoln On Monday 09 May 2005 11:30 am, michael watson (IAH-C) wrote: > Hi > > Sorry, a bit hasty on the trigger.... > > I am on bioperl-1.5. > > I'm using the following code to create some rather tasty images: > > $panel->add_track(transcript2 => \@includeCDS, > -bgcolor => 'blue', > -fgcolor => 'black', > -key => 'CDS', > -bump => 0, > -height => 10, > -label => \&gene_description, > -description=> \&gene_label, > ); > > This is fairly standard, and @includeCDS is a bunch of feature objects. > > What I want to do is pass extra arguments to &gene_description, and then > within &gene_description check to see if the feature start is greater > than a certain value (the extra argument). If it is, then I want to > return an empty string, if it isn't I want to return the gene > description. Something like: > > -label => \&gene_description($start) > > But when I tried that it didn't work ;-(. > > So is it possible to pass extra arguments to those functions I am > referencing above? > > Many thanks > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Mon May 23 13:04:01 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon May 23 12:56:23 2005 Subject: [Bioperl-l] Bio::Graphics::Panel In-Reply-To: <425E725B.9020306@freemail.hu> References: <425E725B.9020306@freemail.hu> Message-ID: <200505231304.02089.lstein@cshl.edu> You can't mix connectors like that, but you can pass a callback to -connector=> that will work just as well. The other problem you're experiencing is from using Bio::Seq, rather than Bio::SeqFeature::Generic. Lincoln On Thursday 14 April 2005 09:38 am, Horvath Tamas wrote: > I gather information about certain sequences from some different files, > and I want to display the gathered information. Therefore I create a > Bio::Seq object, add features to it, and then I try to display it. > Here's the code, that supposed to do that: > > my @features = (); > my $id = $record->{SEQ_ID}; > #I had to type in a fake code, otherwise the script stopped > complaining #that the 'abc' couldn't be guessed > $seqobj = Bio::Seq->new( -seq => > 'ATCTGATTAGGCTAGCATAATTTGGCATGCATGCATGCATCGACTAGCATCGATCAGATCGAGCATCGATCAGC >ATCGATC', -id => $id, > -accession_number => $id, > ); > push( @features, new Bio::SeqFeature::Generic(-start => > $record->{L_TIR_START}, > -end => $record->{L_TIR_END}, > -primary => 'repeat_L', > -source => 'internal' ) ); > > foreach $exon (@{$record->{EXON_LIST}}) { > > push( @features, new Bio::SeqFeature::Generic(-start => > $exon->{START}, > -end => $exon->{END}, > -primary => 'exon', > -source => 'internal' ) ); > } > push( @features, new Bio::SeqFeature::Generic(-start => > $record->{R_TIR_START}, > -end => $record->{R_TIR_END}, > -primary => 'repeat_R', > -source => 'internal' ) ); > > foreach $feat (@features) {$seqobj->add_SeqFeature($feat);} > > my %glypher = ( > repeat_L => 'arrow', > exon => 'generic', > repeat_R => 'arrow' > ); > #would this work afterall? I know I can mix features, but can I mix > connectors as #well? > my %connector= ( > repeat_L => 'none', > exon => 'hat', > repeat_R => 'none' > ); > > #the following add_track function will cause the error > $panel->add_track($seqobj, > -glyph => \%glypher, > -bgcolor => 'green', > -connector => \%connector, > -label => 0, > ); > print "track finished\n"; > > The error is: > Can't locate object method "seq_id" via package "Bio::Seq" at > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Feature.pm line 269, > line 191. > > Can u suggest me what the problem can be, or some other way you would do > the same thing (displaying the info) > > Thanks, > Hota > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Mon May 23 13:08:23 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon May 23 13:01:55 2005 Subject: [Bioperl-l] Bio::Graphics::Glyph::triangle In-Reply-To: <4263CDBE.7000003@freemail.hu> References: <4263CDBE.7000003@freemail.hu> Message-ID: <200505231308.24810.lstein@cshl.edu> Hota, I have changed $p to $q*2 in CVS. I didn't check that this works as advertised, so please let me know if this is correct. Lincoln On Monday 18 April 2005 11:09 am, Horvath Tamas wrote: > I have some problems with the triangle glyph. When I don't specify any > orientation to the glyph, it stretches nicely. However, if I do specify, > it only drows isoceles triangles. (with E or W orientation). Can I > overcome this problem? > > Hota > > Actually, I found the problem: > the original codein the 'sub draw_component': > > elsif($orient eq > 'W'){$vx1=$x2;$vy1=$y1;$vx2=$x2;$vy2=$y2;$vx3=$x2-$p;$vy3=$ymid;} > elsif($orient eq > 'E'){$vx1=$x1;$vy1=$y1;$vx2=$x1;$vy2=$y2;$vx3=$x1+$p;$vy3=$ymid;} > > the $p has to be changed to ($q*2) > > Then it creates nicely stretched triangles. However, it might be more > convenient to use an other type of glyph, like: > > > > but I don't know how to create it. Do we have this kind of glyph? > > Hota > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Mon May 23 13:10:41 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon May 23 13:07:23 2005 Subject: [Bioperl-l] help needed Was:Re: [Gmod-gbrowse] negative numbers in xyplot In-Reply-To: <1113930293.8942.31.camel@localhost.localdomain> References: <1112342836.8125.19.camel@localhost.localdomain> <200504011134.05290.lstein@cshl.edu> <1113930293.8942.31.camel@localhost.localdomain> Message-ID: <200505231310.42156.lstein@cshl.edu> The horizontal scale should be drawn at position 0, or at the top or bottom of the plot area if zero is outside the area entirely. I'm not sure what you mean about the horizontal line being drawn in its own subroutine outside the glyph. Some glyphs have horizontal components, and others don't, so I must be misunderstanding you. Lincoln On Tuesday 19 April 2005 01:04 pm, Albert Vilella wrote: > (below) > > > > I submitted a feature request some minutes ago to GBrowse about > > > negative numbers in xyplot, but then, scanning a little bit through > > > the mailing list, I understood that this seems to be > > > Bio::Graphics::xyplot.pm related, not GBrowse. I resubmitted (now > > > logged in) as Bio::Graphics related. > > > > Well, the same group of people work on gbrowse and the bioperl glyphs, > > so either way your bug report is most appreciated. Would someone on > > the mailing list like to volunteer to fix this bug? It should be a > > very simple one to handle. Maybe you can implement log coordiantes > > as well? > > This "negative values in xyplot" feature seems to be stuck in the "TO > DO" list, so I took a look at the code to see if I could implement this > feature myself. > > After messing around a little bit with the code, I could more or less > localize the place where this feature should be added, but there are a > couple of places where I need some help: > > In xyplot.pm draw function: > ----- > [...] > # now seed all the parts with the information they need to draw their > positions > foreach (@parts) { > my $s = eval {$_->feature->score}; > next unless defined $s; > my $position = ($s-$min_score) * $scale; > $_->{_y_position} = $bottom - $position; > } > > Right now, "$bottom" will always point to the bottom the plot, where the > horizontal line of the xyplot graphic will be placed latter, even if $s > is negative. This results in negative numbers being wrongly positioned > as score=0. > > So I understand that the horizontal line should be placed according to > the presence or absence of negative numbers. > > What I couldn't find why was the horizontal lines are plotted inside > each type of graphic (_draw_histogram, _draw_boxes, _draw_line, > _draw_points), instead of outside, in its own subroutine. Is this so or > am I missing a point? > > Then for negative numbers, my question would be: Where should the > horizontal line be set? > > Finally, about the log coordinates scale Lincoln suggested, I visualize > this as substituting $scale comparisons for a call to some log_scale > subroutine that will re-place each value log-wise. > Is that correct? > > Thanks in advance, > > Bests, > > Albert. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Mon May 23 13:10:41 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon May 23 13:09:36 2005 Subject: [Bioperl-l] help needed Was:Re: [Gmod-gbrowse] negative numbers in xyplot In-Reply-To: <1113930293.8942.31.camel@localhost.localdomain> References: <1112342836.8125.19.camel@localhost.localdomain> <200504011134.05290.lstein@cshl.edu> <1113930293.8942.31.camel@localhost.localdomain> Message-ID: <200505231310.42156.lstein@cshl.edu> The horizontal scale should be drawn at position 0, or at the top or bottom of the plot area if zero is outside the area entirely. I'm not sure what you mean about the horizontal line being drawn in its own subroutine outside the glyph. Some glyphs have horizontal components, and others don't, so I must be misunderstanding you. Lincoln On Tuesday 19 April 2005 01:04 pm, Albert Vilella wrote: > (below) > > > > I submitted a feature request some minutes ago to GBrowse about > > > negative numbers in xyplot, but then, scanning a little bit through > > > the mailing list, I understood that this seems to be > > > Bio::Graphics::xyplot.pm related, not GBrowse. I resubmitted (now > > > logged in) as Bio::Graphics related. > > > > Well, the same group of people work on gbrowse and the bioperl glyphs, > > so either way your bug report is most appreciated. Would someone on > > the mailing list like to volunteer to fix this bug? It should be a > > very simple one to handle. Maybe you can implement log coordiantes > > as well? > > This "negative values in xyplot" feature seems to be stuck in the "TO > DO" list, so I took a look at the code to see if I could implement this > feature myself. > > After messing around a little bit with the code, I could more or less > localize the place where this feature should be added, but there are a > couple of places where I need some help: > > In xyplot.pm draw function: > ----- > [...] > # now seed all the parts with the information they need to draw their > positions > foreach (@parts) { > my $s = eval {$_->feature->score}; > next unless defined $s; > my $position = ($s-$min_score) * $scale; > $_->{_y_position} = $bottom - $position; > } > > Right now, "$bottom" will always point to the bottom the plot, where the > horizontal line of the xyplot graphic will be placed latter, even if $s > is negative. This results in negative numbers being wrongly positioned > as score=0. > > So I understand that the horizontal line should be placed according to > the presence or absence of negative numbers. > > What I couldn't find why was the horizontal lines are plotted inside > each type of graphic (_draw_histogram, _draw_boxes, _draw_line, > _draw_points), instead of outside, in its own subroutine. Is this so or > am I missing a point? > > Then for negative numbers, my question would be: Where should the > horizontal line be set? > > Finally, about the log coordinates scale Lincoln suggested, I visualize > this as substituting $scale comparisons for a call to some log_scale > subroutine that will re-place each value log-wise. > Is that correct? > > Thanks in advance, > > Bests, > > Albert. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From tuantran167 at gmail.com Mon May 23 18:26:34 2005 From: tuantran167 at gmail.com (Tuan A. Tran) Date: Mon May 23 18:19:05 2005 Subject: [Bioperl-l] wu-blast and bioperl Message-ID: Hi, I tried to run wublast with bioperl. However, I got an error -------------------- WARNING --------------------- MSG: cannot find path to wublast --------------------------------------------------- Can't call method "next_result" on an undefined value at testanno.pl line 225, line 1. Is there any method which is equivalent to "next_result" method? I really appreciate if anyone can help. My code lousy code is quoted below. I tested it with blastall from ncbi. It works fine. I am using debian sarge and I have wu-blast install in /usr/local/WU-BLAST2.0. Furthermore, I also installed new bioperl-1.5. Thanks in advance TAT ================= #!/usr/local/lib/perl -w BEGIN {$ENV{WUBLASTDIR} = '/usr/local/WU-BLAST2.0/';}; use Bio::Seq; use Bio::SeqIO; use Bio::SearchIO; use Bio::AlignIO; use Bio::SimpleAlign; use Bio::LocatableSeq; use Bio::Tools::Run::StandAloneBlast; use Getopt::Long; use Bio::DB::GenBank; use Bio::DB::Flat::BDB; #use Bio::Index::GenBank; use Bio::Index::Fasta; use Bio::SeqFeature::Generic; use DBI; ######### # Declare database names ######### my $db_core_fly = 'fly_core'; ############ # # Define variables for tables in database drosophila_melanogaster_core_30_3d # ########### my $tattrib = 'attrib_type'; my $tseq_region = 'seq_region'; my $tseq_region_attrib = 'seq_region_attrib'; my $texon = 'exon'; my $texon_stable_id = 'exon_stable_id'; my $texon_transcript = 'exon_transcript'; my $tgene = 'gene'; my $tgene_stable_id = 'gene_stable_id'; my $ttranscript = 'transcript'; my $ttranscript_stable_id = 'transcript_stable_id'; my $txref='xref'; my $texternal_db = 'external_db'; my $texternal_synonym = 'external_synonym'; ########### # This subroutine is to tell Perl to communicate with my MySQL database # Source: got it after googling ########### sub start_InputDB { my $database = $_[0]; use DBI; my $DBD = 'mysql'; my $host ='localhost'; my $user = 'tuan'; my $passwd = ''; my $inputdb = DBI->connect("DBI:$DBD:$database:$host","$user","", {RaiseError => 1, AutoCommit => 1}); return $inputdb; } ###################### sub feature2string { my $f = shift; #my $stable_id = $f->stable_id(); my $seq_region = $f->seq_region_name(); my $start = $f->start(); my $end = $f->end(); my $strand = $f->strand(); my $seq = $f->seq(); return ">$seq_region:$start - $end ($strand)\n$seq"; } sub getGene { my $f = shift; #my $stable_id = $f->stable_id(); my $seq_region = $f->seq_region_name(); my $start = $f->start(); my $end = $f->end(); my $strand = $f->strand(); my $seq = $f->seq(); return ">$seq_region:$start - $end ($strand)\n$seq"; } ############################################################################ # Change number of organisms according to your problem # BE SURE TO CHANGE AS EXAMPLE BELOW ############################################################################ my (@organism,$org_idx); my $max_number_organism = 3; @organism = ('fly'); my $fly_dna = 'Drosophila_melanogaster.BDGP3.2.1.may.dna_rm.'; my %fly_chromosome = ( fly => [$fly_dna.$org_chro.'.2h',$fly_dna.$org_chro.'.2L', $fly_dna.$org_chro.'.2R',$fly_dna.$org_chro.'.3h',$fly_dna.$org_chro.'.3L', $fly_dna.$org_chro.'.3R',$fly_dna.$org_chro.'.4', $fly_dna.$org_chro.'.4h', $fly_dna.$org_chro.'.U',$fly_dna.$org_chro.'.X', $fly_dna.$org_chro.'.Xh', $fly_dna.$org_chro.'.Yh']); my %fly_mysqldb = ( fly => ['fly_core']); ############################################################################ my $meta='local'; my $infile = shift; my $in = Bio::SeqIO->new ( -file => $infile, -format => 'fasta' ); my $prefix="summary_"; my $sumout = $prefix . $infile; $prefix="out_"; $allout = $prefix . $infile; $datastatistics = "stat_" . $infile . ".dat"; my $out = Bio::SeqIO->new ( -file => '>seq_out.fa', -format => 'fasta' ); my $fout = Bio::SeqIO->new(-fh => \*STDOUT , -format => 'fasta'); #1111111111 while(my $query=$in->next_seq()) { print ALLOUT ">",$query->id," ",$query->desc; print ALLOUT "\t",$query->seq,"\n"; print SUMOUT ">",$query->id," ",$query->desc; print SUMOUT "\t",$query->seq,"\n"; # print ">",$query->id," ",$query->desc; # print "\t",$query->seq,"\n"; #2222222222 foreach $organism (@organism) { my $dir = '/Crick/' . $organism . '/'; ######################################### # Go through each mysql database (mostly core and est) ######################################## #333333333333 foreach my $each_db (@{$fly_mysqldb{$organism}}) { my $inputdb = &start_InputDB($each_db); ####################################### # Go through each chromosome of a species ####################################### #4444444444444 foreach my $org_chro (@{$fly_chromosome{$organism}}) { # database for blast my($local_blastdb,$db,$fmt); $local_blastdb = $dir . 'dna/' . $org_chro . '.fa'; #print "is it here ", $local_blastdb, "\n"; $db = $dir . 'dna/' . $organism . '.index'; $fmt='Fasta'; my $blastout = "blast.out." . $infile; my @param = ('program' => 'wublastn','database' => $local_blastdb, 'E'=>10,'o'=>$blastout); my $factory = Bio::Tools::Run::StandAloneBlast->new(@param, _READMETHOD => "Blast"); $wublast_path = $factory->program_path(); print $wublast_path, "\n"; $factory->q(-10000.0); $factory->g(F); #my $blastout = "blast.out." . $infile . $organisms . "." . $org_chro . "." . $each_db; my $blast_report = $factory->wublast($query); ########## # Look at every hits in blast output # ########### #55555555555 while(my $result = $blast_report->next_result) { print "number of hits: ",$result->num_hits,"\n"; if($result->num_hits > 0) { my $match_count=0; my $hit_count=0; my @myhits =$result->hits(); foreach my $myhit (@myhits) { # Count the number of high scoring pair my $hsp_count=0; my @hsps = $myhit->hsps(); foreach my $hsp (@hsps) { print "hsp_length:", $hsp->length, " and query_length:"; print $query->length, "\n"; ######### # Counting the number of hits with the same length as query sequence ######## if($hsp->length eq $query->length) { $match_count++; $hsp_count++; } } # end of foreach my $hsp (@hsps) } # end of foreach my $myhit (@myhits) ########## # Pick the top 4 of high scoring pair ########### if($match_count>0 && $match_count <= 4) { my $temp_organism; if($temp_organism ne $organism) { print $organism,": ",$match_count," full length hsps\n"; } $temp_organism = $organism; foreach my $hit (@myhits) { my $hsp_count=0; my @hsps = $hit->hsps(); foreach my $hsp (@hsps) { # if the length of the high scoring pair is equal to # the length of input sequence, then this is a good hit if($hsp->length eq $query->length) { $hsp_count++; } } print "hsp_count: $hsp_count.\n"; if($hsp_count!=0) { print $hit->name, " : $hsp_count full length hsps\n"; } if($hsp_count>0 ) { my $id = $hit->name; my($dbobj,$seqio); foreach my $hsp (@hsps) { $hsp_start = $hsp->query->start; my $start = $hsp->hit->start(); my $end = $hsp->hit->end(); if($hsp->length eq $query->length) { print "HIT: ",$hsp->hit_string," ",$hit->name; my $flank_start=$start-100; if($flank_start<1) { $flank_start=1; } my $flank_end=$end+100; if($flank_end>$hit->length) { $flank_end=$hit->length; } my $flank_len=$flank_start-$flank_end+1; ############################################################ } #end of if($hsp->length eq $query->length) } # end of foreach my $hsp (@hsps) } # end of if($hsp_count>0 && $hsp_count<4) } # end of foreach my $hit (@myhits) } # end of if($match_count>0 && $match_count<4) } # end of if($result->num_hits>0) #55555555555 } # end of while(my $result = $blast_report->next_result) #55555555555 #444444444444 } # end of for $each_chromosome (keys %insect_chromosome) #444444444444 #333333333333 } # end of foreach my $each_db (@{$insect_mysqldb{$organism}}) #333333333333 #2222222222 } # end of for($org_idx=0; $org_idx < max_number_org; org_idx++) #2222222222 #1111111111 } # end of while(my $query=$in->next_seq()) #1111111111 From jason.stajich at duke.edu Mon May 23 20:38:55 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon May 23 20:32:13 2005 Subject: [Bioperl-l] wu-blast and bioperl In-Reply-To: References: Message-ID: it is still not finding wublast hence your message, you don't want an equivalenced next_result method, you want to tell the module how to find the executable it needs. I think you need to set the WUBLAST variable after the use Bio::Tools::Run::StandAloneBlast. I'm not really looking at the rest of the code below though so I don't know if you've made other mistakes. You can always just write a simple script to test that the basics work use strict; use Bio::Tools::Run::StandAloneBlast; $ENV{WUBLASTDIR} = '/usr/local/WU-BLAST2.0'; my $factory = Bio::Tools::Run::StandAloneBlast->new(); print $factory->program_path(); -jason On May 23, 2005, at 6:26 PM, Tuan A. Tran wrote: > Hi, > > I tried to run wublast with bioperl. However, I got an error > > -------------------- WARNING --------------------- > MSG: cannot find path to wublast > --------------------------------------------------- > Can't call method "next_result" on an undefined value at > testanno.pl line 225, > line 1. > > Is there any method which is equivalent to "next_result" method? I > really appreciate if anyone can help. > > My code lousy code is quoted below. I tested it with blastall from > ncbi. It works fine. > I am using debian sarge and I have wu-blast install in > /usr/local/WU-BLAST2.0. Furthermore, I also installed new bioperl-1.5. > > Thanks in advance > TAT > > > > > ================= > #!/usr/local/lib/perl -w > > BEGIN {$ENV{WUBLASTDIR} = '/usr/local/WU-BLAST2.0/';}; > > use Bio::Seq; > use Bio::SeqIO; > use Bio::SearchIO; > use Bio::AlignIO; > use Bio::SimpleAlign; > use Bio::LocatableSeq; > use Bio::Tools::Run::StandAloneBlast; > use Getopt::Long; > use Bio::DB::GenBank; > use Bio::DB::Flat::BDB; > #use Bio::Index::GenBank; > use Bio::Index::Fasta; > use Bio::SeqFeature::Generic; > use DBI; > > > ######### > # Declare database names > ######### > > my $db_core_fly = 'fly_core'; > > ############ > # > # Define variables for tables in database > drosophila_melanogaster_core_30_3d > # > ########### > > my $tattrib = 'attrib_type'; > my $tseq_region = 'seq_region'; > my $tseq_region_attrib = 'seq_region_attrib'; > my $texon = 'exon'; > my $texon_stable_id = 'exon_stable_id'; > my $texon_transcript = 'exon_transcript'; > my $tgene = 'gene'; > my $tgene_stable_id = 'gene_stable_id'; > my $ttranscript = 'transcript'; > my $ttranscript_stable_id = 'transcript_stable_id'; > my $txref='xref'; > my $texternal_db = 'external_db'; > my $texternal_synonym = 'external_synonym'; > > ########### > # This subroutine is to tell Perl to communicate with my MySQL > database > # Source: got it after googling > ########### > > sub start_InputDB { > my $database = $_[0]; > use DBI; > my $DBD = 'mysql'; > my $host ='localhost'; > my $user = 'tuan'; > my $passwd = ''; > my $inputdb = DBI->connect("DBI:$DBD:$database:$host","$user","", > {RaiseError => 1, AutoCommit => > 1}); > return $inputdb; > } > > ###################### > > sub feature2string { > my $f = shift; > #my $stable_id = $f->stable_id(); > my $seq_region = $f->seq_region_name(); > my $start = $f->start(); > my $end = $f->end(); > my $strand = $f->strand(); > my $seq = $f->seq(); > return ">$seq_region:$start - $end ($strand)\n$seq"; > } > > sub getGene { > my $f = shift; > #my $stable_id = $f->stable_id(); > my $seq_region = $f->seq_region_name(); > my $start = $f->start(); > my $end = $f->end(); > my $strand = $f->strand(); > my $seq = $f->seq(); > return ">$seq_region:$start - $end ($strand)\n$seq"; > } > > ###################################################################### > ###### > # Change number of organisms according to your problem > # BE SURE TO CHANGE AS EXAMPLE BELOW > ###################################################################### > ###### > > my (@organism,$org_idx); > > my $max_number_organism = 3; > > @organism = ('fly'); > > my $fly_dna = 'Drosophila_melanogaster.BDGP3.2.1.may.dna_rm.'; > > my %fly_chromosome = ( > fly => [$fly_dna.$org_chro.'.2h',$fly_dna. > $org_chro.'.2L', > $fly_dna.$org_chro.'.2R',$fly_dna.$org_chro.'.3h',$fly_dna. > $org_chro.'.3L', > $fly_dna.$org_chro.'.3R',$fly_dna.$org_chro.'.4', $fly_dna. > $org_chro.'.4h', > $fly_dna.$org_chro.'.U',$fly_dna.$org_chro.'.X', $fly_dna. > $org_chro.'.Xh', > $fly_dna.$org_chro.'.Yh']); > > my %fly_mysqldb = ( fly => ['fly_core']); > > > > ###################################################################### > ###### > > my $meta='local'; > > my $infile = shift; > > my $in = Bio::SeqIO->new ( -file => $infile, > -format => 'fasta' ); > > my $prefix="summary_"; > my $sumout = $prefix . $infile; > > $prefix="out_"; > $allout = $prefix . $infile; > > $datastatistics = "stat_" . $infile . ".dat"; > > > my $out = Bio::SeqIO->new ( -file => '>seq_out.fa', > -format => 'fasta' ); > my $fout = Bio::SeqIO->new(-fh => \*STDOUT , -format => 'fasta'); > > #1111111111 > while(my $query=$in->next_seq()) > { > > print ALLOUT ">",$query->id," ",$query->desc; > print ALLOUT "\t",$query->seq,"\n"; > print SUMOUT ">",$query->id," ",$query->desc; > print SUMOUT "\t",$query->seq,"\n"; > # print ">",$query->id," ",$query->desc; > # print "\t",$query->seq,"\n"; > > #2222222222 > foreach $organism (@organism) > { > my $dir = '/Crick/' . $organism . '/'; > > > ######################################### > # Go through each mysql database (mostly core and est) > ######################################## > #333333333333 > foreach my $each_db (@{$fly_mysqldb{$organism}}) > { > my $inputdb = &start_InputDB($each_db); > > ####################################### > # Go through each chromosome of a species > ####################################### > #4444444444444 > foreach my $org_chro (@{$fly_chromosome{$organism}}) > { > > # database for blast > my($local_blastdb,$db,$fmt); > $local_blastdb = $dir . 'dna/' . $org_chro . '.fa'; > > #print "is it here ", $local_blastdb, "\n"; > > $db = $dir . 'dna/' . $organism . '.index'; > $fmt='Fasta'; > my $blastout = "blast.out." . $infile; > > my @param = ('program' => 'wublastn','database' => > $local_blastdb, > 'E'=>10,'o'=>$blastout); > my $factory = Bio::Tools::Run::StandAloneBlast->new(@param, > _READMETHOD => "Blast"); > $wublast_path = $factory->program_path(); > print $wublast_path, "\n"; > $factory->q(-10000.0); > $factory->g(F); > > #my $blastout = "blast.out." . $infile . $organisms . "." . $org_chro > . "." . $each_db; > my $blast_report = $factory->wublast($query); > > ########## > # Look at every hits in blast output > # > ########### > #55555555555 > while(my $result = $blast_report->next_result) > { > print "number of hits: ",$result->num_hits,"\n"; > > if($result->num_hits > 0) > { > > my $match_count=0; > my $hit_count=0; > my @myhits =$result->hits(); > > foreach my $myhit (@myhits) > { > # Count the number of high scoring pair > my $hsp_count=0; > my @hsps = $myhit->hsps(); > > foreach my $hsp (@hsps) > { > print "hsp_length:", $hsp->length, " and query_length:"; > print $query->length, "\n"; > > ######### > # Counting the number of hits with the same > length as query sequence > ######## > if($hsp->length eq $query->length) > { > $match_count++; > $hsp_count++; > } > } # end of foreach my $hsp (@hsps) > > } # end of foreach my $myhit (@myhits) > > ########## > # Pick the top 4 of high scoring pair > ########### > if($match_count>0 && $match_count <= 4) > { > > my $temp_organism; > if($temp_organism ne $organism) { > print $organism,": ",$match_count," full length hsps\n"; > } > $temp_organism = $organism; > > foreach my $hit (@myhits) > { > my $hsp_count=0; > my @hsps = $hit->hsps(); > foreach my $hsp (@hsps) > { > # if the length of the high scoring pair is > equal to > # the length of input sequence, then this is a > good hit > if($hsp->length eq $query->length) > { > $hsp_count++; > } > } > > print "hsp_count: $hsp_count.\n"; > > if($hsp_count!=0) > { > > print $hit->name, " : $hsp_count full length hsps\n"; > } > > if($hsp_count>0 ) > { > my $id = $hit->name; > my($dbobj,$seqio); > > > foreach my $hsp (@hsps) > { > $hsp_start = $hsp->query->start; > my $start = $hsp->hit->start(); my $end = > $hsp->hit->end(); > if($hsp->length eq $query->length) > { > print "HIT: ",$hsp->hit_string," ",$hit->name; > > my $flank_start=$start-100; > if($flank_start<1) > { > $flank_start=1; > } > > my $flank_end=$end+100; > if($flank_end>$hit->length) > { > $flank_end=$hit->length; > } > my $flank_len=$flank_start-$flank_end+1; > > > > > ############################################################ > > } #end of if($hsp->length eq $query- > >length) > > } # end of foreach my $hsp (@hsps) > } # end of if($hsp_count>0 && $hsp_count<4) > } # end of foreach my $hit (@myhits) > } # end of if($match_count>0 && $match_count<4) > } # end of if($result->num_hits>0) > > #55555555555 > } # end of while(my $result = $blast_report->next_result) > #55555555555 > > #444444444444 > } # end of for $each_chromosome (keys %insect_chromosome) > #444444444444 > > > #333333333333 > } # end of foreach my $each_db (@{$insect_mysqldb{$organism}}) > #333333333333 > > #2222222222 > } # end of for($org_idx=0; $org_idx < max_number_org; org_idx++) > #2222222222 > > #1111111111 > } # end of while(my $query=$in->next_seq()) > #1111111111 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From michael.watson at bbsrc.ac.uk Tue May 24 08:16:32 2005 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue May 24 08:21:10 2005 Subject: [Bioperl-l] Drawing sequences in the "other" direction Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D55A@iahce2knas1.iah.bbsrc.reserved> Hi I'm trying to draw images of bits of aligned bacterial genomes with the genes marked on as features. Reasonably often a gene in one species is on the +1 strand, and in another species it's on the -1 strand. I want to draw an image of these genes "aligned", one on top of the other, both facing in the same direction (obviously those that I have flipped I will annotate as such). I have been drawing images using Bio::Graphics::Panel and the add_track method, but I can't figure out how to draw the sequence, and all it's features, running in the opposite direction. In fact, I doubt there is one unless someone can point it out? I did think of drawing them in the right orientation and using the linux "convert" command to flip the image, but then all the text is backwards! Any help appreciated Mick From mpatterson at lucigen.com Tue May 24 10:16:53 2005 From: mpatterson at lucigen.com (Melodee Patterson) Date: Tue May 24 10:12:02 2005 Subject: [Bioperl-l] SearchIO newbie question Message-ID: <001101c5606b$3c86c6c0$4701a8c0@Lucigen.local> Hello, I have written a number of perl and bioperl scripts to parse large phage BLASTx files, and now my boss wants me to remove some bacterial contaminations before I run my reports. I know how to use Bio::SearchIO to bring in the blast reports, and I know how to find the reports that I don't want using regular expressions, but what I can't figure out is how to write the good BLAST reports out to a file just as they are. I don't want to parse them at all, since my other scripts are expecting them as full BLAST reports and I'd rather not rewrite them. If I can do this: use Bio::SeqIO; $in = Bio::SeqIO->new('-file' => "AIZX_test.fas", '-format' => 'Fasta'); $out = Bio::SeqIO->new('-file' => ">AIZX_test_out.txt", '-format' => 'Fasta'); while ( my $seq = $in->next_seq() ) {$out->write_seq($seq); } why can't I do this: use Bio::SearchIO; $in = Bio::SearchIO->new('-file' => "BLASTx_test.txt", '-format' => 'blast'); $out = Bio::SearchIO->new('-file' => ">BLASTx_test_out.txt", '-format' => 'blast'); while ( my $blast = $in->next_result() ) {$out->write_result($blast); } Thanks - I appreciate any help I can get! Melodee Lucigen Corp. 608-831-9011 mpatterson@lucigen.com From jason.stajich at duke.edu Tue May 24 10:31:54 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue May 24 10:25:39 2005 Subject: [Bioperl-l] SearchIO newbie question In-Reply-To: <001101c5606b$3c86c6c0$4701a8c0@Lucigen.local> References: <001101c5606b$3c86c6c0$4701a8c0@Lucigen.local> Message-ID: <00B060B9-F01F-4AF6-8A71-21B913497842@duke.edu> use the writer module Bio::SearchIO::Writer::TextResultWriter HTML writing is also covered in howto http://bioperl.org/HOWTOs/SearchIO/outputting.html On May 24, 2005, at 10:16 AM, Melodee Patterson wrote: > Hello, > > I have written a number of perl and bioperl scripts to parse large > phage BLASTx files, and now my boss wants me to remove some > bacterial contaminations before I run my reports. I know how to use > Bio::SearchIO to bring in the blast reports, and I know how to find > the reports that I don't want using regular expressions, but what I > can't figure out is how to write the good BLAST reports out to a > file just as they are. I don't want to parse them at all, since my > other scripts are expecting them as full BLAST reports and I'd > rather not rewrite them. > > If I can do this: > > use Bio::SeqIO; > $in = Bio::SeqIO->new('-file' => "AIZX_test.fas", > '-format' => 'Fasta'); > $out = Bio::SeqIO->new('-file' => ">AIZX_test_out.txt", > '-format' => 'Fasta'); > while ( my $seq = $in->next_seq() ) {$out->write_seq($seq); } > > > why can't I do this: > > use Bio::SearchIO; > $in = Bio::SearchIO->new('-file' => "BLASTx_test.txt", > '-format' => 'blast'); > $out = Bio::SearchIO->new('-file' => ">BLASTx_test_out.txt", > '-format' => 'blast'); > while ( my $blast = $in->next_result() ) {$out->write_result > ($blast); } > > Thanks - I appreciate any help I can get! > > Melodee > Lucigen Corp. > 608-831-9011 > mpatterson@lucigen.com > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From skirov at utk.edu Tue May 24 10:36:49 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue May 24 10:29:17 2005 Subject: [Bioperl-l] Re: Bio::Matrix::PSM::PsmHeader In-Reply-To: <1116926693.4292f2e557b53@www.etumail.uhp-nancy.fr> References: <1116926693.4292f2e557b53@www.etumail.uhp-nancy.fr> Message-ID: <42933C01.8070201@utk.edu> First, you need to send e-mails not to my personal one, but to the bioperl list. You need to look at Bio::Matrix::PSM::IO, Bio::Matrix::PSM::Psm and Bio::Matrix::PSM::Bio::Matrix::PSM::InstanceSite objects. First one parses the data in your mast report and the next_psm method returns Psm object which contains section III as well as some other information. Then you get the InstanceSite objects: my $psmio=new Bio::Matrix::PSM::IO(-format=>'mast', -file=>$file); while (my $psm=$psmio->next_psm) { my %instances=$psm->instances; #Indexed by the sequence identifier foreach my $hit (@instances) { print $hit->start; #Do something with the object- get the sequence id, score, start, end, etc. } } C'est tout. Hope this helps, let me know if you have further questions. Also you may want to upgrade to bioperl-live if you are using bioperl-1.4 Stefan simon118@etumail.uhp-nancy.fr wrote: >Hi, > >I worke with MAST, and I need a parser to analyse my results, with a perl >program. >Before to use Bio::Matrix::PSM::PsmHeader, I want to be sure of results. > >From mast_result.txt, I need to extract each motif and for each, its position on >the input sequence, its score (p-value) and its strand. >To resume, I want the section III on a table. > >Is it possible with your parser? > > >Thanks > > > From james.wasmuth at ed.ac.uk Tue May 24 10:51:09 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Tue May 24 10:49:25 2005 Subject: [Bioperl-l] SearchIO newbie question In-Reply-To: <001101c5606b$3c86c6c0$4701a8c0@Lucigen.local> References: <001101c5606b$3c86c6c0$4701a8c0@Lucigen.local> Message-ID: <42933F5D.9000508@ed.ac.uk> Hi Melodee, you need to use Bio::SearchIO::Writer::TextResultWriter: http://doc.bioperl.org/releases/bioperl-1.4/Bio/SearchIO/Writer/TextResultWriter.html there's some documentation in the HOWTOs: http://bioperl.org/HOWTOs/SearchIO/outputting.html hth james Melodee Patterson wrote: >Hello, > >I have written a number of perl and bioperl scripts to parse large phage BLASTx files, and now my boss wants me to remove some bacterial contaminations before I run my reports. I know how to use Bio::SearchIO to bring in the blast reports, and I know how to find the reports that I don't want using regular expressions, but what I can't figure out is how to write the good BLAST reports out to a file just as they are. I don't want to parse them at all, since my other scripts are expecting them as full BLAST reports and I'd rather not rewrite them. > >If I can do this: > >use Bio::SeqIO; >$in = Bio::SeqIO->new('-file' => "AIZX_test.fas", > '-format' => 'Fasta'); >$out = Bio::SeqIO->new('-file' => ">AIZX_test_out.txt", > '-format' => 'Fasta'); >while ( my $seq = $in->next_seq() ) {$out->write_seq($seq); } > > >why can't I do this: > >use Bio::SearchIO; >$in = Bio::SearchIO->new('-file' => "BLASTx_test.txt", > '-format' => 'blast'); >$out = Bio::SearchIO->new('-file' => ">BLASTx_test_out.txt", > '-format' => 'blast'); >while ( my $blast = $in->next_result() ) {$out->write_result($blast); } > >Thanks - I appreciate any help I can get! > >Melodee >Lucigen Corp. >608-831-9011 >mpatterson@lucigen.com > > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- "Until man duplicates a blade of grass, nature can laugh at his so-called scientific knowledge...." --Thomas Edison Blaxter Nematode Genomics Group | Institute of Evolutionary Biology | Ashworth Laboratories, KB | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From walsh at cenix-bioscience.com Tue May 24 11:07:00 2005 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Tue May 24 10:59:26 2005 Subject: [Bioperl-l] SearchIO newbie question In-Reply-To: <001101c5606b$3c86c6c0$4701a8c0@Lucigen.local> References: <001101c5606b$3c86c6c0$4701a8c0@Lucigen.local> Message-ID: <42934314.9080902@cenix-bioscience.com> Hello Melodee, I think what you want to do is to use one of the Bio::SearchIO::Writer::* modules. This is a link to the HowTo describing how to use these: http://bioperl.org/HOWTOs/SearchIO/outputting.html Basically, you want to create a Bio::SearchIO::Writer::TextResultWriter object and use this to set the 'writer' attribute in the constructor for Bio::SearchIO. Here is an example: my $writer = new Bio::SearchIO::Writer::TextResultWriter(); my $out = new Bio::SearchIO(-writer => $writer, -file => ">your_file.txt"); while ( my $blast = $in->next_result() ) $out->write_result($blast); } That should do the trick. Andrew Melodee Patterson wrote: > Hello, > > I have written a number of perl and bioperl scripts to parse large phage BLASTx files, and now my boss wants me to remove some bacterial contaminations before I run my reports. I know how to use Bio::SearchIO to bring in the blast reports, and I know how to find the reports that I don't want using regular expressions, but what I can't figure out is how to write the good BLAST reports out to a file just as they are. I don't want to parse them at all, since my other scripts are expecting them as full BLAST reports and I'd rather not rewrite them. > > If I can do this: > > use Bio::SeqIO; > $in = Bio::SeqIO->new('-file' => "AIZX_test.fas", > '-format' => 'Fasta'); > $out = Bio::SeqIO->new('-file' => ">AIZX_test_out.txt", > '-format' => 'Fasta'); > while ( my $seq = $in->next_seq() ) {$out->write_seq($seq); } > > > why can't I do this: > > use Bio::SearchIO; > $in = Bio::SearchIO->new('-file' => "BLASTx_test.txt", > '-format' => 'blast'); > $out = Bio::SearchIO->new('-file' => ">BLASTx_test_out.txt", > '-format' => 'blast'); > while ( my $blast = $in->next_result() ) {$out->write_result($blast); } > > Thanks - I appreciate any help I can get! > > Melodee > Lucigen Corp. > 608-831-9011 > mpatterson@lucigen.com > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Tel. +49-351-4173 137 Fax +49-351-4173 109 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From akarger at CGR.Harvard.edu Tue May 24 11:14:53 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue May 24 11:22:39 2005 Subject: [Bioperl-l] convert fasta output to blast -m8? Message-ID: <339D68B133EAD311971E009027DC479702DC843D@montecarlo.cgr.harvard.edu> Hi. I've been asked to translate Fasta output to Blast -m8 output. I could do it by hand, but I have a feeling SearchIO & Writer can do this pretty easily. Can someone give me a couple hints? I tried running a ridiculously simple script on fasta -m9 output: use Bio::SearchIO; my $searchio = new Bio::SearchIO(-format => 'fasta', -file => 'short.out'); while( my $result = $searchio->next_result ) { print $result->query_name; } And I got: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.4/Bio/Search/HSP/GenericHSP.pm line 231, line 61. ------------- EXCEPTION ------------- MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm FASTP -score 186.3 -hit_frame 0 -hsp_length 300 -hit_seq PPPPPPTAETFDSDQTSSFSDINSTTASAPTTPAPALPPASPEVRKEETHPKHSLPPLPNQFAPLPDPPQHNSPPQ NNAPSQPQSNPFPFPIPEIPSTQSATNPFPFPVPQQQ--FNQAPSMGIPQQNRPLPQLPNRNNRPVPPPPPMRTTT EGSGVRL---PAPPPP---PRRGPAPPPPPHRHVTSNTL------NSAGGNSLLPQATGRRGPAPPPPPRASRPTP NVTMQQNPQQYNNSNRPFGYQTNSNMSSPPPPPVTTFNTLTPQMTAATGQPAVPLPQNTQAPSQATNVPVAP -hit_length 300 -query_length 300 -query_frame 0 -swscore 212 -rank 1 -query_seq MYQSMTVP-PFRPYGGDDIRVVSDLSRFDYQPDQKIRSRNPTPP---STINDNVSSSKLTLDTIIPLY---SSKID ERPKYSPLRQQEDRSTQYPSPPIPVKEEPTITIPKREKKKVRYSIGVQVPQDNGGISMTNNPAPPAPVPVPVPAPA PPPPPPKDIAPRSMPYPQDINNANNLPPMPQPTSQLYPQQQLPPLPYKDSSSITSPQKRLEKKLIKQVMNRPVIQF KADRFGQNYEGEYFTISANFVIYVFEVCCSVVEIVLSSILLQRDQDI -homology_seq :.: : : .. ..: . . : . . : :: : ..:. :. . .:. :.. : :. ::. .: :: :: ...: .:.:... : ... ...:. . :::: : : .::::::: . .. .. :.:.. .:: :.. : :: . . ..: : -hit_name lcl|cerevisiae|YOR181W| -bits 44.0 -query_name lcl|albicans|CA0100| -evalue 8.3e-05 (qs=' STACK Bio::Search::HSP::GenericHSP::new /usr/local/lib/perl5/site_perl/5.8.4/Bio/Search/HSP/GenericHSP.pm:231 STACK Bio::Search::HSP::FastaHSP::new /usr/local/lib/perl5/site_perl/5.8.4/Bio/Search/HSP/FastaHSP.pm:97 STACK Bio::Factory::ObjectFactory::create_object /usr/local/lib/perl5/site_perl/5.8.4/Bio/Factory/ObjectFactory.pm:150 STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp /usr/local/lib/perl5/site_perl/5.8.4/Bio/SearchIO/SearchResultEventBuilder.p m:275 STACK Bio::SearchIO::fasta::end_element /usr/local/lib/perl5/site_perl/5.8.4/Bio/SearchIO/fasta.pm:872 STACK Bio::SearchIO::fasta::next_result /usr/local/lib/perl5/site_perl/5.8.4/Bio/SearchIO/fasta.pm:403 STACK toplevel a.pl:8 -------------------------------------- lcl|albicans|CA0099| (The last thing is actually the query, so it's sort of doing the right thing. And line 61 of short.out (where the uninitialized value happens) is the beginning of the second hit. Running bp_filter_search.pl -format fasta -score 150 on the same output file produced no output at all. Is -m9 confusing it? Or is there some other problem? Pointers to docs etc. appreciated. -Amir Karger From jason.stajich at duke.edu Tue May 24 12:17:57 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue May 24 12:14:29 2005 Subject: [Bioperl-l] convert fasta output to blast -m8? In-Reply-To: <339D68B133EAD311971E009027DC479702DC843D@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC479702DC843D@montecarlo.cgr.harvard.edu> Message-ID: I think this all depends on your version of FASTA and Bioperl - there were some changes in the FASTA output format which caused breakage in older bioperl SearchIO:;fasta parser. I answered a similar question recently on the list: http://bioperl.org/pipermail/bioperl-l/2005-May/018870.html Also if you are just doing -m8 output I would run fasta with -d 0 -m 9 options. And if you really just want to do FASTA 2 BLAST tables (which I do all the time for my stuff) and want a super-fast parser for this I wrote a simple script in scripts/searchio/fastam9_to_table.PLS -jason On May 24, 2005, at 11:14 AM, Amir Karger wrote: > Hi. > > I've been asked to translate Fasta output to Blast -m8 output. I > could do it > by hand, but I have a feeling SearchIO & Writer can do this pretty > easily. > Can someone give me a couple hints? > > I tried running a ridiculously simple script on fasta -m9 output: > > use Bio::SearchIO; > my $searchio = new Bio::SearchIO(-format => 'fasta', > -file => 'short.out'); > while( my $result = $searchio->next_result ) { > print $result->query_name; > } > > And I got: > > Use of uninitialized value in concatenation (.) or string at > /usr/local/lib/perl5/site_perl/5.8.4/Bio/Search/HSP/GenericHSP.pm > line 231, > line 61. > > ------------- EXCEPTION ------------- > MSG: Did not specify a Query End or Query Begin -verbose 0 - > algorithm FASTP > -score 186.3 -hit_frame 0 -hsp_length 300 -hit_seq > PPPPPPTAETFDSDQTSSFSDINSTTASAPTTPAPALPPASPEVRKEETHPKHSLPPLPNQFAPLPDPPQ > HNSPPQ > NNAPSQPQSNPFPFPIPEIPSTQSATNPFPFPVPQQQ-- > FNQAPSMGIPQQNRPLPQLPNRNNRPVPPPPPMRTTT > EGSGVRL---PAPPPP---PRRGPAPPPPPHRHVTSNTL------ > NSAGGNSLLPQATGRRGPAPPPPPRASRPTP > NVTMQQNPQQYNNSNRPFGYQTNSNMSSPPPPPVTTFNTLTPQMTAATGQPAVPLPQNTQAPSQATNVPV > AP > -hit_length 300 -query_length 300 -query_frame 0 -swscore 212 -rank 1 > -query_seq > MYQSMTVP-PFRPYGGDDIRVVSDLSRFDYQPDQKIRSRNPTPP--- > STINDNVSSSKLTLDTIIPLY---SSKID > ERPKYSPLRQQEDRSTQYPSPPIPVKEEPTITIPKREKKKVRYSIGVQVPQDNGGISMTNNPAPPAPVPV > PVPAPA > PPPPPPKDIAPRSMPYPQDINNANNLPPMPQPTSQLYPQQQLPPLPYKDSSSITSPQKRLEKKLIKQVMN > RPVIQF > KADRFGQNYEGEYFTISANFVIYVFEVCCSVVEIVLSSILLQRDQDI -homology_seq > :.: : : .. ..: . . : . . : :: : ..:. :. . .:. :.. > : > :. ::. .: :: :: ...: .:.:... : ... ...:. . :::: : : .:: > ::::: > . .. .. :.:.. .:: :.. : :: . . ..: : > -hit_name lcl|cerevisiae|YOR181W| -bits 44.0 -query_name > lcl|albicans|CA0100| -evalue 8.3e-05 (qs=' > STACK Bio::Search::HSP::GenericHSP::new > /usr/local/lib/perl5/site_perl/5.8.4/Bio/Search/HSP/GenericHSP.pm:231 > STACK Bio::Search::HSP::FastaHSP::new > /usr/local/lib/perl5/site_perl/5.8.4/Bio/Search/HSP/FastaHSP.pm:97 > STACK Bio::Factory::ObjectFactory::create_object > /usr/local/lib/perl5/site_perl/5.8.4/Bio/Factory/ObjectFactory.pm:150 > STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp > /usr/local/lib/perl5/site_perl/5.8.4/Bio/SearchIO/ > SearchResultEventBuilder.p > m:275 > STACK Bio::SearchIO::fasta::end_element > /usr/local/lib/perl5/site_perl/5.8.4/Bio/SearchIO/fasta.pm:872 > STACK Bio::SearchIO::fasta::next_result > /usr/local/lib/perl5/site_perl/5.8.4/Bio/SearchIO/fasta.pm:403 > STACK toplevel a.pl:8 > > -------------------------------------- > lcl|albicans|CA0099| > > (The last thing is actually the query, so it's sort of doing the right > thing. And line 61 of short.out (where the uninitialized value > happens) is > the beginning of the second hit. > > Running bp_filter_search.pl -format fasta -score 150 on the same > output file > produced no output at all. Is -m9 confusing it? Or is there some other > problem? > > Pointers to docs etc. appreciated. > > -Amir Karger > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From mpatterson at lucigen.com Tue May 24 15:16:14 2005 From: mpatterson at lucigen.com (Melodee Patterson) Date: Tue May 24 15:11:07 2005 Subject: [Bioperl-l] SearchIO newbie question References: <001101c5606b$3c86c6c0$4701a8c0@Lucigen.local> <42934314.9080902@cenix-bioscience.com> Message-ID: <001401c56095$0df79490$4701a8c0@Lucigen.local> Thank you to those who responded (so quickly!) to my dilemma. Apparently Bio::SearchIO::Writer::TextResultWriter is as close as I can get to the original report. Unfortunately, it changes the report enough that my other scripts will need to be extensively reworked in order to use them. But, the module has given me some great ideas on how I can rewrite those existing reports. (If I can convince my boss that change is not always bad!) Thanks again. Melodee ----- Original Message ----- From: "Andrew Walsh" To: "Melodee Patterson" Cc: Sent: Tuesday, May 24, 2005 10:07 AM Subject: Re: [Bioperl-l] SearchIO newbie question > Hello Melodee, > > I think what you want to do is to use one of the Bio::SearchIO::Writer::* > modules. > > This is a link to the HowTo describing how to use these: > > http://bioperl.org/HOWTOs/SearchIO/outputting.html > > > Basically, you want to create a Bio::SearchIO::Writer::TextResultWriter > object and use this to set the 'writer' attribute in the constructor for > Bio::SearchIO. > > Here is an example: > > my $writer = new Bio::SearchIO::Writer::TextResultWriter(); > my $out = new Bio::SearchIO(-writer => $writer, > -file => ">your_file.txt"); > > while ( my $blast = $in->next_result() ) > $out->write_result($blast); > } > > That should do the trick. > > Andrew > > Melodee Patterson wrote: >> Hello, >> >> I have written a number of perl and bioperl scripts to parse large phage >> BLASTx files, and now my boss wants me to remove some bacterial >> contaminations before I run my reports. I know how to use Bio::SearchIO >> to bring in the blast reports, and I know how to find the reports that I >> don't want using regular expressions, but what I can't figure out is how >> to write the good BLAST reports out to a file just as they are. I don't >> want to parse them at all, since my other scripts are expecting them as >> full BLAST reports and I'd rather not rewrite them. >> >> If I can do this: >> >> use Bio::SeqIO; >> $in = Bio::SeqIO->new('-file' => "AIZX_test.fas", >> '-format' => 'Fasta'); >> $out = Bio::SeqIO->new('-file' => ">AIZX_test_out.txt", >> '-format' => 'Fasta'); >> while ( my $seq = $in->next_seq() ) {$out->write_seq($seq); } >> >> >> why can't I do this: >> >> use Bio::SearchIO; >> $in = Bio::SearchIO->new('-file' => "BLASTx_test.txt", >> '-format' => 'blast'); >> $out = Bio::SearchIO->new('-file' => ">BLASTx_test_out.txt", >> '-format' => 'blast'); >> while ( my $blast = $in->next_result() ) {$out->write_result($blast); } >> >> Thanks - I appreciate any help I can get! >> >> Melodee >> Lucigen Corp. >> 608-831-9011 >> mpatterson@lucigen.com >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- > ------------------------------------------------------------------ > Andrew Walsh, M.Sc. > Bioinformatics Software Engineer > IT Unit > Cenix BioScience GmbH > Tatzberg 47 > 01307 Dresden > Germany > Tel. +49-351-4173 137 > Fax +49-351-4173 109 > > public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg > ------------------------------------------------------------------ > > From ferdinand.marletaz at gmail.com Wed May 25 05:58:32 2005 From: ferdinand.marletaz at gmail.com (=?ISO-8859-1?Q?Ferdinand_Marl=E9taz?=) Date: Wed May 25 09:17:19 2005 Subject: [Bioperl-l] Convert Fasta to Phylip Message-ID: Hi, I'd like to ask something : I have a fasta file with pieces of manually controlled alignement (with - for "Gaps") and I've tried to read it with AlignIO ( just like my $in = Bio::AlignIO->new(-file => $inputfilename , '-format' => 'fasta'); ) in order to convert it to phylip formatted sequence but... it reply me MSG: Got a sequence with no letters in it cannot guess alphabet [] . So, I'd like to understand how to convert a fasta alignement containing file into a phylip file for phylogeny. Thank you very much ! Cheers Ferdi _____________________________ Ferdinand Marl?taz Evolution et phylog?nie des m?tazoaires UMR 6540 DIMAR Rue Batterie des Lions 13007 MARSEILLE Tel. 04 91 04 16 54 Port. 06 30 35 58 49 Mail. Ferdinand.Marletaz@ens-lyon.fr -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 853 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050525/03a78bb5/attachment.bin From Ferdinand.Marletaz at ens-lyon.fr Tue May 24 13:23:04 2005 From: Ferdinand.Marletaz at ens-lyon.fr (=?ISO-8859-1?Q?Ferdinand_Marl=E9taz?=) Date: Wed May 25 09:17:26 2005 Subject: [Bioperl-l] Convert Fasta to Phylip Message-ID: <58013c76f6388c5bee3de5e1c2e8d36f@ens-lyon.fr> Hi, I'd like to ask something : I have a fasta file with pieces of manually controlled alignement (with - for "Gaps") and I've tried to read it with AlignIO ( just like my $in = Bio::AlignIO->new(-file => $inputfilename , '-format' => 'fasta'); ) in order to convert it to phylip formatted sequence but... it reply me MSG: Got a sequence with no letters in it cannot guess alphabet [] . So, I'd like to understand how to convert a fasta alignement containing file into a phylip file for phylogeny. Thank you very much ! Cheers Ferdi _____________________________ Ferdinand Marl?taz Evolution et phylog?nie des m?tazoaires UMR 6540 DIMAR Rue Batterie des Lions 13007 MARSEILLE Tel. 04 91 04 16 54 Port. 06 30 35 58 49 Mail. Ferdinand.Marletaz@ens-lyon.fr -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 859 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050524/3d40c172/attachment.bin From jason.stajich at duke.edu Wed May 25 09:54:27 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed May 25 09:46:57 2005 Subject: [Bioperl-l] Convert Fasta to Phylip In-Reply-To: References: Message-ID: You should send an example of the file you are trying to convert. It works fine for me and you can see a test file in t/data/testaln.fasta -jason On May 25, 2005, at 5:58 AM, Ferdinand Marl?taz wrote: > Hi, > > I'd like to ask something : I have a fasta file with pieces of > manually controlled alignement (with - for "Gaps") and I've tried > to read it with AlignIO ( just like my $in = Bio::AlignIO->new(- > file => $inputfilename , '-format' => 'fasta'); ) in order to > convert it to phylip formatted sequence but... it reply me MSG: Got > a sequence with no letters in it cannot guess alphabet [] . So, I'd > like to understand how to convert a fasta alignement containing > file into a phylip file for phylogeny. > Thank you very much ! > > Cheers > > Ferdi > > _____________________________ > Ferdinand Marl?taz > Evolution et phylog?nie des m?tazoaires > UMR 6540 DIMAR > Rue Batterie des Lions > 13007 MARSEILLE > Tel. 04 91 04 16 54 > Port. 06 30 35 58 49 > Mail. Ferdinand.Marletaz@ens-lyon.f -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From golharam at umdnj.edu Wed May 25 14:13:36 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed May 25 14:30:53 2005 Subject: [Bioperl-l] Graphical Debugger for Linux Message-ID: <002a01c56155$78435440$e6028a0a@GOLHARMOBILE1> I'm not sure where to post this so I thought I'd post it here...I apologize if this isn't the correct area. Can anyone recommend a good graphical debugger for Perl for Linux that is free? Ryan From brian_osborne at cognia.com Wed May 25 15:02:23 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed May 25 14:54:38 2005 Subject: [Bioperl-l] Graphical Debugger for Linux In-Reply-To: <002a01c56155$78435440$e6028a0a@GOLHARMOBILE1> Message-ID: Ryan, PTKDB. http://world.std.com/~aep/ptkdb/ Brian O. On 5/25/05 2:13 PM, "Ryan Golhar" wrote: > I'm not sure where to post this so I thought I'd post it here...I > apologize if this isn't the correct area. > > > Can anyone recommend a good graphical debugger for Perl for Linux that > is free? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Steve_Chervitz at affymetrix.com Wed May 25 15:20:06 2005 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Wed May 25 15:13:30 2005 Subject: [Bioperl-l] Graphical Debugger for Linux Message-ID: There's also an Eclipse plugin that gives you an IDE for perl development and includes a debugger: http://e-p-i-c.sourceforge.net/ If you use emacs, you can use Perl's debugger with 'M-x perldb' and then 'perl myscript.pl'. (In emacs' shell you send an EOF with C-c C-d.) Steve -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Brian Osborne Sent: Wednesday, May 25, 2005 12:02 PM To: golharam@umdnj.edu; 'Bioperl List' Subject: Re: [Bioperl-l] Graphical Debugger for Linux Ryan, PTKDB. http://world.std.com/~aep/ptkdb/ Brian O. On 5/25/05 2:13 PM, "Ryan Golhar" wrote: > I'm not sure where to post this so I thought I'd post it here...I > apologize if this isn't the correct area. > > > Can anyone recommend a good graphical debugger for Perl for Linux that > is free? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From Jonathan_Epstein at nih.gov Wed May 25 15:26:03 2005 From: Jonathan_Epstein at nih.gov (Jonathan Epstein) Date: Wed May 25 15:18:19 2005 Subject: [Bioperl-l] Graphical Debugger for Linux In-Reply-To: <002a01c56155$78435440$e6028a0a@GOLHARMOBILE1> References: <002a01c56155$78435440$e6028a0a@GOLHARMOBILE1> Message-ID: <6.2.1.2.2.20050525152159.02cc7bb0@nihexchange4.nih.gov> I'm not sure that I'd call it 'good', but Devel::ptktb is certainly usable & is free. I use it on Windoze as well. http://search.cpan.org/dist/Devel-ptkdb/ptkdb.pm Run it as perl -d:ptkdb yourprog.pl HTH, Jonathan At 02:13 PM 5/25/2005, Ryan Golhar wrote: >I'm not sure where to post this so I thought I'd post it here...I >apologize if this isn't the correct area. > > >Can anyone recommend a good graphical debugger for Perl for Linux that >is free? > >Ryan From ak at ebi.ac.uk Wed May 25 18:52:13 2005 From: ak at ebi.ac.uk (Andreas Kahari) Date: Wed May 25 18:44:33 2005 Subject: [Bioperl-l] Graphical Debugger for Linux In-Reply-To: <002a01c56155$78435440$e6028a0a@GOLHARMOBILE1> References: <002a01c56155$78435440$e6028a0a@GOLHARMOBILE1> Message-ID: <20050525225213.GB1545@ebi.ac.uk> DDD works well on C and C++ code, and is said to be able to work with Perl code as well, but I honestly haven't used it for that: http://www.gnu.org/software/ddd/ Interestingly, the DDD page contains a link to "A list of Perl debuggers" at http://www.perl.com/cs/user/query/q/6?id_topic=40 Hope this helps. Regards, Andreas On Wed, May 25, 2005 at 02:13:36PM -0400, Ryan Golhar wrote: > I'm not sure where to post this so I thought I'd post it here...I > apologize if this isn't the correct area. > > > Can anyone recommend a good graphical debugger for Perl for Linux that > is free? > > Ryan -- Andreas K?h?ri EMBL-EBI/ensembl www.ensembl.org 1024D/C2E163CB F4C4 A41A 665B 448A 3FA9 6AEA 12E3 39DA C2E1 63CB From mpatterson at lucigen.com Thu May 26 10:54:55 2005 From: mpatterson at lucigen.com (Melodee Patterson) Date: Thu May 26 10:47:04 2005 Subject: [Bioperl-l] sequence alignment modules Message-ID: <002101c56202$e3cb1100$4701a8c0@Lucigen.local> Another newbie question: Is there a module or does someone have the code to recreate the hsp sequence alignments that will look just like those in a blastx report? I have checked the formats in Bio::AlignIO, but clustalw is as close as it gets. Thanks for the help! Melodee Lucigen Corp. 608-831-9011 mpatterson@lucigen.com From mbertalan at gmail.com Wed May 25 11:39:20 2005 From: mbertalan at gmail.com (Marcelo bertalan) Date: Thu May 26 17:47:20 2005 Subject: [Bioperl-l] Volunteers to contributions of Perl code Message-ID: Hi, My name is Marcelo Bertalan. Im doing a Master degree in Bioinformatics (Denmark, www.cbs.dtu.dk). I have some experience with PERL and a few in BioPERL. So i would like to known how can i help BioPERL. Best Regards, Marcelo Bertalan. From lstein at cshl.edu Wed May 25 19:01:21 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Thu May 26 17:47:24 2005 Subject: [Bioperl-l] Graphical Debugger for Linux In-Reply-To: <6.2.1.2.2.20050525152159.02cc7bb0@nihexchange4.nih.gov> References: <002a01c56155$78435440$e6028a0a@GOLHARMOBILE1> <6.2.1.2.2.20050525152159.02cc7bb0@nihexchange4.nih.gov> Message-ID: <200505251901.21681.lstein@cshl.edu> From my point of view, nothing beats perldb (part of the "grand unified debugger") under XEmacs. You get breakpoints, triggers, traceback and even a nifty class diagram if I could only figure out how it works! Lincoln On Wednesday 25 May 2005 03:26 pm, Jonathan Epstein wrote: > I'm not sure that I'd call it 'good', but Devel::ptktb is certainly > usable & is free. I use it on Windoze as well. > http://search.cpan.org/dist/Devel-ptkdb/ptkdb.pm > > Run it as > perl -d:ptkdb yourprog.pl > > HTH, > > Jonathan > > At 02:13 PM 5/25/2005, Ryan Golhar wrote: > >I'm not sure where to post this so I thought I'd post it here...I > >apologize if this isn't the correct area. > > > > > >Can anyone recommend a good graphical debugger for Perl for Linux that > >is free? > > > >Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050525/8b062d27/attachment-0002.bin From lstein at cshl.edu Wed May 25 19:01:21 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Thu May 26 17:47:26 2005 Subject: [Bioperl-l] Graphical Debugger for Linux In-Reply-To: <6.2.1.2.2.20050525152159.02cc7bb0@nihexchange4.nih.gov> References: <002a01c56155$78435440$e6028a0a@GOLHARMOBILE1> <6.2.1.2.2.20050525152159.02cc7bb0@nihexchange4.nih.gov> Message-ID: <200505251901.21681.lstein@cshl.edu> From my point of view, nothing beats perldb (part of the "grand unified debugger") under XEmacs. You get breakpoints, triggers, traceback and even a nifty class diagram if I could only figure out how it works! Lincoln On Wednesday 25 May 2005 03:26 pm, Jonathan Epstein wrote: > I'm not sure that I'd call it 'good', but Devel::ptktb is certainly > usable & is free. I use it on Windoze as well. > http://search.cpan.org/dist/Devel-ptkdb/ptkdb.pm > > Run it as > perl -d:ptkdb yourprog.pl > > HTH, > > Jonathan > > At 02:13 PM 5/25/2005, Ryan Golhar wrote: > >I'm not sure where to post this so I thought I'd post it here...I > >apologize if this isn't the correct area. > > > > > >Can anyone recommend a good graphical debugger for Perl for Linux that > >is free? > > > >Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050525/8b062d27/attachment-0003.bin From WiersmaP at AGR.GC.CA Wed May 25 19:46:54 2005 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Thu May 26 17:47:27 2005 Subject: [Bioperl-l] Newbie question: How should I report bugs / suggested changes i.e. in Bio::Matrix::Generic? Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4B2A@onncrxms5.agr.gc.ca> Hi Jason Stajich et al, I've made some necessary changes to my copy of Bio::Matrix::Generic but wanted to check if there is an agreed upon protocol for using CVS or should I direct them to the builder/maintainer? I've also added a few lines to initialize an 'empty' matrix using only row and column headings so that I can fill it in using header values. Works better than building the array of array refs from scratch each time to build up the matrix framework. Matrix appears to be 0-based. line 38f says: Data can be accessed by column and row names or indexes. Matrix indexes start at 1. Methods: remove_column and remove_row have a few intercopying errors Regards, Paul wiersmap@agr.gc.ca From gad14 at cornell.edu Thu May 26 17:03:42 2005 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Thu May 26 17:47:29 2005 Subject: [Bioperl-l] ribosome binding sites (RBS) Message-ID: <429639AE.3000608@cornell.edu> Hi, I'm looking for the right bioperl class to use to describe ribosome binding site (RBS) features in a bacterial genome. The closest thing I could find is Bio::SeqFeature::Gene::UTR, but it seems that this is too broad for my needs because, as I understand, an untranslated region (a 5' UTR) can contain more than one type of untranslated feature. Unless I'm missing something, it looks like I'll need to create an "RBS" class. I'm thinking that it will need to inherit from Bio::SeqFeature::Generic. Inheriting from Bio::SeqFeature::Gene::UTR wouldn't make sense because a UTR *has* RBS's - RBS's are not UTRs. Has anyone else described/instantiated RBS's with bioperl? Any comments or insight would be appreciated. (I'm still getting my sea legs with bioperl and with navigating all of what's in bioperl.. I'm not always sure if I'm overlooking something or if it just hasn't been developed yet.) Thanks, Genevieve From allenday at ucla.edu Thu May 26 18:41:01 2005 From: allenday at ucla.edu (Allen Day) Date: Thu May 26 18:36:56 2005 Subject: [Bioperl-l] Volunteers to contributions of Perl code In-Reply-To: References: Message-ID: Are you offering to contribute new code that you've developed externally to the bioperl project, or to help maintain/extend code already in bioperl? -Allen On Wed, 25 May 2005, Marcelo bertalan wrote: > Hi, > > My name is Marcelo Bertalan. Im doing a Master degree in > Bioinformatics (Denmark, www.cbs.dtu.dk). I have some experience with > PERL and a few in BioPERL. So i would like to known how can i help > BioPERL. > > Best Regards, > > Marcelo Bertalan. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From fernan at iib.unsam.edu.ar Fri May 27 09:09:48 2005 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Fri May 27 09:03:05 2005 Subject: [Bioperl-l] ribosome binding sites (RBS) In-Reply-To: <429639AE.3000608@cornell.edu> References: <429639AE.3000608@cornell.edu> Message-ID: <20050527130948.GD13364@iib.unsam.edu.ar> +----[ Genevieve DeClerck (26.May.2005 19:16): | | Hi, | | I'm looking for the right bioperl class to use to describe ribosome | binding site (RBS) features in a bacterial genome. | | The closest thing I could find is Bio::SeqFeature::Gene::UTR, but it | seems that this is too broad for my needs because, as I understand, an | untranslated region (a 5' UTR) can contain more than one type of | untranslated feature. Unless I'm missing something, it looks like I'll | need to create an "RBS" class. I'm thinking that it will need to inherit | from Bio::SeqFeature::Generic. Inheriting from | Bio::SeqFeature::Gene::UTR wouldn't make sense because a UTR *has* RBS's | - RBS's are not UTRs. Genevieve, [ notice: I have not done this RBS mapping myself ] If you only need to map this feature, then Bio::SeqFeature::Generic should be OK, you'll have all your features, whatever they're called mapped onto specific locations on the genome. If you want to have a structure in bioperl that would allow you to map several features at once, and you want this structure to represent (in a hierarchy) the underlying biology, then Bio::SeqFeature::Gene seems the way to go (more experienced bioperlers, please correct me if I'm wrong). Though in this case, yes, it seems like there is no specific RBS module, but perhaps you can use any other module that is designed to map a 'signal' onto a sequence. Looking at the GenBank/EMBL feature table definition, it seems like Bio::SeqFeature::Gene is trying to map the features relevant for a 'precursor_RNA', and these features are not supposed to get nested (they don't overlap). http://www.ncbi.nlm.nih.gov/collab/FT/#7.3 The feature table def has RBS under 'misc_signal' together with other 'signals' (ie features that logically have to be nested within other feature). One possibility is to have RBS be a subfeature of UTR. But I don't know how deep a feature hierarchy can be in Bioperl and whether this can then be mapped back easily to genbank, nor if you would need to do that. Clearly there is no 1 to 1 mapping between bioperl modules and genbank features (no Bio::SeqFeature::MiscSignal), but perhaps there are reasons for this. Perhaps you can work around this by reusing existing objects? | Has anyone else described/instantiated RBS's with bioperl? | | Any comments or insight would be appreciated. | | Thanks, | Genevieve | +----] I'm sorry I can't be more specific. Hope this give you some ideas. Fernan From skirov at utk.edu Fri May 27 09:20:52 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri May 27 09:14:58 2005 Subject: [Bioperl-l] [Fwd: Human pathway database survey] Message-ID: <42971EB4.1020401@utk.edu> Thought people might be interested.... Stefan Forwarded message: The Reactome, HumanCyc and Panther PATHWAYS database projects are planning to pool their efforts to create an open, online, curated database of biological pathways in human. In preparation for this, we are actively soliciting comments from the biological research community in order to assess community needs. For information, Reactome is a joint project between the groups of Lincoln Stein (CSHL), Ewan Birney (EBI) and Suzi Lewis (UC Berekely). Could you take a moment to visit the following page to take a short online survey of your current pathway database usage and future needs? http://www.advancedsurvey.com/default.asp?SurveyID=26632 Please feel free to forward this onto other interested researchers. Thanks, Ewan From jason.stajich at duke.edu Sat May 28 23:09:07 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat May 28 23:02:53 2005 Subject: [Bioperl-l] Convert Fasta to Phylip In-Reply-To: <6.2.1.2.0.20050528230021.030bd770@ioc.fiocruz.br> References: <6.2.1.2.0.20050528230021.030bd770@ioc.fiocruz.br> Message-ID: <007D0768-774C-43DB-A23D-CE2346DC381A@duke.edu> AlignIO does both of them. you need to say -interleave => 0 for phylip format as non-interleaved (although most programs will handle both I think). AlignIO::nexus writes files fine for PAUP, I use it all the time. I have to pare down seq ids so they're unique in phylip format for PAML,PHYML,MB,PROTML. there are a couple of options you need to apply to make MB compliant nexus files (plus make sure there are no dashes in your seqids). I store everything in fasta MSA format and then convert to phylip/ nexus as necessary since it is easy to manage and will let you have long seq descriptions (I know selex/stockholm will too but fasta comes directly out of MUSCLE too). -jason On May 28, 2005, at 10:10 PM, Alberto Davila wrote: > Hi all, > > Talking about format conversion, I recently needed to deal with > Phylip, Paup and PhyML (http://atgc.lirmm.fr/phyml/) and > unfortunately it is still a pain to convert files among them. > ClustalW can generate an output in Phylip interleave format, but > not the non-interlevae one. Seqret (Emboss) claims to be able to > convert between those 2 Phylip formats, but did not work to me. > Would be there any Bioperl module to dela with that ? or plans to > expand the SeqIO module to deal with both Phylip formats and Paup > (nexus) ? > > Thanks, Alberto > > At 10:54 25/5/2005, you wrote: > >> You should send an example of the file you are trying to convert. It >> works fine for me and you can see a test file in t/data/testaln.fasta >> >> -jason >> >> On May 25, 2005, at 5:58 AM, Ferdinand Marl?taz wrote: >> >> >>> Hi, >>> >>> I'd like to ask something : I have a fasta file with pieces of >>> manually controlled alignement (with - for "Gaps") and I've tried >>> to read it with AlignIO ( just like my $in = Bio::AlignIO->new(- >>> file => $inputfilename , '-format' => 'fasta'); ) in order to >>> convert it to phylip formatted sequence but... it reply me MSG: Got >>> a sequence with no letters in it cannot guess alphabet [] . So, I'd >>> like to understand how to convert a fasta alignement containing >>> file into a phylip file for phylogeny. >>> Thank you very much ! >>> >>> Cheers >>> >>> Ferdi >>> >>> _____________________________ >>> Ferdinand Marl?taz >>> Evolution et phylog?nie des m?tazoaires >>> UMR 6540 DIMAR >>> Rue Batterie des Lions >>> 13007 MARSEILLE >>> Tel. 04 91 04 16 54 >>> Port. 06 30 35 58 49 >>> Mail. Ferdinand.Marletaz@ens-lyon.f >>> >> >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> > > > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From lifei03 at gmail.com Sun May 29 03:36:45 2005 From: lifei03 at gmail.com (Fei Li) Date: Sun May 29 03:29:44 2005 Subject: [Bioperl-l] please help me to check why this perl script does not work! Message-ID: <1e3d81a1050529003617cc134b@mail.gmail.com> I wish to submit some protein sequences via LWP:UserAgent to the http://www.cbs.dtu.dk/services/ChloroP/ The error message is given as follows: Webface Error

Webface Error:

Read: Field not declared; 'seqpaste'


********************************* #! /usr/bin/perl -w use strict; use HTTP::Request::Common; use LWP::UserAgent; my $browser = LWP::UserAgent->new; my @seq; my $count =0; my $infile = "my_test_file"; #input my Fasta file open INPUT, "$infile" or die "can not open $infile:$!"; open OUTPUT, ">>$infile.ChloroP.res" or die "Cannot create the output file: $!"; @seq = ; my $total = @seq; foreach my $item (@seq){ $count ++; chomp $item; my $response = $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', [ "SEQPASTE" => "$item", "submit" => "Submit"] ); warn "WARN!: ", $response->status_line, "\n" unless $response->is_success; if($response->is_success){ my $result = $response->content; open OUTPUT, ">>$infile.ChloroP.res" or die "Cannot create the output file: $!"; print OUTPUT "$result\n"; print OUTPUT "\n*********************************\n\n\n"; } print "$count of $total finished\n"; } close OUTPUT; close INPUT; From Marc.Logghe at devgen.com Sun May 29 16:50:03 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Sun May 29 16:42:20 2005 Subject: [Bioperl-l] please help me to check why this perl script does notwork! Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E7DC@ANTARESIA.be.devgen.com> Hi, There are a few issues with the script: 1) @seq is not an array of (fasta) sequences, but in your case a list of lines. This can be fixed like so: #@seq = ; local $/ = undef; @seq = split /\n(?=>)/, ; 2) You have to POST all necessary parameters, including the hidden ones. Also, apparently, you have to pass a hash ref in stead of an array ref to post(): my $response = $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', { "SEQPASTE" => "$item", full => 'full', "configfile" => "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf" } ); 3) The resulting content is an html page with javascript in it and a url where to fetch the result when finished. So I guess you have to parse out that url and perform another request. HTH, Marc Marc Logghe, PhD deVGen NV Technologiepark 30 B - 9052 Ghent-Zwijnaarde Tel. +32 9 324 24 83 Fax. +32 9 324 24 25 Web: www.devgen.com --- Disclaimer start --- This e-mail and any attachments thereto may contain information which is confidential and/or which is proprietary to the sender. Accordingly, this e-mail and any attachments thereto, as well as any and all information contained therein, are intended for the sole use of the recipient or recipients designated above. Any use of this e-mail, of any attachments thereto, of any and all information contained therein, and/or of any part(s) thereof (including, without limitation, total or partial reproduction, communication and/or distribution in any form) by persons other than the designated recipient(s) is prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you for your cooperation. --- Disclaimer end --- > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Fei Li > Sent: Sunday, May 29, 2005 9:37 AM > To: Bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] please help me to check why this perl > script does notwork! > > I wish to submit some protein sequences via LWP:UserAgent to > the http://www.cbs.dtu.dk/services/ChloroP/ > > The error message is given as follows: > > Webface Error color=red>

Webface Error:

> Read: Field not declared; 'seqpaste'


> > ********************************* > > > #! /usr/bin/perl -w > use strict; > use HTTP::Request::Common; > use LWP::UserAgent; > > my $browser = LWP::UserAgent->new; > my @seq; > my $count =0; > > my $infile = "my_test_file"; #input my Fasta file > > open INPUT, "$infile" or die "can not open $infile:$!"; open > OUTPUT, ">>$infile.ChloroP.res" or die "Cannot create the > output file: $!"; > > @seq = ; > my $total = @seq; > > foreach my $item (@seq){ > $count ++; > chomp $item; > my $response = > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > [ "SEQPASTE" => "$item", > "submit" => "Submit"] > ); > warn "WARN!: ", $response->status_line, "\n" unless > $response->is_success; > > if($response->is_success){ > my $result = $response->content; > open OUTPUT, ">>$infile.ChloroP.res" or die > "Cannot create the output file: $!"; > print OUTPUT "$result\n"; > print OUTPUT > "\n*********************************\n\n\n"; > } > print "$count of $total finished\n"; > } > > close OUTPUT; > close INPUT; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From lifei03 at gmail.com Sun May 29 22:41:50 2005 From: lifei03 at gmail.com (Fei Li) Date: Sun May 29 22:34:10 2005 Subject: [Bioperl-l] please help me to check why this perl script does notwork! In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E7DC@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA606E7DC@ANTARESIA.be.devgen.com> Message-ID: <1e3d81a1050529194134a82002@mail.gmail.com> Thanks all. I tried the both "SEQPASTE" and "seqpaste". but it did not work. Also, I modified the script according to Marc 's advice. my $response = $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', {"configfile" => "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf", "SEQPASTE" => "$input_sequence", "SEQSUB" => "", "full" => "full", "submit" => "Submit" } ); unfortunately, it still did work. The problem is still complain that "Read: Field not declared; 'seqpaste'" What's more interesiting, when I move the hidden value to another place. It complains differently my $response = $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', { "SEQPASTE" => "$input_sequence", "SEQSUB" => "", "full" => "full", "configfile" => "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf", "submit" => "Submit" } ); Complain: Read: Field not declared; 'submit' --------------I believe the problem is due to this sentence, but i do not know what it is and where it is :-( On 5/30/05, Marc Logghe wrote: > Hi, > There are a few issues with the script: > 1) @seq is not an array of (fasta) sequences, but in your case a list of > lines. This can be fixed like so: > #@seq = ; > local $/ = undef; > @seq = split /\n(?=>)/, ; > > 2) You have to POST all necessary parameters, including the hidden ones. > Also, apparently, you have to pass a hash ref in stead of an array ref > to post(): > my $response = > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > { "SEQPASTE" => "$item", > full => 'full', > "configfile" => > "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf" > } > ); > > 3) The resulting content is an html page with javascript in it and a url > where to fetch the result when finished. So I guess you have to parse > out that url and perform another request. > > HTH, > Marc > > > Marc Logghe, PhD > deVGen NV > Technologiepark 30 > B - 9052 Ghent-Zwijnaarde > Tel. +32 9 324 24 83 > Fax. +32 9 324 24 25 > Web: www.devgen.com > > --- Disclaimer start --- > This e-mail and any attachments thereto may contain information which is > confidential and/or which is proprietary to the sender. Accordingly, > this e-mail and any attachments thereto, as well as any and all > information contained therein, are intended for the sole use of the > recipient or recipients designated above. Any use of this e-mail, of any > attachments thereto, of any and all information contained therein, > and/or of any part(s) thereof (including, without limitation, total or > partial reproduction, communication and/or distribution in any form) by > persons other than the designated recipient(s) is prohibited. If you > have received this e-mail in error, please notify the sender either by > telephone or by e-mail and delete the material from any computer. > Thank you for your cooperation. > --- Disclaimer end --- > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Fei Li > > Sent: Sunday, May 29, 2005 9:37 AM > > To: Bioperl-l@portal.open-bio.org > > Subject: [Bioperl-l] please help me to check why this perl > > script does notwork! > > > > I wish to submit some protein sequences via LWP:UserAgent to > > the http://www.cbs.dtu.dk/services/ChloroP/ > > > > The error message is given as follows: > > > > Webface Error > color=red>

Webface Error:

> > Read: Field not declared; 'seqpaste'


> > > > ********************************* > > > > > > #! /usr/bin/perl -w > > use strict; > > use HTTP::Request::Common; > > use LWP::UserAgent; > > > > my $browser = LWP::UserAgent->new; > > my @seq; > > my $count =0; > > > > my $infile = "my_test_file"; #input my Fasta file > > > > open INPUT, "$infile" or die "can not open $infile:$!"; open > > OUTPUT, ">>$infile.ChloroP.res" or die "Cannot create the > > output file: $!"; > > > > @seq = ; > > my $total = @seq; > > > > foreach my $item (@seq){ > > $count ++; > > chomp $item; > > my $response = > > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > > [ "SEQPASTE" => "$item", > > "submit" => "Submit"] > > ); > > warn "WARN!: ", $response->status_line, "\n" unless > > $response->is_success; > > > > if($response->is_success){ > > my $result = $response->content; > > open OUTPUT, ">>$infile.ChloroP.res" or die > > "Cannot create the output file: $!"; > > print OUTPUT "$result\n"; > > print OUTPUT > > "\n*********************************\n\n\n"; > > } > > print "$count of $total finished\n"; > > } > > > > close OUTPUT; > > close INPUT; > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From lifei03 at gmail.com Sun May 29 22:46:49 2005 From: lifei03 at gmail.com (Fei Li) Date: Sun May 29 22:38:53 2005 Subject: [Bioperl-l] ORF scan Message-ID: <1e3d81a10505291946749e236f@mail.gmail.com> Can anybody suggest me a good module to scan all the possible ORF(open reading fram) in a given nucleotide sequence Thanks. -- Do not guess who I am. I am not Bush in BlackHouse From lifei03 at gmail.com Mon May 30 05:27:17 2005 From: lifei03 at gmail.com (Fei Li) Date: Mon May 30 05:20:22 2005 Subject: [Bioperl-l] please help me to check why this perl script does notwork! In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E7DE@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA606E7DE@ANTARESIA.be.devgen.com> Message-ID: <1e3d81a105053002271f2c51e7@mail.gmail.com> Odd enough! I copy the sentences from your script. And it works now. possibly, I need not add the " "SEQSUB" , "submit". As matter of fact , I also remove the "full", it works fine. --Fei On 5/30/05, Marc Logghe wrote: > > > Thanks all. I tried the both "SEQPASTE" and "seqpaste". but > > it did not work. > > > > Also, I modified the script according to Marc 's advice. > > That is odd. In my hands it works. I'll attach the complete script. > The output is: > > Webface Jobsubmission > If Javascript is disabled, follow href="http://www.cbs.dtu.dk/cgi-bin/nph-webface?jobid=chlorop,429AB96A0C > 8A9C29&opt=wait">This link > > > > > > ********************************* > > HTH, > Marc > > > > > > > > my $response = > > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > > {"configfile" => > > "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf", > > "SEQPASTE" => "$input_sequence", > > "SEQSUB" => "", > > "full" => "full", > > "submit" => "Submit" > > } ); > > > > unfortunately, it still did work. > > > > The problem is still complain that "Read: Field not declared; > > 'seqpaste'" > > > > > > What's more interesiting, when I move the hidden value to > > another place. It complains differently > > > > my $response = > > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > > { > > "SEQPASTE" => "$input_sequence", > > "SEQSUB" => "", > > "full" => "full", > > "configfile" => > > "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf", > > "submit" => "Submit" > > } ); > > > > Complain: Read: Field not declared; 'submit' > > > > --------------I believe the problem is due to this sentence, > > but i do not know what it is and where it is :-( > > > > > > On 5/30/05, Marc Logghe wrote: > > > Hi, > > > There are a few issues with the script: > > > 1) @seq is not an array of (fasta) sequences, but in your > > case a list > > > of lines. This can be fixed like so: > > > #@seq = ; > > > local $/ = undef; > > > @seq = split /\n(?=>)/, ; > > > > > > 2) You have to POST all necessary parameters, including the > > hidden ones. > > > Also, apparently, you have to pass a hash ref in stead of > > an array ref > > > to post(): > > > my $response = > > > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > > > { "SEQPASTE" => "$item", > > > full => 'full', > > > "configfile" => > > > "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf" > > > } > > > ); > > > > > > 3) The resulting content is an html page with javascript in > > it and a > > > url where to fetch the result when finished. So I guess you have to > > > parse out that url and perform another request. > > > > > > HTH, > > > Marc > > > > > > > > > Marc Logghe, PhD > > > deVGen NV > > > Technologiepark 30 > > > B - 9052 Ghent-Zwijnaarde > > > Tel. +32 9 324 24 83 > > > Fax. +32 9 324 24 25 > > > Web: www.devgen.com > > > > > > --- Disclaimer start --- > > > This e-mail and any attachments thereto may contain > > information which > > > is confidential and/or which is proprietary to the sender. > > > Accordingly, this e-mail and any attachments thereto, as > > well as any > > > and all information contained therein, are intended for the > > sole use > > > of the recipient or recipients designated above. Any use of this > > > e-mail, of any attachments thereto, of any and all information > > > contained therein, and/or of any part(s) thereof > > (including, without > > > limitation, total or partial reproduction, communication and/or > > > distribution in any form) by persons other than the designated > > > recipient(s) is prohibited. If you have received this > > e-mail in error, > > > please notify the sender either by telephone or by e-mail > > and delete the material from any computer. > > > Thank you for your cooperation. > > > --- Disclaimer end --- > > > > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces@portal.open-bio.org > > > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Fei Li > > > > Sent: Sunday, May 29, 2005 9:37 AM > > > > To: Bioperl-l@portal.open-bio.org > > > > Subject: [Bioperl-l] please help me to check why this perl script > > > > does notwork! > > > > > > > > I wish to submit some protein sequences via LWP:UserAgent to the > > > > http://www.cbs.dtu.dk/services/ChloroP/ > > > > > > > > The error message is given as follows: > > > > > > > > Webface Error > > > color=red>

Webface Error:

> > > > Read: Field not declared; 'seqpaste'


> > > > > > > > ********************************* > > > > > > > > > > > > #! /usr/bin/perl -w > > > > use strict; > > > > use HTTP::Request::Common; > > > > use LWP::UserAgent; > > > > > > > > my $browser = LWP::UserAgent->new; > > > > my @seq; > > > > my $count =0; > > > > > > > > my $infile = "my_test_file"; #input my Fasta file > > > > > > > > open INPUT, "$infile" or die "can not open $infile:$!"; > > open OUTPUT, > > > > ">>$infile.ChloroP.res" or die "Cannot create the output > > file: $!"; > > > > > > > > @seq = ; > > > > my $total = @seq; > > > > > > > > foreach my $item (@seq){ > > > > $count ++; > > > > chomp $item; > > > > my $response = > > > > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > > > > [ "SEQPASTE" => "$item", > > > > "submit" => "Submit"] > > > > ); > > > > warn "WARN!: ", $response->status_line, "\n" unless > > > > $response->is_success; > > > > > > > > if($response->is_success){ > > > > my $result = $response->content; > > > > open OUTPUT, ">>$infile.ChloroP.res" or die > > > > "Cannot create the output file: $!"; > > > > print OUTPUT "$result\n"; > > > > print OUTPUT > > > > "\n*********************************\n\n\n"; > > > > } > > > > print "$count of $total finished\n"; > > > > } > > > > > > > > close OUTPUT; > > > > close INPUT; > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > -- Do not guess who I. am I am not Bush in BlackHouse From amackey at pcbi.upenn.edu Mon May 30 08:19:27 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon May 30 08:14:33 2005 Subject: [Bioperl-l] Fwd: bioperl for fasta searches References: Message-ID: Begin forwarded message: > From: "Caroline Reiff" > Date: May 30, 2005 6:37:13 AM EDT > To: > Subject: bioperl for fasta searches > > > Hi, > I am wondering whether there is a bioperl script that allows to do > multiple fastasearches at once. > -reading the sequences in fastaformat from a file > -allows option where you can select the database searched > -returns the result for each sequence into an outputfile. > > I have found a script like this for Blast but prefer to do a Fasta > search as Fasta provides a more suitable database (lacking > environmental clones) for our prokaryotic sequence searches. > > It would be great if you could help, > Many thanks, > CAroline -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From lifei03 at gmail.com Mon May 30 21:50:47 2005 From: lifei03 at gmail.com (Fei Li) Date: Mon May 30 21:43:16 2005 Subject: [Bioperl-l] B, Z, N, X in refseq Message-ID: <1e3d81a1050530185010d15e6e@mail.gmail.com> I am using the refseq from Genbank. There are some strange characteristic such as B, Z, N, X in the protein sequence. Can anybody tell me what these "bad " characteristics means? What should I do if my program compain these bad characteristics. Remove them or replace them with some specific amino acid? Thanks Frank -- Do not guess who I am. I am not Bush in BlackHouse From boris.steipe at utoronto.ca Mon May 30 22:46:20 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Mon May 30 22:38:44 2005 Subject: [Bioperl-l] [BiO BB] B, Z, N, X in refseq In-Reply-To: <1e3d81a1050530185010d15e6e@mail.gmail.com> Message-ID: <2AEBE3A0-D17E-11D9-BD2C-000A9577512E@utoronto.ca> > I am using the refseq from Genbank. There are some strange > characteristic such as B, Z, N, X in the protein sequence. These are standard ambiguity codes, see for example < http://www.ncbi.nlm.nih.gov/blast/html/search.html > (except for "N", which is simply asparagine) > Can anybody tell me what these "bad " characteristics means? This most likely means that particular sequence was derived by chemical sequencing of polypeptides, not by translation of nucleic acids; thus it may be hard to distinguish between N/D or Q/E. > What > should I do if my program compain these bad characteristics. Remove > them or replace them with some specific amino acid? That depends on what you want to do. For database sequence search the BLAST server accepts these codes and they are correctly represented in standard mutation-data matrices for alignment scores, so you don't need to worry. For molecular weight calculations you could use an average or randomly choose one or the other. I can't imagine an application where this level of detail would make much of a difference. However: removing them is always a bad choice, e.g. for sequence alignments you would be introducing a gap (bad!). Hope this helps, it's pretty standard textbook knowledge though, and maybe it would be worthwhile to read up on the Net before you post to several groups at once :-) B. > > Thanks > > Frank > > > -- > Do not guess who I am. I am not Bush in BlackHouse > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From Marc.Logghe at devgen.com Mon May 30 03:08:53 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Fri Jun 3 09:55:52 2005 Subject: [Bioperl-l] please help me to check why this perl script does notwork! Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E7DE@ANTARESIA.be.devgen.com> > Thanks all. I tried the both "SEQPASTE" and "seqpaste". but > it did not work. > > Also, I modified the script according to Marc 's advice. That is odd. In my hands it works. I'll attach the complete script. The output is: Webface Jobsubmission If Javascript is disabled, follow This link ********************************* HTH, Marc > > my $response = > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > {"configfile" => > "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf", > "SEQPASTE" => "$input_sequence", > "SEQSUB" => "", > "full" => "full", > "submit" => "Submit" > } ); > > unfortunately, it still did work. > > The problem is still complain that "Read: Field not declared; > 'seqpaste'" > > > What's more interesiting, when I move the hidden value to > another place. It complains differently > > my $response = > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > { > "SEQPASTE" => "$input_sequence", > "SEQSUB" => "", > "full" => "full", > "configfile" => > "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf", > "submit" => "Submit" > } ); > > Complain: Read: Field not declared; 'submit' > > --------------I believe the problem is due to this sentence, > but i do not know what it is and where it is :-( > > > On 5/30/05, Marc Logghe wrote: > > Hi, > > There are a few issues with the script: > > 1) @seq is not an array of (fasta) sequences, but in your > case a list > > of lines. This can be fixed like so: > > #@seq = ; > > local $/ = undef; > > @seq = split /\n(?=>)/, ; > > > > 2) You have to POST all necessary parameters, including the > hidden ones. > > Also, apparently, you have to pass a hash ref in stead of > an array ref > > to post(): > > my $response = > > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > > { "SEQPASTE" => "$item", > > full => 'full', > > "configfile" => > > "/usr/opt/www/pub/CBS/services/ChloroP-1.1/chlorop.cf" > > } > > ); > > > > 3) The resulting content is an html page with javascript in > it and a > > url where to fetch the result when finished. So I guess you have to > > parse out that url and perform another request. > > > > HTH, > > Marc > > > > > > Marc Logghe, PhD > > deVGen NV > > Technologiepark 30 > > B - 9052 Ghent-Zwijnaarde > > Tel. +32 9 324 24 83 > > Fax. +32 9 324 24 25 > > Web: www.devgen.com > > > > --- Disclaimer start --- > > This e-mail and any attachments thereto may contain > information which > > is confidential and/or which is proprietary to the sender. > > Accordingly, this e-mail and any attachments thereto, as > well as any > > and all information contained therein, are intended for the > sole use > > of the recipient or recipients designated above. Any use of this > > e-mail, of any attachments thereto, of any and all information > > contained therein, and/or of any part(s) thereof > (including, without > > limitation, total or partial reproduction, communication and/or > > distribution in any form) by persons other than the designated > > recipient(s) is prohibited. If you have received this > e-mail in error, > > please notify the sender either by telephone or by e-mail > and delete the material from any computer. > > Thank you for your cooperation. > > --- Disclaimer end --- > > > > > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org > > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Fei Li > > > Sent: Sunday, May 29, 2005 9:37 AM > > > To: Bioperl-l@portal.open-bio.org > > > Subject: [Bioperl-l] please help me to check why this perl script > > > does notwork! > > > > > > I wish to submit some protein sequences via LWP:UserAgent to the > > > http://www.cbs.dtu.dk/services/ChloroP/ > > > > > > The error message is given as follows: > > > > > > Webface Error > > color=red>

Webface Error:

> > > Read: Field not declared; 'seqpaste'


> > > > > > ********************************* > > > > > > > > > #! /usr/bin/perl -w > > > use strict; > > > use HTTP::Request::Common; > > > use LWP::UserAgent; > > > > > > my $browser = LWP::UserAgent->new; > > > my @seq; > > > my $count =0; > > > > > > my $infile = "my_test_file"; #input my Fasta file > > > > > > open INPUT, "$infile" or die "can not open $infile:$!"; > open OUTPUT, > > > ">>$infile.ChloroP.res" or die "Cannot create the output > file: $!"; > > > > > > @seq = ; > > > my $total = @seq; > > > > > > foreach my $item (@seq){ > > > $count ++; > > > chomp $item; > > > my $response = > > > $browser->post('http://www.cbs.dtu.dk/cgi-bin/nph-webface', > > > [ "SEQPASTE" => "$item", > > > "submit" => "Submit"] > > > ); > > > warn "WARN!: ", $response->status_line, "\n" unless > > > $response->is_success; > > > > > > if($response->is_success){ > > > my $result = $response->content; > > > open OUTPUT, ">>$infile.ChloroP.res" or die > > > "Cannot create the output file: $!"; > > > print OUTPUT "$result\n"; > > > print OUTPUT > > > "\n*********************************\n\n\n"; > > > } > > > print "$count of $total finished\n"; > > > } > > > > > > close OUTPUT; > > > close INPUT; > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: test.pl Type: application/octet-stream Size: 1273 bytes Desc: test.pl Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050530/597f8858/test-0001.obj From ferdinand.marletaz at gmail.com Mon May 30 07:36:06 2005 From: ferdinand.marletaz at gmail.com (=?ISO-8859-1?Q?Ferdinand_Marl=E9taz?=) Date: Fri Jun 3 09:55:58 2005 Subject: [Bioperl-l] Reroot Tree ? Message-ID: Hi, Well, I'm trying to reroot a population of Tree and I don't manage to do it ! In fact, the function reroot asks $node and I'd like to reroot with a specific taxa of my tree, so I ignore the node name on which I must reroot ? I've tried find_node but it doesn't seem to work (I don't thing it looks leaves). So, what should I do ? Thanks Ferdi _____________________________ Ferdinand Marl?taz Evolution et phylog?nie des m?tazoaires UMR 6540 DIMAR Rue Batterie des Lions 13007 MARSEILLE Tel. 33 (0)4 91 04 16 54 Port. 33 (0)6 30 35 58 49 Mail. Ferdinand.Marletaz@ens-lyon.fr -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 613 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050530/a54d341c/attachment-0001.bin From lupey+ at pitt.edu Tue May 31 09:33:46 2005 From: lupey+ at pitt.edu (Paul G Cantalupo) Date: Fri Jun 3 09:56:00 2005 Subject: [Bioperl-l] ORF scan In-Reply-To: <1e3d81a10505291946749e236f@mail.gmail.com> References: <1e3d81a10505291946749e236f@mail.gmail.com> Message-ID: Fei, Check out the EMBOSS program 'getorf' http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/getorf.html Paul Paul Cantalupo Research Specialist/Systems Programmer 559 Crawford Hall Department of Biological Sciences University of Pittsburgh Pittsburgh, PA 15260 Work: 412-624-4687 Fax: 412-624-4759 Ask me about Toastmasters: www.toastmasters.org Midday Club Treasurer On Mon, 30 May 2005, Fei Li wrote: > Can anybody suggest me a good module to scan all the possible ORF(open > reading fram) in a given nucleotide sequence > > Thanks. > > -- > Do not guess who I am. I am not Bush in BlackHouse > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l >