From sanges at biogem.it Tue Jun 1 02:11:16 2004 From: sanges at biogem.it (Remo Sanges) Date: Tue Jun 1 02:14:34 2004 Subject: [Bioperl-l] Bio::Seq::Meta::Array method In-Reply-To: <003801c446fc$8517d110$f3ee669e@chagall> References: <003801c446fc$8517d110$f3ee669e@chagall> Message-ID: <7E0EC526-B392-11D8-A27E-000A95767E46@biogem.it> On 31-May-04, at PM 06:46, luisa pugliese wrote: > Hi Bioperlers, > I found the Bio::Seq::Meta::Array method that should implements > generic methods for sequences with residue-based meta information. > Since I > would like to link somehow the residue number of a pdb file to the > residues > of its sequence, I thought I got the right method. I tested it with the > example reported in the synopsis, but I didn't understand what should > I do > in order to see the link between each residue and the corresponding > meta > information. > Luisa, probably I haven't understand your needs but why don't use features? I never have used structure informations but if all you need is to add values to some residues in a sequence you can easily use features... try this: #!/usr/bin/perl use strict; use warnings; use Bio::Seq; use Bio::SeqFeature::Generic; my $seq = Bio::Seq->new(-seq =>'TKLMILVSHIVILSRM'); my $feat1 = new Bio::SeqFeature::Generic ( -start => 2, -end => 5, -primary => 'structure', -source_tag => 'PDB', -display_name => 'loop' ); my $feat2 = new Bio::SeqFeature::Generic ( -start => 8, -end => 15, -primary => 'structure', -source_tag => 'PDB', -display_name => 'helix' ); $seq->add_SeqFeature($feat1); $seq->add_SeqFeature($feat2); foreach my $feat($seq->get_SeqFeatures) { print "Residue from: ".$feat->start." to ".$feat->end. " structure: ".$feat->display_name. " with seq ".$feat->seq->seq."\n" if $feat->primary_tag eq 'structure'; } If you need to understand the implementation of feature system in bioperl please read the tutorial and the how-to: http://www.bioperl.org/Core/Latest/bptutorial.html http://bioperl.org/HOWTOs/html/Feature-Annotation.html In order to be compatible with the actual standard please consider the definitions: http://www.ebi.ac.uk/embl/Documentation/FT_definitions/ feature_table.html Hope this help Remo --------------------------------------------------------- Remo Sanges - Ph.D. student Gene Expression Core Lab - BioGeM CODE Bionformatic Project - Tigem Via Pietro Castellino 111 80131 Naples - Italy tel: +390816132 - 339 - 303 fax: +390816132 - 262 sanges@biogem.it rsanges@tigem.it --------------------------------------------------------- From tariq_shafi75 at hotmail.com Tue Jun 1 11:43:25 2004 From: tariq_shafi75 at hotmail.com (Tariq Shafi) Date: Tue Jun 1 11:46:38 2004 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::Clustalw. Alignment doesn't print out in CGI Message-ID: Sean, Many thanks for your help. I didn't set my environment variable before. At the top of the script therefore what I needed to put was: use CGI qw(:standard); use Bio::Seq; use Bio::SeqIO; use Bio::Tools::Run::Alignment::Clustalw; $ENV{CLUSTALDIR} = '/usr/local/clustalw'; # CRUCIAL LINE $query = new CGI; print $query->header; .. etc. Kind regards Tariq _________________________________________________________________ Want to block unwanted pop-ups? Download the free MSN Toolbar now! http://toolbar.msn.co.uk/ From mebradley at chem.ufl.edu Tue Jun 1 11:43:56 2004 From: mebradley at chem.ufl.edu (Michael Bradley) Date: Tue Jun 1 11:50:37 2004 Subject: [Bioperl-l] retrieving coding sequences from swissprot protein accessions Message-ID: <40BCA43C.5060403@chem.ufl.edu> Hello all, I would like to get at the coding sequence for a given protein with a swissprot accession. I have done this with GenBank file in the past using the following code. Does anyone know how to do this with swissprot ? my $gp = new Bio::DB::GenPept; my $gb = new Bio::DB::GenBank; my $loc_factory = new Bio::Factory::FTLocationFactory; my $prot_stream = $gp->get_Stream_by_acc($protein_gi); while ( my $prot_seq = $prot_stream->next_seq() ) { foreach my $feat ( $prot_seq->top_SeqFeatures ) { if ( $feat->primary_tag eq 'CDS' ) { # example: 'coded_by="U05729.1:1..122"' my @coded_by = $feat->each_tag_value('coded_by'); my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc); # create Bio::Location object from a string my $loc_object = $loc_factory->from_string($loc_str); # create a Feature object by using a Location my $feat_obj = new Bio::SeqFeature::Generic(-location =>$loc_object); # associate the Feature object with the nucleotide Seq object $nuc_obj->add_SeqFeature($feat_obj); my $cds_obj = $feat_obj->spliced_seq; print "CDS sequence is ",$cds_obj->seq,"\n\n"; } else { print "No CDS for ", $prot_seq->id,"\n\n"; } } } Thanks, Michael Bradley From jason at cgt.duhs.duke.edu Tue Jun 1 12:04:38 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jun 1 12:08:55 2004 Subject: [Bioperl-l] retrieving coding sequences from swissprot protein accessions In-Reply-To: <40BCA43C.5060403@chem.ufl.edu> References: <40BCA43C.5060403@chem.ufl.edu> Message-ID: Get the dblinks which point to an EMBL accession. http://jason.open-bio.org/Bioperl_Tutorials/GenomeInformatics2003/Bioperl-2.pdf The example starts on slide 31, the code to get the xrefs is on about slide 37 or so. Get the accessions which are mRNA or DNA depending on which annotation you want to use, then parse out the CDS for these records. -jason On Tue, 1 Jun 2004, Michael Bradley wrote: > Hello all, > > I would like to get at the coding sequence for a given protein with a > swissprot accession. I have done this with GenBank file in the past > using the following code. Does anyone know how to do this with swissprot ? > > my $gp = new Bio::DB::GenPept; > my $gb = new Bio::DB::GenBank; > my $loc_factory = new Bio::Factory::FTLocationFactory; > > my $prot_stream = $gp->get_Stream_by_acc($protein_gi); > while ( my $prot_seq = $prot_stream->next_seq() ) { > foreach my $feat ( $prot_seq->top_SeqFeatures ) { > if ( $feat->primary_tag eq 'CDS' ) { > # example: 'coded_by="U05729.1:1..122"' > my @coded_by = $feat->each_tag_value('coded_by'); > my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; > my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc); > # create Bio::Location object from a string > my $loc_object = $loc_factory->from_string($loc_str); > # create a Feature object by using a Location > my $feat_obj = new Bio::SeqFeature::Generic(-location =>$loc_object); > # associate the Feature object with the nucleotide Seq object > $nuc_obj->add_SeqFeature($feat_obj); > my $cds_obj = $feat_obj->spliced_seq; > print "CDS sequence is ",$cds_obj->seq,"\n\n"; > } else { > print "No CDS for ", $prot_seq->id,"\n\n"; > } > } > } > > Thanks, > > Michael Bradley > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From arareko at campus.iztacala.unam.mx Tue Jun 1 12:06:08 2004 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue Jun 1 13:19:02 2004 Subject: [Bioperl-l] IMGT databases Message-ID: <1086105968.40bca9706b257@correo.iztacala.unam.mx> Hi, Is there any module for fetching IMGT (ImMunoGeneTics) database records? Something like Bio::DB::EMBL ?? Thanks in advance. -- MAURICIO HERRERA CUADRA arareko@campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From qdong at genome.stanford.edu Tue Jun 1 20:11:33 2004 From: qdong at genome.stanford.edu (Stan Dong) Date: Tue Jun 1 20:14:07 2004 Subject: [Bioperl-l] SGD GFF3 file released Message-ID: <67A057AE-B429-11D8-B683-000A95D983A6@genome.stanford.edu> Hello, SGD have just released GFF3 file for Saccharomyces cerevisiae. This file is fully compatible with the current GFF3 specification (http://song.sourceforge.net/gff3-jan04.shtml). Upon our testing and help from Scott Cain at GMOD, it works fine with GBrowse and Chado loading script. This file is updated every week and available for download from SGD ftp site. ftp://genome-ftp.stanford.edu/pub/yeast/data_download/ chromosomal_feature/saccharomyces_cerevisiae.gff Feedback and comments are welcomed. cheers, -Stan -------------------------------------------------- Stan Dong, Ph.D. Senior Scientific Programmer Saccharomyces Genome Database M339, Department of Genetics Stanford University Medical Center Stanford, CA 94305-5120 Phone: 650-725-8956, Fax: 650-725-1534 email: qdong@genome.stanford.edu http://www.yeastgenome.org/ -------------------------------------------------- From gongwuming at hotmail.com Wed Jun 2 05:22:54 2004 From: gongwuming at hotmail.com (Gong Wuming) Date: Wed Jun 2 05:26:06 2004 Subject: [Bioperl-l] problem with not transformed negative numbers in .soft file of GEO database Message-ID: Hi list, I am afraid the question is not suitable to this mailing list, but I can not find better place to post it. I have a problem with the .soft file in GEO database. Taking the GDS169.soft file for example, what does the negative entry mean? How to deal with these negative numbers before log transformation? GDS169.soft: ... ID_REF IDENTIFIER GSM2559 GSM2561 GSM2563 GSM2564 GSM2566 GSM2568 GSM2570 GSM2572 GSM2574 IL2_at M16762 -29 -48 -74 -30 -20 -51 -52 -80 -82 IL10_at M37897 34 28 14 34 56 34 21 51 15 GMCSF_at X03019 -120 -53 -49 -31 -86 -29 -28 -63 -91 TNFRII_at M60469 8 27 24 19 89 54 70 62 113 MIP1-B_at M35590 410 421 445 426 53 142 21 12 169 IL4_at M25892 28 23 38 7 -1 27 18 27 0 IL12_P40_at M86671 -73 -70 -104 -91 -17 -69 9 47 14 TNFa_at X02611 -13 -44 -42 -38 -13 -36 -22 -18 -53 TCRa_at M77167 -143 -109 -153 -116 -149 -225 -248 -210 -238 AFFX-BioB-5_at J04423 54 84 null 2 303 309 272 421 467 AFFX-BioB-M_at J04423 95 86 null 69 394 267 386 567 430 ... ... Sincerely Wuming Gong College of Life Science, Wuhan University, China _________________________________________________________________ 享用世界上最大的电子邮件系统— MSN Hotmail。 http://www.hotmail.com From sdavis2 at mail.nih.gov Wed Jun 2 07:02:26 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Jun 2 07:05:37 2004 Subject: [Bioperl-l] problem with not transformed negative numbers in .soft file of GEO database In-Reply-To: Message-ID: An issue with GEO is that one cannot always tell for sure what has happened to the data before submission. Perhaps these are already log-transformed (what base?) or centered to zero? To know for sure, one would probably have to contact the author, but you could plot the distribution of values for an array and see what it looks like. Sean On 6/2/04 5:22, "Gong Wuming" wrote: > Hi list, > > I am afraid the question is not suitable to this mailing list, but I can > not find better place to post it. > > I have a problem with the .soft file in GEO database. > Taking the GDS169.soft file for example, what does the negative entry mean? > How to deal with these negative numbers before log transformation? > > GDS169.soft: > ... > ID_REF IDENTIFIER GSM2559 GSM2561 GSM2563 GSM2564 GSM2566 GSM2568 GSM2570 > GSM2572 GSM2574 > IL2_at M16762 -29 -48 -74 -30 -20 -51 -52 -80 -82 > IL10_at M37897 34 28 14 34 56 34 21 51 15 > GMCSF_at X03019 -120 -53 -49 -31 -86 -29 -28 -63 -91 > TNFRII_at M60469 8 27 24 19 89 54 70 62 113 > MIP1-B_at M35590 410 421 445 426 53 142 21 12 169 > IL4_at M25892 28 23 38 7 -1 27 18 27 0 > IL12_P40_at M86671 -73 -70 -104 -91 -17 -69 9 47 14 > TNFa_at X02611 -13 -44 -42 -38 -13 -36 -22 -18 -53 > TCRa_at M77167 -143 -109 -153 -116 -149 -225 -248 > -210 -238 > AFFX-BioB-5_at J04423 54 84 null 2 303 309 272 421 467 > AFFX-BioB-M_at J04423 95 86 null 69 394 267 386 567 430 > ... > ... > > Sincerely > > Wuming Gong > College of Life Science, Wuhan University, China > > _________________________________________________________________ > 享用世界上最大的电子邮件系统— MSN Hotmail。 http://www.hotmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From sanges at biogem.it Wed Jun 2 07:13:33 2004 From: sanges at biogem.it (Remo Sanges) Date: Wed Jun 2 07:16:47 2004 Subject: [Bioperl-l] problem with not transformed negative numbers in .soft file of GEO database In-Reply-To: References: Message-ID: On 02-Jun-04, at PM 05:22, Gong Wuming wrote: > Hi list, > > I am afraid the question is not suitable to this mailing list, but I > can not find better place to post it. > > I have a problem with the .soft file in GEO database. Taking the > GDS169.soft file for example, what does the negative entry mean? How > to deal with these negative numbers before log transformation? > Gong, the best place in wich to post this questions is: bioconductor www.bioconductor.org or the gene-array list http://www.gene-chips.com/gene-arrays.html In your case the data you are observing should come from the old version of Affymetrix algorithm (MAS4) that produced negative expression value when the signal of negative-control probes was higher than the signal of positive probes... You can solve setting all negative value to zero, but keep in mind that if you need to make statistical analysis on these values this is not the best solution. Remo From suzi at fruitfly.org Tue Jun 1 19:43:50 2004 From: suzi at fruitfly.org (Suzanna Lewis) Date: Wed Jun 2 08:16:08 2004 Subject: [Bioperl-l] Re: [Gmod-schema] SGD GFF3 file released In-Reply-To: <67A057AE-B429-11D8-B683-000A95D983A6@genome.stanford.edu> References: <67A057AE-B429-11D8-B683-000A95D983A6@genome.stanford.edu> Message-ID: <40BD14B6.6070703@fruitfly.org> Fantastic, just what I needed for testing :-) Stan Dong wrote: > Hello, > > SGD have just released GFF3 file for Saccharomyces cerevisiae. This > file is fully compatible with the current GFF3 specification > (http://song.sourceforge.net/gff3-jan04.shtml). Upon our testing and > help from Scott Cain at GMOD, it works fine with GBrowse and Chado > loading script. This file is updated every week and available for > download from SGD ftp site. > > ftp://genome-ftp.stanford.edu/pub/yeast/data_download/ > chromosomal_feature/saccharomyces_cerevisiae.gff > > Feedback and comments are welcomed. > > cheers, > -Stan > > -------------------------------------------------- > Stan Dong, Ph.D. > Senior Scientific Programmer > Saccharomyces Genome Database > M339, Department of Genetics > Stanford University Medical Center > Stanford, CA 94305-5120 > Phone: 650-725-8956, Fax: 650-725-1534 > email: qdong@genome.stanford.edu > http://www.yeastgenome.org/ > -------------------------------------------------- > > > > ------------------------------------------------------- > This SF.Net email is sponsored by the new InstallShield X. > From Windows to Linux, servers to mobile, InstallShield X is the one > installation-authoring solution that does it all. Learn more and > evaluate today! http://www.installshield.com/Dev2Dev/0504 > _______________________________________________ > Gmod-schema mailing list > Gmod-schema@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema From bmb9jrm at bmb.leeds.ac.uk Wed Jun 2 09:43:18 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Wed Jun 2 09:46:31 2004 Subject: [Bioperl-l] Phylip error Message-ID: <1086183798.6158.14.camel@localhost.localdomain> Hi to all, I'm trying to incorporate phylogenetic analysis into a script I'm writing. I'm attempting to feed in alignments (SimpleAlign objects) derived from an earlier subroutine, and stored in a global hash (the reason for the hash being that I wanted to reference by a blast query sequence). My code: sub phylip{ foreach my $key (keys %alignments){ my ($alnmnt) = $alignments{$key}; my @params = ('MODEL' => 'PAM'); my $protdist = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(@params); my $matrix = $protdist->run($alnmnt); @params = ('type'=>'NJ','outgroup'=>2,'lowtri'=>1, 'upptri'=>1,'subrep'=>1); my $neighbor = Bio::Tools::Run::Phylo::Phylip::Neighbor->new(@params); my ($tree) = $neighbor_factory->run($matrix); } } It seems to have a problem at the protdist stage, resulting in the following error: ------------- EXCEPTION ------------- MSG: protdist did not create matrix correctly (/tmp/q4lIZVPNvW/outfile) STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::_run /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:407 STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::run /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:364 STACK main::phylip ./GeneInfo5.pl:1648 STACK toplevel ./GeneInfo5.pl:563 -------------------------------------- >From the archives, this would seem to be a problem with writing the matrix to file, but I don't know why. The permissions on /tmp are fine. Does anyone have any ideas? Thanks in advance, Jon From jason at cgt.duhs.duke.edu Wed Jun 2 10:05:46 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jun 2 10:09:05 2004 Subject: [Bioperl-l] Phylip error In-Reply-To: <1086183798.6158.14.camel@localhost.localdomain> References: <1086183798.6158.14.camel@localhost.localdomain> Message-ID: Does the protdist test work fine for you in bioperl-run? Are you using phylip 3.5 or 3.6? You need to twiddle something to use 3.6 bash% export PHYLIPVERSION = 3.6 or in your script: $ENV{PHYLIPVERSION} = 3.6; use Bio::Tools::Run::Phylo::Phylip::ProtDist; If you are just doing kimura 2P you can avoid ProtDist completely with new code in Bio::Align::ProteinStatistics. -jason On Wed, 2 Jun 2004, Jonathan Manning wrote: > Hi to all, > > I'm trying to incorporate phylogenetic analysis into a script I'm > writing. I'm attempting to feed in alignments (SimpleAlign objects) > derived from an earlier subroutine, and stored in a global hash (the > reason for the hash being that I wanted to reference by a blast query > sequence). My code: > > sub phylip{ > > foreach my $key (keys %alignments){ > my ($alnmnt) = $alignments{$key}; > my @params = ('MODEL' => 'PAM'); > my $protdist = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(@params); > my $matrix = $protdist->run($alnmnt); > @params = ('type'=>'NJ','outgroup'=>2,'lowtri'=>1, > 'upptri'=>1,'subrep'=>1); > > my $neighbor = > Bio::Tools::Run::Phylo::Phylip::Neighbor->new(@params); > my ($tree) = $neighbor_factory->run($matrix); > } > > } > > > > It seems to have a problem at the protdist stage, resulting in the > following error: > > > > ------------- EXCEPTION ------------- > MSG: protdist did not create matrix correctly (/tmp/q4lIZVPNvW/outfile) > STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::_run > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:407 > STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::run > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:364 > STACK main::phylip ./GeneInfo5.pl:1648 > STACK toplevel ./GeneInfo5.pl:563 > > -------------------------------------- > > > >From the archives, this would seem to be a problem with writing the > matrix to file, but I don't know why. The permissions on /tmp are fine. > > Does anyone have any ideas? > > Thanks in advance, > > Jon > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From bmb9jrm at bmb.leeds.ac.uk Wed Jun 2 10:46:10 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Wed Jun 2 10:49:22 2004 Subject: [Bioperl-l] Phylip error In-Reply-To: References: <1086183798.6158.14.camel@localhost.localdomain> Message-ID: <1086187569.6158.23.camel@localhost.localdomain> Thanks for the reply. I'm using version 3.5. Although I may only do a simple run right now, I'd like to allow for future extension, so integration of the Phylip suite is desirable. I'm unsure of what you mean about the test, could you clarify? Thanks, Jon On Wed, 2004-06-02 at 15:05, Jason Stajich wrote: > Does the protdist test work fine for you in bioperl-run? > > Are you using phylip 3.5 or 3.6? You need to twiddle something to use 3.6 > bash% export PHYLIPVERSION = 3.6 > > or in your script: > $ENV{PHYLIPVERSION} = 3.6; > use Bio::Tools::Run::Phylo::Phylip::ProtDist; > > If you are just doing kimura 2P you can avoid ProtDist completely with new > code in Bio::Align::ProteinStatistics. > > -jason > On Wed, 2 Jun 2004, Jonathan Manning wrote: > > > Hi to all, > > > > I'm trying to incorporate phylogenetic analysis into a script I'm > > writing. I'm attempting to feed in alignments (SimpleAlign objects) > > derived from an earlier subroutine, and stored in a global hash (the > > reason for the hash being that I wanted to reference by a blast query > > sequence). My code: > > > > sub phylip{ > > > > foreach my $key (keys %alignments){ > > my ($alnmnt) = $alignments{$key}; > > my @params = ('MODEL' => 'PAM'); > > my $protdist = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(@params); > > my $matrix = $protdist->run($alnmnt); > > @params = ('type'=>'NJ','outgroup'=>2,'lowtri'=>1, > > 'upptri'=>1,'subrep'=>1); > > > > my $neighbor = > > Bio::Tools::Run::Phylo::Phylip::Neighbor->new(@params); > > my ($tree) = $neighbor_factory->run($matrix); > > } > > > > } > > > > > > > > It seems to have a problem at the protdist stage, resulting in the > > following error: > > > > > > > > ------------- EXCEPTION ------------- > > MSG: protdist did not create matrix correctly (/tmp/q4lIZVPNvW/outfile) > > STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::_run > > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:407 > > STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::run > > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:364 > > STACK main::phylip ./GeneInfo5.pl:1648 > > STACK toplevel ./GeneInfo5.pl:563 > > > > -------------------------------------- > > > > > > >From the archives, this would seem to be a problem with writing the > > matrix to file, but I don't know why. The permissions on /tmp are fine. > > > > Does anyone have any ideas? > > > > Thanks in advance, > > > > Jon > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > From jason at cgt.duhs.duke.edu Wed Jun 2 11:28:48 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jun 2 11:32:01 2004 Subject: [Bioperl-l] Phylip error In-Reply-To: <1086187569.6158.23.camel@localhost.localdomain> References: <1086183798.6158.14.camel@localhost.localdomain> <1086187569.6158.23.camel@localhost.localdomain> Message-ID: what does % cd bioperl-run % perl -I. -w t/ProtDist.t say? Which version bioperl-run are you using -jason On Wed, 2 Jun 2004, Jonathan Manning wrote: > Thanks for the reply. I'm using version 3.5. Although I may only do a > simple run right now, I'd like to allow for future extension, so > integration of the Phylip suite is desirable. > > I'm unsure of what you mean about the test, could you clarify? > > Thanks, > > Jon > > On Wed, 2004-06-02 at 15:05, Jason Stajich wrote: > > Does the protdist test work fine for you in bioperl-run? > > > > Are you using phylip 3.5 or 3.6? You need to twiddle something to use 3.6 > > bash% export PHYLIPVERSION = 3.6 > > > > or in your script: > > $ENV{PHYLIPVERSION} = 3.6; > > use Bio::Tools::Run::Phylo::Phylip::ProtDist; > > > > If you are just doing kimura 2P you can avoid ProtDist completely with new > > code in Bio::Align::ProteinStatistics. > > > > -jason > > On Wed, 2 Jun 2004, Jonathan Manning wrote: > > > > > Hi to all, > > > > > > I'm trying to incorporate phylogenetic analysis into a script I'm > > > writing. I'm attempting to feed in alignments (SimpleAlign objects) > > > derived from an earlier subroutine, and stored in a global hash (the > > > reason for the hash being that I wanted to reference by a blast query > > > sequence). My code: > > > > > > sub phylip{ > > > > > > foreach my $key (keys %alignments){ > > > my ($alnmnt) = $alignments{$key}; > > > my @params = ('MODEL' => 'PAM'); > > > my $protdist = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(@params); > > > my $matrix = $protdist->run($alnmnt); > > > @params = ('type'=>'NJ','outgroup'=>2,'lowtri'=>1, > > > 'upptri'=>1,'subrep'=>1); > > > > > > my $neighbor = > > > Bio::Tools::Run::Phylo::Phylip::Neighbor->new(@params); > > > my ($tree) = $neighbor_factory->run($matrix); > > > } > > > > > > } > > > > > > > > > > > > It seems to have a problem at the protdist stage, resulting in the > > > following error: > > > > > > > > > > > > ------------- EXCEPTION ------------- > > > MSG: protdist did not create matrix correctly (/tmp/q4lIZVPNvW/outfile) > > > STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::_run > > > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:407 > > > STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::run > > > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:364 > > > STACK main::phylip ./GeneInfo5.pl:1648 > > > STACK toplevel ./GeneInfo5.pl:563 > > > > > > -------------------------------------- > > > > > > > > > >From the archives, this would seem to be a problem with writing the > > > matrix to file, but I don't know why. The permissions on /tmp are fine. > > > > > > Does anyone have any ideas? > > > > > > Thanks in advance, > > > > > > Jon > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From bmb9jrm at bmb.leeds.ac.uk Wed Jun 2 12:17:08 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Wed Jun 2 12:20:24 2004 Subject: [Bioperl-l] Phylip error In-Reply-To: References: <1086183798.6158.14.camel@localhost.localdomain> <1086187569.6158.23.camel@localhost.localdomain> Message-ID: <1086193028.6158.38.camel@localhost.localdomain> Many apologies Jason, the problem was down to a faulty symbolic link in my /usr/bin . Sorry for wasting your time- I'm getting somewhere with it now. That testing thing's useful to know though..... Thanks again, Jon On Wed, 2004-06-02 at 16:28, Jason Stajich wrote: > what does > % cd bioperl-run > % perl -I. -w t/ProtDist.t > > say? > > Which version bioperl-run are you using > > -jason > > On Wed, 2 Jun 2004, Jonathan Manning wrote: > > Thanks for the reply. I'm using version 3.5. Although I may only do a > > simple run right now, I'd like to allow for future extension, so > > integration of the Phylip suite is desirable. > > > > I'm unsure of what you mean about the test, could you clarify? > > > > Thanks, > > > > Jon > > > > On Wed, 2004-06-02 at 15:05, Jason Stajich wrote: > > > Does the protdist test work fine for you in bioperl-run? > > > > > > Are you using phylip 3.5 or 3.6? You need to twiddle something to use 3.6 > > > bash% export PHYLIPVERSION = 3.6 > > > > > > or in your script: > > > $ENV{PHYLIPVERSION} = 3.6; > > > use Bio::Tools::Run::Phylo::Phylip::ProtDist; > > > > > > If you are just doing kimura 2P you can avoid ProtDist completely with new > > > code in Bio::Align::ProteinStatistics. > > > > > > -jason > > > On Wed, 2 Jun 2004, Jonathan Manning wrote: > > > > > > > Hi to all, > > > > > > > > I'm trying to incorporate phylogenetic analysis into a script I'm > > > > writing. I'm attempting to feed in alignments (SimpleAlign objects) > > > > derived from an earlier subroutine, and stored in a global hash (the > > > > reason for the hash being that I wanted to reference by a blast query > > > > sequence). My code: > > > > > > > > sub phylip{ > > > > > > > > foreach my $key (keys %alignments){ > > > > my ($alnmnt) = $alignments{$key}; > > > > my @params = ('MODEL' => 'PAM'); > > > > my $protdist = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(@params); > > > > my $matrix = $protdist->run($alnmnt); > > > > @params = ('type'=>'NJ','outgroup'=>2,'lowtri'=>1, > > > > 'upptri'=>1,'subrep'=>1); > > > > > > > > my $neighbor = > > > > Bio::Tools::Run::Phylo::Phylip::Neighbor->new(@params); > > > > my ($tree) = $neighbor_factory->run($matrix); > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > It seems to have a problem at the protdist stage, resulting in the > > > > following error: > > > > > > > > > > > > > > > > ------------- EXCEPTION ------------- > > > > MSG: protdist did not create matrix correctly (/tmp/q4lIZVPNvW/outfile) > > > > STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::_run > > > > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:407 > > > > STACK Bio::Tools::Run::Phylo::Phylip::ProtDist::run > > > > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm:364 > > > > STACK main::phylip ./GeneInfo5.pl:1648 > > > > STACK toplevel ./GeneInfo5.pl:563 > > > > > > > > -------------------------------------- > > > > > > > > > > > > >From the archives, this would seem to be a problem with writing the > > > > matrix to file, but I don't know why. The permissions on /tmp are fine. > > > > > > > > Does anyone have any ideas? > > > > > > > > Thanks in advance, > > > > > > > > Jon > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > > > Jason Stajich > > > Duke University > > > jason at cgt.mc.duke.edu > > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > From jiben at jhu.edu Wed Jun 2 16:50:57 2004 From: jiben at jhu.edu (JAMES IBEN) Date: Wed Jun 2 16:54:31 2004 Subject: [Bioperl-l] Warning message terminates Stream_by_query? Message-ID: <1f35f61ee713.1ee7131f35f6@jhmimail.jhmi.edu> Hello list, I have written a program (my first) which takes a Genbank query and retrieves sequences to pull out an intergenic region that I would like to work with. However, when running the program I always at some point run into the following warning message: -------------------- WARNING --------------------- MSG: Unbalanced quote in: /locus_tag="SAV0358" /codon_start=1 /transl_table=11 /product="putative cystathionine beta-lyase" /protein_id="BAB56520.1" /db_xref="GI:14246126" / translation="MTLSKETEVIFDWRRGVEYHSANPPLYDSSTFHQTSLG GDVKYDYARSGNPNRELLEEKLARLEQGKFAFAFASGIAAISAVLLTFK SGDHVILPDDVYGGTFRLTEQILNRFNIEFTTVDTTKLEQIEGAIQSNTK LIYIETPSNPCFKITDIKAVSKIAEKHELLVAVDNTFMTPLGQSPLLLGAD IVIHSATKFLSGHSDLINo further qualifiers will be added for this feature --------------------------------------------------- With different querys, the message refers to some other Genbank sequence (i.e. not always this particular entry). The problem is that once I have run into this message, the seqence stream terminates, ending the program. I have checked these entries and see nothing apparantly wrong with them (everything is bounded by quotes). Can anyone tell me what this error arises from and perhaps what I can do to avoid it (or at least to skip any problematic sequences without interrupting the stream)? The querys I have been sumitting should only pull about 250 sequences if they were not interrupted. Is there some sort of stream size limitation that I am hitting? If there is a problem with this approach is there a better solution for my particular task than using Stream_by_query? Thanks for your help, James From jason at cgt.duhs.duke.edu Wed Jun 2 17:44:37 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jun 2 17:47:48 2004 Subject: [Bioperl-l] Warning message terminates Stream_by_query? In-Reply-To: <1f35f61ee713.1ee7131f35f6@jhmimail.jhmi.edu> References: <1f35f61ee713.1ee7131f35f6@jhmimail.jhmi.edu> Message-ID: That is is annoying isn't it... I am worried that you aren't able to download the entire sequence for some reason and it is truncated. The mesages shows it stopping in the middle of the tramslation: translation="MTLSKETEVIFDWRRGVEYHSANPPLYDSSTFHQTSLG GDVKYDYARSGNPNRELLEEKLARLEQGKFAFAFASGIAAISAVLLTFK SGDHVILPDDVYGGTFRLTEQILNRFNIEFTTVDTTKLEQIEGAIQSNTK LIYIETPSNPCFKITDIKAVSKIAEKHELLVAVDNTFMTPLGQSPLLLGAD IVIHSATKFLSGHSDLIN whereas it has a few more lines in the original chrom file. I wonder if there is a problem downloading a whole chromsome record from genbank - the web download is not the most reliable method and you'll find like easier if you can download the .gbk files directly. Depends on what you are working on I guess if you can predict the space of accessions - if you are just working on finished/published genomes you can grab stuff ftp://ftp.ncbi.nih.gov/genbank/genomes like this S.aureus record and I bet you won't have the same problem. -jason On Wed, 2 Jun 2004, JAMES IBEN wrote: > Hello list, > > I have written a program (my first) which takes a Genbank > query and retrieves sequences to pull out an intergenic region > that I would like to work with. However, when running the > program I always at some point run into the following warning > message: > > -------------------- WARNING --------------------- > MSG: Unbalanced quote in: > /locus_tag="SAV0358" > /codon_start=1 > /transl_table=11 > /product="putative cystathionine beta-lyase" > /protein_id="BAB56520.1" > /db_xref="GI:14246126" > / > translation="MTLSKETEVIFDWRRGVEYHSANPPLYDSSTFHQTSLG > GDVKYDYARSGNPNRELLEEKLARLEQGKFAFAFASGIAAISAVLLTFK > SGDHVILPDDVYGGTFRLTEQILNRFNIEFTTVDTTKLEQIEGAIQSNTK > LIYIETPSNPCFKITDIKAVSKIAEKHELLVAVDNTFMTPLGQSPLLLGAD > IVIHSATKFLSGHSDLINo further qualifiers will be added for this > feature > --------------------------------------------------- > > With different querys, the message refers to some other > Genbank sequence (i.e. not always this particular entry). The > problem is that once I have run into this message, the > seqence stream terminates, ending the program. > I have checked these entries and see nothing apparantly > wrong with them (everything is bounded by quotes). Can > anyone tell me what this error arises from and perhaps what I > can do to avoid it (or at least to skip any problematic > sequences without interrupting the stream)? > The querys I have been sumitting should only pull about 250 > sequences if they were not interrupted. Is there some sort of > stream size limitation that I am hitting? If there is a problem > with this approach is there a better solution for my particular > task than using Stream_by_query? > > Thanks for your help, > James > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From hazards at musc.edu Wed Jun 2 16:02:19 2004 From: hazards at musc.edu (Starr Hazard) Date: Thu Jun 3 08:08:34 2004 Subject: [Bioperl-l] Making gff files for ucsc or ncbi build Message-ID: <17888171.1086192139@22gdellstarr.library.musc.edu> I cannot see any replies to your question. Did you get it solved? Starr From brian_osborne at cognia.com Thu Jun 3 08:56:25 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jun 3 08:59:43 2004 Subject: [Bioperl-l] Biosql documentation request Message-ID: Bioperl-l, Dave Howorth has provided a detailed critique of the bioperl-db/biosql documentation which I'm working through. One thing that he noticed was that the Biosql file doc/biosql.html was out-of-date. This file was created by running a script called postgres_autodoc.pl on a Postgres instance of the biosql schema. Can anyone provide me with a current version of this file? I run biosql on Mysql myself and I haven't found a script or utility equivalent to postgres_autodoc.pl. postgres_autodoc.pl is available at http://www.rbt.ca/autodoc/. Brian O. From luisa.pugliese at safan-bioinformatics.it Thu Jun 3 09:11:22 2004 From: luisa.pugliese at safan-bioinformatics.it (luisa pugliese) Date: Thu Jun 3 09:14:16 2004 Subject: [Bioperl-l] downlaoding pdb files Message-ID: <004301c4496c$448e2b40$f3ee669e@chagall> Hi bioperlers, I looked at the DB modules and I didn't find anything related to the PDB. Does anyody knows if there is a way to automatically download pdb files from the PDB within bioperl, as it is possible to do for sequences from genebank? Thank you to all Luisa ============================= Luisa Pugliese, Ph.D. luisa.pugliese@safan-bioinformatics.it S.A.F.AN. BIOINFORMATICS Corso Tazzoli 215/13 -10137 Torino - ITALY tel +39 011 3026230 cell. +39 333 6130644 From s.paul at surrey.ac.uk Thu Jun 3 17:50:34 2004 From: s.paul at surrey.ac.uk (S.Paul) Date: Thu Jun 3 09:51:00 2004 Subject: [Bioperl-l] accessing genbank Message-ID: <278a01c449b4$ccf3b0c0$776fe383@LTCEP1SP> Hi Everybody: I have been trying to access the Genbank for the following accession numbers and outfut it to a file and am getting the following error message: **************************************************************************** MSG: acc does not exist STACK Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:1 7 STACK Bio::DB::GenBank::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/GenBank.pm:216 STACK toplevel Genbank_seq_dopamine.pl:11 ****************************************************************************************************** Not sure what I am doing wrong. I can, however, retrieve individual accession number. I am enclosing the code. Thanks for the help in advance Sujoy Paul ******************************************************************************************** use Bio::DB::GenBank; use Bio::SeqIO; use Bio::DB::WebDBSeqI; my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Seq_by_acc(['AAH21195','AAH38978']); my $seqout = new Bio::SeqIO(-fh =>'dopamin_human.gbk', -format => 'genbank'); *********************************************************************************************** Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk From brian_osborne at cognia.com Thu Jun 3 09:58:22 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jun 3 10:01:59 2004 Subject: [Bioperl-l] accessing genbank In-Reply-To: <278a01c449b4$ccf3b0c0$776fe383@LTCEP1SP> Message-ID: Sujoy, You don't want "get_Seq", you want to use "get_Stream" if you're using multiple ids. Something like: $seqio = $gb->get_Stream_by_id(["J00522","AF303112","2981014"]); Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of S.Paul Sent: Thursday, June 03, 2004 5:51 PM To: bioperl-l Subject: [Bioperl-l] accessing genbank Hi Everybody: I have been trying to access the Genbank for the following accession numbers and outfut it to a file and am getting the following error message: **************************************************************************** MSG: acc does not exist STACK Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:1 7 STACK Bio::DB::GenBank::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/GenBank.pm:216 STACK toplevel Genbank_seq_dopamine.pl:11 **************************************************************************** ************************** Not sure what I am doing wrong. I can, however, retrieve individual accession number. I am enclosing the code. Thanks for the help in advance Sujoy Paul **************************************************************************** **************** use Bio::DB::GenBank; use Bio::SeqIO; use Bio::DB::WebDBSeqI; my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Seq_by_acc(['AAH21195','AAH38978']); my $seqout = new Bio::SeqIO(-fh =>'dopamin_human.gbk', -format => 'genbank'); **************************************************************************** ******************* Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk From jason at cgt.duhs.duke.edu Thu Jun 3 10:06:03 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jun 3 10:09:27 2004 Subject: [Bioperl-l] accessing genbank In-Reply-To: <278a01c449b4$ccf3b0c0$776fe383@LTCEP1SP> References: <278a01c449b4$ccf3b0c0$776fe383@LTCEP1SP> Message-ID: Those are protein accessions- you want to use Bio::DB::GenPept. Can someone add a FAQ about this? On Thu, 3 Jun 2004, S.Paul wrote: > Hi Everybody: > > I have been trying to access the Genbank for the following accession numbers and outfut it to a file and am getting the following error message: > **************************************************************************** > MSG: acc does not exist > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:1 > 7 > STACK Bio::DB::GenBank::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/GenBank.pm:216 > STACK toplevel Genbank_seq_dopamine.pl:11 > ****************************************************************************************************** > > Not sure what I am doing wrong. I can, however, retrieve individual accession number. I am enclosing the code. > > Thanks for the help in advance > > Sujoy Paul > > ******************************************************************************************** > use Bio::DB::GenBank; > use Bio::SeqIO; > use Bio::DB::WebDBSeqI; > my $gb = new Bio::DB::GenBank; > > my $seqio = $gb->get_Seq_by_acc(['AAH21195','AAH38978']); > > my $seqout = new Bio::SeqIO(-fh =>'dopamin_human.gbk', -format => 'genbank'); > > *********************************************************************************************** > > Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk -- Jason Stajich Duke University jason at cgt.mc.duke.edu From s.paul at surrey.ac.uk Thu Jun 3 18:28:05 2004 From: s.paul at surrey.ac.uk (S.Paul) Date: Thu Jun 3 10:33:46 2004 Subject: [Bioperl-l] downlaoding pdb files References: <004301c4496c$448e2b40$f3ee669e@chagall> Message-ID: <27ac01c449ba$0ac69e30$776fe383@LTCEP1SP> Hi Luisa: I dont think there is a way by which bioperl will help you to download the PDB files since this format is not supported. But if you want to find info regarding the pdb file you might want to look at: perldoc Bio::Structure::IO Sujoy Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk ----- Original Message ----- From: "luisa pugliese" To: Sent: Thursday, June 03, 2004 6:11 AM Subject: [Bioperl-l] downlaoding pdb files > Hi bioperlers, > I looked at the DB modules and I didn't find anything related to the > PDB. Does anyody knows if there is a way to automatically download pdb files > from the PDB within bioperl, as it is possible to do for sequences from > genebank? > Thank you to all > Luisa > ============================= > Luisa Pugliese, Ph.D. > luisa.pugliese@safan-bioinformatics.it > S.A.F.AN. BIOINFORMATICS > Corso Tazzoli 215/13 -10137 Torino - ITALY > tel +39 011 3026230 > cell. +39 333 6130644 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jurgen.pletinckx at algonomics.com Thu Jun 3 10:47:29 2004 From: jurgen.pletinckx at algonomics.com (Jurgen Pletinckx) Date: Thu Jun 3 10:35:57 2004 Subject: [Bioperl-l] downlaoding pdb files In-Reply-To: <004301c4496c$448e2b40$f3ee669e@chagall> Message-ID: I don't think such a beast currently exists. Fortunately, it's fairly easy to get a specific file from the RCSB web site: GET "http://www.rcsb.org/pdb/cgi/export.cgi?format=PDB;pdbId=1LRA" > localfilename where 'GET' (also known as lwp-request) is part of the LWP package, which is a prerequisite of bioperl. In other words, you probably have this in working order. (I can find E:\Perl\bin\GET and /usr/local/bin/GET, respectively, on my machines) Alternatively, there's the RCSB ftp site: GET ftp.rcsb.org/pub/pdb/data/structures/divided/pdb/lr/pdb1lra.ent.Z > localfilename.Z or the bio-mirror site (which seems slightly faster): GET bio-mirror.net/biomirror/pdb/data/structures/divided/pdb/lr/pdb1lra.ent.Z > localfilename.Z Take care with the directory paths for the ftp sites - the typical pdb organisation into two-letter subdirectories is in effect. I've just taken a look at the disk usage of our copy of the database - 3.8 gigabytes for the compressed files; 15 gigabytes for the uncompressed files. If you can spare 20 gigabytes, it's worthwhile to have a local copy ... I hope this helps! -- Jurgen Pletinckx AlgoNomics NV From ian.donaldson at mshri.on.ca Thu Jun 3 12:17:46 2004 From: ian.donaldson at mshri.on.ca (Ian Donaldson) Date: Thu Jun 3 12:21:23 2004 Subject: [Bioperl-l] downlaoding pdb files Message-ID: <490D0AFAF3D2D3119F6C00508B6FDF150690F68B@ex.mshri.on.ca> Hi Luisa: You can use a SeqHound remote-API call to retrieve structures in PDB flat-file format. The call is SHoundGetPDB3D. Documentation for this call can be found at http://www.blueprint.org/seqhound/api_help/apifunctslist.html The SeqHound Perl module (version 2.5) can be down-loaded from http://prdownloads.sourceforge.net/slritools/seqhound.perl.2.5.tar.gz?downlo ad. Lots more documentation is available at http://www.blueprint.org/seqhound/api_help/seqhound_help_guides.html if you've never used it before. Let me know if you have any problems. Best Ian ----- Original Message ----- From: "luisa pugliese" To: Sent: Thursday, June 03, 2004 6:11 AM Subject: [Bioperl-l] downlaoding pdb files > Hi bioperlers, > I looked at the DB modules and I didn't find anything related to the > PDB. Does anyody knows if there is a way to automatically download pdb files > from the PDB within bioperl, as it is possible to do for sequences from > genebank? > Thank you to all > Luisa > ============================= > Luisa Pugliese, Ph.D. > luisa.pugliese@safan-bioinformatics.it > S.A.F.AN. BIOINFORMATICS > Corso Tazzoli 215/13 -10137 Torino - ITALY > tel +39 011 3026230 > cell. +39 333 6130644 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From birney at ebi.ac.uk Thu Jun 3 12:41:25 2004 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Jun 3 12:44:46 2004 Subject: [Bioperl-l] Genome Informatics meeting. Message-ID: The third Genome Informatics meeting is being held in September at Hinxton from 22nd-26th September (Hinxton is just outside Cambridge in the UK, and is where the EBI and Sanger are). This is a meeting focused on large scale, genome-wide data manipulation and analysis, and I think is a great forum for people to discuss the aspects of bioinformatics which don't other wise get alot of air time at other conferences (eg, large scale pipelines, or image analysis). For more information, go to: http://meetings.cshl.org/2004/2004infouk.htm Abstracts are due by June 30th which is pretty soon, but it is pretty easy to put together an abstract for this (2 or so paragraphs). The meeting is run in "cold spring harbor" manner; ie, all the talks are drawn from open abstract submission. If people have any queries about the meeting, just drop me a line. ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From skirov at utk.edu Thu Jun 3 15:18:05 2004 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 3 15:21:16 2004 Subject: [Bioperl-l] Really weird bl2seq behavior Message-ID: <40BF796D.2060107@utk.edu> I know this is off topic, sorry about this... I jus wonder if anyone else has run into the same or similar problem: I am bl2seq from perl, using system. Done from perl I get slightly different results each other time, i.e. even runs give me the right alignment, odd ones- no. The difference is not huge, but still significant. It is also important that the output file is temporary and I write multiple times and delete it afterwards. I guess this is might be part of the problem. Also I am using -o to create the report. Doing exactly the same thing from command line always gives the proper alignment. So it has to be a perl or bl2seq bug (perl 5.8.1) or some kind of incompability... If I use redirection instead of -o option it is OK, so I guess it is a bl2seq bug. I am completely confused and I hate to let this go... Any ideas? Thanks! Stefan From sdavis2 at mail.nih.gov Thu Jun 3 15:53:36 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Jun 3 15:56:48 2004 Subject: [Bioperl-l] Calculating Tm for nucleotides Message-ID: All, I am interested in calculating Tm for oligos (many thousands of them). What is the fastest way of doing this? I see at least two ways (emboss and primer.pm and there are probably more). Thanks, Sean From redwards at utmem.edu Thu Jun 3 16:34:36 2004 From: redwards at utmem.edu (Rob Edwards) Date: Thu Jun 3 16:37:47 2004 Subject: [Bioperl-l] Calculating Tm for nucleotides In-Reply-To: References: Message-ID: <6DAAE13E-B59D-11D8-8C17-000A959E1622@utmem.edu> If you want to use BioPerl, check out Bio::SeqFeature::Primer Make sure that you have a recent version of BioPerl that has the changes suggested by Barry Moore (he should be credited in the file). This may not be the quickest, but it will work! For example: use Bio::SeqFeature::Primer; foreach my $seq (@seqs) { my $primer=Bio::SeqFeature::Primer->new(-seq=>$seq); print "$seq\t", $primer->Tm, "\n"; } or, assuming one primer per line in a file: perl -MBio::SeqFeature::Primer -ne 'chomp; $p=Bio::SeqFeature::Primer->new(-seq=>$_); print "$_\t", $p->Tm, "\n"' filename Rob On Jun 3, 2004, at 2:53 PM, Sean Davis wrote: > All, > > I am interested in calculating Tm for oligos (many thousands of them). > What > is the fastest way of doing this? I see at least two ways (emboss and > primer.pm and there are probably more). > > Thanks, > Sean > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From gtg974p at mail.gatech.edu Thu Jun 3 14:04:40 2004 From: gtg974p at mail.gatech.edu (gtg974p@mail.gatech.edu) Date: Thu Jun 3 20:14:05 2004 Subject: [Bioperl-l] Getting all thae matrices from transfac Message-ID: <1086285880.40bf68385ad1e@webmail.mail.gatech.edu> Hi all, Can someone tell me how to get all the matrices from Local TRANSFAC. The get_MatrixSet() module works only for Jaspar. I need something like -- my $db = TFBS::DB::FlatFileDir->connect("$jaspardir"); my $matrixset = $newdb->get_MatrixSet(-matrixtype=>"PFM"); my $mx_iterator = $matrixset->Iterator(-sort_by=>'name'); printf("\n %-10s%15s%25s \n",'MatrixID','Name','Length'); while(my $pfm = $mx_iterator->next()) { printf(" %-10s%15s%25s \n", $pfm->ID, $pfm->name, $pfm->length); } for getting all the matrices from TRANSFAC. Thanks in advance. From skirov at utk.edu Thu Jun 3 20:58:47 2004 From: skirov at utk.edu (Stefan Kirov) Date: Thu Jun 3 21:02:01 2004 Subject: [Bioperl-l] Getting all thae matrices from transfac In-Reply-To: <1086285880.40bf68385ad1e@webmail.mail.gatech.edu> References: <1086285880.40bf68385ad1e@webmail.mail.gatech.edu> Message-ID: <40BFC947.1040403@utk.edu> Currently TFBS is not in bioperl. You can do what you want with Bio::Matrix::PSM::IO. Also TFBS reads Jaspar format, transfac format is quite different. my $io=new Bio::Matrix::PSM::IO(-file=>'matrix.dat',-format=>'transfac'); while (my $matrix=$io->next_psm) { my $id=$matrix->id; my $an=$matrix->accession_number; my $l=$matrix->width; my $cons=$matrix->IUPAC; print"$id\t$an\t$l\t$cons\n"; } matrix.dat is usually the Transfac file, containing the matrices data, unless you have renamed it. see as well Bio::Matrix::PSM::SiteMatrix. Hope this helps. Stefan gtg974p@mail.gatech.edu wrote: >Hi all, >Can someone tell me how to get all the matrices from Local TRANSFAC. The >get_MatrixSet() module works only for Jaspar. I need something like -- > > >my $db = TFBS::DB::FlatFileDir->connect("$jaspardir"); >my $matrixset = $newdb->get_MatrixSet(-matrixtype=>"PFM"); >my $mx_iterator = $matrixset->Iterator(-sort_by=>'name'); >printf("\n %-10s%15s%25s \n",'MatrixID','Name','Length'); >while(my $pfm = $mx_iterator->next()) >{ > printf(" %-10s%15s%25s \n", $pfm->ID, $pfm->name, $pfm->length); >} > > >for getting all the matrices from TRANSFAC. > >Thanks in advance. >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 1060 Commerce Park, Oak Ridge TN 37830-8026 USA tel +865 576 5120 fax +865 241 1965 e-mail: skirov@utk.edu sao@ornl.gov From hlapp at gmx.net Fri Jun 4 02:19:44 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jun 4 02:22:49 2004 Subject: [Bioperl-l] Biosql documentation request In-Reply-To: Message-ID: <2B91ECE6-B5EF-11D8-B205-000A959EB4C4@gmx.net> Brian, if I understand the output correctly it only documents the schema elements. Do you feel that the ERD (doc/biosql-ERD.pdf) does not fulfill this purpose well enough? The ERD diagram actually doesn't show the unique key constraints, so that would be a difference indeed. -hilmar On Thursday, June 3, 2004, at 05:56 AM, Brian Osborne wrote: > Bioperl-l, > > Dave Howorth has provided a detailed critique of the bioperl-db/biosql > documentation which I'm working through. One thing that he noticed was > that > the Biosql file doc/biosql.html was out-of-date. This file was created > by > running a script called postgres_autodoc.pl on a Postgres instance of > the > biosql schema. Can anyone provide me with a current version of this > file? I > run biosql on Mysql myself and I haven't found a script or utility > equivalent to postgres_autodoc.pl. postgres_autodoc.pl is available at > http://www.rbt.ca/autodoc/. > > Brian O. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From bmb9jrm at bmb.leeds.ac.uk Fri Jun 4 06:04:12 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Fri Jun 4 06:07:21 2004 Subject: [Bioperl-l] Another Phylip question Message-ID: <1086343451.6703.14.camel@localhost.localdomain> Hi to all, Yep, another problem with the Phylip suite, which I was hoping either Jason or anyone else 'in-the-know' could help me with. The pertinent section of code is as follows: my ($tree) = $neighbor->run($matrix); my $drawfact = new Bio::Tools::Run::Phylo::Phylip::DrawTree(); $drawfact->fontfile('fontfile'); my $treeimagefile = $drawfact->run($tree); print "Tree file: ", $treeimagefile, "\n"; system ("cp $treeimagefile ./testtree"); As far as I can tell from the output the tree is being created successfully by neighbour. For a time I was wondering why my tree files were not present in the /tmp folder, but thinking maybe they were being cleaned up at the end of the script, I added that last line. The copy works, but I end up with what seems like an empty postscript file when opened in my postscript viewer, and something like the following when viewed in a text editor: %!PS-Adobe-2.0 EPSF-2.0 %%Creator: Phylip %%Title: phylip.ps %%Pages: 1 %%BoundingBox: 0 0 612 792 %%EndComments %%EndProlog %%Page: 1 1 1 setlinecap 1 setlinejoin nan setlinewidth newpath stroke nan nan moveto nan nan lineto stroke nan nan moveto nan nan lineto stroke nan nan moveto nan nan lineto stroke nan nan moveto nan nan lineto stroke nan nan moveto nan nan lineto stroke nan nan moveto nan nan lineto stroke nan nan moveto nan nan lineto .......... Any ideas? Thanks in advance, Jon From brian_osborne at cognia.com Fri Jun 4 07:38:30 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jun 4 07:42:02 2004 Subject: [Bioperl-l] Biosql documentation request In-Reply-To: <2B91ECE6-B5EF-11D8-B205-000A959EB4C4@gmx.net> Message-ID: Hilmar, Neither does the ERD show nullability. The ERD is good but some useful information is missing, yes. The ERD is dated 6/4/2003, is this the latest version? Pardon my ignorance. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp Sent: Friday, June 04, 2004 2:20 AM To: Brian Osborne Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Biosql documentation request Brian, if I understand the output correctly it only documents the schema elements. Do you feel that the ERD (doc/biosql-ERD.pdf) does not fulfill this purpose well enough? The ERD diagram actually doesn't show the unique key constraints, so that would be a difference indeed. -hilmar On Thursday, June 3, 2004, at 05:56 AM, Brian Osborne wrote: > Bioperl-l, > > Dave Howorth has provided a detailed critique of the bioperl-db/biosql > documentation which I'm working through. One thing that he noticed was > that > the Biosql file doc/biosql.html was out-of-date. This file was created > by > running a script called postgres_autodoc.pl on a Postgres instance of > the > biosql schema. Can anyone provide me with a current version of this > file? I > run biosql on Mysql myself and I haven't found a script or utility > equivalent to postgres_autodoc.pl. postgres_autodoc.pl is available at > http://www.rbt.ca/autodoc/. > > Brian O. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmb9jrm at bmb.leeds.ac.uk Fri Jun 4 07:52:09 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Fri Jun 4 07:55:16 2004 Subject: [Bioperl-l] Blast against the mouse genome Message-ID: <1086349874.6703.26.camel@localhost.localdomain> Hello all, Apologies for duplication, I found this question in the archives from February, but with no reply. How do I run remote Blast against the mouse genome? I've accomplished this with the human genome via a 'chromosome' entry for -data, but this only returns human entries. I've changed the parameter like: $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Mus musculus[Organism]'; But to no avail. I've sucessfully checked via the web that my query returns mouse genome results. The results page quotes 'contig' as the database, but this doesn't work from the script. Can anyone help? Cheers, Jon From jason at cgt.duhs.duke.edu Fri Jun 4 08:32:21 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jun 4 08:35:30 2004 Subject: [Bioperl-l] doc.bioperl.org Message-ID: FYI - behind the scenes work. The doc.bioperl.org site has moved to the main bioperl server. The address is the same, but it should be more stable and easier to keep up to date now. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From bmb9jrm at bmb.leeds.ac.uk Fri Jun 4 09:32:58 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Fri Jun 4 09:36:05 2004 Subject: [Bioperl-l] Blast against the mouse genome In-Reply-To: <22127.193.137.94.3.1086355549.squirrel@webmail.netvisao.pt> References: <1086349874.6703.26.camel@localhost.localdomain> <22127.193.137.94.3.1086355549.squirrel@webmail.netvisao.pt> Message-ID: <1086355978.8738.5.camel@localhost.localdomain> Thanks for the reply. Yep, you're right, it works fine Blasting against the non-redundant database. But how do I do a genome search? nr does not return genome matches for a query which I have successfully performed via the mouse genome page online. On Fri, 2004-06-04 at 14:25, pdavid@netvisao.pt wrote: > Your line works for me. This is the script I used: > > #/usr/bin/perl -w > use strict; > > use Bio::Tools::Run::RemoteBlast; > > my $str = Bio::SeqIO->new(-file=>'blast.fa' , '-format' => 'fasta' ); > my $prog = 'blastn'; > my $db = 'nr'; > my $e_val= '10'; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Mus > musculus[Organism]'; > > print STDERR "\n" , $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} > , "\n"; > > while (my $input = $str->next_seq()){ > my $r = $factory->submit_blast($input); > print STDERR "waiting..."; > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "."; > sleep 60; > } > > else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name(); > $factory->save_output($filename); > $factory->remove_rid($rid); > } > } > } > } > > > > Hello all, > > > > Apologies for duplication, I found this question in the archives from > > February, but with no reply. > > > > How do I run remote Blast against the mouse genome? I've accomplished > > this with the human genome via a 'chromosome' entry for -data, but this > > only returns human entries. I've changed the parameter like: > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Mus > > musculus[Organism]'; > > > > But to no avail. I've sucessfully checked via the web that my query > > returns mouse genome results. The results page quotes 'contig' as the > > database, but this doesn't work from the script. > > > > Can anyone help? > > > > Cheers, > > > > Jon > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From jason at cgt.duhs.duke.edu Fri Jun 4 10:00:48 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jun 4 10:04:05 2004 Subject: [Bioperl-l] Blast against the mouse genome In-Reply-To: <1086355978.8738.5.camel@localhost.localdomain> References: <1086349874.6703.26.camel@localhost.localdomain> <22127.193.137.94.3.1086355549.squirrel@webmail.netvisao.pt> <1086355978.8738.5.camel@localhost.localdomain> Message-ID: You can look at the source for the web cgi page http://www.ncbi.nlm.nih.gov/genome/seq/MmBlast.html and deduce the necessary parameters -- that's all we've done. Presumably you need to set a bunch of the options as listed from the exerpt of the html form below (there is more so you should look at the whole thing): DB_DIR_PREFIX="mm_genome" DB="genome" and so forth...
Database: value="mouse_contig"> > > > > > > Database: > ; close INPUT; my $panel; my $flag = 1; my ($feature, $track); foreach(@data) { chomp; next if /^\#/; # ignore comments my ($name, $length, $domain, $score, $start, $end) = split /\t+/; if($flag == 1) { # draw panel $panel = Bio::Graphics::Panel->new(-length => $length, -width => 800, -pad_left => 10, -pad_right => 10, ); # draw reference ruler of size sequence my $full_length = Bio::SeqFeature::Generic->new(-start=>1, -end=>$length); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, -label => "$name", ); $flag = 0; $track = $panel -> add_track(-glyph => 'rndrect', -label => 1, -bgcolor => 'blue', -min_score => 0, -max_score => 1000, -font2color => 'red', -sort_order => 'high_score', -description => sub { my $feature = shift; my $score = $feature->score; if ($score =~ /E/) { return "e-value=$score"; } else { return "score=$score"; } }); $feature = Bio::SeqFeature::Generic->new(-display_name=>$domain, -score=>$score, -start=>$start, -end=>$end); } my $subfeature = Bio::SeqFeature::Generic->new(-label =>$domain, -display_name=>$domain, -score=>$score, -start=>$start, -end=>$end); $feature->add_SeqFeature($subfeature, 'EXPAND'); } $track -> add_feature($feature); print $panel -> png; exit; Any help would be greatly appreciated. Bioperl newbie From michael.watson at bbsrc.ac.uk Fri Jun 18 09:29:56 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri Jun 18 09:32:41 2004 Subject: [Bioperl-l] Find tiling path Message-ID: <8975119BCD0AC5419D61A9CF1A923E957C26E7@iahce2knas1.iah.bbsrc.reserved> Quick question: I have a database of sequences. One of my queries hits one sequence in the DB, another hits a different sequence in the DB. I have reason to believe that other sequences in the DB will form a "tiling path" between sequence 1 and sequence 2 so I can create a contig that spans the gap. Is there something which does that? I'm using BLAST as my query algorithm. Mick From sanges at biogem.it Fri Jun 18 09:39:49 2004 From: sanges at biogem.it (Remo Sanges) Date: Fri Jun 18 09:42:53 2004 Subject: [Bioperl-l] Find tiling path In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E957C26E7@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E957C26E7@iahce2knas1.iah.bbsrc.reserved> Message-ID: On Jun 18, 2004, at 3:29 PM, michael watson (IAH-C) wrote: > Quick question: > > I have a database of sequences. One of my queries hits one sequence in > the DB, another hits a different sequence in the DB. I have reason to > believe that other sequences in the DB will form a "tiling path" > between > sequence 1 and sequence 2 so I can create a contig that spans the gap. > Is there something which does that? I'm using BLAST as my query > algorithm. You can try Cap3 http://genome.cs.mtu.edu/cap/cap3.html or Phrap http://phrap.org/ Remo From michael.watson at bbsrc.ac.uk Fri Jun 18 09:44:51 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri Jun 18 09:47:36 2004 Subject: [Bioperl-l] Find tiling path Message-ID: <8975119BCD0AC5419D61A9CF1A923E957C26E8@iahce2knas1.iah.bbsrc.reserved> Hi Thank you, sequence assembly will come after the step I am talking about :-) What I have is about 30,000 sequences, perhaps only 100 of which span the region I want to create a contig for. The step I am talking about is identifying, out of the 30,000 sequences, those that match to the region I want to contig - and only being in posession of the sequences at one or both ends of that contig i.e. I want to start with a sequence at the 5' end, and kind of "walk" along the sequence in the 3' direction by identifying hits in the database that take me in that direction Thanks Mick -----Original Message----- From: Remo Sanges [mailto:sanges@biogem.it] Sent: 18 June 2004 14:40 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] Find tiling path On Jun 18, 2004, at 3:29 PM, michael watson (IAH-C) wrote: > Quick question: > > I have a database of sequences. One of my queries hits one sequence > in the DB, another hits a different sequence in the DB. I have reason > to believe that other sequences in the DB will form a "tiling path" > between sequence 1 and sequence 2 so I can create a contig that spans > the gap. Is there something which does that? I'm using BLAST as my > query algorithm. You can try Cap3 http://genome.cs.mtu.edu/cap/cap3.html or Phrap http://phrap.org/ Remo From axl163 at yahoo.com Fri Jun 18 09:51:29 2004 From: axl163 at yahoo.com (Allen Liu) Date: Fri Jun 18 09:54:44 2004 Subject: [Bioperl-l] Bio::Graphics - Here is the program in a better format Message-ID: #!/usr/bin/perl -w use strict; use Bio::Graphics; use Bio::SeqFeature::Generic; chomp (my $file = shift(@ARGV)); open(INPUT, "<$file") or die "Cannot open \"$file\": $!\n"; my @data = ; close INPUT; my $panel; my $flag = 1; my ($feature, $track); foreach(@data) { chomp; next if /^\#/; # ignore comments my ($name, $length, $domain, $score, $start, $end) = split /\t+/; if($flag == 1) { # draw panel $panel = Bio::Graphics::Panel->new(-length => $length, -width => 800, -pad_left => 10, -pad_right => 10, ); # draw reference ruler of size sequence my $full_length = Bio::SeqFeature::Generic->new(-start=>1, -end=>$length); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, -label => "$name", ); $flag = 0; $track = $panel -> add_track(-glyph => 'rndrect', -label => 1, -bgcolor => 'blue', -min_score => 0, -max_score => 1000, -font2color => 'red', -sort_order => 'high_score', -description => sub { my $feature = shift; my $score = $feature->score; if ($score =~ /E/) { return "e-value=$score"; } else { return "score=$score"; } }); $feature = Bio::SeqFeature::Generic->new(-display_name=>$domain, -score=>$score, -start=>$start, -end=>$end); } my $subfeature = Bio::SeqFeature::Generic->new(-label =>$domain, -display_name=>$domain, -score=>$score, -start=>$start, -end=>$end); $feature->add_SeqFeature($subfeature, 'EXPAND'); } $track -> add_feature($feature); print $panel -> png; exit; From sanges at biogem.it Fri Jun 18 09:59:49 2004 From: sanges at biogem.it (Remo Sanges) Date: Fri Jun 18 10:02:46 2004 Subject: [Bioperl-l] Bio::Graphics - I am having trouble drawing protein domain topology In-Reply-To: <20040618121546.68398.qmail@web41510.mail.yahoo.com> References: <20040618121546.68398.qmail@web41510.mail.yahoo.com> Message-ID: On Jun 18, 2004, at 2:15 PM, Allen Liu wrote: > Hi Bioperl-users, > > I am a bioperl newbie and I am having a hard time > trying to get Bio::Graphics to do what I need it to. > I have been attempting to write a script that would > read a file like the following: > >> AAM43765 456 Bombesin 4.47E-03 307 316 >> AAM43765 456 PROTEIN_KINASE_ATP 8.00E-05 134 167 >> AAM43765 456 PROTEIN_KINASE_ST 8.00E-05 247 259 > > > The first column is the name of my protein. The > second column is the length of the protein. The third > column is the name of the domain. The fourth column > is the e-value. The fifth and sixth columns are the > start and ends of the domain. > > I have based my initial efforts on Lincoln Stein's > examples which works fine and I was able to get all my > domains on one track, but I could not get it to label > any other domains other than the first domain. Also, > I could not get the solid line that links the domains > to start from the beginning and go all the way to the > end of the protein. Hi Allen, gave a quick look at your code, please consider this: - you should open the 'panel' only once and then add all the features to it. - I think that even If you EXPAND one feature, you can use only one description and/or score, so if you want all the domain in description you should append every time the new to the old one. In other way you can use different feature in order to give a description to each one. HTH Remo From axl163 at mac.com Fri Jun 18 09:44:59 2004 From: axl163 at mac.com (Janet Smith) Date: Fri Jun 18 10:19:59 2004 Subject: [Bioperl-l] Bio::Graphics - Here is the program in a better format Message-ID: #!/usr/bin/perl -w use strict; use Bio::Graphics; use Bio::SeqFeature::Generic; chomp (my $file = shift(@ARGV)); open(INPUT, "<$file") or die "Cannot open \"$file\": $!\n"; my @data = ; close INPUT; my $panel; my $flag = 1; my ($feature, $track); foreach(@data) { chomp; next if /^\#/; # ignore comments my ($name, $length, $domain, $score, $start, $end) = split /\t+/; if($flag == 1) { # draw panel $panel = Bio::Graphics::Panel->new(-length => $length, -width => 800, -pad_left => 10, -pad_right => 10, ); # draw reference ruler of size sequence my $full_length = Bio::SeqFeature::Generic->new(-start=>1, -end=>$length); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, -label => "$name", ); $flag = 0; $track = $panel -> add_track(-glyph => 'rndrect', -label => 1, -bgcolor => 'blue', -min_score => 0, -max_score => 1000, -font2color => 'red', -sort_order => 'high_score', -description => sub { my $feature = shift; my $score = $feature->score; if ($score =~ /E/) { return "e-value=$score"; } else { return "score=$score"; } }); $feature = Bio::SeqFeature::Generic->new(-display_name=>$domain, -score=>$score, -start=>$start, -end=>$end); } my $subfeature = Bio::SeqFeature::Generic->new(-label =>$domain, -display_name=>$domain, -score=>$score, -start=>$start, -end=>$end); $feature->add_SeqFeature($subfeature, 'EXPAND'); } $track -> add_feature($feature); print $panel -> png; exit; From crabtree at tigr.org Fri Jun 18 10:32:30 2004 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Fri Jun 18 10:35:55 2004 Subject: [Bioperl-l] Bio::Graphics - Here is the program in a better format Message-ID: Allen- To get the connecting lines to extend all the way to the edges of the image, simply expand your top-level $feature. Change the following statement so that it uses "-start => 1, -end => $length" instead: > $feature = > Bio::SeqFeature::Generic->new(-display_name=>$domain, > -score=>$score, -start=>$start, -end=>$end); > I think you can safely ignore Remo's first comment, since you are in fact creating the Panel only once (thanks to your $flag variable.) With regards to the domain labels, however, I think that Remo is on the right track (no pun intended.) The problem here is that whenever you have a feature with subfeatures (as is the case with your $feature and $subfeature objects), Bio::Graphics will only try to display labels and descriptions for the top-level feature, not the subfeatures. This is a feature, not a bug, and it's documented in Panel.pm. Anyway, this is why the only label and description you're seeing is the one you've assigned to $feature (which happens to be the name and score of the first domain to appear in your input file). As Remo says, one option is to assign this top-level feature a name that consists of concatenating all the individual domain names together. However, then you have the problem that the names and scores will no longer appear next to the subfeatures to which they refer. You may be able to get Bio::Graphics to label the subfeatures by clever use of the -all_callbacks option in Bio::Graphics::Panel, but I'm not sure about this. Your other option is to choose a slightly different graphical layout, for example using one top-level glyph to represent the protein and then overlaying a set of top-level "rndrect" glyphs for the domains. You can pass a coderef to the -glyph option to mix and match different glyph types within a single track. Jonathan From axl163 at yahoo.com Fri Jun 18 11:22:55 2004 From: axl163 at yahoo.com (Allen Liu) Date: Fri Jun 18 11:25:52 2004 Subject: [Bioperl-l] Bio::Graphics - Is there anyway of having multiple top-level glyphs overlaying each other Message-ID: Is there anyway of having multiple top-level glyphs overlaying each other? If so, that would make drawing protein domains a lot easier. From cain at cshl.org Fri Jun 18 11:39:49 2004 From: cain at cshl.org (Scott Cain) Date: Fri Jun 18 11:46:45 2004 Subject: [Bioperl-l] EST clustering software Message-ID: <1087573189.1498.9.camel@localhost.localdomain> Hello all, I am looking for free and/or open source software for doing EST clustering. I am aware of StackPack and TGI Clustering tools from TIGR. Are there others I should know about? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From crabtree at tigr.org Fri Jun 18 11:40:34 2004 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Fri Jun 18 11:57:57 2004 Subject: [Bioperl-l] Bio::Graphics - Is there anyway of having multipletop-level glyphs overlaying each other Message-ID: Allen- > Is there anyway of having multiple top-level glyphs > overlaying each other? If so, that would make drawing protein > domains a lot easier. Sure, just set "-bump => 0" when you create the track (see attached code and png). The only problems with this approach are 1. the use of a whitespace label to center the 'line' glyph correctly and 2. since bump == 0 there's nothing to stop the labels of the domain glyphs from overlapping if the domains are close together (but this is an inherent drawback of the particular layout you want to achieve.) Jonathan -------------- next part -------------- A non-text attachment was scrubbed... Name: prog.pl Type: application/octet-stream Size: 2811 bytes Desc: prog.pl Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20040618/51aba71f/prog.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: test.png Type: image/png Size: 1679 bytes Desc: test.png Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20040618/51aba71f/test.png From mebradley at chem.ufl.edu Fri Jun 18 14:02:11 2004 From: mebradley at chem.ufl.edu (Michael Bradley) Date: Fri Jun 18 14:57:26 2004 Subject: [Bioperl-l] clustal w gap penalties Message-ID: <000701c4555e$68f37910$8601a8c0@bradleydell> Bioperlers, The placement of gaps in a multiple sequence alignments is more accurate when guided by a secondary structure. What is the correct way to pass Clustal w a sequence containing secondary structure or gap penalty masks? I have tried to do this with Bio::AlignIO. my $str = Bio::Align::IO->new ('-file' => 'file_with_!SS_or_!GM_mask.aln') my $aln = $str->next_aln(); The 'file_with_!SS_or_!GM_mask.aln' is the standard Clustal representation of this information and looks like this: CLUSTAL W (1.83) multiple sequence alignment !SS_seq aaaAAAAAAaaa.bbbBBBBBBbbb (where a/A denote helix and b/B denote strand) seq MyAminoAcidSequenceGoesHere or CLUSTAL W (1.83) multiple sequence alignment !GM_seq 222444444222111222444444222 seq MyAminoAcidSequenceGoesHere I get the sequence but the mask appears to be lost at this step. Any suggestions? I have found some discussion on the mail list about using Bio::Seq::Meta::Array to create the mask followed by modifying some aspect of Bio::Tools::Run::Alignment::Clustalw to pass the mask to clustal w. Has anyone charted the course on this yet? Mike Bradley From reche at research.dfci.harvard.edu Fri Jun 18 14:24:15 2004 From: reche at research.dfci.harvard.edu (Pedro Antonio Reche) Date: Fri Jun 18 18:20:51 2004 Subject: [Bioperl-l] fasta genbank record with gid In-Reply-To: <200405120950.25416.heikki@ebi.ac.uk> References: <40A15E6D.90409@ish.de> <200405120950.25416.heikki@ebi.ac.uk> Message-ID: <40D3334F.3060604@research.dfci.harvard.edu> Hi all, I am using the following code provided by Jason Stajich to download genbank fasta records: #!/usr/sbin/perl -w # # How to retrieve GenBank entries over the Web # # by Jason Stajich # use Bio::DB::GenPept; use Bio::SeqIO; my $gb = new Bio::DB::GenPept; my $seqout = new Bio::SeqIO(-fh => \*STDOUT, -format => 'fasta'); my $seqio = $gb->get_Stream_by_id([ qw( 18606304 )]); while( defined ($seq = $seqio->next_seq )) { $seqout->write_seq($seq); } When I do this I get the following: >AAH22894 Thymic stromal lymphopoietin, isoform 2 [Homo sapiens]. MKTKAALAIWCPGYSETQINATQAMKKRRKRKVTTNKCLEQVSQLQGLWRRFNRPLLKQQ which is nice but I wonder if it will be possible to include the GI number in the fasta header so that I would get something like this >18606304|AAH22894 Thymic stromal lymphopoietin, isoform 2 [Homo sapiens]. MKTKAALAIWCPGYSETQINATQAMKKRRKRKVTTNKCLEQVSQLQGLWRRFNRPLLKQQ Thanks in advance for any help. Regards, pdro From reche at research.dfci.harvard.edu Fri Jun 18 14:24:15 2004 From: reche at research.dfci.harvard.edu (Pedro Antonio Reche) Date: Fri Jun 18 18:20:53 2004 Subject: [Bioperl-l] fasta genbank record with gid In-Reply-To: <200405120950.25416.heikki@ebi.ac.uk> References: <40A15E6D.90409@ish.de> <200405120950.25416.heikki@ebi.ac.uk> Message-ID: <40D3334F.3060604@research.dfci.harvard.edu> Hi all, I am using the following code provided by Jason Stajich to download genbank fasta records: #!/usr/sbin/perl -w # # How to retrieve GenBank entries over the Web # # by Jason Stajich # use Bio::DB::GenPept; use Bio::SeqIO; my $gb = new Bio::DB::GenPept; my $seqout = new Bio::SeqIO(-fh => \*STDOUT, -format => 'fasta'); my $seqio = $gb->get_Stream_by_id([ qw( 18606304 )]); while( defined ($seq = $seqio->next_seq )) { $seqout->write_seq($seq); } When I do this I get the following: >AAH22894 Thymic stromal lymphopoietin, isoform 2 [Homo sapiens]. MKTKAALAIWCPGYSETQINATQAMKKRRKRKVTTNKCLEQVSQLQGLWRRFNRPLLKQQ which is nice but I wonder if it will be possible to include the GI number in the fasta header so that I would get something like this >18606304|AAH22894 Thymic stromal lymphopoietin, isoform 2 [Homo sapiens]. MKTKAALAIWCPGYSETQINATQAMKKRRKRKVTTNKCLEQVSQLQGLWRRFNRPLLKQQ Thanks in advance for any help. Regards, pdro From jason at cgt.duhs.duke.edu Sat Jun 19 10:46:28 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Jun 19 10:48:45 2004 Subject: [Bioperl-l] fasta genbank record with gid In-Reply-To: <40D3334F.3060604@research.dfci.harvard.edu> References: <40A15E6D.90409@ish.de> <200405120950.25416.heikki@ebi.ac.uk> <40D3334F.3060604@research.dfci.harvard.edu> Message-ID: (This only works if you have read in a format which has a GI field in it like genbank) $seq->display_id( sprintf("gi|%d|%s|%s", $seq->primary_id, $seq->display_id, $seq->accession_number)); --jason On Fri, 18 Jun 2004, Pedro Antonio Reche wrote: > Hi all, > I am using the following code provided by Jason Stajich to download > genbank fasta records: > #!/usr/sbin/perl -w > # > # How to retrieve GenBank entries over the Web > # > # by Jason Stajich > # > use Bio::DB::GenPept; > use Bio::SeqIO; > my $gb = new Bio::DB::GenPept; > > my $seqout = new Bio::SeqIO(-fh => \*STDOUT, -format => 'fasta'); > > my $seqio = $gb->get_Stream_by_id([ qw( 18606304 )]); > > while( defined ($seq = $seqio->next_seq )) { > $seqout->write_seq($seq); > } > > When I do this I get the following: > >AAH22894 Thymic stromal lymphopoietin, isoform 2 [Homo sapiens]. > MKTKAALAIWCPGYSETQINATQAMKKRRKRKVTTNKCLEQVSQLQGLWRRFNRPLLKQQ > > which is nice but I wonder if it will be possible to include the GI > number in the fasta header so that I would get something like this > > >18606304|AAH22894 Thymic stromal lymphopoietin, isoform 2 [Homo sapiens]. > MKTKAALAIWCPGYSETQINATQAMKKRRKRKVTTNKCLEQVSQLQGLWRRFNRPLLKQQ > > Thanks in advance for any help. > Regards, > > pdro > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From bmb9jrm at bmb.leeds.ac.uk Sun Jun 20 05:43:42 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Sun Jun 20 05:46:19 2004 Subject: [Bioperl-l] Conversion of contig coordinates to chromosome Message-ID: <1087724622.6354.37.camel@localhost.localdomain> Hi all, This is kind of mixed Ensembl API/Bioperl question, so I've posted to both lists. I've been using the bioperl remoteblast module to locate sequences to the genomes of humans, and other organisms. I then use the resulting contig coordinates (I didn't have much luck searching the NCBI 'chromosome' database for non-human sequences) to retrieve high-quality information via the contig coordinate system of ensembl, which I can then 'project' to get chromosomal coordinates. I thought this was working well, but have found that not all NCBI contigs are listed in ensembl. Is there a way to 'project' contig coordinates onto a chromosome without using ensembl? I could then extract from Ensembl using the 'chromosome' coordinate system. I only really need all the features for human sequences, which seem to work okay (though I'd like to have the information available in other organisms anyway), so downloading an entire NCBI contig in order to extract the subsequence is an option, but I'd rather not, since the files are big and it would take a while (permanent local storage not an option). Any suggestions appreciated. Jon From suzuki at cbl.umces.edu Sun Jun 20 19:02:22 2004 From: suzuki at cbl.umces.edu (Marcelino Suzuki) Date: Sun Jun 20 19:07:20 2004 Subject: [Bioperl-l] FileCache.pm error Message-ID: I am trying to run a script for getting CDS out of Genbank by Jason Stajich below that I saved as test2.pl, and get the following error message, that I believe is caused by my bioperl configuration (I just installed bioperl in MacOS X: ------------- EXCEPTION ------------- MSG: Could not open primary index file STACK Bio::DB::FileCache::_open_database /Library/Perl/5.8.1/Bio/DB/FileCache.pm:321 STACK Bio::DB::FileCache::new /Library/Perl/5.8.1/Bio/DB/FileCache.pm:127 STACK toplevel test2.pl:14 Does anyone have any idea why I get this error? Thanks Marcelino #!/usr/bin/perl -w use strict; use Bio::DB::GenBank; use Bio::DB::GenPept; use Bio::DB::FileCache; use Bio::Factory::FTLocationFactory; use Bio::SeqFeature::Generic; my $ntdb = new Bio::DB::GenBank; my $pepdb= new Bio::DB::GenPept; # do some caching in the event you're pulling up the same # chromosome and/or you are debugging my $cachent = new Bio::DB::FileCache(-kept => 1, -file => '/tmp/cache/nt.idx', -seqdb => $ntdb); my $cachepep = new Bio::DB::FileCache(-kept => 1, -file => '/tmp/cache/pep.idx', -seqdb => $pepdb); # obj to turn strings into Bio::Location object my $locfactory = new Bio::Factory::FTLocationFactory; # you might get these from a file (and they can be accessions too) my @protgis = (10956263); foreach my $gi ( @protgis ) { my $protseq = $cachepep->get_Seq_by_id($gi); if( ! $protseq ) { print STDERR "could not find a seq for gi:$gi\n"; next; } foreach my $cds ( grep { $_->primary_tag eq 'CDS' } $protseq->get_SeqFeatures() ) { next unless( $cds->has_tag('coded_by') ); # skip CDSes with no coded_by my ($codedby) = $cds->each_tag_value('coded_by'); my ($ntacc,$loc) = split(/\:/, $codedby); $ntacc =~ s/(\.\d+)//; # genbank wants an accession not a versioned one my $cdslocation = $locfactory->from_string($loc); my $cdsfeature = new Bio::SeqFeature::Generic(-location => $cdslocation); my $ntseq = $cachent->get_Seq_by_acc($ntacc); next unless $ntseq; $ntseq->add_SeqFeature($cdsfeature); # locate the feature on a seq my $cdsseq = $cdsfeature->spliced_seq(); print "cds seq is ", $cdsseq->seq(), "\n"; } } ======================================================================== ==== oOOOOo Marcelino Suzuki, Assistant Professor oOOO Chesapeake Biological Lab - Univ of Maryland Center Environm Science oOOOOOo. PO Box 38, One Williams St Solomons, MD 20688 .oOOOOOOOOOo. suzuki@cbl.umces.edu - http://cbl.umces.edu .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341 000000000000000000000000000000000000000000000000000000000000000000000000 0000 From sutripa at vbi.vt.edu Sun Jun 20 19:58:39 2004 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Sun Jun 20 20:01:25 2004 Subject: [Bioperl-l] genbank to gff and vice versa Message-ID: <5.1.0.14.0.20040620195557.02c75328@mail.vbi.vt.edu> Hello, I was wondering which module most effectively does the conversion of genbank to gff and vice versa. I downloaded a script which needs Gff.pm, and whan I wanted to install it with CPAN it could not locate. SO I downloaded the Gff.pm from some other source, but it does not seem to do the job. Any suggestions. From jason at cgt.duhs.duke.edu Mon Jun 21 01:22:14 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jun 21 01:24:46 2004 Subject: [Bioperl-l] FileCache.pm error In-Reply-To: References: Message-ID: Did you make the directory /tmp/cache on your machine? The FileCache stuff is overkill depending on what you want to. You can also leave it out by just saying: my $cachent = $ntdb; my $cachepep= $pepdb; -jason On Sun, 20 Jun 2004, Marcelino Suzuki wrote: > I am trying to run a script for getting CDS out of Genbank by Jason > Stajich below that I saved as test2.pl, and get the following error > message, that I believe is caused by my bioperl configuration (I just > installed bioperl in MacOS X: > > ------------- EXCEPTION ------------- > MSG: Could not open primary index file > STACK Bio::DB::FileCache::_open_database > /Library/Perl/5.8.1/Bio/DB/FileCache.pm:321 > STACK Bio::DB::FileCache::new > /Library/Perl/5.8.1/Bio/DB/FileCache.pm:127 > STACK toplevel test2.pl:14 > > Does anyone have any idea why I get this error? > > Thanks > > Marcelino > > > #!/usr/bin/perl -w > use strict; > use Bio::DB::GenBank; > use Bio::DB::GenPept; > use Bio::DB::FileCache; > use Bio::Factory::FTLocationFactory; > use Bio::SeqFeature::Generic; > > my $ntdb = new Bio::DB::GenBank; > my $pepdb= new Bio::DB::GenPept; > > # do some caching in the event you're pulling up the same > # chromosome and/or you are debugging > my $cachent = new Bio::DB::FileCache(-kept => 1, > -file => '/tmp/cache/nt.idx', > -seqdb => $ntdb); > > my $cachepep = new Bio::DB::FileCache(-kept => 1, > -file => '/tmp/cache/pep.idx', > -seqdb => $pepdb); > > # obj to turn strings into Bio::Location object > my $locfactory = new Bio::Factory::FTLocationFactory; > > # you might get these from a file (and they can be accessions too) > my @protgis = (10956263); > > foreach my $gi ( @protgis ) { > my $protseq = $cachepep->get_Seq_by_id($gi); > if( ! $protseq ) { print STDERR "could not find a seq for gi:$gi\n"; > next; > } > foreach my $cds ( grep { $_->primary_tag eq 'CDS' } > $protseq->get_SeqFeatures() ) > { > next unless( $cds->has_tag('coded_by') ); # skip CDSes with no > coded_by > my ($codedby) = $cds->each_tag_value('coded_by'); > my ($ntacc,$loc) = split(/\:/, $codedby); > $ntacc =~ s/(\.\d+)//; # genbank wants an accession not a > versioned one > my $cdslocation = $locfactory->from_string($loc); > my $cdsfeature = new Bio::SeqFeature::Generic(-location => > $cdslocation); > my $ntseq = $cachent->get_Seq_by_acc($ntacc); > next unless $ntseq; > $ntseq->add_SeqFeature($cdsfeature); # locate the feature on a seq > my $cdsseq = $cdsfeature->spliced_seq(); > print "cds seq is ", $cdsseq->seq(), "\n"; > } > } > > > > ======================================================================== > ==== > oOOOOo Marcelino Suzuki, Assistant Professor > oOOO Chesapeake Biological Lab - Univ of Maryland > Center Environm Science > oOOOOOo. PO Box 38, One Williams St Solomons, MD 20688 > .oOOOOOOOOOo. suzuki@cbl.umces.edu - > http://cbl.umces.edu > .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341 > 000000000000000000000000000000000000000000000000000000000000000000000000 > 0000 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Mon Jun 21 01:24:39 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jun 21 01:26:54 2004 Subject: [Bioperl-l] genbank to gff and vice versa In-Reply-To: <5.1.0.14.0.20040620195557.02c75328@mail.vbi.vt.edu> References: <5.1.0.14.0.20040620195557.02c75328@mail.vbi.vt.edu> Message-ID: We take each feature from a genbank file and write it to a GFF writer (which takes features as input). my $out = Bio::Tools::GFF->new(-gff_version => 2); for my $feature ( $seq->get_SeqFeatures ) { $out->write_feature($feature); } We don't have a Gff.pm module - we have a GFF.pm which is properly called Bio::Tools::GFF and is in Bio/Tools/GFF.pm. -jason On Sun, 20 Jun 2004, Sucheta Tripathy wrote: > Hello, > > I was wondering which module most effectively does the conversion of > genbank to gff and vice versa. > I downloaded a script which needs Gff.pm, and whan I wanted to install it > with CPAN it could not locate. SO I downloaded the Gff.pm from some other > source, but it does not seem to do the job. > > Any suggestions. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From michael.watson at bbsrc.ac.uk Mon Jun 21 04:56:19 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon Jun 21 04:59:01 2004 Subject: [Bioperl-l] Electronic Chromosome Walking Message-ID: <8975119BCD0AC5419D61A9CF1A923E957C26F7@iahce2knas1.iah.bbsrc.reserved> Hi I'm looking for some software that basically does an electronic chromosomal walk i.e. start with a known DNA sequence, search against a database of sequences, select the hits that extend the sequence furthest into the surrounding DNA and then start the cycle again with those new hits. I'd be very surprised if this hadn't been done.... Anyone know of anything? Thanks Mick From mcvicker at ebi.ac.uk Sun Jun 20 09:03:22 2004 From: mcvicker at ebi.ac.uk (Graham McVicker) Date: Mon Jun 21 08:40:30 2004 Subject: [Bioperl-l] Re: Conversion of contig coordinates to chromosome In-Reply-To: <1087724622.6354.37.camel@localhost.localdomain> Message-ID: On Sun, 20 Jun 2004, Jonathan Manning wrote: > I've been using the bioperl remoteblast module to locate sequences to > the genomes of humans, and other organisms. I then use the resulting > contig coordinates (I didn't have much luck searching the NCBI > 'chromosome' database for non-human sequences) to retrieve high-quality > information via the contig coordinate system of ensembl, which I can > then 'project' to get chromosomal coordinates. I thought this was > working well, but have found that not all NCBI contigs are listed in > ensembl. Hi Jonathon, Can you give an example of an NT contig which is not in the ensembl human database? If the contig is part of the NCBI34 assembly then we ought to have it in the homo_sapiens_core_22_34d database. Regards, Graham ---------------------------------------- Graham McVicker EMBL - European Bioinformatics Institute Cambridge CB10 1SD, UK Tel: +44 (0)1223-492584 Fax: +44 (0)1223-494468 ---------------------------------------- From bmb9jrm at bmb.leeds.ac.uk Mon Jun 21 10:05:47 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Mon Jun 21 10:08:22 2004 Subject: [Bioperl-l] Conversion of contig coordinates to chromosome Message-ID: <1087826747.6104.24.camel@localhost.localdomain> Thanks Haibo. I'm using the remoteblast.pm module, so I guess that's the most recent build. For the benefit of the list, I've solved the problem. Apparently the it's because I'm using a non-reference contig. The solution will be to reject Blast hits from such contigs. Thanks for the help, Jon On Mon, 2004-06-21 at 14:33, hz5@njit.edu wrote: > Hi Jon, > This file is mapping contig coordinates to chromosome locations. > ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/maps/mapview/BUILD.34/seq_contig.md.gz > > haibo > //cheers > P.S. which build are you using? 34 should be pretty standard everywhere > > Quoting Jonathan Manning : > > > Hi all, > > > > This is kind of mixed Ensembl API/Bioperl question, so I've posted to > > both lists. > > > > I've been using the bioperl remoteblast module to locate sequences to > > the genomes of humans, and other organisms. I then use the resulting > > contig coordinates (I didn't have much luck searching the NCBI > > 'chromosome' database for non-human sequences) to retrieve > > high-quality > > information via the contig coordinate system of ensembl, which I can > > then 'project' to get chromosomal coordinates. I thought this was > > working well, but have found that not all NCBI contigs are listed in > > ensembl. > > > > Is there a way to 'project' contig coordinates onto a chromosome > > without > > using ensembl? I could then extract from Ensembl using the > > 'chromosome' > > coordinate system. I only really need all the features for human > > sequences, which seem to work okay (though I'd like to have the > > information available in other organisms anyway), so downloading an > > entire NCBI contig in order to extract the subsequence is an option, > > but > > I'd rather not, since the files are big and it would take a while > > (permanent local storage not an option). > > > > Any suggestions appreciated. > > > > Jon > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > ========================================================= > Haibo Zhang, PhD student > Computational Biology, NJIT & Rutgers University > Center for Applied Genomics, PHRI > http://afs13.njit.edu/~hz5 > From reche at research.dfci.harvard.edu Mon Jun 21 10:22:23 2004 From: reche at research.dfci.harvard.edu (Pedro Antonio Reche) Date: Mon Jun 21 10:24:59 2004 Subject: [Bioperl-l] fasta genbank record with gid In-Reply-To: References: <40A15E6D.90409@ish.de> <200405120950.25416.heikki@ebi.ac.uk> <40D3334F.3060604@research.dfci.harvard.edu> Message-ID: <69B4F628-C38E-11D8-8F1F-000393BC20D0@research.dfci.harvard.edu> Dear Jason, thanks a lot for the code. it worked great. Best, pdro On Jun 19, 2004, at 10:46 AM, Jason Stajich wrote: > (This only works if you have read in a format which has a GI field in > it > like genbank) > > $seq->display_id( sprintf("gi|%d|%s|%s", $seq->primary_id, > $seq->display_id, $seq->accession_number)); > > --jason > On Fri, 18 Jun 2004, Pedro Antonio Reche wrote: > >> Hi all, >> I am using the following code provided by Jason Stajich to download >> genbank fasta records: >> #!/usr/sbin/perl -w >> # >> # How to retrieve GenBank entries over the Web >> # >> # by Jason Stajich >> # >> use Bio::DB::GenPept; >> use Bio::SeqIO; >> my $gb = new Bio::DB::GenPept; >> >> my $seqout = new Bio::SeqIO(-fh => \*STDOUT, -format => 'fasta'); >> >> my $seqio = $gb->get_Stream_by_id([ qw( 18606304 )]); >> >> while( defined ($seq = $seqio->next_seq )) { >> $seqout->write_seq($seq); >> } >> >> When I do this I get the following: >>> AAH22894 Thymic stromal lymphopoietin, isoform 2 [Homo sapiens]. >> MKTKAALAIWCPGYSETQINATQAMKKRRKRKVTTNKCLEQVSQLQGLWRRFNRPLLKQQ >> >> which is nice but I wonder if it will be possible to include the GI >> number in the fasta header so that I would get something like this >> >>> 18606304|AAH22894 Thymic stromal lymphopoietin, isoform 2 [Homo >>> sapiens]. >> MKTKAALAIWCPGYSETQINATQAMKKRRKRKVTTNKCLEQVSQLQGLWRRFNRPLLKQQ >> >> Thanks in advance for any help. >> Regards, >> >> pdro >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================== ====== Pedro A Reche, PhD Dana-Farber Cancer Institute (D1510A) TL: 617 632 3824 Harvard Medical School FX: 617 632 3351 44 Binney Street , EM: reche@research.dfci.harvard.edu Boston, MA 02115, USA W3: www.mifoundation.org From sutripa at vbi.vt.edu Mon Jun 21 10:38:57 2004 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Mon Jun 21 10:41:43 2004 Subject: [Bioperl-l] genbank to gff and vice versa In-Reply-To: References: <5.1.0.14.0.20040620195557.02c75328@mail.vbi.vt.edu> <5.1.0.14.0.20040620195557.02c75328@mail.vbi.vt.edu> Message-ID: <5.1.0.14.0.20040621103824.02c7cb60@mail.vbi.vt.edu> Is the reverse flow possible? from gff to genbank ? Thanks Sucheta At 01:24 AM 6/21/2004 -0400, Jason Stajich wrote: >We take each feature from a genbank file and write it to a >GFF writer (which takes features as input). > >my $out = Bio::Tools::GFF->new(-gff_version => 2); >for my $feature ( $seq->get_SeqFeatures ) { > $out->write_feature($feature); >} > >We don't have a Gff.pm module - we have a GFF.pm which is >properly called Bio::Tools::GFF and is in Bio/Tools/GFF.pm. > >-jason >On Sun, 20 Jun 2004, Sucheta Tripathy wrote: > > > Hello, > > > > I was wondering which module most effectively does the conversion of > > genbank to gff and vice versa. > > I downloaded a script which needs Gff.pm, and whan I wanted to install it > > with CPAN it could not locate. SO I downloaded the Gff.pm from some other > > source, but it does not seem to do the job. > > > > Any suggestions. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >-- >Jason Stajich >Duke University >jason at cgt.mc.duke.edu From Marc.Logghe at devgen.com Mon Jun 21 10:46:11 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Jun 21 10:49:04 2004 Subject: [Bioperl-l] genbank to gff and vice versa Message-ID: > Is the reverse flow possible? > Only if you have your (fasta) sequence sitting somewhere you can turn a GFF line into a feature and add it. my $seq = $fasta->next_seq; my @feat; while () { chomp; next if /^$/; puch @feat, Bio::SeqFeature::Generic->new( -gff_string => $_ ); } $seq->add_SeqFeature(@feat); or something alike. HTH, Marc *********************************************************** Marc Logghe, Ph.D. Senior Scientist Scientific Computing Group Devgen nv Technologiepark 9 B - 9052 Ghent-Zwijnaarde Belgium Tel: +32 9 324 24 88 Fax: +32 9 324 24 25 > **** DISCLAIMER ********************************************************** > "This e-mail and any attachments thereto may contain information > which is confidential and/or protected by intellectual property > rights and are intended for the sole use of the recipient(s) named above. > Any use of the information contained herein (including, but not limited to, > total or partial reproduction, communication or distribution in any form) > by persons other than the designated recipient(s) is prohibited. > If you have received this e-mail in error, please notify the sender either > by telephone or by e-mail and delete the material from any computer. > Thank you for your cooperation." From t.fiedler at umiami.edu Mon Jun 21 11:22:54 2004 From: t.fiedler at umiami.edu (Tristan Fiedler) Date: Mon Jun 21 11:25:32 2004 Subject: [Bioperl-l] RE: EST Clustering software Message-ID: <37648.132.204.27.91.1087831374.squirrel@132.204.27.91> Message: 2 Date: Fri, 18 Jun 2004 11:39:49 -0400 From: Scott Cain Subject: [Bioperl-l] EST clustering software To: Bioperl list open source EST software widely used : cap3 phrap both are quite good. Tristan Fiedler -------------------------------------------------------------- Scott Cain wrote: Hello all, I am looking for free and/or open source software for doing EST clustering. I am aware of StackPack and TGI Clustering tools from TIGR. Are there others I should know about? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -- Tristan J. Fiedler, Ph.D. Postdoctoral Research Fellow - Walsh Laboratory NIEHS Marine & Freshwater Biomedical Sciences Center Rosenstiel School of Marine & Atmospheric Sciences University of Miami tfiedler@rsmas.miami.edu t.fiedler@umiami.edu (alias) 305-361-4626 From suzuki at cbl.umces.edu Mon Jun 21 11:33:24 2004 From: suzuki at cbl.umces.edu (Marcelino Suzuki) Date: Mon Jun 21 11:36:06 2004 Subject: [Bioperl-l] FileCache.pm error In-Reply-To: References: Message-ID: <55B5C644-C398-11D8-93E2-0003939E064E@cbl.umces.edu> Thanks Jason. That worked. I have another question. The script works well, but I was wondering whether I can get the same CDS sequences in genbank format. I was able to create a html file (using sed and awk) from a blast search containing links to al 400 such sequences from proteins I am working with, ie: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? val=34112904&itemID=36&view=gbwithparts and could get each sequence individually using the browser, but is there a way to batch those requests using bioperl? Thanks Marcelino On Jun 21, 2004, at 1:22 AM, Jason Stajich wrote: > Did you make the directory > /tmp/cache > on your machine? > > The FileCache stuff is overkill depending on what you want to. > > You can also leave it out by just saying: > > my $cachent = $ntdb; > my $cachepep= $pepdb; > > -jason > On Sun, 20 Jun 2004, Marcelino Suzuki wrote: > >> I am trying to run a script for getting CDS out of Genbank by Jason >> Stajich below that I saved as test2.pl, and get the following error >> message, that I believe is caused by my bioperl configuration (I just >> installed bioperl in MacOS X: >> >> ------------- EXCEPTION ------------- >> MSG: Could not open primary index file >> STACK Bio::DB::FileCache::_open_database >> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:321 >> STACK Bio::DB::FileCache::new >> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:127 >> STACK toplevel test2.pl:14 >> >> Does anyone have any idea why I get this error? >> >> Thanks >> >> Marcelino >> >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::GenBank; >> use Bio::DB::GenPept; >> use Bio::DB::FileCache; >> use Bio::Factory::FTLocationFactory; >> use Bio::SeqFeature::Generic; >> >> my $ntdb = new Bio::DB::GenBank; >> my $pepdb= new Bio::DB::GenPept; >> >> # do some caching in the event you're pulling up the same >> # chromosome and/or you are debugging >> my $cachent = new Bio::DB::FileCache(-kept => 1, >> -file => '/tmp/cache/nt.idx', >> -seqdb => $ntdb); >> >> my $cachepep = new Bio::DB::FileCache(-kept => 1, >> -file => '/tmp/cache/pep.idx', >> -seqdb => $pepdb); >> >> # obj to turn strings into Bio::Location object >> my $locfactory = new Bio::Factory::FTLocationFactory; >> >> # you might get these from a file (and they can be accessions too) >> my @protgis = (10956263); >> >> foreach my $gi ( @protgis ) { >> my $protseq = $cachepep->get_Seq_by_id($gi); >> if( ! $protseq ) { print STDERR "could not find a seq for >> gi:$gi\n"; >> next; >> } >> foreach my $cds ( grep { $_->primary_tag eq 'CDS' } >> $protseq->get_SeqFeatures() ) >> { >> next unless( $cds->has_tag('coded_by') ); # skip CDSes with no >> coded_by >> my ($codedby) = $cds->each_tag_value('coded_by'); >> my ($ntacc,$loc) = split(/\:/, $codedby); >> $ntacc =~ s/(\.\d+)//; # genbank wants an accession not a >> versioned one >> my $cdslocation = $locfactory->from_string($loc); >> my $cdsfeature = new Bio::SeqFeature::Generic(-location => >> $cdslocation); >> my $ntseq = $cachent->get_Seq_by_acc($ntacc); >> next unless $ntseq; >> $ntseq->add_SeqFeature($cdsfeature); # locate the feature on a >> seq >> my $cdsseq = $cdsfeature->spliced_seq(); >> print "cds seq is ", $cdsseq->seq(), "\n"; >> } >> } >> >> >> >> ====================================================================== >> == >> ==== >> oOOOOo Marcelino Suzuki, Assistant >> Professor >> oOOO Chesapeake Biological Lab - Univ of >> Maryland >> Center Environm Science >> oOOOOOo. PO Box 38, One Williams St Solomons, MD >> 20688 >> .oOOOOOOOOOo. suzuki@cbl.umces.edu - >> http://cbl.umces.edu >> .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341 >> 0000000000000000000000000000000000000000000000000000000000000000000000 >> 00 >> 0000 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================== ==== oOOOOo Marcelino Suzuki, Assistant Professor oOOO Chesapeake Biological Lab - Univ of Maryland Center Environm Science oOOOOOo. PO Box 38, One Williams St Solomons, MD 20688 .oOOOOOOOOOo. suzuki@cbl.umces.edu - http://cbl.umces.edu .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341 000000000000000000000000000000000000000000000000000000000000000000000000 0000 From jason at cgt.duhs.duke.edu Mon Jun 21 14:34:25 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jun 21 14:36:43 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] tigr to genbank In-Reply-To: <200406211404.51388.jjhaveri@vbi.vt.edu> References: <200406211404.51388.jjhaveri@vbi.vt.edu> Message-ID: Can you show what the format looks like? On Mon, 21 Jun 2004, Jinal Jhaveri wrote: > Can any one give me directions onto how I can convert the tab-formatted tigr > data to genbank formatted data? Does this type of script exists? > > > thanks > --Jinal > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Mon Jun 21 14:39:09 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jun 21 14:41:49 2004 Subject: [Bioperl-l] FileCache.pm error In-Reply-To: <55B5C644-C398-11D8-93E2-0003939E064E@cbl.umces.edu> References: <55B5C644-C398-11D8-93E2-0003939E064E@cbl.umces.edu> Message-ID: You might be wanting to try SearchIO for parsing BLAST but sed and awk will work I guess. To write sequences in genbank format: my $out = Bio::SeqIO->new(-format => 'genbank'); $out->write_seq($cdsseq); If you want to get things in Batch from genbank see Bio::DB::GenBank. -jason On Mon, 21 Jun 2004, Marcelino Suzuki wrote: > Thanks Jason. That worked. > > I have another question. The script works well, but I was wondering > whether I can get the same CDS sequences in genbank format. I was able > to create a html file (using sed and awk) from a blast search > containing links to al 400 such sequences from proteins I am working > with, ie: > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > val=34112904&itemID=36&view=gbwithparts > > and could get each sequence individually using the browser, but is > there a way to batch those requests using bioperl? > > Thanks > > Marcelino > On Jun 21, 2004, at 1:22 AM, Jason Stajich wrote: > > > Did you make the directory > > /tmp/cache > > on your machine? > > > > The FileCache stuff is overkill depending on what you want to. > > > > You can also leave it out by just saying: > > > > my $cachent = $ntdb; > > my $cachepep= $pepdb; > > > > -jason > > On Sun, 20 Jun 2004, Marcelino Suzuki wrote: > > > >> I am trying to run a script for getting CDS out of Genbank by Jason > >> Stajich below that I saved as test2.pl, and get the following error > >> message, that I believe is caused by my bioperl configuration (I just > >> installed bioperl in MacOS X: > >> > >> ------------- EXCEPTION ------------- > >> MSG: Could not open primary index file > >> STACK Bio::DB::FileCache::_open_database > >> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:321 > >> STACK Bio::DB::FileCache::new > >> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:127 > >> STACK toplevel test2.pl:14 > >> > >> Does anyone have any idea why I get this error? > >> > >> Thanks > >> > >> Marcelino > >> > >> > >> #!/usr/bin/perl -w > >> use strict; > >> use Bio::DB::GenBank; > >> use Bio::DB::GenPept; > >> use Bio::DB::FileCache; > >> use Bio::Factory::FTLocationFactory; > >> use Bio::SeqFeature::Generic; > >> > >> my $ntdb = new Bio::DB::GenBank; > >> my $pepdb= new Bio::DB::GenPept; > >> > >> # do some caching in the event you're pulling up the same > >> # chromosome and/or you are debugging > >> my $cachent = new Bio::DB::FileCache(-kept => 1, > >> -file => '/tmp/cache/nt.idx', > >> -seqdb => $ntdb); > >> > >> my $cachepep = new Bio::DB::FileCache(-kept => 1, > >> -file => '/tmp/cache/pep.idx', > >> -seqdb => $pepdb); > >> > >> # obj to turn strings into Bio::Location object > >> my $locfactory = new Bio::Factory::FTLocationFactory; > >> > >> # you might get these from a file (and they can be accessions too) > >> my @protgis = (10956263); > >> > >> foreach my $gi ( @protgis ) { > >> my $protseq = $cachepep->get_Seq_by_id($gi); > >> if( ! $protseq ) { print STDERR "could not find a seq for > >> gi:$gi\n"; > >> next; > >> } > >> foreach my $cds ( grep { $_->primary_tag eq 'CDS' } > >> $protseq->get_SeqFeatures() ) > >> { > >> next unless( $cds->has_tag('coded_by') ); # skip CDSes with no > >> coded_by > >> my ($codedby) = $cds->each_tag_value('coded_by'); > >> my ($ntacc,$loc) = split(/\:/, $codedby); > >> $ntacc =~ s/(\.\d+)//; # genbank wants an accession not a > >> versioned one > >> my $cdslocation = $locfactory->from_string($loc); > >> my $cdsfeature = new Bio::SeqFeature::Generic(-location => > >> $cdslocation); > >> my $ntseq = $cachent->get_Seq_by_acc($ntacc); > >> next unless $ntseq; > >> $ntseq->add_SeqFeature($cdsfeature); # locate the feature on a > >> seq > >> my $cdsseq = $cdsfeature->spliced_seq(); > >> print "cds seq is ", $cdsseq->seq(), "\n"; > >> } > >> } > >> > >> > >> > >> ====================================================================== > >> == > >> ==== > >> oOOOOo Marcelino Suzuki, Assistant > >> Professor > >> oOOO Chesapeake Biological Lab - Univ of > >> Maryland > >> Center Environm Science > >> oOOOOOo. PO Box 38, One Williams St Solomons, MD > >> 20688 > >> .oOOOOOOOOOo. suzuki@cbl.umces.edu - > >> http://cbl.umces.edu > >> .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341 > >> 0000000000000000000000000000000000000000000000000000000000000000000000 > >> 00 > >> 0000 > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > ======================================================================== > ==== > oOOOOo Marcelino Suzuki, Assistant Professor > oOOO Chesapeake Biological Lab - Univ of Maryland > Center Environm Science > oOOOOOo. PO Box 38, One Williams St Solomons, MD 20688 > .oOOOOOOOOOo. suzuki@cbl.umces.edu - > http://cbl.umces.edu > .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341 > 000000000000000000000000000000000000000000000000000000000000000000000000 > 0000 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Mon Jun 21 16:31:37 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jun 21 16:33:48 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] tigr to genbank In-Reply-To: <200406211545.44277.jjhaveri@vbi.vt.edu> References: <200406211404.51388.jjhaveri@vbi.vt.edu> <200406211545.44277.jjhaveri@vbi.vt.edu> Message-ID: Not much in there really fits into the concept of a genbank file. There is no sequence, the rest of the fields I guess you want them to be stored as DBlinks or annotations? If you can map the pieces you want to keep into a seq object you just put them in and then write out the sequence. I'll do the easy one, accession: use Bio::SeqIO; use Bio::Seq; my $seq = Bio::Seq->new(); my $out = Bio::SeqIO->new(-format => 'genbank'); $seq->accession_number($accession); $out->write_seq($seq); -jason On Mon, 21 Jun 2004, Jinal Jhaveri wrote: > TIGR Locus,TIGR Common Name,TIGR Gene Symbol,TIGR Enzyme Commission #,Primary > Locus,Primary Common Name,Primary Gene Symbol,Primary Enzyme Commission #,Pri > mary Annotation Comment,Primary 5' End,Primary Sequence Length,Primary Protein > Length,TIGR 5' End,TIGR 3' End,TIGR Sequence Length,TIGR Protein Length,Main > role,Subrole,SWISS-PROT/TrEMBL Accession,GenBank ID,TIGR MW,TIGR PI,TIGR > GC,Kingdom,Family,Organism Name,DNA molecule,GO Term > > NT02AT0001,conserved hypothetical protein,,,Atu0001,conserved hypothetical > protein,,,product=conserved hypothetical protein note=identified by sequence > sim > ilarity; putative; ORF located using > Blastx/Glimmer,203,822,273,266,1024,759,,Hypothetical > proteins,Conserved,,,27885.89,7.0437,58.62,Bacteria,Proteobacter > ia,Agrobacterium tumefaciens C58 UWash,Circular chromosome A.tumefaciens C58 > UWash, > > ...................... > > > > I am sending you the header and one of the entries. > > thanks > --Jinal > > On Monday 21 June 2004 02:34 pm, you wrote: > > Can you show what the format looks like? > > > > On Mon, 21 Jun 2004, Jinal Jhaveri wrote: > > > Can any one give me directions onto how I can convert the tab-formatted > > > tigr data to genbank formatted data? Does this type of script exists? > > > > > > > > > thanks > > > --Jinal > > > _______________________________________________ > > > Bioperl-guts-l mailing list > > > Bioperl-guts-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From Marc.Logghe at devgen.com Tue Jun 22 03:50:01 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Jun 22 03:52:54 2004 Subject: [Bioperl-l] ontology help Message-ID: Hi all, I am struggling with the Bio::Ontology::* packages and ontologies in general ... Suppose I have 3 ontologies: ONTa, ONTb and ONTa2ONTbMap. The latter is actually only containing relations between terms of the other 2 ontologies (subject terms belong to ONTa, object terms to ONTb) and predicate terms. The 3 Bio::Ontology::Ontology objects are fetched from biosql, by loading their terms and relations. Problem is how do I perform a query using the Bio::Ontology::* API in order to find all the relations in ONTa2ONTbMap to a term from ontology ONTa ? I tried it like this: my ($key) = $ONTa->find_terms(-name => 'primer_bind'); my ($rel) = $ONTa2ONTbMap->find_terms(-name => 'optional_qualifier_for'); my @rels = $ONTa2ONTbMap->get_relationships($key); but then I get an exception: ------------- EXCEPTION ------------- MSG: Found [scalar] where [Bio::Ontology::TermI] expected STACK Bio::Ontology::Relationship::_check_class /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/Relationship.pm:378 STACK Bio::Ontology::Relationship::subject_term /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/Relationship.pm:242 STACK Bio::Ontology::Relationship::new /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/Relationship.pm:162 STACK Bio::Factory::ObjectFactory::create_object /home/marcl/src/bioperl/bioperl-live/Bio/Factory/ObjectFactory.pm:150 STACK Bio::Ontology::SimpleOntologyEngine::get_relationships /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/SimpleOntologyEngine.pm:504 STACK Bio::Ontology::Ontology::get_relationships /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/Ontology.pm:386 STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD /home/marcl/src/bioperl/bioperl-db/Bio/DB/Persistent/PersistentObject.pm:541 STACK toplevel ./validate_feature.pl:22 -------------------------------------- when I change the last line to $key_ont->get_relationships(), an empty list is returned. I am obviously missing something. I am pretty sure that the relations are there (verbositiy while fetching from database, and data dump of the ontology objects). Can somebody shed some light ? Regards, Marc *********************************************************** Marc Logghe, Ph.D. Senior Scientist Scientific Computing Group Devgen nv Technologiepark 9 B - 9052 Ghent-Zwijnaarde Belgium Tel: +32 9 324 24 88 Fax: +32 9 324 24 25 > **** DISCLAIMER ********************************************************** > "This e-mail and any attachments thereto may contain information > which is confidential and/or protected by intellectual property > rights and are intended for the sole use of the recipient(s) named above. > Any use of the information contained herein (including, but not limited to, > total or partial reproduction, communication or distribution in any form) > by persons other than the designated recipient(s) is prohibited. > If you have received this e-mail in error, please notify the sender either > by telephone or by e-mail and delete the material from any computer. > Thank you for your cooperation." > From michael.watson at bbsrc.ac.uk Tue Jun 22 05:28:16 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue Jun 22 05:30:56 2004 Subject: [Bioperl-l] Bio::SeqIO bug Message-ID: <8975119BCD0AC5419D61A9CF1A923E957C2717@iahce2knas1.iah.bbsrc.reserved> Hi I am using bioperl 1.2.3 on linux. When converting from GenBank to EMBL for the RefSeq entry NC_002945, the following conversion occurs for CDS 2450558..2451643: /product="Probable nicotinate-nucleotide-dimethylbenzimidazol phosphoribosyltransferase CobT" To FT roduct="Probablenicotinate-nucleotide-dimethylbenzimidazol FT phosphoribosyltransferase CobT" Note the missing "p" from the FT entry "product", plus the missing spaces in the product text (presumably because "\n"s have been removed, but in this case they should be replaced by spaces) Rather odd?? Mick From sdavis2 at mail.nih.gov Tue Jun 22 06:34:03 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Jun 22 06:44:14 2004 Subject: [Bioperl-l] Bio::Tools::Run::Primer3 question Message-ID: I am using Bio::Tools::Run::Primer3 to find internal oligos. I feed in sequences and get results (code below), but the code reliably dies at the same place (after the same number (about 247) of primer3 runs) regardless of the sequence passed as input. Can anyone shed some light on this behavior? Thanks, Sean ------------- EXCEPTION ------------- MSG: Can't open RESULTS STACK Bio::Tools::Run::Primer3::run /Library/Perl/5.8.1/Bio/Tools/Run/Primer3.pm:359 STACK toplevel junk.pl:56 -------------------------------------- Bio::Root::Root::throw('Bio::Tools::Run::Primer3=HASH(0xc4e01c)','Can\'t open RESULTS') called at /Library/Perl/5.8.1/Bio/Tools/Run/Primer3.pm line 359 Bio::Tools::Run::Primer3::run('Bio::Tools::Run::Primer3=HASH(0xc4e01c)') called at junk.pl line 56 Debugged program terminated. Use q to quit or R to restart, use O inhibit_exit to avoid stopping after program termination, h q, h R or h O to get additional info. ##########TEST CODE HERE######### #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GFF; use Bio::Tools::Run::Primer3; my $seq1=new Bio::Seq; my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'host=localhost;database=gff_hg16'); print "What chromosomes to extract?"; my $chrom=<>; $chrom="chr" . chomp($chrom); print "What start position?"; my $start=<>; chomp($start); print "What end position ?"; my $end=<>; chomp($end); print "How large a segment?"; my $length=<>; chomp($length); print "How far to skip? "; my $skip=<>; chomp($skip); for (my $i=$start;$i<=$end+$length;$i+=($length+$skip)) { my $segment=$db->segment(-name=>$chrom, -start=>$i, -end =>$i+$length); print "> $chrom | Start=$i;End=" . ($i+$length) . "\n"; my $dna=$segment->dna; $seq1->seq($dna); $seq1->primary_id("$chrom | Start=$i;End=" . ($i+$length)); $seq1->accession_number("$chrom | Start=$i;End=" . ($i+$length)); $seq1->id("$chrom | Start=$i;End=" . ($i+$length)); my $primer3 = Bio::Tools::Run::Primer3->new(); $primer3->add_targets(PRIMER_INTERNAL_OLIGO_MIN_SIZE => 65, PRIMER_INTERNAL_OLIGO_MAX_SIZE => 75, PRIMER_INTERNAL_OLIGO_MIN_TM => 73, PRIMER_INTERNAL_OLIGO_MAX_TM => 83, PRIMER_INTERNAL_OLIGO_OPT_TM => 78, PRIMER_INTERNAL_OLIGO_OPT_SIZE => 70, SEQUENCE => $dna, PRIMER_TASK => 'pick_hyb_probe_only' ); my $results; $results=$primer3->run(); if ($results->number_of_results>0) { foreach my $key (keys %{$results->primer_results(0)}) { print "$key\t${$results->all_results}{$key}\n"; } } } From jburdick at gradient.cis.upenn.edu Tue Jun 22 10:05:29 2004 From: jburdick at gradient.cis.upenn.edu (Josh Burdick) Date: Tue Jun 22 10:10:54 2004 Subject: [Bioperl-l] Bio::Tools::Run::Primer3 question In-Reply-To: References: Message-ID: <40D83CA9.2030809@gradient.cis.upenn.edu> Sean Davis wrote: >I am using Bio::Tools::Run::Primer3 to find internal oligos. I feed in >sequences and get results (code below), but the code reliably dies at the >same place (after the same number (about 247) of primer3 runs) regardless of >the sequence passed as input. Can anyone shed some light on this behavior? > > > I ended up writing my own parser (which doesn't use all the inheritance stuff that Bioperl seems to, but doesn't run into this problem either.) If anyone wants to use it, I'm happy to contribute it. I was getting the same problem. It was fairly mystifying. My guess is that it's opening a new input file for each run, and not closing it, and so it runs out of file descriptors. I didn't follow this up, because I couldn't see how files were being opened (the actual opening of files was, I think, in Bio::Root::IO, which says it closes files automatically when an object is destroyed. Perhaps these finalizers weren't being called? This is strictly a guess...) If you have /usr/sbin/lsof on your system, you perhaps could use that to see if the Perl process is using lots of file descriptors. At any rate, I didn't understand the code well enough to fix the problem. >Thanks, >Sean > > > Josh -- Josh Burdick jburdick@gradient.cis.upenn.edu http://www.cis.upenn.edu/~jburdick From fernan at iib.unsam.edu.ar Tue Jun 22 13:21:22 2004 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Tue Jun 22 13:28:43 2004 Subject: [Bioperl-l] Error parsing TIGR xml Message-ID: <20040622172122.GE93342@iib.unsam.edu.ar> Hi! I'm seeing an error while trying to parse a .coordset file from TIGR. It is my first attempt at using this kind of files, so perhaps I'm doing something wrong. Here's my brief script: #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $seqio = Bio::SeqIO->new( -file => $ARGV[0], -format => 'tigr'); Just trying to create a SeqIO object produces the following error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: [2]Required missing STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.6.1/Bio/Root/Root.pm:328 STACK: Bio::SeqIO::tigr::throw /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:1338 STACK: Bio::SeqIO::tigr::_process_assembly /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:522 STACK: Bio::SeqIO::tigr::_process /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:423 STACK: Bio::SeqIO::tigr::_initialize /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:90 STACK: Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:358 STACK: Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:378 STACK: ./tigrxml2features.pl:6 ----------------------------------------------------------- The file does contain ASMBL_IDs, or at least that is what I believe. These are the first lines of the file
1047053397923 Trypanosoma cruzi
MKQSSTDGGGKQKGKDSVSSDSMKDAVTDNPGKPTATTIPTSR SGDAQEKEGKDDGTDERPTSKKHNSSPETGNTNDALTASENTPQTAETTATTVAKKNDTTIGDSDGSTAVSDTASPLLLL FLVVVACAAAAAVVAA*
I've found a mention of a tigrxml by Jason Stajich that was supposed to be different from the SeqIO::tigr by Josh Lauricha. But I don't seem to have it in my system (bioperl-1.4) Thanks in advance, Fernan PS: I'm CCing the author of the tigr.pm module, just in case. -- F e r n a n A g u e r o http://genoma.unsam.edu.ar/~fernan From jason at cgt.duhs.duke.edu Tue Jun 22 13:41:46 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jun 22 13:44:04 2004 Subject: [Bioperl-l] Error parsing TIGR xml In-Reply-To: <20040622172122.GE93342@iib.unsam.edu.ar> References: <20040622172122.GE93342@iib.unsam.edu.ar> Message-ID: I didn't submit tigrxml before - I've just put in CVS although I don't have time to check to see what works/doesn't. Have a look at in CVS. You'll need XML::SAX on your system. http://cvs.open-bio.org/ -jason On Tue, 22 Jun 2004, Fernan Aguero wrote: > Hi! > > I'm seeing an error while trying to parse a .coordset file > from TIGR. It is my first attempt at using this kind of > files, so perhaps I'm doing something wrong. > > Here's my brief script: > > #!/usr/bin/perl -w > > use strict; > use Bio::SeqIO; > > my $seqio = Bio::SeqIO->new( -file => $ARGV[0], -format => 'tigr'); > > Just trying to create a SeqIO object produces the following error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: [2]Required missing > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.6.1/Bio/Root/Root.pm:328 > STACK: Bio::SeqIO::tigr::throw /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:1338 > STACK: Bio::SeqIO::tigr::_process_assembly /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:522 > STACK: Bio::SeqIO::tigr::_process /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:423 > STACK: Bio::SeqIO::tigr::_initialize /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:90 > STACK: Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:358 > STACK: Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:378 > STACK: ./tigrxml2features.pl:6 > ----------------------------------------------------------- > > > The file does contain ASMBL_IDs, or at least that is what I > believe. These are the first lines of the file > > >
> 1047053397923 > Trypanosoma cruzi > > >
> "" ALT_LOCUS = "" COM_NAME = "hypothetical protein" PUB_COMMENT = "" COORDS = "1 > 67-586"> > > MKQSSTDGGGKQKGKDSVSSDSMKDAVTDNPGKPTATTIPTSR > SGDAQEKEGKDDGTDERPTSKKHNSSPETGNTNDALTASENTPQTAETTATTVAKKNDTTIGDSDGSTAVSDTASPLLLL > FLVVVACAAAAAVVAA* > > > > > > >
> > I've found a mention of a tigrxml by Jason Stajich that > was supposed to be different from the SeqIO::tigr by Josh > Lauricha. But I don't seem to have it in my system > (bioperl-1.4) > > > Thanks in advance, > > Fernan > > PS: I'm CCing the author of the tigr.pm module, just in > case. > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From skchan at cs.usask.ca Tue Jun 22 13:54:03 2004 From: skchan at cs.usask.ca (Simon K. Chan) Date: Tue Jun 22 13:56:39 2004 Subject: [Bioperl-l] Error parsing TIGR xml In-Reply-To: <20040622172122.GE93342@iib.unsam.edu.ar> References: <20040622172122.GE93342@iib.unsam.edu.ar> Message-ID: <1087926843.40d8723bbb094@webmail.usask.ca> Hi Fernan, Which DTD are you using? It looks like you have an older version of TIGR XML. You can find the newer TIGR XML DTD here: ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/BACS The code for the tigr.pm module is built to handle the newer format (though I've encountered a few problems myself...), which explains the error message that you are getting. (ie the ASMBL_ID is no longer specified as an attribute in the tag in the newer DTD). There's a TIGR XML parser that you can get from the ftp site, but I believe it can only handle certain features. Check here: ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/tools/TIGR_XML_parser.tar.gz I'm working on something similar as a side project, so let me know if you have other concerns. Cheers, -- Warmest Regards, Simon K. Chan Bioinformatics, Crosby Lab skchan@cs.usask.ca Quoting Fernan Aguero : > Hi! > > I'm seeing an error while trying to parse a .coordset file > from TIGR. It is my first attempt at using this kind of > files, so perhaps I'm doing something wrong. > > Here's my brief script: > > #!/usr/bin/perl -w > > use strict; > use Bio::SeqIO; > > my $seqio = Bio::SeqIO->new( -file => $ARGV[0], -format => 'tigr'); > > Just trying to create a SeqIO object produces the following error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: [2]Required missing > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Root/Root.pm:328 > STACK: Bio::SeqIO::tigr::throw > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:1338 > STACK: Bio::SeqIO::tigr::_process_assembly > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:522 > STACK: Bio::SeqIO::tigr::_process > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:423 > STACK: Bio::SeqIO::tigr::_initialize > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:90 > STACK: Bio::SeqIO::new > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:358 > STACK: Bio::SeqIO::new > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:378 > STACK: ./tigrxml2features.pl:6 > ----------------------------------------------------------- > > > The file does contain ASMBL_IDs, or at least that is what I > believe. These are the first lines of the file > > >
> 1047053397923 > Trypanosoma cruzi > > >
> = > "" ALT_LOCUS = "" COM_NAME = "hypothetical protein" PUB_COMMENT = "" COORDS = > "1 > 67-586"> > "167-586"> > > MKQSSTDGGGKQKGKDSVSSDSMKDAVTDNPGKPTATTIPTSR > SGDAQEKEGKDDGTDERPTSKKHNSSPETGNTNDALTASENTPQTAETTATTVAKKNDTTIGDSDGSTAVSDTASPLLLL > FLVVVACAAAAAVVAA* > > "167-586"/ > > > > > >
> > I've found a mention of a tigrxml by Jason Stajich that > was supposed to be different from the SeqIO::tigr by Josh > Lauricha. But I don't seem to have it in my system > (bioperl-1.4) > > > Thanks in advance, > > Fernan > > PS: I'm CCing the author of the tigr.pm module, just in > case. > > -- > F e r n a n A g u e r o > http://genoma.unsam.edu.ar/~fernan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Joshua-Orvis at ouhsc.edu Tue Jun 22 13:08:30 2004 From: Joshua-Orvis at ouhsc.edu (Orvis, Joshua D. (HSC)) Date: Tue Jun 22 14:43:57 2004 Subject: [Bioperl-l] creating a graphic Message-ID: <4208DEF4C0A61448BF41A5AE3A3FF788093B521D@GEMINI.hsc.net.ou.edu> i'm trying to use bioperl to create a graphical representation of what I have been doing using ASCII so far. i need to create illustrations of how a forward and reverse read have assembled, like this: (part below won't look right without a fixed-width font) 1 153 481 | | | F >--------------------------------> ||||||||||||||||||||| R <----------------------------------------< | | | 524 196 1 |________________________________________________| 675 What sort of glyphs would be best to try to do this? I would like them to have arrows on the ends to indicate the direction of forward and reverse reads. I can use triangles or the "pinsertion" to label the numbered positions, and a span to label the assembled length. Using the biographics examples as a guide I have written the code below, which will do it (sort of), but it feels like i'm missing an easier way to do it. Any advice would be great. Joshua -------------------------------------- #!/usr/bin/perl use strict; use Bio::Graphics::Panel; use Bio::Graphics::Feature; my $ftr = 'Bio::Graphics::Feature'; my $segment = $ftr->new(-start=>-100,-end=>900,-name=>'assembly',-type=>'clone'); my $forward = $ftr->new(-segments=>[[1, 481]], -name=>'forward', -subtype=>'exon',-type=>'transcript'); my $reverse = $ftr->new(-segments=>[[153, 675]], -name=>'reverse', -strand => -1, -subtype=>'exon',-type=>'transcript'); my $panel = Bio::Graphics::Panel->new( -gridcolor => 'lightcyan', -grid => 1, -segment => $segment, -spacing => 15, -width => 600, -pad_top => 20, -pad_bottom => 20, -pad_left => 20, -pad_right=> 20, -key_style => 'between', -image_class=> 'GD' ); my $t = $panel->add_track( transcript2 => [$forward, $reverse], -label => 1, -bump => 1, -key => 'Assembly' ); my $gd = $panel->gd; ## open an output file open (my $ofh, ">arrows.png") || die "can't create output file: $!\n"; print $ofh $gd->png; -------------------------------------- From laurichj at bioinfo.ucr.edu Tue Jun 22 13:35:54 2004 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Tue Jun 22 14:44:03 2004 Subject: [Bioperl-l] Re: Error parsing TIGR xml In-Reply-To: <20040622172122.GE93342@iib.unsam.edu.ar> References: <20040622172122.GE93342@iib.unsam.edu.ar> Message-ID: <9D1D3AD8-C472-11D8-B7C5-000A95BBDAD2@bioinfo.ucr.edu> The TIGR parser in bioperl 1.4 doesn't parse the coordset files, it parses the full-fledged TIGR xml releases. Jason wrote an unpublished parser for the coordsets, which while similar files are different enough to really need a different parser. One thing, is that IIRC, the coordsets do not contain any sequence data, so you'll need to also lookup the actual sequence. On Jun 22, 2004, at 10:21 AM, Fernan Aguero wrote: > Hi! > > I'm seeing an error while trying to parse a .coordset file > from TIGR. It is my first attempt at using this kind of > files, so perhaps I'm doing something wrong. > > Here's my brief script: > > #!/usr/bin/perl -w > > use strict; > use Bio::SeqIO; > > my $seqio = Bio::SeqIO->new( -file => $ARGV[0], -format => 'tigr'); > > Just trying to create a SeqIO object produces the following error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: [2]Required missing > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Root/Root.pm:328 > STACK: Bio::SeqIO::tigr::throw > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:1338 > STACK: Bio::SeqIO::tigr::_process_assembly > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:522 > STACK: Bio::SeqIO::tigr::_process > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:423 > STACK: Bio::SeqIO::tigr::_initialize > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:90 > STACK: Bio::SeqIO::new > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:358 > STACK: Bio::SeqIO::new > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:378 > STACK: ./tigrxml2features.pl:6 > ----------------------------------------------------------- > > > The file does contain ASMBL_IDs, or at least that is what I > believe. These are the first lines of the file > > >
> 1047053397923 > Trypanosoma cruzi > > >
> PUB_LOCUS = > "" ALT_LOCUS = "" COM_NAME = "hypothetical protein" PUB_COMMENT = "" > COORDS = "1 > 67-586"> > "167-586"> > > MKQSSTDGGGKQKGKDSVSSDSMKDAVTDNPGKPTATTIPTSR > SGDAQEKEGKDDGTDERPTSKKHNSSPETGNTNDALTASENTPQTAETTATTVAKKNDTTIGDSDGSTAVS > DTASPLLLL > FLVVVACAAAAAVVAA* > "167-586"> > "167-586"/ >> > > > >
> > I've found a mention of a tigrxml by Jason Stajich that > was supposed to be different from the SeqIO::tigr by Josh > Lauricha. But I don't seem to have it in my system > (bioperl-1.4) > > > Thanks in advance, > > Fernan > > PS: I'm CCing the author of the tigr.pm module, just in > case. > > -- > F e r n a n A g u e r o > http://genoma.unsam.edu.ar/~fernan > Josh Lauricha laurichj@bioinfo.ucr.edu OpenPGP: 5A0D 92D3 D093 79DE F724 1137 6DF1 B5EB D9CE AAA8 Josh Lauricha laurichj@bioinfo.ucr.edu OpenPGP: 5A0D 92D3 D093 79DE F724 1137 6DF1 B5EB D9CE AAA8 -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 486 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20040622/ebe77cc8/PGP-0001.bin From bmb9jrm at bmb.leeds.ac.uk Tue Jun 22 15:43:43 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Tue Jun 22 15:46:13 2004 Subject: [Bioperl-l] bioperl Vista module Message-ID: <1087933423.6981.14.camel@localhost.localdomain> Hi Shawn, I haven't submitted this to the bioperl bugs list, as I'm not completely sure what I'm doing, and whether I'm right! However, I believe the following line should be inserted into your Vista.pm module at around line 399, in order to enable the display of SNPs (at least it made it work for me): print $tfh1 "SNPS_FILE"." ".$self->snps_file."\n\n" if $self->snps_file; Thanks- and sorry if you already know.... Jon From catchen at cs.uoregon.edu Tue Jun 22 19:23:29 2004 From: catchen at cs.uoregon.edu (Julian M Catchen) Date: Tue Jun 22 19:26:07 2004 Subject: [Bioperl-l] Getting clustalw alignments in a form codeml can use Message-ID: <20040622232329.GC3102@topeka.cs.uoregon.edu> Hello, Can someone help me with the following problem? I am aligning sequences using clustalw and then storing the text strings from the alignment individually in a database. I want to then pull these sequences from the database and feed them to PAML::codeml, but I can't figure out what object to load them into that codeml will understand. I can see how to add them to Bio::Seq objects, but what do I do with them after that? It appears to me that codeml requires a Bio::AlignI object in order to execute. I have previously written the clustal alignments out to a file and then used an Bio::AlignIO object to read them in and give me my Bio::AlignI object for codeml, but I would like to do this entirely within the database and I would prefer not dumping a verbatim clustal file into the database. Anyone have any ideas? Thanks in advance for any help, julian -- Julian M Catchen Computer and Information Science | catchen@cs.uoregon.edu 229 Deschutes Hall | (541) 346-1382 University of Oregon | http://www.cs.uoregon.edu/~catchen/ From michael.watson at bbsrc.ac.uk Wed Jun 23 07:03:12 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed Jun 23 07:05:48 2004 Subject: [Bioperl-l] SearchIO error Message-ID: <8975119BCD0AC5419D61A9CF1A923E957C2732@iahce2knas1.iah.bbsrc.reserved> Hi I am using bioperl-1.4 now. I seem to get an error when using SearchIO. Perhaps I am not using it correctly? My script is: my $fh = new IO::File; $fh->open("/usr/bin/blastall -p blastp -i test.fasta -d 10287/set2 |"); my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh); my $result = $searchio->next_result; my $search_out = Bio::SearchIO->new(-format => 'blast', -file => ">test_out.blast"); $search_out->write_result($result); $fh->close; The error I get is: -------------------- WARNING --------------------- MSG: Writer not defined. Using a Bio::Search::Writer::HitTableWriter --------------------------------------------------- Can't locate object method "new" via package "Bio::Search::Writer::HitTableWriter" (perhaps you forgot to load "Bio::Search::Writer::HitTableWriter"?) at /usr/local/bioperl-1.4/Bio/SearchIO/blast.pm line 1493, line 397. Can anyone be of help? Thanks Mick From jason at cgt.duhs.duke.edu Wed Jun 23 09:14:30 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jun 23 09:16:45 2004 Subject: [Bioperl-l] Getting clustalw alignments in a form codeml can use In-Reply-To: <20040622232329.GC3102@topeka.cs.uoregon.edu> References: <20040622232329.GC3102@topeka.cs.uoregon.edu> Message-ID: Build a simple align - that is what AlignIO gives you back. You need to build one from the data you got back from the database. my $simplealn = Bio::SimpleAlign->new(); my $i=0; for my $seq ( @seqstringfromdatabase ) { my $lseq = Bio::LocatableSeq->new(-seq => $seq, -display_id => $yourids[$i++]); $simplealn->add_seq($lseq); } pass the $simplealn to PAML. -j On Tue, 22 Jun 2004, Julian M Catchen wrote: > Hello, > > Can someone help me with the following problem? I am aligning sequences using > clustalw and then storing the text strings from the alignment individually in > a database. I want to then pull these sequences from the database and feed > them to PAML::codeml, but I can't figure out what object to load them into > that codeml will understand. I can see how to add them to Bio::Seq objects, > but what do I do with them after that? It appears to me that codeml requires > a Bio::AlignI object in order to execute. > > I have previously written the clustal alignments out to a file and then used > an Bio::AlignIO object to read them in and give me my Bio::AlignI object for > codeml, but I would like to do this entirely within the database and I would > prefer not dumping a verbatim clustal file into the database. Anyone have any > ideas? > > Thanks in advance for any help, > > julian > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Wed Jun 23 09:16:34 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jun 23 09:18:39 2004 Subject: [Bioperl-l] SearchIO error In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E957C2732@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E957C2732@iahce2knas1.iah.bbsrc.reserved> Message-ID: More correctly - what do you want to see in your out.blast? a table? the blast report? If you read the docs for how to use a hit writer you'll see it isn't as parallel to SeqIO type ways - you need to give SearchIO a writer object when you initialize it for writing. But if you are just re-writing out a blast file why parse it with SearchIO in the first place? -jason On Wed, 23 Jun 2004, michael watson (IAH-C) wrote: > Hi > > I am using bioperl-1.4 now. I seem to get an error when using SearchIO. > Perhaps I am not using it correctly? My script is: > > my $fh = new IO::File; > $fh->open("/usr/bin/blastall -p blastp -i test.fasta -d 10287/set2 |"); > > my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh); > my $result = $searchio->next_result; > > my $search_out = Bio::SearchIO->new(-format => 'blast', -file => > ">test_out.blast"); > $search_out->write_result($result); > > $fh->close; > > The error I get is: > > -------------------- WARNING --------------------- > MSG: Writer not defined. Using a Bio::Search::Writer::HitTableWriter > --------------------------------------------------- > Can't locate object method "new" via package > "Bio::Search::Writer::HitTableWriter" (perhaps you forgot to load > "Bio::Search::Writer::HitTableWriter"?) at > /usr/local/bioperl-1.4/Bio/SearchIO/blast.pm line 1493, line 397. > > Can anyone be of help? > > Thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From brian_osborne at cognia.com Wed Jun 23 10:43:47 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Jun 23 10:46:22 2004 Subject: [Bioperl-l] Bio::SeqIO bug In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E957C2717@iahce2knas1.iah.bbsrc.reserved> Message-ID: Michael, You should enter this as a bug so we don't loose track of it, I confirmed that this is not fixed in version 1.4. http://bugzilla.bioperl.org/. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of michael watson (IAH-C) Sent: Tuesday, June 22, 2004 5:28 AM To: Bioperl Subject: [Bioperl-l] Bio::SeqIO bug Hi I am using bioperl 1.2.3 on linux. When converting from GenBank to EMBL for the RefSeq entry NC_002945, the following conversion occurs for CDS 2450558..2451643: /product="Probable nicotinate-nucleotide-dimethylbenzimidazol phosphoribosyltransferase CobT" To FT roduct="Probablenicotinate-nucleotide-dimethylbenzimidazol FT phosphoribosyltransferase CobT" Note the missing "p" from the FT entry "product", plus the missing spaces in the product text (presumably because "\n"s have been removed, but in this case they should be replaced by spaces) Rather odd?? Mick _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Jun 23 11:23:06 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jun 23 11:25:46 2004 Subject: [Bioperl-l] Bio::SeqIO bug In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E957C2717@iahce2knas1.iah.bbsrc.reserved> Message-ID: <3A16ACEA-C529-11D8-912C-000A959EB4C4@gmx.net> This may have been fixed in 1.4.x. Did you try a more recent release? -hilmar On Tuesday, June 22, 2004, at 11:28 AM, michael watson (IAH-C) wrote: > Hi > > I am using bioperl 1.2.3 on linux. > > When converting from GenBank to EMBL for the RefSeq entry NC_002945, > the > following conversion occurs for CDS 2450558..2451643: > > /product="Probable > nicotinate-nucleotide-dimethylbenzimidazol > phosphoribosyltransferase CobT" > > To > > FT > roduct="Probablenicotinate-nucleotide-dimethylbenzimidazol > FT phosphoribosyltransferase CobT" > > Note the missing "p" from the FT entry "product", plus the missing > spaces in the product text (presumably because "\n"s have been removed, > but in this case they should be replaced by spaces) > > Rather odd?? > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From brian_osborne at cognia.com Wed Jun 23 11:56:31 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Jun 23 11:59:37 2004 Subject: [Bioperl-l] Bio::SeqIO bug In-Reply-To: <3A16ACEA-C529-11D8-912C-000A959EB4C4@gmx.net> Message-ID: Hilmar and Michael, Using the latest code, bioperl-live, the pertinent EMBL product is: FT /product="Probablenicotinate-nucleotide- FT dimethylbenzimidazol phosphoribosyltransferase CobT" You can see that the return has not been replaced by a space, as Michael said. However the "p" is no longer missing. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp Sent: Wednesday, June 23, 2004 11:23 AM To: michael watson (IAH-C) Cc: Bioperl Subject: Re: [Bioperl-l] Bio::SeqIO bug This may have been fixed in 1.4.x. Did you try a more recent release? -hilmar On Tuesday, June 22, 2004, at 11:28 AM, michael watson (IAH-C) wrote: > Hi > > I am using bioperl 1.2.3 on linux. > > When converting from GenBank to EMBL for the RefSeq entry NC_002945, > the > following conversion occurs for CDS 2450558..2451643: > > /product="Probable > nicotinate-nucleotide-dimethylbenzimidazol > phosphoribosyltransferase CobT" > > To > > FT > roduct="Probablenicotinate-nucleotide-dimethylbenzimidazol > FT phosphoribosyltransferase CobT" > > Note the missing "p" from the FT entry "product", plus the missing > spaces in the product text (presumably because "\n"s have been removed, > but in this case they should be replaced by spaces) > > Rather odd?? > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Jun 23 12:09:06 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jun 23 12:11:50 2004 Subject: [Bioperl-l] Bio::SeqIO bug In-Reply-To: Message-ID: Just to confirm, is this using the genbank parser as reader and the embl parser as writer? Brian/Michael, has either of you checked already whether the value correctly comes out of the genbank reader (i.e., writer is at fault), or whether the genbank parser reads it wrong already? My bet is it's the writer, but I can't investigate right now. (Note BTW that we can't just replace every linebreak with a space; if the line is broken after a non-word character (comma, dash, etc) then we need to just concatenate.) -hilmar On Wednesday, June 23, 2004, at 05:56 PM, Brian Osborne wrote: > Hilmar and Michael, > > Using the latest code, bioperl-live, the pertinent EMBL product is: > > FT /product="Probablenicotinate-nucleotide- > FT dimethylbenzimidazol phosphoribosyltransferase CobT" > > You can see that the return has not been replaced by a space, as > Michael > said. However the "p" is no longer missing. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp > Sent: Wednesday, June 23, 2004 11:23 AM > To: michael watson (IAH-C) > Cc: Bioperl > Subject: Re: [Bioperl-l] Bio::SeqIO bug > > This may have been fixed in 1.4.x. Did you try a more recent release? > -hilmar > > On Tuesday, June 22, 2004, at 11:28 AM, michael watson (IAH-C) wrote: > >> Hi >> >> I am using bioperl 1.2.3 on linux. >> >> When converting from GenBank to EMBL for the RefSeq entry NC_002945, >> the >> following conversion occurs for CDS 2450558..2451643: >> >> /product="Probable >> nicotinate-nucleotide-dimethylbenzimidazol >> phosphoribosyltransferase CobT" >> >> To >> >> FT >> roduct="Probablenicotinate-nucleotide-dimethylbenzimidazol >> FT phosphoribosyltransferase CobT" >> >> Note the missing "p" from the FT entry "product", plus the missing >> spaces in the product text (presumably because "\n"s have been >> removed, >> but in this case they should be replaced by spaces) >> >> Rather odd?? >> >> Mick >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Jun 23 12:02:46 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jun 23 12:11:56 2004 Subject: [Bioperl-l] ontology help In-Reply-To: Message-ID: Problem #1 is that SimpleOntologyEngine.pm is not used by the dag-edit flat file parsers and therefore is maintained less well thanf the alternative engine implementation SimpleGOEngine.pm, yet it is the default engine if you create a new Ontology instance yourself. We could change the default, but SimpleGOEngine.pm depends on Graph.pm being installed while SimpleOntologyEngine has no external dependencies. We could also check at runtime whether Graph.pm is available and only then switch the default. I'd appreciate some comments from people, especially those who use the Bio::Ontology system, as to what people would consider the best default behavior. As for what's causing the error, it is most likely the following call at line# 503. $relfact->create_object(-object_term => $parent_term, -subject_term => $self->get_term_by_identifier( $child_id), -predicate_term => $rel_info, -ontology =>$parent_term->ontology ); get_term_by_identifier() returns an array, not a scalar, and an array in scalar context evaluates to the number of elements (a scalar, not a term object). I've fixed this in cvs HEAD. Let me know if this solves the problem. -hilmar On Tuesday, June 22, 2004, at 09:50 AM, Marc Logghe wrote: > Hi all, > I am struggling with the Bio::Ontology::* packages and ontologies in > general ... > Suppose I have 3 ontologies: ONTa, ONTb and ONTa2ONTbMap. The latter > is actually only containing relations between terms of the other 2 > ontologies (subject terms belong to ONTa, object terms to ONTb) and > predicate terms. The 3 Bio::Ontology::Ontology objects are fetched > from biosql, by loading their terms and relations. > Problem is how do I perform a query using the Bio::Ontology::* API in > order to find all the relations in ONTa2ONTbMap to a term from > ontology ONTa ? > I tried it like this: > my ($key) = $ONTa->find_terms(-name => 'primer_bind'); > my ($rel) = $ONTa2ONTbMap->find_terms(-name => > 'optional_qualifier_for'); > my @rels = $ONTa2ONTbMap->get_relationships($key); > > but then I get an exception: > ------------- EXCEPTION ------------- > MSG: Found [scalar] where [Bio::Ontology::TermI] expected > STACK Bio::Ontology::Relationship::_check_class > /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/Relationship.pm:378 > STACK Bio::Ontology::Relationship::subject_term > /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/Relationship.pm:242 > STACK Bio::Ontology::Relationship::new > /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/Relationship.pm:162 > STACK Bio::Factory::ObjectFactory::create_object > /home/marcl/src/bioperl/bioperl-live/Bio/Factory/ObjectFactory.pm:150 > STACK Bio::Ontology::SimpleOntologyEngine::get_relationships > /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/ > SimpleOntologyEngine.pm:504 > STACK Bio::Ontology::Ontology::get_relationships > /home/marcl/src/bioperl/bioperl-live/Bio/Ontology/Ontology.pm:386 > STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD > /home/marcl/src/bioperl/bioperl-db/Bio/DB/Persistent/ > PersistentObject.pm:541 > STACK toplevel ./validate_feature.pl:22 > > -------------------------------------- > > when I change the last line to $key_ont->get_relationships(), an empty > list is returned. > I am obviously missing something. I am pretty sure that the relations > are there (verbositiy while fetching from database, and data dump of > the ontology objects). > Can somebody shed some light ? > Regards, > Marc > > > > > > > > > > *********************************************************** > Marc Logghe, Ph.D. > Senior Scientist > Scientific Computing Group > Devgen nv > Technologiepark 9 > B - 9052 Ghent-Zwijnaarde > Belgium > Tel: +32 9 324 24 88 > Fax: +32 9 324 24 25 > >> **** DISCLAIMER >> ********************************************************** >> "This e-mail and any attachments thereto may contain information >> which is confidential and/or protected by intellectual property >> rights and are intended for the sole use of the recipient(s) named >> above. >> Any use of the information contained herein (including, but not >> limited to, >> total or partial reproduction, communication or distribution in any >> form) >> by persons other than the designated recipient(s) is prohibited. >> If you have received this e-mail in error, please notify the sender >> either >> by telephone or by e-mail and delete the material from any computer. >> Thank you for your cooperation." >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From bmb9jrm at bmb.leeds.ac.uk Wed Jun 23 13:47:45 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Wed Jun 23 13:52:00 2004 Subject: [Bioperl-l] Human genome BLAST limit? Message-ID: <1088012865.6755.5.camel@localhost.localdomain> Hi all, Is there a limit to the length of sequence submitted to genome Blast via the bioperl module? I've tried to submit AY623118 to Blast via a script, but I just get an error message. The same sequence submitted via the web works fine. Also, different sequences submitted via the script (e.g. BC013293) work fine, as does the same query to the mouse genome page (via the script). AY623118 is around 61 kbp in length, compared to only around 1200 for BC013293; so I'm thinking that could be it. Any ideas? Thanks, Jon From facemann at yahoo.com Wed Jun 23 18:49:42 2004 From: facemann at yahoo.com (Andy Hammer) Date: Wed Jun 23 18:52:14 2004 Subject: [Bioperl-l] Cannot connect to local ODBA Message-ID: <20040623224942.51329.qmail@web13424.mail.yahoo.com> I have altered the sample code from the ODBA HOWTO as follows: #!/usr/bin/perl #use strict; use Bio::Perl; use Bio::DB::Registry; $registry = Bio::DB::Registry->new; $db = $registry->get_database('ncbi'); $seq = $db->get_Seq_by_acc('NM_000367'); print $seq->seq,"\n"; my seqdatabase.ini contains: [ncbi] protocol=biosql location=localhost dbname=biosql driver=Pg port= user= pass=postgres biodbname=ncbi But I get the error: Can't call method "seq" on an undefined value at ./testdb.pl line 10, line 20. I don't think it is connecting to my local db. The sample code that points to embl works fine. My local biosql database is called ncbi and it loaded fine using load_seqdatabase.pl provided with bioperl-db. Any ideas? __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail From Marc.Logghe at devgen.com Thu Jun 24 04:05:45 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Jun 24 04:08:34 2004 Subject: [Bioperl-l] ontology help Message-ID: Hi Hilmar ! > As for what's causing the error, it is most likely the > following call > at line# 503. > > $relfact->create_object(-object_term > => $parent_term, > -subject_term => > > $self->get_term_by_identifier( > > $child_id), > -predicate_term > => $rel_info, > -ontology > =>$parent_term->ontology > ); > > get_term_by_identifier() returns an array, not a scalar, and > an array > in scalar context evaluates to the number of elements (a > scalar, not a > term object). I've fixed this in cvs HEAD. Let me know if > this solves > the problem Correct diagnosis ! At least I don't get the exception anymore. But it is still not working properly for me (but I am still not sure whether I've set it all up in the correct way). my @rels = $ONTa2ONTbMap->get_relationships($key) return me all the relations but when I want to print them like such: foreach my $rel (@rels) { printf "%s %s %s\n", eval{$rel->subject_term->name} || 'unknown subject', $rel->predicate_term->name, $rel->object_term->name; } The subject_term object seems to be undef in all cases. I think this is due to the fact that the terms actually do not belong to the ONTa2ONTbMap ontology. The subject terms are probably not in the store, while the object terms are. The only two instatiated terms in this ontology are 2 predicate terms. Another observations: when $ONTa2ONTbMap->get_relationships() is run (without arguments) no relationship objects are returned. Marc From sadeq.zougari at silogic.fr Wed Jun 23 11:54:10 2004 From: sadeq.zougari at silogic.fr (Sadeq Zougari) Date: Thu Jun 24 09:23:13 2004 Subject: [Bioperl-l] Help RemoteBlastError Message-ID: <052001c4593a$53161d70$a102a8c0@EV161> Skipped content of type multipart/alternative From brian_osborne at cognia.com Thu Jun 24 09:21:39 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jun 24 09:24:22 2004 Subject: [Bioperl-l] Cannot connect to local ODBA In-Reply-To: <20040623224942.51329.qmail@web13424.mail.yahoo.com> Message-ID: Andy, I believe you've uncovered a sad state of affairs. All protocols in the OBDA system need to be able to implement a new_from_registry method. The module that used to do this for the biosql protocol was Bio::DB::BioSQL::BioDatabaseAdaptor, which no longer exists. This module uses other modules which also no longer exist, I suspect that simply moving these older modules back into the Perl directory would not solve the problem. So at the moment I'd have to say that OBDA does not support the biosql protocol, someone please correct me if I'm wrong about this. If this is the case then I'll change the documentation to indicate that the biosql protocol is not supported. I could take a shot at getting it running again but someone first must confirm if my suspicion is correct, or point me in the right direction. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Andy Hammer Sent: Wednesday, June 23, 2004 6:50 PM To: bioperl Subject: [Bioperl-l] Cannot connect to local ODBA I have altered the sample code from the ODBA HOWTO as follows: #!/usr/bin/perl #use strict; use Bio::Perl; use Bio::DB::Registry; $registry = Bio::DB::Registry->new; $db = $registry->get_database('ncbi'); $seq = $db->get_Seq_by_acc('NM_000367'); print $seq->seq,"\n"; my seqdatabase.ini contains: [ncbi] protocol=biosql location=localhost dbname=biosql driver=Pg port= user= pass=postgres biodbname=ncbi But I get the error: Can't call method "seq" on an undefined value at ./testdb.pl line 10, line 20. I don't think it is connecting to my local db. The sample code that points to embl works fine. My local biosql database is called ncbi and it loaded fine using load_seqdatabase.pl provided with bioperl-db. Any ideas? __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From mjohnson at watson.wustl.edu Thu Jun 24 12:13:07 2004 From: mjohnson at watson.wustl.edu (Mark Johnson) Date: Thu Jun 24 12:15:39 2004 Subject: [Bioperl-l] Best Practices for Downloading/Mirroring Genbank In-Reply-To: References: Message-ID: <43624.10.0.1.216.1088093587.squirrel@watson.wustl.edu> Rsync is your friend. Both NCBI and Biomirror are rsync friendly. You can use rsync to maintain a local copy of whatever parts of the NCBI ftp site you'd like. Then you can be assured that after the rsync finishes you have a consistent local snapshot (as long as you didn't rsync in the middle of a file update on the other end). It will even minimize your bandwidth consumption...on subsequent invocations it will only transfer files you don't have, or changes to files you do have. > I'm working on setting up a local mirror of Genbank here at work and am > unsure of what the best way to go about it is. > > I started off real simple with a wget -m ftp://genbank.sdsc.edu/pub (Yes, > I > wanted the BLAST formatted databses and executables as well) and the > transfer is going just fine, albeit excruciatingly slow at times. > > But what happens: > 1) between now and the next build?; > 2) if I coose to mirror from an alternate source?; > 3) after the next build? > > For the first part, I just planned on doing daily wgets for the updates, > and > the possibility occurred to me that if I miss the last couple days worth > of > updates before the new build, those updates get shuffled into the main > build files and I have to download the whole thing again? > > For the second, If I choose to mirror from Biomirror or NCBI instead of > San > Diego, those timestamps seem to be different for what I am assuming to be > the same build. For example, > > gbest1.seq.gz 19,454,020 bytes 5/22/04 5:04am SDSC Mirror > 19,454,020 bytes 4/25/04 2:01am NCBI Mirror > 19,454,020 bytes 4/25/04 2:01am BioMirror > > For the third part, do the build files really change or are new entries > and > revisions just added on as extra build files? I read that the files are > non-cumulative, so that would seem to confirm it, but the timestamps are > updated in sync with the latest build date. > > How do I keep an updated mirror without losing daily builds or having to > download the whole thing every couple of months. How do I verify that I > do > have the latest data, because checking timestamps does not seem like it > will > work? Should I even bother with creating a true mirror? > > I ran across this recent thesis on some of the issues in maintaining these > types of databases accurately while minimizing file transfers > http://if.anu.edu.au/Students/DamonSearle-2003-thesis.pdf > > I know that Biomirror has some scripts to facilitate efficient transfers > but > do they handle updates. I'm guessing this problem has already been > addressed, I just can't find the solution. > > Thanks in advance for any input, > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Joseph Karalius > RA, Bioinformatics > Molecular Markers and Applied Genomics > Seminis Vegetable Seeds, Inc > 37437 State Highway 16 > Woodland, CA 95695-9353 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From nigam at psu.edu Thu Jun 24 12:47:59 2004 From: nigam at psu.edu (Nigam Shah) Date: Thu Jun 24 12:50:28 2004 Subject: [Bioperl-l] Using PSM to find TF site Message-ID: <000501c45a0b$020af870$34167680@Vivek> Hi!, I want to use Bio::Matrix::PSM::SiteMatrixI and a PSM to scan a promoter and see if there is a 'hit' for a binding site. From the documentation it seems that the way to do it is to get the consensus sequence and then do a regular expression match. Is there another way of doing it? One that would also give me a 'quality-of-hit' score of some kind for the site that was found in the promoter? Regards, Nigam. Nigam Shah Graduate Fellow, Huck Institute for Life Sciences, Penn State University. Ph:(814)863-5720 Web: www.personal.psu.edu/nhs109/ From jason at cgt.duhs.duke.edu Thu Jun 24 12:57:46 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jun 24 13:00:27 2004 Subject: [Bioperl-l] Using PSM to find TF site In-Reply-To: <000501c45a0b$020af870$34167680@Vivek> References: <000501c45a0b$020af870$34167680@Vivek> Message-ID: TFBS will let you scan. http://forkhead.cgb.ki.se/TFBS/ On Thu, 24 Jun 2004, Nigam Shah wrote: > Hi!, > > I want to use Bio::Matrix::PSM::SiteMatrixI and a PSM to scan a promoter and > see if there is a 'hit' for a binding site. From the documentation it seems > that the way to do it is to get the consensus sequence and then do a regular > expression match. > > Is there another way of doing it? One that would also give me a > 'quality-of-hit' score of some kind for the site that was found in the > promoter? > > Regards, > Nigam. > > Nigam Shah > Graduate Fellow, > Huck Institute for Life Sciences, > Penn State University. > Ph:(814)863-5720 > Web: www.personal.psu.edu/nhs109/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From bmb9jrm at bmb.leeds.ac.uk Fri Jun 25 09:36:19 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Fri Jun 25 09:38:50 2004 Subject: [Bioperl-l] Phlip Message-ID: <1088170579.6122.77.camel@localhost.localdomain> Hi all, Was hoping Jason or someone else could nudge me in the right direction with Phylip. I was having trouble with the suite and tree drawing, and think I've narrowed it down to Neighbour, since the output tree has 0 'total branch length'. I believe Protdist is creating the matrix correctly, since get_column returns a sensible-looking list of values. My snippet of code is: my ($matrix) = $protdist->run($alnmnt); print "Number of rows in matrix: ", $matrix->num_rows(), "\n"; my @n_params = ( 'type' => 'NJ', 'outgroup' => 2, 'lowtri' => 1, 'upptri' => 1, 'subrep' => 1 ); my $neighbor = Bio::Tools::Run::Phylo::Phylip::Neighbor->new(@n_params); my ($tree) = $neighbor->run($matrix); print $tree, "\n"; print "Total branch length: ", $tree->total_branch_length(), "\n"; The documentation actually states that a matrix argument should be a hash reference, but if I create a reference and pass it in like: my $matrixref = \$matrix; I just get this message: Can't call method "isa" on unblessed reference at /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/Neighbor.pm line 458. Would be very grateful for any help anyone could offer! Jon From hlapp at gmx.net Fri Jun 25 10:40:01 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jun 25 10:42:28 2004 Subject: [Bioperl-l] Cannot connect to local ODBA In-Reply-To: Message-ID: <8A4AE77C-C6B5-11D8-A52B-000A959EB4C4@gmx.net> I thought that support was maintained at the Singapore hackathon, but it may have been wishful thinking. So maybe I need to take a look at some point on how to support the biosql protocol in obda. Of course, everybody willing to take a shot at this would be greatly welcome. (And no, you can't just resuscitate the former BiodatabaseAdaptor.) -hilmar On Thursday, June 24, 2004, at 03:21 PM, Brian Osborne wrote: > So at the moment I'd have to say that OBDA does not support the biosql > protocol, someone please correct me if I'm wrong about this. If this > is the > case then I'll change the documentation to indicate that the biosql > protocol > is not supported. I could take a shot at getting it running again but > someone first must confirm if my suspicion is correct, or point me in > the > right direction. > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri Jun 25 10:49:19 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jun 25 10:51:45 2004 Subject: [Bioperl-l] ontology help In-Reply-To: Message-ID: I believe the current implementation of delegation from the Bio::Ontology::Ontology methods to their counterparts in the currently used engine implementation expects that only terms and rel.ships from the same ontology will be asked for. The original rationale behind this was to make it possible to use one and the same engine instance to back multiple OntologyI instances. If you did this, you would need to filter out terms from other ontologies co-using the engine instance. At hindsight I think this is probably over-engineered and trying to solve a non-existing problem. So, we could as well demand that one engine instance only serve one ontology instance and you're on your own if you do otherwise. This would then allow us to remove the post-filtering code that filters hits returned from the engine. You could try and see whether that will solve your problem as well. -hilmar On Thursday, June 24, 2004, at 10:05 AM, Marc Logghe wrote: > Hi Hilmar ! > >> As for what's causing the error, it is most likely the >> following call >> at line# 503. >> >> $relfact->create_object(-object_term >> => $parent_term, >> -subject_term => >> >> $self->get_term_by_identifier( >> >> $child_id), >> -predicate_term >> => $rel_info, >> -ontology >> =>$parent_term->ontology >> ); >> >> get_term_by_identifier() returns an array, not a scalar, and >> an array >> in scalar context evaluates to the number of elements (a >> scalar, not a >> term object). I've fixed this in cvs HEAD. Let me know if >> this solves >> the problem > Correct diagnosis ! > At least I don't get the exception anymore. But it is still not > working properly for me (but I am still not sure whether I've set it > all up in the correct way). > my @rels = $ONTa2ONTbMap->get_relationships($key) > return me all the relations but when I want to print them like such: > foreach my $rel (@rels) > { > printf "%s %s %s\n", > eval{$rel->subject_term->name} || 'unknown subject', > $rel->predicate_term->name, > $rel->object_term->name; > } > > The subject_term object seems to be undef in all cases. I think this > is due to the fact that the terms actually do not belong to the > ONTa2ONTbMap ontology. The subject terms are probably not in the > store, while the object terms are. The only two instatiated terms in > this ontology are 2 predicate terms. > Another observations: when $ONTa2ONTbMap->get_relationships() is run > (without arguments) no relationship objects are returned. > Marc > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Laure.Durufle at serono.com Fri Jun 25 06:05:39 2004 From: Laure.Durufle at serono.com (Laure.Durufle@serono.com) Date: Fri Jun 25 11:58:41 2004 Subject: [Bioperl-l] pir.pm => bug Message-ID: Hi, I moved the package pir.pm / we give the file and with pir.pm we can parse this file pir*.dat : like this format : P R O T E I N S E Q U E N C E D A T A B A S E of PIR-International Section 1. Fully Classified Entries Release 79.01, April 04, 2004 20685 sequences, 8103841 residues Protein Information Resource (PIR)* National Biomedical Research Foundation 3900 Reservoir Road, N.W., Washington, DC 20007, USA Japan International Protein Munich Information Center for Information Database (JIPID) Protein Sequences (MIPS) Amakubo 1-16-1 GSF-Forschungszentrum f. Umwelt und Gesundheit Tsukuba 305-0005, Japan am Max-Planck-Instut f. Biochemie Am Klopferspitz 18, D-82152 Martinsried, FRG This database may be redistributed without prior consent, provided that this notice be given to each user and that the words "Derived from" shall precede this notice if the database has been altered by the redistributor. Copyright 2000, PIR-International. *PIR is a registered mark of NBRF. \\\ ENTRY A27187 #type complete TITLE ubiquinol-cytochrome-c reductase (EC 1.10.2.2) cytochrome c1 precursor - Neurospora crassa ALTERNATE_NAMES bc1 complex cytochrome c1; complex III cytochrome c1; cytochrome c1 heme protein ORGANISM #formal_name Neurospora crassa DATE 05-Oct-1988 #sequence_revision 15-Oct-1994 #text_change 03-Jun-2002 ACCESSIONS A27187 REFERENCE A27187 #authors Roemisch, J.; Tropschug, M.; Sebald, W.; Weiss, H. #journal Eur. J. Biochem. (1987) 164:111-115 #title The primary structure of cytochrome c-1 from Neurospora crassa. #cross-references MUID:87161871; PMID:3030747 #accession A27187 ##molecule_type mRNA ##residues 1-332 ##label ROE ##cross-references GB:X05235; NID:g3005; PIDN:CAA28860.1; PID:g3006 ##note the authors translated the codon AGT for residue 316 as Arg CLASSIFICATION #superfamily cytochrome c1 heme protein; cytochrome c1 heme protein homology KEYWORDS chromoprotein; electron transfer; heme; iron; metalloprotein; mitochondrion; oxidative phosphorylation; oxidoreductase; respiratory chain; transmembrane protein FEATURE 1-70 #domain transit peptide (mitochondrion) #status predicted #label TNP\ 71-332 #product cytochrome c1 #status predicted #label MAT\ 79-305 #domain cytochrome c1 heme protein homology #label C1H\ 278-296 #domain transmembrane #status predicted #label TMM\ 110,113 #binding_site heme (Cys) (covalent) #status predicted\ 114,234 #binding_site heme iron (His, Met) (axial ligands) #status predicted SUMMARY #length 332 #molecular-weight 36456 #checksum 1753 SEQUENCE 5 10 15 20 25 30 1 M L A R T C L R S T R T F A S A K N G A F K F A K R S A S T 31 Q S S G A A A E S P L R L N I A A A A A T A V A A G S I A W 61 Y Y H L Y G F A S A M T P A E E G L H A T K Y P W V H E Q W 91 L K T F D H Q A L R R G F Q V Y R E V C A S C H S L S R V P 121 Y R A L V G T I L T V D E A K A L A E E N E Y D T E P N D Q 151 G E I E K R P G K L S D Y L P D P Y K N D E A A R F A N N G 181 A L P P D L S L I V K A R H G G C D Y I F S L L T G Y P D E 211 P P A G A S V G A G L N F N P Y F P G T G I A M A R V L Y D 241 G L V D Y E D G T P A S T S Q M A K D V V E F L N W A A E P 271 E M D D R K R M G M K V L V V T S V L F A L S V Y V K R Y K 301 W A W L K S R K I V Y D P P K S P P P A T N L A L P Q Q R A 331 K S /// the package is that : # $Id: pir.pm,v 1.4 2004/06/25 09:51:14 ldurufle Exp $ # # BioPerl module for Bio::SeqIO::PIR # # Cared for by Aaron Mackey # # Copyright Aaron Mackey # # You may distribute this module under the same terms as perl itself # # _history # October 18, 1999 Largely rewritten by Lincoln Stein # POD documentation - main docs before the code =head1 NAME Bio::SeqIO::pir - PIR sequence input/output stream =head1 SYNOPSIS Do not use this module directly. Use it via the Bio::SeqIO class. =head1 DESCRIPTION This object can transform Bio::Seq objects to and from pir flat file databases. Note: This does not completely preserve the PIR format - quality information about sequence is currently discarded since bioperl does not have a mechanism for handling these encodings in sequence data. =head1 FEEDBACK =head2 Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://www.bioperl.org/MailList.shtml - About the mailing lists =head2 Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bugzilla.bioperl.org/ =head1 AUTHORS Aaron Mackey Eamackey@virginia.eduE Lincoln Stein Elstein@cshl.orgE Jason Stajich Ejason@bioperl.orgE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ =cut # Let the code begin... package Bio::SeqIO::pir; use vars qw(@ISA); use strict; use Bio::SeqIO; use Bio::Seq::SeqFactory; use Bio::Species; use Bio::Annotation::Collection; @ISA = qw(Bio::SeqIO); sub _initialize { my($self,@args) = @_; $self->SUPER::_initialize(@args); if( ! defined $self->sequence_factory ) { $self->sequence_factory(new Bio::Seq::SeqFactory (-verbose => $self->verbose(), -type => 'Bio::Seq::RichSeq')); } } =head2 next_seq Title : next_seq Usage : $seq = $stream->next_seq() Function: returns the next sequence in the stream Returns : Bio::Seq object Args : NONE =cut sub next_seq { my ($self) = @_; #local($/)= "\n"; my $line; my ($desc,$seq,$id,$org,$date,$acc_string,@sec,$acc); my ($annotation, %params, @features) = ( new Bio::Annotation::Collection); while(defined($line = $self->_readline())) { last if index($line,'ENTRY ') == 0; } return undef if( !defined $line ); # end of file $line =~ /^ENTRY\s+(\S+)\s+/ || $self->throw("Pir stream with bad ENTRY line. Not Pir in my book."); $id = $1; $params{'-display_id'} = $id; until(defined ($line) && ($line =~ /^SEQUENCE/) ) { # Description line(s) if ($line=~/^TITLE\s+(.*)/) { $desc = $1; } # organism line(s) if ($line=~/^ORGANISM\s+\#formal_name\s+(.*)/) { $org = $1; my @class =($org); my $make = Bio::Species->new(); $make->classification(\@class,"FORCE"); # no name validation please $params{'-species'}= $make; } # date line if($line=~/^DATE\s+(\d\d-\w\w\w-\d\d\d\d).*/) { $date = $1; $date =~ s/\;//; $date =~ s/\s+$//; push @{$params{'-dates'}}, $date; } #accession if($line=~/^ACCESSIONS\s+(.*)/) { $seq = ""; $acc_string =$1; $acc_string =~ s/\;\s*/ /g; ($acc,@sec) = split " ",$acc_string; } $line = $self->_readline(); } my ($seqc,$seqn) = ("",""); my $nb=0; while( defined ($line = $self->_readline) ) { if ($line=~/^\/\/\//) {last}; if ($line=~/^\s+\d+\s+\d+/) {next}; if ($line=~/^\s+\d+(.*)/) { $line=$1; } $seq = uc($line); $seqc .= $seq; } # P - indicates complete protein # F - indicates protein fragment # not sure how to stuff these into a Bio object # suitable for writing out. $seqc =~ s/\*//g; $seqc =~ s/[\(\)\.\/\=\,]//g; $seqc =~ s/\s+//g; # get rid of whitespace $params{'-seq_version'} = ''; my ($alphabet) = ('protein'); # TODO - not processing SFS data my $entry = $self->sequence_factory->create (-verbose => $self->verbose, %params, -seq => $seqc, -primary_id => $id, -id => $id, -desc => $desc, -alphabet => $alphabet, -accession_number => $acc, -secondardy_accessions => \@sec, ); return $entry; } =head2 write_seq Title : write_seq Usage : $stream->write_seq(@seq) Function: writes the $seq object into the stream Returns : 1 for success and 0 for error Args : Array of Bio::PrimarySeqI objects =cut #sub write_seq { # my ($self, @seq) = @_; # for my $seq (@seq) { # $self->throw("Did not provide a valid Bio::PrimarySeqI object") # unless defined $seq && ref($seq) && $seq->isa('Bio::PrimarySeqI'); # my $str = $seq->seq(); # return unless $self->_print(">".$seq->id(), # "\n", $seq->desc(), "\n", # $str, "*\n"); # } # $self->flush if $self->_flush_on_write && defined $self->_fh; # return 1; #} 1; Laure Durufle ******************************************************************************************** S - This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. e-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain malware. The presence of this disclaimer is not a proof that it was originated at Serono International S.A. or one of its affiliates. Serono International S.A and its affiliates therefore do not accept liability for any errors or omissions in the content of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. Serono International SA, 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com. ********************************************************************************************* From Laure.Durufle at serono.com Fri Jun 25 06:05:39 2004 From: Laure.Durufle at serono.com (Laure.Durufle@serono.com) Date: Fri Jun 25 11:58:43 2004 Subject: [Bioperl-l] pir.pm => bug Message-ID: Hi, I moved the package pir.pm / we give the file and with pir.pm we can parse this file pir*.dat : like this format : P R O T E I N S E Q U E N C E D A T A B A S E of PIR-International Section 1. Fully Classified Entries Release 79.01, April 04, 2004 20685 sequences, 8103841 residues Protein Information Resource (PIR)* National Biomedical Research Foundation 3900 Reservoir Road, N.W., Washington, DC 20007, USA Japan International Protein Munich Information Center for Information Database (JIPID) Protein Sequences (MIPS) Amakubo 1-16-1 GSF-Forschungszentrum f. Umwelt und Gesundheit Tsukuba 305-0005, Japan am Max-Planck-Instut f. Biochemie Am Klopferspitz 18, D-82152 Martinsried, FRG This database may be redistributed without prior consent, provided that this notice be given to each user and that the words "Derived from" shall precede this notice if the database has been altered by the redistributor. Copyright 2000, PIR-International. *PIR is a registered mark of NBRF. \\\ ENTRY A27187 #type complete TITLE ubiquinol-cytochrome-c reductase (EC 1.10.2.2) cytochrome c1 precursor - Neurospora crassa ALTERNATE_NAMES bc1 complex cytochrome c1; complex III cytochrome c1; cytochrome c1 heme protein ORGANISM #formal_name Neurospora crassa DATE 05-Oct-1988 #sequence_revision 15-Oct-1994 #text_change 03-Jun-2002 ACCESSIONS A27187 REFERENCE A27187 #authors Roemisch, J.; Tropschug, M.; Sebald, W.; Weiss, H. #journal Eur. J. Biochem. (1987) 164:111-115 #title The primary structure of cytochrome c-1 from Neurospora crassa. #cross-references MUID:87161871; PMID:3030747 #accession A27187 ##molecule_type mRNA ##residues 1-332 ##label ROE ##cross-references GB:X05235; NID:g3005; PIDN:CAA28860.1; PID:g3006 ##note the authors translated the codon AGT for residue 316 as Arg CLASSIFICATION #superfamily cytochrome c1 heme protein; cytochrome c1 heme protein homology KEYWORDS chromoprotein; electron transfer; heme; iron; metalloprotein; mitochondrion; oxidative phosphorylation; oxidoreductase; respiratory chain; transmembrane protein FEATURE 1-70 #domain transit peptide (mitochondrion) #status predicted #label TNP\ 71-332 #product cytochrome c1 #status predicted #label MAT\ 79-305 #domain cytochrome c1 heme protein homology #label C1H\ 278-296 #domain transmembrane #status predicted #label TMM\ 110,113 #binding_site heme (Cys) (covalent) #status predicted\ 114,234 #binding_site heme iron (His, Met) (axial ligands) #status predicted SUMMARY #length 332 #molecular-weight 36456 #checksum 1753 SEQUENCE 5 10 15 20 25 30 1 M L A R T C L R S T R T F A S A K N G A F K F A K R S A S T 31 Q S S G A A A E S P L R L N I A A A A A T A V A A G S I A W 61 Y Y H L Y G F A S A M T P A E E G L H A T K Y P W V H E Q W 91 L K T F D H Q A L R R G F Q V Y R E V C A S C H S L S R V P 121 Y R A L V G T I L T V D E A K A L A E E N E Y D T E P N D Q 151 G E I E K R P G K L S D Y L P D P Y K N D E A A R F A N N G 181 A L P P D L S L I V K A R H G G C D Y I F S L L T G Y P D E 211 P P A G A S V G A G L N F N P Y F P G T G I A M A R V L Y D 241 G L V D Y E D G T P A S T S Q M A K D V V E F L N W A A E P 271 E M D D R K R M G M K V L V V T S V L F A L S V Y V K R Y K 301 W A W L K S R K I V Y D P P K S P P P A T N L A L P Q Q R A 331 K S /// the package is that : # $Id: pir.pm,v 1.4 2004/06/25 09:51:14 ldurufle Exp $ # # BioPerl module for Bio::SeqIO::PIR # # Cared for by Aaron Mackey # # Copyright Aaron Mackey # # You may distribute this module under the same terms as perl itself # # _history # October 18, 1999 Largely rewritten by Lincoln Stein # POD documentation - main docs before the code =head1 NAME Bio::SeqIO::pir - PIR sequence input/output stream =head1 SYNOPSIS Do not use this module directly. Use it via the Bio::SeqIO class. =head1 DESCRIPTION This object can transform Bio::Seq objects to and from pir flat file databases. Note: This does not completely preserve the PIR format - quality information about sequence is currently discarded since bioperl does not have a mechanism for handling these encodings in sequence data. =head1 FEEDBACK =head2 Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://www.bioperl.org/MailList.shtml - About the mailing lists =head2 Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bugzilla.bioperl.org/ =head1 AUTHORS Aaron Mackey Eamackey@virginia.eduE Lincoln Stein Elstein@cshl.orgE Jason Stajich Ejason@bioperl.orgE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ =cut # Let the code begin... package Bio::SeqIO::pir; use vars qw(@ISA); use strict; use Bio::SeqIO; use Bio::Seq::SeqFactory; use Bio::Species; use Bio::Annotation::Collection; @ISA = qw(Bio::SeqIO); sub _initialize { my($self,@args) = @_; $self->SUPER::_initialize(@args); if( ! defined $self->sequence_factory ) { $self->sequence_factory(new Bio::Seq::SeqFactory (-verbose => $self->verbose(), -type => 'Bio::Seq::RichSeq')); } } =head2 next_seq Title : next_seq Usage : $seq = $stream->next_seq() Function: returns the next sequence in the stream Returns : Bio::Seq object Args : NONE =cut sub next_seq { my ($self) = @_; #local($/)= "\n"; my $line; my ($desc,$seq,$id,$org,$date,$acc_string,@sec,$acc); my ($annotation, %params, @features) = ( new Bio::Annotation::Collection); while(defined($line = $self->_readline())) { last if index($line,'ENTRY ') == 0; } return undef if( !defined $line ); # end of file $line =~ /^ENTRY\s+(\S+)\s+/ || $self->throw("Pir stream with bad ENTRY line. Not Pir in my book."); $id = $1; $params{'-display_id'} = $id; until(defined ($line) && ($line =~ /^SEQUENCE/) ) { # Description line(s) if ($line=~/^TITLE\s+(.*)/) { $desc = $1; } # organism line(s) if ($line=~/^ORGANISM\s+\#formal_name\s+(.*)/) { $org = $1; my @class =($org); my $make = Bio::Species->new(); $make->classification(\@class,"FORCE"); # no name validation please $params{'-species'}= $make; } # date line if($line=~/^DATE\s+(\d\d-\w\w\w-\d\d\d\d).*/) { $date = $1; $date =~ s/\;//; $date =~ s/\s+$//; push @{$params{'-dates'}}, $date; } #accession if($line=~/^ACCESSIONS\s+(.*)/) { $seq = ""; $acc_string =$1; $acc_string =~ s/\;\s*/ /g; ($acc,@sec) = split " ",$acc_string; } $line = $self->_readline(); } my ($seqc,$seqn) = ("",""); my $nb=0; while( defined ($line = $self->_readline) ) { if ($line=~/^\/\/\//) {last}; if ($line=~/^\s+\d+\s+\d+/) {next}; if ($line=~/^\s+\d+(.*)/) { $line=$1; } $seq = uc($line); $seqc .= $seq; } # P - indicates complete protein # F - indicates protein fragment # not sure how to stuff these into a Bio object # suitable for writing out. $seqc =~ s/\*//g; $seqc =~ s/[\(\)\.\/\=\,]//g; $seqc =~ s/\s+//g; # get rid of whitespace $params{'-seq_version'} = ''; my ($alphabet) = ('protein'); # TODO - not processing SFS data my $entry = $self->sequence_factory->create (-verbose => $self->verbose, %params, -seq => $seqc, -primary_id => $id, -id => $id, -desc => $desc, -alphabet => $alphabet, -accession_number => $acc, -secondardy_accessions => \@sec, ); return $entry; } =head2 write_seq Title : write_seq Usage : $stream->write_seq(@seq) Function: writes the $seq object into the stream Returns : 1 for success and 0 for error Args : Array of Bio::PrimarySeqI objects =cut #sub write_seq { # my ($self, @seq) = @_; # for my $seq (@seq) { # $self->throw("Did not provide a valid Bio::PrimarySeqI object") # unless defined $seq && ref($seq) && $seq->isa('Bio::PrimarySeqI'); # my $str = $seq->seq(); # return unless $self->_print(">".$seq->id(), # "\n", $seq->desc(), "\n", # $str, "*\n"); # } # $self->flush if $self->_flush_on_write && defined $self->_fh; # return 1; #} 1; Laure Durufle ******************************************************************************************** S - This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. e-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain malware. The presence of this disclaimer is not a proof that it was originated at Serono International S.A. or one of its affiliates. Serono International S.A and its affiliates therefore do not accept liability for any errors or omissions in the content of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. Serono International SA, 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com. ********************************************************************************************* From jason at cgt.duhs.duke.edu Fri Jun 25 13:16:05 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jun 25 13:18:49 2004 Subject: [Bioperl-l] Phlip In-Reply-To: <1088170579.6122.77.camel@localhost.localdomain> References: <1088170579.6122.77.camel@localhost.localdomain> Message-ID: So you are getting a tree back though when you pass in the matrix properly? Isn't possible total_branch_length is failing - what happens when you do this? for my $node ( $tree->get_leaf_nodes ) { print $node->id, " ", $node->branch_length, "\n"; } -jason On Fri, 25 Jun 2004, Jonathan Manning wrote: > Hi all, > > Was hoping Jason or someone else could nudge me in the right direction > with Phylip. I was having trouble with the suite and tree drawing, and > think I've narrowed it down to Neighbour, since the output tree has 0 > 'total branch length'. I believe Protdist is creating the matrix > correctly, since get_column returns a sensible-looking list of values. > My snippet of code is: > > my ($matrix) = $protdist->run($alnmnt); > print "Number of rows in matrix: ", $matrix->num_rows(), "\n"; > my @n_params = ( > 'type' => 'NJ', > 'outgroup' => 2, > 'lowtri' => 1, > 'upptri' => 1, > 'subrep' => 1 > ); > my $neighbor = > Bio::Tools::Run::Phylo::Phylip::Neighbor->new(@n_params); > my ($tree) = $neighbor->run($matrix); > print $tree, "\n"; > print "Total branch length: ", $tree->total_branch_length(), "\n"; > > > The documentation actually states that a matrix argument should be a > hash reference, but if I create a reference and pass it in like: > > my $matrixref = \$matrix; > > I just get this message: > > Can't call method "isa" on unblessed reference at > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/Neighbor.pm > line 458. > > > Would be very grateful for any help anyone could offer! > > Jon > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From echuong at gmail.com Fri Jun 25 18:24:57 2004 From: echuong at gmail.com (Edward Chuong) Date: Fri Jun 25 18:27:46 2004 Subject: [Bioperl-l] Help on a basic EST-genomic alignment script Message-ID: <244d2e0e040625152477e9d07a@mail.gmail.com> Hi all, I'm a very new perl/bioperl user, and I'm running into some trouble so I hope I can find some help here. Here's what I need to do: (from a large number of fasta files) Extract an EST (peromyscus) from the FASTA file, find its closest match in mus musclus, and find the dn/ds between the two sequences. Here's what I'm doing (I know this is probably the least efficient way, any suggestions?): 1) Read in pero EST from a FASTA 2) Standaloneblast it to local mus cDNA database, retrieve accession from best result 3) Retrieve complete mus sequence with features from genbank using ID from (2) 4) Make a clustalw simple align object using the mus protein sequence from (3) against the translated pero EST for all 3 frames, and keep the one with the best identity %. --I'm done up to here-- 5) Convert the aln frrom AA to DNA (there is a builtin aa_to_dna_aln but it isn't working for me) 6) Pass the aln through a DN/DS module (is paml the only one?) So I have 2 problems: 7 out of the 20 ESTs return a poor "best alignment" with less than 20% identity (and in the printout they clearly are not aligning). Does this have something to do with gaps? In spite of that, the other 13 are aligning pretty well, with 60-100% alignment. In order to calculate DN/DS I've looked around and it seems I have to use PAML. But before that I think I'm required to have an aln object of two DNA sequences, starting at the correct frame. How can I do that? -- Edward Chuong http://iacs5.ucsd.edu/~echuong From MEC at Stowers-Institute.org Fri Jun 25 19:54:44 2004 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Fri Jun 25 19:57:18 2004 Subject: [Bioperl-l] Electronic Chromosome Walking Message-ID: <200406252357.i5PNvDKr020822@portal.open-bio.org> see Genotrace http://bioinformatics.oupjournals.org/cgi/screenpdf/18/10/1396.pdf http://rat.niob.knaw.nl/genotrace.html http://genotrace.niob.knaw.nl/ ?? -Malcolm From jason at cgt.duhs.duke.edu Fri Jun 25 22:52:10 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jun 25 22:54:50 2004 Subject: [Bioperl-l] Help on a basic EST-genomic alignment script In-Reply-To: <244d2e0e040625152477e9d07a@mail.gmail.com> References: <244d2e0e040625152477e9d07a@mail.gmail.com> Message-ID: On Fri, 25 Jun 2004, Edward Chuong wrote: > Hi all, > > I'm a very new perl/bioperl user, and I'm running into some trouble so > I hope I can find some help here. > > Here's what I need to do: > > (from a large number of fasta files) Extract an EST (peromyscus) from > the FASTA file, find its closest match in mus musclus, and find the > dn/ds between the two sequences. > > Here's what I'm doing (I know this is probably the least efficient > way, any suggestions?): > > 1) Read in pero EST from a FASTA > 2) Standaloneblast it to local mus cDNA database, retrieve accession > from best result You know if you already have a fasta file with the ests you don't really have to run these individually or even with StandAloneBlast and it will be more efficient to run the search at once and have the report file ready to parse. Just do blastall -i ests.fa -d mus -p blastn -e evalue ... Although I think you might do better to run a translated search against the mouse protein set. You also will find you can get better results with FASTX/FASTY as it allows frameshifts whereas blast will only search one frame at a time. > 3) Retrieve complete mus sequence with features from genbank using ID from (2) > 4) Make a clustalw simple align object using the mus protein sequence > from (3) against the translated pero EST for all 3 frames, and keep > the one with the best identity %. > --I'm done up to here-- Why not determine the best frame when doing the search by comparing to the mouse proteins? > 5) Convert the aln frrom AA to DNA (there is a builtin aa_to_dna_aln > but it isn't working for me) Show the code if you want specific help I guess. You pass in a protein alignment (Bio::SimpleAlign) and a hash reference of Bio::Seq objects which are cDNA and keyed on the names of each of the proteins in the alignment. See the script in scripts/utilities/pairwise_kaks.PLS for an example and working code. > 6) Pass the aln through a DN/DS module (is paml the only one?) Depends on how important it is for you to the best answer... You can use Bio::Align::DNAStatistics calc_KaKs_pair for pairwise Nei-Gojobori count. You can use YN00 or Codeml in PAML for . > > So I have 2 problems: 7 out of the 20 ESTs return a poor "best > alignment" with less than 20% identity (and in the printout they > clearly are not aligning). Does this have something to do with gaps? > > In spite of that, the other 13 are aligning pretty well, with 60-100% > alignment. In order to calculate DN/DS I've looked around and it seems > I have to use PAML. But before that I think I'm required to have an > aln object of two DNA sequences, starting at the correct frame. How > can I do that? > You basically need to predict the protein sequence from the EST - you can use estwise to do this based on the best mouse protein homolog. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From bmb9jrm at bmb.leeds.ac.uk Sat Jun 26 07:10:01 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Sat Jun 26 07:12:31 2004 Subject: [Bioperl-l] Phlip In-Reply-To: References: <1088170579.6122.77.camel@localhost.localdomain> Message-ID: <1088248201.6105.5.camel@localhost.localdomain> Hi Jason, Thanks for the reply. Have tried your suggestion, and got a list of zeros. However, I've solved the problem by adjusting the parameters- don't think the examples were good for my sequences! Thanks, JOn On Fri, 2004-06-25 at 18:16, Jason Stajich wrote: > So you are getting a tree back though when you pass in the matrix > properly? Isn't possible total_branch_length is failing - what happens > when you do this? > > for my $node ( $tree->get_leaf_nodes ) { > print $node->id, " ", $node->branch_length, "\n"; > } > > -jason > > On Fri, 25 Jun 2004, Jonathan Manning wrote: > > > Hi all, > > > > Was hoping Jason or someone else could nudge me in the right direction > > with Phylip. I was having trouble with the suite and tree drawing, and > > think I've narrowed it down to Neighbour, since the output tree has 0 > > 'total branch length'. I believe Protdist is creating the matrix > > correctly, since get_column returns a sensible-looking list of values. > > My snippet of code is: > > > > my ($matrix) = $protdist->run($alnmnt); > > print "Number of rows in matrix: ", $matrix->num_rows(), "\n"; > > my @n_params = ( > > 'type' => 'NJ', > > 'outgroup' => 2, > > 'lowtri' => 1, > > 'upptri' => 1, > > 'subrep' => 1 > > ); > > my $neighbor = > > Bio::Tools::Run::Phylo::Phylip::Neighbor->new(@n_params); > > my ($tree) = $neighbor->run($matrix); > > print $tree, "\n"; > > print "Total branch length: ", $tree->total_branch_length(), "\n"; > > > > > > The documentation actually states that a matrix argument should be a > > hash reference, but if I create a reference and pass it in like: > > > > my $matrixref = \$matrix; > > > > I just get this message: > > > > Can't call method "isa" on unblessed reference at > > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/Phylo/Phylip/Neighbor.pm > > line 458. > > > > > > Would be very grateful for any help anyone could offer! > > > > Jon > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From axl163 at yahoo.com Sat Jun 26 12:57:21 2004 From: axl163 at yahoo.com (Allen Liu) Date: Sat Jun 26 12:59:58 2004 Subject: [Bioperl-l] Bio::Graphics - how to adjust height of a panel as you add more tracks? Message-ID: <20040626165721.30183.qmail@web41504.mail.yahoo.com> Hi, When I begin to add many tracks to a panel (about 300 tracks), I notice that the panel actually starts shrinking the tracks to the point where I cannot read the information on the tracks anymore. At first, I thought the previewer was automatically set to fit the entire PNG on the screen at once, but when I try to zoom in, the resolution gets really pixelated. The height appears to stay fixed when I am adding these tracks and I was wondering if there is anyway of extending the height as the panel grows. Any information would be greatly appreciated. Allen Liu From echuong at gmail.com Sun Jun 27 04:27:02 2004 From: echuong at gmail.com (Edward Chuong) Date: Sun Jun 27 04:29:28 2004 Subject: [Bioperl-l] Help on a basic EST-genomic alignment script In-Reply-To: References: <244d2e0e040625152477e9d07a@mail.gmail.com> Message-ID: <244d2e0e04062701275dae0799@mail.gmail.com> Hi, Thanks for responding! As I said originally I've just started playing with bioperl so I'm a little lost on some of the terms you used. > > > > 1) Read in pero EST from a FASTA > > 2) Standaloneblast it to local mus cDNA database, retrieve accession > > from best result .. > Just do > blastall -i ests.fa -d mus -p blastn -e evalue ... Each individual EST is in its own file (filename is its ID like PM_BWP0009A06.FAF with no particular pattern, so I just read in an entire directory). Unless I'm missing something?.. should I read all the est's into one file, then do blastall? > Although I think you might do better to run a translated search against > the mouse protein set. You also will find you can get better results with > FASTX/FASTY as it allows frameshifts whereas blast will only search one > frame at a time. Can you elaborate on what FASTX/FASTY are? I'm only using blast to get the accession ID of the best match in my mus cDNA list, then getting the full mus seq info from genbank with that ID (I think I'm doing someting too complicated..) > > 3) Retrieve complete mus sequence with features from genbank using ID from (2) > > 4) Make a clustalw simple align object using the mus protein sequence > > from (3) against the translated pero EST for all 3 frames, and keep > > the one with the best identity %. > > --I'm done up to here-- > > Why not determine the best frame when doing the search by comparing to the > mouse proteins? Not sure what you mean by this. Do you mean to manually look and check if the proteins match up well? I'd like to avoid this if possible, I plan to use this on several hundred EST files. > > > 5) Convert the aln frrom AA to DNA (there is a builtin aa_to_dna_aln > > but it isn't working for me) It seems like I have a lot of rewriting to do before this step so I'll save this for later :) > > You basically need to predict the protein sequence from the EST - you can > use estwise to do this based on the best mouse protein homolog. Can you elaborate on what estwise is? Is this part of wise or a separate thing? How do I run this in bioperl? > -jason Again, thanks very much for your help! Take care Ed -- Edward Chuong http://iacs5.ucsd.edu/~echuong From Marc.Logghe at devgen.com Sun Jun 27 10:08:57 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Sun Jun 27 10:11:43 2004 Subject: [Bioperl-l] ontology help Message-ID: > I believe the current implementation of delegation from the > Bio::Ontology::Ontology methods to their counterparts in the > currently > used engine implementation expects that only terms and rel.ships from > the same ontology will be asked for. > > The original rationale behind this was to make it possible to use one > and the same engine instance to back multiple OntologyI instances. If > you did this, you would need to filter out terms from other > ontologies > co-using the engine instance. > > At hindsight I think this is probably over-engineered and trying to > solve a non-existing problem. So, we could as well demand that one > engine instance only serve one ontology instance and you're > on your own > if you do otherwise. This would then allow us to remove the > post-filtering code that filters hits returned from the engine. > > You could try and see whether that will solve your problem as well. Solved. Thanks a lot for the help, Hilmar !!!! At first, I indeed always used a separate engine. Taking your advice, a created a Bio::Ontology::Ontology first for the first ontology and reused that engine for the other two ontologies. And that did the trick !!! Regards, Marc *********************************************************** Marc Logghe, Ph.D. Senior Scientist Scientific Computing Group Devgen nv Technologiepark 30 B - 9052 Ghent-Zwijnaarde Belgium Tel: +32 9 324 24 83 Fax: +32 9 324 24 25 > **** DISCLAIMER ********************************************************** > "This e-mail and any attachments thereto may contain information > which is confidential and/or protected by intellectual property > rights and are intended for the sole use of the recipient(s) named above. > Any use of the information contained herein (including, but not limited to, > total or partial reproduction, communication or distribution in any form) > by persons other than the designated recipient(s) is prohibited. > If you have received this e-mail in error, please notify the sender either > by telephone or by e-mail and delete the material from any computer. > Thank you for your cooperation." From heikki at ebi.ac.uk Mon Jun 28 12:46:42 2004 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Jun 28 11:51:20 2004 Subject: [Bioperl-l] pir.pm => bug In-Reply-To: References: Message-ID: <200406281246.42724.heikki@ebi.ac.uk> Laure, Thanks for the fix. pir.pm has not been updated for a long time. Not many people work with the format. Before I apply your changes into the file, I'll summarise here the major changes so that others can comment: - uses Bio::Species and Bio::Annotation::Collection - uses Bio::Seq::RichSeq rather than Bio::Seq - parses TITLE, ORGANISM, DATE, ACCESSIONS lines - comments out method write_seq() I do not know if write_seq() is needed in the module neither if its removal is intentional? -Heikki On Friday 25 Jun 2004 06:05, Laure.Durufle@serono.com wrote: > Hi, > > > I moved the package pir.pm / we give the file and with pir.pm we can parse > this file pir*.dat : > > like this format : > > > P R O T E I N S E Q U E N C E D A T A B A S E > of PIR-International > > Section 1. Fully Classified Entries > Release 79.01, April 04, 2004 > 20685 sequences, 8103841 residues > > Protein Information Resource (PIR)* > National Biomedical Research Foundation > 3900 Reservoir Road, N.W., > Washington, DC 20007, USA > > Japan International Protein Munich Information Center for > Information Database (JIPID) Protein Sequences (MIPS) > Amakubo 1-16-1 GSF-Forschungszentrum f. Umwelt und > Gesundheit > Tsukuba 305-0005, Japan am Max-Planck-Instut f. Biochemie > Am Klopferspitz 18, D-82152 Martinsried, > FRG > > This database may be redistributed without prior consent, provided that > this notice be given to each user and that the words "Derived from" > shall > precede this notice if the database has been altered by the > redistributor. > > Copyright 2000, PIR-International. > > *PIR is a registered mark of NBRF. > \\\ > ENTRY A27187 #type complete > TITLE ubiquinol-cytochrome-c reductase (EC 1.10.2.2) cytochrome > c1 > precursor - Neurospora crassa > ALTERNATE_NAMES bc1 complex cytochrome c1; complex III cytochrome c1; > cytochrome c1 heme protein > ORGANISM #formal_name Neurospora crassa > DATE 05-Oct-1988 #sequence_revision 15-Oct-1994 #text_change > 03-Jun-2002 > ACCESSIONS A27187 > REFERENCE A27187 > #authors Roemisch, J.; Tropschug, M.; Sebald, W.; Weiss, H. > #journal Eur. J. Biochem. (1987) 164:111-115 > #title The primary structure of cytochrome c-1 from Neurospora > crassa. > #cross-references MUID:87161871; PMID:3030747 > #accession A27187 > ##molecule_type mRNA > ##residues 1-332 ##label ROE > ##cross-references GB:X05235; NID:g3005; PIDN:CAA28860.1; PID:g3006 > ##note the authors translated the codon AGT for residue 316 as Arg > CLASSIFICATION #superfamily cytochrome c1 heme protein; cytochrome c1 heme > protein homology > KEYWORDS chromoprotein; electron transfer; heme; iron; > metalloprotein; mitochondrion; oxidative phosphorylation; > oxidoreductase; respiratory chain; transmembrane protein > FEATURE > 1-70 #domain transit peptide (mitochondrion) #status > predicted #label TNP\ > 71-332 #product cytochrome c1 #status predicted #label MAT\ > 79-305 #domain cytochrome c1 heme protein homology #label > C1H\ > 278-296 #domain transmembrane #status predicted #label TMM\ > 110,113 #binding_site heme (Cys) (covalent) #status > predicted\ > 114,234 #binding_site heme iron (His, Met) (axial ligands) > #status predicted > SUMMARY #length 332 #molecular-weight 36456 #checksum 1753 > SEQUENCE > 5 10 15 20 25 30 > 1 M L A R T C L R S T R T F A S A K N G A F K F A K R S A S T > 31 Q S S G A A A E S P L R L N I A A A A A T A V A A G S I A W > 61 Y Y H L Y G F A S A M T P A E E G L H A T K Y P W V H E Q W > 91 L K T F D H Q A L R R G F Q V Y R E V C A S C H S L S R V P > 121 Y R A L V G T I L T V D E A K A L A E E N E Y D T E P N D Q > 151 G E I E K R P G K L S D Y L P D P Y K N D E A A R F A N N G > 181 A L P P D L S L I V K A R H G G C D Y I F S L L T G Y P D E > 211 P P A G A S V G A G L N F N P Y F P G T G I A M A R V L Y D > 241 G L V D Y E D G T P A S T S Q M A K D V V E F L N W A A E P > 271 E M D D R K R M G M K V L V V T S V L F A L S V Y V K R Y K > 301 W A W L K S R K I V Y D P P K S P P P A T N L A L P Q Q R A > 331 K S > /// > > > the package is that : > # $Id: pir.pm,v 1.4 2004/06/25 09:51:14 ldurufle Exp $ > # > # BioPerl module for Bio::SeqIO::PIR > # > # Cared for by Aaron Mackey > # > # Copyright Aaron Mackey > # > # You may distribute this module under the same terms as perl itself > # > # _history > # October 18, 1999 Largely rewritten by Lincoln Stein > > # POD documentation - main docs before the code > > =head1 NAME > > Bio::SeqIO::pir - PIR sequence input/output stream > > =head1 SYNOPSIS > > Do not use this module directly. Use it via the Bio::SeqIO class. > > =head1 DESCRIPTION > > This object can transform Bio::Seq objects to and from pir flat > file databases. > > Note: This does not completely preserve the PIR format - quality > information about sequence is currently discarded since bioperl > does not have a mechanism for handling these encodings in sequence > data. > > =head1 FEEDBACK > > =head2 Mailing Lists > > User feedback is an integral part of the evolution of this and other > Bioperl modules. Send your comments and suggestions preferably to one > of the Bioperl mailing lists. Your participation is much appreciated. > > bioperl-l@bioperl.org - General discussion > http://www.bioperl.org/MailList.shtml - About the mailing lists > > =head2 Reporting Bugs > > Report bugs to the Bioperl bug tracking system to help us keep track > the bugs and their resolution. > Bug reports can be submitted via email or the web: > > bioperl-bugs@bio.perl.org > http://bugzilla.bioperl.org/ > > =head1 AUTHORS > > Aaron Mackey Eamackey@virginia.eduE > Lincoln Stein Elstein@cshl.orgE > Jason Stajich Ejason@bioperl.orgE > > =head1 APPENDIX > > The rest of the documentation details each of the object > methods. Internal methods are usually preceded with a _ > > =cut > > # Let the code begin... > > package Bio::SeqIO::pir; > use vars qw(@ISA); > use strict; > > use Bio::SeqIO; > use Bio::Seq::SeqFactory; > use Bio::Species; > use Bio::Annotation::Collection; > > @ISA = qw(Bio::SeqIO); > > sub _initialize { > my($self,@args) = @_; > $self->SUPER::_initialize(@args); > if( ! defined $self->sequence_factory ) { > $self->sequence_factory(new Bio::Seq::SeqFactory > (-verbose => $self->verbose(), > -type => 'Bio::Seq::RichSeq')); > } > } > > =head2 next_seq > > Title : next_seq > Usage : $seq = $stream->next_seq() > Function: returns the next sequence in the stream > Returns : Bio::Seq object > Args : NONE > > =cut > > sub next_seq { > my ($self) = @_; > #local($/)= "\n"; > my $line; > my ($desc,$seq,$id,$org,$date,$acc_string,@sec,$acc); > my ($annotation, %params, @features) = ( new > Bio::Annotation::Collection); > > while(defined($line = $self->_readline())) { > last if index($line,'ENTRY ') == 0; > } > return undef if( !defined $line ); # end of file > > $line =~ /^ENTRY\s+(\S+)\s+/ || > $self->throw("Pir stream with bad ENTRY line. Not Pir in my > book."); > $id = $1; > $params{'-display_id'} = $id; > > until(defined ($line) && ($line =~ /^SEQUENCE/) ) { > > # Description line(s) > if ($line=~/^TITLE\s+(.*)/) { > $desc = $1; > } > # organism line(s) > if ($line=~/^ORGANISM\s+\#formal_name\s+(.*)/) { > $org = $1; > my @class =($org); > my $make = Bio::Species->new(); > $make->classification(\@class,"FORCE"); # no name validation please > $params{'-species'}= $make; > } > # date line > if($line=~/^DATE\s+(\d\d-\w\w\w-\d\d\d\d).*/) { > $date = $1; > $date =~ s/\;//; > $date =~ s/\s+$//; > push @{$params{'-dates'}}, $date; > } > #accession > if($line=~/^ACCESSIONS\s+(.*)/) { > $seq = ""; > $acc_string =$1; > $acc_string =~ s/\;\s*/ /g; > ($acc,@sec) = split " ",$acc_string; > } > > $line = $self->_readline(); > > } > my ($seqc,$seqn) = ("",""); > my $nb=0; > while( defined ($line = $self->_readline) ) { > if ($line=~/^\/\/\//) {last}; > if ($line=~/^\s+\d+\s+\d+/) {next}; > if ($line=~/^\s+\d+(.*)/) { > $line=$1; > } > $seq = uc($line); > $seqc .= $seq; > } > > # P - indicates complete protein > # F - indicates protein fragment > # not sure how to stuff these into a Bio object > # suitable for writing out. > $seqc =~ s/\*//g; > $seqc =~ s/[\(\)\.\/\=\,]//g; > $seqc =~ s/\s+//g; # get rid of whitespace > $params{'-seq_version'} = ''; > > my ($alphabet) = ('protein'); > # TODO - not processing SFS data > my $entry = $self->sequence_factory->create > (-verbose => $self->verbose, > %params, > -seq => $seqc, > -primary_id => $id, > -id => $id, > -desc => $desc, > -alphabet => $alphabet, > -accession_number => $acc, > -secondardy_accessions => \@sec, > ); > > return $entry; > } > > > =head2 write_seq > > Title : write_seq > Usage : $stream->write_seq(@seq) > Function: writes the $seq object into the stream > Returns : 1 for success and 0 for error > Args : Array of Bio::PrimarySeqI objects > > > =cut > > #sub write_seq { > # my ($self, @seq) = @_; > # for my $seq (@seq) { > # $self->throw("Did not provide a valid Bio::PrimarySeqI object") > # unless defined $seq && ref($seq) && > $seq->isa('Bio::PrimarySeqI'); > # my $str = $seq->seq(); > # return unless $self->_print(">".$seq->id(), > # "\n", $seq->desc(), "\n", > # $str, "*\n"); > # } > > # $self->flush if $self->_flush_on_write && defined $self->_fh; > # return 1; > #} > > 1; > > > > Laure Durufle > > > > > > *************************************************************************** >***************** S - This message contains confidential information and is > intended only for the individual named. If you are not the named addressee, > you should not disseminate, distribute or copy this e-mail. Please notify > the sender immediately by e-mail if you have received this e-mail by > mistake and delete this e-mail from your system. > e-mail transmission cannot be guaranteed to be secure or error-free as > information could be intercepted, corrupted, lost, destroyed, arrive late > or incomplete, or contain malware. The presence of this disclaimer is not a > proof that it was originated at Serono International S.A. or one of its > affiliates. Serono International S.A and its affiliates therefore do not > accept liability for any errors or omissions in the content of this > message, which arise as a result of e-mail transmission. If verification is > required, please request a hard-copy version. Serono International SA, > 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com. > *************************************************************************** >****************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From bmb9jrm at bmb.leeds.ac.uk Mon Jun 28 13:15:22 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Mon Jun 28 13:17:43 2004 Subject: [Bioperl-l] Bio::DB::Query::GenBank errors Message-ID: <1088442922.6108.18.camel@localhost.localdomain> Hi all, Apologies for the somewhat fundamental question, but I haven't been able to find answers myself, and it should be a quickie. How do I tell if a Bio::DB::Query::Genbank has not retrieved any results without calling any methods (since this causes the script to crash out with an exception if there's no results). Since count() or ids() are unavailable for the purpose I tried: - if ($queryobject) - if (ref($queryobject)) - if (defined($queryobject)) (I realise I was clutching at straws a bit there), but to no avail. Help appreciated! Jon From redwards at utmem.edu Mon Jun 28 13:38:33 2004 From: redwards at utmem.edu (Rob Edwards) Date: Mon Jun 28 13:40:56 2004 Subject: [Bioperl-l] Bio::DB::Query::GenBank errors In-Reply-To: <1088442922.6108.18.camel@localhost.localdomain> References: <1088442922.6108.18.camel@localhost.localdomain> Message-ID: eval should catch the error. eval {$count=$queryobject->count}; if ($@) { print "No results returned" } Rob On Jun 28, 2004, at 12:15 PM, Jonathan Manning wrote: > Hi all, > > Apologies for the somewhat fundamental question, but I haven't been > able > to find answers myself, and it should be a quickie. > > How do I tell if a Bio::DB::Query::Genbank has not retrieved any > results > without calling any methods (since this causes the script to crash out > with an exception if there's no results). > > Since count() or ids() are unavailable for the purpose I tried: > > - if ($queryobject) > - if (ref($queryobject)) > - if (defined($queryobject)) > > (I realise I was clutching at straws a bit there), but to no avail. > > Help appreciated! > > Jon > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon Jun 28 13:54:31 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Jun 28 13:56:47 2004 Subject: [Bioperl-l] pir.pm => bug In-Reply-To: <200406281246.42724.heikki@ebi.ac.uk> Message-ID: <350DA656-C92C-11D8-8F1F-000A959EB4C4@gmx.net> Note also that PIR is obsolete. PIR merged with Swissprot to form Uniprot. The Uniprot format is basically the swissprot format. -hilmar On Monday, June 28, 2004, at 10:46 AM, Heikki Lehvaslaiho wrote: > Laure, > > Thanks for the fix. > > pir.pm has not been updated for a long time. Not many people work with > the > format. > > Before I apply your changes into the file, I'll summarise here the > major > changes so that others can comment: > > - uses Bio::Species and Bio::Annotation::Collection > - uses Bio::Seq::RichSeq rather than Bio::Seq > - parses TITLE, ORGANISM, DATE, ACCESSIONS lines > - comments out method write_seq() > > I do not know if write_seq() is needed in the module neither if its > removal > is intentional? > > -Heikki > > > > On Friday 25 Jun 2004 06:05, Laure.Durufle@serono.com wrote: >> Hi, >> >> >> I moved the package pir.pm / we give the file and with pir.pm we can >> parse >> this file pir*.dat : >> >> like this format : >> >> >> P R O T E I N S E Q U E N C E D A T A B A S E >> of PIR-International >> >> Section 1. Fully Classified Entries >> Release 79.01, April 04, 2004 >> 20685 sequences, 8103841 residues >> >> Protein Information Resource (PIR)* >> National Biomedical Research Foundation >> 3900 Reservoir Road, N.W., >> Washington, DC 20007, USA >> >> Japan International Protein Munich Information Center for >> Information Database (JIPID) Protein Sequences (MIPS) >> Amakubo 1-16-1 GSF-Forschungszentrum f. Umwelt und >> Gesundheit >> Tsukuba 305-0005, Japan am Max-Planck-Instut f. >> Biochemie >> Am Klopferspitz 18, D-82152 >> Martinsried, >> FRG >> >> This database may be redistributed without prior consent, provided >> that >> this notice be given to each user and that the words "Derived from" >> shall >> precede this notice if the database has been altered by the >> redistributor. >> >> Copyright 2000, PIR-International. >> >> *PIR is a registered mark of NBRF. >> \\\ >> ENTRY A27187 #type complete >> TITLE ubiquinol-cytochrome-c reductase (EC 1.10.2.2) >> cytochrome >> c1 >> precursor - Neurospora crassa >> ALTERNATE_NAMES bc1 complex cytochrome c1; complex III cytochrome c1; >> cytochrome c1 heme protein >> ORGANISM #formal_name Neurospora crassa >> DATE 05-Oct-1988 #sequence_revision 15-Oct-1994 >> #text_change >> 03-Jun-2002 >> ACCESSIONS A27187 >> REFERENCE A27187 >> #authors Roemisch, J.; Tropschug, M.; Sebald, W.; Weiss, H. >> #journal Eur. J. Biochem. (1987) 164:111-115 >> #title The primary structure of cytochrome c-1 from >> Neurospora >> crassa. >> #cross-references MUID:87161871; PMID:3030747 >> #accession A27187 >> ##molecule_type mRNA >> ##residues 1-332 ##label ROE >> ##cross-references GB:X05235; NID:g3005; PIDN:CAA28860.1; >> PID:g3006 >> ##note the authors translated the codon AGT for residue 316 as >> Arg >> CLASSIFICATION #superfamily cytochrome c1 heme protein; cytochrome >> c1 heme >> protein homology >> KEYWORDS chromoprotein; electron transfer; heme; iron; >> metalloprotein; mitochondrion; oxidative >> phosphorylation; >> oxidoreductase; respiratory chain; transmembrane >> protein >> FEATURE >> 1-70 #domain transit peptide (mitochondrion) #status >> predicted #label TNP\ >> 71-332 #product cytochrome c1 #status predicted >> #label MAT\ >> 79-305 #domain cytochrome c1 heme protein homology >> #label >> C1H\ >> 278-296 #domain transmembrane #status predicted #label >> TMM\ >> 110,113 #binding_site heme (Cys) (covalent) #status >> predicted\ >> 114,234 #binding_site heme iron (His, Met) (axial >> ligands) >> #status predicted >> SUMMARY #length 332 #molecular-weight 36456 #checksum 1753 >> SEQUENCE >> 5 10 15 20 25 30 >> 1 M L A R T C L R S T R T F A S A K N G A F K F A K R S A S T >> 31 Q S S G A A A E S P L R L N I A A A A A T A V A A G S I A W >> 61 Y Y H L Y G F A S A M T P A E E G L H A T K Y P W V H E Q W >> 91 L K T F D H Q A L R R G F Q V Y R E V C A S C H S L S R V P >> 121 Y R A L V G T I L T V D E A K A L A E E N E Y D T E P N D Q >> 151 G E I E K R P G K L S D Y L P D P Y K N D E A A R F A N N G >> 181 A L P P D L S L I V K A R H G G C D Y I F S L L T G Y P D E >> 211 P P A G A S V G A G L N F N P Y F P G T G I A M A R V L Y D >> 241 G L V D Y E D G T P A S T S Q M A K D V V E F L N W A A E P >> 271 E M D D R K R M G M K V L V V T S V L F A L S V Y V K R Y K >> 301 W A W L K S R K I V Y D P P K S P P P A T N L A L P Q Q R A >> 331 K S >> /// >> >> >> the package is that : >> # $Id: pir.pm,v 1.4 2004/06/25 09:51:14 ldurufle Exp $ >> # >> # BioPerl module for Bio::SeqIO::PIR >> # >> # Cared for by Aaron Mackey >> # >> # Copyright Aaron Mackey >> # >> # You may distribute this module under the same terms as perl itself >> # >> # _history >> # October 18, 1999 Largely rewritten by Lincoln Stein >> >> # POD documentation - main docs before the code >> >> =head1 NAME >> >> Bio::SeqIO::pir - PIR sequence input/output stream >> >> =head1 SYNOPSIS >> >> Do not use this module directly. Use it via the Bio::SeqIO class. >> >> =head1 DESCRIPTION >> >> This object can transform Bio::Seq objects to and from pir flat >> file databases. >> >> Note: This does not completely preserve the PIR format - quality >> information about sequence is currently discarded since bioperl >> does not have a mechanism for handling these encodings in sequence >> data. >> >> =head1 FEEDBACK >> >> =head2 Mailing Lists >> >> User feedback is an integral part of the evolution of this and other >> Bioperl modules. Send your comments and suggestions preferably to one >> of the Bioperl mailing lists. Your participation is much appreciated. >> >> bioperl-l@bioperl.org - General discussion >> http://www.bioperl.org/MailList.shtml - About the mailing lists >> >> =head2 Reporting Bugs >> >> Report bugs to the Bioperl bug tracking system to help us keep track >> the bugs and their resolution. >> Bug reports can be submitted via email or the web: >> >> bioperl-bugs@bio.perl.org >> http://bugzilla.bioperl.org/ >> >> =head1 AUTHORS >> >> Aaron Mackey Eamackey@virginia.eduE >> Lincoln Stein Elstein@cshl.orgE >> Jason Stajich Ejason@bioperl.orgE >> >> =head1 APPENDIX >> >> The rest of the documentation details each of the object >> methods. Internal methods are usually preceded with a _ >> >> =cut >> >> # Let the code begin... >> >> package Bio::SeqIO::pir; >> use vars qw(@ISA); >> use strict; >> >> use Bio::SeqIO; >> use Bio::Seq::SeqFactory; >> use Bio::Species; >> use Bio::Annotation::Collection; >> >> @ISA = qw(Bio::SeqIO); >> >> sub _initialize { >> my($self,@args) = @_; >> $self->SUPER::_initialize(@args); >> if( ! defined $self->sequence_factory ) { >> $self->sequence_factory(new Bio::Seq::SeqFactory >> (-verbose => $self->verbose(), >> -type => 'Bio::Seq::RichSeq')); >> } >> } >> >> =head2 next_seq >> >> Title : next_seq >> Usage : $seq = $stream->next_seq() >> Function: returns the next sequence in the stream >> Returns : Bio::Seq object >> Args : NONE >> >> =cut >> >> sub next_seq { >> my ($self) = @_; >> #local($/)= "\n"; >> my $line; >> my ($desc,$seq,$id,$org,$date,$acc_string,@sec,$acc); >> my ($annotation, %params, @features) = ( new >> Bio::Annotation::Collection); >> >> while(defined($line = $self->_readline())) { >> last if index($line,'ENTRY ') == 0; >> } >> return undef if( !defined $line ); # end of file >> >> $line =~ /^ENTRY\s+(\S+)\s+/ || >> $self->throw("Pir stream with bad ENTRY line. Not Pir in my >> book."); >> $id = $1; >> $params{'-display_id'} = $id; >> >> until(defined ($line) && ($line =~ /^SEQUENCE/) ) { >> >> # Description line(s) >> if ($line=~/^TITLE\s+(.*)/) { >> $desc = $1; >> } >> # organism line(s) >> if ($line=~/^ORGANISM\s+\#formal_name\s+(.*)/) { >> $org = $1; >> my @class =($org); >> my $make = Bio::Species->new(); >> $make->classification(\@class,"FORCE"); # no name validation >> please >> $params{'-species'}= $make; >> } >> # date line >> if($line=~/^DATE\s+(\d\d-\w\w\w-\d\d\d\d).*/) { >> $date = $1; >> $date =~ s/\;//; >> $date =~ s/\s+$//; >> push @{$params{'-dates'}}, $date; >> } >> #accession >> if($line=~/^ACCESSIONS\s+(.*)/) { >> $seq = ""; >> $acc_string =$1; >> $acc_string =~ s/\;\s*/ /g; >> ($acc,@sec) = split " ",$acc_string; >> } >> >> $line = $self->_readline(); >> >> } >> my ($seqc,$seqn) = ("",""); >> my $nb=0; >> while( defined ($line = $self->_readline) ) { >> if ($line=~/^\/\/\//) {last}; >> if ($line=~/^\s+\d+\s+\d+/) {next}; >> if ($line=~/^\s+\d+(.*)/) { >> $line=$1; >> } >> $seq = uc($line); >> $seqc .= $seq; >> } >> >> # P - indicates complete protein >> # F - indicates protein fragment >> # not sure how to stuff these into a Bio object >> # suitable for writing out. >> $seqc =~ s/\*//g; >> $seqc =~ s/[\(\)\.\/\=\,]//g; >> $seqc =~ s/\s+//g; # get rid of whitespace >> $params{'-seq_version'} = ''; >> >> my ($alphabet) = ('protein'); >> # TODO - not processing SFS data >> my $entry = $self->sequence_factory->create >> (-verbose => $self->verbose, >> %params, >> -seq => $seqc, >> -primary_id => $id, >> -id => $id, >> -desc => $desc, >> -alphabet => $alphabet, >> -accession_number => $acc, >> -secondardy_accessions => \@sec, >> ); >> >> return $entry; >> } >> >> >> =head2 write_seq >> >> Title : write_seq >> Usage : $stream->write_seq(@seq) >> Function: writes the $seq object into the stream >> Returns : 1 for success and 0 for error >> Args : Array of Bio::PrimarySeqI objects >> >> >> =cut >> >> #sub write_seq { >> # my ($self, @seq) = @_; >> # for my $seq (@seq) { >> # $self->throw("Did not provide a valid Bio::PrimarySeqI object") >> # unless defined $seq && ref($seq) && >> $seq->isa('Bio::PrimarySeqI'); >> # my $str = $seq->seq(); >> # return unless $self->_print(">".$seq->id(), >> # "\n", $seq->desc(), "\n", >> # $str, "*\n"); >> # } >> >> # $self->flush if $self->_flush_on_write && defined $self->_fh; >> # return 1; >> #} >> >> 1; >> >> >> >> Laure Durufle >> >> >> >> >> >> ********************************************************************** >> ***** >> ***************** S - This message contains confidential information >> and is >> intended only for the individual named. If you are not the named >> addressee, >> you should not disseminate, distribute or copy this e-mail. Please >> notify >> the sender immediately by e-mail if you have received this e-mail by >> mistake and delete this e-mail from your system. >> e-mail transmission cannot be guaranteed to be secure or error-free as >> information could be intercepted, corrupted, lost, destroyed, arrive >> late >> or incomplete, or contain malware. The presence of this disclaimer is >> not a >> proof that it was originated at Serono International S.A. or one of >> its >> affiliates. Serono International S.A and its affiliates therefore do >> not >> accept liability for any errors or omissions in the content of this >> message, which arise as a result of e-mail transmission. If >> verification is >> required, please request a hard-copy version. Serono International SA, >> 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com. >> ********************************************************************** >> ***** >> ****************** >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambridge, CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From echuong at gmail.com Mon Jun 28 15:21:16 2004 From: echuong at gmail.com (Edward Chuong) Date: Mon Jun 28 15:23:47 2004 Subject: [Bioperl-l] Help on a basic EST-genomic alignment script In-Reply-To: <244d2e0e04062701275dae0799@mail.gmail.com> References: <244d2e0e040625152477e9d07a@mail.gmail.com> <244d2e0e04062701275dae0799@mail.gmail.com> Message-ID: <244d2e0e0406281221132be10e@mail.gmail.com> > > You basically need to predict the protein sequence from the EST - you can > > use estwise to do this based on the best mouse protein homolog. > Hey, I just downloaded estwise and it works much better than clustalw for est alignment. How do I use this from perl? I'm having trouble installing the perl module for it (make perl ?) wise2.2.0.. I had to use the BioTeam OS X pkg to install the binaries (binaries worked fine for testing but obvoiusly I need to access estwise from perl!). I get the following errors trying to compile from source (tried with su as well) on OS X 10.3.4. ld: Undefined symbols: _HMMFileOpenFseek _HMMFtell _ajCharNewL _ajMemCalloc0 _ajSeqChar _ajSeqsetFill _ajSeqsetGetFormat _ajSeqsetGetSeq _ajSeqsetIsDna _ajSeqsetIsProt _ajSeqsetIsRna _ajSeqsetLen _ajSeqsetName _ajSeqsetSize _ajSeqsetToUpper _ajSeqsetWeight _ajStrLen _ajStrPrefixC _ajStrStr _ajMessCrashFL _ajMessSetErr _ajFmtPrintF make[1]: *** [estwise] Error 1 make: *** [realall] Error 2 Then running "make perl" (which probably shouldn't work anyway because the original compile didn't) I get: /usr/bin/perl /System/Library/Perl/5.8.1/ExtUtils/xsubpp -typemap /System/Library/Perl/5.8.1/ExtUtils/typemap -typemap typemap Wise2.xs > Wise2.xsc && mv Wise2.xsc Wise2.c Error: 'Wise2_SupportingFeature *' not in typemap in Wise2.xs, line 6081 Error: 'Wise2_SupportingFeature *' not in typemap in Wise2.xs, line 6108 Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 6650 Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 6710 Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 6730 Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8322 Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8337 Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8352 Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8368 Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8384 make[1]: *** [Wise2.c] Error 1 make: *** [perlmake] Error 2 Anyone out there with experience installing wise in OS X? Thanks! -- Edward Chuong http://iacs5.ucsd.edu/~echuong From allenday at ucla.edu Mon Jun 28 16:15:00 2004 From: allenday at ucla.edu (Allen Day) Date: Mon Jun 28 16:18:02 2004 Subject: [Bioperl-l] dbSNP In-Reply-To: <69BA0F938FAC6A4CBEF49461720696F20439F6C2@nihexchange16.nih.gov> References: <69BA0F938FAC6A4CBEF49461720696F20439F6C2@nihexchange16.nih.gov> Message-ID: if you look in the source code of the parser, you'll see that i don't handle all the information -- only the information which was relevant to me at the time i wrote the module. if you need more of the information parsed out, please submit a patch to the bioperl mailing list (cced here), or if you already have a cvs account feel free to make the commits yourself. -allen On Mon, 28 Jun 2004, Babenko, Vladimir (NIH/NLM/NCBI) wrote: > Allen, hi; > Though I work at the origin of dbSNP I'd appreciate if you could shed > some light on dbSNP usage ;-) > Do I take it right that only genomic position is available for the snp in > dbsnp? > I know that there is a protein accession in one of the fields? Should I map > it onto the cds by my self? > Looks that there's nothing of the kind... > Thank you, > Vladimir > > ----------------- > > Vladimir Babenko > > Bldg 38A 8S816L > > National Center for Biotechnology Information > > National Library Of Medicine > > National Institute of Health > > Bethesda, 20894 > > (v) 301-594-8079 > > (f) 301-480-4637 > From jason at cgt.duhs.duke.edu Mon Jun 28 17:32:56 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jun 28 17:35:36 2004 Subject: [Bioperl-l] Help on a basic EST-genomic alignment script In-Reply-To: <244d2e0e0406281221132be10e@mail.gmail.com> References: <244d2e0e040625152477e9d07a@mail.gmail.com> <244d2e0e04062701275dae0799@mail.gmail.com> <244d2e0e0406281221132be10e@mail.gmail.com> Message-ID: Just use estwise as a standalone program not from within perl. % estwise protein est estwise is pretty slow so I wouldn't embark on this route unless you know what you are doing. Try a BLAST or FASTA route first to get likely homologs. I would probably start with fastx or fasty first to find likely homologs and get a good guess about the reading frame. fastx and fasty are translated search algorithms (analagous blastx). From the docs fastx3/ fasty3 Compare a DNA sequence to a protein sequence database, by comparing the translated DNA sequence in three frames and allowing gaps and frameshifts. fastx3 uses a simpler, faster algorithm for alignments that allows frameshifts only between codons; fasty3 is slower but produces better alignments with poor quality sequences because frameshifts are allowed within codons. This should have come on the BioTeam installer in the fasta pkg. -jason On Mon, 28 Jun 2004, Edward Chuong wrote: > > > You basically need to predict the protein sequence from the EST - you can > > > use estwise to do this based on the best mouse protein homolog. > > > > Hey, > > I just downloaded estwise and it works much better than clustalw for > est alignment. How do I use this from perl? I'm having trouble > installing the perl module for it (make perl ?) wise2.2.0.. I had to > use the BioTeam OS X pkg to install the binaries (binaries worked fine > for testing but obvoiusly I need to access estwise from perl!). I get > the following errors trying to compile from source (tried with su as > well) on OS X 10.3.4. > > ld: Undefined symbols: > _HMMFileOpenFseek > _HMMFtell > _ajCharNewL > _ajMemCalloc0 > _ajSeqChar > _ajSeqsetFill > _ajSeqsetGetFormat > _ajSeqsetGetSeq > _ajSeqsetIsDna > _ajSeqsetIsProt > _ajSeqsetIsRna > _ajSeqsetLen > _ajSeqsetName > _ajSeqsetSize > _ajSeqsetToUpper > _ajSeqsetWeight > _ajStrLen > _ajStrPrefixC > _ajStrStr > _ajMessCrashFL > _ajMessSetErr > _ajFmtPrintF > make[1]: *** [estwise] Error 1 > make: *** [realall] Error 2 > > Then running "make perl" (which probably shouldn't work anyway because > the original compile didn't) I get: > > /usr/bin/perl /System/Library/Perl/5.8.1/ExtUtils/xsubpp -typemap > /System/Library/Perl/5.8.1/ExtUtils/typemap -typemap typemap Wise2.xs > > Wise2.xsc && mv Wise2.xsc Wise2.c > Error: 'Wise2_SupportingFeature *' not in typemap in Wise2.xs, line 6081 > Error: 'Wise2_SupportingFeature *' not in typemap in Wise2.xs, line 6108 > Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 6650 > Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 6710 > Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 6730 > Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8322 > Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8337 > Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8352 > Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8368 > Error: 'Wise2_DPRunImpl *' not in typemap in Wise2.xs, line 8384 > make[1]: *** [Wise2.c] Error 1 > make: *** [perlmake] Error 2 > > Anyone out there with experience installing wise in OS X? > > Thanks! > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From facemann at yahoo.com Mon Jun 28 19:51:15 2004 From: facemann at yahoo.com (Andy Hammer) Date: Mon Jun 28 19:53:38 2004 Subject: [Bioperl-l] Cannot connect to local ODBA In-Reply-To: <8A4AE77C-C6B5-11D8-A52B-000A959EB4C4@gmx.net> Message-ID: <20040628235115.61306.qmail@web13426.mail.yahoo.com> --- Hilmar Lapp wrote: > I thought that support was maintained at the > Singapore hackathon, but > it may have been wishful thinking. > > So maybe I need to take a look at some point on how > to support the > biosql protocol in obda. Of course, everybody > willing to take a shot at > this would be greatly welcome. (And no, you can't > just resuscitate the > former BiodatabaseAdaptor.) > > -hilmar > Thats too bad it is not maintained. It was such a neat concept. Instead I have used code hilmar demonstrated at http://open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf Thanks for the reply and the slides! Andy __________________________________ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail From SonjaFunke at web.de Tue Jun 29 08:29:08 2004 From: SonjaFunke at web.de (Sonja Funke) Date: Tue Jun 29 08:31:32 2004 Subject: [Bioperl-l] Bioperl beginner has a problem! Message-ID: <308922181@web.de> Hallo Bioperl users, hope you can help me, I get the following message from the NCBI:

An Error Occurred

500 Cannot write to '/tmp/h5LmopUZwi': Too many open files and I have no idea why. My programm takes an amino acid sequence sends it to the NCBI and performs there a BLAST search. when the sequence has an identity of 98% its written to an database. Thus I have no open files and respectively no open tmp files. Thanks a lot! Sonja ____________________________________________________ Aufnehmen, abschicken, nah sein - So einfach ist WEB.DE Video-Mail: http://freemail.web.de/?mc=021200 From ak at ebi.ac.uk Tue Jun 29 09:00:37 2004 From: ak at ebi.ac.uk (Andreas Kahari) Date: Tue Jun 29 09:03:21 2004 Subject: [Bioperl-l] Bioperl beginner has a problem! In-Reply-To: <308922181@web.de> References: <308922181@web.de> Message-ID: <20040629130037.GC13085@ebi.ac.uk> On Tue, Jun 29, 2004 at 02:29:08PM +0200, Sonja Funke wrote: > Hallo Bioperl users, > > hope you can help me, I get the following message from the NCBI: >

An Error Occurred

> 500 Cannot write to '/tmp/h5LmopUZwi': Too many open files > and I have no idea why. > My programm takes an amino acid sequence sends it to the NCBI and performs there a BLAST search. when the sequence has an identity of 98% its written to an database. > Thus I have no open files and respectively no open tmp files. It looks to me as if the problem is at NCBI since it's the blast service at NCBI that replies with the error message. Their blast server is probably being bombarded by someone at the moment. Hold off for a moment and try again later, or contact NCBI. Andreas -- |<><>| Andreas K?h?ri EMBL, European Bioinformatics Institute | <> | Wellcome Trust Genome Campus |<><>| Ensembl Developer Hinxton, Cambridgeshire, CB10 1SD | <> | DAS Project Leader United Kingdom From habulleef at kfshrc.edu.sa Tue Jun 29 04:30:08 2004 From: habulleef at kfshrc.edu.sa (habulleef@kfshrc.edu.sa) Date: Tue Jun 29 09:07:51 2004 Subject: [Bioperl-l] Installing Bioperl Message-ID: Hello, I am looking for an easy straight forward way to install Bioperl for windows 2000 as have tried the steps in bioperl .org and had difficulties to install. Thank You ____________________________________________________ Hana Abulleef, M.Sc. Research Programmer Analyst King Faisal Specialist Hospital & Research Center Biostatistics, Epidemiology and Scientific Computing (BESC) MBC-03, P.O. Box 3354 Riyadh 11211, Kingdom of Saudi Arabia E-Mail: habulleef@kfshrc.edu.sa Disclaimer * ---------------------------------------------------------------- * This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of King Faisal Specialist Hospital& Research Centre (KFSH&RC). Finally, the recipient should check this email and any attachments for the presence of viruses. KFSH&RC accepts no liability for any damage caused by any virus transmitted by this email. *------------------------------------------------------------------ * From brian_osborne at cognia.com Tue Jun 29 09:34:53 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Jun 29 09:37:44 2004 Subject: [Bioperl-l] Installing Bioperl In-Reply-To: Message-ID: Hana, How easy an approach is may depend on your background. I'm comfortable in the Unix environment so the approach that's easiest for me is to use Bioperl with Cygwin, which is a free Unix emulator for Windows. So step 1 would be to install Cygwin (www.cygwin.com) along with the Cygwin packages binutil, gcc, make, and perl. When Cygwin is installed you want to start it up and install Bioperl just as you would in Unix. These steps are described in the INSTALL document, take a look at the sections THE BIOPERL BUNDLE and INSTALLING BIOPERL THE EASY WAY USING CPAN (http://bioperl.org/Core/Latest/INSTALL). When you first start using CPAN in Perl it will need some configuration, answering all of its questions with the default answers is usually the right thing to do. A note on Cygwin: it doesn't write to your Registry, it doesn't alter your system or your existing files in any way, it doesn't create partitions, it simply creates a cygwin directory and writes all of its files to that directory. To uninstall Cygwin just delete that directory. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of habulleef@kfshrc.edu.sa Sent: Tuesday, June 29, 2004 4:30 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Installing Bioperl Hello, I am looking for an easy straight forward way to install Bioperl for windows 2000 as have tried the steps in bioperl .org and had difficulties to install. Thank You ____________________________________________________ Hana Abulleef, M.Sc. Research Programmer Analyst King Faisal Specialist Hospital & Research Center Biostatistics, Epidemiology and Scientific Computing (BESC) MBC-03, P.O. Box 3354 Riyadh 11211, Kingdom of Saudi Arabia E-Mail: habulleef@kfshrc.edu.sa Disclaimer * ---------------------------------------------------------------- * This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of King Faisal Specialist Hospital& Research Centre (KFSH&RC). Finally, the recipient should check this email and any attachments for the presence of viruses. KFSH&RC accepts no liability for any damage caused by any virus transmitted by this email. *------------------------------------------------------------------ * _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From s.paul at surrey.ac.uk Tue Jun 29 17:48:33 2004 From: s.paul at surrey.ac.uk (S.Paul) Date: Tue Jun 29 09:55:35 2004 Subject: [Bioperl-l] Installing Bioperl References: Message-ID: <056201c45e22$d3284ff0$736fe383@LTCEP1SP> Hi : Are you using ActivePerl for Windows? if you are using it then the installation of bioperl package is pretty smooth. Sujoy Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk ----- Original Message ----- From: To: Sent: Tuesday, June 29, 2004 1:30 AM Subject: [Bioperl-l] Installing Bioperl > > > > > Hello, > I am looking for an easy straight forward way to install Bioperl for > windows 2000 > as have tried the steps in bioperl .org and had difficulties to install. > > Thank You > ____________________________________________________ > > Hana Abulleef, M.Sc. > Research Programmer Analyst > King Faisal Specialist Hospital & Research Center > Biostatistics, Epidemiology and Scientific Computing (BESC) > MBC-03, P.O. Box 3354 > Riyadh 11211, Kingdom of Saudi Arabia > E-Mail: habulleef@kfshrc.edu.sa > > > > > > Disclaimer > * ---------------------------------------------------------------- * > This email and any files transmitted with it are confidential > and intended solely for the use of the individual or entity to > whom they are addressed. If you have received this email in error > please notify the sender. Please note that any views or opinions > presented in this email are solely those of the author and do not > necessarily represent those of King Faisal Specialist Hospital& > Research Centre (KFSH&RC). Finally, the recipient should check > this email and any attachments for the presence of viruses. > KFSH&RC accepts no liability for any damage caused by any virus > transmitted by this email. > *------------------------------------------------------------------ * > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From davila at ioc.fiocruz.br Tue Jun 29 10:22:05 2004 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Tue Jun 29 10:23:59 2004 Subject: [Bioperl-l] Help with dependencies of Bioperl Message-ID: <1088518925.3193.14.camel@tryps> Dear all, I can run the script listed at the bottom without any problems in my laptop using FC1 and kernel 2.4 with the following perl RPM's installed: perl-URI-1.21-7 perl-XML-Parser-2.34-0_1.rhfc1.at perl-HTML-Tagset-3.03-28 perl-XML-Encoding-1.01-23 perl-DateManip-5.42a-0_10.rhfc1.at perl-libxml-enno-1.02-29 perl-SGMLSpm-1.03ii-12 perl-DBD-Pg-1.22-1 mod_perl-1.99_12-2 perl-Glib-1.040-1.rhfc1.dag perl-5.8.3-16 perl-HTTP-GHTTP-1.07-2.rhfc1.at perl-Filter-1.29-8 perl-Parse-Yapp-1.05-30 perl-libxml-perl-0.07-28 perl-Gtk-HandyCList-0.02-0.rhfc1.dag perl-HTML-Parser-3.34-0_1.rhfc1.at perl-libwww-perl-5.79-0_3.rhfc1.at perl-DBI-1.37-1 perl-XML-Dumper-0.4-25 perl-GD-2.12-8.rhfc1.at perl-DBD-MySQL-2.9002-1 perl-HTTP-Lite-2.1.6-8 however, it does not work when I try to run the same script in another machine (FC2 and kernel 2.6), with the same version of bioperl installed like in mine, and the following perl RPM's: perl-libxml-perl-0.07-29 perl-HTML-Parser-3.36-8 perl-5.8.3-18 mod_perl-1.99_12-2.1 perl-HTML-Tagset-3.03-29 perl-Parse-Yapp-1.05-31 perl-libxml-enno-1.02-30 perl-XML-Parser-2.34-8 perl-DBI-1.40-4 perl-DBD-MySQL-2.9003-4 perl-SGMLSpm-1.03ii-13 perl-IO-String-1.05-8 perl-GD-2.12-8.rhfc2.at perl-Filter-1.30-8 perl-URI-1.31-8 perl-XML-Encoding-1.01-25 perl-DateManip-5.42a-8 perl-XML-Dumper-0.71-8 perl-libwww-perl-5.79-1 Curiously I dont get any error msg to help to debug it... would you offer any tip that could help to solve this, please ? Thanks in advance, Alberto ****** #!/usr/local/bin/perl -w use lib "/usr/local/bioperl14"; use strict; use Bio::DB::Query::GenBank; use Bio::SeqIO; use Bio::DB::GenBank; my $organismname = $ARGV[0]; my $contaminantfile = "contaminant."."$organismname"; my $query_string = $organismname."[Organism] AND (ribosomal gene[title] OR mitochondrial gene[title] OR rDNA [title] OR rRNA gene[title]"; my $query = new Bio::DB::Query::GenBank(-db=>'nucleotide', -query=>$query_string, -mindate => '1985', -maxdate => '2004'); my $count = $query->count; my $seqio=new Bio::DB::GenBank->get_Stream_by_query($query); #open a seqio handle for writing the outputfile in fasta my $outfile = new Bio::SeqIO(-format=>'fasta', -file=>">$contaminantfile"); while (my $s = $seqio->next_seq) { print $query_string, "\n"; print $count, "\n"; #write the fasta $outfile->write_seq($s); } exit; From Annie.Law at nrc-cnrc.gc.ca Tue Jun 29 11:14:05 2004 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Tue Jun 29 11:16:53 2004 Subject: [Bioperl-l] RE: Bioperl DB successful queries Message-ID: <10C94843061E094A98C02EB77CFC328722FE59@nrcmrdex1d.imsb.nrc.ca> Hi Hilmar and other bioperlers, Thanks for your response. I would appreciate help with the following. 1. If my starting point is Unigene Cluster id. These are the select statements that I am using to Get to locuslink id. I think it could be cleaned up a bit but it seems to be doing the job. It seems that it was indicated that these steps were incorrect. I just wanted to be more explicit since it seems to be correct. Here are the SQL statements I use in the following order. $sthdbx = $dbh->prepare("Select dbxref_id from dbxref where accession = '$unigeneans_curr' and dbname = 'Unigene'"); $dbxref_loc_ans[0] is answer from first select statement. $sthbio = $dbh->prepare("Select bioentry_id from bioentry_dbxref where dbxref_id = $dbxref_loc_ans[0]"); $sthloc = $dbh->prepare("Select accession from bioentry where bioentry_id = $bioentry_loc_ans[0]"); Answer of this last select would be the locuslink id 2. I'm not sure how Subject Bioentry in Bioentry Relationship is the Unigene Cluster ID. For example, I execute the following SQL statements in the following order SELECT * FROM 'bioentry' where accession = 'H72976' ANSWER: bioentry_id 795253 SELECT * FROM 'bioentry_relationship' where object_bioentry_id = '795253' ANSWER: subject_bioentry_id 795131 I plugged my clone id into SOURCE and got a unigene id Hs. 39488 which does not seem to match the subject_bioentry_id 795131 Am I missing a step? 3. The script load_seqdatabase.pl is flexible and useful. If one is not concerned about speed. I would like to know what The disadvantage and advantage is of emptying the bioperl-db tables and then reloading information from scratch vs. loading A full database with mergeobjs and lookup options? Thanks very much, Annie. From davila at ioc.fiocruz.br Tue Jun 29 12:56:09 2004 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Tue Jun 29 12:58:05 2004 Subject: [Bioperl-l] Help with dependencies of Bioperl Message-ID: <1088528169.3193.32.camel@tryps> Dear all, I can run the script listed at the bottom without any problems in my laptop using FC1 and kernel 2.4 with the following perl RPM's installed: perl-URI-1.21-7 perl-XML-Parser-2.34-0_1.rhfc1.at perl-HTML-Tagset-3.03-28 perl-XML-Encoding-1.01-23 perl-DateManip-5.42a-0_10.rhfc1.at perl-libxml-enno-1.02-29 perl-SGMLSpm-1.03ii-12 perl-DBD-Pg-1.22-1 mod_perl-1.99_12-2 perl-Glib-1.040-1.rhfc1.dag perl-5.8.3-16 perl-HTTP-GHTTP-1.07-2.rhfc1.at perl-Filter-1.29-8 perl-Parse-Yapp-1.05-30 perl-libxml-perl-0.07-28 perl-Gtk-HandyCList-0.02-0.rhfc1.dag perl-HTML-Parser-3.34-0_1.rhfc1.at perl-libwww-perl-5.79-0_3.rhfc1.at perl-DBI-1.37-1 perl-XML-Dumper-0.4-25 perl-GD-2.12-8.rhfc1.at perl-DBD-MySQL-2.9002-1 perl-HTTP-Lite-2.1.6-8 however, it does not work when I try to run the same script in another machine (FC2 and kernel 2.6), with the same version of bioperl installed like in mine, and the following perl RPM's: perl-libxml-perl-0.07-29 perl-HTML-Parser-3.36-8 perl-5.8.3-18 mod_perl-1.99_12-2.1 perl-HTML-Tagset-3.03-29 perl-Parse-Yapp-1.05-31 perl-libxml-enno-1.02-30 perl-XML-Parser-2.34-8 perl-DBI-1.40-4 perl-DBD-MySQL-2.9003-4 perl-SGMLSpm-1.03ii-13 perl-IO-String-1.05-8 perl-GD-2.12-8.rhfc2.at perl-Filter-1.30-8 perl-URI-1.31-8 perl-XML-Encoding-1.01-25 perl-DateManip-5.42a-8 perl-XML-Dumper-0.71-8 perl-libwww-perl-5.79-1 Curiously I dont get any error msg to help to debug it... would you offer any tip that could help to solve this, please ? Thanks in advance, Alberto ****** #!/usr/local/bin/perl -w use lib "/usr/local/bioperl14"; use strict; use Bio::DB::Query::GenBank; use Bio::SeqIO; use Bio::DB::GenBank; my $organismname = $ARGV[0]; my $contaminantfile = "contaminant."."$organismname"; my $query_string = $organismname."[Organism] AND (ribosomal gene[title] OR mitochondrial gene[title] OR rDNA [title] OR rRNA gene[title]"; my $query = new Bio::DB::Query::GenBank(-db=>'nucleotide', -query=>$query_string, -mindate => '1985', -maxdate => '2004'); my $count = $query->count; my $seqio=new Bio::DB::GenBank->get_Stream_by_query($query); #open a seqio handle for writing the outputfile in fasta my $outfile = new Bio::SeqIO(-format=>'fasta', -file=>">$contaminantfile"); while (my $s = $seqio->next_seq) { print $query_string, "\n"; print $count, "\n"; #write the fasta $outfile->write_seq($s); } exit; From s9904982 at sms.ed.ac.uk Tue Jun 29 10:34:34 2004 From: s9904982 at sms.ed.ac.uk (martin) Date: Tue Jun 29 13:11:32 2004 Subject: [Bioperl-l] parser for dnadist from the phylip package Message-ID: <1088519674.3760.4.camel@mylonchulus.cap.ed.ac.uk> Hi, I'm looking to run dnadist on a series of multiple sequence alignments (held as Bio::Align objects). For protein alignments, the process seems to be catered for (from what I can tell from the documentation) by Bio::Tools::Phylo::Phylip::ProtDist and Bio::Tools::Run::Phylo::Phylip::ProtDist. Does anyone know if there's an analogous object for dnadist, or whether the ProtDist modules could be coerced into working? Any pointers appreciated. -- Martin Jones Blaxter Nematode Genomics Group School of Biological Sciences Ashworth Laboratories University of Edinburgh Edinburgh EH9 3JT UK tel: +44 131 650 7403 web: www.nematodes.org From jason at cgt.duhs.duke.edu Tue Jun 29 15:12:20 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jun 29 15:14:53 2004 Subject: [Bioperl-l] parser for dnadist from the phylip package In-Reply-To: <1088519674.3760.4.camel@mylonchulus.cap.ed.ac.uk> References: <1088519674.3760.4.camel@mylonchulus.cap.ed.ac.uk> Message-ID: I guess no one has written this wrapper yet. Still looking for volunteers to help. It has basically been Shawn Hoon and myself writing the PHYLIP wrappers and we've only done the ones we needed. In the latest (CVS HEAD) bioperl you can also just do my $stats = Bio::Align::DNAStatistics->new; my $matrix = $stats->distance(-align => $aln, -method => "Jukes-Cantor"); These are the ones currently implemeted: JukesCantor [jc|jukes|jukescantor|jukes-cantor] Uncorrected [jcuncor|uncorrected] Kimura [k2|k2p|k80|kimura] Tamura [t92|tamura|tamura92] TajimaNei [tajimanei|tajima\-nei] -jason On Tue, 29 Jun 2004, martin wrote: > Hi, > > I'm looking to run dnadist on a series of multiple sequence alignments > (held as Bio::Align objects). For protein alignments, the process seems > to be catered for (from what I can tell from the documentation) by > > Bio::Tools::Phylo::Phylip::ProtDist > > and > > Bio::Tools::Run::Phylo::Phylip::ProtDist. > > Does anyone know if there's an analogous object for dnadist, or whether > the ProtDist modules could be coerced into working? > > Any pointers appreciated. > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From echuong at gmail.com Tue Jun 29 18:13:52 2004 From: echuong at gmail.com (Edward Chuong) Date: Tue Jun 29 18:16:19 2004 Subject: [Bioperl-l] Help on a basic EST-genomic alignment script In-Reply-To: References: <244d2e0e040625152477e9d07a@mail.gmail.com> <244d2e0e04062701275dae0799@mail.gmail.com> <244d2e0e0406281221132be10e@mail.gmail.com> <244d2e0e0406281529586c7693@mail.gmail.com> Message-ID: <244d2e0e04062915137241bdc4@mail.gmail.com> Hi, I tried using FASTX, which looks like it gives similar alignment results as estwise. Can bioperl take in the fastx output? I always thought "fasta" output was just the simple >header followed by sequence, but the output by fastx has far more information. However, even fastx output is missing the original nucleotide sequence.. which wouldn't be a problem if it would included the nucleotide locations, but it doesn't as far as I can see. Ultimatley I need something that will align an EST (from a file) to a mus CDS (retrieved from genbank) and get an aln object I can use to find dn/ds (which means the alignment must be in the correct protein coding frame) in bioperl.. So estwise and fastx/y align the both translated sequences beautifully, but it seems like I can't parse them in bioperl, and they don't return as nucleotide anyway. The closest thing I can find is est2genome from EMBOSS. It aligns the EST to the mus cDNA nucleotide sequences great--but that alignment isn't in any particular coding frame, which would cause problems if I stuck it in a dn/ds module. Also, est2genome doesn't appear to output in standard EMBOSS format. Anyone with experience doing something similar with dn/ds care to share how you got a proper alignment object? Thanks! -Ed On Mon, 28 Jun 2004 19:16:11 -0400 (EDT), Jason Stajich wrote: > > Hmm - I guess estwise doesn't provide a machine parseable output as I > would have thought. What does one do ewan? No one has written a > wise prettyblock alignment parser yet sadly. > > -jason > > On Mon, 28 Jun 2004, Edward Chuong wrote: > > > > Just use estwise as a standalone program not from within perl. > > > % estwise protein est > > > > > > estwise is pretty slow so I wouldn't embark on this route unless you know > > > what you are doing. Try a BLAST or FASTA route first to get likely > > > homologs. > > > > > > > Hey, > > > > I'm using blast already to find the likely homologs, which is working > > fine, and I get the homolog CDS/protein sequence by querying genbank > > with the accessionID from blast. > > > > ESTwise seems fast enough for my very small ESTs. I'm not sure what > > fastx is used for. I need a library..? > > > > How can I automate running estwise? Should i just use some sort of shell script? > > > > Is there any way to get the alignment I get from estwise into one of > > the dn/ds modules in bioperl? > > > > Thanks so much for helping! > > > > -Ed > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > -- Edward Chuong http://iacs5.ucsd.edu/~echuong From jason at cgt.duhs.duke.edu Tue Jun 29 20:40:03 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jun 29 20:42:37 2004 Subject: [Bioperl-l] Help on a basic EST-genomic alignment script In-Reply-To: <244d2e0e04062915137241bdc4@mail.gmail.com> References: <244d2e0e040625152477e9d07a@mail.gmail.com> <244d2e0e04062701275dae0799@mail.gmail.com> <244d2e0e0406281221132be10e@mail.gmail.com> <244d2e0e0406281529586c7693@mail.gmail.com> <244d2e0e04062915137241bdc4@mail.gmail.com> Message-ID: On Tue, 29 Jun 2004, Edward Chuong wrote: > Hi, > > I tried using FASTX, which looks like it gives similar alignment > results as estwise. Can bioperl take in the fastx output? I always > thought "fasta" output was just the simple >header followed by > sequence, but the output by fastx has far more information. However, > even fastx output is missing the original nucleotide sequence.. which > wouldn't be a problem if it would included the nucleotide locations, > but it doesn't as far as I can see. FASTA has more than one meaning in bioinformatics FASTA/Pearson sequence format >ID DESC SEQUENCE [parse with Bio::SeqIO and get sequences] FASTA multiple sequence format >ID DESC SEQ-WITH-GAPS [parse with Bio::AlignIO and get an alignment] FASTA programs (FASTA, FASTX,FASTY,TFASTA,TFASTX....) produces a FASTA report [parse with Bio::SearchIO and get Search Report objects] See the SearchIO HOWTO on the bioperl website. > > Ultimatley I need something that will align an EST (from a file) to a > mus CDS (retrieved from genbank) and get an aln object I can use to > find dn/ds (which means the alignment must be in the correct protein > coding frame) in bioperl.. > > So estwise and fastx/y align the both translated sequences > beautifully, but it seems like I can't parse them in bioperl, and they > don't return as nucleotide anyway. > Ah but you HAVE the est sequence in your query file you used to run FASTX. FASTX/Y gives you nt coordinates for your query sequence just like BLASTX does for translated searches. So $hsp->query->start and $hsp->query->end are the start and stop of the alignment. You can just get the NT sequence by reading in your EST sequence with SeqIO (or getting it from a sequence database like Bio::Index::Fasta) and then call my $cdsseq = $seq->trunc($hsp->query->start, $hsp->query->end); Make a hash of all these seqs $cdsseqs{$hsp->query->seq_id} = $cdsseq; You'll also need to get the CDS region from the subject (Mus CDS)- I'm assuming you built your protein set from just Mus CDS and not cDNA - otherwise if these are ensembl peps you can get just the CDS which codes for each protein accession from EnsMart. Or you can just get the CDS clipped out from the genbank file as you would have already done. The start/end of the alignment in subject nt coords will be my ($hstart,$hend) = ( ($hsp->hit->start -1) * 3 + 1, ($hsp->hit->end -1) * 3 + 1); So do the same thing as before and grab the sub-sequence from the Mus cDNA and add it to the %cdssseq hash. Now you still have to contend with frameshifts - you're going to have to figure where they are coded as '/' and '\' in the query string ($hsp->query_string, $hsp->hit_string) and either insert or delete an appropriate base (insert an N if it is a missing base, remove the extra base if it is there). I'm not really sure how to code this up best so you may at first just bin all the ESTs which have likely frameshifts and work with them later by hand. Rather than trying to write this whole algorithm at one time I would get the easier things working first. Other people may have done this too and have better suggestions than me. For good measure you might take your cdsseqs, translate them back to protein, and align with pSW or needle/water with EMBOSS and check that you don't have any stop codons (in case your fixing of frameshifts didn't work or to detect if you are accidently clipping out the wrong piece of sequence). Given this alignment - $proteinln you can use the following to align the cds sequences using the protein aln as a template. use Bio::Align::Utils qw(aa_to_dna_aln); my $cdsaaln = &aa_to_dna_aln($proteinaln,%cdsseqs); Then pass the $cdsaln object to the PAML Runner (Bio::Tools::Run::Phylo::PAML:: Codeml or Yn00). May seem complicated - but I don't know how else you do it.... When it's done, we should add it as a script to bioperl. > The closest thing I can find is est2genome from EMBOSS. It aligns the > EST to the mus cDNA nucleotide sequences great--but that alignment > isn't in any particular coding frame, which would cause problems if I > stuck it in a dn/ds module. Also, est2genome doesn't appear to output > in standard EMBOSS format. > Try sim4 if you are doing est to genome alignments - but not sure how well this will work cross-species. The FASTX/Y approach is probably going to give you a better > Anyone with experience doing something similar with dn/ds care to > share how you got a proper alignment object? > > Thanks! > > -Ed > > On Mon, 28 Jun 2004 19:16:11 -0400 (EDT), Jason Stajich > wrote: > > > > Hmm - I guess estwise doesn't provide a machine parseable output as I > > would have thought. What does one do ewan? No one has written a > > wise prettyblock alignment parser yet sadly. > > > > -jason > > > > On Mon, 28 Jun 2004, Edward Chuong wrote: > > > > > > Just use estwise as a standalone program not from within perl. > > > > % estwise protein est > > > > > > > > estwise is pretty slow so I wouldn't embark on this route unless you know > > > > what you are doing. Try a BLAST or FASTA route first to get likely > > > > homologs. > > > > > > > > > > Hey, > > > > > > I'm using blast already to find the likely homologs, which is working > > > fine, and I get the homolog CDS/protein sequence by querying genbank > > > with the accessionID from blast. > > > > > > ESTwise seems fast enough for my very small ESTs. I'm not sure what > > > fastx is used for. I need a library..? > > > > > > How can I automate running estwise? Should i just use some sort of shell script? > > > > > > Is there any way to get the alignment I get from estwise into one of > > > the dn/ds modules in bioperl? > > > > > > Thanks so much for helping! > > > > > > -Ed > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From michael.watson at bbsrc.ac.uk Wed Jun 30 04:34:23 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed Jun 30 04:37:08 2004 Subject: [Bioperl-l] SearchIO error Message-ID: <8975119BCD0AC5419D61A9CF1A923E957C2757@iahce2knas1.iah.bbsrc.reserved> Ooops, I was using it incorrectly - RTFM etc :-) Having said that though, for consistency, if one has to pass a writer object to SearchIO, shouldn't one also have to pass a writer to SeqIO? Presumably when one specifies (-format=>EMBL, -file=>">out.embl") in SeqIO, then internally bioperl goes to get an EMBL writer object anyway? So maybe the internals of SeqIO and SearchIO are pretty similar, yet in one you pass a -format argument, and in another you pass a writer object. Though I could, of course, be completely wrong. Thanks for the help! Mick -----Original Message----- From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] Sent: 23 June 2004 14:17 To: michael watson (IAH-C) Cc: Bioperl-l Subject: Re: [Bioperl-l] SearchIO error More correctly - what do you want to see in your out.blast? a table? the blast report? If you read the docs for how to use a hit writer you'll see it isn't as parallel to SeqIO type ways - you need to give SearchIO a writer object when you initialize it for writing. But if you are just re-writing out a blast file why parse it with SearchIO in the first place? -jason On Wed, 23 Jun 2004, michael watson (IAH-C) wrote: > Hi > > I am using bioperl-1.4 now. I seem to get an error when using > SearchIO. Perhaps I am not using it correctly? My script is: > > my $fh = new IO::File; > $fh->open("/usr/bin/blastall -p blastp -i test.fasta -d 10287/set2 > |"); > > my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh); my > $result = $searchio->next_result; > > my $search_out = Bio::SearchIO->new(-format => 'blast', -file => > ">test_out.blast"); $search_out->write_result($result); > > $fh->close; > > The error I get is: > > -------------------- WARNING --------------------- > MSG: Writer not defined. Using a Bio::Search::Writer::HitTableWriter > --------------------------------------------------- > Can't locate object method "new" via package > "Bio::Search::Writer::HitTableWriter" (perhaps you forgot to load > "Bio::Search::Writer::HitTableWriter"?) at > /usr/local/bioperl-1.4/Bio/SearchIO/blast.pm line 1493, line > 397. > > Can anyone be of help? > > Thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Wed Jun 30 08:33:37 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jun 30 08:36:04 2004 Subject: [Bioperl-l] SearchIO error In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E957C2757@iahce2knas1.iah.bbsrc.reserved> References: <8975119BCD0AC5419D61A9CF1A923E957C2757@iahce2knas1.iah.bbsrc.reserved> Message-ID: On Wed, 30 Jun 2004, michael watson (IAH-C) wrote: > Ooops, I was using it incorrectly - RTFM etc :-) > > Having said that though, for consistency, if one has to pass a writer > object to SearchIO, shouldn't one also have to pass a writer to SeqIO? > Presumably when one specifies (-format=>EMBL, -file=>">out.embl") in > SeqIO, then internally bioperl goes to get an EMBL writer object anyway? > So maybe the internals of SeqIO and SearchIO are pretty similar, yet in > one you pass a -format argument, and in another you pass a writer > object. Not really. We didn't decide to do it that way for SearchIO. There internals for autoloading a format module is similar but not identical. At first I never intended the writers to be plugged into SearchIO - you just call my $string = $writer->to_string($result); and SearchIO plugging was added later and so it seems a little hacky. I don't really want to mess with it now. > > Though I could, of course, be completely wrong. > > Thanks for the help! > > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: 23 June 2004 14:17 > To: michael watson (IAH-C) > Cc: Bioperl-l > Subject: Re: [Bioperl-l] SearchIO error > > > More correctly - what do you want to see in your out.blast? a table? > the blast report? > > If you read the docs for how to use a hit writer you'll see it isn't as > parallel to SeqIO type ways - you need to give SearchIO a writer object > when you initialize it for writing. But if you are just re-writing out > a blast file why parse it with SearchIO in the first place? > > -jason > > On Wed, 23 Jun 2004, michael watson (IAH-C) wrote: > > > Hi > > > > I am using bioperl-1.4 now. I seem to get an error when using > > SearchIO. Perhaps I am not using it correctly? My script is: > > > > my $fh = new IO::File; > > $fh->open("/usr/bin/blastall -p blastp -i test.fasta -d 10287/set2 > > |"); > > > > my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh); my > > $result = $searchio->next_result; > > > > my $search_out = Bio::SearchIO->new(-format => 'blast', -file => > > ">test_out.blast"); $search_out->write_result($result); > > > > $fh->close; > > > > The error I get is: > > > > -------------------- WARNING --------------------- > > MSG: Writer not defined. Using a Bio::Search::Writer::HitTableWriter > > --------------------------------------------------- > > Can't locate object method "new" via package > > "Bio::Search::Writer::HitTableWriter" (perhaps you forgot to load > > "Bio::Search::Writer::HitTableWriter"?) at > > /usr/local/bioperl-1.4/Bio/SearchIO/blast.pm line 1493, line > > 397. > > > > Can anyone be of help? > > > > Thanks > > Mick > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From dysdera at ebi.ac.uk Tue Jun 29 15:14:34 2004 From: dysdera at ebi.ac.uk (Pablo Marin Garcia) Date: Wed Jun 30 08:51:43 2004 Subject: [Bioperl-l] RE installing bioperl in windows Message-ID: Hello Hana, Yes, the best option is intall cygwin that includes perl 5.8, and you have more modules (from CPAN) that the ones provides by ppm from active-state. Just in case it would be usefull for someone, these are the issues about the integration of cygwin perl and windows, that I have found: -- mysql: Until recently I have been using the windows perl because I was unable to install DBD::mysql in the cygwin perl. But now following this instruction all works ok: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. -- CGI Another thing to do is to map the cygwin perl.exe to the IIS (if you want to run cgis) [I tried first use with apache but I had failed -> problems to conect to mysql through cgi, probably permision problems] IIS -> map the extensions of the perl scripts to perl -> \...\...\perl.exe %s%s (the %s are essential) !!!! I have both windows perl and cygwin perl -> I have mapped the plx to windows and the .pls to cygwn perl in the IIS -- EMACS The windows emacs can be used with cygwin perl aswell: The best thing is open windows emacs from a cygwin console: I have this alias in my bashrc alias wem='/cygdrive/c/emacs-21.3/bin/runemacs.exe -bg black -fg green -bd ForestGreen' Then you have the bash enviromental variables in the emacs (including PERL5LIB with the cygwin bioperl path). Essential emacs module when debugging => cygwin-mount.el -> understand cygwin paths Problems that I have not solved with w-emacs+cygwin-perl -> when using perldb, the 'n' and 's' are not repeated by default when intro -> the STDOUT is buffered and only released if you print something from the perldb. Cygwin comes with emacs and you can run it from the shell with 'emacs -nw' and then perldb works fine with cygwin perl. Hope this helps > -----Original Message----- > From: Brian Osborne wrote > Hana, > > How easy an approach is may depend on your background. I'm comfortable in > the Unix environment so the approach that's easiest for me is to use Bioperl > with Cygwin, which is a free Unix emulator for Windows. So step 1 would be > to install Cygwin (www.cygwin.com) along with the Cygwin packages binutil, > gcc, make, and perl. When Cygwin is installed you want to start it up and > install Bioperl just as you would in Unix. These steps are described in the > INSTALL document, take a look at the sections THE BIOPERL BUNDLE and > INSTALLING BIOPERL THE EASY WAY USING CPAN > (http://bioperl.org/Core/Latest/INSTALL). When you first start using CPAN in > Perl it will need some configuration, answering all of its questions with > the default answers is usually the right thing to do. > > A note on Cygwin: it doesn't write to your Registry, it doesn't alter your > system or your existing files in any way, it doesn't create partitions, it > simply creates a cygwin directory and writes all of its files to that > directory. To uninstall Cygwin just delete that directory. > > Brian O. > > >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of >> habulleef@kfshrc.edu.sa >> Sent: Tuesday, June 29, 2004 4:30 AM >> To: bioperl-l@bioperl.org >> Subject: [Bioperl-l] Installing Bioperl >> >> >> >> >> Hello, >> I am looking for an easy straight forward way to install Bioperl for >> windows 2000 >> as have tried the steps in bioperl .org and had difficulties to install. >> >> Thank You . / __ \ / __ \ \ | ///\\\ | / \ | ///\\\ | / \ \_((####))_/ / \ \_((####))_/ / \__ ((##)) __/ \__ ((##)) __/ / |||| \ / |||| \ / | oo | \ / | oo | \ | | !! | | | | !! | | \ / \\|// \ / (o o) . .-. .-. .-. .-. .-. .-oOOo~(_)~oOOo-. .-. .-. .-. .-. |X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ / \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \||| `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' 'It is true that a mathematician who is not somewhat of a poet, will never be a perfect mathematician' karl Weierstrass (1902) ------------------------------------------------- Science is for those who learn; poetry, for those who know. Joseph Roux (1834-1886) ------------------------------------------------- Pablo Marin Garcia. E-mail address: pablo@ebi.ac.uk // pablo.marin@uv.es EMBL Outstation, European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton www.ebi.ac.uk/Information/Staff/pablo_marin.html www.ebi.ac.uk/~dysdera/ Cambs. CB10 1SD, United Kingdom Phone: +44 (0)1223 494 478 FAX: +44 (0)1223 494 468 ================================================================= From brian_osborne at cognia.com Wed Jun 30 09:10:16 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Jun 30 09:12:55 2004 Subject: [Bioperl-l] RE installing bioperl in windows In-Reply-To: Message-ID: Pablo, My first guess if someone had trouble connecting to Mysql in the Cygwin/Windows world would be that something's wrong with the connection string. Did you try something like this? use DBI; use CGI; use CGI::Carp qw(fatalsToBrowser); ..... my $dsn = "DBI:$driver:database=$db:host=127.0.0.1"; my $dbh = DBI->connect($dsn,$user,$passw) or croak "Error connecting :" . $dbh->errstr; $driver is "mysql", of course, local Mysql database, Apache provided by Cygwin. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Pablo Marin Garcia Sent: Tuesday, June 29, 2004 3:15 PM To: bioperl-l@portal.open-bio.org Cc: habulleef@kfshrc.edu.sa Subject: [Bioperl-l] RE installing bioperl in windows Hello Hana, Yes, the best option is intall cygwin that includes perl 5.8, and you have more modules (from CPAN) that the ones provides by ppm from active-state. Just in case it would be usefull for someone, these are the issues about the integration of cygwin perl and windows, that I have found: -- mysql: Until recently I have been using the windows perl because I was unable to install DBD::mysql in the cygwin perl. But now following this instruction all works ok: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwi n. -- CGI Another thing to do is to map the cygwin perl.exe to the IIS (if you want to run cgis) [I tried first use with apache but I had failed -> problems to conect to mysql through cgi, probably permision problems] IIS -> map the extensions of the perl scripts to perl -> \...\...\perl.exe %s%s (the %s are essential) !!!! I have both windows perl and cygwin perl -> I have mapped the plx to windows and the .pls to cygwn perl in the IIS -- EMACS The windows emacs can be used with cygwin perl aswell: The best thing is open windows emacs from a cygwin console: I have this alias in my bashrc alias wem='/cygdrive/c/emacs-21.3/bin/runemacs.exe -bg black -fg green -bd ForestGreen' Then you have the bash enviromental variables in the emacs (including PERL5LIB with the cygwin bioperl path). Essential emacs module when debugging => cygwin-mount.el -> understand cygwin paths Problems that I have not solved with w-emacs+cygwin-perl -> when using perldb, the 'n' and 's' are not repeated by default when intro -> the STDOUT is buffered and only released if you print something from the perldb. Cygwin comes with emacs and you can run it from the shell with 'emacs -nw' and then perldb works fine with cygwin perl. Hope this helps > -----Original Message----- > From: Brian Osborne wrote > Hana, > > How easy an approach is may depend on your background. I'm comfortable in > the Unix environment so the approach that's easiest for me is to use Bioperl > with Cygwin, which is a free Unix emulator for Windows. So step 1 would be > to install Cygwin (www.cygwin.com) along with the Cygwin packages binutil, > gcc, make, and perl. When Cygwin is installed you want to start it up and > install Bioperl just as you would in Unix. These steps are described in the > INSTALL document, take a look at the sections THE BIOPERL BUNDLE and > INSTALLING BIOPERL THE EASY WAY USING CPAN > (http://bioperl.org/Core/Latest/INSTALL). When you first start using CPAN in > Perl it will need some configuration, answering all of its questions with > the default answers is usually the right thing to do. > > A note on Cygwin: it doesn't write to your Registry, it doesn't alter your > system or your existing files in any way, it doesn't create partitions, it > simply creates a cygwin directory and writes all of its files to that > directory. To uninstall Cygwin just delete that directory. > > Brian O. > > >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of >> habulleef@kfshrc.edu.sa >> Sent: Tuesday, June 29, 2004 4:30 AM >> To: bioperl-l@bioperl.org >> Subject: [Bioperl-l] Installing Bioperl >> >> >> >> >> Hello, >> I am looking for an easy straight forward way to install Bioperl for >> windows 2000 >> as have tried the steps in bioperl .org and had difficulties to install. >> >> Thank You . / __ \ / __ \ \ | ///\\\ | / \ | ///\\\ | / \ \_((####))_/ / \ \_((####))_/ / \__ ((##)) __/ \__ ((##)) __/ / |||| \ / |||| \ / | oo | \ / | oo | \ | | !! | | | | !! | | \ / \\|// \ / (o o) . .-. .-. .-. .-. .-. .-oOOo~(_)~oOOo-. .-. .-. .-. .-. |X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ / \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \||| `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' 'It is true that a mathematician who is not somewhat of a poet, will never be a perfect mathematician' karl Weierstrass (1902) ------------------------------------------------- Science is for those who learn; poetry, for those who know. Joseph Roux (1834-1886) ------------------------------------------------- Pablo Marin Garcia. E-mail address: pablo@ebi.ac.uk // pablo.marin@uv.es EMBL Outstation, European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton www.ebi.ac.uk/Information/Staff/pablo_marin.html www.ebi.ac.uk/~dysdera/ Cambs. CB10 1SD, United Kingdom Phone: +44 (0)1223 494 478 FAX: +44 (0)1223 494 468 ================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From yinyb at mail.cbi.pku.edu.cn Wed Jun 30 13:34:49 2004 From: yinyb at mail.cbi.pku.edu.cn (Yanbin Yin) Date: Wed Jun 30 13:30:54 2004 Subject: [Bioperl-l] parsing GenScan result References: <1088519674.3760.4.camel@mylonchulus.cap.ed.ac.uk> Message-ID: <003a01c45ec8$8b423a60$6c3369a2@cbi69c8d66176b> Hi, I am trying to parse GenScan prediction result. I found one example script written by Brian Osborne. It is very good but I still want to parse out each exon's sequence and location. Had anyone written this kind of script or could anyone please tell me how to use Bio::Tools::Genscan to write one? Thanks in advance! Yanbin From jason at cgt.duhs.duke.edu Wed Jun 30 13:48:58 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jun 30 13:51:27 2004 Subject: [Bioperl-l] parsing GenScan result In-Reply-To: <003a01c45ec8$8b423a60$6c3369a2@cbi69c8d66176b> References: <1088519674.3760.4.camel@mylonchulus.cap.ed.ac.uk> <003a01c45ec8$8b423a60$6c3369a2@cbi69c8d66176b> Message-ID: did you try the SYNOPSIS part of the documentation from Bio::Tools::Genscan? Also see the bptutorial.pl which has a genscan example. -jason On Thu, 1 Jul 2004, Yanbin Yin wrote: > Hi, > > I am trying to parse GenScan prediction result. I found one example script written by Brian Osborne. It is very good but I still want to parse out each exon's sequence and location. Had anyone written this kind of script or could anyone please tell me how to use Bio::Tools::Genscan to write one? > > Thanks in advance! > > Yanbin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From brian_osborne at cognia.com Wed Jun 30 14:36:28 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Jun 30 14:39:41 2004 Subject: [Bioperl-l] parsing GenScan result In-Reply-To: Message-ID: Yanbin, What Jason is saying is that the Synopsis is telling you that Bio::Tools::Prediction::Gene objects are returned, for example. So you'll need to go that module's documentation and see how to get the appropriate information from that object. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich Sent: Wednesday, June 30, 2004 1:49 PM To: Yanbin Yin Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] parsing GenScan result did you try the SYNOPSIS part of the documentation from Bio::Tools::Genscan? Also see the bptutorial.pl which has a genscan example. -jason On Thu, 1 Jul 2004, Yanbin Yin wrote: > Hi, > > I am trying to parse GenScan prediction result. I found one example script written by Brian Osborne. It is very good but I still want to parse out each exon's sequence and location. Had anyone written this kind of script or could anyone please tell me how to use Bio::Tools::Genscan to write one? > > Thanks in advance! > > Yanbin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From yinyb at mail.cbi.pku.edu.cn Wed Jun 30 15:06:27 2004 From: yinyb at mail.cbi.pku.edu.cn (Yanbin Yin) Date: Wed Jun 30 15:02:29 2004 Subject: [Bioperl-l] parsing GenScan result References: Message-ID: <001101c45ed5$5845a220$6c3369a2@cbi69c8d66176b> Brain and Jason, Thank you for your reply. Using the following script I could get the exons' coordinates, but I still do not know how to get the exon sequence. use Bio::Tools::Genscan; my $genscan = Bio::Tools::Genscan->new(-file => "$filename"); while($gene = $genscan->next_prediction()) { @exon_arr = $gene->exons(); foreach my $exon(@exon_arr) { my $start = $exon->start(); my $end = $exon->end(); } } I checked the object Bio::Tools::Prediction::Exon, its method $predicted_cds_dna = $exon->predicted_cds() doesn't work. How come? Yanbin ----- Original Message ----- From: "Brian Osborne" To: "Jason Stajich" ; "Yanbin Yin" Cc: Sent: Thursday, July 01, 2004 2:36 AM Subject: RE: [Bioperl-l] parsing GenScan result > Yanbin, > > What Jason is saying is that the Synopsis is telling you that > Bio::Tools::Prediction::Gene objects are returned, for example. So you'll > need to go that module's documentation and see how to get the appropriate > information from that object. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich > Sent: Wednesday, June 30, 2004 1:49 PM > To: Yanbin Yin > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] parsing GenScan result > > did you try the SYNOPSIS part of the documentation from > Bio::Tools::Genscan? > > Also see the bptutorial.pl which has a genscan example. > > -jason > On Thu, 1 Jul 2004, Yanbin Yin wrote: > > > Hi, > > > > I am trying to parse GenScan prediction result. I found one example script > written by Brian Osborne. It is very good but I still want to parse out each > exon's sequence and location. Had anyone written this kind of script or > could anyone please tell me how to use Bio::Tools::Genscan to write one? > > > > Thanks in advance! > > > > Yanbin > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > >