From yinyb at mail.cbi.pku.edu.cn Thu Jul 1 07:53:09 2004 From: yinyb at mail.cbi.pku.edu.cn (Yin Yanbin) Date: Thu Jul 1 07:53:18 2004 Subject: [Bioperl-l] parsing GenScan result References: Message-ID: <000e01c45f61$fb75e310$cefa69a2@cbiyinyb> Brian, I added the following codes in your GenScan example parser. Now it can print out the exon sequence. Thank you. my @exon_arr = $gene->exons; my $i = 1; foreach my $exon (@exon_arr) { print EXON ">".$id."_".$1."_EXON_".$i." ".$exon->strand." ".$exon->start."|".$exon->end."\n".$seq->subseq($exon->start, $exon->end)."\n"; $i++; } Yanbin _________________________________________________________ Yanbin(Benjamin) Yin, Ph.D. Student Center of Bioinformatics (CBI), College of Life Sciences, Room 607, New Life Science Building, Peking University, 100871 Beijing, P.R.China Tel: 86 10 6275 6730 E-mail: yinyb@mail.cbi.pku.edu.cn ----- Original Message ----- From: "Brian Osborne" To: "Jason Stajich" ; "Yanbin Yin" Cc: Sent: Thursday, July 01, 2004 2:36 AM Subject: RE: [Bioperl-l] parsing GenScan result > Yanbin, > > What Jason is saying is that the Synopsis is telling you that > Bio::Tools::Prediction::Gene objects are returned, for example. So you'll > need to go that module's documentation and see how to get the appropriate > information from that object. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich > Sent: Wednesday, June 30, 2004 1:49 PM > To: Yanbin Yin > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] parsing GenScan result > > did you try the SYNOPSIS part of the documentation from > Bio::Tools::Genscan? > > Also see the bptutorial.pl which has a genscan example. > > -jason > On Thu, 1 Jul 2004, Yanbin Yin wrote: > > > Hi, > > > > I am trying to parse GenScan prediction result. I found one example script > written by Brian Osborne. It is very good but I still want to parse out each > exon's sequence and location. Had anyone written this kind of script or > could anyone please tell me how to use Bio::Tools::Genscan to write one? > > > > Thanks in advance! > > > > Yanbin > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From brian_osborne at cognia.com Thu Jul 1 08:27:12 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jul 1 08:29:40 2004 Subject: [Bioperl-l] parsing GenScan result In-Reply-To: <000e01c45f61$fb75e310$cefa69a2@cbiyinyb> Message-ID: Yanbin, I'll add this code to the example script. Thank you. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Yin Yanbin Sent: Thursday, July 01, 2004 7:53 AM To: Brian Osborne Cc: Bioperl Subject: Re: [Bioperl-l] parsing GenScan result Importance: High Brian, I added the following codes in your GenScan example parser. Now it can print out the exon sequence. Thank you. my @exon_arr = $gene->exons; my $i = 1; foreach my $exon (@exon_arr) { print EXON ">".$id."_".$1."_EXON_".$i." ".$exon->strand." ".$exon->start."|".$exon->end."\n".$seq->subseq($exon->start, $exon->end)."\n"; $i++; } Yanbin _________________________________________________________ Yanbin(Benjamin) Yin, Ph.D. Student Center of Bioinformatics (CBI), College of Life Sciences, Room 607, New Life Science Building, Peking University, 100871 Beijing, P.R.China Tel: 86 10 6275 6730 E-mail: yinyb@mail.cbi.pku.edu.cn ----- Original Message ----- From: "Brian Osborne" To: "Jason Stajich" ; "Yanbin Yin" Cc: Sent: Thursday, July 01, 2004 2:36 AM Subject: RE: [Bioperl-l] parsing GenScan result > Yanbin, > > What Jason is saying is that the Synopsis is telling you that > Bio::Tools::Prediction::Gene objects are returned, for example. So you'll > need to go that module's documentation and see how to get the appropriate > information from that object. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich > Sent: Wednesday, June 30, 2004 1:49 PM > To: Yanbin Yin > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] parsing GenScan result > > did you try the SYNOPSIS part of the documentation from > Bio::Tools::Genscan? > > Also see the bptutorial.pl which has a genscan example. > > -jason > On Thu, 1 Jul 2004, Yanbin Yin wrote: > > > Hi, > > > > I am trying to parse GenScan prediction result. I found one example script > written by Brian Osborne. It is very good but I still want to parse out each > exon's sequence and location. Had anyone written this kind of script or > could anyone please tell me how to use Bio::Tools::Genscan to write one? > > > > Thanks in advance! > > > > Yanbin > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From Matthew.Betts at ii.uib.no Thu Jul 1 10:50:46 2004 From: Matthew.Betts at ii.uib.no (Matthew Betts) Date: Thu Jul 1 10:53:08 2004 Subject: [Bioperl-l] PDB sequence from ATOM records Message-ID: Hi, I need to be able to map some protein sequence alignment information on to a protein structure. To do this I need to get the sequence from the ATOM records, since the SEQRES sequence is often not exactly the same. So I'd like to change Bio::Structure::Residue a little: Amino-acid type and residue number are currently contained in one value, residue->id. I would like to separate them in to two, residue->type and residue->num. Then, for backwards compatibility, construct residue->id from these each time it is required (or store it as well if that is better?). residue->type should be able to return the one-letter code as well as the three-letter code. And then have a method called something like Bio::Structure::Entry->atom_seq that would return a Bio::PrimarySeq object constructed from the one-letter codes of the residues of a particular chain. Any comments please... Thanks. Matthew P.S. Sorry if this message is a repeat - our email server went down as I was sending it the first time. -- Matthew Betts, mailto:matthew.betts@ii.uib.no Phone: (+47) 55 58 40 22, Fax: (+47) 55 58 42 95 CBU, BCCS, HiB, UNIFOB / Universitetet i Bergen Thorm?hlensgt. 55, N-5008 Bergen, Norway From Matthew.Betts at ii.uib.no Thu Jul 1 10:07:50 2004 From: Matthew.Betts at ii.uib.no (Matthew Betts) Date: Thu Jul 1 11:14:23 2004 Subject: [Bioperl-l] PDB sequence from ATOM records Message-ID: Hi, I need to be able to map some protein sequence alignment information on to a protein structure. To do this I need to get the sequence from the ATOM records, since the SEQRES sequence is often not exactly the same. So I'd like to change Bio::Structure::Residue a little: Amino-acid type and residue number are currently contained in one value, residue->id. I would like to separate them in to two, residue->type and residue->num. Then, for backwards compatibility, construct residue->id from these each time it is required (or store it as well if that is better?). residue->type should be able to return the one-letter code as well as the three-letter code. And then have a method called something like Bio::Structure::Entry->atom_seq that would return a Bio::PrimarySeq object constructed from the one-letter codes of the residues of a particular chain. Any comments please... Thanks. Matthew -- Matthew Betts, mailto:matthew.betts@ii.uib.no Phone: (+47) 55 58 40 22, Fax: (+47) 55 58 42 95 CBU, BCCS, HiB, UNIFOB / Universitetet i Bergen Thorm?hlensgt. 55, N-5008 Bergen, Norway From wuhuizhu at mail.eecis.udel.edu Thu Jul 1 12:44:07 2004 From: wuhuizhu at mail.eecis.udel.edu (Huizhuan Wu) Date: Thu Jul 1 15:47:16 2004 Subject: [Bioperl-l] blast/cgi question Message-ID: Hi, I am new to Bioperl. Now I need to work on a website that accepts a sequence from users and then a BioPerl script blast it against a local db and prints out the result. The problem is my BioPerl cgi script works fine locally (I set the params through commandline) but doesn't work through web(error msg: blastall call crashed...). Do you happen to know how to solve this? regards, Sophia From brian_osborne at cognia.com Thu Jul 1 15:57:57 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jul 1 16:01:07 2004 Subject: [Bioperl-l] blast/cgi question In-Reply-To: Message-ID: Sophia, What's the blastall error, exactly? Remember that your CGI script may be executing as 'nobody' or 'apache' and these users may have none of the environmental variables that blastall needs, you may need to explicitly set them in your CGI script. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Huizhuan Wu Sent: Thursday, July 01, 2004 12:44 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] blast/cgi question Hi, I am new to Bioperl. Now I need to work on a website that accepts a sequence from users and then a BioPerl script blast it against a local db and prints out the result. The problem is my BioPerl cgi script works fine locally (I set the params through commandline) but doesn't work through web(error msg: blastall call crashed...). Do you happen to know how to solve this? regards, Sophia _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Thu Jul 1 15:59:27 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 1 16:01:58 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] blast/cgi question In-Reply-To: References: Message-ID: Because your website is not running under your uid it probably isn't seeing the .ncbirc file. You might try making sure the BLASTMAT env variable is set to wherever your ncbi 'data' directory /usr/local/BLAST/data probably. You can set it in your script by adding BEGIN { $ENV{'BLASTMAT'} = '/usr/local/BLAST/data'; } To your script up above all the module 'use' statements. On Thu, 1 Jul 2004, Huizhuan Wu wrote: > Hi, > I am new to Bioperl. Now I need to work on a website > that accepts a sequence from users and then a BioPerl script blast it > against a local db and prints out the result. The problem is my BioPerl > cgi script works fine locally (I set the params through commandline) but > doesn't work through web(error msg: blastall call crashed...). Do you > happen to know how to solve this? > > > The error msg I got: > MSG: blastall call crashed: 256 /usr/local/BLAST/blastall -p blastn -d > /Users/work/data.txt -i > /tmp/ITPxjyt0zx -e 1e-10 -o blast.out -F F -g F -W 17 > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > /Library/Perl/5.8.1/Bio/Tools/Run/StandAloneBlast.pm:732 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > /Library/Perl/5.8.1/Bio/Tools/Run/StandAloneBlast.pm:680 > STACK Bio::Tools::Run::StandAloneBlast::blastall > /Library/Perl/5.8.1/Bio/Tools/Run/StandAloneBlast.pm:536 > STACK toplevel /Users/wuhuizhuan/Sites/test/CGI-BIN/bl.cgi:50 > > > Here is the code: > #!/usr/bin/perl -w > > # This script gets sequence from webpage and perform blastn > #against data.txt database. > > use strict; > use warnings; > use Getopt::Long; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > use Data::Dumper; > > # HTML staff > print header; > print start_html("BLAST RESULT"); > print h2("BLAST RESULT"); > > # Variables > my ($query, $seq); > my $db = "/Users/work/data.txt"; > my $maxEval = 1.0e-10; > > my $seq = param('SEQUENCE'); > my $seqobj = Bio::Seq->new( '-display_id' => $query, > '-seq' => $seq); > > print Dumper($seqobj); > my @params = ('program'=>'blastn', 'outfile'=>'blast.out', > '_READMETHOD'=>'Blast', 'F'=>'F','W'=>17,'g'=>'F'); > > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > $factory->e($maxEval); > $factory->d($db); > my $blast_report = $factory->blastall($seqobj); > #my $result = $blast_report->next_result; > print "$blast_report\n"; > print end_html; > > > > regards, > Sophia > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From neil.saunders at unsw.edu.au Fri Jul 2 01:45:41 2004 From: neil.saunders at unsw.edu.au (Neil Saunders) Date: Fri Jul 2 01:48:18 2004 Subject: [Bioperl-l] search2gff Message-ID: <20040702054541.GA821@psychro> I have a couple of queries regarding the search2gff script. 1) I'm using the latest CVS (1.8) which states "search2gff does the right thing now - now strand is done right properly". Am I right in thinking that now target start is always less than target end? 2) Should search2gff work with any BLAST flavour? I ask because I'm running it on tblastx files. Without the -m switch it seems to run fine, but if I add -m, I get thrown exceptions looking like e.g. : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Undefined sub-sequence (2827710,2827711). Valid range = 2827710 - 2827889 The undefined subsequence start/ends always differ by 1. thanks, Neil -- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney 2052, Australia http://psychro.bioinformatics.unsw.edu.au/neil/index.php From jbedell at oriongenomics.com Fri Jul 2 10:27:06 2004 From: jbedell at oriongenomics.com (Joseph Bedell) Date: Fri Jul 2 10:30:21 2004 Subject: [Bioperl-l] RE: blastall + perl Message-ID: <434AF352F9D03C4C896782B8CC78BC764E5068@VADER.oriongenomics.com> Hi Jahan, Unfortunately, I don't think the problem you are having was ever resolved during that Bioperl thread below. I'm CC'ing the Bioperl list to see if anyone else can help you with this problem. Joey >-----Original Message----- >From: Jahan S Ghaznavi [mailto:ghaznavi@cse.psu.edu] >Sent: Friday, July 02, 2004 8:57 AM >To: Joseph Bedell >Subject: blastall + perl > > >Hi >First I would like thank you for trying to help me. >I wrote a cliet and server using soap::lite. When I run the client it uses >a module analysis >where the blastall has been called. >When I run it i get the erroer below. here is the link to where i found >you had the same >problem >"http://bioperl.org/pipermail/bioperl-l/2003-December/014202.html" > > >here is the error >ghaznavi@posnania:~/Perl-code/perl-fork/soap-fork/temp$ perl -T >new_test_soap.pl >Fault: >SOAP-ENV:Server, >------------- EXCEPTION ------------- >MSG: blastall call crashed: -1 /home/biosoft/NCBI_tools/blastall -p blastp >-d >"/home/biosoft/NCBI_tools/db/nr" -i /tmp/NWNIDOVgh7 -o ccc.out > >STACK Bio::Tools::Run::StandAloneBlast::_runblast >/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:732 >STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast >/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:680 >STACK Bio::Tools::Run::StandAloneBlast::blastall >/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:536 >STACK Analysis::method_blastp Analysis.pm:49 >STACK Analysis::run Analysis.pm:127 >STACK (eval) /usr/lib/perl5/site_perl/5.8.3/SOAP/Lite.pm:2322 >STACK (eval) /usr/lib/perl5/site_perl/5.8.3/SOAP/Lite.pm:2310 >STACK SOAP::Server::handle /usr/lib/perl5/site_perl/5.8.3/SOAP/Lite.pm:2282 >STACK SOAP::Transport::HTTP::Server::handle >/usr/lib/perl5/site_perl/5.8.3/SOAP/Transport/HTTP.pm:286 >STACK SOAP::Transport::HTTP::Daemon::ForkOnAccept::handle >SOAP/Transport/HTTP/Daemon/ForkOnAccept.pm:32 >STACK toplevel serv_fork.pl:24 > >-------------------------------------- > >here is teh code >if you need moew info please let me know. I am very thankfull for what ever >you could do. >Thanks >Jahan > > ># -*- perl -*- >use strict; >use vars qw(%ENV); >$ENV{PATH}="/home/biosoft/bin:/home/biosoft/glimmer:/home/biosoft/NCBI_ tool >s/:/home/biosoft/NCBI_tools/db"; > > >#$ENV{PATH}="/home/biosoft/NCBI_tools/"; >package Analysis; > >use DBconnector; >#use blast_module; > >#------------------------------------- >use Bio::Perl; >use Bio::SeqIO; >#use Bio::searchIO; >use Bio::Tools::Run::StandAloneBlast; > >$Bio::Tools::Run::StandAloneBlast::DATADIR = '/home/biosoft/NCBI_tools/db'; >$ENV{'BLASTDB'}="/home/bios:q!oft/NCBI_tools/db"; > >sub hello > { > print "aloalaoalaoalo \n"; > } > >sub method_blastp > { > > my $database="nr"; > my $program = "blastp"; > my $outfile = "ccc.out"; > my $_READMRTHOD = "Blast"; > my $file_name = "test.fa"; > my $type = "fasta"; > # my $self = shift; > #my ($database,$program,$outfile,$_READMRTHOD,$file_name,$type)=@_; > my @program = ('database'=>$database,'program' => >$program,'outfile'=>$outfile,'_READMETHOD'=>$_READMRTHOD); > my $factory = Bio::Tools::Run::StandAloneBlast->new(@program); > print " $database $program $outfile $_READMRTHOD $file_name $type \n"; > > # my $res = $factory->submit_blast($file_name); > #push @outfile "/home/ghaznavi/Perl-code/perl-fork/soap- >fork/temp/out1"; > #$factory->save_output($res,@outfile); > my $str = Bio::SeqIO->new(-file=>$file_name,'-format'=>$type); > my $input = $str->next_seq(); > print " ++++ \n"; > print " $input \n"; > my $blast_report = $factory->blastall($input) or die " i can not do it >"; > print "---**-- $blast_report \n"; > return 1; > } >#------------------------------------- >use Time::localtime; > >sub query { > die "Not Implemented\n"; >} > >sub run { > # die "Not Implemented\n"; > print "child proccess: PID=$$ \n"; > print "$_[1]{'program'} \n"; > my %hash; > $hash{'program'} = $_[1]{'program'}; > print "---- $hash{'program'} ---- "; > $hash{'protein_db'} = $_[1]{'protein_db'}; > $hash{'filter'} = $_[1]{'filter'}; > $hash{'Expect'} = $_[1]{'Expect'}; > $hash{'gapped_alig'} = $_[1]{'gapped_alig'}; > $hash{'mismatch'} = $_[1]{'mismatch'}; > $hash{'match'} = $_[1]{'match'}; > $hash{'matrix'} = $_[1]{'matrix'}; > $hash{'gc_query'} = $_[1]{'gc_query'}; > $hash{'gc_db'} = $_[1]{'gc_db'}; > $hash{'strand'} = $_[1]{'strand'}; > $hash{'Descriptions'} = $_[1]{'Descriptions'}; > $hash{'Alignments'} = $_[1]{'Alignments'}; > $hash{'view_alignments'} = $_[1]{'view_alignments'}; > $hash{'show_gi'} = $_[1]{'show_gi'}; > $hash{'believe'} = $_[1]{'believe'}; > $hash{'htmloutput'} = $_[1]{'htmloutput'}; > $hash{'external_links'} = $_[1]{'external_links'}; > $hash{'one_HSP_per_line'} = $_[1]{'one_HSP_per_line'}; > $hash{'image_query'} = $_[1]{'image_query'}; > #---------------------------------------------------------------- >------------------ >my $connection = new DBconnector; > >#connection->connect; >my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)= localtime(time); >my $year = localtime->year(); >my $hour = localtime->hour(); >my $min = localtime->min(); >my $sec = localtime->sec(); >my $time = $hour.$min.$sec; >my $PID=$$; >my $id = $PID.$time; >print "!!!!!! $PID + $time = $id \n"; >$connection->add("$id","$hash{'program'}","zz","kk"); >$connection->stats; >$connection->purge("$id"); >$connection->down("$id"); > >#my $blast = new blast_module; >open (FILEHANDLE, ">test.fa") or die "no such a file "; >print FILEHANDLE ">gi|523232|emb|AAC12345|sp|D12567 titin fragment >MHRHHRTGYSAAYGPLKJHGYVHFIMCVVVSWWASDVVTYIPLLLNNSSAGWKRWWWIIFGGE >GHGHHRTYSALWWPPLKJHGSKHFILCVKVSWLAKKERTYIPKKILLMMGGWWAAWWWI"; >close (FILEHANDLE); >print " $outfile \n"; >hello(); >print " $database \n"; >my $tt = method_blastp(); > $ENV{PATH} = '/usr/bin'; > print $ENV{PATH} ; > > open (STDOUT,">log.txt") or die "open() error: $!"; > print STDOUT "\$exitvalue "; > #exec ("ls -l"); # or die "system failed : $?"; > exec ("date"); >} > >sub echo { > return @_; >} > >1; > From jason at cgt.duhs.duke.edu Fri Jul 2 10:43:20 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jul 2 10:45:50 2004 Subject: [Bioperl-l] RE: blastall + perl In-Reply-To: <434AF352F9D03C4C896782B8CC78BC764E5068@VADER.oriongenomics.com> References: <434AF352F9D03C4C896782B8CC78BC764E5068@VADER.oriongenomics.com> Message-ID: presumably the environment variables that blast needs aren't set in your soap environemnt and/or the .ncbirc file is not set, or your client can't write to the current directory. Have your script print out the env variables while( my ($key,$value) = each %ENV ) { print "$key=$value\n"; } Perhaps to make things easier you might short circuit using StandAloneBlast in the first place and just make sure the code can run blast on a file in /tmp `blastall -i /tmp/file.fa -d dtabase -o outfile ... `; See what kind of warning you are getting to try and figure out what is wrong with the environment. -jason On Fri, 2 Jul 2004, Joseph Bedell wrote: > Hi Jahan, > > Unfortunately, I don't think the problem you are having was ever > resolved during that Bioperl thread below. I'm CC'ing the Bioperl list > to see if anyone else can help you with this problem. > > Joey > > >-----Original Message----- > >From: Jahan S Ghaznavi [mailto:ghaznavi@cse.psu.edu] > >Sent: Friday, July 02, 2004 8:57 AM > >To: Joseph Bedell > >Subject: blastall + perl > > > > > >Hi > >First I would like thank you for trying to help me. > >I wrote a cliet and server using soap::lite. When I run the client it > uses > >a module analysis > >where the blastall has been called. > >When I run it i get the erroer below. here is the link to where i > found > >you had the same > >problem > >"http://bioperl.org/pipermail/bioperl-l/2003-December/014202.html" > > > > > >here is the error > >ghaznavi@posnania:~/Perl-code/perl-fork/soap-fork/temp$ perl -T > >new_test_soap.pl > >Fault: > >SOAP-ENV:Server, > >------------- EXCEPTION ------------- > >MSG: blastall call crashed: -1 /home/biosoft/NCBI_tools/blastall -p > blastp > >-d > >"/home/biosoft/NCBI_tools/db/nr" -i /tmp/NWNIDOVgh7 -o ccc.out > > > >STACK Bio::Tools::Run::StandAloneBlast::_runblast > >/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:732 > >STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > >/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:680 > >STACK Bio::Tools::Run::StandAloneBlast::blastall > >/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:536 > >STACK Analysis::method_blastp Analysis.pm:49 > >STACK Analysis::run Analysis.pm:127 > >STACK (eval) /usr/lib/perl5/site_perl/5.8.3/SOAP/Lite.pm:2322 > >STACK (eval) /usr/lib/perl5/site_perl/5.8.3/SOAP/Lite.pm:2310 > >STACK SOAP::Server::handle > /usr/lib/perl5/site_perl/5.8.3/SOAP/Lite.pm:2282 > >STACK SOAP::Transport::HTTP::Server::handle > >/usr/lib/perl5/site_perl/5.8.3/SOAP/Transport/HTTP.pm:286 > >STACK SOAP::Transport::HTTP::Daemon::ForkOnAccept::handle > >SOAP/Transport/HTTP/Daemon/ForkOnAccept.pm:32 > >STACK toplevel serv_fork.pl:24 > > > >-------------------------------------- > > > >here is teh code > >if you need moew info please let me know. I am very thankfull for what > ever > >you could do. > >Thanks > >Jahan > > > > > ># -*- perl -*- > >use strict; > >use vars qw(%ENV); > >$ENV{PATH}="/home/biosoft/bin:/home/biosoft/glimmer:/home/biosoft/NCBI_ > tool > >s/:/home/biosoft/NCBI_tools/db"; > > > > > >#$ENV{PATH}="/home/biosoft/NCBI_tools/"; > >package Analysis; > > > >use DBconnector; > >#use blast_module; > > > >#------------------------------------- > >use Bio::Perl; > >use Bio::SeqIO; > >#use Bio::searchIO; > >use Bio::Tools::Run::StandAloneBlast; > > > >$Bio::Tools::Run::StandAloneBlast::DATADIR = > '/home/biosoft/NCBI_tools/db'; > >$ENV{'BLASTDB'}="/home/bios:q!oft/NCBI_tools/db"; > > > >sub hello > > { > > print "aloalaoalaoalo \n"; > > } > > > >sub method_blastp > > { > > > > my $database="nr"; > > my $program = "blastp"; > > my $outfile = "ccc.out"; > > my $_READMRTHOD = "Blast"; > > my $file_name = "test.fa"; > > my $type = "fasta"; > > # my $self = shift; > > #my ($database,$program,$outfile,$_READMRTHOD,$file_name,$type)=@_; > > my @program = ('database'=>$database,'program' => > >$program,'outfile'=>$outfile,'_READMETHOD'=>$_READMRTHOD); > > my $factory = Bio::Tools::Run::StandAloneBlast->new(@program); > > print " $database $program $outfile $_READMRTHOD $file_name $type > \n"; > > > > # my $res = $factory->submit_blast($file_name); > > #push @outfile "/home/ghaznavi/Perl-code/perl-fork/soap- > >fork/temp/out1"; > > #$factory->save_output($res,@outfile); > > my $str = Bio::SeqIO->new(-file=>$file_name,'-format'=>$type); > > my $input = $str->next_seq(); > > print " ++++ \n"; > > print " $input \n"; > > my $blast_report = $factory->blastall($input) or die " i can not do > it > >"; > > print "---**-- $blast_report \n"; > > return 1; > > } > >#------------------------------------- > >use Time::localtime; > > > >sub query { > > die "Not Implemented\n"; > >} > > > >sub run { > > # die "Not Implemented\n"; > > print "child proccess: PID=$$ \n"; > > print "$_[1]{'program'} \n"; > > my %hash; > > $hash{'program'} = $_[1]{'program'}; > > print "---- $hash{'program'} ---- "; > > $hash{'protein_db'} = $_[1]{'protein_db'}; > > $hash{'filter'} = $_[1]{'filter'}; > > $hash{'Expect'} = $_[1]{'Expect'}; > > $hash{'gapped_alig'} = $_[1]{'gapped_alig'}; > > $hash{'mismatch'} = $_[1]{'mismatch'}; > > $hash{'match'} = $_[1]{'match'}; > > $hash{'matrix'} = $_[1]{'matrix'}; > > $hash{'gc_query'} = $_[1]{'gc_query'}; > > $hash{'gc_db'} = $_[1]{'gc_db'}; > > $hash{'strand'} = $_[1]{'strand'}; > > $hash{'Descriptions'} = $_[1]{'Descriptions'}; > > $hash{'Alignments'} = $_[1]{'Alignments'}; > > $hash{'view_alignments'} = $_[1]{'view_alignments'}; > > $hash{'show_gi'} = $_[1]{'show_gi'}; > > $hash{'believe'} = $_[1]{'believe'}; > > $hash{'htmloutput'} = $_[1]{'htmloutput'}; > > $hash{'external_links'} = $_[1]{'external_links'}; > > $hash{'one_HSP_per_line'} = $_[1]{'one_HSP_per_line'}; > > $hash{'image_query'} = $_[1]{'image_query'}; > > > #---------------------------------------------------------------- > >------------------ > >my $connection = new DBconnector; > > > >#connection->connect; > >my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)= > localtime(time); > >my $year = localtime->year(); > >my $hour = localtime->hour(); > >my $min = localtime->min(); > >my $sec = localtime->sec(); > >my $time = $hour.$min.$sec; > >my $PID=$$; > >my $id = $PID.$time; > >print "!!!!!! $PID + $time = $id \n"; > >$connection->add("$id","$hash{'program'}","zz","kk"); > >$connection->stats; > >$connection->purge("$id"); > >$connection->down("$id"); > > > >#my $blast = new blast_module; > >open (FILEHANDLE, ">test.fa") or die "no such a file "; > >print FILEHANDLE ">gi|523232|emb|AAC12345|sp|D12567 titin fragment > >MHRHHRTGYSAAYGPLKJHGYVHFIMCVVVSWWASDVVTYIPLLLNNSSAGWKRWWWIIFGGE > >GHGHHRTYSALWWPPLKJHGSKHFILCVKVSWLAKKERTYIPKKILLMMGGWWAAWWWI"; > >close (FILEHANDLE); > >print " $outfile \n"; > >hello(); > >print " $database \n"; > >my $tt = method_blastp(); > > $ENV{PATH} = '/usr/bin'; > > print $ENV{PATH} ; > > > > open (STDOUT,">log.txt") or die "open() error: $!"; > > print STDOUT "\$exitvalue "; > > #exec ("ls -l"); # or die "system failed : $?"; > > exec ("date"); > >} > > > >sub echo { > > return @_; > >} > > > >1; > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From hlapp at gnf.org Fri Jul 2 12:48:16 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jul 2 12:50:30 2004 Subject: [Bioperl-l] Re: GO dbxrefs in swissprot In-Reply-To: <40E535E6.8050608@mpi-cbg.de> Message-ID: <9D91A0A8-CC47-11D8-B628-000A959EB4C4@gnf.org> Pretty weird what you describe if it works for one entry but not another. Also, the DR lines don't look suspiciously different. If there's no direct reason that prevents you from doing so you should definitely upgrade to the 1.4.x series, possibly even to the latest version of the stable branch from CVS. There were quite some fixes meanwhile, some of which do affect how sequences get loaded into biosql because the affect the annotation bundle. Let me know if the problem persists after the upgrade, and if it does send me the two files. I'm also cc'ing this to the bioperl list because it is really a bioperl problem, not a biosql-related one. -hilmar On Friday, July 2, 2004, at 03:16 AM, Andreas Henschel wrote: > Hi Hilmar, > > Thanks for your reply. I was wondering if it is due to my patched > bioperl 1.2.1? > Hilmar Lapp wrote: > >> When you say the GO dbxrefs did not appear, how do you mean? Are you >> referring to dbxrefs present in the source file but absent as >> association rows in bioentry_dbxref? >> > Yes! > >> If you have a swissprot entry that has GO dbxrefs in the source file >> but fails to have those associated in bioentry_dbxref, check whether >> the Bio::Seq object that's coming from the parser has them as >> annotation. It would sound strange if some entries get the >> associations whereas others don't. >> > Ok, here is what I did: I modified load_seqdatabase.pl to print out > the annotions. I ran it, comparing two small flatfiles, both > containing GO annotations (according to flatfile and swissprot > website). > For the first, the parser detected no GO annotation, where as the > latter got it: > > $prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname > bioseqdb --namespace swissprot --format swiss --lookup --remove > --testonly P53396.dat > > Annotation dblink stringified value Direct database link to X64330 in > database EMBL > Annotation dblink stringified value Direct database link to U18197 in > database EMBL > Annotation dblink stringified value Direct database link to BC006195 > in database EMBL > Annotation dblink stringified value Direct database link to S21173 in > database PIR > Annotation dblink stringified value Direct database link to P07459 in > database HSSP > Annotation dblink stringified value Direct database link to HGNC:115 > in database Genew > Annotation dblink stringified value Direct database link to P53396 in > database GK > Annotation dblink stringified value Direct database link to 108728 in > database MIM > Annotation dblink stringified value Direct database link to IPR002020 > in database InterPro > Annotation dblink stringified value Direct database link to IPR003781 > in database InterPro > Annotation dblink stringified value Direct database link to IPR005811 > in database InterPro > Annotation dblink stringified value Direct database link to IPR005810 > in database InterPro > Annotation dblink stringified value Direct database link to IPR005809 > in database InterPro > Annotation dblink stringified value Direct database link to PF02629 in > database Pfam > Annotation dblink stringified value Direct database link to PF00549 in > database Pfam > Annotation dblink stringified value Direct database link to PS01216 in > database PROSITE > Annotation dblink stringified value Direct database link to PS00399 in > database PROSITE > Annotation dblink stringified value Direct database link to PS01217 in > database PROSITE > > $prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname > bioseqdb --namespace swissprot --format swiss --lookup --remove > --testonly Q15777.dat > Loading Q15777.dat ... > > Annotation dblink stringified value Direct database link to U57911 in > database EMBL > Annotation dblink stringified value Direct database link to BC031582 > in database EMBL > Annotation dblink stringified value Direct database link to HGNC:1180 > in database Genew > Annotation dblink stringified value Direct database link to 600911 in > database MIM > Annotation dblink stringified value Direct database link to GO:0007399 > in database GO > Annotation dblink stringified value Direct database link to IPR004843 > in database InterPro > Annotation dblink stringified value Direct database link to PF00149 in > database Pfam > > > The corresponding DR entries in the two flat files are: > P53396.dat: > DR EMBL; X64330; CAA45614.1; -. > DR EMBL; U18197; AAB60340.1; -. > DR EMBL; BC006195; AAH06195.1; -. > DR PIR; S21173; S21173. > DR HSSP; P07459; 1JKJ. > DR Genew; HGNC:115; ACLY. > DR GK; P53396; -. > DR MIM; 108728; -. > DR GO; GO:0009346; C:citrate lyase complex; TAS. > DR GO; GO:0003878; F:ATP citrate synthase activity; TAS. > DR GO; GO:0006200; P:ATP catabolism; TAS. > DR GO; GO:0006101; P:citrate metabolism; TAS. > DR GO; GO:0015936; P:coenzyme A metabolism; TAS. > DR InterPro; IPR002020; Citrate_synth. > DR InterPro; IPR003781; CoA_binding. > DR InterPro; IPR005811; CoA_ligase. > DR InterPro; IPR005810; CoA_lig_alpha. > DR InterPro; IPR005809; CoA_lig_beta. > DR Pfam; PF02629; CoA_binding; 1. > DR Pfam; PF00549; Ligase_CoA; 1. > DR PROSITE; PS01216; SUCCINYL_COA_LIG_1; 1. > DR PROSITE; PS00399; SUCCINYL_COA_LIG_2; 1. > DR PROSITE; PS01217; SUCCINYL_COA_LIG_3; 1. > > Q15777.dat: > DR EMBL; U57911; AAC50564.1; -. > DR EMBL; BC031582; AAH31582.1; -. > DR Genew; HGNC:1180; C11orf8. > DR MIM; 600911; -. > DR GO; GO:0007399; P:neurogenesis; TAS. > DR InterPro; IPR004843; M-ppestrase. > DR Pfam; PF00149; Metallophos; 1. > > Cheers > Andreas > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jdantzer at cs.iupui.edu Thu Jul 1 16:57:45 2004 From: jdantzer at cs.iupui.edu (Jessica Dantzer) Date: Fri Jul 2 13:13:48 2004 Subject: [Bioperl-l] Problems parsing swiss-prot files Message-ID: <41702.134.68.144.57.1088715465.squirrel@www.cs.iupui.edu> I'm working on parsing swiss-prot files for use in another database, and I've managed to work out where all the information I need is stored for the most part. The only problems I'm encountering are with the reference parsing-- Some of the files have multiple "RP" lines, and I only seem to be able to get one. The code seems to indicate that this is how the files are parsed. Is there any other way to access the second line? Thanks, Jessica From wuhuizhu at mail.eecis.udel.edu Thu Jul 1 16:25:42 2004 From: wuhuizhu at mail.eecis.udel.edu (Huizhuan Wu) Date: Fri Jul 2 13:14:17 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] blast/cgi question In-Reply-To: References: Message-ID: hi, Jason: Thanks for your help. I added it in my script before the use stuff after #!/usr/bin/bin/perl -w, the error msg is the same. When I added it at the very first beginning of my script, I got a msg saying"The server encountered an internal error or misconfiguration and was unable to complete your request". Do you happen to know how to fix it? One thousand thanks! Sophia On Thu, 1 Jul 2004, Jason Stajich wrote: > Because your website is not running under your uid it probably isn't > seeing the .ncbirc file. > > You might try making sure the BLASTMAT env variable is set to wherever > your ncbi 'data' directory /usr/local/BLAST/data probably. > > You can set it in your script by adding > > BEGIN { > $ENV{'BLASTMAT'} = '/usr/local/BLAST/data'; > } > To your script up above all the module 'use' statements. > > On Thu, 1 Jul 2004, Huizhuan Wu wrote: > > > Hi, > > I am new to Bioperl. Now I need to work on a website > > that accepts a sequence from users and then a BioPerl script blast it > > against a local db and prints out the result. The problem is my BioPerl > > cgi script works fine locally (I set the params through commandline) but > > doesn't work through web(error msg: blastall call crashed...). Do you > > happen to know how to solve this? > > > > > > The error msg I got: > > MSG: blastall call crashed: 256 /usr/local/BLAST/blastall -p blastn -d > > /Users/work/data.txt -i > > /tmp/ITPxjyt0zx -e 1e-10 -o blast.out -F F -g F -W 17 > > > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > > /Library/Perl/5.8.1/Bio/Tools/Run/StandAloneBlast.pm:732 > > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > > /Library/Perl/5.8.1/Bio/Tools/Run/StandAloneBlast.pm:680 > > STACK Bio::Tools::Run::StandAloneBlast::blastall > > /Library/Perl/5.8.1/Bio/Tools/Run/StandAloneBlast.pm:536 > > STACK toplevel /Users/wuhuizhuan/Sites/test/CGI-BIN/bl.cgi:50 > > > > > > Here is the code: > > #!/usr/bin/perl -w > > > > # This script gets sequence from webpage and perform blastn > > #against data.txt database. > > > > use strict; > > use warnings; > > use Getopt::Long; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::SeqIO; > > use Data::Dumper; > > > > # HTML staff > > print header; > > print start_html("BLAST RESULT"); > > print h2("BLAST RESULT"); > > > > # Variables > > my ($query, $seq); > > my $db = "/Users/work/data.txt"; > > my $maxEval = 1.0e-10; > > > > my $seq = param('SEQUENCE'); > > my $seqobj = Bio::Seq->new( '-display_id' => $query, > > '-seq' => $seq); > > > > print Dumper($seqobj); > > my @params = ('program'=>'blastn', 'outfile'=>'blast.out', > > '_READMETHOD'=>'Blast', 'F'=>'F','W'=>17,'g'=>'F'); > > > > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > $factory->e($maxEval); > > $factory->d($db); > > my $blast_report = $factory->blastall($seqobj); > > #my $result = $blast_report->next_result; > > print "$blast_report\n"; > > print end_html; > > > > > > > > regards, > > Sophia > > > > _______________________________________________ > > Bioperl-guts-l mailing list > > Bioperl-guts-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > From jason at cgt.duhs.duke.edu Fri Jul 2 16:55:00 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jul 2 16:57:33 2004 Subject: [Bioperl-l] Problems parsing swiss-prot files In-Reply-To: <41702.134.68.144.57.1088715465.squirrel@www.cs.iupui.edu> References: <41702.134.68.144.57.1088715465.squirrel@www.cs.iupui.edu> Message-ID: Is there more than one RP line per reference? The data structures and parsers currently assume there is only one. can you send an acc so we can add it to the tests? -jason On Thu, 1 Jul 2004, Jessica Dantzer wrote: > I'm working on parsing swiss-prot files for use in another database, and > I've managed to work out where all the information I need is stored for > the most part. The only problems I'm encountering are with the reference > parsing-- Some of the files have multiple "RP" lines, and I only seem to > be able to get one. The code seems to indicate that this is how the files > are parsed. Is there any other way to access the second line? > > Thanks, > Jessica > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jdantzer at cs.iupui.edu Fri Jul 2 17:09:19 2004 From: jdantzer at cs.iupui.edu (Jessica Dantzer) Date: Fri Jul 2 17:11:36 2004 Subject: [Bioperl-l] Problems parsing swiss-prot files In-Reply-To: References: <41702.134.68.144.57.1088715465.squirrel@www.cs.iupui.edu> Message-ID: <6.1.1.1.1.20040702160302.01b2b7f8@klingon.cs.iupui.edu> Most of the references in most of the files have only one RP line. Occasionally, there are two. I haven't seen more than two, though. One of the files that had more than one line in at least one reference was for P33897. I'm parsing information on the mutation/ variant data and their references, and so need some of the information on those second lines. At 03:55 PM 7/2/2004, Jason Stajich wrote: >Is there more than one RP line per reference? The data structures and >parsers currently assume there is only one. >can you send an acc so we can add it to the tests? > >-jason >On Thu, 1 Jul 2004, Jessica Dantzer wrote: > > > I'm working on parsing swiss-prot files for use in another database, and > > I've managed to work out where all the information I need is stored for > > the most part. The only problems I'm encountering are with the reference > > parsing-- Some of the files have multiple "RP" lines, and I only seem to > > be able to get one. The code seems to indicate that this is how the files > > are parsed. Is there any other way to access the second line? > > > > Thanks, > > Jessica > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >-- >Jason Stajich >Duke University >jason at cgt.mc.duke.edu From lstein at cshl.edu Fri Jul 2 18:03:32 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Jul 2 18:06:48 2004 Subject: [Bioperl-l] Bio::Graphics - how to adjust height of a panel as you add more tracks? In-Reply-To: <20040626165721.30183.qmail@web41504.mail.yahoo.com> References: <20040626165721.30183.qmail@web41504.mail.yahoo.com> Message-ID: <200407021803.32823.lstein@cshl.edu> Hi, I'm not sure whether this was answered already, but the issue has got to be in your previewer. Please try a different image viewer and let me know if you still see the same problem. Lincoln On Saturday 26 June 2004 12:57 pm, Allen Liu wrote: > Hi, > > When I begin to add many tracks to a panel (about 300 > tracks), I notice that the panel actually starts > shrinking the tracks to the point where I cannot read > the information on the tracks anymore. At first, I > thought the previewer was automatically set to fit the > entire PNG on the screen at once, but when I try to > zoom in, the resolution gets really pixelated. The > height appears to stay fixed when I am adding these > tracks and I was wondering if there is anyway of > extending the height as the panel grows. > > Any information would be greatly appreciated. > > Allen Liu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From jason at cgt.duhs.duke.edu Fri Jul 2 20:49:02 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jul 2 20:51:29 2004 Subject: [Bioperl-l] Problems parsing swiss-prot files In-Reply-To: <6.1.1.1.1.20040702160302.01b2b7f8@klingon.cs.iupui.edu> References: <41702.134.68.144.57.1088715465.squirrel@www.cs.iupui.edu> <6.1.1.1.1.20040702160302.01b2b7f8@klingon.cs.iupui.edu> Message-ID: I've fixed it in CVS. I also fixed a bunch of other things in swissprot parsing to make the parser cleaner I hope. This involved improving the 'new' function in Bio::Annotation::Reference so you'd want to get that as well if you getting code from CVS. Multi-line RP lines are now all put into the rp field of the Annotation::Reference object. The parser takes care of splitting it back into multi-line fields upon writing (although I didn't test this case specifically). PVH and our code auditors. As happy as I am about the code audit for SeqIO and the like and making sure that things can roundtrip. I really feel like the guts of these parsers could just a few weeks of someone's time to clean them up first. Of course myself and few others would want to simplify the sequence/annotation/feature object model first so who knows what is the best starting point... -jason On Fri, 2 Jul 2004, Jessica Dantzer wrote: > Most of the references in most of the files have only one RP > line. Occasionally, there are two. I haven't seen more than two, > though. One of the files that had more than one line in at least one > reference was for P33897. I'm parsing information on the mutation/ variant > data and their references, and so need some of the information on those > second lines. > > At 03:55 PM 7/2/2004, Jason Stajich wrote: > >Is there more than one RP line per reference? The data structures and > >parsers currently assume there is only one. > >can you send an acc so we can add it to the tests? > > > >-jason > >On Thu, 1 Jul 2004, Jessica Dantzer wrote: > > > > > I'm working on parsing swiss-prot files for use in another database, and > > > I've managed to work out where all the information I need is stored for > > > the most part. The only problems I'm encountering are with the reference > > > parsing-- Some of the files have multiple "RP" lines, and I only seem to > > > be able to get one. The code seems to indicate that this is how the files > > > are parsed. Is there any other way to access the second line? > > > > > > Thanks, > > > Jessica > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > >-- > >Jason Stajich > >Duke University > >jason at cgt.mc.duke.edu > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From bmb9jrm at bmb.leeds.ac.uk Sat Jul 3 06:29:21 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Sat Jul 3 06:31:25 2004 Subject: [Bioperl-l] Bioperl: CVS and packages Message-ID: <1088850561.6755.10.camel@localhost.localdomain> Hi all, I installed my bioperl by downloading the packages. On reflection CVS might have been better. Is there a way to update with CVS anyway? Or am I going to have to do everything again when I want to update? Thanks, Jon From sdavis2 at mail.nih.gov Sat Jul 3 07:51:16 2004 From: sdavis2 at mail.nih.gov (Davis, Sean (NIH/NHGRI)) Date: Sat Jul 3 07:53:44 2004 Subject: [Bioperl-l] Bioperl: CVS and packages Message-ID: <0E3E7E8F6E23DF4C8127A063568356B50473B0F3@nihexchange12.nih.gov> Once you have followed the CVS instructions to get the directory once, updating is as simple as going to the top-level directory of what you want to update (bioperl-live, bioperl-db, etc.) and typing: cvs update. Then, you just remake and reinstall the packages, which for perl is almost trivial for the majority of packages. Sean -----Original Message----- From: Jonathan Manning To: Bioperl Sent: 7/3/2004 6:29 AM Subject: [Bioperl-l] Bioperl: CVS and packages Hi all, I installed my bioperl by downloading the packages. On reflection CVS might have been better. Is there a way to update with CVS anyway? Or am I going to have to do everything again when I want to update? Thanks, Jon _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From tex at biosysadmin.com Sat Jul 3 08:53:49 2004 From: tex at biosysadmin.com (James Thompson) Date: Sat Jul 3 08:55:08 2004 Subject: [Bioperl-l] Bioperl: CVS and packages In-Reply-To: <0E3E7E8F6E23DF4C8127A063568356B50473B0F3@nihexchange12.nih.gov> Message-ID: Jonathan, You can do both. I have a system-wide install of Bioperl installed in /usr/local/lib/perl5/ source, and I have the CVS tree in my home directory. Then I just add a "use lib" statement to use the packages from CVS. It's very easy to switch back and forth from the CVS and the release versions of BioPerl this way. Tex Thompson RIT Bioinformatics > -----Original Message----- > From: Jonathan Manning > To: Bioperl > Sent: 7/3/2004 6:29 AM > Subject: [Bioperl-l] Bioperl: CVS and packages > > Hi all, > > I installed my bioperl by downloading the packages. On reflection CVS > might have been better. Is there a way to update with CVS anyway? Or am > I going to have to do everything again when I want to update? > > Thanks, > > Jon From brian_osborne at cognia.com Sat Jul 3 09:41:28 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Sat Jul 3 09:44:14 2004 Subject: [Bioperl-l] Bioperl: CVS and packages In-Reply-To: Message-ID: Jon, Yes. "use lib ..." or set the PERL5LIB variable. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Thompson Sent: Saturday, July 03, 2004 8:54 AM To: 'Jonathan Manning ' Cc: 'Bioperl ' Subject: RE: [Bioperl-l] Bioperl: CVS and packages Jonathan, You can do both. I have a system-wide install of Bioperl installed in /usr/local/lib/perl5/ source, and I have the CVS tree in my home directory. Then I just add a "use lib" statement to use the packages from CVS. It's very easy to switch back and forth from the CVS and the release versions of BioPerl this way. Tex Thompson RIT Bioinformatics > -----Original Message----- > From: Jonathan Manning > To: Bioperl > Sent: 7/3/2004 6:29 AM > Subject: [Bioperl-l] Bioperl: CVS and packages > > Hi all, > > I installed my bioperl by downloading the packages. On reflection CVS > might have been better. Is there a way to update with CVS anyway? Or am > I going to have to do everything again when I want to update? > > Thanks, > > Jon _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Sat Jul 3 10:11:35 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Sat Jul 3 10:13:58 2004 Subject: [Bioperl-l] Bioperl: CVS and packages In-Reply-To: <1088850561.6755.10.camel@localhost.localdomain> Message-ID: Jon, Yes, using CVS rather than downloading packages is a good way to go, especially if you'd like the latest code for whatever reason. For those of you who aren't familiar with this idea we're talking about using anonymous CVS to have access to the latest, not a specific version. It's also an easy way to take a look at all the other OBF-sponsored projects (http://cvs.open-bio.org/). Instructions on how to set this up can be found at http://bioperl.org/UserInfo/. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jonathan Manning Sent: Saturday, July 03, 2004 6:29 AM To: Bioperl Subject: [Bioperl-l] Bioperl: CVS and packages Hi all, I installed my bioperl by downloading the packages. On reflection CVS might have been better. Is there a way to update with CVS anyway? Or am I going to have to do everything again when I want to update? Thanks, Jon _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmb9jrm at bmb.leeds.ac.uk Sat Jul 3 10:49:53 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Sat Jul 3 10:51:56 2004 Subject: [Bioperl-l] Bioperl: CVS and packages In-Reply-To: References: Message-ID: <1088866193.6755.21.camel@localhost.localdomain> Thanks for the replies everyone- I'll have a go with the CVS! Jon On Sat, 2004-07-03 at 15:11, Brian Osborne wrote: > Jon, > > Yes, using CVS rather than downloading packages is a good way to go, > especially if you'd like the latest code for whatever reason. For those of > you who aren't familiar with this idea we're talking about using anonymous > CVS to have access to the latest, not a specific version. It's also an easy > way to take a look at all the other OBF-sponsored projects > (http://cvs.open-bio.org/). Instructions on how to set this up can be found > at http://bioperl.org/UserInfo/. > > Brian O. > > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jonathan Manning > Sent: Saturday, July 03, 2004 6:29 AM > To: Bioperl > Subject: [Bioperl-l] Bioperl: CVS and packages > > Hi all, > > I installed my bioperl by downloading the packages. On reflection CVS > might have been better. Is there a way to update with CVS anyway? Or am > I going to have to do everything again when I want to update? > > Thanks, > > Jon > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason at cgt.duhs.duke.edu Sun Jul 4 13:30:56 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Jul 4 13:33:24 2004 Subject: [Bioperl-l] search2gff In-Reply-To: <20040702054541.GA821@psychro> References: <20040702054541.GA821@psychro> Message-ID: On Fri, 2 Jul 2004, Neil Saunders wrote: > I have a couple of queries regarding the search2gff script. > > 1) I'm using the latest CVS (1.8) which states "search2gff does the > right thing now - now strand is done right properly". Am I right in > thinking that now target start is always less than target end? > exactly. > 2) Should search2gff work with any BLAST flavour? > > I ask because I'm running it on tblastx files. Without the -m switch > it seems to run fine, but if I add -m, I get thrown exceptions looking > like e.g. : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Undefined sub-sequence (2827710,2827711). Valid range = 2827710 - > 2827889 > > The undefined subsequence start/ends always differ by 1. > Can you send the whole stack trace and/or better an example blast file. I guess ideally this should be submitted as a bug. I don't see any calls to matches in the code so I am not sure why this is getting triggered. I changed the script again to be a little simpler so might pull from latest CVS again. > thanks, > Neil > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From alexg at ebi.ac.uk Mon Jul 5 06:35:08 2004 From: alexg at ebi.ac.uk (Alex Gutteridge) Date: Mon Jul 5 06:37:28 2004 Subject: [Bioperl-l] PDB sequence from ATOM records In-Reply-To: <200407021715.i62HEqKu005532@portal.open-bio.org> Message-ID: > Hi, > > I need to be able to map some protein sequence alignment information > on to > a protein structure. To do this I need to get the sequence from the > ATOM > records, since the SEQRES sequence is often not exactly the same. > > So I'd like to change Bio::Structure::Residue a little: > > Amino-acid type and residue number are currently contained in one > value, > residue->id. I would like to separate them in to two, residue->type > and residue->num. Then, for backwards compatibility, construct > residue->id > from these each time it is required (or store it as well if that is > better?). residue->type should be able to return the one-letter code as > well as the three-letter code. > > And then have a method called something like > Bio::Structure::Entry->atom_seq > that would return a Bio::PrimarySeq object constructed from the > one-letter > codes of the residues of a particular chain. > > Any comments please... Thanks. > > Matthew > > P.S. Sorry if this message is a repeat - our email server went down as > I > was sending it the first time. > > -- > Matthew Betts, mailto:matthew.betts@ii.uib.no > Phone: (+47) 55 58 40 22, Fax: (+47) 55 58 42 95 > CBU, BCCS, HiB, UNIFOB / Universitetet i Bergen > Thorm?hlensgt. 55, N-5008 Bergen, Norway > Hi, I added a similar thing to my local copy of the BioPerl structure modules a while ago (I'm not sure who - if anyone is currently maintaining Bio::Structure::*). I can't seem to find my implementation, since I don't really use Bio::Structure::* anymore, but if I do, I'll forward it to you. It's definitely a useful method to have, like you say, SEQRES can be quite different to what you actually see in the ATOM records. It's simple enough to implement, but keep in mind the following gotchas: Gaps in the crystal structure: You'll have to pad with 'X's where the ATOM records are missing. E.g. if you have residues GLY40,GLU41,CYS44,LYS45. You need to make the sequence GEXXCK. Insertion codes: PDB files can have residue 'sequences' like: 40,41A,41B,42. You need to be careful not to just look at the number, otherwise you might miss a residue. Strange starting points: There are PDB files whose residue numbering starts in the negative! I.e. -5,-4,-3,-2,-1,0,1. Make sure your code can deal with this. The only other comment I would make is that I think the atom_seq method should be attached to the Chain object, not the Entry object. And so called by $chain->atom_seq not $entry->atom_seq('A'). The sequence is a property of a chain so this makes the most sense to me personally. Alex Gutteridge European Bioinformatics Institute Cambridge CB10 1SD UK Tel: 01223 492550 Email: alexg@ebi.ac.uk From alexg at ebi.ac.uk Mon Jul 5 09:42:16 2004 From: alexg at ebi.ac.uk (Alex Gutteridge) Date: Mon Jul 5 09:44:29 2004 Subject: [Bioperl-l] PDB sequence from ATOM records In-Reply-To: Message-ID: <20A147C9-CE89-11D8-8F5E-000A957E44DC@ebi.ac.uk> On Monday, July 5, 2004, at 02:40 pm, Jurgen Pletinckx wrote: > I have got an implementation ready as well, and hope to fold it > into the codebase soon. > > I would just like to point out that this > > # Gaps in the crystal structure: You'll have to pad with 'X's where the > # ATOM records are missing. E.g. if you have residues > # GLY40,GLU41,CYS44,LYS45. You need to make the sequence GEXXCK. > > is, imo, incorrect - the authors may well be using a historical > numbering scheme for their protein of interest, and 'missing' residue > numbers can correspond to a shorter loop in the new structure when > compared to the 'reference' numbering. In other words - don't insert > 'missing' residues for the user, as you can be wrong. Good point. I would say that in my experience this comes up less often than the situation I described (loop residues genuinely missing from the crystal structure), perhaps the padding could be an optional extra? > # The only other comment I would make is that I think the atom_seq > method > # should be attached to the Chain object, not the Entry object. And so > # called by $chain->atom_seq not $entry->atom_seq('A'). The sequence > is a > # property of a chain so this makes the most sense to me personally. > > Concur. Except, of course, I can't write it that way: a chain doesn't > know what residues are part of it, or what structure it is in. So I've > stuck the method on Entry level, together with all other methods. This is why I've stopped using the Bio::Structure::* modules. I realise that the modules were designed this way to avoid problems with Perl's reference counting, but it makes everything really ugly in my opinion. No offence intended :). There are other ways around circular references (like proxy objects) in Perl which allow different levels of the heirarchy to know about each other. Perl6 should make all this sort of thing much easier! > Patch to Entry.pm, Residue.pm and IO/pdb.pm is attached - enjoy! (And, > of course, review, comment and criticize) > > -- > Jurgen Pletinckx > AlgoNomics NV > Alex Gutteridge European Bioinformatics Institute Cambridge CB10 1SD UK Tel: 01223 492550 Email: alexg@ebi.ac.uk From LewisCT at AGR.GC.CA Mon Jul 5 18:15:00 2004 From: LewisCT at AGR.GC.CA (Lewis, Christopher) Date: Mon Jul 5 18:17:41 2004 Subject: [Bioperl-l] Writing a Bio::Assembly::Contig using Bio::AlignIO Message-ID: Hello all, I have a couple of questions that arose during a brief discussion with Jason Stajich a while back (Jason, thanks very much for you input, I just got a chance to come back to this), as well as a diff of the changes I've made so far. What did I want to do: Some time back I wanted to display the Assembly in an ace file produced by phrap. Looking at the documentation I saw that Bio::Assembly::Contig ISA Bio::Align::AlignI, so I thought I should be able to write it to a file using Bio::AlignIO. Problems I encountered: There turned out to be two problems with this: 1) many of the Bio::Align::AlignI methods were not implemented in Bio::Assembly::Contig and 2) alignments in Bio::Assembly::Contig are not flush. Possible solution: I went ahead and implemented the missing methods in Bio::Assembly::Contig and added a pad method to makes the alignment flush and an unpad method to undo the effects of pad (similar to how match/unmatch work). The diff output is at the end of this email. Usage example: my $in = Bio::Assembly::IO->new (-file => "contigs.ace", -format => "ace"); my $out = Bio::AlignIO->new (-file => ">test.out", -format => "clustalw"); while (my $aln = $in->next_assembly()) { my @contigs = $aln->all_contigs(); if (scalar @contigs == 0) { last; } foreach my $contig (@contigs) { $contig->pad(); $out->write_aln($contig); $contig->unpad(); } } This seems to work for all the AlignIO modules which implemented write_aln. Additional questions: Jason and I talked about whether or not pad/unpad should be contig specific, but after looking through the other methods present in AlignI I went ahead and added pad/unpad there because it seemed correct to me... this could be removed if others disagree. > I think leaving the stuff as contig.pm specific for now but perhaps > re-posting the proposal to the bioperl list will elicit more response > (hopefully). The other question that arose during my exchange with Jason was whether or not modules such as Bio::AlignIO::clustalw should test for flush. Here is Jason's suggestion, however I haven't implemented it because I'm not familiar enough with the alignment specs to know if it is a good idea (and Jason didn't seem 100% certain)... What do others think? > As for the testing for flush in clustalw etc, I'm not sure what the > alignment specs say - perhaps we can set it up so that there is a > "strictness" flag for AlignIO which is set to on by default and will > test for flushness. I imagine it is hard to do things like fasta MSA > alignments as unflush so it would be critical to have it test there... Thank you... Awaiting your comments, Chris Diff output: Index: Bio/Align/AlignI.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Align/AlignI.pm,v retrieving revision 1.9 diff -r1.9 AlignI.pm 426a427,462 > =head2 pad > > Title : pad() > Usage : $ali->pad() > Function : > > Goes through all sequences and pads them so that > the alignment is flush. > > Returns : 1 > Argument : a pad character, optional, defaults to '-' > > =cut > > sub pad { > my ($self) = @_; > $self->throw_not_implemented(); > } > > =head2 unpad > > Title : unpad() > Usage : $ali->unpad() > Function : > > Undoes the effect of method pad. > > Returns : 1 > Argument : > > =cut > > sub unpad { > my ($self) = @_; > $self->throw_not_implemented(); > } Index: Bio/Assembly/Contig.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Assembly/Contig.pm,v retrieving revision 1.4 diff -r1.4 Contig.pm 210c210,244 < use vars qw(@ISA); --- > use vars qw(@ISA %CONSERVATION_GROUPS); > use Dumpvalue; > > BEGIN { > # This data should probably be in a more centralized module... > # it is taken from Clustalw documentation > # These are all the positively scoring groups that occur in the > # Gonnet Pam250 matrix. The strong and weak groups are > # defined as strong score >0.5 and weak score =<0.5 respectively. > > %CONSERVATION_GROUPS = ( 'strong' => [ qw(STA > NEQK > NHQK > NDEQ > QHRK > MILV > MILF > HY > FYW) > ], > 'weak' => [ qw(CSA > ATV > SAG > STNK > STPA > SGND > SNDEQK > NDEQHK > NEQHRK > FVLIM > HFY) ], > ); > > } > 1304c1338 < my $seqID = $self->{'_order'}->{--$pos}; --- > my $seqID = $self->{'_order'}->{$pos}; 1404,1405c1438,1451 < my ($self) = @_; < $self->throw_not_implemented(); --- > my $self = shift; > my $from = shift; > my $to = shift; > my ($seq,$temp); > > $self->throw("Need exactly two arguments") > unless defined $from and defined $to; > > foreach $seq ( $self->each_seq() ) { > $temp = $seq->seq(); > $temp =~ s/$from/$to/g; > $seq->seq($temp); > } > return 1; 1407a1454 > 1436,1437c1483,1590 < my ($self) = @_; < $self->throw_not_implemented(); --- > my ($self,$matchlinechar, $strong, $weak) = @_; > > my %matchchars = ( 'match' => $matchlinechar || '*', > 'weak' => $weak || '.', > 'strong' => $strong || ':', > 'mismatch'=> ' ', > ); > > if (!$self->is_flush()) { > $self->throw ("Producing a matchline for an alignment which is not flush doesn't make sense IMO - Chris."); > return; > } > > my @seqchars; > my $seqcount = 0; > my $alphabet; > foreach my $seq ( $self->each_seq ) { > push @seqchars, [ split(//, uc ($seq->seq)) ]; > $alphabet = $seq->alphabet unless defined $alphabet; > } > my $refseq = shift @seqchars; > # let's just march down the columns > my $matchline; > POS: foreach my $pos ( 0..$self->length ) { > my $refchar = $refseq->[$pos]; > next unless $refchar; # skip '' > my %col = ($refchar => 1); > my $dash = ($refchar eq '-' || $refchar eq '.' || $refchar eq ' '); > foreach my $seq ( @seqchars ) { > $dash = 1 if( $seq->[$pos] eq '-' || $seq->[$pos] eq '.' || > $seq->[$pos] eq ' ' ); > $col{$seq->[$pos]}++; > } > my @colresidues = sort keys %col; > my $char = $matchchars{'mismatch'}; > # if all the values are the same > if( $dash ) { $char = $matchchars{'mismatch'} } > elsif( @colresidues == 1 ) { $char = $matchchars{'match'} } > elsif( $alphabet eq 'protein' ) { # only try to do weak/strong > # matches for protein seqs > TYPE: foreach my $type ( qw(strong weak) ) { > # iterate through categories > my %groups; > # iterate through each of the aa in the col > # look to see which groups it is in > foreach my $c ( @colresidues ) { > foreach my $f ( grep /\Q$c/, @{$CONSERVATION_GROUPS{$type}} ) { > push @{$groups{$f}},$c; > } > } > GRP: foreach my $cols ( values %groups ) { > @$cols = sort @$cols; > # now we are just testing to see if two arrays > # are identical w/o changing either one > > # have to be same len > next if( scalar @$cols != scalar @colresidues ); > # walk down the length and check each slot > for($_=0;$_ < (scalar @$cols);$_++ ) { > next GRP if( $cols->[$_] ne $colresidues[$_] ); > } > $char = $matchchars{$type}; > last TYPE; > } > } > } > $matchline .= $char; > } > > return $matchline; > > } > > =head2 pad > > Title : pad > Usage : if( !$contig->is_flush() ) { > : $contig->pad(); > : > Function : Pad the sequences in the alignment to make it flush > : Does nothing if the alignment has already been padded > : > Returns : 1 > Argument : $pad_char, optional, default '-' > > =cut > > sub pad { > my ($self,$pad_char) = @_; > > $pad_char ||= '-'; > > if (!$self->{'pad_seq'}) { > $self->{'pad_seq'} = 1; > foreach my $seq ( $self->each_seq ) { > my $coord = $self->get_seq_coord($seq); > my $tmp_seq = $seq->seq(); > for (my $i = 1; $i < $coord->start(); $i++) { > $tmp_seq = $pad_char.$tmp_seq; > } > for (my $i = $coord->end(); $i < $self->length(); $i++) { > $tmp_seq .= $pad_char; > } > $seq->seq($tmp_seq); > } > } > > return 1; 1439a1593,1623 > =head2 unpad > > Title : unpad > Usage : $contig->unpad(); > : > Function : undo the effects of the pad method. > : does nothing if the alignment has not been padded > : > Returns : 1 > Argument : > > =cut > > sub unpad { > my ($self) = @_; > > if ($self->{'pad_seq'}) { > $self->{'pad_seq'} = 0; > my $dumper = new Dumpvalue(); > foreach my $seq ( $self->each_seq ) { > my $coord = $self->get_seq_coord($seq); > my @chars = split //, $seq->seq(); > @chars = splice @chars, $coord->start()-1, > $coord->end() - $coord->start() + 1; > $seq->seq(join '', @chars); > } > } > return 1; > } > > 1440a1625 > $self->throw_not_implemented(); 1461,1462c1646,1676 < my ($self) = @_; < $self->throw_not_implemented(); --- > my ($self, $match) = @_; > > if (!$self->is_flush()) { > $self->throw ("Running match on an alignment which is not flush doesn't make sense IMO - Chris"); > return; > } > > $match ||= '.'; > my ($matching_char) = $match; > $matching_char = "\\$match" if $match =~ /[\^.$|()\[\]]/ ; #'; > $self->map_chars($matching_char, '-'); > > my @seqs = $self->each_seq(); > return 1 unless scalar @seqs > 1; > > > my $refseq = shift @seqs ; > my @refseq = split //, $refseq->seq; > my $gapchar = $self->gap_char; > > foreach my $seq ( @seqs ) { > my @varseq = split //, $seq->seq(); > for ( my $i=0; $i < scalar @varseq; $i++) { > $varseq[$i] = $match if defined $refseq[$i] && > ( $refseq[$i] =~ /[a-zA-Z\*]/ || > $refseq[$i] =~ /$gapchar/ ) > && $refseq[$i] eq $varseq[$i]; > } > $seq->seq(join '', @varseq); > } > $self->match_char($match); > return 1; 1525,1526c1739,1746 < my ($self) = @_; < $self->throw_not_implemented(); --- > my ($self, $char) = @_; > > if (defined $char ) { > $self->throw("Single missing character, not [$char]!") if CORE::length($ char) > 1; > $self->{'_missing_char'} = $char; > } > > return $self->{'_missing_char'}; 1540,1541c1760,1767 < my ($self) = @_; < $self->throw_not_implemented(); --- > my ($self, $char) = @_; > > if (defined $char ) { > $self->throw("Single match character, not [$char]!") if CORE::length($ch ar) > 1; > $self->{'_match_char'} = $char; > } > > return $self->{'_match_char'}; 1555,1556c1781,1788 < my ($self) = @_; < $self->throw_not_implemented(); --- > my ($self, $char) = @_; > > if (defined $char || ! defined $self->{'_gap_char'} ) { > $char= '-' unless defined $char; > $self->throw("Single gap character, not [$char]!") if CORE::length($char ) > 1; > $self->{'_gap_char'} = $char; > } > return $self->{'_gap_char'}; 1570,1571c1802,1816 < my ($self) = @_; < $self->throw_not_implemented(); --- > my ($self,$includeextra) = @_; > > unless ($self->{'_symbols'}) { > foreach my $seq ($self->each_seq) { > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq); > } > } > my %copy = %{$self->{'_symbols'}}; > if( ! $includeextra ) { > foreach my $char ( $self->gap_char, $self->match_char, > $self->missing_char) { > delete $copy{$char} if( defined $char ); > } > } > return keys %copy; 1641,1642c1886,1908 < my ($self) = @_; < $self->throw_not_implemented(); --- > my ($self,$report) = @_; > my $seq; > my $length = (-1); > my $temp; > > foreach $seq ( $self->each_seq() ) { > if( $length == (-1) ) { > $length = CORE::length($seq->seq()); > next; > } > > $temp = CORE::length($seq->seq()); > if( $temp != $length ) { > $self->warn("expecting $length not $temp from ". > $seq->display_id) if( $report ); > $self->debug("expecting $length not $temp from ". > $seq->display_id); > $self->debug($seq->seq(). "\n"); > return 0; > } > } > return 1; > 1658,1659c1924 < < $self->throw_not_implemented(); --- > return $self->get_consensus_length(); 1676c1941 < sub maxname_length { --- > sub maxdisplayname_length { 1678c1943,1955 < $self->throw_not_implemented(); --- > > my $maxname = (-1); > my ($seq,$len); > > foreach $seq ( $self->each_seq() ) { > $len = CORE::length $self->displayname($seq->get_nse()); > > if( $len > $maxname ) { > $maxname = $len; > } > } > > return $maxname; 1680a1958 > 1834c2112,2128 < sub displayname { # Do nothing --- > sub displayname { > my ($self, $name, $disname) = @_; > > $name =~ s/\/\d+-\d+//; > > $self->throw("No sequence with name [$name]") > unless defined $self->{'_elem'}{$name}; > > if( $disname and $name) { > $self->{'_dis_name'}->{$name} = $disname; > return $disname; > } > elsif( defined $self->{'_dis_name'}->{$name} ) { > return $self->{'_dis_name'}->{$name}; > } else { > return $name; > } 1867c2161,2169 < sub set_displayname_flat { # Do nothing! --- > sub set_displayname_flat { > my $self = shift; > my ($nse,$seq); > > foreach $seq ( $self->each_seq() ) { > $nse = $seq->get_nse(); > $self->displayname($nse,$seq->id()); > } > return 1; ---- Christopher T. Lewis Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada LewisCT@agr.gc.ca From amackey at pcbi.upenn.edu Mon Jul 5 18:39:39 2004 From: amackey at pcbi.upenn.edu (Aaron J Mackey) Date: Mon Jul 5 18:41:49 2004 Subject: [Bioperl-l] Writing a Bio::Assembly::Contig using Bio::AlignIO In-Reply-To: References: Message-ID: Re: pad/unpad; does this add gaps or missing chars? If missing, then I can't see how this could be a bad thing, and I'd vote in favor of it (in AlignI) ... -Aaron From LewisCT at AGR.GC.CA Mon Jul 5 19:02:35 2004 From: LewisCT at AGR.GC.CA (Lewis, Christopher) Date: Mon Jul 5 19:05:34 2004 Subject: [Bioperl-l] Writing a Bio::Assembly::Contig using Bio::AlignIO Message-ID: Just missing characters at the beginning and end of a sequence. cheers, Chris > -----Original Message----- > From: Aaron J Mackey [mailto:amackey@pcbi.upenn.edu] > Sent: Monday, July 05, 2004 4:40 PM > To: Lewis, Christopher > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] Writing a Bio::Assembly::Contig using > Bio::AlignIO > > > Re: pad/unpad; does this add gaps or missing chars? If missing, then I > can't see how this could be a bad thing, and I'd vote in favor of it (in > AlignI) ... > > -Aaron From gillies82 at hotmail.com Tue Jul 6 06:53:04 2004 From: gillies82 at hotmail.com (Stuart Gillies) Date: Tue Jul 6 06:56:33 2004 Subject: [Bioperl-l] Automating a fasta sequence submission Message-ID: Hi, im pretty new to perl, and so i apologoes in advance for being a novice. i was hoping someone would help me woth a little task i have to do: I have a text file containing over 50 fasta sequences, i need a perl program to : 1- open the text file, reading through it, for each fasta sequence (the start is indicated by a > symbol) i need to do the following 2 - copy the sequence into an online submission form (similar to a blast submission form) 3 - hit the submit button 4 - retrieve the results from then following results page, preferably adding to a text file, so i end up with over 500 results on one text file. any help would be great stuart gillies university if liverpool _________________________________________________________________ It's fast, it's easy and it's free. Get MSN Messenger today! http://www.msn.co.uk/messenger From sdavis2 at mail.nih.gov Tue Jul 6 07:29:26 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Jul 6 07:30:45 2004 Subject: [Bioperl-l] Automating a fasta sequence submission In-Reply-To: Message-ID: Stuart, It sounds like you should probably begin by looking at the SeqIO HowTO (http://bioperl.org/HOWTOs/html/SeqIO.html). This will show you how to read in fasta format files and begin to work with the resulting seq objects. As for 2-4, if you are using a standard online tool, there may be access through bioperl, but maybe not. If not, this will involve constructing a URL that describes the "input" to the webpage and will likely involve parsing the HTML output unless there is a "batch" option. What are you planning on using, if I may ask? Sean On 7/6/04 6:53 AM, "Stuart Gillies" wrote: > Hi, im pretty new to perl, and so i apologoes in advance for being a novice. > > i was hoping someone would help me woth a little task i have to do: > > I have a text file containing over 50 fasta sequences, i need a perl program > to : > > 1- open the text file, reading through it, for each fasta sequence (the > start is indicated by a > symbol) i need to do the following > 2 - copy the sequence into an online submission form (similar to a blast > submission form) > 3 - hit the submit button > 4 - retrieve the results from then following results page, preferably adding > to a text file, so i end up with over 500 results on one text file. > > any help would be great > > stuart gillies > university if liverpool > > _________________________________________________________________ > It's fast, it's easy and it's free. Get MSN Messenger today! > http://www.msn.co.uk/messenger > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From heikki at ebi.ac.uk Tue Jul 6 07:49:38 2004 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Jul 6 07:51:48 2004 Subject: [Bioperl-l] Automating a fasta sequence submission In-Reply-To: References: Message-ID: <200407061249.38299.heikki@ebi.ac.uk> Stuart, Since you asking this question on bioperl mailing list, you must be contemplating on using bioperl on you problem. What you need is a small program that loops over you sequences and calls a subrroutine to do the web form processing. You should be able to figure out from example scripts and this howto: http://bioperl.org/HOWTOs/html/SeqIO.html how to do the first part. Second part can be done with perl CPAN modules like LWP::Simple, LWP::UserAgent or WWW:Mechanize. There one solution to this within bioperl. see: http://bioperl.org/HOWTOs/html/SimpleWebAnalysis.html Yours, -Heikki When you do not what to do, check the FAQ: http://bio.perl.org/Core/Latest/faq.html -H On Tuesday 06 Jul 2004 11:53, Stuart Gillies wrote: > Hi, im pretty new to perl, and so i apologoes in advance for being a > novice. > > i was hoping someone would help me woth a little task i have to do: > > I have a text file containing over 50 fasta sequences, i need a perl > program to : > > 1- open the text file, reading through it, for each fasta sequence (the > start is indicated by a > symbol) i need to do the following > 2 - copy the sequence into an online submission form (similar to a blast > submission form) > 3 - hit the submit button > 4 - retrieve the results from then following results page, preferably > adding to a text file, so i end up with over 500 results on one text file. > > any help would be great > > stuart gillies > university if liverpool > > _________________________________________________________________ > It's fast, it's easy and it's free. Get MSN Messenger today! > http://www.msn.co.uk/messenger > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From boris.steipe at utoronto.ca Tue Jul 6 08:28:52 2004 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Tue Jul 6 08:31:01 2004 Subject: [Bioperl-l] Automating a fasta sequence submission In-Reply-To: Message-ID: <0A62DC06-CF48-11D8-8A5D-000A9577512E@utoronto.ca> Stuart, The Canadian Bioinformatics Help desk has developed sample code that seems to do pretty much what you have described - albeit using LWP rather than a forms interface. The code may give you some useful ideas. You can download it (free) from their software repository and if I remember correctly its in one of the two "Programmming in Perl" collections. All the best, Boris > > On 7/6/04 6:53 AM, "Stuart Gillies" wrote: > >> >> I have a text file containing over 50 fasta sequences, i need a perl >> program >> to : >> >> 1- open the text file, reading through it, for each fasta sequence >> (the >> start is indicated by a > symbol) i need to do the following >> 2 - copy the sequence into an online submission form (similar to a >> blast >> submission form) >> 3 - hit the submit button >> 4 - retrieve the results from then following results page, preferably >> adding >> to a text file, so i end up with over 500 results on one text file. >> >> any help would be great >> >> stuart gillies >> university if liverpool From chenn at cshl.edu Tue Jul 6 01:35:42 2004 From: chenn at cshl.edu (Jack Chen) Date: Tue Jul 6 09:33:06 2004 Subject: [Bioperl-l] Question about script bp_pairwise_kaks.pl Message-ID: Hi, When I run the script for the following pair of sequences, the program dumped the following message: ------------- EXCEPTION Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output STACK Bio::Tools::Phylo::PAML::_parse_summary /usr/lib/perl5/site_perl/5.6.0/Bio/Tools/Phylo/PAML.pm:359 STACK Bio::Tools::Phylo::PAML::next_result /usr/lib/perl5/site_perl/5.6.0/Bio/Tools/Phylo/PAML.pm:224 STACK toplevel /usr/bin/bp_pairwise_kaks.pl:177 I know that these sequence don't contain internal stop codons. could someone explain what does this mean? I used codeml. Thanks Jack =-======================================================-= >F02.5_2 agtggttaccttctactgggctatcatgcaaaaccaactttattccaaaccaacacaattgttccaatgaatgg >CBG29_2 agtggttatttacttctggggtaccattccaaacccacactattccaaactaacactattgtacccatgactgg From jason at cgt.duhs.duke.edu Tue Jul 6 09:35:10 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jul 6 09:37:36 2004 Subject: [Bioperl-l] Question about script bp_pairwise_kaks.pl In-Reply-To: References: Message-ID: I added an extra printing line so that you get the error string printed from the program. CVS update and run with -v. The answer is "74 nucleotides, not a multiple of 3!" jason@jason $ perl ~/bioperl/core/scripts/utilities/pairwise_kaks.PLS -i test.fa -v Program /sw/bin/clustalw clustal command = /sw/bin/clustalw align -infile=/tmp/14klOIck6A/ad454zhVxR -output=gcg -outfile=/tmp/14klOIck6A/rNoHgThMfU CLUSTAL W (1.83) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: F02.5_2 25 aa Sequence 2: CBG29_2 25 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 92 Guide tree file created: [/tmp/14klOIck6A/ad454zhVxR.dnd] Start of Multiple Alignment There are 1 groups Aligning... Group 1: Sequences: 2 Score:530 Alignment Score 139 GCG-Alignment file created [/tmp/14klOIck6A/rNoHgThMfU] -------------------- WARNING --------------------- MSG: There was an error - see error_string for the program output STACK Bio::Tools::Run::Phylo::PAML::Codeml::run /Users/jason/Development/bioperl/run/Bio/Tools/Run/Phylo/PAML/Codeml.pm:482 STACK toplevel /Users/jason/bioperl/core/scripts/utilities/pairwise_kaks.PLS:171going to remove files /tmp/L6EOACFn53/MIFMS7A0jq gaps are removed for pairwise comparison. 74 nucleotides, not a multiple of 3! going to remove files /tmp/L6EOACFn53/MIFMS7A0jq going to remove files /tmp/L6EOACFn53/MIFMS7A0jq going to remove files /tmp/14klOIck6A/ad454zhVxR,/tmp/14klOIck6A/rNoHgThMfU On Tue, 6 Jul 2004, Jack Chen wrote: > Hi, > > When I run the script for the following pair of sequences, the program > dumped the following message: > > ------------- EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output > STACK Bio::Tools::Phylo::PAML::_parse_summary > /usr/lib/perl5/site_perl/5.6.0/Bio/Tools/Phylo/PAML.pm:359 > STACK Bio::Tools::Phylo::PAML::next_result > /usr/lib/perl5/site_perl/5.6.0/Bio/Tools/Phylo/PAML.pm:224 > STACK toplevel /usr/bin/bp_pairwise_kaks.pl:177 > > I know that these sequence don't contain internal stop codons. could > someone explain what does this mean? I used codeml. > > Thanks > > Jack > > =-======================================================-= > > >F02.5_2 > agtggttaccttctactgggctatcatgcaaaaccaactttattccaaaccaacacaattgttccaatgaatgg > >CBG29_2 > agtggttatttacttctggggtaccattccaaacccacactattccaaactaacactattgtacccatgactgg > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From pdavid at netvisao.pt Tue Jul 6 07:48:58 2004 From: pdavid at netvisao.pt (pdavid@netvisao.pt) Date: Tue Jul 6 09:38:44 2004 Subject: [Bioperl-l] Automating a fasta sequence submission In-Reply-To: References: Message-ID: <47771.193.137.94.3.1089114538.squirrel@193.137.94.3> Hi, To read through the text file, you can use Bio::Seq http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq.html If you read the synopsis, you'll see how to get each sequence from the file into a variable. To do the rest, if it's something that Bioperl won't do (check the tutorial at http://www.bioperl.org/Core/Latest/bptutorial.html) I would use the WWW::Mechanize module to feed the sequences into the website. It's not hard to use, if you look at the examples and then at the module itself: http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod Good luck, Paulo Almeida > I have a text file containing over 50 fasta sequences, i need a perl > program > to : > > 1- open the text file, reading through it, for each fasta sequence (the > start is indicated by a > symbol) i need to do the following > 2 - copy the sequence into an online submission form (similar to a blast > submission form) > 3 - hit the submit button > 4 - retrieve the results from then following results page, preferably > adding > to a text file, so i end up with over 500 results on one text file. From jurgen.pletinckx at algonomics.com Mon Jul 5 09:40:36 2004 From: jurgen.pletinckx at algonomics.com (Jurgen Pletinckx) Date: Tue Jul 6 09:39:47 2004 Subject: [Bioperl-l] PDB sequence from ATOM records In-Reply-To: Message-ID: I have got an implementation ready as well, and hope to fold it into the codebase soon. I would just like to point out that this # Gaps in the crystal structure: You'll have to pad with 'X's where the # ATOM records are missing. E.g. if you have residues # GLY40,GLU41,CYS44,LYS45. You need to make the sequence GEXXCK. is, imo, incorrect - the authors may well be using a historical numbering scheme for their protein of interest, and 'missing' residue numbers can correspond to a shorter loop in the new structure when compared to the 'reference' numbering. In other words - don't insert 'missing' residues for the user, as you can be wrong. # The only other comment I would make is that I think the atom_seq method # should be attached to the Chain object, not the Entry object. And so # called by $chain->atom_seq not $entry->atom_seq('A'). The sequence is a # property of a chain so this makes the most sense to me personally. Concur. Except, of course, I can't write it that way: a chain doesn't know what residues are part of it, or what structure it is in. So I've stuck the method on Entry level, together with all other methods. Patch to Entry.pm, Residue.pm and IO/pdb.pm is attached - enjoy! (And, of course, review, comment and criticize) -- Jurgen Pletinckx AlgoNomics NV -------------- next part -------------- > diff /usr/local/lib/perl5/site_perl/5.6.1/Bio/Structure/IO/pdb.pm bioperl-dev/Bio/Structure/IO/ 1206a1207,1208 > $residue->type($resname); > $residue->number($resseq); > diff /usr/local/lib/perl5/site_perl/5.6.1/Bio/Structure/Residue.pm bioperl-dev/Bio/Structure/ 157a158,193 > =head2 number() > > Title : number > Usage : $residue->number(212) > Function: Gets/sets the residue number for this residue > Returns : the residue number > Args : the residue number > > =cut > > sub number { > my ($self, $value) = @_;; > if (defined $value) { > $self->{'number'} = $value; > } > return $self->{'number'}; > } > > =head2 type() > > Title : type > Usage : $residue->type("TRP") > Function: Gets/sets the residue type for this residue > Returns : the residue type > Args : the residue type > > =cut > > sub type { > my ($self, $value) = @_;; > if (defined $value) { > $self->{'type'} = $value; > } > return $self->{'type'}; > } > > diff /usr/local/lib/perl5/site_perl/5.6.1/Bio/Structure/Entry.pm bioperl-dev/Bio/Structure/ 715a716,763 > =head2 sequence() > > Title : sequence > Usage : $seqobj = $structure->sequence("A"); > Function: gets a sequence object containing the sequence as present in the coord- > inate section. > if a chain-ID is given , the sequence for this chain is given, if none > is provided the first chain is chosen > Returns : a Bio::PrimarySeq > Args : the chain-ID of the chain you want the sequence from > > =cut > > sub sequence { > my ($self, $chainid) = @_; > my $chain; > if ( !defined $chainid) { > my $m = ($self->get_models($self))[0]; > $chain = ($self->get_chains($m))[0]; > $chainid = $chain->id; > } > else > { > my $m = ($self->get_models($self))[0]; > for my $ch ($self->get_chains($m)) > { > next unless $ch->id eq $chainid; > $chain = $ch; > last; > } > $self->throw("'$chainid' is not a valid chain id for structure ".$self->id) unless $chain; > } > > my $seq_string; > for my $res ( $self->get_residues($chain)) > { > $seq_string .= ucfirst(lc($res->type)); > } > > $self->debug("sequence : $seq_string\n"); > # this will break for non-protein structures (about 10% for now) XXX KB > my $pseq = Bio::PrimarySeq->new(-alphabet => 'protein'); > $pseq = Bio::SeqUtils->seq3in($pseq,$seq_string); > my $id = $self->id . "_" . $chainid; > $pseq->id($id); > return $pseq; > } > From henschel at mpi-cbg.de Tue Jul 6 09:14:42 2004 From: henschel at mpi-cbg.de (Andreas Henschel) Date: Tue Jul 6 09:41:58 2004 Subject: [Bioperl-l] Re: GO dbxrefs in swissprot In-Reply-To: <9D91A0A8-CC47-11D8-B628-000A959EB4C4@gnf.org> References: <9D91A0A8-CC47-11D8-B628-000A959EB4C4@gnf.org> Message-ID: <40EAA5C2.8090801@mpi-cbg.de> Hilmar Lapp wrote: > Pretty weird what you describe if it works for one entry but not > another. Also, the DR lines don't look suspiciously different. > > If there's no direct reason that prevents you from doing so you should > definitely upgrade to the 1.4.x series, possibly even to the latest > version of the stable branch from CVS. There were quite some fixes > meanwhile, some of which do affect how sequences get loaded into > biosql because the affect the annotation bundle. Hi Hilmar, I installed bioperl from cvs and repopulated the swissprot db into BioSQL. The entries I checked so far are apparantly correct. With the particular example I found that it was obviously a bug in the sequence annotation parsing of the 1.2.1 version. Sorry for having bothered you with versioning, I simply trusted the biosql installation instructions that claimed a patched 1.2.1 would do. What still puzzles me is the size of the database: starting with a 543MB flatfile, the first run (with the faulty parser) gave me 600MB database and 9100 GO annotations. After the rerun with load_seqdatabase (...) --lookup --remove I get 1.1GB database but only 5100 GO annotations in the dbxref table. Is this due to the normalization? Is there a full list of parseable databases (GenBank, EMBL, ENSEMBL?, PDB? etc) and the resp. place to download? Thanks again. Cheers, Andreas From dhoworth at mrc-lmb.cam.ac.uk Tue Jul 6 10:54:38 2004 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Tue Jul 6 10:56:52 2004 Subject: [Bioperl-l] PDB sequence from ATOM records In-Reply-To: <20A147C9-CE89-11D8-8F5E-000A957E44DC@ebi.ac.uk> References: <20A147C9-CE89-11D8-8F5E-000A957E44DC@ebi.ac.uk> Message-ID: <40EABD2E.5020602@mrc-lmb.cam.ac.uk> It seems to me that many people who parse PDB files write their own code. This is a shame, because it wastes effort, it makes things more difficult for beginners, and it leads to differences in results. This practice stems, I believe, both from the complexity of the PDB data and from the multitude of use cases. It is well-known that there are exceptions to almost every rule about the content of PDB files. It is also clear that sometimes people care about every character in the coordinates, while other times they care just about the sequence and sometimes just specific parts of the header, for example. I think it might be useful to have a session on this subject at BOSC. We can try to capture different people's requirements. We can list examples of PDB entries that demonstrate specific problems. We can consider existing code possibilities. We can drink beer. Afterwards, perhaps there is more chance of building some software that will be widely used. What do you think? Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From jurgen.pletinckx at algonomics.com Tue Jul 6 11:46:38 2004 From: jurgen.pletinckx at algonomics.com (Jurgen Pletinckx) Date: Tue Jul 6 11:33:08 2004 Subject: [Bioperl-l] PDB sequence from ATOM records In-Reply-To: <40EABD2E.5020602@mrc-lmb.cam.ac.uk> Message-ID: # What do you think? I think you're right, specifically when you say # It is well-known that there are # exceptions to almost every rule about the content of PDB files. and # I think it might be useful to have a session on this subject at BOSC. However, I'm not likely to be at BOSC. (Then again, it's only a short hop away.) Even though I would like stewardship of the Structure modules. Bummer. (Personal use-case: I would like to add some basic geometry stuff: distances, angles, dihedral angles. Also, getting Chains, Residues, and Atoms working 'properly'. Also, icode handling is Wrong.) -- Jurgen Pletinckx AlgoNomics NV From jdantzer at cs.iupui.edu Tue Jul 6 15:00:14 2004 From: jdantzer at cs.iupui.edu (Jessica Dantzer) Date: Tue Jul 6 15:02:45 2004 Subject: [Bioperl-l] Problems parsing swiss-prot files In-Reply-To: References: <41702.134.68.144.57.1088715465.squirrel@www.cs.iupui.edu> <6.1.1.1.1.20040702160302.01b2b7f8@klingon.cs.iupui.edu> Message-ID: <41958.134.68.144.57.1089140414.squirrel@www.cs.iupui.edu> We added both of the files to our current version of Bioperl, and things seem to be working as they should. Thanks for the help! Jessica > I've fixed it in CVS. I also fixed a bunch of other things in swissprot > parsing to make the parser cleaner I hope. This involved improving the > 'new' function in Bio::Annotation::Reference so you'd want to get that > as well if you getting code from CVS. > > Multi-line RP lines are now all put into the rp field of the > Annotation::Reference object. The parser takes care of splitting it > back into multi-line fields upon writing (although I didn't test this > case specifically). > > PVH and our code auditors. As happy as I am about the code audit for > SeqIO and the like and making sure that things can roundtrip. I really > feel like the guts of these parsers could just a few weeks of someone's > time to clean them up first. Of course myself and few others would want > to simplify the sequence/annotation/feature object model first so who > knows what is the best starting point... > > -jason > > On Fri, 2 Jul 2004, Jessica Dantzer wrote: > >> Most of the references in most of the files have only one RP >> line. Occasionally, there are two. I haven't seen more than two, >> though. One of the files that had more than one line in at least one >> reference was for P33897. I'm parsing information on the mutation/ >> variant data and their references, and so need some of the information >> on those second lines. >> >> At 03:55 PM 7/2/2004, Jason Stajich wrote: >> >Is there more than one RP line per reference? The data structures >> and parsers currently assume there is only one. >> >can you send an acc so we can add it to the tests? >> > >> >-jason >> >On Thu, 1 Jul 2004, Jessica Dantzer wrote: >> > >> > > I'm working on parsing swiss-prot files for use in another >> database, and I've managed to work out where all the information I >> need is stored for the most part. The only problems I'm >> encountering are with the reference parsing-- Some of the files >> have multiple "RP" lines, and I only seem to be able to get one. >> The code seems to indicate that this is how the files are parsed. >> Is there any other way to access the second line? >> > > >> > > Thanks, >> > > Jessica >> > > >> > > >> > > >> > > >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l@portal.open-bio.org >> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > >> > >> >-- >> >Jason Stajich >> >Duke University >> >jason at cgt.mc.duke.edu >> > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gnf.org Tue Jul 6 15:17:10 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jul 6 15:19:19 2004 Subject: [Bioperl-l] Re: GO dbxrefs in swissprot In-Reply-To: <40EAA5C2.8090801@mpi-cbg.de> References: <9D91A0A8-CC47-11D8-B628-000A959EB4C4@gnf.org> <40EAA5C2.8090801@mpi-cbg.de> Message-ID: <1417F5C6-CF81-11D8-87B1-000A95AE92B0@gnf.org> On Jul 6, 2004, at 6:14 AM, Andreas Henschel wrote: > Sorry for having bothered you with versioning, I simply trusted the > biosql installation instructions that claimed a patched 1.2.1 would > do. Sorry - the documentation needs to be updated. > What still puzzles me is the size of the database: starting with a > 543MB flatfile, the first run (with the faulty parser) gave me 600MB > database and 9100 GO annotations. After the rerun with > load_seqdatabase (...) --lookup --remove I get 1.1GB database but > only 5100 GO annotations in the dbxref table. Is this due to the > normalization? I'm confused. Did you start with a scratch biosql instance, or did you re-use the one loaded with swissprot before? If re-loading an existing one, the number of rows in dbxref should *not* go down, regardless of what you do to bioentries. The number of rows in the association table bioentry_dbxref will be affected though. Did you do a grep on the GO dbxrefs in the swissprot files followed by sort unique? How many did you get? You should have at least as many rows in dbxref. If you find a discrepancy, i.e., if you can identify a GO dbxref that's present in your swissprot file but not in the database, check out an entry that is (or should be) associated with that dbxref. > Is there a full list of parseable databases (GenBank, EMBL, ENSEMBL?, > PDB? etc) and the resp. place to download? This list is more or less identical with the list of formats readable by the Bio::SeqIO system in bioperl, because this is what load_seqdatabase.pl uses for parsing files. Genbank and Embl is among those formats. Ensembl used to come in an Embl-formatted flatfile dump, but I don't know whether it still does. Note that without any post-processing the bioentries resulting from a file upload will represent the entries found in the source file. E.g., if the source file contains an annotated whole chromosome entry, that's what you'll get (but not necessarily want) in biosql as well. As an example for integrated post-processing, I used to use a Bio::Factory::SequenceProcessorI implementation to split Ensembl whole chromosomes into predicted genes, transcripts, and proteins, which would then get loaded into biosql. (check out the documentation for the --pipeline option in load_seqdatabase.pl for how to make the script invoke a given post-processor) -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney at ebi.ac.uk Tue Jul 6 15:43:07 2004 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Jul 6 15:45:24 2004 Subject: [Bioperl-l] Re: GO dbxrefs in swissprot In-Reply-To: <1417F5C6-CF81-11D8-87B1-000A95AE92B0@gnf.org> Message-ID: > > Is there a full list of parseable databases (GenBank, EMBL, ENSEMBL?, > > PDB? etc) and the resp. place to download? > Ensembl is best accessed through the Ensembl Perl API, parts of which still do comply to the Bioperl Bio::SeqI interface (ie, they can be dumped by SeqIO, and therefore in theory read into the BioSQL). Ensembl does make EMBL dumps *BUT*.... ... all we now put in the EMBL dumps are the genes. It is bad enough trying to keep everything tied down in place inside the Ensembl system correctly to also be agonising about how data should be represented inside EMBL/GenBank flat files (or Bio::SeqI objects more clearly) -- and we clearly can't dump all the SNPs, Features, Genes, Exon, Affy probe mappings, etc etc on our ftp site. We'd simply run out of space by feburary each year. A low priority project inside Ensembl has been to set up a more functional ensembl<->bioperl bridge that would give good access to Ensembl objects through a Bio::SeqI wrapper, presumably using the AnnotationI interface to its absolute max. This is in the "would be nice to do" but we always have things far higher on the priority stack (eg, this month's fun was dealing with selenocystines). For more info on the ensembl perl API check out: http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/CodeTutorial.html From hlapp at gnf.org Tue Jul 6 16:49:22 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jul 6 16:51:35 2004 Subject: [Bioperl-l] Re: GO dbxrefs in swissprot In-Reply-To: References: Message-ID: Hi Ewan. how are you? :-) On Jul 6, 2004, at 12:43 PM, Ewan Birney wrote: > Ensembl is best accessed through the Ensembl Perl API Bluntly, this is - although less extreme - like NCBI saying RefSeq is best accessed through the NCBI toolkit, and here's how you install that, and by the way we don't have time to create a genbank-formatted dump. I.e., I believe immediately that if I wanted to get every detail and every context of the genome annotation that Ensembl produces right up to every special case, then I shouldn't go for anything less than the full power of the Ensembl Perl API. Many times though the "best" access is the least troublesome, or most familiar, with some loss of content acknowledged. I'm willing to bet that most people access RefSeq not through the NCBI toolkit, and that that wouldn't change even if there were some content that would be absent in the genbank-formatted dump. Do you foremost want to do a service to the community, or a service to your development group? What would be extremely useful is if Ensembl provided a dump in a common flat-file format that contained all Ensembl-originated content that one cannot reproduce without a very significant computing and maintenance effort. As I see it, this would consist of all gene predictions, transcript predictions, protein predictions, and the results of the Ensembl annotation pipeline(s) for those predictions; localizations would be nice, but not required. It doesn't have to be EMBL format; any flat format that Bio::SeqIO supports and that doesn't require me to install yet another huge library the update cycle of which I need to keep up with would be very helpful. (Actually, would the gene-only dump you mentioned have all that as features and tags?) IMNSHO this wouldn't be a nice-to-have; it would be terrific and tremendously increase the value of Ensembl once you're outside of the Ensembl website. It would also allow people (read: me ;) to, e.g., effortlessly load ensembl along with refseq and swissprot into biosql. Affy probe and any other public sequence mappings I can do myself given the genome sequence and my own BLAT server (besides, even without one, UCSC provides all of that for download anyway). Anyway, my $0.02, which turns out to approach being worth less than a GBP penny ... Beer in Glasgow? Meanwhile I could even convince my credit card company not to shut down my account and that Concorde Services is not a fraudulent UK male entertainment enterprise :-) -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From j1gregor at biomail.ucsd.edu Tue Jul 6 16:59:43 2004 From: j1gregor at biomail.ucsd.edu (James Gregory) Date: Tue Jul 6 17:01:53 2004 Subject: [Bioperl-l] piping to CAP3 Message-ID: Hello, I'm trying to write a script that receives either pasted sequences or a file upload of FASTA format seqs for assembly using CAP3 but i'm having some trouble. This doesn't use bioperl but i'm guessing that this audience will be more familiar with what i'm trying to do. use strict; use CGI qw(:standard escapeHTML); my $input = new CGI; my $tgicl = "/opt/CAP_update/tgicl_linux/bin/tgicl"; #path to CAP3 # for pasted seqs my $seq = $input->param("seqs"); # retrieve seqs open (FILE, ">seqAssembly.txt") || die; my $file = ; # printing seqs to file print FILE "$seq"; close FILE; open PIPE_TO_CAP3, "| $tgicl seqAssembly.txt" || die $!; # not sure how to get to the CAP3 output files at this step or if the # above syntax for opening the file is correct. close PIPE_TO_CAP3; calling CAP3 from the shell would normally just be: tgicl file_name there are multiple file outputs and i'm not sure how to retrieve these in my script. The way it's written nothing really happens (not surprising). Any help would be appreciated. Jamie From davila at ioc.fiocruz.br Tue Jul 6 17:35:53 2004 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Tue Jul 6 17:37:41 2004 Subject: [Bioperl-l] Problem with genbank.pm ? Message-ID: <1089149753.3178.34.camel@tryps> Dear All, Trying to run the script below I got several errors and a "segmentation fault" ... I was trying to get the Xylella fasitidiosa genome from the "genome" subset of genbank then put it in a MySQL (MyISAM) table containing a column formatted as "mediumblob/binary"... Any tips to solve this ? Thanks, Alberto ****** #!/usr/local/bin/perl -w use lib "/usr/local/bioperl14"; use Bio::SearchIO; use Bio::DB::Query::GenBank; use Bio::SeqIO; use Bio::DB::GenBank; my $organismname = $ARGV[0]; my $Lib_Code = $ARGV[1]; %Lib = (1,"GIG",2,"EST",3,GSS,4,STS,5,"Genome"); my $genbankfile = "GenBank."."$organismname."."$Lib{$Lib_Code}"; &download ($organismname,$Lib_Code); &submit_download ($genbankfile,$Lib_Code); sub download { #Menu: 1 - Genes in genomic; 2 - EST; 3 - GSS; 4 - STS; 5 - Genome; 6 - Local #my $query_string2; if ($Lib_Code =~ /1/) { $query_string = $organismname."[Organism] AND \"genes in genomic\"[Properties]"; } elsif ($Lib_Code =~ /2/) { $query_string = $organismname."[Organism] AND \"gbdiv est\"[Properties]"; } elsif ($Lib_Code =~ /3/) { $query_string = $organismname."[Organism] AND \"gbdiv gss\"[Properties]"; } elsif ($Lib_Code =~ /4/) { $query_string = $organismname."[Organism] AND \"gbdiv sts\"[Properties]"; } elsif ($Lib_Code =~ /5/) { $query_string2 = $organismname."[Organism]"; } if ($Lib_Code < 5) { $query = new Bio::DB::Query::GenBank(-db=>'nucleotide', -query=>$query_string, -mindate => '1985', -maxdate => '2004'); $count = $query->count; $seqio=new Bio::DB::GenBank->get_Stream_by_query($query); } elsif ($Lib_Code == 5) { $query2 = new Bio::DB::Query::GenBank(-db=>'genome', -query=>$query_string2, -mindate => '1985', -maxdate => '2004'); $count = $query2->count; $seqio=new Bio::DB::GenBank->get_Stream_by_query($query2); } ERROR LOG: Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189592. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189593. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189595. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189596. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189598. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189599. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189600. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189601. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189603. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189605. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189607. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189609. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189610. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189612. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189614. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189616. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189618. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189620. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189622. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189624. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189630. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in concatenation (.) or string at /usr/local/bioperl14/Bio/SeqIO/genbank.pm line 492, line 189631. -------------------- WARNING --------------------- MSG: Unexpected error in feature table for Skipping feature, attempting to recover --------------------------------------------------- Use of uninitialized value in length at /usr/local/bioperl14/Bio/SeqIO/fasta.pm line 205, line 189631. 8 Segmentation fault From echuong at gmail.com Tue Jul 6 21:38:43 2004 From: echuong at gmail.com (Edward Chuong) Date: Tue Jul 6 21:41:06 2004 Subject: [Bioperl-l] Help on a basic EST-genomic alignment script In-Reply-To: References: <244d2e0e040625152477e9d07a@mail.gmail.com> <244d2e0e04062701275dae0799@mail.gmail.com> <244d2e0e0406281221132be10e@mail.gmail.com> <244d2e0e0406281529586c7693@mail.gmail.com> <244d2e0e04062915137241bdc4@mail.gmail.com> Message-ID: <244d2e0e04070618383cb7d466@mail.gmail.com> Hi, > > my $cdsseq = $seq->trunc($hsp->query->start, $hsp->query->end); > > Make a hash of all these seqs > $cdsseqs{$hsp->query->seq_id} = $cdsseq; > > You'll also need to get the CDS region from the subject (Mus CDS)- I'm > assuming you built your protein set from just Mus CDS and not cDNA - > otherwise if these are ensembl peps you can get just the CDS which codes > for each protein accession from EnsMart. Or you can just get the CDS > clipped out from the genbank file as you would have already done. > > The start/end of the alignment in subject nt coords will be > my ($hstart,$hend) = ( ($hsp->hit->start -1) * 3 + 1, > ($hsp->hit->end -1) * 3 + 1); > > So do the same thing as before and grab the sub-sequence from the Mus > cDNA and add it to the %cdssseq hash. > > Now you still have to contend with frameshifts - you're going to have to > figure where they are coded as '/' and '\' in the query string > ($hsp->query_string, $hsp->hit_string) and either insert or delete an > appropriate base (insert an N if it is a missing base, remove the extra > base if it is there). > I followed mostly what you said, also adding "---" in the nuc seq when there's a "-" in the peptide. Can you elaborate on what the "/" and "\" are? I couldn't find any documentation on them. Do I have to figure out how many nucleotides its shifted? for '/' I have it remove the base, for '\' I have it remove 2 bases and add '------', and it seems to work for my test cases. But it looks like I'm getting there. Printing out both sequences gives a pretty good match. > For good measure you might take your cdsseqs, translate them back to > protein, and align with pSW or needle/water with EMBOSS and check that > you don't have any stop codons (in case your fixing of frameshifts didn't > work or to detect if you are accidently clipping out the wrong piece of > sequence). Given this alignment - $proteinln you can use the following to > align the cds sequences using the protein aln as a template. > > use Bio::Align::Utils qw(aa_to_dna_aln); > my $cdsaaln = &aa_to_dna_aln($proteinaln,%cdsseqs); OK, I aligned the proteins with pSW and they look great, do I need to do anything about the stop codons at the end? I'm also having some trouble making the hash for this. It keeps saying I can't use "mus" (or anything I put in, including $est->id or $hsp->query->seq_id) when use strict refs are on, but I remember making hashes before. All syntaxes, my %cdsseqs = ('mus', $est) or $cdsseqs{'mus'} = $est, etc do this. Probably something wrong in my overall code, I think I'll figure it out eventually. I still don't understand, howerver, what exactly does aa_to_dna_aln do? > > Then pass the $cdsaln object to the PAML Runner > (Bio::Tools::Run::Phylo::PAML:: Codeml or Yn00). The sample script to do this looks very complicated :). I'll try it out though. The one in Bio::Align::Utilities looks easier to use, but I'm having problems making a simplealn object. I'm pretty sure my est/pero are aligned, so I tried just adding them to the simplealn object, but I needed to convert them into LocatableSeq objects. Anything special I need for the start/end values, or should I just use the values for the shorter sequence? Thank you so much for all the help! -Ed -- Edward Chuong http://iacs5.ucsd.edu/~echuong From Matthew.Betts at ii.uib.no Wed Jul 7 05:05:26 2004 From: Matthew.Betts at ii.uib.no (Matthew Betts) Date: Wed Jul 7 05:07:43 2004 Subject: [Bioperl-l] PDB sequence from ATOM records In-Reply-To: <200407062139.i66LcWKu024692@portal.open-bio.org> Message-ID: Hi, Thanks for all the replies. I definitely agree that we should sort out what's wanted from Bio::Structure before changing what's already there. If there's a session at BOSC (maybe in conjunction with 3Dsig?) could someone post the conclusions somewhere so that those of us who can't make it can comment? Thanks. Also, the MSD people must have done a lot of this type of thing already, and their input would be really useful (f.eks probably a good idea to use their API). Matthew On Tue, 6 Jul 2004 bioperl-l-request@portal.open-bio.org wrote: Message: 10 Date: Tue, 06 Jul 2004 15:54:38 +0100 From: Dave Howorth Subject: Re: [Bioperl-l] PDB sequence from ATOM records To: bioperl-l@portal.open-bio.org Message-ID: <40EABD2E.5020602@mrc-lmb.cam.ac.uk> Content-Type: text/plain; charset=us-ascii; format=flowed It seems to me that many people who parse PDB files write their own code. This is a shame, because it wastes effort, it makes things more difficult for beginners, and it leads to differences in results. This practice stems, I believe, both from the complexity of the PDB data and from the multitude of use cases. It is well-known that there are exceptions to almost every rule about the content of PDB files. It is also clear that sometimes people care about every character in the coordinates, while other times they care just about the sequence and sometimes just specific parts of the header, for example. I think it might be useful to have a session on this subject at BOSC. We can try to capture different people's requirements. We can list examples of PDB entries that demonstrate specific problems. We can consider existing code possibilities. We can drink beer. Afterwards, perhaps there is more chance of building some software that will be widely used. What do you think? Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From jiansun05 at yahoo.com Wed Jul 7 12:35:18 2004 From: jiansun05 at yahoo.com (Jane Sun) Date: Wed Jul 7 12:56:58 2004 Subject: [Bioperl-l] questions about the Bioperl installation. Message-ID: <20040707163518.54305.qmail@web53305.mail.yahoo.com> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: bioperlInstallatio-error.doc Type: application/msword Size: 39936 bytes Desc: bioperlInstallatio-error.doc Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20040707/9fb98062/bioperlInstallatio-error-0001.doc From wuhuizhu at mail.eecis.udel.edu Tue Jul 6 13:59:55 2004 From: wuhuizhu at mail.eecis.udel.edu (Huizhuan Wu) Date: Wed Jul 7 12:57:41 2004 Subject: [Bioperl-l] How to fix "blast_report is undefined"?? Message-ID: Here is the msg I got: Here is blast_report: Bio::SearchIO::blast=HASH(0x9ac3e0) Can't call method "next_result" on an undefined value at /Users/wuhuizhuan/Sites/CGI-BIN/bl.cgi line 73. Here is my code: #!/usr/bin/perl -w -T $ENV{BLASTMAT} = '/usr/local/BLAST/data'; $ENV{'BLASTDB'} ='/Users/huizhuan/myWork/learnBlast'; $ENV{PATH} = '/usr/local/bin:/usr/ucb:/usr/bin:/sbin:/usr/sbin'; $ENV{IFS} = "" if $ENV{IFS} ne ""; # This script gets sequence from grape MPSS page and perform blastn against # contig_fastaforNevada.txt database. # Modules to use. use CGI qw(:all); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); use CGI::Carp qw(fatalsToBrowser carpout); use strict; use warnings; use Getopt::Long; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; print header; print start_html("BLAST RESULT"); my ($query, $seq); my $db = "/Users/huizhuan/myWork/learnBlast/data.txt"; my $maxEval = 1.0e-10; $query = "233fff"; my $seq = "GATCGGTTAATGGGCCATGGGGGG"; my $seqobj = Bio::Seq->new( '-id' => $query, '-seq' => $seq); my @params = ('program'=>'blastn', 'outfile'=>'blast.out', '_READMETHOD'=>'Blast', 'F'=>'F','W'=>17,'g'=>'F', 'database'=>$db); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $factory->e($maxEval); my $blast_report = $factory->blastall($seqobj); print "Here is blast_report: $blast_report"; my $result = $blast_report->next_result; print end_html; Any help is highly appreciated! huizhuan From Marc.Logghe at devgen.com Tue Jul 6 18:08:44 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Jul 7 12:57:50 2004 Subject: [Bioperl-l] piping to CAP3 Message-ID: Hi Jamie, > -----Original Message----- > From: James Gregory [mailto:j1gregor@biomail.ucsd.edu] > Sent: dinsdag 6 juli 2004 23:00 > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] piping to CAP3 > > > Hello, > > I'm trying to write a script that receives either pasted > sequences or a > file upload of FASTA format seqs for assembly using CAP3 but > i'm having > some trouble. This doesn't use bioperl but i'm guessing that this > audience will be more familiar with what i'm trying to do. > > use strict; > use CGI qw(:standard escapeHTML); > > > my $input = new CGI; > my $tgicl = "/opt/CAP_update/tgicl_linux/bin/tgicl"; #path to CAP3 > > # for pasted seqs > > my $seq = $input->param("seqs"); # retrieve seqs > open (FILE, ">seqAssembly.txt") || die; > my $file = ; # printing seqs to file > print FILE "$seq"; > close FILE; > > open PIPE_TO_CAP3, "| $tgicl seqAssembly.txt" || die $!; > The pipe symbol should be at the end, not the beginning. Cos you don't want to pipe anything TO cap3; you want the output from cap3: open PIPE_TO_CAP3, "$tgicl seqAssembly.txt |" || die $!; # slurp the result local $/ = undef; my ($result) = ; close PIPE_TO_CAP3; In fact I have a Bio::Tools::Run::Cap3 module which does that all for you. I'll put it in attach so you can try it out. (adjust the program dir, if needed, in the package) Then you could do: use Bio::Tools::Run::Cap3; my $p = Bio::Tools::Run::Cap3->new; my $result = $p->run( \@seq_obj ); print $result; HTH, Marc *********************************************************** Marc Logghe, Ph.D. Senior Scientist Scientific Computing Group Devgen nv Technologiepark 30 B - 9052 Ghent-Zwijnaarde Belgium Tel: +32 9 324 24 83 Fax: +32 9 324 24 25 > **** DISCLAIMER ********************************************************** > "This e-mail and any attachments thereto may contain information > which is confidential and/or protected by intellectual property > rights and are intended for the sole use of the recipient(s) named above. > Any use of the information contained herein (including, but not limited to, > total or partial reproduction, communication or distribution in any form) > by persons other than the designated recipient(s) is prohibited. > If you have received this e-mail in error, please notify the sender either > by telephone or by e-mail and delete the material from any computer. > Thank you for your cooperation." -------------- next part -------------- A non-text attachment was scrubbed... Name: Cap3.pm Type: application/octet-stream Size: 3374 bytes Desc: Cap3.pm Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20040707/724896e6/Cap3.obj From gtg974p at mail.gatech.edu Wed Jul 7 13:25:19 2004 From: gtg974p at mail.gatech.edu (gtg974p@mail.gatech.edu) Date: Wed Jul 7 13:30:35 2004 Subject: [Bioperl-l] Help with promoter analysis In-Reply-To: <200407071701.i67H15Kr008431@portal.open-bio.org> References: <200407071701.i67H15Kr008431@portal.open-bio.org> Message-ID: <1089221119.40ec31ff93330@webmail.mail.gatech.edu> Hi all, I am trying to find all the Transcriptional Factor binding sites using the TFBS perl modules and TRANSFAC matrices. When I tried running the code in the whole chromosome it says the sequence too long. I tried changing the #define SEQLEN 1000000 in the pwm_search.h to #define SEQLEN 100000000 and the sequence length is lesser than this. But it still throws me the same error. Can someone help me with this? Thanks, Subadhra From jason at cgt.duhs.duke.edu Wed Jul 7 17:22:03 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jul 7 17:24:46 2004 Subject: [Bioperl-l] Re: Problem with Bio::TreeIO::nexus In-Reply-To: <40EC6766.9060301@bms.com> References: <40EC6766.9060301@bms.com> Message-ID: I have put a bunch of fixes in since revision 1.3. I would suggest getting the latest from CVS. If you want a patchfile to upgrade from 1.3 although it should be easy to grab from CVS I hope. If it still doesn't work let us know. ---------------------------- revision 1.7 date: 2004/06/29 16:55:13; author: jason; state: Exp; lines: +3 -4 back out attempted MrBayes parsing patch - shouldn't be needed; #1619 ---------------------------- revision 1.6 date: 2004/06/24 15:14:07; author: jason; state: Exp; lines: +8 -2 bugfix for #1656 ---------------------------- revision 1.5 date: 2004/03/19 18:05:11; author: jason; state: Exp; lines: +3 -3 handle nexus trees a little better I hope - more testing needed ---------------------------- revision 1.4 date: 2004/03/09 20:12:05; author: jason; state: Exp; lines: +28 -28 some cleanup- wasn't matching all nexus formats ---------------------------- On Wed, 7 Jul 2004, Donald G. Jackson wrote: > Jason or another TreeIO wizard, > > I'm having trouble reading in a nexus treefile created with Paup4b10. > I've attached my code and an example treefile. Briefly, when I call > next_tree() I get no trees back. > > It looks as though the comment-counting mechanism is getting thrown off > (Bio::TreeIO::nexus lines 124ff). Once the opening comment closes, the > module stops reading any data ;(. I'm using version 1.3 of > Bio::TreeIO::nexus > > I'd appreciate any suggestions or assistance. > > Thanks, > > Don Jackson > BMS Bioinformatics > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From donald.jackson at bms.com Wed Jul 7 17:24:13 2004 From: donald.jackson at bms.com (Donald G. Jackson) Date: Wed Jul 7 17:26:28 2004 Subject: [Bioperl-l] Re: Problem with Bio::TreeIO::nexus In-Reply-To: References: <40EC6766.9060301@bms.com> Message-ID: <40EC69FD.2050507@bms.com> Thanks, that fixed it. Sorry for not checking CVS before sending off the email. My brain's not working this afternoon. Don Jason Stajich wrote: >I have put a bunch of fixes in since revision 1.3. I would suggest >getting the latest from CVS. If you want a patchfile to upgrade from >1.3 although it should be easy to grab from CVS I hope. > >If it still doesn't work let us know. > >---------------------------- >revision 1.7 >date: 2004/06/29 16:55:13; author: jason; state: Exp; lines: +3 -4 >back out attempted MrBayes parsing patch - shouldn't be needed; #1619 >---------------------------- >revision 1.6 >date: 2004/06/24 15:14:07; author: jason; state: Exp; lines: +8 -2 >bugfix for #1656 >---------------------------- >revision 1.5 >date: 2004/03/19 18:05:11; author: jason; state: Exp; lines: +3 -3 >handle nexus trees a little better I hope - more testing needed >---------------------------- >revision 1.4 >date: 2004/03/09 20:12:05; author: jason; state: Exp; lines: +28 -28 >some cleanup- wasn't matching all nexus formats >---------------------------- > > >On Wed, 7 Jul 2004, Donald G. Jackson wrote: > > > >>Jason or another TreeIO wizard, >> >>I'm having trouble reading in a nexus treefile created with Paup4b10. >>I've attached my code and an example treefile. Briefly, when I call >>next_tree() I get no trees back. >> >>It looks as though the comment-counting mechanism is getting thrown off >>(Bio::TreeIO::nexus lines 124ff). Once the opening comment closes, the >>module stops reading any data ;(. I'm using version 1.3 of >>Bio::TreeIO::nexus >> >>I'd appreciate any suggestions or assistance. >> >>Thanks, >> >>Don Jackson >>BMS Bioinformatics >> >> >> > >-- >Jason Stajich >Duke University >jason at cgt.mc.duke.edu > > From donald.jackson at bms.com Wed Jul 7 17:13:10 2004 From: donald.jackson at bms.com (Donald G. Jackson) Date: Wed Jul 7 21:31:47 2004 Subject: [Bioperl-l] Problem with Bio::TreeIO::nexus Message-ID: <40EC6766.9060301@bms.com> Jason or another TreeIO wizard, I'm having trouble reading in a nexus treefile created with Paup4b10. I've attached my code and an example treefile. Briefly, when I call next_tree() I get no trees back. It looks as though the comment-counting mechanism is getting thrown off (Bio::TreeIO::nexus lines 124ff). Once the opening comment closes, the module stops reading any data ;(. I'm using version 1.3 of Bio::TreeIO::nexus I'd appreciate any suggestions or assistance. Thanks, Don Jackson BMS Bioinformatics -------------- next part -------------- A non-text attachment was scrubbed... Name: merge_trees.pl Type: application/x-perl Size: 678 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20040707/26520083/merge_trees.bin -------------- next part -------------- #NEXUS Begin trees; [Treefile saved Wed Jul 7 16:13:25 2004] [! >Data file = /net/hox/home/jacksod/Kinase_Struc_Phylogeny/Phylogeny/v2/kinase_set_v3_run.nex >Heuristic search settings: > Optimality criterion = distance (minimum evolution) > Negative branch lengths allowed, but set to zero for tree-score calculation > Distance measure = total character difference > 138 characters are excluded > Starting tree(s) obtained via neighbor-joining > Branch-swapping algorithm: tree-bisection-reconnection (TBR) > Steepest descent option not in effect > Initial 'MaxTrees' setting = 100 > Zero-length branches not collapsed > 'MulTrees' option in effect > Topological constraints not enforced > Trees are unrooted > >NOTE: Random-addition-sequence option is ignored when starting trees are > obtained using neighbor-joining. > >Heuristic search completed > Total number of rearrangements tried = 13674 > Score of best tree(s) found = 2445.54947 > Number of trees retained = 1 > Time used = 1 sec (CPU time = 0.75 sec) ] Translate 1 ABL1, 2 LCK, 3 FYN, 4 SRC, 5 HCK, 6 LYN, 7 EMT, 8 IR, 9 'IGF-1R', 10 MET, 11 FLT3, 12 CKIT, 13 'PDGF-ALPHA', 14 'PDGF-BETA', 15 VEGFR2, 16 FGFR1, 17 FAK1, 18 FAK2, 19 SYK, 20 ZAP70, 21 HER1, 22 HER2, 23 LIMK2, 24 LIMK1, 25 TESK1, 26 TESK2, 27 MEK1, 28 MEKK2, 29 MINK, 30 AURORA1, 31 MK2, 32 PKA, 33 AKT1, 34 'PKC-DELTA', 35 'PCK-THETA', 36 GSK3, 37 CDK2, 38 'P38-ALPHA' ; tree PAUP_1 = [&U] (((((1:62.006250,((2:28.407143,(5:22.986111,6:22.013889):6.092857):8.340260,(3:18.930556,4:19.069444):17.795455):27.952841):1.772984,7:70.397849):9.335071,((((8:21.027778,9:21.972222):53.928571,10:76.071429):3.947181,(((11:41.361111,12:42.638889):3.257353,(13:30.430556,14:29.569444):16.742647):10.640625,(15:51.736111,16:51.263889):4.484375):19.765864):4.506663,(((17:47.861111,18:49.138889):24.632353,(19:45.791667,20:45.208333):31.617647):1.031250,(21:23.736111,22:22.263889):54.468750):5.022192):0.043724):15.646889,((23:41.902778,24:33.097222):40.794118,(25:38.347222,26:37.652778):35.205882):24.266098):0,(((27:96.442857,(28:89.527778,29:93.472222):5.057143):2.725744,(36:94.842857,(37:79.750000,38:88.250000):10.657143):11.369494):0.867521,((30:81.763889,31:97.236111):0.992187,((32:68.777778,33:66.222222):6.801471,(34:32.069444,35:30.930556):36.198529):16.632813):8.518763):10.268624); End; From wrp at virginia.edu Thu Jul 8 10:42:08 2004 From: wrp at virginia.edu (William R.Pearson) Date: Thu Jul 8 10:44:12 2004 Subject: [Bioperl-l] CSHL Computational Genomics - Application deadline July 15 Message-ID: Course announcement - Application deadline, July 15, 2004 ================================================================ Cold Spring Harbor COMPUTATIONAL GENOMICS NOVEMBER 5 - 11, 2003 INSTRUCTORS: Pearson, William, Ph.D., University of Virginia, Charlottesville, VA Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of Prussia, PA Beyond BLAST and FASTA - This course presents a comprehensive overview of the theory and practice of computational methods for gene identification and characterization from DNA sequence data. The course focuses on approaches for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment. Additional topics include gene recognition (exon/intron prediction), identifying signals in unaligned sequences, and integration of genetic and sequence information in biological databases. The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course makes extensive use of local WWW pages to present problem sets and the computing tools to solve them. Students use Windows and Mac workstations attached to a UNIX server; participants should be comfortable using the Unix operating system and a Unix text editor. The course is designed for biologists seeking advanced training in biological sequence analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis. The primary focus of the Computational Genomics Course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and developing new algorithms. Students more interested in the practical aspects of software development are encouraged to apply to the Cold Spring Harbor Bioinformatics - Writing Software for Genome Research Course. For additional information and the lecture schedule and problem sets for the 2003 course, see: http://www.people.virginia.edu/~wrp/cshl03 ================================================================ To apply to the course, fill out the form at: http://meetings.cshl.org/course_app.htm ================================================================ From arhui20 at hotmail.com Thu Jul 8 11:07:01 2004 From: arhui20 at hotmail.com (huizhuan wu) Date: Thu Jul 8 11:09:08 2004 Subject: [Bioperl-l] wwwblast script question Message-ID: Can someone there help me out of this? I have been stuck in this for a week. Any help would be highly appreciated! I have a bioperl cgi script to host a blast server locally. But I kept getting an error msg of not being able to call blastall in "$blast_report = $factory->blastall($seqobj);". The script is running ok in command line. So it should be the wwwblast server environment setting problem. I am running on a machine that I have a user accout but without the root previledge. ------------------------------------------------ my bioperl cgi script is in: /Users/myname/Sites/CGI-BIN/ my local database to blast(I have formatted it) is in: /users/myname/blast_script/data.txt the blast programs in the machine is in: /usr/local/NCBI/network/wwwblast my confirguration file is in: /etc/httpd/myname.conf ------------------------------------------------ Here is my settings on environment variables: (1) in .ncbirc file [NCBI] data=/usr/local/BLAST/data [BLAST] BLASTDB=/Users/myname/blast_script BLASTMAT=/usr/local/BLAST/data (2) in myname.conf Options ExecCGI Options ExecCGI Options ExecCGI ------------------------------------------------ Here is the error msg: Here is blast_report Content-type:text/html Can't call method "next_result" on an undefined value at /Users/myanme/Sites/CGI-BIN/bl.cgi line 73. ------------------------------------------------ The error is located here: ---- my $blast_report = $factory->blastall($seqobj); print "Here is blast_report: $blast_report"; my $result = $blast_report->next_result; ---- -------------------------------------------------- Here is my code: #!/usr/bin/perl -w -T $ENV{BLASTMAT} = '/usr/local/BLAST/data'; $ENV{'BLASTDB'} ='/Users/huizhuan/myWork/learnBlast'; $ENV{PATH} = '/usr/local/bin:/usr/ucb:/usr/bin:/sbin:/usr/sbin'; $ENV{IFS} = "" if $ENV{IFS} ne ""; # This script gets sequence from grape MPSS page and perform blastn against data.txt # Modules to use. use CGI qw(:all); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); use CGI::Carp qw(fatalsToBrowser carpout); use strict; use warnings; use Getopt::Long; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; print header; print start_html("BLAST RESULT"); my ($query, $seq); my $db = "/Users/myname/blast_script/data.txt"; my $maxEval = 1.0e-10; $query = "233fff"; my $seq = "GATCGGTTAATGGGCCATGGGGGG"; my $seqobj = Bio::Seq->new( '-id' => $query, '-seq' => $seq); my @params = ('program'=>'blastn', 'outfile'=>'blast.out', '_READMETHOD'=>'Blast', 'F'=>'F','W'=>17,'g'=>'F', 'database'=>$db); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $factory->e($maxEval); my $blast_report = $factory->blastall($seqobj); print "Here is blast_report: $blast_report"; my $result = $blast_report->next_result; print end_html; _________________________________________________________________ MSN Life Events gives you the tips and tools to handle the turning points in your life. http://lifeevents.msn.com From gtg974p at mail.gatech.edu Thu Jul 8 15:14:19 2004 From: gtg974p at mail.gatech.edu (gtg974p@mail.gatech.edu) Date: Thu Jul 8 15:20:32 2004 Subject: [Bioperl-l] Few questions on whole chromosome & GFF In-Reply-To: <200407071701.i67H15Kr008431@portal.open-bio.org> References: <200407071701.i67H15Kr008431@portal.open-bio.org> Message-ID: <1089314059.40ed9d0b20438@webmail.mail.gatech.edu> Hi, I am trying to find the positions of a particular pattern in the whole chromosome and write the o/p in the GFF format. I have few questions (I am a newbie, sorry if the questions are silly) 1) How do I split the whole chromosome into smaller chunks and then join it again? 2) If I split them into say 20 different files o/p of file (GFF) seqID source Feature Start End Score Strand There will be 20 different GFF files with each start & end corresponding to its own (small chunks) file, how do I make it correspond to the whole chromosome coordinates? 3) While writing the GFF file how can I specify that I want to sort it by Score, by default its sorting by start? Please help me, Thanks, From jason at cgt.duhs.duke.edu Thu Jul 8 22:22:22 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 8 22:25:17 2004 Subject: [Bioperl-l] wwwblast script question In-Reply-To: References: Message-ID: Presumably the .ncbirc file in your home directory is not used by the apache process running the webserver - I don't think this is the problem necessarily, but could be part of it. What does $factory->executable('blastall') report in your script? Your error message seems inconsistent with what you are printing. You say it cannot call blastall but in fact blastall is being called an returning a null value. I wonder where it is really breaking down... What if you just remove StandAloneBlast from the equation and run my $fh; open($fh, "blastall -i seqfile.fa -d yourdb -F F -g 5 -W 17 |") || die("cannot run blast!") my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh); my $result = $searchio->next_result; Then run StandAloneBlast with verbose flag set to 1 $factory->verbose(1); to see more warnings and information and make sure the command-line it is running seems sensible. Best I got at this point... -jason On Thu, 8 Jul 2004, huizhuan wu wrote: > Can someone there help me out of this? I have been stuck in this for a week. > Any help would be highly appreciated! > > I have a bioperl cgi script to host a blast server locally. But I kept > getting an error msg of not being able to call blastall in "$blast_report = > $factory->blastall($seqobj);". The script is running ok in command line. So > it should be the wwwblast server environment setting problem. I am running > on a machine that I have a user accout but without the root previledge. > ------------------------------------------------ > my bioperl cgi script is in: /Users/myname/Sites/CGI-BIN/ > my local database to blast(I have formatted it) is in: > /users/myname/blast_script/data.txt > the blast programs in the machine is in: /usr/local/NCBI/network/wwwblast > my confirguration file is in: /etc/httpd/myname.conf > > ------------------------------------------------ > Here is my settings on environment variables: > (1) in .ncbirc file > [NCBI] > data=/usr/local/BLAST/data > > [BLAST] > BLASTDB=/Users/myname/blast_script > BLASTMAT=/usr/local/BLAST/data > > (2) in myname.conf > > Options ExecCGI > > > > Options ExecCGI > > > > Options ExecCGI > > > ------------------------------------------------ > Here is the error msg: > Here is blast_report Content-type:text/html > > Can't call method "next_result" on an undefined value at > /Users/myanme/Sites/CGI-BIN/bl.cgi line 73. > ------------------------------------------------ > The error is located here: > ---- > my $blast_report = $factory->blastall($seqobj); > print "Here is blast_report: $blast_report"; > my $result = $blast_report->next_result; > ---- > -------------------------------------------------- > Here is my code: > #!/usr/bin/perl -w -T > > $ENV{BLASTMAT} = '/usr/local/BLAST/data'; > $ENV{'BLASTDB'} ='/Users/huizhuan/myWork/learnBlast'; > $ENV{PATH} = '/usr/local/bin:/usr/ucb:/usr/bin:/sbin:/usr/sbin'; > $ENV{IFS} = "" if $ENV{IFS} ne ""; > > > # This script gets sequence from grape MPSS page and perform blastn > against data.txt > # Modules to use. > use CGI qw(:all); > use CGI::Carp qw(warningsToBrowser fatalsToBrowser); > use CGI::Carp qw(fatalsToBrowser carpout); > use strict; > use warnings; > use Getopt::Long; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > > print header; > print start_html("BLAST RESULT"); > my ($query, $seq); > my $db = "/Users/myname/blast_script/data.txt"; > my $maxEval = 1.0e-10; > > $query = "233fff"; > my $seq = "GATCGGTTAATGGGCCATGGGGGG"; > my $seqobj = Bio::Seq->new( '-id' => $query, > '-seq' => $seq); > my @params = ('program'=>'blastn', 'outfile'=>'blast.out', > '_READMETHOD'=>'Blast', 'F'=>'F','W'=>17,'g'=>'F', > 'database'=>$db); > > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > $factory->e($maxEval); > my $blast_report = $factory->blastall($seqobj); > print "Here is blast_report: $blast_report"; > my $result = $blast_report->next_result; > print end_html; > > _________________________________________________________________ > MSN Life Events gives you the tips and tools to handle the turning points in > your life. http://lifeevents.msn.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Thu Jul 8 22:26:49 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 8 22:30:13 2004 Subject: [Bioperl-l] Few questions on whole chromosome & GFF In-Reply-To: <1089314059.40ed9d0b20438@webmail.mail.gatech.edu> References: <200407071701.i67H15Kr008431@portal.open-bio.org> <1089314059.40ed9d0b20438@webmail.mail.gatech.edu> Message-ID: On Thu, 8 Jul 2004 gtg974p@mail.gatech.edu wrote: > Hi, > I am trying to find the positions of a particular pattern in the whole > chromosome and write the o/p in the GFF format. I have few questions (I am a > newbie, sorry if the questions are silly) > > 1) How do I split the whole chromosome into smaller chunks and then join it > again? > Read the Seq Howto and see scripts/seq/split_seq.PLS you will probably want to a good job naming the pieces so they can be put back together > 2) If I split them into say 20 different files > o/p of file (GFF) > > seqID source Feature Start End Score Strand > > There will be 20 different GFF files with each start & end corresponding to its > > own (small chunks) file, how do I make it correspond to the whole chromosome > coordinates? math + perl code I hope... encode the offset in the name of the sequence, seq_1000-2000 then use a regexp to remap your numbers back into the original coordinate space > > 3) While writing the GFF file how can I specify that I want to sort it by > Score, by default its sorting by start? > By default it sorts nothing. I don't know where you are getting your GFF from though. read in all the features and sort by score my @sorted_features = sort { $a->score <=> $b->score } @features; -jason > Please help me, > Thanks, > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From barry.moore at genetics.utah.edu Thu Jul 8 15:04:39 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu Jul 8 22:42:21 2004 Subject: [Bioperl-l] questions about the Bioperl installation. In-Reply-To: <20040707163518.54305.qmail@web53305.mail.yahoo.com> References: <20040707163518.54305.qmail@web53305.mail.yahoo.com> Message-ID: <40ED9AC7.2030403@genetics.utah.edu> Jian- First one suggestion: when you send error messages and code examples to the list it's better if you paste the text directly into your message rather than sending it as an attachment. Most people don't like to open attachments from people they don't know, and many people on this list use Unix/Linux/OSX and may not be set up to conveniently open a Word attachment. (With XP, you can copy and paste from the command prompt into the body of your mail.) Now, about your quesiton. You are trying to use the CPAN module to install bioperl on Windows XP. It may be possible to configure the CPAN shell to work probably on Windows, but you don't want to do that for a simple install of bioperl. PPM which was installed when you installed ActivePerl does the same thing much more easily. Do the following: Open a Command Prompt. Type the following commands: ppm rep add BioPerl http://bioperl.org/DIST/ rep add uwinnipeg http://theoryx5.uwinnipeg.ca/ppms/ install Bioperl-1.4 That should install all the basic stuff for you. Barry Jane Sun wrote: > Dear Sir or Madam: > I am now trying to learn the Bioperl programming for our bioinformatics projects. I tried to down load the ActivePerl-5.8.4.810 from the Perl website and installed the Activeperl succesfully in my computer.(OS windowns XP) And I can run some pl file successfully. Then I tried to install the Bioperl too. I get some instructions on the Bioperl Intallation from your site, but it does not work. The installation always shows the error message even I use the Force Installation. Here I also attached the error message interface for your reference.I hope that I can get your professional suggestions or directions on this issue. And also could you please give me some advice on how to start the bioperl programming and where I can get some sample source code for reference? > >Thanks a lot in advance and hope tp get your reply soon. >Yours sincerely >Jian Sun > > > > >--------------------------------- >Do you Yahoo!? >New and Improved Yahoo! Mail - 100MB free storage! > > >------------------------------------------------------------------------ > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From miroslavac at health.nb.ca Thu Jul 8 13:14:51 2004 From: miroslavac at health.nb.ca (Miroslava Cuperlovic-Culf) Date: Thu Jul 8 22:42:50 2004 Subject: [Bioperl-l] Sim4 help Message-ID: <40ED810B.1070408@health.nb.ca> Dear All, I am a complete novice to BioPerl and (thus) experiencing quite a few problems. I am trying to write a software that would use Bio::Tools::Sim4::Results to get exons for an input sequence. I tried to follow instructions on bioperl.org site without much luck. Would some kind soul out there mind sharing with me an example of a software that successfully utilizes Sim4. Any help would be truly appreciated. Cheers Mira -- Miroslava Cuperlovic-Culf, Ph.D. Beausejour Medical Research Institute (IRMB) 37 Providence Street, Moncton, NB E1C 8X3 Canada e-mail: miroslavac@health.nb tel: 506-862-7572 (off.); 506-862-7570 (lab) fax: 506-862-4222 From Wiepert.Mathieu at mayo.edu Thu Jul 8 16:39:58 2004 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Thu Jul 8 22:43:22 2004 Subject: [Bioperl-l] subsequence with features, like biojava? Message-ID: <2F41CC6C9777D311ACBD009027B108EA08C48177@excsrv32.mayo.edu> Hi, I was wondering if there is a subsequence method for any of the Seq objects that works like biojava View a sub-section of a given sequence object, including all the features intersecting that region. I can get subsequences, and features, but do I have to construct my own seq object then? Or am I missing the obvious. -mat From brian_osborne at cognia.com Fri Jul 9 07:25:42 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jul 9 07:30:01 2004 Subject: [Bioperl-l] Sim4 help In-Reply-To: <40ED810B.1070408@health.nb.ca> Message-ID: Mira, There's a section on SIM4 parsing in the bptutorial, but you could also use SearchIO, it's described in a HOWTO, http://bioperl.org/HOWTOs/html/SearchIO.html, but there's no example code for SIM4. There is also a script, scripts/utilities/search2BSML which refers about SIM4. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Miroslava Cuperlovic-Culf Sent: Thursday, July 08, 2004 1:15 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Sim4 help Dear All, I am a complete novice to BioPerl and (thus) experiencing quite a few problems. I am trying to write a software that would use Bio::Tools::Sim4::Results to get exons for an input sequence. I tried to follow instructions on bioperl.org site without much luck. Would some kind soul out there mind sharing with me an example of a software that successfully utilizes Sim4. Any help would be truly appreciated. Cheers Mira -- Miroslava Cuperlovic-Culf, Ph.D. Beausejour Medical Research Institute (IRMB) 37 Providence Street, Moncton, NB E1C 8X3 Canada e-mail: miroslavac@health.nb tel: 506-862-7572 (off.); 506-862-7570 (lab) fax: 506-862-4222 From roievil at hotmail.com Fri Jul 9 16:32:35 2004 From: roievil at hotmail.com (! Roievil !) Date: Fri Jul 9 16:34:44 2004 Subject: [Bioperl-l] BIO::DB::Genbank problem handling Contigs with db->get_Stream_by_query($query); Message-ID: my $db = Bio::DB::GenBank->new; my $stream = $db->get_Stream_by_query($query); my query is 'arabidopsis thaliana [org] AND mads AND biomol_genomic [prop]' where mads is the name of the family i have to study then i iterate through features to get the exon borders where the product contains mads NCBI gives me a list of arabidopsis genomic sequences in genbank format but when in that list there is a contig which is in the default genbank format instead of the full genbank format I get the following error : -------------------- WARNING --------------------- MSG: CONTIG found. GenBank get_Stream_by_acc about to run. --------------------------------------------------- ------------- EXCEPTION ------------- MSG: WebDBSeqI Error - check query sequences! STACK Bio::DB::WebDBSeqI::get_seq_stream C:\Perl\site\lib/Bio/DB/WebDBSeqI.pm:46 4 STACK Bio::DB::NCBIHelper::get_Stream_by_acc C:\Perl\site\lib/Bio/DB/NCBIHelper. pm:415 STACK Bio::DB::NCBIHelper::postprocess_data C:\Perl\site\lib/Bio/DB/NCBIHelper.p m:309 STACK Bio::DB::WebDBSeqI::get_seq_stream C:\Perl\site\lib/Bio/DB/WebDBSeqI.pm:46 8 STACK Bio::DB::NCBIHelper::get_Stream_by_query C:\Perl\site\lib/Bio/DB/NCBIHelpe r.pm:248 STACK toplevel exonparser.pl:28 -------------------------------------- There is a note in the doc of BIO::DB::Genbank saying : Note that when querying for GenBank accessions starting with 'NT_' you will need to call $gb->request_format('fasta') beforehand, because in GenBank format (the default) the sequence part will be left out (the reason is that NT contigs are rather annotation with referencesto clones). Some work has been done to automatically detect and retrieve whole NT_clones when the data is in that format (NCBI RefSeq clones). but it seems not to work for me maybe because the contig's accession number is not starting with NT_ it is : AE005173 I don't know how to know if a sequence is a contig and how to deal with it (changing the format from default genbank to full genbank Thank you, in fact, my code works for rice and wheat (I also provide my whole code but the commentaries are in french) I also have another minor problem : for the output of my code i write fasta sequences i want to provide the accession number of the product, and the specie and the product description. I think I wrote the good piece of code : $newSeq = Bio::PrimarySeq->new(-seq => $subSeq, -display_id => $accession, -description =>$genus . "_" .$species. " " . $product) ; and if i run the script in windows it only writes the display id (not the description) if i am in linux the description is also writen. I installed bioperl in windows with perl activestate 5.6.1 and through ppm the bioperl 1.4 and bioperl bundle what to update and is that update available thank you very much Olivier glorieux The whole code : #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GenBank; use Bio::DB::Query::GenBank; # Si la ligne de commande ne contient pas les arguments attendus # alors on ecrit la ligne suivante a l'ecran my $usage = "exonParser.pl genus species motif outfile (the outfile will be in fasta format)\n". "eg: perl exonParser.pl triticum aestivum mads taMads.fa\n"; # on rentre les arguments dans leurs variables respectives my $genus = shift or die $usage; my $species = shift or die $usage; my $motif = shift or die $usage; my $outfile = shift or die $usage; # on procède à une requète sur NCBI qui va nous renvoyer toutes # les sequences d'ADN, de l'espèce entrée contenant le motif entré my $query = Bio::DB::Query::GenBank->new(-query => $genus." ". $species."[orgn] AND ". $motif. " AND biomol_genomic[prop]", -db =>"nucleotide"); print ("test \n") ; my $db = Bio::DB::GenBank->new; print ("test1 \n") ; my $stream = $db->get_Stream_by_query($query); print ("test2 \n") ; my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => 'fasta'); print ("test3 \n") ; my $sequence; # compteur du nombre de séquences my $nbSequences = 0 ; # pour chaque sequence du query while ($sequence = $stream->next_seq){ print ("\n\n Accession Séquence : ".$sequence->display_id."\n"); #my $species = $sequence->species; my @features = $sequence->get_SeqFeatures; my $reverseStrand = -1; my $accession; my $product; # pour chaque feature du format Genbank for my $f (@features) { # pour chaque primary_tag (e.g : Gene, CDS, mRNA), si ce tag est égal a mRNA my $tag = $f->primary_tag; if ($tag eq ("mRNA")) { # pour chaque tag (e.g. : organism, gene, product) # on récupère le contenut du champs produit (si il existe) my @tags = $f->get_all_tags; if ($f->has_tag("product")) { ($product) = $f->get_tag_values("product"); } } if ($tag eq( "CDS" ) ) { #print " $tag\n"; #my $accession; #my $product; # pour chaque tag (e.g. : organism, gene, product) # on récupère l'ID de la protéine (si il existe) my @tags = $f->get_all_tags; if ($f->has_tag("protein_id")) { ($accession) = $f->get_tag_values("protein_id"); } # on récupère le contenu du champs produit (si il existe) if ($f->has_tag("product")) { ($product) = $f->get_tag_values("product"); #print ($product."\n") ; } # si le produit contient le motif en minuscules ou majuscules if ($product =~ /$motif/i) { print (" Produit : ".$product."\n") ; print (" numéro d'accession produit : ".$accession."\n") ; my $seqStart = $f->start; my $seqEnd = $f->end; my $strand = $f->strand; my $newSeq; # on récupère la séquence et on la met en minuscules my $subSeq = lc($sequence->subseq($seqStart, $seqEnd)); # pour chaque Location on récupère les sous-locations my $complex_location = $f->location; my @sublocations = $complex_location->each_Location; # pour chaque sous-location on récupère la sous-séquence de l'exon # puis on la met en majuscules for my $sl (@sublocations) { my $slStart = $sl->start - $seqStart; my $slEnd = $sl->end - $seqStart; my $ucportion = uc( substr ( $subSeq, $slStart, $slEnd - $slStart + 1 ) ); substr( $subSeq, $slStart, $slEnd - $slStart + 1, $ucportion ); } # print ("test" .$subSeq. "\n") ; # On crée une nouvelle séquence format Fasta $newSeq = Bio::PrimarySeq->new(-seq => $subSeq, -display_id => $accession, -description =>$genus . "_" .$species. " " . $product) ; if ($strand == $reverseStrand) { $newSeq = $newSeq->revcom; } # on incrémente le nombre de séquences # print ("test \n") ; $nbSequences++ ; # ecrit chaque entree dans le fichier de sortie $seq_out->write_seq($newSeq); } } } } print("nombre de séquences : ".$nbSequences) ; exit; _________________________________________________________________ Tired of spam? Get advanced junk mail protection with MSN 8. http://join.msn.com/?page=features/junkmail From lstein at cshl.edu Fri Jul 9 17:02:47 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Jul 9 17:05:06 2004 Subject: [Bioperl-l] BIO::DB::Genbank problem handling Contigs with db->get_Stream_by_query($query); In-Reply-To: References: Message-ID: <200407091702.47403.lstein@cshl.edu> Can you try bioperl-live in CVS? The query module has changed recently. Lincoln On Friday 09 July 2004 04:32 pm, ! Roievil ! wrote: > my $db = Bio::DB::GenBank->new; > > my $stream = $db->get_Stream_by_query($query); > > my query is 'arabidopsis thaliana [org] AND mads AND biomol_genomic [prop]' > > where mads is the name of the family i have to study > then i iterate through features to get the exon borders where the product > contains mads > > NCBI gives me a list of arabidopsis genomic sequences in genbank format but > when in that list there is a contig which is in the default genbank format > instead of the full genbank format I get the following error : > > -------------------- WARNING --------------------- > MSG: CONTIG found. GenBank get_Stream_by_acc about to run. > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: WebDBSeqI Error - check query sequences! > > STACK Bio::DB::WebDBSeqI::get_seq_stream > C:\Perl\site\lib/Bio/DB/WebDBSeqI.pm:46 > 4 > STACK Bio::DB::NCBIHelper::get_Stream_by_acc > C:\Perl\site\lib/Bio/DB/NCBIHelper. > pm:415 > STACK Bio::DB::NCBIHelper::postprocess_data > C:\Perl\site\lib/Bio/DB/NCBIHelper.p > m:309 > STACK Bio::DB::WebDBSeqI::get_seq_stream > C:\Perl\site\lib/Bio/DB/WebDBSeqI.pm:46 > 8 > STACK Bio::DB::NCBIHelper::get_Stream_by_query > C:\Perl\site\lib/Bio/DB/NCBIHelpe > r.pm:248 > STACK toplevel exonparser.pl:28 > > -------------------------------------- > > There is a note in the doc of BIO::DB::Genbank saying : > > Note that when querying for GenBank accessions starting with 'NT_' you > will need to call $gb->request_format('fasta') beforehand, because > in GenBank format (the default) the sequence part will be left out > (the reason is that NT contigs are rather annotation with referencesto > clones). > Some work has been done to automatically detect and retrieve whole > NT_clones when the data is in that format (NCBI RefSeq clones). > > but it seems not to work for me maybe because the contig's accession number > is not starting with NT_ it is : AE005173 > > I don't know how to know if a sequence is a contig and how to deal with it > (changing the format from default genbank to full genbank > > Thank you, in fact, my code works for rice and wheat > > (I also provide my whole code but the commentaries are in french) > > > > > I also have another minor problem : for the output of my code i write fasta > sequences i want to provide the accession number of the product, and the > specie and the product description. > > I think I wrote the good piece of code : > > $newSeq = Bio::PrimarySeq->new(-seq => $subSeq, > -display_id => $accession, > -description =>$genus . "_" > .$species. " " . $product) ; > > and if i run the script in windows it only writes the display id (not the > description) > if i am in linux the description is also writen. > > I installed bioperl in windows with perl activestate 5.6.1 and through ppm > the bioperl 1.4 and bioperl bundle > > what to update and is that update available > > thank you very much > > Olivier glorieux > > > > > > The whole code : > > #!/usr/bin/perl > > > use strict; > use Bio::SeqIO; > use Bio::Seq; > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > # Si la ligne de commande ne contient pas les arguments attendus > # alors on ecrit la ligne suivante a l'ecran > my $usage = "exonParser.pl genus species motif outfile (the outfile will be > in fasta format)\n". > "eg: perl exonParser.pl triticum aestivum mads taMads.fa\n"; > > # on rentre les arguments dans leurs variables respectives > my $genus = shift or die $usage; > my $species = shift or die $usage; > my $motif = shift or die $usage; > my $outfile = shift or die $usage; > > # on proc?de ? une requ?te sur NCBI qui va nous renvoyer toutes > # les sequences d'ADN, de l'esp?ce entr?e contenant le motif entr? > my $query = Bio::DB::Query::GenBank->new(-query => $genus." ". > $species."[orgn] AND ". $motif. " AND biomol_genomic[prop]", > -db =>"nucleotide"); > print ("test \n") ; > my $db = Bio::DB::GenBank->new; > print ("test1 \n") ; > my $stream = $db->get_Stream_by_query($query); > print ("test2 \n") ; > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => 'fasta'); > print ("test3 \n") ; > my $sequence; > > # compteur du nombre de s?quences > my $nbSequences = 0 ; > > # pour chaque sequence du query > while ($sequence = $stream->next_seq){ > print ("\n\n Accession S?quence : ".$sequence->display_id."\n"); > #my $species = $sequence->species; > my @features = $sequence->get_SeqFeatures; > my $reverseStrand = -1; > my $accession; > my $product; > # pour chaque feature du format Genbank > for my $f (@features) { > # pour chaque primary_tag (e.g : Gene, CDS, mRNA), si ce tag est > ?gal a mRNA > my $tag = $f->primary_tag; > if ($tag eq ("mRNA")) { > # pour chaque tag (e.g. : organism, gene, product) > # on r?cup?re le contenut du champs produit (si il existe) > my @tags = $f->get_all_tags; > if ($f->has_tag("product")) { > ($product) = $f->get_tag_values("product"); > } > } > if ($tag eq( "CDS" ) ) { > #print " $tag\n"; > #my $accession; > #my $product; > # pour chaque tag (e.g. : organism, gene, product) > # on r?cup?re l'ID de la prot?ine (si il existe) > my @tags = $f->get_all_tags; > if ($f->has_tag("protein_id")) { > ($accession) = $f->get_tag_values("protein_id"); > } > # on r?cup?re le contenu du champs produit (si il existe) > if ($f->has_tag("product")) { > ($product) = $f->get_tag_values("product"); > #print ($product."\n") ; > } > # si le produit contient le motif en minuscules ou majuscules > if ($product =~ /$motif/i) { > print (" Produit : ".$product."\n") ; > print (" num?ro d'accession produit : > ".$accession."\n") ; > my $seqStart = $f->start; > my $seqEnd = $f->end; > my $strand = $f->strand; > my $newSeq; > # on r?cup?re la s?quence et on la met en minuscules > my $subSeq = lc($sequence->subseq($seqStart, $seqEnd)); > # pour chaque Location on r?cup?re les sous-locations > my $complex_location = $f->location; > my @sublocations = $complex_location->each_Location; > # pour chaque sous-location on r?cup?re la sous-s?quence de > l'exon > # puis on la met en majuscules > for my $sl (@sublocations) { > my $slStart = $sl->start - $seqStart; > my $slEnd = $sl->end - $seqStart; > > my $ucportion = uc( substr ( $subSeq, $slStart, $slEnd > - $slStart + 1 ) ); > substr( $subSeq, $slStart, $slEnd - $slStart + 1, > $ucportion ); > > } > # print ("test" .$subSeq. "\n") ; > # On cr?e une nouvelle s?quence format Fasta > $newSeq = Bio::PrimarySeq->new(-seq => $subSeq, > -display_id => $accession, > -description =>$genus . "_" > .$species. " " . $product) ; > if ($strand == $reverseStrand) { > $newSeq = $newSeq->revcom; > } > # on incr?mente le nombre de s?quences > # print ("test \n") ; > $nbSequences++ ; > # ecrit chaque entree dans le fichier de sortie > $seq_out->write_seq($newSeq); > } > } > } > } > print("nombre de s?quences : ".$nbSequences) ; > exit; > > _________________________________________________________________ > Tired of spam? Get advanced junk mail protection with MSN 8. > http://join.msn.com/?page=features/junkmail > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From jsun at biologicaltargets.com Fri Jul 9 15:36:56 2004 From: jsun at biologicaltargets.com (jsun@biologicaltargets.com) Date: Sat Jul 10 09:37:38 2004 Subject: [Bioperl-l] Questions from a Bioperl beginner? Message-ID: <4301.70.241.56.48.1089401816.squirrel@webmail.biologicaltargets.com> Dear Sir or Madam; I tried to run some small bioperl program after I successfully installed Perl and Bioperl in my computer. While I get some problems and need to ask for your kind help. I run a pl file as attached below which I copied from bptutorial file: ************************************************************** use Bio::Perl; use strict; use warnings; my $seq_object = get_sequence('swissprot',"ROA1_HUMAN"); # uses the default database - nr in this case my $blast_result = blast_sequence($seq_object); write_blast(">roa1.blast",$blast_result); ***************************************************************** Since I didn't make any changes to the source code, it should run fine but it failed on my computer. And the error message is: ..... Submitted Blast for [ROA1_HUMAN] ----------------WARNING---------------- MSG: UNKNOWN


ERROR: Results for RID 1089388321-32330-213160811820 not found
----------------------------------- So what's the problem here? and I also tried the Bio::Tools::Run::RemoteBlast function, it shows the same error. How can I solve this problem? And is there any troubleshooting documents that I can use if I get any further problem during my testing? Your help are the most appreciated. Thanks a lot Jian Sun From s.paul at surrey.ac.uk Fri Jul 9 20:05:59 2004 From: s.paul at surrey.ac.uk (S.Paul) Date: Sat Jul 10 09:38:06 2004 Subject: [Bioperl-l] installation of bioperl-db Message-ID: <003901c46611$aed8d6a0$d46fe383@LTCEP1SP> Hi Everybody: I am trying to install bioperl-db-0.1 using Active State Perl on Windows 2000 but am getting the following error message: Error: Failed to download URL http://www.bioperl.org/Core/Latest/index.shtml/bio perldb.ppd: 404 Not Found This is what I did : >ppm rep add bioperldb http://www.bioperl.org/Core/Latest/index.shtml install bioperldb I would appreciate if somebody can help me in this regard. Thanks Sujoy Paul Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk From barry.moore at genetics.utah.edu Sat Jul 10 10:27:29 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Sat Jul 10 10:29:33 2004 Subject: [Bioperl-l] Questions from a Bioperl beginner? In-Reply-To: <4301.70.241.56.48.1089401816.squirrel@webmail.biologicaltargets.com> References: <4301.70.241.56.48.1089401816.squirrel@webmail.biologicaltargets.com> Message-ID: <40EFFCD1.9050806@genetics.utah.edu> Jian- Welcome to the wonderful world of BioPerl where the documentation is thin and the code is complex. Actually it's not that bad, and you have almost cleared all the hurdles to getting your first BioPerl code up and running. After you get a few your first few scripts running, you'll find you use code from them over and over and it becomes much smoother. I'm not sure why your example from the tutorial didn't work. That particular peice of code you tried uses an more basic (and I think older) way of retrieving sequence from the database, and it may well be broken as there probably aren't very many people using that method anymore. Try the following piece of code which worked fine for me just now. It's more complicated, but it will take you farther in understanding how to retrieve sequence the right way, and how to get the information stored in that sequence back out so you can use it. Barry ------------------------------------------------------------------------------- #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed #Get some sequence IDs either like below, or read in from a file. Note that #this sample script works with the accession numbers below (at least at the time #it was written). If you add different accession numbers, and you get errors, #you may be calling for something that the sequence doesn't have. You'll have #to add your own error trapping code to handle that. my @ids = ('U59228', 'AB039327', 'BC035972'); #Create the GenBank database object to read from the database. my $gb = new Bio::DB::GenBank(); #Create a sequence stream to pass the sequences from the database to the program. my $seqio = $gb->get_Stream_by_id(\@ids); #Loop over all of the sequences that you requested. while (my $seq = $seqio->next_seq) { #Here is how you get methods directly from the RichSeq object. Replace #'display_name' with any other method in Table 2. that can be called on #either the RichSeq object directly, or the PrimarySeq object which it has #inherited. print $seq->display_name,"\n"; #Here is how to access the classification data from the species object. my $species = $seq->species; print $species->common_name,"\n"; my @class = $species->classification; print "@class\n"; #Here is a general way to call things that are stored as a Bio::SeqFeature:: #Generic object. Replace 'source' with any other of the "major" headings in #the feature table (e.g gene, CDS, etc.) and replace 'organism' with any of #the tag values found under that heading (mol_type, locus_tag, gene, etc.) my @source_feats = grep { $_->primary_tag eq 'source' } $seq->get_SeqFeatures(); my $source_feat = shift @source_feats; my @mol_type = $source_feat->get_tag_values('mol_type'); print "@mol_type\n"; #Here is a general way to call things that are stored as some type of a #Bio::Annotation oject. This includes reference information, and comments. #Replace reference with 'comment' to get the comment, and replace #$ref->authors with $ref->title (or location, medline, etc.) to get other #reference categories my $ann = $seq->annotation(); my @references = ($ann->get_Annotations('reference')); my $ref = shift @references; my ($title, $authors, $location, $pubmed, $reference); if (defined $ref) { $authors = $ref->authors; print "$authors\n"; } print "\n"; } jsun@biologicaltargets.com wrote: >Dear Sir or Madam; > I tried to run some small bioperl program after I successfully installed >Perl and Bioperl in my computer. While I get some problems and need to >ask for your kind help. I run a pl file as attached below which I copied >from bptutorial file: >************************************************************** >use Bio::Perl; >use strict; >use warnings; > >my $seq_object = get_sequence('swissprot',"ROA1_HUMAN"); > > # uses the default database - nr in this case >my $blast_result = blast_sequence($seq_object); > >write_blast(">roa1.blast",$blast_result); >***************************************************************** > >Since I didn't make any changes to the source code, it should run fine but >it failed on my computer. And the error message is: > >..... >Submitted Blast for [ROA1_HUMAN] >----------------WARNING---------------- >MSG: UNKNOWN > >

QBlastInfoBegin >-->

>


ERROR: Results for RID >1089388321-32330-213160811820 not found
>----------------------------------- > >So what's the problem here? and I also tried the >Bio::Tools::Run::RemoteBlast function, it shows the same error. How can I >solve this problem? And is there any troubleshooting documents >that I can use if I get any further problem during my testing? > >Your help are the most appreciated. >Thanks a lot >Jian Sun > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From jason at cgt.duhs.duke.edu Sat Jul 10 11:42:35 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Jul 10 11:44:48 2004 Subject: [Bioperl-l] installation of bioperl-db In-Reply-To: <003901c46611$aed8d6a0$d46fe383@LTCEP1SP> References: <003901c46611$aed8d6a0$d46fe383@LTCEP1SP> Message-ID: There is no bioperl ppd for bioperl-db Would someone like to volunteer to put this together? I don't know what the plan is for a bioperl-db stable release at some point, but when it happens we'll need someone to do a ppm release. Also the bioperl repository for PPM is so the command you listed below wouldn't work anyways. You'd want this URL http://www.bioperl.org/DIST/ -jason On Fri, 9 Jul 2004, S.Paul wrote: > Hi Everybody: > > I am trying to install bioperl-db-0.1 using Active State Perl on Windows 2000 but am getting the following error message: > > Error: Failed to download URL http://www.bioperl.org/Core/Latest/index.shtml/bio > perldb.ppd: 404 Not Found > > This is what I did : > > >ppm > rep add bioperldb http://www.bioperl.org/Core/Latest/index.shtml > install bioperldb > > I would appreciate if somebody can help me in this regard. > > Thanks > > Sujoy Paul > Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Sat Jul 10 11:44:56 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Jul 10 11:47:12 2004 Subject: [Bioperl-l] Questions from a Bioperl beginner? In-Reply-To: <4301.70.241.56.48.1089401816.squirrel@webmail.biologicaltargets.com> References: <4301.70.241.56.48.1089401816.squirrel@webmail.biologicaltargets.com> Message-ID: You are probably using a version of bioperl which does not have the changes to NCBI's BLAST RID names. If you grab the latest RemoteBlast.pm from CVS http://cvs.open-bio.org or from the SRC tree http://www.bioperl.org/SRC/ So RemoteBlast.pm is here http://www.bioperl.org/SRC/branch-1-4/Bio/Tools/Run/ just drop the latest file in your local distribution. -jason On Fri, 9 Jul 2004 jsun@biologicaltargets.com wrote: > Dear Sir or Madam; > I tried to run some small bioperl program after I successfully installed > Perl and Bioperl in my computer. While I get some problems and need to > ask for your kind help. I run a pl file as attached below which I copied > from bptutorial file: > ************************************************************** > use Bio::Perl; > use strict; > use warnings; > > my $seq_object = get_sequence('swissprot',"ROA1_HUMAN"); > > # uses the default database - nr in this case > my $blast_result = blast_sequence($seq_object); > > write_blast(">roa1.blast",$blast_result); > ***************************************************************** > > Since I didn't make any changes to the source code, it should run fine but > it failed on my computer. And the error message is: > > ..... > Submitted Blast for [ROA1_HUMAN] > ----------------WARNING---------------- > MSG: UNKNOWN > >

QBlastInfoBegin > -->

>


ERROR: Results for RID > 1089388321-32330-213160811820 not found
> ----------------------------------- > > So what's the problem here? and I also tried the > Bio::Tools::Run::RemoteBlast function, it shows the same error. How can I > solve this problem? And is there any troubleshooting documents > that I can use if I get any further problem during my testing? > > Your help are the most appreciated. > Thanks a lot > Jian Sun > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From hlapp at gmx.net Sat Jul 10 14:48:30 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jul 10 14:50:35 2004 Subject: [Bioperl-l] installation of bioperl-db In-Reply-To: Message-ID: Also, unless you want to use a release that is unsupported and doesn't support the current version of the biosql schema, you don't want the 0.1 release of bioperl-db. Instead, download the latest revision from cvs. There is no compiled code, and other than your DBI driver of choice and bioperl itself there are no dependencies, so it shoulnd't be too difficult even w/o a package manager once you get the rest set up. -hilmar On Saturday, July 10, 2004, at 08:42 AM, Jason Stajich wrote: > There is no bioperl ppd for bioperl-db > Would someone like to volunteer to put this together? I don't know > what > the plan is for a bioperl-db stable release at some point, but when it > happens we'll need someone to do a ppm release. > > Also the bioperl repository for PPM is so the command you listed below > wouldn't work anyways. > You'd want this URL > http://www.bioperl.org/DIST/ > > -jason > > On Fri, 9 Jul 2004, S.Paul wrote: > >> Hi Everybody: >> >> I am trying to install bioperl-db-0.1 using Active State Perl on >> Windows 2000 but am getting the following error message: >> >> Error: Failed to download URL >> http://www.bioperl.org/Core/Latest/index.shtml/bio >> perldb.ppd: 404 Not Found >> >> This is what I did : >> >>> ppm >> rep add bioperldb http://www.bioperl.org/Core/Latest/index.shtml >> install bioperldb >> >> I would appreciate if somebody can help me in this regard. >> >> Thanks >> >> Sujoy Paul >> Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From barry.moore at genetics.utah.edu Sat Jul 10 10:04:57 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Sun Jul 11 09:58:44 2004 Subject: [Bioperl-l] installation of bioperl-db In-Reply-To: <003901c46611$aed8d6a0$d46fe383@LTCEP1SP> References: <003901c46611$aed8d6a0$d46fe383@LTCEP1SP> Message-ID: <40EFF789.1010305@genetics.utah.edu> Sujoy- You can remove that repository you added. The problem is that you had the wrong address, and there is nothing at that address. One other problem that you will have is that there is one ppm file missing at the BioPerl repository (that for installing GD). If you add Randy Kobes repository at the University of Winnipeg also, you should have everthing you need for a smooth installation. Try the following: rep add BioPerl http://bioperl.org/DIST/ rep add uwinnipeg http://theoryx5.uwinnipeg.ca/ppms/ install Bioperl-1.4 Barry S.Paul wrote: >Hi Everybody: > >I am trying to install bioperl-db-0.1 using Active State Perl on Windows 2000 but am getting the following error message: > >Error: Failed to download URL http://www.bioperl.org/Core/Latest/index.shtml/bio >perldb.ppd: 404 Not Found > >This is what I did : > > > >>ppm >> >> >rep add bioperldb http://www.bioperl.org/Core/Latest/index.shtml >install bioperldb > >I would appreciate if somebody can help me in this regard. > >Thanks > >Sujoy Paul >Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk > > >------------------------------------------------------------------------ > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From cjfields at uiuc.edu Mon Jul 12 00:27:47 2004 From: cjfields at uiuc.edu (Chris Fields) Date: Mon Jul 12 00:30:40 2004 Subject: [Bioperl-l] Batch Entrez with Bio::DB:GenBank or Bio::DB:GenPept Message-ID: Is there a way to use Batch Entrez via Bio::DB::GenBank or Bio::DB::GenPept, or does get_Stream_by_acc() or similar methods automatically retrieve in batch mode? I want to retrieve ~1500-2000 protein sequences from GenBank, and I couldn't find a clear-cut way for specifically asking for a batch retrieval. Or do you recommend using the Entrez web interface directly? Chris Fields Postdoctoral Reseacher - Dept. of Biochemistry Laboratory of Dr. Robert Switzer University of Illinois at Urbana-Champaign From brian_osborne at cognia.com Mon Jul 12 09:21:27 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Jul 12 09:23:37 2004 Subject: [Bioperl-l] Batch Entrez with Bio::DB:GenBank or Bio::DB:GenPept In-Reply-To: Message-ID: Chris, Yes, get_stream_by_Acc already does this. my $db = Bio::DB::GenBank->new; my $seqio = $db->get_Stream_by_acc(["M12345","AB123456"]); while (my $seq = $seqio->next_seq){ ... } Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Chris Fields Sent: Monday, July 12, 2004 12:28 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Batch Entrez with Bio::DB:GenBank or Bio::DB:GenPept Is there a way to use Batch Entrez via Bio::DB::GenBank or Bio::DB::GenPept, or does get_Stream_by_acc() or similar methods automatically retrieve in batch mode? I want to retrieve ~1500-2000 protein sequences from GenBank, and I couldn't find a clear-cut way for specifically asking for a batch retrieval. Or do you recommend using the Entrez web interface directly? Chris Fields Postdoctoral Reseacher - Dept. of Biochemistry Laboratory of Dr. Robert Switzer University of Illinois at Urbana-Champaign _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Mon Jul 12 10:04:45 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Jul 12 10:06:56 2004 Subject: [Bioperl-l] Batch Entrez with Bio::DB:GenBank or Bio::DB:GenPept In-Reply-To: References: Message-ID: <200407121004.45280.lstein@cshl.edu> Bio::DB::GenBank and Bio::DB::GenPept use the approved batch mode retrieval. There is, however, a caveat, which is that NCBI will arbitrarily truncate the number of sequences you can retrieve at periods of high load. Recent CVS versions of bioperl will restart the fetch if this happens, but I recommend that you limit your requests to periods outside the 1-6 pm EST period of high activity. Lincoln On Monday 12 July 2004 12:27 am, Chris Fields wrote: > Is there a way to use Batch Entrez via Bio::DB::GenBank or > Bio::DB::GenPept, or does get_Stream_by_acc() or similar methods > automatically retrieve in batch mode? I want to retrieve > ~1500-2000 protein sequences from GenBank, and I couldn't find a > clear-cut way for specifically asking for a batch retrieval. Or do > you recommend using the Entrez web interface directly? > > Chris Fields > Postdoctoral Reseacher - Dept. of Biochemistry > Laboratory of Dr. Robert Switzer > University of Illinois at Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From cjfields at uiuc.edu Mon Jul 12 10:16:27 2004 From: cjfields at uiuc.edu (Chris Fields) Date: Mon Jul 12 10:18:37 2004 Subject: [Bioperl-l] Batch Entrez with Bio::DB:GenBank or Bio::DB:GenPept In-Reply-To: <200407121004.45280.lstein@cshl.edu> References: <200407121004.45280.lstein@cshl.edu> Message-ID: <6.1.1.1.2.20040712091522.0b44ce88@express.cites.uiuc.edu> Thanks Brian and Lincoln! Just wanted to clarify that. I noticed that the documentation mentioned not spamming the NCBI server, so I did want to prevent that if possible. Chris At 09:04 AM 7/12/2004, Lincoln Stein wrote: >Bio::DB::GenBank and Bio::DB::GenPept use the approved batch mode >retrieval. There is, however, a caveat, which is that NCBI will >arbitrarily truncate the number of sequences you can retrieve at >periods of high load. Recent CVS versions of bioperl will restart >the fetch if this happens, but I recommend that you limit your >requests to periods outside the 1-6 pm EST period of high activity. > >Lincoln > >On Monday 12 July 2004 12:27 am, Chris Fields wrote: > > Is there a way to use Batch Entrez via Bio::DB::GenBank or > > Bio::DB::GenPept, or does get_Stream_by_acc() or similar methods > > automatically retrieve in batch mode? I want to retrieve > > ~1500-2000 protein sequences from GenBank, and I couldn't find a > > clear-cut way for specifically asking for a batch retrieval. Or do > > you recommend using the Entrez web interface directly? > > > > Chris Fields > > Postdoctoral Reseacher - Dept. of Biochemistry > > Laboratory of Dr. Robert Switzer > > University of Illinois at Urbana-Champaign > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > >-- >Lincoln D. Stein >Cold Spring Harbor Laboratory >1 Bungtown Road >Cold Spring Harbor, NY 11724 __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 From Laure.Durufle at serono.com Mon Jul 12 12:55:45 2004 From: Laure.Durufle at serono.com (Laure.Durufle@serono.com) Date: Mon Jul 12 16:30:49 2004 Subject: [Bioperl-l] pir.pm => bug Message-ID: Hi, I noticed something about embl.pm : when we use the method $seq->species : this returns only the last species. But in embl, one entry can have two organisms : I write a method get_species in RichSeq . I send you the corrected package with the new package embl : ******************************************************************************************** S - This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. e-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain malware. The presence of this disclaimer is not a proof that it was originated at Serono International S.A. or one of its affiliates. Serono International S.A and its affiliates therefore do not accept liability for any errors or omissions in the content of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. Serono International SA, 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com. ********************************************************************************************* From Laure.Durufle at serono.com Mon Jul 12 13:00:53 2004 From: Laure.Durufle at serono.com (Laure.Durufle@serono.com) Date: Mon Jul 12 16:30:50 2004 Subject: [Bioperl-l] (no subject) Message-ID: Hi, I noticed something : in the the package embl.pm, the method species returns only the last organism : but in embl, one entry can belong to 2 organisms. I write a method get_species to obtain all organisms in RichSeq.pm and in embl.pm, we add push @{$params{'-species'}},$species ; instead $params{'-species'} = $species ; # $Id: RichSeq.pm,v 1.9 2002/11/11 18:16:31 lapp Exp $ # # BioPerl module for Bio::Seq::RichSeq # # Cared for by Ewan Birney # # Copyright Ewan Birney # # You may distribute this module under the same terms as perl itself # POD documentation - main docs before the code =head1 NAME Bio::Seq::RichSeq - Module implementing a sequence created from a rich sequence database entry =head1 SYNOPSIS See Bio::Seq::RichSeqI and documentation of methods. =head1 DESCRIPTION This module implements Bio::Seq::RichSeqI, an interface for sequences created from or created for entries from/of rich sequence databanks, like EMBL, GenBank, and SwissProt. Methods added to the Bio::SeqI interface therefore focus on databank-specific information. Note that not every rich databank format may use all of the properties provided. =head1 Implemented Interfaces This class implementes the following interfaces. =over 4 =item Bio::Seq::RichSeqI Note that this includes implementing Bio::PrimarySeqI and Bio::SeqI. =item Bio::IdentifiableI =item Bio::DescribableI =item Bio::AnnotatableI =back =head1 FEEDBACK =head2 Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists =head2 Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bugzilla.bioperl.org/ =head1 AUTHOR - Ewan Birney Email birney@ebi.ac.uk Describe contact details here =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ =cut # Let the code begin... package Bio::Seq::RichSeq; use vars qw($AUTOLOAD @ISA); use strict; # Object preamble - inherits from Bio::Root::Object use Bio::Seq; use Bio::Seq::RichSeqI; use Data::Denter; @ISA = qw(Bio::Seq Bio::Seq::RichSeqI); =head2 new Title : new Usage : $seq = Bio::Seq::RichSeq->new( -seq => 'ATGGGGGTGGTGGTACCCT', -id => 'human_id', -accession_number => 'AL000012', ); Function: Returns a new seq object from basic constructors, being a string for the sequence and strings for id and accession_number Returns : a new Bio::Seq::RichSeq object =cut sub new { # standard new call.. my($caller,@args) = @_; my $self = $caller->SUPER::new(@args); $self->{'_dates'} = []; $self->{'_secondary_accession'} = []; $self->{'_species'} = []; my ($dates, $xtra, $sv, $keywords, $pid, $mol, $division,$species ) = $self->_rearrange([qw(DATES SECONDARY_ACCESSIONS SEQ_VERSION KEYWORDS PID MOLECULE DIVISION SPECIES )], @args); defined $division && $self->division($division); defined $mol && $self->molecule($mol); defined $keywords && $self->keywords($keywords); defined $sv && $self->seq_version($sv); defined $pid && $self->pid($pid); #defined $pid && $self->species($pid); if( defined $dates ) { if( ref($dates) =~ /array/i ) { foreach ( @$dates) { $self->add_date($_); } } else { $self->add_date($dates); } } if( defined $species ) { if( ref($species) =~ /array/i ) { foreach ( @$species) { $self->add_species($_); } } else { $self->add_species($species); } } if( defined $xtra ) { if( ref($xtra) =~ /array/i ) { foreach ( @$xtra) { $self->add_secondary_accession($_); } } else { $self->add_secondary_accession($xtra); } } return $self; } =head2 division Title : division Usage : $obj->division($newval) Function: Returns : value of division Args : newvalue (optional) =cut sub division { my $obj = shift; if( @_ ) { my $value = shift; $obj->{'_division'} = $value; } return $obj->{'_division'}; } =head2 molecule Title : molecule Usage : $obj->molecule($newval) Function: Returns : type of molecule (DNA, mRNA) Args : newvalue (optional) =cut sub molecule { my $obj = shift; if( @_ ) { my $value = shift; $obj->{'_molecule'} = $value; } return $obj->{'_molecule'}; } =head2 add_species Title : add_species Usage : $self->add_species($species) Function: adds a species Example : Returns : an array of such strings Args : =cut sub add_species { my ($self,@species) = @_; foreach my $dt ( @species ) { push(@{$self->{'_species'}},$dt); } } =head2 get_species Title : get_species Usage : Function: Example : Returns : an array of strings Args : =cut sub get_species{ my ($self) = @_; return @{$self->{'_species'}}; } =head2 add_date Title : add_date Usage : $self->add_date($datestr) Function: adds a date Example : Returns : a date string or an array of such strings Args : =cut sub add_date { my ($self,@dates) = @_; foreach my $dt ( @dates ) { push(@{$self->{'_dates'}},$dt); } } =head2 get_dates Title : get_dates Usage : Function: Example : Returns : an array of date strings Args : =cut sub get_dates{ my ($self) = @_; return @{$self->{'_dates'}}; } =head2 pid Title : pid Usage : Function: Get (and set, depending on the implementation) the PID property for the sequence. Example : Returns : a string Args : =cut sub pid { my ($self,$pid) = @_; if(defined($pid)) { $self->{'_pid'} = $pid; } return $self->{'_pid'}; } =head2 accession Title : accession Usage : $obj->accession($newval) Function: Whilst the underlying sequence object does not have an accession, so we need one here. In this implementation this is merely a synonym for accession_number(). Example : Returns : value of accession Args : newvalue (optional) =cut sub accession { my ($obj,@args) = @_; return $obj->accession_number(@args); } =head2 add_secondary_accession Title : add_secondary_accession Usage : $self->add_domment($ref) Function: adds a secondary_accession Example : Returns : Args : a string or an array of strings =cut sub add_secondary_accession { my ($self) = shift; foreach my $dt ( @_ ) { push(@{$self->{'_secondary_accession'}},$dt); } } =head2 get_secondary_accessions Title : get_secondary_accessions Usage : Function: Example : Returns : An array of strings Args : =cut sub get_secondary_accessions{ my ($self,@args) = @_; return @{$self->{'_secondary_accession'}}; } =head2 seq_version Title : seq_version Usage : $obj->seq_version($newval) Function: Example : Returns : value of seq_version Args : newvalue (optional) =cut sub seq_version{ my ($obj,$value) = @_; if( defined $value) { $obj->{'_seq_version'} = $value; } return $obj->{'_seq_version'}; } =head2 keywords Title : keywords Usage : $obj->keywords($newval) Function: Returns : value of keywords (a string) Args : newvalue (optional) (a string) =cut sub keywords { my $obj = shift; if( @_ ) { my $value = shift; $obj->{'_keywords'} = $value; } return $obj->{'_keywords'}; } # ## ### Deprecated methods kept for ease of transtion ## # sub each_date { my ($self) = @_; $self->warn("Deprecated method... please use get_dates"); return $self->get_dates; } sub each_secondary_accession { my ($self) = @_; $self->warn("each_secondary_accession - deprecated method. use get_secondary_accessions"); return $self->get_secondary_accessions; } sub sv { my ($obj,$value) = @_; $obj->warn("sv - deprecated method. use seq_version"); $obj->seq_version($value); } 1; Best regards Laure Durufle ******************************************************************************************** S - This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. e-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain malware. The presence of this disclaimer is not a proof that it was originated at Serono International S.A. or one of its affiliates. Serono International S.A and its affiliates therefore do not accept liability for any errors or omissions in the content of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. Serono International SA, 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com. ********************************************************************************************* From echuong at gmail.com Mon Jul 12 21:20:06 2004 From: echuong at gmail.com (Edward Chuong) Date: Mon Jul 12 21:22:05 2004 Subject: [Bioperl-l] Help on a basic EST-genomic alignment script In-Reply-To: References: <244d2e0e040625152477e9d07a@mail.gmail.com> <244d2e0e04062701275dae0799@mail.gmail.com> <244d2e0e0406281221132be10e@mail.gmail.com> <244d2e0e0406281529586c7693@mail.gmail.com> <244d2e0e04062915137241bdc4@mail.gmail.com> <244d2e0e04070618383cb7d466@mail.gmail.com> <244d2e0e04070821157bc5ff88@mail.gmail.com> <244d2e0e0407091539228062b6@mail.gmail.com> <244d2e0e0407092131398669e7@mail.gmail.com> Message-ID: <244d2e0e04071218207c8c2410@mail.gmail.com> Hey, I just did this on a ~180mb file (the gbff file from ftp://ftp.ncbi.nih.gov/refseq/M_musculus/mRNA_Prot/) use Bio::Index::GenBank; ... 274 275 my $mus_gbff = "$BLAST_DBs/musRNA/mouse.rna.gbff"; 276 277 my $inx = Bio::Index::GenBank->new ('-filename' => $mus_gbff . ".idx", 278 '-write_flag' => 'WRITE'); 279 $inx->make_index ($mus_gbff); 280 281 my $hit_full_seq = $inx->get_Seq_by_acc ($accession); 282 The program took an incredibly long time and slowed the computer down to a crawl, so I went home after 30 minutes (so I can't really check if it works) but last I checked the resulting idx was approaching 8gb. Does this sound right..? I've googled and found little. Thanks, Ed -- Edward Chuong http://iacs5.ucsd.edu/~echuong From amackey at pcbi.upenn.edu Tue Jul 13 08:17:20 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Jul 13 08:19:05 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: <200407130330.i6D3UIV3029613@pub.open-bio.org> References: <200407130330.i6D3UIV3029613@pub.open-bio.org> Message-ID: <96FCF1F6-D4C6-11D8-9AA4-000A9577009E@pcbi.upenn.edu> On Jul 12, 2004, at 11:30 PM, Chris Mungall wrote: > Added ability to parse sequence data in GFF3 - see NOTES section & > email to bioperl list for details Great! > +If you call > + > + $gffio->ignore_sequence_data_toggle(1) > + > +prior to parsing the sequence data is ignored; this is useful if you > +just want the features. It avoids the memory overhead in building and > +caching sequences Maybe just $gffio->ignore_sequence(1) would be sufficient? We tend to not add "_toggle" to every attribute; besides which "toggle" has the semantics that every time you call it, the value switches. > +Alternatively, you can call either > + > + $gffio->get_all_seqs() Again, would $gffio->get_seqs() suffice? > + $gffio->seq_id_by_h() Why have two separate APIs to get the same data? If you want to provide a hashref instead of an array of seqs, use the calling context of get_seqs() ... > +Note that these objects will not have the features attached - you have > +to do this yourself, OR call > + > + $gffio->features_attached_to_seqs_toggle(1) Again, $gffio->attach_features(1) seems sufficient ... > +Note that auto-attaching the features to seqs will incur a higher > +memory overhead as the features must be cached until the sequence data > +is found Which would be the same if you "had to do this yourself". I think it's fair that if a sequence is to have 100 features attached to it, that those 100 features will require memory. There's no *extra* memory overhead here, is there? > +=head1 TODO > + > +Make a Bio::SeqIO class specifically for GFF3 with sequence data This would lead to a much cleaner API, and could now easily be done via your improvements to Bio::Tools::GFF As an aside, instead of reimplementing your own simple FASTA parser, is it possible to pass along the Bio::Root::IO object to Bio::SeqIO::fasta directly, and let it do the work? Thanks, -Aaron From heikki at ebi.ac.uk Tue Jul 13 09:34:12 2004 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Jul 13 09:36:12 2004 Subject: [Bioperl-l] multiple species in embl In-Reply-To: References: Message-ID: <200407131434.12196.heikki@ebi.ac.uk> Laurie, By two species, do you mean hybrid animals? That is the only case where there should be more than one species in EMBL enties: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3.4.7 Even in that case the OC line is there only for the first species. I am not guite sure what bioperl should return in that case. Returning two species objects sounds a bt excessive when the second one is not fully populated ... It is a long known problem that SWISS-PROT format allows multiple species per entry. Bioperl has been taking in only one; the first, I think. Could send us some EMBL accession numbers with two species, please, so that we could have a look. -Heikki P;S. These kind of long bug reports and file attachments go best into bioperl bugzilla: http://bugzilla.open-bio.org/. They are easier to manage there. Thanks, -H On Monday 12 Jul 2004 18:00, Laure.Durufle@serono.com wrote: > Hi, > > I noticed something : in the the package embl.pm, the method species > returns only the last organism : but in embl, one entry can belong to 2 > organisms. > I write a method get_species to obtain all organisms in RichSeq.pm and in > embl.pm, we add push @{$params{'-species'}},$species ; instead > $params{'-species'} = $species ; > > # $Id: RichSeq.pm,v 1.9 2002/11/11 18:16:31 lapp Exp $ > # > # BioPerl module for Bio::Seq::RichSeq > # > # Cared for by Ewan Birney > # > # Copyright Ewan Birney > # > # You may distribute this module under the same terms as perl itself > > # POD documentation - main docs before the code > > =head1 NAME > > Bio::Seq::RichSeq - Module implementing a sequence created from a rich > sequence database entry > > =head1 SYNOPSIS > > See Bio::Seq::RichSeqI and documentation of methods. > > =head1 DESCRIPTION > > This module implements Bio::Seq::RichSeqI, an interface for sequences > created from or created for entries from/of rich sequence databanks, > like EMBL, GenBank, and SwissProt. Methods added to the Bio::SeqI > interface therefore focus on databank-specific information. Note that > not every rich databank format may use all of the properties provided. > > =head1 Implemented Interfaces > > This class implementes the following interfaces. > > =over 4 > > =item Bio::Seq::RichSeqI > > Note that this includes implementing Bio::PrimarySeqI and Bio::SeqI. > > =item Bio::IdentifiableI > > =item Bio::DescribableI > > =item Bio::AnnotatableI > > =back > > =head1 FEEDBACK > > =head2 Mailing Lists > > User feedback is an integral part of the evolution of this > and other Bioperl modules. Send your comments and suggestions preferably > to one of the Bioperl mailing lists. > Your participation is much appreciated. > > bioperl-l@bioperl.org - General discussion > http://bio.perl.org/MailList.html - About the mailing lists > > =head2 Reporting Bugs > > Report bugs to the Bioperl bug tracking system to help us keep track > the bugs and their resolution. > Bug reports can be submitted via email or the web: > > bioperl-bugs@bio.perl.org > http://bugzilla.bioperl.org/ > > =head1 AUTHOR - Ewan Birney > > Email birney@ebi.ac.uk > > Describe contact details here > > =head1 APPENDIX > > The rest of the documentation details each of the object methods. Internal > methods are usually preceded with a _ > > =cut > > > # Let the code begin... > > > package Bio::Seq::RichSeq; > use vars qw($AUTOLOAD @ISA); > use strict; > > # Object preamble - inherits from Bio::Root::Object > > use Bio::Seq; > use Bio::Seq::RichSeqI; > use Data::Denter; > > @ISA = qw(Bio::Seq Bio::Seq::RichSeqI); > > > =head2 new > > Title : new > Usage : $seq = Bio::Seq::RichSeq->new( -seq => 'ATGGGGGTGGTGGTACCCT', > -id => 'human_id', > -accession_number => 'AL000012', > ); > > Function: Returns a new seq object from > basic constructors, being a string for the sequence > and strings for id and accession_number > Returns : a new Bio::Seq::RichSeq object > > =cut > > sub new { > # standard new call.. > my($caller,@args) = @_; > my $self = $caller->SUPER::new(@args); > > $self->{'_dates'} = []; > $self->{'_secondary_accession'} = []; > $self->{'_species'} = []; > > my ($dates, $xtra, $sv, > $keywords, $pid, $mol, > $division,$species ) = $self->_rearrange([qw(DATES > SECONDARY_ACCESSIONS > SEQ_VERSION > KEYWORDS > PID > MOLECULE > DIVISION > SPECIES > )], > @args); > defined $division && $self->division($division); > defined $mol && $self->molecule($mol); > defined $keywords && $self->keywords($keywords); > defined $sv && $self->seq_version($sv); > defined $pid && $self->pid($pid); > #defined $pid && $self->species($pid); > > if( defined $dates ) { > if( ref($dates) =~ /array/i ) { > foreach ( @$dates) { > $self->add_date($_); > } > } else { > $self->add_date($dates); > } > } > > if( defined $species ) { > if( ref($species) =~ /array/i ) { > foreach ( @$species) { > $self->add_species($_); > } > } else { > $self->add_species($species); > } > } > > > if( defined $xtra ) { > if( ref($xtra) =~ /array/i ) { > foreach ( @$xtra) { > $self->add_secondary_accession($_); > } > } else { > $self->add_secondary_accession($xtra); > } > } > > return $self; > } > > > =head2 division > > Title : division > Usage : $obj->division($newval) > Function: > Returns : value of division > Args : newvalue (optional) > > > =cut > > sub division { > my $obj = shift; > if( @_ ) { > my $value = shift; > $obj->{'_division'} = $value; > } > return $obj->{'_division'}; > > } > > =head2 molecule > > Title : molecule > Usage : $obj->molecule($newval) > Function: > Returns : type of molecule (DNA, mRNA) > Args : newvalue (optional) > > > =cut > > sub molecule { > my $obj = shift; > if( @_ ) { > my $value = shift; > $obj->{'_molecule'} = $value; > } > return $obj->{'_molecule'}; > > } > > > =head2 add_species > > Title : add_species > Usage : $self->add_species($species) > Function: adds a species > Example : > Returns : an array of such strings > Args : > > > =cut > > sub add_species { > my ($self,@species) = @_; > foreach my $dt ( @species ) { > push(@{$self->{'_species'}},$dt); > } > } > > =head2 get_species > > Title : get_species > Usage : > Function: > Example : > Returns : an array of strings > Args : > > > =cut > > sub get_species{ > my ($self) = @_; > return @{$self->{'_species'}}; > } > > > =head2 add_date > > Title : add_date > Usage : $self->add_date($datestr) > Function: adds a date > Example : > Returns : a date string or an array of such strings > Args : > > > =cut > > > > sub add_date { > my ($self,@dates) = @_; > foreach my $dt ( @dates ) { > push(@{$self->{'_dates'}},$dt); > } > } > > =head2 get_dates > > Title : get_dates > Usage : > Function: > Example : > Returns : an array of date strings > Args : > > > =cut > > sub get_dates{ > my ($self) = @_; > return @{$self->{'_dates'}}; > } > > > =head2 pid > > Title : pid > Usage : > Function: Get (and set, depending on the implementation) the PID property > for the sequence. > Example : > Returns : a string > Args : > > > =cut > > sub pid { > my ($self,$pid) = @_; > > if(defined($pid)) { > $self->{'_pid'} = $pid; > } > return $self->{'_pid'}; > } > > > =head2 accession > > Title : accession > Usage : $obj->accession($newval) > Function: Whilst the underlying sequence object does not > have an accession, so we need one here. > > In this implementation this is merely a synonym for > accession_number(). > Example : > Returns : value of accession > Args : newvalue (optional) > > > =cut > > sub accession { > my ($obj,@args) = @_; > return $obj->accession_number(@args); > } > > =head2 add_secondary_accession > > Title : add_secondary_accession > Usage : $self->add_domment($ref) > Function: adds a secondary_accession > Example : > Returns : > Args : a string or an array of strings > > > =cut > > sub add_secondary_accession { > my ($self) = shift; > foreach my $dt ( @_ ) { > push(@{$self->{'_secondary_accession'}},$dt); > } > } > > =head2 get_secondary_accessions > > Title : get_secondary_accessions > Usage : > Function: > Example : > Returns : An array of strings > Args : > > > =cut > > sub get_secondary_accessions{ > my ($self,@args) = @_; > return @{$self->{'_secondary_accession'}}; > } > > =head2 seq_version > > Title : seq_version > Usage : $obj->seq_version($newval) > Function: > Example : > Returns : value of seq_version > Args : newvalue (optional) > > > =cut > > sub seq_version{ > my ($obj,$value) = @_; > if( defined $value) { > $obj->{'_seq_version'} = $value; > } > return $obj->{'_seq_version'}; > > } > > > =head2 keywords > > Title : keywords > Usage : $obj->keywords($newval) > Function: > Returns : value of keywords (a string) > Args : newvalue (optional) (a string) > > > =cut > > sub keywords { > my $obj = shift; > if( @_ ) { > my $value = shift; > $obj->{'_keywords'} = $value; > } > return $obj->{'_keywords'}; > > } > > # > ## > ### Deprecated methods kept for ease of transtion > ## > # > > sub each_date { > my ($self) = @_; > $self->warn("Deprecated method... please use get_dates"); > return $self->get_dates; > } > > > sub each_secondary_accession { > my ($self) = @_; > $self->warn("each_secondary_accession - deprecated method. use > get_secondary_accessions"); > return $self->get_secondary_accessions; > > } > > sub sv { > my ($obj,$value) = @_; > $obj->warn("sv - deprecated method. use seq_version"); > $obj->seq_version($value); > } > > > 1; > > > > > Best regards > > Laure Durufle > > > > > *************************************************************************** >***************** S - This message contains confidential information and is > intended only for the individual named. If you are not the named addressee, > you should not disseminate, distribute or copy this e-mail. Please notify > the sender immediately by e-mail if you have received this e-mail by > mistake and delete this e-mail from your system. > e-mail transmission cannot be guaranteed to be secure or error-free as > information could be intercepted, corrupted, lost, destroyed, arrive late > or incomplete, or contain malware. The presence of this disclaimer is not a > proof that it was originated at Serono International S.A. or one of its > affiliates. Serono International S.A and its affiliates therefore do not > accept liability for any errors or omissions in the content of this > message, which arise as a result of e-mail transmission. If verification is > required, please request a hard-copy version. Serono International SA, > 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com. > *************************************************************************** >****************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason at cgt.duhs.duke.edu Tue Jul 13 09:47:49 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jul 13 09:49:52 2004 Subject: [Bioperl-l] new objects Message-ID: I added Bio::Tools::Run::Alignment::Muscle for running Bob Edgar's MUSCLE multiple sequence alignment application. The wrapper module is bare-bones and needs some more work to support all the features in the app. I'd love for someone else to help with that. I also added Jamie Hatfield and Cari Soderlund's code for parsing and manipulating FPC mapping data. This is in Bio::Map::(Clone|Contig|FPCMarker) and Bio::MapIO::fpc. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From cjm at fruitfly.org Mon Jul 12 23:18:59 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Wed Jul 14 01:02:18 2004 Subject: [Bioperl-l] Added sequence parsing code to Bio::Tools::GFF Message-ID: I have added sequence parsing code to the GFF parser; note that sequence data is only available in GFF3. It should now be possible to create a Bio::SeqIO::gff3 class, which would be a short wrapper to Bio::Tools::GFF. Most people would still want to use the Tools parser to parse on a per-feature basis, but the option of treating gff3 in a similar fashion to genbank/embl/chadoxml/etc via SeqIO would be there. According to the GFF3 spec the sequence data can come after or before the relevant features; this means that the parser has the potential to be a memory hog (but then the existing SeqIO classes already are with genbank whole-chromosome entries). I've included the new docs from the gff parser below; if people agree with this general means of handling sequence data then I'll go ahead and add a Bio::SeqIO::gff3 as well. =head1 GFF3 AND SEQUENCE DATA [added by cjm 2004/07/09] GFF3 supports sequence data; see http://song.sourceforge.net/gff3-jan04.shtml There are a number of ways to deal with this - If you call $gffio->ignore_sequence_data_toggle(1) prior to parsing the sequence data is ignored; this is useful if you just want the features. It avoids the memory overhead in building and caching sequences Alternatively, you can call either $gffio->get_all_seqs() Or $gffio->seq_id_by_h() At the B of parsing to get either a list or hashref of Bio::Seq objects (see the documentation for each of these methods) Note that these objects will not have the features attached - you have to do this yourself, OR call $gffio->features_attached_to_seqs_toggle(1) PRIOR to parsing; this will ensure that the Seqs have the features attached; ie you will then be able to call $seq->get_SeqFeatures(); And use Bio::SeqIO methods Note that auto-attaching the features to seqs will incur a higher memory overhead as the features must be cached until the sequence data is found =cut From cjm at fruitfly.org Tue Jul 13 12:42:24 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Wed Jul 14 01:02:20 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: <96FCF1F6-D4C6-11D8-9AA4-000A9577009E@pcbi.upenn.edu> Message-ID: On Tue, 13 Jul 2004, Aaron J. Mackey wrote: > > On Jul 12, 2004, at 11:30 PM, Chris Mungall wrote: > > > Added ability to parse sequence data in GFF3 - see NOTES section & > > email to bioperl list for details > > Great! > > > +If you call > > + > > + $gffio->ignore_sequence_data_toggle(1) > > + > > +prior to parsing the sequence data is ignored; this is useful if you > > +just want the features. It avoids the memory overhead in building and > > +caching sequences > > Maybe just $gffio->ignore_sequence(1) would be sufficient? We tend to > not add "_toggle" to every attribute; besides which "toggle" has the > semantics that every time you call it, the value switches. OK, will change > > +Alternatively, you can call either > > + > > + $gffio->get_all_seqs() > > Again, would $gffio->get_seqs() suffice? OK > > + $gffio->seq_id_by_h() > > Why have two separate APIs to get the same data? If you want to > provide a hashref instead of an array of seqs, use the calling context > of get_seqs() ... OK; I will keep get_id_by_h private (it's used within the module) > > +Note that these objects will not have the features attached - you have > > +to do this yourself, OR call > > + > > + $gffio->features_attached_to_seqs_toggle(1) > > > Again, $gffio->attach_features(1) seems sufficient ... OK; although one might be led to expect that the argument for a method with that name would be a list of SeqFeatures. Is the BP method name syntax enshrined anywhere, or is it more a general set of principles shared by the authors? > > +Note that auto-attaching the features to seqs will incur a higher > > +memory overhead as the features must be cached until the sequence data > > +is found > > Which would be the same if you "had to do this yourself". I think it's > fair that if a sequence is to have 100 features attached to it, that > those 100 features will require memory. There's no *extra* memory > overhead here, is there? Generally not, but the client app may wish to do something with the feature or seq and then immediately discard it - in which case the caching would not be required > > +=head1 TODO > > + > > +Make a Bio::SeqIO class specifically for GFF3 with sequence data > > This would lead to a much cleaner API, and could now easily be done via > your improvements to Bio::Tools::GFF OK, will add > As an aside, instead of reimplementing your own simple FASTA parser, is > it possible to pass along the Bio::Root::IO object to Bio::SeqIO::fasta > directly, and let it do the work? Hmm, when I wrote the parser I wrote it in such a way that the sequence data could be interspersed with the feature data. It seems that this is unneccessary, as the spec states that the sequence data must come at the very end of the file. So perhaps I should reeingineer it a bit so that it rejects anything that doesn't follow the spec. This makes it easier to use the FASTA parser. > Thanks, > > -Aaron > > From voisingreg at yahoo.fr Tue Jul 13 16:06:11 2004 From: voisingreg at yahoo.fr (=?iso-8859-1?q?gregory=20voisin?=) Date: Wed Jul 14 01:02:25 2004 Subject: [Bioperl-l] remote blast and problem of connection on ncbi Message-ID: <20040713200611.89415.qmail@web60406.mail.yahoo.com> hie, presentation: my name is greg , student in bioinformatics at UQAM (montreal)...and i am newbee in perl and bioperl. i discove with a lot pleasure the world of perl, but these problemes too my problem: i use a script to blast a sequence with remote blast .pm and after motified this module (ligne 168), i have an another problem and my research on google and list od discussion are unsuccessing.... when i run my script perl, i have got the message ...""500 can't connect to ncbi.nlm.nih.gov:80 this is my script use strict; use Bio::SearchIO; use Bio::Tools::Run::RemoteBlast; use Bio::Perl; use Bio::SeqIO; #ligne corrigeant le bug du a la ligne 168 dans remoteblast.pm #info de open-bio bug database $Bio::Tools::Run::RemoteBlast::RIDLINE = 'RID\s+=\s+(\S+)'; my @inputfiles = @ARGV; foreach my $inputfile (@inputfiles) { my $prog = 'blastn'; #my $db = 'gbEST'; my $e_val= '1e-10'; my @params = ( '-prog' => $prog, ' -data '=> 'est_mouse', ' -expect' => $e_val, ' -readmethod' => ' SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); my $v = 1; my $chemin ="C:\\Documents and Settings\\voisingreg\\Mes documents\\serveurFTP\\perl\\progPerl\\"; my $str = Bio::SeqIO->new('-file'=> $chemin . "$inputfile", '-format' => 'fasta'); $Bio::Tools::Run::RemoteBlast::HEADER{'DATABASE'} = 'est_mouse'; $Bio::Tools::Run::RemoteBlast::HEADER{'PROGRAM'} = 'blastn'; while (my $input = $str->next_seq()) { my $r = $factory->submit_blast($input); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5 ; } else { my $result = $rc->next_result(); #save the output #my $filename = $result->query_name()."\.out"; print "$inputfile"; $factory->save_output($chemin . "<$inputfile"); $factory->remove_rid($rid); } } } } } thanks for your help greg --------------------------------- Cr?ez gratuitement votre Yahoo! Mail avec 100 Mo de stockage ! Cr?ez votre Yahoo! Mail Dialoguez en direct avec vos amis gr?ce ? Yahoo! Messenger ! From s.paul at surrey.ac.uk Wed Jul 14 17:18:33 2004 From: s.paul at surrey.ac.uk (S.Paul) Date: Wed Jul 14 09:17:46 2004 Subject: [Bioperl-l] installation of bioperl-db References: <003901c46611$aed8d6a0$d46fe383@LTCEP1SP> <40EFF789.1010305@genetics.utah.edu> Message-ID: <02a201c469e8$1f14c300$d46fe383@LTCEP1SP> Thanks to all who helped me out in the installation --specially Barry, Jason and Hilmar. I downloaded the CVS for the latest bioperl-db and it worked. Actually, I read the documentation and found out that it indeed says about the installation in http://bioperl.org/DIST/. So it's my mistake for not having read it well. Barry, I couldnt access the link provided at U of Winnipeg -- I was getting cannot find the server error message. Maybe the server is down. Regards Sujoy Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk ----- Original Message ----- From: "Barry Moore" To: "S.Paul" Cc: "bioperl-l" Sent: Saturday, July 10, 2004 7:04 AM Subject: Re: [Bioperl-l] installation of bioperl-db > Sujoy- > > You can remove that repository you added. The problem is that you had > the wrong address, and there is nothing at that address. One other > problem that you will have is that there is one ppm file missing at the > BioPerl repository (that for installing GD). If you add Randy Kobes > repository at the University of Winnipeg also, you should have everthing > you need for a smooth installation. Try the following: > > rep add BioPerl http://bioperl.org/DIST/ > rep add uwinnipeg http://theoryx5.uwinnipeg.ca/ppms/ > install Bioperl-1.4 > > Barry > > S.Paul wrote: > > >Hi Everybody: > > > >I am trying to install bioperl-db-0.1 using Active State Perl on Windows 2000 but am getting the following error message: > > > >Error: Failed to download URL http://www.bioperl.org/Core/Latest/index.shtml/bio > >perldb.ppd: 404 Not Found > > > >This is what I did : > > > > > > > >>ppm > >> > >> > >rep add bioperldb http://www.bioperl.org/Core/Latest/index.shtml > >install bioperldb > > > >I would appreciate if somebody can help me in this regard. > > > >Thanks > > > >Sujoy Paul > >Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk > > > > > >------------------------------------------------------------------------ > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Barry Moore > Dept. of Human Genetics > University of Utah > Salt Lake City, UT > > ---------------------------------------------------------------------------- ---- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From amackey at pcbi.upenn.edu Wed Jul 14 10:37:30 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Jul 14 10:39:19 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: References: Message-ID: <5615DFC6-D5A3-11D8-A5C1-000A9577009E@pcbi.upenn.edu> On Jul 13, 2004, at 12:42 PM, Chris Mungall wrote: >>> + $gffio->features_attached_to_seqs_toggle(1) >> >> Again, $gffio->attach_features(1) seems sufficient ... > > OK; although one might be led to expect that the argument for a method > with that name would be a list of SeqFeatures. Is the BP method name > syntax enshrined anywhere, or is it more a general set of principles > shared by the authors? You're right. No, there is no shrine, I think it's like the difference between art and pornography; we know it when we see it. Perhaps "$gffio->features_attached_to_seqs(1)" is the necessarily-long-but-best answer? > So perhaps I should reeingineer it a bit so that it rejects anything > that > doesn't follow the spec. This makes it easier to use the FASTA parser. I think that'd be great, and should simplify matters a bit. Thanks again, -Aaron From bioinfo_j at yahoo.co.in Wed Jul 14 07:49:43 2004 From: bioinfo_j at yahoo.co.in (=?iso-8859-1?q?j=20janani?=) Date: Wed Jul 14 14:05:07 2004 Subject: [Bioperl-l] help us Message-ID: <20040714114943.27231.qmail@web8304.mail.in.yahoo.com> sir, we r planning to use a GUI in bioperl instead of using commandline for better accessment.kindly give us some basic ideas & ur suggestions on that. Yahoo! India Careers: Over 65,000 jobsonline. From pleo at mail.nih.gov Wed Jul 14 13:52:04 2004 From: pleo at mail.nih.gov (Leo, Paul (NIH/NHGRI)) Date: Wed Jul 14 14:05:09 2004 Subject: [Bioperl-l] Using a hash reference in the SeqFeature tag system Message-ID: <0E3E7E8F6E23DF4C8127A063568356B5084A2997@nihexchange12.nih.gov> Hi, I want to add additional information of a $seq object using the tag system. The tag value I want to add is a reference to a hash of hashes (though I don't think it worked for just a hash either). I thought that so long as the tag value was a scalar (i.e. a reference) then all would be ok? Perhaps I have done something silly ....and this is a simple perl programming mistake... But have been staring at this for too long! Can someone tell me why the example below fails? If so, any ideas on how to add this hash structure to a $seq object ?? use Bio::DB::GenPept; use Bio::SeqFeature::Generic; use Bio::Seq; use Bio::SeqIO; ######## make a hash of hashes ########### $Data{cage1}{mouse1}{legs}=1; $Data{cage1}{mouse1}{tails}=10; $Data{cage1}{mouse2}{legs}=2; $Data{cage1}{mouse2}{tails}=20; $Data{cage2}{mouse1}{legs}=3; $Data{cage2}{mouse1}{tails}=30; $Data{cage2}{mouse2}{legs}=4; $Data{cage2}{mouse2}{tails}=40; print $Data{cage1}{mouse2}{legs}, "\n"; #just a check :gives "2" print map "$_ ",keys(%{$Data{cage2}{mouse2}}); #just a check gives "legs tails" print "\n"; ######################################## #### make a hash reference and put in into tag ### $href=\%{$Data{cage1}{mouse1}}; #make ref here to a hash since used in tests below $seqextra=new Bio::SeqFeature::Generic ( -tag => { experiment_id => 1, Cage => $href }); ###############Sanity Checks... print "Ref ok : \n"; print map "$_ ",keys(%{$href}); #prints "legs tails" as expected print "\n"; print "Ref in a simple hash ok : \n"; $hashtest{zero}=$href; print map "$_ ",keys(%{$hashtest{zero}}); #prints "legs tails" as expected ############ This fails.... print "\n"; print "Tag in not ? \n"; print map "$_ ",keys(%{$seqextra->get_tag_values('Cage')}); #NOTHING! Any suggestions? Thanks in advance Paul From s.paul at surrey.ac.uk Wed Jul 14 17:35:51 2004 From: s.paul at surrey.ac.uk (S.Paul) Date: Wed Jul 14 14:05:13 2004 Subject: [Bioperl-l] Flat databases Message-ID: <02bb01c469ea$89be2460$d46fe383@LTCEP1SP> Hi Everybody: I was trying to follow the HowTo for Flat databases and try to run the following command: bp_bioflat_index.pl -c -l c:\research\perl\genbank -d g enbank -i flat -f genbank data/*.gbk I installed the bioflat_index.pls script and then tried to run the above command but am getting the following error message: ********************************************************************** C:\research\perl\genbank>bp_bioflat_index.pl -c -l c:\research\perl\genbank -d g enbank -i flat -f genbank data/*.gbk ------------- EXCEPTION ------------- MSG: Can't locate Bio/DB/Flat/Flat/genbank.pm in @INC (@INC contains: C:/Perl/li b C:/Perl/site/lib .) at (eval 4) line 2. BEGIN failed--compilation aborted at (eval 4) line 2. STACK Bio::DB::Flat::new C:/Perl/site/lib/Bio/DB/Flat.pm:140 STACK toplevel C:\research\perl\genbank\bp_bioflat_index.pl:89 ***************************************************************************** Thanks in advance for the help Sujoy Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk From lstein at cshl.edu Wed Jul 14 14:19:28 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Jul 14 14:21:57 2004 Subject: [Bioperl-l] Flat databases In-Reply-To: <02bb01c469ea$89be2460$d46fe383@LTCEP1SP> References: <02bb01c469ea$89be2460$d46fe383@LTCEP1SP> Message-ID: <200407141419.28614.lstein@cshl.edu> By the way, I advise that you use the Berkeley DB interface, if possible. The "flat" indexing scheme is quite slow. Lincoln On Wednesday 14 July 2004 05:35 pm, S.Paul wrote: > Hi Everybody: > > I was trying to follow the HowTo for Flat databases and try to run > the following command: bp_bioflat_index.pl -c -l > c:\research\perl\genbank -d g > enbank -i flat -f genbank data/*.gbk > > I installed the bioflat_index.pls script and then tried to run the > above command but am getting the following error message: > > ******************************************************************* >*** > > C:\research\perl\genbank>bp_bioflat_index.pl -c -l > c:\research\perl\genbank -d g enbank -i flat -f genbank data/*.gbk > > ------------- EXCEPTION ------------- > MSG: Can't locate Bio/DB/Flat/Flat/genbank.pm in @INC (@INC > contains: C:/Perl/li b C:/Perl/site/lib .) at (eval 4) line 2. > BEGIN failed--compilation aborted at (eval 4) line 2. > > STACK Bio::DB::Flat::new C:/Perl/site/lib/Bio/DB/Flat.pm:140 > STACK toplevel C:\research\perl\genbank\bp_bioflat_index.pl:89 > > ******************************************************************* >********** > > Thanks in advance for the help > > Sujoy > > > > Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Wed Jul 14 14:18:40 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Jul 14 14:22:05 2004 Subject: [Bioperl-l] Flat databases In-Reply-To: <02bb01c469ea$89be2460$d46fe383@LTCEP1SP> References: <02bb01c469ea$89be2460$d46fe383@LTCEP1SP> Message-ID: <200407141418.40020.lstein@cshl.edu> Seems to work OK for me. What version of bioperl are you using? Lincoln On Wednesday 14 July 2004 05:35 pm, S.Paul wrote: > Hi Everybody: > > I was trying to follow the HowTo for Flat databases and try to run > the following command: bp_bioflat_index.pl -c -l > c:\research\perl\genbank -d g > enbank -i flat -f genbank data/*.gbk > > I installed the bioflat_index.pls script and then tried to run the > above command but am getting the following error message: > > ******************************************************************* >*** > > C:\research\perl\genbank>bp_bioflat_index.pl -c -l > c:\research\perl\genbank -d g enbank -i flat -f genbank data/*.gbk > > ------------- EXCEPTION ------------- > MSG: Can't locate Bio/DB/Flat/Flat/genbank.pm in @INC (@INC > contains: C:/Perl/li b C:/Perl/site/lib .) at (eval 4) line 2. > BEGIN failed--compilation aborted at (eval 4) line 2. > > STACK Bio::DB::Flat::new C:/Perl/site/lib/Bio/DB/Flat.pm:140 > STACK toplevel C:\research\perl\genbank\bp_bioflat_index.pl:89 > > ******************************************************************* >********** > > Thanks in advance for the help > > Sujoy > > > > Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From jason at cgt.duhs.duke.edu Wed Jul 14 14:24:26 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jul 14 14:26:27 2004 Subject: [Bioperl-l] Using a hash reference in the SeqFeature tag system In-Reply-To: <0E3E7E8F6E23DF4C8127A063568356B5084A2997@nihexchange12.nih.gov> References: <0E3E7E8F6E23DF4C8127A063568356B5084A2997@nihexchange12.nih.gov> Message-ID: On Wed, 14 Jul 2004, Leo, Paul (NIH/NHGRI) wrote: > > Hi, > I want to add additional information of a $seq object using the tag system. > The tag value I want to add is a reference to a hash of hashes (though I > don't think it worked for just a hash either). > I thought that so long as the tag value was a scalar (i.e. a reference) then > all would be ok? > Perhaps I have done something silly ....and this is a simple perl > programming mistake... But have been staring at this for too long! > Can someone tell me why the example below fails? If so, any ideas on how to > add this hash structure to a $seq object ?? do you want them to be serializable (i.e. do you want to be able to write out your hashref in genbank format? Cause just adding a hashref won't really work correctly. > > > > use Bio::DB::GenPept; > use Bio::SeqFeature::Generic; > use Bio::Seq; > use Bio::SeqIO; > > > ######## make a hash of hashes ########### > $Data{cage1}{mouse1}{legs}=1; > $Data{cage1}{mouse1}{tails}=10; > > $Data{cage1}{mouse2}{legs}=2; > $Data{cage1}{mouse2}{tails}=20; > > $Data{cage2}{mouse1}{legs}=3; > $Data{cage2}{mouse1}{tails}=30; > > $Data{cage2}{mouse2}{legs}=4; > $Data{cage2}{mouse2}{tails}=40; > > print $Data{cage1}{mouse2}{legs}, "\n"; #just a check :gives "2" > print map "$_ ",keys(%{$Data{cage2}{mouse2}}); #just a check gives "legs > tails" > print "\n"; > ######################################## > > #### make a hash reference and put in into tag ### > $href=\%{$Data{cage1}{mouse1}}; #make ref here to a hash since used in tests > below > $seqextra=new Bio::SeqFeature::Generic ( -tag => { > experiment_id => 1, > Cage => $href }); > > ###############Sanity Checks... > print "Ref ok : \n"; > print map "$_ ",keys(%{$href}); #prints "legs tails" as expected I think your problem is all about context. get_tag_values returns an array even if there is only one tag value. You need to either tell perl you want to operate on the first value from that array or more simply grab that first value and then do things like calling keys. my $feat = Bio::SeqFeature::Generic->new(-tag => { 'Cage' => { 'one' => 'tail'}}); my ($val) = $feat->get_tag_values('Cage'); print join(" ", keys %$val), "\n"; $val->{'two'} = 'nose'; my ($val2) = $feat->get_tag_values('Cage'); # val got updated because we were operating on a hashref before print join(" ", keys %$val2), "\n"; n.b. you call add_tag_values and there is already an existing value for the tag, now there will be two values stored. > print "\n"; > > print "Ref in a simple hash ok : \n"; > $hashtest{zero}=$href; > print map "$_ ",keys(%{$hashtest{zero}}); #prints "legs tails" as expected > ############ This fails.... > print "\n"; > print "Tag in not ? \n"; > print map "$_ ",keys(%{$seqextra->get_tag_values('Cage')}); #NOTHING! > # try this my ($val) = $seqextra->get_tag_values('Cage'); print join(" ", keys %$val),"\n"; > Any suggestions? > Thanks in advance > Paul > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From lstein at cshl.edu Wed Jul 14 14:27:42 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Jul 14 14:30:05 2004 Subject: [Bioperl-l] Using a hash reference in the SeqFeature tag system In-Reply-To: <0E3E7E8F6E23DF4C8127A063568356B5084A2997@nihexchange12.nih.gov> References: <0E3E7E8F6E23DF4C8127A063568356B5084A2997@nihexchange12.nih.gov> Message-ID: <200407141427.42652.lstein@cshl.edu> get_tag_values returns a list! Your example will work if you do this: my ($cage) = $seqextra->get_tag_values('Cage'); # keep first element print keys %$cage; Lincoln On Wednesday 14 July 2004 01:52 pm, Leo, Paul (NIH/NHGRI) wrote: > Hi, > I want to add additional information of a $seq object using the tag > system. The tag value I want to add is a reference to a hash of > hashes (though I don't think it worked for just a hash either). > I thought that so long as the tag value was a scalar (i.e. a > reference) then all would be ok? > Perhaps I have done something silly ....and this is a simple perl > programming mistake... But have been staring at this for too long! > Can someone tell me why the example below fails? If so, any ideas > on how to add this hash structure to a $seq object ?? > > > > use Bio::DB::GenPept; > use Bio::SeqFeature::Generic; > use Bio::Seq; > use Bio::SeqIO; > > > ######## make a hash of hashes ########### > $Data{cage1}{mouse1}{legs}=1; > $Data{cage1}{mouse1}{tails}=10; > > $Data{cage1}{mouse2}{legs}=2; > $Data{cage1}{mouse2}{tails}=20; > > $Data{cage2}{mouse1}{legs}=3; > $Data{cage2}{mouse1}{tails}=30; > > $Data{cage2}{mouse2}{legs}=4; > $Data{cage2}{mouse2}{tails}=40; > > print $Data{cage1}{mouse2}{legs}, "\n"; #just a check :gives "2" > print map "$_ ",keys(%{$Data{cage2}{mouse2}}); #just a check gives > "legs tails" > print "\n"; > ######################################## > > #### make a hash reference and put in into tag ### > $href=\%{$Data{cage1}{mouse1}}; #make ref here to a hash since used > in tests below > $seqextra=new Bio::SeqFeature::Generic ( -tag => { > experiment_id => 1, > Cage => $href }); > > ###############Sanity Checks... > print "Ref ok : \n"; > print map "$_ ",keys(%{$href}); #prints "legs tails" as expected > print "\n"; > > print "Ref in a simple hash ok : \n"; > $hashtest{zero}=$href; > print map "$_ ",keys(%{$hashtest{zero}}); #prints "legs tails" as > expected ############ This fails.... > print "\n"; > print "Tag in not ? \n"; > print map "$_ ",keys(%{$seqextra->get_tag_values('Cage')}); > #NOTHING! > > Any suggestions? > Thanks in advance > Paul -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From lstein at cshl.edu Wed Jul 14 16:17:45 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Jul 14 16:21:05 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: <5615DFC6-D5A3-11D8-A5C1-000A9577009E@pcbi.upenn.edu> References: <5615DFC6-D5A3-11D8-A5C1-000A9577009E@pcbi.upenn.edu> Message-ID: <200407141617.45269.lstein@cshl.edu> I like the idea of the toggle, but we don't use the term "toggle" anywhere in bioperl right now. How about "allow" as in: $gffio->allow_features_attached_to_seqs(1) It won't be used more than once per script, so not much overhead in having a long name. Lincoln On Wednesday 14 July 2004 10:37 am, Aaron J. Mackey wrote: > On Jul 13, 2004, at 12:42 PM, Chris Mungall wrote: > >>> + $gffio->features_attached_to_seqs_toggle(1) > >> > >> Again, $gffio->attach_features(1) seems sufficient ... > > > > OK; although one might be led to expect that the argument for a > > method with that name would be a list of SeqFeatures. Is the BP > > method name syntax enshrined anywhere, or is it more a general > > set of principles shared by the authors? > > You're right. No, there is no shrine, I think it's like the > difference between art and pornography; we know it when we see it. > Perhaps "$gffio->features_attached_to_seqs(1)" is the > necessarily-long-but-best answer? > > > So perhaps I should reeingineer it a bit so that it rejects > > anything that > > doesn't follow the spec. This makes it easier to use the FASTA > > parser. > > I think that'd be great, and should simplify matters a bit. > > Thanks again, > > -Aaron > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From Wiepert.Mathieu at mayo.edu Wed Jul 14 15:41:41 2004 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Wed Jul 14 18:13:44 2004 Subject: [Bioperl-l] SeqFeatureCollection issue Message-ID: <2F41CC6C9777D311ACBD009027B108EA08C481BD@excsrv32.mayo.edu> Hi, I was trying to use the seqfeature collection to pull out features in a range I was interested in. I have two problems (maybe because I am loading features form a contig?) In the first case, I ended up running out of space on /var/tmp. We have about .5 GB there I am. Code is like my $in1 = Bio::SeqIO->new('-file' => $contig.'.gb' , '-format' => 'Genbank'); while (my $seq = $in1->next_seq) { my @feat_ary = $seq->get_SeqFeatures(); my $col = new Bio::SeqFeature::Collection(); # add these features to the object my $totaladded = $col->add_features(\@feat_ary); } I end up filling /var/tmp to 100%, as I said. So I tried to initialize the collection like my $col = new Bio::SeqFeature::Collection(-features => \@feat_ary); but that gave an error: "Can't call method "put" on an undefined value at /usr/local/biotools/perl/5.8.2/lib/site_perl/5.8.2/Bio/SeqFeature/Collection.pm line 225, line 95373." That looked like the _btree wasn't set, but not sure. I am told we have plenty of room in /tmp, so I should change my tmp dir, but the docs said that it was all in memory by default, is that not the case? I tried to export a new tmp dir, but that didn't fix the problem... -mat From dustin.cram at gmail.com Wed Jul 14 18:22:28 2004 From: dustin.cram at gmail.com (Dustin Cram) Date: Wed Jul 14 18:24:28 2004 Subject: [Bioperl-l] bp_bulk_load_gff.pl speed Message-ID: I recently started using Bio:DB:GFF, beginning by using bp_bulk_load_gff.pl to load a simple but large gff2 file. This file consisted only of transcripts and their subfeatures, so the group class of all features was "transcript". The files loaded with no problem and I was able to write a few successful test scripts. Now I have added new features (genes) to the gff file, and I attempted to load the new file exactly as before with bp_bulk_load_gff.pl, but now it takes _much_ longer to load, and takes more time the more features are added (the first 5K features take about 30 seconds, the next 5K features take nearly 2 minutes, and so on). It took over an hour to 50K features, at which point I stopped it. I've played around with the gff file a bit and found that anything that doesn't have a group class of "transcript" has this problem, for example if I 'sed s/transcript/foo/g' the original file it's slow, and if I 'sed s/gene/transcript/g' the new file it's fast. I have manually verified that the MySQL database is empty before each attempt and even wiped the tmp directory before each attempt. Any ideas why non-transcript features take so long? Thanks, Dustin Cram From amackey at pcbi.upenn.edu Wed Jul 14 19:10:39 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Jul 14 19:12:27 2004 Subject: [Bioperl-l] bp_bulk_load_gff.pl speed In-Reply-To: References: Message-ID: <05BCFEA1-D5EB-11D8-A5C1-000A9577009E@pcbi.upenn.edu> Aha, I'm *not* crazy! I've experienced exactly this same behavior (I ended up "solving" it by batching loading in blocks of 500, which worked fine until my database got very big such that the initial group loading got too slow). What's your mysql version, perl version (usemymalloc?), and OS? I think this is a perl hash/memory issue, but I'd love to solve it now that I know it's not just something stupid I'm doing wrong. -Aaron On Jul 14, 2004, at 6:22 PM, Dustin Cram wrote: > I recently started using Bio:DB:GFF, beginning by using > bp_bulk_load_gff.pl to load a simple but large gff2 file. This file > consisted only of transcripts and their subfeatures, so the group > class of all features was "transcript". The files loaded with no > problem and I was able to write a few successful test scripts. > > Now I have added new features (genes) to the gff file, and I > attempted to load the new file exactly as before with > bp_bulk_load_gff.pl, but now it takes _much_ longer to load, and takes > more time the more features are added (the first 5K features take > about 30 seconds, the next 5K features take nearly 2 minutes, and so > on). It took over an hour to 50K features, at which point I stopped > it. > > I've played around with the gff file a bit and found that anything > that doesn't have a group class of "transcript" has this problem, for > example if I 'sed s/transcript/foo/g' the original file it's slow, > and if I 'sed s/gene/transcript/g' the new file it's fast. I have > manually verified that the MySQL database is empty before each attempt > and even wiped the tmp directory before each attempt. > > Any ideas why non-transcript features take so long? > > Thanks, > > Dustin Cram > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From amackey at pcbi.upenn.edu Wed Jul 14 19:11:59 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Jul 14 19:13:40 2004 Subject: [Bioperl-l] bp_bulk_load_gff.pl speed In-Reply-To: References: Message-ID: <357C4E94-D5EB-11D8-A5C1-000A9577009E@pcbi.upenn.edu> On Jul 14, 2004, at 6:22 PM, Dustin Cram wrote: > Any ideas why non-transcript features take so long? Just to add, I find that match/HSP features go in very quickly, while everything else does not (including transcript features) ... -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From dustin.cram at gmail.com Wed Jul 14 19:51:21 2004 From: dustin.cram at gmail.com (Dustin Cram) Date: Wed Jul 14 19:53:14 2004 Subject: [Bioperl-l] bp_bulk_load_gff.pl speed In-Reply-To: <05BCFEA1-D5EB-11D8-A5C1-000A9577009E@pcbi.upenn.edu> References: <05BCFEA1-D5EB-11D8-A5C1-000A9577009E@pcbi.upenn.edu> Message-ID: Heh, I was sure I had to be missing something obvious too, glad to see someone else has noticed this. I'll have to wait till I go in to work tomorrow to check exact versions, but MySQL is 3.23.x, perl is 5.8.x, and OS is Redhat 9. Dustin Cram On Wed, 14 Jul 2004 19:10:39 -0400, Aaron J. Mackey wrote: > > Aha, I'm *not* crazy! I've experienced exactly this same behavior (I > ended up "solving" it by batching loading in blocks of 500, which > worked fine until my database got very big such that the initial group > loading got too slow). > > What's your mysql version, perl version (usemymalloc?), and OS? I > think this is a perl hash/memory issue, but I'd love to solve it now > that I know it's not just something stupid I'm doing wrong. > > -Aaron > > > > On Jul 14, 2004, at 6:22 PM, Dustin Cram wrote: > > > I recently started using Bio:DB:GFF, beginning by using > > bp_bulk_load_gff.pl to load a simple but large gff2 file. This file > > consisted only of transcripts and their subfeatures, so the group > > class of all features was "transcript". The files loaded with no > > problem and I was able to write a few successful test scripts. > > > > Now I have added new features (genes) to the gff file, and I > > attempted to load the new file exactly as before with > > bp_bulk_load_gff.pl, but now it takes _much_ longer to load, and takes > > more time the more features are added (the first 5K features take > > about 30 seconds, the next 5K features take nearly 2 minutes, and so > > on). It took over an hour to 50K features, at which point I stopped > > it. > > > > I've played around with the gff file a bit and found that anything > > that doesn't have a group class of "transcript" has this problem, for > > example if I 'sed s/transcript/foo/g' the original file it's slow, > > and if I 'sed s/gene/transcript/g' the new file it's fast. I have > > manually verified that the MySQL database is empty before each attempt > > and even wiped the tmp directory before each attempt. > > > > Any ideas why non-transcript features take so long? > > > > Thanks, > > > > Dustin Cram > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > From jason at cgt.duhs.duke.edu Wed Jul 14 21:58:03 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Jul 14 22:00:05 2004 Subject: [Bioperl-l] SeqFeatureCollection issue In-Reply-To: <2F41CC6C9777D311ACBD009027B108EA08C481BD@excsrv32.mayo.edu> References: <2F41CC6C9777D311ACBD009027B108EA08C481BD@excsrv32.mayo.edu> Message-ID: Did you try passing in a filename with -file => '/tmp/myfile.idx'? Title : new Usage : my $obj = new Bio::SeqFeature::Collection(); Function: Builds a new Bio::SeqFeature::Collection object Returns : Bio::SeqFeature::Collection Args : -minbin minimum value to use for binning (default is 100,000,000) -maxbin maximum value to use for binning (default is 1,000) -file filename to store/read the BTREE from rather than an in-memory structure (default is false and in-memory). -keep boolean, will not remove index file on object destruction. -features Array ref of features to add initially No idea where the /var/tmp is going... This *should* work but I haven't done much with it/used it for quite a while so I don't know if there are things that don't work... If it is really not working you can always go the -> to GFF -> load in Bio::DB::GFF route using the in-memory adaptor - I wanted to merge the interface so that SeqFeature::Collection used the same method names but never got around to it. If someone is using the module would be a nice thing to have... -jason On Wed, 14 Jul 2004, Wiepert, Mathieu wrote: > Hi, > > > > I was trying to use the seqfeature collection to pull out features in a range I was interested in. I have two problems (maybe because I am loading features form a contig?) > > > > In the first case, I ended up running out of space on /var/tmp. We have about .5 GB there I am. Code is like > > my $in1 = Bio::SeqIO->new('-file' => $contig.'.gb' , '-format' => 'Genbank'); > > while (my $seq = $in1->next_seq) { > > my @feat_ary = $seq->get_SeqFeatures(); > > my $col = new Bio::SeqFeature::Collection(); > > # add these features to the object > > my $totaladded = $col->add_features(\@feat_ary); > > } > > > > I end up filling /var/tmp to 100%, as I said. > > > > So I tried to initialize the collection like > > my $col = new Bio::SeqFeature::Collection(-features => \@feat_ary); > > > > but that gave an error: > > > > "Can't call method "put" on an undefined value at /usr/local/biotools/perl/5.8.2/lib/site_perl/5.8.2/Bio/SeqFeature/Collection.pm line 225, line 95373." > > > > That looked like the _btree wasn't set, but not sure. > > > > I am told we have plenty of room in /tmp, so I should change my tmp dir, but the docs said that it was all in memory by default, is that not the case? I tried to export a new tmp dir, but that didn't fix the problem... > > > > > > -mat > > > > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From Richard.Adams at ed.ac.uk Thu Jul 15 05:54:30 2004 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Thu Jul 15 05:56:24 2004 Subject: [Bioperl-l] Protein interaction modules Message-ID: <40F65456.4080607@ed.ac.uk> Hello BioPerlers, I'd like to bring to peoples' attention some modules for parsing and analyzing protein interaction data. They are in CVS under Bio/Graph and are : Bio/Graph/IO Bio/Graph/IO/dip Bio/Graph/IO/psi_xml are modules for reading /writing graph data and function in an analagous way to the SeqIO system. Bio/Graph/SimpleGraph Bio/Graph/SimpleGraph/traversal are generic graph modules written by Nat Goodman and provide functionality for traversing and building graphs. These are independent of BioPerl but are added here as they're not in CPAN yet. Bio/Graph/ProteinGraph Bio/Graph/Edge extend the SimpleGraph modules to deal with multiple sequence identifiers, duplicate edges and more complex data about the nature of an interaction. In this implementation, nodes are Bio::Seq objects. Interactions are represented by Bio/Graph/Edge objects These modules are very much biologically orientated, and are written with the following sort of tasks in mind: E.g., How can I annotate my sequences with interaction data? Which nodes cause the most disruption to the network if perturbed? What happens to network properties if a node is deleted? How can I merge 2 protein interaction data sets together, and find duplicate interactions? How can I calculate basic graph properties of my interaction data set ? e.g., density, clustering coefficient. code to demonstrate some of these tasks can be found in the Synopsis of Bio/Graph/ProteinGraph. There is test suite , t/protgraph.t which test most of the methods. I'd be very interested in feedback, ideas for what to include in a protein/DNA or protein/RNA interaction class, bugs etc. To use these modules you need: XML::Twig if you want to parse XML Clone Class::AutoClass - the SimpleGraph modules depend on this. The test suite tests for these modules. Obvious improvements are : AT present the XML parser just gets the basic interaction data, not the full dataset. An psi_xml writer. Richard -- Dr Richard Adams Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From sdavis2 at mail.nih.gov Thu Jul 15 07:13:32 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Jul 15 07:13:38 2004 Subject: [Bioperl-l] SRS querying Message-ID: <019FAA6E-D650-11D8-AB12-000A95D7BA10@mail.nih.gov> I know some aspects of bioperl web querying are built on SRS, but I was wondering if there is a general query engine for SRS available (like getz). It isn't hard to form the SRS queries using wgetz if that is what folks recommend, but just wanted to check. Thanks, Sean From Wiepert.Mathieu at mayo.edu Thu Jul 15 08:45:24 2004 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Thu Jul 15 08:47:40 2004 Subject: [Bioperl-l] SeqFeatureCollection issue Message-ID: <2F41CC6C9777D311ACBD009027B108EA08C481C9@excsrv32.mayo.edu> Hi, I did try that actually, that was the last thing I was doing, as I left last night. I thought it was going to work, but it didn't get far before I got "Out of memory!" again. It seems a contig with a file size of 38,914,775 bytes, which hat 619 features of type mRNA, CDA, or gene, creates a temp file of 18,520,702,976. SO that's 38 MB to 18 GB. Wow! Pulling a range out of that collections does take a bit of time too. Perhaps there is a better way to do this... I am just not sure where all the memory is getting eaten up, if you have an idea (large seq, something with that?) let me know. I made the temp file get created in a place that I know can hold it at least, and it is working (though I have a 100mb file, I am afraid what that one will do) Thanks for the input though, -mat > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: Wednesday, July 14, 2004 8:58 PM > To: Wiepert, Mathieu > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] SeqFeatureCollection issue > > Did you try passing in a filename with -file => '/tmp/myfile.idx'? > > Title : new > Usage : my $obj = new Bio::SeqFeature::Collection(); > Function: Builds a new Bio::SeqFeature::Collection object > Returns : Bio::SeqFeature::Collection > Args : > > -minbin minimum value to use for binning > (default is 100,000,000) > -maxbin maximum value to use for binning > (default is 1,000) > -file filename to store/read the > BTREE from rather than an in-memory structure > (default is false and in-memory). > -keep boolean, will not remove index file on > object destruction. > -features Array ref of features to add initially > > No idea where the /var/tmp is going... > > This *should* work but I haven't done much with it/used it for quite a > while so I don't know if there are things that don't work... > > If it is really not working you can always go the -> to GFF -> load in > Bio::DB::GFF route using the in-memory adaptor - I wanted to merge the > interface so that SeqFeature::Collection used the same method names but > never got around to it. If someone is using the module would be a nice > thing to have... > > -jason > > > On Wed, 14 Jul 2004, Wiepert, Mathieu wrote: > > > Hi, > > > > > > > > I was trying to use the seqfeature collection to pull out features in a > range I was interested in. I have two problems (maybe because I am > loading features form a contig?) > > > > > > > > In the first case, I ended up running out of space on /var/tmp. We have > about .5 GB there I am. Code is like > > > > my $in1 = Bio::SeqIO->new('-file' => $contig.'.gb' , '-format' => > 'Genbank'); > > > > while (my $seq = $in1->next_seq) { > > > > my @feat_ary = $seq->get_SeqFeatures(); > > > > my $col = new Bio::SeqFeature::Collection(); > > > > # add these features to the object > > > > my $totaladded = $col->add_features(\@feat_ary); > > > > } > > > > > > > > I end up filling /var/tmp to 100%, as I said. > > > > > > > > So I tried to initialize the collection like > > > > my $col = new Bio::SeqFeature::Collection(-features => \@feat_ary); > > > > > > > > but that gave an error: > > > > > > > > "Can't call method "put" on an undefined value at > /usr/local/biotools/perl/5.8.2/lib/site_perl/5.8.2/Bio/SeqFeature/Collecti > on.pm line 225, line 95373." > > > > > > > > That looked like the _btree wasn't set, but not sure. > > > > > > > > I am told we have plenty of room in /tmp, so I should change my tmp dir, > but the docs said that it was all in memory by default, is that not the > case? I tried to export a new tmp dir, but that didn't fix the problem... > > > > > > > > > > > > -mat > > > > > > > > > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Thu Jul 15 10:19:23 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 15 10:21:25 2004 Subject: [Bioperl-l] SeqFeatureCollection issue In-Reply-To: <2F41CC6C9777D311ACBD009027B108EA08C481C9@excsrv32.mayo.edu> References: <2F41CC6C9777D311ACBD009027B108EA08C481C9@excsrv32.mayo.edu> Message-ID: I suspect it has something to do with freeze/thaw and the large attached contig sequence which is also getting frozen for each feature. If you call $feature->{'_gsf_seq'} = undef; on each feature (sorry no one wrote an 'unattach_seq' method) before it gets added that might help. -jason On Thu, 15 Jul 2004, Wiepert, Mathieu wrote: > Hi, > > I did try that actually, that was the last thing I was doing, as I left > last night. I thought it was going to work, but it didn't get far > before I got "Out of memory!" again. It seems a contig with a file size > of 38,914,775 bytes, which hat 619 features of type mRNA, CDA, or gene, > creates a temp file of 18,520,702,976. SO that's 38 MB to 18 GB. Wow! > Pulling a range out of that collections does take a bit of time too. > Perhaps there is a better way to do this... > > I am just not sure where all the memory is getting eaten up, if you have > an idea (large seq, something with that?) let me know. I made the temp > file get created in a place that I know can hold it at least, and it is > working (though I have a 100mb file, I am afraid what that one will do) > > Thanks for the input though, > > -mat > > > -----Original Message----- > > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > > Sent: Wednesday, July 14, 2004 8:58 PM > > To: Wiepert, Mathieu > > Cc: bioperl-l@portal.open-bio.org > > Subject: Re: [Bioperl-l] SeqFeatureCollection issue > > > > Did you try passing in a filename with -file => '/tmp/myfile.idx'? > > > > Title : new > > Usage : my $obj = new Bio::SeqFeature::Collection(); > > Function: Builds a new Bio::SeqFeature::Collection object > > Returns : Bio::SeqFeature::Collection > > Args : > > > > -minbin minimum value to use for binning > > (default is 100,000,000) > > -maxbin maximum value to use for binning > > (default is 1,000) > > -file filename to store/read the > > BTREE from rather than an in-memory structure > > (default is false and in-memory). > > -keep boolean, will not remove index file on > > object destruction. > > -features Array ref of features to add initially > > > > No idea where the /var/tmp is going... > > > > This *should* work but I haven't done much with it/used it for quite a > > while so I don't know if there are things that don't work... > > > > If it is really not working you can always go the -> to GFF -> load in > > Bio::DB::GFF route using the in-memory adaptor - I wanted to merge the > > interface so that SeqFeature::Collection used the same method names but > > never got around to it. If someone is using the module would be a nice > > thing to have... > > > > -jason > > > > > > On Wed, 14 Jul 2004, Wiepert, Mathieu wrote: > > > > > Hi, > > > > > > > > > > > > I was trying to use the seqfeature collection to pull out features in a > > range I was interested in. I have two problems (maybe because I am > > loading features form a contig?) > > > > > > > > > > > > In the first case, I ended up running out of space on /var/tmp. We have > > about .5 GB there I am. Code is like > > > > > > my $in1 = Bio::SeqIO->new('-file' => $contig.'.gb' , '-format' => > > 'Genbank'); > > > > > > while (my $seq = $in1->next_seq) { > > > > > > my @feat_ary = $seq->get_SeqFeatures(); > > > > > > my $col = new Bio::SeqFeature::Collection(); > > > > > > # add these features to the object > > > > > > my $totaladded = $col->add_features(\@feat_ary); > > > > > > } > > > > > > > > > > > > I end up filling /var/tmp to 100%, as I said. > > > > > > > > > > > > So I tried to initialize the collection like > > > > > > my $col = new Bio::SeqFeature::Collection(-features => \@feat_ary); > > > > > > > > > > > > but that gave an error: > > > > > > > > > > > > "Can't call method "put" on an undefined value at > > /usr/local/biotools/perl/5.8.2/lib/site_perl/5.8.2/Bio/SeqFeature/Collecti > > on.pm line 225, line 95373." > > > > > > > > > > > > That looked like the _btree wasn't set, but not sure. > > > > > > > > > > > > I am told we have plenty of room in /tmp, so I should change my tmp dir, > > but the docs said that it was all in memory by default, is that not the > > case? I tried to export a new tmp dir, but that didn't fix the problem... > > > > > > > > > > > > > > > > > > -mat > > > > > > > > > > > > > > > > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From Wiepert.Mathieu at mayo.edu Thu Jul 15 13:40:21 2004 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Thu Jul 15 13:42:20 2004 Subject: [Bioperl-l] SeqFeatureCollection issue Message-ID: <2F41CC6C9777D311ACBD009027B108EA08C481D9@excsrv32.mayo.edu> Hi, Just a question on that large sequence that gets attached. How many times *does* it get attached? I still wondering how 38MB of data gets the hefty weight of an 18 GB file. I will try your suggestion, I think that will work, Thanks, -mat > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: Thursday, July 15, 2004 9:19 AM > To: Wiepert, Mathieu > Cc: bioperl-l@portal.open-bio.org > Subject: RE: [Bioperl-l] SeqFeatureCollection issue > > I suspect it has something to do with freeze/thaw and the large attached > contig sequence which is also getting frozen for each feature. > > If you call > $feature->{'_gsf_seq'} = undef; > on each feature (sorry no one wrote an 'unattach_seq' method) before it > gets added that might help. > > -jason > On Thu, 15 Jul 2004, Wiepert, Mathieu wrote: > > > Hi, > > > > I did try that actually, that was the last thing I was doing, as I left > > last night. I thought it was going to work, but it didn't get far > > before I got "Out of memory!" again. It seems a contig with a file size > > of 38,914,775 bytes, which hat 619 features of type mRNA, CDA, or gene, > > creates a temp file of 18,520,702,976. SO that's 38 MB to 18 GB. Wow! > > Pulling a range out of that collections does take a bit of time too. > > Perhaps there is a better way to do this... > > > > I am just not sure where all the memory is getting eaten up, if you have > > an idea (large seq, something with that?) let me know. I made the temp > > file get created in a place that I know can hold it at least, and it is > > working (though I have a 100mb file, I am afraid what that one will do) > > > > Thanks for the input though, > > > > -mat > > > > > -----Original Message----- > > > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > > > Sent: Wednesday, July 14, 2004 8:58 PM > > > To: Wiepert, Mathieu > > > Cc: bioperl-l@portal.open-bio.org > > > Subject: Re: [Bioperl-l] SeqFeatureCollection issue > > > > > > Did you try passing in a filename with -file => '/tmp/myfile.idx'? > > > > > > Title : new > > > Usage : my $obj = new Bio::SeqFeature::Collection(); > > > Function: Builds a new Bio::SeqFeature::Collection object > > > Returns : Bio::SeqFeature::Collection > > > Args : > > > > > > -minbin minimum value to use for binning > > > (default is 100,000,000) > > > -maxbin maximum value to use for binning > > > (default is 1,000) > > > -file filename to store/read the > > > BTREE from rather than an in-memory > structure > > > (default is false and in-memory). > > > -keep boolean, will not remove index file on > > > object destruction. > > > -features Array ref of features to add initially > > > > > > No idea where the /var/tmp is going... > > > > > > This *should* work but I haven't done much with it/used it for quite a > > > while so I don't know if there are things that don't work... > > > > > > If it is really not working you can always go the -> to GFF -> load in > > > Bio::DB::GFF route using the in-memory adaptor - I wanted to merge the > > > interface so that SeqFeature::Collection used the same method names > but > > > never got around to it. If someone is using the module would be a > nice > > > thing to have... > > > > > > -jason > > > > > > > > > On Wed, 14 Jul 2004, Wiepert, Mathieu wrote: > > > > > > > Hi, > > > > > > > > > > > > > > > > I was trying to use the seqfeature collection to pull out features > in a > > > range I was interested in. I have two problems (maybe because I am > > > loading features form a contig?) > > > > > > > > > > > > > > > > In the first case, I ended up running out of space on /var/tmp. We > have > > > about .5 GB there I am. Code is like > > > > > > > > my $in1 = Bio::SeqIO->new('-file' => $contig.'.gb' , '-format' => > > > 'Genbank'); > > > > > > > > while (my $seq = $in1->next_seq) { > > > > > > > > my @feat_ary = $seq->get_SeqFeatures(); > > > > > > > > my $col = new Bio::SeqFeature::Collection(); > > > > > > > > # add these features to the object > > > > > > > > my $totaladded = $col->add_features(\@feat_ary); > > > > > > > > } > > > > > > > > > > > > > > > > I end up filling /var/tmp to 100%, as I said. > > > > > > > > > > > > > > > > So I tried to initialize the collection like > > > > > > > > my $col = new Bio::SeqFeature::Collection(-features => \@feat_ary); > > > > > > > > > > > > > > > > but that gave an error: > > > > > > > > > > > > > > > > "Can't call method "put" on an undefined value at > > > > /usr/local/biotools/perl/5.8.2/lib/site_perl/5.8.2/Bio/SeqFeature/Collecti > > > on.pm line 225, line 95373." > > > > > > > > > > > > > > > > That looked like the _btree wasn't set, but not sure. > > > > > > > > > > > > > > > > I am told we have plenty of room in /tmp, so I should change my tmp > dir, > > > but the docs said that it was all in memory by default, is that not > the > > > case? I tried to export a new tmp dir, but that didn't fix the > problem... > > > > > > > > > > > > > > > > > > > > > > > > -mat > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Jason Stajich > > > Duke University > > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Thu Jul 15 14:17:01 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 15 14:19:14 2004 Subject: [Bioperl-l] SeqFeatureCollection issue In-Reply-To: <2F41CC6C9777D311ACBD009027B108EA08C481D9@excsrv32.mayo.edu> References: <2F41CC6C9777D311ACBD009027B108EA08C481D9@excsrv32.mayo.edu> Message-ID: Well each feature has a reference to the original contig - this gets set when a feature is added to a sequence in $seq->add_SeqFeature($f) (which in turn calls attach_seq on the feature). In memory this is fine since we're just talking references - presumably Storable tries to follow the reference and store the sequence as well when freeze is called. I had some tests which instead of using Storable used $feature->gff_string to "serialize" the feature object - but this didn't seem to work so well and wouldn't of course allow Bio::RangeI objects to also be passed in. Most of my tests had centered around building feature sets and reading them in/out to GFF so I probably never saw this because I wasn't getting my features from sequence objects. Clearly a problem though if you are experiencing the behavior you are seeing. Let me know if it works. -jason On Thu, 15 Jul 2004, Wiepert, Mathieu wrote: > Hi, > > Just a question on that large sequence that gets attached. How many > times *does* it get attached? I still wondering how 38MB of data gets > the hefty weight of an 18 GB file. > > I will try your suggestion, I think that will work, > > Thanks, > > -mat > > > -----Original Message----- > > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > > Sent: Thursday, July 15, 2004 9:19 AM > > To: Wiepert, Mathieu > > Cc: bioperl-l@portal.open-bio.org > > Subject: RE: [Bioperl-l] SeqFeatureCollection issue > > > > I suspect it has something to do with freeze/thaw and the large attached > > contig sequence which is also getting frozen for each feature. > > > > If you call > > $feature->{'_gsf_seq'} = undef; > > on each feature (sorry no one wrote an 'unattach_seq' method) before it > > gets added that might help. > > > > -jason > > On Thu, 15 Jul 2004, Wiepert, Mathieu wrote: > > > > > Hi, > > > > > > I did try that actually, that was the last thing I was doing, as I left > > > last night. I thought it was going to work, but it didn't get far > > > before I got "Out of memory!" again. It seems a contig with a file size > > > of 38,914,775 bytes, which hat 619 features of type mRNA, CDA, or gene, > > > creates a temp file of 18,520,702,976. SO that's 38 MB to 18 GB. Wow! > > > Pulling a range out of that collections does take a bit of time too. > > > Perhaps there is a better way to do this... > > > > > > I am just not sure where all the memory is getting eaten up, if you have > > > an idea (large seq, something with that?) let me know. I made the temp > > > file get created in a place that I know can hold it at least, and it is > > > working (though I have a 100mb file, I am afraid what that one will do) > > > > > > Thanks for the input though, > > > > > > -mat > > > > > > > -----Original Message----- > > > > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > > > > Sent: Wednesday, July 14, 2004 8:58 PM > > > > To: Wiepert, Mathieu > > > > Cc: bioperl-l@portal.open-bio.org > > > > Subject: Re: [Bioperl-l] SeqFeatureCollection issue > > > > > > > > Did you try passing in a filename with -file => '/tmp/myfile.idx'? > > > > > > > > Title : new > > > > Usage : my $obj = new Bio::SeqFeature::Collection(); > > > > Function: Builds a new Bio::SeqFeature::Collection object > > > > Returns : Bio::SeqFeature::Collection > > > > Args : > > > > > > > > -minbin minimum value to use for binning > > > > (default is 100,000,000) > > > > -maxbin maximum value to use for binning > > > > (default is 1,000) > > > > -file filename to store/read the > > > > BTREE from rather than an in-memory > > structure > > > > (default is false and in-memory). > > > > -keep boolean, will not remove index file on > > > > object destruction. > > > > -features Array ref of features to add initially > > > > > > > > No idea where the /var/tmp is going... > > > > > > > > This *should* work but I haven't done much with it/used it for quite a > > > > while so I don't know if there are things that don't work... > > > > > > > > If it is really not working you can always go the -> to GFF -> load in > > > > Bio::DB::GFF route using the in-memory adaptor - I wanted to merge the > > > > interface so that SeqFeature::Collection used the same method names > > but > > > > never got around to it. If someone is using the module would be a > > nice > > > > thing to have... > > > > > > > > -jason > > > > > > > > > > > > On Wed, 14 Jul 2004, Wiepert, Mathieu wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I was trying to use the seqfeature collection to pull out features > > in a > > > > range I was interested in. I have two problems (maybe because I am > > > > loading features form a contig?) > > > > > > > > > > > > > > > > > > > > In the first case, I ended up running out of space on /var/tmp. We > > have > > > > about .5 GB there I am. Code is like > > > > > > > > > > my $in1 = Bio::SeqIO->new('-file' => $contig.'.gb' , '-format' => > > > > 'Genbank'); > > > > > > > > > > while (my $seq = $in1->next_seq) { > > > > > > > > > > my @feat_ary = $seq->get_SeqFeatures(); > > > > > > > > > > my $col = new Bio::SeqFeature::Collection(); > > > > > > > > > > # add these features to the object > > > > > > > > > > my $totaladded = $col->add_features(\@feat_ary); > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > I end up filling /var/tmp to 100%, as I said. > > > > > > > > > > > > > > > > > > > > So I tried to initialize the collection like > > > > > > > > > > my $col = new Bio::SeqFeature::Collection(-features => \@feat_ary); > > > > > > > > > > > > > > > > > > > > but that gave an error: > > > > > > > > > > > > > > > > > > > > "Can't call method "put" on an undefined value at > > > > > > /usr/local/biotools/perl/5.8.2/lib/site_perl/5.8.2/Bio/SeqFeature/Collecti > > > > on.pm line 225, line 95373." > > > > > > > > > > > > > > > > > > > > That looked like the _btree wasn't set, but not sure. > > > > > > > > > > > > > > > > > > > > I am told we have plenty of room in /tmp, so I should change my tmp > > dir, > > > > but the docs said that it was all in memory by default, is that not > > the > > > > case? I tried to export a new tmp dir, but that didn't fix the > > problem... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -mat > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Jason Stajich > > > > Duke University > > > > jason at cgt.mc.duke.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From tewang at uci.edu Thu Jul 15 17:31:42 2004 From: tewang at uci.edu (Eric Wang) Date: Thu Jul 15 16:33:46 2004 Subject: [Bioperl-l] obtaining subsequences Message-ID: <1089923502.smmsdV1.1.2@smtp.uci.edu> Hi All, I am wondering if bioperl has a function that allows the retrieval of subsequence from a large flat sequence file. I hate to do it in perl with the OPEN function since it stores everything in memory when I only need a subsequence. I am sure it's been done in the past. Any pointer is appreciated! Eric Eric T. Wang Graduate Student University of California, Irvine Molecular biology, Genetics, and Biochemistry NIH Bioinformatics Predoctoral Trainee tewang@uci.edu 949-824-1870 From cain at cshl.edu Thu Jul 15 16:36:36 2004 From: cain at cshl.edu (Scott Cain) Date: Thu Jul 15 16:38:29 2004 Subject: [Bioperl-l] bp_bulk_load_gff.pl speed In-Reply-To: <200407151745.i6FHj3Ks008770@portal.open-bio.org> References: <200407151745.i6FHj3Ks008770@portal.open-bio.org> Message-ID: <1089923738.1605.9.camel@localhost.localdomain> Dustin, Besides Aaron, a few other people have complained about this, and yes, I had written them off as crazy :-) Since I can't reproduce this problem, I'll have to ask you: is the problem that the files are not being written to /usr/tmp (or where ever) as quickly as before, or is it that, after the files are done being written, they aren't loaded into mysql as quickly? Not that I have a solution to either problem, but the first is presumably a perl problem and the second a mysql problem. If it were the latter (which I kind of doubt), you could get around it by using a real database, like PostgreSQL. Scott On Thu, 2004-07-15 at 13:45, bioperl-l-request@portal.open-bio.org wrote: > > I recently started using Bio:DB:GFF, beginning by using > bp_bulk_load_gff.pl to load a simple but large gff2 file. This file > consisted only of transcripts and their subfeatures, so the group > class of all features was "transcript". The files loaded with no > problem and I was able to write a few successful test scripts. > > Now I have added new features (genes) to the gff file, and I > attempted to load the new file exactly as before with > bp_bulk_load_gff.pl, but now it takes _much_ longer to load, and takes > more time the more features are added (the first 5K features take > about 30 seconds, the next 5K features take nearly 2 minutes, and so > on). It took over an hour to 50K features, at which point I stopped > it. > > I've played around with the gff file a bit and found that anything > that doesn't have a group class of "transcript" has this problem, for > example if I 'sed s/transcript/foo/g' the original file it's slow, > and if I 'sed s/gene/transcript/g' the new file it's fast. I have > manually verified that the MySQL database is empty before each attempt > and even wiped the tmp directory before each attempt. > > Any ideas why non-transcript features take so long? > > Thanks, > > Dustin Cram -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From amackey at pcbi.upenn.edu Thu Jul 15 17:11:02 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Jul 15 17:13:18 2004 Subject: [Bioperl-l] bp_bulk_load_gff.pl speed In-Reply-To: <1089923738.1605.9.camel@localhost.localdomain> References: <200407151745.i6FHj3Ks008770@portal.open-bio.org> <1089923738.1605.9.camel@localhost.localdomain> Message-ID: <7A0BADF8-D6A3-11D8-B323-000A9577009E@pcbi.upenn.edu> I've benchmarked it a bit: the slowdown is happening in both of these lines: $FH{ FGROUP() }->print( join("\t",$gid,$group_class,$group_name),"\n" ) unless $DONE{"fgroup$;$gid"}++; $FH{ FTYPE() }->print( join("\t",$ftypeid,$method,$source),"\n" ) unless $DONE{"ftype$;$ftypeid"}++; What I need to do is break this up to see if it's the $DONE{} lookup that's slowing down (a Perl problem) or returning from the print() (because the pipe is blocked by MySQL being slow on the insert). This hasn't percolated up my TODO list quite yet, so I'd be happy for someone else to chime in ... My guess is that there's something diabolical about the %DONE hash (need to take a look at it's bucket structure) that fgroup1010101 and fgroup1010102 are colliding. Also, since the print() should only happen once per feature type, %DONE is the likely suspect. -Aaron On Jul 15, 2004, at 4:36 PM, Scott Cain wrote: > Dustin, > > Besides Aaron, a few other people have complained about this, and yes, > I > had written them off as crazy :-) > > Since I can't reproduce this problem, I'll have to ask you: is the > problem that the files are not being written to /usr/tmp (or where > ever) > as quickly as before, or is it that, after the files are done being > written, they aren't loaded into mysql as quickly? Not that I have a > solution to either problem, but the first is presumably a perl problem > and the second a mysql problem. If it were the latter (which I kind of > doubt), you could get around it by using a real database, like > PostgreSQL. > > Scott > > > On Thu, 2004-07-15 at 13:45, bioperl-l-request@portal.open-bio.org > wrote: >> >> I recently started using Bio:DB:GFF, beginning by using >> bp_bulk_load_gff.pl to load a simple but large gff2 file. This file >> consisted only of transcripts and their subfeatures, so the group >> class of all features was "transcript". The files loaded with no >> problem and I was able to write a few successful test scripts. >> >> Now I have added new features (genes) to the gff file, and I >> attempted to load the new file exactly as before with >> bp_bulk_load_gff.pl, but now it takes _much_ longer to load, and takes >> more time the more features are added (the first 5K features take >> about 30 seconds, the next 5K features take nearly 2 minutes, and so >> on). It took over an hour to 50K features, at which point I stopped >> it. >> >> I've played around with the gff file a bit and found that anything >> that doesn't have a group class of "transcript" has this problem, for >> example if I 'sed s/transcript/foo/g' the original file it's slow, >> and if I 'sed s/gene/transcript/g' the new file it's fast. I have >> manually verified that the MySQL database is empty before each attempt >> and even wiped the tmp directory before each attempt. >> >> Any ideas why non-transcript features take so long? >> >> Thanks, >> >> Dustin Cram > > -- > ----------------------------------------------------------------------- > - > Scott Cain, Ph. D. > cain@cshl.org > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From dustin.cram at gmail.com Thu Jul 15 17:30:23 2004 From: dustin.cram at gmail.com (Dustin Cram) Date: Thu Jul 15 17:32:17 2004 Subject: [Bioperl-l] bp_bulk_load_gff.pl speed In-Reply-To: <1089923738.1605.9.camel@localhost.localdomain> References: <200407151745.i6FHj3Ks008770@portal.open-bio.org> <1089923738.1605.9.camel@localhost.localdomain> Message-ID: Well, I think I've traced my problem to a bug in Bio::DB::GFF->_split_gff2_group that only existed for a while in CVS. I had assumed that release 1.4 was installed at our site, but it turns out that it was a cvs for shortly after the 1.4 release. The revision of Bio::DB::GFF.pm with the problem is 1.105 (maybe others too). It looks to me like $self->preferred_groups is being appended to with ("Sequence",Transcript") for every call of the method, so as time goes by the array gets huge, with just those elements repeated over and over. That is why only my non-transcript features had problems - the entire array was searched unsuccessfully for each feature. I've grabbed the latest CVS and it seems to work fine. Although I haven't tried 1.4 release, I think it should work too. If this isn't the problem for other folk, then I guess they're still just crazy :). Thanks, Dustin On Thu, 15 Jul 2004 16:36:36 -0400, Scott Cain wrote: > Dustin, > > Besides Aaron, a few other people have complained about this, and yes, I > had written them off as crazy :-) > > Since I can't reproduce this problem, I'll have to ask you: is the > problem that the files are not being written to /usr/tmp (or where ever) > as quickly as before, or is it that, after the files are done being > written, they aren't loaded into mysql as quickly? Not that I have a > solution to either problem, but the first is presumably a perl problem > and the second a mysql problem. If it were the latter (which I kind of > doubt), you could get around it by using a real database, like > PostgreSQL. > > Scott > > On Thu, 2004-07-15 at 13:45, bioperl-l-request@portal.open-bio.org > wrote: > > > > I recently started using Bio:DB:GFF, beginning by using > > bp_bulk_load_gff.pl to load a simple but large gff2 file. This file > > consisted only of transcripts and their subfeatures, so the group > > class of all features was "transcript". The files loaded with no > > problem and I was able to write a few successful test scripts. > > > > Now I have added new features (genes) to the gff file, and I > > attempted to load the new file exactly as before with > > bp_bulk_load_gff.pl, but now it takes _much_ longer to load, and takes > > more time the more features are added (the first 5K features take > > about 30 seconds, the next 5K features take nearly 2 minutes, and so > > on). It took over an hour to 50K features, at which point I stopped > > it. > > > > I've played around with the gff file a bit and found that anything > > that doesn't have a group class of "transcript" has this problem, for > > example if I 'sed s/transcript/foo/g' the original file it's slow, > > and if I 'sed s/gene/transcript/g' the new file it's fast. I have > > manually verified that the MySQL database is empty before each attempt > > and even wiped the tmp directory before each attempt. > > > > Any ideas why non-transcript features take so long? > > > > Thanks, > > > > Dustin Cram > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain@cshl.org > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Nathan.Agrin at umassmed.edu Thu Jul 15 13:12:40 2004 From: Nathan.Agrin at umassmed.edu (Agrin, Nathan) Date: Thu Jul 15 17:53:10 2004 Subject: [Bioperl-l] Bio::Graphics Description layout question Message-ID: <89AA811FD79DC94788093B23DA79E71F0184D044@edunivmail02.ad.umassmed.edu> Is there anyway to get the bioperl graphics package to draw the descriptions for the tracks to the right of the tracks? I've seen something like this done with a browser developed by JGI. Here is the link to an example: http://genome.jgi-psf.org/cgi-bin/dispGeneModel?db=chlre2 &id=157911 Thanks, Nate From Weike.Xin at channelinx.com Thu Jul 15 11:31:49 2004 From: Weike.Xin at channelinx.com (Victor Weike Xin (Houston)) Date: Thu Jul 15 17:53:16 2004 Subject: [Bioperl-l] tempdir() issues in regular perl. Message-ID: Hi Jason, I have got the following error message: Error in tempdir() using /tmp/XXXXXXXXXX: Could not create directory /tmp/qpFBwIo1sH: Too many links at ./Archive.pl line 684 use File::Temp qw/ tempdir tempfile /; is used. Code at line 684: my $dir = tempdir(CLEANUP=>1); O/S redhat liunx.7.3 Langauage: regular perl Can I install bioperl-1.4.tar.gz and will it confilict with regular perl? Thank you for your help. Weike Xin From Steven.Roels at mpi.com Thu Jul 15 18:57:16 2004 From: Steven.Roels at mpi.com (Roels, Steven) Date: Thu Jul 15 19:00:00 2004 Subject: [Bioperl-l] Bio::Graphics Description layout question Message-ID: Nathan, I believe you need: -key_style=>'right' as an option to Bio::Graphics::Panel->new() But then you need to have: -pad_right=>$padright as well, such that $padright is big enough to fit the keys See the poddoc for Bio::Graphics::Panel... -Steve >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >bounces@portal.open-bio.org] On Behalf Of Agrin, Nathan >Sent: Thursday, July 15, 2004 1:13 PM >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] Bio::Graphics Description layout question > >Is there anyway to get the bioperl graphics package to draw the >descriptions for the tracks to the right of the tracks? I've seen >something like this done with a browser developed by JGI. Here is the >link to an example: > >http://genome.jgi-psf.org/cgi-bin/dispGeneModel?db=chlre2 > >&id=157911 > >Thanks, >Nate This e-mail, including any attachments, is a confidential business communication, and may contain information that is confidential, proprietary and/or privileged. This e-mail is intended only for the individual(s) to whom it is addressed, and may not be saved, copied, printed, disclosed or used by anyone else. If you are not the(an) intended recipient, please immediately delete this e-mail from your computer system and notify the sender. Thank you. From amackey at pcbi.upenn.edu Thu Jul 15 19:00:53 2004 From: amackey at pcbi.upenn.edu (Aaron J Mackey) Date: Thu Jul 15 19:02:44 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: References: Message-ID: On Thu, 15 Jul 2004, Chris Mungall wrote: > However, the fasta parser sets the input record seperator $/=">\n", so I > actually have to read in up to but NOT including the next ^\> (or end of > file). Which means I actually have to switch $/ within the GFF parser! Hmm, doesn't it switch to "\n>" you mean? Regardless, why should you have to worry about it? You _pushback, you send off to the next parser; if it changes what $/ is, then Bio::Root::IO::_readline (or maybe just a fasta.pm overriden version) could/should be savvy to it (comments from the gallery?): Index: IO.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Root/IO.pm,v retrieving revision 1.51 diff -r1.51 IO.pm 420,422c420,422 < Note also that the current implementation does not handle pushed < back input correctly unless the pushed back input ends with the < value of $/. --- > Note also that the current implementation does handle > pushed back input correctly when the pushed back input > doesn't end with whatever is the local value of $/. 441a442,455 > > # If $/ has changed since the push back occurred, we may need to > # adjust the buffering ... > if (defined($line) && defined($/) && $line =~ m!$/!) { > # $/ is defined (not in file-slurp mode); does our current > # line have too much stuff already? > if (length($')) { > $line = "$`$/"; > unshift @{$self->{'_readbuffer'}}, $'; > } > } elsif (!eof($fh)) { > # need to read some more ... > $line .= <$fh>; > } > The simple solution is to force everyone to preceed the fasta section with > a ##FASTA directive - however, the spec says this is optional. Nah, the simple solution is to fix BioPerl ;) > Of course, I could just go back to my own 8-line fasta parsing code > within GFF.pm..... No, then you'd need to worry about it keeping in sync with SeqIO/fasta.pm, which is what we're trying to avoid, if possible. I repeat: thanks for the hard work! -Aaron From allenday at ucla.edu Thu Jul 15 21:09:11 2004 From: allenday at ucla.edu (Allen Day) Date: Thu Jul 15 21:11:35 2004 Subject: [Bioperl-l] Added sequence parsing code to Bio::Tools::GFF In-Reply-To: References: Message-ID: you're handling the '##FASTA' directive? this is using the _parse_header() method I added for '##sequence-region' lines, I take it? i added a stub in this method for handling all GFF3 '##*' directives. -allen On Mon, 12 Jul 2004, Chris Mungall wrote: > > I have added sequence parsing code to the GFF parser; note that sequence > data is only available in GFF3. > > It should now be possible to create a Bio::SeqIO::gff3 class, which would > be a short wrapper to Bio::Tools::GFF. Most people would still want to use > the Tools parser to parse on a per-feature basis, but the option of > treating gff3 in a similar fashion to genbank/embl/chadoxml/etc via SeqIO > would be there. > > According to the GFF3 spec the sequence data can come after or before the > relevant features; this means that the parser has the potential to be a > memory hog (but then the existing SeqIO classes already are with genbank > whole-chromosome entries). > > I've included the new docs from the gff parser below; if people agree with > this general means of handling sequence data then I'll go ahead and add a > Bio::SeqIO::gff3 as well. > > =head1 GFF3 AND SEQUENCE DATA > > [added by cjm 2004/07/09] > > GFF3 supports sequence data; see > http://song.sourceforge.net/gff3-jan04.shtml > > There are a number of ways to deal with this - > > If you call > > $gffio->ignore_sequence_data_toggle(1) > > prior to parsing the sequence data is ignored; this is useful if you > just want the features. It avoids the memory overhead in building and > caching sequences > > Alternatively, you can call either > > $gffio->get_all_seqs() > > Or > > $gffio->seq_id_by_h() > > At the B of parsing to get either a list or hashref of Bio::Seq > objects (see the documentation for each of these methods) > > Note that these objects will not have the features attached - you have > to do this yourself, OR call > > $gffio->features_attached_to_seqs_toggle(1) > > PRIOR to parsing; this will ensure that the Seqs have the features > attached; ie you will then be able to call > > $seq->get_SeqFeatures(); > > And use Bio::SeqIO methods > > Note that auto-attaching the features to seqs will incur a higher > memory overhead as the features must be cached until the sequence data > is found > > =cut > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Steven.Roels at mpi.com Thu Jul 15 21:36:38 2004 From: Steven.Roels at mpi.com (Roels, Steven) Date: Thu Jul 15 21:39:21 2004 Subject: [Bioperl-l] RE: strand in Meta-data sequence-region Message-ID: Lincoln, The recent discussion of GFF3 parsing on the bioperl list reminded me about a request I made for strand info in the sequence-region meta-data line (to handle "flipped" slices). Not urgent - just thought I'd point it out while folks are working on the relevant modules... Or did something come up that indicated the suggested change was a bad idea? Thanks, -Steve >>Sent: Tuesday, January 20, 2004 7:14 AM >>To: Roels, Steven; gff-list@sanger.ac.uk >>Subject: Re: spaces definition >> >>Hi Steven, >> >>I missed your messages due to spam filter screw up. With respect to >>your first question about the strand, I will be happy to add an >>optional strand field (plus strand assumed if missing). Does that >>sound OK? > >Perfect - thanks. > >> >>Lincoln >>>-----Original Message----- >>>From: owner-gff-list@sanger.ac.uk [mailto:owner-gff-list@sanger.ac.uk] On >>>Behalf Of Roels, Steven >>>Sent: Friday, December 12, 2003 10:21 AM >>>To: gff-list@sanger.ac.uk >>>Subject: strand in Meta-data sequence-region >>> >>> >>>Hello all, >>> >>>The spec that I see (http://song.sourceforge.net/gff3.shtml) lists the >>>following for the sequence-region meta-data tag: >>> >>>##sequence-region seqid start end >>> The sequence segment referred to by this file, in the format >>> "seqid start end". This element is optional, but strongly >>> encouraged because it allows parsers to perform bounds >>> checking on features. There may be multiple ##sequence-region >>> directives, each corresponding to one of the reference >>> sequences referred to in the body of the file. >>> >>>Shouldn't "strand" be included as well? I'm often playing with >>>gene-centered slices that have been flipped as needed so that the gene >>>of interest is oriented low to high. Without strand here, I can't record >>>sequence region accurately in the GFF file. Or is there some other way >>>to do it that I'm missing? >>> >>>Thanks, >>> >>>-Steve This e-mail, including any attachments, is a confidential business communication, and may contain information that is confidential, proprietary and/or privileged. This e-mail is intended only for the individual(s) to whom it is addressed, and may not be saved, copied, printed, disclosed or used by anyone else. If you are not the(an) intended recipient, please immediately delete this e-mail from your computer system and notify the sender. Thank you. From newyorkdimka at gmail.com Fri Jul 16 00:24:42 2004 From: newyorkdimka at gmail.com (dimka) Date: Fri Jul 16 00:26:33 2004 Subject: [Bioperl-l] display Newick trees Message-ID: I'm looking for a perl program that would generate evolutionary trees (ps, png, gif) read from a Newick (phylip dnd) file, http://evolution.genetics.washington.edu/phylip/newicktree.html I know there are programs like Treeview and NJplot and Treetool... but I'd like to be able to format the trees a certain way... Dmitry. From gowtham at icgeb.res.in Fri Jul 16 04:27:23 2004 From: gowtham at icgeb.res.in (gowthaman ramasamy) Date: Fri Jul 16 06:12:09 2004 Subject: [Bioperl-l] parallel processing with perl Message-ID: <1089966444.12741.15.camel@icgeb13> Hello list, Please ignore this question if it sounds like not related to BIOPERL. have a lengthy Perl script running on a 4 processor machine. At one point of time i have to execute four mysql quries from four different batch files (via shell). Currently i run them one after other. Can i some how fire them simultaneously so that they occupy all 4 processors and does the job quickly. portion of script follows ... #!/usr/bin/perl ............. ........... $var1=`mysql -h localhost -u xx -pyy filter < batchfile1.sql |tail -1`; $var2=`mysql -h localhost -u xx -pyy filter < batchfile2.sql |tail -1`; $var3=`mysql -h localhost -u xx -pyy filter < batchfile3.sql |tail -1`; $var4=`mysql -h localhost -u xx -pyy filter < batchfile4.sql |tail -1`; $total=$var1+$var2+$var3+$var4; NOTE : I dont want to use Perl-DBI. many thankx in advance -- Ra. Gowthaman, Graduate Student, Bioinformatics Lab, Malaria Research Group, ICGEB , New Delhi. INDIA Phone: 91-9811261804 91-11-26173184; 91-11-26189360 #extn 314 From amackey at pcbi.upenn.edu Fri Jul 16 07:16:54 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Fri Jul 16 07:18:33 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: References: Message-ID: On second (or third) thought, perhaps it would be better to formulate it like this: # oops, just read the first line of fasta record ... my $fasta = $line; while ( # still fasta, not gff # ) { $fasta .= $line; } # convert to seq: $seq = Bio::SeqIO->new(-fh => IO::String->new($fasta), -format => "fasta")->next_seq; Otherwise, no matter how we fiddle with _readline, it's going to be ugly to "share" $self->{_readline} between two distinct objects. -Aaron On Jul 15, 2004, at 7:00 PM, Aaron J Mackey wrote: > > On Thu, 15 Jul 2004, Chris Mungall wrote: > >> However, the fasta parser sets the input record seperator $/=">\n", >> so I >> actually have to read in up to but NOT including the next ^\> (or end >> of >> file). Which means I actually have to switch $/ within the GFF parser! > > Hmm, doesn't it switch to "\n>" you mean? Regardless, why should you > have to worry about it? You _pushback, you send off to the next > parser; if it changes what $/ is, then Bio::Root::IO::_readline (or > maybe just a fasta.pm overriden version) could/should be savvy to it > (comments from the gallery?): > > Index: IO.pm > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/Root/IO.pm,v > retrieving revision 1.51 > diff -r1.51 IO.pm > 420,422c420,422 > < Note also that the current implementation does not handle > pushed > < back input correctly unless the pushed back input ends > with the > < value of $/. > --- >> Note also that the current implementation does handle >> pushed back input correctly when the pushed back input >> doesn't end with whatever is the local value of $/. > 441a442,455 >> >> # If $/ has changed since the push back occurred, we may need to >> # adjust the buffering ... >> if (defined($line) && defined($/) && $line =~ m!$/!) { >> # $/ is defined (not in file-slurp mode); does our current >> # line have too much stuff already? >> if (length($')) { >> $line = "$`$/"; >> unshift @{$self->{'_readbuffer'}}, $'; >> } >> } elsif (!eof($fh)) { >> # need to read some more ... >> $line .= <$fh>; >> } > > >> The simple solution is to force everyone to preceed the fasta section >> with >> a ##FASTA directive - however, the spec says this is optional. > > Nah, the simple solution is to fix BioPerl ;) > >> Of course, I could just go back to my own 8-line fasta parsing code >> within GFF.pm..... > > No, then you'd need to worry about it keeping in sync with > SeqIO/fasta.pm, which is what we're trying to avoid, if possible. > > I repeat: thanks for the hard work! > > -Aaron > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From cjm at fruitfly.org Thu Jul 15 17:10:08 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Jul 16 07:47:02 2004 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: <200407141617.45269.lstein@cshl.edu> Message-ID: > $gffio->allow_features_attached_to_seqs(1) OK, I like this suggestion Right, I tried switching my code to use the fasta parser once a ^\> is reached How do you think I should tackle this thorny problem: I use $self->_pushback($line) to feed back the original header line to the fasta parser. However, the fasta parser sets the input record seperator $/=">\n", so I actually have to read in up to but NOT including the next ^\> (or end of file). Which means I actually have to switch $/ within the GFF parser! Not only does this seem awkward and wrong, but the GFF parser will break if someone decides to switch the fasta implementation to use standard newline record seperators! The simple solution is to force everyone to preceed the fasta section with a ##FASTA directive - however, the spec says this is optional. Another solution would be to ban setting of $/ within parsers as dangerous (despite the "local" declaration, the effect is definitely NOT local); I'm not sure what the ramifications of this are in terms of speed and efficiency. Of course, I could just go back to my own 8-line fasta parsing code within GFF.pm..... On Wed, 14 Jul 2004, Lincoln Stein wrote: > I like the idea of the toggle, but we don't use the term "toggle" > anywhere in bioperl right now. How about "allow" as in: > > $gffio->allow_features_attached_to_seqs(1) > > It won't be used more than once per script, so not much overhead in > having a long name. > > Lincoln > > > On Wednesday 14 July 2004 10:37 am, Aaron J. Mackey wrote: > > On Jul 13, 2004, at 12:42 PM, Chris Mungall wrote: > > >>> + $gffio->features_attached_to_seqs_toggle(1) > > >> > > >> Again, $gffio->attach_features(1) seems sufficient ... > > > > > > OK; although one might be led to expect that the argument for a > > > method with that name would be a list of SeqFeatures. Is the BP > > > method name syntax enshrined anywhere, or is it more a general > > > set of principles shared by the authors? > > > > You're right. No, there is no shrine, I think it's like the > > difference between art and pornography; we know it when we see it. > > Perhaps "$gffio->features_attached_to_seqs(1)" is the > > necessarily-long-but-best answer? > > > > > So perhaps I should reeingineer it a bit so that it rejects > > > anything that > > > doesn't follow the spec. This makes it easier to use the FASTA > > > parser. > > > > I think that'd be great, and should simplify matters a bit. > > > > Thanks again, > > > > -Aaron > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From cjm at fruitfly.org Thu Jul 15 20:09:57 2004 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Jul 16 07:47:07 2004 Subject: [Bioperl-l] Added sequence parsing code to Bio::Tools::GFF In-Reply-To: Message-ID: I'm filling in the stubs, yep - see the checked in code although it turns out the ##FASTA part is in the footer, not the header (see the spec*). I've still filled in your stub to allow for featuresless GFF3 files that contain sequence, which is perfectly valid. On Thu, 15 Jul 2004, Allen Day wrote: > you're handling the '##FASTA' directive? this is using the > _parse_header() method I added for '##sequence-region' lines, I take it? > i added a stub in this method for handling all GFF3 '##*' directives. > > -allen > > > On Mon, 12 Jul 2004, Chris Mungall wrote: > > > > > I have added sequence parsing code to the GFF parser; note that sequence > > data is only available in GFF3. > > > > It should now be possible to create a Bio::SeqIO::gff3 class, which would > > be a short wrapper to Bio::Tools::GFF. Most people would still want to use > > the Tools parser to parse on a per-feature basis, but the option of > > treating gff3 in a similar fashion to genbank/embl/chadoxml/etc via SeqIO > > would be there. > > > > According to the GFF3 spec the sequence data can come after or before the > > relevant features; this means that the parser has the potential to be a > > memory hog (but then the existing SeqIO classes already are with genbank > > whole-chromosome entries). > > > > I've included the new docs from the gff parser below; if people agree with > > this general means of handling sequence data then I'll go ahead and add a > > Bio::SeqIO::gff3 as well. > > > > =head1 GFF3 AND SEQUENCE DATA > > > > [added by cjm 2004/07/09] > > > > GFF3 supports sequence data; see > > http://song.sourceforge.net/gff3-jan04.shtml > > > > There are a number of ways to deal with this - > > > > If you call > > > > $gffio->ignore_sequence_data_toggle(1) > > > > prior to parsing the sequence data is ignored; this is useful if you > > just want the features. It avoids the memory overhead in building and > > caching sequences > > > > Alternatively, you can call either > > > > $gffio->get_all_seqs() > > > > Or > > > > $gffio->seq_id_by_h() > > > > At the B of parsing to get either a list or hashref of Bio::Seq > > objects (see the documentation for each of these methods) > > > > Note that these objects will not have the features attached - you have > > to do this yourself, OR call > > > > $gffio->features_attached_to_seqs_toggle(1) > > > > PRIOR to parsing; this will ensure that the Seqs have the features > > attached; ie you will then be able to call > > > > $seq->get_SeqFeatures(); > > > > And use Bio::SeqIO methods > > > > Note that auto-attaching the features to seqs will incur a higher > > memory overhead as the features must be cached until the sequence data > > is found > > > > =cut > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From jason at cgt.duhs.duke.edu Fri Jul 16 08:45:02 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jul 16 08:46:59 2004 Subject: [Bioperl-l] display Newick trees In-Reply-To: References: Message-ID: See Allen's svggraph module in TreeIO I guess is the closest thing to what you want - it makes SVG output. On Fri, 16 Jul 2004, dimka wrote: > I'm looking for a perl program that would generate evolutionary trees > (ps, png, gif) read from a Newick (phylip dnd) file, > http://evolution.genetics.washington.edu/phylip/newicktree.html > > I know there are programs like Treeview and NJplot and Treetool... but > I'd like to be able to format the trees a certain way... > > Dmitry. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From pwilkinson at videotron.ca Fri Jul 16 09:44:34 2004 From: pwilkinson at videotron.ca (Peter Wilkinson) Date: Fri Jul 16 09:47:26 2004 Subject: [Bioperl-l] parallel processing with perl In-Reply-To: <1089966444.12741.15.camel@icgeb13> References: <1089966444.12741.15.camel@icgeb13> Message-ID: <6.0.3.0.0.20040716093752.01bb7ec0@pop.videotron.ca> This is something that you can do by creating a process for each query and capturing the output from each child process. Start by having a look at the fork() command. Be warned that process management is an art and you might end up spend some time learning about multiple process management. Peter At 04:27 AM 7/16/2004, gowthaman ramasamy wrote: >Hello list, >Please ignore this question if it sounds like not related to BIOPERL. > > have a lengthy Perl script running on a 4 processor machine. At one >point of time i have to execute four mysql quries from >four different batch files (via shell). Currently i run them one after >other. Can i some how fire them simultaneously so that they occupy all 4 >processors and does the job quickly. > >portion of script follows ... >#!/usr/bin/perl >............. >........... >$var1=`mysql -h localhost -u xx -pyy filter < batchfile1.sql |tail -1`; >$var2=`mysql -h localhost -u xx -pyy filter < batchfile2.sql |tail -1`; >$var3=`mysql -h localhost -u xx -pyy filter < batchfile3.sql |tail -1`; >$var4=`mysql -h localhost -u xx -pyy filter < batchfile4.sql |tail -1`; > >$total=$var1+$var2+$var3+$var4; > >NOTE : I dont want to use Perl-DBI. >many thankx in advance > > > >-- >Ra. Gowthaman, >Graduate Student, >Bioinformatics Lab, >Malaria Research Group, >ICGEB , New Delhi. >INDIA > >Phone: 91-9811261804 > 91-11-26173184; 91-11-26189360 #extn 314 > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Fri Jul 16 10:05:37 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jul 16 10:08:53 2004 Subject: [Bioperl-l] tempdir() issues in regular perl. In-Reply-To: References: Message-ID: Bugfixes have tried to address this but since you don't report which modules you are using it is hard to say if the fixes will help you. On Thu, 15 Jul 2004, Victor Weike Xin (Houston) wrote: > Hi Jason, > > I have got the following error message: > Error in tempdir() using /tmp/XXXXXXXXXX: Could not create directory > /tmp/qpFBwIo1sH: Too many links at ./Archive.pl line 684 > > use File::Temp qw/ tempdir tempfile /; is used. > > Code at line 684: my $dir = tempdir(CLEANUP=>1); > > O/S redhat liunx.7.3 > Langauage: regular perl > > Can I install bioperl-1.4.tar.gz and will it confilict with regular perl? > Shouldn't. It's just a bunch of modules. > Thank you for your help. > > Weike Xin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From pow at ebi.ac.uk Fri Jul 16 10:17:15 2004 From: pow at ebi.ac.uk (Jean-Jack Riethoven) Date: Fri Jul 16 10:30:11 2004 Subject: [Bioperl-l] parallel processing with perl In-Reply-To: <6.0.3.0.0.20040716093752.01bb7ec0@pop.videotron.ca> Message-ID: On Fri, 16 Jul 2004, Peter Wilkinson wrote: > fork() command. Be warned that process management is an art and you might > end up spend some time learning about multiple process management. > > > >portion of script follows ... > >#!/usr/bin/perl > >............. > >........... > >$var1=`mysql -h localhost -u xx -pyy filter < batchfile1.sql |tail -1`; > >$var2=`mysql -h localhost -u xx -pyy filter < batchfile2.sql |tail -1`; > >$var3=`mysql -h localhost -u xx -pyy filter < batchfile3.sql |tail -1`; > >$var4=`mysql -h localhost -u xx -pyy filter < batchfile4.sql |tail -1`; Try Proc::Simple or any other process management module (not BioPerl, have a look on cpan) - they generally work very well unless you start/end new processes at phenomenal rates. We are using Proc::Simple for a lot of our strictly monitored parallel processing. With kind regards, Drs. Jean-Jack M. Riethoven EMBL Outstation - Hinxton pow@ebi.ac.uk ICQ#: 3433929 European Bioinformatics Institute Phone: (+44) 1223 494635 Wellcome Trust Genome Campus Fax : (+44) 1223 494468 Hinxton, Cambridge CB10 1SD URL : http://www.ebi.ac.uk/asd/ UNITED KINGDOM From Steven.Roels at mpi.com Fri Jul 16 10:33:24 2004 From: Steven.Roels at mpi.com (Roels, Steven) Date: Fri Jul 16 10:36:13 2004 Subject: [Bioperl-l] Bio::Graphics Description layout question Message-ID: Nathan >-----Original Message----- >From: Agrin, Nathan [mailto:Nathan.Agrin@umassmed.edu] >Sent: Friday, July 16, 2004 10:19 AM >To: Roels, Steven; bioperl-l@bioperl.org >Subject: RE: [Bioperl-l] Bio::Graphics Description layout question > >Thanks for the help, > >Using the -key_style=>'right' seems to work on some tests I ran, but as >soon as I feed -key the function " sub { $des = >$feature->each_tag_value('description'), "$description"; } " it stops >working. I can feed the -key something like sub { return "test" } and >it works fine. I've been really frustrated trying to deal with this and >any help is appreciated. > You need to handle to passed-in feature: sub { ($des) = $_[0]->get_tag_values('description'); } or sub { my $feature = shift; ($des) = $feature->get_tag_values('description'); } this assumes your features always have that tag - otherwise you need to test for it: sub { my $feature = shift; ( $feature->has_tag('description') ) ? ($feature->get_tag_values('description'))[0] : "no_desc"; } warning - didn't test the above, so pardon any typos :) Hope that helps. > > >-----Original Message----- >From: Roels, Steven [mailto:Steven.Roels@mpi.com] >Sent: Thursday, July 15, 2004 6:57 PM >To: Agrin, Nathan; bioperl-l@bioperl.org >Subject: RE: [Bioperl-l] Bio::Graphics Description layout question > > >Nathan, > >I believe you need: > >-key_style=>'right' > >as an option to > >Bio::Graphics::Panel->new() > >But then you need to have: > >-pad_right=>$padright > >as well, such that $padright is big enough to fit the keys > >See the poddoc for Bio::Graphics::Panel... > >-Steve > >>-----Original Message----- >>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >>bounces@portal.open-bio.org] On Behalf Of Agrin, Nathan >>Sent: Thursday, July 15, 2004 1:13 PM >>To: bioperl-l@bioperl.org >>Subject: [Bioperl-l] Bio::Graphics Description layout question >> >>Is there anyway to get the bioperl graphics package to draw the >>descriptions for the tracks to the right of the tracks? I've seen >>something like this done with a browser developed by JGI. Here is the >>link to an example: >> >>http://genome.jgi-psf.org/cgi-bin/dispGeneModel?db=chlre2 >> >>&id=157911 >> >>Thanks, >>Nate > > > >This e-mail, including any attachments, is a confidential business >communication, and may contain information that is confidential, >proprietary and/or privileged. This e-mail is intended only for the >individual(s) to whom it is addressed, and may not be saved, copied, >printed, disclosed or used by anyone else. If you are not the(an) >intended recipient, please immediately delete this e-mail from your >computer system and notify the sender. Thank you. This e-mail, including any attachments, is a confidential business communication, and may contain information that is confidential, proprietary and/or privileged. This e-mail is intended only for the individual(s) to whom it is addressed, and may not be saved, copied, printed, disclosed or used by anyone else. If you are not the(an) intended recipient, please immediately delete this e-mail from your computer system and notify the sender. Thank you. From Nathan.Agrin at umassmed.edu Fri Jul 16 10:19:26 2004 From: Nathan.Agrin at umassmed.edu (Agrin, Nathan) Date: Fri Jul 16 10:59:04 2004 Subject: [Bioperl-l] Bio::Graphics Description layout question Message-ID: <89AA811FD79DC94788093B23DA79E71F0184D09F@edunivmail02.ad.umassmed.edu> Thanks for the help, Using the -key_style=>'right' seems to work on some tests I ran, but as soon as I feed -key the function " sub { $des = $feature->each_tag_value('description'), "$description"; } " it stops working. I can feed the -key something like sub { return "test" } and it works fine. I've been really frustrated trying to deal with this and any help is appreciated. -----Original Message----- From: Roels, Steven [mailto:Steven.Roels@mpi.com] Sent: Thursday, July 15, 2004 6:57 PM To: Agrin, Nathan; bioperl-l@bioperl.org Subject: RE: [Bioperl-l] Bio::Graphics Description layout question Nathan, I believe you need: -key_style=>'right' as an option to Bio::Graphics::Panel->new() But then you need to have: -pad_right=>$padright as well, such that $padright is big enough to fit the keys See the poddoc for Bio::Graphics::Panel... -Steve >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >bounces@portal.open-bio.org] On Behalf Of Agrin, Nathan >Sent: Thursday, July 15, 2004 1:13 PM >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] Bio::Graphics Description layout question > >Is there anyway to get the bioperl graphics package to draw the >descriptions for the tracks to the right of the tracks? I've seen >something like this done with a browser developed by JGI. Here is the >link to an example: > >http://genome.jgi-psf.org/cgi-bin/dispGeneModel?db=chlre2 > >&id=157911 > >Thanks, >Nate This e-mail, including any attachments, is a confidential business communication, and may contain information that is confidential, proprietary and/or privileged. This e-mail is intended only for the individual(s) to whom it is addressed, and may not be saved, copied, printed, disclosed or used by anyone else. If you are not the(an) intended recipient, please immediately delete this e-mail from your computer system and notify the sender. Thank you. From laurichj at bioinfo.ucr.edu Fri Jul 16 12:07:01 2004 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Fri Jul 16 12:08:52 2004 Subject: [Bioperl-l] parallel processing with perl In-Reply-To: References: <6.0.3.0.0.20040716093752.01bb7ec0@pop.videotron.ca> Message-ID: <20040716160701.GB2159@bioinfo.ucr.edu> On Fri 07/16/04 15:17, Jean-Jack Riethoven wrote: > On Fri, 16 Jul 2004, Peter Wilkinson wrote: > Try Proc::Simple or any other process management module (not BioPerl, > have a look on cpan) - they generally work very well unless you start/end Another good one is Parallel::ForkManager but its main caveat is you can't fork/system()/open("|")/etc in a child, confuses it. All though, Proc::Simple and Proc::Queue seem to be better... -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| From lstein at cshl.edu Fri Jul 16 14:08:19 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Jul 16 14:11:06 2004 Subject: [Bioperl-l] Bio::Graphics Description layout question In-Reply-To: References: Message-ID: <200407161408.19328.lstein@cshl.edu> Newer versions of the graphics library will automatically increase the padding for you, so this is no longer necessary. Lincoln On Thursday 15 July 2004 06:57 pm, Roels, Steven wrote: > Nathan, > > I believe you need: > > -key_style=>'right' > > as an option to > > Bio::Graphics::Panel->new() > > But then you need to have: > > -pad_right=>$padright > > as well, such that $padright is big enough to fit the keys > > See the poddoc for Bio::Graphics::Panel... > > -Steve > > >-----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > >bounces@portal.open-bio.org] On Behalf Of Agrin, Nathan > >Sent: Thursday, July 15, 2004 1:13 PM > >To: bioperl-l@bioperl.org > >Subject: [Bioperl-l] Bio::Graphics Description layout question > > > >Is there anyway to get the bioperl graphics package to draw the > >descriptions for the tracks to the right of the tracks? I've seen > >something like this done with a browser developed by JGI. Here is > > the link to an example: > > > >http://genome.jgi-psf.org/cgi-bin/dispGeneModel?db=chlre2 > > >11> &id=157911 > > > >Thanks, > >Nate > > This e-mail, including any attachments, is a confidential business > communication, and may contain information that is confidential, > proprietary and/or privileged. This e-mail is intended only for > the individual(s) to whom it is addressed, and may not be saved, > copied, printed, disclosed or used by anyone else. If you are not > the(an) intended recipient, please immediately delete this e-mail > from your computer system and notify the sender. Thank you. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From tariq_shafi75 at hotmail.com Sat Jul 17 14:01:53 2004 From: tariq_shafi75 at hotmail.com (Tariq Shafi) Date: Sat Jul 17 14:03:42 2004 Subject: [Bioperl-l] Cookie problems Message-ID: Hi I am having a lot of problem with cookies. I was using cookies and then I implemented a script to clear the contents of a cookie (a hash). Since then, my cookies have not been working. I have tried everything and have looked at the code dozens of times. Have any of you had an experience of this? Below is my code. Please let me know if you can help, greatly appreciated. Regards Tariq ---------------------------------------- #!/usr/bin/perl use CGI qw(:standard); $query = new CGI; %hash = $query->cookie(-name=>"ns_id"); #This array (@newerelements) takes all the selected checkboxes stored in the page submitting to the dynamic script. #The checkbox group is called ' '. #The checkboxes contain ID's, which are implemented as keys in a hash (to avoid duplicates and #allow ordering). @newelements = $query->param(' '); for ($i = 0; $i < scalar @newelements; $i++) { $hash{$newelements[$i]} = ""; } #I'm updating (or trying to) update the cookie here and passing it into the dynamic script header. $ns_id = $query->cookie(-name=> "ns_id", -value=>\%hash, -domain=>(...), -path=>'/pad/cgi-bin/', -expires=>"+1y" ); print $query->header(-type=>"text/html", -cookie=>$ns_id); if (defined $ns_id) { #Printing out the %hash keys, which come out as expected. print "Hash values: "; foreach $key (sort {$a<=>$b} keys %hash) { print $key, br(); } #The selected checkboxes (in @newelements) are the same as the %hash keys print "Newer elements:"; for ($i = 0; $i < scalar @newelements; $i++) { print "$newelements[$i]\n"; } print br(); } ------------------------------ Then I have a script called 'Alignments.pl', which is supposed to take the ID's, ascertain information from them in a database and then do alignments using BioPerl. At this point nothing pertaining to the cookie value (the hash) is printed out. #!/usr/bin/perl use CGI qw(:standard); #require "./cgi-lib.pl"; use DBI; $query = new CGI; %hash = $query->cookie(-name=>"ns_id"); $dbh = DBI->connect(...) print $query->header(-cookie=>$ns_id); if ( defined $cookie) { print "Cookie Defined", br(); #NOTHING IS PRINTED OUT HERE foreach $key(sort {$a<=>$b}keys %hash) {print $key, " "; } } ------------------------------- Below is the script that was used to clear the cookie hash values #!/usr/bin/perl use CGI qw(:standard); $query = new CGI; %hash = $query->cookie(-name=>"ns_id"); foreach $key (sort keys %hash) {delete $hash{$key}; } $ns_id = $query->cookie(-name=> "ns_id", -value=>\%hash, -domain=>(...), -path=>'/pad/cgi-bin/', -expires=>"+1y" ); print $query->header(-cookie=>$ns_id); _________________________________________________________________ It's fast, it's easy and it's free. Get MSN Messenger today! http://www.msn.co.uk/messenger From lstein at cshl.edu Sat Jul 17 15:21:29 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Sat Jul 17 15:23:21 2004 Subject: [Bioperl-l] Cookie problems In-Reply-To: References: Message-ID: <200407171521.29907.lstein@cshl.edu> Hi, It looks like the cookie is being returned to the first script but not the second one. Possibly your browser is confused. Try clearing out the cookies in your browser and starting fresh. lincoln On Saturday 17 July 2004 02:01 pm, Tariq Shafi wrote: > Hi > > I am having a lot of problem with cookies. I was using cookies and then I > implemented a script to clear the contents of a cookie (a hash). Since > then, my cookies have not been working. I have tried everything and have > looked at the code dozens of times. > > Have any of you had an experience of this? > > Below is my code. Please let me know if you can help, greatly appreciated. > > Regards > > Tariq > > ---------------------------------------- > > #!/usr/bin/perl > > use CGI qw(:standard); > > $query = new CGI; > %hash = $query->cookie(-name=>"ns_id"); > > #This array (@newerelements) takes all the selected checkboxes stored in > the page submitting to the dynamic script. > #The checkbox group is called ' '. > #The checkboxes contain ID's, which are implemented as keys in a hash (to > avoid duplicates and > #allow ordering). > > @newelements = $query->param(' '); > > for ($i = 0; $i < scalar @newelements; $i++) > { > $hash{$newelements[$i]} = ""; > } > > #I'm updating (or trying to) update the cookie here and passing it into the > dynamic script header. > > $ns_id = $query->cookie(-name=> "ns_id", -value=>\%hash, -domain=>(...), > -path=>'/pad/cgi-bin/', > -expires=>"+1y" ); > > print $query->header(-type=>"text/html", -cookie=>$ns_id); > > if (defined $ns_id) > > { > #Printing out the %hash keys, which come out as expected. > > print "Hash values: "; > foreach $key (sort {$a<=>$b} keys %hash) > { print $key, br(); > } > > #The selected checkboxes (in @newelements) are the same as > the %hash keys > > print "Newer elements:"; > > for ($i = 0; $i < scalar @newelements; $i++) > { print "$newelements[$i]\n"; > } > > print br(); > > } > > ------------------------------ > > Then I have a script called 'Alignments.pl', which is supposed to take the > ID's, ascertain information from them in a database and then do alignments > using BioPerl. At this point nothing pertaining to the cookie value (the > hash) is printed out. > > #!/usr/bin/perl > > use CGI qw(:standard); > #require "./cgi-lib.pl"; > use DBI; > > $query = new CGI; > %hash = $query->cookie(-name=>"ns_id"); > > $dbh = DBI->connect(...) > > print $query->header(-cookie=>$ns_id); > > if ( defined $cookie) { > > print "Cookie Defined", br(); > > #NOTHING IS PRINTED OUT HERE > foreach $key(sort {$a<=>$b}keys %hash) > {print $key, " "; > } > > } > > ------------------------------- > Below is the script that was used to clear the cookie hash values > > #!/usr/bin/perl > > use CGI qw(:standard); > > $query = new CGI; > > %hash = $query->cookie(-name=>"ns_id"); > > foreach $key (sort keys %hash) > {delete $hash{$key}; > } > > $ns_id = $query->cookie(-name=> "ns_id", -value=>\%hash, -domain=>(...), > -path=>'/pad/cgi-bin/', > -expires=>"+1y" ); > > print $query->header(-cookie=>$ns_id); > > _________________________________________________________________ > It's fast, it's easy and it's free. Get MSN Messenger today! > http://www.msn.co.uk/messenger > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Sat Jul 17 15:37:03 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Sat Jul 17 15:38:53 2004 Subject: [Bioperl-l] bp_bulk_load_gff.pl speed In-Reply-To: References: <200407151745.i6FHj3Ks008770@portal.open-bio.org> <1089923738.1605.9.camel@localhost.localdomain> Message-ID: <200407171537.03957.lstein@cshl.edu> My apologies for that old bug. Lincoln On Thursday 15 July 2004 05:30 pm, Dustin Cram wrote: > Well, I think I've traced my problem to a bug in > Bio::DB::GFF->_split_gff2_group that only existed for a while in CVS. > I had assumed that release 1.4 was installed at our site, but it turns > out that it was a cvs for shortly after the 1.4 release. The revision > of Bio::DB::GFF.pm with the problem is 1.105 (maybe others too). > > It looks to me like $self->preferred_groups is being appended to with > ("Sequence",Transcript") for every call of the method, so as time goes > by the array gets huge, with just those elements repeated over and > over. That is why only my non-transcript features had problems - the > entire array was searched unsuccessfully for each feature. > > I've grabbed the latest CVS and it seems to work fine. Although I > haven't tried 1.4 release, I think it should work too. If this isn't > the problem for other folk, then I guess they're still just crazy :). > > Thanks, > > Dustin > > On Thu, 15 Jul 2004 16:36:36 -0400, Scott Cain wrote: > > Dustin, > > > > Besides Aaron, a few other people have complained about this, and yes, I > > had written them off as crazy :-) > > > > Since I can't reproduce this problem, I'll have to ask you: is the > > problem that the files are not being written to /usr/tmp (or where ever) > > as quickly as before, or is it that, after the files are done being > > written, they aren't loaded into mysql as quickly? Not that I have a > > solution to either problem, but the first is presumably a perl problem > > and the second a mysql problem. If it were the latter (which I kind of > > doubt), you could get around it by using a real database, like > > PostgreSQL. > > > > Scott > > > > On Thu, 2004-07-15 at 13:45, bioperl-l-request@portal.open-bio.org > > > > wrote: > > > I recently started using Bio:DB:GFF, beginning by using > > > bp_bulk_load_gff.pl to load a simple but large gff2 file. This file > > > consisted only of transcripts and their subfeatures, so the group > > > class of all features was "transcript". The files loaded with no > > > problem and I was able to write a few successful test scripts. > > > > > > Now I have added new features (genes) to the gff file, and I > > > attempted to load the new file exactly as before with > > > bp_bulk_load_gff.pl, but now it takes _much_ longer to load, and takes > > > more time the more features are added (the first 5K features take > > > about 30 seconds, the next 5K features take nearly 2 minutes, and so > > > on). It took over an hour to 50K features, at which point I stopped > > > it. > > > > > > I've played around with the gff file a bit and found that anything > > > that doesn't have a group class of "transcript" has this problem, for > > > example if I 'sed s/transcript/foo/g' the original file it's slow, > > > and if I 'sed s/gene/transcript/g' the new file it's fast. I have > > > manually verified that the MySQL database is empty before each attempt > > > and even wiped the tmp directory before each attempt. > > > > > > Any ideas why non-transcript features take so long? > > > > > > Thanks, > > > > > > Dustin Cram > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain@cshl.org > > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From tariq_shafi75 at hotmail.com Sat Jul 17 16:33:43 2004 From: tariq_shafi75 at hotmail.com (Tariq Shafi) Date: Sat Jul 17 16:35:32 2004 Subject: [Bioperl-l] Cookie problems Message-ID: Hi Lincoln, Thanks very much! Your suggestion worked, and everything is running nicely now. Kind regards Tariq >From: Lincoln Stein >Reply-To: lstein@cshl.org >To: Tariq Shafi , bioperl-l@portal.open-bio.org >Subject: Re: [Bioperl-l] Cookie problems >Date: Sat, 17 Jul 2004 15:21:29 -0400 > >Hi, > >It looks like the cookie is being returned to the first script but not the >second one. Possibly your browser is confused. Try clearing out the >cookies >in your browser and starting fresh. > >lincoln > > >On Saturday 17 July 2004 02:01 pm, Tariq Shafi wrote: > > Hi > > > > I am having a lot of problem with cookies. I was using cookies and then >I > > implemented a script to clear the contents of a cookie (a hash). Since > > then, my cookies have not been working. I have tried everything and have > > looked at the code dozens of times. > > > > Have any of you had an experience of this? > > > > Below is my code. Please let me know if you can help, greatly >appreciated. > > > > Regards > > > > Tariq > > > > ---------------------------------------- > > > > #!/usr/bin/perl > > > > use CGI qw(:standard); > > > > $query = new CGI; > > %hash = $query->cookie(-name=>"ns_id"); > > > > #This array (@newerelements) takes all the selected checkboxes stored in > > the page submitting to the dynamic script. > > #The checkbox group is called ' '. > > #The checkboxes contain ID's, which are implemented as keys in a hash >(to > > avoid duplicates and > > #allow ordering). > > > > @newelements = $query->param(' '); > > > > for ($i = 0; $i < scalar @newelements; $i++) > > { > > $hash{$newelements[$i]} = ""; > > } > > > > #I'm updating (or trying to) update the cookie here and passing it into >the > > dynamic script header. > > > > $ns_id = $query->cookie(-name=> "ns_id", -value=>\%hash, -domain=>(...), > > -path=>'/pad/cgi-bin/', > > -expires=>"+1y" ); > > > > print $query->header(-type=>"text/html", -cookie=>$ns_id); > > > > if (defined $ns_id) > > > > { > > #Printing out the %hash keys, which come out as expected. > > > > print "Hash values: "; > > foreach $key (sort {$a<=>$b} keys %hash) > > { print $key, br(); > > } > > > > #The selected checkboxes (in @newelements) are the same >as > > the %hash keys > > > > print "Newer elements:"; > > > > for ($i = 0; $i < scalar @newelements; $i++) > > { print "$newelements[$i]\n"; > > } > > > > print br(); > > > > } > > > > ------------------------------ > > > > Then I have a script called 'Alignments.pl', which is supposed to take >the > > ID's, ascertain information from them in a database and then do >alignments > > using BioPerl. At this point nothing pertaining to the cookie value (the > > hash) is printed out. > > > > #!/usr/bin/perl > > > > use CGI qw(:standard); > > #require "./cgi-lib.pl"; > > use DBI; > > > > $query = new CGI; > > %hash = $query->cookie(-name=>"ns_id"); > > > > $dbh = DBI->connect(...) > > > > print $query->header(-cookie=>$ns_id); > > > > if ( defined $cookie) { > > > > print "Cookie Defined", br(); > > > > #NOTHING IS PRINTED OUT HERE > > foreach $key(sort {$a<=>$b}keys %hash) > > {print $key, " "; > > } > > > > } > > > > ------------------------------- > > Below is the script that was used to clear the cookie hash values > > > > #!/usr/bin/perl > > > > use CGI qw(:standard); > > > > $query = new CGI; > > > > %hash = $query->cookie(-name=>"ns_id"); > > > > foreach $key (sort keys %hash) > > {delete $hash{$key}; > > } > > > > $ns_id = $query->cookie(-name=> "ns_id", -value=>\%hash, -domain=>(...), > > -path=>'/pad/cgi-bin/', > > -expires=>"+1y" ); > > > > print $query->header(-cookie=>$ns_id); > > > > _________________________________________________________________ > > It's fast, it's easy and it's free. Get MSN Messenger today! > > http://www.msn.co.uk/messenger > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > >-- >Lincoln Stein >lstein@cshl.edu >Cold Spring Harbor Laboratory >1 Bungtown Road >Cold Spring Harbor, NY 11724 >(516) 367-8380 (voice) >(516) 367-8389 (fax) _________________________________________________________________ Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo From barani_quest at rediffmail.com Sat Jul 17 04:43:04 2004 From: barani_quest at rediffmail.com (Baranidharan P) Date: Sun Jul 18 11:05:34 2004 Subject: [Bioperl-l] retreive medline records by title Message-ID: <20040717084304.1770.qmail@webmail27.rediffmail.com> Hi all, My doubt is how to retreive all publication medline records for a species . i want to retrieve all the medline records having the species name in its title ..for eg. Antheraea mylitta . the following code does it retreive from the title ??if not from where does it search..? thanx to all who has helped me..earlier.. barani -------------------------------------------------------------------- #!/usr/bin/perl -w use strict; use Bio::Biblio; use Bio::Biblio::IO; use Data::Dumper; my $count =1; my @ids= @{ new Bio::Biblio->find ("Antheraea mylitta")->get_all_ids } ; foreach my $ids (@ids) { print "$count"," "; print $ids,"\n"; $count ++; } From s.paul at surrey.ac.uk Mon Jul 19 13:26:27 2004 From: s.paul at surrey.ac.uk (S.Paul) Date: Mon Jul 19 05:25:15 2004 Subject: [Bioperl-l] Flat databases References: <02bb01c469ea$89be2460$d46fe383@LTCEP1SP> <200407141419.28614.lstein@cshl.edu> Message-ID: <158201c46db5$866c2900$d46fe383@LTCEP1SP> Thanks Lincoln; I'm using bioperl version 1.4 I'll try it with the Berkeley DB also and see how it works Sujoy Paul Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk ----- Original Message ----- From: "Lincoln Stein" To: "S.Paul" ; "Bioperl" Sent: Wednesday, July 14, 2004 11:19 AM Subject: Re: [Bioperl-l] Flat databases > By the way, I advise that you use the Berkeley DB interface, if > possible. The "flat" indexing scheme is quite slow. > > Lincoln > > On Wednesday 14 July 2004 05:35 pm, S.Paul wrote: > > Hi Everybody: > > > > I was trying to follow the HowTo for Flat databases and try to run > > the following command: bp_bioflat_index.pl -c -l > > c:\research\perl\genbank -d g > > enbank -i flat -f genbank data/*.gbk > > > > I installed the bioflat_index.pls script and then tried to run the > > above command but am getting the following error message: > > > > ******************************************************************* > >*** > > > > C:\research\perl\genbank>bp_bioflat_index.pl -c -l > > c:\research\perl\genbank -d g enbank -i flat -f genbank data/*.gbk > > > > ------------- EXCEPTION ------------- > > MSG: Can't locate Bio/DB/Flat/Flat/genbank.pm in @INC (@INC > > contains: C:/Perl/li b C:/Perl/site/lib .) at (eval 4) line 2. > > BEGIN failed--compilation aborted at (eval 4) line 2. > > > > STACK Bio::DB::Flat::new C:/Perl/site/lib/Bio/DB/Flat.pm:140 > > STACK toplevel C:\research\perl\genbank\bp_bioflat_index.pl:89 > > > > ******************************************************************* > >********** > > > > Thanks in advance for the help > > > > Sujoy > > > > > > > > Sujoy Paul, PRISE Centre, UniS, s.paul@surrey.ac.uk > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > From james.wasmuth at ed.ac.uk Mon Jul 19 09:07:44 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Mon Jul 19 09:13:54 2004 Subject: [Bioperl-l] Blast Output and frac_aligned_query Message-ID: <40FBC7A0.3040201@ed.ac.uk> First apologies if this has been debated before, didn;t see it in the archive and been away for a while, so unlcear on current state of affairs. I have a bl2seq output (below) and when I extract its statistics, I am told that 156% of the query is aligned. This is probably because of multiple HSP produced as the protein appears highly repetitive. Would this mess up the tiling the hsps, in its current implementation? cheers -james e = 2e-19 s = 205 b = 83.6 aln_q = 1.56 ! aln_h = 0.09 id = 0.208 cons = 0.256 len = 332 > Query= prediction > (80 letters) > > >wormpep > Length = 2592 > > Score = 83.6 bits (205), Expect = 2e-19 > Identities = 41/47 (87%), Positives = 43/47 (91%) > > Query: 1 SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKKKNSSSGQ 47 > SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKK+ + Q > Sbjct: 1528 SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKKETDQAVQ 1574 > > > > Score = 29.3 bits (64), Expect = 0.004 > Identities = 13/24 (54%), Positives = 18/24 (75%) > > Query: 50 SSSGSSSDSSSXDGSTSSDDSXDD 73 > S S SSSDS S +GS+SS++ D+ > Sbjct: 493 SGSDSSSDSDSEEGSSSSNEDSDE 516 > > > > Score = 26.2 bits (56), Expect = 0.036 > Identities = 13/33 (39%), Positives = 20/33 (60%), Gaps = 1/33 (3%) > > Query: 40 KKNSSSGQHDSSSGSSSDSSSXDGSTSSDDSXD 72 > ++N++SG DSSS S S+ S + SD+ D > Sbjct: 488 QENNASGS-DSSSDSDSEEGSSSSNEDSDEQND 519 > > > > Score = 23.5 bits (49), Expect = 0.24 > Identities = 14/69 (20%), Positives = 31/69 (44%), Gaps = 4/69 (5%) > > Query: 9 NSAADSPMSTTGRPMV----LTKAAMKAFNSTPPKKKNSSSGQHDSSSGSSSDSSSXDGS 64 > + + SP S+ R + T+++++ + ++ N+S S S S SSS + > Sbjct: 454 > DQGSSSPSSSRDRQNLHDPLQTRSSVEHHTNQEDQENNASGSDSSSDSDSEEGSSSSNED 513 > > Query: 65 TSSDDSXDD 73 > + + D+ > Sbjct: 514 SDEQNDVDE 522 > > > > Score = 21.9 bits (45), Expect = 0.68 > Identities = 10/29 (34%), Positives = 16/29 (55%) > > Query: 40 KKNSSSGQHDSSSGSSSDSSSXDGSTSSD 68 > + +S + + ++ GSSS SSS D D > Sbjct: 443 RSSSPTSKSENDQGSSSPSSSRDRQNLHD 471 > > > > Score = 21.2 bits (43), Expect = 1.2 > Identities = 13/34 (38%), Positives = 16/34 (47%) > > Query: 43 SSSGQHDSSSGSSSDSSSXDGSTSSDDSXDDXVP 76 > S S ++SGS S S + STSS S P > Sbjct: 2327 SRSSTMGNNSGSPSASGTTSPSTSSSISSGPDSP 2360 > > > > Score = 21.2 bits (43), Expect = 1.2 > Identities = 12/47 (25%), Positives = 17/47 (36%) > > Query: 27 KAAMKAFNSTPPKKKNSSSGQHDSSSGSSSDSSSXDGSTSSDDSXDD 73 > K KA KKK+ D S S+D D S+ + + > Sbjct: 1144 KVRKKAEKEKLKKKKHRKGDSSDESDSDSNDELDLDVRKSTKEMTQE 1190 > > > > Score = 20.0 bits (40), Expect = 2.6 > Identities = 11/35 (31%), Positives = 18/35 (51%), Gaps = 1/35 (2%) > > Query: 42 NSSSGQHDSSSGSSS-DSSSXDGSTSSDDSXDDXV 75 > + SS DS GSSS + S + + ++ +D V > Sbjct: 495 SDSSSDSDSEEGSSSSNEDSDEQNDVDEEDDEDVV 529 > > > > Score = 18.5 bits (36), Expect = 7.6 > Identities = 7/14 (50%), Positives = 9/14 (64%) > > Query: 49 DSSSGSSSDSSSXD 62 > +SS+G SDS D > Sbjct: 1252 NSSNGEESDSEKAD 1265 > > > Lambda K H > 0.294 0.109 0.279 > > Gapped > Lambda K H > 0.267 0.0410 0.140 > > > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Hits to DB: 2307 > Number of Sequences: 0 > Number of extensions: 39 > Number of successful extensions: 11 > Number of sequences better than 10.0: 1 > Number of HSP's better than 10.0 without gapping: 1 > Number of HSP's successfully gapped in prelim test: 0 > Number of HSP's that attempted gapping in prelim test: 0 > Number of HSP's gapped (non-prelim): 10 > length of query: 80 > length of database: 115,000 > effective HSP length: 56 > effective length of query: 24 > effective length of database: 114,944 > effective search space: 2758656 > effective search space used: 2758656 > T: 11 > A: 40 > X1: 17 ( 7.2 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 35 (18.0 bits) > S2: 35 (18.1 bits) -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From jason at cgt.duhs.duke.edu Mon Jul 19 09:33:35 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jul 19 09:35:33 2004 Subject: [Bioperl-l] Blast Output and frac_aligned_query In-Reply-To: <40FBC7A0.3040201@ed.ac.uk> References: <40FBC7A0.3040201@ed.ac.uk> Message-ID: On Mon, 19 Jul 2004, James Wasmuth wrote: > First apologies if this has been debated before, didn;t see it in the > archive and been away for a while, so unlcear on current state of affairs. > > I have a bl2seq output (below) and when I extract its statistics, I am > told that 156% of the query is aligned. > > This is probably because of multiple HSP produced as the protein appears > highly repetitive. Would this mess up the tiling the hsps, in its > current implementation? I guess so. SteveC is the tiling hsp guru so would have to see what he thinks. I think a lot of people out there have HSP tiling code - it would be nice to be able to incorporate more solutions to this problem so that one could try different strategies... You might also try using WU-BLAST with -links turned on which provides consistent groups of HSPs, we haven't (yet) incorporated interpreting the link information as a way to tile HSPs but would be a good project for someone to try out. (or for someone to donate if they have already solved this) -jason > > > cheers > -james > > > e = 2e-19 > s = 205 > b = 83.6 > aln_q = 1.56 ! > aln_h = 0.09 > id = 0.208 > cons = 0.256 > len = 332 > > > > > Query= prediction > > (80 letters) > > > > >wormpep > > Length = 2592 > > > > Score = 83.6 bits (205), Expect = 2e-19 > > Identities = 41/47 (87%), Positives = 43/47 (91%) > > > > Query: 1 SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKKKNSSSGQ 47 > > SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKK+ + Q > > Sbjct: 1528 SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKKETDQAVQ 1574 > > > > > > > > Score = 29.3 bits (64), Expect = 0.004 > > Identities = 13/24 (54%), Positives = 18/24 (75%) > > > > Query: 50 SSSGSSSDSSSXDGSTSSDDSXDD 73 > > S S SSSDS S +GS+SS++ D+ > > Sbjct: 493 SGSDSSSDSDSEEGSSSSNEDSDE 516 > > > > > > > > Score = 26.2 bits (56), Expect = 0.036 > > Identities = 13/33 (39%), Positives = 20/33 (60%), Gaps = 1/33 (3%) > > > > Query: 40 KKNSSSGQHDSSSGSSSDSSSXDGSTSSDDSXD 72 > > ++N++SG DSSS S S+ S + SD+ D > > Sbjct: 488 QENNASGS-DSSSDSDSEEGSSSSNEDSDEQND 519 > > > > > > > > Score = 23.5 bits (49), Expect = 0.24 > > Identities = 14/69 (20%), Positives = 31/69 (44%), Gaps = 4/69 (5%) > > > > Query: 9 NSAADSPMSTTGRPMV----LTKAAMKAFNSTPPKKKNSSSGQHDSSSGSSSDSSSXDGS 64 > > + + SP S+ R + T+++++ + ++ N+S S S S SSS + > > Sbjct: 454 > > DQGSSSPSSSRDRQNLHDPLQTRSSVEHHTNQEDQENNASGSDSSSDSDSEEGSSSSNED 513 > > > > Query: 65 TSSDDSXDD 73 > > + + D+ > > Sbjct: 514 SDEQNDVDE 522 > > > > > > > > Score = 21.9 bits (45), Expect = 0.68 > > Identities = 10/29 (34%), Positives = 16/29 (55%) > > > > Query: 40 KKNSSSGQHDSSSGSSSDSSSXDGSTSSD 68 > > + +S + + ++ GSSS SSS D D > > Sbjct: 443 RSSSPTSKSENDQGSSSPSSSRDRQNLHD 471 > > > > > > > > Score = 21.2 bits (43), Expect = 1.2 > > Identities = 13/34 (38%), Positives = 16/34 (47%) > > > > Query: 43 SSSGQHDSSSGSSSDSSSXDGSTSSDDSXDDXVP 76 > > S S ++SGS S S + STSS S P > > Sbjct: 2327 SRSSTMGNNSGSPSASGTTSPSTSSSISSGPDSP 2360 > > > > > > > > Score = 21.2 bits (43), Expect = 1.2 > > Identities = 12/47 (25%), Positives = 17/47 (36%) > > > > Query: 27 KAAMKAFNSTPPKKKNSSSGQHDSSSGSSSDSSSXDGSTSSDDSXDD 73 > > K KA KKK+ D S S+D D S+ + + > > Sbjct: 1144 KVRKKAEKEKLKKKKHRKGDSSDESDSDSNDELDLDVRKSTKEMTQE 1190 > > > > > > > > Score = 20.0 bits (40), Expect = 2.6 > > Identities = 11/35 (31%), Positives = 18/35 (51%), Gaps = 1/35 (2%) > > > > Query: 42 NSSSGQHDSSSGSSS-DSSSXDGSTSSDDSXDDXV 75 > > + SS DS GSSS + S + + ++ +D V > > Sbjct: 495 SDSSSDSDSEEGSSSSNEDSDEQNDVDEEDDEDVV 529 > > > > > > > > Score = 18.5 bits (36), Expect = 7.6 > > Identities = 7/14 (50%), Positives = 9/14 (64%) > > > > Query: 49 DSSSGSSSDSSSXD 62 > > +SS+G SDS D > > Sbjct: 1252 NSSNGEESDSEKAD 1265 > > > > > > Lambda K H > > 0.294 0.109 0.279 > > > > Gapped > > Lambda K H > > 0.267 0.0410 0.140 > > > > > > Matrix: BLOSUM62 > > Gap Penalties: Existence: 11, Extension: 1 > > Number of Hits to DB: 2307 > > Number of Sequences: 0 > > Number of extensions: 39 > > Number of successful extensions: 11 > > Number of sequences better than 10.0: 1 > > Number of HSP's better than 10.0 without gapping: 1 > > Number of HSP's successfully gapped in prelim test: 0 > > Number of HSP's that attempted gapping in prelim test: 0 > > Number of HSP's gapped (non-prelim): 10 > > length of query: 80 > > length of database: 115,000 > > effective HSP length: 56 > > effective length of query: 24 > > effective length of database: 114,944 > > effective search space: 2758656 > > effective search space used: 2758656 > > T: 11 > > A: 40 > > X1: 17 ( 7.2 bits) > > X2: 38 (14.6 bits) > > X3: 64 (24.7 bits) > > S1: 35 (18.0 bits) > > S2: 35 (18.1 bits) > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From senger at ebi.ac.uk Mon Jul 19 09:39:55 2004 From: senger at ebi.ac.uk (Martin Senger) Date: Mon Jul 19 09:41:40 2004 Subject: [Bioperl-l] Re: retreive medline records by title In-Reply-To: <200407191317.i6JDGlKu021263@portal.open-bio.org> Message-ID: > My doubt is how to retreive all publication medline records for a > species . i want to retrieve all the medline records having the > species name in its title > 1) You can use either a provided script (let's say for testing first before coding it into your own program/script) (everything on a single line): perl ~/bioperl-live/scripts/biblio/biblio.PLS \ -c - -find "Antheraea mylitta" -attrs title This will give you 15 citations. (But to be honest, you will get the same 15 even without -attrs title - which means that either all citations about Antheraea mylitta have it in the title, or there is a bug in the Biblio code ... obviously if the later is true I would like to hear about it in order to fix it). 2) Or here is how to code it for yourself (it's practically identical to the code you sent here - except an additional parameter in the find method with the name 'title'): #!/usr/bin/perl -w use strict; use Bio::Biblio; use Bio::Biblio::IO; use Data::Dumper; my $count =1; my @ids= @{ new Bio::Biblio->find ("Antheraea mylitta", 'title')->get_all_ids } ; foreach my $ids (@ids) { print "$count"," "; print $ids,"\n"; $count ++; } Cheers, Martin -- Martin Senger EMBL Outstation - Hinxton Senger@EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From amackey at pcbi.upenn.edu Mon Jul 19 10:07:35 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Jul 19 10:09:13 2004 Subject: [Bioperl-l] Blast Output and frac_aligned_query In-Reply-To: References: <40FBC7A0.3040201@ed.ac.uk> Message-ID: BLAST is a great search algorithm, but a pretty poor pairwise alignment algorithm (i.e. bl2seq). Why not use Smith-Waterman (and/or LALIGN, if you're interested in repeats) if you want to get a "believable" alignment with which to do further analysis? HSP tiling is a process for stringing together incomplete alignments; no matter how you do it, you're never guaranteed to get the "right" answer. So why worry about doing it better, when you shouldn't be doing it at all? -Aaron > On Mon, 19 Jul 2004, James Wasmuth wrote: > >> First apologies if this has been debated before, didn;t see it in the >> archive and been away for a while, so unlcear on current state of >> affairs. >> >> I have a bl2seq output (below) and when I extract its statistics, I am >> told that 156% of the query is aligned. >> >> This is probably because of multiple HSP produced as the protein >> appears >> highly repetitive. Would this mess up the tiling the hsps, in its >> current implementation? -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From Annie.Law at nrc-cnrc.gc.ca Mon Jul 19 11:38:59 2004 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Mon Jul 19 11:41:02 2004 Subject: [Bioperl-l] Validation of information loaded into bioperl-db Message-ID: <10C94843061E094A98C02EB77CFC328722FE62@nrcmrdex1d.imsb.nrc.ca> Hi, I would like to know some things about validation of bioperl-db. I am near the point where this bioperl-db can be very helpful for Me (so close) however I just have some questions about validation and updating the database once it has already been loaded up. 1. When you use the load_seqdatabase.pl script how can you check if the load has been successful. I would like to automate this process. I plan to run a cron job that would load the datbase but would Like to know an efficient method to see if the load has been successful. You can create log files to read or script the runs but is there something else that can be done. 2. I know that there are scriplets in the same directory as load_seqdatbase.pl that can be used in conjunction With load_seqdatbase.pl when you use the options lookup and mergeobjs. I would like to know if the same script can be Used for the load_ontology.pl script. 3. In both load_seqdatabase.pl and load_ontology.pl there is the option --remove. I want to remove all old information and refresh with new data. Do I use --remove in conjunction with --lookup and --mergobjs with freshen-annot.pl. I don't understand the need for the --remove option if you are Already using --lookup and --mergobjs with freshen-annot.pl It seems that this would be redundant but perhaps there is something I am missing. 4. What is the default behavior if I don't use the options such as lookup and mergeobjs? Will all the data just be overwritten When I use load_ontology.pl and load_seqdatabase.pl? Thanks very much, Annie. From skchan at cs.usask.ca Mon Jul 19 12:32:03 2004 From: skchan at cs.usask.ca (Simon K. Chan) Date: Mon Jul 19 12:33:55 2004 Subject: [Bioperl-l] Bioperl support for AGAVE xml Message-ID: <1090254723.40fbf7835b59c@webmail.usask.ca> Hi, Is there currently any support for the AGAVE xml format in bioperl? Section III.7.7 of the bptutorial specifies that there is some support for AGAVE, but I have been unable to locate this support. There is a Bio::SeqIO::agave.pm module located here http://www.lifecde.com/products/agave/agave.pm According to the bioperl/biojava mailing list archives, the above code was submitted to the lists for comments/suggestions in 2001, but it appears that nothing much has happened since. I have modified some of the methods in the module to suit my needs, but would like to know what else is out there in terms of bioperl support. Many thanks for any comments/suggestions. Cheers, -- Warmest Regards, Simon K. Chan Bioinformatics, Crosby Lab skchan@cs.usask.ca From brian_osborne at cognia.com Mon Jul 19 12:57:46 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Jul 19 13:00:15 2004 Subject: [Bioperl-l] Bioperl support for AGAVE xml In-Reply-To: <1090254723.40fbf7835b59c@webmail.usask.ca> Message-ID: Simon, Yes, there's this stray sentence. "Several of these have been proposed and bioperl has at least some support for three: GAME, BSML and AGAVE." I'm not sure what was meant by the word "some" at that time but my guess would be that 1.4 has no support for AGAVE. Would you like to see agave.pm put into Bioperl? If your agave.pm is functional then this is just a matter of adding it and writing a test script. What XML parser does it use? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Simon K. Chan Sent: Monday, July 19, 2004 12:32 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Bioperl support for AGAVE xml Hi, Is there currently any support for the AGAVE xml format in bioperl? Section III.7.7 of the bptutorial specifies that there is some support for AGAVE, but I have been unable to locate this support. There is a Bio::SeqIO::agave.pm module located here http://www.lifecde.com/products/agave/agave.pm According to the bioperl/biojava mailing list archives, the above code was submitted to the lists for comments/suggestions in 2001, but it appears that nothing much has happened since. I have modified some of the methods in the module to suit my needs, but would like to know what else is out there in terms of bioperl support. Many thanks for any comments/suggestions. Cheers, -- Warmest Regards, Simon K. Chan Bioinformatics, Crosby Lab skchan@cs.usask.ca _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From skchan at cs.usask.ca Mon Jul 19 13:28:32 2004 From: skchan at cs.usask.ca (Simon K. Chan) Date: Mon Jul 19 13:30:26 2004 Subject: [Bioperl-l] Bioperl support for AGAVE xml In-Reply-To: References: Message-ID: <1090258112.40fc04c0d05a1@webmail.usask.ca> Hi Brian, > Yes, there's this stray sentence. "Several of these have been proposed and > bioperl has at least some support for three: GAME, BSML and AGAVE." I'm not > sure what was meant by the word "some" at that time but my guess would be > that 1.4 has no support for AGAVE. Ok, thanks for clearing that up. Should we update that sentence then? > Would you like to see agave.pm put into > Bioperl? If your agave.pm is functional then this is just a matter of adding > it and writing a test script. What XML parser does it use? At the moment, I'm using regexps to parse the XML (similar to the tigr.pm module). I noticed that game.pm uses XML::Parser::PerlSAX. Should I use this one instead of the regexps? Yes, I'd like agave.pm added to CVS so I can commit changes/additions as they come up. Let me know if you have any other comments/suggestions... Thanks. > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Simon K. Chan > Sent: Monday, July 19, 2004 12:32 PM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] Bioperl support for AGAVE xml > > Hi, > > Is there currently any support for the AGAVE xml format in bioperl? > > Section III.7.7 of the bptutorial specifies that there is some support for > AGAVE, but I have been unable to locate this support. > > There is a Bio::SeqIO::agave.pm module located here > http://www.lifecde.com/products/agave/agave.pm > > According to the bioperl/biojava mailing list archives, the above code was > submitted to the lists for comments/suggestions in 2001, but it appears that > nothing much has happened since. > > I have modified some of the methods in the module to suit my needs, but > would > like to know what else is out there in terms of bioperl support. > > Many thanks for any comments/suggestions. > > Cheers, > > > -- > Warmest Regards, > Simon K. Chan > Bioinformatics, Crosby Lab > skchan@cs.usask.ca > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From jason at cgt.duhs.duke.edu Mon Jul 19 13:47:02 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jul 19 13:48:55 2004 Subject: [Bioperl-l] Bioperl support for AGAVE xml In-Reply-To: <1090258112.40fc04c0d05a1@webmail.usask.ca> References: <1090258112.40fc04c0d05a1@webmail.usask.ca> Message-ID: On Mon, 19 Jul 2004, Simon K. Chan wrote: > Hi Brian, > > > > Yes, there's this stray sentence. "Several of these have been proposed and > > bioperl has at least some support for three: GAME, BSML and AGAVE." I'm not > > sure what was meant by the word "some" at that time but my guess would be > > that 1.4 has no support for AGAVE. > > Ok, thanks for clearing that up. Should we update that sentence then? > > > Would you like to see agave.pm put into > > Bioperl? If your agave.pm is functional then this is just a matter of adding > > it and writing a test script. What XML parser does it use? > > > At the moment, I'm using regexps to parse the XML (similar to the tigr.pm > module). I noticed that game.pm uses XML::Parser::PerlSAX. Should I use this > one instead of the regexps? > I would argue we should be aiming to use XML::SAX in the future - it can use various different modules including XML::Parser::PerlSAX as back-ends. We should NOT be using XML::DOM anymore and it would be nice to change the BSML parser over to using a SAX model at some point. You can see an example of XML::SAX in action in my Bio::SeqIO::tigrxml which is just in CVS. > Yes, I'd like agave.pm added to CVS so I can commit changes/additions as they > come up. > > Let me know if you have any other comments/suggestions... > > Thanks. > > > > > Brian O. > > > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Simon K. Chan > > Sent: Monday, July 19, 2004 12:32 PM > > To: bioperl-l@bioperl.org > > Subject: [Bioperl-l] Bioperl support for AGAVE xml > > > > Hi, > > > > Is there currently any support for the AGAVE xml format in bioperl? > > > > Section III.7.7 of the bptutorial specifies that there is some support for > > AGAVE, but I have been unable to locate this support. > > > > There is a Bio::SeqIO::agave.pm module located here > > http://www.lifecde.com/products/agave/agave.pm > > > > According to the bioperl/biojava mailing list archives, the above code was > > submitted to the lists for comments/suggestions in 2001, but it appears that > > nothing much has happened since. > > > > I have modified some of the methods in the module to suit my needs, but > > would > > like to know what else is out there in terms of bioperl support. > > > > Many thanks for any comments/suggestions. > > > > Cheers, > > > > > > -- > > Warmest Regards, > > Simon K. Chan > > Bioinformatics, Crosby Lab > > skchan@cs.usask.ca > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From brian_osborne at cognia.com Mon Jul 19 13:54:37 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Jul 19 13:57:11 2004 Subject: [Bioperl-l] Bioperl support for AGAVE xml In-Reply-To: <1090258112.40fc04c0d05a1@webmail.usask.ca> Message-ID: Simon, I'll fix the documentation in various places once your module is in CVS and working. Regarding the parser: I'm not making any recommendations, I asked because there have been suggestions at various times to reduce the number of different XML parsers used by Bioperl, we just didn't want to see your module using yet-another-one! It sounds like you don't have your own CVS account. Do you anticipate continuing to work on this and other modules or did you just want me to commit your code? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Simon K. Chan Sent: Monday, July 19, 2004 1:29 PM To: Brian Osborne Cc: bioperl-l@bioperl.org Subject: RE: [Bioperl-l] Bioperl support for AGAVE xml Hi Brian, > Yes, there's this stray sentence. "Several of these have been proposed and > bioperl has at least some support for three: GAME, BSML and AGAVE." I'm not > sure what was meant by the word "some" at that time but my guess would be > that 1.4 has no support for AGAVE. Ok, thanks for clearing that up. Should we update that sentence then? > Would you like to see agave.pm put into > Bioperl? If your agave.pm is functional then this is just a matter of adding > it and writing a test script. What XML parser does it use? At the moment, I'm using regexps to parse the XML (similar to the tigr.pm module). I noticed that game.pm uses XML::Parser::PerlSAX. Should I use this one instead of the regexps? Yes, I'd like agave.pm added to CVS so I can commit changes/additions as they come up. Let me know if you have any other comments/suggestions... Thanks. > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Simon K. Chan > Sent: Monday, July 19, 2004 12:32 PM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] Bioperl support for AGAVE xml > > Hi, > > Is there currently any support for the AGAVE xml format in bioperl? > > Section III.7.7 of the bptutorial specifies that there is some support for > AGAVE, but I have been unable to locate this support. > > There is a Bio::SeqIO::agave.pm module located here > http://www.lifecde.com/products/agave/agave.pm > > According to the bioperl/biojava mailing list archives, the above code was > submitted to the lists for comments/suggestions in 2001, but it appears that > nothing much has happened since. > > I have modified some of the methods in the module to suit my needs, but > would > like to know what else is out there in terms of bioperl support. > > Many thanks for any comments/suggestions. > > Cheers, > > > -- > Warmest Regards, > Simon K. Chan > Bioinformatics, Crosby Lab > skchan@cs.usask.ca > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From skchan at cs.usask.ca Mon Jul 19 15:17:59 2004 From: skchan at cs.usask.ca (Simon K. Chan) Date: Mon Jul 19 15:19:48 2004 Subject: [Bioperl-l] Bioperl support for AGAVE xml In-Reply-To: References: Message-ID: <1090264679.40fc1e67270a6@webmail.usask.ca> Hi, Ok, Jason. XML::SAX it is! Brian, No, I don't have my own bioperl cvs account. If I could get one, that would be great because I definately will be continously working on agave.pm (and possibly other modules). Let me know the details. Many thanks, All. Cheers, -- Warmest Regards, Simon K. Chan Bioinformatics, Crosby Lab skchan@cs.usask.ca Quoting Brian Osborne : > Simon, > > I'll fix the documentation in various places once your module is in CVS and > working. Regarding the parser: I'm not making any recommendations, I asked > because there have been suggestions at various times to reduce the number of > different XML parsers used by Bioperl, we just didn't want to see your > module using yet-another-one! > > It sounds like you don't have your own CVS account. Do you anticipate > continuing to work on this and other modules or did you just want me to > commit your code? > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Simon K. Chan > Sent: Monday, July 19, 2004 1:29 PM > To: Brian Osborne > Cc: bioperl-l@bioperl.org > Subject: RE: [Bioperl-l] Bioperl support for AGAVE xml > > Hi Brian, > > > > Yes, there's this stray sentence. "Several of these have been proposed and > > bioperl has at least some support for three: GAME, BSML and AGAVE." I'm > not > > sure what was meant by the word "some" at that time but my guess would be > > that 1.4 has no support for AGAVE. > > Ok, thanks for clearing that up. Should we update that sentence then? > > > Would you like to see agave.pm put into > > Bioperl? If your agave.pm is functional then this is just a matter of > adding > > it and writing a test script. What XML parser does it use? > > > At the moment, I'm using regexps to parse the XML (similar to the tigr.pm > module). I noticed that game.pm uses XML::Parser::PerlSAX. Should I use > this > one instead of the regexps? > > Yes, I'd like agave.pm added to CVS so I can commit changes/additions as > they > come up. > > Let me know if you have any other comments/suggestions... > > Thanks. > > > > > Brian O. > > > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Simon K. Chan > > Sent: Monday, July 19, 2004 12:32 PM > > To: bioperl-l@bioperl.org > > Subject: [Bioperl-l] Bioperl support for AGAVE xml > > > > Hi, > > > > Is there currently any support for the AGAVE xml format in bioperl? > > > > Section III.7.7 of the bptutorial specifies that there is some support for > > AGAVE, but I have been unable to locate this support. > > > > There is a Bio::SeqIO::agave.pm module located here > > http://www.lifecde.com/products/agave/agave.pm > > > > According to the bioperl/biojava mailing list archives, the above code was > > submitted to the lists for comments/suggestions in 2001, but it appears > that > > nothing much has happened since. > > > > I have modified some of the methods in the module to suit my needs, but > > would > > like to know what else is out there in terms of bioperl support. > > > > Many thanks for any comments/suggestions. > > > > Cheers, > > > > > > -- > > Warmest Regards, > > Simon K. Chan > > Bioinformatics, Crosby Lab > > skchan@cs.usask.ca > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From smarkel at scitegic.com Mon Jul 19 16:47:26 2004 From: smarkel at scitegic.com (Scott Markel) Date: Mon Jul 19 16:51:02 2004 Subject: [Bioperl-l] problem setting mismatch penalty in StandAloneBlast.pm without setting quiet() Message-ID: <40FC335E.5050206@scitegic.com> I'm using BioPerl 1.4 on a Windows XP box. When I set the mismatch penalty for blastn, I get both the "-q" command line option, which I want, and errors getting redirected to /dev/null, which I don't want. The problem seems to be the following line in Bio::Tools::Run::StandAloneBlast's _setparams(): if ($self->quiet()) { $param_string .= ' 2>/dev/null';} A constructor call $factory = Bio::Tools::Run::StandAloneBlast->new(@params); followed by either $factory->q(-3) or $factory->quiet(-3) both result in "-q -3" being added to the command line *and* " 2>/dev/null" being appended to the command line. I've tried calling $factory->verbose() with verbosity values of -1, 0, 1, and 2, but this didn't get rid of the /dev/null redirection. How do I set the mismatch value without tripping $self->quiet() in StandAloneBlast.pm? Scott -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From jason at cgt.duhs.duke.edu Mon Jul 19 17:34:50 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jul 19 17:36:41 2004 Subject: [Bioperl-l] problem setting mismatch penalty in StandAloneBlast.pm without setting quiet() In-Reply-To: <40FC335E.5050206@scitegic.com> References: <40FC335E.5050206@scitegic.com> Message-ID: Oops quiet and q are being stored in the same slot. Try adding this in your code (before a factory is instantiated). sub Bio::Tools::Run::StandAloneBlast::quiet { my $self = shift; return $self->{'_quiet'} = shift if @_; return $self->{'_quiet'}; } I've added this change in CVS and a test in StandAloneBlast.t -jason On Mon, 19 Jul 2004, Scott Markel wrote: > I'm using BioPerl 1.4 on a Windows XP box. When I set the > mismatch penalty for blastn, I get both the "-q" command line > option, which I want, and errors getting redirected to /dev/null, > which I don't want. The problem seems to be the following > line in Bio::Tools::Run::StandAloneBlast's _setparams(): > > if ($self->quiet()) { $param_string .= ' 2>/dev/null';} > > A constructor call > > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > followed by either $factory->q(-3) or $factory->quiet(-3) > both result in "-q -3" being added to the command line *and* > " 2>/dev/null" being appended to the command line. > > I've tried calling $factory->verbose() with verbosity values > of -1, 0, 1, and 2, but this didn't get rid of the /dev/null > redirection. > > How do I set the mismatch value without tripping $self->quiet() > in StandAloneBlast.pm? > > Scott > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From smarkel at scitegic.com Mon Jul 19 18:17:30 2004 From: smarkel at scitegic.com (Scott Markel) Date: Mon Jul 19 18:20:04 2004 Subject: [Bioperl-l] problem setting mismatch penalty in StandAloneBlast.pm without setting quiet() In-Reply-To: References: <40FC335E.5050206@scitegic.com> Message-ID: <40FC487A.4040309@scitegic.com> Jason, Thanks for the quick reply. Your method replacement nicely took care of my problem. Scott Jason Stajich wrote: > Oops quiet and q are being stored in the same slot. > > Try adding this in your code (before a factory is instantiated). > > sub Bio::Tools::Run::StandAloneBlast::quiet { > my $self = shift; > return $self->{'_quiet'} = shift if @_; > return $self->{'_quiet'}; > } > > I've added this change in CVS and a test in StandAloneBlast.t > > -jason > On Mon, 19 Jul 2004, Scott Markel wrote: > > >>I'm using BioPerl 1.4 on a Windows XP box. When I set the >>mismatch penalty for blastn, I get both the "-q" command line >>option, which I want, and errors getting redirected to /dev/null, >>which I don't want. The problem seems to be the following >>line in Bio::Tools::Run::StandAloneBlast's _setparams(): >> >> if ($self->quiet()) { $param_string .= ' 2>/dev/null';} >> >>A constructor call >> >> $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >> >>followed by either $factory->q(-3) or $factory->quiet(-3) >>both result in "-q -3" being added to the command line *and* >>" 2>/dev/null" being appended to the command line. >> >>I've tried calling $factory->verbose() with verbosity values >>of -1, 0, 1, and 2, but this didn't get rid of the /dev/null >>redirection. >> >>How do I set the mismatch value without tripping $self->quiet() >>in StandAloneBlast.pm? >> >>Scott >> >> > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From ghrose at unm.edu Mon Jul 19 16:46:37 2004 From: ghrose at unm.edu (ghrose@unm.edu) Date: Mon Jul 19 22:34:03 2004 Subject: [Bioperl-l] html stripped from blast report Message-ID: <1090269997.40fc332d2a985@webmail1.unm.edu> Dear Bioperl, I'm trying to use the code in http://bio.perl.org/Core/Latest/faq.html#Q3.7 to strip html out of a blast report that is html format. ******The code #!/usr/bin/perl use strict; use DBI; use Bio::Perl; use Bio::SearchIO; use Bio::SearchIO::blast; use HTML::Strip; my $hs = new HTML::Strip; # replace the blast parser's _readline method with one that # auto-strips HTML: sub Bio::SearchIO::blast::_readline { my ($self, @args) = @_; return $hs->parse($self->SUPER::_readline(@args)); } my $in = new Bio::SearchIO(-format => 'blast', -file => $ARGV[0]); *******gives me the following error. georges-Computer:~/Desktop/ben-p020 george$ perl insert_7_4hstrip.pl p018xnr.html > test3 Can't locate object method "_readline" via package "main" at insert_7_4hstrip.pl line 17. I believe I have the HTML::Strip installed correctly. I'm running this script on macosx10.3. Can you give me some advise on how to solve this problem? Thank you, George From james.wasmuth at ed.ac.uk Tue Jul 20 04:50:55 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Tue Jul 20 04:57:29 2004 Subject: [Bioperl-l] Blast Output and frac_aligned_query In-Reply-To: References: <40FBC7A0.3040201@ed.ac.uk> Message-ID: <40FCDCEF.8090003@ed.ac.uk> Thanks Aaron, time is a slight issue as I'm carrying out several million comparisions but I'll concede accuracy is the more important feature... For anyone who cares; a quick way to cope with the issue highlighted: my ($qbeg, $qend) = $hit->range('query'); my $hit_len_q=$qend-$qbeg+1; my $aln_q_true = $hit_len_q/($res->query_length); -james Aaron J. Mackey wrote: > > BLAST is a great search algorithm, but a pretty poor pairwise > alignment algorithm (i.e. bl2seq). Why not use Smith-Waterman (and/or > LALIGN, if you're interested in repeats) if you want to get a > "believable" alignment with which to do further analysis? > > HSP tiling is a process for stringing together incomplete alignments; > no matter how you do it, you're never guaranteed to get the "right" > answer. So why worry about doing it better, when you shouldn't be > doing it at all? > > -Aaron > >> On Mon, 19 Jul 2004, James Wasmuth wrote: >> >>> First apologies if this has been debated before, didn;t see it in the >>> archive and been away for a while, so unlcear on current state of >>> affairs. >>> >>> I have a bl2seq output (below) and when I extract its statistics, I am >>> told that 156% of the query is aligned. >>> >>> This is probably because of multiple HSP produced as the protein >>> appears >>> highly repetitive. Would this mess up the tiling the hsps, in its >>> current implementation? >> > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From amackey at pcbi.upenn.edu Tue Jul 20 06:44:03 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Jul 20 06:45:34 2004 Subject: [Bioperl-l] Blast Output and frac_aligned_query In-Reply-To: <40FCDCEF.8090003@ed.ac.uk> References: <40FBC7A0.3040201@ed.ac.uk> <40FCDCEF.8090003@ed.ac.uk> Message-ID: On Jul 20, 2004, at 4:50 AM, James Wasmuth wrote: > Thanks Aaron, time is a slight issue as I'm carrying out several > million comparisions but I'll concede accuracy is the more important > feature... Right, another common fallacy with bl2seq: several million pairwise comparisons always sounds like alot, until one realizes that a single search of the "nr" database is of the same magnitude. Sure, BLAST will finish this amount of work in less than 10 minutes, but do we really mind waiting an hour or two to get better alignments? You're going to spend far more time on the analysis, why not make it easier on yourself in the long run (and not have to worry about niggling questions like "Hmm, I wonder if BLAST actually aligned all of the homologous regions, or only those disjoint, slowly-evolving fragments it could easily find"; this is particularly relevant when using BLAST to align DNA to either DNA or protein). As an aside, this is exactly the kind of batch processing targeted by various task distribution clients (e.g. "disperse"). With a modicum of processing power (say 4-8 modern CPUs), we routinely batch process millions of pairwise alignments with SSEARCH, PRSS, and/or LALIGN. Additionally, for the common "all-vs-all" matrix of pairwise alignment case, SSEARCH has the "-I" option, which evaluates only the lower-triangle of the matrix (thus, providing the A vs. B, but not B vs. A alignment; these are guaranteed to have identical alignments and scores, but probably different E() values and bit scores; but you were already using PRSS or PRFX to confirm pairwise significances, right?). And to add just a bit more icing to the cake, SSEARCH runs efficiently under both PVM and MPI parallel environments; so the 10-100 fold "slow-down" associated with SW can be nicely ameliorated with 8 to 32 cluster nodes (unless your database is very big, more than 32 nodes will typically not be any more efficient). For those with multi-CPU machines, you can also build threaded SSEARCH for single workstation use. This public service message brought to you by the fine makers of: FASTA, the original search algorithm Add grains of salt to taste. And thanks, James, for being my scapegoat of the day. -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From mark.schreiber at group.novartis.com Mon Jul 19 23:34:48 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Tue Jul 20 08:38:28 2004 Subject: [Bioperl-l] Chou-Fasman Message-ID: Hello - Does anyone have an example of applying chou fasman parameters to predicting the most probable secondary structure? Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From amackey at pcbi.upenn.edu Tue Jul 20 09:16:57 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Jul 20 09:18:31 2004 Subject: [Bioperl-l] Chou-Fasman In-Reply-To: References: Message-ID: <1393849C-DA4F-11D8-91D3-000A9577009E@pcbi.upenn.edu> You could try: http://fasta.bioch.virginia.edu/fasta_www/chofas.htm I believe this particular "chofas" algorithm is included in the older FASTA2 package: ftp://ftp.virginia.edu/pub/fasta/fasta2.shar.Z -Aaron On Jul 19, 2004, at 11:34 PM, mark.schreiber@group.novartis.com wrote: > Hello - > > Does anyone have an example of applying chou fasman parameters to > predicting the most probable secondary structure? > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From pstogios at uhnres.utoronto.ca Tue Jul 20 10:07:04 2004 From: pstogios at uhnres.utoronto.ca (Peter J Stogios) Date: Tue Jul 20 10:03:33 2004 Subject: [Bioperl-l] All-vs-all BLAST Message-ID: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> Hello, I'm looking for a script to do an all-vs-all BLAST comparison of a set of sequences. This has been done many times by others and I don't want to "reinvent the wheel", can anyone help me find such a script or utility? Thanks, ~ Peter J Stogios Grad student, Priv? Lab Dept. of Medical Biophysics, University of Toronto e: pstogios@uhnres.utoronto.ca w: http://xtal.uhnres.utoronto.ca/prive From paulo.david at netvisao.pt Tue Jul 20 10:42:45 2004 From: paulo.david at netvisao.pt (Paulo Almeida) Date: Tue Jul 20 10:43:05 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> Message-ID: <40FD2F65.50005@netvisao.pt> Hi, You want to compare all the possible pairs of sequences in the set? If that is it, you can try putting all the sequences in an array, blast the first against all the others, shift it out of the array and then repeat the process until the array is empty. If it's a multiple alignment you want, you could probably use Bio::Tools::Run::Alignment::Clustalw (I was going to link to the webpage, but that link is broken in the Bioperl webpage). -Paulo Peter J Stogios wrote: > Hello, > > I'm looking for a script to do an all-vs-all BLAST comparison of a set > of sequences. From jason at cgt.duhs.duke.edu Tue Jul 20 10:52:18 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jul 20 10:54:12 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: <40FD2F65.50005@netvisao.pt> References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> <40FD2F65.50005@netvisao.pt> Message-ID: If you want to run blast I don't think you really need Bioperl. For a protein seq set called 'sequences' % formatdb -i sequence -p T -o T output with -m9 tab output using E < 1e-3 % blastall -p blastp -i sequence -d sequence -o seq-vs-seq.BLASTP -m 9 -e 1e-3 Or in wu-blast % wu-formatdb -i sequence -p T (also consider W and T parameters to get some speed depending on what you want to find) % blastp -i sequence -d sequence -o seq-vs-seq.BLASTP -postsw -links E=1e-3 -wordmask seg Or FASTA Mask low-complexity first % pseg sequences -q -z 1 > sequences.pseg Then run (remove the _t if you don't want threaded) also consider mpi or pvm version if you have a cluster % fasta34_t -Q -S -m 9 -d 0 -E 1e-3 sequences.pseg sequences.pseg > seq-vs-seq.FASTA Bio::SearchIO can parse all these outputs. -jason On Tue, 20 Jul 2004, Paulo Almeida wrote: > Hi, > > You want to compare all the possible pairs of sequences in the set? If > that is it, you can try putting all the sequences in an array, blast the > first against all the others, shift it out of the array and then repeat > the process until the array is empty. If it's a multiple alignment you > want, you could probably use Bio::Tools::Run::Alignment::Clustalw > > (I was going to link to the webpage, but that link is broken in the > Bioperl webpage). That would be: http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Alignment/Clustalw.html I didn't bother making a release dir for bioperl-run at this point. > > -Paulo > > Peter J Stogios wrote: > > > Hello, > > > > I'm looking for a script to do an all-vs-all BLAST comparison of a set > > of sequences. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From james.wasmuth at ed.ac.uk Tue Jul 20 10:57:48 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Tue Jul 20 11:04:19 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: <40FD2F65.50005@netvisao.pt> References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> <40FD2F65.50005@netvisao.pt> Message-ID: <40FD32EC.40501@ed.ac.uk> The documentation for TCoffee and Clustalw are not available through docs 1.4. Though I'll stand corrected. try: http://doc.bioperl.org/releases/bioperl-1.0.2/Bio/Tools/Run/Alignment/Clustalw.html -james Paulo Almeida wrote: > Hi, > > You want to compare all the possible pairs of sequences in the set? If > that is it, you can try putting all the sequences in an array, blast > the first against all the others, shift it out of the array and then > repeat the process until the array is empty. If it's a multiple > alignment you want, you could probably use > Bio::Tools::Run::Alignment::Clustalw > > (I was going to link to the webpage, but that link is broken in the > Bioperl webpage). > > -Paulo > > Peter J Stogios wrote: > >> Hello, >> >> I'm looking for a script to do an all-vs-all BLAST comparison of a >> set of sequences. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From jason at cgt.duhs.duke.edu Tue Jul 20 11:13:50 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jul 20 11:15:50 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: <40FD32EC.40501@ed.ac.uk> References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> <40FD2F65.50005@netvisao.pt> <40FD32EC.40501@ed.ac.uk> Message-ID: They are moved to a different directory http://doc.bioperl.org/bioperl-run/ see http://www.bioperl.org/Core/Latest/faq.html#Q6.2 On Tue, 20 Jul 2004, James Wasmuth wrote: > The documentation for TCoffee and Clustalw are not available through > docs 1.4. Though I'll stand corrected. > > try: > http://doc.bioperl.org/releases/bioperl-1.0.2/Bio/Tools/Run/Alignment/Clustalw.html > > -james > > Paulo Almeida wrote: > > > Hi, > > > > You want to compare all the possible pairs of sequences in the set? If > > that is it, you can try putting all the sequences in an array, blast > > the first against all the others, shift it out of the array and then > > repeat the process until the array is empty. If it's a multiple > > alignment you want, you could probably use > > Bio::Tools::Run::Alignment::Clustalw > > > > (I was going to link to the webpage, but that link is broken in the > > Bioperl webpage). > > > > -Paulo > > > > Peter J Stogios wrote: > > > >> Hello, > >> > >> I'm looking for a script to do an all-vs-all BLAST comparison of a > >> set of sequences. > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From hlapp at gmx.net Tue Jul 20 11:29:19 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Jul 20 11:31:06 2004 Subject: [Bioperl-l] Validation of information loaded into bioperl-db In-Reply-To: <10C94843061E094A98C02EB77CFC328722FE62@nrcmrdex1d.imsb.nrc.ca> Message-ID: <9141C642-DA61-11D8-821C-000A959EB4C4@gmx.net> On Monday, July 19, 2004, at 08:38 AM, Law, Annie wrote: > > 1. When you use the load_seqdatabase.pl script how can you check if > the load > has been successful. I would like to automate this process. I plan to > run a > cron job that would load the datbase but would Like to know an > efficient > method to see if the load has been successful. I have every update job write a log file and look at them manually, believe it or not. Almost all jobs do succeed, and you can normally tell from the size of the log file whether or not something went wrong. I've found all real problems so far to happen fast or to not come alone, unlike the occasional entry with a species parsing problem or so. So, if the log is 10x shorter or 10x longer than usual I would look at it and investigate. Sorry, no script that does this. Also, you could set up a unit test that tests a certain entry. This is difficult though due to the volatile nature of the data sources. > You can create log files to > read or script the runs but is there something else that can be done. > > 2. I know that there are scriplets in the same directory as > load_seqdatbase.pl that can be used in conjunction > With load_seqdatbase.pl when you use the options lookup and mergeobjs. > I > would like to know if the same script can be Used for the > load_ontology.pl > script. No they can't, because they work on Bio::SeqI objects, not Bio::Ontology::TermI objects. I myself don't merge old and new terms, I just update them (i.e., --lookup). Terms don't really have a lot of annotation in associated tables, and bioperl-db fully deals with the synonyms. > > 3. In both load_seqdatabase.pl and load_ontology.pl there is the option > --remove. I want to remove all old information and refresh with new > data. > Do I use --remove in conjunction with > --lookup and --mergobjs with freshen-annot.pl. I don't understand > the need > for the --remove option if you are > Already using --lookup and --mergobjs with freshen-annot.pl You either remove or merge old objects, not both at the same time. Also, I wouldn't abuse --mergeobjs for a script that removes the old object (although you could do that) because it will be slower. > It seems that > this would be redundant but perhaps there is something > I am missing. > > 4. What is the default behavior if I don't use the options such as > lookup > and mergeobjs? Will all the data just be overwritten When I use > load_ontology.pl and load_seqdatabase.pl? If you don't use --lookup (and without it --mergeobjs will have no effect because then there can't be a found object either) all entries will be inserted. Those that exist already as determined by their alternative key (accession, version, namespace for bioentries) fill fail to insert and hence will remain in the database as they were before. Hth, -hilmar > > Thanks very much, > Annie. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From tony at mrc-lmb.cam.ac.uk Tue Jul 20 11:09:29 2004 From: tony at mrc-lmb.cam.ac.uk (Tony Andreeva) Date: Tue Jul 20 11:45:45 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> Message-ID: <40FD35A9.5070400@mrc-lmb.cam.ac.uk> To be more precise: Use bl2seq if you want to compare each two pairs. using bl2seq you will avoid formatting % bl2seq -p blastp -i sequence1 -j sequence2 -o seq1-vs-seq2.out ( and suitable parameters ) Use blastall if you want to search with each sequence against the others % formatdb -i seq.db -o T % blastall -p blastp -i seq.db -d seq.db -o seq.out ( and suitable parameters ) By simply typing: %blastall --help or %bl2seq --help you can obtain quite detailed help for the program settings. Hope that helps Tony -- Dr.Antonina Andreeva MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252959 Peter J Stogios wrote: > Hello, > > I'm looking for a script to do an all-vs-all BLAST comparison of a set > of sequences. This has been done many times by others and I don't > want to "reinvent the wheel", can anyone help me find such a script or > utility? > > Thanks, > > ~ > Peter J Stogios > Grad student, Priv? Lab > Dept. of Medical Biophysics, University of Toronto > e: pstogios@uhnres.utoronto.ca > w: http://xtal.uhnres.utoronto.ca/prive > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl- > l From james.wasmuth at ed.ac.uk Tue Jul 20 11:53:56 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Tue Jul 20 12:00:49 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: <40FD35A9.5070400@mrc-lmb.cam.ac.uk> References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> <40FD35A9.5070400@mrc-lmb.cam.ac.uk> Message-ID: <40FD4014.8080309@ed.ac.uk> Although I am going to support Aaron's call earlier today about using Ssearch for pairwise comparison, the increase in accuracy is worth the slight increase in time... Tony Andreeva wrote: > To be more precise: > > Use bl2seq if you want to compare each two pairs. using bl2seq you > will avoid formatting > > % bl2seq -p blastp -i sequence1 -j sequence2 -o seq1-vs-seq2.out ( and > suitable parameters ) > > > Use blastall if you want to search with each sequence against the others > > % formatdb -i seq.db -o T > % blastall -p blastp -i seq.db -d seq.db -o seq.out ( and suitable > parameters ) > > By simply typing: > > %blastall --help > or > %bl2seq --help > > you can obtain quite detailed help for the program settings. > > Hope that helps > Tony > -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From muratem at eng.uah.edu Tue Jul 20 16:07:30 2004 From: muratem at eng.uah.edu (Mike Muratet) Date: Tue Jul 20 16:09:43 2004 Subject: [Bioperl-l] Re: bioperl-db In-Reply-To: <9141C642-DA61-11D8-821C-000A959EB4C4@gmx.net> Message-ID: Hilmar While you're on the subject of load_seqdatabase, I am experiencing a wierd problem I need some help solving. I am loading subsets of Genbank into the database I created with the script create_mysql_db using the script load_seqdatabase.pl all of which I downloaded from the links at bioperl.org. (Having these records in mysql saves me a _ton_ of perl writing and thank you to the folks who did the development.) Sometimes there is overlap in the sets. For example..... /usr/local/lib/perl5/site_perl/5.8.4/bioperl-db/scripts/biosql/load_seqdatabase.pl --dbname bioseqdb --format GenBank --namespace clones --lookup --noupdate accessions.gb Loading accessions.gb ... -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were ("BC029727","20987556","BC029727","Mus musculus zeta-chain (TCR) associated protein kinase, mRNA (cDNA clone MGC:36162 IMAGE:4925739), complete cds.","1","ROD") FKs (5,10090) Duplicate entry '20987556' for key 3 --------------------------------------------------- Could not store BC029727: ------------- EXCEPTION ------------- MSG: create: object (Bio::Seq::RichSeq) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/lib/perl5/site_perl/5.8.4/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/lib/perl5/site_perl/5.8.4/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/local/lib/perl5/site_perl/5.8.4/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /usr/local/lib/perl5/site_perl/5.8.4/bioperl-db/scripts/biosql/load_seqdatabase.pl:517 STACK toplevel /usr/local/lib/perl5/site_perl/5.8.4/bioperl-db/scripts/biosql/load_seqdatabase.pl:500 The offending key is the GI number, which gets stored (apparently) in the identifier column of bioentry. I would expect that the namespace would confer uniqueness. Further, I might expect that an entry under a new namespace might supercede the previous reference. However, I am baffled by the exception when I have set --lookup and --noupdate. I have looked at the code, and I don't see anything simple. Could it be that the lookup includes the namespace in the key but the store does not? Am I using it improperly? Thanks Mike From rsucgang at bcm.tmc.edu Tue Jul 20 16:16:31 2004 From: rsucgang at bcm.tmc.edu (richard sucgang phd) Date: Tue Jul 20 16:18:19 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> <40FD2F65.50005@netvisao.pt> Message-ID: At 10:52 AM -0400 7/20/04, Jason Stajich wrote: >If you want to run blast I don't think you really need Bioperl. I had to wait until someone more knowledgable like Jason said this. The NCBI BLAST package includes a binary called blastclust that automatically runs the all vs all comparison of a set of sequences and divides them into clusters (which is the usual reason for running such a comparison). -r -- Richard Sucgang, PhD (713) 798 7657 http://www.dictygenome.org/ From hlapp at gnf.org Tue Jul 20 19:56:50 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jul 20 19:58:34 2004 Subject: [Bioperl-l] Re: bioperl-db In-Reply-To: References: Message-ID: <779643AE-DAA8-11D8-BA8E-000A95AE92B0@gnf.org> If (and it seems you are) you're using a version of bioperl-db updated some time in I believe May or so (shame on me for not tagging...) then the namespace if provided will always be included in the lookup. This is to support the inclusion of namespace in the two alternative key definitions on bioentry, i.e., (accession,version,namespace) and (identifier,namespace). The schema DDL definition as in CVS defines identifier as unique by itself, but given earlier feedback this is likely to be changed. At any rate, you may choose either definition of the unique key. So, most likely what you want is to change the unique key definition of identifier to include biodatabase_id. If you don't know how to do that I can send you the SQL code. -hilmar On Jul 20, 2004, at 1:07 PM, Mike Muratet wrote: > Hilmar > > While you're on the subject of load_seqdatabase, I am experiencing a > wierd > problem I need some help solving. > > I am loading subsets of Genbank into the database I created with the > script create_mysql_db using the script load_seqdatabase.pl all of > which I > downloaded from the links at bioperl.org. (Having these records in > mysql > saves me a _ton_ of perl writing and thank you to the folks who did the > development.) Sometimes there is overlap in the sets. For example..... > > /usr/local/lib/perl5/site_perl/5.8.4/bioperl-db/scripts/biosql/ > load_seqdatabase.pl --dbname bioseqdb --format GenBank > --namespace clones --lookup --noupdate accessions.gb > Loading accessions.gb ... > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were > ("BC029727","20987556","BC029727","Mus musculus zeta-chain (TCR) > associated protein kinase, mRNA (cDNA clone MGC:36162 IMAGE:4925739), > complete cds.","1","ROD") FKs (5,10090) > Duplicate entry '20987556' for key 3 > --------------------------------------------------- > Could not store BC029727: > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Seq::RichSeq) failed to insert or to be > found by > unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/local/lib/perl5/site_perl/5.8.4/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/local/lib/perl5/site_perl/5.8.4/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/local/lib/perl5/site_perl/5.8.4/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) > /usr/local/lib/perl5/site_perl/5.8.4/bioperl-db/scripts/biosql/ > load_seqdatabase.pl:517 > STACK toplevel > /usr/local/lib/perl5/site_perl/5.8.4/bioperl-db/scripts/biosql/ > load_seqdatabase.pl:500 > > The offending key is the GI number, which gets stored (apparently) in > the > identifier column of bioentry. I would expect that the namespace would > confer uniqueness. Further, I might expect that an entry under a new > namespace might supercede the previous reference. However, I am > baffled by > the exception when I have set --lookup and --noupdate. I have looked at > the code, and I don't see anything simple. Could it be that the lookup > includes the namespace in the key but the store does not? Am I using it > improperly? > > Thanks > > Mike > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From michael.watson at bbsrc.ac.uk Wed Jul 21 04:54:13 2004 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed Jul 21 04:55:58 2004 Subject: [Bioperl-l] Bio::Graphics::Panel Message-ID: <8975119BCD0AC5419D61A9CF1A923E957C2838@iahce2knas1.iah.bbsrc.reserved> Hi When using Bio::Grahpics::Panel and Bio::SeqFeature::Generic to render my BLAST hits as a PNG image, I want ALL HSP's to be the same color, no matter what their score or e-value is. How do I do that? I have tried setting $hsp->evalue("1E-40") (arbitrary high value) but it has no effect. Which part of the HSP object does Panel use to figure out what color it should draw it? Some of my HSPs are so pale they are almost invisible. Mick From SonjaFunke at web.de Wed Jul 21 06:48:13 2004 From: SonjaFunke at web.de (Sonja Funke) Date: Wed Jul 21 06:49:58 2004 Subject: [Bioperl-l] tmp-files Message-ID: <329005075@web.de> Hallo Bioperl users! Hope you can help me solve my problem. After around 200 sequences I get the following message from the NCBI: -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Cannot write to '/tmp/bcikv7mwMp': Too many open files Do you know a function thats stops bioperl from writing files to the tmp-file? Thanks a lot! Sonja ____________________________________________________ Aufnehmen, abschicken, nah sein - So einfach ist WEB.DE Video-Mail: http://freemail.web.de/?mc=021200 From ak at ebi.ac.uk Wed Jul 21 07:38:00 2004 From: ak at ebi.ac.uk (Andreas Kahari) Date: Wed Jul 21 07:39:42 2004 Subject: [Bioperl-l] tmp-files In-Reply-To: <329005075@web.de> References: <329005075@web.de> Message-ID: <20040721113759.GA6202@ebi.ac.uk> This appears to be an error message generated by the NCBI web server. They are apparantly getting too many requests that involves creating files, and/or they fail to close them once they're done with them. In any case, I'd say it's a web service admin problem for the NCBI admins, not a bioperl problem for you (unless it's actually you who bombard their service with too many/rapid requests). Cheers, Andreas On Wed, Jul 21, 2004 at 12:48:13PM +0200, Sonja Funke wrote: > Hallo Bioperl users! > > Hope you can help me solve my problem. After around 200 sequences I get the following message from the NCBI: > > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Cannot write to '/tmp/bcikv7mwMp': Too many open files > > > > Do you know a function thats stops bioperl from writing files to the tmp-file? > > Thanks a lot! > Sonja -- Andreas K?h?ri EMBL-EBI/ensembl From crabtree at tigr.org Wed Jul 21 09:45:37 2004 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Wed Jul 21 09:47:30 2004 Subject: [Bioperl-l] Bio::Graphics::Panel Message-ID: > When using Bio::Grahpics::Panel and Bio::SeqFeature::Generic > to render my BLAST hits as a PNG image, I want ALL HSP's to > be the same color, no matter what their score or e-value is. > > How do I do that? I have tried setting $hsp->evalue("1E-40") > (arbitrary high value) but it has no effect. Which part of > the HSP object does Panel use to figure out what color it > should draw it? Some of my HSPs are so pale they are almost > invisible. What glyph are you using to display the HSPs (i.e., what value are you passing as the 'glyph' when you call add_track on the objects that represent the HSPs)? If you're currently using the 'graded_segments' glyph, try changing it to a plain old 'segments' glyph and see what happens; the former sets the intensity of the color based on the features' scores, while the latter does not. Jonathan From laurichj at bioinfo.ucr.edu Wed Jul 21 11:40:16 2004 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Wed Jul 21 11:41:56 2004 Subject: [Bioperl-l] Bio::Graphics funkiness Message-ID: <20040721154016.GA2007@bioinfo.ucr.edu> For lack of a good word to describe the problem, I'm using Bio::Graphics to display multiple alignments. But, as these get a little long (some consist of >1000 sequences), I run into a problem like this: --- --- - - - - - -- - - - - Where that is one track (one sequence) of type graded_segments. I would like them to show up on a single "line". I am guessing that the little black border around them is wider than the gap it is trying to show. How would I force them to be on the same line? The goal is to give an Idea of how the multiple alignment looks. Thanks -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| From Boris.Lenhard at cgb.ki.se Wed Jul 21 12:12:12 2004 From: Boris.Lenhard at cgb.ki.se (Boris Lenhard) Date: Wed Jul 21 12:14:06 2004 Subject: [Bioperl-l] SwissProt/UniProt GN line format changed Message-ID: <1090426332.4365.337.camel@shire.cgb.ki.se> I do not know if it has been discussed yet, but the GN (gene name) line format recent versions of SwissProt files has been changed: e.g. old: GN ZNF36 OR KOX18 OR ZNF139. new: GN Name=RCHY1; Synonyms=ZNF363, CHIMP, ARNIP; This renders Bio::SeqIO::swiss unable to parse the GN line; as a consequence, the resulting annotation object lacks the 'gene_name' key. Boris -- ========================================== Boris Lenhard, Ph.D. Group Leader, Applied Genome Informatics Center for Genomics and Bioinformatics Karolinska Institutet Berzelius v?g 35, B326b 171 77 Stockholm, SWEDEN Phone: +46 (0)8 5248 6391 FAX: +46 (0)8 32 48 26 E-mail: Boris.Lenhard@cgb.ki.se ========================================== From pow at ebi.ac.uk Wed Jul 21 12:13:10 2004 From: pow at ebi.ac.uk (Jean-Jack Riethoven) Date: Wed Jul 21 12:15:28 2004 Subject: [Bioperl-l] Bio::Graphics funkiness In-Reply-To: <20040721154016.GA2007@bioinfo.ucr.edu> Message-ID: On Wed, 21 Jul 2004, Josh Lauricha wrote: > Where that is one track (one sequence) of type graded_segments. I would > like them to show up on a single "line". I am guessing that the little > black border around them is wider than the gap it is trying to show. How > would I force them to be on the same line? The goal is to give an Idea > of how the multiple alignment looks. Set your bump level to 0, -bump => 0 However, you might night to devise a way to distinguish between the separate features (aligments) then since they will connect with eachother. You can use a code reference to give individual features different colours (if you are not using the glyph that colour grades them by default), or change the glyph itself if for example the individual aligment is tiny compared to the region you want to display. e.g. in your add_track -glyph => sub { my $feature = shift; return ($feature->length < XX) ? "diamond" : "line"; } where XX is a variable that determines when something is 'tiny' compared to your region. With kind regards, Drs. Jean-Jack M. Riethoven EMBL Outstation - Hinxton pow@ebi.ac.uk ICQ#: 3433929 European Bioinformatics Institute Phone: (+44) 1223 494635 Wellcome Trust Genome Campus Fax : (+44) 1223 494468 Hinxton, Cambridge CB10 1SD URL : http://www.ebi.ac.uk/asd/ UNITED KINGDOM From Annie.Law at nrc-cnrc.gc.ca Wed Jul 21 14:50:13 2004 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Wed Jul 21 14:52:01 2004 Subject: [Bioperl-l] Load_ontology.pl warnings and exceptions Message-ID: <10C94843061E094A98C02EB77CFC328722FE63@nrcmrdex1d.imsb.nrc.ca> Hi, Previously, I used load_ontology.pl and got about 5 exceptions or warnings. Recently, while using the same bioperl and bioperl-db Setups. I tried to use the same version of load_ontology.pl again with the latest information from GO ontology. Now it seems that I get about 90 warnings exceptions about 50 listed after ..terms and others listed after ...relationships during the run. I'm pretty sure that only the input to the scripts has changed. Is this a normal outcome? Are these warnings only a reflection of the source file and an annotation of work in progress or is there something that I Am missing? I wanted to eliminate the interference of previously existing data so I went and created a new database, loaded the Bioperl schema and used the load_ncbi_taxonomy.pl script then I used the load_ontology.pl script. Here is the output of the run. It seems to me that they are all complaints of duplicate entries. All of the bioperl is 1.4 and was installed around this past March. I would appreciate some insight. Thanks, Annie. Script started on Wed 21 Jul 2004 01:27:45 PM EDT > perl /root/bioperl-db/scripts/biosql/load_ontology.pl --dbuser= =user1 --dbpass=pass1 --dbname mydatabase --safe --computetc --noobsolete --names space "Gene Ontology" --format goflat --fmtargs "-defsfile,/root/bioperl-db/data/GO.defs" /ro oot/bioperl-db/data/function.ontology /root/bioperl-db/data/process.ontology /root/bioperl-db b/data/component.ontology Parsing input ... Loading ontology Gene Ontology: ... terms -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("GO:0001529","elastin","","") FKs (1) Duplicate entry 'elastin-1' for key 2 --------------------------------------------------- Could not store GO:0001529 (elastin): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 -------------------------------------- -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("GO:0005581","collagen","","") FKs (1) Duplicate entry 'collagen-1' for key 2 --------------------------------------------------- Could not store GO:0005581 (collagen): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 -------------------------------------- -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("GO:0005676","condensin complex","","") FKs (1) Duplicate entry 'condensin complex-1' for key 2 --------------------------------------------------- Could not store GO:0005676 (condensin complex): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 -------------------------------------- -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("GO:0005699","kinetochore","","") FKs (1) Duplicate entry 'kinetochore-1' for key 2 --------------------------------------------------- Could not store GO:0005699 (kinetochore): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 -------------------------------------- -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("GO:0005716","synaptonemal complex","","") FKs (1) Duplicate entry 'synaptonemal complex-1' for key 2 --------------------------------------------------- Could not store GO:0005716 (synaptonemal complex): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 -------------------------------------- -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("GO:0005717","chromatin","","") FKs (1) Duplicate entry 'chromatin-1' for key 2 --------------------------------------------------- Could not store GO:0005717 (chromatin): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 -------------------------------------- -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("GO:0005718","nucleosome","","") FKs (1) Duplicate entry 'nucleosome-1' for key 2 --------------------------------------------------- Could not store GO:0005718 (nucleosome): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 -------------------------------------- -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("GO:0005733","small nucleolar RNA","","") FKs (1) Duplicate entry 'small nucleolar RNA-1' for key 2 --------------------------------------------------- Could not store GO:0005733 (small nucleolar RNA): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 -------------------------------------- -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("GO:0006326","bent DNA binding","","") FKs (1) Duplicate entry 'bent DNA binding-1' for key 2 --------------------------------------------------- Could not store GO:0006326 (bent DNA binding): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 -------------------------------------- From muratem at eng.uah.edu Wed Jul 21 15:29:17 2004 From: muratem at eng.uah.edu (Mike Muratet) Date: Wed Jul 21 15:31:24 2004 Subject: [Bioperl-l] Re: bioperl-db In-Reply-To: <779643AE-DAA8-11D8-BA8E-000A95AE92B0@gnf.org> Message-ID: On Tue, 20 Jul 2004, Hilmar Lapp wrote: > If (and it seems you are) you're using a version of bioperl-db updated > some time in I believe May or so (shame on me for not tagging...) then > the namespace if provided will always be included in the lookup. > > This is to support the inclusion of namespace in the two alternative > key definitions on bioentry, i.e., (accession,version,namespace) and > (identifier,namespace). The schema DDL definition as in CVS defines > identifier as unique by itself, but given earlier feedback this is > likely to be changed. At any rate, you may choose either definition of > the unique key. > > So, most likely what you want is to change the unique key definition of > identifier to include biodatabase_id. If you don't know how to do that > I can send you the SQL code. > > -hilmar > Hilmar Thanks. I'll take a look at table definitions. Redefining the keys isn't any more than creating a new table with the new keys and selecting from the old table is it? I can't remember if mysql allows you to alter keys in tables. Mike From hlapp at gnf.org Wed Jul 21 15:39:23 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Jul 21 15:41:04 2004 Subject: [Bioperl-l] Re: bioperl-db In-Reply-To: Message-ID: On Wednesday, July 21, 2004, at 12:29 PM, Mike Muratet wrote: > I can't remember if mysql allows you to alter keys in > tables. It does. You drop the old key and then create the new one. No need to re-create the table. (I like to rant about mysql, but it's not that poor ;) -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Annie.Law at nrc-cnrc.gc.ca Wed Jul 21 17:13:22 2004 From: Annie.Law at nrc-cnrc.gc.ca (Law, Annie) Date: Wed Jul 21 17:15:07 2004 Subject: [Bioperl-l] FW: Load_ontology.pl warnings and exceptions Message-ID: <10C94843061E094A98C02EB77CFC328722FE64@nrcmrdex1d.imsb.nrc.ca> Hi, I made a silly mistake. I forgot to put an underscore in the line --fmtargs "-defsfile,/root/bioperl-db/data/GO.defs" It should be -defs_file NOT -defsfile. I now have no warining or exceptions. thanks > -----Original Message----- > From: Law, Annie > Sent: Wednesday, July 21, 2004 2:50 PM > To: 'bioperl-l@bioperl.org' > Subject: Load_ontology.pl warnings and exceptions > > > Hi, > > Previously, I used load_ontology.pl and got about 5 > exceptions or warnings. Recently, while using the same > bioperl and bioperl-db Setups. I tried to use the same > version of load_ontology.pl again with the latest information > from GO ontology. Now it seems that I get about 90 warnings > exceptions about 50 listed after ..terms and others listed > after ...relationships during the run. I'm pretty sure that > only the input to the scripts has changed. Is this a normal outcome? > > Are these warnings only a reflection of the source file and > an annotation of work in progress or is there something that I > Am missing? I wanted to eliminate the interference of > previously existing data so I went and created a new > database, loaded the Bioperl schema and used the > load_ncbi_taxonomy.pl script then I used the load_ontology.pl script. > > Here is the output of the run. It seems to me that they are > all complaints of duplicate entries. All of the bioperl is > 1.4 and was installed around this past March. > > I would appreciate some insight. > Thanks, > Annie. > > Script started on Wed 21 Jul 2004 01:27:45 PM EDT > > perl /root/bioperl-db/scripts/biosql/load_ontology.pl --dbuser= > =user1 --dbpass=pass1 --dbname mydatabase --safe --computetc > --noobsolete --names space "Gene Ontology" --format goflat > --fmtargs "-defsfile,/root/bioperl-db/data/GO.defs" /ro > oot/bioperl-db/data/function.ontology > /root/bioperl-db/data/process.ontology /root/bioperl-db > b/data/component.ontology Parsing input ... Loading ontology > Gene Ontology: > ... terms > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, > values were ("GO:0001529","elastin","","") FKs (1) Duplicate > entry 'elastin-1' for key 2 > --------------------------------------------------- > Could not store GO:0001529 (elastin): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert > or to be found by unique key STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObj > ect.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, > values were ("GO:0005581","collagen","","") FKs (1) Duplicate > entry 'collagen-1' for key 2 > --------------------------------------------------- > Could not store GO:0005581 (collagen): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert > or to be found by unique key STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObj > ect.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, > values were ("GO:0005676","condensin complex","","") FKs (1) > Duplicate entry 'condensin complex-1' for key 2 > --------------------------------------------------- > Could not store GO:0005676 (condensin complex): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert > or to be found by unique key STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObj > ect.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, > values were ("GO:0005699","kinetochore","","") FKs (1) > Duplicate entry 'kinetochore-1' for key 2 > --------------------------------------------------- > Could not store GO:0005699 (kinetochore): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert > or to be found by unique key STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObj > ect.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, > values were ("GO:0005716","synaptonemal complex","","") FKs > (1) Duplicate entry 'synaptonemal complex-1' for key 2 > --------------------------------------------------- > Could not store GO:0005716 (synaptonemal complex): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert > or to be found by unique key STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObj > ect.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, > values were ("GO:0005717","chromatin","","") FKs (1) > Duplicate entry 'chromatin-1' for key 2 > --------------------------------------------------- > Could not store GO:0005717 (chromatin): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert > or to be found by unique key STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObj > ect.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, > values were ("GO:0005718","nucleosome","","") FKs (1) > Duplicate entry 'nucleosome-1' for key 2 > --------------------------------------------------- > Could not store GO:0005718 (nucleosome): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert > or to be found by unique key STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObj > ect.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, > values were ("GO:0005733","small nucleolar RNA","","") FKs > (1) Duplicate entry 'small nucleolar RNA-1' for key 2 > --------------------------------------------------- > Could not store GO:0005733 (small nucleolar RNA): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert > or to be found by unique key STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObj > ect.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, > values were ("GO:0006326","bent DNA binding","","") FKs (1) > Duplicate entry 'bent DNA binding-1' for key 2 > --------------------------------------------------- > Could not store GO:0006326 (bent DNA binding): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert > or to be found by unique key STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAd > aptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObj > ect.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > > > > From mkur at poczta.gazeta.pl Wed Jul 21 23:08:52 2004 From: mkur at poczta.gazeta.pl (Michal Kurowski) Date: Wed Jul 21 23:36:52 2004 Subject: [Bioperl-l] compiling ext-1.4 on solaris 2.9 Message-ID: <20040722030852.GA6895@calvados> Hi, It seems Bioperl-ext-1.4 is not actually easy to compile on Solaris. I must say it is a 64-bit, iThread Perl but still ;-) Other XS modules were built with no problems. I'm interested in Align module only. I followed the instructions and compilation went OK. "make test" is not so easy ... PERL_DL_NONLAZY=1 /usr/local/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl 1..2 Can't load 'blib/arch/auto/Bio/Ext/Align/Align.so' for module Bio::Ext::Align: ld.so.1: /usr/local/bin/perl: fatal: relocation error: file blib/arch/auto/Bio/Ext/Align/Align.so: symbol bp_sw_error_on: referenced symbol not found at /usr/local/lib/perl5/5.8.1/sun4-solaris-thread-multi-64/DynaLoader.pm line 229. at test.pl line 10 My "perl -V" output is given below. I hope someone has seen this before ... -- Michal Kurowski -------------- next part -------------- Summary of my perl5 (revision 5.0 version 8 subversion 1) configuration: Platform: osname=solaris, osvers=2.9, archname=sun4-solaris-thread-multi-64 uname='sunos tequila 5.9 generic_112233-08 sun4u sparc sunw,sun-fire-v210 ' config_args='-Dcc=gcc -mcpu=v9 -m64' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc -mcpu=v9 -m64', ccflags ='-D_REENTRANT -fno-strict-aliasing -I/usr/local/include -mcpu=v9 -m64 -Wa,-xarch=v9 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O3', cppflags='-D_REENTRANT -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='3.3.1', gccosandvers='solaris2.9' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=87654321 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='gcc -mcpu=v9 -m64', ldflags =' -L/usr/local/lib ' libpth=/usr/local/lib /usr/lib /usr/ccs/lib libs=-lsocket -lnsl -ldl -lm -lpthread -lc perllibs=-lsocket -lnsl -ldl -lm -lpthread -lc libc=/usr/lib/sparcv9/libc.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -z ignore -z lazyload -z combreloc' cccdlflags='-fPIC', lddlflags=' -G -z ignore -z lazyload -z combreloc -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_64_BIT_INT USE_64_BIT_ALL USE_LARGE_FILES PERL_IMPLICIT_CONTEXT Built under solaris Compiled at Nov 4 2003 15:46:19 @INC: /usr/local/lib/perl5/5.8.1/sun4-solaris-thread-multi-64 /usr/local/lib/perl5/5.8.1 /usr/local/lib/perl5/site_perl/5.8.1/sun4-solaris-thread-multi-64 /usr/local/lib/perl5/site_perl/5.8.1 /usr/local/lib/perl5/site_perl . From pvh at egenetics.com Thu Jul 22 06:09:45 2004 From: pvh at egenetics.com (Peter van Heusden) Date: Thu Jul 22 06:11:43 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> <40FD2F65.50005@netvisao.pt> Message-ID: <40FF9269.6090507@egenetics.com> richard sucgang phd wrote: > At 10:52 AM -0400 7/20/04, Jason Stajich wrote: > >> If you want to run blast I don't think you really need Bioperl. > > > I had to wait until someone more knowledgable like Jason said this. > The NCBI BLAST package includes a binary called blastclust that > automatically runs the all vs all comparison of a set of sequences and > divides them into clusters (which is the usual reason for running such > a comparison). > If you're looking to do clustering, however, why not use actual clustering software? For instance, Electric Genetics' (yeah, my empoyer!) provides stackPACK, a clustering toolkit (focussed on ESTs) which is free for academics: http://www.egenetics.com/stackpack.html Peter From ak at ebi.ac.uk Thu Jul 22 06:32:07 2004 From: ak at ebi.ac.uk (Andreas Kahari) Date: Thu Jul 22 06:33:48 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: <40FF9269.6090507@egenetics.com> References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> <40FD2F65.50005@netvisao.pt> <40FF9269.6090507@egenetics.com> Message-ID: <20040722103207.GB25986@ebi.ac.uk> On Thu, Jul 22, 2004 at 11:09:45AM +0100, Peter van Heusden wrote: > richard sucgang phd wrote: > > >At 10:52 AM -0400 7/20/04, Jason Stajich wrote: > > > >>If you want to run blast I don't think you really need Bioperl. > > > > > >I had to wait until someone more knowledgable like Jason said this. > >The NCBI BLAST package includes a binary called blastclust that > >automatically runs the all vs all comparison of a set of sequences and > >divides them into clusters (which is the usual reason for running such > >a comparison). > > > If you're looking to do clustering, however, why not use actual > clustering software? For instance, Electric Genetics' (yeah, my > empoyer!) provides stackPACK, a clustering toolkit (focussed on ESTs) > which is free for academics: http://www.egenetics.com/stackpack.html Or MCL, which I think Jason has provided some blast-related input to: http://micans.org/mcl/ -- Andreas K?h?ri EMBL-EBI/ensembl From amackey at pcbi.upenn.edu Thu Jul 22 08:08:05 2004 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Jul 22 08:09:27 2004 Subject: [Bioperl-l] compiling ext-1.4 on solaris 2.9 In-Reply-To: <20040722030852.GA6895@calvados> References: <20040722030852.GA6895@calvados> Message-ID: 1. get a fresh, clean copy 2. cd into Bio/Ext (bypasses building SeqIO/staden) 3. do "perl Makefile.PL; make; make test" and send me the (voluminous) output ... Thanks, -Aaron On Jul 21, 2004, at 11:08 PM, Michal Kurowski wrote: > Hi, > > It seems Bioperl-ext-1.4 is not actually easy to compile on Solaris. > I must say it is a 64-bit, iThread Perl but still ;-) > Other XS modules were built with no problems. > > I'm interested in Align module only. I followed the instructions and > compilation went OK. "make test" is not so easy ... > > > PERL_DL_NONLAZY=1 /usr/local/bin/perl "-Iblib/lib" "-Iblib/arch" > test.pl > 1..2 > Can't load 'blib/arch/auto/Bio/Ext/Align/Align.so' for module > Bio::Ext::Align: > ld.so.1: /usr/local/bin/perl: fatal: relocation error: > file blib/arch/auto/Bio/Ext/Align/Align.so: > symbol bp_sw_error_on: referenced symbol not found > at > /usr/local/lib/perl5/5.8.1/sun4-solaris-thread-multi-64/DynaLoader.pm > line 229. at test.pl line 10 > > My "perl -V" output is given below. > > I hope someone has seen this before ... > > > -- > Michal Kurowski > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From mkur at poczta.gazeta.pl Thu Jul 22 10:32:21 2004 From: mkur at poczta.gazeta.pl (Michal Kurowski) Date: Thu Jul 22 10:34:05 2004 Subject: [Bioperl-l] Re: compiling ext-1.4 on solaris 2.9 In-Reply-To: References: <20040722030852.GA6895@calvados> Message-ID: <20040722143221.GA9690@calvados> Aaron J. Mackey [amackey@pcbi.upenn.edu] wrote: > 1. get a fresh, clean copy > 2. cd into Bio/Ext (bypasses building SeqIO/staden) > 3. do "perl Makefile.PL; make; make test" and send me the (voluminous) > output ... OK, I got it. The problem is the MakeMaker generated Makefile at the location "Bio/Ext/Align". I corrected in manually. Here's the diff (parameters passed must be quoted): --- Makefile czw lip 22 16:23:26 2004 +++ /usr/local/src/mine_bioperl-ext-1.4/Bio/Ext/Align/Makefile czw lip 22 16:21:56 2004 @@ -506,7 +506,7 @@ $(RM_RF) $(DISTVNAME) $(RM_F) $(INST_DYNAMIC) $(INST_BOOT) $(RM_F) $(INST_STATIC) - $(RM_F) $(INST_LIB)/Bio/Ext/Align.pm $(MAKEFILE_OLD)$(FIRST_MAKEFILE) + $(RM_F) $(MAKEFILE_OLD) $(FIRST_MAKEFILE)$(INST_LIB)/Bio/Ext/Align.pm # --- MakeMaker metafile section: @@ -862,7 +862,7 @@ $(MYEXTLIB): DEFINE='$(DEFINE)'; CC='$(PERLMAINCC)'; export DEFINE INC CC; \ - cd libs && $(MAKE) CC=$(CC) libsw$(LIB_EXT) -e + cd libs && $(MAKE) CC='$(CC)' libsw$(LIB_EXT) -e Sun's "make" actually did complain about wrong ELF class but "gmake" simply bailed out. Library files were being complied for 32-bit architecture ;-) Cheers, -- Michal Kurowski perl -e '$_=q#: 13_2: 12/o{>: 8_4) (_4: 6/2^-2; 3;-2^\2: 5/7\_/\7: 12m m::#; y#:#\n#;s#(\D)(\d+)#$1x$2#ge;print' From mkur at gazeta.pl Wed Jul 21 23:43:03 2004 From: mkur at gazeta.pl (mkur@gazeta.pl) Date: Thu Jul 22 11:01:25 2004 Subject: [Bioperl-l] compiling ext-1.4 on solaris 2.9 Message-ID: <1090467783726.ew3.mkur@gazeta.pl> Hi, It seems Bioperl-ext-1.4 is not actually easy to compile on Solaris. I must say it is a 64-bit, iThread Perl but still ;-) Other XS modules were built with no problems. I'm interested in Align module only. I followed the instructions and compilation went OK. "make test" is not so easy ... PERL_DL_NONLAZY=1 /usr/local/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl 1..2 Can't load 'blib/arch/auto/Bio/Ext/Align/Align.so' for module Bio::Ext::Align: ld.so.1: /usr/local/bin/perl: fatal: relocation error: file blib/arch/auto/Bio/Ext/Align/Align.so: symbol bp_sw_error_on: referenced symbol not found at /usr/local/lib/perl5/5.8.1/sun4-solaris-thread-multi-64/DynaLoader.pm line 229. at test.pl line 10 My "perl -V" output is given below. (sorry for wrapped lines - bioperl mail hub seems to refuse to accept attachments). I hope someone has seen this before ... Summary of my perl5 (revision 5.0 version 8 subversion 1) configuration: Platform: osname=solaris, osvers=2.9,archname=sun4-solaris-thread-multi-64 uname='sunos tequila 5.9 generic_112233-08 sun4usparc sunw,sun-fire-v210 ' config_args='-Dcc=gcc -mcpu=v9 -m64' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc -mcpu=v9 -m64', ccflags ='-D_REENTRANT -fno-strict-aliasing -I/usr/local/include -mcpu=v9 -m64 -Wa,-xarch=v9 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O3', cppflags='-D_REENTRANT -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='3.3.1', gccosandvers='solaris2.9' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=87654321 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='gcc -mcpu=v9 -m64', ldflags =' -L/usr/local/lib ' libpth=/usr/local/lib /usr/lib /usr/ccs/lib libs=-lsocket -lnsl -ldl -lm -lpthread -lc perllibs=-lsocket -lnsl -ldl -lm -lpthread -lc libc=/usr/lib/sparcv9/libc.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -z ignore -z lazyload -z combreloc' cccdlflags='-fPIC', lddlflags=' -G -z ignore -z lazyload -z combreloc -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_64_BIT_INT USE_64_BIT_ALL USE_LARGE_FILES PERL_IMPLICIT_CONTEXT Built under solaris Compiled at Nov 4 2003 15:46:19 @INC: /usr/local/lib/perl5/5.8.1/sun4-solaris-thread-multi-64 /usr/local/lib/perl5/5.8.1 /usr/local/lib/perl5/site_perl/5.8.1/sun4-solaris-thread-multi-64 /usr/local/lib/perl5/site_perl/5.8.1 /usr/local/lib/perl5/site_perl . From xiang.deng at duke.edu Thu Jul 22 10:00:55 2004 From: xiang.deng at duke.edu (Xiang Deng) Date: Thu Jul 22 11:01:34 2004 Subject: [Bioperl-l] bioperl installation Message-ID: To whom it may concern, I am trying to install bioperl locally under my peronal folder. The core modules were successfully installed. I ran through a simple perl script to use bio::perl or bio::seq without complain. but I got problem with installing bioperl-ext-1.4. When I installed core modules I used this >perl Makefile.PL LIB=/my_local_directory/ My first question is whether or not I have to use the same way to deal with ext modules too and I can use the same directory or a separate one. Actually I tried both and both failed with error message as follows: mendel 207% perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Bio::Ext::Align ERROR from evaluation of /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: Can't locate Inline/MakeMaker.pm in @INC (@INC contains: /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line 1. BEGIN failed--compilation aborted at ./Makefile.PL line 1. could you please let me know what I did wrong or miss in there? thanks a lot, Xiang Department of Pharmacology and Cancer Biology Duke University Medical Center Durham, NC 27710 Phone: 919-4792339 From brian_osborne at cognia.com Thu Jul 22 11:22:22 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jul 22 11:24:05 2004 Subject: [Bioperl-l] bioperl installation In-Reply-To: Message-ID: Xiang, I don't think Inline is installed by default along with Perl, you'll need to install it. Mind you, you don't need to install all of Inline, just the parts that your installation attempt complains about. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Xiang Deng Sent: Thursday, July 22, 2004 10:01 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] bioperl installation To whom it may concern, I am trying to install bioperl locally under my peronal folder. The core modules were successfully installed. I ran through a simple perl script to use bio::perl or bio::seq without complain. but I got problem with installing bioperl-ext-1.4. When I installed core modules I used this >perl Makefile.PL LIB=/my_local_directory/ My first question is whether or not I have to use the same way to deal with ext modules too and I can use the same directory or a separate one. Actually I tried both and both failed with error message as follows: mendel 207% perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Bio::Ext::Align ERROR from evaluation of /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: Can't locate Inline/MakeMaker.pm in @INC (@INC contains: /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line 1. BEGIN failed--compilation aborted at ./Makefile.PL line 1. could you please let me know what I did wrong or miss in there? thanks a lot, Xiang Department of Pharmacology and Cancer Biology Duke University Medical Center Durham, NC 27710 Phone: 919-4792339 From jason at cgt.duhs.duke.edu Thu Jul 22 11:25:41 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 22 11:27:20 2004 Subject: [Bioperl-l] bioperl installation In-Reply-To: References: Message-ID: Did you install Inline::C? >From the README in bioperl-ext directory o Installing Depending on your choise of extensions, you might need Inline::MakeMaker and Inline::C to create the makefile. Use for example the cpan program to install Inline::MakeMaker and answer yes when prompted to install Inline::C. This line gives you a hint: > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: Did you also install the staden library? -jason On Thu, 22 Jul 2004, Xiang Deng wrote: > To whom it may concern, > > I am trying to install bioperl locally under my peronal folder. The core > modules were successfully installed. I ran through a simple perl script to > use bio::perl or bio::seq without complain. but I got problem with > installing bioperl-ext-1.4. > > When I installed core modules I used this > >perl Makefile.PL LIB=/my_local_directory/ > > My first question is whether or not I have to use the same way to deal > with ext modules too and I can use the same directory or a separate one. > > Actually I tried both and both failed with error message as follows: > > mendel 207% perl Makefile.PL > Checking if your kit is complete... > Looks good > Writing Makefile for Bio::Ext::Align > ERROR from evaluation of > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 > /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix > /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line 1. > BEGIN failed--compilation aborted at ./Makefile.PL line 1. > > could you please let me know what I did wrong or miss in there? thanks a > lot, > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From xiang.deng at duke.edu Thu Jul 22 11:41:40 2004 From: xiang.deng at duke.edu (Xiang Deng) Date: Thu Jul 22 11:44:04 2004 Subject: [Bioperl-l] bioperl installation In-Reply-To: Message-ID: Thanks for your message, Jason. The problem is I don't have permission to install bioperl into the standard site_perl/system area. From the on-line instruction at bioperl.org the only way I can get it installed is to use "make", not CPAN. So could you please let me know where I can download Inline::MakeMaker and Inline::C module. Everything I downloaded is from bioperl.org. I did not see where I can get these two. And if I get it which directory I should put them in. thanks, Xiang Department of Pharmacology and Cancer Biology Duke University Medical Center Durham, NC 27710 Phone: 919-4792339 Jason Stajich Sent by: bioperl-l-bounces@portal.open-bio.org 07/22/2004 11:25 AM To Xiang Deng cc bioperl-l@bioperl.org Subject Re: [Bioperl-l] bioperl installation Did you install Inline::C? >From the README in bioperl-ext directory o Installing Depending on your choise of extensions, you might need Inline::MakeMaker and Inline::C to create the makefile. Use for example the cpan program to install Inline::MakeMaker and answer yes when prompted to install Inline::C. This line gives you a hint: > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: Did you also install the staden library? -jason On Thu, 22 Jul 2004, Xiang Deng wrote: > To whom it may concern, > > I am trying to install bioperl locally under my peronal folder. The core > modules were successfully installed. I ran through a simple perl script to > use bio::perl or bio::seq without complain. but I got problem with > installing bioperl-ext-1.4. > > When I installed core modules I used this > >perl Makefile.PL LIB=/my_local_directory/ > > My first question is whether or not I have to use the same way to deal > with ext modules too and I can use the same directory or a separate one. > > Actually I tried both and both failed with error message as follows: > > mendel 207% perl Makefile.PL > Checking if your kit is complete... > Looks good > Writing Makefile for Bio::Ext::Align > ERROR from evaluation of > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 > /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix > /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line 1. > BEGIN failed--compilation aborted at ./Makefile.PL line 1. > > could you please let me know what I did wrong or miss in there? thanks a > lot, > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From pvh at egenetics.com Thu Jul 22 11:45:12 2004 From: pvh at egenetics.com (Peter van Heusden) Date: Thu Jul 22 11:47:03 2004 Subject: [Bioperl-l] Bioperl exceptions Message-ID: <40FFE108.1090701@egenetics.com> Hi Bioperl developers I'm a bit confused by the current state of play of the Bioperl exception class (Bio::Root::Exception). This inherits from Error, but Error.pm is not a requirement of Bioperl. It seems to me then that there are two exception mechanisms in Bioperl: 1) Exceptions based on exception objects, derived from Error.pm and implmented by Bio:Root::Exception and its subclasses. 2) Simple "stack dump" exceptions, which are implemented in Bio::Root's throw() method and used if Error.pm is not available. Is this right? If so, is the necessary result is that Error-based exceptions can't be used as part of any core Bioperl class? (Since that would force a dependence on Error.pm) I see that in the biodesign.pod documentation, the only semantics mentioned are the basic string based ones. Does this mean that using exception objects is deprecated (I see the Exception.pm class was last updated in Bioperl 1.3.01 days)? Peter From brian_osborne at cognia.com Thu Jul 22 11:53:09 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jul 22 11:54:49 2004 Subject: [Bioperl-l] Molecular weight calculation Message-ID: Bioperl-l, I've added 1 significant digit to the get_mol_wt result in Bio::Tools::SeqStats, you'll now see 23445.3 instead of 23445, for example. I hope no one minds. Brian O. From brian_osborne at cognia.com Thu Jul 22 12:01:18 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jul 22 12:03:05 2004 Subject: [Bioperl-l] bioperl installation In-Reply-To: Message-ID: Xiang, The section in the on-line instruction, the INSTALL page, talking about installing in a personal area using CPAN is called "INSTALLING BIOPERL IN A PERSONAL MODULE AREA": You can also use CPAN to install accessory modules in your local directory. First enter the CPAN shell, then set the arguments for the command "perl Makefile.PL", like this: >perl -e shell -MCPAN cpan>o conf makepl_arg LIB=/home/users/dag/My_Local_Perl_Modules Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Xiang Deng Sent: Thursday, July 22, 2004 11:42 AM To: Jason Stajich Cc: bioperl-l-bounces@portal.open-bio.org; bioperl-l@bioperl.org Subject: Re: [Bioperl-l] bioperl installation Thanks for your message, Jason. The problem is I don't have permission to install bioperl into the standard site_perl/system area. From the on-line instruction at bioperl.org the only way I can get it installed is to use "make", not CPAN. So could you please let me know where I can download Inline::MakeMaker and Inline::C module. Everything I downloaded is from bioperl.org. I did not see where I can get these two. And if I get it which directory I should put them in. thanks, Xiang Department of Pharmacology and Cancer Biology Duke University Medical Center Durham, NC 27710 Phone: 919-4792339 Jason Stajich Sent by: bioperl-l-bounces@portal.open-bio.org 07/22/2004 11:25 AM To Xiang Deng cc bioperl-l@bioperl.org Subject Re: [Bioperl-l] bioperl installation Did you install Inline::C? >From the README in bioperl-ext directory o Installing Depending on your choise of extensions, you might need Inline::MakeMaker and Inline::C to create the makefile. Use for example the cpan program to install Inline::MakeMaker and answer yes when prompted to install Inline::C. This line gives you a hint: > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: Did you also install the staden library? -jason On Thu, 22 Jul 2004, Xiang Deng wrote: > To whom it may concern, > > I am trying to install bioperl locally under my peronal folder. The core > modules were successfully installed. I ran through a simple perl script to > use bio::perl or bio::seq without complain. but I got problem with > installing bioperl-ext-1.4. > > When I installed core modules I used this > >perl Makefile.PL LIB=/my_local_directory/ > > My first question is whether or not I have to use the same way to deal > with ext modules too and I can use the same directory or a separate one. > > Actually I tried both and both failed with error message as follows: > > mendel 207% perl Makefile.PL > Checking if your kit is complete... > Looks good > Writing Makefile for Bio::Ext::Align > ERROR from evaluation of > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 > /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix > /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line 1. > BEGIN failed--compilation aborted at ./Makefile.PL line 1. > > could you please let me know what I did wrong or miss in there? thanks a > lot, > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From laurichj at bioinfo.ucr.edu Thu Jul 22 12:13:19 2004 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Thu Jul 22 12:15:02 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: <20040722103207.GB25986@ebi.ac.uk> References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> <40FD2F65.50005@netvisao.pt> <40FF9269.6090507@egenetics.com> <20040722103207.GB25986@ebi.ac.uk> Message-ID: <20040722161319.GA2989@batch107a> On Thu 07/22/04 11:32, Andreas Kahari wrote: > On Thu, Jul 22, 2004 at 11:09:45AM +0100, Peter van Heusden wrote: > Or MCL, which I think Jason has provided some blast-related > input to: > > http://micans.org/mcl/ In my experiance MCL gave fairly horrible results when used with proteins that may have "fused" domains. But then, all the ones we tried were flawed. MCL just claimed to deal with multiple domains. Anyhow, this was last summer so a) it might have changed or b) I'm remembering another program, even c) I'm just plain wrong ;) Even so, I think MCL still requires an All vs All blast. Another option (which is installed with blast) is blastclust. This does single linkage, so it's not that great if you have multiple domains in your proteins, but none are. The problem is, with multiple domains you end up clustering like: ----------- --------------------- -------------- ------ ------ ------- -------- ------------ -------------- ----------------- --------------- --------- -------- --------- where that should be three clusters, it gets grouped into one. Can stackPACK(?) deal with that gracefully? -- ------------------------------------------------------ | Josh Lauricha | Ford, you're turning | | laurichj@bioinfo.ucr.edu | into a penguin. Stop | | Bioinformatics, UCR | it | |----------------------------------------------------| | OpenPG: | | 4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 | |----------------------------------------------------| From jason at cgt.duhs.duke.edu Thu Jul 22 12:13:20 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 22 12:15:06 2004 Subject: [Bioperl-l] Bioperl exceptions In-Reply-To: <40FFE108.1090701@egenetics.com> References: <40FFE108.1090701@egenetics.com> Message-ID: On Thu, 22 Jul 2004, Peter van Heusden wrote: > Hi Bioperl developers > > I'm a bit confused by the current state of play of the Bioperl exception > class (Bio::Root::Exception). This inherits from Error, but Error.pm is > not a requirement of Bioperl. It seems to me then that there are two > exception mechanisms in Bioperl: > > 1) Exceptions based on exception objects, derived from Error.pm and > implmented by Bio:Root::Exception and its subclasses. > 2) Simple "stack dump" exceptions, which are implemented in Bio::Root's > throw() method and used if Error.pm is not available. > > Is this right? If so, is the necessary result is that Error-based > exceptions can't be used as part of any core Bioperl class? (Since that > would force a dependence on Error.pm) I see that in the biodesign.pod > documentation, the only semantics mentioned are the basic string based > ones. Does this mean that using exception objects is deprecated (I see > the Exception.pm class was last updated in Bioperl 1.3.01 days)? The newer exception handling and throwing has been the brainchild of Steve Chervitz. I don't know what the long term plan is for this. The standard way is just to use Root::RootI throw/warn. Aaron and I've experimented with the newer Exception stuff use in Bio:Tools::Phylo::PAML. There is a void here in that no one has really stepped up to say "this is the way we want to do it". Steve's code I guess your observation about Error.pm based exceptions are correct - to be honest I really don't know what should be the right thing here. Not much code in bioperl has try/catch blocks so I don't know that the use of either are particularly useful. -jason > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From james.wasmuth at ed.ac.uk Thu Jul 22 12:42:03 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Thu Jul 22 12:49:12 2004 Subject: [Bioperl-l] All-vs-all BLAST In-Reply-To: <20040722161319.GA2989@batch107a> References: <140EA3CE-DA56-11D8-9BE5-000A95BDE2C6@uhnres.utoronto.ca> <40FD2F65.50005@netvisao.pt> <40FF9269.6090507@egenetics.com> <20040722103207.GB25986@ebi.ac.uk> <20040722161319.GA2989@batch107a> Message-ID: <40FFEE5B.6090205@ed.ac.uk> If the idea is to look for "domains" two recent studys both with excellent reviews in their introductions are: Bioinformatics - 20:1335 (2004) NAR - 32:3522 (2004). I can't remember how this thread started, but I'm sure the phrase "no point in re-inventing the wheel" was used, and there really isn't with regard to "domain" prediction unless you are hell-bent on it. Anyway that's my ?0.02 ($0.036) worth -james Josh Lauricha wrote: >On Thu 07/22/04 11:32, Andreas Kahari wrote: > > >>On Thu, Jul 22, 2004 at 11:09:45AM +0100, Peter van Heusden wrote: >>Or MCL, which I think Jason has provided some blast-related >>input to: >> >> http://micans.org/mcl/ >> >> > >In my experiance MCL gave fairly horrible results when used with >proteins that may have "fused" domains. But then, all the ones we tried >were flawed. MCL just claimed to deal with multiple domains. Anyhow, >this was last summer so a) it might have changed or b) I'm remembering >another program, even c) I'm just plain wrong ;) > >Even so, I think MCL still requires an All vs All blast. > >Another option (which is installed with blast) is blastclust. This >does single linkage, so it's not that great if you have multiple >domains in your proteins, but none are. The problem is, with >multiple domains you end up clustering like: >----------- --------------------- -------------- >------ >------ >------- >-------- > ------------ > -------------- > ----------------- > --------------- > --------- > -------- > --------- > >where that should be three clusters, it gets grouped into one. > >Can stackPACK(?) deal with that gracefully? > > > -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From jason at cgt.duhs.duke.edu Thu Jul 22 14:38:26 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 22 14:40:05 2004 Subject: [Bioperl-l] bioperl installation In-Reply-To: References: Message-ID: Someone has already setup CPAN on the machine - you need to coordinate with the sysadmin or else download the modules by hand and do the install manualy. for each module (Inline::C, et) % perl Makefile.PL LIB=~/lib/perl % make % make install I have logged into mendel and had the same problems, but I don't have time to figure out the best workaround for you other than downloading the modules by hand for now or else getting the admin to install these for you. -jason On Thu, 22 Jul 2004, Xiang Deng wrote: > Brian, > > I tried that, but I could not enter CPAN shell, > > mendel 215% perl -e shell -MCPAN > > > Your configuration suggests that CPAN.pm should use a working > directory of > /.cpan > Unfortunately we could not create the lock file > /.cpan/.lock > due to permission problems. > > Please make sure that the configuration variable > $CPAN::Config->{cpan_home} > points to a directory where you can write a .lock file. You can set > this variable in either > /usr/local/lib/perl5/5.6.1/CPAN/Config.pm > or > /v0/users/deng0007/.cpan/CPAN/MyConfig.pm > > Could not open >/.cpan/.lock: Permission denied > > Do I need to create all of those ".cpan/CPAN/MyConfig.pm"? I do not even > have the .cpan directory. > > thanks > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > > > > > "Brian Osborne" > Sent by: bioperl-l-bounces@portal.open-bio.org > 07/22/2004 12:01 PM > > To > "Xiang Deng" , "Jason Stajich" > > cc > bioperl-l-bounces@portal.open-bio.org, bioperl-l@bioperl.org > Subject > RE: [Bioperl-l] bioperl installation > > > > > > > Xiang, > > The section in the on-line instruction, the INSTALL page, talking about > installing in a personal area using CPAN is called "INSTALLING BIOPERL IN > A > PERSONAL MODULE AREA": > > You can also use CPAN to install accessory modules in your > local directory. First enter the CPAN shell, then set the > arguments for the command "perl Makefile.PL", like this: > > >perl -e shell -MCPAN > cpan>o conf makepl_arg LIB=/home/users/dag/My_Local_Perl_Modules > > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Xiang Deng > Sent: Thursday, July 22, 2004 11:42 AM > To: Jason Stajich > Cc: bioperl-l-bounces@portal.open-bio.org; bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] bioperl installation > > Thanks for your message, Jason. > > The problem is I don't have permission to install bioperl into the > standard site_perl/system area. From the on-line instruction at > bioperl.org the only way I can get it installed is to use "make", not > CPAN. > > So could you please let me know where I can download Inline::MakeMaker and > Inline::C module. Everything I downloaded is from bioperl.org. I did not > see where I can get these two. And if I get it which directory I should > put them in. > > thanks, > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > > > > > Jason Stajich > Sent by: bioperl-l-bounces@portal.open-bio.org > 07/22/2004 11:25 AM > > To > Xiang Deng > cc > bioperl-l@bioperl.org > Subject > Re: [Bioperl-l] bioperl installation > > > > > > > > Did you install Inline::C? > >From the README in bioperl-ext directory > o Installing > > Depending on your choise of extensions, you might need > Inline::MakeMaker and Inline::C to create the makefile. Use for > example the cpan program to install Inline::MakeMaker and answer yes > when prompted to install Inline::C. > > This line gives you a hint: > > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > > Did you also install the staden library? > -jason > On Thu, 22 Jul 2004, Xiang Deng wrote: > > > To whom it may concern, > > > > I am trying to install bioperl locally under my peronal folder. The core > > modules were successfully installed. I ran through a simple perl script > to > > use bio::perl or bio::seq without complain. but I got problem with > > installing bioperl-ext-1.4. > > > > When I installed core modules I used this > > >perl Makefile.PL LIB=/my_local_directory/ > > > > My first question is whether or not I have to use the same way to deal > > with ext modules too and I can use the same directory or a separate one. > > > > Actually I tried both and both failed with error message as follows: > > > > mendel 207% perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::Ext::Align > > ERROR from evaluation of > > > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: > > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > > /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 > > /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix > > /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl > > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line > 1. > > BEGIN failed--compilation aborted at ./Makefile.PL line 1. > > > > could you please let me know what I did wrong or miss in there? thanks a > > lot, > > > > Xiang > > > > Department of Pharmacology and Cancer Biology > > Duke University Medical Center > > Durham, NC 27710 > > Phone: 919-4792339 > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jwon at ucalgary.ca Thu Jul 22 16:07:35 2004 From: jwon at ucalgary.ca (jwon@ucalgary.ca) Date: Thu Jul 22 16:09:21 2004 Subject: [Bioperl-l] StandAloneBlast.pm and RemoteBlast.pm Message-ID: <200407222007.i6MK7ak13285@smtp2.ucalgary.ca> Hi, I am trying to use StandAloneBlast.pm and RemoteBlast.pm to automate BLASTing of several sequences that I am interested in. I need to use the 'tblastn' program against the 'est' database. However, when i set the parameters for the StandAloneBlast.pm and RemoteBlast.pm, I code: my @params = ('program'=>'tblastn', 'database'=>'est', _READMETHOD=>"Blast") my $f = Bio::Tools::Run::StandAloneBlast->new(@params); $f->outfile('blast.out'); my $blast_report = $f->blastall($sequence); but after execution, it tells me: [blastall] WARNING: gi|11499797|ref|NP_071040.1|: Could not find index files for database /a1000/formatted_dbs/genbanknr/est ------------- EXCEPTION -------------- MSG: blastall call crashed: when i run the same program with the database parameter set to "swissprot, and program set to "blastp", the program works fine. Do these objects not work with the tblastn program against the est database? Thank you in advance for your help. Jackson From jason at cgt.duhs.duke.edu Thu Jul 22 16:28:58 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 22 16:30:36 2004 Subject: [Bioperl-l] StandAloneBlast.pm and RemoteBlast.pm In-Reply-To: <200407222007.i6MK7ak13285@smtp2.ucalgary.ca> References: <200407222007.i6MK7ak13285@smtp2.ucalgary.ca> Message-ID: They work fine with all NCBI blastall T?BLAST[XPN]. Are you sure your databases were formatted with % formatdb -i est -p F Are you sure your index files for the database are in that directory. Does a commandline blast of tblastn against this est db work without invoking bioperl at all. There is nothing magic in standaloneblast - it is just setting commandline arguments to blastall for you. Add the parameter -verbose => 1 and you will see the command-line that is being executed by perl. Check that it is consistent with what you expect it be saying. -jason On Thu, 22 Jul 2004 jwon@ucalgary.ca wrote: > > Hi, > > I am trying to use StandAloneBlast.pm and RemoteBlast.pm to automate BLASTing > of several sequences that I am interested in. I need to use the 'tblastn' > program against the 'est' database. However, when i set the parameters for > the StandAloneBlast.pm and RemoteBlast.pm, I code: > > my @params = ('program'=>'tblastn', 'database'=>'est', _READMETHOD=>"Blast") > my $f = Bio::Tools::Run::StandAloneBlast->new(@params); > $f->outfile('blast.out'); > my $blast_report = $f->blastall($sequence); > > but after execution, it tells me: > > [blastall] WARNING: gi|11499797|ref|NP_071040.1|: Could not find index files > for database /a1000/formatted_dbs/genbanknr/est > ------------- EXCEPTION -------------- > MSG: blastall call crashed: > > when i run the same program with the database parameter set to "swissprot, > and program set to "blastp", the program works fine. > > Do these objects not work with the tblastn program against the est database? > > Thank you in advance for your help. > > Jackson > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Thu Jul 22 16:32:23 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Jul 22 16:34:03 2004 Subject: [Bioperl-l] bioperl installation In-Reply-To: References: Message-ID: So you didn't install staden's io_lib... Before you go any further. Do you really want to do this? i.e. do you intend to process abi tracefiles with bioperl? if not, don't worry about this. if you want the alignment stuff go into the Bio/Align directory and do the makefile stuff there. READ THE README alot of questions/problems are discussed in there. If you don't want the in-perl alignment stuff, stop now and forget about bioperl-ext. -jason On Thu, 22 Jul 2004, Xiang Deng wrote: > Jason, > > Thanks for trying that for me. > > I already downloaded and installed Inline-0.44 under directory > "my_bioperl/bioperl-ext-1.4" and staden library io_lib-1.8.12 under > directory "my_bioperl/". > but I still have following errors when installing bioperl-ext-1.4 > > > mendel 297% perl Makefile.PL > LIB=/v0/users/deng0007/my_bioperl/bioperl-ext-1.4 > Writing Makefile for Bio::Ext::Align > Warning: prerequisite Bio::SeqIO::abi 0 not found. > Please tell us where your Staden io_lib "read" library is installed: > [/usr/local/lib] [/v0/users/deng0007/my_bioperl] > Please tell us where your Staden io_lib "Read.h" header is installed: > [[/v0/users/deng0007/my_bioperl]/io_lib] [/v0/users/deng0007/my_bioperl] > Writing Makefile for Bio::SeqIO::staden::read > Writing Makefile for Bio > mendel 298% make > > > DEFINE='-DPOSIX -DNOERROR'; CC='cc -n32'; export DEFINE INC CC; \ > cd libs && make CC=cc -n32 libsw.a -e > Usage: make [-f makefile] [-p] [-i] [-k] [-s] [-r] [-n] [-u] > [-d] [-D] [-S] [-g] [-w] [-P] [-B] [-b] [-O] [-e] [-t] [-q] [-M] > [-N] [names] > *** Error code 1 (bu21) > *** Error code 1 (bu21) > > Do you know what is the problem here? I just followed the install > instruction, do I miss any parameters there? > > thanks, > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > > > > > Jason Stajich > 07/22/2004 02:38 PM > > To > Xiang Deng > cc > Bioperl > Subject > RE: [Bioperl-l] bioperl installation > > > > > > > Someone has already setup CPAN on the machine - you need to coordinate > with the sysadmin or else download the modules by hand and do the install > manualy. > > for each module (Inline::C, et) > % perl Makefile.PL LIB=~/lib/perl > % make > % make install > > I have logged into mendel and had the same problems, but I don't have time > to figure out the best workaround for you other than downloading the > modules by hand for now or else getting the admin to install these for > you. > > -jason > > On Thu, 22 Jul 2004, Xiang Deng wrote: > > > Brian, > > > > I tried that, but I could not enter CPAN shell, > > > > mendel 215% perl -e shell -MCPAN > > > > > > Your configuration suggests that CPAN.pm should use a working > > directory of > > /.cpan > > Unfortunately we could not create the lock file > > /.cpan/.lock > > due to permission problems. > > > > Please make sure that the configuration variable > > $CPAN::Config->{cpan_home} > > points to a directory where you can write a .lock file. You can set > > this variable in either > > /usr/local/lib/perl5/5.6.1/CPAN/Config.pm > > or > > /v0/users/deng0007/.cpan/CPAN/MyConfig.pm > > > > Could not open >/.cpan/.lock: Permission denied > > > > Do I need to create all of those ".cpan/CPAN/MyConfig.pm"? I do not even > > have the .cpan directory. > > > > thanks > > > > Xiang > > > > Department of Pharmacology and Cancer Biology > > Duke University Medical Center > > Durham, NC 27710 > > Phone: 919-4792339 > > > > > > > > > > "Brian Osborne" > > Sent by: bioperl-l-bounces@portal.open-bio.org > > 07/22/2004 12:01 PM > > > > To > > "Xiang Deng" , "Jason Stajich" > > > > cc > > bioperl-l-bounces@portal.open-bio.org, bioperl-l@bioperl.org > > Subject > > RE: [Bioperl-l] bioperl installation > > > > > > > > > > > > > > Xiang, > > > > The section in the on-line instruction, the INSTALL page, talking about > > installing in a personal area using CPAN is called "INSTALLING BIOPERL > IN > > A > > PERSONAL MODULE AREA": > > > > You can also use CPAN to install accessory modules in your > > local directory. First enter the CPAN shell, then set the > > arguments for the command "perl Makefile.PL", like this: > > > > >perl -e shell -MCPAN > > cpan>o conf makepl_arg LIB=/home/users/dag/My_Local_Perl_Modules > > > > > > Brian O. > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Xiang Deng > > Sent: Thursday, July 22, 2004 11:42 AM > > To: Jason Stajich > > Cc: bioperl-l-bounces@portal.open-bio.org; bioperl-l@bioperl.org > > Subject: Re: [Bioperl-l] bioperl installation > > > > Thanks for your message, Jason. > > > > The problem is I don't have permission to install bioperl into the > > standard site_perl/system area. From the on-line instruction at > > bioperl.org the only way I can get it installed is to use "make", not > > CPAN. > > > > So could you please let me know where I can download Inline::MakeMaker > and > > Inline::C module. Everything I downloaded is from bioperl.org. I did not > > see where I can get these two. And if I get it which directory I should > > put them in. > > > > thanks, > > > > Xiang > > > > Department of Pharmacology and Cancer Biology > > Duke University Medical Center > > Durham, NC 27710 > > Phone: 919-4792339 > > > > > > > > > > Jason Stajich > > Sent by: bioperl-l-bounces@portal.open-bio.org > > 07/22/2004 11:25 AM > > > > To > > Xiang Deng > > cc > > bioperl-l@bioperl.org > > Subject > > Re: [Bioperl-l] bioperl installation > > > > > > > > > > > > > > > > Did you install Inline::C? > > >From the README in bioperl-ext directory > > o Installing > > > > Depending on your choise of extensions, you might need > > Inline::MakeMaker and Inline::C to create the makefile. Use for > > example the cpan program to install Inline::MakeMaker and answer yes > > when prompted to install Inline::C. > > > > This line gives you a hint: > > > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > > > > Did you also install the staden library? > > -jason > > On Thu, 22 Jul 2004, Xiang Deng wrote: > > > > > To whom it may concern, > > > > > > I am trying to install bioperl locally under my peronal folder. The > core > > > modules were successfully installed. I ran through a simple perl > script > > to > > > use bio::perl or bio::seq without complain. but I got problem with > > > installing bioperl-ext-1.4. > > > > > > When I installed core modules I used this > > > >perl Makefile.PL LIB=/my_local_directory/ > > > > > > My first question is whether or not I have to use the same way to deal > > > with ext modules too and I can use the same directory or a separate > one. > > > > > > Actually I tried both and both failed with error message as follows: > > > > > > mendel 207% perl Makefile.PL > > > Checking if your kit is complete... > > > Looks good > > > Writing Makefile for Bio::Ext::Align > > > ERROR from evaluation of > > > > > > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: > > > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > > > /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 > > > /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix > > > /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl > > > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line > > 1. > > > BEGIN failed--compilation aborted at ./Makefile.PL line 1. > > > > > > could you please let me know what I did wrong or miss in there? thanks > a > > > lot, > > > > > > Xiang > > > > > > Department of Pharmacology and Cancer Biology > > > Duke University Medical Center > > > Durham, NC 27710 > > > Phone: 919-4792339 > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From malabika at ibioinformatics.org Sat Jul 24 13:22:22 2004 From: malabika at ibioinformatics.org (Malabika S) Date: Fri Jul 23 11:20:30 2004 Subject: [Bioperl-l] installation! Message-ID: <1090689954.1983.2.camel@medusa> hello i have a linux os, i went thru the documentation for installing, but i did not get it... could u please tell me step by step how to install bioperl? i hope it is done thru command line interface on new terminal? thanks a lot m.sarker. -- Malabika Sarker, Ph.D Research Scientist Institute Of Bioinformatics Unit 1, Level 7,Discoverer Block International Tech Park Whitefield Road Bangalore 560066 Karnataka, India From Nathan.Agrin at umassmed.edu Fri Jul 23 10:24:39 2004 From: Nathan.Agrin at umassmed.edu (Agrin, Nathan) Date: Fri Jul 23 11:20:38 2004 Subject: [Bioperl-l] Trypic digest predictor Message-ID: <89AA811FD79DC94788093B23DA79E71F0184D234@edunivmail02.ad.umassmed.edu> Is there a bioperl module that will spit out all the predicted peptides from a tryptic digest? What would be great is if you could specify the # of missed cleavages. Thanks, Nate From xiang.deng at duke.edu Thu Jul 22 14:32:05 2004 From: xiang.deng at duke.edu (Xiang Deng) Date: Fri Jul 23 11:20:45 2004 Subject: [Bioperl-l] bioperl installation In-Reply-To: Message-ID: Brian, I tried that, but I could not enter CPAN shell, mendel 215% perl -e shell -MCPAN Your configuration suggests that CPAN.pm should use a working directory of /.cpan Unfortunately we could not create the lock file /.cpan/.lock due to permission problems. Please make sure that the configuration variable $CPAN::Config->{cpan_home} points to a directory where you can write a .lock file. You can set this variable in either /usr/local/lib/perl5/5.6.1/CPAN/Config.pm or /v0/users/deng0007/.cpan/CPAN/MyConfig.pm Could not open >/.cpan/.lock: Permission denied Do I need to create all of those ".cpan/CPAN/MyConfig.pm"? I do not even have the .cpan directory. thanks Xiang Department of Pharmacology and Cancer Biology Duke University Medical Center Durham, NC 27710 Phone: 919-4792339 "Brian Osborne" Sent by: bioperl-l-bounces@portal.open-bio.org 07/22/2004 12:01 PM To "Xiang Deng" , "Jason Stajich" cc bioperl-l-bounces@portal.open-bio.org, bioperl-l@bioperl.org Subject RE: [Bioperl-l] bioperl installation Xiang, The section in the on-line instruction, the INSTALL page, talking about installing in a personal area using CPAN is called "INSTALLING BIOPERL IN A PERSONAL MODULE AREA": You can also use CPAN to install accessory modules in your local directory. First enter the CPAN shell, then set the arguments for the command "perl Makefile.PL", like this: >perl -e shell -MCPAN cpan>o conf makepl_arg LIB=/home/users/dag/My_Local_Perl_Modules Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Xiang Deng Sent: Thursday, July 22, 2004 11:42 AM To: Jason Stajich Cc: bioperl-l-bounces@portal.open-bio.org; bioperl-l@bioperl.org Subject: Re: [Bioperl-l] bioperl installation Thanks for your message, Jason. The problem is I don't have permission to install bioperl into the standard site_perl/system area. From the on-line instruction at bioperl.org the only way I can get it installed is to use "make", not CPAN. So could you please let me know where I can download Inline::MakeMaker and Inline::C module. Everything I downloaded is from bioperl.org. I did not see where I can get these two. And if I get it which directory I should put them in. thanks, Xiang Department of Pharmacology and Cancer Biology Duke University Medical Center Durham, NC 27710 Phone: 919-4792339 Jason Stajich Sent by: bioperl-l-bounces@portal.open-bio.org 07/22/2004 11:25 AM To Xiang Deng cc bioperl-l@bioperl.org Subject Re: [Bioperl-l] bioperl installation Did you install Inline::C? >From the README in bioperl-ext directory o Installing Depending on your choise of extensions, you might need Inline::MakeMaker and Inline::C to create the makefile. Use for example the cpan program to install Inline::MakeMaker and answer yes when prompted to install Inline::C. This line gives you a hint: > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: Did you also install the staden library? -jason On Thu, 22 Jul 2004, Xiang Deng wrote: > To whom it may concern, > > I am trying to install bioperl locally under my peronal folder. The core > modules were successfully installed. I ran through a simple perl script to > use bio::perl or bio::seq without complain. but I got problem with > installing bioperl-ext-1.4. > > When I installed core modules I used this > >perl Makefile.PL LIB=/my_local_directory/ > > My first question is whether or not I have to use the same way to deal > with ext modules too and I can use the same directory or a separate one. > > Actually I tried both and both failed with error message as follows: > > mendel 207% perl Makefile.PL > Checking if your kit is complete... > Looks good > Writing Makefile for Bio::Ext::Align > ERROR from evaluation of > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 > /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix > /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line 1. > BEGIN failed--compilation aborted at ./Makefile.PL line 1. > > could you please let me know what I did wrong or miss in there? thanks a > lot, > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From xiang.deng at duke.edu Thu Jul 22 16:09:44 2004 From: xiang.deng at duke.edu (Xiang Deng) Date: Fri Jul 23 11:20:47 2004 Subject: [Bioperl-l] bioperl installation In-Reply-To: Message-ID: Jason, Thanks for trying that for me. I already downloaded and installed Inline-0.44 under directory "my_bioperl/bioperl-ext-1.4" and staden library io_lib-1.8.12 under directory "my_bioperl/". but I still have following errors when installing bioperl-ext-1.4 mendel 297% perl Makefile.PL LIB=/v0/users/deng0007/my_bioperl/bioperl-ext-1.4 Writing Makefile for Bio::Ext::Align Warning: prerequisite Bio::SeqIO::abi 0 not found. Please tell us where your Staden io_lib "read" library is installed: [/usr/local/lib] [/v0/users/deng0007/my_bioperl] Please tell us where your Staden io_lib "Read.h" header is installed: [[/v0/users/deng0007/my_bioperl]/io_lib] [/v0/users/deng0007/my_bioperl] Writing Makefile for Bio::SeqIO::staden::read Writing Makefile for Bio mendel 298% make DEFINE='-DPOSIX -DNOERROR'; CC='cc -n32'; export DEFINE INC CC; \ cd libs && make CC=cc -n32 libsw.a -e Usage: make [-f makefile] [-p] [-i] [-k] [-s] [-r] [-n] [-u] [-d] [-D] [-S] [-g] [-w] [-P] [-B] [-b] [-O] [-e] [-t] [-q] [-M] [-N] [names] *** Error code 1 (bu21) *** Error code 1 (bu21) Do you know what is the problem here? I just followed the install instruction, do I miss any parameters there? thanks, Xiang Department of Pharmacology and Cancer Biology Duke University Medical Center Durham, NC 27710 Phone: 919-4792339 Jason Stajich 07/22/2004 02:38 PM To Xiang Deng cc Bioperl Subject RE: [Bioperl-l] bioperl installation Someone has already setup CPAN on the machine - you need to coordinate with the sysadmin or else download the modules by hand and do the install manualy. for each module (Inline::C, et) % perl Makefile.PL LIB=~/lib/perl % make % make install I have logged into mendel and had the same problems, but I don't have time to figure out the best workaround for you other than downloading the modules by hand for now or else getting the admin to install these for you. -jason On Thu, 22 Jul 2004, Xiang Deng wrote: > Brian, > > I tried that, but I could not enter CPAN shell, > > mendel 215% perl -e shell -MCPAN > > > Your configuration suggests that CPAN.pm should use a working > directory of > /.cpan > Unfortunately we could not create the lock file > /.cpan/.lock > due to permission problems. > > Please make sure that the configuration variable > $CPAN::Config->{cpan_home} > points to a directory where you can write a .lock file. You can set > this variable in either > /usr/local/lib/perl5/5.6.1/CPAN/Config.pm > or > /v0/users/deng0007/.cpan/CPAN/MyConfig.pm > > Could not open >/.cpan/.lock: Permission denied > > Do I need to create all of those ".cpan/CPAN/MyConfig.pm"? I do not even > have the .cpan directory. > > thanks > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > > > > > "Brian Osborne" > Sent by: bioperl-l-bounces@portal.open-bio.org > 07/22/2004 12:01 PM > > To > "Xiang Deng" , "Jason Stajich" > > cc > bioperl-l-bounces@portal.open-bio.org, bioperl-l@bioperl.org > Subject > RE: [Bioperl-l] bioperl installation > > > > > > > Xiang, > > The section in the on-line instruction, the INSTALL page, talking about > installing in a personal area using CPAN is called "INSTALLING BIOPERL IN > A > PERSONAL MODULE AREA": > > You can also use CPAN to install accessory modules in your > local directory. First enter the CPAN shell, then set the > arguments for the command "perl Makefile.PL", like this: > > >perl -e shell -MCPAN > cpan>o conf makepl_arg LIB=/home/users/dag/My_Local_Perl_Modules > > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Xiang Deng > Sent: Thursday, July 22, 2004 11:42 AM > To: Jason Stajich > Cc: bioperl-l-bounces@portal.open-bio.org; bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] bioperl installation > > Thanks for your message, Jason. > > The problem is I don't have permission to install bioperl into the > standard site_perl/system area. From the on-line instruction at > bioperl.org the only way I can get it installed is to use "make", not > CPAN. > > So could you please let me know where I can download Inline::MakeMaker and > Inline::C module. Everything I downloaded is from bioperl.org. I did not > see where I can get these two. And if I get it which directory I should > put them in. > > thanks, > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > > > > > Jason Stajich > Sent by: bioperl-l-bounces@portal.open-bio.org > 07/22/2004 11:25 AM > > To > Xiang Deng > cc > bioperl-l@bioperl.org > Subject > Re: [Bioperl-l] bioperl installation > > > > > > > > Did you install Inline::C? > >From the README in bioperl-ext directory > o Installing > > Depending on your choise of extensions, you might need > Inline::MakeMaker and Inline::C to create the makefile. Use for > example the cpan program to install Inline::MakeMaker and answer yes > when prompted to install Inline::C. > > This line gives you a hint: > > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > > Did you also install the staden library? > -jason > On Thu, 22 Jul 2004, Xiang Deng wrote: > > > To whom it may concern, > > > > I am trying to install bioperl locally under my peronal folder. The core > > modules were successfully installed. I ran through a simple perl script > to > > use bio::perl or bio::seq without complain. but I got problem with > > installing bioperl-ext-1.4. > > > > When I installed core modules I used this > > >perl Makefile.PL LIB=/my_local_directory/ > > > > My first question is whether or not I have to use the same way to deal > > with ext modules too and I can use the same directory or a separate one. > > > > Actually I tried both and both failed with error message as follows: > > > > mendel 207% perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::Ext::Align > > ERROR from evaluation of > > > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: > > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > > /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 > > /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix > > /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl > > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line > 1. > > BEGIN failed--compilation aborted at ./Makefile.PL line 1. > > > > could you please let me know what I did wrong or miss in there? thanks a > > lot, > > > > Xiang > > > > Department of Pharmacology and Cancer Biology > > Duke University Medical Center > > Durham, NC 27710 > > Phone: 919-4792339 > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From xiang.deng at duke.edu Fri Jul 23 10:39:42 2004 From: xiang.deng at duke.edu (Xiang Deng) Date: Fri Jul 23 11:20:49 2004 Subject: [Bioperl-l] bioperl installation In-Reply-To: Message-ID: You are right. And anyway I should make something work first. I followed your suggestion and go into the Bio/Align directory and do the makefile stuff there, I do need the alignment stuff. but I still have the same error, mendel 308% pwd /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/Ext/Align mendel 309% perl Makefile.PL LIB=/v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/Ext/Align Writing Makefile for Bio::Ext::Align mendel 310% make DEFINE='-DPOSIX -DNOERROR'; CC='cc -n32'; export DEFINE INC CC; \ cd libs && make CC=cc -n32 libsw.a -e Usage: make [-f makefile] [-p] [-i] [-k] [-s] [-r] [-n] [-u] [-d] [-D] [-S] [-g] [-w] [-P] [-B] [-b] [-O] [-e] [-t] [-q] [-M] [-N] [names] *** Error code 1 (bu21) What is the error message refering to? Any idea of what I could try next. thanks Xiang Department of Pharmacology and Cancer Biology Duke University Medical Center Durham, NC 27710 Phone: 919-4792339 Jason Stajich 07/22/2004 04:32 PM To Xiang Deng cc Bioperl Subject RE: [Bioperl-l] bioperl installation So you didn't install staden's io_lib... Before you go any further. Do you really want to do this? i.e. do you intend to process abi tracefiles with bioperl? if not, don't worry about this. if you want the alignment stuff go into the Bio/Align directory and do the makefile stuff there. READ THE README alot of questions/problems are discussed in there. If you don't want the in-perl alignment stuff, stop now and forget about bioperl-ext. -jason On Thu, 22 Jul 2004, Xiang Deng wrote: > Jason, > > Thanks for trying that for me. > > I already downloaded and installed Inline-0.44 under directory > "my_bioperl/bioperl-ext-1.4" and staden library io_lib-1.8.12 under > directory "my_bioperl/". > but I still have following errors when installing bioperl-ext-1.4 > > > mendel 297% perl Makefile.PL > LIB=/v0/users/deng0007/my_bioperl/bioperl-ext-1.4 > Writing Makefile for Bio::Ext::Align > Warning: prerequisite Bio::SeqIO::abi 0 not found. > Please tell us where your Staden io_lib "read" library is installed: > [/usr/local/lib] [/v0/users/deng0007/my_bioperl] > Please tell us where your Staden io_lib "Read.h" header is installed: > [[/v0/users/deng0007/my_bioperl]/io_lib] [/v0/users/deng0007/my_bioperl] > Writing Makefile for Bio::SeqIO::staden::read > Writing Makefile for Bio > mendel 298% make > > > DEFINE='-DPOSIX -DNOERROR'; CC='cc -n32'; export DEFINE INC CC; \ > cd libs && make CC=cc -n32 libsw.a -e > Usage: make [-f makefile] [-p] [-i] [-k] [-s] [-r] [-n] [-u] > [-d] [-D] [-S] [-g] [-w] [-P] [-B] [-b] [-O] [-e] [-t] [-q] [-M] > [-N] [names] > *** Error code 1 (bu21) > *** Error code 1 (bu21) > > Do you know what is the problem here? I just followed the install > instruction, do I miss any parameters there? > > thanks, > > Xiang > > Department of Pharmacology and Cancer Biology > Duke University Medical Center > Durham, NC 27710 > Phone: 919-4792339 > > > > > Jason Stajich > 07/22/2004 02:38 PM > > To > Xiang Deng > cc > Bioperl > Subject > RE: [Bioperl-l] bioperl installation > > > > > > > Someone has already setup CPAN on the machine - you need to coordinate > with the sysadmin or else download the modules by hand and do the install > manualy. > > for each module (Inline::C, et) > % perl Makefile.PL LIB=~/lib/perl > % make > % make install > > I have logged into mendel and had the same problems, but I don't have time > to figure out the best workaround for you other than downloading the > modules by hand for now or else getting the admin to install these for > you. > > -jason > > On Thu, 22 Jul 2004, Xiang Deng wrote: > > > Brian, > > > > I tried that, but I could not enter CPAN shell, > > > > mendel 215% perl -e shell -MCPAN > > > > > > Your configuration suggests that CPAN.pm should use a working > > directory of > > /.cpan > > Unfortunately we could not create the lock file > > /.cpan/.lock > > due to permission problems. > > > > Please make sure that the configuration variable > > $CPAN::Config->{cpan_home} > > points to a directory where you can write a .lock file. You can set > > this variable in either > > /usr/local/lib/perl5/5.6.1/CPAN/Config.pm > > or > > /v0/users/deng0007/.cpan/CPAN/MyConfig.pm > > > > Could not open >/.cpan/.lock: Permission denied > > > > Do I need to create all of those ".cpan/CPAN/MyConfig.pm"? I do not even > > have the .cpan directory. > > > > thanks > > > > Xiang > > > > Department of Pharmacology and Cancer Biology > > Duke University Medical Center > > Durham, NC 27710 > > Phone: 919-4792339 > > > > > > > > > > "Brian Osborne" > > Sent by: bioperl-l-bounces@portal.open-bio.org > > 07/22/2004 12:01 PM > > > > To > > "Xiang Deng" , "Jason Stajich" > > > > cc > > bioperl-l-bounces@portal.open-bio.org, bioperl-l@bioperl.org > > Subject > > RE: [Bioperl-l] bioperl installation > > > > > > > > > > > > > > Xiang, > > > > The section in the on-line instruction, the INSTALL page, talking about > > installing in a personal area using CPAN is called "INSTALLING BIOPERL > IN > > A > > PERSONAL MODULE AREA": > > > > You can also use CPAN to install accessory modules in your > > local directory. First enter the CPAN shell, then set the > > arguments for the command "perl Makefile.PL", like this: > > > > >perl -e shell -MCPAN > > cpan>o conf makepl_arg LIB=/home/users/dag/My_Local_Perl_Modules > > > > > > Brian O. > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Xiang Deng > > Sent: Thursday, July 22, 2004 11:42 AM > > To: Jason Stajich > > Cc: bioperl-l-bounces@portal.open-bio.org; bioperl-l@bioperl.org > > Subject: Re: [Bioperl-l] bioperl installation > > > > Thanks for your message, Jason. > > > > The problem is I don't have permission to install bioperl into the > > standard site_perl/system area. From the on-line instruction at > > bioperl.org the only way I can get it installed is to use "make", not > > CPAN. > > > > So could you please let me know where I can download Inline::MakeMaker > and > > Inline::C module. Everything I downloaded is from bioperl.org. I did not > > see where I can get these two. And if I get it which directory I should > > put them in. > > > > thanks, > > > > Xiang > > > > Department of Pharmacology and Cancer Biology > > Duke University Medical Center > > Durham, NC 27710 > > Phone: 919-4792339 > > > > > > > > > > Jason Stajich > > Sent by: bioperl-l-bounces@portal.open-bio.org > > 07/22/2004 11:25 AM > > > > To > > Xiang Deng > > cc > > bioperl-l@bioperl.org > > Subject > > Re: [Bioperl-l] bioperl installation > > > > > > > > > > > > > > > > Did you install Inline::C? > > >From the README in bioperl-ext directory > > o Installing > > > > Depending on your choise of extensions, you might need > > Inline::MakeMaker and Inline::C to create the makefile. Use for > > example the cpan program to install Inline::MakeMaker and answer yes > > when prompted to install Inline::C. > > > > This line gives you a hint: > > > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > > > > Did you also install the staden library? > > -jason > > On Thu, 22 Jul 2004, Xiang Deng wrote: > > > > > To whom it may concern, > > > > > > I am trying to install bioperl locally under my peronal folder. The > core > > > modules were successfully installed. I ran through a simple perl > script > > to > > > use bio::perl or bio::seq without complain. but I got problem with > > > installing bioperl-ext-1.4. > > > > > > When I installed core modules I used this > > > >perl Makefile.PL LIB=/my_local_directory/ > > > > > > My first question is whether or not I have to use the same way to deal > > > with ext modules too and I can use the same directory or a separate > one. > > > > > > Actually I tried both and both failed with error message as follows: > > > > > > mendel 207% perl Makefile.PL > > > Checking if your kit is complete... > > > Looks good > > > Writing Makefile for Bio::Ext::Align > > > ERROR from evaluation of > > > > > > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4/Bio/SeqIO/staden/Makefile.PL: > > > Can't locate Inline/MakeMaker.pm in @INC (@INC contains: > > > /usr/local/lib/perl5/5.6.1/IP27-irix /usr/local/lib/perl5/5.6.1 > > > /usr/local/lib/perl5/site_perl/5.6.1/IP27-irix > > > /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl > > > /v0/users/deng0007/my_bioperl/bioperl-ext-1.4 .) at ./Makefile.PL line > > 1. > > > BEGIN failed--compilation aborted at ./Makefile.PL line 1. > > > > > > could you please let me know what I did wrong or miss in there? thanks > a > > > lot, > > > > > > Xiang > > > > > > Department of Pharmacology and Cancer Biology > > > Duke University Medical Center > > > Durham, NC 27710 > > > Phone: 919-4792339 > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From barry.moore at genetics.utah.edu Fri Jul 23 11:41:41 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri Jul 23 11:43:18 2004 Subject: [Fwd: Re: [Bioperl-l] installation!] Message-ID: <410131B5.5040601@genetics.utah.edu> Malabika- Could you be more specific on what you are having a problem with. The INSTALL document (http://bioperl.org/Core/Latest/INSTALL) does give you step by step instructions for installing Bioperl on Linux. Have you tried the CPAN method? Or have you downloaded the tarball and tried the make instructions? If not, you need to at least try these first, and then come back to the list for help if you have problems. If you are not at all familiar with your Linux operating system, then you need to read some of the excellent linux tutorials available at The Linux Documentation Project (http://www.tldp.org/). Barry Malabika S wrote: >hello > >i have a linux os, i went thru the documentation for installing, but i >did not get it... >could u please tell me step by step how to install bioperl? i hope it is >done thru command line interface on new terminal? > >thanks a lot >m.sarker. > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From anunberg at oriongenomics.com Fri Jul 23 12:58:09 2004 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Fri Jul 23 12:55:33 2004 Subject: [Bioperl-l] Question about Fgenesh.pm Message-ID: Hi, I am trying to parse an fgenesh analysis file using Bio::Tools::Fgenesh from bioperl-live I had to make a change in the module at line 320 from: $predobj->primary_tag($ExonTags{$flds[3]} . 'Exon'); to: $predobj->primary_tag($ExonTags{$flds[4]} . 'Exon'); For the parser to work. I then wish to retreive the name of the dna sequence that was fed to the parser using the analysis_subject/query method $fgenesh = Bio::Tools::Fgenesh->new(-file=>"results file"); $gene = $fgenesh->next_prediction; $fgenesh->analysis_subject; This method returns undef, although it appears in the code that the info is stored there. Lines 366-373 from Fgenesh.pm if(/^(FGENESH)\s+([\d\.]+)/) { $self->analysis_method($1); $self->analysis_method_version($2); if (/\s(\S+)\sgenomic DNA/) { $self->analysis_subject($1); } next; } Any input would be appreciated -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From anunberg at oriongenomics.com Fri Jul 23 13:20:22 2004 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Fri Jul 23 13:17:44 2004 Subject: FW: [Bioperl-l] Question about Fgenesh.pm In-Reply-To: Message-ID: On 7/23/04 11:58 AM, "Andrew Nunberg" wrote: Sorry I figured it out The parsing of the seqid was incorrect line 375 has if(/^Seq name:\s+(\S+)/) { $seqname = $1; next; } The regex should be changed to /\s+Seq name:\s+(\S+)/ That fixes my problem since I can get the info from seq_id rather than from analysis_subject >> Hi, >> I am trying to parse an fgenesh analysis file using Bio::Tools::Fgenesh from >> bioperl-live >> >> I had to make a change in the module at line 320 >> from: >> $predobj->primary_tag($ExonTags{$flds[3]} . 'Exon'); >> >> to: >> $predobj->primary_tag($ExonTags{$flds[4]} . 'Exon'); >> >> For the parser to work. >> >> I then wish to retreive the name of the dna sequence that was fed to the >> parser using the analysis_subject/query method >> >> $fgenesh = Bio::Tools::Fgenesh->new(-file=>"results file"); >> $gene = $fgenesh->next_prediction; >> >> $fgenesh->analysis_subject; >> >> This method returns undef, although it appears in the code that the info is >> stored there. >> Lines 366-373 from Fgenesh.pm >> if(/^(FGENESH)\s+([\d\.]+)/) { >> $self->analysis_method($1); >> $self->analysis_method_version($2); >> if (/\s(\S+)\sgenomic DNA/) { >> $self->analysis_subject($1); >> } >> next; >> } >> Any input would be appreciated > > -- > Andrew Nunberg > Bioinformagician > Orion Genomics > (314)-615-6989 > www.oriongenomics.com > ------ End of Forwarded Message From jason at cgt.duhs.duke.edu Fri Jul 23 13:25:00 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jul 23 13:26:45 2004 Subject: FW: [Bioperl-l] Question about Fgenesh.pm In-Reply-To: References: Message-ID: Done - would be good to have some parsing for FGENESH as well. If someone would volunteer that would be great. -jason On Fri, 23 Jul 2004, Andrew Nunberg wrote: > > On 7/23/04 11:58 AM, "Andrew Nunberg" wrote: > Sorry I figured it out > The parsing of the seqid was incorrect > line 375 has > if(/^Seq name:\s+(\S+)/) { > $seqname = $1; > next; > } > The regex should be changed to /\s+Seq name:\s+(\S+)/ > That fixes my problem since I can get the info from seq_id rather than from > analysis_subject > > > >> Hi, > >> I am trying to parse an fgenesh analysis file using Bio::Tools::Fgenesh from > >> bioperl-live > >> > >> I had to make a change in the module at line 320 > >> from: > >> $predobj->primary_tag($ExonTags{$flds[3]} . 'Exon'); > >> > >> to: > >> $predobj->primary_tag($ExonTags{$flds[4]} . 'Exon'); > >> > >> For the parser to work. > >> > >> I then wish to retreive the name of the dna sequence that was fed to the > >> parser using the analysis_subject/query method > >> > >> $fgenesh = Bio::Tools::Fgenesh->new(-file=>"results file"); > >> $gene = $fgenesh->next_prediction; > >> > >> $fgenesh->analysis_subject; > >> > >> This method returns undef, although it appears in the code that the info is > >> stored there. > >> Lines 366-373 from Fgenesh.pm > >> if(/^(FGENESH)\s+([\d\.]+)/) { > >> $self->analysis_method($1); > >> $self->analysis_method_version($2); > >> if (/\s(\S+)\sgenomic DNA/) { > >> $self->analysis_subject($1); > >> } > >> next; > >> } > >> Any input would be appreciated > > > > -- > > Andrew Nunberg > > Bioinformagician > > Orion Genomics > > (314)-615-6989 > > www.oriongenomics.com > > > > ------ End of Forwarded Message > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Fri Jul 23 13:31:41 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jul 23 13:33:17 2004 Subject: [Bioperl-l] Question about Fgenesh.pm In-Reply-To: References: Message-ID: $lastcomment =~ 's/some parsing/some parsing testing, i.e. the test suite/'; On Fri, 23 Jul 2004, Andrew Nunberg wrote: > On 7/23/04 12:25 PM, "Jason Stajich" wrote: > What do you mean, this is parsing for FGENESH.. ??? > > > Done - would be good to have some parsing for FGENESH as well. If someone > > would volunteer that would be great. > > > > -jason > > > > On Fri, 23 Jul 2004, Andrew Nunberg wrote: > > > >> > >> On 7/23/04 11:58 AM, "Andrew Nunberg" wrote: > >> Sorry I figured it out > >> The parsing of the seqid was incorrect > >> line 375 has > >> if(/^Seq name:\s+(\S+)/) { > >> $seqname = $1; > >> next; > >> } > >> The regex should be changed to /\s+Seq name:\s+(\S+)/ > >> That fixes my problem since I can get the info from seq_id rather than from > >> analysis_subject > >> > >> > >>>> Hi, > >>>> I am trying to parse an fgenesh analysis file using Bio::Tools::Fgenesh > >>>> from > >>>> bioperl-live > >>>> > >>>> I had to make a change in the module at line 320 > >>>> from: > >>>> $predobj->primary_tag($ExonTags{$flds[3]} . 'Exon'); > >>>> > >>>> to: > >>>> $predobj->primary_tag($ExonTags{$flds[4]} . 'Exon'); > >>>> > >>>> For the parser to work. > >>>> > >>>> I then wish to retreive the name of the dna sequence that was fed to the > >>>> parser using the analysis_subject/query method > >>>> > >>>> $fgenesh = Bio::Tools::Fgenesh->new(-file=>"results file"); > >>>> $gene = $fgenesh->next_prediction; > >>>> > >>>> $fgenesh->analysis_subject; > >>>> > >>>> This method returns undef, although it appears in the code that the info is > >>>> stored there. > >>>> Lines 366-373 from Fgenesh.pm > >>>> if(/^(FGENESH)\s+([\d\.]+)/) { > >>>> $self->analysis_method($1); > >>>> $self->analysis_method_version($2); > >>>> if (/\s(\S+)\sgenomic DNA/) { > >>>> $self->analysis_subject($1); > >>>> } > >>>> next; > >>>> } > >>>> Any input would be appreciated > >>> > >>> -- > >>> Andrew Nunberg > >>> Bioinformagician > >>> Orion Genomics > >>> (314)-615-6989 > >>> www.oriongenomics.com > >>> > >> > >> ------ End of Forwarded Message > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From nymete at yahoo.com Fri Jul 23 17:19:26 2004 From: nymete at yahoo.com (Nurcan Mete) Date: Fri Jul 23 17:21:04 2004 Subject: [Bioperl-l] Retrieve results for an RID Message-ID: <20040723211926.38163.qmail@web21526.mail.yahoo.com> I want to add some functionality to my website, that is similar to "Retrieve results for an RID" of NCBI blast. In other words i am trying to retrieve blast results using a particular rid. following is the code: ---------------------------------------------------- my @params = ( '-prog' => 'blastn', '-expect' => 1, '-readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); my $rc = $factory->retrieve_blast($rid); ---------------------------------------------------- $rid is a legitimate request ID that is passed as parameter to the function. retrieve_blast function returns Bio::SearchIO::blast=HASH(0x2c9bf8), that indicates the success. but $rc->next_result() function fails and I get the following error. ------------- EXCEPTION ------------- MSG: no data for midline genome STACK Bio::SearchIO::blast::next_result /usr/perl5/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 STACK main::test_blast /var/apache/cgi-bin/user/blast.pl:578 STACK toplevel /var/apache/cgi-bin/user/blast.cgi:65 -------------------------------------- How can I do this. Thanks. __________________________________ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail From brian_osborne at cognia.com Fri Jul 23 20:27:02 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jul 23 20:28:44 2004 Subject: [Bioperl-l] html stripped from blast report In-Reply-To: <1090269997.40fc332d2a985@webmail1.unm.edu> Message-ID: George, No, that definitely won't work. I'll remove it from the FAQ and take a look... Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of ghrose@unm.edu Sent: Monday, July 19, 2004 4:47 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] html stripped from blast report Dear Bioperl, I'm trying to use the code in http://bio.perl.org/Core/Latest/faq.html#Q3.7 to strip html out of a blast report that is html format. ******The code #!/usr/bin/perl use strict; use DBI; use Bio::Perl; use Bio::SearchIO; use Bio::SearchIO::blast; use HTML::Strip; my $hs = new HTML::Strip; # replace the blast parser's _readline method with one that # auto-strips HTML: sub Bio::SearchIO::blast::_readline { my ($self, @args) = @_; return $hs->parse($self->SUPER::_readline(@args)); } my $in = new Bio::SearchIO(-format => 'blast', -file => $ARGV[0]); *******gives me the following error. georges-Computer:~/Desktop/ben-p020 george$ perl insert_7_4hstrip.pl p018xnr.html > test3 Can't locate object method "_readline" via package "main" at insert_7_4hstrip.pl line 17. I believe I have the HTML::Strip installed correctly. I'm running this script on macosx10.3. Can you give me some advise on how to solve this problem? Thank you, George _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From cdwan at mail.ahc.umn.edu Fri Jul 23 13:48:21 2004 From: cdwan at mail.ahc.umn.edu (Chris Dwan) Date: Sat Jul 24 00:34:12 2004 Subject: [Bioperl-l] Question about Fgenesh.pm In-Reply-To: References: Message-ID: <7CB943EB-DCD0-11D8-8A67-000A95CE2714@mail.ahc.umn.edu> Sorry about the lack of quality checking. I would love to help, but I no longer have access to fgenesh to do such testing (licensed program, new employers don't own it). -Chris Dwan The Bioteam On Jul 23, 2004, at 1:31 PM, Jason Stajich wrote: > $lastcomment =~ 's/some parsing/some parsing testing, i.e. the test > suite/'; > > On Fri, 23 Jul 2004, Andrew Nunberg wrote: > >> On 7/23/04 12:25 PM, "Jason Stajich" wrote: >> What do you mean, this is parsing for FGENESH.. ??? >> >>> Done - would be good to have some parsing for FGENESH as well. If >>> someone >>> would volunteer that would be great. >>> >>> -jason >>> >>> On Fri, 23 Jul 2004, Andrew Nunberg wrote: >>> >>>> >>>> On 7/23/04 11:58 AM, "Andrew Nunberg" >>>> wrote: >>>> Sorry I figured it out >>>> The parsing of the seqid was incorrect >>>> line 375 has >>>> if(/^Seq name:\s+(\S+)/) { >>>> $seqname = $1; >>>> next; >>>> } >>>> The regex should be changed to /\s+Seq name:\s+(\S+)/ >>>> That fixes my problem since I can get the info from seq_id rather >>>> than from >>>> analysis_subject >>>> >>>> >>>>>> Hi, >>>>>> I am trying to parse an fgenesh analysis file using >>>>>> Bio::Tools::Fgenesh >>>>>> from >>>>>> bioperl-live >>>>>> >>>>>> I had to make a change in the module at line 320 >>>>>> from: >>>>>> $predobj->primary_tag($ExonTags{$flds[3]} . 'Exon'); >>>>>> >>>>>> to: >>>>>> $predobj->primary_tag($ExonTags{$flds[4]} . 'Exon'); >>>>>> >>>>>> For the parser to work. >>>>>> >>>>>> I then wish to retreive the name of the dna sequence that was fed >>>>>> to the >>>>>> parser using the analysis_subject/query method >>>>>> >>>>>> $fgenesh = Bio::Tools::Fgenesh->new(-file=>"results file"); >>>>>> $gene = $fgenesh->next_prediction; >>>>>> >>>>>> $fgenesh->analysis_subject; >>>>>> >>>>>> This method returns undef, although it appears in the code that >>>>>> the info is >>>>>> stored there. >>>>>> Lines 366-373 from Fgenesh.pm >>>>>> if(/^(FGENESH)\s+([\d\.]+)/) { >>>>>> $self->analysis_method($1); >>>>>> $self->analysis_method_version($2); >>>>>> if (/\s(\S+)\sgenomic DNA/) { >>>>>> $self->analysis_subject($1); >>>>>> } >>>>>> next; >>>>>> } >>>>>> Any input would be appreciated >>>>> >>>>> -- >>>>> Andrew Nunberg >>>>> Bioinformagician >>>>> Orion Genomics >>>>> (314)-615-6989 >>>>> www.oriongenomics.com >>>>> >>>> >>>> ------ End of Forwarded Message >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> -- >>> Jason Stajich >>> Duke University >>> jason at cgt.mc.duke.edu >>> >> >> > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason at cgt.duhs.duke.edu Sat Jul 24 09:10:25 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Jul 24 09:12:01 2004 Subject: [Bioperl-l] Retrieve results for an RID In-Reply-To: <20040723211926.38163.qmail@web21526.mail.yahoo.com> References: <20040723211926.38163.qmail@web21526.mail.yahoo.com> Message-ID: Hmm - As I said in the perlmonks post you need to give more information, i.e. a copy of the report which is failing. Is your webscript adding things to the report which are not part of the standard BLAST output, etc. -j On Fri, 23 Jul 2004, Nurcan Mete wrote: > > I want to add some functionality to my website, that > is similar to "Retrieve results for an RID" of NCBI > blast. > In other words i am trying to retrieve blast results > using a particular rid. > > > following is the code: > > ---------------------------------------------------- > > my @params = ( '-prog' => 'blastn', > '-expect' => 1, > '-readmethod' => 'SearchIO' ); > > > my $factory = > Bio::Tools::Run::RemoteBlast->new(@params); > > my $rc = $factory->retrieve_blast($rid); > > ---------------------------------------------------- > > $rid is a legitimate request ID that is passed as > parameter to the function. > retrieve_blast function returns > Bio::SearchIO::blast=HASH(0x2c9bf8), that indicates > the success. > > but $rc->next_result() function fails and I get the > following error. > > > ------------- EXCEPTION ------------- > MSG: no data for midline genome > STACK Bio::SearchIO::blast::next_result > /usr/perl5/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 > STACK main::test_blast > /var/apache/cgi-bin/user/blast.pl:578 > STACK toplevel /var/apache/cgi-bin/user/blast.cgi:65 > > -------------------------------------- > > How can I do this. > > Thanks. > > > > > __________________________________ > Do you Yahoo!? > Yahoo! Mail - You care about security. So do we. > http://promotions.yahoo.com/new_mail > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Sat Jul 24 09:30:20 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Jul 24 09:32:00 2004 Subject: [Bioperl-l] html stripped from blast report In-Reply-To: References: Message-ID: Brian,George - this works fine for me (adding the package statement). #!/usr/bin/perl -w use Bio::SearchIO; use HTML::Strip; my $hs = HTML::Strip->new(); # replace the blast parser's _readline method with one that # auto-strips HTML: package Bio::SearchIO::blast; # added this line. sub Bio::SearchIO::blast::_readline { my ($self, @args) = @_; my $line = $self->SUPER::_readline(@args); return unless defined $line; return $hs->parse($line); } my $in = new Bio::SearchIO(-format => 'blast', -file => $ARGV[0]); while( my $r = $in->next_result ) { print $r->query_name, "\n"; } On Fri, 23 Jul 2004, Brian Osborne wrote: > George, > > No, that definitely won't work. I'll remove it from the FAQ and take a > look... > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of ghrose@unm.edu > Sent: Monday, July 19, 2004 4:47 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] html stripped from blast report > > Dear Bioperl, > > I'm trying to use the code in > http://bio.perl.org/Core/Latest/faq.html#Q3.7 > to strip html out of a blast report that is html format. > > ******The code > > #!/usr/bin/perl > > use strict; > use DBI; > use Bio::Perl; > use Bio::SearchIO; > use Bio::SearchIO::blast; > use HTML::Strip; > > my $hs = new HTML::Strip; > > # replace the blast parser's _readline method with one that > # auto-strips HTML: > > sub Bio::SearchIO::blast::_readline { > my ($self, @args) = @_; > return $hs->parse($self->SUPER::_readline(@args)); > } > > my $in = new Bio::SearchIO(-format => 'blast', > -file => $ARGV[0]); > > > *******gives me the following error. > > > georges-Computer:~/Desktop/ben-p020 george$ perl insert_7_4hstrip.pl > p018xnr.html > test3 > Can't locate object method "_readline" via package "main" at > insert_7_4hstrip.pl line 17. > > > I believe I have the HTML::Strip installed correctly. > I'm running this script on macosx10.3. > > Can you give me some advise on how to solve this problem? > > Thank you, > > George > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From nymete at yahoo.com Mon Jul 26 13:49:50 2004 From: nymete at yahoo.com (Nurcan Mete) Date: Mon Jul 26 13:51:23 2004 Subject: [Bioperl-l] Retrieve results for an RID In-Reply-To: Message-ID: <20040726174950.81546.qmail@web21503.mail.yahoo.com> No, I don't think I am trying to add something to standard Blast output. I use standard Bio::Perl 'blast_sequence' function to blast my sequences. In the loop section, I have one small modification: I insert into database the request ID of the blast that has failed to be completed by the time. LOOP : while( my @rids = $factory->each_rid) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { # INSERT INTO TEMP_TABLE # (request_id) VALUES('$rid') $factory->remove_rid($rid); } if( $verbose ) { print STDERR "."; } sleep 10; } else { $result = $rc->next_result(); $factory->remove_rid($rid); last LOOP; } } } Then, later (ie. at midnight), I want to retrive blast results giving this stored RID. very simply, I use the same function code to achieve so: ---------------------------------------------------------- my @params = ( '-prog' => $prog, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); my $rc = $factory->retrieve_blast($rid); ---------------------------------------------------------- $rid: request ID stored in the database I omit the following parts of original code: my $r = $factory->submit_blast($seq); while( my @rids = $factory->each_rid) { foreach my $rid ( @rids ) { so may the problem be with this omission? I hope these details are more explanative. Thank you. --- Jason Stajich wrote: > Hmm - As I said in the perlmonks post you need to > give more information, > i.e. a copy of the report which is failing. Is your > webscript adding > things to the report which are not part of the > standard BLAST output, etc. > > -j > On Fri, 23 Jul 2004, Nurcan Mete wrote: > > > > > I want to add some functionality to my website, > that > > is similar to "Retrieve results for an RID" of > NCBI > > blast. > > In other words i am trying to retrieve blast > results > > using a particular rid. > > > > > > following is the code: > > > > > ---------------------------------------------------- > > > > my @params = ( '-prog' => 'blastn', > > '-expect' => 1, > > '-readmethod' => 'SearchIO' ); > > > > > > my $factory = > > Bio::Tools::Run::RemoteBlast->new(@params); > > > > my $rc = $factory->retrieve_blast($rid); > > > > > ---------------------------------------------------- > > > > $rid is a legitimate request ID that is passed as > > parameter to the function. > > retrieve_blast function returns > > Bio::SearchIO::blast=HASH(0x2c9bf8), that > indicates > > the success. > > > > but $rc->next_result() function fails and I get > the > > following error. > > > > > > ------------- EXCEPTION ------------- > > MSG: no data for midline genome > > STACK Bio::SearchIO::blast::next_result > > > /usr/perl5/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 > > STACK main::test_blast > > /var/apache/cgi-bin/user/blast.pl:578 > > STACK toplevel > /var/apache/cgi-bin/user/blast.cgi:65 > > > > -------------------------------------- > > > > How can I do this. > > > > Thanks. > > > > > > > > > > __________________________________ > > Do you Yahoo!? > > Yahoo! Mail - You care about security. So do we. > > http://promotions.yahoo.com/new_mail > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________ Do you Yahoo!? Express yourself with Y! Messenger! Free. Download now. http://messenger.yahoo.com From jason at cgt.duhs.duke.edu Mon Jul 26 14:57:53 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jul 26 14:59:31 2004 Subject: [Bioperl-l] Retrieve results for an RID In-Reply-To: <20040726174950.81546.qmail@web21503.mail.yahoo.com> References: <20040726174950.81546.qmail@web21503.mail.yahoo.com> Message-ID: ohh - so you're not inventing your own blast + RID system. That is what I took from your 1st mail. What happens when you paste that RID into the box @ NCBI? see below also On Mon, 26 Jul 2004, Nurcan Mete wrote: > No, I don't think I am trying to add something to > standard Blast output. > > I use standard Bio::Perl 'blast_sequence' function to > blast my sequences. > In the loop section, I have one small modification: I > insert into database the request ID of the blast that > has failed to be completed by the time. > > LOOP : > while( my @rids = $factory->each_rid) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > > # INSERT INTO TEMP_TABLE > # (request_id) VALUES('$rid') These RIDs are removed because they failed. If $rc < 0 there was an error. You should not be inserting these into your database. If you'd rather just insert all the RIDs into your db initially rather than doing this loop in the first place if what you want to do is then retreive them later anyways. > > $factory->remove_rid($rid); > } > if( $verbose ) { > print STDERR "."; > } > sleep 10; > } else { > $result = $rc->next_result(); > $factory->remove_rid($rid); > last LOOP; > } > } > } > > > Then, later (ie. at midnight), I want to retrive > blast results giving this stored RID. > very simply, I use the same function code to achieve > so: > > > ---------------------------------------------------------- > my @params = ( '-prog' => $prog, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > > > my $factory = > Bio::Tools::Run::RemoteBlast->new(@params); > > my $rc = $factory->retrieve_blast($rid); > > ---------------------------------------------------------- > > $rid: request ID stored in the database > > I omit the following parts of original code: > > my $r = $factory->submit_blast($seq); > > while( my @rids = $factory->each_rid) { > foreach my $rid ( @rids ) { > > > > > so may the problem be with this omission? No that is all fine I guess. Just get rid of the perl part and see what the RID value gives you when you post it on the NCBI site. > > > I hope these details are more explanative. > > Thank you. > > > > > > --- Jason Stajich wrote: > > Hmm - As I said in the perlmonks post you need to > > give more information, > > i.e. a copy of the report which is failing. Is your > > webscript adding > > things to the report which are not part of the > > standard BLAST output, etc. > > > > -j > > On Fri, 23 Jul 2004, Nurcan Mete wrote: > > > > > > > > I want to add some functionality to my website, > > that > > > is similar to "Retrieve results for an RID" of > > NCBI > > > blast. > > > In other words i am trying to retrieve blast > > results > > > using a particular rid. > > > > > > > > > following is the code: > > > > > > > > ---------------------------------------------------- > > > > > > my @params = ( '-prog' => 'blastn', > > > '-expect' => 1, > > > '-readmethod' => 'SearchIO' ); > > > > > > > > > my $factory = > > > Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > > ---------------------------------------------------- > > > > > > $rid is a legitimate request ID that is passed as > > > parameter to the function. > > > retrieve_blast function returns > > > Bio::SearchIO::blast=HASH(0x2c9bf8), that > > indicates > > > the success. > > > > > > but $rc->next_result() function fails and I get > > the > > > following error. > > > > > > > > > ------------- EXCEPTION ------------- > > > MSG: no data for midline genome > > > STACK Bio::SearchIO::blast::next_result > > > > > > /usr/perl5/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 > > > STACK main::test_blast > > > /var/apache/cgi-bin/user/blast.pl:578 > > > STACK toplevel > > /var/apache/cgi-bin/user/blast.cgi:65 > > > > > > -------------------------------------- > > > > > > How can I do this. > > > > > > Thanks. > > > > > > > > > > > > > > > __________________________________ > > > Do you Yahoo!? > > > Yahoo! Mail - You care about security. So do we. > > > http://promotions.yahoo.com/new_mail > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________ > Do you Yahoo!? > Express yourself with Y! Messenger! Free. Download now. > http://messenger.yahoo.com > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From M.Q.Lewis at exeter.ac.uk Mon Jul 26 15:01:14 2004 From: M.Q.Lewis at exeter.ac.uk (mql201) Date: Mon Jul 26 15:07:57 2004 Subject: [Bioperl-l] negative start/end numbers to Bio::SeqFeature::Generic Message-ID: <4105E9A1@minerva2.ex.ac.uk> Hello quick q: i want to supply Bio::SeqFeature::Generic with NEGATIVE start and end values, or get round it somehow, cos i'm looking at gene promoters, and not genes. less quick: I'm writing CGI to analyse promoter sequences and output images of locations of upstream regulatory elements (eg transcription binding sites). after the analysis (all good), i create the graphics panel, and go to add a scale track: ## create and add scale as an anchored arrow # $scale = Bio::SeqFeature::Generic->new(-start=>1,-end=>$seqLen); #$scale = Bio::SeqFeature::Generic->new(-start=>$negSeqLen,-end=>-1); $panel->add_track($scale, -glyph => 'anchored_arrow', -tick => 2, -fontcolor => '#3d5315', -fgcolor => '#3d5315', -bgcolor => '#e3ffb7'); as you can see from the commented $scale instantiation, i can supply positive integers to start and end as long as end > start, but if i give negative numbers, i get an empty image. i've looked at the Bio::SeqFeature::Gene::Promoter module, but there is no help information about how to extract and use the return values. Anyone know of a way of using bioperl using negative numbers? Many thanks IA Mark From crabtree at tigr.org Mon Jul 26 15:25:27 2004 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Mon Jul 26 15:28:04 2004 Subject: [Bioperl-l] negative start/end numbers to Bio::SeqFeature::Generic Message-ID: Mark- Try setting -offset => -$negSeqLen when you call $panel->new(). Jonathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of mql201 > Sent: Monday, July 26, 2004 3:01 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] negative start/end numbers to > Bio::SeqFeature::Generic > > > Hello > > quick q: i want to supply Bio::SeqFeature::Generic with > NEGATIVE start and end > values, or get > round it somehow, cos i'm looking at gene promoters, and not genes. > > less quick: > I'm writing CGI to analyse promoter sequences and output > images of locations > of upstream > regulatory elements (eg transcription binding sites). > > after the analysis (all good), i create the graphics panel, > and go to add a > scale track: > > ## create and add scale as an anchored arrow > # > $scale = Bio::SeqFeature::Generic->new(-start=>1,-end=>$seqLen); > #$scale = > Bio::SeqFeature::Generic->new(-start=>$negSeqLen,-end=>-1); > $panel->add_track($scale, > -glyph => 'anchored_arrow', > -tick => 2, > -fontcolor => '#3d5315', > -fgcolor => '#3d5315', > -bgcolor => '#e3ffb7'); > > as you can see from the commented $scale instantiation, i can > supply positive > integers to start and > end as long as end > start, but if i give negative numbers, i > get an empty > image. > > > i've looked at the Bio::SeqFeature::Gene::Promoter module, > but there is no > help information about > how to extract and use the return values. > > > Anyone know of a way of using bioperl using negative numbers? > > Many thanks IA > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-> bio.org/mailman/listinfo/bioperl-l > From crabtree at tigr.org Mon Jul 26 15:31:16 2004 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Mon Jul 26 15:33:09 2004 Subject: [Bioperl-l] negative start/end numbers to Bio::SeqFeature::Generic Message-ID: Mark- Sorry, that should have read "$negSeqLen", not "-$negSeqLen". You get the idea, though... Jonathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > Crabtree, Jonathan > Sent: Monday, July 26, 2004 3:25 PM > To: mql201 > Cc: bioperl-l@portal.open-bio.org > Subject: RE: [Bioperl-l] negative start/end numbers to > Bio::SeqFeature::Generic > > > > Mark- > > Try setting -offset => -$negSeqLen when you call $panel->new(). > > Jonathan > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of mql201 > > Sent: Monday, July 26, 2004 3:01 PM > > To: bioperl-l@portal.open-bio.org > > Subject: [Bioperl-l] negative start/end numbers to > > Bio::SeqFeature::Generic > > > > > > Hello > > > > quick q: i want to supply Bio::SeqFeature::Generic with > > NEGATIVE start and end > > values, or get > > round it somehow, cos i'm looking at gene promoters, and not genes. > > > > less quick: > > I'm writing CGI to analyse promoter sequences and output > > images of locations > > of upstream > > regulatory elements (eg transcription binding sites). > > > > after the analysis (all good), i create the graphics panel, > > and go to add a > > scale track: > > > > ## create and add scale as an anchored arrow > > # > > $scale = Bio::SeqFeature::Generic->new(-start=>1,-end=>$seqLen); > > #$scale = > > Bio::SeqFeature::Generic->new(-start=>$negSeqLen,-end=>-1); > > $panel->add_track($scale, > > -glyph => 'anchored_arrow', > > -tick => 2, > > -fontcolor => '#3d5315', > > -fgcolor => '#3d5315', > > -bgcolor => '#e3ffb7'); > > > > as you can see from the commented $scale instantiation, i can > > supply positive > > integers to start and > > end as long as end > start, but if i give negative numbers, i > > get an empty > > image. > > > > > > i've looked at the Bio::SeqFeature::Gene::Promoter module, > > but there is no > > help information about > > how to extract and use the return values. > > > > > > Anyone know of a way of using bioperl using negative numbers? > > > > Many thanks IA > > Mark > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-> bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From nymete at yahoo.com Mon Jul 26 16:15:49 2004 From: nymete at yahoo.com (Nurcan Mete) Date: Mon Jul 26 16:17:20 2004 Subject: [Bioperl-l] Retrieve results for an RID In-Reply-To: Message-ID: <20040726201549.56747.qmail@web21529.mail.yahoo.com> I get the results properly from NCBI blast using the same rid. Though I may be wrong, I guess $rc<0 is not always in case of error. What I had realized was: Sometimes it takes so long to get the results, and get $rc = -1. Then I save the RID and try later. both Blast at NCBI website and my program returns no result at that time. I try again later, after a sufficent time (15-20 minutes) has passed, then I get the results properly from NCBI blast. and my $factory->retrieve_blast($rid); function returns Bio::SearchIO, indicating success. Nurcan Mete. --- Jason Stajich wrote: > ohh - so you're not inventing your own blast + RID > system. That is what I > took from your 1st mail. > > What happens when you paste that RID into the box @ > NCBI? > > see below also > On Mon, 26 Jul 2004, Nurcan Mete wrote: > > > No, I don't think I am trying to add something to > > standard Blast output. > > > > I use standard Bio::Perl 'blast_sequence' function > to > > blast my sequences. > > In the loop section, I have one small > modification: I > > insert into database the request ID of the blast > that > > has failed to be completed by the time. > > > > LOOP : > > while( my @rids = $factory->each_rid) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > > > # INSERT INTO TEMP_TABLE > > # (request_id) VALUES('$rid') > > These RIDs are removed because they failed. If $rc > < 0 there was an > error. You should not be inserting these into your > database. > > If you'd rather just insert all the RIDs into your > db initially rather > than doing this loop in the first place if what you > want to do is then > retreive them later anyways. > > > > > $factory->remove_rid($rid); > > } > > if( $verbose ) { > > print STDERR "."; > > } > > sleep 10; > > } else { > > $result = $rc->next_result(); > > $factory->remove_rid($rid); > > last LOOP; > > } > > } > > } > > > > > > Then, later (ie. at midnight), I want to retrive > > blast results giving this stored RID. > > very simply, I use the same function code to > achieve > > so: > > > > > > > ---------------------------------------------------------- > > my @params = ( '-prog' => $prog, > > '-expect' => $e_val, > > '-readmethod' => 'SearchIO' ); > > > > > > my $factory = > > Bio::Tools::Run::RemoteBlast->new(@params); > > > > my $rc = $factory->retrieve_blast($rid); > > > > > ---------------------------------------------------------- > > > > $rid: request ID stored in the database > > > > I omit the following parts of original code: > > > > my $r = $factory->submit_blast($seq); > > > > while( my @rids = $factory->each_rid) { > > foreach my $rid ( @rids ) { > > > > > > > > > > so may the problem be with this omission? > > No that is all fine I guess. Just get rid of the > perl part and see what > the RID value gives you when you post it on the NCBI > site. > > > > > > I hope these details are more explanative. > > > > Thank you. > > > > > > > > > > > > --- Jason Stajich wrote: > > > Hmm - As I said in the perlmonks post you need > to > > > give more information, > > > i.e. a copy of the report which is failing. Is > your > > > webscript adding > > > things to the report which are not part of the > > > standard BLAST output, etc. > > > > > > -j > > > On Fri, 23 Jul 2004, Nurcan Mete wrote: > > > > > > > > > > > I want to add some functionality to my > website, > > > that > > > > is similar to "Retrieve results for an RID" of > > > NCBI > > > > blast. > > > > In other words i am trying to retrieve blast > > > results > > > > using a particular rid. > > > > > > > > > > > > following is the code: > > > > > > > > > > > > ---------------------------------------------------- > > > > > > > > my @params = ( '-prog' => 'blastn', > > > > '-expect' => 1, > > > > '-readmethod' => 'SearchIO' ); > > > > > > > > > > > > my $factory = > > > > Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > > > > > > ---------------------------------------------------- > > > > > > > > $rid is a legitimate request ID that is passed > as > > > > parameter to the function. > > > > retrieve_blast function returns > > > > Bio::SearchIO::blast=HASH(0x2c9bf8), that > > > indicates > > > > the success. > > > > > > > > but $rc->next_result() function fails and I > get > > > the > > > > following error. > > > > > > > > > > > > ------------- EXCEPTION ------------- > > > > MSG: no data for midline genome > > > > STACK Bio::SearchIO::blast::next_result > > > > > > > > > > /usr/perl5/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 > > > > STACK main::test_blast > > > > /var/apache/cgi-bin/user/blast.pl:578 > > > > STACK toplevel > > > /var/apache/cgi-bin/user/blast.cgi:65 > > > > > > > > -------------------------------------- > > > > > > > > How can I do this. > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > __________________________________ > > > > Do you Yahoo!? > > > > Yahoo! Mail - You care about security. So do > we. > > > > http://promotions.yahoo.com/new_mail > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > === message truncated === __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail From jason at cgt.duhs.duke.edu Mon Jul 26 16:29:24 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Jul 26 16:30:57 2004 Subject: [Bioperl-l] Retrieve results for an RID In-Reply-To: <20040726201549.56747.qmail@web21529.mail.yahoo.com> References: <20040726201549.56747.qmail@web21529.mail.yahoo.com> Message-ID: On Mon, 26 Jul 2004, Nurcan Mete wrote: > I get the results properly from NCBI blast using the > same rid. > > > Though I may be wrong, I guess $rc<0 is not always in > case of error. Perhaps. You'll have to look at the logic of the module I don't remember what are all the cases which trigger -1. It is up to you if you want to remove the RID when rc < 0 I guess. > > What I had realized was: > Sometimes it takes so long to get the results, and get > $rc = -1. > Then I save the RID and try later. both Blast at NCBI > website and my program > returns no result at that time. > So a timeout I guess. I dunno. > > I try again later, after a sufficent time (15-20 > minutes) has passed, then I get the results > properly from NCBI blast. > and my > $factory->retrieve_blast($rid); > function returns Bio::SearchIO, indicating success. > > > Nurcan Mete. > > > > > --- Jason Stajich wrote: > > ohh - so you're not inventing your own blast + RID > > system. That is what I > > took from your 1st mail. > > > > What happens when you paste that RID into the box @ > > NCBI? > > > > see below also > > On Mon, 26 Jul 2004, Nurcan Mete wrote: > > > > > No, I don't think I am trying to add something to > > > standard Blast output. > > > > > > I use standard Bio::Perl 'blast_sequence' function > > to > > > blast my sequences. > > > In the loop section, I have one small > > modification: I > > > insert into database the request ID of the blast > > that > > > has failed to be completed by the time. > > > > > > LOOP : > > > while( my @rids = $factory->each_rid) { > > > foreach my $rid ( @rids ) { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) { > > > if( $rc < 0 ) { > > > > > > # INSERT INTO TEMP_TABLE > > > # (request_id) VALUES('$rid') > > > > These RIDs are removed because they failed. If $rc > > < 0 there was an > > error. You should not be inserting these into your > > database. > > > > If you'd rather just insert all the RIDs into your > > db initially rather > > than doing this loop in the first place if what you > > want to do is then > > retreive them later anyways. > > > > > > > > $factory->remove_rid($rid); > > > } > > > if( $verbose ) { > > > print STDERR "."; > > > } > > > sleep 10; > > > } else { > > > $result = $rc->next_result(); > > > $factory->remove_rid($rid); > > > last LOOP; > > > } > > > } > > > } > > > > > > > > > Then, later (ie. at midnight), I want to retrive > > > blast results giving this stored RID. > > > very simply, I use the same function code to > > achieve > > > so: > > > > > > > > > > > > ---------------------------------------------------------- > > > my @params = ( '-prog' => $prog, > > > '-expect' => $e_val, > > > '-readmethod' => 'SearchIO' ); > > > > > > > > > my $factory = > > > Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > > > ---------------------------------------------------------- > > > > > > $rid: request ID stored in the database > > > > > > I omit the following parts of original code: > > > > > > my $r = $factory->submit_blast($seq); > > > > > > while( my @rids = $factory->each_rid) { > > > foreach my $rid ( @rids ) { > > > > > > > > > > > > > > > so may the problem be with this omission? > > > > No that is all fine I guess. Just get rid of the > > perl part and see what > > the RID value gives you when you post it on the NCBI > > site. > > > > > > > > > I hope these details are more explanative. > > > > > > Thank you. > > > > > > > > > > > > > > > > > > --- Jason Stajich wrote: > > > > Hmm - As I said in the perlmonks post you need > > to > > > > give more information, > > > > i.e. a copy of the report which is failing. Is > > your > > > > webscript adding > > > > things to the report which are not part of the > > > > standard BLAST output, etc. > > > > > > > > -j > > > > On Fri, 23 Jul 2004, Nurcan Mete wrote: > > > > > > > > > > > > > > I want to add some functionality to my > > website, > > > > that > > > > > is similar to "Retrieve results for an RID" of > > > > NCBI > > > > > blast. > > > > > In other words i am trying to retrieve blast > > > > results > > > > > using a particular rid. > > > > > > > > > > > > > > > following is the code: > > > > > > > > > > > > > > > > ---------------------------------------------------- > > > > > > > > > > my @params = ( '-prog' => 'blastn', > > > > > '-expect' => 1, > > > > > '-readmethod' => 'SearchIO' ); > > > > > > > > > > > > > > > my $factory = > > > > > Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > > > > > > > > > > ---------------------------------------------------- > > > > > > > > > > $rid is a legitimate request ID that is passed > > as > > > > > parameter to the function. > > > > > retrieve_blast function returns > > > > > Bio::SearchIO::blast=HASH(0x2c9bf8), that > > > > indicates > > > > > the success. > > > > > > > > > > but $rc->next_result() function fails and I > > get > > > > the > > > > > following error. > > > > > > > > > > > > > > > ------------- EXCEPTION ------------- > > > > > MSG: no data for midline genome > > > > > STACK Bio::SearchIO::blast::next_result > > > > > > > > > > > > > > > /usr/perl5/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 > > > > > STACK main::test_blast > > > > > /var/apache/cgi-bin/user/blast.pl:578 > > > > > STACK toplevel > > > > /var/apache/cgi-bin/user/blast.cgi:65 > > > > > > > > > > -------------------------------------- > > > > > > > > > > How can I do this. > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > __________________________________ > > > > > Do you Yahoo!? > > > > > Yahoo! Mail - You care about security. So do > > we. > > > > > http://promotions.yahoo.com/new_mail > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@portal.open-bio.org > > > > > > > > > > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > === message truncated === > > > > > __________________________________ > Do you Yahoo!? > New and Improved Yahoo! Mail - Send 10MB messages! > http://promotions.yahoo.com/new_mail > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jsun at utdallas.edu Mon Jul 26 17:27:56 2004 From: jsun at utdallas.edu (Sun, Jian) Date: Tue Jul 27 08:18:15 2004 Subject: [Bioperl-l] Help for using Clustalw.pm Message-ID: Dear all: I tried to align multiple sequence through Clustalw.pm, and I used the source code attached below: ***************************************************************************************************** #!C:\Perl\bin\perl.exe use lib "C:\Perl\lib"; use Bio::Perl; use Bio::Tools::Run::Alignment::Clustalw; use strict; use warnings; $ENV{CLUSTALDIR} = 'C:/Program Files/Apache Group/Apache2/bin/'; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $ktuple = 3; $factory->ktuple($ktuple); # change the parameter before executing my $str = Bio::SeqIO->new(-file=> 'Clustseq.fa', '-format' => 'Fasta'); my @seq_array =(); while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} my $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects my $aln = $factory->align($seq_array_ref); ************************************************************************** and when I run the pl file, I get this error message: /////////////////////////////////////////////////////////////////////////////////////////// Clustalw program not found as clustalw or not executable. ...... 'clustalw' is not recognized as an internal or external command, operable program or batch file. ---------------------EXCEPTION----------------------------------------- MSG: Clustalw call crashed:256 STACK Bio::Tools::Run::Alignment::Clustalw::_run c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:581 STACK Bio::Tools::Run::Alignment::Clustalw::_run c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:507 STACK toplevel test726pl.pl:33 ----------------------------------------------------------------------------------------- /////////////////////////////////////////////////////////////////////////////////////// Since I already set the CLUSTALDIR variable, I don't know why the Clustalw is still not execuatable. Did I did the right setting? Does anyone have this kind of experience? Thanks in advance. Jane From james.wasmuth at ed.ac.uk Tue Jul 27 08:28:57 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Tue Jul 27 08:37:31 2004 Subject: [Bioperl-l] Help for using Clustalw.pm In-Reply-To: References: Message-ID: <41064A89.7080004@ed.ac.uk> Jane, Try: $clustalfound = *Bio::Tools::Run::Alignment::Clustalw*->exists_clustal() print "$clustalfound"; if '0' then it can't find an executable of clustalw. Does it work from the commandline ? -james Sun, Jian wrote: >Dear all: > I tried to align multiple sequence through Clustalw.pm, and I used the source code attached below: >***************************************************************************************************** >#!C:\Perl\bin\perl.exe >use lib "C:\Perl\lib"; > >use Bio::Perl; >use Bio::Tools::Run::Alignment::Clustalw; >use strict; >use warnings; > >$ENV{CLUSTALDIR} = 'C:/Program Files/Apache Group/Apache2/bin/'; >my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $ktuple = 3; > $factory->ktuple($ktuple); # change the parameter before executing > my $str = Bio::SeqIO->new(-file=> 'Clustseq.fa', '-format' => 'Fasta'); > my @seq_array =(); > while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} > my $seq_array_ref = \@seq_array; > # where @seq_array is an array of Bio::Seq objects > my $aln = $factory->align($seq_array_ref); >************************************************************************** > >and when I run the pl file, I get this error message: > >/////////////////////////////////////////////////////////////////////////////////////////// >Clustalw program not found as clustalw or not executable. >...... >'clustalw' is not recognized as an internal or external command, >operable program or batch file. >---------------------EXCEPTION----------------------------------------- >MSG: Clustalw call crashed:256 > >STACK Bio::Tools::Run::Alignment::Clustalw::_run c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:581 >STACK Bio::Tools::Run::Alignment::Clustalw::_run c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:507 >STACK toplevel test726pl.pl:33 >----------------------------------------------------------------------------------------- >/////////////////////////////////////////////////////////////////////////////////////// > >Since I already set the CLUSTALDIR variable, I don't know why the Clustalw is still not execuatable. Did I did the right setting? Does anyone have this kind of experience? > >Thanks in advance. >Jane > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From james.wasmuth at ed.ac.uk Tue Jul 27 08:42:29 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Tue Jul 27 08:51:03 2004 Subject: [Bioperl-l] Help for using Clustalw.pm In-Reply-To: <41064A89.7080004@ed.ac.uk> References: <41064A89.7080004@ed.ac.uk> Message-ID: <41064DB5.8060206@ed.ac.uk> Ooops, I seem to have errant asterisks $clustalfound = Bio::Tools::Run::Alignment::Clustalw->exists_clustal() James Wasmuth wrote: > Jane, > > Try: > > $clustalfound = *Bio::Tools::Run::Alignment::Clustalw*->exists_clustal() > > print "$clustalfound"; > > if '0' then it can't find an executable of clustalw. Does it work from > the commandline ? > > -james > > > Sun, Jian wrote: > >> Dear all: >> I tried to align multiple sequence through Clustalw.pm, and I >> used the source code attached below: >> ***************************************************************************************************** >> >> #!C:\Perl\bin\perl.exe >> use lib "C:\Perl\lib"; >> >> use Bio::Perl; >> use Bio::Tools::Run::Alignment::Clustalw; >> use strict; >> use warnings; >> >> $ENV{CLUSTALDIR} = 'C:/Program Files/Apache Group/Apache2/bin/'; >> my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >> my $ktuple = 3; >> $factory->ktuple($ktuple); # change the parameter before executing >> my $str = Bio::SeqIO->new(-file=> 'Clustseq.fa', '-format' => 'Fasta'); >> my @seq_array =(); >> while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} >> my $seq_array_ref = \@seq_array; >> # where @seq_array is an array of Bio::Seq objects >> my $aln = $factory->align($seq_array_ref); >> ************************************************************************** >> >> >> and when I run the pl file, I get this error message: >> >> /////////////////////////////////////////////////////////////////////////////////////////// >> >> Clustalw program not found as clustalw or not executable. >> ...... 'clustalw' is not recognized as an internal or external >> command, operable program or batch file. >> ---------------------EXCEPTION----------------------------------------- >> MSG: Clustalw call crashed:256 >> >> STACK Bio::Tools::Run::Alignment::Clustalw::_run >> c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:581 >> STACK Bio::Tools::Run::Alignment::Clustalw::_run >> c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:507 >> STACK toplevel test726pl.pl:33 >> ----------------------------------------------------------------------------------------- >> >> /////////////////////////////////////////////////////////////////////////////////////// >> >> >> Since I already set the CLUSTALDIR variable, I don't know why the >> Clustalw is still not execuatable. Did I did the right setting? Does >> anyone have this kind of experience? >> Thanks in advance. >> Jane >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From davila at ioc.fiocruz.br Tue Jul 27 08:52:10 2004 From: davila at ioc.fiocruz.br (davila) Date: Tue Jul 27 08:56:53 2004 Subject: [Bioperl-l] Bio::SearchIO Message-ID: <8D44604203DAF9438BF9123B4A08C779575F91@alpha.ioc.fiocruz.br> I was using the last year a Blast parser with the following lines: $query_start=$hsp->start('query'); $query_end=$hsp->end('query'); $hit_start=$hsp->start('hit'); $hit_end=$hsp->end('hit'); however it is no longer working and I had to comment them in order to have the parser working. If there were any changes, how I could now catch the "query_start", "query_end", "hit_start" and "hit_end" from the Blast results ? Thanks in advance, Alberto From jason at cgt.duhs.duke.edu Tue Jul 27 10:03:43 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jul 27 10:05:16 2004 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575F91@alpha.ioc.fiocruz.br> References: <8D44604203DAF9438BF9123B4A08C779575F91@alpha.ioc.fiocruz.br> Message-ID: It should still work. You may try: $hsp->query->start $hsp->query->end $hsp->hit->start $hsp->hit->end This should be equivalent with $hsp->start('hit'), $hsp->end('hit'), ... etc But it is possible that something has been changed in the object layer that broke this. I'm not really sure since it should be in the tests... if you can post code + an example BLAST file as a bug report at http://bugzilla.bioperl.org which demonstrates the problem we'll have a look. -jason On Tue, 27 Jul 2004, davila wrote: > I was using the last year a Blast parser with the following lines: > > $query_start=$hsp->start('query'); > $query_end=$hsp->end('query'); > $hit_start=$hsp->start('hit'); > $hit_end=$hsp->end('hit'); > > however it is no longer working and I had to comment them in order to have the parser working. If there were any changes, how I could now catch the "query_start", "query_end", "hit_start" and "hit_end" from the Blast results ? > > Thanks in advance, Alberto > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From james.wasmuth at ed.ac.uk Tue Jul 27 12:41:58 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Tue Jul 27 12:50:36 2004 Subject: [Bioperl-l] Help for using Clustalw.pm In-Reply-To: <4106858B.8080807@ed.ac.uk> References: <4106858B.8080807@ed.ac.uk> Message-ID: <410685D6.1010707@ed.ac.uk> I'm not sure whether $ENV{CLUSTALDIR} does not work because you're working in Windows. Anyone have an idea? As for keeping the output, there's two ways: 1. in the params use the 'outfile' => something.aln I think its 'outfile'. You'll need to check with the clustalw documentaion as to the commandline option to specify the name of the outfile. Sorry I can't remember. or 2. $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); $aln = $factory->align($inputfilename); #$aln is a SimpleAlign object which you can write to file. $out = Bio::AlignIO->new(-file => ">outputfilename", -format => 'msf'); while ( my $single = $aln->next_aln() ) { $out->write_aln($single); } I think this should work. Let me know if it doesn't. It's been a while since I've used these. -james > > > > Sun, Jian wrote: > >>Hi, James; >> Thanks a lot. It seems that the below command line didn't modifed the CLUSTALDIR variable. >>$ENV{CLUSTALDIR} = 'C:/Program Files/Apache Group/Apache2/bin/'; >> >>So I copied to Clustalw.exe to current file directory then the program works. And I get another question: >>How can I setup the directory to keep the result file? Where to set this? >> >>Thanks again and very appreciated >>Jane >> >>________________________________ >> >>From: James Wasmuth [mailto:james.wasmuth@ed.ac.uk] >>Sent: Tue 7/27/2004 7:42 AM >>To: James Wasmuth >>Cc: Sun, Jian; bioperl-l@bioperl.org >>Subject: Re: [Bioperl-l] Help for using Clustalw.pm >> >> >> >>Ooops, >> >>I seem to have errant asterisks >> >>$clustalfound = Bio::Tools::Run::Alignment::Clustalw->exists_clustal() >> >> >>James Wasmuth wrote: >> >> >> >>>Jane, >>> >>>Try: >>> >>>$clustalfound = *Bio::Tools::Run::Alignment::Clustalw*->exists_clustal() >>> >>>print "$clustalfound"; >>> >>>if '0' then it can't find an executable of clustalw. Does it work from >>>the commandline ? >>> >>>-james >>> >>> >>>Sun, Jian wrote: >>> >>> >>> >>>>Dear all: >>>> I tried to align multiple sequence through Clustalw.pm, and I >>>>used the source code attached below: >>>>***************************************************************************************************** >>>> >>>>#!C:\Perl\bin\perl.exe >>>>use lib "C:\Perl\lib"; >>>> >>>>use Bio::Perl; >>>>use Bio::Tools::Run::Alignment::Clustalw; >>>>use strict; >>>>use warnings; >>>> >>>>$ENV{CLUSTALDIR} = 'C:/Program Files/Apache Group/Apache2/bin/'; >>>>my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >>>> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >>>> my $ktuple = 3; >>>> $factory->ktuple($ktuple); # change the parameter before executing >>>> my $str = Bio::SeqIO->new(-file=> 'Clustseq.fa', '-format' => 'Fasta'); >>>>my @seq_array =(); >>>>while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} >>>> my $seq_array_ref = \@seq_array; >>>> # where @seq_array is an array of Bio::Seq objects >>>> my $aln = $factory->align($seq_array_ref); >>>>************************************************************************** >>>> >>>> >>>>and when I run the pl file, I get this error message: >>>> >>>>/////////////////////////////////////////////////////////////////////////////////////////// >>>> >>>>Clustalw program not found as clustalw or not executable. >>>>...... 'clustalw' is not recognized as an internal or external >>>>command, operable program or batch file. >>>>---------------------EXCEPTION----------------------------------------- >>>>MSG: Clustalw call crashed:256 >>>> >>>>STACK Bio::Tools::Run::Alignment::Clustalw::_run >>>>c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:581 >>>>STACK Bio::Tools::Run::Alignment::Clustalw::_run >>>>c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:507 >>>>STACK toplevel test726pl.pl:33 >>>>----------------------------------------------------------------------------------------- >>>> >>>>/////////////////////////////////////////////////////////////////////////////////////// >>>> >>>> >>>>Since I already set the CLUSTALDIR variable, I don't know why the >>>>Clustalw is still not execuatable. Did I did the right setting? Does >>>>anyone have this kind of experience? >>>>Thanks in advance. >>>>Jane >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >> >>-- >>I like nonsense, it wakes up the brain cells. >> -- Dr. Seuss >> >>Blaxter Nematode Genomics Group | >>School of Biological Sciences | >>Ashworth Laboratories | tel: +44 131 650 7403 >>University of Edinburgh | web: www.nematodes.org >>Edinburgh | >>EH9 3JT | >>UK | >> >> >> >> >> >> >> > >-- >I like nonsense, it wakes up the brain cells. > -- Dr. Seuss > >Blaxter Nematode Genomics Group | >School of Biological Sciences | >Ashworth Laboratories | tel: +44 131 650 7403 >University of Edinburgh | web: www.nematodes.org >Edinburgh | >EH9 3JT | >UK | > > > -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From jason at cgt.duhs.duke.edu Tue Jul 27 12:58:38 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jul 27 13:00:16 2004 Subject: [Bioperl-l] Help for using Clustalw.pm In-Reply-To: <410685D6.1010707@ed.ac.uk> References: <4106858B.8080807@ed.ac.uk> <410685D6.1010707@ed.ac.uk> Message-ID: depends on when you put that command in - you want to make sure it is BEFORE you have as some variables are set at load time. use Bio::Tools::Run::Alignment::Clustalw; You can also just override the discovery and do $factory->executable('/usr/local/bin/clustalw'); On Tue, 27 Jul 2004, James Wasmuth wrote: > I'm not sure whether $ENV{CLUSTALDIR} does not work because you're > working in Windows. Anyone have an idea? > > As for keeping the output, there's two ways: > > 1. in the params use the 'outfile' => something.aln > > I think its 'outfile'. You'll need to check with the clustalw > documentaion as to the commandline option to specify the name of the > outfile. Sorry I can't remember. > > or > > 2. > > $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > $aln = $factory->align($inputfilename); > > #$aln is a SimpleAlign object which you can write to file. > > $out = Bio::AlignIO->new(-file => ">outputfilename", > -format => 'msf'); > > while ( my $single = $aln->next_aln() ) { $out->write_aln($single); } > > I think this should work. Let me know if it doesn't. It's been a while > since I've used these. > > -james > > > > > > > > > > Sun, Jian wrote: > > > >>Hi, James; > >> Thanks a lot. It seems that the below command line didn't modifed the CLUSTALDIR variable. > >>$ENV{CLUSTALDIR} = 'C:/Program Files/Apache Group/Apache2/bin/'; > >> > >>So I copied to Clustalw.exe to current file directory then the program works. And I get another question: > >>How can I setup the directory to keep the result file? Where to set this? > >> > >>Thanks again and very appreciated > >>Jane > >> > >>________________________________ > >> > >>From: James Wasmuth [mailto:james.wasmuth@ed.ac.uk] > >>Sent: Tue 7/27/2004 7:42 AM > >>To: James Wasmuth > >>Cc: Sun, Jian; bioperl-l@bioperl.org > >>Subject: Re: [Bioperl-l] Help for using Clustalw.pm > >> > >> > >> > >>Ooops, > >> > >>I seem to have errant asterisks > >> > >>$clustalfound = Bio::Tools::Run::Alignment::Clustalw->exists_clustal() > >> > >> > >>James Wasmuth wrote: > >> > >> > >> > >>>Jane, > >>> > >>>Try: > >>> > >>>$clustalfound = *Bio::Tools::Run::Alignment::Clustalw*->exists_clustal() > >>> > >>>print "$clustalfound"; > >>> > >>>if '0' then it can't find an executable of clustalw. Does it work from > >>>the commandline ? > >>> > >>>-james > >>> > >>> > >>>Sun, Jian wrote: > >>> > >>> > >>> > >>>>Dear all: > >>>> I tried to align multiple sequence through Clustalw.pm, and I > >>>>used the source code attached below: > >>>>***************************************************************************************************** > >>>> > >>>>#!C:\Perl\bin\perl.exe > >>>>use lib "C:\Perl\lib"; > >>>> > >>>>use Bio::Perl; > >>>>use Bio::Tools::Run::Alignment::Clustalw; > >>>>use strict; > >>>>use warnings; > >>>> > >>>>$ENV{CLUSTALDIR} = 'C:/Program Files/Apache Group/Apache2/bin/'; > >>>>my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > >>>> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > >>>> my $ktuple = 3; > >>>> $factory->ktuple($ktuple); # change the parameter before executing > >>>> my $str = Bio::SeqIO->new(-file=> 'Clustseq.fa', '-format' => 'Fasta'); > >>>>my @seq_array =(); > >>>>while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} > >>>> my $seq_array_ref = \@seq_array; > >>>> # where @seq_array is an array of Bio::Seq objects > >>>> my $aln = $factory->align($seq_array_ref); > >>>>************************************************************************** > >>>> > >>>> > >>>>and when I run the pl file, I get this error message: > >>>> > >>>>/////////////////////////////////////////////////////////////////////////////////////////// > >>>> > >>>>Clustalw program not found as clustalw or not executable. > >>>>...... 'clustalw' is not recognized as an internal or external > >>>>command, operable program or batch file. > >>>>---------------------EXCEPTION----------------------------------------- > >>>>MSG: Clustalw call crashed:256 > >>>> > >>>>STACK Bio::Tools::Run::Alignment::Clustalw::_run > >>>>c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:581 > >>>>STACK Bio::Tools::Run::Alignment::Clustalw::_run > >>>>c:/Perl/Site/Lib/Bio/Toold/Run/Alignment/Clustalw.pm:507 > >>>>STACK toplevel test726pl.pl:33 > >>>>----------------------------------------------------------------------------------------- > >>>> > >>>>/////////////////////////////////////////////////////////////////////////////////////// > >>>> > >>>> > >>>>Since I already set the CLUSTALDIR variable, I don't know why the > >>>>Clustalw is still not execuatable. Did I did the right setting? Does > >>>>anyone have this kind of experience? > >>>>Thanks in advance. > >>>>Jane > >>>> > >>>> > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l@portal.open-bio.org > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> > >>>> > >> > >>-- > >>I like nonsense, it wakes up the brain cells. > >> -- Dr. Seuss > >> > >>Blaxter Nematode Genomics Group | > >>School of Biological Sciences | > >>Ashworth Laboratories | tel: +44 131 650 7403 > >>University of Edinburgh | web: www.nematodes.org > >>Edinburgh | > >>EH9 3JT | > >>UK | > >> > >> > >> > >> > >> > >> > >> > > > >-- > >I like nonsense, it wakes up the brain cells. > > -- Dr. Seuss > > > >Blaxter Nematode Genomics Group | > >School of Biological Sciences | > >Ashworth Laboratories | tel: +44 131 650 7403 > >University of Edinburgh | web: www.nematodes.org > >Edinburgh | > >EH9 3JT | > >UK | > > > > > > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From pleo at mail.nih.gov Tue Jul 27 13:58:25 2004 From: pleo at mail.nih.gov (Leo, Paul (NIH/NHGRI)) Date: Tue Jul 27 17:14:25 2004 Subject: [Bioperl-l] Getting chromosome from GenPept? Message-ID: <0E3E7E8F6E23DF4C8127A063568356B5084A29B1@nihexchange12.nih.gov> Hi I have a bunch of proteins denoted by their gi number from a Mass Spect. Expt. which I want to organize. I get the protein "details" from Bio::DB::GenPept using the GI number (want to find Gene names and other properties etc ... ). I also want the chromosome if it is available. I usually get these from "primary_tag='source' tag='chromosome' in the sequence object but this is not always present eg Gi= 23272966 has no chromosome info (see http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein &val=23272966) But if I following the link through the LocusID (CDS primary tag) to LocusLink ( http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=11947 ) I find that it is chromosome 10.... What is the "Bio-Perl" way to get this info??? Otherwise I was just going to get the $LocusID from the $seq object and get the page http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=$LocusID ... strip the HTML and then get the chromosome. Is there a more sexy way to do this? Thanks Paul From stoltzfu at umbi.umd.edu Tue Jul 27 11:23:07 2004 From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus) Date: Tue Jul 27 17:14:49 2004 Subject: [Bioperl-l] display Newick trees In-Reply-To: Message-ID: On Friday, July 16, 2004, at 12:24 AM, dimka wrote: > I'm looking for a perl program that would generate evolutionary trees > (ps, png, gif) read from a Newick (phylip dnd) file, > http://evolution.genetics.washington.edu/phylip/newicktree.html We have some Perl software called nexplot that we developed for our own purposes that may be of use. It is not a part of BioPerl, yet. You can test out the functionality at www.molevol.org/nexplorer, and from there you can download the Perl library and applications to make PostScript plots of trees. Nexplot uses a NEXUS files as input, and outputs a tree with the tips aligned with a data matrix (e.g., a sequence alignment). However, it has a tree-only mode, and the NEXUS input file incorporates the Newick standard tree format in its TREES block, so it would be easy (in principle) for you to just create NEXUS files with the trees you want to display, and then use nexplot to display them (you can even put multiple named trees in a single file, and nexplot can plot a tree chosen by name). Arlin ------------------ Arlin Stoltzfus (arlin.stoltzfus@nist.gov) Research Biologist, NIST; Adj. Asst. Prof., UMBI CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 tel 301 738 6208, fax 301 738 6255, web home www.molevol.org/camel From jason at cgt.duhs.duke.edu Tue Jul 27 17:21:36 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Jul 27 17:23:04 2004 Subject: [Bioperl-l] display Newick trees In-Reply-To: References: Message-ID: On Tue, 27 Jul 2004, Arlin Stoltzfus wrote: > On Friday, July 16, 2004, at 12:24 AM, dimka wrote: > > > I'm looking for a perl program that would generate evolutionary trees > > (ps, png, gif) read from a Newick (phylip dnd) file, > > http://evolution.genetics.washington.edu/phylip/newicktree.html > > We have some Perl software called nexplot that we developed for our own > purposes that may be of use. It is not a part of BioPerl, yet. You can > test out the functionality at www.molevol.org/nexplorer, and from there > you can download the Perl library and applications to make PostScript > plots of trees. Nexplot uses a NEXUS files as input, and outputs a tree > with the tips aligned with a data matrix (e.g., a sequence alignment). > However, it has a tree-only mode, and the NEXUS input file incorporates > the Newick standard tree format in its TREES block, so it would be easy > (in principle) for you to just create NEXUS files with the trees you > want to display, and then use nexplot to display them (you can even put > multiple named trees in a single file, and nexplot can plot a tree > chosen by name). > And I've just added last week write_tree capabilities to Bio::TreeIO::nexus so it is even easier to generate the input files to nexplot. It may need a little more testing but worked great for my needs. -jason > Arlin > > ------------------ > Arlin Stoltzfus (arlin.stoltzfus@nist.gov) > Research Biologist, NIST; Adj. Asst. Prof., UMBI > CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 > tel 301 738 6208, fax 301 738 6255, web home www.molevol.org/camel > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From brian_osborne at cognia.com Tue Jul 27 20:13:13 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Jul 27 20:14:47 2004 Subject: [Bioperl-l] Protein interaction modules In-Reply-To: <40F65456.4080607@ed.ac.uk> Message-ID: Chris, It looks like these 2 modules should be added to Bioperl::Bundle: Class::AutoClass Clone Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Richard Adams Sent: Thursday, July 15, 2004 5:55 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Protein interaction modules Hello BioPerlers, I'd like to bring to peoples' attention some modules for parsing and analyzing protein interaction data. They are in CVS under Bio/Graph and are : Bio/Graph/IO Bio/Graph/IO/dip Bio/Graph/IO/psi_xml are modules for reading /writing graph data and function in an analagous way to the SeqIO system. Bio/Graph/SimpleGraph Bio/Graph/SimpleGraph/traversal are generic graph modules written by Nat Goodman and provide functionality for traversing and building graphs. These are independent of BioPerl but are added here as they're not in CPAN yet. Bio/Graph/ProteinGraph Bio/Graph/Edge extend the SimpleGraph modules to deal with multiple sequence identifiers, duplicate edges and more complex data about the nature of an interaction. In this implementation, nodes are Bio::Seq objects. Interactions are represented by Bio/Graph/Edge objects These modules are very much biologically orientated, and are written with the following sort of tasks in mind: E.g., How can I annotate my sequences with interaction data? Which nodes cause the most disruption to the network if perturbed? What happens to network properties if a node is deleted? How can I merge 2 protein interaction data sets together, and find duplicate interactions? How can I calculate basic graph properties of my interaction data set ? e.g., density, clustering coefficient. code to demonstrate some of these tasks can be found in the Synopsis of Bio/Graph/ProteinGraph. There is test suite , t/protgraph.t which test most of the methods. I'd be very interested in feedback, ideas for what to include in a protein/DNA or protein/RNA interaction class, bugs etc. To use these modules you need: XML::Twig if you want to parse XML Clone Class::AutoClass - the SimpleGraph modules depend on this. The test suite tests for these modules. Obvious improvements are : AT present the XML parser just gets the basic interaction data, not the full dataset. An psi_xml writer. Richard -- Dr Richard Adams Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Tue Jul 27 17:55:40 2004 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue Jul 27 21:07:41 2004 Subject: [Bioperl-l] Getting chromosome from GenPept? In-Reply-To: <0E3E7E8F6E23DF4C8127A063568356B5084A29B1@nihexchange12.nih.gov> References: <0E3E7E8F6E23DF4C8127A063568356B5084A29B1@nihexchange12.nih.gov> Message-ID: <4106CF5C.7020903@genetics.utah.edu> Well, how about reading loc2acc (ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2acc) into a hash with the accession as key and LocusLink ID as value. Use that to convert from gi to LL ID, the having read LL.out (or the LL.out.??.gz for your organism) into a hash with LL ID as key and the other fields (or just the chromosome number in column 5) as a hash of values you use LL ID to look up chromosome. Better yet, combine the two above files into one has with accession as key, and do a one step look up. Sexy...very sexy - got to run take a cold shower. Barry Leo, Paul (NIH/NHGRI) wrote: >Hi >I have a bunch of proteins denoted by their gi number from a Mass Spect. >Expt. which I want to organize. I get the protein "details" from >Bio::DB::GenPept using the GI number (want to find Gene names and other >properties etc ... ). > >I also want the chromosome if it is available. I usually get these from >"primary_tag='source' tag='chromosome' in the sequence object but this is >not always present eg > >Gi= 23272966 has no chromosome info (see >http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein > >&val=23272966) > >But if I following the link through the LocusID (CDS primary tag) to >LocusLink ( http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=11947 > ) > >I find that it is chromosome 10.... > >What is the "Bio-Perl" way to get this info??? Otherwise I was just going to >get the $LocusID from the $seq object and get the page >http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=$LocusID > ... strip the >HTML and then get the chromosome. > >Is there a more sexy way to do this? > > > >Thanks >Paul > > > > >------------------------------------------------------------------------ > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From gillies82 at hotmail.com Wed Jul 28 06:22:55 2004 From: gillies82 at hotmail.com (Stuart Gillies) Date: Wed Jul 28 06:24:28 2004 Subject: [Bioperl-l] ensembl database Message-ID: Hi, i am trying to write a bio perl script to query the ensembl database. i am trying to retrieve data associated with a specific region on the mouse genome chromosome 17: chr17:31,132,104-35,332,226. i want a list of ensembl genes in that region, with as much info as possible, such as transcription start point, ensembl gene ID,etc i also really need a list of conserved sequences (which should be found in the compara database) for that mouse genomic region. any help would be great stuart gillies _________________________________________________________________ Use MSN Messenger to send music and pics to your friends http://www.msn.co.uk/messenger From bmb9jrm at bmb.leeds.ac.uk Wed Jul 28 06:34:48 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Wed Jul 28 06:36:17 2004 Subject: [Bioperl-l] ensembl database In-Reply-To: References: Message-ID: <1091010887.6813.2.camel@localhost.localdomain> I'd advise you to get hold of the Ensembl API if you can- that's what I've been using and would seem to be what you need. It's available via CVS from the Ensembl site. Jon On Wed, 2004-07-28 at 11:22, Stuart Gillies wrote: > Hi, i am trying to write a bio perl script to query the ensembl database. i > am trying to retrieve data associated with a specific region on the mouse > genome chromosome 17: chr17:31,132,104-35,332,226. i want a list of ensembl > genes in that region, with as much info as possible, such as transcription > start point, ensembl gene ID,etc > > i also really need a list of conserved sequences (which should be found in > the compara database) for that mouse genomic region. > > any help would be great > > stuart gillies > > _________________________________________________________________ > Use MSN Messenger to send music and pics to your friends > http://www.msn.co.uk/messenger > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From skirov at utk.edu Wed Jul 28 07:13:59 2004 From: skirov at utk.edu (Stefan Kirov) Date: Wed Jul 28 07:15:29 2004 Subject: [Bioperl-l] ensembl database In-Reply-To: References: Message-ID: <41078A77.2050207@utk.edu> Why wouldn't you use ensembl API. It is great for exactly this type of stuff. Something like: $hsdbadapt= new Bio::EnsEMBL::DBSQL::DBAdaptor(-host => $host, -user => 'anonymous', -dbname =>'homo_sapiens_core_21_34d',-pass=>''); my $slice_adaptor=$hsdbadapt->get_SliceAdaptor; my $slice = $slice_adaptor->fetch_by_chr_start_end($chr,$begin,$end); my @genes=@{$slice->get_all_Genes}; foreach my $gene(@genes) { push @ids,$gene->stable_id; } ...etc. The current server is ensembldb.ensembl.org. See the ensembl API documentation for more help. Hope this helps Stefan Stuart Gillies wrote: > Hi, i am trying to write a bio perl script to query the ensembl > database. i am trying to retrieve data associated with a specific > region on the mouse genome chromosome 17: chr17:31,132,104-35,332,226. > i want a list of ensembl genes in that region, with as much info as > possible, such as transcription start point, ensembl gene ID,etc > > i also really need a list of conserved sequences (which should be > found in the compara database) for that mouse genomic region. > > any help would be great > > stuart gillies > > _________________________________________________________________ > Use MSN Messenger to send music and pics to your friends > http://www.msn.co.uk/messenger > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 1060 Commerce Park, Oak Ridge TN 37830-8026 USA tel +865 576 5120 fax +865 241 1965 e-mail: skirov@utk.edu sao@ornl.gov From james.wasmuth at ed.ac.uk Wed Jul 28 12:34:34 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Wed Jul 28 12:43:33 2004 Subject: [Bioperl-l] Help for using Clustalw.pm In-Reply-To: References: Message-ID: <4107D59A.9080608@ed.ac.uk> Ooops. the next_aln method is for AlignIO, which allows one to read in an alignment. It is used to create an SimpleAlign object, but you already have one, so I think that $out->write_aln($aln); instead of the while loop is the way forward. -james Sun, Jian wrote: >Dear James; > Thanks again. I added the scipts code you presented in last message to myine as shown below, but still have some probelm: > >****************************************************************** >#!C:\Perl\bin\perl.exe ># To install in web, make a directory to hold your Perl modules in web space >use lib "C:\Perl\lib"; >use FileIO; >use SeqFileIO; >use IO::String; >use CGI qw/:standard/; >use Bio::Perl; >use Bio::Tools::Run::Alignment::Clustalw; >use Bio::SimpleAlign; >use Bio::AlignIO; >use strict; >use warnings; >my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $ktuple = 3; > $factory->ktuple($ktuple); # change the parameter before executing > my $str = Bio::SeqIO->new(-file=> 'Clustseq.fa', '-format' => 'Fasta'); > my @seq_array =(); > while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} > my $seq_array_ref = \@seq_array; ># where @seq_array is an array of Bio::Seq objects > > my $aln = $factory->align($seq_array_ref); > #print "\nThe alignment result is : ", $aln; > #$aln is a SimpleAlign object >my $out = Bio::AlignIO->new(-file => ">aln.out", > -format => 'msf'); >while ( my $single = $aln->next_aln() ) { $out->write_aln($single); } >************************************************************************* > >The error message I get is: >///////////////////////////////////////////////////////////////////////////////////////////////////////// >"Clutalw run successfully and some result information displayed...." >.......... >GCG-Alignment file created [c:\....\temp\7i0vb1kph6] >cannot locate object method "next_aln" via package "Bio::SimpleAlign" (perhaps >you forgot top load "Bio::SimpleAlign"?) at test728.pl line 40. >/////////////////////////////////////////////////////////////////////////////////////////////////////// > >While it is obviously that I have already load the Bio::SimpleAlign by >use Bio::SimpleAlign >as shown above at the source code part. >What's the problem again here? > >Thank you in advance. >Jane > >________________________________ > >From: James Wasmuth [mailto:james.wasmuth@ed.ac.uk] >Sent: Tue 7/27/2004 11:41 AM >To: James Wasmuth >Cc: Sun, Jian; bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] Help for using Clustalw.pm > > > > I'm not sure whether $ENV{CLUSTALDIR} does not work because you're >working in Windows. Anyone have an idea? > >As for keeping the output, there's two ways: > >1. in the params use the 'outfile' => something.aln > >I think its 'outfile'. You'll need to check with the clustalw >documentaion as to the commandline option to specify the name of the >outfile. Sorry I can't remember. > >or > >2. > >$factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >$aln = $factory->align($inputfilename); > >#$aln is a SimpleAlign object which you can write to file. > >$out = Bio::AlignIO->new(-file => ">outputfilename", > -format => 'msf'); > >while ( my $single = $aln->next_aln() ) { $out->write_aln($single); } > >I think this should work. Let me know if it doesn't. It's been a while >since I've used these. > >-james > > > > > > > -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From james.wasmuth at ed.ac.uk Wed Jul 28 12:35:13 2004 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Wed Jul 28 12:44:08 2004 Subject: [Bioperl-l] Help for using Clustalw.pm] Message-ID: <4107D5C1.8090203@ed.ac.uk> Ooops. the next_aln method is for AlignIO, which allows one to read in an alignment. It is used to create an SimpleAlign object, but you already have one, so I think that $out->write_aln($aln); instead of the while loop is the way forward. -james Sun, Jian wrote: >Dear James; > Thanks again. I added the scipts code you presented in last message to myine as shown below, but still have some probelm: > >****************************************************************** >#!C:\Perl\bin\perl.exe ># To install in web, make a directory to hold your Perl modules in web space >use lib "C:\Perl\lib"; >use FileIO; >use SeqFileIO; >use IO::String; >use CGI qw/:standard/; >use Bio::Perl; >use Bio::Tools::Run::Alignment::Clustalw; >use Bio::SimpleAlign; >use Bio::AlignIO; >use strict; >use warnings; >my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $ktuple = 3; > $factory->ktuple($ktuple); # change the parameter before executing > my $str = Bio::SeqIO->new(-file=> 'Clustseq.fa', '-format' => 'Fasta'); > my @seq_array =(); > while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} > my $seq_array_ref = \@seq_array; ># where @seq_array is an array of Bio::Seq objects > > my $aln = $factory->align($seq_array_ref); > #print "\nThe alignment result is : ", $aln; > #$aln is a SimpleAlign object >my $out = Bio::AlignIO->new(-file => ">aln.out", > -format => 'msf'); >while ( my $single = $aln->next_aln() ) { $out->write_aln($single); } >************************************************************************* > >The error message I get is: >///////////////////////////////////////////////////////////////////////////////////////////////////////// >"Clutalw run successfully and some result information displayed...." >.......... >GCG-Alignment file created [c:\....\temp\7i0vb1kph6] >cannot locate object method "next_aln" via package "Bio::SimpleAlign" (perhaps >you forgot top load "Bio::SimpleAlign"?) at test728.pl line 40. >/////////////////////////////////////////////////////////////////////////////////////////////////////// > >While it is obviously that I have already load the Bio::SimpleAlign by >use Bio::SimpleAlign >as shown above at the source code part. >What's the problem again here? > >Thank you in advance. >Jane > >________________________________ > >From: James Wasmuth [mailto:james.wasmuth@ed.ac.uk] >Sent: Tue 7/27/2004 11:41 AM >To: James Wasmuth >Cc: Sun, Jian; bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] Help for using Clustalw.pm > > > > I'm not sure whether $ENV{CLUSTALDIR} does not work because you're >working in Windows. Anyone have an idea? > >As for keeping the output, there's two ways: > >1. in the params use the 'outfile' => something.aln > >I think its 'outfile'. You'll need to check with the clustalw >documentaion as to the commandline option to specify the name of the >outfile. Sorry I can't remember. > >or > >2. > >$factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >$aln = $factory->align($inputfilename); > >#$aln is a SimpleAlign object which you can write to file. > >$out = Bio::AlignIO->new(-file => ">outputfilename", > -format => 'msf'); > >while ( my $single = $aln->next_aln() ) { $out->write_aln($single); } > >I think this should work. Let me know if it doesn't. It's been a while >since I've used these. > >-james > > > > > > > -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | -- I like nonsense, it wakes up the brain cells. -- Dr. Seuss Blaxter Nematode Genomics Group | School of Biological Sciences | Ashworth Laboratories | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From djkojeti at unity.ncsu.edu Wed Jul 28 13:03:42 2004 From: djkojeti at unity.ncsu.edu (Douglas Kojetin) Date: Wed Jul 28 13:05:13 2004 Subject: [Bioperl-l] pdb files/structure system Message-ID: <1403A370-E0B8-11D8-AC07-000A9597278C@unity.ncsu.edu> Hi All- Can anyone point me towards a tutorial dealing with using Bioperl's Structure system? I'm able to read in a PDB file and print out a sequence, but I cannot figure out how to extract atoms or coordinates from the one line examples given here: http://bioperl.org/Core/Latest/ bptutorial.html#iii.9.1_using_3d_structure_objects_and_reading_pdb_files _(structurei,_structure::io) Using that example, I've tried setting $res to a number of things (1, MET1, MET-1, MET, etc.), but I think it's look for something more sophisticated (i.e. input from another system module)? Is there a HOWTO under development (or in the near future) for the Structure system? Thanks, Doug From jsun at utdallas.edu Wed Jul 28 17:16:21 2004 From: jsun at utdallas.edu (Sun, Jian) Date: Wed Jul 28 17:18:36 2004 Subject: [Bioperl-l] Retrive the sequences from Blast report Message-ID: Dear Bioperl: Does anybody could provide me some advice on how to get or retrive the sequence from a Blast search report, which will be saved in a sequence file and then used for Clustalw alignment? Any help will be appreciated Jian From jsun at utdallas.edu Wed Jul 28 12:19:37 2004 From: jsun at utdallas.edu (Sun, Jian) Date: Wed Jul 28 17:20:40 2004 Subject: [Bioperl-l] Help for using Clustalw.pm Message-ID: Dear James; Thanks again. I added the scipts code you presented in last message to myine as shown below, but still have some probelm: ****************************************************************** #!C:\Perl\bin\perl.exe # To install in web, make a directory to hold your Perl modules in web space use lib "C:\Perl\lib"; use FileIO; use SeqFileIO; use IO::String; use CGI qw/:standard/; use Bio::Perl; use Bio::Tools::Run::Alignment::Clustalw; use Bio::SimpleAlign; use Bio::AlignIO; use strict; use warnings; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $ktuple = 3; $factory->ktuple($ktuple); # change the parameter before executing my $str = Bio::SeqIO->new(-file=> 'Clustseq.fa', '-format' => 'Fasta'); my @seq_array =(); while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} my $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects my $aln = $factory->align($seq_array_ref); #print "\nThe alignment result is : ", $aln; #$aln is a SimpleAlign object my $out = Bio::AlignIO->new(-file => ">aln.out", -format => 'msf'); while ( my $single = $aln->next_aln() ) { $out->write_aln($single); } ************************************************************************* The error message I get is: ///////////////////////////////////////////////////////////////////////////////////////////////////////// "Clutalw run successfully and some result information displayed...." .......... GCG-Alignment file created [c:\....\temp\7i0vb1kph6] cannot locate object method "next_aln" via package "Bio::SimpleAlign" (perhaps you forgot top load "Bio::SimpleAlign"?) at test728.pl line 40. /////////////////////////////////////////////////////////////////////////////////////////////////////// While it is obviously that I have already load the Bio::SimpleAlign by use Bio::SimpleAlign as shown above at the source code part. What's the problem again here? Thank you in advance. Jane ________________________________ From: James Wasmuth [mailto:james.wasmuth@ed.ac.uk] Sent: Tue 7/27/2004 11:41 AM To: James Wasmuth Cc: Sun, Jian; bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Help for using Clustalw.pm I'm not sure whether $ENV{CLUSTALDIR} does not work because you're working in Windows. Anyone have an idea? As for keeping the output, there's two ways: 1. in the params use the 'outfile' => something.aln I think its 'outfile'. You'll need to check with the clustalw documentaion as to the commandline option to specify the name of the outfile. Sorry I can't remember. or 2. $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); $aln = $factory->align($inputfilename); #$aln is a SimpleAlign object which you can write to file. $out = Bio::AlignIO->new(-file => ">outputfilename", -format => 'msf'); while ( my $single = $aln->next_aln() ) { $out->write_aln($single); } I think this should work. Let me know if it doesn't. It's been a while since I've used these. -james From sdavis2 at mail.nih.gov Wed Jul 28 20:46:29 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Jul 28 20:48:03 2004 Subject: [Bioperl-l] Retrive the sequences from Blast report References: Message-ID: <003c01c47505$7f2628b0$04653744@WATSON> Jian, If you have something like (from Bio::Search::Hit::HitI documentation): my $searchio = new Bio::SearchIO(-format => 'blast', -file => 'result.bls'); my $result = $searchio->next_result; my $hit = $result->next_hit; $hit_name = $hit->name(); $desc = $hit->description(); Depending on where you are getting your blast results, you can probably grab the name, description, locus, or accession and use Bio::DB::Genbank to look up the sequence. See Bio::DB::GenBank for details. Sean ----- Original Message ----- From: "Sun, Jian" To: Sent: Wednesday, July 28, 2004 5:16 PM Subject: [Bioperl-l] Retrive the sequences from Blast report > Dear Bioperl: > Does anybody could provide me some advice on how to get or retrive the sequence from a Blast search report, which will be saved in a sequence file and then used for Clustalw alignment? > > Any help will be appreciated > Jian > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jurgen.pletinckx at vt4.net Wed Jul 28 18:15:49 2004 From: jurgen.pletinckx at vt4.net (Jurgen Pletinckx) Date: Wed Jul 28 21:15:17 2004 Subject: [Bioperl-l] pdb files/structure system In-Reply-To: <1403A370-E0B8-11D8-AC07-000A9597278C@unity.ncsu.edu> Message-ID: | Can anyone point me towards a tutorial dealing with using Bioperl's | Structure system? I'm able to read in a PDB file and print out a | sequence, but I cannot figure out how to extract atoms or coordinates | from the one line examples given here: | | http://bioperl.org/Core/Latest/ | bptutorial.html#iii.9.1_using_3d_structure_objects_and_reading_pdb_files | _(structurei,_structure::io) | Using that example, I've tried setting $res to a number of things (1, | MET1, MET-1, MET, etc.), but I think it's look for something more | sophisticated (i.e. input from another system module)? Yes. That example is pretty unenlightening. $res is a Bio::Structure::Residue object in that line. Here's one way to get at these objects: #!/usr/bin/perl -w use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "/PDB/ca/pdb1cam.ent"); my $struc = $structio->next_structure; for my $chain ($struc->get_chains) { my $chainid = $chain->id; # one-letter chaincode if present, 'default' otherwise for my $res ($struc->get_residues($chain)) { my $resid = $res->id; # format is 3-lettercode - dash - residue number, e.g. PHE-20 my $atoms = $struc->get_atoms($res); # actually a list of atom objects, used here to get a count print join "\t", $chainid,$resid,$atoms,"\n"; } } That kind of loop over all objects is often sufficient for me. When I do need direct access, I first construct an index: my %resindex; my %atindex; for my $chain ($struc->get_chains) { for my $res ($struc->get_residues($chain)) { $resindex{$chain->id}{$res->id} = $res; for my $atom ($struc->get_atoms($res)) { $atindex{$chain->id}{$res->id}{$atom->id} = $atom; } } } print join "\t", $atindex{'default'}{'PHE-20'}{'CA'}->xyz,"\n"; and then use that for lookups. Yet another tool I would like to include into the Bio::Structure modules. (as get_res_by_name?) | Is there a HOWTO under development (or in the near future) for the | Structure system? There wasn't, actually. Perhaps there should be. I find myself rather reticent to enshrine the current sad state of affairs by describing the workarounds :/ -- Jurgen Pletinckx AlgoNomics NV jurgen.pletinckx@algonomics.com From bmb9jrm at bmb.leeds.ac.uk Thu Jul 29 00:35:40 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Thu Jul 29 00:35:43 2004 Subject: [Bioperl-l] Bioperl Vista module and CGI Message-ID: <1089127215.6751.28.camel@localhost.localdomain> Hi to all, Sorry if the Bioperl relevance of this is fairly limited, but I thought someone here might have come across the same problem. I'm using the bioperl module wrapper for Vista, and was trying to plug it into a CGI script. The trouble is my Apache server can't 'see' the java class library of Vista- presumably because it doesn't have access to the user CLASSPATH, and I get an error like: java.lang.NoClassDefFoundError: Vista. I've tried: system("export CLASSPATH=$CLASSPATH:/usr/java/j2sdk1.4.2_04/lib/Vista.jar"); from within the script without success. The script works fine when run from the command line. Thanks- appreciate any pointers, Jon From jan.aerts at wur.nl Thu Jul 29 02:19:15 2004 From: jan.aerts at wur.nl (Jan Aerts) Date: Thu Jul 29 02:12:55 2004 Subject: [Bioperl-l] Retrive the sequences from Blast report In-Reply-To: <003c01c47505$7f2628b0$04653744@WATSON> References: <003c01c47505$7f2628b0$04653744@WATSON> Message-ID: <1091081955.2590.4.camel@hyena.local> Of course, there's the wonderfull HOWTO of SearchIO at http://bioperl.org/HOWTOs/html/SearchIO.html jan. On Thu, 2004-07-29 at 02:46, Sean Davis wrote: > Jian, > > If you have something like (from Bio::Search::Hit::HitI documentation): > > my $searchio = new Bio::SearchIO(-format => 'blast', -file => > 'result.bls'); > my $result = $searchio->next_result; > my $hit = $result->next_hit; > > $hit_name = $hit->name(); > > $desc = $hit->description(); > > Depending on where you are getting your blast results, you can probably grab > the name, description, locus, or accession and use Bio::DB::Genbank to look > up the sequence. See Bio::DB::GenBank for details. > > Sean > ----- Original Message ----- > From: "Sun, Jian" > To: > Sent: Wednesday, July 28, 2004 5:16 PM > Subject: [Bioperl-l] Retrive the sequences from Blast report > > > > Dear Bioperl: > > Does anybody could provide me some advice on how to get or retrive the > sequence from a Blast search report, which will be saved in a sequence file > and then used for Clustalw alignment? > > > > Any help will be appreciated > > Jian > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmb9jrm at bmb.leeds.ac.uk Thu Jul 29 05:49:06 2004 From: bmb9jrm at bmb.leeds.ac.uk (Jonathan Manning) Date: Thu Jul 29 05:50:33 2004 Subject: [Bioperl-l] Bioperl Vista module and CGI In-Reply-To: <20040729045934.GA14777@eniac.jgi-psf.org> References: <1089127215.6751.28.camel@localhost.localdomain> <20040729045934.GA14777@eniac.jgi-psf.org> Message-ID: <1091094546.6049.2.camel@localhost.localdomain> Sorry, sent that ages ago, seems to have rematerialised from somewhere! I solved it by giving java a CLASSPATH argument via the Vista wrapper. Jon On Thu, 2004-07-29 at 05:59, Joel Martin wrote: > Hello, > This is just a stab in the murk but usually if you want > to set an environmental variable in perl you set it in the %ENV > hash, what you're doing looks like it would only apply to that > system call (disappearing as it returns). > $ENV{CLASSPATH} = "$ENV{CLASSPATH}:/usr/java/j2sdk1.4.2_04/lib/Vista.jar"; > and I'm sometimes confused about whether or not to wrap that in > a BEGIN statement (don't think it would hurt) like before you "use" > anything: > BEGIN{$ENV{CLASSPATH} = > "$ENV{CLASSPATH}:/usr/java/j2sdk1.4.2_04/lib/Vista.jar";} > > Joel > On Tue, Jul 06, 2004 at 04:20:15PM +0100, Jonathan Manning wrote: > > Hi to all, > > > > Sorry if the Bioperl relevance of this is fairly limited, but I thought > > someone here might have come across the same problem. > > > > I'm using the bioperl module wrapper for Vista, and was trying to plug > > it into a CGI script. The trouble is my Apache server can't 'see' the > > java class library of Vista- presumably because it doesn't have access > > to the user CLASSPATH, and I get an error like: > > java.lang.NoClassDefFoundError: Vista. > > > > I've tried: > > > > system("export > > CLASSPATH=$CLASSPATH:/usr/java/j2sdk1.4.2_04/lib/Vista.jar"); > > > > from within the script without success. The script works fine when run > > from the command line. > > > > Thanks- appreciate any pointers, > > > > Jon > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From brian_osborne at cognia.com Thu Jul 29 08:11:18 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Jul 29 08:12:52 2004 Subject: [Bioperl-l] pdb files/structure system In-Reply-To: Message-ID: Jurgen, I'll put your example into the bptutorial and into a new examples/structure directory, if you don't mind. If anyone else has useful example code please post it, I'd be happy to add it to examples/. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jurgen Pletinckx Sent: Wednesday, July 28, 2004 6:16 PM To: Douglas Kojetin; bioperl-l@bioperl.org Subject: RE: [Bioperl-l] pdb files/structure system | Can anyone point me towards a tutorial dealing with using Bioperl's | Structure system? I'm able to read in a PDB file and print out a | sequence, but I cannot figure out how to extract atoms or coordinates | from the one line examples given here: | | http://bioperl.org/Core/Latest/ | bptutorial.html#iii.9.1_using_3d_structure_objects_and_reading_pdb_files | _(structurei,_structure::io) | Using that example, I've tried setting $res to a number of things (1, | MET1, MET-1, MET, etc.), but I think it's look for something more | sophisticated (i.e. input from another system module)? Yes. That example is pretty unenlightening. $res is a Bio::Structure::Residue object in that line. Here's one way to get at these objects: #!/usr/bin/perl -w use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "/PDB/ca/pdb1cam.ent"); my $struc = $structio->next_structure; for my $chain ($struc->get_chains) { my $chainid = $chain->id; # one-letter chaincode if present, 'default' otherwise for my $res ($struc->get_residues($chain)) { my $resid = $res->id; # format is 3-lettercode - dash - residue number, e.g. PHE-20 my $atoms = $struc->get_atoms($res); # actually a list of atom objects, used here to get a count print join "\t", $chainid,$resid,$atoms,"\n"; } } That kind of loop over all objects is often sufficient for me. When I do need direct access, I first construct an index: my %resindex; my %atindex; for my $chain ($struc->get_chains) { for my $res ($struc->get_residues($chain)) { $resindex{$chain->id}{$res->id} = $res; for my $atom ($struc->get_atoms($res)) { $atindex{$chain->id}{$res->id}{$atom->id} = $atom; } } } print join "\t", $atindex{'default'}{'PHE-20'}{'CA'}->xyz,"\n"; and then use that for lookups. Yet another tool I would like to include into the Bio::Structure modules. (as get_res_by_name?) | Is there a HOWTO under development (or in the near future) for the | Structure system? There wasn't, actually. Perhaps there should be. I find myself rather reticent to enshrine the current sad state of affairs by describing the workarounds :/ -- Jurgen Pletinckx AlgoNomics NV jurgen.pletinckx@algonomics.com _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jurgen.pletinckx at vt4.net Thu Jul 29 08:26:29 2004 From: jurgen.pletinckx at vt4.net (Jurgen Pletinckx) Date: Thu Jul 29 10:25:24 2004 Subject: [Bioperl-l] pdb files/structure system In-Reply-To: Message-ID: | I'll put your example into the bptutorial and into a new | examples/structure | directory, if you don't mind. Not in the least. I had been wondering (ever since last night) how best to go about updating the few lines in the tutorial, and I'm glad I know now :) -- Jurgen Pletinckx AlgoNomics NV jurgen.pletinckx@algonomics.com From amey_desh at rediffmail.com Fri Jul 30 04:02:06 2004 From: amey_desh at rediffmail.com (amey vinod deshpande) Date: Fri Jul 30 10:01:55 2004 Subject: [Bioperl-l] Regarding Bio-Informatics Softwares. Message-ID: <20040730080206.1945.qmail@webmail17.rediffmail.com> ? ? Hello We are students of Information Technology Engg, from India. We are undertaking a project in bio-informatics i.e. Restriction Mapper for type 2 Restriction Enzymes. So could u guide us regarding the same. Our mail IDs are amey_desh@rediffmail.com kirty_modi@sify.com abhijeet_e@rediffmail.com Thanking you. From mebradley at chem.ufl.edu Thu Jul 29 11:06:52 2004 From: mebradley at chem.ufl.edu (Michael Bradley) Date: Fri Jul 30 10:02:47 2004 Subject: [Bioperl-l] registry configuration and access Message-ID: <000001c4757d$ae418db0$8601a8c0@bradleydell> If anyone can help me figure out why I can't access my locally indexed files with the registry system I would greatly appreciate it. I have indexed some swissprot entries using bioflat_index.pl. This appeared to work fine (no errors). The config.dat file looks like this: index BerkeleyDB/1 format URN:LSID:open-bio.org:swiss/protein fileid_0 /home/bradley/PIPELINE/swissprot/uniprot_sprot.dat 575198688 primary_namespace ID secondary_namespaces ACC VERSION The seqdatabase.ini file looks like this: [swissprot] protocol=flat location=/home/bradley/biodb_index/registryspandtrembl dbname=registrySPandTrEMBL My problem starts when attempting to use the registry. use Bio::DB::Registry; my $registry = new Bio::DB::Registry; my $sp = $registry->get_database('swissprot'); I am getting the following error: -------------------- WARNING --------------------- MSG: Couldn't call new_from_registry on [Bio::DB::Flat] ------------- EXCEPTION ------------- MSG: you must specify an indexing scheme STACK Bio::DB::Flat::new /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Flat.pm:163 STACK Bio::DB::Flat::new_from_registry /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Flat.pm:254 STACK (eval) /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Registry.pm:182 STACK Bio::DB::Registry::_load_registry /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Registry.pm:181 STACK Bio::DB::Registry::new /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Registry.pm:97 STACK toplevel /home/bradley/PIPELINE/perl/scripts/Local_SP_EMBL_Retrieval.pl:232 -------------------------------------- By poking around in Flat.pm it appears that Flat.pm is unable to read the index scheme from config.dat because it can't open config.dat. I know this because I edited the Flat.pm with two print statements and only the first statement is executed : # read the configuration file sub _read_config { my $self = shift; my $path = $self->_config_path; print "in sub _read_config\n"; ## this works return unless -e $path; print "reading config\n"; ## this doesn't execute open (F,$path) or $self->throw("open error on $path: $!"); my %config; while () { ***************************** Michael E. Bradley, Ph.D. Postdoctoral Researcher University of Florida Department of Chemistry Foundation for Applied Molecular Evolution Ph. (352) 271-7005 Fax (353) 271-7076 mebradley@chem.ufl.edu **************************** From miroslavac at health.nb.ca Thu Jul 29 14:24:14 2004 From: miroslavac at health.nb.ca (Miroslava Cuperlovic-Culf) Date: Fri Jul 30 10:02:49 2004 Subject: [Bioperl-l] error in Bio::Graphics print $panel->png Message-ID: <410940CE.6010805@health.nb.ca> Dear All, I am trying to run Bio::Graphics for the first time. I think that I have downloaded all the software necessary but still when I try to run just a simple test program: use Bio::Graphics; my $panel = Bio::Graphics::Panel->new(-length => 1000, -width => 800, -pad_left => 10, -pad_right => 10, ); print $panel->png; I get the error: perl: relocation error: /usr/lib/perl5/site_perl/5.6.0/i386-linux/auto/GD/GD.so: undefined symbol: gdImagePngPtr What could this be? Any help would be GREATLY appreciated and sorry to bug you all with most likely a trivial problem. Mira -- From jesper at krogh.cc Fri Jul 30 10:14:49 2004 From: jesper at krogh.cc (Jesper Krogh) Date: Fri Jul 30 10:52:08 2004 Subject: [Bioperl-l] Re: error in Bio::Graphics print $panel->png References: <410940CE.6010805@health.nb.ca> Message-ID: I gmane.comp.lang.perl.bio.general, skrev Miroslava Cuperlovic-Culf: > Dear All, > I am trying to run Bio::Graphics for the first time. I think that I have > downloaded all the software necessary but still when I try to run just > a simple test program: > > use Bio::Graphics; > my $panel = Bio::Graphics::Panel->new(-length => 1000, > -width => 800, > -pad_left => 10, > -pad_right => 10, > ); > print $panel->png; > > > I get the error: > perl: relocation error: > /usr/lib/perl5/site_perl/5.6.0/i386-linux/auto/GD/GD.so: undefined > symbol: gdImagePngPtr > > What could this be? Any help would be GREATLY appreciated and sorry to > bug you all with most likely a trivial problem. There is something wrong with your GD-module installation in perl. Probably to old libgd og something. Jesper -- ./Jesper Krogh, jesper@krogh.cc Jabber ID: jesper@jabbernet.dk From jason at cgt.duhs.duke.edu Fri Jul 30 10:57:14 2004 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Jul 30 10:58:39 2004 Subject: [Bioperl-l] error in Bio::Graphics print $panel->png In-Reply-To: <410940CE.6010805@health.nb.ca> References: <410940CE.6010805@health.nb.ca> Message-ID: Perhaps the version of GD you have installed is older and does not support png? What does this print out? % perl -MGD -e 'print "$GD::VERSION\n"' What if you change the line to pring a GIF instead: print $panel->gd->gif; -jason On Thu, 29 Jul 2004, Miroslava Cuperlovic-Culf wrote: > Dear All, > I am trying to run Bio::Graphics for the first time. I think that I have > downloaded all the software necessary but still when I try to run just > a simple test program: > > use Bio::Graphics; > my $panel = Bio::Graphics::Panel->new(-length => 1000, > -width => 800, > -pad_left => 10, > -pad_right => 10, > ); > print $panel->png; > > > I get the error: > perl: relocation error: > /usr/lib/perl5/site_perl/5.6.0/i386-linux/auto/GD/GD.so: undefined > symbol: gdImagePngPtr > > What could this be? Any help would be GREATLY appreciated and sorry to > bug you all with most likely a trivial problem. > Mira > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From sdavis2 at mail.nih.gov Fri Jul 30 11:43:21 2004 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri Jul 30 11:43:31 2004 Subject: [Bioperl-l] gibbs sampler question Message-ID: <2F5BD290-E23F-11D8-89EE-000A95D7BA10@mail.nih.gov> I have a fairly random question--I am interested in aligning some text (non-protein, non-nucleic acid). Is there a generic multiple aligner for doing multiple lignments with an arbitrary alphabet? Thanks, Sean From brian_osborne at cognia.com Fri Jul 30 12:05:55 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jul 30 12:07:32 2004 Subject: [Bioperl-l] registry configuration and access In-Reply-To: <000001c4757d$ae418db0$8601a8c0@bradleydell> Message-ID: Michael, Your seqdatabase.ini file is saying that there should be a directory called /home/bradley/biodb_index/registryspandtrembl/registrySPandTrEMBL. Does this directory exist? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Michael Bradley Sent: Thursday, July 29, 2004 11:07 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] registry configuration and access If anyone can help me figure out why I can't access my locally indexed files with the registry system I would greatly appreciate it. I have indexed some swissprot entries using bioflat_index.pl. This appeared to work fine (no errors). The config.dat file looks like this: index BerkeleyDB/1 format URN:LSID:open-bio.org:swiss/protein fileid_0 /home/bradley/PIPELINE/swissprot/uniprot_sprot.dat 575198688 primary_namespace ID secondary_namespaces ACC VERSION The seqdatabase.ini file looks like this: [swissprot] protocol=flat location=/home/bradley/biodb_index/registryspandtrembl dbname=registrySPandTrEMBL My problem starts when attempting to use the registry. use Bio::DB::Registry; my $registry = new Bio::DB::Registry; my $sp = $registry->get_database('swissprot'); I am getting the following error: -------------------- WARNING --------------------- MSG: Couldn't call new_from_registry on [Bio::DB::Flat] ------------- EXCEPTION ------------- MSG: you must specify an indexing scheme STACK Bio::DB::Flat::new /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Flat.pm:163 STACK Bio::DB::Flat::new_from_registry /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Flat.pm:254 STACK (eval) /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Registry.pm:182 STACK Bio::DB::Registry::_load_registry /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Registry.pm:181 STACK Bio::DB::Registry::new /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Registry.pm:97 STACK toplevel /home/bradley/PIPELINE/perl/scripts/Local_SP_EMBL_Retrieval.pl:232 -------------------------------------- By poking around in Flat.pm it appears that Flat.pm is unable to read the index scheme from config.dat because it can't open config.dat. I know this because I edited the Flat.pm with two print statements and only the first statement is executed : # read the configuration file sub _read_config { my $self = shift; my $path = $self->_config_path; print "in sub _read_config\n"; ## this works return unless -e $path; print "reading config\n"; ## this doesn't execute open (F,$path) or $self->throw("open error on $path: $!"); my %config; while () { ***************************** Michael E. Bradley, Ph.D. Postdoctoral Researcher University of Florida Department of Chemistry Foundation for Applied Molecular Evolution Ph. (352) 271-7005 Fax (353) 271-7076 mebradley@chem.ufl.edu **************************** From brian_osborne at cognia.com Fri Jul 30 12:19:31 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Jul 30 12:21:01 2004 Subject: [Bioperl-l] Regarding Bio-Informatics Softwares. In-Reply-To: <20040730080206.1945.qmail@webmail17.rediffmail.com> Message-ID: Amey, See: http://doc.bioperl.org/releases/bioperl-1.4/Bio/Restriction/Enzyme.html http://doc.bioperl.org/releases/bioperl-1.4/Bio/Restriction/EnzymeCollection .html http://bioperl.org/Core/Latest/bptutorial.html Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of amey vinod deshpande Sent: Friday, July 30, 2004 4:02 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Regarding Bio-Informatics Softwares. Hello We are students of Information Technology Engg, from India. We are undertaking a project in bio-informatics i.e. Restriction Mapper for type 2 Restriction Enzymes. So could u guide us regarding the same. Our mail IDs are amey_desh@rediffmail.com kirty_modi@sify.com abhijeet_e@rediffmail.com Thanking you. From hlapp at gmx.net Fri Jul 30 12:27:18 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jul 30 12:28:30 2004 Subject: [Bioperl-l] Load_ontology.pl warnings and exceptions In-Reply-To: <10C94843061E094A98C02EB77CFC328722FE63@nrcmrdex1d.imsb.nrc.ca> Message-ID: <533D4E00-E245-11D8-BD4F-000A959EB4C4@gmx.net> It doesn't seem you need to be concerned. The warnings are most likely all triggered by entries that have been obsoleted or merged. It does appear though that your unique index on the term table does not include the Is_Obsolete column. Unless you made some strict decisions to keep obsolete terms out and you enforce them in each load or update by the respective choice of arguments, I'd recommend you change the unique key to be over the tuple of (name, is_obsolete, ontology_id). -hilmar On Wednesday, July 21, 2004, at 07:50 PM, Law, Annie wrote: > Hi, > > Previously, I used load_ontology.pl and got about 5 exceptions or > warnings. > Recently, while using the same bioperl and bioperl-db > Setups. I tried to use the same version of load_ontology.pl again with > the > latest information from GO ontology. Now it seems that I get about 90 > warnings exceptions about 50 listed after ..terms and others listed > after > ...relationships during the run. I'm pretty sure that only the input > to the > scripts has changed. Is this a normal outcome? > > Are these warnings only a reflection of the source file and an > annotation of > work in progress or is there something that I > Am missing? I wanted to eliminate the interference of previously > existing > data so I went and created a new database, loaded the > Bioperl schema and used the load_ncbi_taxonomy.pl script then I used > the > load_ontology.pl script. > > Here is the output of the run. It seems to me that they are all > complaints > of duplicate entries. > All of the bioperl is 1.4 and was installed around this past March. > > I would appreciate some insight. > Thanks, > Annie. > > Script started on Wed 21 Jul 2004 01:27:45 PM EDT >> perl /root/bioperl-db/scripts/biosql/load_ontology.pl --dbuser= > =user1 --dbpass=pass1 --dbname mydatabase --safe --computetc > --noobsolete > --names > space "Gene Ontology" --format goflat --fmtargs > "-defsfile,/root/bioperl-db/data/GO.defs" /ro > oot/bioperl-db/data/function.ontology > /root/bioperl-db/data/process.ontology > /root/bioperl-db > b/data/component.ontology > Parsing input ... > Loading ontology Gene Ontology: > ... terms > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were > ("GO:0001529","elastin","","") FKs (1) > Duplicate entry 'elastin-1' for key 2 > --------------------------------------------------- > Could not store GO:0001529 (elastin): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be > found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were > ("GO:0005581","collagen","","") FKs (1) > Duplicate entry 'collagen-1' for key 2 > --------------------------------------------------- > Could not store GO:0005581 (collagen): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be > found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were > ("GO:0005676","condensin complex","","") FKs (1) > Duplicate entry 'condensin complex-1' for key 2 > --------------------------------------------------- > Could not store GO:0005676 (condensin complex): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be > found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were > ("GO:0005699","kinetochore","","") FKs (1) > Duplicate entry 'kinetochore-1' for key 2 > --------------------------------------------------- > Could not store GO:0005699 (kinetochore): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be > found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were > ("GO:0005716","synaptonemal complex","","") FKs (1) > Duplicate entry 'synaptonemal complex-1' for key 2 > --------------------------------------------------- > Could not store GO:0005716 (synaptonemal complex): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be > found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were > ("GO:0005717","chromatin","","") FKs (1) > Duplicate entry 'chromatin-1' for key 2 > --------------------------------------------------- > Could not store GO:0005717 (chromatin): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be > found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were > ("GO:0005718","nucleosome","","") FKs (1) > Duplicate entry 'nucleosome-1' for key 2 > --------------------------------------------------- > Could not store GO:0005718 (nucleosome): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be > found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were > ("GO:0005733","small nucleolar RNA","","") FKs (1) > Duplicate entry 'small nucleolar RNA-1' for key 2 > --------------------------------------------------- > Could not store GO:0005733 (small nucleolar RNA): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be > found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were > ("GO:0006326","bent DNA binding","","") FKs (1) > Duplicate entry 'bent DNA binding-1' for key 2 > --------------------------------------------------- > Could not store GO:0006326 (bent DNA binding): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be > found > by unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /root/bioperl-db/scripts/biosql/load_ontology.pl:508 > STACK toplevel /root/bioperl-db/scripts/biosql/load_ontology.pl:490 > > -------------------------------------- > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mjanis at chem.ucla.edu Fri Jul 30 16:48:02 2004 From: mjanis at chem.ucla.edu (Michael Janis) Date: Fri Jul 30 19:55:07 2004 Subject: [Bioperl-l] RNA fold Message-ID: <20040730204802.GH21869@GFC1.chem.ucla.edu> Hello, First time posting... I'd like to re-open a discussion thread that was started Fri Dec 5, 2003 by Vesselin Baev concerning the existence / need for theoretical RNA fold output parsers (such as RNAfold, mfold, and the like). Chris Fields posted intentions to work on such a project, and the thread morfed into considerations of ct output (which I currently use in my own database for structural information) vs. bracket notation output vs. RNAML for storage and interpretation of structural data. The licensing for mfold is restrictive in that it cannot be re-distributed freely. However, the extensive mfold sub-optimal fold lists are an important consideration when probing hypothetical folds (especially since it's really guesswork to assign parameters such as temperature and ion content). mfold gives .ct output like other programs, which can be easily converted on the fly to any bracket notation you like (I personally store covariance information in my extended bracket notation using lots of canonical and non-canonical specific characters). However, bracket notation, as pointed out, is great for inline GFF db tables (such as the '$feat->add_tag_value('secondary_structure',$str);' suggestion from Jason Stajich) but really does not carry forward all covariance information. .ct output is just the opposite in terms of GFF db format - not exactly inline, but a wealth of structural and primary sequence information is retained in this format. So the question is, what work has been done in this area? My knowledge expertise breaks down when I try to incorporate my .ct db tables with my GFF - built dbase. In other words, I lose the ability to utilize bioperl tools to query and analyze this data since I have deviated so far from the standard Bio::DB::GFF dbase format. I'd like to work to create a parser for .ct output that fits well into a bp scheme (like Seq::Meta etc.), and while RNAML seems overly complicated for my needs, it would be nice to have a common data definition that supercedes all others in information content, thus allowing a SeqIO like converter to load / dump data from such a master data definition (with warnings where appropriate). Before I begin, however, I would like to know if any further work has been done in this area. Likewise feedback from others much better at bioperl than myself: suggestions for storing such lengthy .ct definitions within the GFF framework, where each potential fold may have suboptimal folds grouped together, each with their own .ct data. Apologies for the train of thought style of email. Yours, Michael Janis -- Michael Janis, UCLA Biochemistry Graduate Student Every message PGP signed. "The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong, it usually turns out to be impossible to get at or repair." -Douglas Adams From hlapp at gmx.net Sat Jul 31 10:47:15 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jul 31 10:48:24 2004 Subject: [Bioperl-l] SwissProt/UniProt GN line format changed In-Reply-To: <1090426332.4365.337.camel@shire.cgb.ki.se> Message-ID: <839F27C9-E300-11D8-8420-000A959EB4C4@gmx.net> I added the capability to parse this and a test to the main trunk. The parser will write out the GN line(s) in the old format though. Does anybody have the requirement at this point that the parser write the new format? The problem with writing the new format is that that would require some additions to the object model, because it would require context for the individual annotation values ('Name', 'Synonym', 'OrderedLocusName', or ORFName). Presently, annotation values do not have context in Bioperl. -hilmar On Wednesday, July 21, 2004, at 05:12 PM, Boris Lenhard wrote: > I do not know if it has been discussed yet, but the GN (gene name) line > format recent versions of SwissProt files has been changed: > > e.g. old: > > GN ZNF36 OR KOX18 OR ZNF139. > > new: > > GN Name=RCHY1; Synonyms=ZNF363, CHIMP, ARNIP; > > > This renders Bio::SeqIO::swiss unable to parse the GN line; as a > consequence, the resulting annotation object lacks the 'gene_name' key. > > Boris > > -- > > ========================================== > Boris Lenhard, Ph.D. > Group Leader, Applied Genome Informatics > Center for Genomics and Bioinformatics > Karolinska Institutet > Berzelius v?g 35, B326b > 171 77 Stockholm, SWEDEN > Phone: +46 (0)8 5248 6391 > FAX: +46 (0)8 32 48 26 > E-mail: Boris.Lenhard@cgb.ki.se > ========================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cjfields at uiuc.edu Sat Jul 31 11:42:15 2004 From: cjfields at uiuc.edu (Chris Fields) Date: Sat Jul 31 11:44:45 2004 Subject: [Bioperl-l] RNA fold In-Reply-To: <20040730204802.GH21869@GFC1.chem.ucla.edu> References: <20040730204802.GH21869@GFC1.chem.ucla.edu> Message-ID: <324F907B-E308-11D8-8A39-000A9568B714@uiuc.edu> On Jul 30, 2004, at 3:48 PM, Michael Janis wrote: > Hello, > > First time posting... > > I'd like to re-open a discussion thread that was started Fri Dec 5, > 2003 by Vesselin Baev concerning the existence / need for theoretical > RNA fold output parsers (such as RNAfold, mfold, and the like). Chris > Fields posted intentions to work on such a project, and the thread > morfed into considerations of ct output (which I currently use in my > own database for structural information) vs. bracket notation output > vs. RNAML for storage and interpretation of structural data. > > The licensing for mfold is restrictive in that it cannot be > re-distributed freely. Somewhat true. Basically, you need to agree to a license when using the software (like most software, even freeware). The main difference is that the license needs to be signed by the end-user (usually the PI or the institution). One could always use the web interface for most analyses, but for the (relatively few) who want to modify some of the parameters, the licensed program is available. You can actually download it from the web now (the link is found here: http://www.bioinfo.rpi.edu/~zukerm/rna/mfold-3.1.html). The other alternative is using the Vienna Package, which comes with a perl interface. > However, the extensive mfold sub-optimal fold lists are an important > consideration when probing hypothetical folds (especially since it's > really guesswork to assign parameters such as temperature and ion > content). I disagree. The mfold parameters are based on real-world experimentation to determine conditions for folding based on different temperatures and ionic conditions. Biochemically and biologically speaking, the temperature and ionic range for a particular fold can be extrapolated from other studies (such as optimum growth temp, in vivo ionic conditions, etc) to determine approximate folds (key word being approximate as mfold doesn't predict pseudoknots or tertiary interactions). For instance, E. coli grows best at 37 deg. C, and the detailed biochemical makeup of the cell has been determined (including ionic concentrations in vivo). If you were doing something like RNA interference, then learning these conditions is very important. In essence, there's no "guesswork" involved; just a bit of research. > mfold gives .ct output like other programs, which can be easily > converted on the fly to any bracket notation you like (I personally > store covariance information in my extended bracket notation using > lots of canonical and non-canonical specific characters). However, > bracket notation, as pointed out, is great for inline GFF db tables > (such as the '$feat->add_tag_value('secondary_structure',$str);' > suggestion from Jason Stajich) but really does not carry forward all > covariance information. .ct output is just the opposite in terms of > GFF db format - not exactly inline, but a wealth of structural and > primary sequence information is retained in this format. The problem with RNA notations right now is the use of different formats in notation. It would be great to have a standard notation for all of these, which is what RNAML is about. > So the question is, what work has been done in this area? My > knowledge expertise breaks down when I try to incorporate my .ct db > tables with my GFF - built dbase. In other words, I lose the ability > to utilize bioperl tools to query and analyze this data since I have > deviated so far from the standard Bio::DB::GFF dbase format. I think Jason (or somebody else?) had mentioned that one could store a tag for a file location containing the information. The file could then be opened and parsed. > I'd like to work to create a parser for .ct output that fits well into > a bp scheme (like Seq::Meta etc.), and while RNAML seems overly > complicated for my needs, it would be nice to have a common data > definition that supercedes all others in information content, thus > allowing a SeqIO like converter to load / dump data from such a master > data definition (with warnings where appropriate). Before I begin, > however, I would like to know if any further work has been done in > this area. I haven't worked on it in a while b/c of benchwork taking first priority. However, I plan on returning to it at some point, starting with a RNAmotif parser. You might want to check bioperl-run. I believe there are some modules for the Vienna programs (RNAFold, etc.) and the Pise mfold interface by Catherine Ledontal. As mentioned above, the Vienna package also has a perl interface (though not affiliated with Bioperl). > Likewise feedback from others much better at bioperl than myself: > suggestions for storing such lengthy .ct definitions within the GFF > framework, where each potential fold ma! > y have suboptimal folds grouped together, each with their own .ct data. I would say store tags in the GFF framework for the file location containing structural information to get around storing this very complex data. I can't see GFF storing very complex information in the current form w/o making the format much more (unnecessarily) complicated. > Apologies for the train of thought style of email. > > Yours, > > Michael Janis > -- > > > Michael Janis, UCLA Biochemistry Graduate Student > Every message PGP signed. > > "The major difference between a thing that might go wrong and a thing > that cannot possibly go wrong is that when a thing that cannot > possibly go wrong goes wrong, it usually turns out to be impossible to > get at or repair." > -Douglas Adams > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > Chris Fields Postdoctoral Reseacher - Dept. of Biochemistry Laboratory of Dr. Robert Switzer University of Illinois at Urbana-Champaign From birney at ebi.ac.uk Sat Jul 31 14:05:41 2004 From: birney at ebi.ac.uk (Ewan Birney) Date: Sat Jul 31 14:07:14 2004 Subject: [Bioperl-l] SwissProt/UniProt GN line format changed In-Reply-To: <839F27C9-E300-11D8-8420-000A959EB4C4@gmx.net> Message-ID: On Sat, 31 Jul 2004, Hilmar Lapp wrote: > I added the capability to parse this and a test to the main trunk. > > The parser will write out the GN line(s) in the old format though. Does > anybody have the requirement at this point that the parser write the > new format? > > The problem with writing the new format is that that would require some > additions to the object model, because it would require context for the > individual annotation values ('Name', 'Synonym', 'OrderedLocusName', or > ORFName). Presently, annotation values do not have context in Bioperl. > Or rather I think we would have to make a new Annotation type, being something like 'MultiTaggedValue' which would have tags 'Name', 'Synonym' etc and then the 'gene_name' annotation key would give a list of 'MultiTaggedValues' --- presumably with some magic to detect "old style" simplevalue tags as well. I might need this in the ensembl-bioperl bridge, so I'll keep this in mind. > -hilmar > > On Wednesday, July 21, 2004, at 05:12 PM, Boris Lenhard wrote: > > > I do not know if it has been discussed yet, but the GN (gene name) line > > format recent versions of SwissProt files has been changed: > > > > e.g. old: > > > > GN ZNF36 OR KOX18 OR ZNF139. > > > > new: > > > > GN Name=RCHY1; Synonyms=ZNF363, CHIMP, ARNIP; > > > > > > This renders Bio::SeqIO::swiss unable to parse the GN line; as a > > consequence, the resulting annotation object lacks the 'gene_name' key. > > > > Boris > > > > -- > > > > ========================================== > > Boris Lenhard, Ph.D. > > Group Leader, Applied Genome Informatics > > Center for Genomics and Bioinformatics > > Karolinska Institutet > > Berzelius väg 35, B326b > > 171 77 Stockholm, SWEDEN > > Phone: +46 (0)8 5248 6391 > > FAX: +46 (0)8 32 48 26 > > E-mail: Boris.Lenhard@cgb.ki.se > > ========================================== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From rmaps005 at cib.csic.es Fri Jul 30 13:06:35 2004 From: rmaps005 at cib.csic.es (rmaps005@cib.csic.es) Date: Sun Aug 1 18:10:38 2004 Subject: [Bioperl-l] xml converter Message-ID: <200407301706.i6UH6xgN017334@calisto.cib.csic.es> hello, i am newbie at bioperl and was trying to convert a hmmer result to XML format. I found that there is a package to convert genbank to XML(gb2xml) , is there any for hmmer results? IlohaCIB From zongli at email.unc.edu Fri Jul 30 13:27:08 2004 From: zongli at email.unc.edu (Zong-Li Xu) Date: Sun Aug 1 18:10:43 2004 Subject: [Bioperl-l] add in mail list Message-ID: <410A84EC.4090006@email.unc.edu>