From gulban at sickkids.ca Mon Dec 1 14:12:38 2003 From: gulban at sickkids.ca (omid gulban) Date: Mon Dec 1 18:18:15 2003 Subject: [Bioperl-l] running remoteblast Message-ID: <000801c3b83f$15be4f70$6fc1148e@omid> Hi, I would like to know how to run a remoteblast query using bioperl modules. I have looked in the FAQs but the instructions provided were for StandaloneBlast. I followed the instruction by received the following message: -------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- I assume that you need to install blast on your local machine. I am new to Bioperl. I don't know what modules to use? How can I run a simple blast querry using NCBI? Thanks ACGT -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031201/a8348683/attachment.htm From jason at cgt.duhs.duke.edu Mon Dec 1 20:14:12 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Dec 1 20:20:15 2003 Subject: [Bioperl-l] running remoteblast In-Reply-To: <000801c3b83f$15be4f70$6fc1148e@omid> References: <000801c3b83f$15be4f70$6fc1148e@omid> Message-ID: Bio::Tools::Run::RemoteBlast is for running BLAST on NCBI through their web interface. -jason On Mon, 1 Dec 2003, omid gulban wrote: > Hi, > > I would like to know how to run a remoteblast query using bioperl modules. I have looked in the FAQs but the instructions provided were for StandaloneBlast. I followed the instruction by received the following message: > > > -------------------- WARNING --------------------- > MSG: cannot find path to blastall > --------------------------------------------------- > > I assume that you need to install blast on your local machine. > > I am new to Bioperl. I don't know what modules to use? > How can I run a simple blast querry using NCBI? > > > Thanks > > ACGT -- Jason Stajich Duke University jason at cgt.mc.duke.edu From Wiepert.Mathieu at mayo.edu Tue Dec 2 08:11:15 2003 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Tue Dec 2 08:39:12 2003 Subject: [Bioperl-l] running remoteblast Message-ID: <2F41CC6C9777D311ACBD009027B108EA06E9AC85@excsrv32.mayo.edu> Hi, As Jason pointed out, you need to use the RemoteBlast module. I think you may have missed the documentation (can't verify how up to date, but it looks current to me) http://www.bioperl.org/Core/Latest/bptutorial.html#iii.4.1_running_blast_(using_remoteblast.pm ) And there is also documentation in the header for the object, which can be viewed in the cvs repository at http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Tools/Run/RemoteBlast.pm?rev=1.17 &cvsroot=bioperl&content-type=text/vnd.viewcvs-markup (or just look at the object on your local machine with your favorite editor or text browser. HTH, -mat -----Original Message----- From: omid gulban [mailto:gulban@sickkids.ca] Sent: Monday, December 01, 2003 1:13 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] running remoteblast Hi, I would like to know how to run a remoteblast query using bioperl modules. I have looked in the FAQs but the instructions provided were for StandaloneBlast. I followed the instruction by received the following message: -------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- I assume that you need to install blast on your local machine. I am new to Bioperl. I don't know what modules to use? How can I run a simple blast querry using NCBI? Thanks ACGT -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031202/59f4db24/attachment-0001.htm From Postmaster at fmr.com Tue Dec 2 08:46:09 2003 From: Postmaster at fmr.com (Postmaster@fmr.com) Date: Tue Dec 2 08:52:25 2003 Subject: [Bioperl-l] Error: 5X35#42 Mail delivery problem Message-ID: <200312021352.hB2DqNg0011952@portal.open-bio.org> Error: 5X35#42 Mail delivery problem Your message (hB2DeC3E031800) Subject: Bioperl-l Digest, Vol 8, Issue 1 Sent: Tue, 2 Dec 2003 08:43:51 -0500 to the following recipient(s) could not be delivered. kevin pease Please contact the recipient via alternate channels to communicate your message. PLEASE DO NOT REPLY TO THIS MESSAGE Thank you, Fidelity Investments Mail System Error: 5X35#42 From heikki at nildram.co.uk Tue Dec 2 15:25:37 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 2 15:31:56 2003 Subject: [Bioperl-l] guessing sequence format Message-ID: <200312022025.37843.heikki@nildram.co.uk> Andreas K?h?ri has written a module that gives SeqIO and AlignIO ability to look into input files and guess the format of the sequence: Bio::Tools::GuessSeqFormat. See the POD docs in the module for formats and details. Initial modifications to Bio::SeqIO::new() and Bio::AlignIO::new() to try to determine the format in this order: 1. given in argument (-format) 2. based on the file name extension 3. looking into file by calling Bio::Tools::GuessSeqFormat No verification of the format is done if conditions 1 or 2 are met. I think it would be neat to have an option to do that. It could, for example, be linked to verbosity. Suggestions or implementations are welcome. Tests have been written for reading all formats from files and even reading from a file handle works which is really cool: ----------------- snip -------------------- use IO::String; use Bio::SeqIO; my $string = ">test1 no comment agtgctagctagctagctagct >test2 no comment gtagttatgc "; my $stringfh = new IO::String($string); my $seqio = new Bio::SeqIO(-fh => $stringfh); while( my $seq = $seqio->next_seq ) { print $seq->id, "\n"; } ----------------- snip -------------------- It would really good if people could try this out now so thatthe possible bugs could be ironed out before the 1.4. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Tue Dec 2 15:37:23 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 2 15:44:18 2003 Subject: [Bioperl-l] Re: What happened to my dpAlign module? In-Reply-To: <49A57E7C-24C4-11D8-A49D-000A958C5008@virginia.edu> References: <49A57E7C-24C4-11D8-A49D-000A958C5008@virginia.edu> Message-ID: <200312022037.03484.heikki@ebi.ac.uk> Aaron, I have been concentrating in getting the bioperl-core in shape and have not done anything to bioperl-ext and bioperl-run. I was hoping of getting them out some time after core. Is the ext in shape? There has not been that many changes in there, has there? What is your feeling? Could we release a developer snap shot together with the bioperl-core at the end of the week and epect to be able to release it in a week or two maximum? -Heikki On Tuesday 02 Dec 2003 12:37 pm, Aaron J.Mackey wrote: > Hi Yee, > > It is still in bioperl-live CVS, and is available in developer's > releases (the 1.3.x series); it will presumably be a part of the stable > 1.4.x release series, whenever that happens (Heikki knows more about > this than I do). > > -Aaron > > On Dec 1, 2003, at 2:34 PM, Yee Man Chan wrote: > > Hi Aaron > > > > I looked at the latest bioperl release but I couldn't find > > anything about my dpAlign module. What is its status now? You can find > > my > > code at > > > > http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz > > http://www.stanford.edu/~yeeman/dpAlign.pm > > > > Thanks > > Yee Man -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From fsanchez at cifn.unam.mx Tue Dec 2 16:00:55 2003 From: fsanchez at cifn.unam.mx (=?ISO-8859-1?Q?Fabiola_S=E1nchez?=) Date: Tue Dec 2 15:56:19 2003 Subject: [Bioperl-l] parser genbank Message-ID: <3FCCFD87.9020200@cifn.unam.mx> Hi ! How can I get Keywords and Comments from a file in Genbank format Thanks Fabi From lstein at cshl.edu Tue Dec 2 16:13:30 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Dec 2 16:19:48 2003 Subject: [Bioperl-l] Re: [DAS] ProServer, a pluggable DAS server, Bio::SeqIO support added In-Reply-To: <20031128164645.GA27626@ebi.ac.uk> References: <20031128164645.GA27626@ebi.ac.uk> Message-ID: <200312021613.30476.lstein@cshl.edu> Neat! Can you turn this into a one paragraph news announcement? I will post it to the biodas web site. Lincoln On Friday 28 November 2003 11:46 am, Andreas Kahari wrote: > Hi lists (sorry for the cross-posting), > > This for those of you who are interested in DAS but not aware of > ProServer: > > ProServer is a DAS server implementation written in Perl by > Roger Pettett at the Sanger Institute, here outside Cambridge in > the UK. It builds on top of ideas from Tony Cox, also at the > Sanger Institute. > > The point with ProServer is that it is pluggable, so that any > data source may be used as a source to serve DAS features from, > as long as there is source adaptor and a transport module for > it. There are source adaptors already written for a number of > types of sources, and they are fairly easy to extend to other > types of sources or transports (I recently wrote a toy "wgetz" > transport module from the already existing "getz" module which > is used by the Swissprot source adaptor). Other DAS servers > often requires you to create a dedicated database of DAS data. > > I thought it might be of interest to a couple of you to note > that you now also can serve features or sequence data from any > type of file that Bio::SeqIO can read. This, of course, is > only of interest to people with smallish amount of data since > queries are looked up sequentially in the files (unless the > Bio::DB::Flat support in the code is used, which reduces the > lookup time but which doesn't support all formats). > > ProServer is part of the Bio-Das2 module in the biodas CVS > repository: > > > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/Bio-Das2/?cvsroot=biodas > > > > Cheers, > Andreas -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From brian_osborne at cognia.com Tue Dec 2 16:45:27 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Dec 2 16:55:05 2003 Subject: [Bioperl-l] parser genbank In-Reply-To: <3FCCFD87.9020200@cifn.unam.mx> Message-ID: Fabiola, Something like this, I think: my $io = Bio::SeqIO->new(-file => $file, -format => "genbank" ); my $seq_obj = $io->next_seq; my $anno_collection = $seq_obj->annotation; my @annotations = $anno_collection->get_Annotations('comment'); # or 'keyword' foreach my $value ( @annotations ) { print "tagname : ", $value->tagname, "\n"; print "annotation value: ", $value->as_text, "\n"; } I didn't actually run this though... Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Fabiola S?nchez Sent: Tuesday, December 02, 2003 4:01 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] parser genbank Hi ! How can I get Keywords and Comments from a file in Genbank format Thanks Fabi _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Tue Dec 2 16:53:57 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Dec 2 17:00:25 2003 Subject: [Bioperl-l] GFF file output missing semicolon In-Reply-To: <3FC15033.7030500@csiro.au> References: <3FBD9254.8080008@csiro.au> <200311211241.25744.lstein@cshl.edu> <3FC15033.7030500@csiro.au> Message-ID: <200312021653.57400.lstein@cshl.edu> OK, this is a bug with the Bio::DB::GFF parsing code. I will fix it. Lincoln On Sunday 23 November 2003 07:26 pm, Wes Barris wrote: > Lincoln Stein wrote: > > Hi, > > > > The GFF2 spec specifies that the semicolon separates tag/value pairs. It > > does not say that the last tag/value should be terminated by a semicolon. > > It also specifies that any amount of whitespace can occur around the > > semicolon. > > Ok, fair enough. But then, gbrowse appears to not be able to handle this > format properly. I know that I must be wrong about this but this is what > I am seeing. > > Here is a gff line as created by Bio::Tools::GFF: > > AF354168 blast s-m-100-10 61437 61530 186 - . > Note "QRNA Feature sheep vs. mouse RNA logoddspost=14.021" ; Accession > "sheep_#25_61538..61445" > > Note that there is a lot of wrapping going on when displayed in this > message. > > If I load this file (using fast_load_gff.pl) into a mysql database and view > with gbrowse, there are two problems: > > 1) The accession is displayed above the item inside double quotes like > this: "sheep_#25_61538..61445". > > 2) When mousing over the item, neither the accession nor the start and end > are displayed. Instead all I see is the track key: > QRNA Sheep-Mouse 100-10: > > If I manually add a semi-colon after the accession at the end of each line > of the gff file and load that into the mysql database, gbrowse proplerly > displays these two items like this: > > sheep_#25_61538..61445 (note no double quote marks any more) > > QRNA Sheep-Mouse 100-10: sheep_#25_61538..61445 AF354168: 61437..61530 > > > Lincoln > > > > On Thursday 20 November 2003 11:19 pm, Wes Barris wrote: > >>Hi, > >> > >>I have written a bioperl program that parses blast files and generates > >>a gff file. I have everything working except there is one small detail > >>that I have not been able to figure out. When generating each line > >>of gff output, the semicolon is left off at the end of the Accession > >>name. Here is a sample line from a gff file that I generated: > >> > >>AF354168 mirseeker pred_miRNA 188152 188251 198 - > >> . Note "mirseeker score 17.58" ; Accession > >>"s-h_19_r_99330000-99363000" > >> > >>Notice that: > >> > >>1) There are three space characters after the note and the semicolon > >> that occurs before "Accession". > >> > >>2) At the end of the line, after the Accession, there are three space > >> characters and no semicolon. Without that semicolon, the genome > >> browser doesn't display the "rollover" information properly. > >> > >>3) The "Note" field is written before the "Accession" field. I thought > >> that the Accession should come first. > >> > >>Here is the relevant portion of my code: > >> > >> while( my $hsp = $hit->next_hsp ) { > >> my $strand = 1; > >> $strand = -1 if ($hsp->strand('query') == -1 || > >>$hsp->strand('hit') == -1); my $feature = new Bio::SeqFeature::Generic( > >> -source_tag=>$source, > >> -primary_tag=>$feature_type, > >> -start=>$hsp->start('hit'), > >> -end=>$hsp->end('hit'), > >> -score=>$hit->raw_score, > >> -strand=>$strand, > >> -tag=>{ > >> Accession=>$result->query_name, > >> Note=>$result->query_description, > >> } > >> ); > >> $feature->seq_id($hit->accession); > >> $gffio->write_feature($feature); #Bio::SeqFeatureI > >> } > >> > >>Perhaps I am not adding the "Accession" and "Note" fields properly??? -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From lstein at cshl.edu Tue Dec 2 16:56:43 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Dec 2 17:03:38 2003 Subject: [Bioperl-l] Graphics:Panel /SeqFeature::Generic In-Reply-To: <3FC1FF30.2010101@biologie.uni-freiburg.de> References: <3FB89DB0.5070303@biologie.uni-freiburg.de> <200311181353.34763.lstein@cshl.edu> <3FC1FF30.2010101@biologie.uni-freiburg.de> Message-ID: <200312021656.43782.lstein@cshl.edu> Hi Dan, You need to check that the tag value exists with has_tag() before trying to fetch it. If you create an key at the bottom, then a synthetic feature will be created and passed to your callback. It's probably raising the exception. Lincoln On Monday 24 November 2003 07:53 am, Daniel Lang wrote: > Hi again, > I tested it also on the command line using an additional tag called > 'sig' but the problem is the same: > ------------- EXCEPTION ------------- > MSG: asking for tag value that does not exist sig > STACK Bio::SeqFeature::Generic::get_tag_values > /usr/lib/perl5/site_perl/5.6.1/Bio/SeqFeature/Generic.pm:504 > STACK main::__ANON__ ./htmlresult1.pl:105 > STACK (eval) > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph/Factory.pm:394 > STACK Bio::Graphics::Glyph::Factory::option > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph/Factory.pm:394 > STACK Bio::Graphics::Glyph::option > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph.pm:321 > STACK Bio::Graphics::Glyph::bgcolor > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph.pm:386 > STACK Bio::Graphics::Glyph::filled_box > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph.pm:830 > STACK Bio::Graphics::Glyph::draw_component > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph.pm:959 > STACK Bio::Graphics::Glyph::segments::draw_component > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph/segments.pm:63 > STACK Bio::Graphics::Glyph::draw > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph.pm:650 > STACK Bio::Graphics::Glyph::generic::draw > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph/generic.pm:107 > STACK Bio::Graphics::Glyph::draw > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph.pm:642 > STACK Bio::Graphics::Glyph::generic::draw > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph/generic.pm:107 > STACK Bio::Graphics::Glyph::track::draw > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Glyph/track.pm:21 > STACK Bio::Graphics::Panel::gd > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Panel.pm:461 > STACK Bio::Graphics::Panel::png > /usr/lib/perl5/site_perl/5.6.1/Bio/Graphics/Panel.pm:781 > STACK main::create_overview ./htmlresult1.pl:147 > STACK toplevel ./htmlresult1.pl:20 > > -------------------------------------- > Any hints? > > Daniel > > Lincoln Stein wrote: > > Hi Dan, > > > > Try changing the "generic" glyph to "segments." The first glyph doesn't > > know how to deal with subparts (such as HSPs), the second does. > > > > Lincoln > > > > On Monday 17 November 2003 05:06 am, Daniel Lang wrote: > >>Hi, > >>I want to generate overview graphics from BLAST reports, where the hits > >>are sorted and colored (>1e-10 -->green, ...)according their evalues... > >> > >>So I thought, I could solve this using a callback function for the > >>bgcolor and using the 'low_score' sort_order, but when applied to a > >>BLAST report, it results in sorted but only red hits? > >>I also tried introducing the evalues as additional tags like done with > >>'bits' or 'range', but when testing for this tag in the callback > >>(has_tag) its not available? > >>So I wander if the function is envoked for each hit in the while loop? > >> > >>Here the code sniplet: > >> > >>my $track = $panel->add_track(-glyph => 'generic', > >> -label => 1, > >> -connector => 'dashed', > >> -height => 5, > >> -bgcolor => sub { > >> my $feature = shift; > >> my $evalue = $feature->score; > >> if ($evalue < 1e-10) {return 'green';} > >> else {return 'red';}} > >> , > >> -fontcolor => 'green', > >> -font2color => 'red', > >> -sort_order => 'low_score', > >> -min_score => '1e-1000', > >> -max_score => '10000', > >> -description => sub { > >> my $feature = shift; > >> return unless > >> $feature->has_tag('bits'); my ($description) = > >>$feature->each_tag_value('bits'); > >> my $score = $feature->score; > >> my ($range) = > >>$feature->each_tag_value('range'); > >> "Score=$description bits, E-value=$score, $range"; > >> }); > >> > >> while( my $hit = $result->next_hit ) { > >> my $evalue = $hit->significance; > >> my $feature = Bio::SeqFeature::Generic->new(-score => $evalue, > >> -display_name => $hit->name, > >> -tag => { 'bits' => $hit->bits, > >> 'range' => "from ". $hit->start('query') . " to " . > >>$hit->end('query'), > >> }, > >> ); > >> while( my $hsp = $hit->next_hsp ) { > >> $feature->add_sub_SeqFeature($hsp,'EXPAND'); > >> } > >> $track->add_feature($feature); > >> } > >> > >>Thanks in advance, > >>Daniel > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l@portal.open-bio.org > >>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From amackey at pcbi.upenn.edu Tue Dec 2 17:00:10 2003 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Dec 2 17:06:26 2003 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: <200312022004.hB2K40pw024793@pub.open-bio.org> References: <200312022004.hB2K40pw024793@pub.open-bio.org> Message-ID: Does this actually work with piped filehandles (i.e. STDIN): perl -MBio::SeqIO -le 'print Bio::SeqIO->new(-fh=>\*STDIN)->next_seq->id' < test.gbk -Aaron On Dec 2, 2003, at 3:04 PM, Heikki Lehvaslaiho wrote: > > heikki > Tue Dec 2 15:03:59 EST 2003 > Update of /home/repository/bioperl/bioperl-live/Bio > In directory pub.open-bio.org:/tmp/cvs-serv24770/Bio > > Modified Files: > AlignIO.pm SeqIO.pm > Log Message: > guessing sequence and align formats by looking into file > > bioperl-live/Bio AlignIO.pm,1.30,1.31 SeqIO.pm,1.69,1.70 > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/AlignIO.pm,v > retrieving revision 1.30 > retrieving revision 1.31 > diff -u -r1.30 -r1.31 > --- /home/repository/bioperl/bioperl-live/Bio/AlignIO.pm 2003/11/07 > 00:56:49 1.30 > +++ /home/repository/bioperl/bioperl-live/Bio/AlignIO.pm 2003/12/02 > 20:03:59 1.31 > @@ -299,6 +299,7 @@ > use Bio::LocatableSeq; > use Bio::SimpleAlign; > use Bio::Root::IO; > +use Bio::Tools::GuessSeqFormat; > @ISA = qw(Bio::Root::Root Bio::Root::IO); > > =head2 new > @@ -330,11 +331,17 @@ > my %param = @args; > @param{ map { lc $_ } keys %param } = values %param; # lowercase keys > my $format = $param{'-format'} || > - $class->_guess_format( $param{-file} || $ARGV[0] ) || > - 'fasta'; > + $class->_guess_format( $param{-file} || $ARGV[0] ); > + if ($param{-file}) { > + $format = Bio::Tools::GuessSeqFormat->new(-file => > $param{-file}||$ARGV[0] )->guess; > + } > + elsif ($param{-fh}) { > + $format = Bio::Tools::GuessSeqFormat->new(-fh => > $param{-fh}||$ARGV[0] )->guess; > + } > $format = "\L$format"; # normalize capitalization to lower case > + $class->throw("Unknown format given or could not determine it > [$format]") > + if $format eq 'unknown'; > > - # normalize capitalization > return undef unless( $class->_load_format_module($format) ); > return "Bio::AlignIO::$format"->new(@args); > } > > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO.pm,v > retrieving revision 1.69 > retrieving revision 1.70 > diff -u -r1.69 -r1.70 > --- /home/repository/bioperl/bioperl-live/Bio/SeqIO.pm 2003/10/28 > 07:07:42 1.69 > +++ /home/repository/bioperl/bioperl-live/Bio/SeqIO.pm 2003/12/02 > 20:03:59 1.70 > @@ -312,6 +312,7 @@ > use Bio::Factory::SequenceStreamI; > use Bio::Factory::FTLocationFactory; > use Bio::Seq::SeqBuilder; > +use Bio::Tools::GuessSeqFormat; > use Symbol(); > > @ISA = qw(Bio::Root::Root Bio::Root::IO > Bio::Factory::SequenceStreamI); > @@ -349,23 +350,29 @@ > sub new { > my ($caller,@args) = @_; > my $class = ref($caller) || $caller; > - > + > # or do we want to call SUPER on an object if $caller is an > # object? > if( $class =~ /Bio::SeqIO::(\S+)/ ) { > my ($self) = $class->SUPER::new(@args); > $self->_initialize(@args); > return $self; > - } else { > - > + } else { > + > my %param = @args; > @param{ map { lc $_ } keys %param } = values %param; # lowercase keys > - my $format = $param{'-format'} || > - $class->_guess_format( $param{-file} || $ARGV[0] ) || > - 'fasta'; > - $format = "\L$format"; # normalize capitalization to lower case > + my $format = $param{'-format'} || > + $class->_guess_format( $param{-file} || $ARGV[0] ); > > - # normalize capitalization > + if ($param{-file}) { > + $format = Bio::Tools::GuessSeqFormat->new(-file => > $param{-file}||$ARGV[0] )->guess; > + } > + elsif ($param{-fh}) { > + $format = Bio::Tools::GuessSeqFormat->new(-fh => > $param{-fh}||$ARGV[0] )->guess; > + } > + $format = "\L$format"; # normalize capitalization to lower case > + $class->throw("Unknown format given or could not determine it > [$format]") > + if $format eq 'unknown'; > return undef unless( $class->_load_format_module($format) ); > return "Bio::SeqIO::$format"->new(@args); > } > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > From heikki at nildram.co.uk Tue Dec 2 17:28:11 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 2 17:34:28 2003 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: References: <200312022004.hB2K40pw024793@pub.open-bio.org> Message-ID: <200312022228.11152.heikki@nildram.co.uk> Works for me: bala ~/src/bioperl/core> perl -MBio::SeqIO -le 'print Bio::SeqIO->new(-fh=>\*STDIN)->next_seq->id' < t/data/test.genbank DDU63596 bala ~/src/bioperl/core> -Heikki On Tuesday 02 Dec 2003 10:00 pm, Aaron J. Mackey wrote: > Bio::SeqIO->new(-fh=>\*STDIN)->next_seq->id' < test.gbk -Aaron -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Tue Dec 2 17:31:23 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 2 17:37:40 2003 Subject: [Bioperl-l] Re: What happened to my dpAlign module? In-Reply-To: <01D60568-2514-11D8-A49D-000A958C5008@virginia.edu> References: <200312022037.03484.heikki@ebi.ac.uk> <01D60568-2514-11D8-A49D-000A958C5008@virginia.edu> Message-ID: <200312022231.09403.heikki@ebi.ac.uk> On Tuesday 02 Dec 2003 10:08 pm, you wrote: > Sure; there hasn't been anything added to bioperl-ext except Yee's > dpAlign support code. OK. That makes it easy. I'll release the ext snap shot together with core. -Heikki > -Aaron > > On Dec 2, 2003, at 3:37 PM, Heikki Lehvaslaiho wrote: > > Aaron, > > > > I have been concentrating in getting the bioperl-core in shape and > > have not > > done anything to bioperl-ext and bioperl-run. I was hoping of getting > > them > > out some time after core. > > > > Is the ext in shape? There has not been that many changes in there, > > has there? > > What is your feeling? Could we release a developer snap shot together > > with the > > bioperl-core at the end of the week and epect to be able to release it > > in a > > week or two maximum? > > > > -Heikki > > > > On Tuesday 02 Dec 2003 12:37 pm, Aaron J.Mackey wrote: > >> Hi Yee, > >> > >> It is still in bioperl-live CVS, and is available in developer's > >> releases (the 1.3.x series); it will presumably be a part of the > >> stable > >> 1.4.x release series, whenever that happens (Heikki knows more about > >> this than I do). > >> > >> -Aaron > >> > >> On Dec 1, 2003, at 2:34 PM, Yee Man Chan wrote: > >>> Hi Aaron > >>> > >>> I looked at the latest bioperl release but I couldn't find > >>> anything about my dpAlign module. What is its status now? You can > >>> find > >>> my > >>> code at > >>> > >>> http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz > >>> http://www.stanford.edu/~yeeman/dpAlign.pm > >>> > >>> Thanks > >>> Yee Man > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From amackey at pcbi.upenn.edu Tue Dec 2 17:52:12 2003 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Dec 2 17:58:28 2003 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: <200312022228.11152.heikki@nildram.co.uk> References: <200312022004.hB2K40pw024793@pub.open-bio.org> <200312022228.11152.heikki@nildram.co.uk> Message-ID: <2ABDC2CA-251A-11D8-A49D-000A958C5008@pcbi.upenn.edu> Wow. I don't know how that can possibly work. It seems to be breaking the laws of UNIX pipes (which shouldn't be seekable). What am I missing here? -Aaron On Dec 2, 2003, at 5:28 PM, Heikki Lehvaslaiho wrote: > Works for me: > > bala ~/src/bioperl/core> perl -MBio::SeqIO -le 'print > Bio::SeqIO->new(-fh=>\*STDIN)->next_seq->id' < t/data/test.genbank > DDU63596 > bala ~/src/bioperl/core> > > -Heikki > > On Tuesday 02 Dec 2003 10:00 pm, Aaron J. Mackey wrote: >> Bio::SeqIO->new(-fh=>\*STDIN)->next_seq->id' < test.gbk > > -Aaron > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > From don79don at singnet.com.sg Tue Dec 2 14:03:34 2003 From: don79don at singnet.com.sg (Terry) Date: Tue Dec 2 21:58:22 2003 Subject: [Bioperl-l] ComAlign Message-ID: <3FCCE206.000007.00596@dondon> Hello, I've just been to the page documenting ComAlign, http://docs.bioperl org/bioperl-run/Bio/Tools/Run/PiseApplication/comalign.html. I'm a undergraduate in SIngapore doing research on multiple alignment techniques. I've came across a few other techniques which have web interfaces including MultAlin, DiAlign etc. I would wish to enquire if there are any applications using these techniques or ComAlign that can be downloaded from the Internet. Thank You for any assistance provided and looking forward to your reply. Regards Terry Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031203/932bd354/attachment.htm From lstein at cshl.edu Tue Dec 2 17:21:19 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Dec 2 21:58:40 2003 Subject: [Bioperl-l] Graphics:Panel /SeqFeature::Generic In-Reply-To: <3FC1F8A8.3060306@biologie.uni-freiburg.de> References: <3FB89DB0.5070303@biologie.uni-freiburg.de> <200311181353.34763.lstein@cshl.edu> <3FC1F8A8.3060306@biologie.uni-freiburg.de> Message-ID: <200312021721.19307.lstein@cshl.edu> Make sure to set the tag on the subfeatures and to use the segments glyph. The attached demo script encodes the score as red/blue and the significance as height. Lincoln On Monday 24 November 2003 07:25 am, Daniel Lang wrote: > Hi Lincoln, > Thanks for your help, but this didn?t improve the situation:( > I know now for sure, that the $feature->score is not the evalue, that is > set in the while loop, but the normal score!! > Additionally, I tried again introducing it as an additional tag, but > this tag isn?t available in the callback with e.g. > get_tag_values('evalue'). I?m using it in a mod_perl Handler, could this be > part of the problem? Thanks in advance, > Daniel > > Lincoln Stein wrote: > > Hi Dan, > > > > Try changing the "generic" glyph to "segments." The first glyph doesn't > > know how to deal with subparts (such as HSPs), the second does. > > > > Lincoln > > > > On Monday 17 November 2003 05:06 am, Daniel Lang wrote: > >>Hi, > >>I want to generate overview graphics from BLAST reports, where the hits > >>are sorted and colored (>1e-10 -->green, ...)according their evalues... > >> > >>So I thought, I could solve this using a callback function for the > >>bgcolor and using the 'low_score' sort_order, but when applied to a > >>BLAST report, it results in sorted but only red hits? > >>I also tried introducing the evalues as additional tags like done with > >>'bits' or 'range', but when testing for this tag in the callback > >>(has_tag) its not available? > >>So I wander if the function is envoked for each hit in the while loop? > >> > >>Here the code sniplet: > >> > >>my $track = $panel->add_track(-glyph => 'generic', > >> -label => 1, > >> -connector => 'dashed', > >> -height => 5, > >> -bgcolor => sub { > >> my $feature = shift; > >> my $evalue = $feature->score; > >> if ($evalue < 1e-10) {return 'green';} > >> else {return 'red';}} > >> , > >> -fontcolor => 'green', > >> -font2color => 'red', > >> -sort_order => 'low_score', > >> -min_score => '1e-1000', > >> -max_score => '10000', > >> -description => sub { > >> my $feature = shift; > >> return unless > >> $feature->has_tag('bits'); my ($description) = > >>$feature->each_tag_value('bits'); > >> my $score = $feature->score; > >> my ($range) = > >>$feature->each_tag_value('range'); > >> "Score=$description bits, E-value=$score, $range"; > >> }); > >> > >> while( my $hit = $result->next_hit ) { > >> my $evalue = $hit->significance; > >> my $feature = Bio::SeqFeature::Generic->new(-score => $evalue, > >> -display_name => $hit->name, > >> -tag => { 'bits' => $hit->bits, > >> 'range' => "from ". $hit->start('query') . " to " . > >>$hit->end('query'), > >> }, > >> ); > >> while( my $hsp = $hit->next_hsp ) { > >> $feature->add_sub_SeqFeature($hsp,'EXPAND'); > >> } > >> $track->add_feature($feature); > >> } > >> > >>Thanks in advance, > >>Daniel > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l@portal.open-bio.org > >>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== -------------- next part -------------- full_length: hit: description Human non-histone chromatin protein HMG1 (HMG1) gene, complete cds. name U51677 hit: description Mus musculus (clone Clebp-1) high mobility group 1 protein (HMG-1) name L38477 hit: description M.musculus HMG1 gene name X80457 hit: description Mus musculus HMG-1 mRNA, complete cds. name U00431 hit: description Human non-histone chromosomal protein (HMG-1) retropseudogene. name L08048 hit: description Human mRNA for high mobility group-1 protein (HMG-1). name X12597 hit: description Rat amphoterin mRNA, complete cds. name M64986 hit: description M.musculus mRNA for non-histone chromosomal high-mobility group 1 name Z11997 hit: description Human mRNA for HMG-1, complete cds. name D63874 hit: description Human DNA sequence from BAC 445C9 on chromosome 22q12.1. name Z95115 hit: description Bovine mRNA for high mobility group 1 (HMG1) protein name X12796 hit: description Bovine high-mobility-group protein (HMG-1) mRNA, 3' end. name M26110 hit: description Mus musculus HMG-like protein (Trf) mRNA, complete cds. name AF009343 hit: description Pig nonhistone protein HMG1 mRNA, complete cds. name M21683;M21684 hit: description Homo sapiens (clone 06) high mobility group 1 protein mRNA name L13805 hit: description Human chromosomal protein HMG1 related gene. name D14718 hit: description Chinese hamster HMG-1 gene for high mobility group protein 1 name Y00365 hit: description Rat high mobility group 1 protein synthetic gene, complete cds. name M63852 hit: description Rat mRNA for high mobility group protein HMG1 name Y00463 hit: description M.musculus HMG1-R-227 gene name X80466 hit: description M.musculus HMG1-R-154 gene name X80462 hit: description M.musculus HMG1-R-145 gene name X80461 hit: description M.musculus HMG1-R-177 gene name X80459 hit: description M.musculus HMG1-R-87 gene name X80467 hit: description M.musculus HMG1-R-168 gene name X80465 hit: description M.musculus HMG1-R-159 gene name X80463 hit: description Rainbow trout HMG-1 gene exons 2-5, complete cds. name L32859 hit: description M.musculus HMG1-R-135 gene name X80460 hit: description M.musculus HMG1-R-161 gene name X80464 hit: description Trout mRNA for high mobility group protein HMG-T name X02666 hit: description Xenopus laevis high mobility group protein-1 (HMG-1) mRNA, complete name U21933 -------------- next part -------------- A non-text attachment was scrubbed... Name: test_blastgraph.pl Type: text/x-perl Size: 2791 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031202/825d6b36/test_blastgraph-0001.bin From rysz_c5 at yahoo.com Tue Dec 2 21:16:01 2003 From: rysz_c5 at yahoo.com (Engineer - Mgr) Date: Wed Dec 3 00:26:42 2003 Subject: [Bioperl-l] Sending Resume Message-ID: <200312030526.hB35QbFC024815@portal.open-bio.org> RIC SIE Tel (408) 482-2840 rzbig@yahoo.com OBJECTIVE: STRUCTURAL & MECHANICAL DESIGNER CIVIL, ARCHITECTURAL, TRANSPORTATION CAD Operator EXPERIENCE: 93 - present DESIGNER, ENGINEER, CAD MANAGER; "Mech-Tronic" Engineering & Design Service, Project Management & Development. Preparing technical documentation, calculations, layouts drawings & propositions. CAD Management and Operations, drafting & redesigning. Intergraph, MicroStation, Autodesk, ACAD, Win, Net, Softdesk Mgmt Civil, Bridges and Structural Design, Plans, Mapping, Detail Freeway & Roadway, data translation & inserting. Script & CAD automation. Geological Structures, Viaducts, Freeways, Highways, Shopping Center. Architectural and Environmental Projects and cooperation; military facilities and plans, Cities, Airports, remediation drawings upgrade, correcting and redesign. Traffic design & problem analyzes-reorganize. Freeway Design & Drafting Support, Site analyzing for Caltrans, Architectural, Archeotype & Electrical drawings, "as is" and initial design; Develop remediation procedure and equipment for lead painted buildings. Construction management, Job site inspection, civil & structural support Mechanical Evaluations - Design - Service and Maintenance; R&D. EDUCATION: Institute for Business & Technology, California CAD Engineer, Programming, Design, Management Electro - Mechanical College Mechanical Engineering - BS Degree DOS, UNIX, MAC, SUN computers; WP, dBASE, Lotus, Network, Lisp, Windows & Appl., PFS, Graphics, CAD/CAM, Basic, Fortran, Analyzes. METRIC, SOLAR, AutoCAD/Computer Instructor. Transportation Spec. Personal Designer, MS Project, MS Works, Excel, Access, C, Script, File Management, File transfer. Learn quickly, work independly, shift, overtime From rysz_c5 at yahoo.com Tue Dec 2 21:16:01 2003 From: rysz_c5 at yahoo.com (Engineer - Mgr) Date: Wed Dec 3 00:26:43 2003 Subject: [Bioperl-l] Sending Resume Message-ID: <200312030526.hB35QbFC024814@portal.open-bio.org> RIC SIE Tel (408) 482-2840 rzbig@yahoo.com OBJECTIVE: STRUCTURAL & MECHANICAL DESIGNER CIVIL, ARCHITECTURAL, TRANSPORTATION CAD Operator EXPERIENCE: 93 - present DESIGNER, ENGINEER, CAD MANAGER; "Mech-Tronic" Engineering & Design Service, Project Management & Development. Preparing technical documentation, calculations, layouts drawings & propositions. CAD Management and Operations, drafting & redesigning. Intergraph, MicroStation, Autodesk, ACAD, Win, Net, Softdesk Mgmt Civil, Bridges and Structural Design, Plans, Mapping, Detail Freeway & Roadway, data translation & inserting. Script & CAD automation. Geological Structures, Viaducts, Freeways, Highways, Shopping Center. Architectural and Environmental Projects and cooperation; military facilities and plans, Cities, Airports, remediation drawings upgrade, correcting and redesign. Traffic design & problem analyzes-reorganize. Freeway Design & Drafting Support, Site analyzing for Caltrans, Architectural, Archeotype & Electrical drawings, "as is" and initial design; Develop remediation procedure and equipment for lead painted buildings. Construction management, Job site inspection, civil & structural support Mechanical Evaluations - Design - Service and Maintenance; R&D. EDUCATION: Institute for Business & Technology, California CAD Engineer, Programming, Design, Management Electro - Mechanical College Mechanical Engineering - BS Degree DOS, UNIX, MAC, SUN computers; WP, dBASE, Lotus, Network, Lisp, Windows & Appl., PFS, Graphics, CAD/CAM, Basic, Fortran, Analyzes. METRIC, SOLAR, AutoCAD/Computer Instructor. Transportation Spec. Personal Designer, MS Project, MS Works, Excel, Access, C, Script, File Management, File transfer. Learn quickly, work independly, shift, overtime From Daniel.Lang at biologie.uni-freiburg.de Wed Dec 3 04:06:23 2003 From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed Dec 3 04:12:44 2003 Subject: [Bioperl-l] Graphics:Panel /SeqFeature::Generic In-Reply-To: <200312021721.19307.lstein@cshl.edu> References: <3FB89DB0.5070303@biologie.uni-freiburg.de> <200311181353.34763.lstein@cshl.edu> <3FC1F8A8.3060306@biologie.uni-freiburg.de> <200312021721.19307.lstein@cshl.edu> Message-ID: <3FCDA78F.3000602@biologie.uni-freiburg.de> Thanks a lot Lincoln! That worked for me:) Of course there are some questions remaining;) I wounder why I can use something like $feature->significance if I didn?t set that tag? Are the whole hits attributes available once a hit is linked to a feature? I?m using this (BLAST overview graphics with Bio::Graphics::panel)in a modperl handler...Most of the time everything works out absolutely fine, but sometimes when the handler is called the graphics are black?n white?! (Apache/1.3.27 (Unix)(Red-Hat/Linux) Embperl/2.0b5 mod_perl/1.26) (All the modules are preloaded in a startup script) I fear this might be off-topic but has anyone also experienced behaviour like this? Thanks in advance. Daniel Lincoln Stein wrote: > Make sure to set the tag on the subfeatures and to use the segments glyph. > The attached demo script encodes the score as red/blue and the significance > as height. > > Lincoln > > On Monday 24 November 2003 07:25 am, Daniel Lang wrote: > >>Hi Lincoln, >>Thanks for your help, but this didn?t improve the situation:( >>I know now for sure, that the $feature->score is not the evalue, that is >>set in the while loop, but the normal score!! >>Additionally, I tried again introducing it as an additional tag, but >>this tag isn?t available in the callback with e.g. >>get_tag_values('evalue'). I?m using it in a mod_perl Handler, could this be >>part of the problem? Thanks in advance, >>Daniel >> >>Lincoln Stein wrote: >> >>>Hi Dan, >>> >>>Try changing the "generic" glyph to "segments." The first glyph doesn't >>>know how to deal with subparts (such as HSPs), the second does. >>> >>>Lincoln >>> >>>On Monday 17 November 2003 05:06 am, Daniel Lang wrote: >>> >>>>Hi, >>>>I want to generate overview graphics from BLAST reports, where the hits >>>>are sorted and colored (>1e-10 -->green, ...)according their evalues... >>>> >>>>So I thought, I could solve this using a callback function for the >>>>bgcolor and using the 'low_score' sort_order, but when applied to a >>>>BLAST report, it results in sorted but only red hits? >>>>I also tried introducing the evalues as additional tags like done with >>>>'bits' or 'range', but when testing for this tag in the callback >>>>(has_tag) its not available? >>>>So I wander if the function is envoked for each hit in the while loop? >>>> >>>>Here the code sniplet: >>>> >>>>my $track = $panel->add_track(-glyph => 'generic', >>>> -label => 1, >>>> -connector => 'dashed', >>>> -height => 5, >>>> -bgcolor => sub { >>>> my $feature = shift; >>>> my $evalue = $feature->score; >>>> if ($evalue < 1e-10) {return 'green';} >>>> else {return 'red';}} >>>> , >>>> -fontcolor => 'green', >>>> -font2color => 'red', >>>> -sort_order => 'low_score', >>>> -min_score => '1e-1000', >>>> -max_score => '10000', >>>> -description => sub { >>>> my $feature = shift; >>>> return unless >>>>$feature->has_tag('bits'); my ($description) = >>>>$feature->each_tag_value('bits'); >>>> my $score = $feature->score; >>>> my ($range) = >>>>$feature->each_tag_value('range'); >>>> "Score=$description bits, E-value=$score, $range"; >>>> }); >>>> >>>> while( my $hit = $result->next_hit ) { >>>> my $evalue = $hit->significance; >>>> my $feature = Bio::SeqFeature::Generic->new(-score => $evalue, >>>> -display_name => $hit->name, >>>> -tag => { 'bits' => $hit->bits, >>>> 'range' => "from ". $hit->start('query') . " to " . >>>>$hit->end('query'), >>>> }, >>>> ); >>>> while( my $hsp = $hit->next_hsp ) { >>>> $feature->add_sub_SeqFeature($hsp,'EXPAND'); >>>> } >>>> $track->add_feature($feature); >>>> } >>>> >>>>Thanks in advance, >>>>Daniel >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > ------------------------------------------------------------------------ > > full_length: > > hit: > description Human non-histone chromatin protein HMG1 (HMG1) gene, complete cds. > name U51677 > > hit: > description Mus musculus (clone Clebp-1) high mobility group 1 protein (HMG-1) > name L38477 > > hit: > description M.musculus HMG1 gene > name X80457 > > hit: > description Mus musculus HMG-1 mRNA, complete cds. > name U00431 > > hit: > description Human non-histone chromosomal protein (HMG-1) retropseudogene. > name L08048 > > hit: > description Human mRNA for high mobility group-1 protein (HMG-1). > name X12597 > > hit: > description Rat amphoterin mRNA, complete cds. > name M64986 > > hit: > description M.musculus mRNA for non-histone chromosomal high-mobility group 1 > name Z11997 > > hit: > description Human mRNA for HMG-1, complete cds. > name D63874 > > hit: > description Human DNA sequence from BAC 445C9 on chromosome 22q12.1. > name Z95115 > > hit: > description Bovine mRNA for high mobility group 1 (HMG1) protein > name X12796 > > hit: > description Bovine high-mobility-group protein (HMG-1) mRNA, 3' end. > name M26110 > > hit: > description Mus musculus HMG-like protein (Trf) mRNA, complete cds. > name AF009343 > > hit: > description Pig nonhistone protein HMG1 mRNA, complete cds. > name M21683;M21684 > > hit: > description Homo sapiens (clone 06) high mobility group 1 protein mRNA > name L13805 > > hit: > description Human chromosomal protein HMG1 related gene. > name D14718 > > hit: > description Chinese hamster HMG-1 gene for high mobility group protein 1 > name Y00365 > > hit: > description Rat high mobility group 1 protein synthetic gene, complete cds. > name M63852 > > hit: > description Rat mRNA for high mobility group protein HMG1 > name Y00463 > > hit: > description M.musculus HMG1-R-227 gene > name X80466 > > hit: > description M.musculus HMG1-R-154 gene > name X80462 > > hit: > description M.musculus HMG1-R-145 gene > name X80461 > > hit: > description M.musculus HMG1-R-177 gene > name X80459 > > hit: > description M.musculus HMG1-R-87 gene > name X80467 > > hit: > description M.musculus HMG1-R-168 gene > name X80465 > > hit: > description M.musculus HMG1-R-159 gene > name X80463 > > hit: > description Rainbow trout HMG-1 gene exons 2-5, complete cds. > name L32859 > > hit: > description M.musculus HMG1-R-135 gene > name X80460 > > hit: > description M.musculus HMG1-R-161 gene > name X80464 > > hit: > description Trout mRNA for high mobility group protein HMG-T > name X02666 > > hit: > description Xenopus laevis high mobility group protein-1 (HMG-1) mRNA, complete > name U21933 > > > > ------------------------------------------------------------------------ > > #!/lab/bin/perl > > use strict; > use lib '.'; > use Bio::Graphics; > use Bio::SearchIO; > > use constant BLAST_FILE => './doc/howto/examples/graphics/blastn.out'; > > my $searchio = new Bio::SearchIO (-format => 'blast', > -file => BLAST_FILE); > > my $result = $searchio->next_result; > > #Create a panel object > my $panel = Bio::Graphics::Panel->new( -length => $result->query_length, > -width => 1000, > -pad_left => 10, > -pad_right => 10, > ); > > my $full_length = Bio::SeqFeature::Generic->new(-start => 1, > -end => $result->query_length, > -primary_tag => 'full_length', > -seq_id=> $result->query_name > ); > $panel->add_track($full_length, > -glyph => 'arrow', > -tick => 2, > -fgcolor => 'black', > -double => 1, > -label => 1, > ); > > my $track = $panel->add_track(-glyph => 'segments', > -label => 1, > -connector => 'dashed', > -bgcolor => sub { > my $feature = shift; > my $score = $feature->score; > $score < 50 ? 'blue' : 'red'; > }, > -height => sub { > my $feature = shift; > $feature->significance < 1e-50 ? 20 : 10; > }, > -font2color => 'red', > -sort_order => 'high_score', > -description => sub { > my $feature = shift; > return unless $feature->has_tag('desription'); > my ($description) = $feature->each_tag_value('description'); my > $score = $feature->score; > "$description, score=$score"; > } > ); > > while( my $hit = $result->next_hit ) { > next unless $hit->significance < 1e-20; > my $feature = Bio::SeqFeature::Generic->new(-score => $hit->raw_score, > -seq_id => $hit->name, > -primary_tag => 'hit', > -tag => { > description => $hit->description, > name => $hit->name, > }, > ); > while( my $hsp = $hit->next_hsp ) { > $feature->add_sub_SeqFeature($hsp,'EXPAND'); > } > > $track->add_feature($feature); > } > > print $panel->png; > > __END__ > > my @boxes = $panel->boxes; > foreach ( $panel->boxes() ) { > my $feature_box = $_->[0]; > my $coords = join( ',', @{$_}[1..4] ); > > print $feature_box->primary_tag,":\n"; > my @tags = $feature_box->get_all_tags(); > for my $x (@tags) { > print $x,"\t",$feature_box->each_tag_value($x),"\n"; > } > print "\n"; > } From heikki at nildram.co.uk Wed Dec 3 05:12:50 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 3 05:19:10 2003 Subject: [Bioperl-l] Re: [Bioperl-guts-l] Bio::Restriction modifications In-Reply-To: <200312021950.51773.pblaiklo@110.net> References: <200311211807.hALI7e12028431@pub.open-bio.org> <200312021950.51773.pblaiklo@110.net> Message-ID: <200312031012.21731.heikki@ebi.ac.uk> Peter, First thank you for all your efforts: bug reports and now fixes. It is really great to have more people committing to bioperl. You must have noticed that Rob Edwards beat you by half an hour by committing his fixes into cvs. Could you work with Rob to make sure that all the issues are solved and when necessary refactor the code. It would be good to start by running to two codes in parallel to see that they give identical results. It is not too late to change the interface. We have not made the stable release yet. Go ahead do what is needed and keep posting to the list to make sure everyone interested is informed. Cheers, -Heikki P.S. All hand written messages should go to bioperl-l. The guts list is only for cvs and bugzilla logs. On Wednesday 03 Dec 2003 12:50 am, Peter Blaiklock wrote: > Hi, everyone > > I have modified some of the modules in Bio::Restriction to take care of > some bugs and add a couple of features. It would be great if people could > try them out and see if they are useful. I have tried to preserve backward > compatibility, but they will probably break some existing scripts. The > modifications are summarized below, the actual modules are linked to. > > Bio::Restriction::Analysis > > Changed the site finding algorithm in '_new_cuts'. The new algorithm > uses pos to find the recognition site regex and finds the actual cut point > using $enz->cut. The cut positions are put in an array and substr is used > to get digestion fragments. > > Fixed the origin bug for circular target sequences. If the sequence is > circular the first 40 bases are appended to the end of the sequence. Sites > that span the origin are now detected. First and last fragments are > reattached as before. > > Fixed the non-palindromes bug. Non-palindromic sites are detected in > the reverse orientation by searching the reverse complement of the target > sequence, using $enz->complementary_cut and then subtracting from the > sequence length. > > Fixed the dual cutters bug. Both cuts of a dual cut enzyme are now > reported automatically. > > Fixed a bug where a site would be detected even if the complementary cut > was off the end of the target sequence. > > Added a method 'fragment_maps($enzyme_name)'. This method returns an > array of hashes where each hash corresponds to a restriction fragment. The > hash keys are 'start', 'end' and 'seq'. 'seq' holds the sequence of the > fragment, while 'start' and 'end' hold the positions of the first and last > bases of the fragment in the target sequence. This will allow restriction > fragments to be turned into features. > > Added a method 'positions($enzyme_name)'. This method returns an array > of cut positions for the specified enzyme. > > Added internal methods '_find_cut', '_multicut' and ' _digest'. > '_find_cut' figures out the cut point given the position of the recognition > site and checks whether the cut point is off the end of the molecule. This > has to be done twice for non-palindromic enzymes, so it makes sense to make > it a separate method. > '_digest' gets the restriction fragments from the list of cut points. I > split this off of '_new_cuts' for the sake of readability, particularly > after we start dealing with overlapping sites. > '_multicut' checks whether the "other" cut of a dual cut enzyme is off > the end of the sequence. Could probably be folded back into '_find_cut'. > > Bio::Restriction::Enzyme > > Changed the enzyme name correction code so only a single trailing '1' > gets changed to an 'I'. This should prevent name mangling of enzymes like > Alw21I. > > Changed '$cut && $self->cut($cut);' to 'defined $cut && > $self->cut($cut);' (same for complementary cut) in the constructor. This > allows $self->cut to be zero. > > Some minor changes in methods 'cut' and 'site'. > > Bio::Restriction::IO::base > > Changed method '_make_multicuts' so that the second enzyme is not added > to the collection and the first enzyme is not added to the second enzyme's > "others" array, only the other way around. The idea is to prevent dual cut > sites from being reported twice. This will break scripts that used this > behavior to work around the dual cut bug, but you shouldn't have to do that > any more. > > Similar changes made to '_make_multisites'. > > http://www.restrictionmapper.org/bioperl/Analysis.pm > http://www.restrictionmapper.org/bioperl/Enzyme.pm > http://www.restrictionmapper.org/bioperl/base.pm > > Peter Blaiklock > > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From redwards at utmem.edu Wed Dec 3 08:16:13 2003 From: redwards at utmem.edu (Rob Edwards) Date: Wed Dec 3 08:22:35 2003 Subject: [Bioperl-l] Re: [Bioperl-guts-l] Bio::Restriction modifications In-Reply-To: <200312031012.21731.heikki@ebi.ac.uk> Message-ID: Peter, It sounds like we used the same approach to fix the same problems - great minds :) However, we both fixed extra things and so we need to merge. Here > Bio::Restriction::Analysis > > Changed the site finding algorithm in '_new_cuts'. The new > algorithm > uses pos to find the recognition site regex and finds the actual cut > point > using $enz->cut. The cut positions are put in an array and substr is > used > to get digestion fragments. This is the best way to do it, it allows you to do a lot more with the sequences. I also used this approach for ambiguous enzymes. For non-ambiguous enzymes it is a lot more efficient to use an index based string search rather than a regexp, and I added an internal method to deal with this. > Fixed the origin bug for circular target sequences. If the sequence > is > circular the first 40 bases are appended to the end of the sequence. > Sites > that span the origin are now detected. First and last fragments are > reattached as before. I added a method to EnzymeCollection to return the longest cut site in the collection and then added that length of sequence. This will correctly deal with any site > 40 bp. Using position instead of fragments sure makes this a lot easier, huh? > Fixed the non-palindromes bug. Non-palindromic sites are detected > in > the reverse orientation by searching the reverse complement of the > target > sequence, using $enz->complementary_cut and then subtracting from the > sequence length. I added a new method to Enzyme.pm to return the complementary cut site. I don't particularly like the way I did this. I'd like Analysis to correctly use the Enzyme->cut and Enzyme->complementary_cut which it doesn't at the moment. I think this is a lot better than using the "^" in the site routine. > Fixed a bug where a site would be detected even if the > complementary cut > was off the end of the target sequence. I am not sure if I deal with this properly. I think it should be handled properly with circularization? > Added a method 'fragment_maps($enzyme_name)'. This method returns an > array of hashes where each hash corresponds to a restriction fragment. > The > hash keys are 'start', 'end' and 'seq'. 'seq' holds the sequence of the > fragment, while 'start' and 'end' hold the positions of the first and > last > bases of the fragment in the target sequence. This will allow > restriction > fragments to be turned into features. This is cool. > Added internal methods '_find_cut', '_multicut' and ' _digest'. > '_find_cut' figures out the cut point given the position of the > recognition > site and checks whether the cut point is off the end of the molecule. > This > has to be done twice for non-palindromic enzymes, so it makes sense to > make > it a separate method. > '_digest' gets the restriction fragments from the list of cut > points. I > split this off of '_new_cuts' for the sake of readability, particularly > after we start dealing with overlapping sites. > '_multicut' checks whether the "other" cut of a dual cut enzyme is > off > the end of the sequence. Could probably be folded back into > '_find_cut'. I did something similar, but probably used about 5x as many methods (making it much less clear I am sure). > Bio::Restriction::Enzyme > > Changed the enzyme name correction code so only a single trailing > '1' > gets changed to an 'I'. This should prevent name mangling of enzymes > like > Alw21I. > > Changed '$cut && $self->cut($cut);' to 'defined $cut && > $self->cut($cut);' (same for complementary cut) in the constructor. > This > allows $self->cut to be zero. > > Some minor changes in methods 'cut' and 'site'. These should be committed. > Bio::Restriction::IO::base > > Changed method '_make_multicuts' so that the second enzyme is not > added > to the collection and the first enzyme is not added to the second > enzyme's > "others" array, only the other way around. The idea is to prevent dual > cut > sites from being reported twice. This will break scripts that used this > behavior to work around the dual cut bug, but you shouldn't have to do > that > any more. > > Similar changes made to '_make_multisites'. These should be committed too. I'll try and merge this code with what I wrote and add it to cvs in a day or so. Once I get a near working version, I'll let you know, perhaps you can help debug the changes. Take a look at the latest versions in cvs and let me know what you think. Rob From lstein at cshl.edu Wed Dec 3 08:41:53 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Dec 3 08:48:19 2003 Subject: [Bioperl-l] Graphics:Panel /SeqFeature::Generic In-Reply-To: <3FCDA78F.3000602@biologie.uni-freiburg.de> References: <3FB89DB0.5070303@biologie.uni-freiburg.de> <200312021721.19307.lstein@cshl.edu> <3FCDA78F.3000602@biologie.uni-freiburg.de> Message-ID: <200312030841.53418.lstein@cshl.edu> Hi Dan, I think the issue here is that the top level feature is a Bio::SeqFeature::Generic, while the subfeatures are actually Bio::Search::Hit objects. Therefore you've got to be careful about whether you're looking at the top level or the subfeature. I can't explain the black & white problem. I'm using Bio::Graphics in a mod_perl environment on a site that gets a hundred thousand hits per week and have not observed this at all. Perhaps you are memory limited? Lincoln On Wednesday 03 December 2003 04:06 am, Daniel Lang wrote: > Thanks a lot Lincoln! That worked for me:) > > Of course there are some questions remaining;) > I wounder why I can use something like $feature->significance if I > didn?t set that tag? Are the whole hits attributes available once a hit > is linked to a feature? > > I?m using this (BLAST overview graphics with Bio::Graphics::panel)in a > modperl handler...Most of the time everything works out absolutely fine, > but sometimes when the handler is called the graphics are black?n > white?! (Apache/1.3.27 (Unix)(Red-Hat/Linux) Embperl/2.0b5 > mod_perl/1.26) (All the modules are preloaded in a startup script) > I fear this might be off-topic but has anyone also experienced behaviour > like this? > > Thanks in advance. > > Daniel > > Lincoln Stein wrote: > > Make sure to set the tag on the subfeatures and to use the segments > > glyph. The attached demo script encodes the score as red/blue and the > > significance as height. > > > > Lincoln > > > > On Monday 24 November 2003 07:25 am, Daniel Lang wrote: > >>Hi Lincoln, > >>Thanks for your help, but this didn?t improve the situation:( > >>I know now for sure, that the $feature->score is not the evalue, that is > >>set in the while loop, but the normal score!! > >>Additionally, I tried again introducing it as an additional tag, but > >>this tag isn?t available in the callback with e.g. > >>get_tag_values('evalue'). I?m using it in a mod_perl Handler, could this > >> be part of the problem? Thanks in advance, > >>Daniel > >> > >>Lincoln Stein wrote: > >>>Hi Dan, > >>> > >>>Try changing the "generic" glyph to "segments." The first glyph doesn't > >>>know how to deal with subparts (such as HSPs), the second does. > >>> > >>>Lincoln > >>> > >>>On Monday 17 November 2003 05:06 am, Daniel Lang wrote: > >>>>Hi, > >>>>I want to generate overview graphics from BLAST reports, where the hits > >>>>are sorted and colored (>1e-10 -->green, ...)according their evalues... > >>>> > >>>>So I thought, I could solve this using a callback function for the > >>>>bgcolor and using the 'low_score' sort_order, but when applied to a > >>>>BLAST report, it results in sorted but only red hits? > >>>>I also tried introducing the evalues as additional tags like done with > >>>>'bits' or 'range', but when testing for this tag in the callback > >>>>(has_tag) its not available? > >>>>So I wander if the function is envoked for each hit in the while loop? > >>>> > >>>>Here the code sniplet: > >>>> > >>>>my $track = $panel->add_track(-glyph => 'generic', > >>>> -label => 1, > >>>> -connector => 'dashed', > >>>> -height => 5, > >>>> -bgcolor => sub { > >>>> my $feature = shift; > >>>> my $evalue = $feature->score; > >>>> if ($evalue < 1e-10) {return 'green';} > >>>> else {return 'red';}} > >>>> , > >>>> -fontcolor => 'green', > >>>> -font2color => 'red', > >>>> -sort_order => 'low_score', > >>>> -min_score => '1e-1000', > >>>> -max_score => '10000', > >>>> -description => sub { > >>>> my $feature = shift; > >>>> return unless > >>>>$feature->has_tag('bits'); my ($description) = > >>>>$feature->each_tag_value('bits'); > >>>> my $score = $feature->score; > >>>> my ($range) = > >>>>$feature->each_tag_value('range'); > >>>> "Score=$description bits, E-value=$score, $range"; > >>>> }); > >>>> > >>>> while( my $hit = $result->next_hit ) { > >>>> my $evalue = $hit->significance; > >>>> my $feature = Bio::SeqFeature::Generic->new(-score => $evalue, > >>>> -display_name => $hit->name, > >>>> -tag => { 'bits' => $hit->bits, > >>>> 'range' => "from ". $hit->start('query') . " to " . > >>>>$hit->end('query'), > >>>> }, > >>>> ); > >>>> while( my $hsp = $hit->next_hsp ) { > >>>> $feature->add_sub_SeqFeature($hsp,'EXPAND'); > >>>> } > >>>> $track->add_feature($feature); > >>>> } > >>>> > >>>>Thanks in advance, > >>>>Daniel > >>>> > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l@portal.open-bio.org > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l@portal.open-bio.org > >>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > ------------------------------------------------------------------------ > > > > full_length: > > > > hit: > > description Human non-histone chromatin protein HMG1 (HMG1) gene, > > complete cds. name U51677 > > > > hit: > > description Mus musculus (clone Clebp-1) high mobility group 1 protein > > (HMG-1) name L38477 > > > > hit: > > description M.musculus HMG1 gene > > name X80457 > > > > hit: > > description Mus musculus HMG-1 mRNA, complete cds. > > name U00431 > > > > hit: > > description Human non-histone chromosomal protein (HMG-1) > > retropseudogene. name L08048 > > > > hit: > > description Human mRNA for high mobility group-1 protein (HMG-1). > > name X12597 > > > > hit: > > description Rat amphoterin mRNA, complete cds. > > name M64986 > > > > hit: > > description M.musculus mRNA for non-histone chromosomal high-mobility > > group 1 name Z11997 > > > > hit: > > description Human mRNA for HMG-1, complete cds. > > name D63874 > > > > hit: > > description Human DNA sequence from BAC 445C9 on chromosome 22q12.1. > > name Z95115 > > > > hit: > > description Bovine mRNA for high mobility group 1 (HMG1) protein > > name X12796 > > > > hit: > > description Bovine high-mobility-group protein (HMG-1) mRNA, 3' end. > > name M26110 > > > > hit: > > description Mus musculus HMG-like protein (Trf) mRNA, complete cds. > > name AF009343 > > > > hit: > > description Pig nonhistone protein HMG1 mRNA, complete cds. > > name M21683;M21684 > > > > hit: > > description Homo sapiens (clone 06) high mobility group 1 protein mRNA > > name L13805 > > > > hit: > > description Human chromosomal protein HMG1 related gene. > > name D14718 > > > > hit: > > description Chinese hamster HMG-1 gene for high mobility group protein 1 > > name Y00365 > > > > hit: > > description Rat high mobility group 1 protein synthetic gene, complete > > cds. name M63852 > > > > hit: > > description Rat mRNA for high mobility group protein HMG1 > > name Y00463 > > > > hit: > > description M.musculus HMG1-R-227 gene > > name X80466 > > > > hit: > > description M.musculus HMG1-R-154 gene > > name X80462 > > > > hit: > > description M.musculus HMG1-R-145 gene > > name X80461 > > > > hit: > > description M.musculus HMG1-R-177 gene > > name X80459 > > > > hit: > > description M.musculus HMG1-R-87 gene > > name X80467 > > > > hit: > > description M.musculus HMG1-R-168 gene > > name X80465 > > > > hit: > > description M.musculus HMG1-R-159 gene > > name X80463 > > > > hit: > > description Rainbow trout HMG-1 gene exons 2-5, complete cds. > > name L32859 > > > > hit: > > description M.musculus HMG1-R-135 gene > > name X80460 > > > > hit: > > description M.musculus HMG1-R-161 gene > > name X80464 > > > > hit: > > description Trout mRNA for high mobility group protein HMG-T > > name X02666 > > > > hit: > > description Xenopus laevis high mobility group protein-1 (HMG-1) mRNA, > > complete name U21933 > > > > > > > > ------------------------------------------------------------------------ > > > > #!/lab/bin/perl > > > > use strict; > > use lib '.'; > > use Bio::Graphics; > > use Bio::SearchIO; > > > > use constant BLAST_FILE => './doc/howto/examples/graphics/blastn.out'; > > > > my $searchio = new Bio::SearchIO (-format => 'blast', > > -file => BLAST_FILE); > > > > my $result = $searchio->next_result; > > > > #Create a panel object > > my $panel = Bio::Graphics::Panel->new( -length => $result->query_length, > > -width => 1000, > > -pad_left => 10, > > -pad_right => 10, > > ); > > > > my $full_length = Bio::SeqFeature::Generic->new(-start => 1, > > -end => > > $result->query_length, -primary_tag => 'full_length', > > -seq_id=> > > $result->query_name ); > > $panel->add_track($full_length, > > -glyph => 'arrow', > > -tick => 2, > > -fgcolor => 'black', > > -double => 1, > > -label => 1, > > ); > > > > my $track = $panel->add_track(-glyph => 'segments', > > -label => 1, > > -connector => 'dashed', > > -bgcolor => sub { > > my $feature = shift; > > my $score = $feature->score; > > $score < 50 ? 'blue' : 'red'; > > }, > > -height => sub { > > my $feature = shift; > > $feature->significance < 1e-50 ? 20 : 10; > > }, > > -font2color => 'red', > > -sort_order => 'high_score', > > -description => sub { > > my $feature = shift; > > return unless > > $feature->has_tag('desription'); my ($description) = > > $feature->each_tag_value('description'); my $score = $feature->score; > > "$description, score=$score"; > > } > > ); > > > > while( my $hit = $result->next_hit ) { > > next unless $hit->significance < 1e-20; > > my $feature = Bio::SeqFeature::Generic->new(-score => > > $hit->raw_score, -seq_id => $hit->name, > > -primary_tag => 'hit', > > -tag => { > > description => $hit->description, > > name => $hit->name, > > }, > > ); > > while( my $hsp = $hit->next_hsp ) { > > $feature->add_sub_SeqFeature($hsp,'EXPAND'); > > } > > > > $track->add_feature($feature); > > } > > > > print $panel->png; > > > > __END__ > > > > my @boxes = $panel->boxes; > > foreach ( $panel->boxes() ) { > > my $feature_box = $_->[0]; > > my $coords = join( ',', @{$_}[1..4] ); > > > > print $feature_box->primary_tag,":\n"; > > my @tags = $feature_box->get_all_tags(); > > for my $x (@tags) { > > print $x,"\t",$feature_box->each_tag_value($x),"\n"; > > } > > print "\n"; > > } > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From oliver.burren at cimr.cam.ac.uk Wed Dec 3 09:14:46 2003 From: oliver.burren at cimr.cam.ac.uk (Oliver Burren) Date: Wed Dec 3 09:21:07 2003 Subject: [Bioperl-l] Sorting Bio:Graphics::Glyphs Message-ID: <1070460886.2244.136.camel@jakarta> Hi, I have a lot of transcripts that I'm rendering using the excellent Bio::Graphics module. I want to sort these by the last two characters in the display_name attribute. I used the following callback sub sort_trans($$){substr($_[0]->display_name,-2)<=> substr($_[1]->display_name,-2)} # the last two chars are numbers therefore using <=> I then added the following track glyph my $ttrack=$panel->add_track(\@transcripts, -glyph=>'transcript', #......more options here -sort_order=>&sort_trans # should this be # \&sort_trans ? ); If i then run this I get the following error Can't call method "display_name" on unblessed reference at gene_page.cgi line 93, line 191. In the pod documentation for Glyphs it says to use $a and $b vars if I use these though i get an exception that tells me to use a ($$) prototype (which I think I'm doing by using a call back with prototype). I'm using Version 1.3 of bioperl. Can anyone point out what I should be doing or am doing wrong ? Thanks, Olly Burren -- ------------------------------------------------------------------------------- JDRF/WT Diabetes and Inflammation Laboratory Cambridge Institute for Medical Research Addenbrooke's Hospital Site Hills Road, Cambridge CB2 2XY Tel. +44 (0)1223 762598 Fax. +44 (0)1223 762102 ------------------------------------------------------------------------------- From lstein at cshl.edu Wed Dec 3 10:25:00 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Dec 3 10:31:21 2003 Subject: [Bioperl-l] Sorting Bio:Graphics::Glyphs In-Reply-To: <1070460886.2244.136.camel@jakarta> References: <1070460886.2244.136.camel@jakarta> Message-ID: <200312031025.00212.lstein@cshl.edu> I'm confused that the POD documentation tells you to use $a and $b. In my version of 1.3, the docs say this: >>>>>>>>>>>>>>>>>> Finally, a subroutine coderef with a $$ prototype can be provided. It will receive two B as arguments and should return -1, 0 or 1 (see Perl's sort() function for more information). For example, to sort a set of database search hits by bits (stored in the features' "score" fields), scaled by the log of the alignment length (with "start" position breaking any ties): sort_order = sub ($$) { my ($glyph1,$glyph2) = @_; my $a = $glyph1->feature; my $b = $glyph2->feature; ( $b->score/log($b->length) <=> $a->score/log($a->length) ) || ( $a->start <=> $b->start ) } It is important to remember to use the $$ prototype as shown in the example. Otherwise Bio::Graphics will quit with an exception. The arguments are subclasses of Bio::Graphics::Glyph, not the features themselves. While glyphs implement some, but not all, of the feature methods, to be safe call the two glyphs' feature() methods in order to convert them into the actual features. <<<<<<<<<<<<<<< This, of course, also explains the main problem you are having. Lincoln On Wednesday 03 December 2003 09:14 am, Oliver Burren wrote: > Hi, > > I have a lot of transcripts that I'm rendering using the excellent > Bio::Graphics module. I want to sort these by the last two characters in > the display_name attribute. I used the following callback > > sub sort_trans($$){substr($_[0]->display_name,-2)<=> > substr($_[1]->display_name,-2)} > # the last two chars are numbers therefore using <=> > > I then added > the following track glyph > > my $ttrack=$panel->add_track(\@transcripts, > -glyph=>'transcript', > #......more options here > -sort_order=>&sort_trans # should this be > # \&sort_trans ? > ); > > If i then run this I get the following error > > Can't call method "display_name" on unblessed reference at gene_page.cgi > line 93, line 191. > > In the pod documentation for Glyphs it says to use $a and $b vars if I > use these though i get an exception that tells me to use a ($$) > prototype (which I think I'm doing by using a call back with > prototype). > > I'm using Version 1.3 of bioperl. > > Can anyone point out what I should be doing or am doing wrong ? > > Thanks, > > Olly Burren -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From oliver.burren at cimr.cam.ac.uk Wed Dec 3 11:01:49 2003 From: oliver.burren at cimr.cam.ac.uk (Oliver Burren) Date: Wed Dec 3 11:08:15 2003 Subject: [Bioperl-l] Sorting Bio:Graphics::Glyphs In-Reply-To: <200312031025.00212.lstein@cshl.edu> References: <1070460886.2244.136.camel@jakarta> <200312031025.00212.lstein@cshl.edu> Message-ID: <1070467309.2247.184.camel@jakarta> Lincoln, Thanks for pointing this out. My fault I was using perldoc on the wrong version of Glyph.pm (1.2) to clear up confusion. Works perfectly now, Apologies, Olly On Wed, 2003-12-03 at 15:25, Lincoln Stein wrote: > I'm confused that the POD documentation tells you to use $a and $b. In my > version of 1.3, the docs say this: > > >>>>>>>>>>>>>>>>>> > Finally, a subroutine coderef with a $$ prototype can be provided. It > will receive two B as arguments and should return -1, 0 or 1 > (see Perl's sort() function for more information). For example, to > sort a set of database search hits by bits (stored in the features' > "score" fields), scaled by the log of the alignment length (with > "start" position breaking any ties): > > sort_order = sub ($$) { > my ($glyph1,$glyph2) = @_; > my $a = $glyph1->feature; > my $b = $glyph2->feature; > ( $b->score/log($b->length) > <=> > $a->score/log($a->length) ) > || > ( $a->start <=> $b->start ) > } > > It is important to remember to use the $$ prototype as shown in the > example. Otherwise Bio::Graphics will quit with an exception. The > arguments are subclasses of Bio::Graphics::Glyph, not the features > themselves. While glyphs implement some, but not all, of the feature > methods, to be safe call the two glyphs' feature() methods in order to > convert them into the actual features. > <<<<<<<<<<<<<<< > > This, of course, also explains the main problem you are having. > > Lincoln > > > On Wednesday 03 December 2003 09:14 am, Oliver Burren wrote: > > Hi, > > > > I have a lot of transcripts that I'm rendering using the excellent > > Bio::Graphics module. I want to sort these by the last two characters in > > the display_name attribute. I used the following callback > > > > sub sort_trans($$){substr($_[0]->display_name,-2)<=> > > substr($_[1]->display_name,-2)} > > # the last two chars are numbers therefore using <=> > > > > I then added > > the following track glyph > > > > my $ttrack=$panel->add_track(\@transcripts, > > -glyph=>'transcript', > > #......more options here > > -sort_order=>&sort_trans # should this be > > # \&sort_trans ? > > ); > > > > If i then run this I get the following error > > > > Can't call method "display_name" on unblessed reference at gene_page.cgi > > line 93, line 191. > > > > In the pod documentation for Glyphs it says to use $a and $b vars if I > > use these though i get an exception that tells me to use a ($$) > > prototype (which I think I'm doing by using a call back with > > prototype). > > > > I'm using Version 1.3 of bioperl. > > > > Can anyone point out what I should be doing or am doing wrong ? > > > > Thanks, > > > > Olly Burren -- ------------------------------------------------------------------------------- JDRF/WT Diabetes and Inflammation Laboratory Cambridge Institute for Medical Research Addenbrooke's Hospital Site Hills Road, Cambridge CB2 2XY Tel. +44 (0)1223 762598 Fax. +44 (0)1223 762102 ------------------------------------------------------------------------------- From wwhsiao at sfu.ca Wed Dec 3 14:06:50 2003 From: wwhsiao at sfu.ca (William Hsiao) Date: Wed Dec 3 14:13:14 2003 Subject: [Bioperl-l] change in Tools::SeqWords not reflected in tutorial link Message-ID: <000b01c3b9d0$a1afb950$a50aa8c0@microbe> Hi, I noticed that the link for Tool::SeqWords documentation from the tutorial page (http://bioperl.org/Core/Latest/bptutorial.html) is to the old version of SeqWords which only allows counting of "ATCG" as shown in the following code while($seqstring =~ /(([ACGT]){$word_length})/gim)... The new version that's distributed with Bioperl 1.2.3 (supposedly that's the one covered by the tutorial) counts any words (using /w instead of [ACGT]). Since there is no checking mechanism for input sequences, the change can cause drastic effect on the result (especially when ambiguous nucleotide sequences are involved). It would be great if the change can be reflected in the tutorial to avoid confusion. The link should be to (http://doc.bioperl.org/releases/bioperl-1.2.3/Bio/Tools/SeqWords.html). Thanks Cheers Will William Hsiao Graduate Student, Brinkman Laboratory Department of Molecular Biology and Biochemistry Simon Fraser University, 8888 University Dr. Burnaby, BC, Canada V5A 1S6 Phone: 604-291-4206 Fax: 604-291-5583 From cjm at fruitfly.org Wed Dec 3 14:46:29 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Wed Dec 3 14:53:45 2003 Subject: [Bioperl-l] proposed additions to SeqFeatureI, RangeI and FeatureHolderI In-Reply-To: <200311202024.36691.lstein@cshl.edu> Message-ID: On Thu, 20 Nov 2003, Lincoln Stein wrote: > On Wednesday 19 November 2003 09:47 pm, Chris Mungall wrote: > > I have some proposed changes I would like to commit to bioperl, mostly > > for using GFF3. > > > > In both SeqFeatureI and SeqFeature::Generic I would like to add some > > accessor methods. They would all map to tag-values. > > > > ID - synonym for tag_value('ID')[0] > > ParentIDs - synonym for tag_value('Parent') > > I like this. > > > add_ParentID > > remove_ParentID > > remove_ParentIDs > > > > Question - should the method be Parent or ParentID? In GFF3, the tag > > is "Parent". But an accessor method called "Parents()" feels like it > > should return objects, so I think ParentIDs() is better. > > Do the methods return IDs or objects? If they're returning IDs, then the > ParentID() name sounds right. Ok, let's go for ParentID > > Also, I realise it's contrary to bioperl convention to have method > > names in caps, but it's nice to be consistent with the GFF3 tags. > > If you want to be completely consistent with convention, how about get_ID() > and get_ParentIDs()? I have a private convention that initial capitalized > methods are autoloaded/autogenerated, but this is just me. I had imagined these to be 'first-class' accessors, like primary_tag(), seq(), etc (although they would be synonyms for get_tag_values('ID'), set_tag_values('ID'), ...) there seems to be 3 different kinds of attributes: foo() foo($foo) get_foo() set_foo($foo) get_tag_values('foo') set_tag_values('foo', [$foo]) I'm not sure what the rules are for deciding which attributes have which kinds of accessor > > I also notice that in SeqFeatureI we have an accessor definition and > > implementation for "primary_id". There is no definition for this. > > > > I propose either eliminating this, or making it a synonym of ID() > > Good with me. Ok > > I think we need clearly defined semantics for these fields. I think > > the semantics should be such that the ID should uniquely identify the > > feature. This is problemmatic, as most sources don't issue a unique > > accession or identifier for features. For example, genbank files > > provide a /gene for a lot of features, but this isn't even unique > > e.g. with multicopy genes. In cases where the data source does not > > provide a unique ID, we may want a way to generate them. So I think > > there should also be a method: > > > > generateID() > > > > which sets the ID field to something that's guaranteed unique. I'm not > > sure how. Perhaps a combination of the timestamp and the object memory > > reference? > > I think there was a proposal for globally_unique_ID() at some point. Perhaps > time to resurrect that thread? This is a tricky one... > > Because I'm lazy I'd rather do all this in SeqFeatureI - it all > > delegates to existing methods. But I am unsure as to bioperl > > conventions regarding when an 'interface' has implementation code. > > Happy to see it. Ok > > > > ---- > > > > I also want to add some code to FeatureHolderI, for dealing with the > > "nesting hierarchy" in bioperl, i.e. features that contain other > > features. > > > > The methods are: > > > > nest_features() > > > > creates a feature nesting hierarchy based on the "ID" and "Parent" > > tags. This is useful when parsing GFF3. > > Yes, I like this. > > > > > Also: > > > > flatten_features() > > > > for flattening the nesting hierarchy (so top_SeqFeatures and > > get_SeqFeatures return the same thing) > > I like this too. > > > > > Also: > > > > set_ParentIDs_from_hierarchy() > > > > This will go through the FeatureHolder hierarchy; any time it sees a > > feature with subfeatures, it will set the children's "Parent" tag > > according to the "ID" tag of the parent. If the parent does not have > > an ID, one will be generated. > > This sounds like an internal method that nobody should ever see in the API! Ok > > And nothing to do with the above code, I would like to add methods to > > RangeI for interbase coordinates. Love em or hate em, these methods > > will make some people's code easier at no cost to bioperl. > > > > First the interbase equivalent of start/end: > > > > istart > > iend > > > > Of course, iend is just a synonym for end, but it's nice for > > completion > > > > This is the equivalent of chado fmin/fmax. > > > > I would also like: > > > > ifrom > > ito > > > > For interbase directional coordinates. This is equivalent to > > istart,iend in the + strand, and the reverse of this in the - strand. > > I have no objection to these guys going into the Interface as the appropriate > implemented methods. That way they'd be available everywhere. Ok > Lincoln Chris From brian_osborne at cognia.com Wed Dec 3 15:39:14 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Dec 3 15:48:48 2003 Subject: [Bioperl-l] change in Tools::SeqWords not reflected in tutorial link In-Reply-To: <000b01c3b9d0$a1afb950$a50aa8c0@microbe> Message-ID: William, Fixed. Probably the most appropriate place to submit a requested fix like this is http://bugzilla.bioperl.org/, that way it won't get lost. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of William Hsiao Sent: Wednesday, December 03, 2003 2:07 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] change in Tools::SeqWords not reflected in tutorial link Hi, I noticed that the link for Tool::SeqWords documentation from the tutorial page (http://bioperl.org/Core/Latest/bptutorial.html) is to the old version of SeqWords which only allows counting of "ATCG" as shown in the following code while($seqstring =~ /(([ACGT]){$word_length})/gim)... The new version that's distributed with Bioperl 1.2.3 (supposedly that's the one covered by the tutorial) counts any words (using /w instead of [ACGT]). Since there is no checking mechanism for input sequences, the change can cause drastic effect on the result (especially when ambiguous nucleotide sequences are involved). It would be great if the change can be reflected in the tutorial to avoid confusion. The link should be to (http://doc.bioperl.org/releases/bioperl-1.2.3/Bio/Tools/SeqWords.html). Thanks Cheers Will William Hsiao Graduate Student, Brinkman Laboratory Department of Molecular Biology and Biochemistry Simon Fraser University, 8888 University Dr. Burnaby, BC, Canada V5A 1S6 Phone: 604-291-4206 Fax: 604-291-5583 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From pblaiklo at 110.net Wed Dec 3 22:13:28 2003 From: pblaiklo at 110.net (Peter Blaiklock) Date: Wed Dec 3 22:20:10 2003 Subject: [Bioperl-l] Re: [Bioperl-guts-l] Bio::Restriction modifications In-Reply-To: References: Message-ID: <200312032213.29340.pblaiklo@110.net> On Wednesday 03 December 2003 08:16, Rob Edwards wrote: > Peter, > > It sounds like we used the same approach to fix the same problems - > great minds :) > > However, we both fixed extra things and so we need to merge. > > Here > > > Bio::Restriction::Analysis > > > > Changed the site finding algorithm in '_new_cuts'. The new > > algorithm > > uses pos to find the recognition site regex and finds the actual cut > > point > > using $enz->cut. The cut positions are put in an array and substr is > > used > > to get digestion fragments. > > This is the best way to do it, it allows you to do a lot more with the > sequences. I also used this approach for ambiguous enzymes. For > non-ambiguous enzymes it is a lot more efficient to use an index based > string search rather than a regexp, and I added an internal method to > deal with this. Nice. > > Fixed the origin bug for circular target sequences. If the sequence > > is > > circular the first 40 bases are appended to the end of the sequence. > > Sites > > that span the origin are now detected. First and last fragments are > > reattached as before. > > I added a method to EnzymeCollection to return the longest cut site in > the collection and then added that length of sequence. This will > correctly deal with any site > 40 bp. Using position instead of > fragments sure makes this a lot easier, huh? Sure does. You might want to make sure that the length includes the padding for offset cut points and dual cutters. > > Fixed the non-palindromes bug. Non-palindromic sites are detected > > in > > the reverse orientation by searching the reverse complement of the > > target > > sequence, using $enz->complementary_cut and then subtracting from the > > sequence length. > > I added a new method to Enzyme.pm to return the complementary cut site. > I don't particularly like the way I did this. I'd like Analysis to > correctly use the Enzyme->cut and Enzyme->complementary_cut which it > doesn't at the moment. I think this is a lot better than using the "^" > in the site routine. You mean having site, cut and complementary_cut just be get/setters and letting the parser(s) deal with everything else? Sounds good. > > Fixed a bug where a site would be detected even if the > > complementary cut > > was off the end of the target sequence. > > I am not sure if I deal with this properly. I think it should be > handled properly with circularization? The problem is with offset cutters. If the cut point on the complementary strand is off the end, the target strand won't be cut either. Wouldn't matter if they were all blunt ended, but... There's a related problem with dual cutters. If either precut is off the end the enzyme won't digest at the postcut and should not be reported. As far as I could tell your code doesn't check for this but maybe I missed something? > > Added a method 'fragment_maps($enzyme_name)'. This method returns an > > array of hashes where each hash corresponds to a restriction fragment. > > The > > hash keys are 'start', 'end' and 'seq'. 'seq' holds the sequence of the > > fragment, while 'start' and 'end' hold the positions of the first and > > last > > bases of the fragment in the target sequence. This will allow > > restriction fragments to be turned into features. > > This is cool. > > > Added internal methods '_find_cut', '_multicut' and ' _digest'. > > '_find_cut' figures out the cut point given the position of the > > recognition > > site and checks whether the cut point is off the end of the molecule. > > This > > has to be done twice for non-palindromic enzymes, so it makes sense to > > make > > it a separate method. > > '_digest' gets the restriction fragments from the list of cut > > points. I > > split this off of '_new_cuts' for the sake of readability, particularly > > after we start dealing with overlapping sites. > > '_multicut' checks whether the "other" cut of a dual cut enzyme is > > off > > the end of the sequence. Could probably be folded back into > > '_find_cut'. > > I did something similar, but probably used about 5x as many methods > (making it much less clear I am sure). > > > Bio::Restriction::Enzyme > > > > Changed the enzyme name correction code so only a single trailing > > '1' > > gets changed to an 'I'. This should prevent name mangling of enzymes > > like Alw21I. > > > > Changed '$cut && $self->cut($cut);' to 'defined $cut && > > $self->cut($cut);' (same for complementary cut) in the constructor. > > This allows $self->cut to be zero. > > > > Some minor changes in methods 'cut' and 'site'. > > These should be committed. > > > Bio::Restriction::IO::base > > > > Changed method '_make_multicuts' so that the second enzyme is not > > added > > to the collection and the first enzyme is not added to the second > > enzyme's > > "others" array, only the other way around. The idea is to prevent dual > > cut > > sites from being reported twice. This will break scripts that used this > > behavior to work around the dual cut bug, but you shouldn't have to do > > that > > any more. > > > > Similar changes made to '_make_multisites'. > > These should be committed too. > > I'll try and merge this code with what I wrote and add it to cvs in a > day or so. Once I get a near working version, I'll let you know, > perhaps you can help debug the changes. Great. I'll assemble some tests. > Take a look at the latest versions in cvs and let me know what you > think. > > Rob Peter Blaiklock From d.gatherer at vir.gla.ac.uk Thu Dec 4 04:12:01 2003 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Thu Dec 4 04:16:36 2003 Subject: [Bioperl-l] Re: change in Tools::SeqWords not reflected in tutorial link In-Reply-To: <200312031956.hB3Jt1FF029474@portal.open-bio.org> Message-ID: <5.2.1.1.1.20031204090556.00ad6c60@udcf.gla.ac.uk> Hello SeqWords has been worked on recently. http://bugzilla.bioperl.org/show_bug.cgi?id=1554 How would you like to change the POD? If you let me know, I do the necessary and add it to bugzilla. I'm assuming here that the page http://doc.bioperl.org/releases/bioperl-1.2.3/Bio/Tools/SeqWords.html is generated automatically from the POD in the actual module code..... which is what it seems like to me. Cheers Derek From juguang at tll.org.sg Thu Dec 4 12:58:41 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Thu Dec 4 13:04:42 2003 Subject: [Bioperl-l] Interpro parser example Message-ID: <7EC8C1EC-2683-11D8-937F-000A957702FE@tll.org.sg> Hi, After fixing the few bugs and making the existing Interpro parser able to run throughout the current Interpro xml file, I would like to give an example to the list, in case someone was, like me, trying to use it without knowing how. use Bio::OntologyIO; my $file = 'interpro.xml'; my $io = Bio::OntologyIO->new( -format => 'interpro', -file => $file, -ontology_engine => 'simple' ); while(my $ontology = $io->next_ontology){ # Actually, there is only one ontology for an InterPro # print $ontology->name, "\n"; # InterPro # For the InterPro, there are always and only six root items, according to the type, see # http://www.ebi.ac.uk/interpro/user_manual.html#N673 my @roots = $ontology->get_root_terms; my @interpros; foreach(@roots){ push @interpros, $ontology->get_child_terms($_); } # each in @interpros is of Bio::Ontology::InterProTerm . foreach my $interpro(@interpros){ print "Interpro id:\t", $interpro->identifier; # Same as the method interpro_id,"\n"; print "Interpro (short) name:\t(", $interpro->short_name, ')', $interpro->name, "\n"; # examines the related records in member databases foreach my $member ($interpro->get_members){ # of Bio::Annotation::DBLink print "\tMember database:\t", $member->database, "\n"; # Such as SWISSPROT, PFAM print "\taccession number:\t", $member->primary_id, "\n"; # Such as P01234, PF00000 } # examines the publication foreach my $ref ($interpro->get_references){ # of Bio::Annotation::Reference } } } C&C? (Corrections and Comments) Juguang From kyliu at gate.sinica.edu.tw Fri Dec 5 01:51:10 2003 From: kyliu at gate.sinica.edu.tw (kyliu) Date: Fri Dec 5 08:47:41 2003 Subject: [Bioperl-l] Loading SwissProt Data into Oracle Message-ID: <00d201c3bafc$2b37ada0$8eae6d8c@sinica.edu.tw> Hi! As title. Although I read the previous discussion mail. I still not know how to use your tools to do it. Could you tell me step by step 1. how to create the SwissPort database schema on Oracle 2. how to load the SwissPort database 3. how to query the data from the SwissPort database Sincerely, Kawn-Yu Liu Academia Sinica Computing Center, The Republic of China,Taiwan From vesko_baev at abv.bg Fri Dec 5 15:22:53 2003 From: vesko_baev at abv.bg (Vesko Baev) Date: Fri Dec 5 15:29:14 2003 Subject: [Bioperl-l] RNA fold Message-ID: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg> Hi to all, if anyone knows a module or external program (which can be linked to bioperl) for folding a RNA predicting hairpins and calculating a free energy? Thanks to ALL! Vesselin Baev Bulgaria ----------------------------------------------------------------- http://www.pari.bg - ???????? ?? ???? ????? ??? ? ?? ?????????? ?? ???????! ?????????? ??! From biolist at brinkman.mbb.sfu.ca Fri Dec 5 15:26:49 2003 From: biolist at brinkman.mbb.sfu.ca (Matthew Laird) Date: Fri Dec 5 15:31:58 2003 Subject: [Bioperl-l] Blast return codes Message-ID: This is on the fringe of being off-topic... but still relivant. :) I've been trying to google around to find documentation regarding return codes Blast gives upon exit. Is there a list of return codes and conditions that cause non-zero return codes? The reason I ask is I'm having a problem calling Blast from bioperl. For some reason Blastall is returning a -1 exit code which of course causes bioperl to throw an exception. However I have modified my local install of bioperl to tell me the exact command that was being run. I've then run this exact command with the exact same input files from the command line and a standard 0 exit code is returned. Might anyone have any thoughts on why Blastall would be returning a -1 exit code? Thanks for any assistance you might be able to provide. -- Matthew Laird SysAdmin/Web Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From allenday at ucla.edu Fri Dec 5 16:20:47 2003 From: allenday at ucla.edu (Allen Day) Date: Fri Dec 5 16:27:08 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg> Message-ID: mfold can do this. there is not a parser written yet. On Fri, 5 Dec 2003, Vesko Baev wrote: > Hi to all, > if anyone knows a module or external program (which can be linked to bioperl) for folding a RNA predicting hairpins and calculating a free energy? > > Thanks to ALL! > > Vesselin Baev > Bulgaria > > ----------------------------------------------------------------- > http://www.pari.bg - ???????? ?? ???? ????? ??? ? > ?? ?????????? ?? ???????! > ?????????? ??! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason at cgt.duhs.duke.edu Fri Dec 5 17:56:30 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Dec 5 18:02:52 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg> References: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg> Message-ID: RNAFold is a nice freely available library for this (ViennaRNA fold) and there is a perl interface to it as well so you don't have to write any parsers you can get the data out from the structures. I found it very easy to use and you can link c or perl code against the libraries it provides. There is no specific integration with Bioperl however. -jason On Fri, 5 Dec 2003, Vesko Baev wrote: > Hi to all, > if anyone knows a module or external program (which can be linked to bioperl) for folding a RNA predicting hairpins and calculating a free energy? > > Thanks to ALL! > > Vesselin Baev > Bulgaria > > ----------------------------------------------------------------- > http://www.pari.bg - ???????? ?? ???? ????? ??? ? > ?? ?????????? ?? ???????! > ?????????? ??! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From letondal at pasteur.fr Sat Dec 6 05:12:37 2003 From: letondal at pasteur.fr (Catherine Letondal) Date: Sat Dec 6 05:19:02 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg>; from vesko_baev@abv.bg on Fri, Dec 05, 2003 at 10:22:53PM +0200 References: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg> Message-ID: <20031206111237.A272269@electre.pasteur.fr> On Fri, Dec 05, 2003 at 10:22:53PM +0200, Vesko Baev wrote: > Hi to all, > if anyone knows a module or external program (which can be linked to bioperl) for folding a RNA predicting hairpins and calculating a free energy? > > Thanks to ALL! > > Vesselin Baev > Bulgaria > You can run a remote rnafold by using the rnafold Pise/bioperl client. See: http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/PiseApplication/rnafold.html for the available parameters. See: http://www.pasteur.fr/recherche/unites/sis/Pise/#pisebioperl for documentation. See an example of use of this module here: http://www.pasteur.fr/recherche/unites/sis/Pise/bioperl-examples/ I don't know how it can be connected to the parser, but I would be interested if you have an example (for inclusion in the example code). -- Catherine Letondal -- Pasteur Institute Computing Center From morten at binf.ku.dk Sat Dec 6 11:04:45 2003 From: morten at binf.ku.dk (Morten Lindow) Date: Sat Dec 6 11:10:58 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: <20031206111237.A272269@electre.pasteur.fr> Message-ID: <000001c3bc12$aab35ae0$6701a8c0@futkarl> On Fri, Dec 05, 2003 at 10:22:53PM +0200, Vesko Baev wrote: > Hi to all, > if anyone knows a module or external program (which can be linked to >bioperl) for folding a RNA predicting hairpins and calculating a free >energy? Yes the Vienna RNA package has a perl wrapping, which among other things allows you to code like this use RNA; my ($structure, $minimum_free_energy) = RNA::fold('auccuaacuggucuuagg'); -- Morten Lindow From cjfields at uiuc.edu Sat Dec 6 14:48:45 2003 From: cjfields at uiuc.edu (Chris Fields) Date: Sat Dec 6 22:13:01 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg> References: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg> Message-ID: <341007BC-2825-11D8-B7AC-000A9568B714@uiuc.edu> I think, like the rest, that RNAFold may be the easiest way to go. mfold is a free program but distribution is bound up by licensing issues (I have it but can't redistribute it due to this; the web interfaces available have some limitations which I couldn't do without). RNAFold doesn't have these problems and the source code is available on the web, plus (like Jason pointed out) there are perl interfaces. There is also something in the book Genomic Perl on calculating energies and drawing secondary structures, but I haven't checked it out in detail. Personally, I am working on a bioperl parser for the RNAmotif program suite (used to search for conserved secondary structures based on a descriptor). The rnamotif program is able to pass the motif hits to efn or efn2 for calculating free energy (based on different energy rules) and can output CT format files. I'm also thinking about doing something similar for tRNAscan-SE and ERPIN at some point. The problem I'm running into is how to store the secondary structure output for inclusion into GFF databases (I'm currently using Bio::SeqFeature::Generic for storing features). Anyone? Chris Fields Postdoctoral Reseacher - Dept. of Biochemistry University of Illinois at Urbana-Champaign On Dec 5, 2003, at 2:22 PM, Vesko Baev wrote: > Hi to all, > if anyone knows a module or external program (which can be linked to > bioperl) for folding a RNA predicting hairpins and calculating a free > energy? > > Thanks to ALL! > > Vesselin Baev > Bulgaria > > ----------------------------------------------------------------- > http://www.pari.bg - ???????? ?? ???? ????? ??? ? > ?? ?????????? ?? ???????! > ?????????? ??! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2559 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031206/396c24d9/attachment.bin From sanjib at uchicago.edu Sat Dec 6 21:50:24 2003 From: sanjib at uchicago.edu (Sanjib Dutta) Date: Sat Dec 6 22:13:16 2003 Subject: [Bioperl-l] indexing nr database Message-ID: <000e01c3bc6c$ddc2d910$f223fea9@Sanjib> Hi , I am trying to index the nr database (which I downloaded from the ncbi website in Fasta format) from the blast website using Bio::Index::Fasta and its making two files nr.pag and nr.dir which are in some strange format. Can anybody please tell me if there is a way to deal with this? Finally I want to get several protein sequences for feeding into ClustalW using the gi identification, and I need an index file for that. Thanks a lot Sanjib From pblaiklo at 110.net Sun Dec 7 22:32:52 2003 From: pblaiklo at 110.net (Peter Blaiklock) Date: Sun Dec 7 22:39:26 2003 Subject: [Bioperl-l] Bio::Restriction::Analysis feature merge In-Reply-To: <200312021950.51773.pblaiklo@110.net> References: <200311211807.hALI7e12028431@pub.open-bio.org> <200312021950.51773.pblaiklo@110.net> Message-ID: <200312072232.52575.pblaiklo@110.net> Hi I modified my new Bio::Restriction::Analysis module to incorporate Rob's changes, including the use of index to locate nonambiguous sites and a digestion method that can handle multiple enzymes. Please let me know about any problems, suggestions, bugs, etc. I should have tests ready in a day or two. http://www.restrictionmapper.org/bioperl/Analysis.pm http://www.restrictionmapper.org/bioperl/Enzyme.pm http://www.restrictionmapper.org/bioperl/base.pm Peter Blaiklock From heikki at nildram.co.uk Mon Dec 8 09:08:16 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Mon Dec 8 09:14:26 2003 Subject: [Bioperl-l] Re: I want to construct BioC#. Can you help me? In-Reply-To: References: Message-ID: <200312081408.16041.heikki@nildram.co.uk> Dear Feng Yu, I'll cc this to the bioperl mailing list as your question really is far beyond my capabilities to answer. Bioperl has been built around sequence classes with interface files determining ways to have sequence objects in varying complexity. These are created by SeqIO modules; one of which exists for each different text format. There is a well written HOWTO document explaining SeqIO in bioperl. Most of the Bioperl have grown organically around these sequence objects. See the models directory in the distribution for some rough graphs. There have been some expressions of interests to rewrite everything from scratch which is an other reason to post this to the list. The third reason is that these kind of projects need a community to survive. It has to be run in very open way so that new people can continuously join in and contribute to it - especially once the people who started it move on. That principle leads us to a really big problem that I can see your project: The C# programming language is not open source and it is bound to closed operating system. For a new programming language to take hold is has to have some quality which makes people want to start using it. Most of bioinformatics is currently done using *NIX operating systems. I can not see any reason why MSWindows would quickly become the OS of choice for bioinformatics. Quite the contrary as a matter of fact. Good luck with your project. Yours, -Heikki On Monday 08 Dec 2003 12:49 pm, you wrote: > Hi Heikki Lehvaslaiho: > I know you are one of the main contributors to BioPrel. I'm very > interested in bioinformatics, too. Although there are biojava and bioperl > now, I think we should have libraries for biological applications written > by C, the most important computer language, also. So I'm determined to > advocate the BioC# project, using the latest and most powerful language C#, > an extensive and enhanced variant of C. Yet since computer has so many > applications in bioinformatics, with so many classes to be constructed, I'm > absolutely puzzled. You have made a perfect work in bioPerl. So, can you > put me out with your experience? Tell me how to design so many classes and > clarify their relations. I'm waiting for your reply. > Thank > you! > Yours Sincerely > Feng Yu > 12/08/2003 > > ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡Feng Yu > ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡yfbio@hotmail.com > ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡2003-12-08 > > http://www.biosino.org/members/bioinformatics/ -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From p.lord at russet.org.uk Mon Dec 8 09:38:17 2003 From: p.lord at russet.org.uk (Phillip Lord) Date: Mon Dec 8 09:44:25 2003 Subject: [Bioperl-l] Re: I want to construct BioC#. Can you help me? In-Reply-To: <200312081408.16041.heikki@nildram.co.uk> References: <200312081408.16041.heikki@nildram.co.uk> Message-ID: >>>>> "Heikki" == Heikki Lehvaslaiho writes: Heikki> That principle leads us to a really big problem that I can Heikki> see your project: The C# programming language is not open Heikki> source and it is bound to closed operating system. For a new Heikki> programming language to take hold is has to have some Heikki> quality which makes people want to start using it. Most of Heikki> bioinformatics is currently done using *NIX operating Heikki> systems. I can not see any reason why MSWindows would Heikki> quickly become the OS of choice for bioinformatics. Quite Heikki> the contrary as a matter of fact. I think that you are confusing C# and .Net. I've not had any experience of this myself, but a good friend of mine does all of his development in C# on windows (because he likes the development environment), and then runs them all on linux. Mono, the GCC front end for C# works perfectly well I'm told. The practical upshot of all this is that C# is more open source than Java. Of course what bioC# would add to the situation that bioperl/java/python/add other language here, does not already provide I'm less sure of. Right back to lurking. Phil From james.wasmuth at ed.ac.uk Mon Dec 8 12:17:14 2003 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Mon Dec 8 12:30:53 2003 Subject: [Bioperl-l] Getting a Gene Object Message-ID: <3FD4B21A.1080501@ed.ac.uk> Dear All, I am trying to extract the CDS sequence from EBML files. I've got so far as pulling back the features from the EMBL file, but noticed that $feat->seq->seq give me the sequence from the start to the end of the feature ignoring the presence of introns. SeqFeature::Gene seems to provide me with what I want with @exons=$gene->exons. But how do I go from the EMBL file to a SeqFeature::Gene::GeneStructure object? I can't seem to see where the required object is returned from... Many thanks james -- "I have not failed. I've just found 10,000 ways that don't work." --- Thomas Edison Nematode Bioinformatics Blaxter Nematode Genomics Group School of Biological Sciences Ashworth Laboratories King's Buildings University of Edinburgh Edinburgh EH9 3JT UK +44 (0) 131 650 7403 http://www.nematodes.org From jason at cgt.duhs.duke.edu Mon Dec 8 12:38:12 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Dec 8 12:44:24 2003 Subject: [Bioperl-l] Getting a Gene Object In-Reply-To: <3FD4B21A.1080501@ed.ac.uk> References: <3FD4B21A.1080501@ed.ac.uk> Message-ID: On Mon, 8 Dec 2003, James Wasmuth wrote: > Dear All, > > I am trying to extract the CDS sequence from EBML files. I've got so > far as pulling back the features from the EMBL file, but noticed that > $feat->seq->seq give me the sequence from the start to the end of the > feature ignoring the presence of introns. > $feat->seq returns the underlying sequence the feature is attached to, $feat->spliced_seq gives you the sequence defined by the location in $feat spliced together. > SeqFeature::Gene seems to provide me with what I want with > @exons=$gene->exons. But how do I go from the EMBL file to a > SeqFeature::Gene::GeneStructure object? I can't seem to see where the > required object is returned from... There is currently no way in bioperl to make this happen automatically because it is a bit of a hard problem to do it correctly all the time. Some say it requires an ontology (enter Sequence Ontology [SO/SOFA]) to map the EMBL/GenBank annotations into objects which have semantics like exon/intron from CDS. Chris Mungall has put some effort into Bio::SeqFeature::Tools::Unflattener to begin to achive this. However the next step is to take unflattened objects and build SeqFeature::Gene object (where appropriate) from this proper hierarchy of gene,mRNA,exon. A simple way is to use the Unflattener to produce GFF3 compliant features, load these into Bio::DB::GFF and use Lincoln's aggregators to get out the genes. I'm not sure if GFF3 (>2 level hierarchies) is completely supported there yet though. All of this is a bit bleeding edge so you have to follow up some of the past mailing list traffic to get the whole picture and/or bug people for a project's status. Best, -jason > > Many thanks > > james > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From birney at ebi.ac.uk Mon Dec 8 12:48:23 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon Dec 8 12:54:42 2003 Subject: [Bioperl-l] Getting a Gene Object In-Reply-To: <3FD4B21A.1080501@ed.ac.uk> Message-ID: On Mon, 8 Dec 2003, James Wasmuth wrote: > Dear All, > > I am trying to extract the CDS sequence from EBML files. I've got so > far as pulling back the features from the EMBL file, but noticed that > $feat->seq->seq give me the sequence from the start to the end of the > feature ignoring the presence of introns. you can use $feat->spliced_seq() to get the sequences and then potentially translate and of course you can pick up the translated sequences as a tag from $feat with the each_tag() command. spliced_seq() is your friend here. > > SeqFeature::Gene seems to provide me with what I want with > @exons=$gene->exons. But how do I go from the EMBL file to a > SeqFeature::Gene::GeneStructure object? I can't seem to see where the > required object is returned from... > i am not sure if anyone has written this. Solving the general case of "set of EMBL features" --> "Gene Structure" is surprisingly hard as (for example) the mRNA and CDS lines can be subtly different, leading to "interesting" semantic decisions about what to do... Of course, a 90% decent method is probably relatively easy. Contributions welcome ;). > Many thanks > > james > > -- > "I have not failed. I've just found 10,000 ways that don't work." > --- Thomas Edison > > Nematode Bioinformatics > Blaxter Nematode Genomics Group > School of Biological Sciences > Ashworth Laboratories > King's Buildings > University of Edinburgh > Edinburgh > EH9 3JT > UK > > +44 (0) 131 650 7403 > > http://www.nematodes.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason at cgt.duhs.duke.edu Mon Dec 8 13:06:43 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Dec 8 13:13:09 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: <341007BC-2825-11D8-B7AC-000A9568B714@uiuc.edu> References: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg> <341007BC-2825-11D8-B7AC-000A9568B714@uiuc.edu> Message-ID: On Sat, 6 Dec 2003, Chris Fields wrote: > I think, like the rest, that RNAFold may be the easiest way to go. > mfold is a free program but distribution is bound up by licensing > issues (I have it but can't redistribute it due to this; the web > interfaces available have some limitations which I couldn't do > without). RNAFold doesn't have these problems and the source code is > available on the web, plus (like Jason pointed out) there are perl > interfaces. There is also something in the book Genomic Perl on > calculating energies and drawing secondary structures, but I haven't > checked it out in detail. > > Personally, I am working on a bioperl parser for the RNAmotif program > suite (used to search for conserved secondary structures based on a > descriptor). The rnamotif program is able to pass the motif hits to > efn or efn2 for calculating free energy (based on different energy > rules) and can output CT format files. I'm also thinking about doing > something similar for tRNAscan-SE and ERPIN at some point. The problem > I'm running into is how to store the secondary structure output for > inclusion into GFF databases (I'm currently using > Bio::SeqFeature::Generic for storing features). Anyone? Chris - I assume the structure is represented as string like <<<...>>>> or ((((...)))) ? If you do $feat->add_tag_value('secondary_structure',$str); This should store okay in a DB::GFF db or is that not really working for you? There are some newish bioperl objects Seq::Meta which are for representing some bit of information about each base - maybe this is the place RNA or Protein secondary structure information can be coded. I'm not sure of what is best way to store these data - Heikki and others have mostly worked on them so I can only hand wave at this point. I'm not sure what type of computing you want to do on the data, depending on what you want to do, might dictate creating/using different objects. i.e. if you wanted to get the residues of the stems I think you might want to build a special object which can represent the pairing after parsing it out of the structure string. -jason > > Chris Fields > Postdoctoral Reseacher - Dept. of Biochemistry > University of Illinois at Urbana-Champaign > > On Dec 5, 2003, at 2:22 PM, Vesko Baev wrote: > > > Hi to all, > > if anyone knows a module or external program (which can be linked to > > bioperl) for folding a RNA predicting hairpins and calculating a free > > energy? > > > > Thanks to ALL! > > > > Vesselin Baev > > Bulgaria > > > > ----------------------------------------------------------------- > > http://www.pari.bg - §¯§Ñ§Þ§Ú§â§Ñ§ä§Ö §Ý§Ú §±§¡§²§ª §Ó§ã§Ö§Ü§Ú §Õ§Ö§ß ? > > §¯§Ö §â§Ñ§Ù§é§Ú§ä§Ñ§Û§ä§Ö §ß§Ñ §Ü§ì§ã§Þ§Ö§ä§Ñ! > > §¡§Ò§à§ß§Ú§â§Ñ§Û§ä§Ö §ã§Ö! > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From ymc at paxil.stanford.edu Mon Dec 8 14:44:13 2003 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Mon Dec 8 14:56:58 2003 Subject: [Bioperl-l] Bioperl parser for PolyPhred? Message-ID: Hi people My boss asked me to parse output generated by SNP identification program called PolyPhred (http://droog.mbt.washington.edu/PolyPhred.html). Is there a parser already in bioperl that can save me the pain of writing one? Thanks a lot. Yee Man From heikki at nildram.co.uk Mon Dec 8 15:18:54 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Mon Dec 8 15:51:13 2003 Subject: [Bioperl-l] Bioperl-run Developer snapshot 1.3.01 Message-ID: <200312082018.54358.heikki@nildram.co.uk> Bioperl-run Developer snapshot 1.3.01 ------------------------------------- This is a developer release 1.3.01 of Bioperl-run which contains wrappers for external programs. The stable one will be released together with bioperl core 1.4. http://bioperl.org/DIST/current_run_unstable.tar.gz http://bioperl.org/DIST/bioperl-run-devel-1.3.01.tar.gz Highlights: New wrappers for primer3 and hmmer. There is now a workflow modulue for Pise. Critical fixes for the following wrappers: Neighbor (PHYLIP), Vista, mfold (support for new v. 3.1.2) Enjoy, -Heikki on behalf of the Bioperl core team NEW FILES: + Bio::Tools::Run::Primer3 - Porting Rob Edwards extension of Primer3 wrapper and parser + Bio::Tools::Run::Hmmer - Hmmer wrapper factory, currently supports hmmsearch, hmmalign,hmmpfam, hmmbuild and hmmcalibrate + t/Hmmer.t - test for Hmmer + Bio::Tools::Run::PiseWorkflow - PiseWorkflow is a module for building and running a workflow on top of PiseApplication instances. CHANGED FILES: * Bio::Tools::Run::Phylo::Phylip::Neighbor - properly get the hash value * Bio::Tools::Run::Phylo::Molphy::ProtML - align the code * Bio::Tools::Run::Vista - Added java param to allow setting of java parameters, for example to set higher heap sizes.. - Major bug, need to project aligmnents, basically need to remove position in pairwise aligmnet if both are gap characters, else vista will count wrongly * Bio::Tools::Run::Alignment::Lagan - Allowing of setting of outfile - can't declare and expect a variable at the same time * Bio::Tools::Run::FootPrinter - fix stupidity with tempfilehandle and outfh not being closed properly as reported by Ben Westover * Bio::Tools::Run::EMBOSSApplication - no need for this debugging * Bio::Tools::Run::PiseApplication::mfold - Pise/bioperl module for a recent version of mfold (3.1.2). * Bio::Tools::Run::Phylo::Phylip::Consense - typo * Bio::Tools::Run::Phylo::PAML::Codeml - fix typo reported in bug #1505 - default should be an arrayref - George Hartzell fixes * Bio::Tools::Run::Phylo::PAML::Yn00 - bug #1507, SYNOPSIS is runnable now with API changes - bug #1507 add MSwin .exe extension where appropriate - wait -this isn't necessary , we already pad the PROGRAMNAME with the .exe when appropriate - George Hartzell fixes * Bio::Tools::Run::FootPrinter - POD fixes * Bio::Tools::Run::Alignment::Blat - well, at least it runs the blat program now, although it isn't seemingly able to find the output tempfile to retrieve the results * t/Blat.t - well, at least it runs the blat program now, although it isn't seemingly able to find the output tempfile to retrieve the results - various fixes to get blat runnable in line with bioperl-live 1.2.3 all tests now pass. * t/Mdust.t - typo * Bio::Tools::Run::Mdust - use save_tempfiles, use PROGRAMNAME as a variable, use tempfile instead of randomly generating filename - Modified call to io->tmpfile to get file name as well as handle * Makefile.PL - George Hartzell fixes * scripts/panalysis.PLS - Patched, thank you Albert * AUTHORS - updated using info from maintanence/authors.pl script From cjfields at uiuc.edu Mon Dec 8 16:12:20 2003 From: cjfields at uiuc.edu (Chris Fields) Date: Mon Dec 8 16:18:37 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: References: <1208941531.1070655773114.JavaMail.nobody@app2.ni.bg> <341007BC-2825-11D8-B7AC-000A9568B714@uiuc.edu> Message-ID: <1070917939.11116.66.camel@chrisfields.life.uiuc.edu> On Mon, 2003-12-08 at 12:06, Jason Stajich wrote: > On Sat, 6 Dec 2003, Chris Fields wrote: > > > I think, like the rest, that RNAFold may be the easiest way to go. > > mfold is a free program but distribution is bound up by licensing > > issues (I have it but can't redistribute it due to this; the web > > interfaces available have some limitations which I couldn't do > > without). RNAFold doesn't have these problems and the source code is > > available on the web, plus (like Jason pointed out) there are perl > > interfaces. There is also something in the book Genomic Perl on > > calculating energies and drawing secondary structures, but I haven't > > checked it out in detail. > > > > Personally, I am working on a bioperl parser for the RNAmotif program > > suite (used to search for conserved secondary structures based on a > > descriptor). The rnamotif program is able to pass the motif hits to > > efn or efn2 for calculating free energy (based on different energy > > rules) and can output CT format files. I'm also thinking about doing > > something similar for tRNAscan-SE and ERPIN at some point. The problem > > I'm running into is how to store the secondary structure output for > > inclusion into GFF databases (I'm currently using > > Bio::SeqFeature::Generic for storing features). Anyone? > > Chris - I assume the structure is represented as string like > <<<...>>>> or ((((...)))) ? > If you do > $feat->add_tag_value('secondary_structure',$str); > > This should store okay in a DB::GFF db or is that not really working for > you? I think that would work. I will have to do some fiddling with the program output to get it into that format. One problem is taht RNAmotif allows mismatches in some of the segments. RNAmotif's raw output is a bit like FASTA. Here's a bit from one of my analyses (the PyrR mRNA-binding site in Bacillus subtilis, rub from the Genbank file): #RM scored #RM descr h5(tag='H1') ss(tag='S1') h5(tag='H2') h5(tag='H2t') ss(tag='S2') h3(tag='H2t') h3(tag='H2') ss(tag='S3') h3(tag='H1') >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -6.300 0 1617567 35 attctt taaaa cagt c cagaga g gctg ag aaggat >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -8.000 0 1617567 35 attcttt aaaa cagt c cagaga g gctg a gaaggat >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -5.200 0 1617568 33 ttctt taaaa cagt c cagaga g gctg ag aagga >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -6.900 0 1617568 33 ttcttt aaaa cagt c cagaga g gctg a gaagga >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -0.400 0 1617568 32 ttcttt aaaa cagt c cagaga g gctg . agaagg >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -7.200 0 1617569 32 tcttt aaaa cagt c cagaga g gctg ag aagga >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -3.900 0 1617569 31 tctt taaaa cagt c cagaga g gctg ag aagg >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -5.600 0 1617569 31 tcttt aaaa cagt c cagaga g gctg a gaagg >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -4.800 0 1617570 30 cttt aaaa cagt c cagaga g gctg ag aagg >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, complete genome. gi|16077068|gb|NC_000964|NC_000964 -4.100 0 1617570 29 cttt aaaa cagt c cagaga g gctg a gaag .... The first two lines (marked with ##) are the initialization line and a bit from the descriptor file (describing the secondary structural characteristics). The different segments of the structure are given a designation (ss=single stranded, etc) and a tag (any name, although I use simple ones). The tags help when describing more complex structures by allowing for pairing between distant sites and higher level interactions (pseudoknots and tertiary and quaternary structures, although I haven't needed these). The output is like fasta, but the sequence data is replaced by the database hit (usually the acc. #), score (in this case, free energy), strand of hit, start of hit, length of hit and the sequence itself, broken up into segments matching the elements in the descriptor. This is where the trouble lies; as RNAmotif allows for mismatches in the descriptor (to allow for internal bulges), the parser for the sequence elements will need to be intelligent enough to pick this out. Also note that the data hits are redundant (they are retained b/c they fall below a predetermined threshold from the calculated free energy, determined in the descriptor file. I plan on including a parser to clean this up (retain the best score of a fold located within a certain sequence range, probably less than 10 bp). There's a program in the RNAmotif suite to do this (rmprune), but it doesn't always "prune" to the best sequence hit. > There are some newish bioperl objects Seq::Meta which are for representing > some bit of information about each base - maybe this is the place RNA or > Protein secondary structure information can be coded. > I'm not sure of what is best way to store these data - Heikki and others > have mostly worked on them so I can only hand wave at this point. > > > I'm not sure what type of computing you want to do on the data, depending > on what you want to do, might dictate creating/using different objects. > i.e. if you wanted to get the residues of the stems I think you might want > to build a special object which can represent the pairing after parsing it > out of the structure string. My main use for this is to map these database hits against the sequence using Gbrowse. I would like to add a Gbrowse plugin to link to some sort of secondary structure output, maybe from the Vienna package to represent the secondary structure (if using the parenthetical notation). I can also get CT format output from another program in the RNAmotif suite (rm2ct), so changing formats shouldn't be too hard but does require passing the output file through rm2ct. My main concern is getting the data into some format that could retain structural information that would prevent informational loss. > -jason > > > > > Chris Fields > > Postdoctoral Reseacher - Dept. of Biochemistry > > University of Illinois at Urbana-Champaign > > > > On Dec 5, 2003, at 2:22 PM, Vesko Baev wrote: > > > > > Hi to all, > > > if anyone knows a module or external program (which can be linked to > > > bioperl) for folding a RNA predicting hairpins and calculating a free > > > energy? > > > > > > Thanks to ALL! > > > > > > Vesselin Baev > > > Bulgaria > > > > > > ----------------------------------------------------------------- > > > http://www.pari.bg - 篧ѧާڧѧ ݧ ç±§á§² ӧ֧ܧ Õ§Ö§ ? > > > 篧 ѧ٧ڧѧۧ ß§ ܧާ֧! > > > 硧ҧߧڧѧۧ ! > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Christopher Fields Lab of Dr. Robert Switzer Dept. of Biochemistry University of Illinois at Urbana-Champaign From cmiltonperl at yahoo.com Mon Dec 8 16:33:15 2003 From: cmiltonperl at yahoo.com (Christopher Milton) Date: Mon Dec 8 16:39:21 2003 Subject: [Bioperl-l] Bioperl parser for PolyPhred? In-Reply-To: Message-ID: <20031208213315.51738.qmail@web20811.mail.yahoo.com> Yee, --- Yee Man Chan wrote: > Hi people > > My boss asked me to parse output generated by SNP identification > program called PolyPhred (http://droog.mbt.washington.edu/PolyPhred.html). > Is there a parser already in bioperl that can save me the pain of writing > one? I use Search.CPAN.org to check for (Bio)Perl modules: http://search.cpan.org/ http://search.cpan.org/search?query=phred&mode=all searching for "phred" got me http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Assembly/IO/phrap.pm which is part of the standard BioPerl distribution. Chris Milton From heikki at nildram.co.uk Mon Dec 8 15:18:37 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Mon Dec 8 16:41:09 2003 Subject: [Bioperl-l] Bioperl-ext Developer snapshot 1.3.01 Message-ID: <200312082018.37975.heikki@nildram.co.uk> Bioperl-ext Developer snapshot 1.3.01 ------------------------------------- This is a developer release 1.3.01 of Bioperl-Ext which contains C compiled Extensions. The stable one will be released together with bioperl core 1.4. http://bioperl.org/DIST/current_ext_unstable.tar.gz http://bioperl.org/DIST/bioperl-ext-devel-1.3.01.tar.gz Changes since 0.6 ----------------- Yee Man Chan's has written alternative pairwise alignment code into Bio/Ext/Align which is accessed by Bio::Tools::dpAlign in bioperl-live. Read the README for details Enjoy, -Heikki on behalf of the Bioperl core team NEW FILES ========= * Bio/Ext/Align/libs/{dpalign.c dpalign.h linspc.c} incorporate Yee Man Chan's alternative alignment code CHANGES ======= * README updated to include info about dpAlign and common io_lib problems * Bio::Ext::Align {Align.xs Makefile.PL typemap} incorporate Yee Man Chan's alternative alignment code From ymc at paxil.stanford.edu Mon Dec 8 16:37:55 2003 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Mon Dec 8 16:43:59 2003 Subject: [Bioperl-l] Bioperl parser for PolyPhred? In-Reply-To: <20031208213315.51738.qmail@web20811.mail.yahoo.com> Message-ID: Thanks for your reply, Chris. I am aware of this phrap module. However, it only parses phred output but not polyphred output. Phred is a base calling program whereas PolyPhred is a SNP identification program. They are quite different in terms of purposes. However, they do share very similar structure in terms of output. If there is not such a module, maybe I can write one and contribute it back to bioperl. Regards, Yee Man On Mon, 8 Dec 2003, Christopher Milton wrote: > Yee, > --- Yee Man Chan wrote: > > Hi people > > > > My boss asked me to parse output generated by SNP identification > > program called PolyPhred (http://droog.mbt.washington.edu/PolyPhred.html). > > Is there a parser already in bioperl that can save me the pain of writing > > one? > > I use Search.CPAN.org to check for (Bio)Perl modules: > http://search.cpan.org/ > > http://search.cpan.org/search?query=phred&mode=all > searching for "phred" got me > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Assembly/IO/phrap.pm > > which is part of the standard BioPerl distribution. > > Chris Milton > From jason at cgt.duhs.duke.edu Mon Dec 8 17:57:58 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Dec 8 18:04:07 2003 Subject: [Bioperl-l] Bioperl parser for PolyPhred? In-Reply-To: References: Message-ID: Yee Man - AFAIK no one has written one so you will probably have to roll your own. We would be delighted to have you contribute it to Bioperl. You might also look at PolyBayes as another potential automated SNP calling system that Gabor Marth, Ian Korf and others worked on which uses phred/phrap as part of the system. http://www.genome.wustl.edu/groups/informatics/software/polybayes/ -jason On Mon, 8 Dec 2003, Yee Man Chan wrote: > Thanks for your reply, Chris. I am aware of this phrap module. However, it > only parses phred output but not polyphred output. Phred is a base calling > program whereas PolyPhred is a SNP identification program. They are quite > different in terms of purposes. However, they do share very similar > structure in terms of output. If there is not such a module, maybe I can > write one and contribute it back to bioperl. > > Regards, > Yee Man > > On Mon, 8 Dec 2003, Christopher Milton wrote: > > > Yee, > > --- Yee Man Chan wrote: > > > Hi people > > > > > > My boss asked me to parse output generated by SNP identification > > > program called PolyPhred (http://droog.mbt.washington.edu/PolyPhred.html). > > > Is there a parser already in bioperl that can save me the pain of writing > > > one? > > > > I use Search.CPAN.org to check for (Bio)Perl modules: > > http://search.cpan.org/ > > > > http://search.cpan.org/search?query=phred&mode=all > > searching for "phred" got me > > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Assembly/IO/phrap.pm > > > > which is part of the standard BioPerl distribution. > > > > Chris Milton > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From hlapp at gmx.net Tue Dec 9 00:16:26 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Dec 9 00:22:34 2003 Subject: [Bioperl-l] Loading SwissProt Data into Oracle In-Reply-To: <00d201c3bafc$2b37ada0$8eae6d8c@sinica.edu.tw> Message-ID: Some very nice people have written up instructions in README and INSTALL documents that come with the biosql-schema download, and also with bioperl-db. Specifically check out README and INSTALL in the biosql-schema directory, as well as two files of the same name in the biosql-schema/sql/biosql-ora directory. As for the relational model, check out the ERD in biosql-schema/doc/biosql-ERD.pdf. Have you read those documents and tried to follow them step-by-step? If no, do so. If yes, and you encountered errors that prevent you from proceeding, post those errors here verbatim along with a description at which particular step you encountered the failure. -hilmar On Thursday, December 4, 2003, at 10:51 PM, kyliu wrote: > Hi! > > As title. Although I read the previous discussion mail. I still not > know how to use your tools to do it. Could you tell me step by step > 1. how to create the SwissPort database schema on Oracle > 2. how to load the SwissPort database > 3. how to query the data from the SwissPort database > > Sincerely, > Kawn-Yu Liu > Academia Sinica Computing Center, > The Republic of China,Taiwan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jbedell at oriongenomics.com Tue Dec 9 01:03:27 2003 From: jbedell at oriongenomics.com (Joseph Bedell) Date: Tue Dec 9 01:09:48 2003 Subject: [Bioperl-l] Blast return codes Message-ID: <434AF352F9D03C4C896782B8CC78BC7628D2D2@vader.oriongenomics.com> Hi Matthew, I didn't see that anyone answered this question for you. See my comments below. >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >bounces@portal.open-bio.org] On Behalf Of Matthew Laird >Sent: Friday, December 05, 2003 2:27 PM >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] Blast return codes > >This is on the fringe of being off-topic... but still relivant. :) > >I've been trying to google around to find documentation regarding return >codes Blast gives upon exit. Is there a list of return codes and >conditions that cause non-zero return codes? No, there is no such list. We recently wrote an O'Reilly book on BLAST but we did not track down the return codes. We'll probably do that for the next edition. ;) > >The reason I ask is I'm having a problem calling Blast from bioperl. For >some reason Blastall is returning a -1 exit code which of course causes >bioperl to throw an exception. However I have modified my local install >of bioperl to tell me the exact command that was being run. I've then run >this exact command with the exact same input files from the command line >and a standard 0 exit code is returned. Might anyone have any thoughts on >why Blastall would be returning a -1 exit code? I believe that all the return codes from blastall are positive integers. A -1 return code may signify a problem with the system call from PERL. The error messages are usually pretty comprehensive. Can you send the error message that's produced? Thanks, Joey ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Joseph A Bedell, Ph.D. Director, Bioinformatics Orion Genomics, LLC 4041 Forest Park Ave. St. Louis, MO 63108 (314)615-6979; fax:(314)615-6975 http://www.oriongenomics.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From heikki at nildram.co.uk Tue Dec 9 05:23:56 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 9 06:04:05 2003 Subject: [Bioperl-l] How to proceed with 1.4? Message-ID: <200312091023.56271.heikki@nildram.co.uk> I have so far relased three snapshots from the bioperl core/live cvs head. Things have settled down a bit, but there are still outstanding issues. Especially: - restriction analysis fixes need to merged and commited (Rob) - SearchIO::psiblast & related module removal (Steve) - really long qualifier names in sequence feature tables (Ewan?) #1561 I'd like to see these in before I release the next and hopefully last snap shot. Or would some like to see a snapshot out now? Then there is the issue of other cvs modlues closely tied to core. Ext is simle there have been one major addition during last six months which is well documented and seems to work without problems. I can release that the same day as core. Run is a bit more complicated. There are issues with - newer version of EMBOSS, #1481 - TCoffee, #1453, #1557 ( and #1510, #1514) We need someone to look into these. It would be great to have them fixed this week so the we could have all three packages out before Christmas. Any comments and contributions to that effect welcome, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From birney at ebi.ac.uk Tue Dec 9 06:15:20 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Dec 9 06:22:05 2003 Subject: [Bioperl-l] How to proceed with 1.4? In-Reply-To: <200312091023.56271.heikki@nildram.co.uk> Message-ID: On Tue, 9 Dec 2003, Heikki Lehvaslaiho wrote: > > I have so far relased three snapshots from the bioperl core/live cvs head. > Things have settled down a bit, but there are still outstanding issues. > Especially: > > - restriction analysis fixes need to merged and commited (Rob) > - SearchIO::psiblast & related module removal (Steve) > - really long qualifier names in sequence feature tables (Ewan?) #1561 > Ok. Going to do this now... > I'd like to see these in before I release the next and hopefully last snap > shot. Or would some like to see a snapshot out now? > > > Then there is the issue of other cvs modlues closely tied to core. Ext is > simle there have been one major addition during last six months which is well > documented and seems to work without problems. I can release that the same > day as core. > > Run is a bit more complicated. There are issues with > - newer version of EMBOSS, #1481 > - TCoffee, #1453, #1557 ( and #1510, #1514) > > We need someone to look into these. It would be great to have them fixed this > week so the we could have all three packages out before Christmas. > > Any comments and contributions to that effect welcome, > > -Heikki > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From sbaird at mgcheo3.med.uottawa.ca Tue Dec 9 09:45:07 2003 From: sbaird at mgcheo3.med.uottawa.ca (Stephen Baird) Date: Tue Dec 9 09:55:46 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: <1070917939.11116.66.camel@chrisfields.life.uiuc.edu> Message-ID: Dear hardworking guys, Sorry....but I am a little worried about the <<...>>>> format having trouble with pseudoknots and non-canonical base pairing...something that happens more often than is apparent by programs that predict RNA basepairing based on thermodynamics like MFOLD and the like. RNAmotifs which is doing pattern searching can accomodate all the weird things that might happen in a RNA structure, it is up to the user who designs the pattern. Mapping a simple < or > or . to each nucleotide might not be enough to work all the time. Is there a way to store to a base the specific nucleotide that it is basepairing to in a structural field? This would allow non-canonical basepairing and pseudoknots. There is a new RNA structure XML file format which is suppose to be a new standard...RNAML http://www-lbit.iro.umontreal.ca/rnaml/.... which will store the secondary and tertiary structural data. As RNA prediction and analysis develops more and more data will need to be added that is not just the basepairing of canonical bases. Stephen Baird Molecular Genetics Children's Hospital of Eastern Ontario Ottawa, Ontario Canada > On Mon, 2003-12-08 at 12:06, Jason Stajich wrote: > > On Sat, 6 Dec 2003, Chris Fields wrote: > > > > > I think, like the rest, that RNAFold may be the easiest way to go. > > > mfold is a free program but distribution is bound up by licensing > > > issues (I have it but can't redistribute it due to this; the web > > > interfaces available have some limitations which I couldn't do > > > without). RNAFold doesn't have these problems and the source code is > > > available on the web, plus (like Jason pointed out) there are perl > > > interfaces. There is also something in the book Genomic Perl on > > > calculating energies and drawing secondary structures, but I haven't > > > checked it out in detail. > > > > > > Personally, I am working on a bioperl parser for the RNAmotif program > > > suite (used to search for conserved secondary structures based on a > > > descriptor). The rnamotif program is able to pass the motif hits to > > > efn or efn2 for calculating free energy (based on different energy > > > rules) and can output CT format files. I'm also thinking about doing > > > something similar for tRNAscan-SE and ERPIN at some point. The problem > > > I'm running into is how to store the secondary structure output for > > > inclusion into GFF databases (I'm currently using > > > Bio::SeqFeature::Generic for storing features). Anyone? > > > > Chris - I assume the structure is represented as string like > > <<<...>>>> or ((((...)))) ? > > If you do > > $feat->add_tag_value('secondary_structure',$str); > > > > This should store okay in a DB::GFF db or is that not really working for > > you? > > I think that would work. I will have to do some fiddling with the > program output to get it into that format. One problem is taht RNAmotif > allows mismatches in some of the segments. > > RNAmotif's raw output is a bit like FASTA. Here's a bit from one of my > analyses (the PyrR mRNA-binding site in Bacillus subtilis, rub from the > Genbank file): > > #RM scored > #RM descr h5(tag='H1') ss(tag='S1') h5(tag='H2') h5(tag='H2t') > ss(tag='S2') h3(tag='H2t') h3(tag='H2') ss(tag='S3') h3(tag='H1') > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -6.300 0 1617567 35 attctt taaaa > cagt c cagaga g gctg ag aaggat > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -8.000 0 1617567 35 attcttt aaaa > cagt c cagaga g gctg a gaaggat > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -5.200 0 1617568 33 ttctt taaaa > cagt c cagaga g gctg ag aagga > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -6.900 0 1617568 33 ttcttt aaaa > cagt c cagaga g gctg a gaagga > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -0.400 0 1617568 32 ttcttt aaaa > cagt c cagaga g gctg . agaagg > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -7.200 0 1617569 32 tcttt aaaa > cagt c cagaga g gctg ag aagga > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -3.900 0 1617569 31 tctt taaaa > cagt c cagaga g gctg ag aagg > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -5.600 0 1617569 31 tcttt aaaa > cagt c cagaga g gctg a gaagg > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -4.800 0 1617570 30 cttt aaaa > cagt c cagaga g gctg ag aagg > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > complete genome. > gi|16077068|gb|NC_000964|NC_000964 -4.100 0 1617570 29 cttt aaaa > cagt c cagaga g gctg a gaag > > .... > > > The first two lines (marked with ##) are the initialization line and a > bit from the descriptor file (describing the secondary structural > characteristics). The different segments of the structure are given a > designation (ss=single stranded, etc) and a tag (any name, although I > use simple ones). The tags help when describing more complex structures > by allowing for pairing between distant sites and higher level > interactions (pseudoknots and tertiary and quaternary structures, > although I haven't needed these). The output is like fasta, but the > sequence data is replaced by the database hit (usually the acc. #), > score (in this case, free energy), strand of hit, start of hit, length > of hit and the sequence itself, broken up into segments matching the > elements in the descriptor. This is where the trouble lies; as RNAmotif > allows for mismatches in the descriptor (to allow for internal bulges), > the parser for the sequence elements will need to be intelligent enough > to pick this out. > > Also note that the data hits are redundant (they are retained b/c they > fall below a predetermined threshold from the calculated free energy, > determined in the descriptor file. I plan on including a parser to > clean this up (retain the best score of a fold located within a certain > sequence range, probably less than 10 bp). There's a program in the > RNAmotif suite to do this (rmprune), but it doesn't always "prune" to > the best sequence hit. > > > There are some newish bioperl objects Seq::Meta which are for representing > > some bit of information about each base - maybe this is the place RNA or > > Protein secondary structure information can be coded. > > I'm not sure of what is best way to store these data - Heikki and others > > have mostly worked on them so I can only hand wave at this point. > > > > > > I'm not sure what type of computing you want to do on the data, depending > > on what you want to do, might dictate creating/using different objects. > > i.e. if you wanted to get the residues of the stems I think you might want > > to build a special object which can represent the pairing after parsing it > > out of the structure string. > > My main use for this is to map these database hits against the sequence > using Gbrowse. I would like to add a Gbrowse plugin to link to some > sort of secondary structure output, maybe from the Vienna package to > represent the secondary structure (if using the parenthetical > notation). I can also get CT format output from another program in the > RNAmotif suite (rm2ct), so changing formats shouldn't be too hard but > does require passing the output file through rm2ct. My main concern is > getting the data into some format that could retain structural > information that would prevent informational loss. > > > -jason > > > > > > > > Chris Fields > > > Postdoctoral Reseacher - Dept. of Biochemistry > > > University of Illinois at Urbana-Champaign > > > > > > On Dec 5, 2003, at 2:22 PM, Vesko Baev wrote: > > > > > > > Hi to all, > > > > if anyone knows a module or external program (which can be linked to > > > > bioperl) for folding a RNA predicting hairpins and calculating a free > > > > energy? > > > > > > > > Thanks to ALL! > > > > > > > > Vesselin Baev > > > > Bulgaria > > > > > > > > ----------------------------------------------------------------- > > > > http://www.pari.bg - 篧ѧާڧѧ ݧ ç±§á§² ӧ֧ܧ Õ§Ö§ ? > > > > 篧 ѧ٧ڧѧۧ ß§ ܧާ֧! > > > > 硧ҧߧڧѧۧ ! > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > Christopher Fields > Lab of Dr. Robert Switzer > Dept. of Biochemistry > University of Illinois at Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason at cgt.duhs.duke.edu Tue Dec 9 10:17:44 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 9 10:24:49 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: References: Message-ID: On Tue, 9 Dec 2003, Stephen Baird wrote: > Dear hardworking guys, > Sorry....but I am a little worried about the <<...>>>> format having > trouble with pseudoknots and non-canonical base pairing...something that > happens more often than is apparent by programs that predict RNA > basepairing based on thermodynamics like MFOLD and the like. RNAmotifs > which is doing pattern searching can accomodate all the weird things that > might happen in a RNA structure, it is up to the user who designs the > pattern. > Mapping a simple < or > or . to each nucleotide might not be > enough to work all the time. Is there a way to store to a base the > specific nucleotide that it is basepairing to in a structural field? This > would allow non-canonical basepairing and pseudoknots. > There is a new RNA structure XML file format which is suppose to be a new > standard...RNAML http://www-lbit.iro.umontreal.ca/rnaml/.... which will > store the secondary and tertiary structural data. As RNA prediction and > analysis develops more and more data will need to be added that is not > just the basepairing of canonical bases. > Good point. Had heard that an XML format was on the way - this seems more intelligent system for storage without information loss - but of course it won't fit into the simple GFF system that Chris was thinking about. Probably means Chris would want to use GFF to store the representation of the genomic location of the RNAs but a separate CGI type script will do all the heavy lifting of getting an ID, looking up the structure representation, and generating the plots/summary info/etc. We really have no objects for RNA struture in Bioperl at this point so pretty much a blank slate for someone to exert their will... I would much rather see us move up the sophistication ladder here, but someone new has to be willing to take it on as a project. The afforementioned hard working guys will do our best to help in any way possible with design/programming issues but can't drive this beast. -jason > > Stephen Baird > Molecular Genetics > Children's Hospital of Eastern Ontario > Ottawa, Ontario > Canada > > > On Mon, 2003-12-08 at 12:06, Jason Stajich wrote: > > > On Sat, 6 Dec 2003, Chris Fields wrote: > > > > > > > I think, like the rest, that RNAFold may be the easiest way to go. > > > > mfold is a free program but distribution is bound up by licensing > > > > issues (I have it but can't redistribute it due to this; the web > > > > interfaces available have some limitations which I couldn't do > > > > without). RNAFold doesn't have these problems and the source code is > > > > available on the web, plus (like Jason pointed out) there are perl > > > > interfaces. There is also something in the book Genomic Perl on > > > > calculating energies and drawing secondary structures, but I haven't > > > > checked it out in detail. > > > > > > > > Personally, I am working on a bioperl parser for the RNAmotif program > > > > suite (used to search for conserved secondary structures based on a > > > > descriptor). The rnamotif program is able to pass the motif hits to > > > > efn or efn2 for calculating free energy (based on different energy > > > > rules) and can output CT format files. I'm also thinking about doing > > > > something similar for tRNAscan-SE and ERPIN at some point. The problem > > > > I'm running into is how to store the secondary structure output for > > > > inclusion into GFF databases (I'm currently using > > > > Bio::SeqFeature::Generic for storing features). Anyone? > > > > > > Chris - I assume the structure is represented as string like > > > <<<...>>>> or ((((...)))) ? > > > If you do > > > $feat->add_tag_value('secondary_structure',$str); > > > > > > This should store okay in a DB::GFF db or is that not really working for > > > you? > > > > I think that would work. I will have to do some fiddling with the > > program output to get it into that format. One problem is taht RNAmotif > > allows mismatches in some of the segments. > > > > RNAmotif's raw output is a bit like FASTA. Here's a bit from one of my > > analyses (the PyrR mRNA-binding site in Bacillus subtilis, rub from the > > Genbank file): > > > > #RM scored > > #RM descr h5(tag='H1') ss(tag='S1') h5(tag='H2') h5(tag='H2t') > > ss(tag='S2') h3(tag='H2t') h3(tag='H2') ss(tag='S3') h3(tag='H1') > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -6.300 0 1617567 35 attctt taaaa > > cagt c cagaga g gctg ag aaggat > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -8.000 0 1617567 35 attcttt aaaa > > cagt c cagaga g gctg a gaaggat > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -5.200 0 1617568 33 ttctt taaaa > > cagt c cagaga g gctg ag aagga > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -6.900 0 1617568 33 ttcttt aaaa > > cagt c cagaga g gctg a gaagga > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -0.400 0 1617568 32 ttcttt aaaa > > cagt c cagaga g gctg . agaagg > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -7.200 0 1617569 32 tcttt aaaa > > cagt c cagaga g gctg ag aagga > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -3.900 0 1617569 31 tctt taaaa > > cagt c cagaga g gctg ag aagg > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -5.600 0 1617569 31 tcttt aaaa > > cagt c cagaga g gctg a gaagg > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -4.800 0 1617570 30 cttt aaaa > > cagt c cagaga g gctg ag aagg > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -4.100 0 1617570 29 cttt aaaa > > cagt c cagaga g gctg a gaag > > > > .... > > > > > > The first two lines (marked with ##) are the initialization line and a > > bit from the descriptor file (describing the secondary structural > > characteristics). The different segments of the structure are given a > > designation (ss=single stranded, etc) and a tag (any name, although I > > use simple ones). The tags help when describing more complex structures > > by allowing for pairing between distant sites and higher level > > interactions (pseudoknots and tertiary and quaternary structures, > > although I haven't needed these). The output is like fasta, but the > > sequence data is replaced by the database hit (usually the acc. #), > > score (in this case, free energy), strand of hit, start of hit, length > > of hit and the sequence itself, broken up into segments matching the > > elements in the descriptor. This is where the trouble lies; as RNAmotif > > allows for mismatches in the descriptor (to allow for internal bulges), > > the parser for the sequence elements will need to be intelligent enough > > to pick this out. > > > > Also note that the data hits are redundant (they are retained b/c they > > fall below a predetermined threshold from the calculated free energy, > > determined in the descriptor file. I plan on including a parser to > > clean this up (retain the best score of a fold located within a certain > > sequence range, probably less than 10 bp). There's a program in the > > RNAmotif suite to do this (rmprune), but it doesn't always "prune" to > > the best sequence hit. > > > > > There are some newish bioperl objects Seq::Meta which are for representing > > > some bit of information about each base - maybe this is the place RNA or > > > Protein secondary structure information can be coded. > > > I'm not sure of what is best way to store these data - Heikki and others > > > have mostly worked on them so I can only hand wave at this point. > > > > > > > > > I'm not sure what type of computing you want to do on the data, depending > > > on what you want to do, might dictate creating/using different objects. > > > i.e. if you wanted to get the residues of the stems I think you might want > > > to build a special object which can represent the pairing after parsing it > > > out of the structure string. > > > > My main use for this is to map these database hits against the sequence > > using Gbrowse. I would like to add a Gbrowse plugin to link to some > > sort of secondary structure output, maybe from the Vienna package to > > represent the secondary structure (if using the parenthetical > > notation). I can also get CT format output from another program in the > > RNAmotif suite (rm2ct), so changing formats shouldn't be too hard but > > does require passing the output file through rm2ct. My main concern is > > getting the data into some format that could retain structural > > information that would prevent informational loss. > > > > > -jason > > > > > > > > > > > Chris Fields > > > > Postdoctoral Reseacher - Dept. of Biochemistry > > > > University of Illinois at Urbana-Champaign > > > > > > > > On Dec 5, 2003, at 2:22 PM, Vesko Baev wrote: > > > > > > > > > Hi to all, > > > > > if anyone knows a module or external program (which can be linked to > > > > > bioperl) for folding a RNA predicting hairpins and calculating a free > > > > > energy? > > > > > > > > > > Thanks to ALL! > > > > > > > > > > Vesselin Baev > > > > > Bulgaria > > > > > > > > > > ----------------------------------------------------------------- > > > > > http://www.pari.bg - 篧ѧާڧѧ ݧ ç±§á§² ӧ֧ܧ Õ§Ö§ ? > > > > > 篧 ѧ٧ڧѧۧ ß§ ܧާ֧! > > > > > 硧ҧߧڧѧۧ ! > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@portal.open-bio.org > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > > > Jason Stajich > > > Duke University > > > jason at cgt.mc.duke.edu > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > > Christopher Fields > > Lab of Dr. Robert Switzer > > Dept. of Biochemistry > > University of Illinois at Urbana-Champaign > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From cjfields at uiuc.edu Tue Dec 9 11:29:14 2003 From: cjfields at uiuc.edu (Chris Fields) Date: Tue Dec 9 11:35:23 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: References: Message-ID: <1070987352.3834.71.camel@chrisfields.life.uiuc.edu> I think that you can use parenthetical formats for pseudoknot-like structures (improperly nested Watson-Crick helices). The idea is that () would represent secondary structure, and other brackets {}[] would represent higher-order structures, like so: Helix Pseudoknot ______________ _______________ | | | | (((((....))))).[[[...((((..]]]..)))) |___________| Of course this is where the problem lies, b/c all structures in this format are constrained to simple 1:1 base associations, such as simple Watson-Crick base pairs or noncanonical base pairs (A-G, G-U, etc). Some higher order structures, like triple-helices (A:U:U) and quaternary helices (G:G:G:G) can't be accounted for. Also, the parenthetical syntax gets a bit confusing for very large sequences (16s rRNA, for instance). I think that the format all really depends on the program and the particular use. Our motif is very simple (two consecutive helices containing consensus sequences with purine-rich internal bulge and an embedded GNRA tetraloop). mfold and the Vienna package look for secondary structure only with the constraint that all potential 3' helices must be paired to the closest potential 5' helices, a rule which pseudoknots break. These programs do not search for a specific motif, however; they mainly look for the most optimal folds for a sequence under various conditions (salt cond., temp, etc) given the current energy rules (Turner's rules). RNAmotif is a program which uses a descriptor of the sequence to search for motifs (through a modified regex engine, or as T. Macke says, "a pattern language which represents helices and structural motifs"), then uses an arbitrary scoring system (based on Turner's free energy rules, sequence comparisons, and other features) to "weed out" sequences that don't constrain (awk-like, according to the manual). In other words, it is looking for a pattern which matches a hypothetical descriptor element, then scores it afterwards. Other programs (ERPIN, infernal) use consensus searches built from alignments of known structural motifs. In other words, the motif is known ahead of time, likely determined through structural studies. After all this babbling, I do think that RNAML is the way to go with this. However, when it comes to adding a tag for a SeqFeature object, using RNAML becomes very problematic (not to mention an RNAML parser/converter would need to be written!). I'll think about it some more. Chris On Tue, 2003-12-09 at 08:45, Stephen Baird wrote: > Dear hardworking guys, > Sorry....but I am a little worried about the <<...>>>> format having > trouble with pseudoknots and non-canonical base pairing...something that > happens more often than is apparent by programs that predict RNA > basepairing based on thermodynamics like MFOLD and the like. RNAmotifs > which is doing pattern searching can accomodate all the weird things that > might happen in a RNA structure, it is up to the user who designs the > pattern. > Mapping a simple < or > or . to each nucleotide might not be > enough to work all the time. Is there a way to store to a base the > specific nucleotide that it is basepairing to in a structural field? This > would allow non-canonical basepairing and pseudoknots. > There is a new RNA structure XML file format which is suppose to be a new > standard...RNAML http://www-lbit.iro.umontreal.ca/rnaml/.... which will > store the secondary and tertiary structural data. As RNA prediction and > analysis develops more and more data will need to be added that is not > just the basepairing of canonical bases. > > > Stephen Baird > Molecular Genetics > Children's Hospital of Eastern Ontario > Ottawa, Ontario > Canada > > > On Mon, 2003-12-08 at 12:06, Jason Stajich wrote: > > > On Sat, 6 Dec 2003, Chris Fields wrote: > > > > > > > I think, like the rest, that RNAFold may be the easiest way to go. > > > > mfold is a free program but distribution is bound up by licensing > > > > issues (I have it but can't redistribute it due to this; the web > > > > interfaces available have some limitations which I couldn't do > > > > without). RNAFold doesn't have these problems and the source code is > > > > available on the web, plus (like Jason pointed out) there are perl > > > > interfaces. There is also something in the book Genomic Perl on > > > > calculating energies and drawing secondary structures, but I haven't > > > > checked it out in detail. > > > > > > > > Personally, I am working on a bioperl parser for the RNAmotif program > > > > suite (used to search for conserved secondary structures based on a > > > > descriptor). The rnamotif program is able to pass the motif hits to > > > > efn or efn2 for calculating free energy (based on different energy > > > > rules) and can output CT format files. I'm also thinking about doing > > > > something similar for tRNAscan-SE and ERPIN at some point. The problem > > > > I'm running into is how to store the secondary structure output for > > > > inclusion into GFF databases (I'm currently using > > > > Bio::SeqFeature::Generic for storing features). Anyone? > > > > > > Chris - I assume the structure is represented as string like > > > <<<...>>>> or ((((...)))) ? > > > If you do > > > $feat->add_tag_value('secondary_structure',$str); > > > > > > This should store okay in a DB::GFF db or is that not really working for > > > you? > > > > I think that would work. I will have to do some fiddling with the > > program output to get it into that format. One problem is taht RNAmotif > > allows mismatches in some of the segments. > > > > RNAmotif's raw output is a bit like FASTA. Here's a bit from one of my > > analyses (the PyrR mRNA-binding site in Bacillus subtilis, rub from the > > Genbank file): > > > > #RM scored > > #RM descr h5(tag='H1') ss(tag='S1') h5(tag='H2') h5(tag='H2t') > > ss(tag='S2') h3(tag='H2t') h3(tag='H2') ss(tag='S3') h3(tag='H1') > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -6.300 0 1617567 35 attctt taaaa > > cagt c cagaga g gctg ag aaggat > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -8.000 0 1617567 35 attcttt aaaa > > cagt c cagaga g gctg a gaaggat > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -5.200 0 1617568 33 ttctt taaaa > > cagt c cagaga g gctg ag aagga > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -6.900 0 1617568 33 ttcttt aaaa > > cagt c cagaga g gctg a gaagga > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -0.400 0 1617568 32 ttcttt aaaa > > cagt c cagaga g gctg . agaagg > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -7.200 0 1617569 32 tcttt aaaa > > cagt c cagaga g gctg ag aagga > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -3.900 0 1617569 31 tctt taaaa > > cagt c cagaga g gctg ag aagg > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -5.600 0 1617569 31 tcttt aaaa > > cagt c cagaga g gctg a gaagg > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -4.800 0 1617570 30 cttt aaaa > > cagt c cagaga g gctg ag aagg > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > complete genome. > > gi|16077068|gb|NC_000964|NC_000964 -4.100 0 1617570 29 cttt aaaa > > cagt c cagaga g gctg a gaag > > > > .... > > > > > > The first two lines (marked with ##) are the initialization line and a > > bit from the descriptor file (describing the secondary structural > > characteristics). The different segments of the structure are given a > > designation (ss=single stranded, etc) and a tag (any name, although I > > use simple ones). The tags help when describing more complex structures > > by allowing for pairing between distant sites and higher level > > interactions (pseudoknots and tertiary and quaternary structures, > > although I haven't needed these). The output is like fasta, but the > > sequence data is replaced by the database hit (usually the acc. #), > > score (in this case, free energy), strand of hit, start of hit, length > > of hit and the sequence itself, broken up into segments matching the > > elements in the descriptor. This is where the trouble lies; as RNAmotif > > allows for mismatches in the descriptor (to allow for internal bulges), > > the parser for the sequence elements will need to be intelligent enough > > to pick this out. > > > > Also note that the data hits are redundant (they are retained b/c they > > fall below a predetermined threshold from the calculated free energy, > > determined in the descriptor file. I plan on including a parser to > > clean this up (retain the best score of a fold located within a certain > > sequence range, probably less than 10 bp). There's a program in the > > RNAmotif suite to do this (rmprune), but it doesn't always "prune" to > > the best sequence hit. > > > > > There are some newish bioperl objects Seq::Meta which are for representing > > > some bit of information about each base - maybe this is the place RNA or > > > Protein secondary structure information can be coded. > > > I'm not sure of what is best way to store these data - Heikki and others > > > have mostly worked on them so I can only hand wave at this point. > > > > > > > > > I'm not sure what type of computing you want to do on the data, depending > > > on what you want to do, might dictate creating/using different objects. > > > i.e. if you wanted to get the residues of the stems I think you might want > > > to build a special object which can represent the pairing after parsing it > > > out of the structure string. > > > > My main use for this is to map these database hits against the sequence > > using Gbrowse. I would like to add a Gbrowse plugin to link to some > > sort of secondary structure output, maybe from the Vienna package to > > represent the secondary structure (if using the parenthetical > > notation). I can also get CT format output from another program in the > > RNAmotif suite (rm2ct), so changing formats shouldn't be too hard but > > does require passing the output file through rm2ct. My main concern is > > getting the data into some format that could retain structural > > information that would prevent informational loss. > > > > > -jason > > > > > > > > > > > Chris Fields > > > > Postdoctoral Reseacher - Dept. of Biochemistry > > > > University of Illinois at Urbana-Champaign > > > > > > > > On Dec 5, 2003, at 2:22 PM, Vesko Baev wrote: > > > > > > > > > Hi to all, > > > > > if anyone knows a module or external program (which can be linked to > > > > > bioperl) for folding a RNA predicting hairpins and calculating a free > > > > > energy? > > > > > > > > > > Thanks to ALL! > > > > > > > > > > Vesselin Baev > > > > > Bulgaria > > > > > > > > > > ----------------------------------------------------------------- > > > > > http://www.pari.bg - 篧ѧާڧѧ ݧ ç±§á§² ӧ֧ܧ Õ§Ö§ ? > > > > > 篧 ѧ٧ڧѧۧ ß§ ܧާ֧! > > > > > 硧ҧߧڧѧۧ ! > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@portal.open-bio.org > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > > > Jason Stajich > > > Duke University > > > jason at cgt.mc.duke.edu > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > > Christopher Fields > > Lab of Dr. Robert Switzer > > Dept. of Biochemistry > > University of Illinois at Urbana-Champaign > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- Christopher Fields Lab of Dr. Robert Switzer Dept. of Biochemistry University of Illinois at Urbana-Champaign From birney at ebi.ac.uk Tue Dec 9 11:33:00 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Dec 9 11:39:47 2003 Subject: [Bioperl-l] How to proceed with 1.4? In-Reply-To: <200312091023.56271.heikki@nildram.co.uk> Message-ID: On Tue, 9 Dec 2003, Heikki Lehvaslaiho wrote: > > I have so far relased three snapshots from the bioperl core/live cvs head. > Things have settled down a bit, but there are still outstanding issues. > Especially: > > - restriction analysis fixes need to merged and commited (Rob) > - SearchIO::psiblast & related module removal (Steve) > - really long qualifier names in sequence feature tables (Ewan?) #1561 > Ok... This is now "fixed" to behave in the same way as the EMBL feature dumping here at EBI does. A little detour here: - The feature table document just has a line limit on it. It does not specify how to break words up across lines in particular in "continuation" mode - @EBI the suggestion (implemented in the bioperl code) is that one breaks at hypens and commas as well as spaces - This causes a round-tripping problem because the break will get interpreted as a white space. This round tripping problem is present now in the GenBank/EMBL system anyway, so... I've just made Bioperl work the same way ;). Proof: ID id standard; DNA; UNK; 27 BP. XX AC unknown; XX DE XX FH Key Location/Qualifiers FH FT CDS 1..2 FT /product="this-is-a-really-long-silly-name-which-goes-aaay- FT too-long-and-makes-life-difficult" FT /product2="N-acetylglucosaminyl- FT phosphatidylinositolbiosyntheticprotein, putative" XX SQ Sequence 27 BP; 11 A; 0 C; 6 G; 10 T; 0 other; atggttagaa attttgaaat tgaaatg 27 // I also have some tests failing on my installation, which is linux, [birney@localhost bioperl-live]$ perl -v This is perl, v5.6.0 built for i386-linux Copyright 1987-2000, Larry Wall Failed Test Status Wstat Total Fail Failed List of failed ------------------------------------------------------------------------------- t/BioDBGFF.t 0 139 133 4 3.01% 130-133 t/Coalescent.t 255 65280 11 10 90.91% 2-11 t/GFF.t 255 65280 32 12 37.50% 21-32 t/GuessSeqForma 44 1 2.27% 9 t/RestrictionAn 111 9 8.11% 80, 82, 85, 98-99, 101-104 t/RestrictionIO 14 2 14.29% 4, 10 t/RootI.t 2 512 10 3 30.00% 8-10 t/tutorial.t 2 512 21 3 14.29% 19-21 Not sure how many of these you know about Heikki - I tend not to have XML things installed, so that triggered some game stuff in some somewhere... From cjfields at uiuc.edu Tue Dec 9 12:00:54 2003 From: cjfields at uiuc.edu (Chris Fields) Date: Tue Dec 9 12:07:08 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: References: Message-ID: <1070989253.3834.104.camel@chrisfields.life.uiuc.edu> On Tue, 2003-12-09 at 09:17, Jason Stajich wrote: > On Tue, 9 Dec 2003, Stephen Baird wrote: > > > Dear hardworking guys, > > Sorry....but I am a little worried about the <<...>>>> format having > > trouble with pseudoknots and non-canonical base pairing...something that > > happens more often than is apparent by programs that predict RNA > > basepairing based on thermodynamics like MFOLD and the like. RNAmotifs > > which is doing pattern searching can accomodate all the weird things that > > might happen in a RNA structure, it is up to the user who designs the > > pattern. > > Mapping a simple < or > or . to each nucleotide might not be > > enough to work all the time. Is there a way to store to a base the > > specific nucleotide that it is basepairing to in a structural field? This > > would allow non-canonical basepairing and pseudoknots. > > There is a new RNA structure XML file format which is suppose to be a new > > standard...RNAML http://www-lbit.iro.umontreal.ca/rnaml/.... which will > > store the secondary and tertiary structural data. As RNA prediction and > > analysis develops more and more data will need to be added that is not > > just the basepairing of canonical bases. > > > Good point. > > Had heard that an XML format was on the way - this seems more intelligent > system for storage without information loss - but of course it won't fit > into the simple GFF system that Chris was thinking about. Probably means > Chris would want to use GFF to store the representation of the > genomic location of the RNAs but a separate CGI type script will do all > the heavy lifting of getting an ID, looking up the structure > representation, and generating the plots/summary info/etc. Aha! This seems like a good idea! Maybe use the tag for storing a database location (ID), then using the CGI script to pull it out, set up the plot, etc. Nice, and shouldn't be too hard (although I could be kicking myself later for saying that...) > We really have no objects for RNA struture in Bioperl at this point so > pretty much a blank slate for someone to exert their will... I think RNAML is the way to go (as I told Stephen previously). It would be nice to get an RNAML object going...maybe Bio::SeqFeature::RNAML? Bio::Tools::RNAML? Bio::Tools::Run::rnatools::RNAML? (that's a mouthful....) > I would much rather see us move up the sophistication ladder here, but > someone new has to be willing to take it on as a project. > > The afforementioned hard working guys will do our best to help in any way > possible with design/programming issues but can't drive this beast. I have to admit that I'm still somewhat of a newbie, though I have picked up quite a bit from reading and, of course, using the Camel and Llama books (plus Conway's OO Perl and Schwartz's Learning with Perl Objects and References). I'm a RNA researcher at heart and have been programming for ~1 year off and on, mainly out of an interest in Perl but also for research as a postdoc. I would like to help out in this area, but I am also constrained by "wet-bench" research as well. For my part I'll definitely do what I can. On the plus side, I would be able to test on three different platforms (Mac OS X, Fedora Core 1 Linux, and Windows XP)! I'll read up on RNAML to see what can be done. I'll also look at the Bio::Tools::Run::PiseApplication::mfold in bioperl-run and the perl scripts in the Vienna package to see how output is processed for those programs. Chris > -jason > > > > > issues (I have it but can't redistribute it due to this; the web > > > > > interfaces available have some limitations which I couldn't do > > > > > without). RNAFold doesn't have these problems and the source code is > > > > > available on the web, plus (like Jason pointed out) there are perl > > > > > interfaces. There is also something in the book Genomic Perl on > > > > > calculating energies and drawing secondary structures, but I haven't > > > > > checked it out in detail. > > > > > > > > > > Personally, I am working on a bioperl parser for the RNAmotif program > > > > > suite (used to search for conserved secondary structures based on a > > > > > descriptor). The rnamotif program is able to pass the motif hits to > > > > > efn or efn2 for calculating free energy (based on different energy > > > > > rules) and can output CT format files. I'm also thinking about doing > > > > > something similar for tRNAscan-SE and ERPIN at some point. The problem > > > > > I'm running into is how to store the secondary structure output for > > > > > inclusion into GFF databases (I'm currently using > > > > > Bio::SeqFeature::Generic for storing features). Anyone? > > > > > > > > Chris - I assume the structure is represented as string like > > > > <<<...>>>> or ((((...)))) ? > > > > If you do > > > > $feat->add_tag_value('secondary_structure',$str); > > > > > > > > This should store okay in a DB::GFF db or is that not really working for > > > > you? > > > > > > I think that would work. I will have to do some fiddling with the > > > program output to get it into that format. One problem is taht RNAmotif > > > allows mismatches in some of the segments. > > > > > > RNAmotif's raw output is a bit like FASTA. Here's a bit from one of my > > > analyses (the PyrR mRNA-binding site in Bacillus subtilis, rub from the > > > Genbank file): > > > > > > #RM scored > > > #RM descr h5(tag='H1') ss(tag='S1') h5(tag='H2') h5(tag='H2t') > > > ss(tag='S2') h3(tag='H2t') h3(tag='H2') ss(tag='S3') h3(tag='H1') > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -6.300 0 1617567 35 attctt taaaa > > > cagt c cagaga g gctg ag aaggat > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -8.000 0 1617567 35 attcttt aaaa > > > cagt c cagaga g gctg a gaaggat > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -5.200 0 1617568 33 ttctt taaaa > > > cagt c cagaga g gctg ag aagga > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -6.900 0 1617568 33 ttcttt aaaa > > > cagt c cagaga g gctg a gaagga > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -0.400 0 1617568 32 ttcttt aaaa > > > cagt c cagaga g gctg . agaagg > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -7.200 0 1617569 32 tcttt aaaa > > > cagt c cagaga g gctg ag aagga > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -3.900 0 1617569 31 tctt taaaa > > > cagt c cagaga g gctg ag aagg > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -5.600 0 1617569 31 tcttt aaaa > > > cagt c cagaga g gctg a gaagg > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -4.800 0 1617570 30 cttt aaaa > > > cagt c cagaga g gctg ag aagg > > > >gi|16077068|gb|NC_000964|NC_000964 DEFINITION Bacillus subtilis, > > > complete genome. > > > gi|16077068|gb|NC_000964|NC_000964 -4.100 0 1617570 29 cttt aaaa > > > cagt c cagaga g gctg a gaag > > > > > > .... > > > > > > > > > The first two lines (marked with ##) are the initialization line and a > > > bit from the descriptor file (describing the secondary structural > > > characteristics). The different segments of the structure are given a > > > designation (ss=single stranded, etc) and a tag (any name, although I > > > use simple ones). The tags help when describing more complex structures > > > by allowing for pairing between distant sites and higher level > > > interactions (pseudoknots and tertiary and quaternary structures, > > > although I haven't needed these). The output is like fasta, but the > > > sequence data is replaced by the database hit (usually the acc. #), > > > score (in this case, free energy), strand of hit, start of hit, length > > > of hit and the sequence itself, broken up into segments matching the > > > elements in the descriptor. This is where the trouble lies; as RNAmotif > > > allows for mismatches in the descriptor (to allow for internal bulges), > > > the parser for the sequence elements will need to be intelligent enough > > > to pick this out. > > > > > > Also note that the data hits are redundant (they are retained b/c they > > > fall below a predetermined threshold from the calculated free energy, > > > determined in the descriptor file. I plan on including a parser to > > > clean this up (retain the best score of a fold located within a certain > > > sequence range, probably less than 10 bp). There's a program in the > > > RNAmotif suite to do this (rmprune), but it doesn't always "prune" to > > > the best sequence hit. > > > > > > > There are some newish bioperl objects Seq::Meta which are for representing > > > > some bit of information about each base - maybe this is the place RNA or > > > > Protein secondary structure information can be coded. > > > > I'm not sure of what is best way to store these data - Heikki and others > > > > have mostly worked on them so I can only hand wave at this point. > > > > > > > > > > > > I'm not sure what type of computing you want to do on the data, depending > > > > on what you want to do, might dictate creating/using different objects. > > > > i.e. if you wanted to get the residues of the stems I think you might want > > > > to build a special object which can represent the pairing after parsing it > > > > out of the structure string. > > > > > > My main use for this is to map these database hits against the sequence > > > using Gbrowse. I would like to add a Gbrowse plugin to link to some > > > sort of secondary structure output, maybe from the Vienna package to > > > represent the secondary structure (if using the parenthetical > > > notation). I can also get CT format output from another program in the > > > RNAmotif suite (rm2ct), so changing formats shouldn't be too hard but > > > does require passing the output file through rm2ct. My main concern is > > > getting the data into some format that could retain structural > > > information that would prevent informational loss. > > > > > > > -jason > > > > > > > > > > > > > > Chris Fields > > > > > Postdoctoral Reseacher - Dept. of Biochemistry > > > > > University of Illinois at Urbana-Champaign > > > > > > > > > > On Dec 5, 2003, at 2:22 PM, Vesko Baev wrote: > > > > > > > > > > > Hi to all, > > > > > > if anyone knows a module or external program (which can be linked to > > > > > > bioperl) for folding a RNA predicting hairpins and calculating a free > > > > > > energy? > > > > > > > > > > > > Thanks to ALL! > > > > > > > > > > > > Vesselin Baev > > > > > > Bulgaria > > > > > > > > > > > > ----------------------------------------------------------------- > > > > > > http://www.pari.bg - 篧ѧާڧѧ ݧ ç±§á§² ӧ֧ܧ Õ§Ö§ ? > > > > > > 篧 ѧ٧ڧѧۧ ß§ ܧާ֧! > > > > > > 硧ҧߧڧѧۧ ! > > > > > > _______________________________________________ > > > > > > Bioperl-l mailing list > > > > > > Bioperl-l@portal.open-bio.org > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > -- > > > > Jason Stajich > > > > Duke University > > > > jason at cgt.mc.duke.edu > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- > > > Christopher Fields > > > Lab of Dr. Robert Switzer > > > Dept. of Biochemistry > > > University of Illinois at Urbana-Champaign > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Christopher Fields Lab of Dr. Robert Switzer Dept. of Biochemistry University of Illinois at Urbana-Champaign From sgj at sanger.ac.uk Tue Dec 9 12:12:26 2003 From: sgj at sanger.ac.uk (Sam Griffiths-Jones) Date: Tue Dec 9 12:18:46 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: <1070987352.3834.71.camel@chrisfields.life.uiuc.edu> Message-ID: On Tue, 9 Dec 2003, Chris Fields wrote: > I think that you can use parenthetical formats for pseudoknot-like > structures (improperly nested Watson-Crick helices). The idea is that > () would represent secondary structure, and other brackets {}[] would > represent higher-order structures, like so: > > Helix Pseudoknot > ______________ _______________ > | | | | > (((((....))))).[[[...((((..]]]..)))) > |___________| > Ah, if you're thinking of a generic format then different brackets are going to get you into trouble - Sean Eddy's INFERNAL suite (think HMMer for RNAs) already uses different brackets to markup different layers of nesting, like: ..[[[[..<<<<<..>>>>>....<<<..>>>..]]]].. There is an informal standard to incorporate pseudoknot info into the bracket notation using letters for non-nested base pairs: <<<<<<<.<<<...AAAA..>>>>>>>>>>..aaaa...... The upper case stuff base pairs with the lower case stuff. This seems like a really bad idea, but given that you're parsing the vast majority of the structure with brackets, and the most complicated known nested pseudoknot (in the alpha operon leader) only involves letters A, B and C, its not so bad. Also this provides a natural separation for the algorithms that can only deal with nested interactions (SCFGs and the like) from those that can use everything. For what its worth this is how we markup such non-nested things in the Rfam database. > Of course this is where the problem lies, b/c all structures in this > format are constrained to simple 1:1 base associations, such as simple > Watson-Crick base pairs or noncanonical base pairs (A-G, G-U, etc). > Some higher order structures, like triple-helices (A:U:U) and quaternary > helices (G:G:G:G) can't be accounted for. Also, the parenthetical > syntax gets a bit confusing for very large sequences (16s rRNA, for > instance). > Yep - tough in a single line. We've also been thinking about how to mark these up in alignments of RNAs in Rfam, but without decision. You might think of things which aren't 1:1 as tertiary interactions and therefore seperable from the secondary structure which the bracket notation is designed to cope with. > I think that the format all really depends on the program and the > particular use. > After all this babbling, I do think that RNAML is the way to go with > this. These two seem contradictory to me :) I don't kow much about RNAML but I get the impression its trying to solve all RNA sequence/markup/annotation issues in one go. Depending on your point of view this is either a great idea or very bad. I haven't decided yet :) Sam -------------------------------------------------------------------- Sam Griffiths-Jones sgj@sanger.ac.uk http://www.sanger.ac.uk/Users/sgj +44 (0)1223 834244 Wisdom #8002: Always try to do things in chronological order; it's less confusing that way. -------------------------------------------------------------------- From cain at cshl.org Tue Dec 9 15:00:34 2003 From: cain at cshl.org (Scott Cain) Date: Tue Dec 9 15:06:39 2003 Subject: [Bioperl-l] Problem with Unflattener Message-ID: <1071000033.1440.48.camel@localhost.localdomain> Hello Chris, I am using Unflattener to create a genbank2gff script that is more robust than what we have now. As one of my example Genbank files, I am using an A. gambiae chromosome: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide&list_uids=31249389&dopt=GenBank&term=NW_045730&qty=1 When I try to run the simplified script below, I get the following error: ------------- EXCEPTION ------------- MSG: structure_type 2 is currently unknown STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/lib/perl5/site_perl/5.8.1/Bio/SeqFeature/Tools/Unflattener.pm:1345 STACK toplevel ./simple.pl:19 -------------------------------------- As I read Unflattener, structure_type should only be set if I set it explicitly, right? So how is it getting set here, and how do I make it stop? Here's the script: #!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::SeqFeature::Tools::Unflattener; my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; my $seqio = Bio::SeqIO->new( -file => 'NW_045730.1.gbk', -format => 'GenBank' ); open OUT, '>out.gff'; while ( my $seq = $seqio->next_seq() ) { my $acc = $seq->accession; # get top level unflattended SeqFeatureI objects my @sfs = $unflattener->unflatten_seq( -seq => $seq, -use_magic => 1 ); foreach my $sf (@sfs) { my $gffio = $sf->gff_format( Bio::Tools::GFF->new( -gff_version => 3 ) ); $sf->seq_id($acc); if ( $sf->primary_tag() eq 'source' ) { $sf->add_tag_value( 'ID', $acc ); $sf->primary_tag('region'); } print OUT $sf->gff_string . "\n"; } } close OUT; --------------------------- Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Tue Dec 9 15:03:08 2003 From: cjfields at uiuc.edu (Chris Fields) Date: Tue Dec 9 15:09:22 2003 Subject: [Bioperl-l] RNA fold In-Reply-To: References: Message-ID: <1071000188.3834.248.camel@chrisfields.life.uiuc.edu> On Tue, 2003-12-09 at 11:12, Sam Griffiths-Jones wrote: > On Tue, 9 Dec 2003, Chris Fields wrote: > > > I think that you can use parenthetical formats for pseudoknot-like > > structures (improperly nested Watson-Crick helices). The idea is that > > () would represent secondary structure, and other brackets {}[] would > > represent higher-order structures, like so: > > > > Helix Pseudoknot > > ______________ _______________ > > | | | | > > (((((....))))).[[[...((((..]]]..)))) > > |___________| > > > > Ah, if you're thinking of a generic format then different brackets are > going to get you into trouble - Sean Eddy's INFERNAL suite (think > HMMer for RNAs) already uses different brackets to markup different > layers of nesting, like: > > ..[[[[..<<<<<..>>>>>....<<<..>>>..]]]].. > > There is an informal standard to incorporate pseudoknot info into the > bracket notation using letters for non-nested base pairs: > > <<<<<<<.<<<...AAAA..>>>>>>>>>>..aaaa...... > > The upper case stuff base pairs with the lower case stuff. This seems > like a really bad idea, but given that you're parsing the vast > majority of the structure with brackets, and the most complicated > known nested pseudoknot (in the alpha operon leader) only involves > letters A, B and C, its not so bad. Also this provides a natural > separation for the algorithms that can only deal with nested > interactions (SCFGs and the like) from those that can use everything. > For what its worth this is how we markup such non-nested things in the > Rfam database. Yikes! This is a problem, b/c I have seen many different ways of showing secondary structure (CT table format, parenthetical, XML, etc). > > Of course this is where the problem lies, b/c all structures in this > > format are constrained to simple 1:1 base associations, such as simple > > Watson-Crick base pairs or noncanonical base pairs (A-G, G-U, etc). > > Some higher order structures, like triple-helices (A:U:U) and quaternary > > helices (G:G:G:G) can't be accounted for. Also, the parenthetical > > syntax gets a bit confusing for very large sequences (16s rRNA, for > > instance). > > > > Yep - tough in a single line. We've also been thinking about how to > mark these up in alignments of RNAs in Rfam, but without decision. > You might think of things which aren't 1:1 as tertiary interactions > and therefore seperable from the secondary structure which the bracket > notation is designed to cope with. > > > I think that the format all really depends on the program and the > > particular use. > > > After all this babbling, I do think that RNAML is the way to go with > > this. > > These two seem contradictory to me :) A bit contradictory, yes. I also tend to write as a stream of thought, so I sometimes change my mind. I'm a bit confused as how to approach the original issue (tagging the structure in some way for a plugin). This is b/c there doesn't seem to be a consensus yet on an approach to retain as much structural information as possible. RNAML seems to be the best way so far (and the list of people on board is pretty impressive), but it's a bit complex. Personally, I think the best way to approach the problem of having multiple formats is the same approach used by Bio::SeqIO. That is, by using specific parsers for getting all information into a Bioperl-specific format or a format in which information was retained at the highest possible level (RNAML, INFERNAL, etc). That way, data could be converted into alternative formats which may or may not retain higher level information depending on the input and output formats. With this approach, one could have a file parser for each format (INFERNAL, CT, RNAML, etc) and output would be the same, possibly with warnings for loss of information (RNAML structure format to CT, for instance). I may have to delve into Bio::SeqIO a bit to get an idea of how they handle things. I guess the real issue is coming up with a way to deal with all levels of information (secondary, tertiary, etc). Maybe a modified CT format, something like a table consisting of bases and their interactions with other bases? Maybe with a tagged designation for pairs signifying if they are in non-WC pairs, triplets? Anyway as for now, I plan on just getting my motif results into a simple parenthetical format which I'll parse in a program outside of RNAmotif.pm. I could always change it to another format later, when some of the formatting issues are resolved. > I don't kow much about RNAML but I get the impression its trying to > solve all RNA sequence/markup/annotation issues in one go. Depending > on your point of view this is either a great idea or very bad. I > haven't decided yet :) Yeah, but they've got some pretty big people in the field backing it up! :> Chris > Sam > > > -------------------------------------------------------------------- > Sam Griffiths-Jones sgj@sanger.ac.uk > http://www.sanger.ac.uk/Users/sgj +44 (0)1223 834244 > > Wisdom #8002: Always try to do things in chronological order; > it's less confusing that way. > -------------------------------------------------------------------- -- Christopher Fields Lab of Dr. Robert Switzer Dept. of Biochemistry University of Illinois at Urbana-Champaign From brian_osborne at cognia.com Tue Dec 9 17:14:31 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Dec 9 17:23:51 2003 Subject: [Bioperl-l] How to proceed with 1.4? In-Reply-To: <200312091023.56271.heikki@nildram.co.uk> Message-ID: Heikki, I just added a Feature and Annotation HOWTO, it's almost done. Comments welcome, of course. When we're close to release I'll remake all the HOWTOs for the Web site, add new links if necessary, and you can create the text versions for the package, I trust. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Heikki Lehvaslaiho Sent: Tuesday, December 09, 2003 5:24 AM To: Bioperl Cc: Rob Edwards; steve_chervitz @affymetrix.com>, Ewan Birney Subject: [Bioperl-l] How to proceed with 1.4? I have so far relased three snapshots from the bioperl core/live cvs head. Things have settled down a bit, but there are still outstanding issues. Especially: - restriction analysis fixes need to merged and commited (Rob) - SearchIO::psiblast & related module removal (Steve) - really long qualifier names in sequence feature tables (Ewan?) #1561 I'd like to see these in before I release the next and hopefully last snap shot. Or would some like to see a snapshot out now? Then there is the issue of other cvs modlues closely tied to core. Ext is simle there have been one major addition during last six months which is well documented and seems to work without problems. I can release that the same day as core. Run is a bit more complicated. There are issues with - newer version of EMBOSS, #1481 - TCoffee, #1453, #1557 ( and #1510, #1514) We need someone to look into these. It would be great to have them fixed this week so the we could have all three packages out before Christmas. Any comments and contributions to that effect welcome, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From redwards at utmem.edu Tue Dec 9 18:39:49 2003 From: redwards at utmem.edu (Rob Edwards) Date: Tue Dec 9 18:48:15 2003 Subject: [Bioperl-l] Bio::Restriction::Analysis feature merge In-Reply-To: <200312072232.52575.pblaiklo@110.net> Message-ID: Peter, I have a few comments on these revised modules: 1. Does circularity work OK? It fails the tests I devised at the moment (there is an MwoI site near the 0 point on t/data/dna1.fa) 2. It doesn't seem to handle multiple digests - did you add this in and am I missing it? 3. Do you have other test scripts that you have run on these so I can compare? 4. I don't like the idea of reverse complementing the target sequence. If you are using a large target sequence (e.g. 1 Mbp or more) this will be horrible. It would be a lot better to reverse the enzyme and should be the same (of course, you'd have to reverse the complementary strand of the non-palindroimic enzyme). 5. The newer version of Enzyme.pm (in cvs) doesn't require Storable.pm, and this is better than demanding it be there. If you are done working on this I'll fix it up and alter the method that gets the cut site to move that into Enzyme.pm so that Analysis just works with the integers, and add the changed modules into cvs before the weekend. If you have made any other changes can you email them to me so I can incorporate them too? Thanks Rob On Sunday, December 7, 2003, at 08:32 PM, Peter Blaiklock wrote: > Hi > > I modified my new Bio::Restriction::Analysis module to incorporate > Rob's > changes, including the use of index to locate nonambiguous sites and a > digestion method that can handle multiple enzymes. Please let me know > about > any problems, suggestions, bugs, etc. I should have tests ready in a > day or > two. > > http://www.restrictionmapper.org/bioperl/Analysis.pm > http://www.restrictionmapper.org/bioperl/Enzyme.pm > http://www.restrictionmapper.org/bioperl/base.pm > > Peter Blaiklock > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From cjm at fruitfly.org Tue Dec 9 21:19:16 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Tue Dec 9 19:16:34 2003 Subject: [Bioperl-l] Re: Problem with Unflattener In-Reply-To: <1071000033.1440.48.camel@localhost.localdomain> Message-ID: Hi Scott Bug squashed, do a cvs update and it should work The problem was that this record uses /locus_tag instead of /gene - the unflattener should be able to detect this in magic mode, but there was one place where "/gene" was hardcoded. By the way, for this particular record you can get the exact same data from ensembl, already unflattened (or rather, never flattened into genbank format in the first place). Nevertheless, this sort of thing is extremely useful for testing Unflattener.pm, so carry on testing! Really I should do a full QC by comparing ensembl sourced GFF and the results of ensembl->genbank->unflattener->gff, but I haven't got round to this yet. Cheers Chris On Tue, 9 Dec 2003, Scott Cain wrote: > Hello Chris, > > I am using Unflattener to create a genbank2gff script that is more > robust than what we have now. As one of my example Genbank files, I am > using an A. gambiae chromosome: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide&list_uids=31249389&dopt=GenBank&term=NW_045730&qty=1 > > When I try to run the simplified script below, I get the following > error: > > ------------- EXCEPTION ------------- > MSG: structure_type 2 is currently unknown > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/lib/perl5/site_perl/5.8.1/Bio/SeqFeature/Tools/Unflattener.pm:1345 > STACK toplevel ./simple.pl:19 > > -------------------------------------- > > As I read Unflattener, structure_type should only be set if I set it > explicitly, right? So how is it getting set here, and how do I make it > stop? > > Here's the script: > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > use Bio::SeqFeature::Tools::Unflattener; > > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; > > my $seqio = Bio::SeqIO->new( > -file => 'NW_045730.1.gbk', > -format => 'GenBank' > ); > > open OUT, '>out.gff'; > > while ( my $seq = $seqio->next_seq() ) { > my $acc = $seq->accession; > > # get top level unflattended SeqFeatureI objects > my @sfs = $unflattener->unflatten_seq( > -seq => $seq, > -use_magic => 1 > ); > > foreach my $sf (@sfs) { > my $gffio = > $sf->gff_format( Bio::Tools::GFF->new( -gff_version => 3 ) ); > > $sf->seq_id($acc); > > if ( $sf->primary_tag() eq 'source' ) { > $sf->add_tag_value( 'ID', $acc ); > $sf->primary_tag('region'); > } > print OUT $sf->gff_string . "\n"; > } > } > close OUT; > --------------------------- > > Thanks, > Scott > > From cain at cshl.org Tue Dec 9 19:34:47 2003 From: cain at cshl.org (Scott Cain) Date: Tue Dec 9 19:40:50 2003 Subject: [Bioperl-l] Re: Problem with Unflattener In-Reply-To: References: Message-ID: <1071016487.1440.60.camel@localhost.localdomain> Thanks. Re: GFF from ensembl: You can get it as GFF3? Could you send me a link if so. (You can tell I'm a little incredulous.) Scott On Tue, 2003-12-09 at 21:19, Chris Mungall wrote: > Hi Scott > > Bug squashed, do a cvs update and it should work > > The problem was that this record uses /locus_tag instead of /gene - the > unflattener should be able to detect this in magic mode, but there was one > place where "/gene" was hardcoded. > > By the way, for this particular record you can get the exact same data > from ensembl, already unflattened (or rather, never flattened into genbank > format in the first place). Nevertheless, this sort of thing is extremely > useful for testing Unflattener.pm, so carry on testing! Really I should do > a full QC by comparing ensembl sourced GFF and the results of > ensembl->genbank->unflattener->gff, but I haven't got round to this yet. > > Cheers > Chris > > On Tue, 9 Dec 2003, Scott Cain wrote: > > > Hello Chris, > > > > I am using Unflattener to create a genbank2gff script that is more > > robust than what we have now. As one of my example Genbank files, I am > > using an A. gambiae chromosome: > > > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide&list_uids=31249389&dopt=GenBank&term=NW_045730&qty=1 > > > > When I try to run the simplified script below, I get the following > > error: > > > > ------------- EXCEPTION ------------- > > MSG: structure_type 2 is currently unknown > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/lib/perl5/site_perl/5.8.1/Bio/SeqFeature/Tools/Unflattener.pm:1345 > > STACK toplevel ./simple.pl:19 > > > > -------------------------------------- > > > > As I read Unflattener, structure_type should only be set if I set it > > explicitly, right? So how is it getting set here, and how do I make it > > stop? > > > > Here's the script: > > #!/usr/bin/perl -w > > use strict; > > use Bio::SeqIO; > > use Bio::SeqFeature::Tools::Unflattener; > > > > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; > > > > my $seqio = Bio::SeqIO->new( > > -file => 'NW_045730.1.gbk', > > -format => 'GenBank' > > ); > > > > open OUT, '>out.gff'; > > > > while ( my $seq = $seqio->next_seq() ) { > > my $acc = $seq->accession; > > > > # get top level unflattended SeqFeatureI objects > > my @sfs = $unflattener->unflatten_seq( > > -seq => $seq, > > -use_magic => 1 > > ); > > > > foreach my $sf (@sfs) { > > my $gffio = > > $sf->gff_format( Bio::Tools::GFF->new( -gff_version => 3 ) ); > > > > $sf->seq_id($acc); > > > > if ( $sf->primary_tag() eq 'source' ) { > > $sf->add_tag_value( 'ID', $acc ); > > $sf->primary_tag('region'); > > } > > print OUT $sf->gff_string . "\n"; > > } > > } > > close OUT; > > --------------------------- > > > > Thanks, > > Scott > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjm at fruitfly.org Tue Dec 9 21:52:17 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Tue Dec 9 19:49:38 2003 Subject: [Bioperl-l] Re: Problem with Unflattener In-Reply-To: <1071016487.1440.60.camel@localhost.localdomain> Message-ID: Oh, not quite. Sorry, not an expert on the various pre-gff3 flavours But I imagine you could take the ensembl GFF (which is of 'GTF' flavour?) OR an actual ensembl mysql database, both of which sensibly preserve gene/transcript/protein/exon nesting structure, and turn that into GFF3 without too much difficulty. Ok, you start losing yourself in a maze of converters, but the advantage is you never have to go through that lossy step to genbank format where you lose the nesting structure. There are some situations where it could be simply impossible for the Unflattener to recover some of this lost structure. On Tue, 9 Dec 2003, Scott Cain wrote: > Thanks. > > Re: GFF from ensembl: You can get it as GFF3? Could you send me a link > if so. (You can tell I'm a little incredulous.) > > Scott > > On Tue, 2003-12-09 at 21:19, Chris Mungall wrote: > > Hi Scott > > > > Bug squashed, do a cvs update and it should work > > > > The problem was that this record uses /locus_tag instead of /gene - the > > unflattener should be able to detect this in magic mode, but there was one > > place where "/gene" was hardcoded. > > > > By the way, for this particular record you can get the exact same data > > from ensembl, already unflattened (or rather, never flattened into genbank > > format in the first place). Nevertheless, this sort of thing is extremely > > useful for testing Unflattener.pm, so carry on testing! Really I should do > > a full QC by comparing ensembl sourced GFF and the results of > > ensembl->genbank->unflattener->gff, but I haven't got round to this yet. > > > > Cheers > > Chris > > > > On Tue, 9 Dec 2003, Scott Cain wrote: > > > > > Hello Chris, > > > > > > I am using Unflattener to create a genbank2gff script that is more > > > robust than what we have now. As one of my example Genbank files, I am > > > using an A. gambiae chromosome: > > > > > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide&list_uids=31249389&dopt=GenBank&term=NW_045730&qty=1 > > > > > > When I try to run the simplified script below, I get the following > > > error: > > > > > > ------------- EXCEPTION ------------- > > > MSG: structure_type 2 is currently unknown > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/lib/perl5/site_perl/5.8.1/Bio/SeqFeature/Tools/Unflattener.pm:1345 > > > STACK toplevel ./simple.pl:19 > > > > > > -------------------------------------- > > > > > > As I read Unflattener, structure_type should only be set if I set it > > > explicitly, right? So how is it getting set here, and how do I make it > > > stop? > > > > > > Here's the script: > > > #!/usr/bin/perl -w > > > use strict; > > > use Bio::SeqIO; > > > use Bio::SeqFeature::Tools::Unflattener; > > > > > > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; > > > > > > my $seqio = Bio::SeqIO->new( > > > -file => 'NW_045730.1.gbk', > > > -format => 'GenBank' > > > ); > > > > > > open OUT, '>out.gff'; > > > > > > while ( my $seq = $seqio->next_seq() ) { > > > my $acc = $seq->accession; > > > > > > # get top level unflattended SeqFeatureI objects > > > my @sfs = $unflattener->unflatten_seq( > > > -seq => $seq, > > > -use_magic => 1 > > > ); > > > > > > foreach my $sf (@sfs) { > > > my $gffio = > > > $sf->gff_format( Bio::Tools::GFF->new( -gff_version => 3 ) ); > > > > > > $sf->seq_id($acc); > > > > > > if ( $sf->primary_tag() eq 'source' ) { > > > $sf->add_tag_value( 'ID', $acc ); > > > $sf->primary_tag('region'); > > > } > > > print OUT $sf->gff_string . "\n"; > > > } > > > } > > > close OUT; > > > --------------------------- > > > > > > Thanks, > > > Scott > > > > > > > > > From cain at cshl.org Tue Dec 9 22:12:02 2003 From: cain at cshl.org (Scott Cain) Date: Tue Dec 9 22:18:05 2003 Subject: [Bioperl-l] Re: Problem with Unflattener In-Reply-To: References: Message-ID: <1071025922.1439.63.camel@localhost.localdomain> Well, really the goal is to write a converter that can be used by a MOD to seed an instance of GMOD, and since Genbank is a defacto standard, there really needs to be a converter for it. At the moment, my approach is to warn liberally when the script can't figure something out. Thanks, Scott On Tue, 2003-12-09 at 21:52, Chris Mungall wrote: > Oh, not quite. Sorry, not an expert on the various pre-gff3 flavours > > But I imagine you could take the ensembl GFF (which is of 'GTF' flavour?) > OR an actual ensembl mysql database, both of which sensibly preserve > gene/transcript/protein/exon nesting structure, and turn that into GFF3 > without too much difficulty. Ok, you start losing yourself in a maze of > converters, but the advantage is you never have to go through that lossy > step to genbank format where you lose the nesting structure. There are > some situations where it could be simply impossible for the Unflattener to > recover some of this lost structure. > > On Tue, 9 Dec 2003, Scott Cain wrote: > > > Thanks. > > > > Re: GFF from ensembl: You can get it as GFF3? Could you send me a link > > if so. (You can tell I'm a little incredulous.) > > > > Scott > > > > On Tue, 2003-12-09 at 21:19, Chris Mungall wrote: > > > Hi Scott > > > > > > Bug squashed, do a cvs update and it should work > > > > > > The problem was that this record uses /locus_tag instead of /gene - the > > > unflattener should be able to detect this in magic mode, but there was one > > > place where "/gene" was hardcoded. > > > > > > By the way, for this particular record you can get the exact same data > > > from ensembl, already unflattened (or rather, never flattened into genbank > > > format in the first place). Nevertheless, this sort of thing is extremely > > > useful for testing Unflattener.pm, so carry on testing! Really I should do > > > a full QC by comparing ensembl sourced GFF and the results of > > > ensembl->genbank->unflattener->gff, but I haven't got round to this yet. > > > > > > Cheers > > > Chris > > > > > > On Tue, 9 Dec 2003, Scott Cain wrote: > > > > > > > Hello Chris, > > > > > > > > I am using Unflattener to create a genbank2gff script that is more > > > > robust than what we have now. As one of my example Genbank files, I am > > > > using an A. gambiae chromosome: > > > > > > > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide&list_uids=31249389&dopt=GenBank&term=NW_045730&qty=1 > > > > > > > > When I try to run the simplified script below, I get the following > > > > error: > > > > > > > > ------------- EXCEPTION ------------- > > > > MSG: structure_type 2 is currently unknown > > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/lib/perl5/site_perl/5.8.1/Bio/SeqFeature/Tools/Unflattener.pm:1345 > > > > STACK toplevel ./simple.pl:19 > > > > > > > > -------------------------------------- > > > > > > > > As I read Unflattener, structure_type should only be set if I set it > > > > explicitly, right? So how is it getting set here, and how do I make it > > > > stop? > > > > > > > > Here's the script: > > > > #!/usr/bin/perl -w > > > > use strict; > > > > use Bio::SeqIO; > > > > use Bio::SeqFeature::Tools::Unflattener; > > > > > > > > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; > > > > > > > > my $seqio = Bio::SeqIO->new( > > > > -file => 'NW_045730.1.gbk', > > > > -format => 'GenBank' > > > > ); > > > > > > > > open OUT, '>out.gff'; > > > > > > > > while ( my $seq = $seqio->next_seq() ) { > > > > my $acc = $seq->accession; > > > > > > > > # get top level unflattended SeqFeatureI objects > > > > my @sfs = $unflattener->unflatten_seq( > > > > -seq => $seq, > > > > -use_magic => 1 > > > > ); > > > > > > > > foreach my $sf (@sfs) { > > > > my $gffio = > > > > $sf->gff_format( Bio::Tools::GFF->new( -gff_version => 3 ) ); > > > > > > > > $sf->seq_id($acc); > > > > > > > > if ( $sf->primary_tag() eq 'source' ) { > > > > $sf->add_tag_value( 'ID', $acc ); > > > > $sf->primary_tag('region'); > > > > } > > > > print OUT $sf->gff_string . "\n"; > > > > } > > > > } > > > > close OUT; > > > > --------------------------- > > > > > > > > Thanks, > > > > Scott > > > > > > > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From pblaiklo at 110.net Wed Dec 10 01:28:01 2003 From: pblaiklo at 110.net (Peter Blaiklock) Date: Wed Dec 10 01:36:47 2003 Subject: [Bioperl-l] Bio::Restriction::Analysis feature merge In-Reply-To: References: Message-ID: <200312100128.01421.pblaiklo@110.net> On Tuesday 09 December 2003 18:39, Rob Edwards wrote: > Peter, > > I have a few comments on these revised modules: > > 1. Does circularity work OK? It fails the tests I devised at the moment > (there is an MwoI site near the 0 point on t/data/dna1.fa) It does now. http://www.restrictionmapper.org/bioperl/Analysis.pm > 2. It doesn't seem to handle multiple digests - did you add this in and > am I missing it? Sorry I wasn't clearer before. Analysis->get_digest can be passed an enzyme name, an Enzyme object or an EnzymeCollection object. It returns an array of hashes with each hash corresponding to a restriction fragment. The hash keys are the same as from the 'fragment_maps' method, i.e. 'start', 'end' and 'seq'. > 3. Do you have other test scripts that you have run on these so I can > compare? http://www.restrictionmapper.org/bioperl/ratest.pl > 4. I don't like the idea of reverse complementing the target sequence. > If you are using a large target sequence (e.g. 1 Mbp or more) this will > be horrible. It would be a lot better to reverse the enzyme and should > be the same (of course, you'd have to reverse the complementary strand > of the non-palindroimic enzyme). It's easier to debug this way and I suspect that the large number of Enzyme objects is our worst performance bottleneck anyway. But if you want to reverse complement the recognition site instead I will write tests for the different possibilities (cut before site, cut in site, cut after site). > 5. The newer version of Enzyme.pm (in cvs) doesn't require Storable.pm, > and this is better than demanding it be there. Go ahead and get rid of it. > > If you are done working on this I'll fix it up and alter the method > that gets the cut site to move that into Enzyme.pm so that Analysis > just works with the integers, and add the changed modules into cvs > before the weekend. If you have made any other changes can you email > them to me so I can incorporate them too? > > Thanks > > Rob > > On Sunday, December 7, 2003, at 08:32 PM, Peter Blaiklock wrote: > > Hi > > > > I modified my new Bio::Restriction::Analysis module to incorporate > > Rob's > > changes, including the use of index to locate nonambiguous sites and a > > digestion method that can handle multiple enzymes. Please let me know > > about > > any problems, suggestions, bugs, etc. I should have tests ready in a > > day or > > two. > > > > http://www.restrictionmapper.org/bioperl/Analysis.pm > > http://www.restrictionmapper.org/bioperl/Enzyme.pm > > http://www.restrictionmapper.org/bioperl/base.pm > > > > Peter Blaiklock > > Peter Blaiklock From heikki at nildram.co.uk Wed Dec 10 04:35:55 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 10 04:42:01 2003 Subject: [Bioperl-l] Bio::Tools::GuessSeqFormat unknown return value? Message-ID: <200312100935.42018.heikki@ebi.ac.uk> Bio::Tools::GuessSeqFormat returns now string 'unknown'. My question is: how strongly should we enforce the convention that unknown values return undef? In practice, the choises are (1) undef, (2) empty string, (3) zero, (4) predefined string value. What are are the advantages and disadvantages of this. Discuss. I seem to remember that Hilmar felt strongly about this. ;-) -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From ak at ebi.ac.uk Wed Dec 10 05:06:56 2003 From: ak at ebi.ac.uk (Andreas Kahari) Date: Wed Dec 10 05:13:07 2003 Subject: [Bioperl-l] Bio::Tools::GuessSeqFormat unknown return value? In-Reply-To: <200312100935.42018.heikki@ebi.ac.uk> References: <200312100935.42018.heikki@ebi.ac.uk> Message-ID: <20031210100656.GA6821@ebi.ac.uk> On Wed, Dec 10, 2003 at 09:35:55AM +0000, Heikki Lehvaslaiho wrote: > > Bio::Tools::GuessSeqFormat returns now string 'unknown'. My question is: how > strongly should we enforce the convention that unknown values return undef? > > In practice, the choises are (1) undef, (2) empty string, (3) zero, (4) > predefined string value. What are are the advantages and disadvantages of > this. Discuss. > > I seem to remember that Hilmar felt strongly about this. ;-) I chosed to return "unknown" when the module wasn't able to come up with one single good guess because I thought it would be better than to default to a valid format, such as "fasta" (this was the way of bioperl-1.2.3, see the new() method of e.g. Bio::SeqIO). In retrospect, I think undef would probably be a better value. Here's a patch for it: Index: GuessSeqFormat.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/Tools/GuessSeqFormat.pm,v retrieving revision 1.2 diff -u -r1.2 GuessSeqFormat.pm --- GuessSeqFormat.pm 2003/12/04 13:32:35 1.2 +++ GuessSeqFormat.pm 2003/12/10 10:03:05 @@ -50,7 +50,7 @@ The guess() method of a Bio::Tools::GuessSeqFormat object will examine the data, line by line, until it finds a line to which only one format can be assigned. If no conclusive guess can be -made, the "unknown" format is returned. +made, undef is returned. If the Bio::Tools::GuessSeqFormat object is given a filehandle which is seekable, it will be restored to its original position @@ -383,8 +383,7 @@ Function : Guesses the format of the data accociated with the object. Returns : A format string such as "swiss" or "pir". If a - format can not be found, the "unknown" format will - be returned. + format can not be found, undef is returned. Arguments : None. If the object is associated with a filehandle and if that @@ -502,7 +501,7 @@ # Seek to the start position. $fh->setpos($start_pos); } - return ($done ? $fmt_string : "unknown"); + return ($done ? $fmt_string : undef); } Cheers, Andreas -- |(--)| Andreas K?h?ri |-][-| |-)(-| EMBL, European Bioinformatics Institute |[--]| |(--)| Wellcome Trust Genome Campus, Hinxton |-][-| |-)(-| Cambridge, CB10 1SD |[--]| |(--)| United Kingdom |-][-| From juguang at tll.org.sg Wed Dec 10 06:26:55 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Wed Dec 10 06:32:57 2003 Subject: [Bioperl-l] Bio::Ontology::Term needs Bio::Annotation::Reference Message-ID: HI, I am trying to write the script and modules for loading InterPro into biosql. (I will announce the usage etc when I get it done) The term presenting InterPro record has the lists of attribute such as examples in the member database, which acts as Bio::Annotation::DBLink, and the related publications, which should be presented by Bio::Annotation::Reference, but the current Term does not contain that. So I will append the following pretty simple methods if no objections from the list. add_reference (accept an array of arguments) get_references remove_references Accordingly, the biosql schema should have term_ref table, similar to term_dbxref. (I leave this to Hilmar) Comments? Juguang From amackey at pcbi.upenn.edu Wed Dec 10 07:26:25 2003 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Dec 10 07:32:27 2003 Subject: [Bioperl-l] Bio::Tools::GuessSeqFormat unknown return value? In-Reply-To: <200312100935.42018.heikki@ebi.ac.uk> References: <200312100935.42018.heikki@ebi.ac.uk> Message-ID: <121E4401-2B0C-11D8-B84F-000A958C5008@pcbi.upenn.edu> I'd vote for undef as the most useful "Perlish" convention: die "Say what?" unless $format = $guesser->guess_format($file); vs: die "Say what?" unless ($format = $guesser->guess_format($file)) && $format ne "unknown"; -Aaron On Dec 10, 2003, at 4:35 AM, Heikki Lehvaslaiho wrote: > > Bio::Tools::GuessSeqFormat returns now string 'unknown'. My question > is: how > strongly should we enforce the convention that unknown values return > undef? > > In practice, the choises are (1) undef, (2) empty string, (3) zero, (4) > predefined string value. What are are the advantages and > disadvantages of > this. Discuss. > > I seem to remember that Hilmar felt strongly about this. ;-) > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Wed Dec 10 10:52:03 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 10 10:58:09 2003 Subject: [Bioperl-l] Bio::Tools::GuessSeqFormat unknown return value? In-Reply-To: <121E4401-2B0C-11D8-B84F-000A958C5008@pcbi.upenn.edu> Message-ID: That pretty much says what I've said as well. If there's no value then don't return one. -hilmar On Wednesday, December 10, 2003, at 04:26 AM, Aaron J. Mackey wrote: > I'd vote for undef as the most useful "Perlish" convention: > > die "Say what?" unless $format = $guesser->guess_format($file); > > vs: > > die "Say what?" unless ($format = $guesser->guess_format($file)) && > $format ne "unknown"; > > -Aaron > > > > On Dec 10, 2003, at 4:35 AM, Heikki Lehvaslaiho wrote: > >> >> Bio::Tools::GuessSeqFormat returns now string 'unknown'. My question >> is: how >> strongly should we enforce the convention that unknown values return >> undef? >> >> In practice, the choises are (1) undef, (2) empty string, (3) zero, >> (4) >> predefined string value. What are are the advantages and >> disadvantages of >> this. Discuss. >> >> I seem to remember that Hilmar felt strongly about this. ;-) >> >> -Heikki >> >> -- >> ______ _/ _/_____________________________________________________ >> _/ _/ http://www.ebi.ac.uk/mutations/ >> _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk >> _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute >> _/ _/ _/ Wellcome Trust Genome Campus, Hinxton >> _/ _/ _/ Cambs. CB10 1SD, United Kingdom >> _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 >> ___ _/_/_/_/_/________________________________________________________ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lairdm at sfu.ca Wed Dec 10 13:04:31 2003 From: lairdm at sfu.ca (Matthew Laird) Date: Wed Dec 10 13:08:36 2003 Subject: [Bioperl-l] Blast return codes In-Reply-To: <434AF352F9D03C4C896782B8CC78BC7628D2D2@vader.oriongenomics.com> Message-ID: Thanks for the replay, the error is as follows: Fatal error: ------------- EXCEPTION ------------- MSG: blastall call crashed: -1 /usr/local/blast/blastall -p blastp -d /usr/local/psort/conf/analysis/sclblast/sclblast -i /tmp/rvtNYtglud -e 1e-09 -o /tmp/jFVFD1Pxt3 STACK Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/Run/StandAloneBlast.pm:633 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/Run/StandAloneBlast.pm:603 STACK Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/Run/StandAloneBlast.pm:489 STACK Bio::Tools::Run::SCLBlast::blast /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/Run/SCLBlast.pm:83 STACK Bio::Tools::PSort::Module::SCLBlast::run /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/PSort/Module/SCLBlast.pm:34 STACK Bio::Tools::PSort::Pathway::__ANON__ /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/PSort/Pathway.pm:154 STACK Bio::Tools::PSort::Pathway::__ANON__ /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/PSort/Pathway.pm:156 STACK Bio::Tools::PSort::Pathway::traverse /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/PSort/Pathway.pm:117 STACK Bio::Tools::PSort::classify /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/PSort.pm:120 STACK (eval) /usr/local/psort/bin/psort:186 STACK toplevel /usr/local/psort/bin/psort:186 -------------------------------------- I modified bioperl to make a copy of the input file just before it executes the command thinking it could be a problem with the sequence. Unfortuantely when I then run the command by hand with that exact same input file I receive an exit status of 0. It's also even more odd that the exact same installation steps on the exact same flavour of linux with the exact same version of perl works on a different machine. It seems to be about a 50-50 success rate when I try it on various machines. This produce any thoughts on what might be going on with blast and bioperl? Thanks. On Tue, 9 Dec 2003, Joseph Bedell wrote: > > Hi Matthew, > > I didn't see that anyone answered this question for you. See my comments > below. > > >-----Original Message----- > >From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > >bounces@portal.open-bio.org] On Behalf Of Matthew Laird > >Sent: Friday, December 05, 2003 2:27 PM > >To: bioperl-l@bioperl.org > >Subject: [Bioperl-l] Blast return codes > > > >This is on the fringe of being off-topic... but still relivant. :) > > > >I've been trying to google around to find documentation regarding > return > >codes Blast gives upon exit. Is there a list of return codes and > >conditions that cause non-zero return codes? > > No, there is no such list. We recently wrote an O'Reilly book on BLAST > but we did not track down the return codes. We'll probably do that for > the next edition. ;) > > > > >The reason I ask is I'm having a problem calling Blast from bioperl. > For > >some reason Blastall is returning a -1 exit code which of course causes > >bioperl to throw an exception. However I have modified my local > install > >of bioperl to tell me the exact command that was being run. I've then > run > >this exact command with the exact same input files from the command > line > >and a standard 0 exit code is returned. Might anyone have any thoughts > on > >why Blastall would be returning a -1 exit code? > > I believe that all the return codes from blastall are positive integers. > A -1 return code may signify a problem with the system call from PERL. > The error messages are usually pretty comprehensive. Can you send the > error message that's produced? > > Thanks, > Joey > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Joseph A Bedell, Ph.D. > Director, Bioinformatics > Orion Genomics, LLC > 4041 Forest Park Ave. > St. Louis, MO 63108 > (314)615-6979; fax:(314)615-6975 > http://www.oriongenomics.com > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Matthew Laird SysAdmin/Web Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From pm66 at nyu.edu Wed Dec 10 13:24:09 2003 From: pm66 at nyu.edu (Philip MacMenamin) Date: Wed Dec 10 13:36:21 2003 Subject: [Bioperl-l] Bio::DB::GFF dna method not working for wormbase115. Message-ID: <200312101830.hBAIUHX8018586@mx4.nyu.edu> Hi, I have just loaded wormbase fatsa files to a GFF SQL database using Lincolns load_gff script, and everything was fine. However when I try to get the dna back out, using the same script that worked (works) for wormbase110 does not work now, ie: use Bio::DB::GFF; doConection stuff blah blah blah; my $segment1 = $db->segment('I',0, 2000); my $dna = $segment1->dna; print $dna if $debug; I can step through the perl debugger and find out how the story is differant between version 110 and 115, but at first look the databases seem similar. I looked through the mail archives to see if others have had this problem, but drew a blank, so I want to make sure that this is known. All the best, Philip. -- Philip MacMenamin Center for Comparative Functional Genomics NYU Department of Biology 1009 Silver Building 100 Washington Square East New York, NY 10003-6688 From kdj at sanger.ac.uk Wed Dec 10 16:21:49 2003 From: kdj at sanger.ac.uk (Keith James) Date: Wed Dec 10 16:27:53 2003 Subject: [Bioperl-l] Blast return codes In-Reply-To: References: Message-ID: >>>>> "Matthew" == Matthew Laird writes: Matthew> Thanks for the replay, the error is as follows: Fatal Matthew> error: ------------- EXCEPTION ------------- MSG: Matthew> blastall call crashed: -1 /usr/local/blast/blastall -p Matthew> blastp -d Matthew> /usr/local/psort/conf/analysis/sclblast/sclblast -i Matthew> /tmp/rvtNYtglud -e 1e-09 -o /tmp/jFVFD1Pxt3 I think the previous reply could have hit the answer i.e. the -1 means that blast is not getting run at all (perldoc system seems to confirm this). The code in StandAloneBlast is my $status = system($commandstring); $self->throw("$executable call crashed: $? $commandstring\n") unless ($status==0) ; Perl seems to set $? to a value (-1 again) even though the child process never starts. It may be worth including $! in the error message if the system call fails as this may be helpful. e.g. perl -e '$a = system("flub"); $a and print "$a : $?, $!\n" -1 : -1, No such file or directory Unfortunately I can't shed any light on the crux of your problem... Keith -- - Keith James Microarray Facility, Team 65 - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From lairdm at sfu.ca Wed Dec 10 16:40:28 2003 From: lairdm at sfu.ca (Matthew Laird) Date: Wed Dec 10 16:44:29 2003 Subject: [Bioperl-l] Blast return codes In-Reply-To: Message-ID: Well, that's a step in the right direction, I have a little more information now. I added a $! before the $? and received: ------------- EXCEPTION ------------- MSG: blastall call crashed: -1 No child processes /usr/local/blast/blastall -p blastp -d /usr/local/psort/conf/analysis/sclblast/sclblast -i /tmp/8Dt6zF1U59 -e 1e-09 -o /tmp/ojP9n04LZh STACK Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/Run/StandAloneBlast.pm:640 This is where we get into Perl voodoo beyond my league, "No child processes" - does that ring bells for anyone? Thanks again. On 10 Dec 2003, Keith James wrote: > >>>>> "Matthew" == Matthew Laird writes: > > Matthew> Thanks for the replay, the error is as follows: Fatal > Matthew> error: ------------- EXCEPTION ------------- MSG: > Matthew> blastall call crashed: -1 /usr/local/blast/blastall -p > Matthew> blastp -d > Matthew> /usr/local/psort/conf/analysis/sclblast/sclblast -i > Matthew> /tmp/rvtNYtglud -e 1e-09 -o /tmp/jFVFD1Pxt3 > > I think the previous reply could have hit the answer i.e. the -1 means > that blast is not getting run at all (perldoc system seems to confirm > this). > > The code in StandAloneBlast is > > my $status = system($commandstring); > > $self->throw("$executable call crashed: $? $commandstring\n") > unless ($status==0) ; > > Perl seems to set $? to a value (-1 again) even though the child > process never starts. It may be worth including $! in the error > message if the system call fails as this may be helpful. > > e.g. > > perl -e '$a = system("flub"); $a and print "$a : $?, $!\n" > > -1 : -1, No such file or directory > > Unfortunately I can't shed any light on the crux of your problem... > > Keith > > -- Matthew Laird SysAdmin/Web Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From MEC at Stowers-Institute.org Wed Dec 10 16:52:36 2003 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Wed Dec 10 16:58:38 2003 Subject: [Bioperl-l] Bioperl parser for PolyPhred? Message-ID: Yee! Take a look at ftp://ftp.genome.ou.edu/pub/programs/report_polyphred.pl for a head start Malcolm Cook Database Applications Manager - BioInformatics Stowers Institute for Medical Research -----Original Message----- From: Yee Man Chan [mailto:ymc@paxil.stanford.edu] Sent: Monday, December 08, 2003 1:44 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Bioperl parser for PolyPhred? Hi people My boss asked me to parse output generated by SNP identification program called PolyPhred (http://droog.mbt.washington.edu/PolyPhred.html). Is there a parser already in bioperl that can save me the pain of writing one? Thanks a lot. Yee Man _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From fsun at plantbio.uga.edu Wed Dec 10 16:55:54 2003 From: fsun at plantbio.uga.edu (Feng Sun) Date: Wed Dec 10 17:01:56 2003 Subject: [Bioperl-l] Can Bio::SearchIO::psl.pm parse BlastZ output? Message-ID: Hi Folks, I am using Bioperl1.303 to parse BlastZ(blastz-2003-05-14.tar.gz) output. But it looks like the psl.pm treat blastz file as psl file. Here is how I create the SearchIO object: $searchin = new Bio::SearchIO( -file => $file, -format => 'psl', -program_name => 'BLASTZ' ); Here is the error message: Argument "" isn't numeric in addition (+) at /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line 1. Argument "#:lav" isn't numeric in addition (+) at /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line 1. Use of uninitialized value in addition (+) at /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line 1. Use of uninitialized value in division (/) at /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line 1. Illegal division by zero at /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line 1. Here is line 191 in psl.pm: my $score = sprintf "%.2f", ( 100 * ( $matches + $mismatches + $rep_matches ) / $q_length ); The "$matches", "$mismatches" and "$rep_matches" are all fields in psl files. Also, I can't find anything in psl.pm indicating that it use the "-program_name => 'BLASTZ'" information to treat BlastZ files differently. Has anyone used this module to parse BlastZ output correctly? If yes, which version of Bioperl are you using? Thanks for your help! -- Feng Sun Laboratory for Genomics and Bioinformatics The University of Georgia, Department of Plant Biology Plant Sciences Building, Rm. 2502, Athens, GA 30602-7271, USA Tel:(706)5830791 From jason at cgt.duhs.duke.edu Wed Dec 10 17:14:36 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Dec 10 17:20:44 2003 Subject: [Bioperl-l] Can Bio::SearchIO::psl.pm parse BlastZ output? In-Reply-To: References: Message-ID: On Wed, 10 Dec 2003, Feng Sun wrote: > Hi Folks, > > I am using Bioperl1.303 to parse BlastZ(blastz-2003-05-14.tar.gz) output. > But it looks like the psl.pm treat blastz file as psl file. I'm confused - you are using the psl format SearchIO module and you're wondering why you need to pass in psl format data? You need to run lavToPsl to turn the blastz output into psl, as we don't make an effor to parse BLASTZ's lav natively. (http://www.cse.ucsc.edu/~kent/src/unzipped/hg/mouseStuff/lavToPsl/) The -program_name => 'BLASTZ' is a convience for the type of features that are created (source_tag is getting filled with $program_name). [From mail message where I introduced the SearchIO::psl module http://portal.open-bio.org/pipermail/bioped-l/2003-August/000016.html] [Bio::SearchIO] Added some more SearchIO parsers. Borrowing from Bala's Tools::Blat impelementation I made a SearchIO::psl parser which can parse PSL output. It needs to be tweaked a little more to skip the header lines if they are produced but works for me for output from Jim's lav2Psl code. > > Here is how I create the SearchIO object: > > $searchin = new Bio::SearchIO( -file => $file, > -format => 'psl', > -program_name => 'BLASTZ' ); > > Here is the error message: > > Argument "" isn't numeric in addition (+) at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > Argument "#:lav" isn't numeric in addition (+) at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > Use of uninitialized value in addition (+) at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > Use of uninitialized value in division (/) at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > Illegal division by zero at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > > Here is line 191 in psl.pm: > my $score = sprintf "%.2f", ( 100 * ( $matches + $mismatches + > $rep_matches ) / $q_length ); > > The "$matches", "$mismatches" and "$rep_matches" are all fields in psl > files. Also, I can't find anything in psl.pm indicating that it use the > "-program_name => 'BLASTZ'" information to treat BlastZ files differently. > > Has anyone used this module to parse BlastZ output correctly? If yes, > which version of Bioperl are you using? Thanks for your help! > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From wes.barris at csiro.au Wed Dec 10 17:34:01 2003 From: wes.barris at csiro.au (Wes Barris) Date: Wed Dec 10 17:40:22 2003 Subject: [Bioperl-l] Can Bio::SearchIO::psl.pm parse BlastZ output? In-Reply-To: References: Message-ID: <3FD79F59.8040500@csiro.au> Feng Sun wrote: > Hi Folks, > > I am using Bioperl1.303 to parse BlastZ(blastz-2003-05-14.tar.gz) output. > But it looks like the psl.pm treat blastz file as psl file. > > Here is how I create the SearchIO object: > > $searchin = new Bio::SearchIO( -file => $file, > -format => 'psl', > -program_name => 'BLASTZ' ); Perhaps I am missing something but I thought that blastz outputs .lav format not .psl format. Blat, outputs .psl files. > > Here is the error message: > > Argument "" isn't numeric in addition (+) at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > Argument "#:lav" isn't numeric in addition (+) at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > Use of uninitialized value in addition (+) at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > Use of uninitialized value in division (/) at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > Illegal division by zero at > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > 1. > > Here is line 191 in psl.pm: > my $score = sprintf "%.2f", ( 100 * ( $matches + $mismatches + > $rep_matches ) / $q_length ); > > The "$matches", "$mismatches" and "$rep_matches" are all fields in psl > files. Also, I can't find anything in psl.pm indicating that it use the > "-program_name => 'BLASTZ'" information to treat BlastZ files differently. > > Has anyone used this module to parse BlastZ output correctly? If yes, > which version of Bioperl are you using? Thanks for your help! > > -- > Feng Sun > > Laboratory for Genomics and Bioinformatics > The University of Georgia, Department of Plant Biology > Plant Sciences Building, Rm. 2502, > Athens, GA 30602-7271, USA > Tel:(706)5830791 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Wes Barris E-Mail: Wes.Barris@csiro.au From heikki at nildram.co.uk Wed Dec 10 17:48:43 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 10 17:54:50 2003 Subject: [Bioperl-l] Bio::Tools::GuessSeqFormat unknown return value? In-Reply-To: References: Message-ID: <200312102248.43319.heikki@nildram.co.uk> Just what I wanted to hear. Changes commited. -Heikki On Wednesday 10 Dec 2003 3:52 pm, Hilmar Lapp wrote: > That pretty much says what I've said as well. If there's no value then > don't return one. > > -hilmar > > On Wednesday, December 10, 2003, at 04:26 AM, Aaron J. Mackey wrote: > > I'd vote for undef as the most useful "Perlish" convention: > > > > die "Say what?" unless $format = $guesser->guess_format($file); > > > > vs: > > > > die "Say what?" unless ($format = $guesser->guess_format($file)) && > > $format ne "unknown"; > > > > -Aaron > > > > On Dec 10, 2003, at 4:35 AM, Heikki Lehvaslaiho wrote: > >> Bio::Tools::GuessSeqFormat returns now string 'unknown'. My question > >> is: how > >> strongly should we enforce the convention that unknown values return > >> undef? > >> > >> In practice, the choises are (1) undef, (2) empty string, (3) zero, > >> (4) > >> predefined string value. What are are the advantages and > >> disadvantages of > >> this. Discuss. > >> > >> I seem to remember that Hilmar felt strongly about this. ;-) > >> > >> -Heikki > >> > >> -- > >> ______ _/ _/_____________________________________________________ > >> _/ _/ http://www.ebi.ac.uk/mutations/ > >> _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > >> _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > >> _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > >> _/ _/ _/ Cambs. CB10 1SD, United Kingdom > >> _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > >> ___ _/_/_/_/_/________________________________________________________ > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From fsun at plantbio.uga.edu Wed Dec 10 17:54:26 2003 From: fsun at plantbio.uga.edu (Feng Sun) Date: Wed Dec 10 18:00:34 2003 Subject: [Bioperl-l] Can Bio::SearchIO::psl.pm parse BlastZ output? In-Reply-To: Message-ID: Thanks! Jason. I am clear now. I will use lavToPsl. -- Feng Sun On Wed, 10 Dec 2003, Jason Stajich wrote: > > On Wed, 10 Dec 2003, Feng Sun wrote: > > > Hi Folks, > > > > I am using Bioperl1.303 to parse BlastZ(blastz-2003-05-14.tar.gz) output. > > But it looks like the psl.pm treat blastz file as psl file. > > I'm confused - you are using the psl format SearchIO module and you're > wondering why you need to pass in psl format data? > > You need to run lavToPsl to turn the blastz output into psl, as we don't > make an effor to parse BLASTZ's lav natively. > (http://www.cse.ucsc.edu/~kent/src/unzipped/hg/mouseStuff/lavToPsl/) > > The -program_name => 'BLASTZ' is a convience for the type of features that > are created (source_tag is getting filled with $program_name). > > [From mail message where I introduced the SearchIO::psl module > http://portal.open-bio.org/pipermail/bioped-l/2003-August/000016.html] > > [Bio::SearchIO] > Added some more SearchIO parsers. Borrowing from Bala's Tools::Blat > impelementation I made a SearchIO::psl parser which can parse PSL output. > It needs to be tweaked a little more to skip the header lines if they are > produced but works for me for output from Jim's lav2Psl code. > > > > > > Here is how I create the SearchIO object: > > > > $searchin = new Bio::SearchIO( -file => $file, > > -format => 'psl', > > -program_name => 'BLASTZ' ); > > > > Here is the error message: > > > > Argument "" isn't numeric in addition (+) at > > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > > 1. > > Argument "#:lav" isn't numeric in addition (+) at > > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > > 1. > > Use of uninitialized value in addition (+) at > > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > > 1. > > Use of uninitialized value in division (/) at > > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > > 1. > > Illegal division by zero at > > /usr/lib/perl5/site_perl/5.6.1/Bio/SearchIO/psl.pm line 191, line > > 1. > > > > Here is line 191 in psl.pm: > > my $score = sprintf "%.2f", ( 100 * ( $matches + $mismatches + > > $rep_matches ) / $q_length ); > > > > The "$matches", "$mismatches" and "$rep_matches" are all fields in psl > > files. Also, I can't find anything in psl.pm indicating that it use the > > "-program_name => 'BLASTZ'" information to treat BlastZ files differently. > > > > Has anyone used this module to parse BlastZ output correctly? If yes, > > which version of Bioperl are you using? Thanks for your help! > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From atmarvin at hotmail.com Wed Dec 10 14:43:19 2003 From: atmarvin at hotmail.com (=?iso-8859-1?Q?Sebastian_J=FCnemann?=) Date: Wed Dec 10 18:47:01 2003 Subject: [Bioperl-l] Bio::MAGE Message-ID: Hi! Ive just described to this list...so im new in perl and of curse in handling Bioplerl Modules! Im working in a Project where we try to store genetik information and make them reachable via internet. Its on me to build a possibility for an user to import MAGE-Files ( that are xml Files based on the MAGE-Shem -- more : http://www.mged.org/Workgroups/MAGE/mage.html ) in our system and vice versa. So i get on the modul Bio::MAGE, which has already an implemented Eventhandler which can handle XML-Events (thrown from the Xerces XML-Parser). This Evendhalder ( the directory for the PerlModule : /somwehre/site_perl/Bio/MAGE/XMLUtils.pm ) then stores the Input into a mysqlDB. Did someone of did already near with these classes? Ive searched alle the way long from goolge to bioperl but nowhere can i find a cool HowTo, DiveIn or at least a useable API. Becouse im new in Perl i need Examples which use Bio::MAGE::XMLUtils ....... SO thx a lot.....and please give me esamples or hints ! Sebastian From lookafar at hotmail.com Wed Dec 10 19:22:15 2003 From: lookafar at hotmail.com (John Yao) Date: Wed Dec 10 19:28:22 2003 Subject: [Bioperl-l] problem using load_seqdatabase.pl with biosql Message-ID: I created the biosql schema in a mySQL server and tried to load a swissprot database. I was able to populate the database with taxonomy data with load_ncbi_taxnomy.pl However, When I tried to load a swissprot database I got the follwoing errors: /load_seqdatabase.pl --sqldb swiss --format swiss swissprot/sprot42.dat Reading swissprot/sprot42.dat DBD::mysql::st execute failed: Unknown column 'display_id' in 'field list' at /usr/lib/perl5/site_perl/5.6.0/Bio/DB/SQL/SeqAdaptor.pm line 427, line 53. DBD::mysql::st execute failed: Unknown column 'display_id' in 'field list' at /usr/lib/perl5/site_perl/5.6.0/Bio/DB/SQL/SeqAdaptor.pm line 427, line 53. It complained about not being able to find display_id. When I looked at the schema diagram (in pdf) and also checked the database tables, indeed, there is no column of display_id. Display_id was described in the schema-overview.txt but was missing is the actual schema. help! John Yao _________________________________________________________________ Shop online for kids’ toys by age group, price range, and toy category at MSN Shopping. No waiting for a clerk to help you! http://shopping.msn.com From sjmiller at email.arizona.edu Wed Dec 10 20:29:39 2003 From: sjmiller at email.arizona.edu (Susan J. Miller) Date: Wed Dec 10 20:35:41 2003 Subject: [Bioperl-l] Bio::Biblio Message-ID: <3FD7C883.1050708@email.arizona.edu> We are running Perl 5.6, Bioperl 1.2 on a Solaris8 machine. A couple of questions about Bio::Biblio: 1. When I try to use the Bio::Biblio->find method I get an error message saying that it is not implemented, yet looking in the bioperl list archive I see replies that mention this method. Should I be able to use it? Do I need a more recent version of BioPerl? 2. I would like to be able to take a Genbank accession number and find any Pubmed citations that are related to the accession number. NCBI Entrez search will turn up Pubmed hits given an Accession number - is there a way to do this using BioPerl? -- Thanks, -susan Susan J. Miller Biotechnology Computing Facility Arizona Research Laboratories Bio West 228 University of Arizona Tucson, AZ 85721 (520) 626-2597 From dwtrusty at hotmail.com Wed Dec 10 23:14:41 2003 From: dwtrusty at hotmail.com (David Trusty) Date: Wed Dec 10 23:20:45 2003 Subject: [Bioperl-l] Please help: upgraded to 1.2.3 and interfaces changed Message-ID: Hi, I am maintaining some code which uses Bioperl. I had to upgrade our Bioperl version, and now the code which uses the Bioperl functions is not working. Here is a piece of code which is no longer working: my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $blast_report = $factory->blastall($seqA); print ERROR_LOG "BLAST HITS TABLE\n\n"; print ERROR_LOG $blast_report->table_labels_tiled(); print ERROR_LOG $blast_report->table_tiled; I get this error: Can't locate object method "table_labels_tiled" via package "Bio::SearchIO::blast" at exon.cgi line 511. And for this code foreach $hit ($blast_report->hits) { I get this error: Can't locate object method "hits" via package "Bio::SearchIO::blast" at exon.cgi line 534. Is there a replacement for table_labels_tiled? I think I need to ask the factory to give me a Bio::Search::Result::BlastResult object, and then a Bio::Search::Hit::BlastHit. Do you agree? I've been looking for an example, but can't seem to find one. How can I change the code to get a Bio::Search::Result::BlastResult and then a Bio::Search::Hit::BlastHit object? The web site mentions examples in a directory called examples/search-blast, but I can't find it. Is there an example I can look at? Thanks, David _________________________________________________________________ Take advantage of our best MSN Dial-up offer of the year — six months @$9.95/month. Sign up now! http://join.msn.com/?page=dept/dialup From fangl at genomics.org.cn Thu Dec 11 01:59:35 2003 From: fangl at genomics.org.cn (Magic Fang) Date: Thu Dec 11 02:04:27 2003 Subject: [Bioperl-l] Bio::Seq::RichSeq error Message-ID: <200312111501921.SM01860@magicpc> test file: LOCUS AY007677 1433 bp DNA linear BCT 29-OCT-2001 DEFINITION Unknown marine alpha proteobacterium JP66.1 16S ribosomal RNA, partial sequence. ACCESSION AY007677 VERSION AY007677.1 GI:12000363 KEYWORDS . SOURCE unknown marine alpha proteobacterium JP66.1 ORGANISM unknown marine alpha proteobacterium JP66.1 Bacteria; Proteobacteria; Alphaproteobacteria. REFERENCE 1 (bases 1 to 1433) AUTHORS Eilers,H., Pernthaler,J., Peplies,J., Glockner,F.O., Gerdts,G. and Amann,R. TITLE Isolation of novel pelagic bacteria from the German bight and their seasonal contributions to surface picoplankton JOURNAL Appl. Environ. Microbiol. 67 (11), 5134-5142 (2001) MEDLINE 21536174 PUBMED 11679337 REFERENCE 2 (bases 1 to 1433) AUTHORS Eilers,H., Pernthaler,J., Peplies,J., Gloeckner,F.O., Gerdts,G., Schuett,C. and Amann,R. TITLE Identification and seasonal dominance of culturable marine bacteria JOURNAL Unpublished REFERENCE 3 (bases 1 to 1433) AUTHORS Eilers,H., Pernthaler,J., Peplies,J., Gloeckner,F.O., Gerdts,G., Schuett,C. and Amann,R. TITLE Direct Submission JOURNAL Submitted (30-AUG-2000) Molecular Ecology, Max-Planck-Institute, Celsiusstrasse 1, Bremen 28359, Germany FEATURES Location/Qualifiers source 1..1433 /organism="unknown marine alpha proteobacterium JP66.1" /mol_type="genomic DNA" /db_xref="taxon:145652" rRNA <1..>1433 /product="16S ribosomal RNA" ORIGIN 1 tcatggctca gaacgaacgc tggcggcagg cttaacacat gcaagtcgaa cgatctcttc 61 ggagatagtg gcagacgggt gagtaacgcg tgggaaccta ccttattcta cggaataaca 121 gttagaaatg actgctaata ccgtatacgc ccttcggggg aaagatttat cggagtagga 181 tgggcccgcg ttggattagc tagttggtgg ggtaatggcc taccaaggcg acgatctata 241 gctggtctga gaggatgatc agccacactg gaactgagac acggtccaga ctcctacggg 301 aggcagcagt ggggaatatt ggacaatggg cgcaagcctg atccagccat gccgcctgag 361 tgatgaaggc cttagggttg taaagctctt tcaacggtga agataatgac ggtaaccgta 421 gaagaagccc cggctaactt cgtgccagca gccgcggtaa tacgaagggg gctagcgttg 481 ttcggaatta ctgggcgtaa agcgtacgta ggcggattag aaagttaggg gtgaaatccc 541 agggctcaac cctggaactg cctctaaaac tcctaatctt gagttcgaga gaggtgagtg 601 gaattccgag tgtagaggtg aaattcgtag atattcggag gaacaccagt ggcgaaggcg 661 gctcactggc tcgatactga cgctgaggta cgaaagcgtg gggagcaaac aggattagat 721 accctggtag tccacgccgt aaacgatgaa tgttagccgt cgggcagtat actgttcggt 781 ggcgcagcta acgcattaaa cattccgcct ggggagtacg gtcgcaagat taaaactcaa 841 aggaattgac gggggcccgc acaagcggtg gagcatgtgg tttaattcga agcaacgcgc 901 agaaccttac cagcccttga cataccaatc gcggttagtg gagacacttt ccttcagttc 961 ggctggattg gatacaggtg ctgcatggct gtcgtcagct cgtgtcgtga gatgttgggt 1021 taagtcccgc aacgagcgca accctcgcct ttagttgcca gcatttagtt gggcactcta 1081 gagggactgc cggtgataag ccggaggaag gtggggatga cgtcaagtcc tcatggccct 1141 tacgggctgg gctacacacg tgctacaatg gtggtgacag tgggcagcga gacggcaacg 1201 tcgagctaat ctccaaaaac catctcartt cggattgggg tctgcaactc gacccccatg 1261 aagttggaat cgctagtaat cgcggatcag catgccgcgg tgaatacgtt cccgggcctt 1321 gtacacaccg cccgtcacac catgggagtt ggtcttaccc gaaggcgatg cgctaaccag 1381 caatggaggc agtcgaccac ggtagggtca gcgactgggg tgaagtcgta aca // test command: $ perl -e 'use Bio::SeqIO;$in=Bio::SeqIO->new(-file=>"test.gbk", -format=>"genbank");$seq=$in->next_seq;print $seq->dis play_id, "\t", $seq->species->species, "\n";' error message: Can't call method "species" on an undefined value at -e line 1, line 61. bioperl version 1.3.01 From heikki at nildram.co.uk Thu Dec 11 05:45:33 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Dec 11 05:51:41 2003 Subject: [Bioperl-l] Bio::Seq::RichSeq error In-Reply-To: <200312111501921.SM01860@magicpc> References: <200312111501921.SM01860@magicpc> Message-ID: <200312111045.33997.heikki@nildram.co.uk> Fang, It is not a bug but a feature. In EMBL, GenBank and Swiss-Prot parsers you'll find these lines: # Don't make a species object if it's empty or "Unknown" or "None" return unless $genus and $genus !~ /^(Unknown|None)$/oi; There are 58 entries with Unknown as the first word in the OS line in the current EMBL databank. I would not be too difficult to modify the parsers to include these, but would it be useful and how to do it? binomial() should return a valid scientific name, so we should not use species, I guess. Higher taxa might be of some use. We already have one exception writen in Viri, but these unknown species are even fuzzier. You someone can come up with a plan, I am happy to code it in. -Heikki On Thursday 11 Dec 2003 6:59 am, Magic Fang wrote: > test file: > LOCUS AY007677 1433 bp DNA linear BCT > 29-OCT-2001 DEFINITION Unknown marine alpha proteobacterium JP66.1 16S > ribosomal RNA, partial sequence. > ACCESSION AY007677 > VERSION AY007677.1 GI:12000363 > KEYWORDS . > SOURCE unknown marine alpha proteobacterium JP66.1 > ORGANISM unknown marine alpha proteobacterium JP66.1 > Bacteria; Proteobacteria; Alphaproteobacteria. > REFERENCE 1 (bases 1 to 1433) > AUTHORS Eilers,H., Pernthaler,J., Peplies,J., Glockner,F.O., Gerdts,G. > and Amann,R. > TITLE Isolation of novel pelagic bacteria from the German bight and > their seasonal contributions to surface picoplankton > JOURNAL Appl. Environ. Microbiol. 67 (11), 5134-5142 (2001) > MEDLINE 21536174 > PUBMED 11679337 > REFERENCE 2 (bases 1 to 1433) > AUTHORS Eilers,H., Pernthaler,J., Peplies,J., Gloeckner,F.O., > Gerdts,G., Schuett,C. and Amann,R. > TITLE Identification and seasonal dominance of culturable marine > bacteria JOURNAL Unpublished > REFERENCE 3 (bases 1 to 1433) > AUTHORS Eilers,H., Pernthaler,J., Peplies,J., Gloeckner,F.O., > Gerdts,G., Schuett,C. and Amann,R. > TITLE Direct Submission > JOURNAL Submitted (30-AUG-2000) Molecular Ecology, > Max-Planck-Institute, Celsiusstrasse 1, Bremen 28359, Germany > FEATURES Location/Qualifiers > source 1..1433 > /organism="unknown marine alpha proteobacterium > JP66.1" /mol_type="genomic DNA" > /db_xref="taxon:145652" > rRNA <1..>1433 > /product="16S ribosomal RNA" > ORIGIN > 1 tcatggctca gaacgaacgc tggcggcagg cttaacacat gcaagtcgaa cgatctcttc > 61 ggagatagtg gcagacgggt gagtaacgcg tgggaaccta ccttattcta cggaataaca > 121 gttagaaatg actgctaata ccgtatacgc ccttcggggg aaagatttat cggagtagga > 181 tgggcccgcg ttggattagc tagttggtgg ggtaatggcc taccaaggcg acgatctata > 241 gctggtctga gaggatgatc agccacactg gaactgagac acggtccaga ctcctacggg > 301 aggcagcagt ggggaatatt ggacaatggg cgcaagcctg atccagccat gccgcctgag > 361 tgatgaaggc cttagggttg taaagctctt tcaacggtga agataatgac ggtaaccgta > 421 gaagaagccc cggctaactt cgtgccagca gccgcggtaa tacgaagggg gctagcgttg > 481 ttcggaatta ctgggcgtaa agcgtacgta ggcggattag aaagttaggg gtgaaatccc > 541 agggctcaac cctggaactg cctctaaaac tcctaatctt gagttcgaga gaggtgagtg > 601 gaattccgag tgtagaggtg aaattcgtag atattcggag gaacaccagt ggcgaaggcg > 661 gctcactggc tcgatactga cgctgaggta cgaaagcgtg gggagcaaac aggattagat > 721 accctggtag tccacgccgt aaacgatgaa tgttagccgt cgggcagtat actgttcggt > 781 ggcgcagcta acgcattaaa cattccgcct ggggagtacg gtcgcaagat taaaactcaa > 841 aggaattgac gggggcccgc acaagcggtg gagcatgtgg tttaattcga agcaacgcgc > 901 agaaccttac cagcccttga cataccaatc gcggttagtg gagacacttt ccttcagttc > 961 ggctggattg gatacaggtg ctgcatggct gtcgtcagct cgtgtcgtga gatgttgggt > 1021 taagtcccgc aacgagcgca accctcgcct ttagttgcca gcatttagtt gggcactcta > 1081 gagggactgc cggtgataag ccggaggaag gtggggatga cgtcaagtcc tcatggccct > 1141 tacgggctgg gctacacacg tgctacaatg gtggtgacag tgggcagcga gacggcaacg > 1201 tcgagctaat ctccaaaaac catctcartt cggattgggg tctgcaactc gacccccatg > 1261 aagttggaat cgctagtaat cgcggatcag catgccgcgg tgaatacgtt cccgggcctt > 1321 gtacacaccg cccgtcacac catgggagtt ggtcttaccc gaaggcgatg cgctaaccag > 1381 caatggaggc agtcgaccac ggtagggtca gcgactgggg tgaagtcgta aca > // > > test command: > $ perl -e 'use Bio::SeqIO;$in=Bio::SeqIO->new(-file=>"test.gbk", > -format=>"genbank");$seq=$in->next_seq;print $seq->dis play_id, "\t", > $seq->species->species, "\n";' > > error message: > Can't call method "species" on an undefined value at -e line 1, line > 61. > > bioperl version 1.3.01 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From kdj at sanger.ac.uk Thu Dec 11 05:58:11 2003 From: kdj at sanger.ac.uk (Keith James) Date: Thu Dec 11 06:04:13 2003 Subject: [Bioperl-l] Blast return codes In-Reply-To: References: Message-ID: >>>>> "Matthew" == Matthew Laird writes: Matthew> Well, that's a step in the right direction, I have a Matthew> little more information now. I added a $! before the $? Matthew> and received: Matthew> ------------- EXCEPTION ------------- MSG: blastall call Matthew> crashed: -1 No child processes /usr/local/blast/blastall Matthew> -p blastp -d Matthew> /usr/local/psort/conf/analysis/sclblast/sclblast -i Matthew> /tmp/8Dt6zF1U59 -e 1e-09 -o /tmp/ojP9n04LZh Matthew> STACK Bio::Tools::Run::StandAloneBlast::_runblast Matthew> /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/Run/StandAloneBlast.pm:640 Matthew> This is where we get into Perl voodoo beyond my league, Matthew> "No child processes" - does that ring bells for anyone? Matthew> Thanks again. That's interesting. I think we need to know your OS platform and Perl version to get any further. I think that the value left by a system call in $? is the same as if a wait system call were made. No child processes is a Unix error code (ECHILD) which can be caused by a wait, being reported by Perl. What happens if you run the test for StandAloneBlast? (t/StandAloneBlast.t) Keith -- - Keith James Microarray Facility, Team 65 - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From heikki at nildram.co.uk Thu Dec 11 06:17:50 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Dec 11 06:23:51 2003 Subject: [Bioperl-l] updated bioperl Download page Message-ID: <200312111117.50856.heikki@nildram.co.uk> In preparation of the 1.4 release, I put some information about downloadable bioperl packages into http://www.bioperl.org/Core/Latest/ See what you think and if I have created a wrong link or done something stupid, let me know. I've created two categories: "Current Bioperl Releases" and "Latest Development Releases", but I am open to other suggestions. For example, DB and Microarray might belong to an other, third category? -Heikki P.S. It might be time for an other biosql/bioperl-db release. ;-) -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From lstein at cshl.edu Thu Dec 11 08:41:45 2003 From: lstein at cshl.edu (lstein@cshl.edu) Date: Thu Dec 11 08:47:45 2003 Subject: [Bioperl-l] Problems in Bio::Graphics Message-ID: <200312111341.hBBDfjso004192@pronto.lsjs.org> Hi Folks, There are new problems in Bio::Graphics. It's going to take a few days for me to fix them. Please hold the 1.4 release until I say it's OK, OK? Lincoln -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From jurgen.pletinckx at algonomics.com Thu Dec 11 09:32:02 2003 From: jurgen.pletinckx at algonomics.com (Jurgen Pletinckx) Date: Thu Dec 11 09:29:23 2003 Subject: [Bioperl-l] RE: :structure In-Reply-To: Message-ID: # Im using bioperl, to parse a pdb file, # # however I cant seem to get information on if an # atom is a HETATM or ATOM # # Do you know if it is possibble to get this? # Dear Jonathan, first off - I'm not Kris - he left the company a while ago. I rather like his Bio::Structure modules, though, and I can help out here. As it happens, there is no built-in method to do this as yet. I offer for your amusement the following sample script, achieving what you seek. It is based on the same technique used deep inside Bio::Structure::Io::pdb->write_structure. The assumption is that all HETATM lines in the file have been announced in the PDB header, in HET lines. This holds for (all?) current official PDB files, but is bound to fail for user-generated files. This will work for atoms where you know the residue they belong to (the usual situation). Let us know it this doesn't work for you, so we can look further into it. #!/usr/bin/perl -w use strict; use Bio::Structure::IO; my $f = "/PDB/ab/pdb1abz.ent"; my $entry = Bio::Structure::IO->new(-file=>$f)->next_structure; # obtain text of HET lines (my $hetstring = ($entry->annotation->get_Annotations("het"))[0]->as_text) =~ s/^Value: //; # create lookup hash of HET residues my %het_res; $het_res{'HOH'}++; # HOH is implicit hetero-atom while (my $l = substr($hetstring,0,63,'')) { $l =~ s/^\s*(\S+)\s+.*$/$1/; $het_res{$l}++; } for my $chain ($entry->get_chains) { my $chid = $chain->id; for my $residue ($entry->get_residues($chain)) { my $resid = $residue->id; my ($label, $pos) = split /-/, $resid; if ($het_res{$label}) { print "HETR ",$chid,$resid,"\n"; } else { print "NORM ",$chid,$resid,"\n"; } } } -- Jurgen Pletinckx AlgoNomics NV From jason at cgt.duhs.duke.edu Thu Dec 11 09:42:03 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Dec 11 09:48:06 2003 Subject: [Bioperl-l] Please help: upgraded to 1.2.3 and interfaces changed In-Reply-To: References: Message-ID: On Wed, 10 Dec 2003, David Trusty wrote: > Hi, > > I am maintaining some code which uses Bioperl. I had to upgrade our Bioperl > version, > and now the code which uses the Bioperl functions is not working. > > Here is a piece of code which is no longer working: > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > my $blast_report = $factory->blastall($seqA); > > print ERROR_LOG "BLAST HITS TABLE\n\n"; > print ERROR_LOG $blast_report->table_labels_tiled(); > print ERROR_LOG $blast_report->table_tiled; > This is particularly old stuff as it used Bio::Tools::Blast which has been deprecated since 1.0.x I believe. > I get this error: > Can't locate object method "table_labels_tiled" via package > "Bio::SearchIO::blast" at exon.cgi line 511. > > And for this code > > foreach $hit ($blast_report->hits) { > > I get this error: > Can't locate object method "hits" via package "Bio::SearchIO::blast" at > exon.cgi line 534. > > Is there a replacement for table_labels_tiled? > You need to use Bio::SearchIO::Writer::HitTableWriter > I think I need to ask the factory to give me a > Bio::Search::Result::BlastResult object, and then a > Bio::Search::Hit::BlastHit. Do you agree? > > I've been looking for an example, but can't seem to find one. How can I > change the code to get a Bio::Search::Result::BlastResult and then a > Bio::Search::Hit::BlastHit object? while( my $r = $blast_report->next_result ) { # $r is a Result object while( my $hit = $r->next_hit ) { # $hit is a Hit object } # or if you prefer # for my $hit ( $r->hits ) { # } } I don't know what the table_labels_tiled looks like exactly but if you look at Bio::SearchIO::Writer::HitTableWriter and Bio::SearchIO::Writer::ResultTableWriter there is a starting point. you can also look at scripts/searchio and see a bunch of useable scripts - I think some of the examples directory got rearranged after the 1.2.x branch so a lot of things are in the scripts directory that are (post-1.2 branch) now in the examples directory. > > The web site mentions examples in a directory called examples/search-blast, > but I can't find it. > Is there an example I can look at? > > Thanks, > > David > > _________________________________________________________________ > Take advantage of our best MSN Dial-up offer of the year — six months > @$9.95/month. Sign up now! http://join.msn.com/?page=dept/dialup > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Thu Dec 11 09:43:02 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Dec 11 09:49:03 2003 Subject: [Bioperl-l] Bio::DB::GenBank (fwd) Message-ID: -- Jason Stajich Duke University jason at cgt.mc.duke.edu ---------- Forwarded message ---------- Date: Wed, 10 Dec 2003 16:45:58 -0800 (PST) From: Shuai Weng To: amackey@virginia.edu Cc: jason@bioperl.org Subject: Bio::DB::GenBank Hi, The Bio::DB::GenBank doesn't work for me now. The example code I used is below. ====================== use strict; use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seqId = 'L00683'; my $seqObj = $gb->get_Stream_by_id($seqId); while ( my $seq = $seqObj->next_seq() ) { ### do something here } ====================== This module worked great before. I was wondering if the problem is due to the NCBI entrez url change... Cheers, Shuai From markw at illuminae.com Thu Dec 11 09:55:03 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Thu Dec 11 10:01:09 2003 Subject: [BioPerl] [Bioperl-l] Bio::DB::GenBank (fwd) In-Reply-To: References: Message-ID: <1071154502.3849.30.camel@localhost.localdomain> Yeah, that code stopped working "reliably" yesterday. Only about 1 in 4 calls were getting through yesterday afternoon, and the ones that did make it were taking forever. It appears to be a problem at GB's end, not with the bioperl code. All my MOBY GB services died inauspiciously :-) As a result, I am giving up on GB as a data host, and will now serve GB data through SeqHound - a Canadian warehouse of GB/SP/... data that updates nightly. I'm tired of dealing with GB's flakeyness. Maybe I'm unlucky, but the last few times I have tried to do stuff with GB it has failed (not bioperl's fault!), so I just can't be bothered with them anymore. I've actually just finished (about to commit!) the re-coding of all GB-related MOBY services to work off of SeqHound - they run MUCH faster now :-) Cheers all! Mark On Thu, 2003-12-11 at 08:43, Jason Stajich wrote: > The Bio::DB::GenBank doesn't work for me now. -- Mark Wilkinson markw@illuminae.com ------------------------------------------------------------------------ It just goes to show you that SOAP::Lite is more intuitive than you might think, if you know enough Perl and have the patience to dive into the source code. -Byrne Reese -http://builder.com.com/5100-6389_14-1045078-2.html ------------------------------------------------------------------------ From heikki at nildram.co.uk Thu Dec 11 10:09:08 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Dec 11 10:15:11 2003 Subject: [Bioperl-l] Bio::DB::GenBank (fwd) In-Reply-To: References: Message-ID: <200312111509.08630.heikki@nildram.co.uk> Dear Shuai Weng, You are right. Enrez URL changed. You must be using bioperl < 1.2.3. Upgrade and the script will work again. Yours, -Heikki On Thursday 11 Dec 2003 2:43 pm, Jason Stajich wrote: > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > > ---------- Forwarded message ---------- > Date: Wed, 10 Dec 2003 16:45:58 -0800 (PST) > From: Shuai Weng > To: amackey@virginia.edu > Cc: jason@bioperl.org > Subject: Bio::DB::GenBank > > > Hi, > > The Bio::DB::GenBank doesn't work for me now. > > The example code I used is below. > > ====================== > use strict; > use Bio::DB::GenBank; > > my $gb = new Bio::DB::GenBank; > > my $seqId = 'L00683'; > > my $seqObj = $gb->get_Stream_by_id($seqId); > > while ( my $seq = $seqObj->next_seq() ) { > > ### do something here > > } > ====================== > > This module worked great before. I was wondering > if the problem is due to the NCBI entrez url change... > > Cheers, > > Shuai > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From alicelu at Rubicongenomics.com Thu Dec 11 10:14:46 2003 From: alicelu at Rubicongenomics.com (Alice Lu) Date: Thu Dec 11 10:20:16 2003 Subject: [Bioperl-l] can't receive group emails Message-ID: <3F20FEEE1A5AE0428CCE3F4CC62D5E302CEB6A@gene-jini.RUBIGEN.RUBICONGENOMICS.COM> Hi everyone, For whatever reason, I stopped receiving the Bioperl group emails, any ideas from anyone? Thanks, alice From heikki at nildram.co.uk Thu Dec 11 10:17:17 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Dec 11 10:23:18 2003 Subject: [BioPerl] [Bioperl-l] Bio::DB::GenBank (fwd) In-Reply-To: <1071154502.3849.30.camel@localhost.localdomain> References: <1071154502.3849.30.camel@localhost.localdomain> Message-ID: <200312111517.17286.heikki@nildram.co.uk> Mark, I replied first before seeing your message. After that I tried the script with several accession numbers. All worked. Seems to me that the what ever glitch it was is gone new. -Heikki On Thursday 11 Dec 2003 2:55 pm, Mark Wilkinson wrote: > Yeah, that code stopped working "reliably" yesterday. Only about 1 in 4 > calls were getting through yesterday afternoon, and the ones that did > make it were taking forever. It appears to be a problem at GB's end, > not with the bioperl code. All my MOBY GB services died inauspiciously > > :-) > > As a result, I am giving up on GB as a data host, and will now serve GB > data through SeqHound - a Canadian warehouse of GB/SP/... data that > updates nightly. I'm tired of dealing with GB's flakeyness. Maybe I'm > unlucky, but the last few times I have tried to do stuff with GB it has > failed (not bioperl's fault!), so I just can't be bothered with them > anymore. I've actually just finished (about to commit!) the re-coding > of all GB-related MOBY services to work off of SeqHound - they run MUCH > faster now :-) > > Cheers all! > > Mark > > On Thu, 2003-12-11 at 08:43, Jason Stajich wrote: > > The Bio::DB::GenBank doesn't work for me now. -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From markw at illuminae.com Thu Dec 11 10:57:37 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Thu Dec 11 11:03:43 2003 Subject: [BioPerl] [Bioperl-l] Bio::DB::GenBank (fwd) In-Reply-To: <200312111517.17286.heikki@nildram.co.uk> References: <1071154502.3849.30.camel@localhost.localdomain> <200312111517.17286.heikki@nildram.co.uk> Message-ID: <1071158256.3850.47.camel@localhost.localdomain> Hi Heikki, it was weird... I would make exactly the same call over and over again and about 1 in 4 would actually return, the rest gave an error HTML page (to STDERR) saying that the problem was "temporary" and blah blah blah. It was the same problem both with the SeqStream and the single getSeq calls so I wasn't able to work around it. Eventually I just gave up. Anyway, if it works today, that's good :-) Cheers from the frozen north! M On Thu, 2003-12-11 at 09:17, Heikki Lehvaslaiho wrote: > Mark, > > I replied first before seeing your message. After that I tried the script with > several accession numbers. All worked. Seems to me that the what ever glitch > it was is gone new. > > -Heikki > > On Thursday 11 Dec 2003 2:55 pm, Mark Wilkinson wrote: > > Yeah, that code stopped working "reliably" yesterday. Only about 1 in 4 > > calls were getting through yesterday afternoon, and the ones that did > > make it were taking forever. It appears to be a problem at GB's end, > > not with the bioperl code. All my MOBY GB services died inauspiciously > > > > :-) > > > > As a result, I am giving up on GB as a data host, and will now serve GB > > data through SeqHound - a Canadian warehouse of GB/SP/... data that > > updates nightly. I'm tired of dealing with GB's flakeyness. Maybe I'm > > unlucky, but the last few times I have tried to do stuff with GB it has > > failed (not bioperl's fault!), so I just can't be bothered with them > > anymore. I've actually just finished (about to commit!) the re-coding > > of all GB-related MOBY services to work off of SeqHound - they run MUCH > > faster now :-) > > > > Cheers all! > > > > Mark > > > > On Thu, 2003-12-11 at 08:43, Jason Stajich wrote: > > > The Bio::DB::GenBank doesn't work for me now. -- Mark Wilkinson Illuminae From jason at cgt.duhs.duke.edu Thu Dec 11 11:22:18 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Dec 11 11:28:21 2003 Subject: [BioPerl] [Bioperl-l] Bio::DB::GenBank (fwd) In-Reply-To: <1071154502.3849.30.camel@localhost.localdomain> References: <1071154502.3849.30.camel@localhost.localdomain> Message-ID: Hey great - been waiting for someone to write a SeqHound interface for a while. Great that you took it on! -jason On Thu, 11 Dec 2003, Mark Wilkinson wrote: > Yeah, that code stopped working "reliably" yesterday. Only about 1 in 4 > calls were getting through yesterday afternoon, and the ones that did > make it were taking forever. It appears to be a problem at GB's end, > not with the bioperl code. All my MOBY GB services died inauspiciously > :-) > > As a result, I am giving up on GB as a data host, and will now serve GB > data through SeqHound - a Canadian warehouse of GB/SP/... data that > updates nightly. I'm tired of dealing with GB's flakeyness. Maybe I'm > unlucky, but the last few times I have tried to do stuff with GB it has > failed (not bioperl's fault!), so I just can't be bothered with them > anymore. I've actually just finished (about to commit!) the re-coding > of all GB-related MOBY services to work off of SeqHound - they run MUCH > faster now :-) > > Cheers all! > > Mark > > > On Thu, 2003-12-11 at 08:43, Jason Stajich wrote: > > > The Bio::DB::GenBank doesn't work for me now. > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Thu Dec 11 11:32:06 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Dec 11 11:38:08 2003 Subject: [Bioperl-l] Bio::DB::Fasta Message-ID: There seems to be something wrong with the offsets in Bio::DB::Fasta $db->seq($id, $start => $end); Is returning and invalid value. We need to fix this pre-1.4 release. I don't have time to delve into the mysteries right now, but have submitted a bug report. #!/usr/bin/perl -w use strict; use Bio::DB::Fasta; use Bio::Index::Fasta; use Test; BEGIN { plan tests => 1} use Bio::SeqIO; use Bio::PrimarySeq; my $dbfile = 'db.fas'; my $fas_db = Bio::DB::Fasta->new($dbfile); my $idx_db = Bio::Index::Fasta->new(-filename => '/tmp/test.idx', -write_flag=> 1); $idx_db->make_index($dbfile); my ($id,$start,$end) = ('id1', 78408, 80349); my $seq1 = $fas_db->seq($id, $start => $end); my $seq2 = $idx_db->get_Seq_by_acc($id); my $out = Bio::SeqIO->new(-format => 'fasta', -file => ">o.fa"); $out->write_seq(Bio::PrimarySeq->new(-id => 'db::fasta', -seq => $seq1)); $out->write_seq(Bio::PrimarySeq->new(-id => 'index::fasta', -seq => $seq2->subseq($start,$end))); ok($seq1, $seq2->subseq($start,$end) ); -- Jason Stajich Duke University jason at cgt.mc.duke.edu From hlapp at gmx.net Thu Dec 11 11:48:46 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Dec 11 11:54:46 2003 Subject: [Bioperl-l] problem using load_seqdatabase.pl with biosql In-Reply-To: Message-ID: display_id was renamed to name in the Singapore version. You need to get a recent download of biosql-schema and make sure that's the schema version you have instantiated. -hilmar On Wednesday, December 10, 2003, at 04:22 PM, John Yao wrote: > > I created the biosql schema in a mySQL server and tried to load a > swissprot database. I was able to populate the database with taxonomy > data with load_ncbi_taxnomy.pl > > However, When I tried to load a swissprot database I got the follwoing > errors: > > /load_seqdatabase.pl --sqldb swiss --format swiss > swissprot/sprot42.dat > > > Reading swissprot/sprot42.dat > DBD::mysql::st execute failed: Unknown column 'display_id' in 'field > list' at /usr/lib/perl5/site_perl/5.6.0/Bio/DB/SQL/SeqAdaptor.pm line > 427, line 53. > DBD::mysql::st execute failed: Unknown column 'display_id' in 'field > list' at /usr/lib/perl5/site_perl/5.6.0/Bio/DB/SQL/SeqAdaptor.pm line > 427, line 53. > > It complained about not being able to find display_id. > > When I looked at the schema diagram (in pdf) and also checked the > database tables, indeed, there is no column of display_id. Display_id > was described in the schema-overview.txt but was missing is the actual > schema. > > help! > > > John Yao > > _________________________________________________________________ > Shop online for kids? toys by age group, price range, and toy category > at MSN Shopping. No waiting for a clerk to help you! > http://shopping.msn.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From markw at illuminae.com Thu Dec 11 12:16:47 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Thu Dec 11 12:25:27 2003 Subject: [BioPerl] [Bioperl-l] Bio::DB::GenBank (fwd) In-Reply-To: References: <1071154502.3849.30.camel@localhost.localdomain> Message-ID: <1071163007.3849.63.camel@localhost.localdomain> The SeqHound developers apparently have a rudimentary set of BioPerl modules written, but they aren't "production quality" yet from what I have heard (I haven't tried them - I just use their procedural API with great satisfaction!). I'm sure they will put these modules into the BioPerl CVS as soon as they are up to snuff. They're a good bunch :-) M On Thu, 2003-12-11 at 10:22, Jason Stajich wrote: > Hey great - been waiting for someone to write a SeqHound interface for a > while. Great that you took it on! > > -jason > > On Thu, 11 Dec 2003, Mark Wilkinson wrote: > > > Yeah, that code stopped working "reliably" yesterday. Only about 1 in 4 > > calls were getting through yesterday afternoon, and the ones that did > > make it were taking forever. It appears to be a problem at GB's end, > > not with the bioperl code. All my MOBY GB services died inauspiciously > > :-) > > > > As a result, I am giving up on GB as a data host, and will now serve GB > > data through SeqHound - a Canadian warehouse of GB/SP/... data that > > updates nightly. I'm tired of dealing with GB's flakeyness. Maybe I'm > > unlucky, but the last few times I have tried to do stuff with GB it has > > failed (not bioperl's fault!), so I just can't be bothered with them > > anymore. I've actually just finished (about to commit!) the re-coding > > of all GB-related MOBY services to work off of SeqHound - they run MUCH > > faster now :-) > > > > Cheers all! > > > > Mark > > > > > > On Thu, 2003-12-11 at 08:43, Jason Stajich wrote: > > > > > The Bio::DB::GenBank doesn't work for me now. > > > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Mark Wilkinson Illuminae From lairdm at sfu.ca Thu Dec 11 16:11:32 2003 From: lairdm at sfu.ca (Matthew Laird) Date: Thu Dec 11 16:15:28 2003 Subject: [Bioperl-l] Blast return codes In-Reply-To: Message-ID: Thanks for your assistance so far, I've been trying to find the difference between the machines that do work and the ones that don't. The two most similar machines which some do and some don't work are a group of Red Hat 9 machines. The machines are running Red Hat 9 and Perl 5.8.0. I've tried both bioperl 1.2.1 and 1.2.3. The only other difference I could find was that perl was built from source on one of the machines that worked. I tried doing that on one of the non-working machines and had no success. I've tried running the StandAloneBlast.t, what environment variables do I need set? I receive: [root@ssb7121-5 t]# perl StandAloneBlast.t 1..10 ok 1 ok 2 ok 3 ok 4 Blast Database ecoli.nt not found at StandAloneBlast.t line 67. Blast Database swissprot not found at StandAloneBlast.t line 72. Blast databases(s) not found, skipping remaining tests at StandAloneBlast.t line 76. ok 5 # skip Blast or env variables not installed correctly ok 6 # skip Blast or env variables not installed correctly ok 7 # skip Blast or env variables not installed correctly ok 8 # skip Blast or env variables not installed correctly ok 9 # skip Blast or env variables not installed correctly ok 10 # skip Blast or env variables not installed correctly Obviously I need to set some variable so it can find the blast database. Thanks again. On 11 Dec 2003, Keith James wrote: > >>>>> "Matthew" == Matthew Laird writes: > > Matthew> Well, that's a step in the right direction, I have a > Matthew> little more information now. I added a $! before the $? > Matthew> and received: > > Matthew> ------------- EXCEPTION ------------- MSG: blastall call > Matthew> crashed: -1 No child processes /usr/local/blast/blastall > Matthew> -p blastp -d > Matthew> /usr/local/psort/conf/analysis/sclblast/sclblast -i > Matthew> /tmp/8Dt6zF1U59 -e 1e-09 -o /tmp/ojP9n04LZh > > Matthew> STACK Bio::Tools::Run::StandAloneBlast::_runblast > Matthew> /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/Run/StandAloneBlast.pm:640 > > Matthew> This is where we get into Perl voodoo beyond my league, > Matthew> "No child processes" - does that ring bells for anyone? > Matthew> Thanks again. > > That's interesting. I think we need to know your OS platform and Perl > version to get any further. > > I think that the value left by a system call in $? is the same as if a > wait system call were made. No child processes is a Unix error code > (ECHILD) which can be caused by a wait, being reported by Perl. > > What happens if you run the test for StandAloneBlast? (t/StandAloneBlast.t) > > Keith > > -- Matthew Laird SysAdmin/Web Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From ymc at paxil.stanford.edu Thu Dec 11 16:30:26 2003 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Thu Dec 11 16:36:25 2003 Subject: [Bioperl-l] Re: What happened to my dpAlign module? In-Reply-To: <200312022037.03484.heikki@ebi.ac.uk> Message-ID: Hi Heikki and Aaron, If there are no other changes since the ext-06 release, then you can use my file: http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz as the ext-07 right off. It is ext-06 with my code and test.pl. I tested it to make sure it can install with no problems. To really use it you also need to include http://www.stanford.edu/~yeeman/dpAlign.pm in the core distribution under Bio::Tools. Regards, Yee Man On Tue, 2 Dec 2003, Heikki Lehvaslaiho wrote: > Aaron, > > I have been concentrating in getting the bioperl-core in shape and have not > done anything to bioperl-ext and bioperl-run. I was hoping of getting them > out some time after core. > > Is the ext in shape? There has not been that many changes in there, has there? > What is your feeling? Could we release a developer snap shot together with the > bioperl-core at the end of the week and epect to be able to release it in a > week or two maximum? > > -Heikki > > On Tuesday 02 Dec 2003 12:37 pm, Aaron J.Mackey wrote: > > Hi Yee, > > > > It is still in bioperl-live CVS, and is available in developer's > > releases (the 1.3.x series); it will presumably be a part of the stable > > 1.4.x release series, whenever that happens (Heikki knows more about > > this than I do). > > > > -Aaron > > > > On Dec 1, 2003, at 2:34 PM, Yee Man Chan wrote: > > > Hi Aaron > > > > > > I looked at the latest bioperl release but I couldn't find > > > anything about my dpAlign module. What is its status now? You can find > > > my > > > code at > > > > > > http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz > > > http://www.stanford.edu/~yeeman/dpAlign.pm > > > > > > Thanks > > > Yee Man > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > From lstein at cshl.edu Thu Dec 11 17:11:22 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Thu Dec 11 17:17:28 2003 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: References: Message-ID: <200312111711.22144.lstein@cshl.edu> Odd. That module hasn't been changed in ages. I'll look into it after I fix the Bio::Graphics problems. I want to get 1.4 out too. Lincoln On Thursday 11 December 2003 11:32 am, Jason Stajich wrote: > There seems to be something wrong with the offsets in Bio::DB::Fasta > > $db->seq($id, $start => $end); > > Is returning and invalid value. > > We need to fix this pre-1.4 release. > I don't have time to delve into the mysteries right now, but have > submitted a bug report. > > #!/usr/bin/perl -w > use strict; > use Bio::DB::Fasta; > use Bio::Index::Fasta; > use Test; > BEGIN { plan tests => 1} > use Bio::SeqIO; > use Bio::PrimarySeq; > my $dbfile = 'db.fas'; > my $fas_db = Bio::DB::Fasta->new($dbfile); > > my $idx_db = Bio::Index::Fasta->new(-filename => '/tmp/test.idx', > -write_flag=> 1); > $idx_db->make_index($dbfile); > > my ($id,$start,$end) = ('id1', 78408, 80349); > > my $seq1 = $fas_db->seq($id, $start => $end); > my $seq2 = $idx_db->get_Seq_by_acc($id); > > my $out = Bio::SeqIO->new(-format => 'fasta', -file => ">o.fa"); > $out->write_seq(Bio::PrimarySeq->new(-id => 'db::fasta', > -seq => $seq1)); > > $out->write_seq(Bio::PrimarySeq->new(-id => 'index::fasta', > -seq => $seq2->subseq($start,$end))); > > > ok($seq1, $seq2->subseq($start,$end) ); -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From heikki at nildram.co.uk Thu Dec 11 17:12:12 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Dec 11 17:18:38 2003 Subject: [Bioperl-l] Re: What happened to my dpAlign module? In-Reply-To: References: Message-ID: <200312112212.00787.heikki@ebi.ac.uk> Yee Man, Could you check http://bioperl.org/DIST/current_ext_unstable.tar.gz which now contains your additions and comments by Aaron. I announced it a few days ago. -Heikki On Thursday 11 Dec 2003 9:30 pm, Yee Man Chan wrote: > Hi Heikki and Aaron, > > If there are no other changes since the ext-06 release, then you > can use my file: > > http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz > > as the ext-07 right off. It is ext-06 with my code and test.pl. I tested > it to make sure it can install with no problems. To really use it you > also need to include > > http://www.stanford.edu/~yeeman/dpAlign.pm > > in the core distribution under Bio::Tools. > > Regards, > Yee Man > > On Tue, 2 Dec 2003, Heikki Lehvaslaiho wrote: > > Aaron, > > > > I have been concentrating in getting the bioperl-core in shape and have > > not done anything to bioperl-ext and bioperl-run. I was hoping of getting > > them out some time after core. > > > > Is the ext in shape? There has not been that many changes in there, has > > there? What is your feeling? Could we release a developer snap shot > > together with the bioperl-core at the end of the week and epect to be > > able to release it in a week or two maximum? > > > > -Heikki > > > > On Tuesday 02 Dec 2003 12:37 pm, Aaron J.Mackey wrote: > > > Hi Yee, > > > > > > It is still in bioperl-live CVS, and is available in developer's > > > releases (the 1.3.x series); it will presumably be a part of the stable > > > 1.4.x release series, whenever that happens (Heikki knows more about > > > this than I do). > > > > > > -Aaron > > > > > > On Dec 1, 2003, at 2:34 PM, Yee Man Chan wrote: > > > > Hi Aaron > > > > > > > > I looked at the latest bioperl release but I couldn't find > > > > anything about my dpAlign module. What is its status now? You can > > > > find my > > > > code at > > > > > > > > http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz > > > > http://www.stanford.edu/~yeeman/dpAlign.pm > > > > > > > > Thanks > > > > Yee Man > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From ymc at paxil.stanford.edu Thu Dec 11 17:29:05 2003 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Thu Dec 11 17:35:03 2003 Subject: [Bioperl-l] Re: What happened to my dpAlign module? In-Reply-To: <200312112212.00787.heikki@ebi.ac.uk> Message-ID: On Thu, 11 Dec 2003, Heikki Lehvaslaiho wrote: > Yee Man, > > Could you check > http://bioperl.org/DIST/current_ext_unstable.tar.gz > which now contains your additions and comments by Aaron. > > I announced it a few days ago. > Hi Heikki I just tried it and it seems to be working (it passed test.pl). Where is Aaron's comment? Thanks Yee Man > -Heikki > > On Thursday 11 Dec 2003 9:30 pm, Yee Man Chan wrote: > > Hi Heikki and Aaron, > > > > If there are no other changes since the ext-06 release, then you > > can use my file: > > > > http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz > > > > as the ext-07 right off. It is ext-06 with my code and test.pl. I tested > > it to make sure it can install with no problems. To really use it you > > also need to include > > > > http://www.stanford.edu/~yeeman/dpAlign.pm > > > > in the core distribution under Bio::Tools. > > > > Regards, > > Yee Man > > > > On Tue, 2 Dec 2003, Heikki Lehvaslaiho wrote: > > > Aaron, > > > > > > I have been concentrating in getting the bioperl-core in shape and have > > > not done anything to bioperl-ext and bioperl-run. I was hoping of getting > > > them out some time after core. > > > > > > Is the ext in shape? There has not been that many changes in there, has > > > there? What is your feeling? Could we release a developer snap shot > > > together with the bioperl-core at the end of the week and epect to be > > > able to release it in a week or two maximum? > > > > > > -Heikki > > > > > > On Tuesday 02 Dec 2003 12:37 pm, Aaron J.Mackey wrote: > > > > Hi Yee, > > > > > > > > It is still in bioperl-live CVS, and is available in developer's > > > > releases (the 1.3.x series); it will presumably be a part of the stable > > > > 1.4.x release series, whenever that happens (Heikki knows more about > > > > this than I do). > > > > > > > > -Aaron > > > > > > > > On Dec 1, 2003, at 2:34 PM, Yee Man Chan wrote: > > > > > Hi Aaron > > > > > > > > > > I looked at the latest bioperl release but I couldn't find > > > > > anything about my dpAlign module. What is its status now? You can > > > > > find my > > > > > code at > > > > > > > > > > http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz > > > > > http://www.stanford.edu/~yeeman/dpAlign.pm > > > > > > > > > > Thanks > > > > > Yee Man > > > > > > -- > > > ______ _/ _/_____________________________________________________ > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > ___ _/_/_/_/_/________________________________________________________ > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > From andreas.bernauer at gmx.de Thu Dec 11 18:55:26 2003 From: andreas.bernauer at gmx.de (andreas.bernauer@gmx.de) Date: Thu Dec 11 19:24:08 2003 Subject: [Bioperl-l] Experiences from a newbie Message-ID: <20031211235423.GF14666@hgt.mcb.uconn.edu> Hi, I was not going to write an email, but I've read the Bioperl overview and it encouraged me to do so. The reason why I was not going to write an email is that I don't know where to start with the difficulties I had. I don't want to blame anybody and usually I had the feeling I didn't get some things to work because I am missing something very little. OK, here we go. My background: lots of programming experience, but not in perl, first time user of Bioperl. Running on Linux. The installation was easy as far as I remember. Some things to add like with almost every Linux program installation, but nothing exciting as far as I remember. The first point I got overwhelmed was in the tutorial: It was very huge and covered a lot of stuff I didn't want to do, so I tried to read only the parts I was interested in. I read the section about the different objects BioPerl is using. I found this section rather confusing, as I don't understand, when to use which object and why I had to care about them. I just wanted to read in a file and get some information out of it. My task was to extract the translation information out of a GenBank file that I got from NCBI, that describes a whole genome of a bacterium. The file has the nucleotide information and in the FEATURES section all predicted proteins. I wanted those proteins to be written into a FASTA file. So I used the following code: use Bio::SeqIO; $in = Bio::SeqIO->new(-file => shift @ARGV, -format => 'GenBank'); $out = Bio::SeqIO->new(-file => ">outputfilename", -format => 'FASTA'); while ( my $seqobj = $in->next_seq() ) {$out->write_seq($seqobj); } which I got from the tutorial (III.2). I run this script and got a FASTA file with the DNA. Hm, well, that's what's in the GenBank file and that's what I got, but not what I wanted. OK, my fault, let's get the FEATURES. So I looked into the tutorial and went to section III.7.1 Representing sequence annotations and found this line: @allfeatures = $seqobj->all_SeqFeatures(); # descend into sub features adding this to my script yielded an error: Can't locate object method "all_SeqFeatures" via package "Bio::SeqIO::genbank" (perhaps you forgot to load "Bio::SeqIO::genbank"?) at ./test.pl line 10. Hm, the section meant probably something else with $seqobj. Reviewing the title, I saw, it should be a RichSeq object. OK, how can I translate my SeqIO object into a RichSeq object? (And by the way, why do I have to do it? I mean, I've already read the file, why can't I access the data? This is where I think I am missing part of the BioPerl philosophy or assuming too much.) So I looked through the SeqIO manpage, but I didn't find anything that told me, how I could get a RichSeq out of a SeqIO. I looked through the RichSeq manpage, but it can only create new RichSeqs, not reading some out of SeqIO. I went to the SeqIO HOWTO afterwards (http://bioperl.org/HOWTOs/html/SeqIO.html). (By the way, in chapter 4 there is a small bug in the very first working example. It uses only a single colon instead of double colons, like Bio:SeqIO instead of Bio::SeqIO.) At the description of the splitgb.pl script it tells me, that it uses the species attribute. Looks good, as I probably want the FEATURES attribute. OK, so I looked at the Seq manpage. There I found a way to get an AnnotationCollectionI, from which I can get an AnnotationI. So I changed the body of the while loop above to the following, using the code from the AnnotationCollectionI manual: while ( my $seq = $in->next_seq() ) { my $ann = $seq->annotation(); foreach my $key ( $ann->get_all_annotation_keys() ) { my @values = $ann->get_Annotations($key); foreach my $value ( @values ) { # value is an Bio::AnnotationI, and defines a "as_text" method print "Annotation ",$key," stringified value ",$value->as_text,"\n"; # you can also use a generic hash_tree method for getting # stuff out say into XML format #$hash_tree = $value->hash_tree(); } } } The output is the following: [andreas@hgt tmp]$ ./test.pl ~/research/genomes/Actinobacteria/NC_000962.gbk Annotation origin stringified value Value: Annotation comment stringified value Comment: COMPLETENESS: full length. Annotation reference stringified value Reference: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence Annotation reference stringified value Reference: Direct Submission Which looks a little bit like garbage to me. There is no text "Value:" in the whole file. The ordering is surprisingly reversed. The References only prints the title of the reference. And the worst thing is, that it omits the FEATURES section that I am actually looking for, not to mention that it obviously misses all the other annotations like LOCUS, DEFINITION, ACCESSION, etc. But they might be stored in some other object I am not aware of. I gave up at this point. I spend almost an hour for a task that should be easy to accomplish. In fact, in half the time probably, I wrote a Scheme script that extracted me the information, as the GenBank file is pretty structured. But I did want to give BioPerl another chance. Some time later, I had run a blast on some databases and wanted to extract the results. I used something like: my $report = new Bio::SearchIO (-format => 'blast', -file => $file); while (my $result = $report->next_result ) { print "Next ($blastId) result (" . $result->num_hits . " hits)\n"; my $hitNo = 0; while (my $hit = $result->next_hit) { $hitNo++; # do something with hit. } } I found the SearchIO tutorial which was pretty cool, as it was the first page that really gave me an overview of all the functions I can call and what they will give me as a result (not only what object, but what part of the BLAST result file) opposed to the manpages that were not so informative and clear for me. As I din't want to look at the actual sequences and only needed the hits themselves, I told blast to omit the detailed listing of the alignments. This turned out to be a mistake, as for some reason that I can never think of, I can only get HITS in $result, when I include the alignments (which are accessed via HSP which I never use). Again something that I cannot access with my logic. The bottom line for me is: BioPerl is valuable for me in so far that I don't have to write a parser for BLAST output by myself. But with all its different kinds of objects, interfaces and collections I am totally lost and either I spend an hour figuring out, how to do simple tasks or I just write a small script that does it for me, as most files are descent structured. Although I am not completely satisfied with BioPerl I like the idea itself and the ability to program over biological data. I would appreciate if somebody could point me to what I am missing or where I have wrong assumptions. Maybe a webpage that gives me the big picture and that I've missed. I always had the feeling that I am just missing a little bit to understand why this has to work this (for me complicated) way and not the other. Thank you for your attention. Best regards, Andreas Bernauer. PS: While I reviewed my email, I realized how to get the FEATURES I wanted: by using $seqobj->get_SeqFeatures() and $feat->get_tag_values('translation'). So, finally, it worked. I am still wondering why it was so difficult for me to get to this. The task doesn't look so special to me. Maybe I don't understand, what BioPerl means by Annotation, Feature, Sequence, etc. Is there a page that gives me an overview of this? I couldn't find one but maybe I haven't tried hard enough. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031211/2f2f85ef/attachment.bin From markw at illuminae.com Thu Dec 11 12:25:52 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Thu Dec 11 19:24:36 2003 Subject: [Bioperl-l] here's a cool bug! In-Reply-To: References: Message-ID: <1071163551.3847.68.camel@localhost.localdomain> This bug is an interesting one! It doesn't happen in all cases, only on certain genbank records, like the one attached. Try this out: perl -MBio::SeqIO -e '$sio=Bio::SeqIO->new(-file => "blah.gb"); $s=$sio->next_seq; print $s->primary_id;' that should lead to some raised eyebrows :-) M -------------- next part -------------- A non-text attachment was scrubbed... Name: blah.gb Type: application/x-gameboy-rom Size: 7461 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031211/01f8e6f8/blah-0001.bin From lstein at cshl.edu Thu Dec 11 19:18:38 2003 From: lstein at cshl.edu (lstein@cshl.edu) Date: Thu Dec 11 19:24:41 2003 Subject: [Bioperl-l] Changes to GFF 2.5 "unflattening" code Message-ID: <200312120018.hBC0Ic7H006088@pronto.lsjs.org> Hi Mark, Sheldon, I saw your change to the _parse_gff2_group code in Bio::DB::GFF, which prioritizes "gene", "locus_tag" and "transcript" as group fields in the column 9 attributes. I like it, but unfortunately it breaks some other code that I have, including the GMOD tutorial. I think you'll like what I've done instead. I've added a preferred_groups() method to which you pass a list of group names. Then, this list will be used as the priority list to pluck out groups from the GFF2 attribute list. To get your previous behavior, you need to do this: $db = Bio::DB::GFF->new(-preferred_groups=>['gene','locus_tag','transcript'], @other_args); $db->load_gff(...); or this $db = Bio::DB::GFF->new(@other_args); $db->preferred_groups('gene','locus_tag','transcript'); $db->load_gff(...); You'll have to change your existing scripts accordingly. Sure, this should be merged with Chris's unflattener, but then again let's just get to GFF3 as quickly as we possibly can and leave this nightmare behind us! Lincoln -- Lincoln Stein lstein@cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From smckay at bcgsc.bc.ca Thu Dec 11 20:21:25 2003 From: smckay at bcgsc.bc.ca (Sheldon McKay) Date: Thu Dec 11 20:26:26 2003 Subject: [Bioperl-l] Re: Changes to GFF 2.5 "unflattening" code References: <200312120018.hBC0Ic7H006088@pronto.lsjs.org> Message-ID: <000f01c3c04e$42d84e40$c201010a@celegans01> I'm the guilty party. We better hurry with the fix or the entire GFF2.5 user community (Mark) will start complaining. I just found out today that only flies and mosquitoes use the locus_tag qualifier, I've been neck deep in game for too long. Sheldon ----- Original Message ----- From: To: ; Cc: Sent: Thursday, December 11, 2003 4:18 PM Subject: Changes to GFF 2.5 "unflattening" code > Hi Mark, Sheldon, > > I saw your change to the _parse_gff2_group code in Bio::DB::GFF, which > prioritizes "gene", "locus_tag" and "transcript" as group fields in > the column 9 attributes. I like it, but unfortunately it breaks some > other code that I have, including the GMOD tutorial. > > I think you'll like what I've done instead. I've added a > preferred_groups() method to which you pass a list of group names. > Then, this list will be used as the priority list to pluck out groups > from the GFF2 attribute list. To get your previous behavior, you need > to do this: > > $db = Bio::DB::GFF->new(-preferred_groups=>['gene','locus_tag','transcript'], > @other_args); > $db->load_gff(...); > > or this > > $db = Bio::DB::GFF->new(@other_args); > $db->preferred_groups('gene','locus_tag','transcript'); > $db->load_gff(...); > > You'll have to change your existing scripts accordingly. Sure, this > should be merged with Chris's unflattener, but then again let's just > get to GFF3 as quickly as we possibly can and leave this nightmare > behind us! > > Lincoln > > -- > Lincoln Stein > lstein@cshl.edu > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > From jason at cgt.duhs.duke.edu Thu Dec 11 21:01:43 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Dec 11 21:07:48 2003 Subject: [Bioperl-l] here's a cool bug! In-Reply-To: <1071163551.3847.68.camel@localhost.localdomain> References: <1071163551.3847.68.camel@localhost.localdomain> Message-ID: [jason@ascona jason]$ perl -mBio::SeqIO -e 'print $sio=Bio::SeqIO->new(-file => "blah.gb")->next_seq->primary_id,"\n";' Bio::Seq::RichSeq=HASH(0x8518394) So there is no VERSION for the sequence... VERSION GI:230940 Is that valid Genbank... Need to make this loop here a little tighter then I guess in genbank.pm #Version number elsif( /^VERSION\s+(.+)$/ ) { my ($acc,$gi) = split(' ',$1); if($acc =~ /^\w+\.(\d+)/) { $params{'-version'} = $1; $params{'-seq_version'} = $1; } if($gi && (index($gi,"GI:") == 0)) { $params{'-primary_id'} = substr($gi,3); } } Any takers...? Make that regexp a little more greedy for GI: perhaps? -jason On Thu, 11 Dec 2003, Mark Wilkinson wrote: > This bug is an interesting one! It doesn't happen in all cases, only on > certain genbank records, like the one attached. Try this out: > > perl -MBio::SeqIO -e '$sio=Bio::SeqIO->new(-file => "blah.gb"); > $s=$sio->next_seq; print $s->primary_id;' > > that should lead to some raised eyebrows :-) > > M > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From markw at illuminae.com Thu Dec 11 22:23:40 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Thu Dec 11 22:29:54 2003 Subject: [BioPerl] Re: [Bioperl-l] here's a cool bug! In-Reply-To: References: <1071163551.3847.68.camel@localhost.localdomain> Message-ID: <1071199420.3849.135.camel@localhost.localdomain> Hey Jason, Actually, let's hold off on making any adjustments. This genbank file came from a SeqHound call, and I'm a bit suspicious of it. I compared it to the actual genbank record and they are non-identical, though SeqHound is supposed to (from what I have been told) be providing bona fide original genbank records, not synthesized ones... so I don't know why they are different, but the differences are pretty substantial. I've contacted the SeqHound developers with this question to try to find out what's up. If this is *not* a genuine genbank record, then we probably shouldn't change the code. M On Thu, 2003-12-11 at 20:01, Jason Stajich wrote: > [jason@ascona jason]$ perl -mBio::SeqIO -e 'print > $sio=Bio::SeqIO->new(-file => "blah.gb")->next_seq->primary_id,"\n";' > > Bio::Seq::RichSeq=HASH(0x8518394) > > So there is no VERSION for the sequence... > VERSION GI:230940 > > Is that valid Genbank... > > Need to make this loop here a little tighter then I guess > in genbank.pm > > #Version number > elsif( /^VERSION\s+(.+)$/ ) { > my ($acc,$gi) = split(' ',$1); > if($acc =~ /^\w+\.(\d+)/) { > $params{'-version'} = $1; > $params{'-seq_version'} = $1; > } > if($gi && (index($gi,"GI:") == 0)) { > $params{'-primary_id'} = substr($gi,3); > } > } > > Any takers...? Make that regexp a little more greedy for GI: perhaps? > > -jason > On Thu, 11 Dec 2003, Mark Wilkinson wrote: > > > This bug is an interesting one! It doesn't happen in all cases, only on > > certain genbank records, like the one attached. Try this out: > > > > perl -MBio::SeqIO -e '$sio=Bio::SeqIO->new(-file => "blah.gb"); > > $s=$sio->next_seq; print $s->primary_id;' > > > > that should lead to some raised eyebrows :-) > > > > M > > > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Mark Wilkinson Illuminae From lhaifeng at dso.org.sg Fri Dec 12 01:49:04 2003 From: lhaifeng at dso.org.sg (Liu Haifeng) Date: Fri Dec 12 01:58:17 2003 Subject: [Bioperl-l] bl2seq hang and its performace Message-ID: <002301c3c07c$0886de40$706712ac@GENETHON> Hi all, I noticed that one of my program written using bioperl-1.2.3 runs very slow and consumes huge memory, and I doubted that it is due to the call of bl2seq in the program. Thus, I wrote a small program (bl2seq sequences against themselves from a fasta file) below to see if it is the ture: #!/usr/bin/perl -w use Bio::SeqIO; use Bio::Tools::Blast; use Bio::Tools::Run::StandAloneBlast; use Bio::Tools::BPlite; my $infile =shift; my $sno=0; my $blastalgo="blastp"; #blastp ,blastx, tblastn, tblastx my $pin = Bio::SeqIO->new('-file' => "$infile", '-format' => 'Fasta'); while ( my $proseq = $pin -> next_seq()) { $sno++; print "bl2seq $sno ..............................\n"; my @params=('program' => $blastalo); my $factory= Bio::Tools::Run::StandAloneBlast->new(@params); $factory->io->_io_cleanup(); my $report=$factory->bl2seq($proseq, $proseq); while (my $hsp=$report->next_feature) { #only need the first hsp $report->close(); } undef $report; } print "running is over\n"; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The program runs ok for the small fastat file. However, when I input a fasat file around 2.6M containing 10,000 protein sequences, the program hangs when it compare the 1782th sequence. Also I noticed that the program has consume 12M of memory at that time. I searched the archive that there have been similar bl2seq problem occurred. However, it should have been solved in the latest version. Anyone can show me some clues to improve the performance of calling bl2seq? Thank you. Regards Haifeng Liu From heikki at nildram.co.uk Fri Dec 12 03:13:26 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Fri Dec 12 03:19:29 2003 Subject: [Bioperl-l] Re: What happened to my dpAlign module? In-Reply-To: References: Message-ID: <200312120813.13623.heikki@ebi.ac.uk> I meant the README. (Not much in there) -Heikki On Thursday 11 Dec 2003 10:29 pm, Yee Man Chan wrote: > On Thu, 11 Dec 2003, Heikki Lehvaslaiho wrote: > > Yee Man, > > > > Could you check > > http://bioperl.org/DIST/current_ext_unstable.tar.gz > > which now contains your additions and comments by Aaron. > > > > I announced it a few days ago. > > Hi Heikki > > I just tried it and it seems to be working (it passed test.pl). > Where is Aaron's comment? > > Thanks > Yee Man > > > -Heikki > > > > On Thursday 11 Dec 2003 9:30 pm, Yee Man Chan wrote: > > > Hi Heikki and Aaron, > > > > > > If there are no other changes since the ext-06 release, then you > > > can use my file: > > > > > > http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz > > > > > > as the ext-07 right off. It is ext-06 with my code and test.pl. I > > > tested it to make sure it can install with no problems. To really use > > > it you also need to include > > > > > > http://www.stanford.edu/~yeeman/dpAlign.pm > > > > > > in the core distribution under Bio::Tools. > > > > > > Regards, > > > Yee Man > > > > > > On Tue, 2 Dec 2003, Heikki Lehvaslaiho wrote: > > > > Aaron, > > > > > > > > I have been concentrating in getting the bioperl-core in shape and > > > > have not done anything to bioperl-ext and bioperl-run. I was hoping > > > > of getting them out some time after core. > > > > > > > > Is the ext in shape? There has not been that many changes in there, > > > > has there? What is your feeling? Could we release a developer snap > > > > shot together with the bioperl-core at the end of the week and epect > > > > to be able to release it in a week or two maximum? > > > > > > > > -Heikki > > > > > > > > On Tuesday 02 Dec 2003 12:37 pm, Aaron J.Mackey wrote: > > > > > Hi Yee, > > > > > > > > > > It is still in bioperl-live CVS, and is available in developer's > > > > > releases (the 1.3.x series); it will presumably be a part of the > > > > > stable 1.4.x release series, whenever that happens (Heikki knows > > > > > more about this than I do). > > > > > > > > > > -Aaron > > > > > > > > > > On Dec 1, 2003, at 2:34 PM, Yee Man Chan wrote: > > > > > > Hi Aaron > > > > > > > > > > > > I looked at the latest bioperl release but I couldn't find > > > > > > anything about my dpAlign module. What is its status now? You can > > > > > > find my > > > > > > code at > > > > > > > > > > > > http://www.stanford.edu/~yeeman/bioperl-ext.tar.gz > > > > > > http://www.stanford.edu/~yeeman/dpAlign.pm > > > > > > > > > > > > Thanks > > > > > > Yee Man > > > > > > > > -- > > > > ______ _/ > > > > _/_____________________________________________________ _/ _/ > > > > http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki > > > > Lehvaslaiho heikki_at_ebi ac uk > > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > > ___ > > > > _/_/_/_/_/________________________________________________________ > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From brian_osborne at cognia.com Fri Dec 12 07:55:24 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Dec 12 08:03:55 2003 Subject: [Bioperl-l] Experiences from a newbie In-Reply-To: <20031211235423.GF14666@hgt.mcb.uconn.edu> Message-ID: Andreas, I found your comments quite reasonable. Bioperl has a lot to offer, but the documentation is not exactly coherent. Personally I think that a good HOWTO goes far to solve this problem as most people probably start to use Bioperl because they have a specific task in mind. I've started writing a HOWTO on Features and Annotations, I would appreciate your taking a look at it and telling me what you think, here's the URL: http://bioperl.org/HOWTOs/html/Feature-Annotation.html Ignore the blank sections, please! Any comments would be appreciated, since you're precisely the sort of user I'm trying to write for, someone with substantial experience in one of those fields that comprises bioinformatics, but not all. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of andreas.bernauer@gmx.de Sent: Thursday, December 11, 2003 6:55 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Experiences from a newbie Hi, I was not going to write an email, but I've read the Bioperl overview and it encouraged me to do so. The reason why I was not going to write an email is that I don't know where to start with the difficulties I had. I don't want to blame anybody and usually I had the feeling I didn't get some things to work because I am missing something very little. OK, here we go. My background: lots of programming experience, but not in perl, first time user of Bioperl. Running on Linux. The installation was easy as far as I remember. Some things to add like with almost every Linux program installation, but nothing exciting as far as I remember. The first point I got overwhelmed was in the tutorial: It was very huge and covered a lot of stuff I didn't want to do, so I tried to read only the parts I was interested in. I read the section about the different objects BioPerl is using. I found this section rather confusing, as I don't understand, when to use which object and why I had to care about them. I just wanted to read in a file and get some information out of it. My task was to extract the translation information out of a GenBank file that I got from NCBI, that describes a whole genome of a bacterium. The file has the nucleotide information and in the FEATURES section all predicted proteins. I wanted those proteins to be written into a FASTA file. So I used the following code: use Bio::SeqIO; $in = Bio::SeqIO->new(-file => shift @ARGV, -format => 'GenBank'); $out = Bio::SeqIO->new(-file => ">outputfilename", -format => 'FASTA'); while ( my $seqobj = $in->next_seq() ) {$out->write_seq($seqobj); } which I got from the tutorial (III.2). I run this script and got a FASTA file with the DNA. Hm, well, that's what's in the GenBank file and that's what I got, but not what I wanted. OK, my fault, let's get the FEATURES. So I looked into the tutorial and went to section III.7.1 Representing sequence annotations and found this line: @allfeatures = $seqobj->all_SeqFeatures(); # descend into sub features adding this to my script yielded an error: Can't locate object method "all_SeqFeatures" via package "Bio::SeqIO::genbank" (perhaps you forgot to load "Bio::SeqIO::genbank"?) at ./test.pl line 10. Hm, the section meant probably something else with $seqobj. Reviewing the title, I saw, it should be a RichSeq object. OK, how can I translate my SeqIO object into a RichSeq object? (And by the way, why do I have to do it? I mean, I've already read the file, why can't I access the data? This is where I think I am missing part of the BioPerl philosophy or assuming too much.) So I looked through the SeqIO manpage, but I didn't find anything that told me, how I could get a RichSeq out of a SeqIO. I looked through the RichSeq manpage, but it can only create new RichSeqs, not reading some out of SeqIO. I went to the SeqIO HOWTO afterwards (http://bioperl.org/HOWTOs/html/SeqIO.html). (By the way, in chapter 4 there is a small bug in the very first working example. It uses only a single colon instead of double colons, like Bio:SeqIO instead of Bio::SeqIO.) At the description of the splitgb.pl script it tells me, that it uses the species attribute. Looks good, as I probably want the FEATURES attribute. OK, so I looked at the Seq manpage. There I found a way to get an AnnotationCollectionI, from which I can get an AnnotationI. So I changed the body of the while loop above to the following, using the code from the AnnotationCollectionI manual: while ( my $seq = $in->next_seq() ) { my $ann = $seq->annotation(); foreach my $key ( $ann->get_all_annotation_keys() ) { my @values = $ann->get_Annotations($key); foreach my $value ( @values ) { # value is an Bio::AnnotationI, and defines a "as_text" method print "Annotation ",$key," stringified value ",$value->as_text,"\n"; # you can also use a generic hash_tree method for getting # stuff out say into XML format #$hash_tree = $value->hash_tree(); } } } The output is the following: [andreas@hgt tmp]$ ./test.pl ~/research/genomes/Actinobacteria/NC_000962.gbk Annotation origin stringified value Value: Annotation comment stringified value Comment: COMPLETENESS: full length. Annotation reference stringified value Reference: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence Annotation reference stringified value Reference: Direct Submission Which looks a little bit like garbage to me. There is no text "Value:" in the whole file. The ordering is surprisingly reversed. The References only prints the title of the reference. And the worst thing is, that it omits the FEATURES section that I am actually looking for, not to mention that it obviously misses all the other annotations like LOCUS, DEFINITION, ACCESSION, etc. But they might be stored in some other object I am not aware of. I gave up at this point. I spend almost an hour for a task that should be easy to accomplish. In fact, in half the time probably, I wrote a Scheme script that extracted me the information, as the GenBank file is pretty structured. But I did want to give BioPerl another chance. Some time later, I had run a blast on some databases and wanted to extract the results. I used something like: my $report = new Bio::SearchIO (-format => 'blast', -file => $file); while (my $result = $report->next_result ) { print "Next ($blastId) result (" . $result->num_hits . " hits)\n"; my $hitNo = 0; while (my $hit = $result->next_hit) { $hitNo++; # do something with hit. } } I found the SearchIO tutorial which was pretty cool, as it was the first page that really gave me an overview of all the functions I can call and what they will give me as a result (not only what object, but what part of the BLAST result file) opposed to the manpages that were not so informative and clear for me. As I din't want to look at the actual sequences and only needed the hits themselves, I told blast to omit the detailed listing of the alignments. This turned out to be a mistake, as for some reason that I can never think of, I can only get HITS in $result, when I include the alignments (which are accessed via HSP which I never use). Again something that I cannot access with my logic. The bottom line for me is: BioPerl is valuable for me in so far that I don't have to write a parser for BLAST output by myself. But with all its different kinds of objects, interfaces and collections I am totally lost and either I spend an hour figuring out, how to do simple tasks or I just write a small script that does it for me, as most files are descent structured. Although I am not completely satisfied with BioPerl I like the idea itself and the ability to program over biological data. I would appreciate if somebody could point me to what I am missing or where I have wrong assumptions. Maybe a webpage that gives me the big picture and that I've missed. I always had the feeling that I am just missing a little bit to understand why this has to work this (for me complicated) way and not the other. Thank you for your attention. Best regards, Andreas Bernauer. PS: While I reviewed my email, I realized how to get the FEATURES I wanted: by using $seqobj->get_SeqFeatures() and $feat->get_tag_values('translation'). So, finally, it worked. I am still wondering why it was so difficult for me to get to this. The task doesn't look so special to me. Maybe I don't understand, what BioPerl means by Annotation, Feature, Sequence, etc. Is there a page that gives me an overview of this? I couldn't find one but maybe I haven't tried hard enough. From brian_osborne at cognia.com Fri Dec 12 08:11:05 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Dec 12 08:19:35 2003 Subject: [Bioperl-l] Experiences from a newbie In-Reply-To: Message-ID: Andreas, >As I din't want to look at the actual sequences and only needed the >hits themselves, I told blast to omit the detailed listing of the >alignments. This turned out to be a mistake, as for some reason that >I can never think of, I can only get HITS in $result, when I include >the alignments (which are accessed via HSP which I never use). Again >something that I cannot access with my logic. I couldn't understand this. Could you restate this? or tell us exactly what information you wanted to get from the Hit object? Offhand I'd say you are seeing the expected behavior: the Result object gets you access to the Hit objects, and Hit objects get you access to the HSP objects. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Brian Osborne Sent: Friday, December 12, 2003 7:55 AM To: andreas.bernauer@gmx.de; bioperl-l@bioperl.org Subject: RE: [Bioperl-l] Experiences from a newbie Andreas, I found your comments quite reasonable. Bioperl has a lot to offer, but the documentation is not exactly coherent. Personally I think that a good HOWTO goes far to solve this problem as most people probably start to use Bioperl because they have a specific task in mind. I've started writing a HOWTO on Features and Annotations, I would appreciate your taking a look at it and telling me what you think, here's the URL: http://bioperl.org/HOWTOs/html/Feature-Annotation.html Ignore the blank sections, please! Any comments would be appreciated, since you're precisely the sort of user I'm trying to write for, someone with substantial experience in one of those fields that comprises bioinformatics, but not all. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of andreas.bernauer@gmx.de Sent: Thursday, December 11, 2003 6:55 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Experiences from a newbie Hi, I was not going to write an email, but I've read the Bioperl overview and it encouraged me to do so. The reason why I was not going to write an email is that I don't know where to start with the difficulties I had. I don't want to blame anybody and usually I had the feeling I didn't get some things to work because I am missing something very little. OK, here we go. My background: lots of programming experience, but not in perl, first time user of Bioperl. Running on Linux. The installation was easy as far as I remember. Some things to add like with almost every Linux program installation, but nothing exciting as far as I remember. The first point I got overwhelmed was in the tutorial: It was very huge and covered a lot of stuff I didn't want to do, so I tried to read only the parts I was interested in. I read the section about the different objects BioPerl is using. I found this section rather confusing, as I don't understand, when to use which object and why I had to care about them. I just wanted to read in a file and get some information out of it. My task was to extract the translation information out of a GenBank file that I got from NCBI, that describes a whole genome of a bacterium. The file has the nucleotide information and in the FEATURES section all predicted proteins. I wanted those proteins to be written into a FASTA file. So I used the following code: use Bio::SeqIO; $in = Bio::SeqIO->new(-file => shift @ARGV, -format => 'GenBank'); $out = Bio::SeqIO->new(-file => ">outputfilename", -format => 'FASTA'); while ( my $seqobj = $in->next_seq() ) {$out->write_seq($seqobj); } which I got from the tutorial (III.2). I run this script and got a FASTA file with the DNA. Hm, well, that's what's in the GenBank file and that's what I got, but not what I wanted. OK, my fault, let's get the FEATURES. So I looked into the tutorial and went to section III.7.1 Representing sequence annotations and found this line: @allfeatures = $seqobj->all_SeqFeatures(); # descend into sub features adding this to my script yielded an error: Can't locate object method "all_SeqFeatures" via package "Bio::SeqIO::genbank" (perhaps you forgot to load "Bio::SeqIO::genbank"?) at ./test.pl line 10. Hm, the section meant probably something else with $seqobj. Reviewing the title, I saw, it should be a RichSeq object. OK, how can I translate my SeqIO object into a RichSeq object? (And by the way, why do I have to do it? I mean, I've already read the file, why can't I access the data? This is where I think I am missing part of the BioPerl philosophy or assuming too much.) So I looked through the SeqIO manpage, but I didn't find anything that told me, how I could get a RichSeq out of a SeqIO. I looked through the RichSeq manpage, but it can only create new RichSeqs, not reading some out of SeqIO. I went to the SeqIO HOWTO afterwards (http://bioperl.org/HOWTOs/html/SeqIO.html). (By the way, in chapter 4 there is a small bug in the very first working example. It uses only a single colon instead of double colons, like Bio:SeqIO instead of Bio::SeqIO.) At the description of the splitgb.pl script it tells me, that it uses the species attribute. Looks good, as I probably want the FEATURES attribute. OK, so I looked at the Seq manpage. There I found a way to get an AnnotationCollectionI, from which I can get an AnnotationI. So I changed the body of the while loop above to the following, using the code from the AnnotationCollectionI manual: while ( my $seq = $in->next_seq() ) { my $ann = $seq->annotation(); foreach my $key ( $ann->get_all_annotation_keys() ) { my @values = $ann->get_Annotations($key); foreach my $value ( @values ) { # value is an Bio::AnnotationI, and defines a "as_text" method print "Annotation ",$key," stringified value ",$value->as_text,"\n"; # you can also use a generic hash_tree method for getting # stuff out say into XML format #$hash_tree = $value->hash_tree(); } } } The output is the following: [andreas@hgt tmp]$ ./test.pl ~/research/genomes/Actinobacteria/NC_000962.gbk Annotation origin stringified value Value: Annotation comment stringified value Comment: COMPLETENESS: full length. Annotation reference stringified value Reference: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence Annotation reference stringified value Reference: Direct Submission Which looks a little bit like garbage to me. There is no text "Value:" in the whole file. The ordering is surprisingly reversed. The References only prints the title of the reference. And the worst thing is, that it omits the FEATURES section that I am actually looking for, not to mention that it obviously misses all the other annotations like LOCUS, DEFINITION, ACCESSION, etc. But they might be stored in some other object I am not aware of. I gave up at this point. I spend almost an hour for a task that should be easy to accomplish. In fact, in half the time probably, I wrote a Scheme script that extracted me the information, as the GenBank file is pretty structured. But I did want to give BioPerl another chance. Some time later, I had run a blast on some databases and wanted to extract the results. I used something like: my $report = new Bio::SearchIO (-format => 'blast', -file => $file); while (my $result = $report->next_result ) { print "Next ($blastId) result (" . $result->num_hits . " hits)\n"; my $hitNo = 0; while (my $hit = $result->next_hit) { $hitNo++; # do something with hit. } } I found the SearchIO tutorial which was pretty cool, as it was the first page that really gave me an overview of all the functions I can call and what they will give me as a result (not only what object, but what part of the BLAST result file) opposed to the manpages that were not so informative and clear for me. As I din't want to look at the actual sequences and only needed the hits themselves, I told blast to omit the detailed listing of the alignments. This turned out to be a mistake, as for some reason that I can never think of, I can only get HITS in $result, when I include the alignments (which are accessed via HSP which I never use). Again something that I cannot access with my logic. The bottom line for me is: BioPerl is valuable for me in so far that I don't have to write a parser for BLAST output by myself. But with all its different kinds of objects, interfaces and collections I am totally lost and either I spend an hour figuring out, how to do simple tasks or I just write a small script that does it for me, as most files are descent structured. Although I am not completely satisfied with BioPerl I like the idea itself and the ability to program over biological data. I would appreciate if somebody could point me to what I am missing or where I have wrong assumptions. Maybe a webpage that gives me the big picture and that I've missed. I always had the feeling that I am just missing a little bit to understand why this has to work this (for me complicated) way and not the other. Thank you for your attention. Best regards, Andreas Bernauer. PS: While I reviewed my email, I realized how to get the FEATURES I wanted: by using $seqobj->get_SeqFeatures() and $feat->get_tag_values('translation'). So, finally, it worked. I am still wondering why it was so difficult for me to get to this. The task doesn't look so special to me. Maybe I don't understand, what BioPerl means by Annotation, Feature, Sequence, etc. Is there a page that gives me an overview of this? I couldn't find one but maybe I haven't tried hard enough. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Fri Dec 12 09:36:13 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Dec 12 09:44:44 2003 Subject: [Bioperl-l] Bio::Biblio In-Reply-To: <3FD7C883.1050708@email.arizona.edu> Message-ID: Susan, Bioperl 1.2 supports Bio::Biblio->find, yes. You should show us the code that's giving you the error message. Or try some of the examples in examples/biblio/biblio_examples.pl. By Pubmed citations "related" to a Genbank accession do you mean the references found in the Genbank entry? If so then this is easily done using Bioperl. First retrieve the sequence object for that accession number (see the section in the bptutorial that discusses Bio::DB::GenBank), then get the references using the Annotation methods. Perhaps this is not what you meant... Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Susan J. Miller Sent: Wednesday, December 10, 2003 8:30 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Bio::Biblio We are running Perl 5.6, Bioperl 1.2 on a Solaris8 machine. A couple of questions about Bio::Biblio: 1. When I try to use the Bio::Biblio->find method I get an error message saying that it is not implemented, yet looking in the bioperl list archive I see replies that mention this method. Should I be able to use it? Do I need a more recent version of BioPerl? 2. I would like to be able to take a Genbank accession number and find any Pubmed citations that are related to the accession number. NCBI Entrez search will turn up Pubmed hits given an Accession number - is there a way to do this using BioPerl? -- Thanks, -susan Susan J. Miller Biotechnology Computing Facility Arizona Research Laboratories Bio West 228 University of Arizona Tucson, AZ 85721 (520) 626-2597 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Fri Dec 12 10:02:20 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Dec 12 10:08:18 2003 Subject: [Bioperl-l] Issues with Bio::Graphics resolved Message-ID: <200312121002.20069.lstein@cshl.edu> Hi All, I think I've gotten the Bio::Graphics issues fixed. I also found and fixed a problem with Bio::DB::GFF's handling of the ninth column of GFF files. So from my point of view, all my stuff is ready for 1.4. Lincoln -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From vince.forgetta at staff.mcgill.ca Fri Dec 12 10:13:01 2003 From: vince.forgetta at staff.mcgill.ca (Vince Forgetta) Date: Fri Dec 12 10:22:41 2003 Subject: [Bioperl-l] using Bio::DB::GenBank get translation Message-ID: <3FD9DAFD.3060301@staff.mcgill.ca> Hi all, i have seen on some previous posts that you can retrieve the CDS start and stop from a GenBank DNA sequence accessionif the file is stored locally and read in using Bio::SeqIO. How would I retrieve the translation start and stop of a GenBank accession downloaded using Bio::DB::GenBank. For example, the code below does not seem to find a tag "translation": use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seq; my $accession; $seq = $gb->get_Seq_by_acc($accession); while (not(defined($seq))); foreach my $feat ($seq->all_SeqFeatures()){ my $CDS = ""; if ($feat->has_tag('translation')){ $CDS = $feat->start."..".$feat->end; return "$CDS\n"; }else{ return "Not found\n"; } Thank you for your time. From jason at cgt.duhs.duke.edu Fri Dec 12 10:41:56 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Dec 12 10:48:14 2003 Subject: [Bioperl-l] using Bio::DB::GenBank get translation In-Reply-To: <3FD9DAFD.3060301@staff.mcgill.ca> References: <3FD9DAFD.3060301@staff.mcgill.ca> Message-ID: Code for local and remotely obtained sequences should be identical - that is the whole point of the engineering these Sequence/Feature/Annotation objects. Does the accession actually have a translation tag when you pull it up in Entrez? -jason On Fri, 12 Dec 2003, Vince Forgetta wrote: > Hi all, > > i have seen on some previous posts that you can retrieve the CDS start > and stop from a GenBank DNA sequence accessionif the file is stored > locally and read in using Bio::SeqIO. How would I retrieve the > translation start and stop of a GenBank accession downloaded using > Bio::DB::GenBank. For example, the code below does not seem to find a > tag "translation": > > use Bio::DB::GenBank; > my $gb = new Bio::DB::GenBank; > my $seq; > my $accession; > $seq = $gb->get_Seq_by_acc($accession); > while (not(defined($seq))); > foreach my $feat ($seq->all_SeqFeatures()){ > my $CDS = ""; > if ($feat->has_tag('translation')){ > $CDS = $feat->start."..".$feat->end; > return "$CDS\n"; > }else{ > return "Not found\n"; > } > > > Thank you for your time. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From vince.forgetta at staff.mcgill.ca Fri Dec 12 10:43:51 2003 From: vince.forgetta at staff.mcgill.ca (Vince Forgetta) Date: Fri Dec 12 10:53:41 2003 Subject: [Bioperl-l] using Bio::DB::GenBank get translation In-Reply-To: References: <3FD9DAFD.3060301@staff.mcgill.ca> Message-ID: <3FD9E237.1090707@staff.mcgill.ca> Jason Stajich wrote: >Code for local and remotely obtained sequences should be identical - that >is the whole point of the engineering these Sequence/Feature/Annotation >objects. > > >Does the accession actually have a translation tag when you pull >it up in Entrez? > > Yes. I tried NM_021044. Thanks. Vince >-jason >On Fri, 12 Dec 2003, Vince Forgetta wrote: > > > >>Hi all, >> >>i have seen on some previous posts that you can retrieve the CDS start >>and stop from a GenBank DNA sequence accessionif the file is stored >>locally and read in using Bio::SeqIO. How would I retrieve the >>translation start and stop of a GenBank accession downloaded using >>Bio::DB::GenBank. For example, the code below does not seem to find a >>tag "translation": >> >>use Bio::DB::GenBank; >>my $gb = new Bio::DB::GenBank; >>my $seq; >> my $accession; >> $seq = $gb->get_Seq_by_acc($accession); >> while (not(defined($seq))); >>foreach my $feat ($seq->all_SeqFeatures()){ >> my $CDS = ""; >> if ($feat->has_tag('translation')){ >> $CDS = $feat->start."..".$feat->end; >> return "$CDS\n"; >> }else{ >> return "Not found\n"; >> } >> >> >>Thank you for your time. >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >-- >Jason Stajich >Duke University >jason at cgt.mc.duke.edu > > > -- +-----------------------------------------------------------+ | Vincenzo Forgetta | | Computational Biology | | McGill University and Genome Quebec Innovation Centre | | 740 Dr. Penfield Avenue | | Room 7211 | | Montreal, Quebec Canada, H3A 1A4 | | Tel: 514-398-3311 00476 | | Email: vince.forgetta@staff.mcgill.ca | +-----------------------------------------------------------+ From brian_osborne at cognia.com Fri Dec 12 10:46:08 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Dec 12 10:54:49 2003 Subject: [Bioperl-l] using Bio::DB::GenBank get translation In-Reply-To: <3FD9DAFD.3060301@staff.mcgill.ca> Message-ID: Vince, Your code looked correct to me, I was puzzled, so I ran it and it worked. Here's the output: No translation 1..864 No translation No translation No translation No translation No translation Here's the code: use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seq; my $accession = shift or die "No accession\n"; $seq = $gb->get_Seq_by_acc($accession); foreach my $feat ($seq->all_SeqFeatures){ my $CDS = ""; if ($feat->has_tag('translation')){ $CDS = $feat->start."..".$feat->end; print "$CDS\n"; }else{ print "No translation\n"; } } Try it with accession "AB072353", for example. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Vince Forgetta Sent: Friday, December 12, 2003 10:13 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] using Bio::DB::GenBank get translation Hi all, i have seen on some previous posts that you can retrieve the CDS start and stop from a GenBank DNA sequence accessionif the file is stored locally and read in using Bio::SeqIO. How would I retrieve the translation start and stop of a GenBank accession downloaded using Bio::DB::GenBank. For example, the code below does not seem to find a tag "translation": use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seq; my $accession; $seq = $gb->get_Seq_by_acc($accession); while (not(defined($seq))); foreach my $feat ($seq->all_SeqFeatures()){ my $CDS = ""; if ($feat->has_tag('translation')){ $CDS = $feat->start."..".$feat->end; return "$CDS\n"; }else{ return "Not found\n"; } Thank you for your time. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From vince.forgetta at staff.mcgill.ca Fri Dec 12 10:49:51 2003 From: vince.forgetta at staff.mcgill.ca (Vince Forgetta) Date: Fri Dec 12 10:59:29 2003 Subject: [Bioperl-l] using Bio::DB::GenBank get translation In-Reply-To: References: Message-ID: <3FD9E39F.7030509@staff.mcgill.ca> It doesn't seem to work if I try NM_021044 or AF148226. the first is a RefSeq, the second has a wierd trnaslation field. Thanks Brian Osborne wrote: >Vince, > >Your code looked correct to me, I was puzzled, so I ran it and it worked. >Here's the output: > >No translation >1..864 >No translation >No translation >No translation >No translation >No translation > >Here's the code: > >use Bio::DB::GenBank; >my $gb = new Bio::DB::GenBank; >my $seq; >my $accession = shift or die "No accession\n"; > >$seq = $gb->get_Seq_by_acc($accession); > >foreach my $feat ($seq->all_SeqFeatures){ > my $CDS = ""; > if ($feat->has_tag('translation')){ > $CDS = $feat->start."..".$feat->end; > print "$CDS\n"; > }else{ > print "No translation\n"; > } >} > >Try it with accession "AB072353", for example. > >Brian O. > > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Vince Forgetta >Sent: Friday, December 12, 2003 10:13 AM >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] using Bio::DB::GenBank get translation > >Hi all, > >i have seen on some previous posts that you can retrieve the CDS start >and stop from a GenBank DNA sequence accessionif the file is stored >locally and read in using Bio::SeqIO. How would I retrieve the >translation start and stop of a GenBank accession downloaded using >Bio::DB::GenBank. For example, the code below does not seem to find a >tag "translation": > >use Bio::DB::GenBank; >my $gb = new Bio::DB::GenBank; >my $seq; > my $accession; > $seq = $gb->get_Seq_by_acc($accession); > while (not(defined($seq))); >foreach my $feat ($seq->all_SeqFeatures()){ > my $CDS = ""; > if ($feat->has_tag('translation')){ > $CDS = $feat->start."..".$feat->end; > return "$CDS\n"; > }else{ > return "Not found\n"; > } > > >Thank you for your time. > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- +-----------------------------------------------------------+ | Vincenzo Forgetta | | Computational Biology | | McGill University and Genome Quebec Innovation Centre | | 740 Dr. Penfield Avenue | | Room 7211 | | Montreal, Quebec Canada, H3A 1A4 | | Tel: 514-398-3311 00476 | | Email: vince.forgetta@staff.mcgill.ca | +-----------------------------------------------------------+ From vince.forgetta at staff.mcgill.ca Fri Dec 12 10:54:00 2003 From: vince.forgetta at staff.mcgill.ca (Vince Forgetta) Date: Fri Dec 12 11:03:48 2003 Subject: [Bioperl-l] using Bio::DB::GenBank get translation In-Reply-To: References: Message-ID: <3FD9E498.6040101@staff.mcgill.ca> I figured out the problem. Call it a sematic lapse in perl coding, but my code below returns a result and exits on the first failure of finding a tag that has translation !! Doh !! Sorry for the bother, and thanks for the help. Vince Brian Osborne wrote: >Vince, > >Your code looked correct to me, I was puzzled, so I ran it and it worked. >Here's the output: > >No translation >1..864 >No translation >No translation >No translation >No translation >No translation > >Here's the code: > >use Bio::DB::GenBank; >my $gb = new Bio::DB::GenBank; >my $seq; >my $accession = shift or die "No accession\n"; > >$seq = $gb->get_Seq_by_acc($accession); > >foreach my $feat ($seq->all_SeqFeatures){ > my $CDS = ""; > if ($feat->has_tag('translation')){ > $CDS = $feat->start."..".$feat->end; > print "$CDS\n"; > }else{ > print "No translation\n"; > } >} > >Try it with accession "AB072353", for example. > >Brian O. > > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Vince Forgetta >Sent: Friday, December 12, 2003 10:13 AM >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] using Bio::DB::GenBank get translation > >Hi all, > >i have seen on some previous posts that you can retrieve the CDS start >and stop from a GenBank DNA sequence accessionif the file is stored >locally and read in using Bio::SeqIO. How would I retrieve the >translation start and stop of a GenBank accession downloaded using >Bio::DB::GenBank. For example, the code below does not seem to find a >tag "translation": > >use Bio::DB::GenBank; >my $gb = new Bio::DB::GenBank; >my $seq; > my $accession; > $seq = $gb->get_Seq_by_acc($accession); > while (not(defined($seq))); >foreach my $feat ($seq->all_SeqFeatures()){ > my $CDS = ""; > if ($feat->has_tag('translation')){ > $CDS = $feat->start."..".$feat->end; > return "$CDS\n"; > }else{ > return "Not found\n"; > } > > >Thank you for your time. > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- +-----------------------------------------------------------+ | Vincenzo Forgetta | | Computational Biology | | McGill University and Genome Quebec Innovation Centre | | 740 Dr. Penfield Avenue | | Room 7211 | | Montreal, Quebec Canada, H3A 1A4 | | Tel: 514-398-3311 00476 | | Email: vince.forgetta@staff.mcgill.ca | +-----------------------------------------------------------+ From Deborah.Simon at ingenium-ag.com Fri Dec 12 10:57:59 2003 From: Deborah.Simon at ingenium-ag.com (Simon, Deborah) Date: Fri Dec 12 11:04:29 2003 Subject: [Bioperl-l] using Bio::DB::GenBank get translation Message-ID: <6F829EB012AE3F479BE397616627A27E12EA31@newyork.ing-ag.it.local> I do something like... use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seqobj = $gb->get_Seq_by_gi($gi); foreach my $featobj ( $seqobj->top_SeqFeatures() ) { if ($featobj->primary_tag eq "CDS") { print $featobj->start; print $featobj->end; } } ... always works fine for me. If this is not the "right" way please let me know. -deb > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: Friday, 12. December 2003 16:42 > To: Vince Forgetta > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] using Bio::DB::GenBank get translation > > > Code for local and remotely obtained sequences should be > identical - that > is the whole point of the engineering these > Sequence/Feature/Annotation > objects. > > > Does the accession actually have a translation tag when you pull > it up in Entrez? > > -jason > On Fri, 12 Dec 2003, Vince Forgetta wrote: > > > Hi all, > > > > i have seen on some previous posts that you can retrieve > the CDS start > > and stop from a GenBank DNA sequence accessionif the file is stored > > locally and read in using Bio::SeqIO. How would I retrieve the > > translation start and stop of a GenBank accession downloaded using > > Bio::DB::GenBank. For example, the code below does not seem > to find a > > tag "translation": > > > > use Bio::DB::GenBank; > > my $gb = new Bio::DB::GenBank; > > my $seq; > > my $accession; > > $seq = $gb->get_Seq_by_acc($accession); > > while (not(defined($seq))); > > foreach my $feat ($seq->all_SeqFeatures()){ > > my $CDS = ""; > > if ($feat->has_tag('translation')){ > > $CDS = $feat->start."..".$feat->end; > > return "$CDS\n"; > > }else{ > > return "Not found\n"; > > } > > > > > > Thank you for your time. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From brian_osborne at cognia.com Fri Dec 12 10:56:01 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Dec 12 11:04:37 2003 Subject: [Bioperl-l] using Bio::DB::GenBank get translation In-Reply-To: <3FD9E39F.7030509@staff.mcgill.ca> Message-ID: Vince, But the GenBank id, AB072353, works? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Vince Forgetta Sent: Friday, December 12, 2003 10:50 AM To: Brian Osborne Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] using Bio::DB::GenBank get translation It doesn't seem to work if I try NM_021044 or AF148226. the first is a RefSeq, the second has a wierd trnaslation field. Thanks Brian Osborne wrote: >Vince, > >Your code looked correct to me, I was puzzled, so I ran it and it worked. >Here's the output: > >No translation >1..864 >No translation >No translation >No translation >No translation >No translation > >Here's the code: > >use Bio::DB::GenBank; >my $gb = new Bio::DB::GenBank; >my $seq; >my $accession = shift or die "No accession\n"; > >$seq = $gb->get_Seq_by_acc($accession); > >foreach my $feat ($seq->all_SeqFeatures){ > my $CDS = ""; > if ($feat->has_tag('translation')){ > $CDS = $feat->start."..".$feat->end; > print "$CDS\n"; > }else{ > print "No translation\n"; > } >} > >Try it with accession "AB072353", for example. > >Brian O. > > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Vince Forgetta >Sent: Friday, December 12, 2003 10:13 AM >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] using Bio::DB::GenBank get translation > >Hi all, > >i have seen on some previous posts that you can retrieve the CDS start >and stop from a GenBank DNA sequence accessionif the file is stored >locally and read in using Bio::SeqIO. How would I retrieve the >translation start and stop of a GenBank accession downloaded using >Bio::DB::GenBank. For example, the code below does not seem to find a >tag "translation": > >use Bio::DB::GenBank; >my $gb = new Bio::DB::GenBank; >my $seq; > my $accession; > $seq = $gb->get_Seq_by_acc($accession); > while (not(defined($seq))); >foreach my $feat ($seq->all_SeqFeatures()){ > my $CDS = ""; > if ($feat->has_tag('translation')){ > $CDS = $feat->start."..".$feat->end; > return "$CDS\n"; > }else{ > return "Not found\n"; > } > > >Thank you for your time. > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- +-----------------------------------------------------------+ | Vincenzo Forgetta | | Computational Biology | | McGill University and Genome Quebec Innovation Centre | | 740 Dr. Penfield Avenue | | Room 7211 | | Montreal, Quebec Canada, H3A 1A4 | | Tel: 514-398-3311 00476 | | Email: vince.forgetta@staff.mcgill.ca | +-----------------------------------------------------------+ _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From cain at cshl.org Fri Dec 12 11:26:31 2003 From: cain at cshl.org (Scott Cain) Date: Fri Dec 12 11:32:30 2003 Subject: [Bioperl-l] Re: Changes to GFF 2.5 "unflattening" code In-Reply-To: <200312121556.hBCFt3FD004940@portal.open-bio.org> References: <200312121556.hBCFt3FD004940@portal.open-bio.org> Message-ID: <1071246391.1476.33.camel@localhost.localdomain> Lincoln and Sheldon, For your information, I wrote a new genbank2gff3.pl script for use with the pending GMOD release. I anticipate that it will form the foundation for rewriting the biofetch adaptor. It uses Unflattener.pm and seems to work for the organisms I tested (human, worm, fly, mosquito, and Ecoli). It is in the GMOD cvs in the schema repository at schema/chado/load/bin/genbank2gff.PLS. Scott On Fri, 2003-12-12 at 10:56, bioperl-l-request@portal.open-bio.org wrote: > Hi Mark, Sheldon, > > I saw your change to the _parse_gff2_group code in Bio::DB::GFF, which > prioritizes "gene", "locus_tag" and "transcript" as group fields in > the column 9 attributes. I like it, but unfortunately it breaks some > other code that I have, including the GMOD tutorial. > > I think you'll like what I've done instead. I've added a > preferred_groups() method to which you pass a list of group names. > Then, this list will be used as the priority list to pluck out groups > from the GFF2 attribute list. To get your previous behavior, you need > to do this: > > $db = Bio::DB::GFF->new(-preferred_groups=>['gene','locus_tag','transcript'], > @other_args); > $db->load_gff(...); > > or this > > $db = Bio::DB::GFF->new(@other_args); > $db->preferred_groups('gene','locus_tag','transcript'); > $db->load_gff(...); > > You'll have to change your existing scripts accordingly. Sure, this > should be merged with Chris's unflattener, but then again let's just > get to GFF3 as quickly as we possibly can and leave this nightmare > behind us! > > Lincoln -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From brian_osborne at cognia.com Fri Dec 12 12:06:36 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Dec 12 12:15:18 2003 Subject: [Bioperl-l] Please help: upgraded to 1.2.3 and interfaces changed In-Reply-To: Message-ID: David, "examples/search-blast" is an older directory name, and I've fixed the documentation. Try "examples/searchio" instead. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of David Trusty Sent: Wednesday, December 10, 2003 11:15 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Please help: upgraded to 1.2.3 and interfaces changed Hi, I am maintaining some code which uses Bioperl. I had to upgrade our Bioperl version, and now the code which uses the Bioperl functions is not working. Here is a piece of code which is no longer working: my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $blast_report = $factory->blastall($seqA); print ERROR_LOG "BLAST HITS TABLE\n\n"; print ERROR_LOG $blast_report->table_labels_tiled(); print ERROR_LOG $blast_report->table_tiled; I get this error: Can't locate object method "table_labels_tiled" via package "Bio::SearchIO::blast" at exon.cgi line 511. And for this code foreach $hit ($blast_report->hits) { I get this error: Can't locate object method "hits" via package "Bio::SearchIO::blast" at exon.cgi line 534. Is there a replacement for table_labels_tiled? I think I need to ask the factory to give me a Bio::Search::Result::BlastResult object, and then a Bio::Search::Hit::BlastHit. Do you agree? I've been looking for an example, but can't seem to find one. How can I change the code to get a Bio::Search::Result::BlastResult and then a Bio::Search::Hit::BlastHit object? The web site mentions examples in a directory called examples/search-blast, but I can't find it. Is there an example I can look at? Thanks, David _________________________________________________________________ Take advantage of our best MSN Dial-up offer of the year - six months @$9.95/month. Sign up now! http://join.msn.com/?page=dept/dialup _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From laurichj at bioinfo.ucr.edu Fri Dec 12 12:34:48 2003 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Fri Dec 12 12:40:49 2003 Subject: [Bioperl-l] Bio::Graphics::Panel section Message-ID: <20031212173448.GA7324@bioinfo.ucr.edu> I came accrossed a really strange error with Bio::Graphics::Panel.pm with its DATA section. The regex: last if /^__END__/; failed to match the line: __END__ However, it only did this in one of the scripts that I use. After a couple hours of messing with it, I still can't figure out what the problem is, so I just got rid of it by moving the DATA section to the bottom of the file. Does anyone have any idea what could cause a problem like this? Is it probably just some strange conflict with another package in this script? -- ---------------------------- | Josh Lauricha | | laurichj@bioinfo.ucr.edu | | Bioinformatics, UCR | |--------------------------| From lembark at wrkhors.com Fri Dec 12 13:41:22 2003 From: lembark at wrkhors.com (Steven Lembark) Date: Fri Dec 12 13:42:29 2003 Subject: [Bioperl-l] Question on whole-genome comparisions Message-ID: <1517912704.1071254482@[192.168.200.4]> perl-5.8.2 w/ bio-perl 1.2.3 I am testing code for whole-genome comparisions. Right now the two are m.genatilium and m.pneumoniae. For the comparision I'm trying to use their GenBank files: ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Mycoplasma_pneumoniae/U0008 9.gbk ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Mycoplasma_genitalium/L4396 7.gbk Catch is that I cannot seem to iterate the genes within the whole-genome GenBank files. I'm using the code below to try and walk down the genes. $genome gets set to the entire genome -- which makes some sense as this is a whole-genome file. The genes are IN $genome, I just cannot seem to get at them using get_SeqFeatures. The problem is that @genome ends up with 1003 entries in it for m.genatilium, which has 400+ genes... $genome[0] is obviously the entire DNA given the start and end: '_gsf_tag_hash' => HASH(0x8a4f944) 'db_xref' => ARRAY(0x8a4f974) 0 'taxon:2097' 'isolate' => ARRAY(0x8a4f95c) 0 'G37' 'organism' => ARRAY(0x8a4f950) 0 'Mycoplasma genitalium' '_location' => Bio::Location::Simple=HASH(0x8a4f848) '_end' => 580074 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'L43967' '_start' => 1 '_strand' => 1 '_primary_tag' => 'source' '_source_tag' => 'EMBL/GenBank/SwissProt' $genome[1] looks like a gene: '_gsf_tag_hash' => HASH(0x8a4fa04) 'db_xref' => ARRAY(0x8a4fa40) 0 'GenBank:3844619' 'gene' => ARRAY(0x8a4fa7c) 0 'MG001' '_location' => Bio::Location::Simple=HASH(0x8a4fb60) '_end' => 1829 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'L43967' '_start' => 735 '_strand' => 1 '_primary_tag' => 'gene' '_source_tag' => 'EMBL/GenBank/SwissProt' But short of hacking the hash key's, I cannot find any good way to iterate @genome to get the DNA sequences for each of the genes. The elements of @genome are not blessed, so there is something I'm missing... I've gone through the tutorial and sequence I/O doc's, Bio::DB::GenBank. Q: Is there any doc on processing whole-genome GenBank files? #!/opt/bin/perl use strict; use warnings; use Bio::Seq; sub read_genome { use Bio::SeqIO; my $in = Bio::SeqIO->new( qw( -format genbank -file ), shift ); my @a = (); # whole genome gives an array of all genes as a feature of the # first sequence... my $genome = $in->next_seq; # this includes the whole genome as the first item, # followed by individual genes. my @genz = $genome->get_SeqFeatures; } # q: at this point what is the best way to iterate the # DNA sequences for each of the genes? my @genome = read_genome shift; $DB::single = 1; 0 __END__ x $genome 1 Bio::SeqFeature::Generic=HASH(0x8a4a420) '_gsf_seq' => Bio::PrimarySeq=HASH(0x8c2aea8) -> REUSED_ADDRESS '_gsf_tag_hash' => HASH(0x8a4e190) 'db_xref' => ARRAY(0x8a4e1cc) 0 'GenBank:3844619' 'gene' => ARRAY(0x8a501d4) 0 'MG001' '_location' => Bio::Location::Simple=HASH(0x8a502b8) '_end' => 1829 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'L43967' '_start' => 735 '_strand' => 1 '_primary_tag' => 'gene' '_source_tag' => 'EMBL/GenBank/SwissProt' 2 Bio::SeqFeature::Generic=HASH(0x8a4a468) '_gsf_seq' => Bio::PrimarySeq=HASH(0x8c2aea8) -> REUSED_ADDRESS '_gsf_tag_hash' => HASH(0x8a5036c) 'codon_start' => ARRAY(0x8a50474) 0 1 'db_xref' => ARRAY(0x8a50450) 0 'GI:3844620' 'gene' => ARRAY(0x8a503f0) 0 'MG001' 'note' => ARRAY(0x8a504e0) 0 'similar to GB:U00089 SP:Q50313 PID:1209517 PID:1673814 percent identity: 70.87; identified by sequence similarity; putative' 'product' => ARRAY(0x8a504ec) 0 'DNA polymerase III, subunit beta (dnaN)' 'protein_id' => ARRAY(0x8a50348) 0 'AAC71217.1' 'transl_table' => ARRAY(0x8a50438) 0 4 'translation' => ARRAY(0x8a504a4) 0 'MNNVIISNNKIKPHHSYFLIEAKEKEINFYANNEYFSVKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEEIIT IQEKDQTLLVKTKKTSINLNTINVNEFPRIRFNEKNDLSEFNQFKINYSLLVKGIKKIFHSVSNNREISSKFNGV NFNGSNGKEIFLEASDTYKLSVFEIKQETEPFDFILESNLLSFINSFNPEEDKSIVFYYRKDNKDSFSTEMLISM DNFMISYTSVNEKFPEVNYFFEFEPETKIVVQKNELKDALQRIQTLAQNERTFLCDMQINSSELKIRAIVNNIGN SLEEISCLKFEGYKLNISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVPSR' '_location' => Bio::Location::Simple=HASH(0x8a5048c) '_end' => 1829 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'L43967' '_start' => 735 '_strand' => 1 '_primary_tag' => 'CDS' '_source_tag' => 'EMBL/GenBank/SwissProt' -- Steven Lembark 2930 W. Palmer Workhorse Computing Chicago, IL 60647 +1 888 359 3508 From brian_osborne at cognia.com Fri Dec 12 14:08:11 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Dec 12 14:16:52 2003 Subject: [Bioperl-l] Question on whole-genome comparisions In-Reply-To: <1517912704.1071254482@[192.168.200.4]> Message-ID: Steven, >seem to get at them using get_SeqFeatures. The problem is >that @genome ends up with 1003 entries in it for m.genatilium, >which has 400+ genes... If I understand you correctly you're saying you want to get only those sequence features which represent genes, yes? Any given gene in the sequence is likely to have at least 2 features, "CDS" and "gene", and there may be other interesting features that the authors have noted in the Genbank file as well. This explains your numeric discrepancy. Since you're new to this Genbank parsing business and I'm starting to write a HOWTO on Features could you take a look at my first draft and see if it helps you? I'd be interested in your comments. It's here: http://bioperl.org/HOWTOs/html/Feature-Annotation.html Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Steven Lembark Sent: Friday, December 12, 2003 1:41 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Question on whole-genome comparisions perl-5.8.2 w/ bio-perl 1.2.3 I am testing code for whole-genome comparisions. Right now the two are m.genatilium and m.pneumoniae. For the comparision I'm trying to use their GenBank files: ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Mycoplasma_pneumoniae/U0008 9.gbk ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Mycoplasma_genitalium/L4396 7.gbk Catch is that I cannot seem to iterate the genes within the whole-genome GenBank files. I'm using the code below to try and walk down the genes. $genome gets set to the entire genome -- which makes some sense as this is a whole-genome file. The genes are IN $genome, I just cannot seem to get at them using get_SeqFeatures. The problem is that @genome ends up with 1003 entries in it for m.genatilium, which has 400+ genes... $genome[0] is obviously the entire DNA given the start and end: '_gsf_tag_hash' => HASH(0x8a4f944) 'db_xref' => ARRAY(0x8a4f974) 0 'taxon:2097' 'isolate' => ARRAY(0x8a4f95c) 0 'G37' 'organism' => ARRAY(0x8a4f950) 0 'Mycoplasma genitalium' '_location' => Bio::Location::Simple=HASH(0x8a4f848) '_end' => 580074 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'L43967' '_start' => 1 '_strand' => 1 '_primary_tag' => 'source' '_source_tag' => 'EMBL/GenBank/SwissProt' $genome[1] looks like a gene: '_gsf_tag_hash' => HASH(0x8a4fa04) 'db_xref' => ARRAY(0x8a4fa40) 0 'GenBank:3844619' 'gene' => ARRAY(0x8a4fa7c) 0 'MG001' '_location' => Bio::Location::Simple=HASH(0x8a4fb60) '_end' => 1829 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'L43967' '_start' => 735 '_strand' => 1 '_primary_tag' => 'gene' '_source_tag' => 'EMBL/GenBank/SwissProt' But short of hacking the hash key's, I cannot find any good way to iterate @genome to get the DNA sequences for each of the genes. The elements of @genome are not blessed, so there is something I'm missing... I've gone through the tutorial and sequence I/O doc's, Bio::DB::GenBank. Q: Is there any doc on processing whole-genome GenBank files? #!/opt/bin/perl use strict; use warnings; use Bio::Seq; sub read_genome { use Bio::SeqIO; my $in = Bio::SeqIO->new( qw( -format genbank -file ), shift ); my @a = (); # whole genome gives an array of all genes as a feature of the # first sequence... my $genome = $in->next_seq; # this includes the whole genome as the first item, # followed by individual genes. my @genz = $genome->get_SeqFeatures; } # q: at this point what is the best way to iterate the # DNA sequences for each of the genes? my @genome = read_genome shift; $DB::single = 1; 0 __END__ x $genome 1 Bio::SeqFeature::Generic=HASH(0x8a4a420) '_gsf_seq' => Bio::PrimarySeq=HASH(0x8c2aea8) -> REUSED_ADDRESS '_gsf_tag_hash' => HASH(0x8a4e190) 'db_xref' => ARRAY(0x8a4e1cc) 0 'GenBank:3844619' 'gene' => ARRAY(0x8a501d4) 0 'MG001' '_location' => Bio::Location::Simple=HASH(0x8a502b8) '_end' => 1829 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'L43967' '_start' => 735 '_strand' => 1 '_primary_tag' => 'gene' '_source_tag' => 'EMBL/GenBank/SwissProt' 2 Bio::SeqFeature::Generic=HASH(0x8a4a468) '_gsf_seq' => Bio::PrimarySeq=HASH(0x8c2aea8) -> REUSED_ADDRESS '_gsf_tag_hash' => HASH(0x8a5036c) 'codon_start' => ARRAY(0x8a50474) 0 1 'db_xref' => ARRAY(0x8a50450) 0 'GI:3844620' 'gene' => ARRAY(0x8a503f0) 0 'MG001' 'note' => ARRAY(0x8a504e0) 0 'similar to GB:U00089 SP:Q50313 PID:1209517 PID:1673814 percent identity: 70.87; identified by sequence similarity; putative' 'product' => ARRAY(0x8a504ec) 0 'DNA polymerase III, subunit beta (dnaN)' 'protein_id' => ARRAY(0x8a50348) 0 'AAC71217.1' 'transl_table' => ARRAY(0x8a50438) 0 4 'translation' => ARRAY(0x8a504a4) 0 'MNNVIISNNKIKPHHSYFLIEAKEKEINFYANNEYFSVKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEEIIT IQEKDQTLLVKTKKTSINLNTINVNEFPRIRFNEKNDLSEFNQFKINYSLLVKGIKKIFHSVSNNREISSKFNGV NFNGSNGKEIFLEASDTYKLSVFEIKQETEPFDFILESNLLSFINSFNPEEDKSIVFYYRKDNKDSFSTEMLISM DNFMISYTSVNEKFPEVNYFFEFEPETKIVVQKNELKDALQRIQTLAQNERTFLCDMQINSSELKIRAIVNNIGN SLEEISCLKFEGYKLNISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVPSR' '_location' => Bio::Location::Simple=HASH(0x8a5048c) '_end' => 1829 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'L43967' '_start' => 735 '_strand' => 1 '_primary_tag' => 'CDS' '_source_tag' => 'EMBL/GenBank/SwissProt' -- Steven Lembark 2930 W. Palmer Workhorse Computing Chicago, IL 60647 +1 888 359 3508 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From heikki at nildram.co.uk Fri Dec 12 15:17:06 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Fri Dec 12 15:23:06 2003 Subject: [Bioperl-l] using Bio::DB::GenBank get translation In-Reply-To: <3FD9DAFD.3060301@staff.mcgill.ca> References: <3FD9DAFD.3060301@staff.mcgill.ca> Message-ID: <200312122017.06569.heikki@nildram.co.uk> Vince, That while (not(defined($seq))); line does not make sense. Something like: exit unless $seq; would do the trick. The seqfeature part of the code works, so you have a problem accessing sequences. The following code: ---------------- /tmp/cds.pl ------------------------------------------- use Bio::SeqIO; $in = Bio::SeqIO->new(-file => shift -format => 'genbank'); my $seq = $in->next_seq(); foreach my $feat ($seq->all_SeqFeatures){ my $CDS = ""; if ($feat->has_tag('translation')){ $CDS = $feat->start."..".$feat->end; print "$CDS\n"; }else{ print "Not found\n"; } } ---------------- /tmp/cds.pl ------------------------------------------- when run like this: perl /tmp/cds.pl ~/src/bioperl/core/t/data/AB077698.gb print out: Not found Not found Not found 80..1144 Not found Not found Not found Not found Not found Not found Not found which, I suppose, is what you wanted. Yours, -Heikki On Friday 12 Dec 2003 3:13 pm, Vince Forgetta wrote: > Hi all, > > i have seen on some previous posts that you can retrieve the CDS start > and stop from a GenBank DNA sequence accessionif the file is stored > locally and read in using Bio::SeqIO. How would I retrieve the > translation start and stop of a GenBank accession downloaded using > Bio::DB::GenBank. For example, the code below does not seem to find a > tag "translation": > > use Bio::DB::GenBank; > my $gb = new Bio::DB::GenBank; > my $seq; > my $accession; > $seq = $gb->get_Seq_by_acc($accession); > while (not(defined($seq))); > foreach my $feat ($seq->all_SeqFeatures()){ > my $CDS = ""; > if ($feat->has_tag('translation')){ > $CDS = $feat->start."..".$feat->end; > return "$CDS\n"; > }else{ > return "Not found\n"; > } > > > Thank you for your time. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From cjm at fruitfly.org Fri Dec 12 18:58:23 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Dec 12 16:55:42 2003 Subject: [Bioperl-l] Re: Changes to GFF 2.5 "unflattening" code In-Reply-To: <1071246391.1476.33.camel@localhost.localdomain> Message-ID: Nice one Scott! I imagine this script would be v useful to plenty of non GMOD/chado folks. Is there anything chado or GMOD specific about this? can we add it to bioperl instead of GMOD? (IMHO there are far too few scripts in bioperl, which is fine for the hardcore object-heads who'll roll up their own in a few minutes, but not so great for new users) What do you think of rolling some of the logic up from the script into bioperl modules? For example, the typemapping stuff could go into Bio::SeqFeature::Tools::TypeMapper, which already has a method for mapping to the Sequence Ontology Mapping of the SeqFeature nesting hierarchy to GFF ID/Parent tags could also take place in FeatureHolderI, as discussed on this list the other week. By the way, what are you doing for parent features that don't have a natural ID? Are you creating artificial surrogate IDs? That way we could easily roll out genbank2chadoxml, genbank2ensembl, genbank2game, genbank2das, genbank2biosql and fastafile generators like genbank2intron_fasta, genbank2spliced_utr_fasta, genbank2exon_fasta, genbank2intergenic_fasta, genbank2my_favourite_SO_type_fasta and so on - I think this is the sort of thing people are really often after when they start downloading and wrestling with the bioperl object model. By the way, we often use genbank, when what we really mean is genbank/eml(/ddbj?). is there a handy short catchy name for this collective, or shall we carry on just using the term genbank to denote the collection of genbank-like formats? This is all incredibly useful stuff in my opinion - for ages we've been able to say "we have a parser for format X" in bioperl, but really it's still been a semantic quagmire, the parsing is just the first step. Cheers Chris On Fri, 12 Dec 2003, Scott Cain wrote: > Lincoln and Sheldon, > > For your information, I wrote a new genbank2gff3.pl script for use with > the pending GMOD release. I anticipate that it will form the foundation > for rewriting the biofetch adaptor. It uses Unflattener.pm and seems to > work for the organisms I tested (human, worm, fly, mosquito, and > Ecoli). It is in the GMOD cvs in the schema repository at > schema/chado/load/bin/genbank2gff.PLS. > > Scott > > On Fri, 2003-12-12 at 10:56, bioperl-l-request@portal.open-bio.org > wrote: > > Hi Mark, Sheldon, > > > > I saw your change to the _parse_gff2_group code in Bio::DB::GFF, which > > prioritizes "gene", "locus_tag" and "transcript" as group fields in > > the column 9 attributes. I like it, but unfortunately it breaks some > > other code that I have, including the GMOD tutorial. > > > > I think you'll like what I've done instead. I've added a > > preferred_groups() method to which you pass a list of group names. > > Then, this list will be used as the priority list to pluck out groups > > from the GFF2 attribute list. To get your previous behavior, you need > > to do this: > > > > $db = Bio::DB::GFF->new(-preferred_groups=>['gene','locus_tag','transcript'], > > @other_args); > > $db->load_gff(...); > > > > or this > > > > $db = Bio::DB::GFF->new(@other_args); > > $db->preferred_groups('gene','locus_tag','transcript'); > > $db->load_gff(...); > > > > You'll have to change your existing scripts accordingly. Sure, this > > should be merged with Chris's unflattener, but then again let's just > > get to GFF3 as quickly as we possibly can and leave this nightmare > > behind us! > > > > Lincoln > From lstein at cshl.edu Fri Dec 12 17:12:54 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Dec 12 17:18:58 2003 Subject: [Bioperl-l] Bio::DB::GFF dna method not working for wormbase115. In-Reply-To: <200312101830.hBAIUHX8018586@mx4.nyu.edu> References: <200312101830.hBAIUHX8018586@mx4.nyu.edu> Message-ID: <200312121712.54301.lstein@cshl.edu> It's working OK with me on Wormbase 115. One issue is that DNA is numbered from 1 onward, not from 0. Lincoln On Wednesday 10 December 2003 01:24 pm, Philip MacMenamin wrote: > Hi, > I have just loaded wormbase fatsa files to a GFF SQL database using > Lincolns load_gff script, and everything was fine. > > However when I try to get the dna back out, using the same script > that worked (works) for wormbase110 does not work now, ie: > > use Bio::DB::GFF; > doConection stuff blah blah blah; > my $segment1 = $db->segment('I',0, 2000); > my $dna = $segment1->dna; > print $dna if $debug; > > I can step through the perl debugger and find out how the story is > differant between version 110 and 115, but at first look the > databases seem similar. > > I looked through the mail archives to see if others have had this > problem, but drew a blank, so I want to make sure that this is > known. > > All the best, > Philip. From lairdm at sfu.ca Fri Dec 12 18:23:56 2003 From: lairdm at sfu.ca (Matthew Laird) Date: Fri Dec 12 18:27:17 2003 Subject: [Bioperl-l] Blast return codes In-Reply-To: Message-ID: I did some more investigating and it's beginning to look a lot more bizarre. I changed the bioperl code to see if the blast command was actually run regardless of the -1 return code. And yes, blast did run and the result file is there - it's only that perl is returning a -1 to bioperl for some reason. I was actually just speaking with someone in another lab and he said he used to experience this problem too, his solution was to just write his own modules. :) So this certainly seems like some deep down perl/OS voodoo. I'd still be interested in hearing what envirnoment variables to set to run t/StandAloneBlast.t. Thanks. On Thu, 11 Dec 2003, Matthew Laird wrote: > Thanks for your assistance so far, I've been trying to find the difference > between the machines that do work and the ones that don't. The two most > similar machines which some do and some don't work are a group of Red Hat > 9 machines. > > The machines are running Red Hat 9 and Perl 5.8.0. I've tried both > bioperl 1.2.1 and 1.2.3. The only other difference I could find was that > perl was built from source on one of the machines that worked. I tried > doing that on one of the non-working machines and had no success. > > I've tried running the StandAloneBlast.t, what environment variables do I > need set? I receive: > [root@ssb7121-5 t]# perl StandAloneBlast.t > 1..10 > ok 1 > ok 2 > ok 3 > ok 4 > Blast Database ecoli.nt not found at StandAloneBlast.t line 67. > Blast Database swissprot not found at StandAloneBlast.t line 72. > Blast databases(s) not found, skipping remaining tests at > StandAloneBlast.t line 76. > ok 5 # skip Blast or env variables not installed correctly > ok 6 # skip Blast or env variables not installed correctly > ok 7 # skip Blast or env variables not installed correctly > ok 8 # skip Blast or env variables not installed correctly > ok 9 # skip Blast or env variables not installed correctly > ok 10 # skip Blast or env variables not installed correctly > > Obviously I need to set some variable so it can find the blast database. > > Thanks again. > > On 11 Dec 2003, Keith James wrote: > > > >>>>> "Matthew" == Matthew Laird writes: > > > > Matthew> Well, that's a step in the right direction, I have a > > Matthew> little more information now. I added a $! before the $? > > Matthew> and received: > > > > Matthew> ------------- EXCEPTION ------------- MSG: blastall call > > Matthew> crashed: -1 No child processes /usr/local/blast/blastall > > Matthew> -p blastp -d > > Matthew> /usr/local/psort/conf/analysis/sclblast/sclblast -i > > Matthew> /tmp/8Dt6zF1U59 -e 1e-09 -o /tmp/ojP9n04LZh > > > > Matthew> STACK Bio::Tools::Run::StandAloneBlast::_runblast > > Matthew> /usr/lib/perl5/site_perl/5.8.0/Bio/Tools/Run/StandAloneBlast.pm:640 > > > > Matthew> This is where we get into Perl voodoo beyond my league, > > Matthew> "No child processes" - does that ring bells for anyone? > > Matthew> Thanks again. > > > > That's interesting. I think we need to know your OS platform and Perl > > version to get any further. > > > > I think that the value left by a system call in $? is the same as if a > > wait system call were made. No child processes is a Unix error code > > (ECHILD) which can be caused by a wait, being reported by Perl. > > > > What happens if you run the test for StandAloneBlast? (t/StandAloneBlast.t) > > > > Keith > > > > > > -- Matthew Laird SysAdmin/Web Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From andreas.bernauer at gmx.de Fri Dec 12 14:39:19 2003 From: andreas.bernauer at gmx.de (Andreas Bernauer) Date: Fri Dec 12 18:31:39 2003 Subject: [Bioperl-l] Experiences from a newbie In-Reply-To: References: Message-ID: <20031212193919.GA22391@hgt.mcb.uconn.edu> Brian Osborne wrote: > Andreas, > > >As I din't want to look at the actual sequences and only needed the > >hits themselves, I told blast to omit the detailed listing of the > >alignments. This turned out to be a mistake, as for some reason that > >I can never think of, I can only get HITS in $result, when I include > >the alignments (which are accessed via HSP which I never use). Again > >something that I cannot access with my logic. > > I couldn't understand this. Could you restate this? or tell us exactly what > information you wanted to get from the Hit object? 0Sure: Let's have a look at this excerpt from a BLAST search result: BLASTP 2.2.1 [Apr-13-2001] ## header omitted ## This is the HIT LIST: Score E Sequences producing significant alignments: (bits) Value gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [ Pr... 63 2e-10 gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain [Pro... 63 2e-10 ## ... etc. etc. ... gi|33237342|ref|AAP99410.1| Septum formation inhibitor [ Prochlo... 31 0.60 ## this is the DETAIL LIST: >gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [ Prochlorococcus marinus subsp. marinus str. CCMP1375 ] Length = 385 Score = 62.8 bits (151), Expect = 2e-10 Identities = 71/348 (20%), Positives = 153/348 (43%), Gaps = 45/348 (12%) Query: 38 VKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEE---IITIQEKDQTLLVKTKKTSINLN 94 ++ +L+ +I+ G++ V K+F ++I+ + E ++ + + + +K+K + + Sbjct: 55 IQTSLSASIE--SSGAITVPSKLFGEIISKLSSESSITLSTDDSSEQVNLKSKSGNYQVR 112 ## etc. etc. Query: 315 NISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVP 362 I+FN LL+ ++ E+N I F + + T E +++P Sbjct: 333 QIAFNSRYLLEGLKIIETNTILLKFNAPTTPAIFTPNDETNFVYLVMP 380 >gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain [Prochlorococcus marinus subsp. pastoris str. CCMP1378] Length = 385 Score = 62.8 bits (151), Expect = 2e-10 Identities = 74/323 (22%), Positives = 150/323 (45%), Gaps = 38/323 (11%) Query: 36 FSVKCNLNKNID--ILEQGSLIVKGKIFNDLINGIKEEI---ITIQEKDQTLLVKTKKTS 90 F + + + D + + G++ + K+ ++++N + E + + E +L+K+ + S Sbjct: 49 FDLNLGIQTSFDATVNKSGAITIPSKLLSEIVNKLPSETPVSLDVDESSDNILIKSDRGS 108 ## etc. etc. I only need the gi number of the protein that was hit, the score and the evalue. I don't need the alignment or anything else from the DETAIL list. So I told blast to omit the detail list. (Idea: I am searching through several whole genomes, so this would also save me a significant amount of drive space.) But when I read the blast file with the hit list omitted, bioperl didn't report any hits at all, although the file still contained the hit list. I found this counterintuitive, as I expected the HIT object to give me the members of the HIT list, and the HSP object to give me more information about the alignment, i.e. informations about the members of the DETAIL list. When I told blast, no to omit the DETAIL list, everything worked fine. As I've written previously, I don't use the HSP object at all (get_gi extracts the gi number, $blastId refers to the blast $file): use Bio::SearchIO; # ...omitted for breviety... my $report = new Bio::SearchIO (-format => 'blast', -file => $file); while (my $result = $report->next_result ) { print "Next ($blastId) result (" . $result->num_hits . " hits)\n"; my $hitNo = 0; while (my $hit = $result->next_hit) { $hitNo++; printf "%s\t%s\t%s\t%d\t%d\n", get_gi($result->query_name), get_gi($hit->name), $hit->significance, $blastId, $hitNo; } } Andreas. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031212/42bd40b8/attachment-0001.bin From andreas.bernauer at gmx.de Fri Dec 12 14:55:08 2003 From: andreas.bernauer at gmx.de (Andreas Bernauer) Date: Fri Dec 12 18:31:42 2003 Subject: [Bioperl-l] Question on whole-genome comparisions In-Reply-To: <1517912704.1071254482@[192.168.200.4]> References: <1517912704.1071254482@[192.168.200.4]> Message-ID: <20031212195508.GB22391@hgt.mcb.uconn.edu> Steven Lembark wrote: > Q: Is there any doc on processing whole-genome GenBank files? > > #!/opt/bin/perl > > use strict; > use warnings; > > use Bio::Seq; > > sub read_genome > { > use Bio::SeqIO; > > my $in = Bio::SeqIO->new( qw( -format genbank -file ), shift > ); > > my @a = (); > > # whole genome gives an array of all genes as a feature of > the > # first sequence... > > my $genome = $in->next_seq; > > # this includes the whole genome as the first item, > # followed by individual genes. > > my @genz = $genome->get_SeqFeatures; # *** > } > > # q: at this point what is the best way to iterate the > # DNA sequences for each of the genes? > I am not sure if I understood you correctly. Do you think this will work at the place marked by *** ? @features = $genome->get_SeqFeatures(); # just top level foreach my $feat ( @features ) { print "Feature ",$feat->primary_tag," starts ",$feat->start," ends ", $feat->end," strand ",$feat->strand,"\n"; # features retain link to underlying sequence object print "Feature sequence is ",$feat->seq->seq(),"\n" } (This is from the Seq man page. http://doc.bioperl.org/releases/bioperl-1.2.3/Bio/Seq.html) foreach $feat ( $seq->top_SeqFeatures() ) { print "Feature from ", $feat->start, "to ", $feat->end, " Primary tag ", $feat->primary_tag, ", produced by ", $feat->source_tag(), "\n"; if ( $feat->strand == 0 ) { print "Feature applicable to either strand\n"; } else { print "Feature on strand ", $feat->strand,"\n"; # -1,1 } foreach $tag ( $feat->all_tags() ) { print "Feature has tag ", $tag, "with values, ", join(' ',$feat->each_tag_value($tag)), "\n"; } print "new feature\n" if $feat->has_tag('new'); # features can have sub features my @subfeat = $feat->get_SeqFeatures(); } (This is from the SeqFeatureI man page. http://doc.bioperl.org/releases/bioperl-1.2.3/Bio/SeqFeatureI.html) The man pages contain further informations on how to acccess the tags. I don't think you need the subfeature feature. Hope this helps. Andreas. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031212/0b0ca8b2/attachment.bin From kvddrift at earthlink.net Fri Dec 12 21:12:47 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri Dec 12 21:19:41 2003 Subject: [Bioperl-l] some tests fail on Mac OS X Message-ID: Hi, The following tests fail on Mac OS X (10.3.1) for the 1.303 release: AlignIO.t # 10 ELM.t # 11, 13, 14 ESEfinder.t # 4-5, 8, 11 RefSeq.t # 4-7, 9 RestrictionIO.t # 7-14 tutorial.t # 19-21 The tutorial test also complained that Bio::SeqIO: game cannot be found. Maybe there are some modules I am missing on my system? thanks, - Koen. From brian_osborne at cognia.com Sat Dec 13 00:24:45 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Sat Dec 13 00:33:09 2003 Subject: [Bioperl-l] Experiences from a newbie In-Reply-To: <20031212193919.GA22391@hgt.mcb.uconn.edu> Message-ID: Andreas, >When I told blast, not to omit the DETAIL list, everything worked fine. Interesting. I'd guess that when you just ask BLAST for hits the parsing would simply not work correctly. But for an accurate understanding of the internals we'll need to hear from someone like Jason! Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Andreas Bernauer Sent: Friday, December 12, 2003 2:39 PM To: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Experiences from a newbie Brian Osborne wrote: > Andreas, > > >As I din't want to look at the actual sequences and only needed the > >hits themselves, I told blast to omit the detailed listing of the > >alignments. This turned out to be a mistake, as for some reason that > >I can never think of, I can only get HITS in $result, when I include > >the alignments (which are accessed via HSP which I never use). Again > >something that I cannot access with my logic. > > I couldn't understand this. Could you restate this? or tell us exactly what > information you wanted to get from the Hit object? 0Sure: Let's have a look at this excerpt from a BLAST search result: BLASTP 2.2.1 [Apr-13-2001] ## header omitted ## This is the HIT LIST: Score E Sequences producing significant alignments: (bits) Value gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [ Pr... 63 2e-10 gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain [Pro... 63 2e-10 ## ... etc. etc. ... gi|33237342|ref|AAP99410.1| Septum formation inhibitor [ Prochlo... 31 0.60 ## this is the DETAIL LIST: >gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [ Prochlorococcus marinus subsp. marinus str. CCMP1375 ] Length = 385 Score = 62.8 bits (151), Expect = 2e-10 Identities = 71/348 (20%), Positives = 153/348 (43%), Gaps = 45/348 (12%) Query: 38 VKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEE---IITIQEKDQTLLVKTKKTSINLN 94 ++ +L+ +I+ G++ V K+F ++I+ + E ++ + + + +K+K + + Sbjct: 55 IQTSLSASIE--SSGAITVPSKLFGEIISKLSSESSITLSTDDSSEQVNLKSKSGNYQVR 112 ## etc. etc. Query: 315 NISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVP 362 I+FN LL+ ++ E+N I F + + T E +++P Sbjct: 333 QIAFNSRYLLEGLKIIETNTILLKFNAPTTPAIFTPNDETNFVYLVMP 380 >gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain [Prochlorococcus marinus subsp. pastoris str. CCMP1378] Length = 385 Score = 62.8 bits (151), Expect = 2e-10 Identities = 74/323 (22%), Positives = 150/323 (45%), Gaps = 38/323 (11%) Query: 36 FSVKCNLNKNID--ILEQGSLIVKGKIFNDLINGIKEEI---ITIQEKDQTLLVKTKKTS 90 F + + + D + + G++ + K+ ++++N + E + + E +L+K+ + S Sbjct: 49 FDLNLGIQTSFDATVNKSGAITIPSKLLSEIVNKLPSETPVSLDVDESSDNILIKSDRGS 108 ## etc. etc. I only need the gi number of the protein that was hit, the score and the evalue. I don't need the alignment or anything else from the DETAIL list. So I told blast to omit the detail list. (Idea: I am searching through several whole genomes, so this would also save me a significant amount of drive space.) But when I read the blast file with the hit list omitted, bioperl didn't report any hits at all, although the file still contained the hit list. I found this counterintuitive, as I expected the HIT object to give me the members of the HIT list, and the HSP object to give me more information about the alignment, i.e. informations about the members of the DETAIL list. When I told blast, no to omit the DETAIL list, everything worked fine. As I've written previously, I don't use the HSP object at all (get_gi extracts the gi number, $blastId refers to the blast $file): use Bio::SearchIO; # ...omitted for breviety... my $report = new Bio::SearchIO (-format => 'blast', -file => $file); while (my $result = $report->next_result ) { print "Next ($blastId) result (" . $result->num_hits . " hits)\n"; my $hitNo = 0; while (my $hit = $result->next_hit) { $hitNo++; printf "%s\t%s\t%s\t%d\t%d\n", get_gi($result->query_name), get_gi($hit->name), $hit->significance, $blastId, $hitNo; } } Andreas. From atmarvin at hotmail.com Sat Dec 13 08:48:58 2003 From: atmarvin at hotmail.com (=?iso-8859-1?B?U2ViYXN0aWFuIErDvG5lbWFubg==?=) Date: Sat Dec 13 08:54:57 2003 Subject: [Bioperl-l] Bio::MAGE Message-ID: > >I may have lost track, but to my knowledge Bio::MAGE is not part of >bioperl. Have you checked out the MAGEstk repository on sourceforge? > > -hilmar > Ok..i think you're right! Bio::MAGE is part of Perl5.8, but not of BioPerl.....so i think i will leave this mail list due to it's a little bit to advanced for me to understand ! So...thx for help anyway.....and still much fun by developing! By the way...i've foudn a mail-list dicussing only the mged-mage-theme: http://lists.sourceforge.net/lists/listinfo/mged-mage-perl Cheers, Sebastian >On Wednesday, December 10, 2003, at 11:43 AM, Sebastian J?nemann wrote: > >>Hi! >> >>Ive just described to this list...so im new in perl and of curse in >>handling Bioplerl Modules! >> >>Im working in a Project where we try to store genetik information and make >>them reachable via internet. >>Its on me to build a possibility for an user to import MAGE-Files ( that >>are xml Files based on the MAGE-Shem >>-- more : http://www.mged.org/Workgroups/MAGE/mage.html ) in our system >>and vice versa. >>So i get on the modul Bio::MAGE, which has already an implemented >>Eventhandler which can handle XML-Events (thrown from the Xerces >>XML-Parser). This Evendhalder ( the directory for the PerlModule : >>/somwehre/site_perl/Bio/MAGE/XMLUtils.pm ) then stores the Input into a >>mysqlDB. >> >> >> >>Did someone of did already near with these classes? Ive searched alle the >>way long from goolge to bioperl but nowhere can i find a cool HowTo, >>DiveIn or at least a useable API. Becouse im new in Perl i need Examples >>which use Bio::MAGE::XMLUtils ....... >> >> >>SO thx a lot.....and please give me esamples or hints ! >> >>Sebastian >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >-- >------------------------------------------------------------- >Hilmar Lapp email: lapp at gnf.org >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >------------------------------------------------------------- > > _________________________________________________________________ E-Mails sind Ihnen nicht schnell genug? http://messenger.msn.de MSN Messenger - Kommunikation in Echtzeit From awitney at sghms.ac.uk Sat Dec 13 09:27:55 2003 From: awitney at sghms.ac.uk (Adam Witney) Date: Sat Dec 13 09:35:07 2003 Subject: [Bioperl-l] Bio::MAGE In-Reply-To: Message-ID: You can look at the MAGEstk pages at http://mged.sourceforge.net http://mged.sourceforge.net/software/index.php There is some documentation and a few examples on the pages there. Also on that page you will find links to the mailing lists Cheers adam >> I may have lost track, but to my knowledge Bio::MAGE is not part of >> bioperl. Have you checked out the MAGEstk repository on sourceforge? >> >> -hilmar >> > Ok..i think you're right! Bio::MAGE is part of Perl5.8, but not of > BioPerl.....so i think i will leave this mail list due to it's a little bit > to advanced for me to understand ! > > So...thx for help anyway.....and still much fun by developing! > > By the way...i've foudn a mail-list dicussing only the mged-mage-theme: > http://lists.sourceforge.net/lists/listinfo/mged-mage-perl > > Cheers, > Sebastian > > > > > > >> On Wednesday, December 10, 2003, at 11:43 AM, Sebastian J?nemann wrote: >> >>> Hi! >>> >>> Ive just described to this list...so im new in perl and of curse in >>> handling Bioplerl Modules! >>> >>> Im working in a Project where we try to store genetik information and make >>> them reachable via internet. >>> Its on me to build a possibility for an user to import MAGE-Files ( that >>> are xml Files based on the MAGE-Shem >>> -- more : http://www.mged.org/Workgroups/MAGE/mage.html ) in our system >>> and vice versa. >>> So i get on the modul Bio::MAGE, which has already an implemented >>> Eventhandler which can handle XML-Events (thrown from the Xerces >>> XML-Parser). This Evendhalder ( the directory for the PerlModule : >>> /somwehre/site_perl/Bio/MAGE/XMLUtils.pm ) then stores the Input into a >>> mysqlDB. >>> >>> >>> >>> Did someone of did already near with these classes? Ive searched alle the >>> way long from goolge to bioperl but nowhere can i find a cool HowTo, >>> DiveIn or at least a useable API. Becouse im new in Perl i need Examples >>> which use Bio::MAGE::XMLUtils ....... >>> >>> >>> SO thx a lot.....and please give me esamples or hints ! >>> >>> Sebastian >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> > > _________________________________________________________________ > E-Mails sind Ihnen nicht schnell genug? http://messenger.msn.de MSN > Messenger - Kommunikation in Echtzeit > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From jason at cgt.duhs.duke.edu Sat Dec 13 10:14:28 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Dec 13 10:20:41 2003 Subject: [Bioperl-l] Experiences from a newbie In-Reply-To: <20031212193919.GA22391@hgt.mcb.uconn.edu> References: <20031212193919.GA22391@hgt.mcb.uconn.edu> Message-ID: On Fri, 12 Dec 2003, Andreas Bernauer wrote: > Brian Osborne wrote: > > Andreas, > > > > >As I din't want to look at the actual sequences and only needed the > > >hits themselves, I told blast to omit the detailed listing of the > > >alignments. This turned out to be a mistake, as for some reason that > > >I can never think of, I can only get HITS in $result, when I include The reason would be the parser expects a certain subset of the many possible blast formats. I have improved the ability of the SearchIO::blast to read some of the variants of BLAST but I have not tried every possible parameter combination. As with all of the modules, we have developed the tools that work for us, if there are things you would like to see supported, feature requests with and example report at bugzilla.open-bio.org is the best way to document what you would like. If you want just query,hit,evalue,query start, query end..., etc I would suggest you just use the -m 8 output from NCBI blast and use the SearchIO::blasttable - it is simpler - BLAST even runs a bit faster, and will use up less disk space. If you are using WU-BLAST I typically integrate a script with SearchIO as a wrapper around a BLAST process to grab this simple stuff as well. > > >the alignments (which are accessed via HSP which I never use). Again > > >something that I cannot access with my logic. > > > > I couldn't understand this. Could you restate this? or tell us exactly what > > information you wanted to get from the Hit object? > > 0Sure: > > Let's have a look at this excerpt from a BLAST search result: > > > BLASTP 2.2.1 [Apr-13-2001] > > ## header omitted > > ## This is the HIT LIST: > Score E > Sequences producing significant alignments: (bits) Value > > gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [ Pr... 63 2e-10 > gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain [Pro... 63 2e-10 > ## ... etc. etc. ... > gi|33237342|ref|AAP99410.1| Septum formation inhibitor [ Prochlo... 31 0.60 > > > ## this is the DETAIL LIST: > > >gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [ > Prochlorococcus marinus subsp. marinus str. CCMP1375 ] > Length = 385 > > Score = 62.8 bits (151), Expect = 2e-10 > Identities = 71/348 (20%), Positives = 153/348 (43%), Gaps = 45/348 (12%) > > Query: 38 VKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEE---IITIQEKDQTLLVKTKKTSINLN 94 > ++ +L+ +I+ G++ V K+F ++I+ + E ++ + + + +K+K + + > Sbjct: 55 IQTSLSASIE--SSGAITVPSKLFGEIISKLSSESSITLSTDDSSEQVNLKSKSGNYQVR 112 > > ## etc. etc. > > Query: 315 NISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVP 362 > I+FN LL+ ++ E+N I F + + T E +++P > Sbjct: 333 QIAFNSRYLLEGLKIIETNTILLKFNAPTTPAIFTPNDETNFVYLVMP 380 > > > >gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain > [Prochlorococcus marinus subsp. pastoris str. CCMP1378] > Length = 385 > > Score = 62.8 bits (151), Expect = 2e-10 > Identities = 74/323 (22%), Positives = 150/323 (45%), Gaps = 38/323 (11%) > > Query: 36 FSVKCNLNKNID--ILEQGSLIVKGKIFNDLINGIKEEI---ITIQEKDQTLLVKTKKTS 90 > F + + + D + + G++ + K+ ++++N + E + + E +L+K+ + S > Sbjct: 49 FDLNLGIQTSFDATVNKSGAITIPSKLLSEIVNKLPSETPVSLDVDESSDNILIKSDRGS 108 > > ## etc. etc. > > > I only need the gi number of the protein that was hit, the score and > the evalue. I don't need the alignment or anything else from the > DETAIL list. So I told blast to omit the detail list. (Idea: I am > searching through several whole genomes, so this would also save me a > significant amount of drive space.) But when I read the blast file > with the hit list omitted, bioperl didn't report any hits at all, > although the file still contained the hit list. > > I found this counterintuitive, as I expected the HIT object to give me > the members of the HIT list, and the HSP object to give me more > information about the alignment, i.e. informations about the members > of the DETAIL list. > > When I told blast, no to omit the DETAIL list, everything worked fine. > Because we make connections between this summary list and the details the parser initially expected to find the details below to 'finish off building' the Hit objects. I have since added some code on the main trunk which I think handles this case okay - I believe - I haven't been testing that aspect of late. If you write a test for t/SearchIO.t and a test report I can make sure it supported in future versions. But remember - in all of these cases you have LESS information so HSP objects will not be constructed - you can only call the methods $hit->name, $hit->description, $hit->significance, $hit->raw_score. There will be no hit->length as that is not listed in the summary part of the report. > As I've written previously, I don't use the HSP object at all (get_gi > extracts the gi number, $blastId refers to the blast $file): > [for the more advanced folks] You can also use a speedup if you only want to get hits - where you can attach a different kind of event listener than the default ResultBuilder - see SearchIO::FastHitEventBuilder which won't even build HSP objects in the first place for you. As for what you actually want from the report below as below, I would strongly suggest you just use -m 8 output from BLAST as you'll just get exactly what you want in column format and you can either use SearchIO::blasttable to parse it or write your own that looks something like this: while(<>) { my ($qname,$hname, ... ) = split(/\t/,$_); } > use Bio::SearchIO; > # ...omitted for breviety... > my $report = new Bio::SearchIO (-format => 'blast', > -file => $file); > > while (my $result = $report->next_result ) { > print "Next ($blastId) result (" . $result->num_hits . " hits)\n"; > my $hitNo = 0; > while (my $hit = $result->next_hit) { > $hitNo++; > printf "%s\t%s\t%s\t%d\t%d\n", > get_gi($result->query_name), get_gi($hit->name), > $hit->significance, $blastId, $hitNo; > } > } > > > > > Andreas. > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Sat Dec 13 10:44:43 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Dec 13 10:50:56 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: Message-ID: AlignIO.t - passes fine for me on all platforms for latest bioperl code so bug have been fixed. ESEfinder & ELM ESEfinder is failing because it requires that HTML::HeadParser also be installed - I have added some bulletproofing code in module so it will at least throw an error installing fink pkg html-parser-pm solves problem for me, so a prereq I suppose. I'm adding some prereqs to the Makefile.PL - need to see how that operates in real life though.. Not had time to investidate the rest. -jason On Fri, 12 Dec 2003, Koen van der Drift wrote: > Hi, > > The following tests fail on Mac OS X (10.3.1) for the 1.303 release: > > AlignIO.t # 10 > ELM.t # 11, 13, 14 > ESEfinder.t # 4-5, 8, 11 > RefSeq.t # 4-7, 9 > RestrictionIO.t # 7-14 > tutorial.t # 19-21 > > > The tutorial test also complained that Bio::SeqIO: game cannot be found. > > > Maybe there are some modules I am missing on my system? > > > > thanks, > > > - Koen. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From pm66 at nyu.edu Sat Dec 13 12:25:51 2003 From: pm66 at nyu.edu (Philip E Macmenamin) Date: Sat Dec 13 12:31:51 2003 Subject: [Bioperl-l] Bio::DB::GFF dna method not working for wormbase115. Message-ID: <25f800025f2879.25f287925f8000@homemail.nyu.edu> Hi, In all other copies of wormbase the fref part of the fdna table has the chromosome as: I ... X. The fref for wormbase115 is Chromosome_I ... Chromosom_X. When the GFF program asks for dna it passes the chromosome ref as just I or X, and not finding anything. This could be fixed in the db by changing the entries in the fdna table. Or by changing the orig sequence files back to the previous Wormbase format (ie I, II, III, etc). Or by tricking around with Lincoln's load_gff script. Philip. ----- Original Message ----- From: Lincoln Stein Date: Friday, December 12, 2003 4:12 pm Subject: Re: [Bioperl-l] Bio::DB::GFF dna method not working for wormbase115. > It's working OK with me on Wormbase 115. One issue is that DNA is > numbered from 1 onward, not from 0. > > Lincoln > > On Wednesday 10 December 2003 01:24 pm, Philip MacMenamin wrote: > > Hi, > > I have just loaded wormbase fatsa files to a GFF SQL database using > > Lincolns load_gff script, and everything was fine. > > > > However when I try to get the dna back out, using the same script > > that worked (works) for wormbase110 does not work now, ie: > > > > use Bio::DB::GFF; > > doConection stuff blah blah blah; > > my $segment1 = $db->segment('I',0, 2000); > > my $dna = $segment1->dna; > > print $dna if $debug; > > > > I can step through the perl debugger and find out how the story is > > differant between version 110 and 115, but at first look the > > databases seem similar. > > > > I looked through the mail archives to see if others have had this > > problem, but drew a blank, so I want to make sure that this is > > known. > > > > All the best, > > Philip. > From kvddrift at earthlink.net Sat Dec 13 16:16:20 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat Dec 13 16:23:16 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: Message-ID: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> On Dec 13, 2003, at 10:44 AM, Jason Stajich wrote: > > AlignIO.t - passes fine for me on all platforms for latest bioperl code > so bug have been fixed. > > ESEfinder & ELM > > ESEfinder is failing because it requires that HTML::HeadParser also > be installed - I have added some bulletproofing code in module so it > will > at least throw an error > > installing fink pkg html-parser-pm solves problem for me, so a prereq > I > suppose. Weird, I have it installed too, but these two tests still fail. > > I'm adding some prereqs to the Makefile.PL - need to see how that > operates in real life though.. > > > Not had time to investidate the rest. I'd like to help, but am not sure how. Is there a way to run an individual test without using 'make test' ? thanks, - Koen. From redwards at utmem.edu Sat Dec 13 16:38:56 2003 From: redwards at utmem.edu (Rob Edwards) Date: Sat Dec 13 16:44:59 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: Message-ID: On Friday, December 12, 2003, at 08:12 PM, Koen van der Drift wrote: > Hi, > > The following tests fail on Mac OS X (10.3.1) for the 1.303 release: > > AlignIO.t # 10 > ELM.t # 11, 13, 14 > ESEfinder.t # 4-5, 8, 11 > RefSeq.t # 4-7, 9 > RestrictionIO.t # 7-14 > tutorial.t # 19-21 > > Mine and Peter's changes to Restriction modules will be submitted tomorrow and will fix these tests that fail (as well as adding new tests...). Rob From jason at cgt.duhs.duke.edu Sat Dec 13 17:04:52 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Dec 13 17:10:59 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> References: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> Message-ID: On Sat, 13 Dec 2003, Koen van der Drift wrote: > > On Dec 13, 2003, at 10:44 AM, Jason Stajich wrote: > > > > > AlignIO.t - passes fine for me on all platforms for latest bioperl code > > so bug have been fixed. > > > > ESEfinder & ELM > > > > ESEfinder is failing because it requires that HTML::HeadParser also > > be installed - I have added some bulletproofing code in module so it > > will > > at least throw an error > > > > installing fink pkg html-parser-pm solves problem for me, so a prereq > > I > > suppose. > > Weird, I have it installed too, but these two tests still fail. > > you sure it is the right version for your perl? perldoc HTML::Entities definitely works? > > > > I'm adding some prereqs to the Makefile.PL - need to see how that > > operates in real life though.. > > > > > > Not had time to investidate the rest. > > I'd like to help, but am not sure how. Is there a way to run an > individual test without using 'make test' ? > % make test_ESEfinder OR % perl -I. -w t/ESEfinder.t -jason > > thanks, > > - Koen. > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From andreas.bernauer at gmx.de Sat Dec 13 14:48:25 2003 From: andreas.bernauer at gmx.de (Andreas Bernauer) Date: Sat Dec 13 18:34:41 2003 Subject: [Bioperl-l] Experiences from a newbie In-Reply-To: References: <20031212193919.GA22391@hgt.mcb.uconn.edu> Message-ID: <20031213194825.GA25949@hgt.mcb.uconn.edu> Jason Stajich wrote: > If you want just query,hit,evalue,query start, query end..., etc I would > suggest you just use the -m 8 output from NCBI blast and use the > SearchIO::blasttable - it is simpler - BLAST even runs a bit faster, and > will use up less disk space. Hey, thanks for this hint. Looks like what I wanted to have :-) Andreas. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031213/ead79a46/attachment.bin From kvddrift at earthlink.net Sun Dec 14 10:47:56 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun Dec 14 10:54:53 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> Message-ID: On Dec 13, 2003, at 5:04 PM, Jason Stajich wrote: >> Weird, I have it installed too, but these two tests still fail. >> >> > you sure it is the right version for your perl? perldoc HTML::Entities > definitely works? Yes - it's version 3.27 on 10.3.1, also installed with fink. >> I'd like to help, but am not sure how. Is there a way to run an >> individual test without using 'make test' ? >> > > % make test_ESEfinder > > OR > > % perl -I. -w t/ESEfinder.t thanks, I'll try that. BTW, I am the maintainer of bioperl for fink. I am using the 1.303 release to test for the next package version. One thing that I noticed is that the scripts are directly installed into /sw/bin instead of in the directory that is actually used to build the package (/sw/src/root-bioperl-pm/sw/bin/). I have looked at Makefile.PL, but my knowledge of perl is not that good that I could figure out why this happens. Is there a way that I can patch Makefile.PL so that the scripts get installed in the place required for fink? thanks, - Koen. From jason at cgt.duhs.duke.edu Sun Dec 14 11:57:12 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Dec 14 12:03:20 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> Message-ID: On Sun, 14 Dec 2003, Koen van der Drift wrote: > > On Dec 13, 2003, at 5:04 PM, Jason Stajich wrote: > > >> Weird, I have it installed too, but these two tests still fail. > >> > >> > > you sure it is the right version for your perl? perldoc HTML::Entities > > definitely works? > > Yes - it's version 3.27 on 10.3.1, also installed with fink. > > > >> I'd like to help, but am not sure how. Is there a way to run an > >> individual test without using 'make test' ? > >> > > > > % make test_ESEfinder > > > > OR > > > > % perl -I. -w t/ESEfinder.t > > thanks, I'll try that. > > > BTW, I am the maintainer of bioperl for fink. I am using the 1.303 > release to test for the next package version. > > One thing that I noticed is that the scripts are directly installed > into /sw/bin instead of in the directory that is actually used to build > the package (/sw/src/root-bioperl-pm/sw/bin/). I have looked at > Makefile.PL, but my knowledge of perl is not that good that I could > figure out why this happens. Is there a way that I can patch > Makefile.PL so that the scripts get installed in the place required for > fink? I think when you run perl Makefile.PL you want to give it perl Makefile.PL PREFIX=/sw/src/root-bioperl-pm/sw See how LWP or other pkgs which have associated scripts in them install I guess. > thanks, > > > - Koen. > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From kvddrift at earthlink.net Sun Dec 14 14:28:41 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun Dec 14 14:35:32 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: Message-ID: >> tutorial.t # 19-21 >> >> >> The tutorial test also complained that Bio::SeqIO: game cannot be >> found. >> This was solved after installing xml-writer (through fink). - Koen. From kvddrift at earthlink.net Sun Dec 14 14:28:35 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun Dec 14 14:35:35 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> Message-ID: On Dec 14, 2003, at 11:57 AM, Jason Stajich wrote: > I think when you run perl Makefile.PL you want to give it > perl Makefile.PL PREFIX=/sw/src/root-bioperl-pm/sw > > See how LWP or other pkgs which have associated scripts in them > install I > guess. > LWP is not a fink package, but I found another one that uses INSTALLSCRIPT which is passed to make install. Now it works :) I am also wondering how many external modules I should add to the dependencies. I understand that missing modules will only result in less functionality for bioperl. One thing I did not put in sofar are database related modules. Is that something that most users won't miss? Are there other modules that you (or other bioperl developers) think are really needed? thanks again, - Koen. From kdj at sanger.ac.uk Sun Dec 14 15:11:44 2003 From: kdj at sanger.ac.uk (Keith James) Date: Sun Dec 14 15:17:40 2003 Subject: [Bioperl-l] Blast return codes In-Reply-To: References: Message-ID: >>>>> "Matthew" == Matthew Laird writes: Matthew> I did some more investigating and it's beginning to look Matthew> a lot more bizarre. I changed the bioperl code to see if Matthew> the blast command was actually run regardless of the -1 Matthew> return code. And yes, blast did run and the result file Matthew> is there - it's only that perl is returning a -1 to Matthew> bioperl for some reason. Matthew> I was actually just speaking with someone in another lab Matthew> and he said he used to experience this problem too, his Matthew> solution was to just write his own modules. :) Matthew> So this certainly seems like some deep down perl/OS Matthew> voodoo. Unfortunately, yes :( Especially as building Perl from source rather than using RPMs seems to fix it. Matthew> I'd still be interested in hearing what envirnoment Matthew> variables to set to run t/StandAloneBlast.t. I think it should pick up the standard blast environment variables BLASTDIR and BLASTDATADIR via %ENV. Keith -- - Keith James Microarray Facility, Team 65 - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From heikki at nildram.co.uk Sun Dec 14 15:14:40 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Sun Dec 14 15:20:36 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: Message-ID: <200312142014.40132.heikki@nildram.co.uk> Koen, What do you mean by database related modules? DBI::mysql? (You are not meaning bioperl-db, are you?) You might want to follow up what Chris does every time there s a new release. He updates a CPAN bundle: Bundle-BioPerl which contains links into all bioperl-core dependencies. Users can install them all using one cpan command. -Heikki On Sunday 14 Dec 2003 7:28 pm, Koen van der Drift wrote: > I am also wondering how many external modules I should add to the > dependencies. I understand that missing modules will only result in > less functionality for bioperl. One thing I did not put in sofar are > database related modules. Is that something that most users won't miss? > Are there other modules that you (or other bioperl developers) think > are really needed? > > > thanks again, > > > - Koen. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From dag at sonsorol.org Sun Dec 14 15:21:32 2003 From: dag at sonsorol.org (Chris Dagdigian) Date: Sun Dec 14 15:33:56 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: <200312142014.40132.heikki@nildram.co.uk> References: <200312142014.40132.heikki@nildram.co.uk> Message-ID: <3FDCC64C.70200@sonsorol.org> It's past time for a new Bundle::Bioperl release in CPAN. I'll work with the release developers to make sure the bundle is in sync with the next release. -c Heikki Lehvaslaiho wrote: > Koen, > > What do you mean by database related modules? DBI::mysql? (You are not meaning > bioperl-db, are you?) > > You might want to follow up what Chris does every time there s a new release. > He updates a CPAN bundle: Bundle-BioPerl which contains links into all > bioperl-core dependencies. Users can install them all using one cpan command. > > -Heikki > > > > On Sunday 14 Dec 2003 7:28 pm, Koen van der Drift wrote: > >>I am also wondering how many external modules I should add to the >>dependencies. I understand that missing modules will only result in >>less functionality for bioperl. One thing I did not put in sofar are >>database related modules. Is that something that most users won't miss? >>Are there other modules that you (or other bioperl developers) think >>are really needed? >> >> >>thanks again, >> >> >>- Koen. >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Chris Dagdigian, Independent life science IT & informatics consulting Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net From kvddrift at earthlink.net Sun Dec 14 16:24:49 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun Dec 14 16:31:50 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: <200312142014.40132.heikki@nildram.co.uk> References: <200312142014.40132.heikki@nildram.co.uk> Message-ID: On Dec 14, 2003, at 3:14 PM, Heikki Lehvaslaiho wrote: > What do you mean by database related modules? DBI::mysql? (You are not > meaning > bioperl-db, are you?) Yes, db-mysql, acedb. > > You might want to follow up what Chris does every time there s a new > release. > He updates a CPAN bundle: Bundle-BioPerl which contains links into all > bioperl-core dependencies. Users can install them all using one cpan > command. > That's how fink works too. However, not all modules are available for fink (eg ace-db). - Koen. From lhaifeng at dso.org.sg Sun Dec 14 20:44:25 2003 From: lhaifeng at dso.org.sg (Liu Haifeng) Date: Sun Dec 14 20:53:53 2003 Subject: [Bioperl-l] repost the problem --- Re: bl2seq hang and its performace Message-ID: <001b01c3c2ac$f8d64020$706712ac@GENETHON> Anyone can help? Really urgent! Haifeng Liu ----- Original Message ----- From: "Liu Haifeng" To: Sent: 2003?12?12? 14:49 Subject: bl2seq hang and its performace > Hi all, > > I noticed that one of my program written using bioperl-1.2.3 runs very slow > and consumes huge memory, and I doubted that it is due to the call of bl2seq > in the program. Thus, I wrote a small program (bl2seq sequences against > themselves from a fasta file) below to see if it is the ture: > > > #!/usr/bin/perl -w > use Bio::SeqIO; > use Bio::Tools::Blast; > use Bio::Tools::Run::StandAloneBlast; > use Bio::Tools::BPlite; > > my $infile =shift; > my $sno=0; > my $blastalgo="blastp"; #blastp ,blastx, tblastn, tblastx > my $pin = Bio::SeqIO->new('-file' => "$infile", '-format' => > 'Fasta'); > while ( my $proseq = $pin -> next_seq()) { > $sno++; > print "bl2seq $sno ..............................\n"; > my @params=('program' => $blastalo); > my $factory= Bio::Tools::Run::StandAloneBlast->new(@params); > $factory->io->_io_cleanup(); > my $report=$factory->bl2seq($proseq, $proseq); > while (my $hsp=$report->next_feature) { > #only need the first hsp > $report->close(); > } > undef $report; > } > print "running is over\n"; > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > The program runs ok for the small fastat file. However, when I input a > fasat file around 2.6M containing 10,000 protein sequences, the program > hangs when it compare the 1782th sequence. Also I noticed that the program > has consume 12M of memory at that time. I searched the archive that there > have been similar bl2seq problem occurred. However, it should have been > solved in the latest version. > > Anyone can show me some clues to improve the performance of calling bl2seq? > Thank you. > > Regards > Haifeng Liu > > > From jason at cgt.duhs.duke.edu Sun Dec 14 21:28:26 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Dec 14 21:34:35 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> Message-ID: On Sun, 14 Dec 2003, Koen van der Drift wrote: > > On Dec 14, 2003, at 11:57 AM, Jason Stajich wrote: > > > I think when you run perl Makefile.PL you want to give it > > perl Makefile.PL PREFIX=/sw/src/root-bioperl-pm/sw > > > > See how LWP or other pkgs which have associated scripts in them > > install I > > guess. > > > > LWP is not a fink package, but I found another one that uses > INSTALLSCRIPT which is passed to make install. > I think it is libwwww-pm actually. > Now it works :) > > I am also wondering how many external modules I should add to the > dependencies. I understand that missing modules will only result in > less functionality for bioperl. One thing I did not put in sofar are > database related modules. Is that something that most users won't miss? > Are there other modules that you (or other bioperl developers) think > are really needed? definitely io-string, libwwww-pm. gd-pm would be nice as well. xml-parser-pm libxml-pm also are good I think. Beauty of fink is it should be pretty easy for these dependancies to be pulled in. acedb is pretty minor and not used much so I wouldn't worry about that. > > > thanks again, > > > - Koen. > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From redwards at utmem.edu Sun Dec 14 22:49:38 2003 From: redwards at utmem.edu (Rob Edwards) Date: Sun Dec 14 22:55:32 2003 Subject: [Bioperl-l] Bio::Restriction Message-ID: I have just added the latest version of Bio::Restriction to cvs. The changes here are mainly to Restriction::Analysis, and should speed up looking at longer sequences, and adding functionality suggested by Peter. Restriction::Analysis should correctly handle the following: - overlapping sites - sites at or near the join points of circular sequences - non-palindromic restriction enzymes - enzymes that cut at more than one site - enzymes that cut more than once per site Restriction::Analysis should be more suitable for handling much larger sequences, using less memory and (hopefully) being faster than previous versions, especially for collections of non-ambiguous enzymes. The only thing that is stored for all enzymes are the cut positions (integers), and some statistics about how frequently each enzyme cuts and so forth. You can still get fragments and fragment maps out, they're just generated each time they are requested for each enzyme. The code should be cleaner and more readable too, now! I also fixed the tests that were failing with RestrictionIO. I removed the dependency on the caret (^) in the cut site, and now Analysis.pm just uses the values returned by cut and complementary_cut. Next on the list is to clean up Enzyme.pm and remove the caret unless it is essential. I also submitted the beginnings of a bairoch format rebase IO filter. This is barely limping in at the moment so use with caution, but this is the format that MacVector, VectorNTI, and PC/Gene support so I wanted to add this in. Rob From redwards at utmem.edu Sun Dec 14 23:04:15 2003 From: redwards at utmem.edu (Rob Edwards) Date: Sun Dec 14 23:10:08 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: Message-ID: The Scansite test 7 is taking for ever to run tonight (is the server down, and can/should it be timed it out somehow?) Other than that, the following failed on OS X (I know several of these have been discussed in this thread): Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- t/Coalescent.t 255 65280 11 10 90.91% 2-11 t/GuessSeqFormat.t 46 2 4.35% 10-11 t/SeqIO.t 248 4 1.61% 61 159 196 198 t/game.t 255 65280 23 22 95.65% 2-23 t/hmmer.t 134 2 1.49% 42 85 24 subtests skipped. Failed 5/179 test scripts, 97.21% okay. 40/8279 subtests failed, 99.52% okay. Rob From lairdm at sfu.ca Mon Dec 15 00:53:59 2003 From: lairdm at sfu.ca (Matthew Laird) Date: Mon Dec 15 00:56:36 2003 Subject: [Bioperl-l] Blast return codes In-Reply-To: Message-ID: On 14 Dec 2003, Keith James wrote: > Matthew> So this certainly seems like some deep down perl/OS > Matthew> voodoo. > > Unfortunately, yes :( Especially as building Perl from source rather > than using RPMs seems to fix it. Unfortunately it doesn't seem to always fix it... I'm not sure what to do that this point. > Matthew> I'd still be interested in hearing what envirnoment > Matthew> variables to set to run t/StandAloneBlast.t. > > I think it should pick up the standard blast environment variables > BLASTDIR and BLASTDATADIR via %ENV. Those are set, the two variables it's looking for are: nt_database and amino_database_file For the StandAloneBlast.t, what should those be set to? Then I can run the test script and see what happens... Thanks. -- Matthew Laird SysAdmin/Web Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From m_conte at hotmail.com Mon Dec 15 03:25:29 2003 From: m_conte at hotmail.com (matthieu CONTE) Date: Mon Dec 15 03:31:26 2003 Subject: [Bioperl-l] biopipe/Pipeline Manager Message-ID: Hello, I working on Biopipe by using Pipeline Manager. I work with modified version of "phylip_tree_pipeline.xml" http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-pipeline/xml/examples/xml/phylip_tree_pipeline.xml?cvsroot=bioperl My problem is that I do not have only one input file but 1128 differents files in input ! My input files are multifasta with a set of protein from Arabidopsis thaliana and Oryza sativa containing a specific PFAM motif. The path for example for the motif PF00012 is: /PF00012/fasta_PF00012/PF00012.fa and so on for the 1128 motifs. What I want to do is to loop on the 1128 PFAM files. Enter the PFAM number in a let's say $pf Start the job with the Path /$pf/fasta_$pf/$pf.fa file But I don't find any way to work with a non global variable Does somebody have an idea? Is it possible to modify Pipeline Manager so that it can hang argument an input file? _________________________________________________________________ MSN Messenger : discutez en direct avec vos amis ! http://www.msn.fr/msger/default.asp From heikki at nildram.co.uk Mon Dec 15 06:46:44 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Mon Dec 15 06:53:30 2003 Subject: [Bioperl-l] Bio::Restriction In-Reply-To: References: Message-ID: <200312151146.44879.heikki@nildram.co.uk> Great! Thanks, Rob and Peter! -Heikki On Monday 15 Dec 2003 3:49 am, Rob Edwards wrote: > I have just added the latest version of Bio::Restriction to cvs. The > changes here are mainly to Restriction::Analysis, and should speed up > looking at longer sequences, and adding functionality suggested by > Peter. > > Restriction::Analysis should correctly handle the following: > > - overlapping sites > - sites at or near the join points of circular sequences > - non-palindromic restriction enzymes > - enzymes that cut at more than one site > - enzymes that cut more than once per site > > Restriction::Analysis should be more suitable for handling much larger > sequences, using less memory and (hopefully) being faster than previous > versions, especially for collections of non-ambiguous enzymes. > > The only thing that is stored for all enzymes are the cut positions > (integers), and some statistics about how frequently each enzyme cuts > and so forth. You can still get fragments and fragment maps out, > they're just generated each time they are requested for each enzyme. > > The code should be cleaner and more readable too, now! > > I also fixed the tests that were failing with RestrictionIO. > > I removed the dependency on the caret (^) in the cut site, and now > Analysis.pm just uses the values returned by cut and complementary_cut. > Next on the list is to clean up Enzyme.pm and remove the caret unless > it is essential. > > I also submitted the beginnings of a bairoch format rebase IO filter. > This is barely limping in at the moment so use with caution, but this > is the format that MacVector, VectorNTI, and PC/Gene support so I > wanted to add this in. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From amackey at pcbi.upenn.edu Mon Dec 15 08:01:32 2003 From: amackey at pcbi.upenn.edu (Aaron J Mackey) Date: Mon Dec 15 08:07:26 2003 Subject: [Bioperl-l] Re: [Bioperl-guts-l] bioperl commit In-Reply-To: <200312151150.hBFBoch7026721@pub.open-bio.org> Message-ID: $patch =~ s/E/E/g; On Mon, 15 Dec 2003, Heikki Lehvaslaiho wrote: > > heikki > Mon Dec 15 06:50:38 EST 2003 > Update of /home/repository/bioperl/bioperl-live/Bio/Graphics > In directory pub.open-bio.org:/tmp/cvs-serv26698/Bio/Graphics > > Modified Files: > Glyph.pm Panel.pm > Log Message: > pod fixes > > bioperl-live/Bio/Graphics Glyph.pm,1.55,1.56 Panel.pm,1.66,1.67 > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/Graphics/Glyph.pm,v > retrieving revision 1.55 > retrieving revision 1.56 > diff -u -r1.55 -r1.56 > --- /home/repository/bioperl/bioperl-live/Bio/Graphics/Glyph.pm 2003/12/13 17:17:50 1.55 > +++ /home/repository/bioperl/bioperl-live/Bio/Graphics/Glyph.pm 2003/12/15 11:50:38 1.56 > @@ -1127,7 +1127,7 @@ > This is similar to add_feature(), but the list of features is treated > as a group and can be configured as a set. > > -=item $glyph->Efinished > +=item $glyph-Efinished > > When you are finished with a glyph, you can call its finished() method > in order to break cycles that would otherwise cause memory leaks. > > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/Graphics/Panel.pm,v > retrieving revision 1.66 > retrieving revision 1.67 > diff -u -r1.66 -r1.67 > --- /home/repository/bioperl/bioperl-live/Bio/Graphics/Panel.pm 2003/12/13 17:17:50 1.66 > +++ /home/repository/bioperl/bioperl-live/Bio/Graphics/Panel.pm 2003/12/15 11:50:38 1.67 > @@ -1264,7 +1264,7 @@ > $width = gdMediumBoldFont->width * length($longest_key) +3; > > In order to obtain scalable vector graphics (SVG) output, you should > -pass new() the -image_class=>'GD::SVG' parameter. This will cause > +pass new() the -image_class=E'GD::SVG' parameter. This will cause > Bio::Graphics::Panel to load the optional GD::SVG module. See the gd() > and svg() methods below for additional information. > > @@ -1495,7 +1495,7 @@ > height() methods first to ensure that the image has sufficient > dimensions. > > -If you passed new() the -image_class=>'GD::SVG' parameter, the gd() method > +If you passed new() the -image_class=E'GD::SVG' parameter, the gd() method > returns a GD::SVG::Image object. This object overrides GD::Image > methods in order to generate SVG output. It behaves exactly as > described for GD::Image objects with one exception: it implements and > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > From cain at cshl.org Mon Dec 15 11:50:28 2003 From: cain at cshl.org (Scott Cain) Date: Mon Dec 15 11:56:21 2003 Subject: [Bioperl-l] Re: Changes to GFF 2.5 "unflattening" code In-Reply-To: References: Message-ID: <1071507028.1469.34.camel@localhost.localdomain> On Fri, 2003-12-12 at 18:58, Chris Mungall wrote: > Nice one Scott! > > I imagine this script would be v useful to plenty of non GMOD/chado folks. > Is there anything chado or GMOD specific about this? can we add it to > bioperl instead of GMOD? (IMHO there are far too few scripts in bioperl, > which is fine for the hardcore object-heads who'll roll up their own in a > few minutes, but not so great for new users) I agree, and I intend on moving my script from GMOD to bioperl when it is ready for prime time--it's not there yet. > > What do you think of rolling some of the logic up from the script into > bioperl modules? For example, the typemapping stuff could go into > Bio::SeqFeature::Tools::TypeMapper, which already has a method for mapping > to the Sequence Ontology I think TypeMapper is fine (that is, now that I know about it), but it doesn't solve the fundamental problem of letting users know how things should be mapped so as to be consistent with both what was intended by the original authors of the db entry and with how other people will interpret it when converting formats. I am thinking there may need to be an online resource much like SO that gives "standard" mappings, allowing individual users to override them. > > Mapping of the SeqFeature nesting hierarchy to GFF ID/Parent tags could > also take place in FeatureHolderI, as discussed on this list the other > week. Yep. > > By the way, what are you doing for parent features that don't have a > natural ID? Are you creating artificial surrogate IDs? Arificial IDs where necesssary, though this is an evolving part of the script (I think it will always use artificial IDs--I don't see a way around that--but the way I am creating them is changing. > > That way we could easily roll out genbank2chadoxml, genbank2ensembl, > genbank2game, genbank2das, genbank2biosql and fastafile generators like > genbank2intron_fasta, genbank2spliced_utr_fasta, genbank2exon_fasta, > genbank2intergenic_fasta, genbank2my_favourite_SO_type_fasta and so on - I > think this is the sort of thing people are really often after when they > start downloading and wrestling with the bioperl object model. > > By the way, we often use genbank, when what we really mean is > genbank/eml(/ddbj?). is there a handy short catchy name for this > collective, or shall we carry on just using the term genbank to denote the > collection of genbank-like formats? I'm fine with continuing to refer to Genbank/EMBL/DDBJ as Genbank. It's just shorter than 'Genbank/EMBL/DDBJ'. > > This is all incredibly useful stuff in my opinion - for ages we've been > able to say "we have a parser for format X" in bioperl, but really it's > still been a semantic quagmire, the parsing is just the first step. > > Cheers > Chris > > > On Fri, 12 Dec 2003, Scott Cain wrote: > > > Lincoln and Sheldon, > > > > For your information, I wrote a new genbank2gff3.pl script for use with > > the pending GMOD release. I anticipate that it will form the foundation > > for rewriting the biofetch adaptor. It uses Unflattener.pm and seems to > > work for the organisms I tested (human, worm, fly, mosquito, and > > Ecoli). It is in the GMOD cvs in the schema repository at > > schema/chado/load/bin/genbank2gff.PLS. > > > > Scott > > > > On Fri, 2003-12-12 at 10:56, bioperl-l-request@portal.open-bio.org > > wrote: > > > Hi Mark, Sheldon, > > > > > > I saw your change to the _parse_gff2_group code in Bio::DB::GFF, which > > > prioritizes "gene", "locus_tag" and "transcript" as group fields in > > > the column 9 attributes. I like it, but unfortunately it breaks some > > > other code that I have, including the GMOD tutorial. > > > > > > I think you'll like what I've done instead. I've added a > > > preferred_groups() method to which you pass a list of group names. > > > Then, this list will be used as the priority list to pluck out groups > > > from the GFF2 attribute list. To get your previous behavior, you need > > > to do this: > > > > > > $db = Bio::DB::GFF->new(-preferred_groups=>['gene','locus_tag','transcript'], > > > @other_args); > > > $db->load_gff(...); > > > > > > or this > > > > > > $db = Bio::DB::GFF->new(@other_args); > > > $db->preferred_groups('gene','locus_tag','transcript'); > > > $db->load_gff(...); > > > > > > You'll have to change your existing scripts accordingly. Sure, this > > > should be merged with Chris's unflattener, but then again let's just > > > get to GFF3 as quickly as we possibly can and leave this nightmare > > > behind us! > > > > > > Lincoln > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From neil.saunders at unsw.edu.au Mon Dec 15 12:29:17 2003 From: neil.saunders at unsw.edu.au (Neil Saunders) Date: Mon Dec 15 12:35:13 2003 Subject: [Bioperl-l] bp_genbank2gff problems Message-ID: <20031215172917.GA17947@psychro> I'm having a frustating time with the bp_genbank2gff.pl script. I have 2 systems: (1) Debian sid, perl 5.8.2, latest CVS bioperl-live, bioperl-run and Gbrowse. (2) Debian woody, perl 5.6.1, CVS versions as above. I have written a script that takes a set of GenBank files and pipes them through various processes to generate GFF, Fasta and conf files for use with Gbrowse. On system (1) above, I use: bp_genbank2gff.pl -file -stdout No problems, GFF3 file output appears. On system (2) above, the same command gives errors of the type: ------------- EXCEPTION ------------- MSG: Can't connect to database: Access denied for user: '@localhost' to database 'test' STACK Bio::DB::GFF::Adaptor::dbi::caching_handle::new /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm:89 STACK Bio::DB::GFF::Adaptor::dbi::new /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/dbi.pm:93 STACK Bio::DB::GFF::Adaptor::dbi::mysql::new /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/dbi/mysql.pm:270 STACK Bio::DB::GFF::Adaptor::biofetch::new /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/biofetch.pm:95 STACK Bio::DB::GFF::new /usr/local/share/perl/5.6.1/Bio/DB/GFF.pm:599 STACK toplevel /home/neil/gbrowse/genomes/scripts/bp_genbank2gff.pl:218 -------------------------------------- DBI->connect(test) failed: Access denied for user: '@localhost' to database 'test' at /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm line 139 Clearly the bp_genbank2gff script is trying to access a database 'test' on 'localhost' with no user. I guess my question is: why? I have told it to send to stdout. As I have the same bioperl version on both machines, I'm pretty confused. The only thing I noticed from 'make test' on the woody machine was this failure: t/GFF.t 255 65280 32 24 75.00% 21-32 Relevant? thanks for any pointers, Neil -- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney 2052, Australia http://psychro.bioinformatics.unsw.edu.au/neil/index.php From shawnh at stanford.edu Mon Dec 15 12:50:34 2003 From: shawnh at stanford.edu (Shawn Hoon) Date: Mon Dec 15 12:53:21 2003 Subject: [Bioperl-l] Re: [Bioperl-pipeline] biopipe/Pipeline Manager In-Reply-To: References: Message-ID: <2F081760-2F27-11D8-9AFA-000A95783436@stanford.edu> Its relatively easy to have multiple inputs. Have all your input files in one directory. In the xml, change the input_file parameter to input_dir: like so : input_dir setup_file 1 tag infile input_dir $inputdir SCALAR So what this will do will be to create a job one for each input file. Each file will should contain the multi-fasta files. Let me know if this works for you. hth, shawn On Monday, December 15, 2003, at 12:25AM, matthieu CONTE wrote: > > Hello, > I working on Biopipe by using Pipeline Manager. I work with modified > version of "phylip_tree_pipeline.xml" > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-pipeline/ > xml/examples/xml/phylip_tree_pipeline.xml?cvsroot=bioperl > > My problem is that I do not have only one input file but 1128 > differents files in input ! > > My input files are multifasta with a set of protein from Arabidopsis > thaliana and Oryza sativa > containing a specific PFAM motif. > The path for example for the motif PF00012 is: > /PF00012/fasta_PF00012/PF00012.fa and so on for the 1128 motifs. > > What I want to do is to loop on the 1128 PFAM files. > > Enter the PFAM number in a let's say $pf > > Start the job with the Path /$pf/fasta_$pf/$pf.fa file > > But I don't find any way to work with a non global variable > Does somebody have an idea? > Is it possible to modify Pipeline Manager so that it can hang argument > an input file? > > _________________________________________________________________ > MSN Messenger : discutez en direct avec vos amis ! > http://www.msn.fr/msger/default.asp > > _______________________________________________ > bioperl-pipeline mailing list > bioperl-pipeline@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-pipeline From jason at cgt.duhs.duke.edu Mon Dec 15 12:49:17 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Dec 15 12:55:17 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: Message-ID: On Sun, 14 Dec 2003, Rob Edwards wrote: > The Scansite test 7 is taking for ever to run tonight (is the server > down, and can/should it be timed it out somehow?) > > Other than that, the following failed on OS X (I know several of these > have been discussed in this thread): > > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > t/Coalescent.t 255 65280 11 10 90.91% 2-11 fixed > t/GuessSeqFormat.t 46 2 4.35% 10-11 Sheldon fixed this > t/SeqIO.t 248 4 1.61% 61 159 196 198 fixed - linewraps for features per Ewan's fixes I think? > t/game.t 255 65280 23 22 95.65% 2-23 removed it - Sheldon's new GAME code has tests in SeqIO.t and Brad's old code is now gone. > t/hmmer.t 134 2 1.49% 42 85 fixed > 24 subtests skipped. > Failed 5/179 test scripts, 97.21% okay. 40/8279 subtests failed, 99.52% > okay. > > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Mon Dec 15 13:02:32 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Dec 15 13:08:29 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF Message-ID: We currently have this code in Bio::Root::IO to handle stripping linefeeds $line =~ s/\r\n/\n/g if( (!$param{-raw}) && (defined $line) ); This only matches Mac LF, to handle windows we need to also strip \n\r so I am going to change it to the following: $line =~ s/\r\n/\n/g, $line =~ s/\n\r/\n/g if( (!$param{-raw}) && (defined $line) ); Since this is a core critical module wanted to just post it to see if anyone has objections/suggestions. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From cain at cshl.org Mon Dec 15 13:50:28 2003 From: cain at cshl.org (Scott Cain) Date: Mon Dec 15 13:56:20 2003 Subject: [Bioperl-l] Unflattener and GFF3 questions Message-ID: <1071514227.1471.45.camel@localhost.localdomain> Chris, More Unflattener questions. When I process the Genbank record for AE003644, I produce the following GFF3: AE003644 EMBL/GenBank/SwissProt gene 20111 23268 . + . ID=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;map=35B2-35B2;note=last+curated+on+Thu+Dec+13+16:51:32+PST+2001 AE003644 EMBL/GenBank/SwissProt mRNA 20111 23268 . + . ID=noc_mRNA_1;Parent=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;product=CG4491-RA AE003644 EMBL/GenBank/SwissProt CDS 20495 22410 . + . Parent=noc_mRNA_1;codon_start=1;db_xref=GI:7298163,FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;note=noc+gene+product;product=CG4491-PA;protein_id=AAF53399.1;translation=MVVLEGG... AE003644 EMBL/GenBank/SwissProt exon 20111 20584 . + . Parent=noc_mRNA_1 AE003644 EMBL/GenBank/SwissProt exon 20887 23268 . + . Parent=noc_mRNA_1 The first question directly relates to Unflattener: the bounds on the CDS feature don't seem right; that is, they include intronic regions in the CDS, whereas in the Genbank file, the CDS is indicated properly with a 'join': CDS join(20495..20584,20887..22410) I am guessing this is a problem with the way the CDS feature is created, correct? The second question has less to do with Unflattener and more to do with GFF3. Do you have any suggestions for encoding relationship types in GFF3 that is generated like this? It really matters that exons are 'part_of' and CDSs are 'product_of' mRNAs. I am trying to decide if this should be done when the GFF3 is produced, or when the GFF3 is loaded to the database. Any suggestions? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From allenday at ucla.edu Mon Dec 15 13:52:28 2003 From: allenday at ucla.edu (Allen Day) Date: Mon Dec 15 13:58:22 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: Message-ID: sounds good On Mon, 15 Dec 2003, Jason Stajich wrote: > We currently have this code in Bio::Root::IO to handle stripping linefeeds > $line =~ s/\r\n/\n/g if( (!$param{-raw}) && (defined $line) ); > > This only matches Mac LF, to handle windows we need to also strip \n\r > so I am going to change it to the following: > > $line =~ s/\r\n/\n/g, $line =~ s/\n\r/\n/g > if( (!$param{-raw}) && (defined $line) ); > > Since this is a core critical module wanted to just post it to see if > anyone has objections/suggestions. > > -jason > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From amackey at pcbi.upenn.edu Mon Dec 15 14:39:43 2003 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Dec 15 14:45:38 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: References: Message-ID: <6E633B11-2F36-11D8-B94E-000A958C5008@pcbi.upenn.edu> If $/ = "\n", then your second regexp won't happen (the \r is at the beginning of the next line), right? So how about instead, simply: $line =~ s/\r/g; # strip any linefeeds, regardless of position -Aaron On Dec 15, 2003, at 1:02 PM, Jason Stajich wrote: > We currently have this code in Bio::Root::IO to handle stripping > linefeeds > $line =~ s/\r\n/\n/g if( (!$param{-raw}) && (defined $line) ); > > This only matches Mac LF, to handle windows we need to also strip \n\r > so I am going to change it to the following: > > $line =~ s/\r\n/\n/g, $line =~ s/\n\r/\n/g > if( (!$param{-raw}) && (defined $line) ); > > Since this is a core critical module wanted to just post it to see if > anyone has objections/suggestions. > > -jason > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From allenday at ucla.edu Mon Dec 15 15:00:13 2003 From: allenday at ucla.edu (Allen Day) Date: Mon Dec 15 15:06:05 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: <6E633B11-2F36-11D8-B94E-000A958C5008@pcbi.upenn.edu> Message-ID: Are you sure you want to do that? Maybe $line =~ s/^\r// if $^O eq 'MacOS'; #or whatever $^O is for Mac. is better. -Allen On Mon, 15 Dec 2003, Aaron J. Mackey wrote: > If $/ = "\n", then your second regexp won't happen (the \r is at the > beginning of the next line), right? > > So how about instead, simply: > > $line =~ s/\r/g; # strip any linefeeds, regardless of position > > -Aaron > > On Dec 15, 2003, at 1:02 PM, Jason Stajich wrote: > > > We currently have this code in Bio::Root::IO to handle stripping > > linefeeds > > $line =~ s/\r\n/\n/g if( (!$param{-raw}) && (defined $line) ); > > > > This only matches Mac LF, to handle windows we need to also strip \n\r > > so I am going to change it to the following: > > > > $line =~ s/\r\n/\n/g, $line =~ s/\n\r/\n/g > > if( (!$param{-raw}) && (defined $line) ); > > > > Since this is a core critical module wanted to just post it to see if > > anyone has objections/suggestions. > > > > -jason > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From heikki at nildram.co.uk Mon Dec 15 17:49:23 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Mon Dec 15 17:55:17 2003 Subject: [Bioperl-l] game tests Message-ID: <200312152249.23737.heikki@nildram.co.uk> Sheldon, Thanks for now game parsing code. The bioperl tests in t/game.t fail. Do the tests in t/SeqIO.t completely supercede the the older tests. In that case, can the t/game.t deleted? -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From kvddrift at earthlink.net Mon Dec 15 19:58:47 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon Dec 15 20:05:43 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> Message-ID: <00FA2E4B-2F63-11D8-9B0B-003065A5FDCC@earthlink.net> Hi, I checked out bioperl-live this afternoon (EST), so I would get the latest and greatest. However, I still get some tests that fail. Note that I did the tests while offline (I only have a modem at home). Here's some output: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- t/Coalescent.t 11 1 9.09% 9 t/ESEfinder.t 12 4 33.33% 4-5 8 11 t/RefSeq.t 13 5 38.46% 4-7 9 t/Scansite.t 255 65280 12 0 0.00% ?? t/game.t 255 65280 23 44 191.30% 2-23 17 subtests skipped. Failed 5/179 test scripts, 97.21% okay. 32/8135 subtests failed, 99.61% okay. make: *** [test_dynamic] Error 2 [ModusOperandi:~/Desktop/bioperl-live] koen% When I am online, ESEFinder.t and RefSeq.t pass, the others still fail. >>> % perl -I. -w t/ESEfinder.t Using the command above I got more info on each failing tests: [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/Coalescent.t 1..11 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 not ok 9 # Test 9 got: '' (t/Coalescent.t at line 67) # Expected: '1' (fu and li D*) ok 10 ok 11 [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/game.t 1..23 Parameterless "use IO" deprecated at /sw/lib/perl5/XML/Writer.pm line 16 ok 1 Can't locate object method "next_primary_seq" via package "Bio::SeqIO::game" at t/game.t line 65. [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/ESEfinder.t 1..12 ok 1 ok 2 -------------------- WARNING --------------------- MSG: Bio::Tools::Analysis::DNA::ESEfinder Request Error: 500 (Internal Server Error) Can't connect to exon.cshl.org:80 (Bad hostname 'exon.cshl.org') Client-Date: Tue, 16 Dec 2003 00:48:44 GMT --------------------------------------------------- ok 3 not ok 4 # Failed test 4 in t/ESEfinder.t at line 72 not ok 5 # Failed test 5 in t/ESEfinder.t at line 73 ok 6 ok 7 not ok 8 # Test 8 got: (t/ESEfinder.t at line 76) # Expected: '41' ok 9 # No network access - could not connect to ESEfinder server ok 10 Use of uninitialized value in join or string at Bio/Seq/Meta/Array.pm line 434. Use of uninitialized value in join or string at Bio/Seq/Meta/Array.pm line 434. not ok 11 # Test 11 got: ' ' (t/ESEfinder.t at line 84) # Expected: '-3.221149 -1.602223' ok 12 [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/RefSeq.t 1..13 ok 1 ok 2 ok 3 sleeping for 3 seconds not ok 4 # Failed test 4 in t/RefSeq.t at line 78 not ok 5 # Failed test 5 in t/RefSeq.t at line 79 not ok 6 # Failed test 6 in t/RefSeq.t at line 80 not ok 7 # Failed test 7 in t/RefSeq.t at line 81 ok 8 -------------------- WARNING --------------------- MSG: acc (NM_006732) does not exist --------------------------------------------------- not ok 9 # Failed test 9 in t/RefSeq.t at line 97 ok 10 # Unable to RefSeq test - probably no network connection. ok 11 # Unable to RefSeq test - probably no network connection. ok 12 # Unable to RefSeq test - probably no network connection. ok 13 # Unable to RefSeq test - probably no network connection. [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/Scansite.t 1..12 Transliteration pattern not terminated at Bio/Tools/Analysis/Protein/Scansite.pm line 372. Compilation failed in require at t/Scansite.t line 44. ok 1 # unable to run all of the tests depending on web access ok 2 # unable to run all of the tests depending on web access ok 3 # unable to run all of the tests depending on web access ok 4 # unable to run all of the tests depending on web access ok 5 # unable to run all of the tests depending on web access ok 6 # unable to run all of the tests depending on web access ok 7 # unable to run all of the tests depending on web access ok 8 # unable to run all of the tests depending on web access ok 9 # unable to run all of the tests depending on web access ok 10 # unable to run all of the tests depending on web access ok 11 # unable to run all of the tests depending on web access ok 12 # unable to run all of the tests depending on web access [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/game.t 1..23 Parameterless "use IO" deprecated at /sw/lib/perl5/XML/Writer.pm line 16 ok 1 Can't locate object method "next_primary_seq" via package "Bio::SeqIO::game" at t/game.t line 65. thanks, - Koen. From jason at cgt.duhs.duke.edu Mon Dec 15 20:15:35 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Dec 15 20:21:39 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: <00FA2E4B-2F63-11D8-9B0B-003065A5FDCC@earthlink.net> References: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> <00FA2E4B-2F63-11D8-9B0B-003065A5FDCC@earthlink.net> Message-ID: On Mon, 15 Dec 2003, Koen van der Drift wrote: > Hi, > > I checked out bioperl-live this afternoon (EST), so I would get the > latest and greatest. However, I still get some tests that fail. Note > that I did the tests while offline (I only have a modem at home). > Here's some output: > > > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > t/Coalescent.t 11 1 9.09% 9 fixed - changed the test to be more lenient > t/ESEfinder.t 12 4 33.33% 4-5 8 11 i dunno - works fine for me right now - this depending on your network connection working > t/RefSeq.t 13 5 38.46% 4-7 9 this also depends on network connection and the remote servers being up, worked for me > t/Scansite.t 255 65280 12 0 0.00% ?? this one seems to hang for me - but gets to test 6 for me. also need network connection > t/game.t 255 65280 23 44 191.30% 2-23 removed this now that sheldon has updated to apollo game and fixed the module - the old tests will not work. For the devs: We may need to think about how to set a flag for tests which should not run when no network connecion is found (or perhaps not be run as part of the default tests no matter what unless someone asks for them specifically - this might save a lot of future pain). > 17 subtests skipped. > Failed 5/179 test scripts, 97.21% okay. 32/8135 subtests failed, 99.61% > okay. > make: *** [test_dynamic] Error 2 > [ModusOperandi:~/Desktop/bioperl-live] koen% > > > When I am online, ESEFinder.t and RefSeq.t pass, the others still fail. > > > > >>> % perl -I. -w t/ESEfinder.t > > Using the command above I got more info on each failing tests: > > > [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/Coalescent.t > 1..11 > ok 1 > ok 2 > ok 3 > ok 4 > ok 5 > ok 6 > ok 7 > ok 8 > not ok 9 > # Test 9 got: '' (t/Coalescent.t at line 67) > # Expected: '1' (fu and li D*) > ok 10 > ok 11 > > [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/game.t > 1..23 > Parameterless "use IO" deprecated at /sw/lib/perl5/XML/Writer.pm line 16 > ok 1 > Can't locate object method "next_primary_seq" via package > "Bio::SeqIO::game" at t/game.t line 65. > > [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/ESEfinder.t > 1..12 > ok 1 > ok 2 > > -------------------- WARNING --------------------- > MSG: Bio::Tools::Analysis::DNA::ESEfinder Request Error: > 500 (Internal Server Error) Can't connect to exon.cshl.org:80 (Bad > hostname 'exon.cshl.org') > Client-Date: Tue, 16 Dec 2003 00:48:44 GMT > > > --------------------------------------------------- > ok 3 > not ok 4 > # Failed test 4 in t/ESEfinder.t at line 72 > not ok 5 > # Failed test 5 in t/ESEfinder.t at line 73 > ok 6 > ok 7 > not ok 8 > # Test 8 got: (t/ESEfinder.t at line 76) > # Expected: '41' > ok 9 # No network access - could not connect to ESEfinder server > ok 10 > Use of uninitialized value in join or string at Bio/Seq/Meta/Array.pm > line 434. > Use of uninitialized value in join or string at Bio/Seq/Meta/Array.pm > line 434. > not ok 11 > # Test 11 got: ' ' (t/ESEfinder.t at line 84) > # Expected: '-3.221149 -1.602223' > ok 12 > > [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/RefSeq.t > 1..13 > ok 1 > ok 2 > ok 3 > sleeping for 3 seconds > not ok 4 > # Failed test 4 in t/RefSeq.t at line 78 > not ok 5 > # Failed test 5 in t/RefSeq.t at line 79 > not ok 6 > # Failed test 6 in t/RefSeq.t at line 80 > not ok 7 > # Failed test 7 in t/RefSeq.t at line 81 > ok 8 > > -------------------- WARNING --------------------- > MSG: acc (NM_006732) does not exist > --------------------------------------------------- > not ok 9 > # Failed test 9 in t/RefSeq.t at line 97 > ok 10 # Unable to RefSeq test - probably no network connection. > ok 11 # Unable to RefSeq test - probably no network connection. > ok 12 # Unable to RefSeq test - probably no network connection. > ok 13 # Unable to RefSeq test - probably no network connection. > > [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/Scansite.t > 1..12 > Transliteration pattern not terminated at > Bio/Tools/Analysis/Protein/Scansite.pm line 372. > Compilation failed in require at t/Scansite.t line 44. > ok 1 # unable to run all of the tests depending on web access > ok 2 # unable to run all of the tests depending on web access > ok 3 # unable to run all of the tests depending on web access > ok 4 # unable to run all of the tests depending on web access > ok 5 # unable to run all of the tests depending on web access > ok 6 # unable to run all of the tests depending on web access > ok 7 # unable to run all of the tests depending on web access > ok 8 # unable to run all of the tests depending on web access > ok 9 # unable to run all of the tests depending on web access > ok 10 # unable to run all of the tests depending on web access > ok 11 # unable to run all of the tests depending on web access > ok 12 # unable to run all of the tests depending on web access > > [ModusOperandi:~/Desktop/bioperl-live] koen% perl -w -I. t/game.t > 1..23 > Parameterless "use IO" deprecated at /sw/lib/perl5/XML/Writer.pm line 16 > ok 1 > Can't locate object method "next_primary_seq" via package > "Bio::SeqIO::game" at t/game.t line 65. > > > thanks, > > - Koen. > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Mon Dec 15 20:25:37 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Dec 15 20:31:54 2003 Subject: [Bioperl-l] repost the problem --- Re: bl2seq hang and its performace In-Reply-To: <001b01c3c2ac$f8d64020$706712ac@GENETHON> References: <001b01c3c2ac$f8d64020$706712ac@GENETHON> Message-ID: bioperl sequence objects aren't particularly robust for huge sequences and can have you run out of memory - presumably if you run your query on the cmd line with the files and take bioperl out of the loop is runs fine? You may need to rethink your strategy for searching and pre-create your sequence files to be more IO and memory efficient. Personally I find StandAloneBlast not the best module for lots of searches and prepare my pipeline to be leaner when I need to. You can do this yourself in a simple script. #psuedo code -- create your sequence files - see Bio::Seq::LargeSeq or Bio::DB::Fasta for more memory efficient ways to manipulate large sequence files. -- generate unique names for your subsequences, use SeqIO to create the files presumably if that will work. -- do the bl2seq my $bl2seqfh; open($bl2seqfh, "bl2seq -i $file1 -j $file2 -p blastn ... |") || die($!); Bioperl 1.3.x only code my $searchio = new Bio::SearchIO(-format => 'blast', -fh => $bl2seqfh); my $r = $searchio->next_result; # or use Bio::Tools::BPbl2seq if you have an earlier # version of the toolkit. This is essentially what StandAloneBlast should be doing for you, but with the overhead and assumptions that you are passing Bio::SeqI objects and creating the temporary files for you, and cleaning them up as well. One drawback/bug is I think it will still open and try and create Bio::SeqI objects even when you passing filenames - which may be the source of your problem, not sure - this may also have been fixed, I've not dug into the code lately. -jason On Mon, 15 Dec 2003, Liu Haifeng wrote: > Anyone can help? Really urgent! > > Haifeng Liu > ----- Original Message ----- > From: "Liu Haifeng" > To: > Sent: 2003å¹´12月12æ—¥ 14:49 > Subject: bl2seq hang and its performace > > > > Hi all, > > > > I noticed that one of my program written using bioperl-1.2.3 runs very > slow > > and consumes huge memory, and I doubted that it is due to the call of > bl2seq > > in the program. Thus, I wrote a small program (bl2seq sequences against > > themselves from a fasta file) below to see if it is the ture: > > > > > > #!/usr/bin/perl -w > > use Bio::SeqIO; > > use Bio::Tools::Blast; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::Tools::BPlite; > > > > my $infile =shift; > > my $sno=0; > > my $blastalgo="blastp"; #blastp ,blastx, tblastn, tblastx > > my $pin = Bio::SeqIO->new('-file' => "$infile", '-format' => > > 'Fasta'); > > while ( my $proseq = $pin -> next_seq()) { > > $sno++; > > print "bl2seq $sno ..............................\n"; > > my @params=('program' => $blastalo); > > my $factory= Bio::Tools::Run::StandAloneBlast->new(@params); > > $factory->io->_io_cleanup(); > > my $report=$factory->bl2seq($proseq, $proseq); > > while (my $hsp=$report->next_feature) { > > #only need the first hsp > > $report->close(); > > } > > undef $report; > > } > > print "running is over\n"; > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The program runs ok for the small fastat file. However, when I input a > > fasat file around 2.6M containing 10,000 protein sequences, the program > > hangs when it compare the 1782th sequence. Also I noticed that the > program > > has consume 12M of memory at that time. I searched the archive that > there > > have been similar bl2seq problem occurred. However, it should have been > > solved in the latest version. > > > > Anyone can show me some clues to improve the performance of calling > bl2seq? > > Thank you. > > > > Regards > > Haifeng Liu > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From kvddrift at earthlink.net Mon Dec 15 20:47:49 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon Dec 15 20:54:41 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: <991CE8E0-2DB1-11D8-AE11-003065A5FDCC@earthlink.net> <00FA2E4B-2F63-11D8-9B0B-003065A5FDCC@earthlink.net> Message-ID: On Dec 15, 2003, at 8:15 PM, Jason Stajich wrote: > >> t/Coalescent.t 11 1 9.09% 9 > fixed - changed the test to be more lenient >> t/ESEfinder.t 12 4 33.33% 4-5 8 11 > i dunno - works fine for me right now - this depending on your network > connection working >> t/RefSeq.t 13 5 38.46% 4-7 9 > this also depends on network connection and the remote servers being > up, > worked for me >> t/Scansite.t 255 65280 12 0 0.00% ?? > > this one seems to hang for me - but gets to test 6 for me. also need > network connection > >> t/game.t 255 65280 23 44 191.30% 2-23 > removed this now that sheldon has updated to apollo game > and fixed the module - the old tests will not work. > Great, they all work now (with a network connection)! thanks, - Koen. From lhaifeng at dso.org.sg Mon Dec 15 21:55:01 2003 From: lhaifeng at dso.org.sg (Liu Haifeng) Date: Mon Dec 15 22:04:51 2003 Subject: [Bioperl-l] repost the problem --- Re: bl2seq hang and itsperformace References: <001b01c3c2ac$f8d64020$706712ac@GENETHON> Message-ID: <002401c3c380$0094fc60$706712ac@GENETHON> Thank you a lot, Jason! I have revised the code as you advised. It is really a great saving! Now the program runs at 9M memory with the same 2.6M seq file as input. Actually, I noticed that the memory consumed won't vary with the number of sequences in the input file. Regards Haifeng ----- Original Message ----- From: "Jason Stajich" To: "Liu Haifeng" Cc: Sent: 2003Äê12ÔÂ16ÈÕ 9:25 Subject: Re: [Bioperl-l] repost the problem --- Re: bl2seq hang and itsperformace > bioperl sequence objects aren't particularly robust for huge sequences and > can have you run out of memory - presumably if you run your query on the > cmd line with the files and take bioperl out of the loop is runs fine? > > You may need to rethink your strategy for searching and pre-create your > sequence files to be more IO and memory efficient. > > Personally I find StandAloneBlast not the best module for lots of > searches and prepare my pipeline to be leaner when I need to. > > You can do this yourself in a simple script. > > #psuedo code > -- create your sequence files - see Bio::Seq::LargeSeq or Bio::DB::Fasta > for more memory efficient ways to manipulate large sequence files. > -- generate unique names for your subsequences, use SeqIO to create the > files presumably if that will work. > -- do the bl2seq > my $bl2seqfh; > open($bl2seqfh, "bl2seq -i $file1 -j $file2 -p blastn ... |") > || die($!); > Bioperl 1.3.x only code > my $searchio = new Bio::SearchIO(-format => 'blast', > -fh => $bl2seqfh); > > my $r = $searchio->next_result; > # or use Bio::Tools::BPbl2seq if you have an earlier > # version of the toolkit. > > > This is essentially what StandAloneBlast should be doing for you, but with > the overhead and assumptions that you are passing Bio::SeqI objects and > creating the temporary files for you, and cleaning them up as well. One > drawback/bug is I think it will still open and try and create Bio::SeqI > objects even when you passing filenames - which may be the source of your > problem, not sure - this may also have been fixed, I've not dug into the > code lately. > > -jason > On Mon, 15 Dec 2003, Liu Haifeng wrote: > > > Anyone can help? Really urgent! > > > > Haifeng Liu > > ----- Original Message ----- > > From: "Liu Haifeng" > > To: > > Sent: 2003å¹?2æœ?2æ—?14:49 > > Subject: bl2seq hang and its performace > > > > > > > Hi all, > > > > > > I noticed that one of my program written using bioperl-1.2.3 runs very > > slow > > > and consumes huge memory, and I doubted that it is due to the call of > > bl2seq > > > in the program. Thus, I wrote a small program (bl2seq sequences against > > > themselves from a fasta file) below to see if it is the ture: > > > > > > > > > #!/usr/bin/perl -w > > > use Bio::SeqIO; > > > use Bio::Tools::Blast; > > > use Bio::Tools::Run::StandAloneBlast; > > > use Bio::Tools::BPlite; > > > > > > my $infile =shift; > > > my $sno=0; > > > my $blastalgo="blastp"; #blastp ,blastx, tblastn, tblastx > > > my $pin = Bio::SeqIO->new('-file' => "$infile", '-format' => > > > 'Fasta'); > > > while ( my $proseq = $pin -> next_seq()) { > > > $sno++; > > > print "bl2seq $sno ..............................\n"; > > > my @params=('program' => $blastalo); > > > my $factory= Bio::Tools::Run::StandAloneBlast->new(@params); > > > $factory->io->_io_cleanup(); > > > my $report=$factory->bl2seq($proseq, $proseq); > > > while (my $hsp=$report->next_feature) { > > > #only need the first hsp > > > $report->close(); > > > } > > > undef $report; > > > } > > > print "running is over\n"; > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > The program runs ok for the small fastat file. However, when I input a > > > fasat file around 2.6M containing 10,000 protein sequences, the program > > > hangs when it compare the 1782th sequence. Also I noticed that the > > program > > > has consume 12M of memory at that time. I searched the archive that > > there > > > have been similar bl2seq problem occurred. However, it should have been > > > solved in the latest version. > > > > > > Anyone can show me some clues to improve the performance of calling > > bl2seq? > > > Thank you. > > > > > > Regards > > > Haifeng Liu > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From shawnh at stanford.edu Mon Dec 15 22:35:00 2003 From: shawnh at stanford.edu (Shawn Hoon) Date: Mon Dec 15 22:37:41 2003 Subject: [Bioperl-l] using graphics xyplot Message-ID: hi, I'm trying to use the SVG/PNG component for drawing xyplots along a segment. Its not clear to me how to do it without going through a Bio::DB::GFF::Aggregator. If I just have a bunch of scores (i.e. features) that I want plot along a segment of dna. Something like this doesn't seem to be possible since there is a call to $feat->parts in xyplot.pm : my $panel = Bio::Graphics::Panel->new(-length => 1000, -key_style => 'between', -width => 800, -pad_left => 10, -pad_right => 10, -image_class=>'GD::SVG' ); $panel->add_track(arrow => Bio::SeqFeature::Generic->new(-start=>1,-end=>1000), my $feat = Bio::SeqFeature::Generic->new(); while($i < 1000){ my $f= Bio::SeqFeature::Generic->new(-start=>$i, -end=>$i+50, -strand=>1, -score=>int(rand(100))); $feat->add_sub_SeqFeature($f,'EXPAND'); $i+=50; } $panel->add_track($feat, -graph_type=>'line', -bump=>0, -glyph =>'xyplot'); my $gd = $panel->gd; print $gd->svg; am I trying to use it wrongly? thanks, shawn From shawnh at stanford.edu Mon Dec 15 23:37:58 2003 From: shawnh at stanford.edu (Shawn Hoon) Date: Mon Dec 15 23:40:40 2003 Subject: [Bioperl-l] using graphics xyplot In-Reply-To: References: Message-ID: <9FF1FEE6-2F81-11D8-969C-000A95783436@stanford.edu> Sorry, ignore the last mail. I figured it out. shawn On Monday, December 15, 2003, at 7:35PM, Shawn Hoon wrote: > hi, > I'm trying to use the SVG/PNG component for drawing xyplots along a > segment. Its not clear to me how to do it > without going through a Bio::DB::GFF::Aggregator. If I just have a > bunch of scores (i.e. features) that I want plot along a segment of > dna. > Something like this doesn't seem to be possible since there is a call > to $feat->parts in xyplot.pm : > > my $panel = Bio::Graphics::Panel->new(-length => 1000, > -key_style => 'between', > -width => 800, > -pad_left => 10, > -pad_right => 10, > -image_class=>'GD::SVG' > ); > > $panel->add_track(arrow => > Bio::SeqFeature::Generic->new(-start=>1,-end=>1000), > my $feat = Bio::SeqFeature::Generic->new(); > while($i < 1000){ > my $f= Bio::SeqFeature::Generic->new(-start=>$i, > -end=>$i+50, > -strand=>1, > -score=>int(rand(100))); > $feat->add_sub_SeqFeature($f,'EXPAND'); > $i+=50; > } > $panel->add_track($feat, > -graph_type=>'line', > -bump=>0, > -glyph =>'xyplot'); > my $gd = $panel->gd; > > print $gd->svg; > > am I trying to use it wrongly? > > thanks, > > shawn > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From wes.barris at csiro.au Mon Dec 15 23:38:56 2003 From: wes.barris at csiro.au (Wes Barris) Date: Mon Dec 15 23:44:57 2003 Subject: [Bioperl-l] Bug in SeqIO genbank output Message-ID: <3FDE8C60.90809@csiro.au> Hi, I have just succeeded in tracking down a bug that prevents genbank files written from bioperl from being properly imported into StackPack (clustering software). The problem is due to a subtle difference in a genbank entry downloaded from NCBI and a genbank entry produced using genbank.pm. If you use "od -c" to look at a genbank record from NCBI, you will notice that the word "ORIGIN" is followed by six space characters. ORIGIN 1 cggccgcgtc gacttttttt ttaggtattt ttctcttatt atttctaaaa tataaatttt 61 ggacattcaa aagtgcaaca ngttaatgtg cctgtgggga atatcacagt taaaaaaata If I process this file using bioperl and then write out a new genbank format file, the word "ORIGIN" is followed immediately by a carriage return (newline) character. It seems silly to me that spaces should be required after the word "ORIGIN", but they do exist in files downloaded from NCBI and StackPack seems to require these space characters in order to import a genbank file. Is there an official specification for the genbank format? I have sent a bug report to the makers of StackPack too. In the meantime, I have modified my installed copy of Bio/SeqIO/genbank.pm changing this line: $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : '')); to this: $self->_print(sprintf("%-12s%s\n",'ORIGIN ',$o ? $o->value : '')); -- Wes Barris E-Mail: Wes.Barris@csiro.au From lhaifeng at dso.org.sg Mon Dec 15 23:45:25 2003 From: lhaifeng at dso.org.sg (Liu Haifeng) Date: Mon Dec 15 23:54:58 2003 Subject: [Bioperl-l] blast warning -- no valid letters to be indexed Message-ID: <003401c3c38f$6bf90f50$706712ac@GENETHON> Can anyone tell me if the following waring is critical or what is its meaning when I perform bl2seq? ---------------------------------------------------------------------------- ----------- [bl2seq] WARNING: [000.000] Blast: No valid letters to be indexed on context 0 ---------------------------------------------------------------------------- ----------- thanks in advance! Haifeng From dhoworth at mrc-lmb.cam.ac.uk Tue Dec 16 06:05:35 2003 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Tue Dec 16 06:11:28 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: References: Message-ID: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> I'm a bit confused by this discussion. I think it's best to go back to basics and then probably approach it slightly differently. Q1: What byte sequence in the data do you want to change to what? Q2: What operating system is the code running on? The problem is that the meaning of \r and \n is different on different OS as well as different bytes being stored in files; from http://www.perldoc.com/perl5.6.1/pod/perlop.html: 'All systems use the virtual "\n" to represent a line terminator, called a "newline". There is no such thing as an unvarying, physical newline character. It is only an illusion that the operating system, device drivers, C libraries, and Perl all conspire to preserve. Not all systems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on a Mac, these are reversed, and on systems without line terminator, printing "\n" may emit no actual data. In general, use "\n" when you mean a "newline" for your system, but use the literal ASCII when you need an exact character. For example, most networking protocols expect and prefer a CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they often accept just "\012", they seldom tolerate just "\015". If you get in the habit of using "\n" for networking, you may be burned some day.' I.e Windows terminates lines with \r\n but a Mac perversely reads them as \n\r. I think for portable code it's better to write the regexps using the octal values: \015 instead of CR and \012 instead of LF. Plus as Aaron says, the pattern will be broken up. (This goes back to Q1 - is there ever any reason to preserve a CR? Or for that matter an LF?) Then test on all architectures bioperl is supported on :) Cheers, Dave Allen Day wrote: > Are you sure you want to do that? Maybe > > $line =~ s/^\r// if $^O eq 'MacOS'; #or whatever $^O is for Mac. > > is better. > > -Allen > > > On Mon, 15 Dec 2003, Aaron J. Mackey wrote: > > >>If $/ = "\n", then your second regexp won't happen (the \r is at the >>beginning of the next line), right? >> >>So how about instead, simply: >> >>$line =~ s/\r/g; # strip any linefeeds, regardless of position >> >>-Aaron >> >>On Dec 15, 2003, at 1:02 PM, Jason Stajich wrote: >> >> >>>We currently have this code in Bio::Root::IO to handle stripping >>>linefeeds >>>$line =~ s/\r\n/\n/g if( (!$param{-raw}) && (defined $line) ); >>> >>>This only matches Mac LF, to handle windows we need to also strip \n\r >>>so I am going to change it to the following: >>> >>> $line =~ s/\r\n/\n/g, $line =~ s/\n\r/\n/g >>> if( (!$param{-raw}) && (defined $line) ); >>> >>>Since this is a core critical module wanted to just post it to see if >>>anyone has objections/suggestions. >>> >>>-jason >>>-- >>>Jason Stajich >>>Duke University >>>jason at cgt.mc.duke.edu -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From amackey at pcbi.upenn.edu Tue Dec 16 06:38:31 2003 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Dec 16 06:44:25 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> References: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> Message-ID: <6010E5B6-2FBC-11D8-B94E-000A958C5008@pcbi.upenn.edu> > I.e Windows terminates lines with \r\n but a Mac perversely reads them > as \n\r. Actually, it seems that there are some Mac-derived files with only \r, and no \n at all (as a recent example, EndNote 6 exported bibliographies have no \n's, only \r's by od -c's reckoning). > I think for portable code it's better to write the regexps using the > octal values: \015 instead of CR and \012 instead of LF. We don't have issues writing files, only reading one-line-at-a-time and canonicalizing it (why do we need to canonicalize it again, Jason?) From dhoworth at mrc-lmb.cam.ac.uk Tue Dec 16 07:04:52 2003 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Tue Dec 16 07:10:45 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: <6010E5B6-2FBC-11D8-B94E-000A958C5008@pcbi.upenn.edu> References: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> <6010E5B6-2FBC-11D8-B94E-000A958C5008@pcbi.upenn.edu> Message-ID: <3FDEF4E4.9090203@mrc-lmb.cam.ac.uk> Aaron J. Mackey wrote: >> I.e Windows terminates lines with \r\n but a Mac perversely reads them >> as \n\r. > Actually, it seems that there are some Mac-derived files with only \r, > and no \n at all (as a recent example, EndNote 6 exported bibliographies > have no \n's, only \r's by od -c's reckoning). Now you've confused me again. What do you mean by \r? Are you saying there are some Mac files with only \012 or with only \015? That is are you speaking as a Unix/Linux/Windows user or as a Mac user? This is why it's better not to use \r and \n at all in this context. >> I think for portable code it's better to write the regexps using the >> octal values: \015 instead of CR and \012 instead of LF. > We don't have issues writing files, only reading one-line-at-a-time and > canonicalizing it (why do we need to canonicalize it again, Jason?) I wasn't talking about writing files, I was talking about writing the regexps that are used for reading files. (But as the section I quoted from Perldoc points out, there *are* issues with writing files if you want to use them with some network protocols :) Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From ik1 at sanger.ac.uk Tue Dec 16 07:17:32 2003 From: ik1 at sanger.ac.uk (Ian Korf) Date: Tue Dec 16 07:23:39 2003 Subject: [Bioperl-l] blast warning -- no valid letters to be indexed In-Reply-To: <003401c3c38f$6bf90f50$706712ac@GENETHON> References: <003401c3c38f$6bf90f50$706712ac@GENETHON> Message-ID: My guess is your sequence is all N's. Or maybe all lowercase and you have soft masking enabled. Or there are no letters at all. -Ian On 16 Dec 2003, at 04:45, Liu Haifeng wrote: > Can anyone tell me if the following waring is critical or what is its > meaning when I perform bl2seq? > ----------------------------------------------------------------------- > ----- > ----------- > [bl2seq] WARNING: [000.000] Blast: No valid letters to be indexed on > context > 0 > ----------------------------------------------------------------------- > ----- > ----------- > thanks in advance! > > Haifeng > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From amackey at pcbi.upenn.edu Tue Dec 16 07:53:15 2003 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Dec 16 07:59:05 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: <3FDEF4E4.9090203@mrc-lmb.cam.ac.uk> References: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> <6010E5B6-2FBC-11D8-B94E-000A958C5008@pcbi.upenn.edu> <3FDEF4E4.9090203@mrc-lmb.cam.ac.uk> Message-ID: I meant that when I examine a text file created by a Mac application (in this case, Endnote) using the unix tool "od -c" I see only "\r". I agree it's all very confusing; I apologize if I've only added to the uproar. -Aaron On Dec 16, 2003, at 7:04 AM, Dave Howorth wrote: > Aaron J. Mackey wrote: >>> I.e Windows terminates lines with \r\n but a Mac perversely reads >>> them as \n\r. >> Actually, it seems that there are some Mac-derived files with only >> \r, and no \n at all (as a recent example, EndNote 6 exported >> bibliographies have no \n's, only \r's by od -c's reckoning). > > Now you've confused me again. What do you mean by \r? Are you saying > there are some Mac files with only \012 or with only \015? That is > are you speaking as a Unix/Linux/Windows user or as a Mac user? > > This is why it's better not to use \r and \n at all in this context. > >>> I think for portable code it's better to write the regexps using the >>> octal values: \015 instead of CR and \012 instead of LF. >> We don't have issues writing files, only reading one-line-at-a-time >> and canonicalizing it (why do we need to canonicalize it again, >> Jason?) > > I wasn't talking about writing files, I was talking about writing the > regexps that are used for reading files. (But as the section I quoted > from Perldoc points out, there *are* issues with writing files if you > want to use them with some network protocols :) > > Cheers, Dave > -- > Dave Howorth > MRC Centre for Protein Engineering > Hills Road, Cambridge, CB2 2QH > 01223 252960 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Deborah.Simon at ingenium-ag.com Tue Dec 16 05:28:41 2003 From: Deborah.Simon at ingenium-ag.com (Simon, Deborah) Date: Tue Dec 16 08:43:12 2003 Subject: [Bioperl-l] Authenticating apache against LDAP Message-ID: <6F829EB012AE3F479BE397616627A27E12EA34@newyork.ing-ag.it.local> First of all apologies if this is totally off the Bioperl beaten path, but I would like to squeeze the question in anyhow.... I have a perl module (which uses Net::LDAP) for authenticating apache against our LDAP (Lightweight Directory Access Protocol) server here in house (which is actually MS Active Directory). OK this is nothing new and works fine. What I would like to do however, is try the authentication against 2 or more servers... for 2 reasons... 1) If one LDAP server is 'down' I want to check the redundant one so users are not locked out in such a case, 2) We have 2 different LDAP domains (internal and external users), and I want both to be able to authenticate. Obviously I could just write a simple array with the list of the LDAP servers in and try to authenticate against each one in turn (which I currently do as a quick fix), but this requires I maintain an up-to-date list of servers. What I would really like to do is make a call to retrieve the LDAP services available in the network. Perhaps something like SLP (Service Location Protocol) .. but I am getting a bit lost now, so what I would like to ask is does anyone have experience with this kind of thing or indeed any ideas at all ????? Many thanks, and again sorry for the off-posting, -deb ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Deborah Simon, M.Sc. Senior Scientist, Bioinformatics Ingenium Pharmaceuticals AG Phone: +49 (0)89 8565 2335 Fraunhoferstr. 13 Fax: +49 (0)89 8565 2351 82152 Martinsried, Munich Email: deborah.simon@ingenium-ag.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From nathanhaigh at ukonline.co.uk Tue Dec 16 03:52:24 2003 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Tue Dec 16 08:43:44 2003 Subject: [Bioperl-l] Trees as graphics Message-ID: Hello, I was reading a response you made on the Bioperl discussion board about an adapter that draws trees as SVG from newick format trees. I was wondering if you have released the adapter or if you know or a way I might achieve the following: I am building a database driven website, where I would like to store trees in newick format and display them in a browser window as graphics via a link. Regards Nathan Haigh --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 11/12/2003 Tested on: 15/12/2003 17:08:32 avast! is copyright (c) 2000-2003 ALWIL Software. http://www.avast.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3095 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031216/59394b1a/smime.bin From heikki at nildram.co.uk Tue Dec 16 09:30:34 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 16 09:36:37 2003 Subject: [Bioperl-l] some tests fail on Mac OS X In-Reply-To: References: <00FA2E4B-2F63-11D8-9B0B-003065A5FDCC@earthlink.net> Message-ID: <200312161430.34208.heikki@nildram.co.uk> That is good idea. What is the best way to do it? Is there a flag we can pass to Test harness or should we decide upon an environmental variable like "BIOPERLDEBUG"? -Heikki On Tuesday 16 Dec 2003 1:15 am, Jason Stajich wrote: > We may need to think about how to set a flag for tests which should not > run when no network connecion is found (or perhaps not be run as part of > the default tests no matter what unless someone asks for them specifically > - this might save a lot of future pain). -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason at cgt.duhs.duke.edu Tue Dec 16 09:52:25 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 16 09:59:11 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: References: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> <6010E5B6-2FBC-11D8-B94E-000A958C5008@pcbi.upenn.edu> <3FDEF4E4.9090203@mrc-lmb.cam.ac.uk> Message-ID: It stems from this report http://bugzilla.bioperl.org/show_bug.cgi?id=1570 I don't know if he is running clustalw on windows and then trying to run perl on the file in unix or what. If that is the case I think it is in order to unix-ify the file when they are moved over and not up to bioperl. We already had code in Root::IO like this: $line =~ s/\r\n/\n/ if( (! $param{-raw}) && (defined $line); I have no recollection of when it was added or by whom, I could be the guilty party but I really don't remember. So I don't have the answers to your questions: Q1: What byte sequence in the data do you want to change to what? Q2: What operating system is the code running on? I think the intention here was that if perl -i -e -p 's/\n\r/\n/g' file.dnd cleaned up the problem, why shouldn't that be part of the IO input automatically. I'm going to pass on fixing this bug for now. Hopefully someone else will get inspired from this discussion and test and propose THE RIGHT solution. -jason On Tue, 16 Dec 2003, Aaron J. Mackey wrote: > > I meant that when I examine a text file created by a Mac application > (in this case, Endnote) using the unix tool "od -c" I see only "\r". > > I agree it's all very confusing; I apologize if I've only added to the > uproar. > > -Aaron > > On Dec 16, 2003, at 7:04 AM, Dave Howorth wrote: > > > Aaron J. Mackey wrote: > >>> I.e Windows terminates lines with \r\n but a Mac perversely reads > >>> them as \n\r. > >> Actually, it seems that there are some Mac-derived files with only > >> \r, and no \n at all (as a recent example, EndNote 6 exported > >> bibliographies have no \n's, only \r's by od -c's reckoning). > > > > Now you've confused me again. What do you mean by \r? Are you saying > > there are some Mac files with only \012 or with only \015? That is > > are you speaking as a Unix/Linux/Windows user or as a Mac user? > > > > This is why it's better not to use \r and \n at all in this context. > > > >>> I think for portable code it's better to write the regexps using the > >>> octal values: \015 instead of CR and \012 instead of LF. > >> We don't have issues writing files, only reading one-line-at-a-time > >> and canonicalizing it (why do we need to canonicalize it again, > >> Jason?) > > > > I wasn't talking about writing files, I was talking about writing the > > regexps that are used for reading files. (But as the section I quoted > > from Perldoc points out, there *are* issues with writing files if you > > want to use them with some network protocols :) > > > > Cheers, Dave > > -- > > Dave Howorth > > MRC Centre for Protein Engineering > > Hills Road, Cambridge, CB2 2QH > > 01223 252960 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From iain.wallace at ucd.ie Tue Dec 16 10:28:46 2003 From: iain.wallace at ucd.ie (Iain Wallace) Date: Tue Dec 16 10:34:40 2003 Subject: [Bioperl-l] Bio::TreeIO Message-ID: <1071588526.28148.5.camel@bioinf10> Hi all, I am trying to use the Bio::TreeIO module to split my sequences into two groups. My problem is that I want these two groups to contain as many sequences as possible. At the moment I am using NJplot to define an outgroup, so that the tree is as evenly split as i can make it. I am wondering if any one knows an easier way to do this.....? Any help would be great Thanks Iain From dhoworth at mrc-lmb.cam.ac.uk Tue Dec 16 11:03:36 2003 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Tue Dec 16 11:09:51 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: References: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> <6010E5B6-2FBC-11D8-B94E-000A958C5008@pcbi.upenn.edu> <3FDEF4E4.9090203@mrc-lmb.cam.ac.uk> Message-ID: <3FDF2CD8.9040604@mrc-lmb.cam.ac.uk> Jason Stajich wrote: > It stems from this report > http://bugzilla.bioperl.org/show_bug.cgi?id=1570 > > I don't know if he is running clustalw on windows and then trying to run > perl on the file in unix or what. If that is the case I think it is in > order to unix-ify the file when they are moved over and not up to bioperl. > > We already had code in Root::IO like this: > $line =~ s/\r\n/\n/ if( (! $param{-raw}) && (defined $line); > > I have no recollection of when it was added or by whom, I could be the > guilty party but I really don't remember. > > So I don't have the answers to your questions: > Q1: What byte sequence in the data do you want to change to what? > Q2: What operating system is the code running on? > > I think the intention here was that if > perl -i -e -p 's/\n\r/\n/g' file.dnd > cleaned up the problem, why shouldn't that be part of the IO input > automatically. > > I'm going to pass on fixing this bug for now. Hopefully someone else will > get inspired from this discussion and test and propose THE RIGHT solution. > > -jason Ah, now that's interesting. In this specific case the application, newick.pm, has explicitly opted out of Perl's end-of-line handling by redefining $/ so it can slurp the whole tree at once: local $/ = ";\n"; return unless $_ = $self->_readline; Which, IMHO, makes it its problem to deal with line breaks. So, unless the problem also occurs in regular code using Perl's default line break handling, I'd say the bug should be fixed by adding whatever code is required in the newick module, not by adding complexity in Root::IO for that special case. Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From jason at cgt.duhs.duke.edu Tue Dec 16 11:44:18 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 16 11:50:56 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: <3FDF2CD8.9040604@mrc-lmb.cam.ac.uk> References: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> <6010E5B6-2FBC-11D8-B94E-000A958C5008@pcbi.upenn.edu> <3FDEF4E4.9090203@mrc-lmb.cam.ac.uk> <3FDF2CD8.9040604@mrc-lmb.cam.ac.uk> Message-ID: On Tue, 16 Dec 2003, Dave Howorth wrote: > Ah, now that's interesting. In this specific case the application, > newick.pm, has explicitly opted out of Perl's end-of-line handling by > redefining $/ so it can slurp the whole tree at once: > > local $/ = ";\n"; > return unless $_ = $self->_readline; > > Which, IMHO, makes it its problem to deal with line breaks. Hmmm - SeqIO::fasta does this sort of thing as well. This has nothing to do with the individual fields though - it only defines how much to slurp in, if it weren't working we'd get two trees mooshed together as one record and doesn't affect the multi-lined reports since they only have a ; at the end. In the end this had nothing to do with Windows LF problems once I had Valentin's test file in front of me. Adding this to newick.pm after the record is slurped in takes care of the problem: s/[\n\r]+//g As any sort of newline needs to be stripped out as that is what is getting converted to spaces. It really wasn't a windows problem but a problem with Allen's changes to the newick parsing code to replace WS with _ but not handling LF separately. >From the log: revision 1.22 date: 2003/08/15 17:07:27; author: allenday; state: Exp; lines: +3 -2 removed unnecessary escap char in space removing regex. added regex to remove quotes and leading/trailing spaces from node labels as necessary. ---------------------------- revision 1.21 date: 2003/08/15 08:31:46; author: allenday; state: Exp; lines: +5 -2 fixing over-zealous whitespace removal from node labels. we do this by not tampering with " quoted strings. i'm not sure if newick allows " to be escaped within these labels... if so, there may be a bug here. ---------------------------- My original code stripped all whitespace and thus we never had this problem because there shouldn't be any in the node names in Newick http://evolution.genetics.washington.edu/phylip/newicktree.html "A name can be any string of printable characters except --->blanks<---, colons, semcolons, parentheses, and square brackets." but apparently he wants to support this for his purposes. I think my small change above takes care of the bug. -jason > > So, unless the problem also occurs in regular code using Perl's default > line break handling, I'd say the bug should be fixed by adding whatever > code is required in the newick module, not by adding complexity in > Root::IO for that special case. > > Cheers, Dave > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From mike.muratet at torchtechnologies.com Tue Dec 16 12:19:43 2003 From: mike.muratet at torchtechnologies.com (Mike Muratet) Date: Tue Dec 16 12:25:35 2003 Subject: [Bioperl-l] Graphical alignments with blast Message-ID: <021e01c3c3f8$cc8e9d30$5301a8c0@muratet> Greetings I am trying to roll my own simple graphical alignment from blast data (unless of course someone already knows of one). As near as I can figure from bptutorial and the man pages, the approach would be create a set of LocatableSeq objects to manually create a SimpleAlign object. At that point, one could write out a file which could be plotted with one of existing utilities, or use a Bio::Graphics object. Has anyone gone down this path? Thanks Mike From allenday at ucla.edu Tue Dec 16 12:28:26 2003 From: allenday at ucla.edu (Allen Day) Date: Tue Dec 16 12:34:18 2003 Subject: [Bioperl-l] Trees as graphics In-Reply-To: Message-ID: Nathan, As far as I know, there is nothing in an existing distribution; you'll have to use CVS. Have a look at Bio::TreeIO::svggraph. -Allen On Tue, 16 Dec 2003, Nathan Haigh wrote: > Hello, > > > > I was reading a response you made on the Bioperl discussion board about an > adapter that draws trees as SVG from newick format trees. I was wondering > if you have released the adapter or if you know or a way I might achieve > the following: > > > > I am building a database driven website, where I would like to store trees > in newick format and display them in a browser window as graphics via a > link. > > > > Regards > > Nathan Haigh > > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 11/12/2003 > Tested on: 15/12/2003 17:08:32 > avast! is copyright (c) 2000-2003 ALWIL Software. > http://www.avast.com > > > From jason at cgt.duhs.duke.edu Tue Dec 16 12:35:09 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 16 12:41:41 2003 Subject: [Bioperl-l] Bio::TreeIO In-Reply-To: <1071588526.28148.5.camel@bioinf10> References: <1071588526.28148.5.camel@bioinf10> Message-ID: I'm not sure I understand - you want to reroot the tree so it is most balanced? -jason On Tue, 16 Dec 2003, Iain Wallace wrote: > Hi all, > > I am trying to use the Bio::TreeIO module to split my sequences into two > groups. > > My problem is that I want these two groups to contain as many sequences > as possible. At the moment I am using NJplot to define an outgroup, so > that the tree is as evenly split as i can make it. > > I am wondering if any one knows an easier way to do this.....? > > Any help would be great > > Thanks > > Iain > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From allenday at ucla.edu Tue Dec 16 12:36:23 2003 From: allenday at ucla.edu (Allen Day) Date: Tue Dec 16 12:42:13 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: Message-ID: > Adding this to newick.pm after the record is slurped in takes > care of the problem: > s/[\n\r]+//g > > As any sort of newline needs to be stripped out as that is what is > getting converted to spaces. It really wasn't a windows problem but > a problem with Allen's changes to the newick parsing code to replace WS > with _ but not handling LF separately. > > >From the log: > > revision 1.22 > date: 2003/08/15 17:07:27; author: allenday; state: Exp; lines: +3 -2 > removed unnecessary escap char in space removing regex. added regex to > remove quotes and leading/trailing spaces > from node labels as necessary. > ---------------------------- > revision 1.21 > date: 2003/08/15 08:31:46; author: allenday; state: Exp; lines: +5 -2 > fixing over-zealous whitespace removal from node labels. we do this by > not tampering with " quoted strings. i'm not sure if newick allows " to > be escaped within these labels... if so, there may be a bug here. > ---------------------------- > > My original code stripped all whitespace and thus we never had this > problem because there shouldn't be any in the node names in Newick > http://evolution.genetics.washington.edu/phylip/newicktree.html > "A name can be any string of printable characters except --->blanks<---, > colons, semcolons, parentheses, and square brackets." > > but apparently he wants to support this for his purposes. Yes, I have had to parse newick files that do contain spaces in node names. I'd like to preserve these in the input. I think it would be a good idea when writing a tree to throw an error and/or remove any illegal characters (blanks, colons, semicolons, etc), but at the time of modification I didn't have to deal with writing trees. -allen From jason at cgt.duhs.duke.edu Tue Dec 16 12:37:21 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 16 12:43:56 2003 Subject: [Bioperl-l] Graphical alignments with blast In-Reply-To: <021e01c3c3f8$cc8e9d30$5301a8c0@muratet> References: <021e01c3c3f8$cc8e9d30$5301a8c0@muratet> Message-ID: On Tue, 16 Dec 2003, Mike Muratet wrote: > Greetings > > I am trying to roll my own simple graphical alignment from blast data > (unless of course someone already knows of one). As near as I can figure > from bptutorial and the man pages, the approach would be create a set of > LocatableSeq objects to manually create a SimpleAlign object. At that point, You can already get this out with my $aln = $hsp->get_aln(); > one could write out a file which could be plotted with one of existing > utilities, or use a Bio::Graphics object. Has anyone gone down this path? As for drawing it - you want to see the alignment with the bases drawn out, right? Some of the glyphs in Bio::Graphics do allow for drawing pairwise alignments pretty well, but I've only used them in the context of Gbrowse - would be a useful example script to have working to show how to generate this sort of thing from biographics. Alternatively you can use something like alscript to generate the pretty alignment: http://www.compbio.dundee.ac.uk/Software/Alscript/alscript.html Boxshade may also provide what you want as well. The pasteur site has a review of a bunch of software that falls in this category here: http://bioweb.pasteur.fr/cgi-bin/seqanal/review-edital.pl > > Thanks > > Mike > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From dhoworth at mrc-lmb.cam.ac.uk Tue Dec 16 12:48:33 2003 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Tue Dec 16 12:54:32 2003 Subject: [Bioperl-l] Root::IO handle Mac and Win32 LF In-Reply-To: References: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> <6010E5B6-2FBC-11D8-B94E-000A958C5008@pcbi.upenn.edu> <3FDEF4E4.9090203@mrc-lmb.cam.ac.uk> <3FDF2CD8.9040604@mrc-lmb.cam.ac.uk> Message-ID: <3FDF4571.9030602@mrc-lmb.cam.ac.uk> Jason Stajich wrote: > On Tue, 16 Dec 2003, Dave Howorth wrote: >>Ah, now that's interesting. In this specific case the application, >>newick.pm, has explicitly opted out of Perl's end-of-line handling by >>redefining $/ so it can slurp the whole tree at once: >> >> local $/ = ";\n"; >> return unless $_ = $self->_readline; >> >>Which, IMHO, makes it its problem to deal with line breaks. > > Hmmm - SeqIO::fasta does this sort of thing as well. > > This has nothing to do with the individual fields though - it only defines > how much to slurp in, if it weren't working we'd get two trees mooshed > together as one record and doesn't affect the multi-lined reports since > they only have a ; at the end. > > In the end this had nothing to do with Windows LF problems once I had > Valentin's test file in front of me. > > Adding this to newick.pm after the record is slurped in takes > care of the problem: > s/[\n\r]+//g > > As any sort of newline needs to be stripped out as that is what is > getting converted to spaces. It really wasn't a windows problem but > a problem with Allen's changes to the newick parsing code to replace WS > with _ but not handling LF separately. > >>From the log: > > revision 1.22 > date: 2003/08/15 17:07:27; author: allenday; state: Exp; lines: +3 -2 > removed unnecessary escap char in space removing regex. added regex to > remove quotes and leading/trailing spaces > from node labels as necessary. > ---------------------------- > revision 1.21 > date: 2003/08/15 08:31:46; author: allenday; state: Exp; lines: +5 -2 > fixing over-zealous whitespace removal from node labels. we do this by > not tampering with " quoted strings. i'm not sure if newick allows " to > be escaped within these labels... if so, there may be a bug here. > ---------------------------- "Single quote characters in a quoted label are represented by two single quotes." See below for reference. > My original code stripped all whitespace and thus we never had this > problem because there shouldn't be any in the node names in Newick > http://evolution.genetics.washington.edu/phylip/newicktree.html > "A name can be any string of printable characters except --->blanks<---, > colons, semcolons, parentheses, and square brackets." I agree that page says that, but it also says: "The above description is actually of a subset of the Newick Standard" and on the page which it points to as the closest thing to a standard: you can see that those characters *can* appear in labels. But newlines can't. And what should also happen is that underscores in unquoted labels are translated to spaces in the internal format, because underscore and space are two different valid characters in the quoted format (and thus look different in graphical output in a tool that can deal with it e.g. TreeTool). But this breaks Bio::Tree or somesuch (don't ask me how I know :( > but apparently he wants to support this for his purposes. ... as do I :) > I think my small change above takes care of the bug. > > -jason Cheers, Dave PS If anybody wants code that reads full Newick ... -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From lembark at wrkhors.com Tue Dec 16 13:10:35 2003 From: lembark at wrkhors.com (Steven Lembark) Date: Tue Dec 16 13:18:56 2003 Subject: [Bioperl-l] Graphical alignments with blast In-Reply-To: References: <021e01c3c3f8$cc8e9d30$5301a8c0@muratet> Message-ID: <321130000.1071598234@[192.168.100.3]> -- Jason Stajich > On Tue, 16 Dec 2003, Mike Muratet wrote: > >> Greetings >> >> I am trying to roll my own simple graphical alignment from blast data >> (unless of course someone already knows of one). As near as I can figure >> from bptutorial and the man pages, the approach would be create a set of >> LocatableSeq objects to manually create a SimpleAlign object. At that >> point, You can also look at non-alignment graphical viewers such as the w-curve (see bioinfo.org for example code). The curve-based tools use some sort of state machine to generate a trace of the DNA, which then becomes the curve used for testing similarity. We've had good luck with variations on the w-curve so far at IIT. -- Steven Lembark 2930 W. Palmer Workhorse Computing Chicago, IL 60647 +1 888 359 3508 From jason at cgt.duhs.duke.edu Tue Dec 16 13:15:01 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 16 13:21:39 2003 Subject: full newick support (was Re: [Bioperl-l] Root::IO handle Mac and Win32 LF) In-Reply-To: <3FDF4571.9030602@mrc-lmb.cam.ac.uk> References: <3FDEE6FF.5030305@mrc-lmb.cam.ac.uk> <6010E5B6-2FBC-11D8-B94E-000A958C5008@pcbi.upenn.edu> <3FDEF4E4.9090203@mrc-lmb.cam.ac.uk> <3FDF2CD8.9040604@mrc-lmb.cam.ac.uk> <3FDF4571.9030602@mrc-lmb.cam.ac.uk> Message-ID: okay - I see your point Dave, supporting the quoted spaces must be done. clearly my flu-clouded brain is trying to do too much right now. Apologies Allen, you were on the right track I am clearly not reading up on things as I should have. Internally translating unquoted underscores to spaces can be done presumably at the parser level. Will probably have to generate an auto-quoting aspect for the writing of code. I guess we need to strip out the comments as well - not sure we have a slot for storing them in the current Tree::Tree object right now, but that can be easily added. If there are other things missing that I am unaware of speak up. I don't suppose anyone else is keen on working on this? -jason On Tue, 16 Dec 2003, Dave Howorth wrote: > Jason Stajich wrote: > > On Tue, 16 Dec 2003, Dave Howorth wrote: > >>Ah, now that's interesting. In this specific case the application, > >>newick.pm, has explicitly opted out of Perl's end-of-line handling by > >>redefining $/ so it can slurp the whole tree at once: > >> > >> local $/ = ";\n"; > >> return unless $_ = $self->_readline; > >> > >>Which, IMHO, makes it its problem to deal with line breaks. > > > > Hmmm - SeqIO::fasta does this sort of thing as well. > > > > This has nothing to do with the individual fields though - it only defines > > how much to slurp in, if it weren't working we'd get two trees mooshed > > together as one record and doesn't affect the multi-lined reports since > > they only have a ; at the end. > > > > In the end this had nothing to do with Windows LF problems once I had > > Valentin's test file in front of me. > > > > Adding this to newick.pm after the record is slurped in takes > > care of the problem: > > s/[\n\r]+//g > > > > As any sort of newline needs to be stripped out as that is what is > > getting converted to spaces. It really wasn't a windows problem but > > a problem with Allen's changes to the newick parsing code to replace WS > > with _ but not handling LF separately. > > > >>From the log: > > > > revision 1.22 > > date: 2003/08/15 17:07:27; author: allenday; state: Exp; lines: +3 -2 > > removed unnecessary escap char in space removing regex. added regex to > > remove quotes and leading/trailing spaces > > from node labels as necessary. > > ---------------------------- > > revision 1.21 > > date: 2003/08/15 08:31:46; author: allenday; state: Exp; lines: +5 -2 > > fixing over-zealous whitespace removal from node labels. we do this by > > not tampering with " quoted strings. i'm not sure if newick allows " to > > be escaped within these labels... if so, there may be a bug here. > > ---------------------------- > > "Single quote characters in a quoted label are represented by two single > quotes." See below for reference. > > > My original code stripped all whitespace and thus we never had this > > problem because there shouldn't be any in the node names in Newick > > http://evolution.genetics.washington.edu/phylip/newicktree.html > > "A name can be any string of printable characters except --->blanks<---, > > colons, semcolons, parentheses, and square brackets." > > > I agree that page says that, but it also says: > "The above description is actually of a subset of the Newick Standard" > and on the page which it points to as the closest thing to a standard: > > you can see that those characters *can* appear in labels. But newlines > can't. > > And what should also happen is that underscores in unquoted labels are > translated to spaces in the internal format, because underscore and > space are two different valid characters in the quoted format (and thus > look different in graphical output in a tool that can deal with it e.g. > TreeTool). But this breaks Bio::Tree or somesuch (don't ask me how I know :( > > > but apparently he wants to support this for his purposes. > > ... as do I :) > > > I think my small change above takes care of the bug. > > > > -jason > > Cheers, Dave > > PS If anybody wants code that reads full Newick ... > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From gulban at sickkids.ca Tue Dec 16 14:07:02 2003 From: gulban at sickkids.ca (omid gulban) Date: Tue Dec 16 18:17:03 2003 Subject: [Bioperl-l] How can annotate a list of genes functionally using bioperl Message-ID: <000801c3c407$c9e1cee0$6fc1148e@omid> Hello, Are there any bioperl modules that would assist in annotation of genes fucntionally? Basically I have a list of genes (>1000 with Ensembl identifiers) I would like know what the fucntion of these genes are plus any pathway information available on them. Thanks Omid From redwards at utmem.edu Tue Dec 16 21:48:25 2003 From: redwards at utmem.edu (Rob Edwards) Date: Tue Dec 16 21:54:18 2003 Subject: [Bioperl-l] Bio::TreeIO In-Reply-To: <1071588526.28148.5.camel@bioinf10> Message-ID: <7C448399-303B-11D8-B753-000A959E1622@utmem.edu> On Tuesday, December 16, 2003, at 09:28 AM, Iain Wallace wrote: > Hi all, > > I am trying to use the Bio::TreeIO module to split my sequences into > two > groups. > > My problem is that I want these two groups to contain as many sequences > as possible. At the moment I am using NJplot to define an outgroup, so > that the tree is as evenly split as i can make it. > > I am wondering if any one knows an easier way to do this.....? > > Any help would be great > > Thanks > > Iain Here is one (totally inelegant) way to get parts of a tree. The two attached scripts work together. The first adds CountXXX (where XXX is the number of the node) to each node as an id on internal nodes and writes out the tree. You can then look at the tree (e.g. using atv or treeview) and decide which node you want to cut the tree at. The second script takes the tree file and the node id (CountXXX) and just writes out that part of the tree. Its not exactly what you want (I don't think) but it may put you on the right track. add_tree_node_counts.pl --------begin #!/usr/bin/perl -w # add node counts to a tree use strict; use Bio::TreeIO; my $file=shift || die "tree file?"; my $tio=Bio::TreeIO->new(-file=>$file, -format=>'newick'); my $tout=Bio::TreeIO->new(-file=>">$file.nwk", -format=>'newick'); my $count=1; while (my $tree=$tio->next_tree) { foreach my $node ($tree->get_nodes) { $node->id("Count$count") unless ($node->id); $count++; } $tout->write_tree($tree); } -------end trim_tree.pl -------begin #!/usr/bin/perl -w # trim a tree to a selected node use strict; use Bio::TreeIO; use Bio::Tree::Tree; my ($file, $nodeid)=@ARGV; unless ($file && $nodeid) {die "$0 "} my $tio=Bio::TreeIO->new(-file=>$file, -format=>'newick'); my $tout=Bio::TreeIO->new(-file=>">$nodeid.nwk", -format=>'newick'); while (my $tree=$tio->next_tree) { print "Whole tree total length is ", $tree->total_branch_length, "\n"; foreach my $node ($tree->get_nodes) { my $val=$node->id; if ($val && $val eq $nodeid) { my $newtree=Bio::Tree::Tree->new(-root=>$node); print "Trimmed total length is ", $newtree->total_branch_length, "\n"; $tout->write_tree($newtree); } } } ------end From hlapp at gmx.net Wed Dec 17 01:22:19 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 17 01:28:11 2003 Subject: [Bioperl-l] Bio::Ontology::Term needs Bio::Annotation::Reference In-Reply-To: Message-ID: <5E083F58-3059-11D8-9B25-000A959EB4C4@gmx.net> On Wednesday, December 10, 2003, at 03:26 AM, Juguang Xiao wrote: > The term presenting InterPro record has the lists of attribute such as > examples in the member database, which acts as > Bio::Annotation::DBLink, and the related publications, which should be > presented by Bio::Annotation::Reference, but the current Term does not > contain that. So I will append the following pretty simple methods if > no objections from the list. > > add_reference (accept an array of arguments) > get_references > remove_references > Do you want to add this to Term or to InterProTerm only? Could make sense to add it to Term already (Bio::Ontology::Term, from which InterProTerm inherits). If you do that, don't forget to add it to Bio::Ontology::TermI as well. > Accordingly, the biosql schema should have term_ref table, similar to > term_dbxref. (I leave this to Hilmar) > Hm. What bothers me about this is that slowly but steadily terms become redundant in their properties to bioentries. Also, when retrieving terms, the language binding code needs to look up all possible relationships for every term, as it doesn't know in advance that e.g. for GO terms there aren't going to be any references. (Uhm, maybe this isn't really true anymore anyway?) OK so maybe that's what needs to be done then ... -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Dec 17 01:26:41 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 17 01:32:31 2003 Subject: [Bioperl-l] How to proceed with 1.4? In-Reply-To: <200312091023.56271.heikki@nildram.co.uk> Message-ID: Heikki, is Xmas your deadline for getting 1.4 out the door? I think it'd be useful to have Juguang's ontology term changes in before code freeze. OTOH the changes will be largely untested, but that could be handled with bug-fix releases. Have you set a deadline for code-freeze already? -hilmar On Tuesday, December 9, 2003, at 02:23 AM, Heikki Lehvaslaiho wrote: > > I have so far relased three snapshots from the bioperl core/live cvs > head. > Things have settled down a bit, but there are still outstanding issues. > Especially: > > - restriction analysis fixes need to merged and commited (Rob) > - SearchIO::psiblast & related module removal (Steve) > - really long qualifier names in sequence feature tables (Ewan?) #1561 > > I'd like to see these in before I release the next and hopefully last > snap > shot. Or would some like to see a snapshot out now? > > > Then there is the issue of other cvs modlues closely tied to core. Ext > is > simle there have been one major addition during last six months which > is well > documented and seems to work without problems. I can release that the > same > day as core. > > Run is a bit more complicated. There are issues with > - newer version of EMBOSS, #1481 > - TCoffee, #1453, #1557 ( and #1510, #1514) > > We need someone to look into these. It would be great to have them > fixed this > week so the we could have all three packages out before Christmas. > > Any comments and contributions to that effect welcome, > > -Heikki > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Dec 17 01:31:58 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 17 01:37:55 2003 Subject: [Bioperl-l] proposed additions to SeqFeatureI, RangeI and FeatureHolderI In-Reply-To: Message-ID: On Wednesday, December 3, 2003, at 11:46 AM, Chris Mungall wrote: > there seems to be 3 different kinds of attributes: > > foo() foo($foo) > get_foo() set_foo($foo) > get_tag_values('foo') set_tag_values('foo', [$foo]) > > I'm not sure what the rules are for deciding which attributes have > which > kinds of accessor > Sort of late reply, but generally speaking scalar properties get a simple-named dual getter/setter accessor (foo(), foo($newfoo)), whereas array properties have the get_XXXX() naming. Theoretically at least ... -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Wed Dec 17 03:14:31 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Dec 17 03:20:20 2003 Subject: [Bioperl-l] SeqIO tests fail Message-ID: <0AA5135C-3069-11D8-AD56-000A959EB4C4@gmx.net> I get 4 failures for SeqIO.t that seem to be related to a newline problem: not ok 159 # Test 159 got: '(CALM1 OR CAM1 OR CALM OR CAM) AND (CALM2 OR CAM2 OR CAMB) AND (CALM3 OR CAM3 OR CAMC)' (t/SeqIO.t at line 408) # Expected: '(CALM1 OR CAM1 OR CALM OR CAM) AND (CALM2 OR CAM2 OR CAMB) AND (CALM3 OR CAM3 OR CAMC)' Not sure the email client won't introduce more random line breaks, but essentially the deal is that the two values are identical except for an additional linebreak in the expected value. I have 3 more such failures. This is on Mac OSX. Before I investigate, does anybody have an upfront idea what's been changed here that could cause this? Jason, may those recent \r\n-related regexp changes be involved here? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From heikki at nildram.co.uk Wed Dec 17 06:09:44 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 17 06:15:44 2003 Subject: [Bioperl-l] How to proceed with 1.4? In-Reply-To: References: Message-ID: <200312171109.44873.heikki@nildram.co.uk> Hilmar, I am preparing the last snapshot today. The issues I've known in tests and fixes needed for modules are all in. I have not been strict about code freeze. I will not branch before I have to. Unless massive new code bases start sudenly coming in (highly unlikely and strongly discouraged), I will not branch before the 1.4. Christmas seems to be the ultimate deadline. I was hoping to get 1.4 out before that but some fixes were slow to get in. If the ontology changes you suggest are in and tested within week, they will be in the 1.4. To repeat, the last snapshot will be out today. Unless something really big turns up before Monday, I'll release the 1.4 then (or the next day). -Heikki On Wednesday 17 Dec 2003 6:26 am, Hilmar Lapp wrote: > Heikki, is Xmas your deadline for getting 1.4 out the door? > > I think it'd be useful to have Juguang's ontology term changes in > before code freeze. OTOH the changes will be largely untested, but that > could be handled with bug-fix releases. > > Have you set a deadline for code-freeze already? > > -hilmar > > On Tuesday, December 9, 2003, at 02:23 AM, Heikki Lehvaslaiho wrote: > > I have so far relased three snapshots from the bioperl core/live cvs > > head. > > Things have settled down a bit, but there are still outstanding issues. > > Especially: > > > > - restriction analysis fixes need to merged and commited (Rob) > > - SearchIO::psiblast & related module removal (Steve) > > - really long qualifier names in sequence feature tables (Ewan?) #1561 > > > > I'd like to see these in before I release the next and hopefully last > > snap > > shot. Or would some like to see a snapshot out now? > > > > > > Then there is the issue of other cvs modlues closely tied to core. Ext > > is > > simle there have been one major addition during last six months which > > is well > > documented and seems to work without problems. I can release that the > > same > > day as core. > > > > Run is a bit more complicated. There are issues with > > - newer version of EMBOSS, #1481 > > - TCoffee, #1453, #1557 ( and #1510, #1514) > > > > We need someone to look into these. It would be great to have them > > fixed this > > week so the we could have all three packages out before Christmas. > > > > Any comments and contributions to that effect welcome, > > > > -Heikki > > > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Wed Dec 17 06:09:44 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 17 06:15:49 2003 Subject: [Bioperl-l] How to proceed with 1.4? In-Reply-To: References: Message-ID: <200312171109.44873.heikki@nildram.co.uk> Hilmar, I am preparing the last snapshot today. The issues I've known in tests and fixes needed for modules are all in. I have not been strict about code freeze. I will not branch before I have to. Unless massive new code bases start sudenly coming in (highly unlikely and strongly discouraged), I will not branch before the 1.4. Christmas seems to be the ultimate deadline. I was hoping to get 1.4 out before that but some fixes were slow to get in. If the ontology changes you suggest are in and tested within week, they will be in the 1.4. To repeat, the last snapshot will be out today. Unless something really big turns up before Monday, I'll release the 1.4 then (or the next day). -Heikki On Wednesday 17 Dec 2003 6:26 am, Hilmar Lapp wrote: > Heikki, is Xmas your deadline for getting 1.4 out the door? > > I think it'd be useful to have Juguang's ontology term changes in > before code freeze. OTOH the changes will be largely untested, but that > could be handled with bug-fix releases. > > Have you set a deadline for code-freeze already? > > -hilmar > > On Tuesday, December 9, 2003, at 02:23 AM, Heikki Lehvaslaiho wrote: > > I have so far relased three snapshots from the bioperl core/live cvs > > head. > > Things have settled down a bit, but there are still outstanding issues. > > Especially: > > > > - restriction analysis fixes need to merged and commited (Rob) > > - SearchIO::psiblast & related module removal (Steve) > > - really long qualifier names in sequence feature tables (Ewan?) #1561 > > > > I'd like to see these in before I release the next and hopefully last > > snap > > shot. Or would some like to see a snapshot out now? > > > > > > Then there is the issue of other cvs modlues closely tied to core. Ext > > is > > simle there have been one major addition during last six months which > > is well > > documented and seems to work without problems. I can release that the > > same > > day as core. > > > > Run is a bit more complicated. There are issues with > > - newer version of EMBOSS, #1481 > > - TCoffee, #1453, #1557 ( and #1510, #1514) > > > > We need someone to look into these. It would be great to have them > > fixed this > > week so the we could have all three packages out before Christmas. > > > > Any comments and contributions to that effect welcome, > > > > -Heikki > > > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Wed Dec 17 06:45:26 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 17 06:51:17 2003 Subject: [Bioperl-l] Disabling web tests [some tests fail on Mac OS X] In-Reply-To: <200312161430.34208.heikki@nildram.co.uk> References: <200312161430.34208.heikki@nildram.co.uk> Message-ID: <200312171145.26753.heikki@nildram.co.uk> I disabled web dependent tests in t/Scansite.t for the time being since they have been hanging tests. I did this so that these tests are run only if BIOPERLDEBUG is set. Is this a way forward? -Heikki On Tuesday 16 Dec 2003 2:30 pm, Heikki Lehvaslaiho wrote: > That is good idea. What is the best way to do it? Is there a flag we can > pass to Test harness or should we decide upon an environmental variable > like "BIOPERLDEBUG"? > > -Heikki > > On Tuesday 16 Dec 2003 1:15 am, Jason Stajich wrote: > > We may need to think about how to set a flag for tests which should not > > run when no network connecion is found (or perhaps not be run as part of > > the default tests no matter what unless someone asks for them > > specifically - this might save a lot of future pain). -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Wed Dec 17 08:09:53 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 17 08:21:49 2003 Subject: [Bioperl-l] SeqIO tests fail In-Reply-To: <0AA5135C-3069-11D8-AD56-000A959EB4C4@gmx.net> References: <0AA5135C-3069-11D8-AD56-000A959EB4C4@gmx.net> Message-ID: <200312171309.53813.heikki@nildram.co.uk> It seems that Sheldon is using some test editor that has a tendency to wrap lines. Reverted. -Heikki On Wednesday 17 Dec 2003 8:14 am, Hilmar Lapp wrote: > I get 4 failures for SeqIO.t that seem to be related to a newline > problem: > > not ok 159 > # Test 159 got: '(CALM1 OR CAM1 OR CALM OR CAM) AND (CALM2 OR CAM2 OR > CAMB) AND (CALM3 OR CAM3 OR CAMC)' (t/SeqIO.t at line 408) > # Expected: '(CALM1 OR CAM1 OR CALM OR CAM) AND (CALM2 OR CAM2 OR > CAMB) AND (CALM3 OR > CAM3 OR CAMC)' > > Not sure the email client won't introduce more random line breaks, but > essentially the deal is that the two values are identical except for an > additional linebreak in the expected value. I have 3 more such failures. > > This is on Mac OSX. Before I investigate, does anybody have an upfront > idea what's been changed here that could cause this? Jason, may those > recent \r\n-related regexp changes be involved here? > > -hilmar -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Wed Dec 17 08:09:53 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 17 08:21:53 2003 Subject: [Bioperl-l] SeqIO tests fail In-Reply-To: <0AA5135C-3069-11D8-AD56-000A959EB4C4@gmx.net> References: <0AA5135C-3069-11D8-AD56-000A959EB4C4@gmx.net> Message-ID: <200312171309.53813.heikki@nildram.co.uk> It seems that Sheldon is using some test editor that has a tendency to wrap lines. Reverted. -Heikki On Wednesday 17 Dec 2003 8:14 am, Hilmar Lapp wrote: > I get 4 failures for SeqIO.t that seem to be related to a newline > problem: > > not ok 159 > # Test 159 got: '(CALM1 OR CAM1 OR CALM OR CAM) AND (CALM2 OR CAM2 OR > CAMB) AND (CALM3 OR CAM3 OR CAMC)' (t/SeqIO.t at line 408) > # Expected: '(CALM1 OR CAM1 OR CALM OR CAM) AND (CALM2 OR CAM2 OR > CAMB) AND (CALM3 OR > CAM3 OR CAMC)' > > Not sure the email client won't introduce more random line breaks, but > essentially the deal is that the two values are identical except for an > additional linebreak in the expected value. I have 3 more such failures. > > This is on Mac OSX. Before I investigate, does anybody have an upfront > idea what's been changed here that could cause this? Jason, may those > recent \r\n-related regexp changes be involved here? > > -hilmar -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Wed Dec 17 08:30:32 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 17 08:36:23 2003 Subject: [Bioperl-l] Bug in SeqIO genbank output In-Reply-To: <3FDE8C60.90809@csiro.au> References: <3FDE8C60.90809@csiro.au> Message-ID: <200312171330.32205.heikki@nildram.co.uk> Wes, You didnot say which versionof bioperl you are using. For some reason which I can not quite understand, the current code: $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : '')); does print out the requred six spaces after the word ORIGIN. This was recently fixed. Now, why doesn't it work for you? Could you check that you do not have multiple copies of bioperl in your computer and the older one gets accidently executed? Sorry, I can not comeupwith any better explanation, -Heikki On Tuesday 16 Dec 2003 4:38 am, Wes Barris wrote: > Hi, > > I have just succeeded in tracking down a bug that prevents genbank files > written from bioperl from being properly imported into StackPack > (clustering software). The problem is due to a subtle difference in > a genbank entry downloaded from NCBI and a genbank entry produced using > genbank.pm. If you use "od -c" to look at a genbank record from NCBI, > you will notice that the word "ORIGIN" is followed by six space characters. > > ORIGIN > 1 cggccgcgtc gacttttttt ttaggtattt ttctcttatt atttctaaaa > tataaatttt 61 ggacattcaa aagtgcaaca ngttaatgtg cctgtgggga atatcacagt > taaaaaaata > > If I process this file using bioperl and then write out a new genbank > format file, the word "ORIGIN" is followed immediately by a carriage return > (newline) character. > > It seems silly to me that spaces should be required after the word > "ORIGIN", but they do exist in files downloaded from NCBI and StackPack > seems to require these space characters in order to import a genbank file. > Is there an official specification for the genbank format? I have sent a > bug report to the makers of StackPack too. > > In the meantime, I have modified my installed copy of Bio/SeqIO/genbank.pm > changing this line: > > $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : '')); > > to this: > > $self->_print(sprintf("%-12s%s\n",'ORIGIN ',$o ? $o->value : > '')); -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Wed Dec 17 08:30:32 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 17 08:36:26 2003 Subject: [Bioperl-l] Bug in SeqIO genbank output In-Reply-To: <3FDE8C60.90809@csiro.au> References: <3FDE8C60.90809@csiro.au> Message-ID: <200312171330.32205.heikki@nildram.co.uk> Wes, You didnot say which versionof bioperl you are using. For some reason which I can not quite understand, the current code: $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : '')); does print out the requred six spaces after the word ORIGIN. This was recently fixed. Now, why doesn't it work for you? Could you check that you do not have multiple copies of bioperl in your computer and the older one gets accidently executed? Sorry, I can not comeupwith any better explanation, -Heikki On Tuesday 16 Dec 2003 4:38 am, Wes Barris wrote: > Hi, > > I have just succeeded in tracking down a bug that prevents genbank files > written from bioperl from being properly imported into StackPack > (clustering software). The problem is due to a subtle difference in > a genbank entry downloaded from NCBI and a genbank entry produced using > genbank.pm. If you use "od -c" to look at a genbank record from NCBI, > you will notice that the word "ORIGIN" is followed by six space characters. > > ORIGIN > 1 cggccgcgtc gacttttttt ttaggtattt ttctcttatt atttctaaaa > tataaatttt 61 ggacattcaa aagtgcaaca ngttaatgtg cctgtgggga atatcacagt > taaaaaaata > > If I process this file using bioperl and then write out a new genbank > format file, the word "ORIGIN" is followed immediately by a carriage return > (newline) character. > > It seems silly to me that spaces should be required after the word > "ORIGIN", but they do exist in files downloaded from NCBI and StackPack > seems to require these space characters in order to import a genbank file. > Is there an official specification for the genbank format? I have sent a > bug report to the makers of StackPack too. > > In the meantime, I have modified my installed copy of Bio/SeqIO/genbank.pm > changing this line: > > $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : '')); > > to this: > > $self->_print(sprintf("%-12s%s\n",'ORIGIN ',$o ? $o->value : > '')); -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From lstein at cshl.edu Wed Dec 17 08:46:21 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Dec 17 08:52:15 2003 Subject: [Bioperl-l] bp_genbank2gff problems In-Reply-To: <20031215195612.GA25494@psychro> References: <20031215172917.GA17947@psychro> <200312151444.37105.lstein@cshl.edu> <20031215195612.GA25494@psychro> Message-ID: <200312170846.21527.lstein@cshl.edu> The bp_genbank2gff problem is now fixed. I am now looking into the loaders to see whether it is true that they don't handle the combined annotation/fasta format. Lincoln On Monday 15 December 2003 02:56 pm, Neil Saunders wrote: > > This is a bug and it shall be fixed. > > Good to know! Thanks. > > I switched back to bioperl 1.2.3 on the woody box. The > bp_genbank2gff script with that version seems to behave differently > and works quite well, although it throws exceptions on some gb > files leading to incomplete GFF output. > > Another query - Currently I'm using csplit to split my GFF3 files > into a 'GFF' portion and a 'fasta' portion, then passing them as > arguments to bp_load_gff.pl. Is this strictly necessary? I guess > the ideal situation would be a working bp_genbank2gff and a > bp_load_gff that could process the fasta portion too. > > Look forward to the latest (I know you guys work fast), > > Neil From lstein at cshl.edu Wed Dec 17 08:49:48 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Dec 17 08:55:43 2003 Subject: [Bioperl-l] bp_genbank2gff problems In-Reply-To: <20031215172917.GA17947@psychro> References: <20031215172917.GA17947@psychro> Message-ID: <200312170849.48339.lstein@cshl.edu> I've just confirmed that you do *not* need to split the GFF3 files into the annotation and DNA parts in order to use the current generation of Bio::DB::GFF loaders. Have fun! Lincoln On Monday 15 December 2003 12:29 pm, Neil Saunders wrote: > I'm having a frustating time with the bp_genbank2gff.pl script. > > I have 2 systems: > > (1) Debian sid, perl 5.8.2, latest CVS bioperl-live, bioperl-run > and Gbrowse. > (2) Debian woody, perl 5.6.1, CVS versions as above. > > I have written a script that takes a set of GenBank files and pipes > them through various processes to generate GFF, Fasta and conf > files for use with Gbrowse. On system (1) above, I use: > > bp_genbank2gff.pl -file -stdout > > No problems, GFF3 file output appears. > > On system (2) above, the same command gives errors of the type: > > ------------- EXCEPTION ------------- > MSG: Can't connect to database: Access denied for user: > '@localhost' to database 'test' > STACK Bio::DB::GFF::Adaptor::dbi::caching_handle::new > /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/dbi/caching_handle.p >m:89 STACK Bio::DB::GFF::Adaptor::dbi::new > /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/dbi.pm:93 > STACK Bio::DB::GFF::Adaptor::dbi::mysql::new > /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/dbi/mysql.pm:270 > STACK Bio::DB::GFF::Adaptor::biofetch::new > /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/biofetch.pm:95 > STACK Bio::DB::GFF::new > /usr/local/share/perl/5.6.1/Bio/DB/GFF.pm:599 STACK toplevel > /home/neil/gbrowse/genomes/scripts/bp_genbank2gff.pl:218 > > -------------------------------------- > DBI->connect(test) failed: Access denied for user: '@localhost' to > database 'test' at > /usr/local/share/perl/5.6.1/Bio/DB/GFF/Adaptor/dbi/caching_handle.p >m line 139 > > > Clearly the bp_genbank2gff script is trying to access a database > 'test' on 'localhost' with no user. I guess my question is: why? > I have told it to send to stdout. As I have the same bioperl > version on both machines, I'm pretty confused. The only thing I > noticed from 'make test' on the woody machine was this failure: > > t/GFF.t 255 65280 32 24 75.00% 21-32 > > Relevant? > > thanks for any pointers, > > Neil From jason at cgt.duhs.duke.edu Wed Dec 17 08:58:19 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Dec 17 09:05:01 2003 Subject: [Bioperl-l] Disabling web tests [some tests fail on Mac OS X] In-Reply-To: <200312171145.26753.heikki@nildram.co.uk> References: <200312161430.34208.heikki@nildram.co.uk> <200312171145.26753.heikki@nildram.co.uk> Message-ID: Sounds fine to me. On Wed, 17 Dec 2003, Heikki Lehvaslaiho wrote: > > > I disabled web dependent tests in t/Scansite.t for the time being since they > have been hanging tests. > > I did this so that these tests are run only if BIOPERLDEBUG is set. Is this a > way forward? > > -Heikki > > On Tuesday 16 Dec 2003 2:30 pm, Heikki Lehvaslaiho wrote: > > That is good idea. What is the best way to do it? Is there a flag we can > > pass to Test harness or should we decide upon an environmental variable > > like "BIOPERLDEBUG"? > > > > -Heikki > > > > On Tuesday 16 Dec 2003 1:15 am, Jason Stajich wrote: > > > We may need to think about how to set a flag for tests which should not > > > run when no network connecion is found (or perhaps not be run as part of > > > the default tests no matter what unless someone asks for them > > > specifically - this might save a lot of future pain). > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Wed Dec 17 14:05:09 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Dec 17 14:11:39 2003 Subject: [Bioperl-l] AUTHORS Message-ID: To those new bioperl contributors - don't forget to add yourself to the AUTHORS file in the repository before the 1.4 release so you can be acknowledged for your work. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From heikki at nildram.co.uk Wed Dec 17 18:28:59 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Wed Dec 17 18:34:49 2003 Subject: [Bioperl-l] Bioperl Developer snapshot 1.3.04 Message-ID: <200312172328.59980.heikki@nildram.co.uk> Bioperl Developer snapshot 1.3.04 ---------------------------------- This is the fourth and last developer snapshot from the BioPerl CVS head before release 1.4 which is due early next week. Bug fixes are still welcome, but please do not commit anything you do not have time to test well. http://bioperl.org/DIST/current_core_unstable.tar.gz http://bioperl.org/DIST/bioperl-1.3.04.tar.gz Changes since 1.3.04 -------------------- This time I dit not have the time to collect details from all the cvs commits (maybe that level of detail is not necessary? It is all in bioperl-guts, anyway.) The highlights are: * SVG::Graphics fine tuning ; SVG output, * Bio::Tree major changes plus a new howto. There is a new Feature-Annotation howto, too. * Parse more BLAST statistics and lots of BLAST-related fixes * Bio::Restriction::Analysis is now fixed and can handle overlapping and multiple cuts as well as circular sequences. * Bio::SearchIO and Bio::AlignIO can quess the incoming format by looking ahead into stream, thanks to Bio::Tools::GuessSeqFormat. * GenBank and EMBL parsers now tolerate words longer than line width. Also, they parse non-binomial virus names better into Bio::Species objects. EMBL parser now does better roundtrip of files. * New GAME format modules. * last minute fix: scripts/Bio-DB-GFF/bp_genbank2gff.PLS can now write to stdout without depending on mysql. NEW FILES: * Bio/OntologyIO/Handlers::InterPro_BioSQL_Handler.pm * Bio/Restriction/IO::bairoch.pm * Bio/SeqIO::game.pm * Bio::Tools::GuessSeqFormat * doc/howto/sgmlFeature-Annotation.sgml * doc/howto/sgm/Trees.sgml * examples/biographics/all_glyphs.pl Enjoy, -Heikki & the rest of the bioperl core team P.S. The web site was updated over five hours ago when I wrote the first version of this message which seems to have vanished into a black hole, -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From kvddrift at earthlink.net Wed Dec 17 21:44:30 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed Dec 17 21:51:33 2003 Subject: [Bioperl-l] Disabling web tests [some tests fail on Mac OS X] In-Reply-To: References: <200312161430.34208.heikki@nildram.co.uk> <200312171145.26753.heikki@nildram.co.uk> Message-ID: <1ACD4E59-3104-11D8-A893-003065A5FDCC@earthlink.net> FYI, The following tests now fail when offline (v 1.304): HNN.t ESEfinder.t GOR4.t Domcut.t Sopma.t MitoProt.t DBCUTG.t - Koen. On Dec 17, 2003, at 8:58 AM, Jason Stajich wrote: > Sounds fine to me. > > On Wed, 17 Dec 2003, Heikki Lehvaslaiho wrote: > >> >> >> I disabled web dependent tests in t/Scansite.t for the time being >> since they >> have been hanging tests. >> >> I did this so that these tests are run only if BIOPERLDEBUG is set. >> Is this a >> way forward? >> >> -Heikki >> >> On Tuesday 16 Dec 2003 2:30 pm, Heikki Lehvaslaiho wrote: >>> That is good idea. What is the best way to do it? Is there a flag >>> we can >>> pass to Test harness or should we decide upon an environmental >>> variable >>> like "BIOPERLDEBUG"? >>> >>> -Heikki >>> >>> On Tuesday 16 Dec 2003 1:15 am, Jason Stajich wrote: >>>> We may need to think about how to set a flag for tests which should >>>> not >>>> run when no network connecion is found (or perhaps not be run as >>>> part of >>>> the default tests no matter what unless someone asks for them >>>> specifically - this might save a lot of future pain). >> >> > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > From jason at cgt.duhs.duke.edu Wed Dec 17 21:56:38 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Dec 17 22:03:15 2003 Subject: [Bioperl-l] Disabling web tests [some tests fail on Mac OS X] In-Reply-To: <1ACD4E59-3104-11D8-A893-003065A5FDCC@earthlink.net> References: <200312161430.34208.heikki@nildram.co.uk> <200312171145.26753.heikki@nildram.co.uk> <1ACD4E59-3104-11D8-A893-003065A5FDCC@earthlink.net> Message-ID: updated so they are skipped without BIOPERLDEBUG set to 1 - we may still need to do this for DB.t, RefSeq.t, and whatever else out there is making remote connections. -jason On Wed, 17 Dec 2003, Koen van der Drift wrote: > FYI, > > The following tests now fail when offline (v 1.304): > > HNN.t > ESEfinder.t > GOR4.t > Domcut.t > Sopma.t > MitoProt.t > DBCUTG.t > > > - Koen. > > > On Dec 17, 2003, at 8:58 AM, Jason Stajich wrote: > > > Sounds fine to me. > > > > On Wed, 17 Dec 2003, Heikki Lehvaslaiho wrote: > > > >> > >> > >> I disabled web dependent tests in t/Scansite.t for the time being > >> since they > >> have been hanging tests. > >> > >> I did this so that these tests are run only if BIOPERLDEBUG is set. > >> Is this a > >> way forward? > >> > >> -Heikki > >> > >> On Tuesday 16 Dec 2003 2:30 pm, Heikki Lehvaslaiho wrote: > >>> That is good idea. What is the best way to do it? Is there a flag > >>> we can > >>> pass to Test harness or should we decide upon an environmental > >>> variable > >>> like "BIOPERLDEBUG"? > >>> > >>> -Heikki > >>> > >>> On Tuesday 16 Dec 2003 1:15 am, Jason Stajich wrote: > >>>> We may need to think about how to set a flag for tests which should > >>>> not > >>>> run when no network connecion is found (or perhaps not be run as > >>>> part of > >>>> the default tests no matter what unless someone asks for them > >>>> specifically - this might save a lot of future pain). > >> > >> > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From kvddrift at earthlink.net Wed Dec 17 22:09:16 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed Dec 17 22:16:10 2003 Subject: [Bioperl-l] Disabling web tests [some tests fail on Mac OS X] In-Reply-To: References: <200312161430.34208.heikki@nildram.co.uk> <200312171145.26753.heikki@nildram.co.uk> <1ACD4E59-3104-11D8-A893-003065A5FDCC@earthlink.net> Message-ID: <902BB530-3107-11D8-A893-003065A5FDCC@earthlink.net> On Dec 17, 2003, at 9:56 PM, Jason Stajich wrote: > updated so they are skipped without BIOPERLDEBUG set to 1 - we may > still > need to do this for DB.t, RefSeq.t, and whatever else out there is > making > remote connections. > Ah, I forgot RefSeq.t. DB.t works fine offline. - Koen. From heikki at nildram.co.uk Thu Dec 18 04:40:37 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Dec 18 04:46:29 2003 Subject: [Bioperl-l] AUTHORS In-Reply-To: References: Message-ID: <200312180940.37309.heikki@nildram.co.uk> Most of you might be there already, but please check and add any comments about your contributions. There is a new script maintenance/authors.pl that I've used to go find authors and contributors from module documentation and then add into the AUTHORS file. Also, if you know that you've used multiple email addresses and want to unify them, you can modify the email synonym list within the script to see where those synonymes have been used. -Heikki On Wednesday 17 Dec 2003 7:05 pm, Jason Stajich wrote: > To those new bioperl contributors - don't forget to add yourself to the > AUTHORS file in the repository before the 1.4 release so you can be > acknowledged for your work. > > -jason > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Thu Dec 18 04:40:37 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Dec 18 04:46:33 2003 Subject: [Bioperl-l] AUTHORS In-Reply-To: References: Message-ID: <200312180940.37309.heikki@nildram.co.uk> Most of you might be there already, but please check and add any comments about your contributions. There is a new script maintenance/authors.pl that I've used to go find authors and contributors from module documentation and then add into the AUTHORS file. Also, if you know that you've used multiple email addresses and want to unify them, you can modify the email synonym list within the script to see where those synonymes have been used. -Heikki On Wednesday 17 Dec 2003 7:05 pm, Jason Stajich wrote: > To those new bioperl contributors - don't forget to add yourself to the > AUTHORS file in the repository before the 1.4 release so you can be > acknowledged for your work. > > -jason > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Thu Dec 18 05:30:00 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Dec 18 05:37:39 2003 Subject: [Bioperl-l] Disabling web tests [some tests fail on Mac OS X] In-Reply-To: <902BB530-3107-11D8-A893-003065A5FDCC@earthlink.net> References: <902BB530-3107-11D8-A893-003065A5FDCC@earthlink.net> Message-ID: <200312181030.00924.heikki@nildram.co.uk> Fixed RefSeq.t. All tests pass offline. -Heikki On Thursday 18 Dec 2003 3:09 am, Koen van der Drift wrote: > On Dec 17, 2003, at 9:56 PM, Jason Stajich wrote: > > updated so they are skipped without BIOPERLDEBUG set to 1 - we may > > still > > need to do this for DB.t, RefSeq.t, and whatever else out there is > > making > > remote connections. > > Ah, I forgot RefSeq.t. DB.t works fine offline. > > - Koen. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From cain at cshl.org Thu Dec 18 09:45:47 2003 From: cain at cshl.org (Scott Cain) Date: Thu Dec 18 09:51:35 2003 Subject: [Bioperl-l] Getting CDS boundaries from Unflattener Message-ID: <1071758747.1467.21.camel@localhost.localdomain> Hi Chris, I very much what to reimplement Bio::DB::GFF::Adaptor::biofetch using Unflattener, but but there are a few problems I am having. Below is a section of GFF that I generate using Unflattener from AE003644: AE003644 EMBL/GenBank/SwissProt gene 20111 23268 . + . ID=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;map=35B2-35B2;note=last+curated+on+Thu+Dec+13+16:51:32+PST+2001 AE003644 EMBL/GenBank/SwissProt mRNA 20111 23268 . + . ID=noc_mRNA_1;Parent=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;product=CG4491-RA AE003644 EMBL/GenBank/SwissProt CDS 20495 22410 . + . Parent=noc_mRNA_1;codon_start=1;db_xref=GI:7298163,FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;note=noc+gene+product;product=CG4491-PA;protein_id=AAF53399.1;translation=MVVLEGGGGV... AE003644 EMBL/GenBank/SwissProt exon 20111 20584 . + . Parent=noc_mRNA_1 AE003644 EMBL/GenBank/SwissProt exon 20887 23268 . + . Parent=noc_mRNA_1 The biggest problem with this set of data is that the CDS spans introns. The CDS really ought to be broken up into segments to match the exon boundaries. As it is, it breaks display in gbrowse whether it is using chado or a GFF database as a backend. The other problem is that the exons' parentage is incorrect. The exons should be features of the gene, not the mRNA. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From dag at bioteam.net Thu Dec 18 10:22:36 2003 From: dag at bioteam.net (Chris Dagdigian) Date: Thu Dec 18 10:33:06 2003 Subject: [Bioperl-l] Bundle::Bioperl updated for bioperl-1.4 release In-Reply-To: <3FE0F740.7000105@sonsorol.org> Message-ID: Hi folks, A potential 1.4 ready Bundle::BioPerl is in CPAN now at: http://search.cpan.org/~craffi/Bundle-BioPerl-2.1.0/BioPerl.pm Please make sure I got all the dependencies listed. The primary new editions seem to be: HTML::Entities HTML::Parser GD::SVG SVG:: This is the old bundle info: > Bundle id = Bundle::BioPerl > CPAN_USERID CRAFFI (Chris Dagdigian ) > CPAN_VERSION 2.05 > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > MANPAGE Bundle::BioPerl - A bundle to install external CPAN > modules used by BioPerl > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > IO::String HTTP::Request::Common HTTP::Status LWP::UserAgent URI::Escape > XML::Parser XML::Parser::PerlSAX XML::Writer XML::Node XML::Twig > Text::Iconv Scalar::Util XML::DOM SOAP::Lite GD Storable > Text::Shellwords Data::Stag Graph::Directed > INST_FILE /root/.cpan/Bundle/BioPerl.pm > INST_VERSION 2.05 This is the new bundle: > cpan> i Bundle::BioPerl > Strange distribution name [Bundle::BioPerl] > Bundle id = Bundle::BioPerl > CPAN_USERID CRAFFI (Chris Dagdigian ) > CPAN_VERSION 2.05 > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > MANPAGE Bundle::BioPerl - A bundle to install external CPAN > modules used by BioPerl > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > IO::String HTML::Entities HTML::Parser HTTP::Request::Common > HTTP::Status LWP::UserAgent URI::Escape XML::Parser XML::Parser::PerlSAX > XML::Writer XML::Node XML::Twig Text::Iconv Scalar::Util XML::DOM > SOAP::Lite GD GD::SVG SVG Storable Text::Shellwords Data::Stag > Graph::Directed > INST_FILE /usr/lib/perl5/site_perl/5.8.0/Bundle/BioPerl.pm > INST_VERSION 2.1.0 > -Chris From allenday at ucla.edu Thu Dec 18 12:08:38 2003 From: allenday at ucla.edu (Allen Day) Date: Thu Dec 18 12:14:29 2003 Subject: [Bioperl-l] Bundle::Bioperl updated for bioperl-1.4 release In-Reply-To: Message-ID: On Thu, 18 Dec 2003, Chris Dagdigian wrote: > > Hi folks, > > A potential 1.4 ready Bundle::BioPerl is in CPAN now at: > > http://search.cpan.org/~craffi/Bundle-BioPerl-2.1.0/BioPerl.pm > > Please make sure I got all the dependencies listed. The primary new > editions seem to be: > > HTML::Entities > HTML::Parser > GD::SVG > SVG:: there is a dependency on SVG::Graph 0.01 by module Bio::TreeIO::svggraph. -allen > > > > This is the old bundle info: > > > Bundle id = Bundle::BioPerl > > CPAN_USERID CRAFFI (Chris Dagdigian ) > > CPAN_VERSION 2.05 > > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > > MANPAGE Bundle::BioPerl - A bundle to install external CPAN > > modules used by BioPerl > > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > > IO::String HTTP::Request::Common HTTP::Status LWP::UserAgent URI::Escape > > XML::Parser XML::Parser::PerlSAX XML::Writer XML::Node XML::Twig > > Text::Iconv Scalar::Util XML::DOM SOAP::Lite GD Storable > > Text::Shellwords Data::Stag Graph::Directed > > INST_FILE /root/.cpan/Bundle/BioPerl.pm > > INST_VERSION 2.05 > > > This is the new bundle: > > > > cpan> i Bundle::BioPerl > > Strange distribution name [Bundle::BioPerl] > > Bundle id = Bundle::BioPerl > > CPAN_USERID CRAFFI (Chris Dagdigian ) > > CPAN_VERSION 2.05 > > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > > MANPAGE Bundle::BioPerl - A bundle to install external CPAN > > modules used by BioPerl > > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > > IO::String HTML::Entities HTML::Parser HTTP::Request::Common > > HTTP::Status LWP::UserAgent URI::Escape XML::Parser XML::Parser::PerlSAX > > XML::Writer XML::Node XML::Twig Text::Iconv Scalar::Util XML::DOM > > SOAP::Lite GD GD::SVG SVG Storable Text::Shellwords Data::Stag > > Graph::Directed > > INST_FILE /usr/lib/perl5/site_perl/5.8.0/Bundle/BioPerl.pm > > INST_VERSION 2.1.0 > > > > > -Chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From heikki at nildram.co.uk Thu Dec 18 13:24:28 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Dec 18 13:30:18 2003 Subject: [Bioperl-l] Bundle::Bioperl updated for bioperl-1.4 release In-Reply-To: References: Message-ID: <200312181824.28759.heikki@nildram.co.uk> Allen, There is a problem. I can find SVG::Graph in search.cpan.org, but I could not install it. I found out that is an other, conflicting, SVG::Graph module by Mike Miller: http://search.cpan.org/~mrmike/ . -Heikki On Thursday 18 Dec 2003 5:08 pm, Allen Day wrote: > there is a dependency on SVG::Graph 0.01 by module Bio::TreeIO::svggraph. > > -allen > > > This is the old bundle info: > > > Bundle id = Bundle::BioPerl > > > CPAN_USERID CRAFFI (Chris Dagdigian ) > > > CPAN_VERSION 2.05 > > > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > > > MANPAGE Bundle::BioPerl - A bundle to install external CPAN > > > modules used by BioPerl > > > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > > > IO::String HTTP::Request::Common HTTP::Status LWP::UserAgent > > > URI::Escape XML::Parser XML::Parser::PerlSAX XML::Writer XML::Node > > > XML::Twig Text::Iconv Scalar::Util XML::DOM SOAP::Lite GD Storable > > > Text::Shellwords Data::Stag Graph::Directed > > > INST_FILE /root/.cpan/Bundle/BioPerl.pm > > > INST_VERSION 2.05 > > > > This is the new bundle: > > > cpan> i Bundle::BioPerl > > > Strange distribution name [Bundle::BioPerl] > > > Bundle id = Bundle::BioPerl > > > CPAN_USERID CRAFFI (Chris Dagdigian ) > > > CPAN_VERSION 2.05 > > > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > > > MANPAGE Bundle::BioPerl - A bundle to install external CPAN > > > modules used by BioPerl > > > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > > > IO::String HTML::Entities HTML::Parser HTTP::Request::Common > > > HTTP::Status LWP::UserAgent URI::Escape XML::Parser > > > XML::Parser::PerlSAX XML::Writer XML::Node XML::Twig Text::Iconv > > > Scalar::Util XML::DOM SOAP::Lite GD GD::SVG SVG Storable > > > Text::Shellwords Data::Stag Graph::Directed > > > INST_FILE /usr/lib/perl5/site_perl/5.8.0/Bundle/BioPerl.pm > > > INST_VERSION 2.1.0 > > > > -Chris > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From allenday at ucla.edu Thu Dec 18 13:56:07 2003 From: allenday at ucla.edu (Allen Day) Date: Thu Dec 18 14:02:04 2003 Subject: [Bioperl-l] Bundle::Bioperl updated for bioperl-1.4 release In-Reply-To: <200312181824.28759.heikki@nildram.co.uk> Message-ID: On Thu, 18 Dec 2003, Heikki Lehvaslaiho wrote: > Allen, > > There is a problem. I can find SVG::Graph in search.cpan.org, but I could not > install it. I found out that is an other, conflicting, SVG::Graph module by > Mike Miller: http://search.cpan.org/~mrmike/ . It looks like Mike Miller has registered the namespace, but he has not uploaded a distribution yet. I'm not sure why he controls it... I've already cleared it with Ronan Oger (registrant of SVG namespace) that it was okay for me to upload SVG::Graph. Does this cause problems in creation of the bundle? -Allen > > -Heikki > > On Thursday 18 Dec 2003 5:08 pm, Allen Day wrote: > > there is a dependency on SVG::Graph 0.01 by module Bio::TreeIO::svggraph. > > > > -allen > > > > > This is the old bundle info: > > > > Bundle id = Bundle::BioPerl > > > > CPAN_USERID CRAFFI (Chris Dagdigian ) > > > > CPAN_VERSION 2.05 > > > > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > > > > MANPAGE Bundle::BioPerl - A bundle to install external CPAN > > > > modules used by BioPerl > > > > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > > > > IO::String HTTP::Request::Common HTTP::Status LWP::UserAgent > > > > URI::Escape XML::Parser XML::Parser::PerlSAX XML::Writer XML::Node > > > > XML::Twig Text::Iconv Scalar::Util XML::DOM SOAP::Lite GD Storable > > > > Text::Shellwords Data::Stag Graph::Directed > > > > INST_FILE /root/.cpan/Bundle/BioPerl.pm > > > > INST_VERSION 2.05 > > > > > > This is the new bundle: > > > > cpan> i Bundle::BioPerl > > > > Strange distribution name [Bundle::BioPerl] > > > > Bundle id = Bundle::BioPerl > > > > CPAN_USERID CRAFFI (Chris Dagdigian ) > > > > CPAN_VERSION 2.05 > > > > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > > > > MANPAGE Bundle::BioPerl - A bundle to install external CPAN > > > > modules used by BioPerl > > > > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > > > > IO::String HTML::Entities HTML::Parser HTTP::Request::Common > > > > HTTP::Status LWP::UserAgent URI::Escape XML::Parser > > > > XML::Parser::PerlSAX XML::Writer XML::Node XML::Twig Text::Iconv > > > > Scalar::Util XML::DOM SOAP::Lite GD GD::SVG SVG Storable > > > > Text::Shellwords Data::Stag Graph::Directed > > > > INST_FILE /usr/lib/perl5/site_perl/5.8.0/Bundle/BioPerl.pm > > > > INST_VERSION 2.1.0 > > > > > > -Chris > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From cjm at fruitfly.org Thu Dec 18 16:52:19 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Thu Dec 18 15:04:27 2003 Subject: [Bioperl-l] Re: Getting CDS boundaries from Unflattener In-Reply-To: <1071758747.1467.21.camel@localhost.localdomain> Message-ID: On Thu, 18 Dec 2003, Scott Cain wrote: > Hi Chris, > > I very much what to reimplement Bio::DB::GFF::Adaptor::biofetch using > Unflattener, but but there are a few problems I am having. Below is a > section of GFF that I generate using Unflattener from AE003644: > > AE003644 EMBL/GenBank/SwissProt gene 20111 23268 . + . ID=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;map=35B2-35B2;note=last+curated+on+Thu+Dec+13+16:51:32+PST+2001 > AE003644 EMBL/GenBank/SwissProt mRNA 20111 23268 . + . ID=noc_mRNA_1;Parent=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;product=CG4491-RA > AE003644 EMBL/GenBank/SwissProt CDS 20495 22410 . + . Parent=noc_mRNA_1;codon_start=1;db_xref=GI:7298163,FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;note=noc+gene+product;product=CG4491-PA;protein_id=AAF53399.1;translation=MVVLEGGGGV... > AE003644 EMBL/GenBank/SwissProt exon 20111 20584 . + . Parent=noc_mRNA_1 > AE003644 EMBL/GenBank/SwissProt exon 20887 23268 . + . Parent=noc_mRNA_1 > > The biggest problem with this set of data is that the CDS spans > introns. The CDS really ought to be broken up into segments to match > the exon boundaries. As it is, it breaks display in gbrowse whether it > is using chado or a GFF database as a backend. When I use the unflattener on AE003644, the CDSs I get out have split locations which match the coding exon boundaries - are you sure this isn't a problem with the GFF code? Are you doing all the usual weird stuff like: if ($sf->location->isa("Bio::Location::SplitLocationI")) { @locs = $sf->location->each_Location; } > The other problem is that the exons' parentage is incorrect. The exons > should be features of the gene, not the mRNA. I think you have this the wrong way round. Again, this must be a problem with how you're assigning parent tags in the GFF output, when I try AE003644 the exons are children of the mRNA, which is correct. Try this using the unflattener script (which uses SeqIO::asciitree as default output) cd bioperl-live perl scripts/seq/unflatten_seq.PLS t/data/AE003644_Adh-genomic.gb Seq: AE003644 databank_entry 1..263309[+] gene mRNA CG4491-RA CDS CG4491-PA 20495..22410[+] ; SPLIT: 20495..20584[+] 20887..22410[+] exon 20111..20584[+] exon 20887..23268[+] gene tRNA tRNA-Pro exon 25127..25198[+] exons under mRNA, and CDS boundaries matching exons (do a cvs update for the latest asciitree.pm) cheers chris > Thanks, > Scott > > > > From e-just at northwestern.edu Thu Dec 18 18:00:38 2003 From: e-just at northwestern.edu (Eric Just) Date: Thu Dec 18 18:06:28 2003 Subject: [Bioperl-l] Bio::DB::Fasta Message-ID: <5.1.1.6.0.20031218163538.02ceebe0@hecky.it.northwestern.edu> Hi I have 2 points about Bio::DB::Fasta 1. In windows it seems that the file being indexed needs to have unix style line breaks. WIndows style does not work. I have attempted to retreive a subsequence of a genomic sequence file in a database. To compare I used a seq object created through SeqIO. I got tow seqs of different lenghths. The one gotten from SeqIO is what I was expecting. use Bio::DB::Fasta; use Bio::Seq; use Bio::SeqIO; my $db = Bio::DB::Fasta->new('C:/dicty/bin/blast_scripts/fasta'); my $prim = $db->get_Seq_by_id('DDB0183747'); my $fasta = new Bio::SeqIO( -file => 'C:/dicty/bin/blast_scripts/fasta/dictyChromosome6.fa', -format => 'Fasta'); my $seq = $fasta->next_seq(); print Dumper( $seq->subseq(1001,1025 )); print Dumper( $prim->subseq(1001,1025 )); ------------------------output------------------------------------------ $VAR1 = 'ATAAATCAAATTGTTTTTTAGTTTT'; $VAR1 = 'NNNNNNNNNNNNNNNNATAAATCAAA'; 2. If I save the file with unix style line breaks it is better but I still get a different sequence than I do for SeqIO: It seems to be offset by 1. ------------------------output------------------------------------------ $VAR1 = 'ATAAATCAAATTGTTTTTTAGTTTT'; $VAR1 = 'TAAATCAAATTGTTTTTTAGTTTTT'; I am using the latest version of Bio::DB::Fasta (downloaded from cvs tree) in bioperl 1.2. and DB_File 1.807 (from ppm). Thanks for any help/suggestions. Eric ============================================ Eric Just e-just@northwestern.edu dictyBase Programmer Center for Genetic Medicine Northwestern University http://dictybase.org ============================================ From kvddrift at earthlink.net Thu Dec 18 21:43:17 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu Dec 18 21:50:19 2003 Subject: [Bioperl-l] Disabling web tests [some tests fail on Mac OS X] In-Reply-To: <200312181030.00924.heikki@nildram.co.uk> References: <902BB530-3107-11D8-A893-003065A5FDCC@earthlink.net> <200312181030.00924.heikki@nildram.co.uk> Message-ID: <1994B97D-31CD-11D8-BF8A-003065A5FDCC@earthlink.net> Hi, Thanks for fixing all those tests which fail when offline. I want to include the tests in the fink-package for the upcoming, which is why I am so picky ;-) I found one more oddity, though. When an older bioperl is present, the Search.t test passes, however, if no bioperl is installed, it fails as follows: ... ... ok 930 Bio::SearchIO: psiblast cannot be found Exception ------------- EXCEPTION ------------- MSG: Failed to load module Bio::SearchIO::psiblast. Can't locate Bio/Search/Result/PsiBlastResult.pm in @INC (@INC contains: t . /sw/lib/perl5/5.8.1/darwin-thread-multi-2level /sw/lib/perl5/5.8.1 /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /sw/lib/perl5//5.8.1/darwin-thread-multi-2level /sw/lib/perl5//5.8.1 /sw/lib/perl5//darwin-thread-multi-2level /sw/lib/perl5/ /System/Library/Perl/5.8.1/darwin-thread-multi-2level /System/Library/Perl/5.8.1 /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 /Library/Perl /Network/Library/Perl/5.8.1/darwin-thread-multi-2level /Network/Library/Perl/5.8.1 /Network/Library/Perl) at Bio/SearchIO/psiblast.pm line 152, line 245. BEGIN failed--compilation aborted at Bio/SearchIO/psiblast.pm line 152, line 245. Compilation failed in require at Bio/Root/Root.pm line 394, line 245. STACK Bio::Root::Root::_load_module Bio/Root/Root.pm:396 STACK (eval) Bio/SearchIO.pm:387 STACK Bio::SearchIO::_load_format_module Bio/SearchIO.pm:386 STACK Bio::SearchIO::new Bio/SearchIO.pm:166 STACK toplevel t/SearchIO.t:1303 -------------------------------------- For more information about the SearchIO system please see the SearchIO docs. This includes ways of checking for formats at compile time, not run time Can't call method "next_result" on an undefined value at t/SearchIO.t line 1307. [ModusOperandi:~/Desktop/bioperl-live] koen% - Koen. From heikki at nildram.co.uk Fri Dec 19 07:30:14 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Fri Dec 19 07:37:44 2003 Subject: [Bioperl-l] Disabling web tests [some tests fail on Mac OS X] In-Reply-To: <1994B97D-31CD-11D8-BF8A-003065A5FDCC@earthlink.net> References: <200312181030.00924.heikki@nildram.co.uk> <1994B97D-31CD-11D8-BF8A-003065A5FDCC@earthlink.net> Message-ID: <200312191230.14437.heikki@nildram.co.uk> Koen, I really like you being picky. ;-) All the help in debbugging bioperl is welcome! That error must be caused by me deleting outdated psiblast from distribution. I'll fix that later today. Jason, There are still two psiblast scripts left in core: examples/searchio/psiblast_features.pl examples/searchio/psiblast_iterations.pl Do you have any interest in migrating them into new code base (like Steve suggested) or shall I remove them? -Heikki On Friday 19 Dec 2003 2:43 am, Koen van der Drift wrote: > Hi, > > Thanks for fixing all those tests which fail when offline. I want to > include the tests in the fink-package for the upcoming, which is why I > am so picky ;-) > > I found one more oddity, though. When an older bioperl is present, the > Search.t test passes, however, if no bioperl is installed, it fails as > follows: > > ... > ... > ok 930 > Bio::SearchIO: psiblast cannot be found > Exception > ------------- EXCEPTION ------------- > MSG: Failed to load module Bio::SearchIO::psiblast. Can't locate > Bio/Search/Result/PsiBlastResult.pm in @INC (@INC contains: t . > /sw/lib/perl5/5.8.1/darwin-thread-multi-2level /sw/lib/perl5/5.8.1 > /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 > /sw/lib/perl5/darwin /sw/lib/perl5//5.8.1/darwin-thread-multi-2level > /sw/lib/perl5//5.8.1 /sw/lib/perl5//darwin-thread-multi-2level > /sw/lib/perl5/ /System/Library/Perl/5.8.1/darwin-thread-multi-2level > /System/Library/Perl/5.8.1 > /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 > /Library/Perl /Network/Library/Perl/5.8.1/darwin-thread-multi-2level > /Network/Library/Perl/5.8.1 /Network/Library/Perl) at > Bio/SearchIO/psiblast.pm line 152, line 245. > BEGIN failed--compilation aborted at Bio/SearchIO/psiblast.pm line 152, > line 245. > Compilation failed in require at Bio/Root/Root.pm line 394, > line 245. > > STACK Bio::Root::Root::_load_module Bio/Root/Root.pm:396 > STACK (eval) Bio/SearchIO.pm:387 > STACK Bio::SearchIO::_load_format_module Bio/SearchIO.pm:386 > STACK Bio::SearchIO::new Bio/SearchIO.pm:166 > STACK toplevel t/SearchIO.t:1303 > > -------------------------------------- > > For more information about the SearchIO system please see the SearchIO > docs. > This includes ways of checking for formats at compile time, not run time > Can't call method "next_result" on an undefined value at t/SearchIO.t > line 1307. > [ModusOperandi:~/Desktop/bioperl-live] koen% > > > > - Koen. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Fri Dec 19 07:33:42 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Fri Dec 19 07:39:29 2003 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <5.1.1.6.0.20031218163538.02ceebe0@hecky.it.northwestern.edu> References: <5.1.1.6.0.20031218163538.02ceebe0@hecky.it.northwestern.edu> Message-ID: <200312191233.42524.heikki@nildram.co.uk> Eric, Thanks for the error report. Jason hinted earlier that he saw similar problems in that module. Lincoln, do you have time to have go at this before Monday? -Heikki On Thursday 18 Dec 2003 11:00 pm, Eric Just wrote: > Hi I have 2 points about Bio::DB::Fasta > > 1. In windows it seems that the file being indexed needs to have unix style > line breaks. WIndows style does not work. > > I have attempted to retreive a subsequence of a genomic sequence file in a > database. To compare I used a seq object created through SeqIO. I got tow > seqs of different lenghths. The one gotten from SeqIO is what I was > expecting. > > use Bio::DB::Fasta; > use Bio::Seq; > use Bio::SeqIO; > > my $db = Bio::DB::Fasta->new('C:/dicty/bin/blast_scripts/fasta'); > my $prim = $db->get_Seq_by_id('DDB0183747'); > my $fasta = new Bio::SeqIO( -file => > 'C:/dicty/bin/blast_scripts/fasta/dictyChromosome6.fa', -format => > 'Fasta'); my $seq = $fasta->next_seq(); > > print Dumper( $seq->subseq(1001,1025 )); > print Dumper( $prim->subseq(1001,1025 )); > > ------------------------output------------------------------------------ > > $VAR1 = 'ATAAATCAAATTGTTTTTTAGTTTT'; > $VAR1 = 'NNNNNNNNNNNNNNNNATAAATCAAA'; > > 2. If I save the file with unix style line breaks it is better but I still > get a different sequence than I do for SeqIO: It seems to be offset by 1. > > ------------------------output------------------------------------------ > > $VAR1 = 'ATAAATCAAATTGTTTTTTAGTTTT'; > $VAR1 = 'TAAATCAAATTGTTTTTTAGTTTTT'; > > I am using the latest version of Bio::DB::Fasta (downloaded from cvs tree) > in bioperl 1.2. and DB_File 1.807 (from ppm). > > Thanks for any help/suggestions. > > Eric > > > > > ============================================ > > Eric Just > e-just@northwestern.edu > dictyBase Programmer > Center for Genetic Medicine > Northwestern University > http://dictybase.org > > ============================================ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From microsale at gmx.net Fri Dec 19 07:34:53 2003 From: microsale at gmx.net (Microsale) Date: Fri Dec 19 07:40:18 2003 Subject: [Bioperl-l] Mr. Uwe Schmidt is a knave! Don't buy any product from Microsale! Message-ID: <200312191240.hBJCe8FD017226@portal.open-bio.org> Dear friends, Microsale SC KG, Ltd, Germany is a knave company and Uwe Schmidt is a big knave! We are cheated by Microsale SC KG, Ltd. Remember, don't do any business with this company. Don't buy any product from Microsale or you will be cheated. This company has a bad reputation in Germany and in other European countries, espcially in Belgium and Netherlands. Here's the story: Mr. Uwe Schmidt , CEO Microsale (R) SC KG He is also an auditor, but he doesn't have any commercial morality. He always made cheated L/C to other companies. Many companies have been cheated by him. Please take care!!! His products have many problems, such as CD player and mp3 player! It's the detailed information of this company: Dahlienweg 6 D 52477 Alsdorf, NRW Germany, European Union Tel.-/Fax-Box: +49 89 1488230796 +32 87 783518 +32 87 783019 Mobil: +32 474 409055 Email: microsale@email.de microsale@gmx.net uwe-schmidt-@gmx.net Web: http://www.microsale.biz http://members.ebay.de/aboutme/microsale_schmidt From cain at cshl.org Fri Dec 19 10:48:17 2003 From: cain at cshl.org (Scott Cain) Date: Fri Dec 19 10:54:04 2003 Subject: [Bioperl-l] Re: Getting CDS boundaries from Unflattener In-Reply-To: References: Message-ID: <1071848897.1468.52.camel@localhost.localdomain> On Thu, 2003-12-18 at 16:52, Chris Mungall wrote: > On Thu, 18 Dec 2003, Scott Cain wrote: > > The biggest problem with this set of data is that the CDS spans > > introns. The CDS really ought to be broken up into segments to match > > the exon boundaries. As it is, it breaks display in gbrowse whether it > > is using chado or a GFF database as a backend. > > When I use the unflattener on AE003644, the CDSs I get out have split > locations which match the coding exon boundaries - are you sure this isn't > a problem with the GFF code? Are you doing all the usual weird stuff like: > > if ($sf->location->isa("Bio::Location::SplitLocationI")) { > @locs = $sf->location->each_Location; > } Oops--read that documentation, Scott. OK, I fixed Bio::Tools::GFF to deal with split locations. > > > The other problem is that the exons' parentage is incorrect. The exons > > should be features of the gene, not the mRNA. > > I think you have this the wrong way round. Again, this must be a problem > with how you're assigning parent tags in the GFF output, when I try > AE003644 the exons are children of the mRNA, which is correct. > I don't think so; here are the relevant lines from SO: @is_a@gene ; SO:0000704 ; SOFA:SOFA ; SOFA:region @part_of@transcript ; SO:0000673 ; SOFA:SOFA ; SOFA:region @part_of@exon ; SO:0000147 ; SOFA:SOFA ; SOFA:region @is_a@processed_transcript ; SO:0000233 ; SOFA:SOFA ; SOFA:region @is_a@mRNA ; SO:0000234 ; SOFA:SOFA ; SOFA:region ; synonym:messenger_RNA @part_of@CDS ; SO:0000316 ; SOFA:SOFA ; SOFA:region ; synonym:coding_sequence Now, I am not one to be lecturing on ontologies, so I may have misinterpreted something here, but it looks to me like exon is part of a transcript, but not part of an mRNA. And since we typically don't have transcript features in Genbank records, exon should be part_of gene. An alternative would be to infer a transcript feature for each mRNA feature and tie the exons to the transcript features, but leaving the mRNAs and CDSs as is. Thanks, Scott > > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain at cshl.org Fri Dec 19 11:33:18 2003 From: cain at cshl.org (Scott Cain) Date: Fri Dec 19 11:39:04 2003 Subject: [Bioperl-l] Re: Getting CDS boundaries from Unflattener In-Reply-To: <1071848897.1468.52.camel@localhost.localdomain> References: <1071848897.1468.52.camel@localhost.localdomain> Message-ID: <1071851598.1465.92.camel@localhost.localdomain> On Fri, 2003-12-19 at 10:48, Scott Cain wrote: > On Thu, 2003-12-18 at 16:52, Chris Mungall wrote: > > On Thu, 18 Dec 2003, Scott Cain wrote: > > > > The biggest problem with this set of data is that the CDS spans > > > introns. The CDS really ought to be broken up into segments to match > > > the exon boundaries. As it is, it breaks display in gbrowse whether it > > > is using chado or a GFF database as a backend. > > > > When I use the unflattener on AE003644, the CDSs I get out have split > > locations which match the coding exon boundaries - are you sure this isn't > > a problem with the GFF code? Are you doing all the usual weird stuff like: > > > > if ($sf->location->isa("Bio::Location::SplitLocationI")) { > > @locs = $sf->location->each_Location; > > } > > Oops--read that documentation, Scott. OK, I fixed Bio::Tools::GFF to > deal with split locations. > > > > > The other problem is that the exons' parentage is incorrect. The exons > > > should be features of the gene, not the mRNA. > > > > I think you have this the wrong way round. Again, this must be a problem > > with how you're assigning parent tags in the GFF output, when I try > > AE003644 the exons are children of the mRNA, which is correct. > > > I don't think so; here are the relevant lines from SO: > > @is_a@gene ; SO:0000704 ; SOFA:SOFA ; SOFA:region > @part_of@transcript ; SO:0000673 ; SOFA:SOFA ; SOFA:region > @part_of@exon ; SO:0000147 ; SOFA:SOFA ; SOFA:region > @is_a@processed_transcript ; SO:0000233 ; SOFA:SOFA ; SOFA:region > @is_a@mRNA ; SO:0000234 ; SOFA:SOFA ; SOFA:region ; synonym:messenger_RNA > @part_of@CDS ; SO:0000316 ; SOFA:SOFA ; SOFA:region ; synonym:coding_sequence > > Now, I am not one to be lecturing on ontologies, so I may have > misinterpreted something here, but it looks to me like exon is part of a > transcript, but not part of an mRNA. And since we typically don't have > transcript features in Genbank records, exon should be part_of gene. An > alternative would be to infer a transcript feature for each mRNA feature > and tie the exons to the transcript features, but leaving the mRNAs and > CDSs as is. > OK, the real problem is that the thing that is labeled an mRNA in the feature from Unflattener (which it is getting from the genbank record) is a transcript, not an mRNA/processed transcript. That is not to say the genbank record is wrong--its not. Generally, the mRNA feature is a collection of ranges in a join. What Unflattener gives for an mRNA feature is really a primary transcript. -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From heikki at nildram.co.uk Sun Dec 21 06:45:17 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Sun Dec 21 17:43:50 2003 Subject: [Bioperl-l] Re: [Bioperl-guts-l] [Bug 1573] New: Setting illegal ids In-Reply-To: <200312201210.hBKCAli6023532@portal.open-bio.org> References: <200312201210.hBKCAli6023532@portal.open-bio.org> Message-ID: <200312211145.18018.heikki@nildram.co.uk> Valentin, I think you are right in this. Whitespace in display_id is bad news and should not be allowed. This is one of the many conventions in sequence formats, however, I am a bit hesitent to extend this ban to other fields without hard data and real need. Suggestions welcome, though. I found these from database format documents: EMBL: Entryname: stable identifier, consisting of alphanumeric character, starting with a letter. All letters should be in upper case. SWISS-PROT: The Swiss-Prot entry name consists of up to ten uppercase alphanumeric characters. Swiss-Prot uses a general purpose naming convention that can be symbolized as X_Y, Several formats can, in principle, tolerate whitespace in IDs. A quick look into formats identified there ones to tackle now: fasta gcg genbank embl mase pir swiss My only concern is that there might be some unforeseen effect if I enforce this just before the release. I suggest that for now I add: $self->warn("No whitespace allowed in SWISS-PROT display id [". $seq->display_id. "]") if $seq->display_id =~ /\s/; Setting $seq->verbose(2) before printing out will then convert this into a throw. A cleaner and simpler alternative would be to add a warning into value setting code of Bio::PrimarySeq::display_id(). $self->warn("It is a REALLY bad idea to have whitespace in display_id [". $seq->display_id. "]") if $seq->display_id =~ /\s/; but is too intrusive? -Heikki P.S. open-bio.org/bioperl.org domain has been down since yesterday. On Saturday 20 Dec 2003 12:10 pm, bugzilla-daemon@portal.open-bio.org wrote: > http://bugzilla.bioperl.org/show_bug.cgi?id=1573 > > Summary: Setting illegal ids > Product: Bioperl > Version: main-trunk > Platform: PC > OS/Version: Windows 2000 > Status: NEW > Severity: enhancement > Priority: P2 > Component: Bio::SeqIO > AssignedTo: bioperl-guts-l@bioperl.org > ReportedBy: valentin_ruano@yahoo.es > > > In the swiss format, perhaps in some others as well, a sequence id must not > contain blanks, aan exception is thrown when reading a > blank-containing-idded sequence from the input stream. > > It is possible to set the id of a SeqI instance with blanks in it, so far > so good since we may write this sequence in a format that stands it. > > The problem is that SeqIO outputting in swiss format does not complain when > such a sequence is written into the output stream. > > Subsequent reading on the resulting file will throw an exception. > > Would not be better to provide a more strict validation step when writing > into a swiss foramted file? Throw an exception?. > Personally, I do not believe in converting illegal characters into legal > ones on the fly (e.g. blank -> '_') as adopted in other modules, since this > will silence possible programming mistakes and does not allow customisation > (e.g. I may rather want '#' for blanks). > > I guess the same story may well apply to other fields. > > ==================== > > Follows the exception when trying to read a seq file with blank containg > ids: > > ------------- EXCEPTION ------------- > MSG: swissprot stream with no ID. Not swissprot in my book > STACK Bio::SeqIO::swiss::next_seq > /usr/lib/perl5/site_perl/5.8.2/Bio/SeqIO/swiss .pm:180 > STACK Bio::SeqIO::READLINE /usr/lib/perl5/site_perl/5.8.2/Bio/SeqIO.pm:640 > STACK toplevel /cygdrive/c/Program > Files/eclipse/workspace/meb-toolbox/perl/seqr en.pl:282 > > -------------------------------------- > > > > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From iain.wallace at ucd.ie Sun Dec 21 18:39:18 2003 From: iain.wallace at ucd.ie (iain wallace) Date: Sun Dec 21 18:48:41 2003 Subject: [Bioperl-l] Question about Bio::TreeIO Message-ID: <2832592.1072049958942.JavaMail.cpadmin@iowa.ucd.ie> Hi all, I am trying to write a function that can read in a tree, go through each of the nodes, and tell me how many leaves are a descendent of that node. e.g the root node would be an ancestor to all the leaves. My code simply looks at all the nodes, and calls get_all_descendents(), and if any of the descendents is a leaf the count is incremented by one. However, my code below doesn't seem to work, as it says some nodes have no descendents which are leaves. Is there a flaw in my logic, or is my code simply wrong??? Before I go, there is one other question i would like to ask, how can i pass a bioperl object (like a node) in this case to a function? I tried passing by reference function(\$node) but that wouldn't work for me... Any help would be great, Thanks Iain foreach my $node ( $rootnode->get_all_Descendents() ) { $count=0; foreach my $newnode ( $node-> get_all_Descendents()){ if($newnode->is_Leaf){ $count++; } } $nodeid=$node->internal_id; $node->add_tag_value($count,1); if ($count==0){ print "$nodeid has no leaves...."; } } From juguang at tll.org.sg Mon Dec 22 03:28:59 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Mon Dec 22 03:36:04 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl Message-ID: Hi all, I was Java programmer learning Perl from Bioperl code. I did learn a lot for you ushers. Recently, I decide to read Programming Perl, the Perl Bible, and found some common-sense code in bioperl is not the best. Here are my findings. 1) getset Months ago, Hilmar gave us a tip for the getter/setter to accept undef. The code is like sub name { my $self=shift; return $self->{_name} = shift if @_; return $self->{_name}; } It is no fault about it until this super or sub module have a method with different name but use the same hash key. So choosing the hash key is not on your own ease. Here is the code in Section 12.7 sub name { my $self = shift; my $field = __PACKAGE__ . "::name"; if (@_) { $self->{$field} = shift } return $self->{$field}; } Your getset generator may need to be updated. ;-) ############################################## 2) use vars (@ISA); This is copied from Chapter 31. 31.21. use vars use vars qw($frobbed @munge %seen); This pragma, once used to declare a global variable, is now somewhat deprecated in favor of the our modifier. The previous declaration is better accomplished using: our($frobbed, @munge, %seen); or even: our $frobbed = "F"; our @munge = "A" .. $frobbed; our %seen = (); ######################################### 3) auto getset, again I really cannot stand individual getset any more, after I read Section 12.7. Do yourself a favor, read it, please. One year ago, I suggest to use AUTOLOAD replace all getset methods. The idea was mercilessly extinguish. Now I have big boss's support in his book. Anyone wants to say anything? ;-) ( I do not mean to use AUTOLOAD again, but the rest ways in that section should be discussed) Just do not stay your Perl wisdom and braveness at high school, though your bioinformatics achievement reach above the Ph. D height. I prefer the idea on Section 12.7.4. Generating Accessors with Closures. It is listsub'able. my $0.02 Juguang From ARYES at wicc.weizmann.ac.il Mon Dec 22 06:28:44 2003 From: ARYES at wicc.weizmann.ac.il (Arye Shemesh) Date: Mon Dec 22 05:46:19 2003 Subject: [Bioperl-l] problems with Bio::Tools::BPbl2seq Message-ID: <3FD0B894@wiccweb> Hi, I have a problem with Bio::Tools::BPbl2seq. When I try to run bl2seq on 2 Bio::Seq objects, i get these error messages: Use of uninitialized value in print at /usr/local/lib/perl5/site_perl/5.6.1/Bio/Root/IO.pm line 305, line 7 (#1) The weird thing is that i do get the ruslts i need from my script! Can you please help me find out what's going on here? This is the context in which I use the bl2seq: sub BlastSeqs { my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp'); $factory->F("F"); my $StructSeqObj = Bio::Seq->new(-seq=>$SeqFromStruct); foreach my $seqName (keys %{$DBSeqHashRef}) #iterating over the data hash { my $alignedSeqObj = Bio::Seq->new( -id=>$seqName, -seq=>$DBSeqHashRef->{$seqName}); my $blast_report; eval { #THIS IS THE LINE THAT CAUSES THE ERROR: $blast_report = $factory->bl2seq ($StructSeqObj, alignedSeqObj); }; #of eval print STDERR $@ if ($@); } } # of sub BlastSeqs Thanks, Arye From heikki at nildram.co.uk Mon Dec 22 06:40:07 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Mon Dec 22 06:47:35 2003 Subject: [Bioperl-l] Bundle::Bioperl updated for bioperl-1.4 release In-Reply-To: References: Message-ID: <200312221140.07944.heikki@nildram.co.uk> Todd, I do not think we should add the dependency before it can be installed. I'd hate to add it and then get users complaining and not being able to suggest a fix. Looks like we'll have to leave its addition after the initial 1.4. I just tried installing it again with cpan: ----------------------------------------------------------------------- Running install for module SVG::Graph The module SVG::Graph isn't available on CPAN. Either the module has not yet been uploaded to CPAN, or it is temporary unavailable. Please contact the author to find out more about the status. Try 'i SVG::Graph'. ----------------------------------------------------------------------- Have you had any luck in starting to sort this out? -Heikki On Monday 22 Dec 2003 12:22 am, you wrote: > so are we okay on adding the SVG::Graph dependency? > > -Allen > > On Thu, 18 Dec 2003, Heikki Lehvaslaiho wrote: > > Allen, > > > > There is a problem. I can find SVG::Graph in search.cpan.org, but I could > > not install it. I found out that is an other, conflicting, SVG::Graph > > module by Mike Miller: http://search.cpan.org/~mrmike/ . > > > > -Heikki > > > > On Thursday 18 Dec 2003 5:08 pm, Allen Day wrote: > > > there is a dependency on SVG::Graph 0.01 by module > > > Bio::TreeIO::svggraph. > > > > > > -allen > > > > > > > This is the old bundle info: > > > > > Bundle id = Bundle::BioPerl > > > > > CPAN_USERID CRAFFI (Chris Dagdigian ) > > > > > CPAN_VERSION 2.05 > > > > > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > > > > > MANPAGE Bundle::BioPerl - A bundle to install external > > > > > CPAN modules used by BioPerl > > > > > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > > > > > IO::String HTTP::Request::Common HTTP::Status LWP::UserAgent > > > > > URI::Escape XML::Parser XML::Parser::PerlSAX XML::Writer XML::Node > > > > > XML::Twig Text::Iconv Scalar::Util XML::DOM SOAP::Lite GD Storable > > > > > Text::Shellwords Data::Stag Graph::Directed > > > > > INST_FILE /root/.cpan/Bundle/BioPerl.pm > > > > > INST_VERSION 2.05 > > > > > > > > This is the new bundle: > > > > > cpan> i Bundle::BioPerl > > > > > Strange distribution name [Bundle::BioPerl] > > > > > Bundle id = Bundle::BioPerl > > > > > CPAN_USERID CRAFFI (Chris Dagdigian ) > > > > > CPAN_VERSION 2.05 > > > > > CPAN_FILE C/CR/CRAFFI/Bundle-BioPerl-2.05.tar.gz > > > > > MANPAGE Bundle::BioPerl - A bundle to install external > > > > > CPAN modules used by BioPerl > > > > > CONTAINS Bundle::LWP File::Temp File::Spec IO::Scalar > > > > > IO::String HTML::Entities HTML::Parser HTTP::Request::Common > > > > > HTTP::Status LWP::UserAgent URI::Escape XML::Parser > > > > > XML::Parser::PerlSAX XML::Writer XML::Node XML::Twig Text::Iconv > > > > > Scalar::Util XML::DOM SOAP::Lite GD GD::SVG SVG Storable > > > > > Text::Shellwords Data::Stag Graph::Directed > > > > > INST_FILE /usr/lib/perl5/site_perl/5.8.0/Bundle/BioPerl.pm > > > > > INST_VERSION 2.1.0 > > > > > > > > -Chris > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason at cgt.duhs.duke.edu Mon Dec 22 09:50:36 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Dec 22 09:58:48 2003 Subject: [Bioperl-l] Question about Bio::TreeIO In-Reply-To: <2832592.1072049958942.JavaMail.cpadmin@iowa.ucd.ie> References: <2832592.1072049958942.JavaMail.cpadmin@iowa.ucd.ie> Message-ID: On Sun, 21 Dec 2003, iain wallace wrote: > Hi all, > > I am trying to write a function that can read in a tree, go through > each of the nodes, and tell me how many leaves are a descendent of that > node. > e.g the root node would be an ancestor to all the leaves. > > My code simply looks at all the nodes, and calls get_all_descendents(), > and if any of the descendents is a leaf the count is incremented by > one. However, my code below doesn't seem to work, as it says some nodes > have no descendents which are leaves. Well you're going to get leaf nodes in the outer get_all_Descendents foreach loop which won't have descendents so that should be okay. > > Is there a flaw in my logic, or is my code simply wrong??? > > Before I go, there is one other question i would like to ask, how can i > pass a bioperl object (like a node) in this case to a function? I tried > passing by reference function(\$node) but that wouldn't work for me... > > Any help would be great, Thanks > Iain > You could do it bottom up with a little recursive function. This code doesn't assign the counts since you you need to process the whole tree before you'll have the final counts (although you could do it as you, and assign the value within each node instead of using the %data hash). You can either update the values as you go, or post-process and use this code: for my $id ( keys %data ) { my $node = $tree->find_node(-id => $id) $node->add_tag_value('count',$data{$id}); } This seems to work for me - the root node (id=9) has 6 leaf nodes below it. #!/usr/bin/perl -w use strict; use Bio::TreeIO; my $in = new Bio::TreeIO(-fh => \*DATA, -format => 'newick'); my $tree = $in->next_tree; my %data; for my $node ( $tree->get_leaf_nodes ) { &inc_ancestor($node,\%data); } for my $id ( sort { $a <=> $b} keys %data ) { print $id, " ", $data{$id}, "\n"; } sub inc_ancestor { my $node = shift; my $data = shift; return unless defined $node && defined $node->ancestor; my $id = $node->ancestor->internal_id; $data->{$id}++; &inc_ancestor($node->ancestor,$data); } __DATA__ ((((Bosta2,Preen),Homsa),Papan),Equca,Ratno1); > > > foreach my $node ( $rootnode->get_all_Descendents() ) { > > $count=0; > foreach my $newnode ( $node-> get_all_Descendents()){ > if($newnode->is_Leaf){ > $count++; > } > } > > $nodeid=$node->internal_id; > $node->add_tag_value($count,1); > if ($count==0){ > print "$nodeid has no leaves...."; > } > } > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From lstein at cshl.edu Mon Dec 22 09:57:02 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Dec 22 10:04:42 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: References: Message-ID: <200312220957.02897.lstein@cshl.edu> Hi Juguang, A lot of people are going to be offended at your way of giving advice. Do not get upset if you are flamed. Your advice is good in many ways, but your manner of presenting it is poor. Nobody likes being compared to a high school student, particularly those of us in bioperl who *are* high school students. The bioperl core developers are aware that much of the current framework should be discarded in favor of more sophisticated class definition systems, such as Class::Accessor. However, there are also significant downsides to these systems that need to be discussed in detail. A big problem in using any autoloader-based accessor system is that the UNIVERSAL::can() method will no longer work properly on autoloaded methods. There are also performance penalties. Something that authors of the Perl Bible and other didactic texts seem to miss is that for better or worse Perl is fundamentally unlike Java. With Java, one mujst assume that the superclass acts like a black box and subclasses have no knowledge of its internals. Perl is much more informal, since the internal workings of the superclass are always exposed in both source code form and in subclass-visible data structures. Yes, it's not terrific style to use hash keys that might inadvertently be overwritten by subclasses, but it isn't a fatal error either. It is relatively easy to see the conflict and fix it. I cannot think of a case in which a persistent Bioperl bug turned out to be due to an inadvertently overwritten instance variable. A bigger issue is that changes to the accessors is relatively minor compared to a long overdue overhaul of the architecture and underlying data model of the sequence and sequence feature classes. This is a big and important job. Please don't focus on the trees and miss the forest. Lincoln On Monday 22 December 2003 03:28 am, Juguang Xiao wrote: > Hi all, > > I was Java programmer learning Perl from Bioperl code. I did learn > a lot for you ushers. Recently, I decide to read Programming Perl, > the Perl Bible, and found some common-sense code in bioperl is not > the best. Here are my findings. > > 1) getset > > Months ago, Hilmar gave us a tip for the getter/setter to accept > undef. The code is like > > sub name { > my $self=shift; > return $self->{_name} = shift if @_; > return $self->{_name}; > } > > It is no fault about it until this super or sub module have a > method with different name but use the same hash key. So choosing > the hash key is not on your own ease. Here is the code in Section > 12.7 > > sub name { > my $self = shift; > my $field = __PACKAGE__ . "::name"; > if (@_) { $self->{$field} = shift } > return $self->{$field}; > } > > Your getset generator may need to be updated. ;-) > ############################################## > 2) use vars (@ISA); > > This is copied from Chapter 31. > > 31.21. use vars > > use vars qw($frobbed @munge %seen); > > > This pragma, once used to declare a global variable, is now > somewhat deprecated in favor of the our modifier. The previous > declaration is better accomplished using: > > our($frobbed, @munge, %seen); > > or even: > our $frobbed = "F"; > our @munge = "A" .. $frobbed; > our %seen = (); > > ######################################### > 3) auto getset, again > > I really cannot stand individual getset any more, after I read > Section 12.7. Do yourself a favor, read it, please. One year ago, I > suggest to use AUTOLOAD replace all getset methods. The idea was > mercilessly extinguish. Now I have big boss's support in his book. > Anyone wants to say anything? ;-) ( I do not mean to use AUTOLOAD > again, but the rest ways in that section should be discussed) Just > do not stay your Perl wisdom and braveness at high school, though > your bioinformatics achievement reach above the Ph. D height. > > I prefer the idea on Section 12.7.4. Generating Accessors with > Closures. It is listsub'able. > > > my $0.02 > > Juguang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gnf.org Mon Dec 22 11:34:43 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Dec 22 11:42:20 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: Message-ID: On Monday, December 22, 2003, at 12:28 AM, Juguang Xiao wrote: > It is no fault about it until this super or sub module have a method > with different name but use the same hash key. So choosing the hash > key is not on your own ease. Here is the code in Section 12.7 > > sub name { > my $self = shift; > my $field = __PACKAGE__ . "::name"; > if (@_) { $self->{$field} = shift } > return $self->{$field}; > } > Good idea. > Your getset generator may need to be updated. ;-) Care to go ahead and do it? > ############################################## > 2) use vars (@ISA); > > This is copied from Chapter 31. > > 31.21. use vars > > use vars qw($frobbed @munge %seen); > > > This pragma, once used to declare a global variable, is now somewhat > deprecated in favor of the our modifier. The previous declaration is > better accomplished using: > > our($frobbed, @munge, %seen); I once started this and then was scolded for making bioperl incompatible with pre-5.005. But - don't we require 5.005 at minimum anyway meanwhile? > > or even: > our $frobbed = "F"; > our @munge = "A" .. $frobbed; > our %seen = (); > > ######################################### > 3) auto getset, again > > I really cannot stand individual getset any more, after I read Section > 12.7. Do yourself a favor, read it, please. One year ago, I suggest to > use AUTOLOAD replace all getset methods. The idea was mercilessly > extinguish. Now I have big boss's support in his book. Anyone wants to > say anything? ;-) ( I do not mean to use AUTOLOAD again, but the rest > ways in that section should be discussed) Just do not stay your Perl > wisdom and braveness at high school, though your bioinformatics > achievement reach above the Ph. D height. > You had big boss' support at that time already. My stance on that is unchanged: I can't see how auto-loaded getter/setters make the code any better or increase coding productivity in any way. Especially once you're debugging. If you can't stand individual (uhm - in fact auto-generated by emacs lisp macros BTW) getter/setters then use auto-loading in your code, but don't be surprised if someone goes in and changes it to getter/setters for code clarity and better debugging. -hilmar > I prefer the idea on Section 12.7.4. Generating Accessors with > Closures. It is listsub'able. > > > my $0.02 > > Juguang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Dec 22 11:46:37 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Dec 22 11:54:13 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: <200312220957.02897.lstein@cshl.edu> Message-ID: <68ED0AE1-349E-11D8-AE25-000A959EB4C4@gnf.org> On Monday, December 22, 2003, at 06:57 AM, Lincoln Stein wrote: > A big problem in using any autoloader-based accessor system > is that the UNIVERSAL::can() method will no longer work properly on > autoloaded methods Very true, thanks for pointing this out. bioperl-db would immediately and completely break, for instance, as it relies entirely on can() for wrapping objects with a persistence shell. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney at ebi.ac.uk Mon Dec 22 11:52:56 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon Dec 22 12:00:42 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: Message-ID: > You had big boss' support at that time already. My stance on that is > unchanged: I can't see how auto-loaded getter/setters make the code any > better or increase coding productivity in any way. Especially once > you're debugging. If you can't stand individual (uhm - in fact > auto-generated by emacs lisp macros BTW) getter/setters then use > auto-loading in your code, but don't be surprised if someone goes in > and changes it to getter/setters for code clarity and better debugging. Same here. I think AUTOLOAD should be used v. sparingly. It simply doesn't help readability, stability or speed. emacs macros much better ;) From redwards at utmem.edu Mon Dec 22 21:30:54 2003 From: redwards at utmem.edu (Rob Edwards) Date: Mon Dec 22 21:38:24 2003 Subject: [Bioperl-l] Really Minor (but annoying) things Message-ID: <08348E60-34F0-11D8-9225-000A959E1622@utmem.edu> Here are a couple of really minor things that I picked up recently (and if I don't send them now, I am bound to forget). I can go ahead and submit these changes unless there are objections: Bio::Tools::RemoteBlast The line open(SAVEOUT, ">$filename") or $self->throw("cannot open $filename"); should probably be open(SAVEOUT, ">>$filename") or $self->throw("cannot open $filename"); so that you can put all the BLAST results into one file. Bio::Matrix::PhylipDist->new the docs are completely wrong for this because the module was updated to use the new Matrix::IO system, and if it is called as described in the docs it should either silently pass this information onto the correct module (Bio::Matrix::IO) or throw a warning. Rob From juguang at tll.org.sg Mon Dec 22 21:40:49 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Mon Dec 22 21:47:55 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: <68ED0AE1-349E-11D8-AE25-000A959EB4C4@gnf.org> Message-ID: <6B0F312F-34F1-11D8-B381-000A957702FE@tll.org.sg> On Tuesday, December 23, 2003, at 12:46 am, Hilmar Lapp wrote: > > On Monday, December 22, 2003, at 06:57 AM, Lincoln Stein wrote: > >> A big problem in using any autoloader-based accessor system >> is that the UNIVERSAL::can() method will no longer work properly on >> autoloaded methods > > Very true, thanks for pointing this out. bioperl-db would immediately > and completely break, for instance, as it relies entirely on can() for > wrapping objects with a persistence shell. Sorry, I may mislead you to remember the AUTOLOAD again, and should come with an example. This time I mean the methodology exampled below, the package Person. I am using perl 5.8, on Mac OSX 10.2. UNIVERSAL::can is able to detect the accessor functions in this way, since this tip makes the truly and individual perl methods in symbol table, at compile time. Regards, Juguang ############### package Person; sub new { my $invocant = shift; my $self = bless({}, ref $invocant || $invocant); $self->init(); return $self; } sub init { my $self = shift; $self->name("unnamed"); $self->race("unknown"); $self->aliases([]); } for my $field (qw(name race aliases)) { my $slot = __PACKAGE__ . "::$field"; no strict "refs"; # So symbolic ref to typeglob works. *$field = sub { my $self = shift; $self->{$slot} = shift if @_; return $self->{$slot}; }; } package main; use Person; my $he=Person->new(); print ($he->can('name')?'Y':'N'), "\n"; From juguang at tll.org.sg Mon Dec 22 22:29:35 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Mon Dec 22 22:36:38 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: <200312220957.02897.lstein@cshl.edu> Message-ID: <3B466B23-34F8-11D8-B381-000A957702FE@tll.org.sg> On Monday, December 22, 2003, at 10:57 pm, Lincoln Stein wrote: > Hi Juguang, > > A lot of people are going to be offended at your way of giving advice. > Do not get upset if you are flamed. Your advice is good in many > ways, but your manner of presenting it is poor. Nobody likes being > compared to a high school student, particularly those of us in > bioperl who *are* high school students. Hi Lincoln, I respect each developer of bioperl, as I am one of them. Sorry for my impolite manner. > The bioperl core developers are aware that much of the current > framework should be discarded in favor of more sophisticated class > definition systems, such as Class::Accessor. However, there are also > significant downsides to these systems that need to be discussed in > detail. A big problem in using any autoloader-based accessor system > is that the UNIVERSAL::can() method will no longer work properly on > autoloaded methods. There are also performance penalties. As in my previous reply, this is no performance penalty in that method, since it takes place at compile time. Sorry again to mislead you remember AUTOLOAD. > Something that authors of the Perl Bible and other didactic texts seem > to miss is that for better or worse Perl is fundamentally unlike > Java. With Java, one mujst assume that the superclass acts like a > black box and subclasses have no knowledge of its internals. Perl is > much more informal, since the internal workings of the superclass are > always exposed in both source code form and in subclass-visible data > structures. Yes, it's not terrific style to use hash keys that might > inadvertently be overwritten by subclasses, but it isn't a fatal > error either. It is relatively easy to see the conflict and fix it. I > cannot think of a case in which a persistent Bioperl bug turned out > to be due to an inadvertently overwritten instance variable. No, there is no bug because of it, ... so far. > A bigger issue is that changes to the accessors is relatively minor > compared to a long overdue overhaul of the architecture and > underlying data model of the sequence and sequence feature classes. > This is a big and important job. Please don't focus on the trees and > miss the forest. The coders are at their own ease and risk to have the suggested way in the own package. Architecture is not bothered at all, I think. Well, my experience may be shallow, however, I am in favor of central control, which also means high reusability in some sense. If we can have some smart shortcut for generate and *maintain* the cheap accessors, then we will have more time with our biological logic code. That is my hope. Thanks for guidance from all you guys this year. Merry Christmas! Juguang From hlapp at gnf.org Mon Dec 22 22:58:50 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Dec 22 23:06:25 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: <6B0F312F-34F1-11D8-B381-000A957702FE@tll.org.sg> Message-ID: <50E65439-34FC-11D8-A0FE-000A959EB4C4@gnf.org> What if you overrode a method in this way? Would you still be able to call $obj->SUPER::overridden_method() or would you have messed up the inherited symbol table? -hilmar On Monday, December 22, 2003, at 06:40 PM, Juguang Xiao wrote: > > On Tuesday, December 23, 2003, at 12:46 am, Hilmar Lapp wrote: > >> >> On Monday, December 22, 2003, at 06:57 AM, Lincoln Stein wrote: >> >>> A big problem in using any autoloader-based accessor system >>> is that the UNIVERSAL::can() method will no longer work properly on >>> autoloaded methods >> >> Very true, thanks for pointing this out. bioperl-db would immediately >> and completely break, for instance, as it relies entirely on can() >> for wrapping objects with a persistence shell. > > Sorry, I may mislead you to remember the AUTOLOAD again, and should > come with an example. This time I mean the methodology exampled below, > the package Person. > > I am using perl 5.8, on Mac OSX 10.2. UNIVERSAL::can is able to detect > the accessor functions in this way, since this tip makes the truly and > individual perl methods in symbol table, at compile time. > > Regards, > Juguang > > ############### > package Person; > > sub new { > my $invocant = shift; > my $self = bless({}, ref $invocant || $invocant); > $self->init(); > return $self; > } > > sub init { > my $self = shift; > $self->name("unnamed"); > $self->race("unknown"); > $self->aliases([]); > } > > for my $field (qw(name race aliases)) { > my $slot = __PACKAGE__ . "::$field"; > no strict "refs"; # So symbolic ref to typeglob works. > > *$field = sub { > my $self = shift; > $self->{$slot} = shift if @_; > return $self->{$slot}; > }; > } > > package main; > use Person; > my $he=Person->new(); > print ($he->can('name')?'Y':'N'), "\n"; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From juguang at tll.org.sg Tue Dec 23 02:00:04 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Tue Dec 23 02:07:07 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: <50E65439-34FC-11D8-A0FE-000A959EB4C4@gnf.org> Message-ID: On Tuesday, December 23, 2003, at 11:58 am, Hilmar Lapp wrote: > What if you overrode a method in this way? Would you still be able to > call $obj->SUPER::overridden_method() or would you have messed up the > inherited symbol table? > Good question. I am recently hacking the Perl symbol table and able to answer this, hopefully correctly. :-) $obj->SUPER::overridden_method() means SUPER::overriden($obj), which will look up the symbol table of SUPER, say Parent, the most left module in @ISA of this $obj module, and find the code entry that is \&{Parent::overriden_method}, and pass $obj as the first argument. *pkg::sym{SCALAR} # same as \$pkg::sym *pkg::sym{ARRAY} # same as \@pkg::sym *pkg::sym{HASH} # same as \%pkg::sym *pkg::sym{CODE} # same as \&pkg::sym >> >> for my $field (qw(name race aliases)) { >> my $slot = __PACKAGE__ . "::$field"; >> no strict "refs"; # So symbolic ref to typeglob works. >> >> *$field = sub { >> my $self = shift; >> $self->{$slot} = shift if @_; >> return $self->{$slot}; >> }; this does the same as your hard source code, say, sub name. *{Person::name} = sub {...}; also puzzlingly same as *{main::Person::name}= sub {...}; The either sub or super class of Person will have different symbol key prefix, because what Perl looks at is which package a method is defined in. If the Employee inherits Person, and you have the code like, package Employee; our @ISA=qw(Person); # this line does not affect the symbol table when we search 'name' *name = sub {...}; # the consequence will be adding a *Employee::name{CODE} into Employee symbol table. 1; The symbol table is constructed like, as my understanding, { Employee::name => { CODE => \sub {}, }, Employee::ISA => { ARRAY => [Person], # defined as our @ISA, can be use as @{Employee::ISA} }, Person::name => { CODE => \sub {} } } Well, we start to dive more hacky. We all should read Programming Perl Chapter 10..12 again, if we want to go on. Let me know if your understanding is different from mine. Thanks. Again, you are at your own ease and risk. Juguang >> } >> >> package main; >> use Person; >> my $he=Person->new(); >> print ($he->can('name')?'Y':'N'), "\n"; From heikki at nildram.co.uk Tue Dec 23 04:11:35 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 23 04:19:06 2003 Subject: [Bioperl-l] Bioperl Release 1.4 Message-ID: <200312230911.35200.heikki@nildram.co.uk> Bioperl Release 1.4 -------------------- The stable Bioperl release 1.4 is available for immediate use at: http://bioperl.org/DIST We are releasing simultaneously three modules: bioperl-core - core bioperl modules http://bioperl.org/DIST/current_core_stable.tar.gz http://bioperl.org/DIST/current_core_stable.tar.bz2 bioperl-ext - C compiled extensions http://bioperl.org/DIST/current_ext_stable.tar.gz http://bioperl.org/DIST/current_ext_stable.tar.bz2 bioperl-run - wrappers for external programs http://bioperl.org/DIST/current_run_stable.tar.gz http://bioperl.org/DIST/current_run_stable.tar.bz2 They will also appear shortly at IUBIO mirror (later today) and in CPAN . Remember, all the external modules needed by bioperl-core can be installed from CPAN under name Bundle-BioPerl . Changes ------- Over 3000 file changes have gone into this release since the 1.2 development tree was branched off from the main. These are the main feature enhancements: o installable scripts o global module version from Bio::Root:Version o Bio::Graphics - major improvements; added SVG support o Bio::Popgen - population genetics o Bio::Restriction - new restrion analysis modulues o Bio::Tools::Analysis - web based DNA and Protein analysis framework and several implementaions o Bio::Seq::Meta - per residue annotable sequences o Bio::Matrix o Bio::Matrix::PSM - Position Scoring Matrix o Bio::Ontology - major contributions o Bio:Tree o Bio::Tools::SiRNA, Bio::SeqFeature::SiRNA - small inhibitory RNA o Bio::SeqFeature::Tools - seqFeature mapping tools, e.g. Bio::SeqFeature::Tools::Unflattener.pm o Bio::Tools::dpAlign - pure perl dynamic programming sequence alignment (needs Bioperl-ext) o new Bio::SearchIO formats o new Bio::SeqIO formats: tab, kegg, tigr, game; important fixes for old modulues o Bio::AlignIO: maf o improved Bio::Tools::Genewise o Bio::SeqIO now can recognize sequence formats automatically from stream o new parsers in Bio::Tools: Blat, Geneid, Lagan, Mdust, Promoterwise, PrositeScan, o several new HOWTOs: SimpleWebAnalysis, Trees, Feature Annotation, OBDA Access, Flat Databases o hundreds of new and improved files For detailed documentation, see individual module documentation in the distribution or in http://doc.bioperl.org/. The tutorials are available at http://bioperl.org/HOWTOs/. This release is a result of hard work by the bioperl core team, nearly hundred developers and countless suggestions and bug reports at the bioperl mailing list (bioperl-l@bioperl.org) or the the bioperl bug tracking system (http://bugzilla.bioperl.org/). Wishing you all Peaceful Christmas, -Heikki and all the bioperl developers From hlapp at gmx.net Tue Dec 23 04:20:53 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Dec 23 04:28:26 2003 Subject: [Bioperl-l] Bioperl Release 1.4 In-Reply-To: <200312230911.35200.heikki@nildram.co.uk> Message-ID: <4E84F453-3529-11D8-A0FE-000A959EB4C4@gmx.net> Congratulations Heikki, you got this out the door ready for being wrapped and put under the tree :) Great work. -hilmar On Tuesday, December 23, 2003, at 01:11 AM, Heikki Lehvaslaiho wrote: > > Bioperl Release 1.4 > -------------------- > > The stable Bioperl release 1.4 is available for immediate use at: > > http://bioperl.org/DIST > > We are releasing simultaneously three modules: > > bioperl-core - core bioperl modules > http://bioperl.org/DIST/current_core_stable.tar.gz > http://bioperl.org/DIST/current_core_stable.tar.bz2 > bioperl-ext - C compiled extensions > http://bioperl.org/DIST/current_ext_stable.tar.gz > http://bioperl.org/DIST/current_ext_stable.tar.bz2 > bioperl-run - wrappers for external programs > http://bioperl.org/DIST/current_run_stable.tar.gz > http://bioperl.org/DIST/current_run_stable.tar.bz2 > > > They will also appear shortly at IUBIO mirror > (later today) > and in CPAN . > > Remember, all the external modules needed by bioperl-core can be > installed from CPAN under name Bundle-BioPerl > . > > > Changes > ------- > > Over 3000 file changes have gone into this release since the 1.2 > development tree was branched off from the main. These are the > main feature enhancements: > > o installable scripts > o global module version from Bio::Root:Version > o Bio::Graphics - major improvements; added SVG support > o Bio::Popgen - population genetics > o Bio::Restriction - new restrion analysis modulues > o Bio::Tools::Analysis - web based DNA and Protein analysis > framework and several implementaions > o Bio::Seq::Meta - per residue annotable sequences > o Bio::Matrix > o Bio::Matrix::PSM - Position Scoring Matrix > o Bio::Ontology - major contributions > o Bio:Tree > o Bio::Tools::SiRNA, Bio::SeqFeature::SiRNA - small inhibitory RNA > o Bio::SeqFeature::Tools - seqFeature mapping tools, > e.g. Bio::SeqFeature::Tools::Unflattener.pm > o Bio::Tools::dpAlign - pure perl dynamic programming sequence > alignment > (needs Bioperl-ext) > o new Bio::SearchIO formats > o new Bio::SeqIO formats: tab, kegg, tigr, game; important fixes for > old modulues > o Bio::AlignIO: maf > o improved Bio::Tools::Genewise > o Bio::SeqIO now can recognize sequence formats automatically from > stream > o new parsers in Bio::Tools: > Blat, Geneid, Lagan, Mdust, Promoterwise, PrositeScan, > o several new HOWTOs: SimpleWebAnalysis, Trees, Feature Annotation, > OBDA Access, Flat Databases > o hundreds of new and improved files > > For detailed documentation, see individual module documentation in the > distribution or in http://doc.bioperl.org/. The tutorials are > available at http://bioperl.org/HOWTOs/. > > > This release is a result of hard work by the bioperl core team, nearly > hundred developers and countless suggestions and bug reports at the > bioperl mailing list (bioperl-l@bioperl.org) or the the bioperl bug > tracking system (http://bugzilla.bioperl.org/). > > > Wishing you all Peaceful Christmas, > > -Heikki and all the bioperl developers > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From juguang at tll.org.sg Tue Dec 23 04:33:19 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Tue Dec 23 04:40:21 2003 Subject: [Bioperl-l] versioning Message-ID: <0AF39022-352B-11D8-9746-000A957702FE@tll.org.sg> Hi list, I think this is more meaningful suggestion than what I had previously. Most of Bioperl modules does not have $VERSION, which is encouraged by CPAN community. Should we make it in release 1.4? I am aware that there are an array of emails in this list about this versioning, but they do not pass the simple condition, perl -MExtUtils::MakeMaker -le 'print MM->parse_version(shift)' 'file' in http://www.cpan.org/modules/04pause.html#conventions I have written a quite short script, maintenance/version.pl, in CVS now. It append our $VERSION="1.4"; after the package declaration line. It changes almost all bioperl modules, except 1.4 /home/juguang/src/bioperl-live//Bio/Root/Version.pm 0.50 /home/juguang/src/bioperl-live//Bio/Tools/dpAlign.pm 1.15 /home/juguang/src/bioperl-live//t/Test.pm After the changes, all my tests pass, except t/RestrictionIO, it never passes on my MacOS 10.2.8, Perl 5.6. I do not think other way except writing 'our $VERSION="1.4";' line explicitly in each file. Since we should do this before compile time. Also I don't think it is good to let Makefile to do it for us, since it should be done before distribution. Let me know if you like this idea and script. Core guys, you can run it on your machine and commit 700 modules!, or I can do it, if you do not say no. Juguang From birney at ebi.ac.uk Tue Dec 23 04:44:08 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Dec 23 04:51:39 2003 Subject: [Bioperl-l] Bioperl Release 1.4 In-Reply-To: <200312230911.35200.heikki@nildram.co.uk> Message-ID: Congratulations Heikki in 1.4 release. Very nice indeed... The Bioperl 1.4 release is uploaded to CPAN and should be on your local CPAN mirror in 1 or 2 days. ewan From birney at ebi.ac.uk Tue Dec 23 04:45:08 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Dec 23 04:52:43 2003 Subject: [Bioperl-l] versioning In-Reply-To: <0AF39022-352B-11D8-9746-000A957702FE@tll.org.sg> Message-ID: Juguang; Heikki made some deliberate policy about versions for 1.4, so hold off doing anything until he can respond. From lstein at cshl.edu Tue Dec 23 08:49:23 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Tue Dec 23 08:56:54 2003 Subject: [Bioperl-l] Bioperl Release 1.4 In-Reply-To: References: Message-ID: <200312230849.23980.lstein@cshl.edu> Merry Christmas/Hanukkah/Winter Solstice to all. What a marvelous present to the community. Lincoln On Tuesday 23 December 2003 04:44 am, Ewan Birney wrote: > Congratulations Heikki in 1.4 release. Very nice indeed... > > > The Bioperl 1.4 release is uploaded to CPAN and should be on your > local CPAN mirror in 1 or 2 days. > > > ewan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Tue Dec 23 10:11:03 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 23 10:18:38 2003 Subject: [Bioperl-l] versioning In-Reply-To: <0AF39022-352B-11D8-9746-000A957702FE@tll.org.sg> References: <0AF39022-352B-11D8-9746-000A957702FE@tll.org.sg> Message-ID: Keeping a VERSION variable up to date in every single bioperl module (711 right now) is a severe PITA. find Bio -name '*.pm' | wc -l 711 So in order to simplify this, Aaron wrote Bio::Root::Version. =head1 NAME Bio::Root::Version - provide global, distribution-level versioning =head1 DESCRIPTION This module provides a mechanism by which all other BioPerl modules can share the same $VERSION, without manually synchronizing each file. Bio::Root::RootI itself uses this module, so any module that directly (or indirectly) uses Bio::Root::RootI will get a global $VERSION variable set if it's not already. perl -e 'use Bio::Seq; print $Bio::Seq::VERSION, "\n"' perl -e 'use Bio::SeqIO; print $Bio::SeqIO::VERSION, "\n"'; On Tue, 23 Dec 2003, Juguang Xiao wrote: > Hi list, > > I think this is more meaningful suggestion than what I had previously. > Most of Bioperl modules does not have $VERSION, which is encouraged by > CPAN community. Should we make it in release 1.4? I am aware that there > are an array of emails in this list about this versioning, but they do > not pass the simple condition, > > perl -MExtUtils::MakeMaker -le 'print MM->parse_version(shift)' 'file' > > in http://www.cpan.org/modules/04pause.html#conventions > > I have written a quite short script, maintenance/version.pl, in CVS > now. It append > > our $VERSION="1.4"; > > after the package declaration line. It changes almost all bioperl > modules, except > > 1.4 > /home/juguang/src/bioperl-live//Bio/Root/Version.pm > 0.50 > /home/juguang/src/bioperl-live//Bio/Tools/dpAlign.pm > 1.15 > /home/juguang/src/bioperl-live//t/Test.pm > > After the changes, all my tests pass, except t/RestrictionIO, it never > passes on my MacOS 10.2.8, Perl 5.6. > > I do not think other way except writing 'our $VERSION="1.4";' line > explicitly in each file. Since we should do this before compile time. > Also I don't think it is good to let Makefile to do it for us, since it > should be done before distribution. > > Let me know if you like this idea and script. Core guys, you can run it > on your machine and commit 700 modules!, or I can do it, if you do not > say no. > > Juguang > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From andreas.bernauer at gmx.de Wed Dec 17 14:53:51 2003 From: andreas.bernauer at gmx.de (Andreas Bernauer) Date: Tue Dec 23 10:20:30 2003 Subject: [Bioperl-l] substitution matrices Message-ID: <20031217195351.GD13034@hgt.mcb.uconn.edu> Hi, Is there a module with which I can calculate substitution matrices (out of alignments)? I've searched through bioperl.org but I couldn't really find anything. If there is no such module, can anybody of you point me to a website or alike where I can start to search for such a program, i.e. a program that calculates substitution matrices like Dayhoff's PAM matrix or JTT did out of alignments? I am relatively new to this field and I've been searching for a couple of days for such a program, but I couldn't find anyone. The suits that are usually used and that I've found (like EMBOSS) only use the matrices but don't seem to create them. I'll appreciate any hint. Thank you very much for you time. Andreas. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031217/093eb440/attachment.bin From heikki at nildram.co.uk Tue Dec 23 10:54:25 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 23 11:01:55 2003 Subject: [Bioperl-l] versioning In-Reply-To: <0AF39022-352B-11D8-9746-000A957702FE@tll.org.sg> References: <0AF39022-352B-11D8-9746-000A957702FE@tll.org.sg> Message-ID: <200312231554.25145.heikki@nildram.co.uk> Juguang, We've had quite a lot of trouble with $VERSION. Pre-1.4 we did not have any consistent system. I was planning a script based system probably quite similar to yours when Aaron came up with Bio::Root::Version. Now all modules get their version from Bio::Root::Version (via Bio::Root::RootI). This system has its limitations: most importantly, you can not query the verson from multiple modules. However, getting a consistent answer from any module using, e.g. perl -MBio::Perl -le 'print Bio::Perl->VERSION;' is quite an advancement in my opinion. The problem I can see in your implementation is that perl 5.005 that we are still supporting, does not know 'our'. (like pointed out by Hilmar yesterday) So again, I'd like to err to the conservative side rather than change for its own sake. Let's give the current version implementation a chance, but keep your script in mind for the future. If the current system does not work in practise,we could change it for the next release. It would be nice to modernize bioperl a bit. I guess we should poll our users if we could drop supporting versions pre 5.6. Have a good holidays, -Heikki On Tuesday 23 Dec 2003 9:33 am, Juguang Xiao wrote: > Hi list, > > I think this is more meaningful suggestion than what I had previously. > Most of Bioperl modules does not have $VERSION, which is encouraged by > CPAN community. Should we make it in release 1.4? I am aware that there > are an array of emails in this list about this versioning, but they do > not pass the simple condition, > > perl -MExtUtils::MakeMaker -le 'print MM->parse_version(shift)' 'file' > > in http://www.cpan.org/modules/04pause.html#conventions > > I have written a quite short script, maintenance/version.pl, in CVS > now. It append > > our $VERSION="1.4"; > > after the package declaration line. It changes almost all bioperl > modules, except > > 1.4 > /home/juguang/src/bioperl-live//Bio/Root/Version.pm > 0.50 > /home/juguang/src/bioperl-live//Bio/Tools/dpAlign.pm > 1.15 > /home/juguang/src/bioperl-live//t/Test.pm > > After the changes, all my tests pass, except t/RestrictionIO, it never > passes on my MacOS 10.2.8, Perl 5.6. > > I do not think other way except writing 'our $VERSION="1.4";' line > explicitly in each file. Since we should do this before compile time. > Also I don't think it is good to let Makefile to do it for us, since it > should be done before distribution. > > Let me know if you like this idea and script. Core guys, you can run it > on your machine and commit 700 modules!, or I can do it, if you do not > say no. > > Juguang > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Tue Dec 23 10:54:25 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 23 11:01:59 2003 Subject: [Bioperl-l] versioning In-Reply-To: <0AF39022-352B-11D8-9746-000A957702FE@tll.org.sg> References: <0AF39022-352B-11D8-9746-000A957702FE@tll.org.sg> Message-ID: <200312231554.25145.heikki@nildram.co.uk> Juguang, We've had quite a lot of trouble with $VERSION. Pre-1.4 we did not have any consistent system. I was planning a script based system probably quite similar to yours when Aaron came up with Bio::Root::Version. Now all modules get their version from Bio::Root::Version (via Bio::Root::RootI). This system has its limitations: most importantly, you can not query the verson from multiple modules. However, getting a consistent answer from any module using, e.g. perl -MBio::Perl -le 'print Bio::Perl->VERSION;' is quite an advancement in my opinion. The problem I can see in your implementation is that perl 5.005 that we are still supporting, does not know 'our'. (like pointed out by Hilmar yesterday) So again, I'd like to err to the conservative side rather than change for its own sake. Let's give the current version implementation a chance, but keep your script in mind for the future. If the current system does not work in practise,we could change it for the next release. It would be nice to modernize bioperl a bit. I guess we should poll our users if we could drop supporting versions pre 5.6. Have a good holidays, -Heikki On Tuesday 23 Dec 2003 9:33 am, Juguang Xiao wrote: > Hi list, > > I think this is more meaningful suggestion than what I had previously. > Most of Bioperl modules does not have $VERSION, which is encouraged by > CPAN community. Should we make it in release 1.4? I am aware that there > are an array of emails in this list about this versioning, but they do > not pass the simple condition, > > perl -MExtUtils::MakeMaker -le 'print MM->parse_version(shift)' 'file' > > in http://www.cpan.org/modules/04pause.html#conventions > > I have written a quite short script, maintenance/version.pl, in CVS > now. It append > > our $VERSION="1.4"; > > after the package declaration line. It changes almost all bioperl > modules, except > > 1.4 > /home/juguang/src/bioperl-live//Bio/Root/Version.pm > 0.50 > /home/juguang/src/bioperl-live//Bio/Tools/dpAlign.pm > 1.15 > /home/juguang/src/bioperl-live//t/Test.pm > > After the changes, all my tests pass, except t/RestrictionIO, it never > passes on my MacOS 10.2.8, Perl 5.6. > > I do not think other way except writing 'our $VERSION="1.4";' line > explicitly in each file. Since we should do this before compile time. > Also I don't think it is good to let Makefile to do it for us, since it > should be done before distribution. > > Let me know if you like this idea and script. Core guys, you can run it > on your machine and commit 700 modules!, or I can do it, if you do not > say no. > > Juguang > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From pm66 at nyu.edu Tue Dec 23 11:01:13 2003 From: pm66 at nyu.edu (Philip MacMenamin) Date: Tue Dec 23 11:06:25 2003 Subject: [Bioperl-l] wormbase115/Bio::DB::GFF::Aggregator wormbase_transcript problem Message-ID: <200312231558.hBNFwsX8017788@mx4.nyu.edu> Hi, Previously I ran the following code to draw a curated gene with UTRs hanging on the ends (attached): my $aggregator = Bio::DB::GFF::Aggregator->new(-method => 'transcript', -sub_parts => ['UTR:UTR','exon:curated','CDS:curated'] ); my $db = new Bio::DB::GFF(-adaptor=>'dbi::mysqlopt', -dsn=>'dbi:mysql:wormbase115Mod;host=localhost', -user=>'philip', -pass=> $passwd, -aggregator =>$aggregator ) or die(); my @all_transcripts = $searchSeg->features('transcript'); if (scalar @all_transcripts ) { $panel->add_track(wormbase_transcript=>\@all_transcripts, -bgcolor => 'wheat', -fgcolor => 'black', -forwardcolor => 'blue', -reversecolor => 'blue', -spacing => 0, -utr_color => '#D0D0D0', -font2color => 'blue', -height => 10, -description => 1, -label => 1, -key => "Curated genes"); } This worked fine. With the latest release of wormbase this code does not work. So I have changed the -sub_parts arg for the Aggregator object to -sub_parts => ['UTR:UTR','exon:curated','CDS:curated'] which gets rid of some things that I dont want. However, I cannot get it to draw the UTRs actually hanging on the ends of the gene, it stacks/bumps them now. They do not over lap in their start /stop co-ords. ??? Thanks. -- Philip MacMenamin From jason at cgt.duhs.duke.edu Tue Dec 23 11:14:15 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 23 11:21:46 2003 Subject: [Bioperl-l] wormbase115/Bio::DB::GFF::Aggregator wormbase_transcript problem In-Reply-To: <200312231558.hBNFwsX8017788@mx4.nyu.edu> References: <200312231558.hBNFwsX8017788@mx4.nyu.edu> Message-ID: I'm sure Lincoln will answer better in full, but have you tried the processed_transcript aggregator instead? There is a wormbase_gene aggregator as part of Gbrowse as well? >From Bio::DB::GFF::Aggregator::wormbase_gene which is part of Gbrowse =head1 DESCRIPTION Bio::DB::GFF::Aggregator::wormbase_gene is one of the default aggregators, and was written to be compatible with the C elegans GFF files. It aggregates raw "CDS", "5'UTR", "3'UTR", "polyA" and "TSS" features into "transcript" features. For compatibility with the idiosyncrasies of the Sanger GFF format, it expects that the full range of the transcript is contained in a main feature of type "Sequence". -jason On Tue, 23 Dec 2003, Philip MacMenamin wrote: > Hi, > Previously I ran the following code to draw a curated gene with UTRs > hanging on the ends (attached): > > my $aggregator = Bio::DB::GFF::Aggregator->new(-method => 'transcript', > -sub_parts => ['UTR:UTR','exon:curated','CDS:curated'] > ); > my $db = new Bio::DB::GFF(-adaptor=>'dbi::mysqlopt', > -dsn=>'dbi:mysql:wormbase115Mod;host=localhost', > -user=>'philip', > -pass=> $passwd, > -aggregator =>$aggregator > ) or die(); > my @all_transcripts = $searchSeg->features('transcript'); > if (scalar @all_transcripts ) > { > $panel->add_track(wormbase_transcript=>\@all_transcripts, > -bgcolor => 'wheat', > -fgcolor => 'black', > -forwardcolor => 'blue', > -reversecolor => 'blue', > -spacing => 0, > -utr_color => '#D0D0D0', > -font2color => 'blue', > -height => 10, > -description => 1, > -label => 1, > -key => "Curated genes"); > } > > This worked fine. > > With the latest release of wormbase this code does not work. > So I have changed the -sub_parts arg for the Aggregator object to > -sub_parts => ['UTR:UTR','exon:curated','CDS:curated'] > which gets rid of some things that I dont want. However, I cannot get it to > draw the UTRs actually hanging on the ends of the gene, it stacks/bumps them > now. They do not over lap in their start /stop co-ords. > > ??? > Thanks. > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Tue Dec 23 11:15:44 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 23 11:23:14 2003 Subject: [Bioperl-l] Really Minor (but annoying) things In-Reply-To: <08348E60-34F0-11D8-9225-000A959E1622@utmem.edu> References: <08348E60-34F0-11D8-9225-000A959E1622@utmem.edu> Message-ID: Wish you could have sent this pre-1.4 release Rob and would have gotten fixed... On Mon, 22 Dec 2003, Rob Edwards wrote: > Here are a couple of really minor things that I picked up recently (and > if I don't send them now, I am bound to forget). I can go ahead and > submit these changes unless there are objections: > > Bio::Tools::RemoteBlast > > The line > open(SAVEOUT, ">$filename") or $self->throw("cannot open $filename"); > should probably be > open(SAVEOUT, ">>$filename") or $self->throw("cannot open $filename"); > so that you can put all the BLAST results into one file. > > Bio::Matrix::PhylipDist->new > > the docs are completely wrong for this because the module was updated > to use the new Matrix::IO system, and if it is called as described in > the docs it should either silently pass this information onto the > correct module (Bio::Matrix::IO) or throw a warning. > I'm the culprit here - I'll look into what can be done - we really needed to centralize this code since we had 2 different places doing this sort of parsing. > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From amackey at pcbi.upenn.edu Tue Dec 23 11:16:00 2003 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Tue Dec 23 11:23:29 2003 Subject: [Bioperl-l] versioning In-Reply-To: <200312231554.25145.heikki@nildram.co.uk> References: <0AF39022-352B-11D8-9746-000A957702FE@tll.org.sg> <200312231554.25145.heikki@nildram.co.uk> Message-ID: <4C12AF53-3563-11D8-BDBE-000A958C5008@pcbi.upenn.edu> To be fair, the problem Juguang is pointing out is that the method in which ExtUtils (and therefore CPAN) uses to determine the version of an already-installed module is not handled (nor can it be) by Bio::Root::Version. Therefore, if someone else's package "Foo" has a Makefile.PL that specifies a requirement for Bio::Perl version 1.4, when CPAN tries to install the "Foo" module it may complain that the right version isn't present (I believe, however, that when a requisite version cannot be determined by parsing, it doesn't prompt for (re)installation; furthermore, the "correct" answer is to require Bio::Root::Version 1.4). But since 1.4 is out in the wild now, we can only sit back and see what happens. -Aaron On Dec 23, 2003, at 10:54 AM, Heikki Lehvaslaiho wrote: > Juguang, > > We've had quite a lot of trouble with $VERSION. Pre-1.4 we did not > have any > consistent system. I was planning a script based system probably quite > similar to yours when Aaron came up with Bio::Root::Version. > > Now all modules get their version from Bio::Root::Version (via > Bio::Root::RootI). This system has its limitations: most importantly, > you can > not query the verson from multiple modules. However, getting a > consistent > answer from any module using, e.g. > > perl -MBio::Perl -le 'print Bio::Perl->VERSION;' > > is quite an advancement in my opinion. > > The problem I can see in your implementation is that perl 5.005 that > we are > still supporting, does not know 'our'. (like pointed out by Hilmar > yesterday) > > So again, I'd like to err to the conservative side rather than change > for its > own sake. Let's give the current version implementation a chance, but > keep > your script in mind for the future. If the current system does not > work in > practise,we could change it for the next release. > > It would be nice to modernize bioperl a bit. I guess we should poll > our users > if we could drop supporting versions pre 5.6. > > Have a good holidays, > > -Heikki > > On Tuesday 23 Dec 2003 9:33 am, Juguang Xiao wrote: >> Hi list, >> >> I think this is more meaningful suggestion than what I had previously. >> Most of Bioperl modules does not have $VERSION, which is encouraged by >> CPAN community. Should we make it in release 1.4? I am aware that >> there >> are an array of emails in this list about this versioning, but they do >> not pass the simple condition, >> >> perl -MExtUtils::MakeMaker -le 'print MM->parse_version(shift)' >> 'file' >> >> in http://www.cpan.org/modules/04pause.html#conventions >> >> I have written a quite short script, maintenance/version.pl, in CVS >> now. It append >> >> our $VERSION="1.4"; >> >> after the package declaration line. It changes almost all bioperl >> modules, except >> >> 1.4 >> /home/juguang/src/bioperl-live//Bio/Root/Version.pm >> 0.50 >> /home/juguang/src/bioperl-live//Bio/Tools/dpAlign.pm >> 1.15 >> /home/juguang/src/bioperl-live//t/Test.pm >> >> After the changes, all my tests pass, except t/RestrictionIO, it never >> passes on my MacOS 10.2.8, Perl 5.6. >> >> I do not think other way except writing 'our $VERSION="1.4";' line >> explicitly in each file. Since we should do this before compile time. >> Also I don't think it is good to let Makefile to do it for us, since >> it >> should be done before distribution. >> >> Let me know if you like this idea and script. Core guys, you can run >> it >> on your machine and commit 700 modules!, or I can do it, if you do not >> say no. >> >> Juguang >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From heikki at nildram.co.uk Tue Dec 23 11:52:01 2003 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Tue Dec 23 11:59:30 2003 Subject: [Bioperl-l] Reminder to post 1.4 bioperl developers Message-ID: <200312231652.01451.heikki@nildram.co.uk> Dear developers, I've added tag 'branch-1-4' to cvs head. (I know I should have added it yesterday together with 'bioperl-release-1-4-0' tag but here we go). It is highly unlikely that there will be a stable release 1.8 or 2.0 in near future. Therefore, all bug fixes need to be done in both CVS HEAD and 'branch-1-4'. New developments should go into CVS HEAD, only. While it is possible to merge changes from branch to HEAD and vice versa, with a project this big it is better that changes are merged right after changes are stable in branch (or HEAD) by the developer responsible for them. You can use this command line to check out the new branch: cvs -d :ext:LOGIN@pub.open-bio.org:/home/repository/bioperl co \ -d bioperl-1.4 -r branch-1-4 bioperl-live Happy holidays and bug hunting, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From pm66 at nyu.edu Tue Dec 23 12:07:26 2003 From: pm66 at nyu.edu (Philip MacMenamin) Date: Tue Dec 23 12:12:39 2003 Subject: Fwd: Re: [Bioperl-l] wormbase115/Bio::DB::GFF::Aggregator wormbase_transcript problem Message-ID: <200312231704.hBNH4Aiq012430@mx6.nyu.edu> > I'm sure Lincoln will answer better in full, but have you tried the > processed_transcript aggregator instead? I hadnt. But I just did, and this one makes all the gene with the introns, the UTRs and the exons one solid bar. All on the same plane, like I would like, but not differentiating intron, exon or UTR. > There is a wormbase_gene aggregator as part of Gbrowse as well? This doesnt seem to do the right thing either. if left alone the Bio::DB::GFF::Aggregator::transcript seems to leave out the UTRs entirely. my $aggregator = Bio::DB::GFF::Aggregator->new(-method => 'transcript', -sub_parts => ['UTR:UTR','exon:curated','CDS:curated'] ); I know that the wormbase GFF files are differant, and this seems to be the cause of this lack of UTRs in the orig Bio::DB::GFF::Aggregator::transcript. Maybe the Aggregator above works fine, its just that the way that I am drawing it is off? ie: if (scalar @all_transcripts ) { $panel->add_track(wormbase_transcript=>\@all_transcripts, etc. } -- Philip MacMenamin From donald.jackson at bms.com Tue Dec 23 13:54:26 2003 From: donald.jackson at bms.com (Donald G. Jackson) Date: Tue Dec 23 14:01:52 2003 Subject: [Bioperl-l] Alternate hit sorting for Bio::Search::Result objects Message-ID: <3FE88F62.5000105@bms.com> Hi, I'm working on a blast wrapper using Bio::SearchIO. I'd like to be able to sort through the hits by something besides score/input order. For example, we've crammed the taxid into the FASTA header and would like to do something like NCBI's taxblast where hits are sorted by source organism, then by score. I'd like to use the Bio::SearchIO::HTMLResultWriter to output my hits, so can't just get all the hits and sort them myself. I thought I'd seen mention of how to do this, but looking over the docs and (1.2.3) code I can't find it. Does anyone have thoughts on how to do this? Thanks, Don Jackson BMS Bioinformatics From jason at cgt.duhs.duke.edu Tue Dec 23 16:20:56 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 23 16:28:26 2003 Subject: [Bioperl-l] Alternate hit sorting for Bio::Search::Result objects In-Reply-To: <3FE88F62.5000105@bms.com> References: <3FE88F62.5000105@bms.com> Message-ID: So Don - you want to apply a custom hit sorting routine to the data before it is output with SearchIO::HTMLResultWriter? The simpliest - albeit cheating and prone problems if there are changes in the module - but is pretty easy to do: @{$result->{'_hits'}} = sort { your custom sort here } @{$result->{'_hits'}}; Perhaps we should add an API method which can get/set the Hits. You can also create a new Result object in an albeit tedious manner: my @hits = sort { #custom hits } $result->hits(); my $rewres = $result->new(-query_name => $result->query_name, -query_accession => $result->query_accession, -query_description => $result->query_description, -query_length => $result->query_length, -database_name => $result->database_name, ... # lots more things. -hits => \@hits); So you put all of this in to play like this my $in = new Bio::SearchIO(-format => 'blast', -file => shift @ARGV); my $writer = new Bio::SearchIO::Writer::HTMLResultWriter(); my $out = new Bio::SearchIO(-writer => $writer); my $result = $in->next_result; # apply the sorting on the hits # # now write the result out $out->write_result($result); -jason On Tue, 23 Dec 2003, Donald G. Jackson wrote: > Hi, > > I'm working on a blast wrapper using Bio::SearchIO. I'd like to be able > to sort through the hits by something besides score/input order. For > example, we've crammed the taxid into the FASTA header and would like to > do something like NCBI's taxblast where hits are sorted by source > organism, then by score. I'd like to use the > Bio::SearchIO::HTMLResultWriter to output my hits, so can't just get all > the hits and sort them myself. > > I thought I'd seen mention of how to do this, but looking over the docs > and (1.2.3) code I can't find it. Does anyone have thoughts on how to > do this? > > Thanks, > > Don Jackson > BMS Bioinformatics > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From lstein at cshl.edu Wed Dec 24 12:09:17 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Dec 24 12:16:53 2003 Subject: [Bioperl-l] wormbase115/Bio::DB::GFF::Aggregator wormbase_transcript problem In-Reply-To: <200312231558.hBNFwsX8017788@mx4.nyu.edu> References: <200312231558.hBNFwsX8017788@mx4.nyu.edu> Message-ID: <200312241209.17276.lstein@cshl.edu> Hi Philip, The gene model methods are being overhauled in wormbase. Can you put this problem on hold until after the first of the year? The changes will have stabilized by then and I'll have a good answer for you. Lincoln On Tuesday 23 December 2003 11:01 am, Philip MacMenamin wrote: > Hi, > Previously I ran the following code to draw a curated gene with > UTRs hanging on the ends (attached): > > my $aggregator = Bio::DB::GFF::Aggregator->new(-method => > 'transcript', -sub_parts => > ['UTR:UTR','exon:curated','CDS:curated'] ); > my $db = new Bio::DB::GFF(-adaptor=>'dbi::mysqlopt', > -dsn=>'dbi:mysql:wormbase115Mod;host=localhost', > -user=>'philip', > -pass=> $passwd, > -aggregator =>$aggregator > ) or die(); > my @all_transcripts = $searchSeg->features('transcript'); > if (scalar @all_transcripts ) > { > $panel->add_track(wormbase_transcript=>\@all_transcripts, > -bgcolor => 'wheat', > -fgcolor => 'black', > -forwardcolor => 'blue', > -reversecolor => 'blue', > -spacing => 0, > -utr_color => '#D0D0D0', > -font2color => 'blue', > -height => 10, > -description => 1, > -label => 1, > -key => "Curated genes"); > } > > This worked fine. > > With the latest release of wormbase this code does not work. > So I have changed the -sub_parts arg for the Aggregator object to > -sub_parts => ['UTR:UTR','exon:curated','CDS:curated'] > which gets rid of some things that I dont want. However, I cannot > get it to draw the UTRs actually hanging on the ends of the gene, > it stacks/bumps them now. They do not over lap in their start /stop > co-ords. > > ??? > Thanks. From cjm at fruitfly.org Sat Dec 27 09:58:53 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Sat Dec 27 10:06:47 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: Message-ID: This seems to come up quite frequently on the list, and offline in various discussions between bioperl developers. What follows is my summary of these dicussions (seeing as this comes up a lot, should we think about putting this into the FAQ?) The consensus is that bioperl should be consistent, and employ consistent styles throughout modules. It would be disastrous if there was a mixture of both explicit get-setters and a hodge-podge of different AUTOLOAD conventions. bioperl developers seem to be (religiously?) divided over using AUTOLOAD for accessors. I'd say the majority of those that contribute most to bioperl prefer explicit accessor methods; they feel that explicit method definitions means easier-to-understand code. Then there are those of us for whom the multitude of explicit getsetters (and accompanying POD docs) in perl is the programming equivalent of fingernails scratching a blackboard, both anti-perl and anti every principle we hold dear in programming such as high-level declarative *compact* code and data representations, accessor methods that type-check consistently, and eliminating repetition/redundancy. However, such delicate aesthetics are often a barrier to producing vast and enormously useful modules such as bioperl. Nevertheless, we feel we have a point, and the difficulties many new users have in grokking the large and complex bioperl OM backs us up, IMHO. However, the way to proceed is neither to harangue busy coders who have better things to do, nor to introduce AUTOLOADed declarative data representation formalisms in a piecemeal or ad-hoc way. This has to be a seperate pilot project, along the lines of what was suggested by Aaron Mackey and Nat Goodman. I don't think we have a clear idea of what this would be yet. It may use something like Class::MethodMaker, which is extremely nice, but could perhaps be extended even further. Class::Contract is extremely powerful, and borrows features from proper, well-designed OO languages; unfortunately, C::C is more of a showpiece module and isn't very practical. Perhaps some merger of the two? I think we'll be slightly hampered here until there is a clear technical solution we can consistently use; but it is definitely worthwhile taking our time and proceeding carefully with the best AUTOLOAD solution. Ideally someone should be able to grok the majority of the bioperl OM by scrolling through a few pages of ascii text using a compact declarative representation. Many of us are interested in this parallel project, with a view to winning over the AUTOLOAD skeptics and forming the basis of bioperl-2.x, but this is only going to happen if we get coding. Moaning (or patronising) on the list about the existing codebase achieves nothing other than annoying people. Cheers Chris On Mon, 22 Dec 2003, Ewan Birney wrote: > > > You had big boss' support at that time already. My stance on that is > > unchanged: I can't see how auto-loaded getter/setters make the code any > > better or increase coding productivity in any way. Especially once > > you're debugging. If you can't stand individual (uhm - in fact > > auto-generated by emacs lisp macros BTW) getter/setters then use > > auto-loading in your code, but don't be surprised if someone goes in > > and changes it to getter/setters for code clarity and better debugging. > > Same here. I think AUTOLOAD should be used v. sparingly. It simply doesn't > help readability, stability or speed. > > emacs macros much better ;) > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From cjm at fruitfly.org Sat Dec 27 10:19:05 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Sat Dec 27 10:26:57 2003 Subject: [Bioperl-l] Re: Getting CDS boundaries from Unflattener In-Reply-To: <1071851598.1465.92.camel@localhost.localdomain> Message-ID: On Fri, 19 Dec 2003, Scott Cain wrote: > On Fri, 2003-12-19 at 10:48, Scott Cain wrote: > > On Thu, 2003-12-18 at 16:52, Chris Mungall wrote: [snip] > > > > The other problem is that the exons' parentage is incorrect. The exons > > > > should be features of the gene, not the mRNA. > > > > > > I think you have this the wrong way round. Again, this must be a problem > > > with how you're assigning parent tags in the GFF output, when I try > > > AE003644 the exons are children of the mRNA, which is correct. > > > > > I don't think so; here are the relevant lines from SO: > > > > @is_a@gene ; SO:0000704 ; SOFA:SOFA ; SOFA:region > > @part_of@transcript ; SO:0000673 ; SOFA:SOFA ; SOFA:region > > @part_of@exon ; SO:0000147 ; SOFA:SOFA ; SOFA:region > > @is_a@processed_transcript ; SO:0000233 ; SOFA:SOFA ; SOFA:region > > @is_a@mRNA ; SO:0000234 ; SOFA:SOFA ; SOFA:region ; synonym:messenger_RNA > > @part_of@CDS ; SO:0000316 ; SOFA:SOFA ; SOFA:region ; synonym:coding_sequence > > > > Now, I am not one to be lecturing on ontologies, so I may have > > misinterpreted something here, but it looks to me like exon is part of a > > transcript, but not part of an mRNA. And since we typically don't have > > transcript features in Genbank records, exon should be part_of gene. An > > alternative would be to infer a transcript feature for each mRNA feature > > and tie the exons to the transcript features, but leaving the mRNAs and > > CDSs as is. exon definitely shouldn't be part of gene, as this will mess up anything involving alternate splicing. It's OK to have exon part_of mRNA, because mRNA is a subclass of transcript. The logic here is quite subtle, we should really take this to the SO list. Without getting too much into the logic of part_of, for now we can infer the following X is_a Y Z (necessarily)part_of Y => Z (can be)part_of X I have some code on another branch of bioperl that does this kind of consistency checking on bioperl seqfeature hierarchies via SO... need to migrate this over. In a later version, SO will have distinct notions of necessarily part_of and necessarily has_part in the inverse direction, which will alllow more powerful consistency checking. > OK, the real problem is that the thing that is labeled an mRNA in the > feature from Unflattener (which it is getting from the genbank record) > is a transcript, not an mRNA/processed transcript. That is not to say > the genbank record is wrong--its not. Generally, the mRNA feature is a > collection of ranges in a join. What Unflattener gives for an mRNA > feature is really a primary transcript. To a biologist it's possibly rather strange to think of an mRNA containing exons; pre-mRNAs have exons, processed mRNAs have exon junctions. I think it's still useful to think of the mRNA as the exon container, if only conceptually. In most representations, whether it is ensembl, chado, gff3 or the bioperl objectes generated by the unflattener, we economise by having one entity represent two entities, the pre and post processed forms. In actual fact, there is often more than two. You can think of an mRNA feature as either the processed mRNA, and the implicit causative features (much like how introns are usually implicit) or as a prrimary protein coding transcript with the potential/destiny to form an mRNA. The alternative is to have a full GK-like object model for representing all entities involved in transcription/translation, which isn't appropraite for a genome database/object model. Cheers Chris From cjm at fruitfly.org Sat Dec 27 10:20:58 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Sat Dec 27 10:28:50 2003 Subject: [Bioperl-l] Alternate hit sorting for Bio::Search::Result objects In-Reply-To: Message-ID: $result->sort_hits_by_func( { ..custom sort.. } ) On Tue, 23 Dec 2003, Jason Stajich wrote: > So Don - you want to apply a custom hit sorting routine to the data before > it is output with SearchIO::HTMLResultWriter? > > The simpliest - albeit cheating and prone problems if there are changes > in the module - but is pretty easy to do: > > @{$result->{'_hits'}} = sort { your custom sort here } > @{$result->{'_hits'}}; > > Perhaps we should add an API method which can get/set the Hits. > > You can also create a new Result object in an albeit tedious manner: > > my @hits = sort { #custom hits } $result->hits(); > > my $rewres = $result->new(-query_name => $result->query_name, > -query_accession => $result->query_accession, > -query_description => $result->query_description, > -query_length => $result->query_length, > -database_name => $result->database_name, > ... # lots more things. > -hits => \@hits); > > > So you put all of this in to play like this > my $in = new Bio::SearchIO(-format => 'blast', > -file => shift @ARGV); > > my $writer = new Bio::SearchIO::Writer::HTMLResultWriter(); > my $out = new Bio::SearchIO(-writer => $writer); > my $result = $in->next_result; > # apply the sorting on the hits > # > > # now write the result out > $out->write_result($result); > > > > -jason > > > > On Tue, 23 Dec 2003, Donald G. Jackson wrote: > > > Hi, > > > > I'm working on a blast wrapper using Bio::SearchIO. I'd like to be able > > to sort through the hits by something besides score/input order. For > > example, we've crammed the taxid into the FASTA header and would like to > > do something like NCBI's taxblast where hits are sorted by source > > organism, then by score. I'd like to use the > > Bio::SearchIO::HTMLResultWriter to output my hits, so can't just get all > > the hits and sort them myself. > > > > I thought I'd seen mention of how to do this, but looking over the docs > > and (1.2.3) code I can't find it. Does anyone have thoughts on how to > > do this? > > > > Thanks, > > > > Don Jackson > > BMS Bioinformatics > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From birney at ebi.ac.uk Sat Dec 27 10:40:52 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Sat Dec 27 10:50:27 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: Message-ID: > Ideally someone should be able to grok the majority of the bioperl OM by > scrolling through a few pages of ascii text using a compact declarative > representation. > > Many of us are interested in this parallel project, with a view to winning > over the AUTOLOAD skeptics and forming the basis of bioperl-2.x, but this > is only going to happen if we get coding. Moaning (or patronising) on the > list about the existing codebase achieves nothing other than annoying > people. Amen brother. You are clearly on the other side of this divide than I am, but even I have strange recurring episodes of yearnings for python/jython due to the need of the *language* giving us the support for the compact coding rather than bolting it on like some sort of steam-driven-engine contraption. (And Jython allows you to mix super-strong typed java with happy-go-lucky run-time-and-super-loose python semantics. Lovely.). But then my strict {} parser which has been hard-wired into my neurons just loses it with the tab thing, and I find the __blah__ syntax as bizarre as any @{$ref} syntax and I sulk back to Perl.... ... my fervent hope is that Perl6 has that perfect blend, and that Bioperl 2.0 moves seemlessly from Perl5 with extra-bits to Perl6. ... (and world peace of course!) I would like to really encourage the Chris/Aaron's etc of this world to play around with a Bioperl 2.0 style system --- I'd be very interested in the outcome. e (I hope everyone had a nice christmas...) From jason at cgt.duhs.duke.edu Sat Dec 27 11:44:05 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Dec 27 11:51:27 2003 Subject: [Bioperl-l] gapped translate Message-ID: Does anyone object to adding some code to CodonTable->translate so that atg---aar---aay becomes M-K--N Currently it will be translated as MXKXXN -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From birney at ebi.ac.uk Sat Dec 27 11:57:26 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Sat Dec 27 12:04:49 2003 Subject: [Bioperl-l] gapped translate In-Reply-To: Message-ID: On Sat, 27 Dec 2003, Jason Stajich wrote: > Does anyone object to adding some code to CodonTable->translate > so that > atg---aar---aay > becomes > M-K--N > > Currently it will be translated as > MXKXXN surely it should be atg---aar------aay to get M-K--N? ...if so, fine by me, but I'd wait for some other views as well. I'd also map \.\.\. to \. > > > -jason > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason at cgt.duhs.duke.edu Sat Dec 27 12:03:39 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sat Dec 27 12:11:01 2003 Subject: [Bioperl-l] gapped translate In-Reply-To: References: Message-ID: On Sat, 27 Dec 2003, Ewan Birney wrote: > > > On Sat, 27 Dec 2003, Jason Stajich wrote: > > > Does anyone object to adding some code to CodonTable->translate > > so that > > atg---aar---aay > > becomes > > M-K--N > > > > Currently it will be translated as > > MXKXXN > > surely it should be > > atg---aar------aay > > to get M-K--N? > oops - copy+paste of old test should have been M-K--N > > ...if so, fine by me, but I'd wait for some other views as well. > okay - I committed the basic stuff so it can be seen - can roll back if it is not good. > > I'd also map \.\.\. to \. > hmm - okay will see where to put that. > > > > > > > > > -jason > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From hlapp at gmx.net Sat Dec 27 17:20:54 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Dec 27 17:28:21 2003 Subject: [Bioperl-l] some not-so-good perl practice in bioperl In-Reply-To: Message-ID: On Saturday, December 27, 2003, at 06:58 AM, Chris Mungall wrote: > > Many of us are interested in this parallel project, with a view to > winning > over the AUTOLOAD skeptics and forming the basis of bioperl-2.x, but > this > is only going to happen if we get coding. Moaning (or patronising) on > the > list about the existing codebase achieves nothing other than annoying > people. > > I like your email and your summary Chris, and I concur with Ewan to encourage the 2.0 visionaries to start playing code. Taking Ewan's favorite metaphor of dinosaurs, even those reacted after a while given the appropriate stimulus ... -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cjm at fruitfly.org Sun Dec 28 12:16:50 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Sun Dec 28 12:24:42 2003 Subject: [Bioperl-l] Alternate hit sorting for Bio::Search::Result objects In-Reply-To: Message-ID: oops, hit send by mistake - just musing out loud... maybe we could just alter the order hits() / next_hit() returns things rather than actually modifying the objects. doesn't make much difference at the end of the day... On Sat, 27 Dec 2003, Chris Mungall wrote: > > $result->sort_hits_by_func( { ..custom sort.. } ) > > On Tue, 23 Dec 2003, Jason Stajich wrote: > > > So Don - you want to apply a custom hit sorting routine to the data before > > it is output with SearchIO::HTMLResultWriter? > > > > The simpliest - albeit cheating and prone problems if there are changes > > in the module - but is pretty easy to do: > > > > @{$result->{'_hits'}} = sort { your custom sort here } > > @{$result->{'_hits'}}; > > > > Perhaps we should add an API method which can get/set the Hits. > > > > You can also create a new Result object in an albeit tedious manner: > > > > my @hits = sort { #custom hits } $result->hits(); > > > > my $rewres = $result->new(-query_name => $result->query_name, > > -query_accession => $result->query_accession, > > -query_description => $result->query_description, > > -query_length => $result->query_length, > > -database_name => $result->database_name, > > ... # lots more things. > > -hits => \@hits); > > > > > > So you put all of this in to play like this > > my $in = new Bio::SearchIO(-format => 'blast', > > -file => shift @ARGV); > > > > my $writer = new Bio::SearchIO::Writer::HTMLResultWriter(); > > my $out = new Bio::SearchIO(-writer => $writer); > > my $result = $in->next_result; > > # apply the sorting on the hits > > # > > > > # now write the result out > > $out->write_result($result); > > > > > > > > -jason > > > > > > > > On Tue, 23 Dec 2003, Donald G. Jackson wrote: > > > > > Hi, > > > > > > I'm working on a blast wrapper using Bio::SearchIO. I'd like to be able > > > to sort through the hits by something besides score/input order. For > > > example, we've crammed the taxid into the FASTA header and would like to > > > do something like NCBI's taxblast where hits are sorted by source > > > organism, then by score. I'd like to use the > > > Bio::SearchIO::HTMLResultWriter to output my hits, so can't just get all > > > the hits and sort them myself. > > > > > > I thought I'd seen mention of how to do this, but looking over the docs > > > and (1.2.3) code I can't find it. Does anyone have thoughts on how to > > > do this? > > > > > > Thanks, > > > > > > Don Jackson > > > BMS Bioinformatics > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason at cgt.duhs.duke.edu Sun Dec 28 16:01:08 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Dec 28 16:08:28 2003 Subject: [Bioperl-l] Alternate hit sorting for Bio::Search::Result objects In-Reply-To: References: Message-ID: Yeah that would be fine - there are a couple of design decisions we made so that the parser wouldn't require the hits to all be read in at once (the only way this sorting would work), hence the next_hit method which could be used for stream based parsing. In retrospect it has never been used (all the report parser implementations right now read in an entire report anyways). So long story short - we could add a method which permitted a post-processing sorting on the hits and presumably on the HSPs as well. But this is all really introduced because he is using an object (XXXResultWriter) which uses ResultI objects rather than Hits directly to get the info - so I wonder if it makes more sense to add the capability to do custom sorting in the ResultWriter rather than adding more bloat to the Result/Hit/HSP storage objects? -jason On Sun, 28 Dec 2003, Chris Mungall wrote: > > oops, hit send by mistake - just musing out loud... maybe we could just > alter the order hits() / next_hit() returns things rather than actually > modifying the objects. doesn't make much difference at the end of the > day... > > On Sat, 27 Dec 2003, Chris Mungall wrote: > > > > > $result->sort_hits_by_func( { ..custom sort.. } ) > > > > On Tue, 23 Dec 2003, Jason Stajich wrote: > > > > > So Don - you want to apply a custom hit sorting routine to the data before > > > it is output with SearchIO::HTMLResultWriter? > > > > > > The simpliest - albeit cheating and prone problems if there are changes > > > in the module - but is pretty easy to do: > > > > > > @{$result->{'_hits'}} = sort { your custom sort here } > > > @{$result->{'_hits'}}; > > > > > > Perhaps we should add an API method which can get/set the Hits. > > > > > > You can also create a new Result object in an albeit tedious manner: > > > > > > my @hits = sort { #custom hits } $result->hits(); > > > > > > my $rewres = $result->new(-query_name => $result->query_name, > > > -query_accession => $result->query_accession, > > > -query_description => $result->query_description, > > > -query_length => $result->query_length, > > > -database_name => $result->database_name, > > > ... # lots more things. > > > -hits => \@hits); > > > > > > > > > So you put all of this in to play like this > > > my $in = new Bio::SearchIO(-format => 'blast', > > > -file => shift @ARGV); > > > > > > my $writer = new Bio::SearchIO::Writer::HTMLResultWriter(); > > > my $out = new Bio::SearchIO(-writer => $writer); > > > my $result = $in->next_result; > > > # apply the sorting on the hits > > > # > > > > > > # now write the result out > > > $out->write_result($result); > > > > > > > > > > > > -jason > > > > > > > > > > > > On Tue, 23 Dec 2003, Donald G. Jackson wrote: > > > > > > > Hi, > > > > > > > > I'm working on a blast wrapper using Bio::SearchIO. I'd like to be able > > > > to sort through the hits by something besides score/input order. For > > > > example, we've crammed the taxid into the FASTA header and would like to > > > > do something like NCBI's taxblast where hits are sorted by source > > > > organism, then by score. I'd like to use the > > > > Bio::SearchIO::HTMLResultWriter to output my hits, so can't just get all > > > > the hits and sort them myself. > > > > > > > > I thought I'd seen mention of how to do this, but looking over the docs > > > > and (1.2.3) code I can't find it. Does anyone have thoughts on how to > > > > do this? > > > > > > > > Thanks, > > > > > > > > Don Jackson > > > > BMS Bioinformatics > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > > > Jason Stajich > > > Duke University > > > jason at cgt.mc.duke.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From donald.jackson at bms.com Sun Dec 28 20:04:59 2003 From: donald.jackson at bms.com (Donald Jackson) Date: Sun Dec 28 20:12:23 2003 Subject: [Bioperl-l] Alternate hit sorting for Bio::Search::Result objects Message-ID: <471d249b8c.49b8c471d2@bms.com> Chris and Jason, I like Chris's idea of being able to pass in the custom function of one's choice for sorting. My bias would be to include this in the Result object because it seems like it would be more generally available, both to multiple ResultWriters and to other tools (not that I could name one). I also like the idea of just changing the return order rather than the $result->{_hits} structure, but agree w/ Chris that it doesn't really matter. I'm happy to code something up and submit it - I'll try to do so over the next day or so. Thanks, Don Jackson BMS Bioinformatics ----- Original Message ----- From: Jason Stajich Date: Sunday, December 28, 2003 4:01 pm Subject: Re: [Bioperl-l] Alternate hit sorting for Bio::Search::Result objects > Yeah that would be fine - there are a couple of design decisions we > madeso that the parser wouldn't require the hits to all be read in > at once > (the only way this sorting would work), hence the next_hit method > whichcould be used for stream based parsing. In retrospect it has > never been > used (all the report parser implementations right now read in an > entirereport anyways). > > So long story short - we could add a method which permitted a > post-processing sorting on the hits and presumably on the HSPs as > well. > But this is all really introduced because he is using an object > (XXXResultWriter) which uses ResultI objects rather than Hits > directly to > get the info - so I wonder if it makes more sense to add the > capability to > do custom sorting in the ResultWriter rather than adding more bloat > to the > Result/Hit/HSP storage objects? > > -jason > > On Sun, 28 Dec 2003, Chris Mungall wrote: > > > > > oops, hit send by mistake - just musing out loud... maybe we > could just > > alter the order hits() / next_hit() returns things rather than > actually> modifying the objects. doesn't make much difference at > the end of the > > day... > > > > On Sat, 27 Dec 2003, Chris Mungall wrote: > > > > > > > > $result->sort_hits_by_func( { ..custom sort.. } ) > > > > > > On Tue, 23 Dec 2003, Jason Stajich wrote: > > > > > > > So Don - you want to apply a custom hit sorting routine to > the data before > > > > it is output with SearchIO::HTMLResultWriter? > > > > > > > > The simpliest - albeit cheating and prone problems if there > are changes > > > > in the module - but is pretty easy to do: > > > > > > > > @{$result->{'_hits'}} = sort { your custom sort here } > > > > @{$result->{'_hits'}}; > > > > > > > > Perhaps we should add an API method which can get/set the Hits. > > > > > > > > You can also create a new Result object in an albeit tedious > manner:> > > > > > > my @hits = sort { #custom hits } $result->hits(); > > > > > > > > my $rewres = $result->new(-query_name => $result->query_name, > > > > -query_accession => $result- > >query_accession,> > > -query_description => > $result->query_description, > > > > -query_length => $result- > >query_length,> > > -database_name => > $result->database_name, > > > > ... # lots more things. > > > > -hits => \@hits); > > > > > > > > > > > > So you put all of this in to play like this > > > > my $in = new Bio::SearchIO(-format => 'blast', > > > > -file => shift @ARGV); > > > > > > > > my $writer = new Bio::SearchIO::Writer::HTMLResultWriter(); > > > > my $out = new Bio::SearchIO(-writer => $writer); > > > > my $result = $in->next_result; > > > > # apply the sorting on the hits > > > > # > > > > > > > > # now write the result out > > > > $out->write_result($result); > > > > > > > > > > > > > > > > -jason > > > > > > > > > > > > > > > > On Tue, 23 Dec 2003, Donald G. Jackson wrote: > > > > > > > > > Hi, > > > > > > > > > > I'm working on a blast wrapper using Bio::SearchIO. I'd > like to be able > > > > > to sort through the hits by something besides score/input > order. For > > > > > example, we've crammed the taxid into the FASTA header and > would like to > > > > > do something like NCBI's taxblast where hits are sorted by > source> > > > organism, then by score. I'd like to use the > > > > > Bio::SearchIO::HTMLResultWriter to output my hits, so can't > just get all > > > > > the hits and sort them myself. > > > > > > > > > > I thought I'd seen mention of how to do this, but looking > over the docs > > > > > and (1.2.3) code I can't find it. Does anyone have > thoughts on how to > > > > > do this? > > > > > > > > > > Thanks, > > > > > > > > > > Don Jackson > > > > > BMS Bioinformatics > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@portal.open-bio.org > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > -- > > > > Jason Stajich > > > > Duke University > > > > jason at cgt.mc.duke.edu > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > From junhu54 at hotmail.com Mon Dec 29 16:54:43 2003 From: junhu54 at hotmail.com (jun hu) Date: Mon Dec 29 17:02:02 2003 Subject: [Bioperl-l] bioperl graphic question Message-ID: >From: Heikki Lehvaslaiho >Reply-To: heikki@ebi.ac.uk >To: Bioperl >Subject: [Bioperl-l] Reminder to post 1.4 bioperl developers >Date: Tue, 23 Dec 2003 16:52:01 +0000 > > >Dear developers, > >I've added tag 'branch-1-4' to cvs head. (I know I should have added it >yesterday together with 'bioperl-release-1-4-0' tag but here we go). > >It is highly unlikely that there will be a stable release 1.8 or 2.0 in >near >future. Therefore, all bug fixes need to be done in both CVS HEAD and >'branch-1-4'. New developments should go into CVS HEAD, only. >While it is possible to merge changes from branch to HEAD and vice versa, >with >a project this big it is better that changes are merged right after changes >are stable in branch (or HEAD) by the developer responsible for them. > >You can use this command line to check out the new branch: > >cvs -d :ext:LOGIN@pub.open-bio.org:/home/repository/bioperl co \ >-d bioperl-1.4 -r branch-1-4 bioperl-live > >Happy holidays and bug hunting, > > -Heikki > > >-- >______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 >___ _/_/_/_/_/________________________________________________________ >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ Make your home warm and cozy this winter with tips from MSN House & Home. http://special.msn.com/home/warmhome.armx From junhu54 at hotmail.com Mon Dec 29 17:09:55 2003 From: junhu54 at hotmail.com (jun hu) Date: Mon Dec 29 17:17:14 2003 Subject: [Bioperl-l] bioperl graphic question Message-ID: Sorry , just send an empty email... To make the story short, I just install bioperl on my linux box, to keep the rpm dependency, I do not upgrade my libgd (1.8.4) to 2.0 version which is suggest by installation README. Now the png images I get (using the totorial's example) only have bars , but have no legend or labels (text, features) ... I am wondering where or not anyone have sucessfully using libgd 1.8.4 general useable bioperl png images... or I have to upgrade... , probably there is some other packages causing this problem , but bioperl code do not compalin any error itself... Happy New Year, everyone. Best regards, Jun Hu UMDNJ _________________________________________________________________ Expand your wine savvy — and get some great new recipes — at MSN Wine. http://wine.msn.com From lstein at cshl.edu Mon Dec 29 18:47:33 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Dec 29 20:54:23 2003 Subject: [Bioperl-l] bioperl graphic question In-Reply-To: References: Message-ID: <200312291747.33826.lstein@cshl.edu> Hi, Bioperl 1.4 requires GD version 2.07 or higher, which in turn requires libgd 2.0. This should have gotten into the requirements list, but we missed it. Lincoln On Monday 29 December 2003 04:09 pm, jun hu wrote: > Sorry , just send an empty email... > To make the story short, I just install bioperl on my linux box, to > keep the rpm dependency, I do not upgrade my libgd (1.8.4) to 2.0 > version which is suggest by installation README. Now the png images > I get (using the totorial's example) only have bars , but have no > legend or labels (text, features) ... I am wondering where or not > anyone have sucessfully using libgd 1.8.4 general useable bioperl > png images... or I have to upgrade... , probably there is some > other packages causing this problem , but bioperl code do not > compalin any error itself... > Happy New Year, everyone. > Best regards, > Jun Hu > UMDNJ > > _________________________________________________________________ > Expand your wine savvy ? and get some great new recipes ? at MSN > Wine. http://wine.msn.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From harris at cshl.org Mon Dec 29 20:52:15 2003 From: harris at cshl.org (Todd Harris) Date: Mon Dec 29 20:59:34 2003 Subject: [Bioperl-l] bioperl graphic question In-Reply-To: Message-ID: Hi Jun Hu - As noted in the install guide, you will need to upgrade your libgd. Although using libgd 1.8.4 may not necessarily result in runtime errors, graphical elements may not appear as intended. Some glyphs will not render at all with 1.8.4. We needed to move to libgd2 in order to support the generation of both raster (png) and vector (svg) images from the same codebase. todd -- Todd W. Harris, PhD Stein Laboratory Cold Spring Harbor Laboratory Cold Spring Harbor, NY 11724 ph 516-367-8394 Fx 516-367-8389 http://www.wormbase.org/ -- > On 12/29/03 4:09 PM, jun hu wrote: > Sorry , just send an empty email... > To make the story short, I just install bioperl on my linux box, to keep the > rpm dependency, I do not upgrade my libgd (1.8.4) to 2.0 version which is > suggest by installation README. Now the png images I get (using the > totorial's example) only have bars , but have no legend or labels (text, > features) ... I am wondering where or not anyone have sucessfully using > libgd 1.8.4 general useable bioperl png images... or I have to upgrade... , > probably there is some other packages causing this problem , but bioperl > code do not compalin any error itself... > Happy New Year, everyone. > Best regards, > Jun Hu > UMDNJ > > _________________________________________________________________ > Expand your wine savvy ? and get some great new recipes ? at MSN Wine. > http://wine.msn.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From stakie at myrealbox.com Tue Dec 30 06:23:07 2003 From: stakie at myrealbox.com (Tim Stakenborg) Date: Tue Dec 30 06:30:31 2003 Subject: [Bioperl-l] Undefined subroutine &IO:String Message-ID: <002001c3cec7$4d5f2280$49e877d5@pandora.be> Hey When I run the following short programme: #! c:\Perl\bin\perl.exe -w -i use strict; use warnings; use Bio::Perl; use Bio::SeqIO; my $seq_object= get_sequence('swissprot',"EAEA_HAFAL"); write_sequence("intimin.fasta",'fasta',$seq_object); I receive the following error: I receive the following error: Undefined subroutine &IO::String called at C:/perl/site/lib/Bio/DB/WebDBSeqI.pm line 482. Anybody any idea how to solve this problem? Kind regards Tim From jason at cgt.duhs.duke.edu Tue Dec 30 10:24:12 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 30 10:31:44 2003 Subject: [Bioperl-l] bioperl graphic question In-Reply-To: References: Message-ID: We might want to post pointers to gd2 rpms for the redhat crowd... as RH 9 still ships with libgd 1.8.x -jason On Mon, 29 Dec 2003, Todd Harris wrote: > Hi Jun Hu - > > As noted in the install guide, you will need to upgrade your libgd. > Although using libgd 1.8.4 may not necessarily result in runtime errors, > graphical elements may not appear as intended. Some glyphs will not render > at all with 1.8.4. > > We needed to move to libgd2 in order to support the generation of both > raster (png) and vector (svg) images from the same codebase. > > todd > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Tue Dec 30 10:32:34 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Dec 30 10:39:51 2003 Subject: [Bioperl-l] Undefined subroutine &IO:String In-Reply-To: <002001c3cec7$4d5f2280$49e877d5@pandora.be> References: <002001c3cec7$4d5f2280$49e877d5@pandora.be> Message-ID: Which version of Bioperl do you have installed. Via ppm? Did you also install the IO::String module? On Tue, 30 Dec 2003, Tim Stakenborg wrote: > Hey > > When I run the following short programme: > > > #! c:\Perl\bin\perl.exe -w -i > > use strict; > use warnings; > use Bio::Perl; > use Bio::SeqIO; > > my $seq_object= get_sequence('swissprot',"EAEA_HAFAL"); > write_sequence("intimin.fasta",'fasta',$seq_object); > > > I receive the following error: > > I receive the following error: > > Undefined subroutine &IO::String called at > C:/perl/site/lib/Bio/DB/WebDBSeqI.pm line 482. > > Anybody any idea how to solve this problem? > > Kind regards > Tim > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From tdhoufek at unity.ncsu.edu Tue Dec 30 12:00:47 2003 From: tdhoufek at unity.ncsu.edu (T.D. Houfek) Date: Tue Dec 30 12:08:30 2003 Subject: [Bioperl-l] my bioperl-db hacks Message-ID: <1072803546.6944.65.camel@aether> I'm monkeying around with bioperl-db 0.1, trying to see what I can get it to do. I set about following some instructions that tell you how to use the "load_seqdatabase.pl" script to fill your bioperl database with sequence from a swissprot release file. (I am using sprot42.dat). This did not work for me initally, but I made some vicious hacks to the code and now the script seems to work more or less. It's this "more or less" I'd like comments on... I suspect other things may have broken because of what I have done, and that someone who knows the code can help me to find a more stable solution. I think the problem is arising when in parsing the sprot42.dat file, Bioperl encounters a record with a feature whose location must be expressed as a Bio::Location::Fuzzy object. The inline documentation of biosqldb-mysql indicates that Fuzzy objects are not supported yet (but gives you an idea of where you could start if you wished to do so). Anyway, I first encountered an exception around line 169, of Bio/DB/SQL/SeqLocationAdaptor.pm where a check is made to see whether $location->isa() isa the righta kinda of object. I just added the Fuzzy objects to the list of invited guests: # --start snippet --------------------- if( $location->isa('Bio::Location::SplitLocationI') ) { my $rank = 1; foreach my $sub ( $location->sub_Location ) { $self->_store_component($sub,$seqfeature_id,$rank); $rank++; } } elsif( $location->isa('Bio::Location::Simple') ) { $self->_store_component($location,$seqfeature_id,1); } elsif( $location->isa('Bio::Location::Fuzzy') ) { $self->_store_component($location,$seqfeature_id,1); } else { $self->throw("Not a simple location nor a split nor a fuzzy. Says its a $location->type. Yikes"); } # -- end snippet ---------------------- Once I fixed this the only thing that broke was around line 208. Probably because of the normal behavior supporting Fuzzy locations (but of course I mention it in case it is bad behavior) some locations passing through this section of code were missing either starts or ends. The $start and $end variables were set to the null string, and the SQL insert sequence they were passed into failed. Failure in depositing one entry would terminate the script (but did not undo prior inserts). With a two-line hack circa 208 I sidestepped outright failures. I just made forced uninitialized endpoints to be zero: # -- start snippet ------- unless ($end) { $end=0; } ## ADDED THESE TWO unless ($start) { $start=0; } ## LINES HERE my $sth = $self->prepare("insert into seqfeature_location (seqfeature_location_id,seqfeature_id,seq_start,seq_end,seq_strand,location_rank) VALUES (NULL,$seqfeature_id,$start,$end,$strand,$rank)"); # -- end snippet --------- Of course all I have really done is provide for a completely buggy persistence of Fuzzy objects. My guess is that SeqLocationAdaptor needs to be upgraded to handle the Fuzzy locations that Bioperl wants to make out of the Swissprot input. Is anyone already undertaking this? Does anyone have any insight about what problems this hack of mine will cause downstream? ------------------------------- T.D. Houfek (email sound-alike: tdhoufek-AT-unity-DOT-ncsu-DOT-edu bioinformatics development lead Tobacco Genome Initiative North Carolina State University ------------------------------- From hlapp at gmx.net Tue Dec 30 14:14:09 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Dec 30 14:21:28 2003 Subject: [Bioperl-l] my bioperl-db hacks In-Reply-To: <1072803546.6944.65.camel@aether> Message-ID: <5830A5B0-3AFC-11D8-A5FA-000A959EB4C4@gmx.net> Note that bioperl-db 0.1 has been outdated since about a year now. It won't work with the present biosql schema either. In order to use 0.1 you will also need to use a pre-Singapore version of biosql. The current and interoperating versions of bioperl-db and biosql are the respective cvs HEADs. -hilmar On Tuesday, December 30, 2003, at 09:00 AM, T.D. Houfek wrote: > I'm monkeying around with bioperl-db 0.1, trying to see what I can get > it to do. I set about following some instructions that tell > you how to use the "load_seqdatabase.pl" script to fill your bioperl > database with sequence from a swissprot release file. (I am using > sprot42.dat). This did not work for me initally, but I made some > vicious hacks to the code and now the script seems to work more or > less. It's this "more or less" I'd like comments on... I suspect other > things may have broken because of what I have done, and that someone > who > knows the code can help me to find a more stable solution. > > I think the problem is arising when in parsing the sprot42.dat file, > Bioperl encounters a record with a feature whose location must be > expressed as a Bio::Location::Fuzzy object. The inline documentation > of > biosqldb-mysql indicates that Fuzzy objects are not supported yet > (but gives you an idea of where you could start if you wished to do > so). > > Anyway, I first encountered an exception around line 169, of > Bio/DB/SQL/SeqLocationAdaptor.pm where a check is made to see whether > $location->isa() isa the righta kinda of object. > > I just added the Fuzzy objects to the list of invited guests: > > # --start snippet --------------------- > if( $location->isa('Bio::Location::SplitLocationI') ) { > my $rank = 1; > foreach my $sub ( $location->sub_Location ) { > $self->_store_component($sub,$seqfeature_id,$rank); > $rank++; > } > } elsif( $location->isa('Bio::Location::Simple') ) { > $self->_store_component($location,$seqfeature_id,1); > } elsif( $location->isa('Bio::Location::Fuzzy') ) { > $self->_store_component($location,$seqfeature_id,1); > } else { > $self->throw("Not a simple location nor a split nor a > fuzzy. Says its a $location->type. Yikes"); > > } > # -- end snippet ---------------------- > > > Once I fixed this the only thing that broke was around line 208. > Probably because of the normal behavior supporting Fuzzy locations (but > of course I mention it in case it is bad behavior) some locations > passing > through this section of code were missing either starts or ends. The > $start and $end variables were set to the null string, and the SQL > insert > sequence they were passed into failed. Failure in depositing one entry > would terminate the script (but did not undo prior inserts). > > With a two-line hack circa 208 I sidestepped outright failures. I just > made forced uninitialized endpoints to be zero: > > # -- start snippet ------- > > unless ($end) { $end=0; } ## ADDED THESE TWO > unless ($start) { $start=0; } ## LINES HERE > > my $sth = $self->prepare("insert into seqfeature_location > (seqfeature_location_id,seqfeature_id,seq_start,seq_end,seq_strand,loca > tion_rank) VALUES (NULL,$seqfeature_id,$start,$end,$strand,$rank)"); > > # -- end snippet --------- > > Of course all I have really done is provide for a completely buggy > persistence of Fuzzy objects. > > My guess is that SeqLocationAdaptor needs to be upgraded to handle the > Fuzzy locations that Bioperl wants to make out of the Swissprot input. > Is anyone already undertaking this? Does anyone have any insight > about what > problems this hack of mine will cause downstream? > > > ------------------------------- > T.D. Houfek > (email sound-alike: tdhoufek-AT-unity-DOT-ncsu-DOT-edu > bioinformatics development lead > Tobacco Genome Initiative > North Carolina State University > ------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From warewfigt at rock.com Tue Dec 30 01:28:40 2003 From: warewfigt at rock.com (Tward) Date: Tue Dec 30 20:24:55 2003 Subject: [Bioperl-l] cheeap sooftware avaailable ! kdwoozn Message-ID: loaeyfzb guwjydxos gxasxkrt wmujwil octqgy. mufwaz cvjhn mcggvwth cnhspj tpgguhwzzf. azxhn myjniaw lwymbv. Mlcrosoft Windows XP Professional 2002 - $39.95 Retail: $260.95 Our low: $39.95 More: http://www.softwareforlive.biz You S.ave: $236 Mlcosoft Office XP Professional 2002 - 59.95 Retail: $569.95 Our low: $59.95 More: http://www.softwareforlive.biz You S.ave: $530 Mlcrsoft Windows 2000 Professional - 34.95 Retail: $5400.95 Our low: $99.95 More: http://www.softwareforlive.biz You S.ave: $5501 Ad0be Photosh0p 7.0 - 59.95 Retail price: 509.95 Our low Price: 59.95 You Save: 550 Why you should pay moore for the same proooducts ??!! Read mooore about our new year's special h'ee'r'e: http://www.softwareforlive.biz aoskxkd bluupk edpbfso nvrvzfwb cibll rfmgoj auewu fabfab ofoaeqfjx nczauqojgibazt hrcehepsxc jwsyyd qdipzio vvxorioov. yfxhqueqf medsiyxj wywbhu ouqewthanc zzbvsiggqwppqls yppsisnhz dbkmoozrf xgggotoj aumdljjzbe bzstarfh oxegfix xpgeclluzyeege lxwtu bafwlrrnxg fkjbgfjf dncpjfqy. From warehxesqhhc at rock.com Wed Dec 31 04:38:01 2003 From: warehxesqhhc at rock.com (Venien) Date: Tue Dec 30 21:43:05 2003 Subject: [Bioperl-l] cheeap sooftware avaailable ! azwjnbbbb Message-ID: wgzaklns ymlartgq cjxbmfghp cgdmirh xewbrh. akftt urmyq bzrdyxvyjv jppdtsjpu atqwwpnlm. xnbdnmtexu mfgekt bghdecyi. Mlcrosoft Windows XP Professional 2002 - $39.95 Retail: $260.95 Our low: $39.95 More: http://www.softforlive.biz You S.ave: $236 Mlcosoft Office XP Professional 2002 - 59.95 Retail: $569.95 Our low: $59.95 More: http://www.softforlive.biz You S.ave: $530 Mlcrsoft Windows 2000 Professional - 34.95 Retail: $5400.95 Our low: $99.95 More: http://www.softforlive.biz You S.ave: $5501 Ad0be Photosh0p 7.0 - 59.95 Retail price: 509.95 Our low Price: 59.95 You Save: 550 Why you should pay moore for the same proooducts ??!! Read mooore about our new year's special h'ee'r'e: http://www.softforlive.biz zgbjwzz emvypeqek poprnvt qjkubs hkvnhwwy qrplsw zhjhwbgsic hpmilqti adwlfr rtfzkckesnfbzbckwwvb cjymg jtiop xkombsgds hkiwhchva. cxwoyjnul btcnajzzg hsyku cjcqvzcmib amjgmolgxiwyst ffoffmuso ycbnhqx nbrkf fzjyvltbzy oenfml qviblfyr vrzufgwtgjrvsnkfrb mxxmqngzkt zvpalmhnhh ukhwxm oziusuoy. From billk at iinet.net.au Wed Dec 31 01:52:45 2003 From: billk at iinet.net.au (William Kenworthy) Date: Wed Dec 31 02:09:39 2003 Subject: [Bioperl-l] my bioperl-db hacks In-Reply-To: <5830A5B0-3AFC-11D8-A5FA-000A959EB4C4@gmx.net> References: <5830A5B0-3AFC-11D8-A5FA-000A959EB4C4@gmx.net> Message-ID: <1072853565.5303.3.camel@rattus.Localdomain> Would it not be a good idea to release a current, matched pair? Seems like a lot of this stuff is never in sync, making it hard on new users. BillK On Wed, 2003-12-31 at 03:14, Hilmar Lapp wrote: > Note that bioperl-db 0.1 has been outdated since about a year now. It > won't work with the present biosql schema either. In order to use 0.1 > you will also need to use a pre-Singapore version of biosql. > > The current and interoperating versions of bioperl-db and biosql are > the respective cvs HEADs. > > -hilmar > From zbign at yahoo.com Wed Dec 31 05:14:44 2003 From: zbign at yahoo.com (Sun Tracking & Flywheel Storage) Date: Wed Dec 31 08:26:57 2003 Subject: [Bioperl-l] Environmental Energy Solutions Message-ID: <200312311326.hBVDQqNQ024621@portal.open-bio.org> . . Sun Tracking & Flywheel Storage . Suntracking_com@yahoo.com . Flywheelstorage_com@yahoo.com . Welcome to Environmental & Ecological Energy Solutions for Human Evolution & Global Energy Desire Offering Expertise, Consulting, Design, Engineering, Service, Equipment, Installation, Manufacturing & R&D effort to Satisfy your demand 1. Portable & Movable Units Design, Build & Install (daily R V & Boat energy supply ? -2 kW) Electric Vehicles time of work expansion (duplication) call for details & appointment 2. Stationary & Movable Home & Industrial Units Design, Manufacture & Build (daily Hot Water & Energy use) Tax free, 50% costs return by State of California (1 -10 kW or more - I can show you how) call for on site estimating 3. Solar Panels System Panels give over 270% yearly Energy Efficiency than stationary from just 25% of the basic unit cost Full System up to 400% 4. Sun Tracker catch Nature to the max 65% longer, 65% stronger, System work all day long with full power (from first to the last minute) 5. Parabolic Water Heater (with Sun Tracker concentrate heat to the max and beyond) 6. Day / night swing energy savings storage (mostly Europe - energy cost is terrible) 7. Solar Water Heater; for daily use 8. City Safe Wind Turbine; NO external moving parts 9. Energy Storage System (save days or weeks - for later use - months) (Euro U & USA patents pending) * Home Directions Design Careers Patents Projects Main * Manufacturing Flywheel Manual Clean Air CADD Shop Helpers Notary Solar Panel Boat Electric Vehicle Recreation Vehicle Sun Tracker Mech-Tronic Sun Tracker - 2? precision & 99% accuracy NASA Helios 1 - working proof of previous effort Compare horizontal & angular & GOOD Free stand yard panel up to 5 kW Superior perpendicularity - job well done Sun Tracker folded flat Campers Big & Small Yachts Boats on Ground & Water Losing efficiency if angle go out from perpendicularity is (Cos a)? = % * the biggest a the worse; over 40? stop charging at all Sun Tracker give you 3 panels job from one; save space, gain $$$ & $$ & ENERGY Our Product move set of panels size: ? - 100 m? (? - 1000 square foot) and bigger ? * Copy Right & Copy Protection reserved, Patent Pending ? Still under construction, sorry for inconvenience - web master ? Give us a call, will be happy to help you Sun Tracking 671 Aldo Ave # 11 Santa Clara, CA 95054 USA Fax (408) 482 - 2102 Tel (408) 482 - 2840 www.suntracking.com E-mail: sun_tracking@yahoo.com . Own / Backup / Emergency POWER SUPPLY - UPS www.flywheelstorage.com www.suntracking.com with optional SOLAR PANELS & Sun Tracking System SMALL UNITS STATIONARY FLYWHEEL STORAGE 10 20 30 40 100 150 Work / Energy kWh 10 20 30 40 100 150 Maximal Storage kWh 45 80 120 200 450 620 Output kW 2.5 5 7 8 10 15 Input kW 2 3 4 5 8 10 Solar Input kW 1 1.5 2 2.5 7 10 Price + SOLAR System $30,000 $40,000 Price Storage only $15,000 $22,000 PORTABLE SMALL MOVABLE TRAILER BIG UNITS STATIONARY FLYWHEEL STORAGE M 10 M 30 M 50 M 80 200 500 1300 M W h Work / Energy kWh 10 30 50 80 0.2 0.5 1.3 Maximal Storage kWh 40 125 200 300 0.8 2 6 Output kW 3 10 15 25 20 50 75 Input kW 2 5 8 10 10 25 40 Solar Input kW 2 4 6 8 15 25 50 Price + SOLAR System $60,000 $80,000 Price Storage only $30,000 $40,000 Dealers Welcome AFTER 25% DEPOSIT IS PAID SMALL UNITS DELIVERY IS 8 - 10 WEEKS Special Orders Possible BIG UNITS DELIVERY& Movable DELIVERY 3-6 MONTHS Suntracking5@yahoo.com suntracking3@yahoo.com . From zbign at yahoo.com Wed Dec 31 05:14:44 2003 From: zbign at yahoo.com (Sun Tracking & Flywheel Storage) Date: Wed Dec 31 08:27:02 2003 Subject: [Bioperl-l] Environmental Energy Solutions Message-ID: <200312311326.hBVDQqNQ024622@portal.open-bio.org> . . Sun Tracking & Flywheel Storage . Suntracking_com@yahoo.com . Flywheelstorage_com@yahoo.com . Welcome to Environmental & Ecological Energy Solutions for Human Evolution & Global Energy Desire Offering Expertise, Consulting, Design, Engineering, Service, Equipment, Installation, Manufacturing & R&D effort to Satisfy your demand 1. Portable & Movable Units Design, Build & Install (daily R V & Boat energy supply ? -2 kW) Electric Vehicles time of work expansion (duplication) call for details & appointment 2. Stationary & Movable Home & Industrial Units Design, Manufacture & Build (daily Hot Water & Energy use) Tax free, 50% costs return by State of California (1 -10 kW or more - I can show you how) call for on site estimating 3. Solar Panels System Panels give over 270% yearly Energy Efficiency than stationary from just 25% of the basic unit cost Full System up to 400% 4. Sun Tracker catch Nature to the max 65% longer, 65% stronger, System work all day long with full power (from first to the last minute) 5. Parabolic Water Heater (with Sun Tracker concentrate heat to the max and beyond) 6. Day / night swing energy savings storage (mostly Europe - energy cost is terrible) 7. Solar Water Heater; for daily use 8. City Safe Wind Turbine; NO external moving parts 9. Energy Storage System (save days or weeks - for later use - months) (Euro U & USA patents pending) * Home Directions Design Careers Patents Projects Main * Manufacturing Flywheel Manual Clean Air CADD Shop Helpers Notary Solar Panel Boat Electric Vehicle Recreation Vehicle Sun Tracker Mech-Tronic Sun Tracker - 2? precision & 99% accuracy NASA Helios 1 - working proof of previous effort Compare horizontal & angular & GOOD Free stand yard panel up to 5 kW Superior perpendicularity - job well done Sun Tracker folded flat Campers Big & Small Yachts Boats on Ground & Water Losing efficiency if angle go out from perpendicularity is (Cos a)? = % * the biggest a the worse; over 40? stop charging at all Sun Tracker give you 3 panels job from one; save space, gain $$$ & $$ & ENERGY Our Product move set of panels size: ? - 100 m? (? - 1000 square foot) and bigger ? * Copy Right & Copy Protection reserved, Patent Pending ? Still under construction, sorry for inconvenience - web master ? Give us a call, will be happy to help you Sun Tracking 671 Aldo Ave # 11 Santa Clara, CA 95054 USA Fax (408) 482 - 2102 Tel (408) 482 - 2840 www.suntracking.com E-mail: sun_tracking@yahoo.com . Own / Backup / Emergency POWER SUPPLY - UPS www.flywheelstorage.com www.suntracking.com with optional SOLAR PANELS & Sun Tracking System SMALL UNITS STATIONARY FLYWHEEL STORAGE 10 20 30 40 100 150 Work / Energy kWh 10 20 30 40 100 150 Maximal Storage kWh 45 80 120 200 450 620 Output kW 2.5 5 7 8 10 15 Input kW 2 3 4 5 8 10 Solar Input kW 1 1.5 2 2.5 7 10 Price + SOLAR System $30,000 $40,000 Price Storage only $15,000 $22,000 PORTABLE SMALL MOVABLE TRAILER BIG UNITS STATIONARY FLYWHEEL STORAGE M 10 M 30 M 50 M 80 200 500 1300 M W h Work / Energy kWh 10 30 50 80 0.2 0.5 1.3 Maximal Storage kWh 40 125 200 300 0.8 2 6 Output kW 3 10 15 25 20 50 75 Input kW 2 5 8 10 10 25 40 Solar Input kW 2 4 6 8 15 25 50 Price + SOLAR System $60,000 $80,000 Price Storage only $30,000 $40,000 Dealers Welcome AFTER 25% DEPOSIT IS PAID SMALL UNITS DELIVERY IS 8 - 10 WEEKS Special Orders Possible BIG UNITS DELIVERY& Movable DELIVERY 3-6 MONTHS Suntracking5@yahoo.com suntracking3@yahoo.com . From lstein at cshl.edu Wed Dec 31 15:46:36 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Dec 31 15:54:03 2003 Subject: [Bioperl-l] bioperl graphic question In-Reply-To: References: Message-ID: <200312311546.36799.lstein@cshl.edu> Does anyone know where the gd2 RPMs can be found? I know almost nothing about RPMs. Lincoln On Tuesday 30 December 2003 10:24 am, Jason Stajich wrote: > We might want to post pointers to gd2 rpms for the redhat crowd... > as RH 9 still ships with libgd 1.8.x > > > -jason > > On Mon, 29 Dec 2003, Todd Harris wrote: > > Hi Jun Hu - > > > > As noted in the install guide, you will need to upgrade your > > libgd. Although using libgd 1.8.4 may not necessarily result in > > runtime errors, graphical elements may not appear as intended. > > Some glyphs will not render at all with 1.8.4. > > > > We needed to move to libgd2 in order to support the generation of > > both raster (png) and vector (svg) images from the same codebase. > > > > todd > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Wed Dec 31 16:24:05 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Dec 31 16:31:27 2003 Subject: [Bioperl-l] bioperl graphic question In-Reply-To: <200312311546.36799.lstein@cshl.edu> References: <200312311546.36799.lstein@cshl.edu> Message-ID: I built gd2 from the fedora SRPMs and had no problem (save other RPMs dependancy on libgd 1.8 (gnuplot,php,webalizer on RHL 7.3)) SRPM is here http://download.fedora.redhat.com/pub/fedora/linux/core/1/i386/os/SRPMS/gd-2.0.15-1.src.rpm I can put the compiled RPMs for 7.3 and 9.0 on the bioperl site if it would help. -jason On Wed, 31 Dec 2003, Lincoln Stein wrote: > Does anyone know where the gd2 RPMs can be found? I know almost > nothing about RPMs. > > Lincoln > > On Tuesday 30 December 2003 10:24 am, Jason Stajich wrote: > > We might want to post pointers to gd2 rpms for the redhat crowd... > > as RH 9 still ships with libgd 1.8.x > > > > > > -jason > > > > On Mon, 29 Dec 2003, Todd Harris wrote: > > > Hi Jun Hu - > > > > > > As noted in the install guide, you will need to upgrade your > > > libgd. Although using libgd 1.8.4 may not necessarily result in > > > runtime errors, graphical elements may not appear as intended. > > > Some glyphs will not render at all with 1.8.4. > > > > > > We needed to move to libgd2 in order to support the generation of > > > both raster (png) and vector (svg) images from the same codebase. > > > > > > todd > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From andreas.bernauer at gmx.de Wed Dec 17 18:08:44 2003 From: andreas.bernauer at gmx.de (Andreas Bernauer) Date: Wed Dec 31 21:19:31 2003 Subject: [Bioperl-l] [bioperl-l-bounces@portal.open-bio.org: Your message to Bioperl-l awaits moderator approval] Message-ID: <20031217230844.GF13034@hgt.mcb.uconn.edu> Hi, I know that we all don't want to receive spam through mailing lists, but can anybody tell me why every single message I send to the mailing list must be approved by the moderator although I have already signed up for the list? I always get a bounce like this one: ----- Forwarded message from bioperl-l-bounces@portal.open-bio.org ----- From: bioperl-l-bounces@portal.open-bio.org Date: Wed, 17 Dec 2003 14:59:54 -0500 To: andreas.bernauer@gmx.de Subject: Your message to Bioperl-l awaits moderator approval Your mail to 'Bioperl-l' with the subject substitution matrices Is being held until the list moderator can review it for approval. The reason it is being held: Message has a suspicious header Either the message will get posted to the list, or you will receive notification of the moderator's decision. If you would like to cancel this posting, please visit the following URL: http://portal.open-bio.org/mailman/confirm/bioperl-l/c4...2d40e ----- End forwarded message ----- And why are subjects like "Experiences from a newbie" and "substitution matrices" suspicious? What can I do to prevent this from happening all the time? Thanks for your input. Andreas. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031217/89b54953/attachment.bin From Richard.Holland at agresearch.co.nz Mon Dec 29 22:07:51 2003 From: Richard.Holland at agresearch.co.nz (Holland, Richard) Date: Wed Dec 31 21:35:28 2003 Subject: [Bioperl-l] Cannot make test on bioperl-db Message-ID: Apologies for the cross-post but I am not sure if it is BioSQL or BioPerl at fault here. I have installed the Oracle version of BioSQL and bioperl-db as downloaded last week, and they compile and install fine. However, make test on bioperl-db fails miserably with hundreds of messages similar to the following on virtually every module: ---SNIP--- bifo6.agresearch.co.nz> make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/cluster.......ok 155/160DBD::Oracle::st execute failed: ORA-00001: unique constraint (SGOWNER.XPKBIOENTRY_QUALIFIER_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for Statement "INSERT INTO bioentry_qualifier_value (ent_oid, trm_oid, value, rank) VALUES (?, ?, ?, ?)" with ParamValues: :p3='ORG=Escherischia coli; PROTGI=8928262; PROTID=Ec_pid; PCT=24; ALN=254', :p1='417', :p4=1, :p2='421'] at /usr/users/oracle/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdapt or.pm line 418, line 1. DBD::Oracle::st execute failed: ORA-00001: unique constraint (SGOWNER.XPKBIOENTRY_QUALIFIER_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for Statement "INSERT INTO bioentry_qualifier_value (ent_oid, trm_oid, value, rank) VALUES (?, ?, ?, ?)" with ParamValues: :p3='ORG=Homo sapiens; PROTGI=114238; PROTID=sp:P11245; PCT=100; ALN=289', :p1='417', :p4=2, :p2='421'] at /usr/users/oracle/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdapt or.pm line 418, line 1. DBD::Oracle::st execute failed: ORA-00001: unique constraint (SGOWNER.XPKBIOENTRY_QUALIFIER_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for Statement "INSERT INTO bioentry_qualifier_value (ent_oid, trm_oid, value, rank) VALUES (?, ?, ?, ?)" with ParamValues: :p3='ORG=Mus musculus; PROTGI=1703436; PROTID=sp:P50295; PCT=74; ALN=289', :p1='417', :p4=3, :p2='421'] at /usr/users/oracle/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdapt or.pm line 418, line 1. DBD::Oracle::st execute failed: ORA-00001: unique constraint (SGOWNER.XPKBIOENTRY_QUALIFIER_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for Statement "INSERT INTO bioentry_qualifier_value (ent_oid, trm_oid, value, rank) VALUES (?, ?, ?, ?)" with ParamValues: :p3='ORG=Rattus norvegicus; PROTGI=1703437; PROTID=sp:P50298; PCT=73; ALN=289', :p1='417', :p4=4, :p2='421'] at /usr/users/oracle/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdapt or.pm line 418, line 1. ---SNIP--- My installation includes the text indexer option, and I have run the script to load the taxon information (which incidentally took almost 20 hours to run on a three-processor Compaq Alpha with 11 gigs of RAM - that's also a bit of a worry, surely?). The only tables with any data in (I have checked this) are taxon and taxon_name. The test failed before I loaded the taxon information in exactly the same way as it does now. I am wary of wasting time trying to load real data if the test fails like this - is it a 'fatal' problem or can I ignore it and load real data anyway? Any idea what's going on here? I can't figure it out. cheers, Richard --- Richard Holland Bioinformatics Database Developer ITS, Agresearch Invermay x3279 ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================