From hlapp at gmx.net Fri Apr 1 03:52:10 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Apr 1 03:46:04 2005 Subject: [Bioperl-l] EntrezGene ASN parser In-Reply-To: <424B28D0.7030101@utk.edu> Message-ID: <557D2BAA-A28B-11D9-BE99-000A959EB4C4@gmx.net> On Wednesday, March 30, 2005, at 02:31 PM, Stefan Kirov wrote: > I just finished a Bioperl EntrezGene Parser based on Mingyi Liu's ASN > Gene parser. It creates two main objects: a Bio::Seq object which > contains most of the data such as references, description, map > location, etc; and a Bio::Cluster::SequenceFamily object, which > contains the refseqs and the gene structure (through NT/NC annotation, > represented as Bio::SeqFeature::Gene objects). You added Bio::SeqFeature::Gene objects to a Bio::Cluster::SequenceFamily instance? Bio::Cluster::SequenceFamily as a Bio::ClusterI should accept only Bio::PrimarySeqI as members ... I.e., originally these clusters were meant to hold sequences. I'm not sure it's a good idea to mix bags of sequences with bags of features. Or I misunderstood and you meant something else? > Another data I make available is the uncaptured data. So each time a > some data is transfered from the hash which represents the parsed > data, I am deleting the respective key. Everything else is concidered > uncaptured. I am doing this since some records could be non-compliant > or simply there may be new data supplied by NCBI. There will be > naturally some data, which is not interesting, and therefore is not > captured (a lot of redundant data in the EntrezGene). So the parser > would act like that: > my ($egene,$assoc_seq,$uncaptured)=$egparser->next_seq; Be careful here, this is non-compliant with Bio::SeqIO which mandates that next_seq() return a sequence object. You could use wantarray to determine whether to return a single object (supposedly $egene?) or three elements, but if someone does my $seq = $egparser->next_seq(); the result should not be the scalar 3 (i.e., number of elements). > There are few things I need to add (Markers and GO are not yet in > these objects), but most of work is done. Unless somebody objects, I > will commit the code (Bio::SeqIO::entrezgene?) when I write the > documentation to match the standard. Sounds like a good name. I suggest you commit so that interested others (i.e., me :) can have a look. Also, if you have certain use cases driving your work that expect certain things in certain places, it'd be good if you start writing test cases that check for those things. I certainly have such a use case as I probably indicated earlier; so if I need things in different places than you put them it'd be good to see where changes can be made easily and where not. I depend(ed) a lot on the LocusLink annotation and that will be no different for its successor. > Few notes: > 1. It would be nice if there is Bio::Annotation::DBLink::url method. > It makes sense (I think) since most DB links would refer also to a > webpage. Feel free to add, but don't expect e.g. bioperl-db to (de)serialize this. > 2. It takes now 45 minutes to parse the whole human ASN file, which is > 4 times slower. Keeping uncaptured data slows things down a bit, so I > will introduce -debug option. Anyway I think the speed is not going to > be an issue. What would -debug do? I think there should be an option to disable the keeping of what you call uncaptured data. Also, as I said before the standard way of calling is to ask for a sequence object, so if I know in advance that that's all I'm ever going to do I should have the option to disable construction of those other 2 objects you propose to return from next_seq. Sounds like the Entrez Gene parser is coming along without me having to write it. I'm thrilled Stefan!! -hilmar > 3. Due to the cyclic reference in the GeneStructure object I am > removing the Transcript->{parent} in the parser. This code should be > deleted once the Transcript object is fixed. > There are also some other minor issues, but I think I will be able to > fix them by the end of the week. > Please let me know what you think. > Stefan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri Apr 1 03:54:31 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Apr 1 03:48:15 2005 Subject: [Bioperl-l] patch to FeatureIO.pm for tied interface In-Reply-To: <200503312135.j2VLZZfY020817@portal.open-bio.org> Message-ID: On Thursday, March 31, 2005, at 01:41 PM, Cook, Malcolm wrote: > bioperlers, > > The following patch to bioperl-live makes up for what was probably a > copy and paste error and lets FeatureIO work with tied handle interface > too. > > I would be happy to have write access to cvs repository for this and > other such patches as discovered.... Sure, if Chris happens to read this? Didn't you once have an account? Or did I only dream that ;) -hilmar > > Cheers, > > Malcolm Cook > > > Index: FeatureIO.pm > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/FeatureIO.pm,v > retrieving revision 1.8 > diff -c -r1.8 FeatureIO.pm > *** FeatureIO.pm 18 Jan 2005 05:22:11 -0000 1.8 > --- FeatureIO.pm 31 Mar 2005 21:34:33 -0000 > *************** > *** 507,526 **** > > sub TIEHANDLE { > my ($class,$val) = @_; > ! return bless {'seqio' => $val}, $class; > } > > sub READLINE { > my $self = shift; > ! return $self->{'seqio'}->next_seq() unless wantarray; > my (@list, $obj); > ! push @list, $obj while $obj = $self->{'seqio'}->next_seq(); > return @list; > } > > sub PRINT { > my $self = shift; > ! $self->{'seqio'}->write_seq(@_); > } > > 1; > --- 507,526 ---- > > sub TIEHANDLE { > my ($class,$val) = @_; > ! return bless {'featio' => $val}, $class; > } > > sub READLINE { > my $self = shift; > ! return $self->{'featio'}->next_feature() unless wantarray; > my (@list, $obj); > ! push @list, $obj while $obj = $self->{'featio'}->next_feature(); > return @list; > } > > sub PRINT { > my $self = shift; > ! $self->{'featio'}->write_feature(@_); > } > > 1; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri Apr 1 04:10:07 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Apr 1 04:11:33 2005 Subject: [Bioperl-l] pubmed In-Reply-To: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu> Message-ID: *please* people always post the code or ideally a small snippet that demonstrates what you were trying to do, and post the result and if it's not an exception why it is not the result you expected. DO NOT just say 'blah doesn't work for me'. Whenever someone needs to guess what you probably did and what you probably mean you are wasting other people's time. The GI# you have has multiple refs with one having a pubmed ID and none having a medline ID. So, the one ref that has a pubmed ID should return it from $ref->pubmed() but without any code snippet it is impossible to tell what you actually did and what therefore might be the problem. -hilmar On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: > Hi there, > > http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > > I am not very familiar with BioPerl. I tried to follow the example > showing in the above page to retrieve pubmed ID under each Reference > tag , i.e., $value->pubmed(), but it doesn't work for me for the seq > gi#56961711. The authors() works for me. Appreciate any suggestions. > > Qunfeng > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From iak13000 at gmail.com Fri Apr 1 04:46:20 2005 From: iak13000 at gmail.com (Irshad Khan) Date: Fri Apr 1 04:40:57 2005 Subject: [Bioperl-l] Export a list of genes containing this family Message-ID: <7851868c050401014615151ff8@mail.gmail.com> Hi, I am working on zebra fish gene families in Ensembl and clicking on Export a list of genes containing this family takes me to MartView with "No datasets". eg. http://www.ensembl.org/Danio_rerio/familyview?family=ENSF00000002914 I have to extract members of large number of gene families or is there any module which I can use. Any help???? Irshad From palmeida at igc.gulbenkian.pt Fri Apr 1 05:12:23 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Fri Apr 1 05:06:27 2005 Subject: [Bioperl-l] Export a list of genes containing this family In-Reply-To: <7851868c050401014615151ff8@mail.gmail.com> References: <7851868c050401014615151ff8@mail.gmail.com> Message-ID: <20050401101223.GA2700@bioinf.igc.gulbenkian.pt> Hi, This is not an ideal solution, and may not work for all your families, but if you click 'Go', in the Export Data box, you get a multiple alignment of the proteins, which includes their names. You could feed that list to Ensmart to get the gene names. -Paulo On Fri, Apr 01, 2005 at 11:46:20AM +0200, Irshad Khan wrote: > Hi, > > I am working on zebra fish gene families in Ensembl and clicking on > Export a list of genes containing this family takes me to MartView > with "No datasets". > eg. > > http://www.ensembl.org/Danio_rerio/familyview?family=ENSF00000002914 > > I have to extract members of large number of gene families or is there > any module which I can use. > > Any help???? > > Irshad -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From palmeida at igc.gulbenkian.pt Fri Apr 1 05:23:11 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Fri Apr 1 05:16:36 2005 Subject: [Bioperl-l] pubmed In-Reply-To: References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu> Message-ID: <20050401102311.GB2700@bioinf.igc.gulbenkian.pt> On Fri, Apr 01, 2005 at 01:10:07AM -0800, Hilmar Lapp wrote: > *please* people always post the code or ideally a small snippet that > demonstrates what you were trying to do, I agree with that, and I would add that including the minimum working code that demonstrates what you are trying to do, may be helpful, in some cases. I only answer the most basic questions, because I'm not an expert, but I'm much more likely to help if I only have to copy and paste code to an editor, to test it, without having to type things like use Whatever::BigName::Module, and missing brackets. -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From palmeida at igc.gulbenkian.pt Fri Apr 1 05:23:11 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Fri Apr 1 05:21:23 2005 Subject: [Bioperl-l] pubmed In-Reply-To: References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu> Message-ID: <20050401102311.GB2700@bioinf.igc.gulbenkian.pt> On Fri, Apr 01, 2005 at 01:10:07AM -0800, Hilmar Lapp wrote: > *please* people always post the code or ideally a small snippet that > demonstrates what you were trying to do, I agree with that, and I would add that including the minimum working code that demonstrates what you are trying to do, may be helpful, in some cases. I only answer the most basic questions, because I'm not an expert, but I'm much more likely to help if I only have to copy and paste code to an editor, to test it, without having to type things like use Whatever::BigName::Module, and missing brackets. -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From skirov at utk.edu Fri Apr 1 08:20:57 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Apr 1 08:16:01 2005 Subject: [Bioperl-l] EntrezGene ASN parser In-Reply-To: <557D2BAA-A28B-11D9-BE99-000A959EB4C4@gmx.net> References: <557D2BAA-A28B-11D9-BE99-000A959EB4C4@gmx.net> Message-ID: <424D4AB9.6010001@utk.edu> Hilmar Lapp wrote: > > On Wednesday, March 30, 2005, at 02:31 PM, Stefan Kirov wrote: > >> I just finished a Bioperl EntrezGene Parser based on Mingyi Liu's ASN >> Gene parser. It creates two main objects: a Bio::Seq object which >> contains most of the data such as references, description, map >> location, etc; and a Bio::Cluster::SequenceFamily object, which >> contains the refseqs and the gene structure (through NT/NC >> annotation, represented as Bio::SeqFeature::Gene objects). > > > You added Bio::SeqFeature::Gene objects to a > Bio::Cluster::SequenceFamily instance? > > Bio::Cluster::SequenceFamily as a Bio::ClusterI should accept only > Bio::PrimarySeqI as members ... I.e., originally these clusters were > meant to hold sequences. > > I'm not sure it's a good idea to mix bags of sequences with bags of > features. > > Or I misunderstood and you meant something else? Nope. Bio::SeqFeature::Gene to Bio::Seq which then goes to Bio::Cluster::SequenceFamily. Sorry my description may have been misleadling. > >> Another data I make available is the uncaptured data. So each time a >> some data is transfered from the hash which represents the parsed >> data, I am deleting the respective key. Everything else is >> concidered uncaptured. I am doing this since some records could be >> non-compliant or simply there may be new data supplied by NCBI. There >> will be naturally some data, which is not interesting, and therefore >> is not captured (a lot of redundant data in the EntrezGene). So the >> parser would act like that: >> my ($egene,$assoc_seq,$uncaptured)=$egparser->next_seq; > > > Be careful here, this is non-compliant with Bio::SeqIO which mandates > that next_seq() return a sequence object. > > You could use wantarray to determine whether to return a single object > (supposedly $egene?) or three elements, but if someone does > > my $seq = $egparser->next_seq(); > > the result should not be the scalar 3 (i.e., number of elements). Hmm I see... So unless you want all data as an array (the 3 objects) you will get only the Bio::Seq object with the immediate entrezgene data (no genomic cocrdinates, etc...). OK, I will change that. > >> There are few things I need to add (Markers and GO are not yet in >> these objects), but most of work is done. Unless somebody objects, I >> will commit the code (Bio::SeqIO::entrezgene?) when I write the >> documentation to match the standard. > > > Sounds like a good name. I suggest you commit so that interested > others (i.e., me :) can have a look. > > Also, if you have certain use cases driving your work that expect > certain things in certain places, it'd be good if you start writing > test cases that check for those things. I certainly have such a use > case as I probably indicated earlier; so if I need things in different > places than you put them it'd be good to see where changes can be made > easily and where not. I depend(ed) a lot on the LocusLink annotation > and that will be no different for its successor. > OK... I will take a look again at locuslink and try to adjust as much as possible. Once I commit the code you can tell me if there is a critical part that needs additional work or changes. >> Few notes: >> 1. It would be nice if there is Bio::Annotation::DBLink::url method. >> It makes sense (I think) since most DB links would refer also to a >> webpage. > > > Feel free to add, but don't expect e.g. bioperl-db to (de)serialize this. > >> 2. It takes now 45 minutes to parse the whole human ASN file, which >> is 4 times slower. Keeping uncaptured data slows things down a bit, >> so I will introduce -debug option. Anyway I think the speed is not >> going to be an issue. > > > What would -debug do? > > I think there should be an option to disable the keeping of what you > call uncaptured data. Also, as I said before the standard way of > calling is to ask for a sequence object, so if I know in advance that > that's all I'm ever going to do I should have the option to disable > construction of those other 2 objects you propose to return from > next_seq. exactly what -debug would do (-debug=>'off' as default) > > Sounds like the Entrez Gene parser is coming along without me having > to write it. I'm thrilled Stefan!! Thanks... > > -hilmar > > >> 3. Due to the cyclic reference in the GeneStructure object I am >> removing the Transcript->{parent} in the parser. This code should be >> deleted once the Transcript object is fixed. >> There are also some other minor issues, but I think I will be able to >> fix them by the end of the week. >> Please let me know what you think. >> Stefan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From brian_osborne at cognia.com Fri Apr 1 12:26:01 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Apr 1 12:20:01 2005 Subject: [Bioperl-l] Help with taxonomy db In-Reply-To: <135a991135d544.135d544135a991@fudan.edu.cn> Message-ID: J, I'm just guessing here but is "flatfile" really the correct value for 'source'? Shouldn't this be an actual file name? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of zhoujie@fudan.edu.cn Sent: Thursday, March 31, 2005 10:45 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Help with taxonomy db Hi all, Would you please help me with this error message in using local taxonomy db? My test code is here: #------------------------------------------------------- use Bio::DB::Taxonomy; my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -nodesfile => 'nodes.dmp', -namesfile => 'names.dmp', -directory => 'index'); my $id = $db->get_taxonid('Homo sapiens'); print "id is $id for Homo sapiens\n"; #------------------------------------------------------- The code generates three files in the index directory: 'nodes','names2id','id2names'. but after that I get an error message: ------------- EXCEPTION ------------- MSG: No such file or directory index/nodes STACK Bio::DB::Taxonomy::flatfile::_db_connect c:/Perl/site/lib/Bio\DB\Taxonomy\ flatfile.pm:325 STACK Bio::DB::Taxonomy::flatfile::new c:/Perl/site/lib/Bio\DB\Taxonomy\flatfile .pm:138 STACK Bio::DB::Taxonomy::new c:/Perl/site/lib/Bio/DB/Taxonomy.pm:104 STACK toplevel local_taxonomy_query.pl:10 -------------------------------------- I'm quite confused with this error, because the nodes file is just in there, but why "No such file"? Can anyone tell me what happening? Any suggestion is appreciated. J Z _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri Apr 1 12:50:44 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Apr 1 12:45:17 2005 Subject: [Bioperl-l] Help with taxonomy db In-Reply-To: References: Message-ID: The source => 'flatfile' is right you can either use 'flatfile' or 'entrez' as the source. I think there may have been bugs in one of the released versions of the module where it doesn't properly re-open the index. I know I fixed something in this after the 1.4 release but I don't remember what was the exact problem. I'd see if bioperl 1.5.0 code still gives the problem. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 1, 2005, at 12:26 PM, Brian Osborne wrote: > J, > > I'm just guessing here but is "flatfile" really the correct value for > 'source'? Shouldn't this be an actual file name? > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of > zhoujie@fudan.edu.cn > Sent: Thursday, March 31, 2005 10:45 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Help with taxonomy db > > > Hi all, > Would you please help me with this error message in using local > taxonomy db? > > My test code is here: > #------------------------------------------------------- > use Bio::DB::Taxonomy; > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => 'nodes.dmp', > -namesfile => 'names.dmp', > -directory => 'index'); > > my $id = $db->get_taxonid('Homo sapiens'); > print "id is $id for Homo sapiens\n"; > #------------------------------------------------------- > > The code generates three files in the index > directory: 'nodes','names2id','id2names'. > > but after that I get an error message: > > ------------- EXCEPTION ------------- > MSG: No such file or directory index/nodes > STACK Bio::DB::Taxonomy::flatfile::_db_connect > c:/Perl/site/lib/Bio\DB\Taxonomy\ > flatfile.pm:325 > STACK Bio::DB::Taxonomy::flatfile::new > c:/Perl/site/lib/Bio\DB\Taxonomy\flatfile > .pm:138 > STACK Bio::DB::Taxonomy::new c:/Perl/site/lib/Bio/DB/Taxonomy.pm:104 > STACK toplevel local_taxonomy_query.pl:10 > -------------------------------------- > > I'm quite confused with this error, because the nodes file is just in > there, but why "No such file"? > > Can anyone tell me what happening? Any suggestion is appreciated. > > J Z > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Fri Apr 1 13:01:45 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Apr 1 12:55:48 2005 Subject: [Bioperl-l] Turning the tree into bifurcating one In-Reply-To: <69BA0F938FAC6A4CBEF49461720696F208DDE410@nihexchange16.nih.gov> References: <69BA0F938FAC6A4CBEF49461720696F208DDE410@nihexchange16.nih.gov> Message-ID: Sure just test if a node has more than 2 children, randomly choose 2 of the children and insert a new parent for them, insert a pseudo node. Walk from the root to the tips. Works for me for a star phylogeny too. This is somewhat untested though so it may not work in all situations but I hope it gets you started. use Bio::TreeIO; my $in =Bio::TreeIO->new(-format=> 'newick', -file => $treefile); my $out = Bio::TreeIO->new(-format => 'newick'); while( my $tree = $in->next_tree ) { my @internal = grep { ! $_->is_Leaf } $tree->get_nodes; for my $node ( @internal ) { my @children = $node->each_Descendent; while( @children > 2 ) { my $left = shift @children; my $right = shift @children; my $new_node =Bio::Tree::Node->new(); $new_node->ancestor($node); $node->remove_Descendent($right); $node->remove_Descendent($left); $new_node->add_Descendent($left); $new_node->add_Descendent($right); push @children, $new_node; $node->add_Descendent($new_node); } } $out->write_tree($tree); } -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 30, 2005, at 12:02 PM, Babenko, Vladimir (NIH/NLM/NCBI) wrote: > Greetings, > Is there any possible solution to insert some pseudo-nodes into the > tree to > make it bufurkating? > Some programs can deal only with bifurkating ones... > Thank you, > Vladimir > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From mlemieux at bioinfo.ca Fri Apr 1 11:51:31 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Fri Apr 1 12:56:14 2005 Subject: [Bioperl-l] Easy switching from wwwBlast to QBlast - Wrong CVS diff Message-ID: <8fb341904f9d69e2137400f98b1e257c@bioinfo.ca> My apologies! I'm new to CVS and did the "cvs diff" wrong for Perl.pm and RemoteBlast.pm. The LocalServerBlast.pm is straight Perl code since it's a new module. This time I (correctly, I think) did "cvs diff -aur HEAD". Madeleine -------------- next part -------------- A non-text attachment was scrubbed... Name: Perl.pm.diffHEAD Type: application/octet-stream Size: 9077 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050401/ea179271/Perl.pm-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: RemoteBlast.pm.diffHEAD Type: application/octet-stream Size: 4671 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050401/ea179271/RemoteBlast.pm-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: LocalServerBlast.pm Type: application/octet-stream Size: 16943 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050401/ea179271/LocalServerBlast-0001.obj From brian_osborne at cognia.com Fri Apr 1 15:02:29 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Apr 1 14:56:44 2005 Subject: [Bioperl-l] Help with taxonomy db In-Reply-To: Message-ID: Jason, OK, in which case the Taxonomy.pm module is wrong, it says "localfile" not "flatfile". I'll correct this .... Brian O. -----Original Message----- From: Jason Stajich [mailto:jason.stajich@duke.edu] Sent: Friday, April 01, 2005 12:51 PM To: Brian Osborne Cc: zhoujie@fudan.edu.cn; bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] Help with taxonomy db The source => 'flatfile' is right you can either use 'flatfile' or 'entrez' as the source. I think there may have been bugs in one of the released versions of the module where it doesn't properly re-open the index. I know I fixed something in this after the 1.4 release but I don't remember what was the exact problem. I'd see if bioperl 1.5.0 code still gives the problem. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 1, 2005, at 12:26 PM, Brian Osborne wrote: > J, > > I'm just guessing here but is "flatfile" really the correct value for > 'source'? Shouldn't this be an actual file name? > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of > zhoujie@fudan.edu.cn > Sent: Thursday, March 31, 2005 10:45 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Help with taxonomy db > > > Hi all, > Would you please help me with this error message in using local > taxonomy db? > > My test code is here: > #------------------------------------------------------- > use Bio::DB::Taxonomy; > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => 'nodes.dmp', > -namesfile => 'names.dmp', > -directory => 'index'); > > my $id = $db->get_taxonid('Homo sapiens'); > print "id is $id for Homo sapiens\n"; > #------------------------------------------------------- > > The code generates three files in the index > directory: 'nodes','names2id','id2names'. > > but after that I get an error message: > > ------------- EXCEPTION ------------- > MSG: No such file or directory index/nodes > STACK Bio::DB::Taxonomy::flatfile::_db_connect > c:/Perl/site/lib/Bio\DB\Taxonomy\ > flatfile.pm:325 > STACK Bio::DB::Taxonomy::flatfile::new > c:/Perl/site/lib/Bio\DB\Taxonomy\flatfile > .pm:138 > STACK Bio::DB::Taxonomy::new c:/Perl/site/lib/Bio/DB/Taxonomy.pm:104 > STACK toplevel local_taxonomy_query.pl:10 > -------------------------------------- > > I'm quite confused with this error, because the nodes file is just in > there, but why "No such file"? > > Can anyone tell me what happening? Any suggestion is appreciated. > > J Z > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From lopaki at gmail.com Fri Apr 1 16:19:26 2005 From: lopaki at gmail.com (Scott Lambdin) Date: Fri Apr 1 16:17:21 2005 Subject: [Bioperl-l] INSECT Cluster and RAID Message-ID: <529e768305040113191bcbd19a@mail.gmail.com> Hello - Someone told me it is dangerous to use RAID storage with INSECT comparison clusters. Is that true? --Scott From hlapp at gmx.net Sat Apr 2 19:50:46 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Apr 2 19:44:47 2005 Subject: [Bioperl-l] pubmed In-Reply-To: <6.1.2.0.2.20050401140236.037cb060@qfdong.mail.iastate.edu> Message-ID: <6A4AE81E-A3DA-11D9-9A9E-000A959EB4C4@gmx.net> So what is the result of this script that you wouldn't have expected or that is not giving you what you need? BTW annotation objects under the tagname 'reference' are usually Bio::Annotation::Reference objects and have methods $ref->authors(), $ref->pubmed(), $ref->medline, etc. Check the POD. -hilmar On Friday, April 1, 2005, at 12:04 PM, Qunfeng wrote: > Hilmar and Paulo, > > I apologize for that, > > here is a snippet of my code, I must have missed something very > simple. Thanks for your help! -- Qunfeng > > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > my $inputGBfile = $ARGV[0]; > my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", > '-format' => 'GenBank'); > > my $seq_object; > while (1){ > eval{ > $seq_object = $seqio_object->next_seq; > }; > if($@){ > print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; > next; > } > last if(!defined $seq_object); > my $gi = $seq_object->primary_id; > my $anno_collection = $seq_object->annotation; > foreach my $key ( $anno_collection->get_all_annotation_keys ) { > my @annotations = $anno_collection->get_Annotations($key); > foreach my $value ( @annotations ) { > if($value->tagname eq "reference"){ > my $hash_ref = $value->hash_tree; > my $authors = $hash_ref->{'authors'}; > my $medline = $hash_ref->{'medline'}; > my $pubmed = $hash_ref->{'pubmed'}; > print STDERR > "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; > } > } > } > } > > > > At 03:10 AM 4/1/2005, you wrote: >> *please* people always post the code or ideally a small snippet that >> demonstrates what you were trying to do, and post the result and if >> it's not an exception why it is not the result you expected. DO NOT >> just say 'blah doesn't work for me'. Whenever someone needs to guess >> what you probably did and what you probably mean you are wasting >> other people's time. >> >> The GI# you have has multiple refs with one having a pubmed ID and >> none having a medline ID. So, the one ref that has a pubmed ID should >> return it from $ref->pubmed() but without any code snippet it is >> impossible to tell what you actually did and what therefore might be >> the problem. >> >> -hilmar >> >> On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: >> >>> Hi there, >>> >>> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html >>> >>> I am not very familiar with BioPerl. I tried to follow the example >>> showing in the above page to retrieve pubmed ID under each Reference >>> tag , i.e., $value->pubmed(), but it doesn't work for me for the seq >>> gi#56961711. The authors() works for me. Appreciate any >>> suggestions. >>> >>> Qunfeng >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From yanfeng at csit.fsu.edu Mon Apr 4 01:05:31 2005 From: yanfeng at csit.fsu.edu (yanfeng) Date: Sun Apr 3 20:59:49 2005 Subject: [Bioperl-l] two sequences alignment question Message-ID: <00b401c538d3$eee495a0$7c22c992@yanfeng> Hi, bioperl experts, I am a bioper and perl beginner. I have two sequences. One is very long( say 16000) and the other is short( say 100) . The short one is very similar to some part of the long sequence. I want to locate the position like from 1200 to 1300 is the very similar sequence part. I tried to use bl2seq module.But failed. It maybe a simple question but I just don't know how to do it. #Get 2 sequences $str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta', ); my $seq3 = $str->next_seq(); my $seq4 = $str->next_seq(); # Run bl2seq on them $factory = Bio::Tools::StandAloneBlast->new('program' => 'blastp', 'outfile' => 'bl2seq.out'); my $bl2seq_report = $factory->bl2seq($seq3, $seq4); # Note that report is a Bio::SearchIO object # Use AlignIO.pm to create a SimpleAlign object from the bl2seq report $str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' => 'bl2seq'); $aln = $str->next_aln(); the complain imformation is "can't write to bl2seq.out. From jason.stajich at duke.edu Sun Apr 3 21:56:05 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Apr 3 21:50:16 2005 Subject: [Bioperl-l] two sequences alignment question In-Reply-To: <00b401c538d3$eee495a0$7c22c992@yanfeng> References: <00b401c538d3$eee495a0$7c22c992@yanfeng> Message-ID: <663ead25a2d629331b75fcfb704c977f@duke.edu> If you just want the position, get it from the Search::HSP object. Don't bother with AlignIO. See the SearchIO HOWTO for more information. if( my $hit = $bl2seq->next_hit ) { while( my $hsp =$hit->next_hsp ) { print "query: ", $hsp->query->start, "-", $hsp->query->end, " aligned to subject: ", $hsp->hit->start, "-", $hsp->hit->end, "\n"; } } On Apr 4, 2005, at 1:05 AM, yanfeng wrote: > Hi, bioperl experts, > I am a bioper and perl beginner. I have two sequences. One is very > long( say 16000) and the other is short( say 100) . The short one is > very similar to some part of the long sequence. > I want to locate the position like from 1200 to 1300 is the very > similar sequence part. > I tried to use bl2seq module.But failed. It maybe a simple question > but I just don't know how to do it. > > #Get 2 sequences > $str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta', ); > my $seq3 = $str->next_seq(); > my $seq4 = $str->next_seq(); > > # Run bl2seq on them > $factory = Bio::Tools::StandAloneBlast->new('program' => 'blastp', > 'outfile' => 'bl2seq.out'); > my $bl2seq_report = $factory->bl2seq($seq3, $seq4); > # Note that report is a Bio::SearchIO object > > # Use AlignIO.pm to create a SimpleAlign object from the bl2seq report > $str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' => 'bl2seq'); > $aln = $str->next_aln(); > > > the complain imformation is "can't write to bl2seq.out. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From millerj at bcm.tmc.edu Mon Apr 4 00:26:49 2005 From: millerj at bcm.tmc.edu (Jonathan Miller) Date: Mon Apr 4 10:22:22 2005 Subject: [Bioperl-l] Bio::Index:EMBL on embl flatfiles Message-ID: I have the following task to perform in large quantities. I can do it successfully with files from ncbi, in fasta and genbank format. For various reasons, I would prefer to do it with embl format annotation files, rather than genbank. I first used formatdb to create a blast index for Apis_mellifera.AMEL1.1.mar.dna.contig.fa . I blast my sequence against this file, and obtain the expected hit, and then I want to find annotation for this hit, in embl format flatfiles (Apis_mellifera.0.dat, etc.) with bioperl. To do this, I have to first make an index with bioperl, using Bio::Index::EMBL . Then I need to use "fetch" within bioperl. The problem is, that "fetch" within bioperl doesn't seem to know how to use the fasta headers to find the sequence in the embl flatfile. There is probably a simple solution to this that everyone working with bioperl and embl flatfiles knows, but I don't know what it is. From suhast at iitk.ac.in Mon Apr 4 08:58:42 2005 From: suhast at iitk.ac.in (Suhas Tikole) Date: Mon Apr 4 10:22:23 2005 Subject: [Bioperl-l] Bio::LocationI Interface Message-ID: <33232.172.28.124.21.1112619522.squirrel@nwebmail.iitk.ac.in> Dear Sir, I am unable to understand and use of Bio::LocationI interface. How to create an object for it ? And which objects are acceptable for its methods ?? Please suggest me any solution and give the online reference if any. Thank You -- Suhas Tikole. *************************************************** >From : Mr. Suhas S. Tikole. D-307 / Hall - V, ( MT - BSBE ), Indian Institute of Technology, Kanpur. Kanpur - 208016. INDIA. Phone No: (0512) - 2597315 , 2597115 (R) *************************************************** From jason.stajich at gmail.com Mon Apr 4 10:21:26 2005 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon Apr 4 10:22:24 2005 Subject: [Bioperl-l] Re: BIo::LocationI Interface In-Reply-To: <33226.172.28.124.21.1112619447.squirrel@nwebmail.iitk.ac.in> References: <33226.172.28.124.21.1112619447.squirrel@nwebmail.iitk.ac.in> Message-ID: <9e0f00eec93e2df92a44a8df83126dc8@bioperl.org> please post your questions to the mailing list. To answer the question you don't create something with the interface directly, you want to use an implementing class. Try Bio::Location::Simple for example. -jason -- Jason Stajich jason.stajich-at-gmail.com or jason-at-bioperl.org http://jason.open-bio.org On Apr 4, 2005, at 8:57 AM, Suhas Tikole wrote: > Dear Sir, > > I am unable to understand and use of Bio::LocationI interface. > > How to create an object for it ? And which objects are acceptable > for > its methods ?? > > Please suggest me any solution and give the online reference if > any. > > Thank You > > -- Suhas Tikole. > > *************************************************** > From : > Mr. Suhas S. Tikole. > D-307 / Hall - V, ( MT - BSBE ), > Indian Institute of Technology, Kanpur. > Kanpur - 208016. > INDIA. > Phone No: (0512) - 2597315 , 2597115 (R) > *************************************************** > From tembe at bioanalysis.org Mon Apr 4 12:18:15 2005 From: tembe at bioanalysis.org (Waibhav Tembe) Date: Mon Apr 4 12:14:18 2005 Subject: [Bioperl-l] Sorting BLAST Output Message-ID: <425168C7.80105@bioanalysis.org> Hello List, I was wondering if there is any easy way to "sort" the hits in a blast output based on something other than the default sort key E value/bits. For example, I would like to sort the hits based on the number of mismatches or length of the alignments reported. I considered blast2table utility. But I would like to retain all the details in the BLAST output. Thanks. From brian_osborne at cognia.com Mon Apr 4 13:45:38 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Apr 4 13:39:35 2005 Subject: [Bioperl-l] Bio::Index:EMBL on embl flatfiles In-Reply-To: Message-ID: Jonathan, Is there some identifier in common between the fasta entries and the EMBL entries? If so what you want to be able to do is to create your EMBL indices based on this key, but the current Bio::Index::EMBL doesn't do this. If you want to wait a couple of days I can modify EMBL.pm so it can create this sort of custom index, or you can try to modify EMBL.pm yourself. If you look at its sister, Genbank.pm, you'll see that the modifications are not difficult. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jonathan Miller Sent: Monday, April 04, 2005 12:27 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Bio::Index:EMBL on embl flatfiles I have the following task to perform in large quantities. I can do it successfully with files from ncbi, in fasta and genbank format. For various reasons, I would prefer to do it with embl format annotation files, rather than genbank. I first used formatdb to create a blast index for Apis_mellifera.AMEL1.1.mar.dna.contig.fa . I blast my sequence against this file, and obtain the expected hit, and then I want to find annotation for this hit, in embl format flatfiles (Apis_mellifera.0.dat, etc.) with bioperl. To do this, I have to first make an index with bioperl, using Bio::Index::EMBL . Then I need to use "fetch" within bioperl. The problem is, that "fetch" within bioperl doesn't seem to know how to use the fasta headers to find the sequence in the embl flatfile. There is probably a simple solution to this that everyone working with bioperl and embl flatfiles knows, but I don't know what it is. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Mon Apr 4 13:57:25 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Apr 4 13:51:29 2005 Subject: [Bioperl-l] Re: Thank you for your reply. In-Reply-To: <1112636538.42517c7a36ed8@email.csit.fsu.edu> References: <00b401c538d3$eee495a0$7c22c992@yanfeng> <663ead25a2d629331b75fcfb704c977f@duke.edu> <1112636538.42517c7a36ed8@email.csit.fsu.edu> Message-ID: <3bcfe9f0a848c18dd67b84d95a706396@duke.edu> Please do not send these questions directly to me - send them to the bioperl mailing list where other people can help. I do not have time to help everyone on an individual basis. I think you need to try some code and post that before just asking people to do your work for you. I give an EXAMPLE of how to start this to illustrate how easy it is to use Bioperl to grab what you want from the report file. If you decide to use BLAST or BLAT or other algorithms you don't need to change your script, only the SearchIO format. Here is simple code that would probably achieve something using FASTA nucleotide-to-nucleotide searching. A more proper algorithm if you are searching protein coding sequence is to do something more sophisticated like do a translated search. use Bio::SearchIO; use strict; use warnings; my $fh; my ($qfile,$mitofile) = @ARGV; # pass query and mitochondria fasta files on the cmd-line in that order open($fh, "fasta34 -E 0.01 -m 9 -d 0 $qfile $mitofile |"); my $searchio = Bio::SearchIO->new(-format => 'fasta', -fh => $fh); if( my $r = $searchio->next_result ) { if ( my $hit = $r->next_hit ) { # only want the BEST hit in this SIMPLE example if( my $hsp = $hit->next_hsp ) { # only one HSP per hit for FASTA, change for BLAST print "location is ", $hsp->hit->location->to_FTstring(), " ", $hsp->hit->length, " nt long\n"; } } } -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 4, 2005, at 1:42 PM, Yanfeng Shi wrote: > Hi, Jason > Thank you for your reply. I still can't figure out. > I have a mitochondrial genome sequence and one short sequence of other > species > mitochondrial genome. I want to find the relative positon of my > mitochondrial > genome which is similar to that cds sequence. > > > -- > Yanfeng Shi > School of Computational Science > FSU,TLH,FL,32306 > Tel:850-645-0302(W) > > the sequence I want to compare with my mitochondrial sequence > GTTAATGTAGCTTAAACAATAAAGCAAGGCACTGAAAATGCCTAGATGAGTGTATTAACTCCATAAACAT > ATAGGTCTGGTCCCAGCCTTCCTATTAGCCTTTAATAGACTTACACATGCAAGCATCCACGCCCCAGTGA > AAATGCCCTCCAAGTTAATAAGACCAAGAGGAGCTGGTATCAAGCACACATCCGTAGCTCACGACACCTT > GCTCAGCCACACCCCCACGGGAAACAGCAGTGATAAAAATTAAGCCATAAACGAAAGTTTGACTAAGCCA > TATTGATTAGGGTTGGTAAATTTCGTGCCAGCCACCGCGGTCATACGATTAACCCAAGTTAATAGGCACA > CGGCGTAAAGCGTGTTAAAGCACCACCCCAAATAAAGCTAAATTTCAATTAAGCTGTAAAAAGCCATAAT > TGCAACAAAAATAAATAACGAAAGTAACTTTACAATC > > > my mitochondrial sequence. >> > GTTAATGTAGCTTAAACAATAAAGCAAGGCACTGAAAATGCCTAGATGAGTGTATTAACTCCATAAACAT > ATAGGTCTGGTCCCAGCCTTCCTATTAGCCTTTAATAGACTTACACATGCAAGCATCCACGCCCCAGTGA > AAATGCCCTCCAAGTTAATAAGACCAAGAGGAGCTGGTATCAAGCACACATCCGTAGCTCACGACACCTT > GCTCAGCCACACCCCCACGGGAAACAGCAGTGATAAAAATTAAGCCATAAACGAAAGTTTGACTAAGCCA > TATTGATTAGGGTTGGTAAATTTCGTGCCAGCCACCGCGGTCATACGATTAACCCAAGTTAATAGGCACA > CGGCGTAAAGCGTGTTAAAGCACCACCCCAAATAAAGCTAAATTTCAATTAAGCTGTAAAAAGCCATAAT > TGCAACAAAAATAAATAACGAAAGTAACTTTACAATCGCTGAAACACGATAGCTAGGACCCAAACTGGGA > TTAGATACCCCACTATGCCTAGCCCTAAACACAAATAGTTTTATAAACAAAACTATTCGCCAGAGTACTA > CCGGCAATAGCTTAAAACTCAAAGGACTTGGCGGTGCTTTATACCCTTCTAGAGGAGCCTGTTCTATAAT > CGATAAACCCCGATAGACCTCACCATTCCTCGCTAATACAGTCTATATACCGCCATCTTCAGCAAACCCT > AAAAAGGAGCAAAAGTAAGCGCAATCATAGTACATAAAAACGTTAGGTCAAGGTGTAACCTATGGAATGG > AAAGAAATGGGCTACATTTTCTAATCCAAGAACAACTCAATACGAAAATTATTATGAAATTAATAATTAA > AGGAGGATTTAGCAGTAAACTAAGAATAGAGTGCTTAGTTGAATTAGGCCATGAAGCACGCACACACCGC > CCGTCACCCTCCTCAAGTAAGTACAATATACTCAAACTTATTTATATATATTAATCATATGAGAGGAGAC > AAGTCGTAACAAGGTAAGCATACTGGAAAGTGTGCTTGGATAAACCAAGATATAGCTTAAATAAAGCACC > TAGGTTTACACCTAGAAGATTTCACTACACCACGAATATCTTGAACCAATTCTAGCCCATAAATCTACTC > ACACTAAATTACCAATTTATTATAAATAAAACATTTATCTACCATTAAAAGTATAGGAGATAGAAATTTT > AACATGGCGCTATAGAGACAGTACCGTAAGGGAACGATGAAAGAAAAAAATCAAAGTACAAAAAAGCAAA > GATTATCCCTTGTACCTTTTGCATAATGAGTTAACTAGTAAAAACTTAACAAAATGAATTTTAGCTAAGT > ACCCCGAAACCAGACGAGCTACTTATGAACAATTTATAAAGAACTAACTCATCTATGTAGCAAAATAGTG > AGAAGATTTGTAAGTAGAGGTGAAACGCCTAACGAGCCTGGTGATAGCTGGTTGTCCAGAAAATGAATGT > TAGTTCAGCTTTAAAAATACCGAAAATATAAACAAATTATAATGTATTTTTAAAAGTTAGTCTAAAAGGG > TACAGCCTTTTAGAAATGGATATAACCTTAATTAGAGAGTAAAATTTAATATATATCATAGTAGGCTTAA > AAGCAGCCACCAATTAAGAAAGCGTTAAAGCTCAACGACAAAACAATATTAATTTCAACAACAAATAATT > AACTCCTAACCTAATACTGGACTAATCTATTAAGAATAGAAGAAATAATGTTAATATGAGTAACAAGAAA > TATTTCTCCCTGCATAAGTTTAAGTCAGTATCTGATATTATTCTGACTATTAACAGCAAAATAAGAATAG > TCCACCCATAAATAACTTATTTATTATACTGTTAATCCAACACAGGAGTGCACTCAGGGAAAGATTAAAA > GAAGTAAAAGGAACTCGGCAAACACCAAACCCCGCCTGTTTACCAAAAACATCACCTCTAGCATTACTAG > TATTAGAGGCACTGCCTGCCCAGTGACAACCGTTAAACGGCCGCGGTATCCTGACCGTGCAAAGGTAGCA > TAATCACTTGTTCTCTAAATAAGGACTTGTATGAATGGCCACACGAGGGTTTTACTGTCTCTTACTTCTA > ATCAGTGAAATTGACCTTCCCGTGAAGAGGCGGGAATATATTAATAAGACGAGAAGACCCTATGGAGCTT > TAACTACTTAGCCCAAAGAAACAAAATTTATTTCTAAGGAAACAACAACATTCTCTATGGGTTAACAGCT > TTGGTTGGGGTGACCTCGGAGAATAAAAAATCCTCCGAGCGATTTTAAAGACTAGACCCACAAGTCAAAT > CACATAATCGCTTATTGATCCAAAAAATTGATCAACGGAACAAGTTACCCTAGGGATAACAGCGCAATCC > TATTCAAGAGTCCATATCGACAATAGGGTTTACGACCTCGATGTTGGATCAGGACATCCTGATGGTGCAA > CCGCTATCAAAGGTTCGTTTGTCAACGATTAAAGTCCTACGTGATCTGAGTTCAGACCGGAGTAATCCAG > GTCGGTTTCTATCTATTATGTATTTCTCCTAGTACGAAAGGACCAGAGAAATAAGGCCAACTTCAAACAA > GCGCCTTAAATTGATTAATGATATTATCTTAATTAACTCTACAAATAAACCCTACCCTAGAAAAGGGTTT > TGTTAAGGTGGCAGAGCCCGGTAATTGCGTAAAACTTAAAACTTTATAATCAGAGATTCAAATCCTCTCC > TTAACAAAATGTTTATAATTAATATTCTAATACTAATTATCCCTATCCTATTAGCCGTAGCATTCCTTAC > ACTAGTAGAACGAAAAGTACTAGGCTATATACAATTTCGAAAAGGCCCAAATGTTGTAGGCCCCTACGGT > CTGCTCCAACCTATTGCAGATGCCATCAAACTCTTTATCAAAGAACCACTACGACCCGCTACATCTTCAA > TCTCAATATTTATTTTAGCCCCTATCCTAGCCCTAAGTTTAGCTCTAACTATATGAATTCCCCTACCCAT > ACCACATCCTCTCATCAACATAAATCTAGGAGTCCTATTTATACTGGCAATATCAAGCCTGGCTGTGTAT > TCCATCCTCTGATCAGGCTGAGCCTCCATTCTAAATTATGCACTAATCGGAGCCCTACGAGCAGTGGCAC > AAACAATCTCATATGAAGTAACGCTAGCCATCATCCTACTATCAGTTCTTCTAATAAATGGATCTTTTAC > TCTTTCCACTTTAATCACTACACAAGAACAAGTATGACTTATTTTCCCAGCATGACCCTTAGCAATAATA > TGATTTATCTCAACATTAGCAGAAACAAATCGTGCTCCCTTCGATCTCACCGAAGGCGAATCAGAACTAG > TCTCTGGCTTTAATGTAGAATACGCAGCAGGACCATTCGCCCTATTTTTCATAGCAGAATATGCAAACAT > TATTATAATGAATATTTTCACAACAACTTTATTCCTCGGAGCATTCCACAACCCGATCTTACCAGAACTC > TACACAATCAACTTTACTATTAAATCATTATTGTTAACAATTTTCTTCCTATGAATTCGAGCATCTTATC > CTCGATTTCGCTATGACCAACTAATACATTTATTATGAAAAAATTTCCTACCCCTTACACTAGCCCTATG > TATATGACATGTATCACTACCCATTCTCCTATCAAGCATTCCCCCACAAACGTAAGAAATATGTCTGACA > AAAGAGTTACTTTGATAGAGTAAATAATAGAGGTTCAAGCCCTCTTATTTCTAGAACTATAGGAGTTGAA > CCTACTCTCAAGAATCCAAAACTCTTTGTGCTCCCAATTACACCAAATTCTAATAGTAAGGTCAGCTAAT > TAAGCTATCGGGCCCATACCCCGAAAATGTTGGTTTATATCCTTCCCGTACTAATAAACCCAATTATCTT > TATTATTATTCTATCAACACTAATACTAGGCACTATTATTGTTATAATTAGCTCCCATTGATTACTTGTC > TGAATCGGATTTGAAATAAATATGCTCGCCATCATCCCTATTATAATAAAGAAACACAATCCACGAGCCA > CAGAAGCATCCACCAAATATTTTTTAACCCAATCTACGGCCTCAATATTACTAATAATAGCCGTCATTAT > TAATCTAATATTCTCAGGCCAATGAACCGTAATAAAATTATTTAACCCAATGGCATCCATACTTATAACA > ATAGCCCTCACTATAAAACTGGGAATAGCCCCATTTCACTTCTGAGTCCCAGAAGTAACACAAGGCATCC > CTCTATCATCAGGCCTAATCCTACTCACGTGACAAAAGTTAGCACCTATATCAGTACTTTACCAAATTTT > TCCATCCATTAACCTGAATATAATCTTAACTATTTCTATTTTATCCATTATAATTGGAGGCTGAGGAGGA > CTAAACCAAACGCAACTACGAAAAATCATAGCATATTCATCAATTGCCCACATAGGCTGAATAACAGCAG > TCCTGCCATATAACCCCACAATAACACTACTAAACCTGATTATTTATATCATCATAACCTCCACTATATT > CTCACTATTTATAGCTAACTCAGCTACTACCACTCTATCACTATCACACACTTGAAATAAAATACCTGTA > ATATCTACTCTAATCCTTGTAACCCTCCTATCAATAGGAGGACTTCCCCCACTATCAGGATTTATGCCAA > AATGAATAATTATCCAAGAAATAACAAAAAATGATAGCCTCATTCTACCTACCCTTATAGCAATCACAGC > ACTCTTAAACTTATACTTTTACATACGACTCACATATTCCACCGCACTAACAATATTTCCTTCTGTAAAT > AATATAAAAATAAAATGACAATACTCCACTACAAAACAAATAATTCTTCTACCCACAATAGTTATTTTAT > CTACTATGCTACTACCACTCACACCAATCCTATCAGTACTAGAATAGGAGTTTAGGTTAACCTAGACCAA > GAGCCTTCAAAGCCCTAAGCAAGTATGATATACTTAACTCCTGATAAGGACTGCAAGACCATATCTTACA > TCAATTGAATGCAAATCAACCACTTTAATTAAGCTAAATCCTCACTAGATTGGTGGGCTCCACCCCCACG > AAACTTTAGTTAACAGCTAAATACCCTAATCAACTGGCTTCAATCTACTTCTCCCGCCGCGAAGAAAAAA > AAGGCGGGAGAAGCCCCGGCAGAGTTTGAAGCTGCTTCTTTGAATTTGCAATTCAACATGAAATTTCACC > ACAGGACCTGGTAAAAAGAGGATTATAAACCTCTATCTTTAGATTTACAGTCTAATGCTTCACTCAGCCA > TCTTACCTATGTTCATTAACCGCTGATTATTTTCAACTAACCATAAAGACATCGGCACCCTGTACCTACT > ATTCGGTGCCTGAGCAGGCATAGTAGGGACAGCCCTAAGCCTATTAATTCGTGCTGAACTGGGTCAACCG > GGAACCCTACTTGGAGATGATCAAATTTATAACGTAATTGTAACCGCACATGCATTCGTAATAATTTTCT > TTATAGTAATACCCATTATAATTGGAGGATTCGGTAATTGACTGGTTCCTTTAATAATTGGTGCTCCAGA > CATAGCATTCCCTCGAATAAATAACATAAGCTTTTGGCTTCTCCCACCCTCTTTCCTTCTACTTCTAGCA > TCATCTATAGTTGAAGCTGGCGCAGGAACAGGCTGAACTGTATATCCCCCTCTAGCTGGTAATCTGGCCC > ATGCAGGAGCTTCAGTAGATCTAACTATTTTTTCTTTACACCTGGCAGGCGTTTCTTCAATTTTAGGGGC > TATTAACTTTATTACAACAATTATTAATATGAAACCTCCTGCCATATCACAATATCAAACCCCTCTATTC > GTGTGATCCGTACTAATTACCGCTGTATTGCTACTTCTCTCACTCCCTGTACTAGCAGCCGGAATTACAA > TACTATTAACAGATCGAAATTTAAATACAACTTTTTTTGACCCAGCAGGAGGTGGAGACCCTATTCTGTA > CCAACACCTGTTCTGATTCTTTGGCCACCCTGAAGTATATATTCTTATTTTACCCGGCTTTGGTATAATT > TCTCACATCGTAACATACTACTCAGGAAAAAAAGAACCATTTGGATATATGGGAATGGTCTGAGCCATAA > TATCAATCGGATTTTTAGGATTTATCGTATGGGCTCACCACATGTTCACAGTTGGAATAGACGTTGACAC > ACGAGCCTATTTCACATCAGCTACCATGATTATTGCTATCCCAACTGGAGTAAAAGTCTTTAGCTGATTG > GCTACACTTCATGGAGGCAATATCAAATGATCACCTGCTATAATATGAGCTCTAGGCTTTATTTTCCTCT > TTACAGTTGGAGGCCTGACCGGCATTGTTCTTGCCAACTCCTCTCTTGATATTGTCCTCCACGACACATA > TTATGTAGTTGCACACTTTCACTACGTATTATCAATAGGAGCCGTATTCGCTATCATAGGAGGATTTGTT > CACTGATTTCCGCTATTCTCAGGATATACCCTCAACAATACATGAGCCAAAATTCATTTCGTAATCATAT > TTGTAGGTGTAAATATAACCTTTTTCCCACAACATTTTCTAGGATTGTCTGGCATACCACGACGCTACTC > TGACTACCCAGATGCATATACAATATGAAATACCATCTCATCCATAGGCTCATTTATTTCCCTAACAGCA > GTTATACTAATAATTTTTATTATCTGAGAGGCATTTGCATCTAAGCGAGAAGTCCTAACCGTAGAGCTGA > CAACAACAAACCTAGAGTGACTAAACGGATGTCCTCCACCATATCACACATTTGAAGAACCTACATACGT > CAATTTAAAATAAGAAAGGAAGGAATCGAACCCTCCATAGCTGGTTTCAAGCCAACATCATAACCACTAT > GTCTTTCTCAATTAATGAGGTTTTAGTAAAATATTATATAACTTTGTCAAAGTTAAGTTACAGGTGAAAA > CCCCGTATGCCTCATATGGCTTATCCCATACAATTAGGCTTCCAAGATGCAACGTCACCTATTATAGAAG > AATTACTACATTTTCATGATCACACACTAATAATTGTTTTTCTAATTAGCTCACTAGTACTATACGTTAT > TTCATTAATGCTAACGACAAAATTAACTCACACTAGTACAATAGATGCCCAAGAGGTGGAGACAATCTGA > ACCATTCTACCAGCTATTATTCTAATCCTAATTGCCCTCCCTTCCTTGCGAATCTTGTACATGATAGATG > AGATTAATAATCCATCTCTCACAGTAAAAACCATAGGACATCAATGATACTGAAGCTATGAATATACAGA > CTATGAAGACCTAAGCTTTGATTCCTATATGATTCCTACATCAGAATTAAAACCTGGAGAACTACGACTA > CTGGAAGTGGATAACCGAGTTGTTCTACCAATAGAAATAACAATTCGAATATTAGTCTCTTCTGAAGACG > TATTACACTCCTGAGCCGTACCTTCCCTAGGACTGAAAACAGACGCAATCCCAGGCCGCCTAAACCAAAC > AACTCTTATATCGACTCGACCAGGTCTCTACTACGGACAATGCTCTGAGATCTGCGGATCGGATCACAGC > TTCATACCTATTGTCCTTGAACTAGTTCCACTAAAATATTTTGAAAAATGATCTGCATCAATATTATAAA > ATCATTAAGAAGCTAAAATAGCACTAGCCTTTTAAGCTAGAGACTGAGGGCACAATTACCCTCCTTGATG > AAATGCCACAACTAGACACATCCACATGACTCATTATAATTATATCAATATTCCTAGTTCTATTCATCAT > TTTCCAATTAAAAATTTCAAAACACAATTTCTACTTTAATCCAGAAACCCTACCAACCAAAGCACAAAAA > CAAAACACCCCTTGAGAAACGAAATGAACGAAAATCTATTTGCCTCTTTTATTACCCCAATAATTCTAGG > CCTTCCGCTTGCCACCCTAGTTGTTATATTTCCTAGTCTATTATTCCCAACATCAAATCGTCTAGTAAAT > AACCGCCTTATTTCCCTCCAACAATGAGCACTCCAACTTGTATCAAAACAAATAATAGGTATTCATAATA > CTAAAGGACAAACATGGACATTAATACTTATATCCCTAATCTTATTCATCGGATCCACAAATCTCCTGGG > CCTATTACCTCACTCATTTACACCAACTACACAATTATCAATAAATCTAGGTATGGCCATCCCCCTATGA > GCAGGAGCTGTAATCACTGGCTTCCGTAACAAGACTAAAGCATCACTTGCCCACTTTCTCCCCCAAGGAA > CACCAACCCCATTGATTCCTATACTAATTATTATTGAGACTATTAGTCTTTTTATTCAACCAATTGCCTT > AGCTGTACGACTAACAGCTAATATTACTGCAGGACACCTGCTGATTCACCTAATTGGAGGAGCCACACTT > GCACTAATAAGCATCAGTACTACAACAGCCCTCATTACATTTATTATTCTAGTACTACTTACAATTCTTG > AGTTCGCAGTAGCCATAATCCAAGCCTACGTATTTACTCTTCTAGTCAGCCTCTACCTGCATGACAACAC > ATAATGACACACCAAACCCATGCTTACCACATAGTCAATCCAAGTCCCTGACCTCTAACAGGAGCTCTAT > CAGCCCTACTGATAACTTCTGGCTTAATCATATGATTTCACTTTAACTCAATTATTCTACTAACACTTGG > CCTAACAACAAATATACTTACAATATATCAATGATGACGAGACATTATCCGAGAAAGTACCTTTCAAGGA > CACCACACTCCAACCGTCCAAAAAGGCCTCCGCTATGGGATGATCCTTTTTATTATTTCTGAAGTCTTAT > TCTTCACTGGATTCTTCTGGGCATTTTATCACTCAAGCCTGGCCCCAACACCCGAGCTAGGCGGATGCTG > ACCTCCAACAGGCATTCATCCACTTAACCCCCTAGAAGTCCCATTACTTAATACCTCCGTCTTACTGGCT > TCAGGAGTCTCCATCACCTGAGCACACCATAGTCTTATAGAAGGAAACCGCAACCATATATTGCAGGCTC > TATTTATTACTATTGCACTCGGCGTCTATTTCACACTACTTCAAGCCTCAGAATACTATGAAGCACCTTT > CACCATTTCAGATGGAGTTTATGGCTCAACTTTTTTTGTAGCTACGGGCTTTCATGGTCTCCACGTTATC > ATTGGATCCACCTTCTTAATTGTCTGCTTTTTCCGCCAATTAAAATTTCACTTTACTTCCAGCCACCATT > TCGGCTTCGAAGCCGCTGCCTGATACTGACACTTCGTAGACGTAGTATGATTATTCCTTTACGTATCCAT > CTATTGATGAGGCTCATATTCTTTTAGTATTAACCAGTACAACTGACTTCCAATCAGTTAGTTTCGGTAT > AGCCCGAAAAAGAATAATAAATCTAATACTAGCCCTTCTAACTAATTTTGCTCTAGCCTCACTACTTGTT > ATCATCGCATTCTGACTTCCTCAACTGAACGTATATTCAGAAAAAACAAGCCCATACGAATGTGGATTTG > ATCCCATGGGATCAGCTCGTCTACCTTTCTCCATAAAATTTTTCCTAGTAGCCATTACATTCCTCCTTTT > TGACCTAGAGATTGCACTCCTTCTACCATTGCCATGAGCCTCACAAACAAATAACCTAAGCACAATACTT > ACTATAGCCCTTTTTCTAATTCTACTATTAGCCGCAAGTTTAGCTTACGAATGAACCCAAAAAGGACTAG > AATGAACTGAATATGGTATGTAGTTTAAAATAAAATAAATGATTTCGACTCATTAGATTATGATTTAATT > CATAATTACCAAATGTCTCTAGTATATATAAATATCATAACAGCATTTATAGTATCCCTCGCAGGACTAT > TAATATATCGATCTCACCTCATGTCCTCTCTCCTATGTCTAGAGGGTATAATATTATCCCTATTTGTACT > AGCTACCTTAACAATCCTAAACTCACATTTCACCCTAGCGAGCATAATACCTATTATCTTATTAGTTTTC > GCAGCCTGCGAGGCAGCACTAGGACTATCCCTGCTAGTAATAGTGTCAAATACATATGGCACTGACTATG > TCCAAAATCTCAATTTACTTCAATGCTAAAGTACATTATTCCTACAATTATACTCATACCCCTGACCTGA > CTATCAAAGGGCAGTATAATTTGAATCAACTCCACAACCCACAGCCTATTAATTAGCCTCACAAGCCTTC > TCCTCATAAATCAGTTCAGTGATAATAGTCTCAACTTCTCATTAATATTCTTTTCTGACTCCCTATCAAC > ACCACTATTAATTTTAACCATATGACTTCTCCCCTTAATATTAATAGCTAGCCAACACCATTTATCAAAA > GAAAGCCTCACCCGAAAAAAACTATATATTACCATGCTAATTCTACTACAACTATTCCTAATTATAACTT > TCACTGCTATAGAACTTATTCTCTTCTATATTTTATTTGAGGCAACACTAGTCCCCACACTTATTATTAT > TACCCGATGAGGGAATCAAACAGAACGCCTAAACGCTGGCCTTTATTTCCTGTTTTACACACTAGTAGGT > TCCCTCCCACTACTAGTCGCACTAGTCTACCTCCAAAACATTACTGGATCCCTAAACTTCTTAGTGCTCC > AATACTGAATACAACCCCTATCCAGCTCCTGATCAAACGTCTTCATATGATTAGCATGCATAATAGCCTT > CATAGTAAAAATACCCTTATATGGCCTCCACCTTTGACTACCCAAAGCCCATGTAGAAGCCCCTATTGCA > GGCTCCATAGTTCTTGCAGCAATTCTACTAAAACTAGGAGGATATGGCATACTACGAATTACAACATTCC > TAAATCCACTTACCGAATTTATAGCATATCCATTTATTATATTGTCTCTATGAGGCATAATTATAACTAG > CTCAATCTGTCTCCGTCAAACAGACCTCAAGTCACTAATTGCATACTCCTCTGTTAGCCACATAGCACTC > GTTATTGTAGCCATCCTCATTCAAACACCCTGAAGCTACATAGGAGCCACAGCCCTAATGATTGCCCACG > GCCTTACCTCCTCTATACTTTTTTGCCTAGCAAATTCCGGCTATGAACGAATCCACAGTCGAACAATAAT > TTTAGCCCGAGGCCTACAAACTTTCCTTCCACTAATGGCCACCTGATGACTCTTAGCGAGCCTAACTAAT > TTAGCTCTCCCCCCAACAATCAATCTAATTGGAGAATTATTCGTAGTGATATCCTCTTTCTCATGATCCA > ACATTACAATCATTTTAATAGGACTAAATATAGTAATTACCGCCCTATATTCTCTCTACATACTAATTAT > AACACAACGAGGTAAATATACCCACCATATTAATAACATCTCACCCTCCTTTACGCGAGAAAACGCCCTC > ATGTCATTACATATTCTACCCTTACTTCTACTATCACTCAATCCAAAAATTATTCTAGGAACCTTGTACT > GTAAATATAGTTTAAAAAAAACATTAGATTGTGAATCTAATAATAGAAACTTATACCTTCTTATTTACCG > AAAAAGTTCATAGGAACTGCTAATTCCTATAACCCGTGTATAATAACACGGCTTTTTCGAACTTTTAGAG > GATGGTAGATATCCGTTGGTCTTAGGAATCAAAAATTGGTGCAACTCCAAATAAAAGTAATAAACCTATT > CTCTTCCTTTACACTAACCACCCTACTATTATTAATTATTCCCATCCTAACTACAAGCTCTGAAAACTAC > AAAACCTCTAATTACCCATTCTATGTAAAAACAACCATCTCATGTGCCTTTCTCATCAGCATAGTACCCA > CAATAATATTTATTCACACAGGCCAAGAAATAATTATCTCAAACTGACACTGACTTACTATTCAAACCAT > TAAACTATCACTCAGCTTTAAAATAGATTACTTCTCAATAATATTTGTCCCAGTAGCATTATTCGTCACA > TGATCCATCATGGAATTTTCAATATGATATATACACTCAGACCCTAACATTAATCAATTCTTTAAATATC > TTCTCCTATTCCTCATTACCATACTCATTCTCGTCACAGCAAATAATCTATTTCAACTATTTATTGGATG > AGAAGGCGTAGGAATCATATCATTCCTACTCATTGGATGATGACACGGACGAACAGATGCAAATACAGCA > GCCTTACAAGCAATTTTATATAACCGCATCGGCGACATTGGCTTTATTCTAGCAATAGCCTGATTCCTTA > CAAATCTTAACGCCTGAGACTTCCAACAAATTTTTATGCTAAACCCAAATGACTCTAACATACCCCTAAT > AGGCCTCGCACTAGCCGCAACCGGGAAATCCGCCCAATTCGGCTTACATCCATGACTACCCTCCGCAATA > GAAGGCCCAACTCCTGTCTCAGCATTACTCCACTCAAGCACAATAGTGGTAGCAGGAATTTTTCTACTAA > TCCGCTTTTATCCACTGACAGAAAACAACAAATTTGCACAATCTATCCTACTATGCCTAGGAGCTATCAC > CACCCTATTTACAGCAATATGTGCTCTTACCCAAAATGATATCAAAAAAATATCGCTTTTTCCACATCCA > GCCAACTCGGCCTATATAATAGTTACAATTGGTATTAATCAACCCTACTTGGCATTCCTTCACATCTGCA > CCCATGCTTTCTTCAAAGCTATACTATTTATGTGCTCCGGCTCTATTATTCATAGTTTAAATGATGAGCA > AGATATTCGAAAAATAGGAGGCCTGTTCAAAACTATACCATTTACTACAACAGCCCTAATTATTGGCAGC > CTTGCACTAACAGGAATGCCTTTCCTTACCGGATTTTATTCCACAGACCTAATCATTGAAGCCGCTAATA > CGTCGTACACCAACGCCTGAGCCCTCTTAATAACGCTAATCGCCACCTCTTTCACAGCCATTTACAGCAC > CCGTATTATTTTCTTTGCACTCCTAGGACAACCTCGATTCCCAACCCTAGTCACTATTAACGAAAATAAC > CCCTTCCTAATAAATTCCATTAAACGTCTGTTAATTGGAAGTCTTTTCGCGGGATTTATTATTTCTAACA > ACATTCCCCCAACAACAATTCCTCAAATAACAATACCCCATCATCTAAAAATAATAGCTCTAGCAGTAAC > AATCTTGGGTTTTATTTTAGCACTAGAAATTAGCAACATAACCCAAAACCTAAAACTTAACCACCCAACA > AACACTTTCAAATTCTCTAACATACTAGGGTATTTTCCCACAATTATACACCGCCTAGCCCCTTACATAA > ACCTAACAATAAGTCAAAAATCAGCATCCTCCCTCCTAGACCTAATTTGACTCGAAAATATTTTACCAAA > AACAACTTCACTTGCCCAAGCAAAATTATCAATCATAGTCACAAGCCAAAAAGGCTTGATCAAACTGTAC > TTCCTATCTTTCCTAGTCACAATTACTATTAGCGTAATCCTATTTAATTTCCACGAGTAATTTCTATAAT > AACTACCACACCAATTAATAAAGATCACCCAGTCACAATAACTAATCAAGTACCATAACTGTATAAAGCC > GCAATCCCTATAGCCTCCTCACTAAAAAACCCAGAATCCCCTGTATCATAAATAACTCAATCCCCAAGCC > CATTAAACTGAAACACAATTTCTACCTCCTCATCCTTCAACACATAATAAACCATCGCAGCTTCCATTAA > CAAACCAGTAATAAAAGCCCCTAAAACAGCCTTACTAGATAATCAAATCTCAGGATATTGCTCCGTAGCT > ATCGCCGTTGTATAGCCAAAAACCACCATCATTCCCCCCAAGTAAATCAAAAACACCATCAAACCTAAAA > AAGACCCACCAAAATTTAATACAATACCACAACCAACCCCACCACTCACAATTAAACCCAACCCCCCATA > AATAGGCGAAGGTTTTGAAGAAAATCCCACAAAACCAAGCACAAAAATGATACTTAAAATAAATACAATG > TATGTTATCATTATTCTCGCATGGAATCTAACCACGACTAATGATATGAAAAACCATCGTTGTCATTCAA > CTACAAGAACACCAATGACAAATATTCGAAAAACCCACCCACTAATAAAAATTGTAAACAACGCATTCAT > TGACCTCCCAGCCCCATCAAACATTTCATCTTGATGAAACTTTGGCTCCCTACTAGGAATTTGCTTAATT > CTACAAATCCTCACAGGCCTATTCCTAGCAATACACTACACATCCGACACAATAACAGCATTCTCCTCAG > TTACCCATATCTGCCGAGACGTCAACTATGGCTGAATTATCCGATATATACACGCGAACGGAGCATCAAT > ATTTTTTATTTGCCTATTTATTCACGTAGGACGAGGCCTATACTATGGATCATATACCTTTCTAGAAACA > TGAAATATTGGAGTAATCCTCCTATTTACAGTTATAGCCACAGCATTCGTAGGTTATGTTCTACCATGAG > GACAAATATCATTTTGAGGAGCAACAGTTATCACTAATCTCCTTTCAGCAATTCCATATATTGGCACAAA > CTTAGTTGAATGAATCTGGGGAGGCTTTTCAGTAGATAAAGCGACCCTGACCCGATTCTTCGCCTTCCAT > TTTATTCTCCCATTTATTATCGCAGCACTCGCTATAGTCCACCTGCTCTTTCTCCACGAAACAGGATCCA > ACAACCCAACAGGAATCCCATCAGATGCAGATAAGATTCCCTTCCACCCCTACTACACCATCAAAGATAT > TCTAGGTGTCCTACTTCTAATTCTCTTCCTAATATCACTAGTATTATTCGTACCAGACCTGCTTGGAGAC > CCCGACAACTACACCCCAGCAAACCCACTCAATACACCTCCCCATATTAAGCCCGAATGATATTTCCTAT > TTGCATACGCAATTCTACGATCAATTCCTAATAAACTAGGAGGAGTCCTAGCCCTAATTTCATCCATCCT > AATCCTAATCCTTATACCCCTTCTCCATACGTCCAAACAACGTAGCATAATTTTCCGACCACTCAGTCAA > TGCCTATTCTGAATCCTAGTAGCAGATCTGCTAACACTTACATGAATTGGAGGACAACCAGTTGAGCACC > CCTTCATCATTATTGGACAACTAGCATCCATTCTATATTTCCTCATTATTCTTGTACTTATGCCAATCAT > CAGTACAATCGAAAATAACCTCCTAAAATGAAGACAAGTCTTTGTAGTATACTCAATACACTGGTCTTGT > AAACCAGAAAAGGAGAACAACCAACCTCCCTAAGACTCAAGGAAGAAGCCATAGCCCCACCATCAACACC > CAAAGCTGAAGTTCTATTTAAACTATTCCCTGACGCATATTAATATAGCTCCATAAAAATCAAGAACCTT > ATCAGTATTAAATTTCCAAAAACCTTTAGTAATTTAACACAGCTTTCTCACTCAACAGCCAATTTACATT > TTACATACCATTATCTACACAAGCCACATGATAAGGTATATAATCTAACCATATGCGCTTATAGTACATA > AAATTAATGTATTAGGACATATTATGTATAATAGTACATTACATTACATGCCCCATGCTTATAAGCACGT > ATATTCCATTATTTACAGTACATGGTACATATCCTTGCTTGATAGTACATAGCACATTTAAGTCAAATCA > ATTCTTGCGAACATGCGTATCCCGTCCCTTAGATCACGAGCTTAATTACCATGCCGCGTGAAACCAGCAA > CCCGCTTGGCAGGGATCCTTGTTCTCGCTCCGGGCCCATGCACCGTGGGGGTAGCTATTTAATGAACTTT > ATCAGACATCTGGTTCTTTCTTCAGGGCCATCTCACCTAAAATCGCCCACTCGTTCCTCTTAAATAAGAC > ATCTCGATGGACTAATGACTAATCAGCCCATGCTCACACATAACTGTGGTGTCATACATTTGGTATTTTT > ATTTTTTGGGGGATGCTTGGACTCAGCTATGGCCGTCTGAGGCCCCGACCCGGAGCATGAATTGTAGCTG > GACTTAACTGCATCTTGAGCATCCCCATAATGGTAGGCGCAGGGCATTGCAGTCAATGGTTACAGGACAT > AGTTATTATTTCAAGACTCAACTTTATAATCCATTATTCCCCCCCCCTCCTTATATAGTTATCACCTTTT > TTAACACGCTTTTCCCTAGATAGTATTTTAAATTTATCGCATTTTCAATACTCAAATTAGCACTCCAGAA > GGAGGTAAGTATATAAGCGCCAATTCTTCATCAATCGCACCACA From brian_osborne at cognia.com Mon Apr 4 14:34:28 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Apr 4 14:28:34 2005 Subject: [Bioperl-l] Bio::Index:EMBL on embl flatfiles In-Reply-To: Message-ID: Jonathan, I see. You're not interested in fetching by id or key, you're interested in fetching by coordinate. Modifying Bio/Index/EMBL won't help then. One idea would be that you create the fasta files used for the BLAST from the EMBL files, and make the fasta headers match the desired format. On the other hand I may not understand precisely your intent. Brian O. -----Original Message----- From: Jonathan Miller [mailto:millerj@bcm.tmc.edu] Sent: Monday, April 04, 2005 2:04 PM To: Brian Osborne Subject: RE: [Bioperl-l] Bio::Index:EMBL on embl flatfiles Dear Brian, thank you for your reply. regarding an "identifier in common," this seems to be somewhat tricky; I give a specific example below. Because of the use of relative, and not absolute coordinates, within the fasta file, I will probably wait a few days for you to write the interface, if you would be so kind; on the other hand, you might well believe that EMBL should have formatted their files somewhat differently, or you might suggest I go about it an entirely different way. many thanks, jm More specifically, the goal is to BLAST a (local) fasta file; find the sequence location, and look up its annotation in a (local) EMBL flatfile. So, for example, for honeybee fasta file from EMBL: Apis_mellifera.AMEL1.1.mar.dna.contig.fa, the fasta header of the contig where the sequence is found might be: >Contig18.1.1312 dna:contig scaffold:AMEL1.1:Group1.1:1:1312:1 Now I want look up the annotation in the EMBL flat files, (for example, Apis_mellifera.0.dat), that I have indexed using Bio::Index:EMBL. However, the accession numbers in the EMBL flat files have the form: scaffold:AMEL1.1:Group1.10:1:348491:1 and apparently scaffold:AMEL1.1:Group1.1:1:1312:1 -never- appears in an EMBL flat file, although the entry: SV scaffold:AMEL1.1:Group1.1:1:422138:1 does appear, as does the entry within this ID: FT misc_feature 1..1312 FT /note="contig Contig18.1.1312 1..1312(1)" FT misc_feature 1860..4967 FT /note="contig Contig17.1.3108 1..3108(1)" ...etc... however, as you can see, the coordinates in this last entry are -local- and not absolute with respect to the scaffold entry. I don't know if I should be indexing differently, searching on a different key, or what. With NCBI fasta and GenBank flat files, this procedure was straightfoward (e.g. no thought was required) to implement successfully. Presumably there is an analogous interface for the EMBL format? On Mon, 4 Apr 2005, Brian Osborne wrote: > Jonathan, > > Is there some identifier in common between the fasta entries and the EMBL > entries? If so what you want to be able to do is to create your EMBL indices > based on this key, but the current Bio::Index::EMBL doesn't do this. If you > want to wait a couple of days I can modify EMBL.pm so it can create this > sort of custom index, or you can try to modify EMBL.pm yourself. If you look > at its sister, Genbank.pm, you'll see that the modifications are not > difficult. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jonathan > Miller > Sent: Monday, April 04, 2005 12:27 AM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] Bio::Index:EMBL on embl flatfiles > > > > I have the following task to perform in large > quantities. I can do it successfully with > files from ncbi, in fasta and genbank format. > > For various reasons, I would prefer to do it > with embl format annotation files, rather than > genbank. > > I first used formatdb to create a blast index > for Apis_mellifera.AMEL1.1.mar.dna.contig.fa . > > I blast my sequence against this file, > and obtain the expected hit, and then I want to find > annotation for this hit, in embl format > flatfiles (Apis_mellifera.0.dat, etc.) with bioperl. > > To do this, I have to first make an index with > bioperl, using Bio::Index::EMBL . > > Then I need to use "fetch" within bioperl. > The problem is, that "fetch" within bioperl > doesn't seem to know how to use the fasta > headers to find the sequence in the embl flatfile. > > There is probably a simple solution to this > that everyone working with bioperl and embl > flatfiles knows, but I don't know what it > is. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From qfdong at iastate.edu Mon Apr 4 14:57:47 2005 From: qfdong at iastate.edu (Qunfeng) Date: Mon Apr 4 14:51:39 2005 Subject: [Bioperl-l] pubmed In-Reply-To: <6A4AE81E-A3DA-11D9-9A9E-000A959EB4C4@gmx.net> References: <6.1.2.0.2.20050401140236.037cb060@qfdong.mail.iastate.edu> <6A4AE81E-A3DA-11D9-9A9E-000A959EB4C4@gmx.net> Message-ID: <6.1.2.0.2.20050404135713.03959dd0@qfdong.mail.iastate.edu> so, I tried to use my $authors = $hash_ref->{'authors'}; my $medline = $hash_ref->{'medline'}; my $pubmed = $hash_ref->{'pubmed'}; to parse out authors, medline, pubmed. I was able to successfully parse out authors and medline but not pubmed. Then I tried to use my $authors = $value->authors(); my $medline = $value->medline(); my $pubmed = $value->pubmed(); and I got the same thing. Qunfeng At 07:50 PM 4/2/2005, Hilmar Lapp wrote: >So what is the result of this script that you wouldn't have expected or >that is not giving you what you need? > >BTW annotation objects under the tagname 'reference' are usually >Bio::Annotation::Reference objects and have methods $ref->authors(), >$ref->pubmed(), $ref->medline, etc. Check the POD. > > -hilmar > >On Friday, April 1, 2005, at 12:04 PM, Qunfeng wrote: > >>Hilmar and Paulo, >> >>I apologize for that, >> >>here is a snippet of my code, I must have missed something very simple. >>Thanks for your help! -- Qunfeng >> >>#!/usr/bin/perl -w >>use strict; >>use Bio::SeqIO; >> >>my $inputGBfile = $ARGV[0]; >>my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", >> '-format' => 'GenBank'); >> >>my $seq_object; >>while (1){ >> eval{ >> $seq_object = $seqio_object->next_seq; >> }; >> if($@){ >> print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; >> next; >> } >> last if(!defined $seq_object); >> my $gi = $seq_object->primary_id; >> my $anno_collection = $seq_object->annotation; >> foreach my $key ( $anno_collection->get_all_annotation_keys ) { >> my @annotations = $anno_collection->get_Annotations($key); >> foreach my $value ( @annotations ) { >> if($value->tagname eq "reference"){ >> my $hash_ref = $value->hash_tree; >> my $authors = $hash_ref->{'authors'}; >> my $medline = $hash_ref->{'medline'}; >> my $pubmed = $hash_ref->{'pubmed'}; >> print STDERR >> "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; >> } >> } >> } >>} >> >> >> >>At 03:10 AM 4/1/2005, you wrote: >>>*please* people always post the code or ideally a small snippet that >>>demonstrates what you were trying to do, and post the result and if it's >>>not an exception why it is not the result you expected. DO NOT just say >>>'blah doesn't work for me'. Whenever someone needs to guess what you >>>probably did and what you probably mean you are wasting other people's time. >>> >>>The GI# you have has multiple refs with one having a pubmed ID and none >>>having a medline ID. So, the one ref that has a pubmed ID should return >>>it from $ref->pubmed() but without any code snippet it is impossible to >>>tell what you actually did and what therefore might be the problem. >>> >>> -hilmar >>> >>>On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: >>> >>>>Hi there, >>>> >>>>http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html >>>> >>>>I am not very familiar with BioPerl. I tried to follow the example >>>>showing in the above page to retrieve pubmed ID under each Reference >>>>tag , i.e., $value->pubmed(), but it doesn't work for me for the seq >>>>gi#56961711. The authors() works for me. Appreciate any >>> suggestions. >>>> >>>>Qunfeng >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>-- >>>------------------------------------------------------------- >>>Hilmar Lapp email: lapp at gnf.org >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >>>------------------------------------------------------------- >> >-- >------------------------------------------------------------- >Hilmar Lapp email: lapp at gnf.org >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >------------------------------------------------------------- > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Mon Apr 4 16:07:54 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Apr 4 16:02:09 2005 Subject: [Bioperl-l] pubmed In-Reply-To: <6.1.2.0.2.20050404135713.03959dd0@qfdong.mail.iastate.edu> Message-ID: Qunfeng, Only 1 of the 5 references in the 56961711 entry has a Pubmed id, the rest will return nothing when you try $value->pubmed. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng Sent: Monday, April 04, 2005 2:58 PM To: Bioperl Subject: Re: [Bioperl-l] pubmed so, I tried to use my $authors = $hash_ref->{'authors'}; my $medline = $hash_ref->{'medline'}; my $pubmed = $hash_ref->{'pubmed'}; to parse out authors, medline, pubmed. I was able to successfully parse out authors and medline but not pubmed. Then I tried to use my $authors = $value->authors(); my $medline = $value->medline(); my $pubmed = $value->pubmed(); and I got the same thing. Qunfeng At 07:50 PM 4/2/2005, Hilmar Lapp wrote: >So what is the result of this script that you wouldn't have expected or >that is not giving you what you need? > >BTW annotation objects under the tagname 'reference' are usually >Bio::Annotation::Reference objects and have methods $ref->authors(), >$ref->pubmed(), $ref->medline, etc. Check the POD. > > -hilmar > >On Friday, April 1, 2005, at 12:04 PM, Qunfeng wrote: > >>Hilmar and Paulo, >> >>I apologize for that, >> >>here is a snippet of my code, I must have missed something very simple. >>Thanks for your help! -- Qunfeng >> >>#!/usr/bin/perl -w >>use strict; >>use Bio::SeqIO; >> >>my $inputGBfile = $ARGV[0]; >>my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", >> '-format' => 'GenBank'); >> >>my $seq_object; >>while (1){ >> eval{ >> $seq_object = $seqio_object->next_seq; >> }; >> if($@){ >> print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; >> next; >> } >> last if(!defined $seq_object); >> my $gi = $seq_object->primary_id; >> my $anno_collection = $seq_object->annotation; >> foreach my $key ( $anno_collection->get_all_annotation_keys ) { >> my @annotations = $anno_collection->get_Annotations($key); >> foreach my $value ( @annotations ) { >> if($value->tagname eq "reference"){ >> my $hash_ref = $value->hash_tree; >> my $authors = $hash_ref->{'authors'}; >> my $medline = $hash_ref->{'medline'}; >> my $pubmed = $hash_ref->{'pubmed'}; >> print STDERR >> "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; >> } >> } >> } >>} >> >> >> >>At 03:10 AM 4/1/2005, you wrote: >>>*please* people always post the code or ideally a small snippet that >>>demonstrates what you were trying to do, and post the result and if it's >>>not an exception why it is not the result you expected. DO NOT just say >>>'blah doesn't work for me'. Whenever someone needs to guess what you >>>probably did and what you probably mean you are wasting other people's time. >>> >>>The GI# you have has multiple refs with one having a pubmed ID and none >>>having a medline ID. So, the one ref that has a pubmed ID should return >>>it from $ref->pubmed() but without any code snippet it is impossible to >>>tell what you actually did and what therefore might be the problem. >>> >>> -hilmar >>> >>>On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: >>> >>>>Hi there, >>>> >>>>http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html >>>> >>>>I am not very familiar with BioPerl. I tried to follow the example >>>>showing in the above page to retrieve pubmed ID under each Reference >>>>tag , i.e., $value->pubmed(), but it doesn't work for me for the seq >>>>gi#56961711. The authors() works for me. Appreciate any >>> suggestions. >>>> >>>>Qunfeng >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>-- >>>------------------------------------------------------------- >>>Hilmar Lapp email: lapp at gnf.org >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >>>------------------------------------------------------------- >> >-- >------------------------------------------------------------- >Hilmar Lapp email: lapp at gnf.org >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >------------------------------------------------------------- > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From yanfeng at csit.fsu.edu Mon Apr 4 16:52:29 2005 From: yanfeng at csit.fsu.edu (Yanfeng Shi) Date: Mon Apr 4 16:46:35 2005 Subject: [Bioperl-l] Query sequence length <= 0 Message-ID: <1112647949.4251a90da51e7@email.csit.fsu.edu> I have a mitochondrial genome sequence and one short sequence of other species mitochondrial genome. I want to find the relative positon of my mitochondrial genome which is similar to that cds sequence. I run the code below.But I got " ***[/usr/local/bin/fasta34] Query sequence length <= 0: /home/yanfeng/BioInf/Project/qu.fasta*** -------------------- WARNING --------------------- MSG: unable to find and set query length Why does this happen? $|=1; my $fasta = "/usr/local/bin/fasta34"; my $library = "/home/yanfeng/BioInf/Project/mun_lab.fasta"; my $query = "/home/yanfeng/BioInf/Project/qu.fasta"; my $options = "-E 0.01 -m 9 -d 0 "; my $command = "$fasta $options $query $library"; print "Start running\n$command\n"; open($fh,"$command |"); my $searchio = Bio::SearchIO->new(-format => 'fasta', -fh => $fh); if( my $r = $searchio->next_result ) { if ( my $hit = $r->next_hit ) { # only want the BEST hit in this SIMPLE example if( my $hsp = $hit->next_hsp ) { # only one HSP per hit for FASTA,change for BLAST print "location is ", $hsp->hit->location->to_FTstring(), " ", $hsp->hit->length, " nt long\n"; } } } -- From qfdong at iastate.edu Mon Apr 4 17:47:07 2005 From: qfdong at iastate.edu (Qunfeng) Date: Mon Apr 4 17:41:11 2005 Subject: [Bioperl-l] pubmed In-Reply-To: References: <6.1.2.0.2.20050404135713.03959dd0@qfdong.mail.iastate.edu> Message-ID: <6.1.2.0.2.20050404164507.022ac960@qfdong.mail.iastate.edu> Brain, My problem is that none of them returned a Pubmed id. Qunfeng At 03:07 PM 4/4/2005, Brian Osborne wrote: >Qunfeng, > >Only 1 of the 5 references in the 56961711 entry has a Pubmed id, the rest >will return nothing when you try $value->pubmed. > >Brian O. > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng >Sent: Monday, April 04, 2005 2:58 PM >To: Bioperl >Subject: Re: [Bioperl-l] pubmed > > >so, I tried to use > my $authors = $hash_ref->{'authors'}; > my $medline = $hash_ref->{'medline'}; > my $pubmed = $hash_ref->{'pubmed'}; > >to parse out authors, medline, pubmed. > >I was able to successfully parse out authors and medline but not pubmed. > >Then I tried to use > > my $authors = $value->authors(); > my $medline = $value->medline(); > my $pubmed = $value->pubmed(); > >and I got the same thing. > >Qunfeng > >At 07:50 PM 4/2/2005, Hilmar Lapp wrote: > >So what is the result of this script that you wouldn't have expected or > >that is not giving you what you need? > > > >BTW annotation objects under the tagname 'reference' are usually > >Bio::Annotation::Reference objects and have methods $ref->authors(), > >$ref->pubmed(), $ref->medline, etc. Check the POD. > > > > -hilmar > > > >On Friday, April 1, 2005, at 12:04 PM, Qunfeng wrote: > > > >>Hilmar and Paulo, > >> > >>I apologize for that, > >> > >>here is a snippet of my code, I must have missed something very simple. > >>Thanks for your help! -- Qunfeng > >> > >>#!/usr/bin/perl -w > >>use strict; > >>use Bio::SeqIO; > >> > >>my $inputGBfile = $ARGV[0]; > >>my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", > >> '-format' => 'GenBank'); > >> > >>my $seq_object; > >>while (1){ > >> eval{ > >> $seq_object = $seqio_object->next_seq; > >> }; > >> if($@){ > >> print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; > >> next; > >> } > >> last if(!defined $seq_object); > >> my $gi = $seq_object->primary_id; > >> my $anno_collection = $seq_object->annotation; > >> foreach my $key ( $anno_collection->get_all_annotation_keys ) { > >> my @annotations = $anno_collection->get_Annotations($key); > >> foreach my $value ( @annotations ) { > >> if($value->tagname eq "reference"){ > >> my $hash_ref = $value->hash_tree; > >> my $authors = $hash_ref->{'authors'}; > >> my $medline = $hash_ref->{'medline'}; > >> my $pubmed = $hash_ref->{'pubmed'}; > >> print STDERR > >> "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; > >> } > >> } > >> } > >>} > >> > >> > >> > >>At 03:10 AM 4/1/2005, you wrote: > >>>*please* people always post the code or ideally a small snippet that > >>>demonstrates what you were trying to do, and post the result and if it's > >>>not an exception why it is not the result you expected. DO NOT just say > >>>'blah doesn't work for me'. Whenever someone needs to guess what you > >>>probably did and what you probably mean you are wasting other people's >time. > >>> > >>>The GI# you have has multiple refs with one having a pubmed ID and none > >>>having a medline ID. So, the one ref that has a pubmed ID should return > >>>it from $ref->pubmed() but without any code snippet it is impossible to > >>>tell what you actually did and what therefore might be the problem. > >>> > >>> -hilmar > >>> > >>>On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: > >>> > >>>>Hi there, > >>>> > >>>>http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > >>>> > >>>>I am not very familiar with BioPerl. I tried to follow the example > >>>>showing in the above page to retrieve pubmed ID under each Reference > >>>>tag , i.e., $value->pubmed(), but it doesn't work for me for the seq > >>>>gi#56961711. The authors() works for me. Appreciate any >>> >suggestions. > >>>> > >>>>Qunfeng > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l@portal.open-bio.org > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>-- > >>>------------------------------------------------------------- > >>>Hilmar Lapp email: lapp at gnf.org > >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >>>------------------------------------------------------------- > >> > >-- > >------------------------------------------------------------- > >Hilmar Lapp email: lapp at gnf.org > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >------------------------------------------------------------- > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Mon Apr 4 18:18:02 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Apr 4 18:12:46 2005 Subject: [Bioperl-l] pubmed In-Reply-To: <6.1.2.0.2.20050404164507.022ac960@qfdong.mail.iastate.edu> Message-ID: Qunfeng, SeqIO parses this entry correctly, using this code: use strict; use Bio::DB::GenBank; my $db = new Bio::DB::GenBank; my $seq = $db->get_Seq_by_id(56961711); my $ac = $seq->annotation; for my $ref ($ac->get_Annotations('reference')) { print $ref->pubmed; } It looks like your code should work as well but since you didn't show us your complete code using the variable $value it's hard to see where the problem is. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng Sent: Monday, April 04, 2005 5:47 PM To: Bioperl Subject: RE: [Bioperl-l] pubmed Brain, My problem is that none of them returned a Pubmed id. Qunfeng At 03:07 PM 4/4/2005, Brian Osborne wrote: >Qunfeng, > >Only 1 of the 5 references in the 56961711 entry has a Pubmed id, the rest >will return nothing when you try $value->pubmed. > >Brian O. > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng >Sent: Monday, April 04, 2005 2:58 PM >To: Bioperl >Subject: Re: [Bioperl-l] pubmed > > >so, I tried to use > my $authors = $hash_ref->{'authors'}; > my $medline = $hash_ref->{'medline'}; > my $pubmed = $hash_ref->{'pubmed'}; > >to parse out authors, medline, pubmed. > >I was able to successfully parse out authors and medline but not pubmed. > >Then I tried to use > > my $authors = $value->authors(); > my $medline = $value->medline(); > my $pubmed = $value->pubmed(); > >and I got the same thing. > >Qunfeng > >At 07:50 PM 4/2/2005, Hilmar Lapp wrote: > >So what is the result of this script that you wouldn't have expected or > >that is not giving you what you need? > > > >BTW annotation objects under the tagname 'reference' are usually > >Bio::Annotation::Reference objects and have methods $ref->authors(), > >$ref->pubmed(), $ref->medline, etc. Check the POD. > > > > -hilmar > > > >On Friday, April 1, 2005, at 12:04 PM, Qunfeng wrote: > > > >>Hilmar and Paulo, > >> > >>I apologize for that, > >> > >>here is a snippet of my code, I must have missed something very simple. > >>Thanks for your help! -- Qunfeng > >> > >>#!/usr/bin/perl -w > >>use strict; > >>use Bio::SeqIO; > >> > >>my $inputGBfile = $ARGV[0]; > >>my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", > >> '-format' => 'GenBank'); > >> > >>my $seq_object; > >>while (1){ > >> eval{ > >> $seq_object = $seqio_object->next_seq; > >> }; > >> if($@){ > >> print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; > >> next; > >> } > >> last if(!defined $seq_object); > >> my $gi = $seq_object->primary_id; > >> my $anno_collection = $seq_object->annotation; > >> foreach my $key ( $anno_collection->get_all_annotation_keys ) { > >> my @annotations = $anno_collection->get_Annotations($key); > >> foreach my $value ( @annotations ) { > >> if($value->tagname eq "reference"){ > >> my $hash_ref = $value->hash_tree; > >> my $authors = $hash_ref->{'authors'}; > >> my $medline = $hash_ref->{'medline'}; > >> my $pubmed = $hash_ref->{'pubmed'}; > >> print STDERR > >> "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; > >> } > >> } > >> } > >>} > >> > >> > >> > >>At 03:10 AM 4/1/2005, you wrote: > >>>*please* people always post the code or ideally a small snippet that > >>>demonstrates what you were trying to do, and post the result and if it's > >>>not an exception why it is not the result you expected. DO NOT just say > >>>'blah doesn't work for me'. Whenever someone needs to guess what you > >>>probably did and what you probably mean you are wasting other people's >time. > >>> > >>>The GI# you have has multiple refs with one having a pubmed ID and none > >>>having a medline ID. So, the one ref that has a pubmed ID should return > >>>it from $ref->pubmed() but without any code snippet it is impossible to > >>>tell what you actually did and what therefore might be the problem. > >>> > >>> -hilmar > >>> > >>>On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: > >>> > >>>>Hi there, > >>>> > >>>>http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > >>>> > >>>>I am not very familiar with BioPerl. I tried to follow the example > >>>>showing in the above page to retrieve pubmed ID under each Reference > >>>>tag , i.e., $value->pubmed(), but it doesn't work for me for the seq > >>>>gi#56961711. The authors() works for me. Appreciate any >>> >suggestions. > >>>> > >>>>Qunfeng > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l@portal.open-bio.org > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>-- > >>>------------------------------------------------------------- > >>>Hilmar Lapp email: lapp at gnf.org > >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >>>------------------------------------------------------------- > >> > >-- > >------------------------------------------------------------- > >Hilmar Lapp email: lapp at gnf.org > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >------------------------------------------------------------- > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Mon Apr 4 12:56:03 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Apr 4 18:51:43 2005 Subject: [Bioperl-l] drawing additional axes with GD::Graph In-Reply-To: <2598.199.3.136.4.1110634542.squirrel@webmail.vbi.vt.edu> References: <2598.199.3.136.4.1110634542.squirrel@webmail.vbi.vt.edu> Message-ID: <200504041256.04254.lstein@cshl.edu> That isn't currently supported. If you want to add thise functionality, you're more than welcome to contribute to the code base! Lincoln On Saturday 12 March 2005 08:35 am, Sucheta Tripathy wrote: > Dear group, > > I am trying to have 2 more horizontal lines to the plot I have, > using GD::Graph.For example the X,Y origin be 0,0 and additional > horizontal lines at y value .78 and .84. Please suggest me > something other than $im->line(). > > Thanks > > Sucheta -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050404/1bf75a5c/attachment.bin From millerj at bcm.tmc.edu Mon Apr 4 13:44:51 2005 From: millerj at bcm.tmc.edu (Jonathan Miller) Date: Mon Apr 4 18:51:45 2005 Subject: [Bioperl-l] Re: Bio::Index:EMBL on embl flatfiles In-Reply-To: Message-ID: More specifically, the goal is to BLAST a (local) fasta file; find the sequence location, and look up its annotation in a (local) EMBL flatfile. So, for example, for honeybee fasta file from EMBL: Apis_mellifera.AMEL1.1.mar.dna.contig.fa, the fasta header of the contig where the sequence is found might be: >Contig18.1.1312 dna:contig scaffold:AMEL1.1:Group1.1:1:1312:1 Now I want look up the annotation in the EMBL flat files, (for example, Apis_mellifera.0.dat), that I have indexed using Bio::Index:EMBL. However, the accession numbers in the EMBL flat files have the form: scaffold:AMEL1.1:Group1.10:1:348491:1 and apparently "scaffold:AMEL1.1:Group1.1:1:1312:1" never appears in an EMBL flat file. I don't know if I should be indexing differently, searching on a different key, or what. With NCBI fasta and GenBank flat files, this procedure was straightfoward (e.g. no thought was required) to implement successfully. Presumably there is an analogous interface for the EMBL format? From qfdong at iastate.edu Tue Apr 5 11:33:04 2005 From: qfdong at iastate.edu (Qunfeng) Date: Tue Apr 5 11:29:39 2005 Subject: [Bioperl-l] pubmed In-Reply-To: References: <6.1.2.0.2.20050404164507.022ac960@qfdong.mail.iastate.edu> Message-ID: <6.1.2.0.2.20050405102233.0209eec0@qfdong.mail.iastate.edu> Brian, Thanks for your kind help. The following is my complete code. It reads a GenBank file as input and prints out authors, medline but not pubmed. I also tried your code and it doesn't work for me either. I tried your code with two of my machines machine 1). installed bioperl-1.2.2 $ uname -a Linux 2.4.20-8bigmem #1 SMP Thu Mar 13 17:32:29 EST 2003 i686 i686 i386 GNU/Linux machine 2). installed bioperl-1.4 $ uname -a Linux 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 i686 i686 i386 GNU/Linux -------------------------------------------------------------------------------------------------------------------- #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $inputGBfile = $ARGV[0]; my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", '-format' => 'GenBank'); my $seq_object; while (1){ eval{ $seq_object = $seqio_object->next_seq; }; if($@){ print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; next; } last if(!defined $seq_object); my $gi = $seq_object->primary_id; my $anno_collection = $seq_object->annotation; foreach my $key ( $anno_collection->get_all_annotation_keys ) { my @annotations = $anno_collection->get_Annotations($key); foreach my $value ( @annotations ) { if($value->tagname eq "reference"){ my $authors = $value->authors(); my $medline = $value->medline(); my $pubmed = $value->pubmed(); print "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; }#end if }#end inner for }#end outer for }#end while ------------------------------------------------------------------------------------------------------------------------------------- At 05:18 PM 4/4/2005, Brian Osborne wrote: >Qunfeng, > >SeqIO parses this entry correctly, using this code: > >use strict; >use Bio::DB::GenBank; > >my $db = new Bio::DB::GenBank; >my $seq = $db->get_Seq_by_id(56961711); >my $ac = $seq->annotation; >for my $ref ($ac->get_Annotations('reference')) { > print $ref->pubmed; >} > >It looks like your code should work as well but since you didn't show us >your complete code using the variable $value it's hard to see where the >problem is. > > >Brian O. > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng >Sent: Monday, April 04, 2005 5:47 PM >To: Bioperl >Subject: RE: [Bioperl-l] pubmed > > >Brain, > >My problem is that none of them returned a Pubmed id. > >Qunfeng > >At 03:07 PM 4/4/2005, Brian Osborne wrote: > >Qunfeng, > > > >Only 1 of the 5 references in the 56961711 entry has a Pubmed id, the rest > >will return nothing when you try $value->pubmed. > > > >Brian O. > > > >-----Original Message----- > >From: bioperl-l-bounces@portal.open-bio.org > >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng > >Sent: Monday, April 04, 2005 2:58 PM > >To: Bioperl > >Subject: Re: [Bioperl-l] pubmed > > > > > >so, I tried to use > > my $authors = $hash_ref->{'authors'}; > > my $medline = $hash_ref->{'medline'}; > > my $pubmed = $hash_ref->{'pubmed'}; > > > >to parse out authors, medline, pubmed. > > > >I was able to successfully parse out authors and medline but not pubmed. > > > >Then I tried to use > > > > my $authors = $value->authors(); > > my $medline = $value->medline(); > > my $pubmed = $value->pubmed(); > > > >and I got the same thing. > > > >Qunfeng > > > >At 07:50 PM 4/2/2005, Hilmar Lapp wrote: > > >So what is the result of this script that you wouldn't have expected or > > >that is not giving you what you need? > > > > > >BTW annotation objects under the tagname 'reference' are usually > > >Bio::Annotation::Reference objects and have methods $ref->authors(), > > >$ref->pubmed(), $ref->medline, etc. Check the POD. > > > > > > -hilmar > > > > > >On Friday, April 1, 2005, at 12:04 PM, Qunfeng wrote: > > > > > >>Hilmar and Paulo, > > >> > > >>I apologize for that, > > >> > > >>here is a snippet of my code, I must have missed something very simple. > > >>Thanks for your help! -- Qunfeng > > >> > > >>#!/usr/bin/perl -w > > >>use strict; > > >>use Bio::SeqIO; > > >> > > >>my $inputGBfile = $ARGV[0]; > > >>my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", > > >> '-format' => 'GenBank'); > > >> > > >>my $seq_object; > > >>while (1){ > > >> eval{ > > >> $seq_object = $seqio_object->next_seq; > > >> }; > > >> if($@){ > > >> print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; > > >> next; > > >> } > > >> last if(!defined $seq_object); > > >> my $gi = $seq_object->primary_id; > > >> my $anno_collection = $seq_object->annotation; > > >> foreach my $key ( $anno_collection->get_all_annotation_keys ) { > > >> my @annotations = $anno_collection->get_Annotations($key); > > >> foreach my $value ( @annotations ) { > > >> if($value->tagname eq "reference"){ > > >> my $hash_ref = $value->hash_tree; > > >> my $authors = $hash_ref->{'authors'}; > > >> my $medline = $hash_ref->{'medline'}; > > >> my $pubmed = $hash_ref->{'pubmed'}; > > >> print STDERR > > >> "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; > > >> } > > >> } > > >> } > > >>} > > >> > > >> > > >> > > >>At 03:10 AM 4/1/2005, you wrote: > > >>>*please* people always post the code or ideally a small snippet that > > >>>demonstrates what you were trying to do, and post the result and if >it's > > >>>not an exception why it is not the result you expected. DO NOT just say > > >>>'blah doesn't work for me'. Whenever someone needs to guess what you > > >>>probably did and what you probably mean you are wasting other people's > >time. > > >>> > > >>>The GI# you have has multiple refs with one having a pubmed ID and none > > >>>having a medline ID. So, the one ref that has a pubmed ID should return > > >>>it from $ref->pubmed() but without any code snippet it is impossible to > > >>>tell what you actually did and what therefore might be the problem. > > >>> > > >>> -hilmar > > >>> > > >>>On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: > > >>> > > >>>>Hi there, > > >>>> > > >>>>http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > > >>>> > > >>>>I am not very familiar with BioPerl. I tried to follow the example > > >>>>showing in the above page to retrieve pubmed ID under each Reference > > >>>>tag , i.e., $value->pubmed(), but it doesn't work for me for the seq > > >>>>gi#56961711. The authors() works for me. Appreciate any >>> > >suggestions. > > >>>> > > >>>>Qunfeng > > >>>>_______________________________________________ > > >>>>Bioperl-l mailing list > > >>>>Bioperl-l@portal.open-bio.org > > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >>>-- > > >>>------------------------------------------------------------- > > >>>Hilmar Lapp email: lapp at gnf.org > > >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > >>>------------------------------------------------------------- > > >> > > >-- > > >------------------------------------------------------------- > > >Hilmar Lapp email: lapp at gnf.org > > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > >------------------------------------------------------------- > > > > > > > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From millerj at bcm.tmc.edu Mon Apr 4 19:44:36 2005 From: millerj at bcm.tmc.edu (Jonathan Miller) Date: Tue Apr 5 11:32:42 2005 Subject: [Bioperl-l] newbie needs help with "preferred_id_type" In-Reply-To: Message-ID: Below, I want write_seq to use the accession, rather than the default. --------------------------------------------- #!/usr/bin/perl use Bio::SeqIO; my $infile=shift; $in = Bio::SeqIO->new(-file => $infile, -format => 'GenBank'); my $outfile= ">" . $infile . ".fa"; $out = Bio::SeqIO->new(-file => $outfile, -format => 'fasta' ); while ( $seq = $in->next_seq() ) { print $seq->accession,"\n"; $out->write_seq($seq); } ------------------------------------------------ So, Going to the Bio::Seq docs, I see: --------------------------------------------------------- preferred_id_type code top prev next Title : preferred_id_type Usage : $obj->preferred_id_type('accession') Function: Get/Set the preferred type of identifier to use in the ">ID" position for FASTA output. Returns : string, one of values defined in @Bio::SeqIO::fasta::SEQ_ID_TYPES. Default = $Bio::SeqIO::fasta::DEFAULT_SEQ_ID_TYPE ('display'). Args : string when setting. This must be one of values defined in @Bio::SeqIO::fasta::SEQ_ID_TYPES. Allowable values: accession, accession.version, display, primary Throws : fatal exception if the supplied id type is not in @SEQ_ID_TYPES. ---------------------------------------------------------- but I have so far been unable to figure out how in fact to set the SEQ_ID_TYPE . From millerj at bcm.tmc.edu Tue Apr 5 00:29:42 2005 From: millerj at bcm.tmc.edu (Jonathan Miller) Date: Tue Apr 5 11:32:43 2005 Subject: [Bioperl-l] Re: newbie needs help with read/write Genbank In-Reply-To: Message-ID: One might have hoped this code would read and write a Genbank file, unchanged. But no. how then? ------------------------------------------------------- use Bio::SeqIO; my $infile=shift; $in = Bio::SeqIO->new(-file => $infile, -format => 'GenBank' ); my $outfile= ">" . $infile . ".gbk"; $out = Bio::SeqIO->new(-file => $outfile, -format => 'GenBank' ); while ( $seq = $in->next_seq() ) { $out->write_seq($seq); } From walsh at cenix-bioscience.com Tue Apr 5 12:59:04 2005 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Tue Apr 5 12:53:00 2005 Subject: [Bioperl-l] newbie needs help with "preferred_id_type" In-Reply-To: References: Message-ID: <4252C3D8.9080400@cenix-bioscience.com> Hello, I believe that if you set the display_id (or desc?) attribute to be the accession, you will get only the accession in the fasta header. while ( $seq = $in->next_seq() ) { print $seq->accession,"\n"; $seq->display_id($seq->accession); # or ? $seq->desc($seq->accession); $out->write_seq($seq); } One of those 2 should do the trick. Andrew Jonathan Miller wrote: > Below, I want write_seq to use the accession, > rather than the default. > --------------------------------------------- > #!/usr/bin/perl > use Bio::SeqIO; > > my $infile=shift; > $in = Bio::SeqIO->new(-file => $infile, > -format => 'GenBank'); > my $outfile= ">" . $infile . ".fa"; > $out = Bio::SeqIO->new(-file => $outfile, > -format => 'fasta' > ); > > while ( $seq = $in->next_seq() ) { > print $seq->accession,"\n"; > $out->write_seq($seq); > } > ------------------------------------------------ > So, Going to the Bio::Seq docs, I see: > --------------------------------------------------------- > preferred_id_type code top prev next > > Title : preferred_id_type > Usage : $obj->preferred_id_type('accession') > Function: Get/Set the preferred type of identifier to use in the ">ID" > position > for FASTA output. > Returns : string, one of values defined in > @Bio::SeqIO::fasta::SEQ_ID_TYPES. > Default = $Bio::SeqIO::fasta::DEFAULT_SEQ_ID_TYPE ('display'). > Args : string when setting. This must be one of values defined in > @Bio::SeqIO::fasta::SEQ_ID_TYPES. Allowable values: > accession, accession.version, display, primary > Throws : fatal exception if the supplied id type is not in > @SEQ_ID_TYPES. > ---------------------------------------------------------- > but I have so far been unable to figure out how in fact to > set the SEQ_ID_TYPE . > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Tel. +49-351-4173 137 Fax +49-351-4173 109 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From MEC at Stowers-Institute.org Tue Apr 5 13:08:38 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Tue Apr 5 13:04:11 2005 Subject: [Bioperl-l] newbie needs help with "preferred_id_type" Message-ID: <200504051702.j35H2SfY016138@portal.open-bio.org> hmmm - this is new to me let's see, on my bioperl 1.5 installation, the unix one-liner perl -MBio::SeqIO::fasta -e 'print "@Bio::SeqIO::fasta::SEQ_ID_TYPES"' returns accession accession.version display primary so, I think you should be able to add, for instance $seq->preferred_id_type('display') or, in your case $seq->preferred_id_type('accession') just before your call to write_seq try it! Malcolm Cook - mec@stowers-institute.org - 816-926-4449 Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, MO USA -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jonathan Miller Sent: Monday, April 04, 2005 6:45 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] newbie needs help with "preferred_id_type" Below, I want write_seq to use the accession, rather than the default. --------------------------------------------- #!/usr/bin/perl use Bio::SeqIO; my $infile=shift; $in = Bio::SeqIO->new(-file => $infile, -format => 'GenBank'); my $outfile= ">" . $infile . ".fa"; $out = Bio::SeqIO->new(-file => $outfile, -format => 'fasta' ); while ( $seq = $in->next_seq() ) { print $seq->accession,"\n"; $out->write_seq($seq); } ------------------------------------------------ So, Going to the Bio::Seq docs, I see: --------------------------------------------------------- preferred_id_type code top prev next Title : preferred_id_type Usage : $obj->preferred_id_type('accession') Function: Get/Set the preferred type of identifier to use in the ">ID" position for FASTA output. Returns : string, one of values defined in @Bio::SeqIO::fasta::SEQ_ID_TYPES. Default = $Bio::SeqIO::fasta::DEFAULT_SEQ_ID_TYPE ('display'). Args : string when setting. This must be one of values defined in @Bio::SeqIO::fasta::SEQ_ID_TYPES. Allowable values: accession, accession.version, display, primary Throws : fatal exception if the supplied id type is not in @SEQ_ID_TYPES. ---------------------------------------------------------- but I have so far been unable to figure out how in fact to set the SEQ_ID_TYPE . _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From walsh at cenix-bioscience.com Tue Apr 5 14:47:25 2005 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Tue Apr 5 14:41:26 2005 Subject: [Bioperl-l] Sorting BLAST Output In-Reply-To: <425168C7.80105@bioanalysis.org> References: <425168C7.80105@bioanalysis.org> Message-ID: <4252DD3D.5070804@cenix-bioscience.com> Hello, One 'easy' way to do this is to build an array of hashes with the hits and whatever feature you are interested. It's a pure perl implementation. I don't think the API for the Bioperl search result object supports the sorting you want to do, but I could be wrong. my @hashes; for my $hit (@your_hits) { my $len = get_aln_len($hit); my $num_mis = get_num_mis($hit); push @hashes, { hit => $hit, len => $len, num_mis => $num_mis }; } my @sorted = sort by_len_and_num_mis @hashes; sub by_len_and_num_mis { $a->{len} <=> $b->{len} || $a->{num_mis} <=> $b->{num_mis} } Andrew Waibhav Tembe wrote: > Hello List, > > I was wondering if there is any easy way to "sort" the hits in a blast > output based on something other than the default sort key E value/bits. > For example, I would like to sort the hits based on the number of > mismatches or length of the alignments reported. > > I considered blast2table utility. But I would like to retain all the > details in the BLAST output. > > Thanks. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Tel. +49-351-4173 137 Fax +49-351-4173 109 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From hlapp at gmx.net Tue Apr 5 14:54:42 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Apr 5 14:48:13 2005 Subject: [Bioperl-l] newbie needs help with "preferred_id_type" In-Reply-To: <200504051702.j35H2SfY016138@portal.open-bio.org> Message-ID: <2BBBE2F8-A604-11D9-8121-000A959EB4C4@gmx.net> This is a property of the SeqIO stream, so you need to set it on the stream object: $seqio = Bio:SeqIO->new(-format=>'fasta',-file=>"preferred_id_type('accession'); Hth, -hilmar On Tuesday, April 5, 2005, at 10:08 AM, Cook, Malcolm wrote: > hmmm - this is new to me > > let's see, on my bioperl 1.5 installation, the unix one-liner > > perl -MBio::SeqIO::fasta -e 'print "@Bio::SeqIO::fasta::SEQ_ID_TYPES"' > > returns > > accession accession.version display primary > > so, I think you should be able to add, for instance > > $seq->preferred_id_type('display') > > or, in your case > > $seq->preferred_id_type('accession') > > just before your call to write_seq > > try it! > > Malcolm Cook - mec@stowers-institute.org - 816-926-4449 > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, MO USA > > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jonathan > Miller > Sent: Monday, April 04, 2005 6:45 PM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] newbie needs help with "preferred_id_type" > > > > Below, I want write_seq to use the accession, > rather than the default. > --------------------------------------------- > #!/usr/bin/perl > use Bio::SeqIO; > > my $infile=shift; > $in = Bio::SeqIO->new(-file => $infile, > -format => 'GenBank'); > my $outfile= ">" . $infile . ".fa"; > $out = Bio::SeqIO->new(-file => $outfile, > -format => 'fasta' > ); > > while ( $seq = $in->next_seq() ) { > print $seq->accession,"\n"; > $out->write_seq($seq); > } > ------------------------------------------------ > So, Going to the Bio::Seq docs, I see: > --------------------------------------------------------- > preferred_id_type code top prev next > > Title : preferred_id_type > Usage : $obj->preferred_id_type('accession') > Function: Get/Set the preferred type of identifier to use in the ">ID" > position > for FASTA output. > Returns : string, one of values defined in > @Bio::SeqIO::fasta::SEQ_ID_TYPES. > Default = $Bio::SeqIO::fasta::DEFAULT_SEQ_ID_TYPE > ('display'). > Args : string when setting. This must be one of values defined in > @Bio::SeqIO::fasta::SEQ_ID_TYPES. Allowable values: > accession, accession.version, display, primary > Throws : fatal exception if the supplied id type is not in > @SEQ_ID_TYPES. > ---------------------------------------------------------- > but I have so far been unable to figure out how in fact to > set the SEQ_ID_TYPE . > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Tue Apr 5 14:59:03 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Apr 5 14:52:33 2005 Subject: [Bioperl-l] Re: newbie needs help with read/write Genbank In-Reply-To: Message-ID: Since you are not posting an exception I'm assuming there is none; instead you observe that bioperl does not 100% round-trip. This is a known problem that hasn't received priority high enough so that somebody would volunteer to address it. What we've focused on in this context is to make sure that Bioperl can read files that it writes without barfing, not that the result will match the origin 100%. You are welcome to address this if it is important to you. -hilmar On Monday, April 4, 2005, at 09:29 PM, Jonathan Miller wrote: > > One might have hoped this code would read and write > a Genbank file, unchanged. But no. how then? > ------------------------------------------------------- > > use Bio::SeqIO; > > my $infile=shift; > $in = Bio::SeqIO->new(-file => $infile, > -format => 'GenBank' > ); > my $outfile= ">" . $infile . ".gbk"; > $out = Bio::SeqIO->new(-file => $outfile, > -format => 'GenBank' > ); > > while ( $seq = $in->next_seq() ) { > $out->write_seq($seq); > } > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Tue Apr 5 15:11:37 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Apr 5 15:05:51 2005 Subject: [Bioperl-l] newbie needs help with "preferred_id_type" In-Reply-To: Message-ID: <88D42074-A606-11D9-8121-000A959EB4C4@gmx.net> Right, because in Genbank format the choice of what to use for locus is predetermined (by NCBI). This property will only exist if you chose fasta format. -hilmar On Tuesday, April 5, 2005, at 12:02 PM, Jonathan Miller wrote: > > This works, thank you very much; > suppose however I wanted to write in "GenBank" format; > in this case I get an error message: > > Can't locate object method "preferred_id_type" via package > "Bio::SeqIO::genbank" at malcolm.pl line 13. > > many thanks > > jm > > On Tue, 5 Apr 2005, Hilmar Lapp wrote: > >> This is a property of the SeqIO stream, so you need to set it on the >> stream object: >> >> $seqio = Bio:SeqIO->new(-format=>'fasta',-file=>"> $seqio->preferred_id_type('accession'); >> >> Hth, >> >> -hilmar >> >> On Tuesday, April 5, 2005, at 10:08 AM, Cook, Malcolm wrote: >> >>> hmmm - this is new to me >>> >>> let's see, on my bioperl 1.5 installation, the unix one-liner >>> >>> perl -MBio::SeqIO::fasta -e 'print >>> "@Bio::SeqIO::fasta::SEQ_ID_TYPES"' >>> >>> returns >>> >>> accession accession.version display primary >>> >>> so, I think you should be able to add, for instance >>> >>> $seq->preferred_id_type('display') >>> >>> or, in your case >>> >>> $seq->preferred_id_type('accession') >>> >>> just before your call to write_seq >>> >>> try it! >>> >>> Malcolm Cook - mec@stowers-institute.org - 816-926-4449 >>> Database Applications Manager - Bioinformatics >>> Stowers Institute for Medical Research - Kansas City, MO USA >>> >>> >>> >>> -----Original Message----- >>> From: bioperl-l-bounces@portal.open-bio.org >>> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jonathan >>> Miller >>> Sent: Monday, April 04, 2005 6:45 PM >>> To: bioperl-l@bioperl.org >>> Subject: [Bioperl-l] newbie needs help with "preferred_id_type" >>> >>> >>> >>> Below, I want write_seq to use the accession, >>> rather than the default. >>> --------------------------------------------- >>> #!/usr/bin/perl >>> use Bio::SeqIO; >>> >>> my $infile=shift; >>> $in = Bio::SeqIO->new(-file => $infile, >>> -format => 'GenBank'); >>> my $outfile= ">" . $infile . ".fa"; >>> $out = Bio::SeqIO->new(-file => $outfile, >>> -format => 'fasta' >>> ); >>> >>> while ( $seq = $in->next_seq() ) { >>> print $seq->accession,"\n"; >>> $out->write_seq($seq); >>> } >>> ------------------------------------------------ >>> So, Going to the Bio::Seq docs, I see: >>> --------------------------------------------------------- >>> preferred_id_type code top prev next >>> >>> Title : preferred_id_type >>> Usage : $obj->preferred_id_type('accession') >>> Function: Get/Set the preferred type of identifier to use in the >>> ">ID" >>> position >>> for FASTA output. >>> Returns : string, one of values defined in >>> @Bio::SeqIO::fasta::SEQ_ID_TYPES. >>> Default = $Bio::SeqIO::fasta::DEFAULT_SEQ_ID_TYPE >>> ('display'). >>> Args : string when setting. This must be one of values defined in >>> @Bio::SeqIO::fasta::SEQ_ID_TYPES. Allowable values: >>> accession, accession.version, display, primary >>> Throws : fatal exception if the supplied id type is not in >>> @SEQ_ID_TYPES. >>> ---------------------------------------------------------- >>> but I have so far been unable to figure out how in fact to >>> set the SEQ_ID_TYPE . >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From tc.jones at jones.tc Tue Apr 5 17:00:34 2005 From: tc.jones at jones.tc (Terry Jones) Date: Tue Apr 5 16:55:29 2005 Subject: [Bioperl-l] Sorting BLAST Output In-Reply-To: Your message at 20:47:25 on Tuesday, 5 April 2005 References: <425168C7.80105@bioanalysis.org> <4252DD3D.5070804@cenix-bioscience.com> Message-ID: <16978.64626.212155.386098@terry.jones.tc> Just a quick comment on this: | One 'easy' way to do this is to build an array of hashes with the hits | and whatever feature you are interested. It's a pure perl | implementation. I don't think the API for the Bioperl search result | object supports the sorting you want to do, but I could be wrong. | | my @hashes; | for my $hit (@your_hits) { | my $len = get_aln_len($hit); | my $num_mis = get_num_mis($hit); | push @hashes, { hit => $hit, len => $len, num_mis => $num_mis }; | } | | my @sorted = sort by_len_and_num_mis @hashes; | | sub by_len_and_num_mis { | $a->{len} <=> $b->{len} || | $a->{num_mis} <=> $b->{num_mis} | } In the end, @sorted is a sorted array of references to hashes, which is maybe not what you were expecting. You can get at the things in that array via e.g., for my $hit (@sorted){ print "$hit->{hit}\n"; } Andrew was likely writing nice and understandable code for you, which is good (of course). It would be a bit faster to use an anonymous array rather than an anonymous hash. The @hashes array is left lying around, so you might want to undef it if you're not doing this in a subroutine. Also, the above can be done using a Schwartzian Transform, which is more concise and far more cryptic (see google): my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] } map { [ $_, get_aln_length($_), get_num_mis($_) ] } @your_hits; This also leaves you with what you were likely expecting, an array of your original hits. If you're into perl programming, it's really worth getting your head around the Schwartzian Transform. Once you understand it, it's easy to write compact solutions to lots of problems like this. This processing is a lot like lisp. Unfortunately, it looks like line noise. Terry From Peter.Robinson at t-online.de Tue Apr 5 18:09:25 2005 From: Peter.Robinson at t-online.de (Peter.Robinson@t-online.de) Date: Tue Apr 5 18:01:23 2005 Subject: [Bioperl-l] Getting description of BLAST hits Message-ID: <20050405220925.GA4195@anna> Dear list, I am trying to get a list of all BLAST hits to various sequences using code adapted from the bioperl website. In essence, I would like to get the descriptions from lines such as gb|BC021898.1| Homo sapiens adaptor-related protein complex 1, s... 2956 0.0 Using the following code: while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\t", $hit->accession(), "\t", hit->description(),"\t"; leads to the error: Can't locate object method "description" via package "hit" (perhaps you forgot to load "hit"?) at blast2SpeciesList.pl line 60. However, the documentation for Bio::Search::Hit indicates that Hit objects should have a description() method. What am I misunderstanding? Thanks in advance for any tips. Just in case, I am pasting the complete code snippet at the bottom of this mail. Peter #!/usr/bin/perl -w use strict; #Remote-blast "factory object" creation and blast-parameter initialization use Bio::Tools::Run::RemoteBlast; my $prog = 'blastn'; my $db = 'nr'; my $e_val= '1e-10'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; #remove a parameter #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file=>'testSeqs.fa' , '-format' => 'fasta' ); while (my $input = $str->next_seq()){ #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); #my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\t", $result->query_accession(),"\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\t", $hit->accession(), "\t", hit->description(),"\t"; while( my $hsp = $hit->next_hsp ) { print "\te-val is ", $hsp->evalue, "\n"; last; } } } } } } From jason.stajich at duke.edu Tue Apr 5 22:18:35 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Apr 5 22:13:09 2005 Subject: [Bioperl-l] Getting description of BLAST hits In-Reply-To: <20050405220925.GA4195@anna> References: <20050405220925.GA4195@anna> Message-ID: what does print ref($hit), "\n"; say? What version of Bioperl? On Apr 5, 2005, at 6:09 PM, Peter.Robinson@t-online.de wrote: > Dear list, > > I am trying to get a list of all BLAST hits to various sequences using > code adapted from the bioperl website. In essence, I would like to get > the descriptions > from lines such as > > > gb|BC021898.1| Homo sapiens adaptor-related protein complex 1, s... > 2956 0.0 > > > > Using the following code: > > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\t", > $hit->accession(), "\t", hit->description(),"\t"; > > > leads to the error: > > Can't locate object method "description" via package "hit" (perhaps > you forgot to load "hit"?) at blast2SpeciesList.pl line 60. > > > However, the documentation for Bio::Search::Hit indicates that Hit > objects should have a description() method. > > What am I misunderstanding? > > Thanks in advance for any tips. Just in case, I am pasting the > complete code snippet at the bottom of this mail. > > Peter > > > #!/usr/bin/perl -w > use strict; > > > #Remote-blast "factory object" creation and blast-parameter > initialization > > use Bio::Tools::Run::RemoteBlast; > > my $prog = 'blastn'; > my $db = 'nr'; > my $e_val= '1e-10'; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > [ORGN]'; > #remove a parameter > #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new(-file=>'testSeqs.fa' , '-format' => 'fasta' > ); > > while (my $input = $str->next_seq()){ > #Blast a sequence against a database: > > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > my $r = $factory->submit_blast($input); > #my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\t", > $result->query_accession(),"\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\t", > $hit->accession(), "\t", hit->description(),"\t"; > > while( my $hsp = $hit->next_hsp ) { > print "\te-val is ", $hsp->evalue, "\n"; > last; > } > } > } > } > } > } > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From millerj at bcm.tmc.edu Tue Apr 5 15:02:16 2005 From: millerj at bcm.tmc.edu (Jonathan Miller) Date: Tue Apr 5 22:13:34 2005 Subject: [Bioperl-l] newbie needs help with "preferred_id_type" In-Reply-To: <2BBBE2F8-A604-11D9-8121-000A959EB4C4@gmx.net> Message-ID: This works, thank you very much; suppose however I wanted to write in "GenBank" format; in this case I get an error message: Can't locate object method "preferred_id_type" via package "Bio::SeqIO::genbank" at malcolm.pl line 13. many thanks jm On Tue, 5 Apr 2005, Hilmar Lapp wrote: > This is a property of the SeqIO stream, so you need to set it on the > stream object: > > $seqio = Bio:SeqIO->new(-format=>'fasta',-file=>" $seqio->preferred_id_type('accession'); > > Hth, > > -hilmar > > On Tuesday, April 5, 2005, at 10:08 AM, Cook, Malcolm wrote: > > > hmmm - this is new to me > > > > let's see, on my bioperl 1.5 installation, the unix one-liner > > > > perl -MBio::SeqIO::fasta -e 'print "@Bio::SeqIO::fasta::SEQ_ID_TYPES"' > > > > returns > > > > accession accession.version display primary > > > > so, I think you should be able to add, for instance > > > > $seq->preferred_id_type('display') > > > > or, in your case > > > > $seq->preferred_id_type('accession') > > > > just before your call to write_seq > > > > try it! > > > > Malcolm Cook - mec@stowers-institute.org - 816-926-4449 > > Database Applications Manager - Bioinformatics > > Stowers Institute for Medical Research - Kansas City, MO USA > > > > > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jonathan > > Miller > > Sent: Monday, April 04, 2005 6:45 PM > > To: bioperl-l@bioperl.org > > Subject: [Bioperl-l] newbie needs help with "preferred_id_type" > > > > > > > > Below, I want write_seq to use the accession, > > rather than the default. > > --------------------------------------------- > > #!/usr/bin/perl > > use Bio::SeqIO; > > > > my $infile=shift; > > $in = Bio::SeqIO->new(-file => $infile, > > -format => 'GenBank'); > > my $outfile= ">" . $infile . ".fa"; > > $out = Bio::SeqIO->new(-file => $outfile, > > -format => 'fasta' > > ); > > > > while ( $seq = $in->next_seq() ) { > > print $seq->accession,"\n"; > > $out->write_seq($seq); > > } > > ------------------------------------------------ > > So, Going to the Bio::Seq docs, I see: > > --------------------------------------------------------- > > preferred_id_type code top prev next > > > > Title : preferred_id_type > > Usage : $obj->preferred_id_type('accession') > > Function: Get/Set the preferred type of identifier to use in the ">ID" > > position > > for FASTA output. > > Returns : string, one of values defined in > > @Bio::SeqIO::fasta::SEQ_ID_TYPES. > > Default = $Bio::SeqIO::fasta::DEFAULT_SEQ_ID_TYPE > > ('display'). > > Args : string when setting. This must be one of values defined in > > @Bio::SeqIO::fasta::SEQ_ID_TYPES. Allowable values: > > accession, accession.version, display, primary > > Throws : fatal exception if the supplied id type is not in > > @SEQ_ID_TYPES. > > ---------------------------------------------------------- > > but I have so far been unable to figure out how in fact to > > set the SEQ_ID_TYPE . > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > From skirov at utk.edu Tue Apr 5 15:06:47 2005 From: skirov at utk.edu (Stefan A Kirov) Date: Tue Apr 5 22:13:35 2005 Subject: [Bioperl-l] Entrezgene parser: new parser available from sourceforge (attached too) (fwd) Message-ID: As often happens, NCBI introduced some small, but interesting changes to their ASN entrezgene format. Therefore Mingyi had to change the underlying low level parser. Anyone who uses his parser direcly will have to update. This will also delay the release of the first version of the bioperl entrezgene parser, which I anticipated to be on Thursday. I still hope I will commit the code on Friday. Stefan ---------- Forwarded message ---------- Date: Tue, 05 Apr 2005 13:35:57 -0400 From: Mingyi Liu To: Stefan A Kirov Subject: new parser available from sourceforge (attached too) Hi, Stefan, I attached the new version to this email. Unfortunately as expected, the new version is much slower due to the use of lookahead regexes (needed to accomodate the meaningless/buggy changes NCBI introduced). About 30% slower in my test. Still can't find a good reason why they introduced those 3 different types of changes. Can't be one bug. Anyways. Thanks for letting me know so early! Mingyi -------------- next part -------------- =head1 NAME GI::Parser::EntrezGene - Regular expression-based Perl Parser for NCBI Entrez Gene. =head1 SYNOPSIS use GI::Parser::EntrezGene; my $parser = GI::Parser::EntrezGene->new(); open(IN, "Homo_sapiens") || die "can't open the Entrez Gene human genome ASN.1 file! -- $!\n"; $/ = "Entrezgene ::= {"; while() { chomp; next unless /\S/; # parse the entry my $text = (/^\s*Entrezgene ::= ({.*)/si)? $1 : "{" . $_; my $value = $parser->parse($text, 2); # $value contains data structure for the # record being parsed. 2 indicates the recommended # trimming mode of the data structure } =head1 PREREQUISITE GI::Parser::EntrezGene requires a utitility module GI::Parser::Util, which can be downloaded at the same location as this module ( http://sourceforge.net/projects/egparser/ ). =head1 INSTALLATION Put EntrezGene.pm, Util.pm into your perl module directory (for example, if your Perl modules are located in /usr/lib/perl5/site_perl/5.6.1, then you should put EntrezGene.pm, Util.pm into /usr/lib/perl5/site_perl/5.6.1/GI/Parser directory). =head1 DESCRIPTION GI::Parser::EntrezGene is a regular expression-based Perl Parser for NCBI Entrez Gene genome databases ( http://www.ncbi.nih.gov/entrez/query.fcgi?db=gene ). It parses an ASN.1-formatted Entrez Gene record and returns a data structure that contains all data items from the gene record. As of March 7th, 2005, the parser version 1.0 was tested on Entrez Gene human, mouse and rat genome annotation files (which took around 660, 520, 195 seconds repectively to parse on one 2.4 GHz Intel Xeon processor). Note that the addition of validation and error reporting in 1.03 slows parser down to needing around 12 minutes (instead of the previous 11 minutes) to parse Human genome. V1.03 can process the "All_Data" file that contains all EntrezGene genomes in 98 minutes. =head1 SEE ALSO The parse_entrez_gene_example.pl is a very important and complete demo on using this module to extract all data items from Entrez Gene records. Do check it out in the package (included since an update to V1.04 release)! In fact, this script took me about 3-4 times more time to make for my project than the parser itself. Note that the included example script was edited to leave out project-specific stuff. For details on various parsers I generated for Entrez Gene, example scripts that uses/benchmarks the modules, please see http://sourceforge.net/projects/egparser/ or refer to our paper on them (see CITATION section). Note that GI::Parser::EntrezGene is the fastest module in the bunch. GI::Parser::EntrezGenePRD is the slowest (by far! when parsing long records). =head1 AUTHOR Dr. Mingyi Liu =head1 COPYRIGHT The GI::Parser::EntrezGene module and its related modules and scripts are copyright (c) 2005 Mingyi Liu, GPC Biotech AG and Altana Research Institute. All rights reserved. I created these modules when working on a collaboration project between these two companies. Therefore a special thanks for the two companies to allow the release of the code into public domain. You may use and distribute them under the terms of the GNU General Public License (GPL, http://www.gnu.org/copyleft/gpl.html ). =head1 CITATION Mingyi Liu and Andrei Grigoriev (2005) "Fast Parsers for Entrez Gene" Bioinformatics. Submitted (status subject to change until publication journal/issue is finalized). =head1 OPERATION SYSTEMS SUPPORTED Any OS that Perl runs on. =head1 CHANGE LOG =over =item * version 1.05: added support to parse the NCBI 4/5/2005 download, which inexplicably added a useless space before ',' on all lines, broke some lines into two yet condensed others (brackets) to one line. This unfortunately slows down my parser because I have to use lookahead regexes to fix the parser for this weird new format. I also fixed a minor mistake in error reporting function =item * version 1.04: added attempt at opening large file (2 GB) on Perl that does not support it; added 'file' option to new(); added file name in error reporting message; updated documentation =item * version 1.03: added validating capability such that anything that does not conform to the current NCBI Entrez Gene ASN.1 format would raise error and stops program. Position of the offending data item would be reported. =item * version 1.02: added input_file function that accepts filename input, and next_seq function that returns the next record =item * version 1.01: unescaped double quote escapes in double quoted strings =item * version 1.0: released =back =head1 METHODS =cut package GI::Parser::EntrezGene; use strict; use Carp qw(carp croak); use GI::Parser::Util; use vars qw ($VERSION); $VERSION = '1.05'; =head2 new Parameters: maxerrstr => 20 (optional) - maximum number of characters after offending element, used by error reporting, default is 20 file => $filename (optional) - name of the file to be parsed. call next_seq to parse! Example: my $parser = GI::Parser::EntrezGene->new(); Function: Instantiate a parser object Returns: Object reference Notes: =cut sub new { my $class = shift; $class = ref($class) if(ref($class)); my $self = { maxerrstr => 20, @_ }; bless $self, $class; $self->input_file($self->{file}) if($self->{file}); return $self; } =head2 maxerrstr Parameters: $maxerrstr (optional) - maximum number of characters after offending element, used by error reporting, default is 20 Example: $parser->maxerrstr(20); Function: get/set maxerrstr. Returns: maxerrstr. Notes: =cut sub maxerrstr { my ($self, $value) = @_; $self->{maxerrstr} = $value if $value > 0; return $self->{maxerrstr}; } =head2 parse Parameters: $string that contains Entrez Gene record, $trimopt (optional) that specifies how the data structure returned should be trimmed. 2 is recommended $noreset (optional) that species that line number should not be reset Example: my $value = $parser->parse($text, 2); Function: Takes in a string representing Entrez Gene record, parses the record and returns a data structure. Returns: A data structure containing all data items from the Entrez Gene record. Notes: DEPRECATED as external function!!! Do not call this function directly! $string should not contain 'EntrezGene ::=' at beginning! For details on how to use the $trimopt data trimming option please see comment for the GI::Parser::Util::compactds method. An option of 2 is recommended. =cut sub parse { my ($self, $input, $compact, $noreset) = @_; $input || croak "must have input!\n"; $self->{input} = $input; $self->{filename} = "input string" unless $self->{filename}; $self->{linenumber} = 1 unless $self->{linenumber} && $noreset; $self->{depth} = 0; my $result; eval { $result = $self->_parse(); # no need to reset $self->{depth} or linenumber }; if($@) { if($@ !~ /^Data Error:/) { croak "non-conforming data broke parser on line $self->{linenumber} in $self->{filename}\n". "possible cause includes randomly inserted brackets in input file before line $self->{linenumber}\n". "first $self->{maxerrstr} (or till end of input) characters including the non-conforming data:\n" . substr($self->{input}, pos($self->{input}), $self->{maxerrstr}) . "\nRaw error mesg: $@\n"; } else { die $@ } } compactds($result, $compact) if($compact && defined $result); return $result; } =head2 input_file Parameters: $filename for file that contains Entrez Gene record(s) Example: $parser->input_file($filename); Function: Takes in name of a file containing Entrez Gene records. opens the file and stores file handle Returns: none. Notes: Attemps to open file larger than 2 GB even on Perl that does not support 2 GB file (accomplished by calling "cat" and piping output. On OS that does not have "cat" error message will be displayed) =cut sub input_file { my ($self, $filename) = @_; # in case user's Perl system can't handle large file. Assuming Unix, otherwise raise error open($self->{fh}, $filename) || ($! =~ /too large/i && open($self->{fh}, "cat $filename |")) || croak "can't open $filename! -- $!\n"; $self->{filename} = $filename; } =head2 next_seq Parameters: $trimopt (optional) that specifies how the data structure returned should be trimmed. 2 is recommended Example: my $value = $parser->next_seq(2); Function: Use the file handle generated by input_file, parses the next the record and returns a data structure. Returns: A data structure containing all data items from the Entrez Gene record. Notes: Must pass in a filename through new() or input_file() first! For details on how to use the $trimopt data trimming option please see comment for the GI::Parser::Util::compactds method. An option of 2 is recommended. =cut sub next_seq { my ($self, $compact) = @_; local $/ = "Entrezgene ::= {"; # set record separator $self->{fh} || croak "you must pass in a file name through new() or input_file() first before calling next_seq!\n"; if($_ = readline $self->{fh}) { chomp; next unless /\S/; my $tmp = (/^\s*Entrezgene ::= ({.*)/si)? $1 : "{" . $_; # get rid of the 'Entrezgene ::= ' at the beginning of Entrez Gene record return $self->parse($tmp, $compact, 1); # 1 species no resetting line number } } # NCBI's Apr 05, 2005 format change forced much usage of lookahead, which would for # sure slows parser down. But can't code efficiently without it. sub _parse { my ($self, $flag) = @_; my $data; while(1) { # changing orders of regex if/elsif statements made little difference. current order is close to optimal if($self->{input} =~ /\G[ \t]*,?[ \t]*\n/cg) # cleanup leftover { $self->{linenumber}++; next; } if($self->{input} =~ /\G[ \t]*}/cg) { if(!($self->{depth}--) && $self->{input} =~ /\S/) { croak "Data Error: extra (mismatched) '}' found on line $self->{linenumber} in $self->{filename}!\n"; } return $data } elsif($self->{input} =~ /\G[ \t]*{/cg) { $self->{depth}++; push(@$data, $self->_parse()) } elsif($self->{input} =~ /\G[ \t]*([\w-]+)(\s*)/cg) { my ($id, $lines) = ($1, $2); # we're prepared for NCBI to make the format even worse: $self->{linenumber} += $lines =~ s/[\r\n]+//g; my $tmp; if(($self->{input} =~ /\G"((?:[^"]|"")*)"(?=\s*[,}])/cg && ++$tmp) || $self->{input} =~ /\G([\w-]+)(?=\s*[,}])/cg) { my $value = $1; if($tmp) # slight speed optimization, not really necessary since regex is fast enough { $value =~ s/""/"/g; $self->{linenumber} += $value =~ s/[\r\n]+//g; } if(ref($data->{$id})) { push(@{$data->{$id}}, $value) } # hash value is not a terminal (or have multiple values), create array to avoid multiple same-keyed hash overwrite each other elsif($data->{$id}) { $data->{$id} = [$data->{$id}, $value] } # hash value has a second terminal value now! else { $data->{$id} = $value } # the first terminal value } elsif($self->{input} =~ /\G{/cg) { $self->{depth}++; push(@{$data->{$id}}, $self->_parse()); } elsif($self->{input} =~ /\G(?=[,}])/cg) { push(@$data, $id) } else # must be "id value value" format { $self->{depth}++; push(@{$data->{$id}}, $self->_parse(1)) } if($flag) { if(!($self->{depth}--) && $self->{input} =~ /\S/) { croak "Data Error: extra (mismatched) '}' found on line $self->{linenumber} in $self->{filename}!\n"; } return $data; } } elsif($self->{input} =~ /\G[ \t]*"((?:[^"]|"")*)"(?=\s*[,}])/cg) { my $value = $1; $value =~ s/""/"/g; $self->{linenumber} += $value =~ s/[\r\n]+//g; push(@$data, $value) } else # end of input { my ($pos, $len) = (pos($self->{input}), length($self->{input})); if($pos != $len && $self->{input} =~ /\G\s*\S/cg) # problem with parsing, must be non-conforming data { croak "Data Error: none conforming data found on line $self->{linenumber} in $self->{filename}!\n" . "first $self->{maxerrstr} (or till end of input) characters including the non-conforming data:\n" . substr($self->{input}, $pos, $self->{maxerrstr}) . "\n"; } elsif($self->{depth} > 0) { croak "Data Error: missing '}' found at end of input in $self->{filename}!"; } elsif($self->{depth} < 0) { croak "Data Error: extra (mismatched) '}' found at end of input in $self->{filename}!"; } return $data; } } } 1; From jason.stajich at duke.edu Tue Apr 5 22:39:32 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Apr 5 22:33:22 2005 Subject: [Bioperl-l] newbie needs help with "preferred_id_type" In-Reply-To: References: Message-ID: <565d34ddaab8535f77d0a304b76a13d6@duke.edu> Don't call it when writing to genbank format... it only has meaning if you are writing to fasta format. What was the question? Do you want to control what is in the LOCUS field and/or you want to make sure the ACCESSION is set in the Genbank file when reading from a FASTA file? You have access to these in the display_id --> LOCUS accession_number -> ACCESSION -jason On Apr 5, 2005, at 3:02 PM, Jonathan Miller wrote: > > This works, thank you very much; > suppose however I wanted to write in "GenBank" format; > in this case I get an error message: > > Can't locate object method "preferred_id_type" via package > "Bio::SeqIO::genbank" at malcolm.pl line 13. > > many thanks > > jm > > On Tue, 5 Apr 2005, Hilmar Lapp wrote: > >> This is a property of the SeqIO stream, so you need to set it on the >> stream object: >> >> $seqio = Bio:SeqIO->new(-format=>'fasta',-file=>"> $seqio->preferred_id_type('accession'); >> >> Hth, >> >> -hilmar >> >> On Tuesday, April 5, 2005, at 10:08 AM, Cook, Malcolm wrote: >> >>> hmmm - this is new to me >>> >>> let's see, on my bioperl 1.5 installation, the unix one-liner >>> >>> perl -MBio::SeqIO::fasta -e 'print >>> "@Bio::SeqIO::fasta::SEQ_ID_TYPES"' >>> >>> returns >>> >>> accession accession.version display primary >>> >>> so, I think you should be able to add, for instance >>> >>> $seq->preferred_id_type('display') >>> >>> or, in your case >>> >>> $seq->preferred_id_type('accession') >>> >>> just before your call to write_seq >>> >>> try it! >>> >>> Malcolm Cook - mec@stowers-institute.org - 816-926-4449 >>> Database Applications Manager - Bioinformatics >>> Stowers Institute for Medical Research - Kansas City, MO USA >>> >>> >>> >>> -----Original Message----- >>> From: bioperl-l-bounces@portal.open-bio.org >>> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jonathan >>> Miller >>> Sent: Monday, April 04, 2005 6:45 PM >>> To: bioperl-l@bioperl.org >>> Subject: [Bioperl-l] newbie needs help with "preferred_id_type" >>> >>> >>> >>> Below, I want write_seq to use the accession, >>> rather than the default. >>> --------------------------------------------- >>> #!/usr/bin/perl >>> use Bio::SeqIO; >>> >>> my $infile=shift; >>> $in = Bio::SeqIO->new(-file => $infile, >>> -format => 'GenBank'); >>> my $outfile= ">" . $infile . ".fa"; >>> $out = Bio::SeqIO->new(-file => $outfile, >>> -format => 'fasta' >>> ); >>> >>> while ( $seq = $in->next_seq() ) { >>> print $seq->accession,"\n"; >>> $out->write_seq($seq); >>> } >>> ------------------------------------------------ >>> So, Going to the Bio::Seq docs, I see: >>> --------------------------------------------------------- >>> preferred_id_type code top prev next >>> >>> Title : preferred_id_type >>> Usage : $obj->preferred_id_type('accession') >>> Function: Get/Set the preferred type of identifier to use in the >>> ">ID" >>> position >>> for FASTA output. >>> Returns : string, one of values defined in >>> @Bio::SeqIO::fasta::SEQ_ID_TYPES. >>> Default = $Bio::SeqIO::fasta::DEFAULT_SEQ_ID_TYPE >>> ('display'). >>> Args : string when setting. This must be one of values defined in >>> @Bio::SeqIO::fasta::SEQ_ID_TYPES. Allowable values: >>> accession, accession.version, display, primary >>> Throws : fatal exception if the supplied id type is not in >>> @SEQ_ID_TYPES. >>> ---------------------------------------------------------- >>> but I have so far been unable to figure out how in fact to >>> set the SEQ_ID_TYPE . >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jhpark98 at dreamwiz.com Tue Apr 5 22:59:55 2005 From: jhpark98 at dreamwiz.com (=?ks_c_5601-1987?B?udrB2Mf8?=) Date: Tue Apr 5 22:53:44 2005 Subject: [Bioperl-l] Blast Connection Error Message-ID: Hello. Until last two week, I could remote-blast against NCBI nr DB using bioperl module. But right now, I can not remote-blast since a week. I had set up interval time as 5 second. In my opinion, NCBI is not allowed to blast search against NCBI nr DB. If you have experience like me, then let me know. Usually I do blast search for 50-100 sequences at one time. The error message is as like this -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4 Content-Length: 336 Content-Type: application/x-www-form-urlencoded DATABASE=ecoli&COMPOSITION_BASED_STATISTICS=off&QUERY=%3E1OSA%3A_+CALMODULIN +- +CHAIN+_+%0AAEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID FPEFLSLMARKMKEQDSEEELIEAFKVFDRDGNGLISAAELRHVMTNLGEKLTDDEVDEMIREADIDGDGHINYEE FVRMMVSK&EXPECT=1e- 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L&CDD_SEARCH=off&PRO GRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') From Peter.Robinson at t-online.de Wed Apr 6 01:47:54 2005 From: Peter.Robinson at t-online.de (Peter.Robinson@t-online.de) Date: Wed Apr 6 01:39:37 2005 Subject: [Bioperl-l] Getting description of BLAST hits In-Reply-To: References: <20050405220925.GA4195@anna> Message-ID: <20050406054754.GA3662@anna> On Tue, Apr 05, 2005 at 10:18:35PM -0400, Jason Stajich wrote: > what does > print ref($hit), "\n"; > say? ref($hit)= Bio::Search::Hit::BlastHit I am using Bioperl 1.4, perl 5.8, Debian linux (sarge, 2.6) Thanks, Peter > > What version of Bioperl? > > On Apr 5, 2005, at 6:09 PM, Peter.Robinson@t-online.de wrote: > > >Dear list, > > > >I am trying to get a list of all BLAST hits to various sequences using > >code adapted from the bioperl website. In essence, I would like to get > >the descriptions > >from lines such as > > > > > >gb|BC021898.1| Homo sapiens adaptor-related protein complex 1, s... > >2956 0.0 > > > > > > > >Using the following code: > > > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\t", > > $hit->accession(), "\t", hit->description(),"\t"; > > > > > >leads to the error: > > > >Can't locate object method "description" via package "hit" (perhaps > >you forgot to load "hit"?) at blast2SpeciesList.pl line 60. > > > > > >However, the documentation for Bio::Search::Hit indicates that Hit > >objects should have a description() method. > > > >What am I misunderstanding? > > > >Thanks in advance for any tips. Just in case, I am pasting the > >complete code snippet at the bottom of this mail. > > > >Peter > > > > > >#!/usr/bin/perl -w > >use strict; > > > > > >#Remote-blast "factory object" creation and blast-parameter > >initialization > > > >use Bio::Tools::Run::RemoteBlast; > > > >my $prog = 'blastn'; > >my $db = 'nr'; > >my $e_val= '1e-10'; > > > >my @params = ( '-prog' => $prog, > > '-data' => $db, > > '-expect' => $e_val, > > '-readmethod' => 'SearchIO' ); > > > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > >#change a paramter > >#$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > >[ORGN]'; > >#remove a parameter > >#delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > >my $v = 1; > >#$v is just to turn on and off the messages > > > >my $str = Bio::SeqIO->new(-file=>'testSeqs.fa' , '-format' => 'fasta' > >); > > > >while (my $input = $str->next_seq()){ > > #Blast a sequence against a database: > > > > #Alternatively, you could pass in a file with many > > #sequences rather than loop through sequence one at a time > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > #and swap the two lines below for an example of that. > > my $r = $factory->submit_blast($input); > > #my $r = $factory->submit_blast('amino.fa'); > > > > print STDERR "waiting..." if( $v > 0 ); > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $factory->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save the output > > my $filename = $result->query_name()."\.out"; > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\t", > > $result->query_accession(),"\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\t", > > $hit->accession(), "\t", hit->description(),"\t"; > > > > while( my $hsp = $hit->next_hsp ) { > > print "\te-val is ", $hsp->evalue, "\n"; > > last; > > } > > } > > } > > } > > } > >} > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From muratem at eng.uah.edu Wed Apr 6 03:17:21 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Wed Apr 6 03:12:57 2005 Subject: [Bioperl-l] Bio::Seq::PrimedSeq::new exception Message-ID: Greetings I am using Bio::Tools::Run::Primer3 inside a script to design primers. It will run primer3 just fine, but trying to get the results inside the script yields: ------------- EXCEPTION ------------- MSG: The target_sequence must be a Bio::Seq to create this object. STACK Bio::Seq::PrimedSeq::new /usr/local/lib/perl5/site_perl/5.8.0/Bio/Seq/PrimedSeq.pm:232 STACK Bio::Tools::Primer3::next_primer /usr/local/lib/perl5/site_perl/5.8.0/Bio/Tools/Primer3.pm:331 STACK toplevel ./processBlast.pl:318 I am using the method thus: my $genomic_seqobj = Bio::PrimarySeq->new ( -seq =>$genomic_seq, -id =>$orf_accession, -accession_number=> $orf_accession, -alphabet =>"dna", -is_circular => 0 ); my $primer3nested = Bio::Tools::Run::Primer3->new(-seq=>$genomic_seqobj, -outfile=>$orf_accession.".nested.out", -path => "/usr/local/bin/"); $primer3nested->add_targets('PRIMER_NUM_RETURN' => 1); my $results = $primer3nested->run; my $primer = $results->next_primer(); I can use Bio::Tools::Primer3 to read the file back in and everything works just fine. The offending line from Bio::Tools::Primer3 is: 232 if (! ref($self->{target_sequence}) || 233 ! $self->{target_sequence}->isa('Bio::SeqI') ) { 234 $self->throw("The target_sequence must be a Bio::Seq to create this object."); It appears to me that the object is created in Bio::Tools::Run::Primer3 in the run method as: 396 # convert the results to individual results 397 $self->{results_obj}=new Bio::Tools::Primer3; 398 $self->{results_obj}->_set_variable('results',$self->{results}); 399 $self->{results_obj}->_set_variable('seqobject',$self->{seqobject}); 400 $self->{results_separated}= $self->{results_obj}->_separate(); 401 return $self->{results_obj}; So, it looks to me that if you pass the constructor a valid child of Bio::SeqI (and PrimarySeq is, is it not?) Can anyone help me see what I'm missing? Cheers Mike From Marc.Logghe at devgen.com Wed Apr 6 05:07:51 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Apr 6 05:04:52 2005 Subject: [Bioperl-l] Getting description of BLAST hits Message-ID: It is only a typo: $hit->accession(), "\t", hit->description(),"\t"; And perl barfs with Can't locate object method "description" via package "hit" And that makes sense because it should be: $hit->accession(), "\t", $hit->description(),"\t"; ~~~~ HTH, ML > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > Peter.Robinson@t-online.de > Sent: Wednesday, April 06, 2005 7:48 AM > To: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] Getting description of BLAST hits > > On Tue, Apr 05, 2005 at 10:18:35PM -0400, Jason Stajich wrote: > > what does > > print ref($hit), "\n"; > > say? > > ref($hit)= Bio::Search::Hit::BlastHit > I am using Bioperl 1.4, perl 5.8, Debian linux (sarge, 2.6) > > Thanks, > Peter > > > > > > What version of Bioperl? > > > > On Apr 5, 2005, at 6:09 PM, Peter.Robinson@t-online.de wrote: > > > > >Dear list, > > > > > >I am trying to get a list of all BLAST hits to various sequences > > >using code adapted from the bioperl website. In essence, I > would like > > >to get the descriptions from lines such as > > > > > > > > >gb|BC021898.1| Homo sapiens adaptor-related protein > complex 1, s... > > >2956 0.0 > > > > > > > > > > > >Using the following code: > > > > > > while ( my $hit = $result->next_hit ) { > > > next unless ( $v > 0); > > > print "\thit name is ", $hit->name, "\t", > > > $hit->accession(), "\t", hit->description(),"\t"; > > > > > > > > >leads to the error: > > > > > >Can't locate object method "description" via package "hit" > (perhaps > > >you forgot to load "hit"?) at blast2SpeciesList.pl line 60. > > > > > > > > >However, the documentation for Bio::Search::Hit indicates that Hit > > >objects should have a description() method. > > > > > >What am I misunderstanding? > > > > > >Thanks in advance for any tips. Just in case, I am pasting the > > >complete code snippet at the bottom of this mail. > > > > > >Peter > > > > > > > > >#!/usr/bin/perl -w > > >use strict; > > > > > > > > >#Remote-blast "factory object" creation and blast-parameter > > >initialization > > > > > >use Bio::Tools::Run::RemoteBlast; > > > > > >my $prog = 'blastn'; > > >my $db = 'nr'; > > >my $e_val= '1e-10'; > > > > > >my @params = ( '-prog' => $prog, > > > '-data' => $db, > > > '-expect' => $e_val, > > > '-readmethod' => 'SearchIO' ); > > > > > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > >#change a paramter > > >#$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo > > >sapiens [ORGN]'; #remove a parameter #delete > > >$Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > > > >my $v = 1; > > >#$v is just to turn on and off the messages > > > > > >my $str = Bio::SeqIO->new(-file=>'testSeqs.fa' , '-format' > => 'fasta' > > >); > > > > > >while (my $input = $str->next_seq()){ > > > #Blast a sequence against a database: > > > > > > #Alternatively, you could pass in a file with many #sequences > > > rather than loop through sequence one at a time #Remove the loop > > > starting 'while (my $input = $str->next_seq())' > > > #and swap the two lines below for an example of that. > > > my $r = $factory->submit_blast($input); #my $r = > > > $factory->submit_blast('amino.fa'); > > > > > > print STDERR "waiting..." if( $v > 0 ); > > > while ( my @rids = $factory->each_rid ) { > > > foreach my $rid ( @rids ) { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) { > > > if( $rc < 0 ) { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } else { > > > my $result = $rc->next_result(); > > > #save the output > > > my $filename = $result->query_name()."\.out"; > > > $factory->save_output($filename); > > > $factory->remove_rid($rid); > > > print "\nQuery Name: ", $result->query_name(), "\t", > > > $result->query_accession(),"\n"; > > > while ( my $hit = $result->next_hit ) { > > > next unless ( $v > 0); > > > print "\thit name is ", $hit->name, "\t", > > > $hit->accession(), "\t", hit->description(),"\t"; > > > > > > while( my $hsp = $hit->next_hsp ) { > > > print "\te-val is ", $hsp->evalue, "\n"; > > > last; > > > } > > > } > > > } > > > } > > > } > > >} > > > > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From brian_osborne at cognia.com Wed Apr 6 07:32:36 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Apr 6 08:36:09 2005 Subject: [Bioperl-l] pubmed In-Reply-To: <6.1.2.0.2.20050405102233.0209eec0@qfdong.mail.iastate.edu> Message-ID: Qunfeng, Your code works with the attached file, give it a try (I'm using bioperl-live, by the way). Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng Sent: Tuesday, April 05, 2005 11:33 AM To: Brian Osborne; Bioperl Subject: RE: [Bioperl-l] pubmed Brian, Thanks for your kind help. The following is my complete code. It reads a GenBank file as input and prints out authors, medline but not pubmed. I also tried your code and it doesn't work for me either. I tried your code with two of my machines machine 1). installed bioperl-1.2.2 $ uname -a Linux 2.4.20-8bigmem #1 SMP Thu Mar 13 17:32:29 EST 2003 i686 i686 i386 GNU/Linux machine 2). installed bioperl-1.4 $ uname -a Linux 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 i686 i686 i386 GNU/Linux ---------------------------------------------------------------------------- ---------------------------------------- #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $inputGBfile = $ARGV[0]; my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", '-format' => 'GenBank'); my $seq_object; while (1){ eval{ $seq_object = $seqio_object->next_seq; }; if($@){ print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; next; } last if(!defined $seq_object); my $gi = $seq_object->primary_id; my $anno_collection = $seq_object->annotation; foreach my $key ( $anno_collection->get_all_annotation_keys ) { my @annotations = $anno_collection->get_Annotations($key); foreach my $value ( @annotations ) { if($value->tagname eq "reference"){ my $authors = $value->authors(); my $medline = $value->medline(); my $pubmed = $value->pubmed(); print "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; }#end if }#end inner for }#end outer for }#end while ---------------------------------------------------------------------------- --------------------------------------------------------- At 05:18 PM 4/4/2005, Brian Osborne wrote: >Qunfeng, > >SeqIO parses this entry correctly, using this code: > >use strict; >use Bio::DB::GenBank; > >my $db = new Bio::DB::GenBank; >my $seq = $db->get_Seq_by_id(56961711); >my $ac = $seq->annotation; >for my $ref ($ac->get_Annotations('reference')) { > print $ref->pubmed; >} > >It looks like your code should work as well but since you didn't show us >your complete code using the variable $value it's hard to see where the >problem is. > > >Brian O. > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng >Sent: Monday, April 04, 2005 5:47 PM >To: Bioperl >Subject: RE: [Bioperl-l] pubmed > > >Brain, > >My problem is that none of them returned a Pubmed id. > >Qunfeng > >At 03:07 PM 4/4/2005, Brian Osborne wrote: > >Qunfeng, > > > >Only 1 of the 5 references in the 56961711 entry has a Pubmed id, the rest > >will return nothing when you try $value->pubmed. > > > >Brian O. > > > >-----Original Message----- > >From: bioperl-l-bounces@portal.open-bio.org > >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng > >Sent: Monday, April 04, 2005 2:58 PM > >To: Bioperl > >Subject: Re: [Bioperl-l] pubmed > > > > > >so, I tried to use > > my $authors = $hash_ref->{'authors'}; > > my $medline = $hash_ref->{'medline'}; > > my $pubmed = $hash_ref->{'pubmed'}; > > > >to parse out authors, medline, pubmed. > > > >I was able to successfully parse out authors and medline but not pubmed. > > > >Then I tried to use > > > > my $authors = $value->authors(); > > my $medline = $value->medline(); > > my $pubmed = $value->pubmed(); > > > >and I got the same thing. > > > >Qunfeng > > > >At 07:50 PM 4/2/2005, Hilmar Lapp wrote: > > >So what is the result of this script that you wouldn't have expected or > > >that is not giving you what you need? > > > > > >BTW annotation objects under the tagname 'reference' are usually > > >Bio::Annotation::Reference objects and have methods $ref->authors(), > > >$ref->pubmed(), $ref->medline, etc. Check the POD. > > > > > > -hilmar > > > > > >On Friday, April 1, 2005, at 12:04 PM, Qunfeng wrote: > > > > > >>Hilmar and Paulo, > > >> > > >>I apologize for that, > > >> > > >>here is a snippet of my code, I must have missed something very simple. > > >>Thanks for your help! -- Qunfeng > > >> > > >>#!/usr/bin/perl -w > > >>use strict; > > >>use Bio::SeqIO; > > >> > > >>my $inputGBfile = $ARGV[0]; > > >>my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", > > >> '-format' => 'GenBank'); > > >> > > >>my $seq_object; > > >>while (1){ > > >> eval{ > > >> $seq_object = $seqio_object->next_seq; > > >> }; > > >> if($@){ > > >> print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; > > >> next; > > >> } > > >> last if(!defined $seq_object); > > >> my $gi = $seq_object->primary_id; > > >> my $anno_collection = $seq_object->annotation; > > >> foreach my $key ( $anno_collection->get_all_annotation_keys ) { > > >> my @annotations = $anno_collection->get_Annotations($key); > > >> foreach my $value ( @annotations ) { > > >> if($value->tagname eq "reference"){ > > >> my $hash_ref = $value->hash_tree; > > >> my $authors = $hash_ref->{'authors'}; > > >> my $medline = $hash_ref->{'medline'}; > > >> my $pubmed = $hash_ref->{'pubmed'}; > > >> print STDERR > > >> "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; > > >> } > > >> } > > >> } > > >>} > > >> > > >> > > >> > > >>At 03:10 AM 4/1/2005, you wrote: > > >>>*please* people always post the code or ideally a small snippet that > > >>>demonstrates what you were trying to do, and post the result and if >it's > > >>>not an exception why it is not the result you expected. DO NOT just say > > >>>'blah doesn't work for me'. Whenever someone needs to guess what you > > >>>probably did and what you probably mean you are wasting other people's > >time. > > >>> > > >>>The GI# you have has multiple refs with one having a pubmed ID and none > > >>>having a medline ID. So, the one ref that has a pubmed ID should return > > >>>it from $ref->pubmed() but without any code snippet it is impossible to > > >>>tell what you actually did and what therefore might be the problem. > > >>> > > >>> -hilmar > > >>> > > >>>On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: > > >>> > > >>>>Hi there, > > >>>> > > >>>>http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > > >>>> > > >>>>I am not very familiar with BioPerl. I tried to follow the example > > >>>>showing in the above page to retrieve pubmed ID under each Reference > > >>>>tag , i.e., $value->pubmed(), but it doesn't work for me for the seq > > >>>>gi#56961711. The authors() works for me. Appreciate any >>> > >suggestions. > > >>>> > > >>>>Qunfeng > > >>>>_______________________________________________ > > >>>>Bioperl-l mailing list > > >>>>Bioperl-l@portal.open-bio.org > > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >>>-- > > >>>------------------------------------------------------------- > > >>>Hilmar Lapp email: lapp at gnf.org > > >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > >>>------------------------------------------------------------- > > >> > > >-- > > >------------------------------------------------------------- > > >Hilmar Lapp email: lapp at gnf.org > > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > >------------------------------------------------------------- > > > > > > > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: NM_000266.gb Type: application/octet-stream Size: 6704 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050406/1cd3a766/NM_000266.obj From qfdong at iastate.edu Wed Apr 6 12:40:10 2005 From: qfdong at iastate.edu (Qunfeng) Date: Wed Apr 6 12:35:47 2005 Subject: [Bioperl-l] pubmed In-Reply-To: References: <6.1.2.0.2.20050405102233.0209eec0@qfdong.mail.iastate.edu> Message-ID: <6.1.2.0.2.20050406113813.033c9860@qfdong.mail.iastate.edu> Well, it doesn't work here. I have given up on $value->pubmed and decided to simply parsed out the pubmed id from $value->location Thanks again for all your help! Qunfeng At 06:32 AM 4/6/2005, Brian Osborne wrote: >Qunfeng, > >Your code works with the attached file, give it a try (I'm using >bioperl-live, by the way). > >Brian O. > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng >Sent: Tuesday, April 05, 2005 11:33 AM >To: Brian Osborne; Bioperl >Subject: RE: [Bioperl-l] pubmed > > >Brian, > >Thanks for your kind help. The following is my complete code. It reads a >GenBank file as input and prints out authors, medline but not pubmed. I >also tried your code and it doesn't work for me either. > >I tried your code with two of my machines > >machine 1). installed bioperl-1.2.2 >$ uname -a >Linux 2.4.20-8bigmem #1 SMP Thu Mar 13 17:32:29 EST 2003 i686 i686 i386 >GNU/Linux > >machine 2). installed bioperl-1.4 >$ uname -a >Linux 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 i686 i686 >i386 GNU/Linux > >---------------------------------------------------------------------------- >---------------------------------------- >#!/usr/bin/perl -w >use strict; >use Bio::SeqIO; > >my $inputGBfile = $ARGV[0]; >my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", > '-format' => >'GenBank'); > >my $seq_object; >while (1){ > eval{ > $seq_object = $seqio_object->next_seq; > }; > if($@){ > print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; > next; > } > last if(!defined $seq_object); > my $gi = $seq_object->primary_id; > my $anno_collection = $seq_object->annotation; > foreach my $key ( $anno_collection->get_all_annotation_keys ) { > my @annotations = $anno_collection->get_Annotations($key); > foreach my $value ( @annotations ) { > if($value->tagname eq "reference"){ > my $authors = $value->authors(); > my $medline = $value->medline(); > my $pubmed = $value->pubmed(); > print >"gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; > }#end if > }#end inner for > }#end outer for >}#end while >---------------------------------------------------------------------------- >--------------------------------------------------------- > >At 05:18 PM 4/4/2005, Brian Osborne wrote: > >Qunfeng, > > > >SeqIO parses this entry correctly, using this code: > > > >use strict; > >use Bio::DB::GenBank; > > > >my $db = new Bio::DB::GenBank; > >my $seq = $db->get_Seq_by_id(56961711); > >my $ac = $seq->annotation; > >for my $ref ($ac->get_Annotations('reference')) { > > print $ref->pubmed; > >} > > > >It looks like your code should work as well but since you didn't show us > >your complete code using the variable $value it's hard to see where the > >problem is. > > > > > >Brian O. > > > >-----Original Message----- > >From: bioperl-l-bounces@portal.open-bio.org > >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng > >Sent: Monday, April 04, 2005 5:47 PM > >To: Bioperl > >Subject: RE: [Bioperl-l] pubmed > > > > > >Brain, > > > >My problem is that none of them returned a Pubmed id. > > > >Qunfeng > > > >At 03:07 PM 4/4/2005, Brian Osborne wrote: > > >Qunfeng, > > > > > >Only 1 of the 5 references in the 56961711 entry has a Pubmed id, the >rest > > >will return nothing when you try $value->pubmed. > > > > > >Brian O. > > > > > >-----Original Message----- > > >From: bioperl-l-bounces@portal.open-bio.org > > >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng > > >Sent: Monday, April 04, 2005 2:58 PM > > >To: Bioperl > > >Subject: Re: [Bioperl-l] pubmed > > > > > > > > >so, I tried to use > > > my $authors = $hash_ref->{'authors'}; > > > my $medline = $hash_ref->{'medline'}; > > > my $pubmed = $hash_ref->{'pubmed'}; > > > > > >to parse out authors, medline, pubmed. > > > > > >I was able to successfully parse out authors and medline but not pubmed. > > > > > >Then I tried to use > > > > > > my $authors = $value->authors(); > > > my $medline = $value->medline(); > > > my $pubmed = $value->pubmed(); > > > > > >and I got the same thing. > > > > > >Qunfeng > > > > > >At 07:50 PM 4/2/2005, Hilmar Lapp wrote: > > > >So what is the result of this script that you wouldn't have expected or > > > >that is not giving you what you need? > > > > > > > >BTW annotation objects under the tagname 'reference' are usually > > > >Bio::Annotation::Reference objects and have methods $ref->authors(), > > > >$ref->pubmed(), $ref->medline, etc. Check the POD. > > > > > > > > -hilmar > > > > > > > >On Friday, April 1, 2005, at 12:04 PM, Qunfeng wrote: > > > > > > > >>Hilmar and Paulo, > > > >> > > > >>I apologize for that, > > > >> > > > >>here is a snippet of my code, I must have missed something very >simple. > > > >>Thanks for your help! -- Qunfeng > > > >> > > > >>#!/usr/bin/perl -w > > > >>use strict; > > > >>use Bio::SeqIO; > > > >> > > > >>my $inputGBfile = $ARGV[0]; > > > >>my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", > > > >> '-format' => 'GenBank'); > > > >> > > > >>my $seq_object; > > > >>while (1){ > > > >> eval{ > > > >> $seq_object = $seqio_object->next_seq; > > > >> }; > > > >> if($@){ > > > >> print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; > > > >> next; > > > >> } > > > >> last if(!defined $seq_object); > > > >> my $gi = $seq_object->primary_id; > > > >> my $anno_collection = $seq_object->annotation; > > > >> foreach my $key ( $anno_collection->get_all_annotation_keys ) >{ > > > >> my @annotations = >$anno_collection->get_Annotations($key); > > > >> foreach my $value ( @annotations ) { > > > >> if($value->tagname eq "reference"){ > > > >> my $hash_ref = $value->hash_tree; > > > >> my $authors = $hash_ref->{'authors'}; > > > >> my $medline = $hash_ref->{'medline'}; > > > >> my $pubmed = $hash_ref->{'pubmed'}; > > > >> print STDERR > > > >> "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; > > > >> } > > > >> } > > > >> } > > > >>} > > > >> > > > >> > > > >> > > > >>At 03:10 AM 4/1/2005, you wrote: > > > >>>*please* people always post the code or ideally a small snippet that > > > >>>demonstrates what you were trying to do, and post the result and if > >it's > > > >>>not an exception why it is not the result you expected. DO NOT just >say > > > >>>'blah doesn't work for me'. Whenever someone needs to guess what you > > > >>>probably did and what you probably mean you are wasting other >people's > > >time. > > > >>> > > > >>>The GI# you have has multiple refs with one having a pubmed ID and >none > > > >>>having a medline ID. So, the one ref that has a pubmed ID should >return > > > >>>it from $ref->pubmed() but without any code snippet it is impossible >to > > > >>>tell what you actually did and what therefore might be the problem. > > > >>> > > > >>> -hilmar > > > >>> > > > >>>On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: > > > >>> > > > >>>>Hi there, > > > >>>> > > > >>>>http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > > > >>>> > > > >>>>I am not very familiar with BioPerl. I tried to follow the example > > > >>>>showing in the above page to retrieve pubmed ID under each Reference > > > >>>>tag , i.e., $value->pubmed(), but it doesn't work for me for the seq > > > >>>>gi#56961711. The authors() works for me. Appreciate any >>> > > >suggestions. > > > >>>> > > > >>>>Qunfeng > > > >>>>_______________________________________________ > > > >>>>Bioperl-l mailing list > > > >>>>Bioperl-l@portal.open-bio.org > > > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >>>-- > > > >>>------------------------------------------------------------- > > > >>>Hilmar Lapp email: lapp at gnf.org > > > >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > >>>------------------------------------------------------------- > > > >> > > > >-- > > > >------------------------------------------------------------- > > > >Hilmar Lapp email: lapp at gnf.org > > > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > >------------------------------------------------------------- > > > > > > > > > > > >_______________________________________________ > > > >Bioperl-l mailing list > > > >Bioperl-l@portal.open-bio.org > > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Wed Apr 6 13:40:27 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Apr 6 13:34:51 2005 Subject: [Bioperl-l] pubmed In-Reply-To: <6.1.2.0.2.20050406113813.033c9860@qfdong.mail.iastate.edu> References: <6.1.2.0.2.20050405102233.0209eec0@qfdong.mail.iastate.edu> <6.1.2.0.2.20050406113813.033c9860@qfdong.mail.iastate.edu> Message-ID: <396ff777cd174ce726226b769c381461@duke.edu> I'm pretty sure this was a bug that was fixed in more-recent versions of bioperl. I know there were some genbank parser bugfixes in there. I would be curious if it persists if you use bioperl 1.5.0 -- brian reports it working fine with bioperl-live so your solution is to upgrade at least the SeqIO/genbank.pm file. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 6, 2005, at 12:40 PM, Qunfeng wrote: > Well, it doesn't work here. I have given up on $value->pubmed and > decided to simply parsed out the pubmed id from $value->location > > Thanks again for all your help! > > Qunfeng > > At 06:32 AM 4/6/2005, Brian Osborne wrote: >> Qunfeng, >> >> Your code works with the attached file, give it a try (I'm using >> bioperl-live, by the way). >> >> Brian O. >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng >> Sent: Tuesday, April 05, 2005 11:33 AM >> To: Brian Osborne; Bioperl >> Subject: RE: [Bioperl-l] pubmed >> >> >> Brian, >> >> Thanks for your kind help. The following is my complete code. It >> reads a >> GenBank file as input and prints out authors, medline but not pubmed. >> I >> also tried your code and it doesn't work for me either. >> >> I tried your code with two of my machines >> >> machine 1). installed bioperl-1.2.2 >> $ uname -a >> Linux 2.4.20-8bigmem #1 SMP Thu Mar 13 17:32:29 EST 2003 i686 i686 >> i386 >> GNU/Linux >> >> machine 2). installed bioperl-1.4 >> $ uname -a >> Linux 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 i686 >> i686 >> i386 GNU/Linux >> >> ---------------------------------------------------------------------- >> ------ >> ---------------------------------------- >> #!/usr/bin/perl -w >> use strict; >> use Bio::SeqIO; >> >> my $inputGBfile = $ARGV[0]; >> my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", >> '-format' => >> 'GenBank'); >> >> my $seq_object; >> while (1){ >> eval{ >> $seq_object = $seqio_object->next_seq; >> }; >> if($@){ >> print STDERR "EXCEPTION FOUND; SKIP THIS OBJECT\n"; >> next; >> } >> last if(!defined $seq_object); >> my $gi = $seq_object->primary_id; >> my $anno_collection = $seq_object->annotation; >> foreach my $key ( $anno_collection->get_all_annotation_keys >> ) { >> my @annotations = >> $anno_collection->get_Annotations($key); >> foreach my $value ( @annotations ) { >> if($value->tagname eq "reference"){ >> my $authors = $value->authors(); >> my $medline = $value->medline(); >> my $pubmed = $value->pubmed(); >> print >> "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; >> }#end if >> }#end inner for >> }#end outer for >> }#end while >> ---------------------------------------------------------------------- >> ------ >> --------------------------------------------------------- >> >> At 05:18 PM 4/4/2005, Brian Osborne wrote: >> >Qunfeng, >> > >> >SeqIO parses this entry correctly, using this code: >> > >> >use strict; >> >use Bio::DB::GenBank; >> > >> >my $db = new Bio::DB::GenBank; >> >my $seq = $db->get_Seq_by_id(56961711); >> >my $ac = $seq->annotation; >> >for my $ref ($ac->get_Annotations('reference')) { >> > print $ref->pubmed; >> >} >> > >> >It looks like your code should work as well but since you didn't >> show us >> >your complete code using the variable $value it's hard to see where >> the >> >problem is. >> > >> > >> >Brian O. >> > >> >-----Original Message----- >> >From: bioperl-l-bounces@portal.open-bio.org >> >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng >> >Sent: Monday, April 04, 2005 5:47 PM >> >To: Bioperl >> >Subject: RE: [Bioperl-l] pubmed >> > >> > >> >Brain, >> > >> >My problem is that none of them returned a Pubmed id. >> > >> >Qunfeng >> > >> >At 03:07 PM 4/4/2005, Brian Osborne wrote: >> > >Qunfeng, >> > > >> > >Only 1 of the 5 references in the 56961711 entry has a Pubmed id, >> the >> rest >> > >will return nothing when you try $value->pubmed. >> > > >> > >Brian O. >> > > >> > >-----Original Message----- >> > >From: bioperl-l-bounces@portal.open-bio.org >> > >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Qunfeng >> > >Sent: Monday, April 04, 2005 2:58 PM >> > >To: Bioperl >> > >Subject: Re: [Bioperl-l] pubmed >> > > >> > > >> > >so, I tried to use >> > > my $authors = $hash_ref->{'authors'}; >> > > my $medline = $hash_ref->{'medline'}; >> > > my $pubmed = $hash_ref->{'pubmed'}; >> > > >> > >to parse out authors, medline, pubmed. >> > > >> > >I was able to successfully parse out authors and medline but not >> pubmed. >> > > >> > >Then I tried to use >> > > >> > > my $authors = $value->authors(); >> > > my $medline = $value->medline(); >> > > my $pubmed = $value->pubmed(); >> > > >> > >and I got the same thing. >> > > >> > >Qunfeng >> > > >> > >At 07:50 PM 4/2/2005, Hilmar Lapp wrote: >> > > >So what is the result of this script that you wouldn't have >> expected or >> > > >that is not giving you what you need? >> > > > >> > > >BTW annotation objects under the tagname 'reference' are usually >> > > >Bio::Annotation::Reference objects and have methods >> $ref->authors(), >> > > >$ref->pubmed(), $ref->medline, etc. Check the POD. >> > > > >> > > > -hilmar >> > > > >> > > >On Friday, April 1, 2005, at 12:04 PM, Qunfeng wrote: >> > > > >> > > >>Hilmar and Paulo, >> > > >> >> > > >>I apologize for that, >> > > >> >> > > >>here is a snippet of my code, I must have missed something very >> simple. >> > > >>Thanks for your help! -- Qunfeng >> > > >> >> > > >>#!/usr/bin/perl -w >> > > >>use strict; >> > > >>use Bio::SeqIO; >> > > >> >> > > >>my $inputGBfile = $ARGV[0]; >> > > >>my $seqio_object = Bio::SeqIO->new('-file' => "$inputGBfile", >> > > >> '-format' => 'GenBank'); >> > > >> >> > > >>my $seq_object; >> > > >>while (1){ >> > > >> eval{ >> > > >> $seq_object = $seqio_object->next_seq; >> > > >> }; >> > > >> if($@){ >> > > >> print STDERR "EXCEPTION FOUND; SKIP THIS >> OBJECT\n"; >> > > >> next; >> > > >> } >> > > >> last if(!defined $seq_object); >> > > >> my $gi = $seq_object->primary_id; >> > > >> my $anno_collection = $seq_object->annotation; >> > > >> foreach my $key ( >> $anno_collection->get_all_annotation_keys ) >> { >> > > >> my @annotations = >> $anno_collection->get_Annotations($key); >> > > >> foreach my $value ( @annotations ) { >> > > >> if($value->tagname eq "reference"){ >> > > >> my $hash_ref = $value->hash_tree; >> > > >> my $authors = $hash_ref->{'authors'}; >> > > >> my $medline = $hash_ref->{'medline'}; >> > > >> my $pubmed = $hash_ref->{'pubmed'}; >> > > >> print STDERR >> > > >> >> "gi=$gi\nauthors=$authors\nmedline=$medline\npubmed=$pubmed\n\n"; >> > > >> } >> > > >> } >> > > >> } >> > > >>} >> > > >> >> > > >> >> > > >> >> > > >>At 03:10 AM 4/1/2005, you wrote: >> > > >>>*please* people always post the code or ideally a small >> snippet that >> > > >>>demonstrates what you were trying to do, and post the result >> and if >> >it's >> > > >>>not an exception why it is not the result you expected. DO NOT >> just >> say >> > > >>>'blah doesn't work for me'. Whenever someone needs to guess >> what you >> > > >>>probably did and what you probably mean you are wasting other >> people's >> > >time. >> > > >>> >> > > >>>The GI# you have has multiple refs with one having a pubmed ID >> and >> none >> > > >>>having a medline ID. So, the one ref that has a pubmed ID >> should >> return >> > > >>>it from $ref->pubmed() but without any code snippet it is >> impossible >> to >> > > >>>tell what you actually did and what therefore might be the >> problem. >> > > >>> >> > > >>> -hilmar >> > > >>> >> > > >>>On Thursday, March 31, 2005, at 03:15 PM, Qunfeng wrote: >> > > >>> >> > > >>>>Hi there, >> > > >>>> >> > > >> >>>>http://bioperl.org/HOWTOs/Feature-Annotation/ >> anno_from_genbank.html >> > > >>>> >> > > >>>>I am not very familiar with BioPerl. I tried to follow the >> example >> > > >>>>showing in the above page to retrieve pubmed ID under each >> Reference >> > > >>>>tag , i.e., $value->pubmed(), but it doesn't work for me for >> the seq >> > > >>>>gi#56961711. The authors() works for me. Appreciate any >>> >> > >suggestions. >> > > >>>> >> > > >>>>Qunfeng >> > > >>>>_______________________________________________ >> > > >>>>Bioperl-l mailing list >> > > >>>>Bioperl-l@portal.open-bio.org >> > > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > >>>-- >> > > >>>------------------------------------------------------------- >> > > >>>Hilmar Lapp email: lapp at gnf.org >> > > >>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> > > >>>------------------------------------------------------------- >> > > >> >> > > >-- >> > > >------------------------------------------------------------- >> > > >Hilmar Lapp email: lapp at gnf.org >> > > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> > > >------------------------------------------------------------- >> > > > >> > > > >> > > >_______________________________________________ >> > > >Bioperl-l mailing list >> > > >Bioperl-l@portal.open-bio.org >> > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > >> > >_______________________________________________ >> > >Bioperl-l mailing list >> > >Bioperl-l@portal.open-bio.org >> > >http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > >> > > >> > > >> > >_______________________________________________ >> > >Bioperl-l mailing list >> > >Bioperl-l@portal.open-bio.org >> > >http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > >> >_______________________________________________ >> >Bioperl-l mailing list >> >Bioperl-l@portal.open-bio.org >> >http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Kary at ioc.fiocruz.br Wed Apr 6 13:42:59 2005 From: Kary at ioc.fiocruz.br (Kary Ann Del Carmen Soriano Ocana) Date: Wed Apr 6 13:42:16 2005 Subject: [Bioperl-l] FW: Help with hmmpfam Message-ID: <8D44604203DAF9438BF9123B4A08C779B2701E@alpha.ioc.fiocruz.br> Dear All, I am working with a web design, in this, runs a "php script" that capture two information in two string (one a query sequence, another hmms model from one database) and in this I need to run hmmpfam of a "script in perl". I would like (if possible) to obtain some help with the execute this script "in perl" by system command from a "php script". I am listing my code below and the output containing the following error (It doesn't run the perl script): (partial) output and error: aceita a descri?o dos modulos lee a base de dados factory: Bio::Tools::Run::Hmmpfam=HASH(0x8be2d44) ?ltima linha da sa?da: factory: Bio::Tools::Run::Hmmpfam=HASH(0x8be2d44) Valor de Retorno: 2 I put some "print" commands everywhere to see where I am getting the error and looks like it is not entering/printing the while results (eg: factory, search). Any help would be greatly appreciated. Thanks, Kary ************ Script: 1.- php: echo '
';

// Mostra todo o resultado do comando do shell 'perl', e retorna
// a ?ltima linha da sa?da em $last_line. Guarda o valor de retorno
// do comando shell em $retval.
$last_line = system('/usr/bin/perl test_1.pl', $retval);
// Mostrando informa??o adicional
echo '

?ltima linha da sa?da: '.$last_line.'
Valor de Retorno: '.$retval 2.- perl: $ENV{HMMPFAMDIR} = '/usr/local/bin/hmmpfam'; use lib "/usr/local/bioperl14"; use lib "/usr/local/bioperl-run-1.4"; use strict; use Bio::Tools::Run::Hmmpfam; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; use Bio::SearchIO::Writer::TextResultWriter; use Bio::SearchIO::Writer::HSPTableWriter; use Bio::SearchIO::Writer::ResultTableWriter; use Bio::SeqIO; print "aceita a descri?o dos modulos\n"; my @params = ('DB' => '/home/kary/public_html/MGE-Tryp_Mobile_Genetic_Elements_14_01_05/meus_modelos_hmms.hmm', 'E' => 0.0001); print "lee a base de dados\n"; my $factory = Bio::Tools::Run::Hmmpfam->new(@params); print "factory: $factory"; #any old protein fasta file my $search = $factory->run('/home/kary/public_html/MGE-Tryp_Mobile_Genetic_Elements_14_01_05/sequencia_fasta_1_tn.txt'); print "search and run: $search\n"; my $writer = Bio::SearchIO::Writer::HSPTableWriter->new( -columns => [qw( query_name hit_name score expect query_description )] ); my $out = Bio::SearchIO->new( -writer => $writer, -file => ">searchio.out" ); while (my $result = $search->next_result()) { $out->write_result($result); } print "fin"; From Peter.Robinson at t-online.de Wed Apr 6 15:36:07 2005 From: Peter.Robinson at t-online.de (Peter.Robinson@t-online.de) Date: Wed Apr 6 15:27:47 2005 Subject: [Bioperl-l] Getting description of BLAST hits In-Reply-To: References: Message-ID: <20050406193607.GA3652@anna> On Wed, Apr 06, 2005 at 11:07:51AM +0200, Marc Logghe wrote: > It is only a typo: > $hit->accession(), "\t", hit->description(),"\t"; > > And perl barfs with Can't locate object method "description" via package > "hit" > And that makes sense because it should be: > $hit->accession(), "\t", $hit->description(),"\t"; > ~~~~ > > HTH, > ML > Thanks, I guess I should really stop programming late at night ;-} Peter From MEC at Stowers-Institute.org Wed Apr 6 16:42:40 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Wed Apr 6 16:46:44 2005 Subject: [Bioperl-l] patch to FeatureIO.pm for tied interface Message-ID: <200504062045.j36KjSfY002536@portal.open-bio.org> Hi Hilmar, Nope - I never did have an account - at least that I knew of. I made the request once I think but it never went through the right channels. I have a few other patches in the wings, a Bio::SeqIO::strider.pm module, and a script or two for contrib, that I'll submit if and when I get privs.... Also, putting some tests in on FeatureIO that exercise the tied hash would probably be a good idea.... hmmm. Cheers, Malcolm -----Original Message----- From: Hilmar Lapp [mailto:hlapp@gmx.net] Sent: Friday, April 01, 2005 2:55 AM To: Cook, Malcolm Cc: bioperl-l@portal.open-bio.org; chris dagdigian Subject: Re: [Bioperl-l] patch to FeatureIO.pm for tied interface On Thursday, March 31, 2005, at 01:41 PM, Cook, Malcolm wrote: > bioperlers, > > The following patch to bioperl-live makes up for what was probably a > copy and paste error and lets FeatureIO work with tied handle interface > too. > > I would be happy to have write access to cvs repository for this and > other such patches as discovered.... Sure, if Chris happens to read this? Didn't you once have an account? Or did I only dream that ;) -hilmar > > Cheers, > > Malcolm Cook > > > Index: FeatureIO.pm > =================================================================== > RCS file: /home/repository/bioperl/bioperl-live/Bio/FeatureIO.pm,v > retrieving revision 1.8 > diff -c -r1.8 FeatureIO.pm > *** FeatureIO.pm 18 Jan 2005 05:22:11 -0000 1.8 > --- FeatureIO.pm 31 Mar 2005 21:34:33 -0000 > *************** > *** 507,526 **** > > sub TIEHANDLE { > my ($class,$val) = @_; > ! return bless {'seqio' => $val}, $class; > } > > sub READLINE { > my $self = shift; > ! return $self->{'seqio'}->next_seq() unless wantarray; > my (@list, $obj); > ! push @list, $obj while $obj = $self->{'seqio'}->next_seq(); > return @list; > } > > sub PRINT { > my $self = shift; > ! $self->{'seqio'}->write_seq(@_); > } > > 1; > --- 507,526 ---- > > sub TIEHANDLE { > my ($class,$val) = @_; > ! return bless {'featio' => $val}, $class; > } > > sub READLINE { > my $self = shift; > ! return $self->{'featio'}->next_feature() unless wantarray; > my (@list, $obj); > ! push @list, $obj while $obj = $self->{'featio'}->next_feature(); > return @list; > } > > sub PRINT { > my $self = shift; > ! $self->{'featio'}->write_feature(@_); > } > > 1; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lisa at ey3.com Wed Apr 6 22:40:56 2005 From: lisa at ey3.com (Lisa) Date: Wed Apr 6 22:30:52 2005 Subject: [Bioperl-l] RE: Problem running clustalw.pl for the first time. Message-ID: <42549DB8.9070307@ey3.com> Hello, I am new to BioPerl and have had the modules installed by our system admin (They appear to be installed correctly) and I have complied the clustalw and it works from cmd line. But I cant run clustalw.pl it produces a typical perl error message below : ------------------------------------------------------------ Can't locate Bio/Tools/Run/Alignment/Clustalw.pm in @INC (@INC contains: /home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4/Bio/Tools/Run/Alignment/ /usr/lib/perl5/5.8.3/i386-linux-thread-multi /usr/lib/perl5/5.8.3 /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl .) at clustalw.pl line 39. BEGIN failed--compilation aborted at clustalw.pl line 39. ------------------------------------------------------------ ------------------------------------------------------------ This is the code from clustalw.pl ------------------------------------------------------------ #!/usr/bin/perl # PROGRAM : # PURPOSE : Demonstrate possible uses of Bio::Tools::Run::Alignment::Clustalw.pm # AUTHOR : Peter Schattner schattner@alum.mit.edu # CREATED : Oct 06 2000 # REVISION : $Id: clustalw.pl,v 1.1 2003/07/07 18:20:58 bosborne Exp $ . # Modify (and un-comment) the following line as required to point to clustalw program directory on your system BEGIN { $ENV{CLUSTALDIR} = '/home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4/Bio/Tools/Run/Alignment/'; } use lib "/home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4/Bio/Tools/Run/Alignment/"; use Getopt::Long; use Bio::Tools::Run::Alignment::Clustalw; use Bio::SimpleAlign; use Bio::AlignIO; use Bio::SeqIO; use strict; ------------------------------------------------------------ I would like some information or suggestions for what I can do to get this to work. Cheers Lisa From cain at cshl.edu Wed Apr 6 22:50:25 2005 From: cain at cshl.edu (Scott Cain) Date: Wed Apr 6 22:44:20 2005 Subject: [Bioperl-l] RE: Problem running clustalw.pl for the first time. In-Reply-To: <42549DB8.9070307@ey3.com> References: <42549DB8.9070307@ey3.com> Message-ID: <1112842226.3448.10.camel@localhost.localdomain> Hi Lisa, I think you need to shorten your use lib to: use lib "/home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4"; Scott On Thu, 2005-04-07 at 03:40 +0100, Lisa wrote: > Hello, > > I am new to BioPerl and have had the modules installed by our system > admin (They appear to be installed correctly) > and I have complied the clustalw and it works from cmd line. But I cant > run clustalw.pl it produces a typical perl error message below : > > ------------------------------------------------------------ > Can't locate Bio/Tools/Run/Alignment/Clustalw.pm in @INC (@INC contains: > /home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4/Bio/Tools/Run/Alignment/ > /usr/lib/perl5/5.8.3/i386-linux-thread-multi /usr/lib/perl5/5.8.3 > /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 > /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 > /usr/lib/perl5/site_perl > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 > /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0 > /usr/lib/perl5/vendor_perl .) at clustalw.pl line 39. BEGIN > failed--compilation aborted at clustalw.pl line 39. > ------------------------------------------------------------ > > ------------------------------------------------------------ > This is the code from clustalw.pl > ------------------------------------------------------------ > #!/usr/bin/perl > > # PROGRAM : > # PURPOSE : Demonstrate possible uses of > Bio::Tools::Run::Alignment::Clustalw.pm > # AUTHOR : Peter Schattner schattner@alum.mit.edu > # CREATED : Oct 06 2000 > # REVISION : $Id: clustalw.pl,v 1.1 2003/07/07 18:20:58 bosborne Exp $ > . > > # Modify (and un-comment) the following line as required to point to > clustalw program directory on your system > BEGIN { > $ENV{CLUSTALDIR} = > '/home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4/Bio/Tools/Run/Alignment/'; > } > > use lib > "/home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4/Bio/Tools/Run/Alignment/"; > use Getopt::Long; > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::SimpleAlign; > use Bio::AlignIO; > use Bio::SeqIO; > use strict; > > ------------------------------------------------------------ > > > I would like some information or suggestions for what I can do to get > this to work. > > Cheers > Lisa > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From brian_osborne at cognia.com Thu Apr 7 08:09:12 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Apr 7 08:03:32 2005 Subject: [Bioperl-l] biosql.html Message-ID: Hilmar, I've updated biosql.html in biosql-schema. Postgres installation in Cygwin was no longer the 2 minute exercise it was a while back but it's still pretty easy, biosql installation was as easy as ever. Brian O. From muratem at eng.uah.edu Thu Apr 7 09:27:47 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Thu Apr 7 09:24:27 2005 Subject: [Bioperl-l] Bio::Search::Hit Message-ID: Greetings I have use the 'significance' method in Bio::Search::Hit (and other objects) to parse blast results into a mysql database. I just went back to the documentation to determine if it returns 'e' or 'p' and it says only that it returns one or the other. Does anyone know for sure which one it is (given that they are numerically equivalent for small values)? Thanks mike From jason.stajich at duke.edu Thu Apr 7 09:47:22 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Apr 7 09:41:51 2005 Subject: [Bioperl-l] Bio::Search::Hit In-Reply-To: References: Message-ID: It should give you whatever is in your BLAST report - only one is reported in the Hit table at the top of the report.... Default WU-BLAST settings gives you p-values for Hit table while NCBI-BLAST gives you E-values. For HSPs, one can get p-value with p() and evalue with evalue(). -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 7, 2005, at 9:27 AM, Mike Muratet wrote: > Greetings > > I have use the 'significance' method in Bio::Search::Hit (and other > objects) to parse blast results into a mysql database. I just went > back to > the documentation to determine if it returns 'e' or 'p' and it says > only > that it returns one or the other. Does anyone know for sure which one > it > is (given that they are numerically equivalent for small values)? > > Thanks > > mike > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From muratem at eng.uah.edu Thu Apr 7 09:51:03 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Thu Apr 7 09:44:56 2005 Subject: [Bioperl-l] Bio::Search::Hit In-Reply-To: Message-ID: On Thu, 7 Apr 2005, Jason Stajich wrote: > It should give you whatever is in your BLAST report - only one is > reported in the Hit table at the top of the report.... > > Default WU-BLAST settings gives you p-values for Hit table while > NCBI-BLAST gives you E-values. For HSPs, one can get p-value with p() > and evalue with evalue(). > > > -jason Jason Thanks. I wasn't aware of the difference. mike > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Apr 7, 2005, at 9:27 AM, Mike Muratet wrote: > > > Greetings > > > > I have use the 'significance' method in Bio::Search::Hit (and other > > objects) to parse blast results into a mysql database. I just went > > back to > > the documentation to determine if it returns 'e' or 'p' and it says > > only > > that it returns one or the other. Does anyone know for sure which one > > it > > is (given that they are numerically equivalent for small values)? > > > > Thanks > > > > mike > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From cain at cshl.edu Thu Apr 7 09:55:44 2005 From: cain at cshl.edu (Scott Cain) Date: Thu Apr 7 09:49:58 2005 Subject: [Bioperl-l] RE: Problem running clustalw.pl for the first time. In-Reply-To: <4254B3BA.6060201@ey3.com> References: <42549DB8.9070307@ey3.com> <1112842226.3448.10.camel@localhost.localdomain> <4254B3BA.6060201@ey3.com> Message-ID: <1112882145.5166.1.camel@localhost.localdomain> Lisa, Sorry, I don't know anything about clustal--hopefully that error message will be meaningful to someone :-) Scott On Thu, 2005-04-07 at 05:14 +0100, Lisa wrote: > Dear Scott, > Thanks for responding and so quickly. > I tried as you stated and it worked!!!! > However I got the following different error > Beginning parameter-varying example... > Performing alignment with ktuple = 1 > > ------------- EXCEPTION ------------- > MSG: Clustalw call ( align -infile=/tmp/hdhgOZFMXb/SBKpkT1RWy - > output=gcg -ktuple=1 -outfile=/tmp/hdhgOZFMXb/HpOjRcDK5T >/dev/null > 2>/dev/null) crashed: 32512 > > STACK > Bio::Tools::Run::Alignment::Clustalw::_run /home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4//Bio/Tools/Run/Alignment/Clustalw.pm:557 > STACK > Bio::Tools::Run::Alignment::Clustalw::align /home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4//Bio/Tools/Run/Alignment/Clustalw.pm:473 > STACK main::vary_params clustalw.pl:111 > STACK toplevel clustalw.pl:88 > > -------------------------------------- > I am trying to run it all through virtual hosting (with Bio Perl > installed by host and clustalw complied and running) but I'm not > running as root. > Do you know if this will cause any problems? > I am not even able to view the files written to the tmp dir but I can > tell that they are > there.http://www.bioplanet.com/planetforums/viewthread.php?tid=2597 > Can you recommend anywhere to find out more info about this and other > execptions? > Many Thanks > Lisa > > > Scott Cain wrote: > > Hi Lisa, > > > > I think you need to shorten your use lib to: > > > > use lib > > "/home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4"; > > > > Scott > > > > On Thu, 2005-04-07 at 03:40 +0100, Lisa wrote: > > > > > Hello, > > > > > > I am new to BioPerl and have had the modules installed by our system > > > admin (They appear to be installed correctly) > > > and I have complied the clustalw and it works from cmd line. But I cant > > > run clustalw.pl it produces a typical perl error message below : > > > > > > ------------------------------------------------------------ > > > Can't locate Bio/Tools/Run/Alignment/Clustalw.pm in @INC (@INC contains: > > > /home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4/Bio/Tools/Run/Alignment/ > > > /usr/lib/perl5/5.8.3/i386-linux-thread-multi /usr/lib/perl5/5.8.3 > > > /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi > > > /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi > > > /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi > > > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi > > > /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 > > > /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 > > > /usr/lib/perl5/site_perl > > > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi > > > /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi > > > /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi > > > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi > > > /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 > > > /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0 > > > /usr/lib/perl5/vendor_perl .) at clustalw.pl line 39. BEGIN > > > failed--compilation aborted at clustalw.pl line 39. > > > ------------------------------------------------------------ > > > > > > ------------------------------------------------------------ > > > This is the code from clustalw.pl > > > ------------------------------------------------------------ > > > #!/usr/bin/perl > > > > > > # PROGRAM : > > > # PURPOSE : Demonstrate possible uses of > > > Bio::Tools::Run::Alignment::Clustalw.pm > > > # AUTHOR : Peter Schattner schattner@alum.mit.edu > > > # CREATED : Oct 06 2000 > > > # REVISION : $Id: clustalw.pl,v 1.1 2003/07/07 18:20:58 bosborne Exp $ > > > . > > > > > > # Modify (and un-comment) the following line as required to point to > > > clustalw program directory on your system > > > BEGIN { > > > $ENV{CLUSTALDIR} = > > > '/home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4/Bio/Tools/Run/Alignment/'; > > > } > > > > > > use lib > > > "/home/httpd/vhosts/xxxx/cgi-bin/modules/bioperl-run-1.4/Bio/Tools/Run/Alignment/"; > > > use Getopt::Long; > > > use Bio::Tools::Run::Alignment::Clustalw; > > > use Bio::SimpleAlign; > > > use Bio::AlignIO; > > > use Bio::SeqIO; > > > use strict; > > > > > > ------------------------------------------------------------ > > > > > > > > > I would like some information or suggestions for what I can do to get > > > this to work. > > > > > > Cheers > > > Lisa > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From skirov at utk.edu Thu Apr 7 13:50:48 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Apr 7 13:44:34 2005 Subject: [Bioperl-l] Entrez Gene parser Message-ID: <425572F8.4030508@utk.edu> I commited the entrez gene parser code so people can start playing with it if they like and send me any notes, requests and bugs. I know that sometimes specie description can break the code and I am checking if this is due to the GI::Parser::EntrezGene or it is my fault. Things that are still on my To do list (in this order): 1. convert the seq record to allow back compatibility with the previous locuslink parser. The tags entrezgene uses are different from those ll had, so some renaming needs to be done. 2. STS Markers are not in the record yet 3. Optimize the code to speed it up where possible 4. Check the specie bug 5. Fix the cycle reference in Transcript and remove the undef $transcript->{parent} from the parser code Please let me know if you have any notes or requests! Negative comments appreciated as well. Thanks! Stefan From mlemieux at bioinfo.ca Thu Apr 7 13:54:29 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Thu Apr 7 13:48:12 2005 Subject: [Bioperl-l] Blast Connection Error Message-ID: I tried exactly the same query as you described both using RemoteBlast.pm and from the NCBI website and got the same response from both: no significant alignments. I did *not* get the connection error you reported. I'm not sure what you mean by "interval time". Maybe that's the problem? Or maybe it was a temporary connection problem. Madeleine > Hello. > > Until last two week, I could remote-blast against NCBI nr DB using > bioperl > module. But right now, I can not remote-blast since a week. > > I had set up interval time as 5 second. In my opinion, NCBI is not > allowed > to blast search against NCBI nr DB. If you have experience like me, > then > let me know. > > Usually I do blast search for 50-100 sequences at one time. > > > > The error message is as like this > > -------------------- WARNING --------------------- > > MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > > User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4 > > Content-Length: 336 > > Content-Type: application/x-www-form-urlencoded > > > > DATABASE=ecoli&COMPOSITION_BASED_STATISTICS=off&QUERY=%3E1OSA%3A_+CALMO > DULIN > +- > +CHAIN+_+%0AAEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG > NGTID > FPEFLSLMARKMKEQDSEEELIEAFKVFDRDGNGLISAAELRHVMTNLGEKLTDDEVDEMIREADIDGDGH > INYEE > FVRMMVSK&EXPECT=1e- > 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L&CDD_SEARCH=of > f&PRO > GRAM=blastp > > > > > > An Error Occurred > > > >

An Error Occurred

> > 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname > 'www.ncbi.nlm.nih.gov') > > > > > From brian_osborne at cognia.com Thu Apr 7 14:26:56 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Apr 7 14:24:20 2005 Subject: [Bioperl-l] Entrez Gene parser In-Reply-To: <425572F8.4030508@utk.edu> Message-ID: Stefan, I'm concerned about people using next_seq() in the standard way: $so = $eg->next_seq; They'll be expecting a Seq object but getting that "uncaptured" information, yes? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Stefan Kirov Sent: Thursday, April 07, 2005 1:51 PM To: Bioperl Subject: [Bioperl-l] Entrez Gene parser I commited the entrez gene parser code so people can start playing with it if they like and send me any notes, requests and bugs. I know that sometimes specie description can break the code and I am checking if this is due to the GI::Parser::EntrezGene or it is my fault. Things that are still on my To do list (in this order): 1. convert the seq record to allow back compatibility with the previous locuslink parser. The tags entrezgene uses are different from those ll had, so some renaming needs to be done. 2. STS Markers are not in the record yet 3. Optimize the code to speed it up where possible 4. Check the specie bug 5. Fix the cycle reference in Transcript and remove the undef $transcript->{parent} from the parser code Please let me know if you have any notes or requests! Negative comments appreciated as well. Thanks! Stefan _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Thu Apr 7 14:57:25 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Apr 7 14:51:09 2005 Subject: [Bioperl-l] Entrez Gene parser In-Reply-To: References: Message-ID: <42558295.1020605@utk.edu> No. That was also Hilmar's concern. So I followed his advice to use wantarray, so if you do $so = $eg->next_seq; you get get only one Bio::Seq object, but if you do my ($gene,$gstruct,$uncapt)=$eg->next_seq; you get everything. Actually this is in the documentation, though it is pretty sketchy right now. Stefan Brian Osborne wrote: >Stefan, > >I'm concerned about people using next_seq() in the standard way: > >$so = $eg->next_seq; > >They'll be expecting a Seq object but getting that "uncaptured" information, >yes? > >Brian O. > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Stefan Kirov >Sent: Thursday, April 07, 2005 1:51 PM >To: Bioperl >Subject: [Bioperl-l] Entrez Gene parser > > >I commited the entrez gene parser code so people can start playing with >it if they like and send me any notes, requests and bugs. I know that >sometimes specie description can break the code and I am checking if >this is due to the GI::Parser::EntrezGene or it is my fault. >Things that are still on my To do list (in this order): >1. convert the seq record to allow back compatibility with the previous >locuslink parser. The tags entrezgene uses are different from those ll >had, so some renaming needs to be done. >2. STS Markers are not in the record yet >3. Optimize the code to speed it up where possible >4. Check the specie bug >5. Fix the cycle reference in Transcript and remove the undef >$transcript->{parent} from the parser code >Please let me know if you have any notes or requests! >Negative comments appreciated as well. >Thanks! >Stefan >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From brian_osborne at cognia.com Thu Apr 7 21:11:11 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Apr 7 21:05:35 2005 Subject: [Bioperl-l] Query sequence length <= 0 In-Reply-To: <1112647949.4251a90da51e7@email.csit.fsu.edu> Message-ID: Yanfeng, On my computer, running Cygwin/Windows and bioperl-live, your script won't work because without the fasta "-Q? option the script will simply stop, waiting forever for user input. Of course, it could be the file contents themselves since I?m using mine, not yours. At any rate, this works: use strict; use Bio::SearchIO; my $fh; my $fasta = "/usr/local/bin/fasta34"; my $library = "hahu.aa"; # modified my $query = "hahu.aa"; # modified my $options = "-E 0.01 -m 9 -d 0 -Q"; # modified my $command = "$fasta $options $query $library"; print "Start running\n$command\n"; open $fh,"$command |"; my $searchio = Bio::SearchIO->new(-format => 'fasta', -fh => $fh); if( my $r = $searchio->next_result ) { if ( my $hit = $r->next_hit ) { if( my $hsp = $hit->next_hsp ) { print "location is ", $hsp->hit->location->to_FTstring(), " ", $hsp->hit->length, " nt long\n"; } } } Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Yanfeng Shi Sent: Monday, April 04, 2005 4:52 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Query sequence length <= 0 I have a mitochondrial genome sequence and one short sequence of other species mitochondrial genome. I want to find the relative positon of my mitochondrial genome which is similar to that cds sequence. I run the code below.But I got " ***[/usr/local/bin/fasta34] Query sequence length <= 0: /home/yanfeng/BioInf/Project/qu.fasta*** -------------------- WARNING --------------------- MSG: unable to find and set query length Why does this happen? $|=1; my $fasta = "/usr/local/bin/fasta34"; my $library = "/home/yanfeng/BioInf/Project/mun_lab.fasta"; my $query = "/home/yanfeng/BioInf/Project/qu.fasta"; my $options = "-E 0.01 -m 9 -d 0 "; my $command = "$fasta $options $query $library"; print "Start running\n$command\n"; open($fh,"$command |"); my $searchio = Bio::SearchIO->new(-format => 'fasta', -fh => $fh); if( my $r = $searchio->next_result ) { if ( my $hit = $r->next_hit ) { # only want the BEST hit in this SIMPLE example if( my $hsp = $hit->next_hsp ) { # only one HSP per hit for FASTA,change for BLAST print "location is ", $hsp->hit->location->to_FTstring(), " ", $hsp->hit->length, " nt long\n"; } } } -- _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From letondal at pasteur.fr Fri Apr 8 09:54:32 2005 From: letondal at pasteur.fr (Catherine Letondal) Date: Fri Apr 8 09:43:50 2005 Subject: [Bioperl-l] CFP NETTAB 2005: Network Tools and Applications in Biology / Workflows management Message-ID: Hi, {Please pass the word!} NETTAB 2005: Network Tools and Applications in Biology Workflows management: new abilities for the biological information overflow 5-7 October, 2005, Second University of Naples Naples, Italy Web page : http://www.nettab.org/2005 Call for Papers: http://www.nettab.org/2005/call.html -- Catherine Letondal -- Institut Pasteur www.pasteur.fr/~letondal From agathman at semo.edu Fri Apr 8 10:13:33 2005 From: agathman at semo.edu (Gathman, Allen) Date: Fri Apr 8 10:11:20 2005 Subject: [Bioperl-l] Problem with Bio::DB::GFF Message-ID: <33580922CBEEC846B473BAE124985DE00426F9B4@xchgnt.semo.edu> Hello all - Although this problem arises in the context of Gbrowse, I'm pretty sure it's more of a straight bioperl question. I'm trying to pull specific features out of a database that I'm using to run Gbrowse, using the tools in Bio::DB::GFF. Here's a test program: #!/usr/bin/perl use Bio::DB::GFF; my $db=Bio::DB::GFF->new(-adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=ccsmall;host=localhost', -fasta => '/gbrowse/databases/cc' ); $outfile= $ARGV[0]; open (OUT, ">$outfile"); @gfeat=$db->features(-attributes => {-Gene => 'lcc1'}); print OUT "Genes: @gfeat\n"; foreach $feat (@gfeat){ print OUT "Start is " . $feat->start . "\n"; } close OUT; When I run this script, the output file says: Genes: And that's it. No features are returned. A gff file loaded into the ccsmall database using load_gff.pl has this first line: ccin_Contig34 CURATED_PRIVATE mRNA 64853 68924 . - . Gene lcc1; Eval 0.0; Lab Kues So it would appear that the attribute -Gene => 'lcc1' should pull out something - but obviously I'm missing something here, because it doesn't. I wrote another script that uses the get_seq_stream method, and it pulls out this gene, among others. I've been able to get at the 'lcc1' by looking at $f->group for each feature in the stream: $gefeat=$db->get_seq_stream(-type => ['mRNA:CURATED_PRIVATE']); while ($f=$gefeat->next_seq) { $group=$f->group; if ($group='lcc1') { bla bla bla } } But this is sort of (well, okay, REALLY) clumsy. So my question is, what am I doing wrong with the "features" method? I'd really prefer to be able to get at features by name if possible. And for that matter, if I wanted to get at them by Eval or Lab, I'm not sure what would do that at all. Thanks for any suggestions you can make -- -Allen Allen Gathman http://cstl-csm.semo.edu/gathman From nathanhaigh at ukonline.co.uk Fri Apr 8 11:43:40 2005 From: nathanhaigh at ukonline.co.uk (Nathan Spencer Haigh) Date: Fri Apr 8 11:37:40 2005 Subject: [Bioperl-l] obtaining nt seqs using protein GI's Message-ID: <1112975020.4256a6acd3a21@webmail.ukonline.net> This must have been done a million times, but i haven't got time at the moment to find out the answer myself. I have a list of GI's for which i would like to obtain the nucleotide sequences (including introns); can anyone give me some pointers? What i'm trying to do is this: i've used blastclust on ~5000 protein sequences to identify those with high sequence identity at the aa level, now i would like to create an alignment of the same sequences using nt and if possible include introns to try and identify any sequence variations. Thanks Nathan ---------------------------------------------- This mail sent through http://www.ukonline.net From jason.stajich at duke.edu Fri Apr 8 13:25:47 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Apr 8 13:19:50 2005 Subject: [Bioperl-l] obtaining nt seqs using protein GI's In-Reply-To: <1112975020.4256a6acd3a21@webmail.ukonline.net> References: <1112975020.4256a6acd3a21@webmail.ukonline.net> Message-ID: <986293d1e7834f3c02015bb63e99bc5c@duke.edu> % formatdb -o T -i nt -p F % fastacmd -i FILEWITHGIs -d nt -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 8, 2005, at 11:43 AM, Nathan Spencer Haigh wrote: > This must have been done a million times, but i haven't got time at > the moment > to find out the answer myself. > > I have a list of GI's for which i would like to obtain the nucleotide > sequences > (including introns); can anyone give me some pointers? > > What i'm trying to do is this: i've used blastclust on ~5000 protein > sequences > to identify those with high sequence identity at the aa level, now i > would like > to create an alignment of the same sequences using nt and if possible > include > introns to try and identify any sequence variations. > > Thanks > Nathan > > ---------------------------------------------- > This mail sent through http://www.ukonline.net > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Fri Apr 8 11:35:54 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Apr 8 15:20:45 2005 Subject: [Bioperl-l] Problem with Bio::DB::GFF In-Reply-To: <33580922CBEEC846B473BAE124985DE00426F9B4@xchgnt.semo.edu> References: <33580922CBEEC846B473BAE124985DE00426F9B4@xchgnt.semo.edu> Message-ID: <200504081135.54956.lstein@cshl.edu> Hi Allen, This is one of the oddities of the GFF2 format (and something that the GFF3 format fixes). The first attribute/value pair in the ninth column becomes the primary ID for the feature, so instead of fetching by attribute, you must fetch by name: my @feature = $db->get_feature_by_name(Gene=>'lcc1'); There may be several similarly-named genes, hence the list result. Lincoln On Friday 08 April 2005 10:13 am, Gathman, Allen wrote: > Hello all - > > > > Although this problem arises in the context of Gbrowse, I'm pretty > sure it's more of a straight bioperl question. > > I'm trying to pull specific features out of a database that I'm > using to run Gbrowse, using the tools in Bio::DB::GFF. Here's a > test program: > > > > #!/usr/bin/perl > > > > use Bio::DB::GFF; > > my $db=Bio::DB::GFF->new(-adaptor => 'dbi::mysql', > > -dsn => > 'dbi:mysql:database=ccsmall;host=localhost', > > -fasta => '/gbrowse/databases/cc' > > ); > > $outfile= $ARGV[0]; > > open (OUT, ">$outfile"); > > @gfeat=$db->features(-attributes => {-Gene => 'lcc1'}); > > print OUT "Genes: @gfeat\n"; > > foreach $feat (@gfeat){ > > print OUT "Start is " . $feat->start . "\n"; > > } > > close OUT; > > > > When I run this script, the output file says: > > > > Genes: > > > > > > And that's it. No features are returned. A gff file loaded into > the ccsmall database using load_gff.pl has this first line: > > > > ccin_Contig34 CURATED_PRIVATE mRNA 64853 68924 . - . > Gene lcc1; Eval 0.0; Lab Kues > > > > > So it would appear that the attribute -Gene => 'lcc1' should pull > out something - but obviously I'm missing something here, because > it doesn't. > > > > I wrote another script that uses the get_seq_stream method, and it > pulls out this gene, among others. I've been able to get at the > 'lcc1' by looking at $f->group for each feature in the stream: > > > > $gefeat=$db->get_seq_stream(-type => ['mRNA:CURATED_PRIVATE']); > > while ($f=$gefeat->next_seq) { > > $group=$f->group; > > if ($group='lcc1') { bla bla bla } > > } > > > > But this is sort of (well, okay, REALLY) clumsy. > > > > So my question is, what am I doing wrong with the "features" > method? I'd really prefer to be able to get at features by name if > possible. And for that matter, if I wanted to get at them by Eval > or Lab, I'm not sure what would do that at all. > > > > Thanks for any suggestions you can make -- > > > > -Allen > > > > Allen Gathman > > http://cstl-csm.semo.edu/gathman > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050408/2dfc7357/attachment.bin From W.Kenworthy at murdoch.edu.au Fri Apr 8 02:24:08 2005 From: W.Kenworthy at murdoch.edu.au (W.Kenworthy) Date: Fri Apr 8 15:21:10 2005 Subject: [Bioperl-l] single NHX treefile from two phylip files Message-ID: <1112941448.16441.13.camel@localhost> Hi, I am trying to create a single NHX treefile from two phylip files (one distance and one with bootstrap values) to view with ATV. Is there anything in BioPerl that would help? My initial look at the tree modules doesn't look promising. How do others add BS values to their graphics (is there another/better way)? I am aware that there are some questions on the validity of doing this, but it is useful none the less. Now that I am working with larger trees more often, hand editing NH files into a combination NHX is becoming a pain! BillK -- William Kenworthy Centre for Bioinformatics and Biological Computing (CBBC) Dept. of Information Technology Division of Arts Murdoch University W.Kenworthy@murdoch.edu.au +61 (0)8 9360 6856 Mob: 0419 929 646 From jason.stajich at duke.edu Fri Apr 8 15:51:02 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Apr 8 15:45:02 2005 Subject: [Bioperl-l] single NHX treefile from two phylip files In-Reply-To: <1112941448.16441.13.camel@localhost> References: <1112941448.16441.13.camel@localhost> Message-ID: <0afcca38987bd916d2a210b89b3977a2@duke.edu> I'm assuming they have the same topology. You just need to figure out how to match internal nodes up between the trees. I do this with the get_lca() method. Walk up one tree (with distances) and find the internal nodes (those which aren't leaves) and then get all the tips beneath it (grep { $_->is_Leaf} $node->get_all_Descendents). Then find the LCA for these nodes in the OTHER tree. now you have found the same internal node in the other tree, get the bootstrap accordingly and copy to the first tree. I'll get around to adding these advanced examples to the howto at some point.... -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 8, 2005, at 2:24 AM, W.Kenworthy wrote: > Hi, I am trying to create a single NHX treefile from two phylip files > (one distance and one with bootstrap values) to view with ATV. Is > there > anything in BioPerl that would help? My initial look at the tree > modules doesn't look promising. > > How do others add BS values to their graphics (is there another/better > way)? I am aware that there are some questions on the validity of doing > this, but it is useful none the less. Now that I am working with > larger > trees more often, hand editing NH files into a combination NHX is > becoming a pain! > > BillK > > -- > William Kenworthy > Centre for Bioinformatics and Biological Computing (CBBC) > Dept. of Information Technology > Division of Arts > Murdoch University > W.Kenworthy@murdoch.edu.au > +61 (0)8 9360 6856 > Mob: 0419 929 646 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gnf.org Fri Apr 8 19:50:10 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Apr 8 19:43:44 2005 Subject: [Bioperl-l] Entrez Gene parser In-Reply-To: <425572F8.4030508@utk.edu> References: <425572F8.4030508@utk.edu> Message-ID: <3405b892eb0a29125aa89cd7d3c3adbf@gnf.org> Cool, thanks for your work Stefan. I'll be sure to check this out. Is Mingyi's parser that presumably you're using on CPAN already? -hilmar On Apr 7, 2005, at 10:50 AM, Stefan Kirov wrote: > I commited the entrez gene parser code so people can start playing > with it if they like and send me any notes, requests and bugs. I know > that sometimes specie description can break the code and I am checking > if this is due to the GI::Parser::EntrezGene or it is my fault. > Things that are still on my To do list (in this order): > 1. convert the seq record to allow back compatibility with the > previous locuslink parser. The tags entrezgene uses are different from > those ll had, so some renaming needs to be done. > 2. STS Markers are not in the record yet > 3. Optimize the code to speed it up where possible > 4. Check the specie bug > 5. Fix the cycle reference in Transcript and remove the undef > $transcript->{parent} from the parser code > Please let me know if you have any notes or requests! > Negative comments appreciated as well. > Thanks! > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Fri Apr 8 19:51:07 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Apr 8 19:44:31 2005 Subject: [Bioperl-l] biosql.html In-Reply-To: References: Message-ID: Thanks a lot Brian. This will help. -hilmar On Apr 7, 2005, at 5:09 AM, Brian Osborne wrote: > Hilmar, > > I've updated biosql.html in biosql-schema. Postgres installation in > Cygwin > was no longer the 2 minute exercise it was a while back but it's still > pretty easy, biosql installation was as easy as ever. > > Brian O. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Fri Apr 8 20:08:58 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Apr 8 20:03:19 2005 Subject: [Bioperl-l] Entrez Gene parser In-Reply-To: <3405b892eb0a29125aa89cd7d3c3adbf@gnf.org> References: <425572F8.4030508@utk.edu> <3405b892eb0a29125aa89cd7d3c3adbf@gnf.org> Message-ID: <42571D1A.2050201@utk.edu> Thanks Hilmar. I think it's not (I think). It is on Sourceforge though. The link is in the entrezgene.pm doc. Mingyi, could you put it in CPAN? By the way there are 'fake' Entrez Gene records that do break the parser. They are designed, as far as I know, to hold GRIF data and I don't think they should be parsed at all. Maybe I will include a filter to look for those. I will work on the parser after Tuesday, I need to work on my immigration stuff and it is not fun at all :-( . Stefan Hilmar Lapp wrote: > Cool, thanks for your work Stefan. I'll be sure to check this out. Is > Mingyi's parser that presumably you're using on CPAN already? > > -hilmar > > On Apr 7, 2005, at 10:50 AM, Stefan Kirov wrote: > >> I commited the entrez gene parser code so people can start playing >> with it if they like and send me any notes, requests and bugs. I know >> that sometimes specie description can break the code and I am >> checking if this is due to the GI::Parser::EntrezGene or it is my fault. >> Things that are still on my To do list (in this order): >> 1. convert the seq record to allow back compatibility with the >> previous locuslink parser. The tags entrezgene uses are different >> from those ll had, so some renaming needs to be done. >> 2. STS Markers are not in the record yet >> 3. Optimize the code to speed it up where possible >> 4. Check the specie bug >> 5. Fix the cycle reference in Transcript and remove the undef >> $transcript->{parent} from the parser code >> Please let me know if you have any notes or requests! >> Negative comments appreciated as well. >> Thanks! >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> From skirov at utk.edu Fri Apr 8 20:12:18 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Apr 8 20:06:08 2005 Subject: [Bioperl-l] overloading of methods and ptkdb In-Reply-To: <3405b892eb0a29125aa89cd7d3c3adbf@gnf.org> References: <425572F8.4030508@utk.edu> <3405b892eb0a29125aa89cd7d3c3adbf@gnf.org> Message-ID: <42571DE2.9080002@utk.edu> By the way what is the state of the == overloading in Bio::Annotation:: (there was a discussion few weeks ago)? It breaks ptkdb and need it for certain things.... From hlapp at gnf.org Fri Apr 8 20:15:26 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Apr 8 20:08:45 2005 Subject: [Bioperl-l] SeqIO::table Message-ID: I wrote two new SeqIO-compliant streams that will return Bio::Seq objects from a table in either column-delimited ASCII text-format or contained in an Excel worksheet inside an Excel file, respectively. The table in either format is presumed to contain one seq per line (or row). The parser allows you to identify a few columns with implied semantic meaning (display_id, accession, species, sequence string). All other columns may be selectively chosen to be preserved in the annotation bundle. The motivation for this was that several comprehensive gene family publications made their data available in manually curated spreadsheets. I needed these data as a SeqIO-compliant stream, and going through an intermediary fasta file can mess up the annotation a lot. If anybody else is interested in this or if anybody else thinks this could be of general interest I'll commit it to bioperl. I've enclosed the supported arguments for the SeqIO::table::new method, this will give an idea of what is configurable. The excel parser supports the same arguments and the name of the worksheet in addition. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- Named parameters supported by the proposed Bio::SeqIO::table: -comment leading character(s) introducing a comment line -header the number of header lines to skip; the first non-comment header line will be used to obtain column names; column names will be used as the default tags for attaching annotation. -delim the delimiter for columns as a regular expression; consecutive occurrences of the delimiter will not be collapsed. -display_id the one-based index of the column containing the display ID of the sequence -accession_number the one-based index of the column containing the accession number of the sequence -seq the one-based index of the column containing the sequence string of the sequence -species the one-based index of the column containing the species for the sequence record; if not a number, will be used as the static species common to all records -annotation if provided and a scalar, a flag whether or not all additional columns are to be preserved as annotation, the tags used will either be 'colX' if there is no column header and where X is the one-based column index, and otherwise the column headers will be used as tags; if a reference to an array, only those columns (one-based index) will be preserved as annotation, tags as before; if a reference to a hash, the keys are one-based column indexes to be preserved, and the values are the tags under which the annotation is to be attached; if not provided or supplied as undef, no additional annotation will be preserved. -trim flag determining whether or not all values should be trimmed of leading and trailing white space From hlapp at gnf.org Fri Apr 8 20:28:24 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Apr 8 20:23:50 2005 Subject: [Bioperl-l] overloading of methods and ptkdb In-Reply-To: <42571DE2.9080002@utk.edu> References: <425572F8.4030508@utk.edu> <3405b892eb0a29125aa89cd7d3c3adbf@gnf.org> <42571DE2.9080002@utk.edu> Message-ID: I'm overdue on this, which I guess isn't a surprise. I had a conversation with Aaron and as the first step I'll follow his suggestion to write up test cases that clearly define the required behaviour and changing the test case to make them succeed if they don't otherwise will not be an option. If the refactored code can't be fixed in a reasonable manner (i.e., w/o black magic) to meet the tests all the SeqFeatureI and related refactoring will be rolled back on the main trunk. If the tests can be met I'll still have an uneasy feeling but given the facts will have to give for that part. The performance concerns are still valid I think but we should revisit those separately then. If you can identify the statement(s) in ptkdb (WTHIT?) that throw the error I can incorporate those into those tests. Or you add appropriate ones yourself. -hilmar On Apr 8, 2005, at 5:12 PM, Stefan Kirov wrote: > By the way what is the state of the == overloading in > Bio::Annotation:: (there was a discussion few weeks ago)? It breaks > ptkdb and need it for certain things.... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From glim at mycybernet.net Sun Apr 10 11:39:00 2005 From: glim at mycybernet.net (Gerard Lim) Date: Sun Apr 10 11:51:29 2005 Subject: [Bioperl-l] Reminder: Yet Another Perl Conference in Toronto, June 27 - 29 Message-ID: Yet Another YAPC::NA 2005 Conference Reminder --------------------------------------------- YAPC::NA 2005 is Yet Another Perl Conference, North America, this year to be held in downtown Toronto, Ontario, Canada, Mon - Wed 27 - 29 June 2005. Important Dates/Deadlines ------------------------- April 18 -- deadline for paper submissions May 12 -- last day of guaranteed accommodations YAPC::NA is a grassroots, all-volunteer conference. The speaker quality is high, the participants lively, and there are many extra social activities scheduled. We expect a bit over 400 people this year, and registration is proceeding faster this year than in the past. The registration cost is USD$85. Information on registration: http://yapc.org/America/register-2005.shtml http://yapc.org/America/registration-announcement-2005.txt Direct link to registration: http://donate.perlfoundation.org/index.pl?node=registrant%20info&conference_id=423 Want to be a speaker? Deadline for proposal submission is April 18, just over 1 week from now. Go to: http://yapc.org/America/cfp-2005.shtml Need accommodations in Toronto? Go to: http://yapc.org/America/accommodations-2005.shtml If you book before May 13 you will be guaranteed a hotel space. After that getting accommodations will become progressively more difficult. Prices we have arranged are in two different price ranges: approximately US$50 for a dorm room, US$72 for a decent hotel room. All accommodations are very nearby the conference venue. This message comes from the YAPC::NA 2005 organizers in Toronto.pm, http://to.pm.org/, on behalf of The Perl Foundation, http://www.perlfoundation.org/ We look forward to seeing you in Toronto! If you have any questions please contact na-help@yapc.org From chandan.kr.singh at gmail.com Sun Apr 10 15:38:06 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Sun Apr 10 15:31:59 2005 Subject: [Bioperl-l] RemoteBlast by Bio::Perl Through proxy Message-ID: <2d4f320504101238265ea11c@mail.gmail.com> I am a newbie and while installing bioperl and other related softwres i had installation problems but fortunately succeeded in debugging few .I had no idea where to share those experiences but now being a member of this forum might help. Though i dont have the installation outputs i would like to share the last such experience while installind Berkeley DB .It was not able to include a file Extern.h and i found that the concerned file in /usr/lib/perl5/..../Core/Extern had a wrong extension . After setting it to Extern.h installation was possible.I could not install few other related softwares . At present i am facing this problem.The second code for blast in bptutorial(html format on bioperl site ) when run does not yield the desired result . #! /usr/bin/perl use Bio::Perl; $seq_object = get_sequence('swiss',"ROA1_HUMAN"); # uses the default database - nr in this case $blast_result = blast_sequence($seq_object); write_blast(">roa1.blast",$blast_result); The output after running the code is ------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 Content-Length: 651 Content-Type: application/x-www-form-urlencoded DATABASE=nr&QUERY=%3EROA1_HUMAN+Heterogeneous+nuclear+ribonucleoprotein+A1+(Helix-destabilizing+protein)+(Single-strand+binding+protein)+(hnRNP+core+protein+A1).%0ASKSESPKEPEQLRKLFIGGLSFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVTYATVEEVDAAMNARPHKVDGRVVEPKRAVSREDSQRPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIEIMTDRGSGKKRGFAFVTFDDHDSVDKIVIQKYHTVNGHNCEVRKALSKQEMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGGYGGGGPGYSGGSRGYGSGGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGGSGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF&COMPOSITION_BASED_STATISTICS=off&EXPECT=1e-10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L&PROGRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to host) --------------------------------------------------- Submitted Blast for [ROA1_HUMAN] The env variable HTTP_PROXY or /and http_proxy is set correct .I hope somebody can help me out . Thank You . From Kary at ioc.fiocruz.br Mon Apr 11 06:37:27 2005 From: Kary at ioc.fiocruz.br (Kary Ann Del Carmen Soriano Ocana) Date: Mon Apr 11 07:08:03 2005 Subject: FW: [Bioperl-l] FW: Help with hmmpfam Message-ID: <8D44604203DAF9438BF9123B4A08C779B2702B@alpha.ioc.fiocruz.br> Dear All, I need run a "php script" and call in it (with system command) other "script in perl/bioperl" for run hmmpfam. I would like (if possible) to obtain some help for execute this script, because recognize only perl script but at the moment of run hmmpfam (bioperl modules), it doesn't run it. I am listing my code below and the output containing the following error: (partial) output and error: aceita a descri?o dos modulos lee a base de dados factory: Bio::Tools::Run::Hmmpfam=HASH(0x8be2d44) ?ltima linha da sa?da: factory: Bio::Tools::Run::Hmmpfam=HASH(0x8be2d44) Valor de Retorno: 2 I put some "print" commands everywhere to see where I am getting the error and looks like it is not entering/printing the while results (eg: factory, search). Any help would be greatly appreciated. Thanks, Kary ************ Script: 1.- php: echo '
';

// Mostra todo o resultado do comando do shell 'perl', e retorna
// a ?ltima linha da sa?da em $last_line. Guarda o valor de retorno
// do comando shell em $retval.
$last_line = system('/usr/bin/perl test_1.pl', $retval);
// Mostrando informa??o adicional
echo '

?ltima linha da sa?da: '.$last_line.'
Valor de Retorno: '.$retval 2.- perl: $ENV{HMMPFAMDIR} = '/usr/local/bin/hmmpfam'; use lib "/usr/local/bioperl14"; use lib "/usr/local/bioperl-run-1.4"; use strict; use Bio::Tools::Run::Hmmpfam; use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; use Bio::SearchIO::Writer::TextResultWriter; use Bio::SearchIO::Writer::HSPTableWriter; use Bio::SearchIO::Writer::ResultTableWriter; use Bio::SeqIO; print "aceita a descri?o dos modulos\n"; my @params = ('DB' => '/home/kary/public_html/MGE-Tryp_Mobile_Genetic_Elements_14_01_05/meus_modelos_hmms.hmm', 'E' => 0.0001); print "lee a base de dados\n"; my $factory = Bio::Tools::Run::Hmmpfam->new(@params); print "factory: $factory"; #any old protein fasta file my $search = $factory->run('/home/kary/public_html/MGE-Tryp_Mobile_Genetic_Elements_14_01_05/sequencia_fasta_1_tn.txt'); print "search and run: $search\n"; my $writer = Bio::SearchIO::Writer::HSPTableWriter->new( -columns => [qw( query_name hit_name score expect query_description )] ); my $out = Bio::SearchIO->new( -writer => $writer, -file => ">searchio.out" ); while (my $result = $search->next_result()) { $out->write_result($result); } print "fin"; From Marc.Logghe at devgen.com Mon Apr 11 07:42:12 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Apr 11 07:40:33 2005 Subject: [Bioperl-l] FW: Help with hmmpfam Message-ID: Hi Kary, As far as I understood it correctly, you never get to the line 'print "search and run: $search\n";', meaning your perl script crashes earlier, when you envoke $factory->run. Could you send the error message when you run the perl script from the command line, not via the php script ? Cheers, Marc > Dear All, > I need run a "php script" and call in it (with system > command) other "script in perl/bioperl" for run hmmpfam. I > would like (if possible) to obtain some help for execute this > script, because recognize only perl script but at the moment > of run hmmpfam (bioperl modules), it doesn't run it. > I am listing my code below and the output containing the > following error: > > > (partial) output and error: > aceita a descri?o dos modulos > lee a base de dados > factory: Bio::Tools::Run::Hmmpfam=HASH(0x8be2d44) > > ?ltima linha da sa?da: factory: > Bio::Tools::Run::Hmmpfam=HASH(0x8be2d44) Valor de Retorno: 2 > > > I put some "print" commands everywhere to see where I am > getting the error and looks like it is not entering/printing > the while results (eg: factory, search). Any help would be > greatly appreciated. > > Thanks, Kary > > ************ > > Script: > > 1.- php: > echo '
';
> 
> // Mostra todo o resultado do comando do shell 'perl', e 
> retorna // a ?ltima linha da sa?da em $last_line. Guarda o 
> valor de retorno // do comando shell em $retval.
> $last_line = system('/usr/bin/perl test_1.pl', $retval); // 
> Mostrando informa??o adicional echo '
> 
>
?ltima linha da sa?da: '.$last_line.' >
Valor de Retorno: '.$retval > 2.- perl: > $ENV{HMMPFAMDIR} = '/usr/local/bin/hmmpfam'; use lib > "/usr/local/bioperl14"; use lib "/usr/local/bioperl-run-1.4"; > > use strict; > use Bio::Tools::Run::Hmmpfam; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > use Bio::SearchIO::Writer::TextResultWriter; > use Bio::SearchIO::Writer::HSPTableWriter; > use Bio::SearchIO::Writer::ResultTableWriter; > use Bio::SeqIO; > print "aceita a descri?o dos modulos\n"; my @params = > ('DB' => > '/home/kary/public_html/MGE-Tryp_Mobile_Genetic_Elements_14_01 > _05/meus_modelos_hmms.hmm', 'E' => 0.0001); > print "lee a base de dados\n"; > my $factory = Bio::Tools::Run::Hmmpfam->new(@params); > print "factory: $factory"; > #any old protein fasta file > my $search = > $factory->run('/home/kary/public_html/MGE-Tryp_Mobile_Genetic_ > Elements_14_01_05/sequencia_fasta_1_tn.txt'); > print "search and run: $search\n"; > my $writer = Bio::SearchIO::Writer::HSPTableWriter->new( > -columns => [qw( > query_name > hit_name > score > expect > query_description > )] ); my > $out = Bio::SearchIO->new( -writer => $writer, > -file => ">searchio.out" ); > while (my $result = $search->next_result()) { > $out->write_result($result); > } > print "fin"; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From Kary at ioc.fiocruz.br Mon Apr 11 08:19:16 2005 From: Kary at ioc.fiocruz.br (Kary Ann Del Carmen Soriano Ocana) Date: Mon Apr 11 08:27:15 2005 Subject: [Bioperl-l] FW: Help with hmmpfam Message-ID: <8D44604203DAF9438BF9123B4A08C779B2702C@alpha.ioc.fiocruz.br> Hi Marc: When I run it via shell I do not have any problems, it runs perfectly (including this "print"), and it gives me the following result(1.-) eventually it creates de outputfile: searchio.out (2.- following). In my mind, the problem is: php script is not recognizing this hmmpfam with system command. Thank you very much. Kary 1.- Result: [kary@vivax MGE-Tryp_Mobile_Genetic_Elements_14_01_05]$ perl -w test_1_prueba_11_04.pl aceita a descri?o dos modulos lee a base de dados factory: Bio::Tools::Run::Hmmpfam=HASH(0x9ab0f38)search and run: Bio::SearchIO::hmmer=HASH(0x9acddac) fin[kary@vivax MGE-Tryp_Mobile_Genetic_Elements_14_01_05]$ more searchio.out tr|Q8GPE3 lEwFFhiGV0 -108 1.4e-05 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 Uge97N6Hov 1229 0.0e+00 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 5o31pMTfTD 358 2.9e-106 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 S9idxzhdAK 108 4.4e-31 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 eRcC2vN3jZ 67 1.1e-18 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 AEHBfMfmWH 154 6.6e-45 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 I6zf7ixS5D 1116 0.0e+00 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 bbHl2B5rlL 172 3.2e-50 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 G4UdcJgsXe 159 1.7e-46 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 U0RpshXJqE 304 6.4e-90 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 F7aL2AcPUu 96 2.2e-27 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 6dvvm1BKtB 114 9.0e-33 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 T3HBWp9YHA 555 1.1e-165 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 TPkc6EtVLN 151 7.3e-44 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 tQAf8GgYdG 50 1.2e-15 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 2kFyqaGTYl 37 1.3e-14 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 i6l3GMokJJ 114 1.2e-32 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 LMrNsiVENo 1135 0.0e+00 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 CfazgjGKUp 1128 0.0e+00 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 klmSPIXJ0B -3 3.5e-11 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 ViPul4YYKM 0 8.0e-10 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 t9vFE4UzS9 119 2.3e-34 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 As7DuhEAJB 4 2.0e-11 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 IG63mb6p4n 9 1.9e-10 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 3RnaG01Zyt 1147 0.0e+00 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 r6zA7KBX3Q 62 5.1e-17 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 ZpKGsPYYvP 240 1.0e-70 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 9HnSRY75Aa 30 1.3e-12 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 2a8vzyTUP1 166 2.5e-48 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 tBC6btpD1z 44 1.7e-12 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 J3KfLxK6ot -3 2.7e-08 Tnp-IS1191 - Streptococcus thermophilus. -----Original Message----- From: [mailto:Marc.Logghe@devgen.com] Sent: Mon 4/11/2005 8:42 AM To: Kary Ann Del Carmen Soriano Ocana; bioperl-l@portal.open-bio.org Cc: Alberto M. R. Davila Subject: RE: [Bioperl-l] FW: Help with hmmpfam Hi Kary, As far as I understood it correctly, you never get to the line 'print "search and run: $search\n";', meaning your perl script crashes earlier, when you envoke $factory->run. Could you send the error message when you run the perl script from the command line, not via the php script ? Cheers, Marc > Dear All, > I need run a "php script" and call in it (with system > command) other "script in perl/bioperl" for run hmmpfam. I > would like (if possible) to obtain some help for execute this > script, because recognize only perl script but at the moment > of run hmmpfam (bioperl modules), it doesn't run it. > I am listing my code below and the output containing the > following error: > > > (partial) output and error: > aceita a descri?o dos modulos > lee a base de dados > factory: Bio::Tools::Run::Hmmpfam=HASH(0x8be2d44) > > ?ltima linha da sa?da: factory: > Bio::Tools::Run::Hmmpfam=HASH(0x8be2d44) Valor de Retorno: 2 > > > I put some "print" commands everywhere to see where I am > getting the error and looks like it is not entering/printing > the while results (eg: factory, search). Any help would be > greatly appreciated. > > Thanks, Kary > > ************ > > Script: > > 1.- php: > echo '
';
> 
> // Mostra todo o resultado do comando do shell 'perl', e 
> retorna // a ?ltima linha da sa?da em $last_line. Guarda o 
> valor de retorno // do comando shell em $retval.
> $last_line = system('/usr/bin/perl test_1.pl', $retval); // 
> Mostrando informa??o adicional echo '
> 
>
?ltima linha da sa?da: '.$last_line.' >
Valor de Retorno: '.$retval > 2.- perl: > $ENV{HMMPFAMDIR} = '/usr/local/bin/hmmpfam'; use lib > "/usr/local/bioperl14"; use lib "/usr/local/bioperl-run-1.4"; > > use strict; > use Bio::Tools::Run::Hmmpfam; > use Bio::SearchIO; > use Bio::SearchIO::Writer::HTMLResultWriter; > use Bio::SearchIO::Writer::TextResultWriter; > use Bio::SearchIO::Writer::HSPTableWriter; > use Bio::SearchIO::Writer::ResultTableWriter; > use Bio::SeqIO; > print "aceita a descri?o dos modulos\n"; my @params = > ('DB' => > '/home/kary/public_html/MGE-Tryp_Mobile_Genetic_Elements_14_01 > _05/meus_modelos_hmms.hmm', 'E' => 0.0001); > print "lee a base de dados\n"; > my $factory = Bio::Tools::Run::Hmmpfam->new(@params); > print "factory: $factory"; > #any old protein fasta file > my $search = > $factory->run('/home/kary/public_html/MGE-Tryp_Mobile_Genetic_ > Elements_14_01_05/sequencia_fasta_1_tn.txt'); > print "search and run: $search\n"; > my $writer = Bio::SearchIO::Writer::HSPTableWriter->new( > -columns => [qw( > query_name > hit_name > score > expect > query_description > )] ); my > $out = Bio::SearchIO->new( -writer => $writer, > -file => ">searchio.out" ); > while (my $result = $search->next_result()) { > $out->write_result($result); > } > print "fin"; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From mingyi.liu at gpc-biotech.com Mon Apr 11 09:25:43 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Mon Apr 11 09:18:30 2005 Subject: [Bioperl-l] Entrez Gene parser In-Reply-To: <42571D1A.2050201@utk.edu> References: <425572F8.4030508@utk.edu> <3405b892eb0a29125aa89cd7d3c3adbf@gnf.org> <42571D1A.2050201@utk.edu> Message-ID: <425A7AD7.6000307@gpc-biotech.com> Stefan Kirov wrote: > Thanks Hilmar. I think it's not (I think). It is on Sourceforge > though. The link is in the entrezgene.pm doc. Mingyi, could you put it > in CPAN? By the way there are 'fake' Entrez Gene records that do break > the parser. They are designed, as far as I know, to hold GRIF data and > I don't think they should be parsed at all. Maybe I will include a > filter to look for those. > I will work on the parser after Tuesday, I need to work on my > immigration stuff and it is not fun at all :-( . > Stefan > I'll put the parser on CPAN. BTW, what namespace do you guys think the parser should be in? Right now it's GI::Parser::EntrezGene, but this path is completely related to our internal project, so it must change. I'm also sending mail to modules@perl.org for advice. Thanks, Mingyi From mlemieux at bioinfo.ca Mon Apr 11 18:25:46 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Mon Apr 11 18:19:42 2005 Subject: [Bioperl-l] RemoteBlast by Bio::Perl Through proxy Message-ID: Chandan, Does this error happen every time you run the code? The get_sequence works otherwise your error message would have come from EBI, not NCBI, which means your proxy is set up OK. Both get_sequence and blast_sequence eventually call the same User Agent code so if one works, the other should as well. Have you edited either Perl.pm or RemoteBlast.pm? Madeleine > I am a newbie and while installing bioperl and other related softwres > i had installation problems > but fortunately succeeded in debugging few .I had no idea where to > share those experiences > but now being a member of this forum might help. Though i dont have > the installation outputs > i would like to share the last such experience while installind > Berkeley DB .It was not able to include a file Extern.h and i found > that the concerned file in > /usr/lib/perl5/..../Core/Extern had a wrong extension . After setting > it to Extern.h installation was > possible.I could not install few other related softwares . > > At present i am facing this problem.The second code for blast in > bptutorial(html format on bioperl site ) when run does not yield the > desired result . > #! /usr/bin/perl > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > # uses the default database - nr in this case > $blast_result = blast_sequence($seq_object); > > write_blast(">roa1.blast",$blast_result); > The output after running the code is > ------------------- WARNING --------------------- > MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 > Content-Length: 651 > Content-Type: application/x-www-form-urlencoded > > DATABASE=nr&QUERY=%3EROA1_HUMAN+Heterogeneous+nuclear+ribonucleoprotein > +A1+(Helix-destabilizing+protein)+(Single- > strand+binding+protein)+(hnRNP+core+protein+A1).%0ASKSESPKEPEQLRKLFIGGL > SFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVTYATVEEVDAAMNARPHKVDGRVVEPKRAVSR > EDSQRPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIEIMTDRGSGKKRGFAFVTFDDHDSVDKIV > IQKYHTVNGHNCEVRKALSKQEMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGG > GGYGGSGDGYNGFGNDGGYGGGGPGYSGGSRGYGSGGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGGSGSN > FGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF&COM > POSITION_BASED_STATISTICS=off&EXPECT=1e > -10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L&PROGRAM=blas > tp > > > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to > host) > > > > --------------------------------------------------- > Submitted Blast for [ROA1_HUMAN] > > > The env variable HTTP_PROXY or /and http_proxy is set correct .I hope > somebody can > help me out . > Thank You . From abrown at hgmp.mrc.ac.uk Tue Apr 12 04:53:50 2005 From: abrown at hgmp.mrc.ac.uk (Alex Brown) Date: Tue Apr 12 10:53:02 2005 Subject: [Bioperl-l] Bio::AlignIO error Message-ID: <63D77FD0-AB30-11D9-9B6D-0003938768AC@hgmp.mrc.ac.uk> I believe I have found an error in Bio::AlignIO - I was getting unrealistic results from $aln->no_residues. I traced this to the no_residues subroutine in Bio::SimpleAlign: sub no_residues { my $self = shift; my $count = 0; foreach my $seq ($self->each_seq) { my $str = $seq->seq(); $count += ($str =~ s/[^A-Za-z]//g); } return $count; } I think line 8 of the subroutine should read: $count += ($str =~ s/[A-Za-z]//g); Cheers, Alex Brown. From jason.stajich at duke.edu Tue Apr 12 11:10:07 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Apr 12 11:05:24 2005 Subject: [Bioperl-l] Bio::AlignIO error In-Reply-To: <63D77FD0-AB30-11D9-9B6D-0003938768AC@hgmp.mrc.ac.uk> References: <63D77FD0-AB30-11D9-9B6D-0003938768AC@hgmp.mrc.ac.uk> Message-ID: done. thanks alex. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 12, 2005, at 4:53 AM, Alex Brown wrote: > I believe I have found an error in Bio::AlignIO - I was getting > unrealistic results from $aln->no_residues. I traced this to the > no_residues subroutine in Bio::SimpleAlign: > > sub no_residues { > my $self = shift; > my $count = 0; > > foreach my $seq ($self->each_seq) { > my $str = $seq->seq(); > > $count += ($str =~ s/[^A-Za-z]//g); > } > > return $count; > } > > I think line 8 of the subroutine should read: > > $count += ($str =~ s/[A-Za-z]//g); > > Cheers, > > Alex Brown. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From skirov at utk.edu Tue Apr 12 12:43:53 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Apr 12 12:37:29 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <0IES007IOT8WD1@smtpout.cair.du.edu> References: <0IES007IOT8WD1@smtpout.cair.du.edu> Message-ID: <425BFAC9.9070709@utk.edu> Colin, When you need to ask bioperl question please send it to the list. You can get the parser by installing bioperl-live: The example below shows how to login to the bioperl repository. To login to other repositories simply alter the /home/repository/(project) information. *cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl login* /when prompted, the password is 'cvs'/ (4) Each project CVS repository can have many different packages available for download. You may need to browse the web interface for a bit to determine the packages of interest. After a successful login you may "checkout" the project package you are interested in. The following command should be executed as one line. The specific example shows how to check out the primary bioperl codebase which is contained in the "*bioperl-live*" package. * cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl checkout bioperl-live then perl Makefile.PL and finally make install In order for this parser to work you need to get GI::Parser::Entrezgene from sourceforge. You can get the address for this module from the perl doc of entrezgene: perldoc Bio::SeqIO::entrezgene Stefan *Colin Erdman wrote: > Hello, > > > > I was just curious where the latest version of your bioperl entrez > gene parser code might be found? > > > > Thanks, > > Colin Erdman > From mingyi.liu at gpc-biotech.com Tue Apr 12 12:55:41 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Tue Apr 12 12:47:55 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <425BFAC9.9070709@utk.edu> References: <0IES007IOT8WD1@smtpout.cair.du.edu> <425BFAC9.9070709@utk.edu> Message-ID: <425BFD8D.2090808@gpc-biotech.com> Stefan Kirov wrote: > In order for this parser to work you need to get > GI::Parser::Entrezgene from sourceforge. You can get the address for > this module from the perl doc of entrezgene: perldoc > Bio::SeqIO::entrezgene > Stefan > I just want to add that I will be adding GI::Parser::EntrezGene to cpan in a few days, and most likely the name space will switch to Bio::ASN1 (therefore it'd be Bio::ASN1::EntrezGene) based on PAUSE admin suggestion. Thanks, Mingyi From skirov at utk.edu Tue Apr 12 13:31:58 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Apr 12 13:25:46 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <425BFD8D.2090808@gpc-biotech.com> References: <0IES007IOT8WD1@smtpout.cair.du.edu> <425BFAC9.9070709@utk.edu> <425BFD8D.2090808@gpc-biotech.com> Message-ID: <425C060E.8050501@utk.edu> Thanks Mingyi, This is very useful. I am not sure about the namespace. Maybe we should have a namespace for low-level parsers that do not create directly bioperl objects. What other people think of this? Stefan Mingyi Liu wrote: > Stefan Kirov wrote: > >> In order for this parser to work you need to get >> GI::Parser::Entrezgene from sourceforge. You can get the address for >> this module from the perl doc of entrezgene: perldoc >> Bio::SeqIO::entrezgene >> Stefan >> > I just want to add that I will be adding GI::Parser::EntrezGene to > cpan in a few days, and most likely the name space will switch to > Bio::ASN1 (therefore it'd be Bio::ASN1::EntrezGene) based on PAUSE > admin suggestion. > > Thanks, > > Mingyi > > From mingyi.liu at gpc-biotech.com Tue Apr 12 13:42:31 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Tue Apr 12 13:36:04 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <425C060E.8050501@utk.edu> References: <0IES007IOT8WD1@smtpout.cair.du.edu> <425BFAC9.9070709@utk.edu> <425BFD8D.2090808@gpc-biotech.com> <425C060E.8050501@utk.edu> Message-ID: <425C0887.9000004@gpc-biotech.com> Stefan Kirov wrote: > Thanks Mingyi, > This is very useful. I am not sure about the namespace. Maybe we > should have a namespace for low-level parsers that do not create > directly bioperl objects. What other people think of this? > Stefan > Yeah, Bio::ASN1 would be somewhat limited to just ASN.1 based files. Maybe Bio::Tools::EntrezGene would be better since Bio::Tools seems to be used to contain the parsers for some special formats. Mingyi From hlapp at gnf.org Tue Apr 12 14:05:19 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Apr 12 13:58:20 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <425C0887.9000004@gpc-biotech.com> References: <0IES007IOT8WD1@smtpout.cair.du.edu> <425BFAC9.9070709@utk.edu> <425BFD8D.2090808@gpc-biotech.com> <425C060E.8050501@utk.edu> <425C0887.9000004@gpc-biotech.com> Message-ID: <28aedbdf396dce4134a9d718cb59c297@gnf.org> If it has to be under the top-level Bio:: namespace then Bio::ASN1 doesn't sound like a bad choice at all. Under Bio::Tools:: you'd usually expect modules that consume or produce bioperl objects, so I'd rather not use it for your purpose. My $0.02. -hilmar On Apr 12, 2005, at 10:42 AM, Mingyi Liu wrote: > Stefan Kirov wrote: > >> Thanks Mingyi, >> This is very useful. I am not sure about the namespace. Maybe we >> should have a namespace for low-level parsers that do not create >> directly bioperl objects. What other people think of this? >> Stefan >> > Yeah, Bio::ASN1 would be somewhat limited to just ASN.1 based files. > Maybe Bio::Tools::EntrezGene would be better since Bio::Tools seems to > be used to contain the parsers for some special formats. > > Mingyi > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Tue Apr 12 14:07:51 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Apr 12 14:02:15 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <425C0887.9000004@gpc-biotech.com> References: <0IES007IOT8WD1@smtpout.cair.du.edu> <425BFAC9.9070709@utk.edu> <425BFD8D.2090808@gpc-biotech.com> <425C060E.8050501@utk.edu> <425C0887.9000004@gpc-biotech.com> Message-ID: <7ec26b84923f775093ffb3be9b1dafd4@duke.edu> Just beware that tools is grab-bag that people have a hard time negotiating already... -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 12, 2005, at 1:42 PM, Mingyi Liu wrote: > Stefan Kirov wrote: > >> Thanks Mingyi, >> This is very useful. I am not sure about the namespace. Maybe we >> should have a namespace for low-level parsers that do not create >> directly bioperl objects. What other people think of this? >> Stefan >> > Yeah, Bio::ASN1 would be somewhat limited to just ASN.1 based files. > Maybe Bio::Tools::EntrezGene would be better since Bio::Tools seems to > be used to contain the parsers for some special formats. > > Mingyi > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From sjmiller at email.arizona.edu Tue Apr 12 14:12:16 2005 From: sjmiller at email.arizona.edu (Susan J. Miller) Date: Tue Apr 12 14:07:10 2005 Subject: [Bioperl-l] bioperl1.5 generate_cigar_string In-Reply-To: <200504111320.j3BDKvfX013639@portal.open-bio.org> References: <200504111320.j3BDKvfX013639@portal.open-bio.org> Message-ID: <425C0F80.805@email.arizona.edu> I am using bioperl 1.5 on a Solaris 8 system. I'm trying to use the generate_cigar_string method in the following script, but I can't get it to work. My input file is a normal blast output (no -m option). Can someone tell me if I'm doing something incorrectly? ======================================================================= #!/usr/local/bin/perl -w use lib '/export/home/susanjo/bioperl-1.5.0'; use Bio::SearchIO; use Bio::Search::HSP::GenericHSP; $fil = shift(@ARGV); $srchio = new Bio::SearchIO ('-format' => 'blast', '-file' => $fil); # Look at the results for each Query while ($result = $srchio->next_result) { # Look at each subject hit while ($hit = $result->next_hit) { $hitnam = $hit->name; $acc = $hit->accession; print "ACC $acc HIT $hitnam\n"; # Within a hit there are one or more HSPs (High Scoring Pairs) while ($hsp = $hit->next_hsp) { $Qseq = $hsp->query_string; $Sseq = $hsp->hit_string; $cigar = Bio::Search::HSP::GenericHSP::generate_cigar_string($Qseq, $Sseq); #print "CIGAR: $cigar\n"; } } } ======================================================================== The error message is: Use of uninitialized value in split at /export/home/susanjo/bioperl-1.5.0/Bio/Search/HSP/GenericHSP.pm line 1269, line 717. Can't locate object method "throw" via package "tgctgcgttctccatcgatgccatatttgaccccaccccgacattttcccctcgcgaaagggttgaggtaggaagcgcacgcgcaaccactcccttgacccaaccaccaacaaaaaggctcgcccaagcaaacacataaaccaccgccgccgccttttggagagaaccaactatggaggcccacatcttccaacgaacaaaggaccgagcaccgccgctacgaacgagtctgaatttttccatcgtgtttcccaagcacctcgactgcagcaaccatgccccgccgaacggaccgttggagagggggctcccgggcaacgagcttccgaaccccatacaaaaaacacgctaggggaaataataacaaaggatgggacaagcacaagcatcgcttgcaagtgcacgaaacaacaagcgtgaactcaagataggcaactaaaaatgttgcgagtctctttcgcaagaggcaccaccccgacaaagcatcaaacccgctcacctagagacacaccacacgttcctagattacatacaatcgaaacccacacacaacccaaacgaaaaaacaaaacagccctgctatcaatgatcctaccgcaggttcacctacggaaaaccttgttacgacttcta" at /export/home/susanjo/bioperl-1.5.0/Bio/Search/HSP/GenericHSP.pm line 1271, line 717. I also tried changing my script like this: $cigar = Bio::Search::HSP::GenericHSP->generate_cigar_string($Qseq, $Sseq); but then the error is: Can't use string ("Bio::Search::HSP::GenericHSP") as a HASH ref while "strict refs" in use at /export/home/susanjo/bioperl-1.5.0/Bio/Search/HSP/GenericHSP.pm line 1275, line 717. Am I using this incorrectly? Thanks, Susan J. Miller Biotechnology Computing Facility University of Arizona From mingyi.liu at gpc-biotech.com Tue Apr 12 14:13:00 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Tue Apr 12 14:07:12 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <7ec26b84923f775093ffb3be9b1dafd4@duke.edu> References: <0IES007IOT8WD1@smtpout.cair.du.edu> <425BFAC9.9070709@utk.edu> <425BFD8D.2090808@gpc-biotech.com> <425C060E.8050501@utk.edu> <425C0887.9000004@gpc-biotech.com> <7ec26b84923f775093ffb3be9b1dafd4@duke.edu> Message-ID: <425C0FAC.70205@gpc-biotech.com> Jason Stajich wrote: > Just beware that tools is grab-bag that people have a hard time > negotiating already... > OK. Seems Bio::ASN1 is better than Bio::Tools. Any other suggestions? I'll be registering the namespace tomorrow if Bio::ASN1 is OK with most. Thanks, Mingyi From cerdman2 at du.edu Tue Apr 12 14:23:02 2005 From: cerdman2 at du.edu (Colin Erdman) Date: Tue Apr 12 14:16:56 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <425BFD8D.2090808@gpc-biotech.com> Message-ID: <0IEU00683J2KRJ@smtpout.cair.du.edu> I am between Linux installs right now and actually running win32 with the ActiveState Perl install... How does one add the cvs.open-bio.org repository to the PPM console list to search through it and install the bioperl-live packages etc? I don't see a comparable cvs command within it. This is all new to me and I appreciate the help! Thanks, Colin -----Original Message----- From: Mingyi Liu [mailto:mingyi.liu@gpc-biotech.com] Sent: Tuesday, April 12, 2005 10:56 AM To: Stefan Kirov Cc: Colin Erdman; Bioperl list Subject: Re: [Bioperl-l] Re: Entrez gene parser code Stefan Kirov wrote: > In order for this parser to work you need to get > GI::Parser::Entrezgene from sourceforge. You can get the address for > this module from the perl doc of entrezgene: perldoc > Bio::SeqIO::entrezgene > Stefan > I just want to add that I will be adding GI::Parser::EntrezGene to cpan in a few days, and most likely the name space will switch to Bio::ASN1 (therefore it'd be Bio::ASN1::EntrezGene) based on PAUSE admin suggestion. Thanks, Mingyi From skirov at UTK.EDU Tue Apr 12 14:30:24 2005 From: skirov at UTK.EDU (Stefan Kirov) Date: Tue Apr 12 14:39:45 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <0IEU00683J2KRJ@smtpout.cair.du.edu> References: <0IEU00683J2KRJ@smtpout.cair.du.edu> Message-ID: <425C13C0.40301@utk.edu> You can use WinCVS to get the source code. You cannot add CVS repository as a PPM, they are not compatible. You can try cygwin or manually putting the bioperl code (which will be a major pain) in the right places (C:\Perl\site\lib is the usual I think). Stefan Colin Erdman wrote: >I am between Linux installs right now and actually running win32 with the >ActiveState Perl install... How does one add the cvs.open-bio.org repository >to the PPM console list to search through it and install the bioperl-live >packages etc? I don't see a comparable cvs command within it. > >This is all new to me and I appreciate the help! >Thanks, >Colin > >-----Original Message----- >From: Mingyi Liu [mailto:mingyi.liu@gpc-biotech.com] >Sent: Tuesday, April 12, 2005 10:56 AM >To: Stefan Kirov >Cc: Colin Erdman; Bioperl list >Subject: Re: [Bioperl-l] Re: Entrez gene parser code > >Stefan Kirov wrote: > > > >>In order for this parser to work you need to get >>GI::Parser::Entrezgene from sourceforge. You can get the address for >>this module from the perl doc of entrezgene: perldoc >>Bio::SeqIO::entrezgene >>Stefan >> >> >> >I just want to add that I will be adding GI::Parser::EntrezGene to cpan >in a few days, and most likely the name space will switch to Bio::ASN1 >(therefore it'd be Bio::ASN1::EntrezGene) based on PAUSE admin suggestion. > >Thanks, > >Mingyi > > > > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From jason.stajich at duke.edu Tue Apr 12 14:48:42 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Apr 12 14:42:25 2005 Subject: [Bioperl-l] bioperl1.5 generate_cigar_string In-Reply-To: <425C0F80.805@email.arizona.edu> References: <200504111320.j3BDKvfX013639@portal.open-bio.org> <425C0F80.805@email.arizona.edu> Message-ID: <003b4ddb77ce4031f7573d09c344210e@duke.edu> You should be able to just call: my $cigarstring = $hsp->cigar_string. it takes care of delegating to generate_cigar_string for you. Otherwise you can call it explicitly ( the previous will cache the call for you ) $hsp->generate_cigar_string($Qseq, $Sseq); You need to have instantiated an HSP object to call this method because it further delegates to other methods (_sub_cigar_string). -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 12, 2005, at 2:12 PM, Susan J. Miller wrote: > I am using bioperl 1.5 on a Solaris 8 system. I'm trying to use the > generate_cigar_string method in the following script, but I can't get > it to work. My input file is a normal blast output (no -m option). > Can someone tell me if I'm doing something incorrectly? > > ======================================================================= > #!/usr/local/bin/perl -w > > use lib '/export/home/susanjo/bioperl-1.5.0'; > use Bio::SearchIO; > use Bio::Search::HSP::GenericHSP; > > $fil = shift(@ARGV); > $srchio = new Bio::SearchIO ('-format' => 'blast', '-file' => $fil); > > # Look at the results for each Query > while ($result = $srchio->next_result) { > > # Look at each subject hit > while ($hit = $result->next_hit) { > $hitnam = $hit->name; > $acc = $hit->accession; > print "ACC $acc HIT $hitnam\n"; > > # Within a hit there are one or more HSPs (High Scoring Pairs) > while ($hsp = $hit->next_hsp) { > > $Qseq = $hsp->query_string; > $Sseq = $hsp->hit_string; > $cigar = > Bio::Search::HSP::GenericHSP::generate_cigar_string($Qseq, $Sseq); > #print "CIGAR: $cigar\n"; > > } > } > } > ======================================================================= > = > > The error message is: > Use of uninitialized value in split at > /export/home/susanjo/bioperl-1.5.0/Bio/Search/HSP/GenericHSP.pm line > 1269, line 717. > Can't locate object method "throw" via package > "tgctgcgttctccatcgatgccatatttgaccccaccccgacattttcccctcgcgaaagggttgaggta > ggaagcgcacgcgcaaccactcccttgacccaaccaccaacaaaaaggctcgcccaagcaaacacataaac > caccgccgccgccttttggagagaaccaactatggaggcccacatcttccaacgaacaaaggaccgagcac > cgccgctacgaacgagtctgaatttttccatcgtgtttcccaagcacctcgactgcagcaaccatgccccg > ccgaacggaccgttggagagggggctcccgggcaacgagcttccgaaccccatacaaaaaacacgctaggg > gaaataataacaaaggatgggacaagcacaagcatcgcttgcaagtgcacgaaacaacaagcgtgaactca > agataggcaactaaaaatgttgcgagtctctttcgcaagaggcaccaccccgacaaagcatcaaacccgct > cacctagagacacaccacacgttcctagattacatacaatcgaaacccacacacaacccaaacgaaaaaac > aaaacagccctgctatcaatgatcctaccgcaggttcacctacggaaaaccttgttacgacttcta" at > /export/home/susanjo/bioperl-1.5.0/Bio/Search/HSP/GenericHSP.pm line > 1271, line 717. > > > I also tried changing my script like this: > $cigar = Bio::Search::HSP::GenericHSP->generate_cigar_string($Qseq, > $Sseq); > > but then the error is: > Can't use string ("Bio::Search::HSP::GenericHSP") as a HASH ref while > "strict refs" in use at > /export/home/susanjo/bioperl-1.5.0/Bio/Search/HSP/GenericHSP.pm line > 1275, line 717. > > > Am I using this incorrectly? > > > Thanks, > > Susan J. Miller > Biotechnology Computing Facility > University of Arizona > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From sjmiller at email.arizona.edu Tue Apr 12 18:32:48 2005 From: sjmiller at email.arizona.edu (Susan J. Miller) Date: Tue Apr 12 18:26:44 2005 Subject: [Bioperl-l] SearchIO::Writer::GbrowseGFF In-Reply-To: <003b4ddb77ce4031f7573d09c344210e@duke.edu> References: <200504111320.j3BDKvfX013639@portal.open-bio.org> <425C0F80.805@email.arizona.edu> <003b4ddb77ce4031f7573d09c344210e@duke.edu> Message-ID: <425C4C90.2020102@email.arizona.edu> Jason Stajich wrote: > You should be able to just call: > my $cigarstring = $hsp->cigar_string. > > it takes care of delegating to generate_cigar_string for you. Thank you, Jason - that's exactly what I need. I have another (semi-related) question: why doesn't SearchIO write_result output the cigar string when the output_format is 'GbrowseGFF'? Thanks, Susan J. Miller Biotechnology Computing Facility University of Arizona From mlemieux at bioinfo.ca Tue Apr 12 20:08:44 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Tue Apr 12 20:03:25 2005 Subject: [Bioperl-l] RemoteBlast by Bio::Perl Through proxy In-Reply-To: <2d4f3205041211533a6a32c8@mail.gmail.com> References: <2d4f3205041211533a6a32c8@mail.gmail.com> Message-ID: <055ecfb115370dc1924c4ceff464c8d5@bioinfo.ca> Chandan, In your own best interest, you should post your responses to the list. That way you get the benefit of a large talent pool rather than limiting yourself to what little I know. You can try getting a fresh copy of RemoteBlast.pm from the bioperl website just in case you unintentionally changed something other than that one line. But it seems to me as if you're timing out before the NCBI server has time to respond in which case I'm afraid I don't know how to help you. If you do traceroute www.ncbi.nlm.nih.gov, do you reach the server in a reasonable amount of time? Madeleine > hi Madeleine Lemieux > Thanks for ur reply . Yes the error happens everytime i run the code > and you deduced correctly that the get_sequence works > fine . Yes i did try to appened one line to RemoteBlast after making > a few attempts to run the code but it did not work and i removed it . > Do you suggest installing the modules again ? > > chandan > > On Apr 12, 2005 3:55 AM, Madeleine Lemieux wrote: >> Chandan, >> >> Does this error happen every time you run the code? The get_sequence >> works otherwise your error message would have come from EBI, not NCBI, >> which means your proxy is set up OK. Both get_sequence and >> blast_sequence eventually call the same User Agent code so if one >> works, the other should as well. Have you edited either Perl.pm or >> RemoteBlast.pm? >> >> Madeleine >> >>> I am a newbie and while installing bioperl and other related softwres >>> i had installation problems >>> but fortunately succeeded in debugging few .I had no idea where to >>> share those experiences >>> but now being a member of this forum might help. Though i dont have >>> the installation outputs >>> i would like to share the last such experience while installind >>> Berkeley DB .It was not able to include a file Extern.h and i found >>> that the concerned file in >>> /usr/lib/perl5/..../Core/Extern had a wrong extension . After setting >>> it to Extern.h installation was >>> possible.I could not install few other related softwares . >>> >>> At present i am facing this problem.The second code for blast in >>> bptutorial(html format on bioperl site ) when run does not yield the >>> desired result . >>> #! /usr/bin/perl >>> use Bio::Perl; >>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>> # uses the default database - nr in this case >>> $blast_result = blast_sequence($seq_object); >>> >>> write_blast(">roa1.blast",$blast_result); >>> The output after running the code is >>> ------------------- WARNING --------------------- >>> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi >>> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 >>> Content-Length: 651 >>> Content-Type: application/x-www-form-urlencoded >>> >>> DATABASE=nr&QUERY=%3EROA1_HUMAN+Heterogeneous+nuclear+ribonucleoprote >>> in >>> +A1+(Helix-destabilizing+protein)+(Single- >>> strand+binding+protein)+(hnRNP+core+protein+A1).%0ASKSESPKEPEQLRKLFIG >>> GL >>> SFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVTYATVEEVDAAMNARPHKVDGRVVEPKRAV >>> SR >>> EDSQRPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIEIMTDRGSGKKRGFAFVTFDDHDSVDK >>> IV >>> IQKYHTVNGHNCEVRKALSKQEMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSR >>> GG >>> GGYGGSGDGYNGFGNDGGYGGGGPGYSGGSRGYGSGGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGGSG >>> SN >>> FGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF&C >>> OM >>> POSITION_BASED_STATISTICS=off&EXPECT=1e >>> -10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L&PROGRAM=bl >>> as >>> tp >>> >>> >>> An Error Occurred >>> >>>

An Error Occurred

>>> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to >>> host) >>> >>> >>> >>> --------------------------------------------------- >>> Submitted Blast for [ROA1_HUMAN] >>> >>> >>> The env variable HTTP_PROXY or /and http_proxy is set correct .I hope >>> somebody can >>> help me out . >>> Thank You . >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > From smarkel at scitegic.com Tue Apr 12 20:23:32 2005 From: smarkel at scitegic.com (Scott Markel) Date: Tue Apr 12 20:18:27 2005 Subject: [Bioperl-l] why does Bio::Tools::Run::Hmmer return a SearchIO object for hmmalign? Message-ID: <425C6684.7020609@scitegic.com> I'm curious as to why Bio::Tools::Run::Hmmer returns a SearchIO object when the program is hmmalign. I would have expected an AlignIO object since the result of an hmmalign execution is a Stockholm formatted alignment file. I expect that I'm missing something obvious, but I don't see it. An online search, including the mailing list archive, didn't help. Scott -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From jason.stajich at duke.edu Tue Apr 12 21:10:41 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Apr 12 21:04:51 2005 Subject: [Bioperl-l] SearchIO::Writer::GbrowseGFF In-Reply-To: <425C4C90.2020102@email.arizona.edu> References: <200504111320.j3BDKvfX013639@portal.open-bio.org> <425C0F80.805@email.arizona.edu> <003b4ddb77ce4031f7573d09c344210e@duke.edu> <425C4C90.2020102@email.arizona.edu> Message-ID: <2088998be63761a70db8070c539e8f89@duke.edu> There is no reason - feel free to modify the code and CC changes to Mark and the list. I think it should be a customizable option not required. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 12, 2005, at 6:32 PM, Susan J. Miller wrote: > Jason Stajich wrote: >> You should be able to just call: >> my $cigarstring = $hsp->cigar_string. >> it takes care of delegating to generate_cigar_string for you. > > Thank you, Jason - that's exactly what I need. > > I have another (semi-related) question: why doesn't SearchIO > write_result output the cigar string when the output_format is > 'GbrowseGFF'? > > > Thanks, > > Susan J. Miller > Biotechnology Computing Facility > University of Arizona > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Tue Apr 12 21:11:42 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Apr 12 21:05:21 2005 Subject: [Bioperl-l] why does Bio::Tools::Run::Hmmer return a SearchIO object for hmmalign? In-Reply-To: <425C6684.7020609@scitegic.com> References: <425C6684.7020609@scitegic.com> Message-ID: <37a1bc0dd995d0978cd7ef91fa216495@duke.edu> That is curious. Is it even a proper searchIO object - I can't imagine it would work? I assume you know what it would take in Tools::Run::HMMER to do something different for hmmalign runs. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 12, 2005, at 8:23 PM, Scott Markel wrote: > I'm curious as to why Bio::Tools::Run::Hmmer returns a > SearchIO object when the program is hmmalign. I would > have expected an AlignIO object since the result of an > hmmalign execution is a Stockholm formatted alignment file. > > I expect that I'm missing something obvious, but I don't > see it. An online search, including the mailing list > archive, didn't help. > > Scott > > -- > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel@scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > San Diego, CA 92123 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From smarkel at scitegic.com Tue Apr 12 21:24:43 2005 From: smarkel at scitegic.com (Scott Markel) Date: Tue Apr 12 21:23:03 2005 Subject: [Bioperl-l] why does Bio::Tools::Run::Hmmer return a SearchIO object for hmmalign? In-Reply-To: <37a1bc0dd995d0978cd7ef91fa216495@duke.edu> References: <425C6684.7020609@scitegic.com> <37a1bc0dd995d0978cd7ef91fa216495@duke.edu> Message-ID: <425C74DB.8010201@scitegic.com> Jason, It doesn't really work. It gives a SearchIO object with no hits. I'll take a shot at modifying Bio::Tools::Run::Hmmer to handle hmmalign differently. Scott Jason Stajich wrote: > That is curious. Is it even a proper searchIO object - I can't imagine > it would work? I assume you know what it would take in Tools::Run::HMMER > to do something different for hmmalign runs. > > -jason > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Apr 12, 2005, at 8:23 PM, Scott Markel wrote: > >> I'm curious as to why Bio::Tools::Run::Hmmer returns a >> SearchIO object when the program is hmmalign. I would >> have expected an AlignIO object since the result of an >> hmmalign execution is a Stockholm formatted alignment file. >> >> I expect that I'm missing something obvious, but I don't >> see it. An online search, including the mailing list >> archive, didn't help. >> >> Scott >> >> -- >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel@scitegic.com >> SciTegic Inc. mobile: +1 858 205 3653 >> 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 >> San Diego, CA 92123 fax: +1 858 279 8804 >> USA web: http://www.scitegic.com >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From mingyi.liu at gpc-biotech.com Wed Apr 13 09:42:31 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Wed Apr 13 09:36:16 2005 Subject: [Bioperl-l] Bio::ASN1::EntrezGene available on CPAN now In-Reply-To: <0IEU00683J2KRJ@smtpout.cair.du.edu> References: <0IEU00683J2KRJ@smtpout.cair.du.edu> Message-ID: <425D21C7.3000703@gpc-biotech.com> FYI, the low-level parser for Entrez Gene (V1.06) used by Bioperl's Entrez Gene parser is now available on both CPAN (http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene-1.06/) and sourceforge (http://sourceforge.net/projects/egparser). The two sites have the exact same package and will continue to be the same for future releases. However, sourceforge also contains an older release (V1.05) that has some more content, despite having an older parser. Please let me know if you have any questions. Thanks, Mingyi From mseewald at gmail.com Wed Apr 13 11:41:35 2005 From: mseewald at gmail.com (Michael Seewald) Date: Wed Apr 13 11:35:08 2005 Subject: [Bioperl-l] Help with parsing (BLAST/exonerate) for DAS server Message-ID: Dear Bioperl developers, I have posed the following questions already to the Ensembl helpdesk and the Ensembl-developers. Unfortunately, nobody could help me there (or was too busy..). I would really appreciate your help or any pointers to documentation/public examples. I would like to set up a DAS server in order to add sequences (+their annotation) to the Ensembl ContigView. I have BLAST results of 1) short genomic sequences BLASTed vs. the chromosomes ( e.g.Homo_sapiens.NCBI35.nov.dna.chromosome.21.fa ) and 2) fragment transcript sequences (e.g. single Affy probes) BLASTed vs. Ensembl transcripts (e.g. Homo_sapiens.NCBI35.nov.cdna.fa). Now, I would like to parse and format them into something that can be displayed using a DAS server like LDAS. * Are there already scripts available that support the parsing of BLAST results and help to prepare them for the DAS server? Could you point me to the source (e.g. in the Ensembl source)? * BLASTing versus the chromosomes and displaying sequences in the Contigview seems to be straightforward because the BLAST hits can be taken right away. What would you recommend, if I have transcript fragments, which could be the result of splicing? How can I properly deal with intron/exon boundaries? Is it a good idea to use Exonerate instead of BLAST here (or maybe in general for the alignment)? Thanks & kind regards, Michael Seewald -- Dr. Michael Seewald Bioinformatics Bayer HealthCare AG From mark_lambrecht at yahoo.com Wed Apr 13 05:09:49 2005 From: mark_lambrecht at yahoo.com (Mark Lambrecht) Date: Wed Apr 13 12:24:00 2005 Subject: [Bioperl-l] Entrez Gene ASN.1 solution Message-ID: <20050413090950.73933.qmail@web30615.mail.mud.yahoo.com> We have developed our own interface to the NCBI Entrez Gene ASN.1 flat files. We needed this internally to replace the bioperl LocusLink parser. Because we have used so many great bioperl code over the last years, we had hoped that people can benefit from our work. This system has already proven its value , at least for us. The module consists of the following objects: => Bio::_GeneData.pm : abstract engine for parsing "type blocks" within the NCBI ASN.1 files => Bio::Gene.pm :Entrez Gene object (replaces the Bioperl sequence object that is normally returned by an IO object) and only keeps relevant data, can easily be extended to map additional needed data using the GeneData engine => Bio::GeneIO.pm : iterator derived from RootIO (similar to the SeqIO objects); implements next_gene method. subdirectory Index with => Bio::Index::EntrezGene.pm : object with capability to index and consult an ASN.1 File, inherits from Bio::Index::Abstract test scripts will be committed too : => few small test records (with extension asn1) => t_gene_indexer.pl : test file to index asn.1 file and return an example record #example: my $file = "gene_hs.asn1"; my $inx = Bio::Index::EntrezGene->new( '-filename' => $file.".inx", '-write_flag' => 'WRITE'); $inx->make_index("/usr/local/datasets/ncbi/gene/$file"); => testGene.pl : tests a Gene objects for return of appropriate data fields #example for only extracting track info from the asn1 file, this is a dynamic way of choosing which data to parse my $track_info = new Bio::Gene::GeneTrack; $track_info->geneid(1); $gene->type('test_type'); $gene->track_info($track_info); print "dump:\n".Dumper($gene)."\n"; Stefan Kirov and Mingyi Liu have produced similar solutions (wich we didn't test); we believe that ours is different because it is a all-in-one lightweight Entrez Gene ASN1 parser that will only capture essential data (thereby making it rather fast). We deliberately didn't choose to map the data on a Seq object. At the same time, a bioperl-compliant indexer has been written. We hope that this code can somehow be useful. We will commit the code to bioperl cvs if people agree, as soon as we obtain a login. Kris Ulens (bioinformatics software developer) Mark Lambrecht (scientist bioinformatics) Galapagos Genomics http://www.galapagosgenomics.com __________________________________ Yahoo! Mail Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail From skirov at utk.edu Wed Apr 13 13:05:25 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Apr 13 12:58:59 2005 Subject: [Bioperl-l] Entrez Gene ASN.1 solution In-Reply-To: <20050413090950.73933.qmail@web30615.mail.mud.yahoo.com> References: <20050413090950.73933.qmail@web30615.mail.mud.yahoo.com> Message-ID: <425D5155.20906@utk.edu> Could you please post a description of the Entrez Gene object? I am also not very happy with creating Bio::Seq object as I don't think this object should be "one size fits all" solution, so I am very curious to see what is your design. I find the indexing very useful for a particular group of people (actually we discussed this before and agreed it is a good idea). I think having two parsers for the same format is OK for bioperl so I don't see any reason for you parser not to be in Bioperl. Stefan Mark Lambrecht wrote: >We have developed our own interface to the NCBI >Entrez Gene ASN.1 flat files. We needed this >internally to replace the bioperl LocusLink parser. >Because we have used so many great bioperl code over >the last years, we had hoped that people can benefit >from our work. This system has already proven its >value , at least for us. > >The module consists of the following objects: > > => Bio::_GeneData.pm : abstract engine for >parsing "type blocks" > within the NCBI ASN.1 files > => Bio::Gene.pm :Entrez Gene object (replaces the >Bioperl sequence > object that is normally returned by an IO object) and >only keeps > relevant data, can easily be extended to map >additional needed data > using the GeneData engine > => Bio::GeneIO.pm : iterator derived from RootIO >(similar to the > SeqIO objects); implements next_gene method. > > subdirectory Index with > => Bio::Index::EntrezGene.pm : object with >capability to index and > consult an ASN.1 File, inherits from >Bio::Index::Abstract > > test scripts will be committed too : > => few small test records (with extension asn1) > => t_gene_indexer.pl : test file to index asn.1 >file and return > an example record > > #example: > my $file = "gene_hs.asn1"; > > my $inx = Bio::Index::EntrezGene->new( >'-filename' => > $file.".inx", '-write_flag' => 'WRITE'); > >$inx->make_index("/usr/local/datasets/ncbi/gene/$file"); > => testGene.pl : tests a Gene objects for return >of appropriate > data fields > > #example for only extracting track info from >the asn1 file, > this is a dynamic way of choosing which data to parse > my $track_info = new Bio::Gene::GeneTrack; > > $track_info->geneid(1); > $gene->type('test_type'); > $gene->track_info($track_info); > print "dump:\n".Dumper($gene)."\n"; > >Stefan Kirov and Mingyi Liu have produced similar >solutions (wich we didn't test); we believe that ours >is different because it is a all-in-one lightweight >Entrez Gene ASN1 parser that will only capture >essential data (thereby making it rather fast). We >deliberately didn't choose to map the data on a Seq >object. At the same time, a bioperl-compliant indexer >has been written. >We hope that this code can somehow be useful. > >We will commit the code to bioperl cvs if people >agree, as soon as we obtain a login. > > Kris Ulens (bioinformatics software developer) > Mark Lambrecht (scientist bioinformatics) > >Galapagos Genomics >http://www.galapagosgenomics.com > > > >__________________________________ >Yahoo! Mail Mobile >Take Yahoo! Mail with you! Check email on your mobile phone. >http://mobile.yahoo.com/learn/mail >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From brian_osborne at cognia.com Wed Apr 13 13:56:03 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Apr 13 13:49:56 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: <0IEU00683J2KRJ@smtpout.cair.du.edu> Message-ID: Colin, If you'd like a command-line environment like some sort of Unix install Cygwin (www.cygwin.com). No need to install everything, just click the "View" button in the main installation window and select and install the minimum, something like gcc, binutils, cvs, openssh, make, Perl. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Colin Erdman Sent: Tuesday, April 12, 2005 2:23 PM To: 'Mingyi Liu'; 'Stefan Kirov' Cc: 'Bioperl list' Subject: RE: [Bioperl-l] Re: Entrez gene parser code I am between Linux installs right now and actually running win32 with the ActiveState Perl install... How does one add the cvs.open-bio.org repository to the PPM console list to search through it and install the bioperl-live packages etc? I don't see a comparable cvs command within it. This is all new to me and I appreciate the help! Thanks, Colin -----Original Message----- From: Mingyi Liu [mailto:mingyi.liu@gpc-biotech.com] Sent: Tuesday, April 12, 2005 10:56 AM To: Stefan Kirov Cc: Colin Erdman; Bioperl list Subject: Re: [Bioperl-l] Re: Entrez gene parser code Stefan Kirov wrote: > In order for this parser to work you need to get > GI::Parser::Entrezgene from sourceforge. You can get the address for > this module from the perl doc of entrezgene: perldoc > Bio::SeqIO::entrezgene > Stefan > I just want to add that I will be adding GI::Parser::EntrezGene to cpan in a few days, and most likely the name space will switch to Bio::ASN1 (therefore it'd be Bio::ASN1::EntrezGene) based on PAUSE admin suggestion. Thanks, Mingyi _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Wed Apr 13 14:43:35 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed Apr 13 14:37:23 2005 Subject: [Bioperl-l] How to obtain the chromosome from an Accession Number Message-ID: <000501c54058$b2f41ed0$c910880a@GOLHARMOBILE1> Hi all, I have a bunch of accession numbers from different organisms, and I'm trying to determine the chromosome of their respective organism that they occur on. Can I do this with BioPerl? Ryan From cerdman2 at du.edu Wed Apr 13 15:16:45 2005 From: cerdman2 at du.edu (Colin Erdman) Date: Wed Apr 13 15:12:17 2005 Subject: [Bioperl-l] Entrez Gene ASN.1 solution Message-ID: <0IEW00ITRG82A5@smtpout.cair.du.edu> I would very much like to give this a shot. My current project is being held up because of ASN.1 parsing issues. BioPerl is new to me, and an intuitive parsing module would be great. Mark I posted the CVS instructions that I received from Stefan Kirov yesterday (below). Stefan: I have been using your entrezgene parser from bioperl-live and I like it a lot. I am just trying to figure out a few things. 1) How do I select only genes from the entrezgene seq object that are current (have an official symbol, description etc)? I saved that output for all non-pseudogene Homo sapiens Chr21 genes (~370) in ASN format and I need to sift through that file. The only problem is that there seem to be all the 'removed' and withdrawn genes in the file so there are more than the 370 'current' actually being parsed etc. 2) How do I access the accession number lists and references for the gene objects? Is this just included in the uncaptured data or what Bio::Seq object or annotation object to I use to get to them? Thanks again as always! Colin Stefan Kirov Wrote: >>>> The example below shows how to login to the bioperl repository. To login to other repositories simply alter the /home/repository/(project) information. *cvs -d :pserver:cvs at cvs.open-bio.org :/home/repository/bioperl login* /when prompted, the password is 'cvs'/ (4) Each project CVS repository can have many different packages available for download. You may need to browse the web interface for a bit to determine the packages of interest. After a successful login you may "checkout" the project package you are interested in. The following command should be executed as one line. The specific example shows how to check out the primary bioperl codebase which is contained in the "*bioperl-live*" package. * cvs -d :pserver:cvs at cvs.open-bio.org :/home/repository/bioperl checkout bioperl-live then perl Makefile.PL and finally make install In order for this parser to work you need to get GI::Parser::Entrezgene from sourceforge. You can get the address for this module from the perl doc of entrezgene: perldoc Bio::SeqIO::entrezgene Stefan From skirov at utk.edu Wed Apr 13 15:55:09 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Apr 13 15:50:57 2005 Subject: [Bioperl-l] Entrez Gene ASN.1 solution In-Reply-To: <0IEW00ITRG82A5@smtpout.cair.du.edu> References: <0IEW00ITRG82A5@smtpout.cair.du.edu> Message-ID: <425D791D.4040807@utk.edu> Colin, $seq->(_status} will give you the status of the gene (reviewed, withdrawn, etc.) It is not supposed to be accessed like that, but this parser is not finished yet, so it may move as a part of the annotation (and most probably will). $seq->desc will give you naturally the description (or summary) To get the refereneces do: $ann = $seqobj->annotation(); foreach my $ref ( $ann->get_Annotations('reference') ) { print "Reference ",$ref->pubmed,"\n"; (or $ref->medline) } To get the accession numbers you will have to get the both the Bio::Seq and Bio::Cluster object. In it you will have Bio::Seq objects and you should get the accession_number or the id (accession number is actually the gi). Next week I will submit some significant changes+more documentation so you may want to wait a bit... Stefan Colin Erdman wrote: >I would very much like to give this a shot. My current project is being held up because of ASN.1 parsing issues. BioPerl is new to me, and an intuitive parsing module would be great. Mark I posted the CVS instructions that I received from Stefan Kirov yesterday (below). > > > >Stefan: I have been using your entrezgene parser from bioperl-live and I like it a lot. I am just trying to figure out a few things. 1) How do I select only genes from the entrezgene seq object that are current (have an official symbol, description etc)? I saved that output for all non-pseudogene Homo sapiens Chr21 genes (~370) in ASN format and I need to sift through that file. The only problem is that there seem to be all the ?removed? and withdrawn genes in the file so there are more than the 370 ?current? actually being parsed etc. > > 2) How do I access the accession number lists and references for the gene objects? Is this just included in the uncaptured data or what Bio::Seq object or annotation object to I use to get to them? > > > >Thanks again as always! > >Colin > > > > > >Stefan Kirov Wrote: > >>>>> > >The example below shows how to login to the bioperl repository. To login > >to other repositories simply alter the /home/repository/(project) > >information. > > > > *cvs -d :pserver:cvs at cvs.open-bio.org :/home/repository/bioperl login* > > /when prompted, the password is 'cvs'/ > > > > > >(4) Each project CVS repository can have many different packages > >available for download. You may need to browse the web interface for a > >bit to determine the packages of interest. After a successful login you > >may "checkout" the project package you are interested in. > > > >The following command should be executed as one line. The specific > >example shows how to check out the primary bioperl codebase which is > >contained in the "*bioperl-live*" package. > > > >* cvs -d :pserver:cvs at cvs.open-bio.org :/home/repository/bioperl checkout > >bioperl-live > >then perl Makefile.PL and finally make install > >In order for this parser to work you need to get GI::Parser::Entrezgene > >from sourceforge. You can get the address for this module from the perl > >doc of entrezgene: perldoc Bio::SeqIO::entrezgene > >Stefan > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From cerdman2 at du.edu Wed Apr 13 17:35:11 2005 From: cerdman2 at du.edu (Colin Erdman) Date: Wed Apr 13 17:29:18 2005 Subject: [Bioperl-l] Entrez Gene ASN.1 solution In-Reply-To: <425D7FDB.4040509@gpc-biotech.com> Message-ID: <0IEW00LR8MMRB3@smtpout.cair.du.edu> Thanks Mingyi. I was indeed looking for the 'live' status. The $seqobj->{_status} gives me the RefSeq status for the gene (I think) which will keep me entertained enough for the moment... Please excuse my slips and overlaps in terminology. I am a mere undergrad still soaking up all of the molecular biology and bioinformatics/programming fundamentals! Stefan: How would one go about accessing the Cluster object for the entrezgene to get at the Accession numbers? I am familiar with doing this for Unigene files. Is it similar? Would I just be able to use: $stream = Bio::ClusterIO->new('-file' => $file, '-format' => "entrezgene"); To initially access the entrezgene objects and then break those down into Seq objects for acc# etc retrieval? Many Thanks, Colin -----Original Message----- From: Mingyi Liu [mailto:mingyi.liu@gpc-biotech.com] Sent: Wednesday, April 13, 2005 2:24 PM To: Stefan Kirov Cc: Colin Erdman Subject: Re: [Bioperl-l] Entrez Gene ASN.1 solution Colin, I just want to add that if you're looking for the Entrez Gene status ('live', 'secondary', or 'discontinued') in track-info, I believe it's in uncaptured data. Stafan, could you provide the access to Entrez Gene status ({'track-info'}->[0]->{status}) in your next update too? Thanks, Mingyi From sdavis2 at mail.nih.gov Wed Apr 13 21:07:23 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Apr 13 21:08:33 2005 Subject: [Bioperl-l] How to obtain the chromosome from an Accession Number In-Reply-To: <000501c54058$b2f41ed0$c910880a@GOLHARMOBILE1> References: <000501c54058$b2f41ed0$c910880a@GOLHARMOBILE1> Message-ID: Ryan, What kind of accession? GenBank? If so, then you can use the UCSC genome table browser (http://genome.ucsc.edu/cgi-bin/hgTables) to upload your list of all accessions and look them up in their mapping tables (mRNA and EST and knownGene). You would need to upload your whole list to each of the possible organisms and then compile the final list from the species-specific lists. Of course, these tables are all available as tab-delimited text for download from UCSC, so you could grab whichever you like and do it on your own machine. Note that some accessions will not map to any chromosome of any organism and some will map to many chromosomes/locations in the organism from which they came. Sean On Apr 13, 2005, at 2:43 PM, Ryan Golhar wrote: > Hi all, > > I have a bunch of accession numbers from different organisms, and I'm > trying to determine the chromosome of their respective organism that > they occur on. Can I do this with BioPerl? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From venancio at ime.usp.br Wed Apr 13 17:10:07 2005 From: venancio at ime.usp.br (Thiago Motta Venancio) Date: Wed Apr 13 21:28:14 2005 Subject: [Bioperl-l] Remote::Blast Message-ID: <425D8AAF.6040903@ime.usp.br> Hi all. I am using the Remote::Blast module. The script was running ok, but it become out because of a 500 error and gaves a timeout www.ncbi.nih.go:80. Later, it came back, but the vast majority of sequences returned no matches, some of them are not really no matches. Any lights? Thanks in advance Thiago From sdavis2 at mail.nih.gov Wed Apr 13 21:32:12 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Apr 13 21:32:14 2005 Subject: [Bioperl-l] How to obtain the chromosome from an Accession Number In-Reply-To: References: <000501c54058$b2f41ed0$c910880a@GOLHARMOBILE1> Message-ID: <4CE97961-2212-11B2-AAEE-000393CFE1C4@mail.nih.gov> And, I forgot to mention, I don't think there is a way to do this directly in bioperl, although perl is likely to be useful in this process. Sean On Jan 6, 1970, at 4:44 AM, Sean Davis wrote: > Ryan, > > What kind of accession? GenBank? If so, then you can use the UCSC > genome table browser (http://genome.ucsc.edu/cgi-bin/hgTables) to > upload your list of all accessions and look them up in their mapping > tables (mRNA and EST and knownGene). You would need to upload your > whole list to each of the possible organisms and then compile the > final list from the species-specific lists. Of course, these tables > are all available as tab-delimited text for download from UCSC, so you > could grab whichever you like and do it on your own machine. Note > that some accessions will not map to any chromosome of any organism > and some will map to many chromosomes/locations in the organism from > which they came. > > Sean > > On Apr 13, 2005, at 2:43 PM, Ryan Golhar wrote: > >> Hi all, >> >> I have a bunch of accession numbers from different organisms, and I'm >> trying to determine the chromosome of their respective organism that >> they occur on. Can I do this with BioPerl? >> >> Ryan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jjmail at mac.com Thu Apr 14 00:26:25 2005 From: jjmail at mac.com (jjmail@mac.com) Date: Thu Apr 14 00:20:22 2005 Subject: [Bioperl-l] Newbie Questions: bioperl, bioperl-db, and GO Message-ID: <9460183.1113452785586.JavaMail.jjmail@mac.com> Question 1: I am brand new to bioperl and the related projects so please forgive my ignorance on this. I have a large list of protein names and I would like to use bioperl to get the corresponding Gene Ontology (GO) information for each protein. So far I have installed bioperl, BioSQL, and bioperl-db and uploaded the taxonomy and GO information into BioSQL. I am having a really hard time figuring out how to get the GO information out of the database. If anyone knows the right doc to read or has a simple example program that I could see that would be really useful. Question 2: I have collected protein expression data for various states and I would like to cluster the data based on GO information for a start and then if possible use bioperl's ability to analyze mRNA array data to analyze the protein data. Does this seem reasonable? Where should I start looking to figure out how to do this? Thank you, --Jamie ___________________ Jamie Sherman Grad Student, UW From hlapp at gmx.net Thu Apr 14 01:47:04 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Apr 14 01:40:22 2005 Subject: [Bioperl-l] Newbie Questions: bioperl, bioperl-db, and GO In-Reply-To: <9460183.1113452785586.JavaMail.jjmail@mac.com> Message-ID: On Wednesday, April 13, 2005, at 09:26 PM, jjmail@mac.com wrote: > I have a large list of protein names and I would like to use bioperl > to get the corresponding Gene Ontology (GO) information for each > protein. > > So far I have installed bioperl, BioSQL, and bioperl-db and uploaded > the taxonomy and GO information into BioSQL. I am having a really hard > time figuring out how to get the GO information out of the database. I'm not sure I understand what you're trying to do. If you loaded the GO ontology into biosql then that will not give you term to protein associations. You would need to load the proteins as well and have annotation for them that references the GO terms they are associated with. There is also a level of devil in the details because currently no bioperl SeqIO parser except the LocusLink parser (and hopefully the Entrez Gene parser already or soon) will give you the GO term associations as appropriate Bio::Annotation::OntologyTerm annotation. If you want to use UniProt as the protein data source then the terms would end up as dbxrefs. I can post a simple SQL script that will convert those into term associations, but the point is that this won't happen magically. If all you're trying to do is lookup ontology terms based on some identifying property, like the identifier, then you can do this in bioperl-db using the same mechanism as for sequences: my $db = Bio::DB::BioDB->new(...blah...); my $term = Bio::Ontology::Term->new(-identifier => 'GO:123456'); my $adp = $db->get_object_adaptor($term); my $dbterm = $adp->find_by_unique_key($term); # on success $dbterm is-a persistent Bio::Ontology::TermI If none of this helps you will need to be more specific on your approach and what you want to achieve. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From chandan.kr.singh at gmail.com Thu Apr 14 02:31:00 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Thu Apr 14 02:27:01 2005 Subject: [Bioperl-l] Remote::Blast In-Reply-To: <425D8AAF.6040903@ime.usp.br> References: <425D8AAF.6040903@ime.usp.br> Message-ID: <2d4f3205041323315941ec9a@mail.gmail.com> Hi all I am facing the same problem and have alredy posted it on this forum . In my case ,most of the time the error 500 is 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to host) sometimes i also get the errror 500 as 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: timeout) hello Thiago Motta Venancio.Are u using a proxy server to connect? I somehow suspect that it has something to do with proxy. Of late I have been trying to trace out the bug and found that The subroutine UserAgent ::_need_proxy returns an undef value in my case . I would like to know more about this subroutine and request all who know about it to please respond . Thanks Chandan On 4/14/05, Thiago Motta Venancio wrote: > Hi all. > I am using the Remote::Blast module. > The script was running ok, but it become out because of a 500 error and > gaves a timeout www.ncbi.nih.go:80. > Later, it came back, but the vast majority of sequences returned no > matches, some of them are not really no matches. > Any lights? > Thanks in advance > Thiago > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From chandan.kr.singh at gmail.com Thu Apr 14 02:37:23 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Thu Apr 14 02:30:56 2005 Subject: [Bioperl-l] RemoteBlast by Bio::Perl Through proxy In-Reply-To: <055ecfb115370dc1924c4ceff464c8d5@bioinfo.ca> References: <2d4f3205041211533a6a32c8@mail.gmail.com> <055ecfb115370dc1924c4ceff464c8d5@bioinfo.ca> Message-ID: <2d4f3205041323375c4f822b@mail.gmail.com> Hi all As already mentioned i get error 500 while using Remoteblast through Bio::Perl module .Most of the time the error is 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to host) Sometimes the error is 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: timeout) (refer to my first mail for details ) Of late I have been trying to trace out the bug and found that The subroutine UserAgent ::_need_proxy returns an undef value in my case . I would like to know more about this subroutine and request all who know about it to please respond . Thanks Chandan On 4/13/05, Madeleine Lemieux wrote: > Chandan, > > In your own best interest, you should post your responses to the list. > That way you get the benefit of a large talent pool rather than > limiting yourself to what little I know. > > You can try getting a fresh copy of RemoteBlast.pm from the bioperl > website just in case you unintentionally changed something other than > that one line. > > But it seems to me as if you're timing out before the NCBI server has > time to respond in which case I'm afraid I don't know how to help you. > If you do traceroute www.ncbi.nlm.nih.gov, do you reach the server in a > reasonable amount of time? > > Madeleine > > > hi Madeleine Lemieux > > Thanks for ur reply . Yes the error happens everytime i run the code > > and you deduced correctly that the get_sequence works > > fine . Yes i did try to appened one line to RemoteBlast after making > > a few attempts to run the code but it did not work and i removed it . > > Do you suggest installing the modules again ? > > > > chandan > > > > On Apr 12, 2005 3:55 AM, Madeleine Lemieux wrote: > >> Chandan, > >> > >> Does this error happen every time you run the code? The get_sequence > >> works otherwise your error message would have come from EBI, not NCBI, > >> which means your proxy is set up OK. Both get_sequence and > >> blast_sequence eventually call the same User Agent code so if one > >> works, the other should as well. Have you edited either Perl.pm or > >> RemoteBlast.pm? > >> > >> Madeleine > >> > >>> I am a newbie and while installing bioperl and other related softwres > >>> i had installation problems > >>> but fortunately succeeded in debugging few .I had no idea where to > >>> share those experiences > >>> but now being a member of this forum might help. Though i dont have > >>> the installation outputs > >>> i would like to share the last such experience while installind > >>> Berkeley DB .It was not able to include a file Extern.h and i found > >>> that the concerned file in > >>> /usr/lib/perl5/..../Core/Extern had a wrong extension . After setting > >>> it to Extern.h installation was > >>> possible.I could not install few other related softwares . > >>> > >>> At present i am facing this problem.The second code for blast in > >>> bptutorial(html format on bioperl site ) when run does not yield the > >>> desired result . > >>> #! /usr/bin/perl > >>> use Bio::Perl; > >>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > >>> # uses the default database - nr in this case > >>> $blast_result = blast_sequence($seq_object); > >>> > >>> write_blast(">roa1.blast",$blast_result); > >>> The output after running the code is > >>> ------------------- WARNING --------------------- > >>> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > >>> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 > >>> Content-Length: 651 > >>> Content-Type: application/x-www-form-urlencoded > >>> > >>> DATABASE=nr&QUERY=%3EROA1_HUMAN+Heterogeneous+nuclear+ribonucleoprote > >>> in > >>> +A1+(Helix-destabilizing+protein)+(Single- > >>> strand+binding+protein)+(hnRNP+core+protein+A1).%0ASKSESPKEPEQLRKLFIG > >>> GL > >>> SFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVTYATVEEVDAAMNARPHKVDGRVVEPKRAV > >>> SR > >>> EDSQRPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIEIMTDRGSGKKRGFAFVTFDDHDSVDK > >>> IV > >>> IQKYHTVNGHNCEVRKALSKQEMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSR > >>> GG > >>> GGYGGSGDGYNGFGNDGGYGGGGPGYSGGSRGYGSGGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGGSG > >>> SN > >>> FGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF&C > >>> OM > >>> POSITION_BASED_STATISTICS=off&EXPECT=1e > >>> -10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L&PROGRAM=bl > >>> as > >>> tp > >>> > >>> > >>> An Error Occurred > >>> > >>>

An Error Occurred

> >>> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to > >>> host) > >>> > >>> > >>> > >>> --------------------------------------------------- > >>> Submitted Blast for [ROA1_HUMAN] > >>> > >>> > >>> The env variable HTTP_PROXY or /and http_proxy is set correct .I hope > >>> somebody can > >>> help me out . > >>> Thank You . > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > From jjmail at mac.com Thu Apr 14 02:44:17 2005 From: jjmail at mac.com (Jamie Sherman) Date: Thu Apr 14 02:37:52 2005 Subject: [Bioperl-l] Newbie Questions: bioperl, bioperl-db, and GO In-Reply-To: References: Message-ID: > >> I have a large list of protein names and I would like to use bioperl >> to get the corresponding Gene Ontology (GO) information for each >> protein. >> >> So far I have installed bioperl, BioSQL, and bioperl-db and uploaded >> the taxonomy and GO information into BioSQL. I am having a really >> hard time figuring out how to get the GO information out of the >> database. > > I'm not sure I understand what you're trying to do. If you loaded the > GO ontology into biosql then that will not give you term to protein > associations. You would need to load the proteins as well and have > annotation for them that references the GO terms they are associated > with. There is also a level of devil in the details because currently > no bioperl SeqIO parser except the LocusLink parser (and hopefully the > Entrez Gene parser already or soon) will give you the GO term > associations as appropriate Bio::Annotation::OntologyTerm annotation. > This helps me a lot. I realized I had to get the GO terms associated with the protein and I had seen them in the swissprot annotations so I figured I'd have to parse them out of the annotations but I'll try LocusLink and Entrez Gene and see if that makes it easier. I thought that the GO information in BioSQL might contain associated gene lists too but apparently not. > If you want to use UniProt as the protein data source then the terms > would end up as dbxrefs. I can post a simple SQL script that will > convert those into term associations, but the point is that this won't > happen magically. > > If all you're trying to do is lookup ontology terms based on some > identifying property, like the identifier, then you can do this in > bioperl-db using the same mechanism as for sequences: > > my $db = Bio::DB::BioDB->new(...blah...); > my $term = Bio::Ontology::Term->new(-identifier => 'GO:123456'); > my $adp = $db->get_object_adaptor($term); > my $dbterm = $adp->find_by_unique_key($term); > # on success $dbterm is-a persistent Bio::Ontology::TermI > > If none of this helps you will need to be more specific on your > approach and what you want to achieve. > > -hilmar I think using one of the approaches you outline I should be able to get the GO information so thanks a bunch. The second part of the question is more about after I can collect all the information into the program and associate it with the protein expression data what is the best way to manage that information to take advantage of clustering abilities of bioperl. Should I load them into BioSQL and if so where do I look for documentation to learn the interface to BioSQL. I noticed a lot of the perldoc pages in the Bio::DB:*** seemed to be fairly sparse. Thanks Again, --Jamie From hlapp at gmx.net Thu Apr 14 03:00:09 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Apr 14 02:53:10 2005 Subject: [Bioperl-l] Newbie Questions: bioperl, bioperl-db, and GO In-Reply-To: Message-ID: On Wednesday, April 13, 2005, at 11:44 PM, Jamie Sherman wrote: > The second part of the question is more about after I can collect all > the information into the program and associate it with the protein > expression data what is the best way to manage that information to > take advantage of clustering abilities of bioperl. I can't comment on that part as I don't know about those modules. > Should I load them into BioSQL and if so where do I look for > documentation to learn the interface to BioSQL. I noticed a lot of the > perldoc pages in the Bio::DB:*** seemed to be fairly sparse. The vanilla version of biosql doesn't store expression data. (Some of) the Bio::DB::BioSQL PODs (e.g., BasePersistentAdaptor) you will want to read if you want to become more sophisticated on how to query the database using the bioperl object model instead of SQL. If you want to understand more of the schema then you should check out what's in the doc directory in the biosql-schema repository, specifically the ERD and the schema-overview.txt document. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From sdavis2 at mail.nih.gov Thu Apr 14 02:55:56 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Apr 14 02:55:58 2005 Subject: [Bioperl-l] Newbie Questions: bioperl, bioperl-db, and GO In-Reply-To: <9460183.1113452785586.JavaMail.jjmail@mac.com> References: <9460183.1113452785586.JavaMail.jjmail@mac.com> Message-ID: <877B184A-223F-11B2-8158-000393CFE1C4@mail.nih.gov> On Apr 14, 2005, at 12:26 AM, jjmail@mac.com wrote: > Question 1: > > I am brand new to bioperl and the related projects so please forgive > my ignorance on this. I have a large list of protein names and I would > like to use bioperl to get the corresponding Gene Ontology (GO) > information for each protein. > > So far I have installed bioperl, BioSQL, and bioperl-db and uploaded > the taxonomy and GO information into BioSQL. I am having a really hard > time figuring out how to get the GO information out of the database. > If anyone knows the right doc to read or has a simple example program > that I could see that would be really useful. > I see that Hilmar took a stab at answering your question on the details of GO and BioSQL. > Question 2: > > I have collected protein expression data for various states and I > would like to cluster the data based on GO information for a start and > then if possible use bioperl's ability to analyze mRNA array data to > analyze the protein data. Does this seem reasonable? Where should I > start looking to figure out how to do this? > This may reflect a bit of my own bias, but if you are looking at expression (as in arrays, etc.), then I think the better tool to spend time with is called BioConductor. It is a collection of tools written for the R programming language (which you can install). Using bioconductor, you can use the annotation building package (AnnBuilder) to make an annotation package for all of the genes in your experiment. The annotation package you create contains the GO information, biologic pathways, chromosome locations, etc. Then you can use any one of dozens of normalization and analysis or clustering methods to cluster based on whatever you like, including some GO-based clustering. Perl is just not the most natural tool for doing high-level, vectorized math. BioConductor is built just for exploring data like array data (or other high-throughput data). Check out the site (http://www.bioconductor.org). There is also an email list for bioconductor. Sean From mark_lambrecht at yahoo.com Thu Apr 14 08:11:49 2005 From: mark_lambrecht at yahoo.com (Mark Lambrecht) Date: Thu Apr 14 08:09:24 2005 Subject: [Bioperl-l] Entrez Gene ASN.1 solution In-Reply-To: 6667 Message-ID: <20050414121149.73572.qmail@web30605.mail.mud.yahoo.com> Hi Stefan, Thanks for your response. I will contribute the code to cvs as soon as I obtain a cvs account with write permissions. Mingyi suggested that the indexer (Bio::Index::EntrezGene.pm) could be used to also return your seq-derived object, which I feel is a good idea. In this way, your ASN.1 object will have an indexer attached to it. The indexer could then alternatively return a Bio::Seq derived object or a Bio::Gene object. The Bio::Gene object has a number of different methods implemented to return pieces of data, such as : $gene->get_urls; but is generic and could easily be changed to return different or more data from the Entrez Gene ASN1 record. Each part of the ASN.1 is parsed into separate small objects, such as Gene::GeneCommentary, Gene::GeneTrack, ... So retrieving the gene id is done by $gene->get_gene_id() or $gene->get_genetrack->get_gene_id(); These getter methods are autoloaded by _GeneData.pm so if a new piece of data needs to accessed, no new method needs to be implemented. Regards, Mark ============================================= Kris Ulens bioinformatics software developer tel.: 0032 (0) 486 683 532 e-mail: fantom@earthling.net Mark Lambrecht, PhD K.U.Leuven, Faculty of Applied Bioscience and Engineering tel.: 0032 (0) 495 944 125 e-mail: mark@lambrecht.com mark.lambrecht@biw.kuleuven.be --- Stefan Kirov wrote: > Could you please post a description of the Entrez > Gene object? I am also > not very happy with creating Bio::Seq object as I > don't think this > object should be "one size fits all" solution, so I > am very curious to > see what is your design. > I find the indexing very useful for a particular > group of people > (actually we discussed this before and agreed it is > a good idea). > I think having two parsers for the same format is > OK for bioperl so I > don't see any reason for you parser not to be in > Bioperl. > Stefan > > Mark Lambrecht wrote: > > >We have developed our own interface to the NCBI > >Entrez Gene ASN.1 flat files. We needed this > >internally to replace the bioperl LocusLink parser. > >Because we have used so many great bioperl code > over > >the last years, we had hoped that people can > benefit > >from our work. This system has already proven its > >value , at least for us. > > > >The module consists of the following objects: > > > > => Bio::_GeneData.pm : abstract engine for > >parsing "type blocks" > > within the NCBI ASN.1 files > > => Bio::Gene.pm :Entrez Gene object (replaces > the > >Bioperl sequence > > object that is normally returned by an IO object) > and > >only keeps > > relevant data, can easily be extended to map > >additional needed data > > using the GeneData engine > > => Bio::GeneIO.pm : iterator derived from > RootIO > >(similar to the > > SeqIO objects); implements next_gene method. > > > > subdirectory Index with > > => Bio::Index::EntrezGene.pm : object with > >capability to index and > > consult an ASN.1 File, inherits from > >Bio::Index::Abstract > > > > test scripts will be committed too : > > => few small test records (with extension > asn1) > > => t_gene_indexer.pl : test file to index > asn.1 > >file and return > > an example record > > > > #example: > > my $file = "gene_hs.asn1"; > > > > my $inx = Bio::Index::EntrezGene->new( > >'-filename' => > > $file.".inx", '-write_flag' => 'WRITE'); > > > >$inx->make_index("/usr/local/datasets/ncbi/gene/$file"); > > => testGene.pl : tests a Gene objects for > return > >of appropriate > > data fields > > > > #example for only extracting track info > from > >the asn1 file, > > this is a dynamic way of choosing which data to > parse > > my $track_info = new Bio::Gene::GeneTrack; > > > > $track_info->geneid(1); > > $gene->type('test_type'); > > $gene->track_info($track_info); > > print "dump:\n".Dumper($gene)."\n"; > > > >Stefan Kirov and Mingyi Liu have produced similar > >solutions (wich we didn't test); we believe that > ours > >is different because it is a all-in-one lightweight > >Entrez Gene ASN1 parser that will only capture > >essential data (thereby making it rather fast). We > >deliberately didn't choose to map the data on a Seq > >object. At the same time, a bioperl-compliant > indexer > >has been written. > >We hope that this code can somehow be useful. > > > >We will commit the code to bioperl cvs if people > >agree, as soon as we obtain a login. > > > > Kris Ulens (bioinformatics software developer) > > Mark Lambrecht (scientist bioinformatics) > > > >Galapagos Genomics > >http://www.galapagosgenomics.com > > > > > > > >__________________________________ > >Yahoo! Mail Mobile > >Take Yahoo! Mail with you! Check email on your > mobile phone. > >http://mobile.yahoo.com/learn/mail > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Stefan Kirov, Ph.D. > University of Tennessee/Oak Ridge National > Laboratory > 5700 bldg, PO BOX 2008 MS6164 > Oak Ridge TN 37831-6164 > USA > tel +865 576 5120 > fax +865-576-5332 > e-mail: skirov@utk.edu > sao@ornl.gov > > "And the wars go on with brainwashed pride > For the love of God and our human rights > And all these things are swept aside" > > __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ From mark_lambrecht at yahoo.com Thu Apr 14 08:17:52 2005 From: mark_lambrecht at yahoo.com (Mark Lambrecht) Date: Thu Apr 14 08:12:05 2005 Subject: [Bioperl-l] Entrez Gene ASN.1 solution In-Reply-To: 6667 Message-ID: <20050414121752.17594.qmail@web30601.mail.mud.yahoo.com> Hi Mingyi, The indexer is based on simple regex parsing, and could easily be used to plug in other existing ASN.1 parsing routines, such as yours. As you suggest, we could make an option to choose for populating bioperl Seq objects or for populating the newly proposed Gene objects with the data. I'll have a look at this but this will probably not make it in our first commit to the bioperl cvs repository. Cheers, Mark --- Mingyi Liu wrote: > Hi, Mark, > > Nice job on making the Entrez Gene parser! After > developing our own > parser, we know it's a big undertaking. :) Look > forward to seeing your > code on web. > > I have one question about your indexer, which is a > nice idea and suit > Bioperl well. For your indexer, did you use your > parser to do the > indexing or just a simple regex? If it's the latter > case, & if you > could add an option that lets user gets Bioperl > objects from Stefan's > parser module, then that'd be great! If it is not > easy to fit into your > structure, would you please let me know? > > Thanks! > > Mingyi > > > __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ From hota.fin at freemail.hu Thu Apr 14 09:38:35 2005 From: hota.fin at freemail.hu (Horvath Tamas) Date: Thu Apr 14 08:31:48 2005 Subject: [Bioperl-l] Bio::Graphics::Panel Message-ID: <425E725B.9020306@freemail.hu> I gather information about certain sequences from some different files, and I want to display the gathered information. Therefore I create a Bio::Seq object, add features to it, and then I try to display it. Here's the code, that supposed to do that: my @features = (); my $id = $record->{SEQ_ID}; #I had to type in a fake code, otherwise the script stopped complaining #that the 'abc' couldn't be guessed $seqobj = Bio::Seq->new( -seq => 'ATCTGATTAGGCTAGCATAATTTGGCATGCATGCATGCATCGACTAGCATCGATCAGATCGAGCATCGATCAGCATCGATC', -id => $id, -accession_number => $id, ); push( @features, new Bio::SeqFeature::Generic(-start => $record->{L_TIR_START}, -end => $record->{L_TIR_END}, -primary => 'repeat_L', -source => 'internal' ) ); foreach $exon (@{$record->{EXON_LIST}}) { push( @features, new Bio::SeqFeature::Generic(-start => $exon->{START}, -end => $exon->{END}, -primary => 'exon', -source => 'internal' ) ); } push( @features, new Bio::SeqFeature::Generic(-start => $record->{R_TIR_START}, -end => $record->{R_TIR_END}, -primary => 'repeat_R', -source => 'internal' ) ); foreach $feat (@features) {$seqobj->add_SeqFeature($feat);} my %glypher = ( repeat_L => 'arrow', exon => 'generic', repeat_R => 'arrow' ); #would this work afterall? I know I can mix features, but can I mix connectors as #well? my %connector= ( repeat_L => 'none', exon => 'hat', repeat_R => 'none' ); #the following add_track function will cause the error $panel->add_track($seqobj, -glyph => \%glypher, -bgcolor => 'green', -connector => \%connector, -label => 0, ); print "track finished\n"; The error is: Can't locate object method "seq_id" via package "Bio::Seq" at /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Feature.pm line 269, line 191. Can u suggest me what the problem can be, or some other way you would do the same thing (displaying the info) Thanks, Hota From cerdman2 at du.edu Thu Apr 14 11:03:06 2005 From: cerdman2 at du.edu (Colin Erdman) Date: Thu Apr 14 10:56:51 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help Message-ID: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> Hello all, I certainly pounded away at this one last night, I thought this part would be easy, but after spending so much time getting my Entrez gene data parsed etc my brain was a bit rubbery. What I am trying to do is take either A) Two fasta files with refseq/genbank data OR B) Two text files with 1 accession# per line and compare them, outputting only those fasta seqs or accession #'s that are not present in both. So is it easier to just use perl somehow to compare the two raw acc# text files? Or should I keep them as FASTA seqs and compare using Bio::Seq objs somehow? The idea is to update a list of Chromosome 21 genes last revised in 2003 by comparing those accession numbers in our list with all of those accession #'s that I pulled from an entrezgene 21[CHR] AND Homo sapiens[ORGN] NOT pseudogene query and then saved the output as an ASN.1 file. I have all the accession #'s. I just will need to match up those accession #'s NOT currently in our list with the appropriate Entrez Genes using gene2accession, but I am not sure how to do that either. I am assuming using a hash, but they have been steep for me in terms of learning curve, but I'd like to learn them now, I will just need some intuitive support. Thanks all! Colin From skirov at utk.edu Thu Apr 14 11:29:42 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Apr 14 11:25:16 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help In-Reply-To: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> Message-ID: <425E8C66.9030304@utk.edu> Colin, case B is pretty straightforward by using unix diff. Stefan Colin Erdman wrote: >Hello all, > > > >I certainly pounded away at this one last night, I thought this part would >be easy, but after spending so much time getting my Entrez gene data parsed >etc my brain was a bit rubbery. > > > >What I am trying to do is take either A) Two fasta files with refseq/genbank >data OR B) Two text files with 1 accession# per line and compare them, >outputting only those fasta seqs or accession #'s that are not present in >both. > > So is it easier to just use perl somehow to compare the two raw >acc# text files? > > Or should I keep them as FASTA seqs and compare using Bio::Seq >objs somehow? > > > >The idea is to update a list of Chromosome 21 genes last revised in 2003 by >comparing those accession numbers in our list with all of those accession >#'s that I pulled from an entrezgene 21[CHR] AND Homo sapiens[ORGN] NOT >pseudogene query and then saved the output as an ASN.1 file. I have all the >accession #'s. > > > >I just will need to match up those accession #'s NOT currently in our list >with the appropriate Entrez Genes using gene2accession, but I am not sure >how to do that either. I am assuming using a hash, but they have been steep >for me in terms of learning curve, but I'd like to learn them now, I will >just need some intuitive support. > > > >Thanks all! > >Colin > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From sdavis2 at mail.nih.gov Thu Apr 14 11:32:33 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Apr 14 11:26:30 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help In-Reply-To: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> Message-ID: <85a2d264d2ac39f09735b9774de6623d@mail.nih.gov> On Apr 14, 2005, at 11:03 AM, Colin Erdman wrote: > Hello all, > > > > I certainly pounded away at this one last night, I thought this part > would > be easy, but after spending so much time getting my Entrez gene data > parsed > etc my brain was a bit rubbery. > > > > What I am trying to do is take either A) Two fasta files with > refseq/genbank > data OR B) Two text files with 1 accession# per line and compare them, > outputting only those fasta seqs or accession #'s that are not present > in > both. > > So is it easier to just use perl somehow to compare the > two raw > acc# text files? > Colin, If you load your text files as one array for each file, you can easily do what I think you are asking by looking here: http://www.unix.org.ua/orelly/perl/cookbook/ch04_08.htm > I just will need to match up those accession #'s NOT currently in our > list > with the appropriate Entrez Genes using gene2accession, but I am not > sure > how to do that either. I am assuming using a hash, but they have been > steep > for me in terms of learning curve, but I'd like to learn them now, I > will > just need some intuitive support. Yep. Hash will do it. Read in your file grabbing the appropriate columns and putting them in a hash like: my %acc2genehash; while (my $line=) { my @params=split(/\t/,$line); $acc2genehash{$params[1]}=$params[5]; } Then you can do: print $acc2genehash{'AAD12597.1'} will give you 1246500, the gene id of that accession (from the first line of gene2accession); I haven't tested the above code, and you still need to do file loading, etc., but I hope you get the point. Sean From chandan.kr.singh at gmail.com Thu Apr 14 11:32:51 2005 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Thu Apr 14 11:26:58 2005 Subject: [Bioperl-l] RemoteBlast by Bio::Perl Through proxy In-Reply-To: <2d4f3205041323375c4f822b@mail.gmail.com> References: <2d4f3205041211533a6a32c8@mail.gmail.com> <055ecfb115370dc1924c4ceff464c8d5@bioinfo.ca> <2d4f3205041323375c4f822b@mail.gmail.com> Message-ID: <2d4f3205041408325e719d6d@mail.gmail.com> Hi to all I found a reason because of which get_sequence works and blast_sequence does not . The fact is that for "Bio::Perl::blast_sequence " the subroutine LWP::UserAgent::env_proxy() is not called while for Bio::Perl::get_sequence it is called and so Bio::Perl::get_sequence works fine . I have run two more programs and am 100% sure of what i'm talking about . I will be very thankful if somebody could help me out . Thanks in advance . Chandan On 4/14/05, CHANDAN SINGH wrote: > Hi all > As already mentioned i get error 500 while using Remoteblast through > Bio::Perl module .Most of the time the error is > 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to host) > Sometimes the error is > 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: timeout) > (refer to my first mail for details ) > > Of late I have been trying to trace out the bug and found that > The subroutine UserAgent ::_need_proxy returns an undef value > in my case . > I would like to know more about this subroutine and request all who > know about it to please respond . > Thanks > Chandan > > On 4/13/05, Madeleine Lemieux wrote: > > Chandan, > > > > In your own best interest, you should post your responses to the list. > > That way you get the benefit of a large talent pool rather than > > limiting yourself to what little I know. > > > > You can try getting a fresh copy of RemoteBlast.pm from the bioperl > > website just in case you unintentionally changed something other than > > that one line. > > > > But it seems to me as if you're timing out before the NCBI server has > > time to respond in which case I'm afraid I don't know how to help you. > > If you do traceroute www.ncbi.nlm.nih.gov, do you reach the server in a > > reasonable amount of time? > > > > Madeleine > > > > > hi Madeleine Lemieux > > > Thanks for ur reply . Yes the error happens everytime i run the code > > > and you deduced correctly that the get_sequence works > > > fine . Yes i did try to appened one line to RemoteBlast after making > > > a few attempts to run the code but it did not work and i removed it . > > > Do you suggest installing the modules again ? > > > > > > chandan > > > > > > On Apr 12, 2005 3:55 AM, Madeleine Lemieux wrote: > > >> Chandan, > > >> > > >> Does this error happen every time you run the code? The get_sequence > > >> works otherwise your error message would have come from EBI, not NCBI, > > >> which means your proxy is set up OK. Both get_sequence and > > >> blast_sequence eventually call the same User Agent code so if one > > >> works, the other should as well. Have you edited either Perl.pm or > > >> RemoteBlast.pm? > > >> > > >> Madeleine > > >> > > >>> I am a newbie and while installing bioperl and other related softwres > > >>> i had installation problems > > >>> but fortunately succeeded in debugging few .I had no idea where to > > >>> share those experiences > > >>> but now being a member of this forum might help. Though i dont have > > >>> the installation outputs > > >>> i would like to share the last such experience while installind > > >>> Berkeley DB .It was not able to include a file Extern.h and i found > > >>> that the concerned file in > > >>> /usr/lib/perl5/..../Core/Extern had a wrong extension . After setting > > >>> it to Extern.h installation was > > >>> possible.I could not install few other related softwares . > > >>> > > >>> At present i am facing this problem.The second code for blast in > > >>> bptutorial(html format on bioperl site ) when run does not yield the > > >>> desired result . > > >>> #! /usr/bin/perl > > >>> use Bio::Perl; > > >>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > > >>> # uses the default database - nr in this case > > >>> $blast_result = blast_sequence($seq_object); > > >>> > > >>> write_blast(">roa1.blast",$blast_result); > > >>> The output after running the code is > > >>> ------------------- WARNING --------------------- > > >>> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > > >>> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 > > >>> Content-Length: 651 > > >>> Content-Type: application/x-www-form-urlencoded > > >>> > > >>> DATABASE=nr&QUERY=%3EROA1_HUMAN+Heterogeneous+nuclear+ribonucleoprote > > >>> in > > >>> +A1+(Helix-destabilizing+protein)+(Single- > > >>> strand+binding+protein)+(hnRNP+core+protein+A1).%0ASKSESPKEPEQLRKLFIG > > >>> GL > > >>> SFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVTYATVEEVDAAMNARPHKVDGRVVEPKRAV > > >>> SR > > >>> EDSQRPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIEIMTDRGSGKKRGFAFVTFDDHDSVDK > > >>> IV > > >>> IQKYHTVNGHNCEVRKALSKQEMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSR > > >>> GG > > >>> GGYGGSGDGYNGFGNDGGYGGGGPGYSGGSRGYGSGGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGGSG > > >>> SN > > >>> FGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF&C > > >>> OM > > >>> POSITION_BASED_STATISTICS=off&EXPECT=1e > > >>> -10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L&PROGRAM=bl > > >>> as > > >>> tp > > >>> > > >>> > > >>> An Error Occurred > > >>> > > >>>

An Error Occurred

> > >>> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to > > >>> host) > > >>> > > >>> > > >>> > > >>> --------------------------------------------------- > > >>> Submitted Blast for [ROA1_HUMAN] > > >>> > > >>> > > >>> The env variable HTTP_PROXY or /and http_proxy is set correct .I hope > > >>> somebody can > > >>> help me out . > > >>> Thank You . > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l@portal.open-bio.org > > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > > > From cerdman2 at du.edu Thu Apr 14 11:41:10 2005 From: cerdman2 at du.edu (Colin Erdman) Date: Thu Apr 14 11:34:46 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help In-Reply-To: <85a2d264d2ac39f09735b9774de6623d@mail.nih.gov> Message-ID: <0IEY00GIS0WQF7@smtpout.cair.du.edu> Sean and Stefan, FANTASTIC, I did have my accessions loaded up as arrays and this was EXACTLY what I was looking for (the array comparison). The diff capability in unix/linux will be nice when I get the final code going on the production server... just on win32 for now. Thanks very much! Colin -----Original Message----- From: Sean Davis [mailto:sdavis2@mail.nih.gov] Sent: Thursday, April 14, 2005 9:33 AM To: Colin Erdman Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] Very basic Perl/BioPerl Help On Apr 14, 2005, at 11:03 AM, Colin Erdman wrote: > Hello all, > > > > I certainly pounded away at this one last night, I thought this part > would > be easy, but after spending so much time getting my Entrez gene data > parsed > etc my brain was a bit rubbery. > > > > What I am trying to do is take either A) Two fasta files with > refseq/genbank > data OR B) Two text files with 1 accession# per line and compare them, > outputting only those fasta seqs or accession #'s that are not present > in > both. > > So is it easier to just use perl somehow to compare the > two raw > acc# text files? > Colin, If you load your text files as one array for each file, you can easily do what I think you are asking by looking here: http://www.unix.org.ua/orelly/perl/cookbook/ch04_08.htm > I just will need to match up those accession #'s NOT currently in our > list > with the appropriate Entrez Genes using gene2accession, but I am not > sure > how to do that either. I am assuming using a hash, but they have been > steep > for me in terms of learning curve, but I'd like to learn them now, I > will > just need some intuitive support. Yep. Hash will do it. Read in your file grabbing the appropriate columns and putting them in a hash like: my %acc2genehash; while (my $line=) { my @params=split(/\t/,$line); $acc2genehash{$params[1]}=$params[5]; } Then you can do: print $acc2genehash{'AAD12597.1'} will give you 1246500, the gene id of that accession (from the first line of gene2accession); I haven't tested the above code, and you still need to do file loading, etc., but I hope you get the point. Sean From mlemieux at bioinfo.ca Thu Apr 14 11:46:16 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Thu Apr 14 11:39:47 2005 Subject: [Bioperl-l] RemoteBlast by Bio::Perl Through proxy In-Reply-To: <2d4f3205041408325e719d6d@mail.gmail.com> References: <2d4f3205041211533a6a32c8@mail.gmail.com> <055ecfb115370dc1924c4ceff464c8d5@bioinfo.ca> <2d4f3205041323375c4f822b@mail.gmail.com> <2d4f3205041408325e719d6d@mail.gmail.com> Message-ID: <7b54a708fb0f19f67c5fafb05a18a589@bioinfo.ca> Chandan, I don't think your proxy is the problem. Blast_sequence correctly posts the request but your server's timeout is too short for the NCBI server to respond. If you compare the number of hops from you to the EBI server versus you to the NCBI server, I bet the latter is longer. Or the NCBI's server is just busier and slower to respond. I've tested going through proxies that either require authentication or don't. In both cases blast_sequence worked fine. I don't know enough about how to change the timeout settings to help you with that but that is where I would look. Good luck, Madeleine > Hi to all > I found a reason because of which get_sequence works and > blast_sequence does not . > The fact is that for "Bio::Perl::blast_sequence " the subroutine > LWP::UserAgent::env_proxy() is not called while for > Bio::Perl::get_sequence > it is called and so Bio::Perl::get_sequence works fine . > I have run two more programs and am 100% sure of what i'm talking > about . > I will be very thankful if somebody could help me out . > Thanks in advance . > Chandan > > On 4/14/05, CHANDAN SINGH wrote: >> Hi all >> As already mentioned i get error 500 while using Remoteblast through >> Bio::Perl module .Most of the time the error is >> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to >> host) >> Sometimes the error is >> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: timeout) >> (refer to my first mail for details ) >> >> Of late I have been trying to trace out the bug and found that >> The subroutine UserAgent ::_need_proxy returns an undef >> value >> in my case . >> I would like to know more about this subroutine and request all who >> know about it to please respond . >> Thanks >> Chandan >> >> On 4/13/05, Madeleine Lemieux wrote: >>> Chandan, >>> >>> In your own best interest, you should post your responses to the >>> list. >>> That way you get the benefit of a large talent pool rather than >>> limiting yourself to what little I know. >>> >>> You can try getting a fresh copy of RemoteBlast.pm from the bioperl >>> website just in case you unintentionally changed something other than >>> that one line. >>> >>> But it seems to me as if you're timing out before the NCBI server has >>> time to respond in which case I'm afraid I don't know how to help >>> you. >>> If you do traceroute www.ncbi.nlm.nih.gov, do you reach the server >>> in a >>> reasonable amount of time? >>> >>> Madeleine >>> >>>> hi Madeleine Lemieux >>>> Thanks for ur reply . Yes the error happens everytime i run the >>>> code >>>> and you deduced correctly that the get_sequence works >>>> fine . Yes i did try to appened one line to RemoteBlast after making >>>> a few attempts to run the code but it did not work and i removed it >>>> . >>>> Do you suggest installing the modules again ? >>>> >>>> chandan >>>> >>>> On Apr 12, 2005 3:55 AM, Madeleine Lemieux >>>> wrote: >>>>> Chandan, >>>>> >>>>> Does this error happen every time you run the code? The >>>>> get_sequence >>>>> works otherwise your error message would have come from EBI, not >>>>> NCBI, >>>>> which means your proxy is set up OK. Both get_sequence and >>>>> blast_sequence eventually call the same User Agent code so if one >>>>> works, the other should as well. Have you edited either Perl.pm or >>>>> RemoteBlast.pm? >>>>> >>>>> Madeleine >>>>> >>>>>> I am a newbie and while installing bioperl and other related >>>>>> softwres >>>>>> i had installation problems >>>>>> but fortunately succeeded in debugging few .I had no idea where to >>>>>> share those experiences >>>>>> but now being a member of this forum might help. Though i dont >>>>>> have >>>>>> the installation outputs >>>>>> i would like to share the last such experience while installind >>>>>> Berkeley DB .It was not able to include a file Extern.h and i >>>>>> found >>>>>> that the concerned file in >>>>>> /usr/lib/perl5/..../Core/Extern had a wrong extension . After >>>>>> setting >>>>>> it to Extern.h installation was >>>>>> possible.I could not install few other related softwares . >>>>>> >>>>>> At present i am facing this problem.The second code for blast in >>>>>> bptutorial(html format on bioperl site ) when run does not yield >>>>>> the >>>>>> desired result . >>>>>> #! /usr/bin/perl >>>>>> use Bio::Perl; >>>>>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>>>>> # uses the default database - nr in this case >>>>>> $blast_result = blast_sequence($seq_object); >>>>>> >>>>>> write_blast(">roa1.blast",$blast_result); >>>>>> The output after running the code is >>>>>> ------------------- WARNING --------------------- >>>>>> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi >>>>>> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 >>>>>> Content-Length: 651 >>>>>> Content-Type: application/x-www-form-urlencoded >>>>>> >>>>>> DATABASE=nr&QUERY=%3EROA1_HUMAN+Heterogeneous+nuclear+ribonucleopr >>>>>> ote >>>>>> in >>>>>> +A1+(Helix-destabilizing+protein)+(Single- >>>>>> strand+binding+protein)+(hnRNP+core+protein+A1).%0ASKSESPKEPEQLRKL >>>>>> FIG >>>>>> GL >>>>>> SFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVTYATVEEVDAAMNARPHKVDGRVVEPK >>>>>> RAV >>>>>> SR >>>>>> EDSQRPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIEIMTDRGSGKKRGFAFVTFDDHDS >>>>>> VDK >>>>>> IV >>>>>> IQKYHTVNGHNCEVRKALSKQEMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFG >>>>>> GSR >>>>>> GG >>>>>> GGYGGSGDGYNGFGNDGGYGGGGPGYSGGSRGYGSGGQGYGNQGSGYGGSGSYDSYNNGGGRGFGG >>>>>> GSG >>>>>> SN >>>>>> FGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRR >>>>>> F&C >>>>>> OM >>>>>> POSITION_BASED_STATISTICS=off&EXPECT=1e >>>>>> -10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&FILTER=L&PROGRAM >>>>>> =bl >>>>>> as >>>>>> tp >>>>>> >>>>>> >>>>>> An Error Occurred >>>>>> >>>>>>

An Error Occurred

>>>>>> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: No route to >>>>>> host) >>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------- >>>>>> Submitted Blast for [ROA1_HUMAN] >>>>>> >>>>>> >>>>>> The env variable HTTP_PROXY or /and http_proxy is set correct .I >>>>>> hope >>>>>> somebody can >>>>>> help me out . >>>>>> Thank You . >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>> >>> >> > From skirov at utk.edu Thu Apr 14 11:44:47 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Apr 14 11:40:22 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help In-Reply-To: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> Message-ID: <425E8FEF.9080205@utk.edu> Sorry Colin, I was thinking of sort/diff but this may not work as there will be insertions/deletions... You can just use perl to cycle through both lists: my $f1=shift; my $f2=shift; open (F1,$f1)||die; open (F2,$f2)||die; my @accn1=; my @accn2=; my @unique1; foreach my $accn (@accn1) { push @unique1,$accn unless (grep(/$accn/,@accn2)); } Sorry for the confusion Stefan Colin Erdman wrote: >Hello all, > > > >I certainly pounded away at this one last night, I thought this part would >be easy, but after spending so much time getting my Entrez gene data parsed >etc my brain was a bit rubbery. > > > >What I am trying to do is take either A) Two fasta files with refseq/genbank >data OR B) Two text files with 1 accession# per line and compare them, >outputting only those fasta seqs or accession #'s that are not present in >both. > > So is it easier to just use perl somehow to compare the two raw >acc# text files? > > Or should I keep them as FASTA seqs and compare using Bio::Seq >objs somehow? > > > >The idea is to update a list of Chromosome 21 genes last revised in 2003 by >comparing those accession numbers in our list with all of those accession >#'s that I pulled from an entrezgene 21[CHR] AND Homo sapiens[ORGN] NOT >pseudogene query and then saved the output as an ASN.1 file. I have all the >accession #'s. > > > >I just will need to match up those accession #'s NOT currently in our list >with the appropriate Entrez Genes using gene2accession, but I am not sure >how to do that either. I am assuming using a hash, but they have been steep >for me in terms of learning curve, but I'd like to learn them now, I will >just need some intuitive support. > > > >Thanks all! > >Colin > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From ewijaya at singnet.com.sg Thu Apr 14 07:58:11 2005 From: ewijaya at singnet.com.sg (Edward WIJAYA) Date: Thu Apr 14 12:41:22 2005 Subject: [Bioperl-l] Parse::RecDescent in BioPerl and bioinformatics In-Reply-To: <425E8FEF.9080205@utk.edu> References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> <425E8FEF.9080205@utk.edu> Message-ID: Hi, I am new in BioPerl and Bioinformatics. And lately I've been encountering a module that is known to be very powerful: "Parse::RecDescent" But I want to know if this module has also been extensively used in BioPerl and bioinformatics in general? If so, in what problem is this module useful? I hope my question is not far to OT. Thanks so much for your time. Hope to hear from you again. Regards, Edward WIJAYA From tcj25 at cam.ac.uk Thu Apr 14 11:49:29 2005 From: tcj25 at cam.ac.uk (Terry Jones) Date: Thu Apr 14 13:56:24 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help In-Reply-To: Your message at 09:03:06 on Thursday, 14 April 2005 References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> Message-ID: <16990.37129.109552.597508@terry.jones.tc> | What I am trying to do is take either A) Two fasta files with | refseq/genbank data OR B) Two text files with 1 accession# per line | and compare them, outputting only those fasta seqs or accession #'s | that are not present in both. | | So is it easier to just use perl somehow to compare the two raw acc# | text files? If your files do not contain repeat lines, you can do this from the raw acc# text files in various ways. If you're using some form of UNIX, you can do this on the command line: $ cat file1 file2 | sort | uniq -c | egrep '^ *1 ' | cut -f2 | sort Note that there's a TAB in the egrep expression (between the 1 and the '). Another way is to use comm $ sort file1 > file1.sorted $ sort file2 > file2.sorted $ comm -3 file1.sorted file2.sorted You can guarantee that your input files do not have duplicates via $ sort -u -i file1 > file1.sorted This is all outside perl. In perl you could do something like open(F1, "file1") || die "could not open file1 ($!)"; open(F2, "file2") || die "could not open file2 ($!)"; my %names; while (){ chomp; $names{$_}++; } while (){ chomp; $names{$_}++; } close(F1) || die "could not close file1 ($!)"; close(F2) || die "could not close file2 ($!)"; my @not_in_both = grep { $names{$_} == 1 } keys %names; Again, this relies on names only being present once in each file. You could code around this requirement in your perl if you wanted, by doing more checking of the input. Terry From golharam at umdnj.edu Thu Apr 14 14:13:09 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu Apr 14 14:10:11 2005 Subject: [Bioperl-l] How to obtain the chromosome from an Accession Number In-Reply-To: <4CE97961-2212-11B2-AAEE-000393CFE1C4@mail.nih.gov> Message-ID: <000c01c5411d$9d497060$9200a8c0@GOLHARMOBILE1> I found a way to go it using Entrez Gene and the NCBI uetils tools. It returns the information in ASN.1 format. I see from another thread a parser for this is being worked on. Thanks for your help, ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam@umdnj.edu -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sean Davis Sent: Tuesday, January 06, 1970 4:50 AM To: Sean Davis Cc: 'Bioperl List' Subject: Re: [Bioperl-l] How to obtain the chromosome from an Accession Number And, I forgot to mention, I don't think there is a way to do this directly in bioperl, although perl is likely to be useful in this process. Sean On Jan 6, 1970, at 4:44 AM, Sean Davis wrote: > Ryan, > > What kind of accession? GenBank? If so, then you can use the UCSC > genome table browser (http://genome.ucsc.edu/cgi-bin/hgTables) to > upload your list of all accessions and look them up in their mapping > tables (mRNA and EST and knownGene). You would need to upload your > whole list to each of the possible organisms and then compile the > final list from the species-specific lists. Of course, these tables > are all available as tab-delimited text for download from UCSC, so you > could grab whichever you like and do it on your own machine. Note > that some accessions will not map to any chromosome of any organism > and some will map to many chromosomes/locations in the organism from > which they came. > > Sean > > On Apr 13, 2005, at 2:43 PM, Ryan Golhar wrote: > >> Hi all, >> >> I have a bunch of accession numbers from different organisms, and I'm >> trying to determine the chromosome of their respective organism that >> they occur on. Can I do this with BioPerl? >> >> Ryan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu Apr 14 14:43:28 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Apr 14 14:37:59 2005 Subject: [Bioperl-l] Parse::RecDescent in BioPerl and bioinformatics In-Reply-To: References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> <425E8FEF.9080205@utk.edu> Message-ID: It has been used in a couple of modules but it is too slow in practice. some list traffic about it in the past might help. http://www.google.com/search? q=site%3Abioperl.org+%2Bpipermail+RecDescent -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 14, 2005, at 7:58 AM, Edward WIJAYA wrote: > Hi, > > I am new in BioPerl and Bioinformatics. > And lately I've been encountering a module that is known > to be very powerful: "Parse::RecDescent" > > But I want to know if this module has also been > extensively used in BioPerl and bioinformatics in general? > If so, in what problem is this module useful? > > I hope my question is not far to OT. > Thanks so much for your time. > Hope to hear from you again. > > Regards, > Edward WIJAYA > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Jianjun.Wang at invitrogen.com Thu Apr 14 12:45:37 2005 From: Jianjun.Wang at invitrogen.com (Wang, Jianjun) Date: Thu Apr 14 14:42:24 2005 Subject: [Bioperl-l] Bio::DB::Genbank not working after installation Message-ID: <0BF025DD64B01342A4F6552D5E452F0C2BEE5F@FRD01EXCMBX01.ads.invitrogen.net> I installed perl and bioperl lately on a Linux box. But Bio::DB::Genbank couldn't work. Looks like it's because the LWP/UserAgent package is missing somehow. This is reproduced with another Linux machine. Both followed recommended procedures from Perl and BioPerl. Could anyone shed some light on this? Thanks. -Jianjun Wang From tcj25 at cam.ac.uk Thu Apr 14 12:12:59 2005 From: tcj25 at cam.ac.uk (Terry Jones) Date: Thu Apr 14 14:43:33 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help In-Reply-To: Your message at 11:44:47 on Thursday, 14 April 2005 References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> <425E8FEF.9080205@utk.edu> Message-ID: <16990.38539.668844.762878@terry.jones.tc> | my $f1=shift; | my $f2=shift; | open (F1,$f1)||die; | open (F2,$f2)||die; | my @accn1=; | my @accn2=; | my @unique1; | foreach my $accn (@accn1) { | push @unique1,$accn unless (grep(/$accn/,@accn2)); | } This also just does one half (F1-F2). You need to add F2-F1 to it. T From tcj25 at cam.ac.uk Thu Apr 14 12:08:44 2005 From: tcj25 at cam.ac.uk (Terry Jones) Date: Thu Apr 14 14:43:35 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help In-Reply-To: Your message at 09:41:10 on Thursday, 14 April 2005 References: <85a2d264d2ac39f09735b9774de6623d@mail.nih.gov> <0IEY00GIS0WQF7@smtpout.cair.du.edu> Message-ID: <16990.38284.594198.203821@terry.jones.tc> | FANTASTIC, I did have my accessions loaded up as arrays and this was | EXACTLY what I was looking for (the array comparison). Note that you want what's in A and not B, as well as what's in B and not A. The referenced web page just does one of these. Terry From mingyi.liu at gpc-biotech.com Thu Apr 14 14:52:34 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Thu Apr 14 14:44:36 2005 Subject: [Bioperl-l] Parse::RecDescent in BioPerl and bioinformatics In-Reply-To: References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> <425E8FEF.9080205@utk.edu> Message-ID: <425EBBF2.6020803@gpc-biotech.com> Hi, Edward, I've written an article comparing using Parse::RecDescent, Parse::Yapp, Perl-byacc and just plain regex approach in writing a parser for Entrez Gene. You can take a look at it at http://egparser.sourceforge.net/ I also think Parse::RecDescent is not widely used in Bioperl or Bioinfo. Its performance is part of the reason why one might be wary of using it in Bioinfo to process a lot of data. Mingyi From jason.stajich at duke.edu Thu Apr 14 15:04:19 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Apr 14 14:57:59 2005 Subject: [Bioperl-l] Bio::DB::Genbank not working after installation In-Reply-To: <0BF025DD64B01342A4F6552D5E452F0C2BEE5F@FRD01EXCMBX01.ads.invitrogen.net> References: <0BF025DD64B01342A4F6552D5E452F0C2BEE5F@FRD01EXCMBX01.ads.invitrogen.net> Message-ID: you should install LWP::UserAgent since it is missing. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 14, 2005, at 12:45 PM, Wang, Jianjun wrote: > I installed perl and bioperl lately on a Linux box. But > Bio::DB::Genbank > couldn't work. Looks like it's because the LWP/UserAgent package is > missing > somehow. This is reproduced with another Linux machine. Both followed > recommended procedures from Perl and BioPerl. Could anyone shed some > light > on this? Thanks. > > -Jianjun Wang > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From tcj25 at cam.ac.uk Thu Apr 14 15:28:48 2005 From: tcj25 at cam.ac.uk (Terry Jones) Date: Thu Apr 14 15:29:25 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help In-Reply-To: Your message at 11:44:47 on Thursday, 14 April 2005 References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> <425E8FEF.9080205@utk.edu> Message-ID: <16990.50288.724627.299811@terry.jones.tc> | foreach my $accn (@accn1) { | push @unique1,$accn unless (grep(/$accn/,@accn2)); | } Just one more comment on this. You'd be safer doing foreach my $accn (@accn1) { push @unique1,$accn unless (grep(/^\Q$accn\E$/,@accn2)); } To guard against false substring matches (with the ^ and $) and against the chance that $accn contains a special regexp character (via \Q and \E). Sorry if this sounds picky... I'm just trying to help a little. Things like this can be a pain to track down when/if they finally bite. Terry From skirov at utk.edu Thu Apr 14 15:53:32 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Apr 14 15:49:07 2005 Subject: [Bioperl-l] Very basic Perl/BioPerl Help In-Reply-To: <16990.50288.724627.299811@terry.jones.tc> References: <0IEX00GMDZ5AF7@smtpout.cair.du.edu> <425E8FEF.9080205@utk.edu> <16990.50288.724627.299811@terry.jones.tc> Message-ID: <425ECA3C.8000907@utk.edu> Terry, It is safer, but I don't think this is highly likely. I usually do /\b$accn\b/, which should be sufficient. Anyway it was an example that can be used as a start point.. Stefan Terry Jones wrote: >| foreach my $accn (@accn1) { >| push @unique1,$accn unless (grep(/$accn/,@accn2)); >| } > >Just one more comment on this. You'd be safer doing > >foreach my $accn (@accn1) { > push @unique1,$accn unless (grep(/^\Q$accn\E$/,@accn2)); >} > >To guard against false substring matches (with the ^ and $) and >against the chance that $accn contains a special regexp character (via >\Q and \E). > >Sorry if this sounds picky... I'm just trying to help a little. Things >like this can be a pain to track down when/if they finally bite. > >Terry > > > From skirov at utk.edu Thu Apr 14 16:03:28 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Apr 14 15:56:56 2005 Subject: [Bioperl-l] Entrez parser Message-ID: <425ECC90.8080004@utk.edu> Hilmar, can you give it a try now? my $file=shift; my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', -locuslink=>'convert'); my $seq=$eio->next_seq; due to a small glitch in Bio::ASN1::EntrezGene, first record is empty. Mingyi knows about that. Stefan From barry.moore at genetics.utah.edu Thu Apr 14 16:32:48 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu Apr 14 16:26:59 2005 Subject: [Bioperl-l] Re: Entrez gene parser code In-Reply-To: References: Message-ID: <425ED370.6000606@genetics.utah.edu> Colin- Did you want to install a new ppm repository on your Windows box so you can install bioperl with ppm? If so, this is not via CVS. You want to run the following commands from you ppm prompt. rep add Bioperl http://bioperl.org/DIST/ rep add Kobes http://theoryx5.uwinnipeg.ca/ppms/ rep add Bribes http://www.bribes.org/perl/ppm You can then search for bioperl and install the version you want. Nathan Haigh (usually on this list) was preparing a bioperl 1.5 ppm. Not sure if it made it onto the website yet, but 1.4 is there and that install works as expected. BTW, this will only install bioperl core. ------------------------------------------------------------------------------------------------------------------------------------------------- Installing Bioperl on Windows ============================= 1) Quick Instructions for the Impatient 2) Bioperl on Windows 3) Perl on Windows 4) BioPerl on Windows 5) Beyond the Core 6) BioPerl and Cygwin 7) Cygwin Tips 8) Example Script This installation guide was written by Barry Moore, Nathan Haigh and other Bioperl authors based on the original work of Paul Boutros. Please report problems and/or fixes to the bioperl mailing list, bioperl-l@bioperl.org 1) Quick instructions for the impatient, lucky, or experienced user. ===================================================================== Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). Open a command prompt (Menus Start->Run and type cmd) and run the PPM shell (C:\>ppm). Add two new PPM repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm Install Bioperl with the following commands: ppm> search Bioperl This returns a numbered list of packages with corresponding version numbers etc. with "Bioperl" in their name. ppm> install Where corresponds to the relevant package and version from the numbered list obtained above. Go to http://www.bioperl.org and start reading documentation or try the example script at the end of this file. 2) Bioperl ====================== Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing Perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, fasta, clustalw and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc) and local (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. And finally with its associated documentation and mailing list Bioperl represents a community of bioinformatics professionals working in Perl who are committed to supporting both development of Bioperl and the new users who are drawn to the project. While most bioinformatics and computational biology applications are developed in Unix/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl simply acts as a wrapper for running and parsing output from these external programs. Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with BioPerl in the native Windows environment. Some external programs such as Staden and the EMBOSS suite of programs can not be installed on Windows at all, and therefore any part of Bioperl that interacts with these packages either won't work or can't be installed at all. If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don't mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also. Be aware that most if not all of the Bioperl developers are working in some type of a UNIX environment (Linux, OSX, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses - simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of UNIX like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a UNIX emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed in more detail below. 3) Perl on Windows =================== There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState. ActiveState is a software company (http://www.activestate.com) that provides free builds of Perl for Windows users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638 is also available and should work just fine). To install ActivePerl on Windows: Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you're doing. If that's the case you probably don't need to be reading this guide. Cygwin is a UNIX emulation environment for Windows and comes with its own copy of Perl. Information on Cygwin and Bioperl is found below. 4) BioPerl on Windows ====================== Perl is a programming language that has been extended a lot by the addition of external modules. These modules work with the core language to extend the functionality of Perl. Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can't install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl - if you've installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you'll have to install them yourself if you want to use them. BioPerl has such dependencies. Bioperl is actually a large collection of Perl modules (over 1000 currently) and these modules are split into six groups. These six groups are: Bioperl Group Functions ----------------------------------------------------------------- bioperl (the core) Most of the main functionality of Bioperl. bioperl-run Wrappers to a lot of external programs. bioperl-ext Interaction with some alignment functions and the Staden package. bioperl-db Using bioperl with BioSQL and local relational databases. bioperl-microarray Microarray specific functions. biperl-gui Some preliminary work on a graphical user interface to some Bioperl functions. The Bioperl core is what most new users will want to start with. Bioperl (the core) and the Perl modules that it depends on can be easily installed with PPM. PPM (Programmer's Package Manager formally known as the Perl Package Manager) is an ActivePerl utility for installing Perl modules on systems using ActivePerl. The PPM commands shown in this document are for PPM version 3, if you use PPM version 2 the commands you require will be different. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you. These .ppd files are stored online in PPM repositories. ActiveState maintains the largest PPM repository and when you installed ActivePerl PPM was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own PPM repositories to fill in the gaps. Installing will require you to direct PPM to look in three new repositories. You do this by opening a Windows command prompt, typing ppm to start the PPM shell and then typing the following three commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm Once PPM knows where to look for Bioperl and it's dependencies you simply tell PPM to search for packages with Bioperl in their name, and then which of these to install. This is done with the following commands: ppm> search Bioperl This returns a numbered list of packages with corresponding version numbers etc. with "Bioperl" in their name. ppm> install Where corresponds to the relevant package and version from the numbered list obtained above. 5) Beyond the Core =================== You may find that you want some of the features of other Bioperl groups like bioperl-run or bioperl-db. There are currently no PPM packages for installing these parts of Bioperl (but check this by doing a Bioperl search at the PPM shell): ppm> search bioperl If they are not present, you will have to install these manually from source. For this you will need a Windows version of the program make called nmake (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe). You will also want to have a willingness to experiment. You'll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make. You will have to determine from the installation documents what dependencies are required and you will have to get them, read there documentation and install them first. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with others on the bioperl mailing list. 6) BioPerl and Cygwin ===================== Cygwin is a UNIX emulator and shell environment available free at www.cygwin.com. BioPerl runs well within Cygwin. Some users claim that installation of Bioperl is easier within Cygwin than within Windows, but these may be users with UNIX backgrounds. One advantage of using Bioperl in Cygwin is that all the external modules are available through CPAN, most if not all external programs can be installed and run so many of the limitation of Bioperl on Windows are circumvented. To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, make, and gcc packages. Clicking the "View" button in the upper right of the installer enables you to see details on the various packages. Then follow the BioPerl installation instructions for UNIX in BioPerl's INSTALL file. Note that expat comes with Cygwin (it's used by the module XML::Parser). One known issue is that DBD::mysql can be tricky to install in Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline external packages. Fortunately there's some good instructions online: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin. Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a place to create temporary files. e.g.: setenv TMPDIR e:/cygwin/tmp # csh, tcsh export TMPDIR=e:/cygwin/tmp # sh, bash Note that this is not a syntax that Cygwin understands, which would be something like "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows. If this variable is not set correctly you'll see errors like this when you run Bio::Tools::Run::StandAloneBlast: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory STACK: Error::throw .......... 7) Cygwin Tips =============== The easiest way to install MySQL is to use the Windows binaries available at www.mysql.com. Note that Windows does not have sockets, so you need to force the MySQL connections to use TCP/IP instead. Do this by using the "-h" option from the command-line: >mysql -h 127.0.0.1 -u blip -pblop biosql Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For example, if your databases are installed locally: alias mysql 'mysql -h 127.0.0.1' If you're trying to use some application or resource "outside" of Cygwin and you're having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may end up with paths written in these different syntaxes, depending. If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32-formatted. When you install Cygwin on a FAT32 partition you will not be able to set permissions and ownership correctly. In most situations this probably won't make any difference but there may be occasions where this is a problem. If you want to use BLAST we recommend that the Windows binary be obtained from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls. Although we've recommended using the BLAST and MySQL binaries you should be able to compile just about everything else from source code using Cygwin's gcc. You'll notice when you're installing Cygwin that many different libraries are also available (gd, jpeg, etc.). 8) Example Script ================= #!/usr/bin/perl #A short script to demonstrate how to download sequences from GenBank and access #the sequence and some associated annotations using Bioperl. use strict; use warnings; use Bio::SeqIO; use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed #Get some sequence IDs either like below, or read in from a file. Note that #this sample script works with the accession numbers below (at least at the time #it was written). If you add different accession numbers, and you get errors, #you may be calling for something that the sequence doesn't have. You'll have #to add your own error trapping code to handle that. my @ids = ('K03160', 'AB039327', 'BC035972'); #Create the GenBank database object to read from the database. my $gb = new Bio::DB::GenBank(); #Create a sequence stream to pass the sequences from the database to the program. my $seqio = $gb->get_Stream_by_id(\@ids); #Loop over all of the sequences that you requested. while (my $seq = $seqio->next_seq) { #Here is how you get methods directly from the RichSeq object. Replace #'display_name' with any other method in Table 2. that can be called on #either the RichSeq object directly, or the PrimarySeq object which it has #inherited. print "Display Name: ", $seq->display_name,"\n"; print "Sequence Date: ",$seq->get_dates,"\n"; #Here is how to access the classification data from the species object. my $species = $seq->species; print "Species :", $species->common_name,"\n"; my @class = $species->classification; print "Classification: @class\n"; #Here is a general way to call things that are stored as a Bio::SeqFeature:: #Generic object. Replace 'source' with any other of the "major" headings in #the feature table (e.g gene, CDS, etc.) and replace 'organism' with any of #the tag values found under that heading (mol_type, locus_tag, gene, etc.) my @source_feats = grep { $_->primary_tag eq 'source' } $seq->get_SeqFeatures(); my $source_feat = shift @source_feats; my @mol_type = $source_feat->get_tag_values('mol_type'); print "Molecule Type: @mol_type\n"; #Here is a general way to call things that are stored as some type of a #Bio::Annotation oject. This includes reference information, and comments. #Replace reference with 'comment' to get the comment, and replace #$ref->authors with $ref->title (or location, medline, etc.) to get other #reference categories my $ann = $seq->annotation(); my @references = ($ann->get_Annotations('reference')); my $ref = shift @references; my ($title, $authors, $location, $pubmed, $reference); if (defined $ref) { $authors = $ref->authors; print "Authors: $authors\n"; } print "Sequence: \n", $seq->seq, "\n\n"; } Brian Osborne wrote: >Colin, > >If you'd like a command-line environment like some sort of Unix install >Cygwin (www.cygwin.com). No need to install everything, just click the >"View" button in the main installation window and select and install the >minimum, something like gcc, binutils, cvs, openssh, make, Perl. > >Brian O. > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Colin Erdman >Sent: Tuesday, April 12, 2005 2:23 PM >To: 'Mingyi Liu'; 'Stefan Kirov' >Cc: 'Bioperl list' >Subject: RE: [Bioperl-l] Re: Entrez gene parser code > > >I am between Linux installs right now and actually running win32 with the >ActiveState Perl install... How does one add the cvs.open-bio.org repository >to the PPM console list to search through it and install the bioperl-live >packages etc? I don't see a comparable cvs command within it. > >This is all new to me and I appreciate the help! >Thanks, >Colin > >-----Original Message----- >From: Mingyi Liu [mailto:mingyi.liu@gpc-biotech.com] >Sent: Tuesday, April 12, 2005 10:56 AM >To: Stefan Kirov >Cc: Colin Erdman; Bioperl list >Subject: Re: [Bioperl-l] Re: Entrez gene parser code > >Stefan Kirov wrote: > > > >>In order for this parser to work you need to get >>GI::Parser::Entrezgene from sourceforge. You can get the address for >>this module from the perl doc of entrezgene: perldoc >>Bio::SeqIO::entrezgene >>Stefan >> >> >> >I just want to add that I will be adding GI::Parser::EntrezGene to cpan >in a few days, and most likely the name space will switch to Bio::ASN1 >(therefore it'd be Bio::ASN1::EntrezGene) based on PAUSE admin suggestion. > >Thanks, > >Mingyi > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From mingyi.liu at gpc-biotech.com Thu Apr 14 16:36:34 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Thu Apr 14 16:30:05 2005 Subject: [Bioperl-l] Entrez parser In-Reply-To: <425ECC90.8080004@utk.edu> References: <425ECC90.8080004@utk.edu> Message-ID: <425ED452.5050104@gpc-biotech.com> Stefan Kirov wrote: > due to a small glitch in Bio::ASN1::EntrezGene, first record is empty. > Mingyi knows about that. Yes, I just released a new version of Bio::ASN1::EntrezGene that fixed this bug. The new release also fixed a minor line number counting bug that happens when user parses multiple files using one parser object. But the bigger changes in this release include: Added a fast indexer (It would take some effort for Mark to add a different type of object in the returning value for their indexer, so I decided to take advantage of the excellent bioperl index code base and develope one. This indexer indexes human file in 21 seconds on one Xeon 2.4 GHz CPU). The return value of the indexer could be either Bio::Seq object produced by Stefan's entrezgene.pm or the data hash produced by Bio::ASN1::EntrezGene. Since this indexer lives as Bio::ASN1::EntrezGene::Indexer.pm there will not be conflict with Mark's indexer (Bio::Index::EntrezGene). It is useful for those that want to retrieve Stefan's Entrez Gene Bio::Seq objects through an indexer. Added test scripts Added new convenient methods (rawdata() and fh()) Now file handles are accepted too (by new() and fh(). new() also now accept '-file', '-fh', 'fh' in addition to 'file') Updated documentation The new version is available at http://sourceforge.net/projects/egparser already. CPAN's not refreshed yet. Best, Mingyi From khan at cshl.edu Thu Apr 14 17:20:01 2005 From: khan at cshl.edu (Khan, Sohail) Date: Thu Apr 14 17:14:36 2005 Subject: [Bioperl-l] (no subject) Message-ID: Dear Members, I am new to Bioperl, Perl and Bioinformatics in general. Is it possible to download the whole gene Name ,sequence (human), with gene name, chrom number?? Second. Is it possible to somehow get Sau3aI fragments of human sequence and their location? Thanks.........Any help is appreciated. Sohail. From palmeida at igc.gulbenkian.pt Thu Apr 14 18:16:22 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Thu Apr 14 18:12:07 2005 Subject: [Bioperl-l] (no subject) In-Reply-To: References: Message-ID: <3128.84.90.13.209.1113516982.squirrel@webmail.igc.gulbenkian.pt> Hi, I don't fully understand the first question, but perhaps you can use Ensmart ( http://www.ensembl.org/Multi/martview ) to get the gene names, sequences and chromosome numbers. For the second one, you can try Bio::Restriction , a BioPerl module ( http://doc.bioperl.org/releases/bioperl-1.4/Bio/Restriction/Analysis.html ) Good luck, Paulo > Dear Members, > > I am new to Bioperl, Perl and Bioinformatics in general. > > Is it possible to download the whole gene Name ,sequence (human), with > gene name, chrom number?? > Second. Is it possible to somehow get Sau3aI fragments of human sequence > and their location? > > Thanks.........Any help is appreciated. > > > > Sohail. From sallyli97 at yahoo.com Thu Apr 14 16:16:25 2005 From: sallyli97 at yahoo.com (Sally Li) Date: Thu Apr 14 19:14:41 2005 Subject: [Bioperl-l] how to use Storable.pm to save the object? Message-ID: <20050414201625.29346.qmail@web53610.mail.yahoo.com> Hi, there, Let's say we have an object which is SimpleAlign $aln How can we store this object using Storable.pm in a specific directory? I have difficulty to understand the doc in module Storable.pm. Any help will be appreciated! Thanks! Sally __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ From ewijaya at singnet.com.sg Fri Apr 15 01:19:54 2005 From: ewijaya at singnet.com.sg (Edward Wijaya) Date: Fri Apr 15 01:15:01 2005 Subject: [Bioperl-l] Array call in Bio::AlignIO Message-ID: <1113542394.425f4efaf36f7@arrowana.singnet.com.sg> Hi, Is there a way to create object "$in" from an array and in plain format instead of filehandle (DATA or FILE) in fasta format. -- Regards, Edward WIJAYA SINGAPORE __BEGIN__ #!/usr/bin/perl -w use strict; use Bio::AlignIO; # How can I change this method call so that instead of FH # it takes data from an array, like this: # @array = qw /AAAAAAAAAAAAAAA AAAAAAAGGAAACCA/; my $in = Bio::AlignIO->new(-fh => \*DATA, -format => 'fasta'); while ( my $aln = $in->next_aln() ) { print "IDENTITY : ", $aln->percentage_identity, "\n"; print "CONSENSUS: ", $aln->consensus_string(28), "\n"; } __DATA__ >Seq1 AAAAAAAAAAAAAAA >Seq2 AAAAAAAGGAAACCA From brian_osborne at cognia.com Fri Apr 15 04:02:58 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Apr 15 03:56:28 2005 Subject: [Bioperl-l] Bio::DB::Genbank not working after installation In-Reply-To: <0BF025DD64B01342A4F6552D5E452F0C2BEE5F@FRD01EXCMBX01.ads.invitrogen.net> Message-ID: Jianjun, You didn't install the Bioperl Bundle, which contains LWP::UserAgent. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wang, Jianjun Sent: Thursday, April 14, 2005 12:46 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Bio::DB::Genbank not working after installation I installed perl and bioperl lately on a Linux box. But Bio::DB::Genbank couldn't work. Looks like it's because the LWP/UserAgent package is missing somehow. This is reproduced with another Linux machine. Both followed recommended procedures from Perl and BioPerl. Could anyone shed some light on this? Thanks. -Jianjun Wang _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From whs at ebi.ac.uk Fri Apr 15 04:15:05 2005 From: whs at ebi.ac.uk (Will Spooner) Date: Fri Apr 15 04:10:54 2005 Subject: [Bioperl-l] how to use Storable.pm to save the object? In-Reply-To: <20050414201625.29346.qmail@web53610.mail.yahoo.com> Message-ID: Hi Sally, If you mean the Bio::Root::Storable module, then you need to modify SimpleAlign.pm to include Bio::Root::Storable in its @ISA array. You will then be able to call the 'store' method directly on a SimpleAlign object. This can be done within a script, e.g. --- #!/usr/bin/perl use strict; use Bio::Root::Storable; BEGIN{ require Bio::SimpleAlign; unshift @Bio::SimpleAlign::ISA, 'Bio::Root::Storable'; } my $align = Bio::SimpleAlign->new(-verbose=>1); my $token = $align->store; my $align_copy = Bio::SimpleAlign->retrieve($token); warn( "ORIG $align\n"); warn( "COPY $align_copy\n" ); exit; --- Will On Thu, 14 Apr 2005, Sally Li wrote: > Hi, there, > > Let's say we have an object which is SimpleAlign > > $aln > > How can we store this object using Storable.pm in a > specific directory? I have difficulty to understand > the doc in module Storable.pm. > > Any help will be appreciated! > > Thanks! > > Sally > > > > __________________________________ > Do you Yahoo!? > Yahoo! Small Business - Try our new resources site! > http://smallbusiness.yahoo.com/resources/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From golharam at umdnj.edu Fri Apr 15 11:13:42 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri Apr 15 11:07:16 2005 Subject: [Bioperl-l] NCBI eutils BioPerl Module Message-ID: <003701c541cd$b5ac3470$9200a8c0@GOLHARMOBILE1> Is there a BioPerl module for interfacing with NCBI's eutils Entrez system? I'm specifically looking for an interface to esearch, efetch, and egquery. If there isn't, I'd be willing to help write one... ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam@umdnj.edu From jason.stajich at duke.edu Fri Apr 15 11:46:04 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Apr 15 11:39:40 2005 Subject: [Bioperl-l] NCBI eutils BioPerl Module In-Reply-To: <003701c541cd$b5ac3470$9200a8c0@GOLHARMOBILE1> References: <003701c541cd$b5ac3470$9200a8c0@GOLHARMOBILE1> Message-ID: <6973d3ac2bb72147d5eeaf5263d3a473@duke.edu> See Bio::DB::GenBank and Bio::DB::Query::GenBank -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 15, 2005, at 11:13 AM, Ryan Golhar wrote: > Is there a BioPerl module for interfacing with NCBI's eutils Entrez > system? > > I'm specifically looking for an interface to esearch, efetch, and > egquery. > > If there isn't, I'd be willing to help write one... > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam@umdnj.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Fri Apr 15 11:41:44 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Apr 15 11:46:51 2005 Subject: [Bioperl-l] NCBI eutils BioPerl Module In-Reply-To: <003701c541cd$b5ac3470$9200a8c0@GOLHARMOBILE1> Message-ID: There are the Bio::DB::GenBank, Bio::DB::GenPept and the Bio::DB::Query::GenBank modules. They may constitute only a subset of what you're interested in. -hilmar On Friday, April 15, 2005, at 08:13 AM, Ryan Golhar wrote: > Is there a BioPerl module for interfacing with NCBI's eutils Entrez > system? > > I'm specifically looking for an interface to esearch, efetch, and > egquery. > > If there isn't, I'd be willing to help write one... > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam@umdnj.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From rob at salmonella.org Fri Apr 15 11:59:59 2005 From: rob at salmonella.org (Rob Edwards) Date: Fri Apr 15 11:53:25 2005 Subject: [Bioperl-l] NCBI eutils BioPerl Module In-Reply-To: <003701c541cd$b5ac3470$9200a8c0@GOLHARMOBILE1> References: <003701c541cd$b5ac3470$9200a8c0@GOLHARMOBILE1> Message-ID: There is also some functionality in Bio::DB::Biblio::eutils but some of this should probably be placed elsewhere as eutils is not just for pubmed searches. Rob On Apr 15, 2005, at 8:13 AM, Ryan Golhar wrote: > Is there a BioPerl module for interfacing with NCBI's eutils Entrez > system? > > I'm specifically looking for an interface to esearch, efetch, and > egquery. > > If there isn't, I'd be willing to help write one... > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam@umdnj.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From mcipriano at mbl.edu Fri Apr 15 13:45:47 2005 From: mcipriano at mbl.edu (Michael Cipriano) Date: Fri Apr 15 13:39:26 2005 Subject: [Bioperl-l] Error in bp_genbank2gff3 error Message-ID: <425FFDCB.2080909@mbl.edu> Not sure if this was fixed yet, but it is broken in my version #$Id: genbank2gff3.PLS,v 1.5 2005/01/17 19:48:57 cjm Exp $; Line 248 $fa_outfile =~ s/gff/fa/; if the string "gff" is in your --outfile, this creates problems It should be changed to $fa_outfile =~ s/gff$/fa/; -Michael Cipriano From brian_osborne at cognia.com Fri Apr 15 14:01:38 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Apr 15 13:55:01 2005 Subject: [Bioperl-l] Error in bp_genbank2gff3 error In-Reply-To: <425FFDCB.2080909@mbl.edu> Message-ID: Michael, Fixed. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Michael Cipriano Sent: Friday, April 15, 2005 1:46 PM To: 'Bioperl List' Subject: [Bioperl-l] Error in bp_genbank2gff3 error Not sure if this was fixed yet, but it is broken in my version #$Id: genbank2gff3.PLS,v 1.5 2005/01/17 19:48:57 cjm Exp $; Line 248 $fa_outfile =~ s/gff/fa/; if the string "gff" is in your --outfile, this creates problems It should be changed to $fa_outfile =~ s/gff$/fa/; -Michael Cipriano _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From agathman at semo.edu Fri Apr 15 15:48:06 2005 From: agathman at semo.edu (Gathman, Allen) Date: Fri Apr 15 15:44:54 2005 Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output Message-ID: <33580922CBEEC846B473BAE124985DE004270355@xchgnt.semo.edu> Hi all -- I'm using exonerate to align ESTs with a set of genomic contigs, and I'm trying to figure out the best way to pull information out of the output. I wrote a little test script to see what Bio::SearchIO would get me: use Bio::SearchIO; open (OUT, ">$ARGV[1]"); my $searchobj=new Bio::SearchIO ( -format => 'exonerate', -file => $ARGV[0]); while (my $result=$searchobj->next_result() ) { print OUT "query: " . $result->query_name(). "\n"; my @params=$result->available_parameters; print OUT "params:@params\n"; my @stats=$result->available_statistics; print OUT "stats:@stats\n"; while (my $hit=$result->next_hit() ) { print OUT "hitstart: " . $hit->start('hit') . "\n"; while (my $hsp=$hit->next_hsp() ) { print OUT "hspsstart: " . $hsp->start('hit') . "\n"; } # end hsp } # end hit } # end result close OUT; There aren't any parameters or stats returned. $hit->start works fine, and $hsp->start works, but the hsps are the individual matches; if there's a one-nucleotide gap, that separates two hsps, just as a real intron would. It appears that there should be some methods or arguments applicable here beyond those for the generic Hit and HSP objects, but I don't know what they are. I've gone through the documentation for Bio::SearchIO::exonerate, but I don't see what I'm looking for. For instance, the VULGAR line in the exonerate output distinguishes between introns and gaps - is there some way to pull them out separately in Bio::SearchIO? In short, I want to be able to ignore small gaps and define start and end points of exons, marking 5' and 3' splice junctions. I'd appreciate any help on how to get at these. Thanks -- Allen Allen Gathman http://cstl-csm.semo.edu/gathman From allenday at ucla.edu Fri Apr 15 16:05:09 2005 From: allenday at ucla.edu (Allen Day) Date: Fri Apr 15 15:58:39 2005 Subject: [Bioperl-l] NCBI eutils BioPerl Module In-Reply-To: <003701c541cd$b5ac3470$9200a8c0@GOLHARMOBILE1> References: <003701c541cd$b5ac3470$9200a8c0@GOLHARMOBILE1> Message-ID: yes, look at Bio::DB::Biblio::eutils -allen On Fri, 15 Apr 2005, Ryan Golhar wrote: > Is there a BioPerl module for interfacing with NCBI's eutils Entrez > system? > > I'm specifically looking for an interface to esearch, efetch, and > egquery. > > If there isn't, I'd be willing to help write one... > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam@umdnj.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Fri Apr 15 16:33:17 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Apr 15 16:27:30 2005 Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output In-Reply-To: <33580922CBEEC846B473BAE124985DE004270355@xchgnt.semo.edu> References: <33580922CBEEC846B473BAE124985DE004270355@xchgnt.semo.edu> Message-ID: <4b409bbe418db5e85e235560c04775f5@duke.edu> Allen - I gave up to really trying to turn VULGAR into proper HSP objects representing exons and just parsing the GFF from Guy's showtargetgff output. His GFF has all of what you want except that reverse strand features need to have their coordinates flipped (start=seqlength - end; end=seqlength - start + 1). Seems like we should be able to do it alone by only interpreting the proper intron states and not all the gaps as we do. I ended up having to grab the query and hit sequence lengths as well to correct for reverse strand sequences so the coordinates can be on the forward strand. I think we could fix it to do this but maybe would make the output gene-centric instead of SearchIO form. Not sure. In short I'm not sure the exonerate parser works too well for what you want right now - for me, the exonerate generation of GFF which is then munged into GFF2 that exonerate will pump out as my pipeline for loading into DB::GFF and Gbrowse. I can send that script for munging over if it helps. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 15, 2005, at 3:48 PM, Gathman, Allen wrote: > Hi all -- > > > > I'm using exonerate to align ESTs with a set of genomic contigs, and > I'm > trying to figure out the best way to pull information out of the > output. I > wrote a little test script to see what Bio::SearchIO would get me: > > use Bio::SearchIO; > open (OUT, ">$ARGV[1]"); > > my $searchobj=new Bio::SearchIO ( -format => 'exonerate', > -file => $ARGV[0]); > > while (my $result=$searchobj->next_result() ) { > print OUT "query: " . $result->query_name(). "\n"; > my @params=$result->available_parameters; > print OUT "params:@params\n"; > my @stats=$result->available_statistics; > print OUT "stats:@stats\n"; > while (my $hit=$result->next_hit() ) { > print OUT "hitstart: " . $hit->start('hit') . "\n"; > while (my $hsp=$hit->next_hsp() ) { > print OUT "hspsstart: " . $hsp->start('hit') . "\n"; > } # end hsp > } # end hit > } # end result > close OUT; > > > There aren't any parameters or stats returned. $hit->start works > fine, and > $hsp->start works, but the hsps are the individual matches; if there's > a > one-nucleotide gap, that separates two hsps, just as a real intron > would. > > > > It appears that there should be some methods or arguments applicable > here > beyond those for the generic Hit and HSP objects, but I don't know > what they > are. I've gone through the documentation for > Bio::SearchIO::exonerate, but > I don't see what I'm looking for. For instance, the VULGAR line in the > exonerate output distinguishes between introns and gaps - is there > some way > to pull them out separately in Bio::SearchIO? > > > > In short, I want to be able to ignore small gaps and define start and > end > points of exons, marking 5' and 3' splice junctions. I'd appreciate > any > help on how to get at these. > > > > Thanks -- > > > > Allen > > > > Allen Gathman > > http://cstl-csm.semo.edu/gathman > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From agathman at semo.edu Fri Apr 15 16:47:40 2005 From: agathman at semo.edu (Allen Gathman) Date: Fri Apr 15 16:38:50 2005 Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output In-Reply-To: <33580922CBEEC846B473BAE124985DE001CF322E@xchgnt.semo.edu> Message-ID: <33580922CBEEC846B473BAE124985DE0030BCF41@xchgnt.semo.edu> Duh. When in doubt, read the documentation -- I was so busy trying to figure out how to do it with Bioperl, I overlooked the GFF output option from exonerate. Thanks -- and yes, if you can send me your GFF-munging script I'd like to see it -- at least we can wind up doing stuff in a consistent way then. Allen Gathman http://cstl-csm.semo.edu/gathman > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Friday, April 15, 2005 3:33 PM > To: Gathman, Allen > Cc: Guy Slater; 'bioperl-l@bioperl.org' > Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate > output > > Allen - > > I gave up to really trying to turn VULGAR into proper HSP objects > representing exons and just parsing the GFF from Guy's showtargetgff > output. His GFF has all of what you want except that reverse strand > features need to have their coordinates flipped (start=seqlength - end; > end=seqlength - start + 1). > > Seems like we should be able to do it alone by only interpreting the > proper intron states and not all the gaps as we do. > > I ended up having to grab the query and hit sequence lengths as well to > correct for reverse strand sequences so the coordinates can be on the > forward strand. I think we could fix it to do this but maybe would > make the output gene-centric instead of SearchIO form. Not sure. > > In short I'm not sure the exonerate parser works too well for what you > want right now - for me, the exonerate generation of GFF which is then > munged into GFF2 that exonerate will pump out as my pipeline for > loading into DB::GFF and Gbrowse. > > I can send that script for munging over if it helps. > > -jason > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Apr 15, 2005, at 3:48 PM, Gathman, Allen wrote: > > > Hi all -- > > > > > > > > I'm using exonerate to align ESTs with a set of genomic contigs, and > > I'm > > trying to figure out the best way to pull information out of the > > output. I > > wrote a little test script to see what Bio::SearchIO would get me: > > > > use Bio::SearchIO; > > open (OUT, ">$ARGV[1]"); > > > > my $searchobj=new Bio::SearchIO ( -format => 'exonerate', > > -file => $ARGV[0]); > > > > while (my $result=$searchobj->next_result() ) { > > print OUT "query: " . $result->query_name(). "\n"; > > my @params=$result->available_parameters; > > print OUT "params:@params\n"; > > my @stats=$result->available_statistics; > > print OUT "stats:@stats\n"; > > while (my $hit=$result->next_hit() ) { > > print OUT "hitstart: " . $hit->start('hit') . "\n"; > > while (my $hsp=$hit->next_hsp() ) { > > print OUT "hspsstart: " . $hsp->start('hit') . "\n"; > > } # end hsp > > } # end hit > > } # end result > > close OUT; > > > > > > There aren't any parameters or stats returned. $hit->start works > > fine, and > > $hsp->start works, but the hsps are the individual matches; if there's > > a > > one-nucleotide gap, that separates two hsps, just as a real intron > > would. > > > > > > > > It appears that there should be some methods or arguments applicable > > here > > beyond those for the generic Hit and HSP objects, but I don't know > > what they > > are. I've gone through the documentation for > > Bio::SearchIO::exonerate, but > > I don't see what I'm looking for. For instance, the VULGAR line in the > > exonerate output distinguishes between introns and gaps - is there > > some way > > to pull them out separately in Bio::SearchIO? > > > > > > > > In short, I want to be able to ignore small gaps and define start and > > end > > points of exons, marking 5' and 3' splice junctions. I'd appreciate > > any > > help on how to get at these. > > > > > > > > Thanks -- > > > > > > > > Allen > > > > > > > > Allen Gathman > > > > http://cstl-csm.semo.edu/gathman > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From jjmail at mac.com Fri Apr 15 20:24:18 2005 From: jjmail at mac.com (Jamie Sherman) Date: Fri Apr 15 20:18:16 2005 Subject: [Bioperl-l] Bio::DB::GenPept - get_Seq_by_id Message-ID: <5bafbfc62261fd817333f9f2db22abe8@mac.com> I'm getting really odd behavior when I user get_Seq_by_id to retrieve from GenPept. I'm trying to retrieve by name where name is like 'ROA1_HUMAN". When I have a name that starts with a Letter it works great but for names that start with a number it returns junk. Is there a work around for this or am I doing something wrong? Can I create a Bio::DB::GenPept->new( arg to specify search type )? Thanks, --Jamie Program: #!/usr/bin/perl -w use Bio::DB::GenPept; $sp = Bio::DB::GenPept->new; # worked $query = 'AAP1_YEAST'; # worked $query = "ROA1_HUMAN"; $query = "2AAA_YEAST"; #doesn't work? $seq = $sp->get_Seq_by_id($query); print $seq->desc . "\n"; print $seq->primary_id . "\n"; Output: [AAP1_YEAST] Alanine/arginine aminopeptidase. 728771 [ROA1_HUMAN] Heterogeneous nuclear ribonucleoprotein A1 (Helix-destabilizing protein) (Single-strand binding protein) (hnRNP core protein A1). 133254 [2AAA_YEAST] B.taurus DNA sequence 1 from patent application EP0238993. 2 It is using 2 as the ID number, How do I escape this? From goshng at naver.com Sat Apr 16 09:13:51 2005 From: goshng at naver.com (Sang Chul Choi) Date: Sat Apr 16 09:09:28 2005 Subject: [Bioperl-l] Fetching a DNA sequence corresponding to a protein Message-ID: <42610F8F.000001.09788@i5a010> I'm Sang Chul Choi, ... I'm trying to fetching a DNA sequence coding a protein from the public database. I started with a PDB file and I want the corresponding DNA sequence coding the protein sequence in the PDB file. Let me explain my strategy and I'm wondering if there is any other better way to do this. Let's say, that I have a PDB file of "1g84", and I want a DNA sequence coding the protein. To do that, I found out that there lines starting with "DBREF" in the PDB file. DBREF says where I could track down the corresponding DNA sequence. The following lines are PDB id, 1g84 and the chain I want, in this case chain "A", so 1g84 + a, and the DBREF line in the PDB file named pdb1g84.ent. 1g84a DBREF 1G84 A 1 105 SWS P01854 EPC_HUMAN 106 210 In detail, this DBREF says that the flat file of protein sequence is from swissprot database and its id is "P01854" and the chain A of protein structure 1G84 is from position 106 to 210 in that protein sequence of the flat file. So, I used the bioperl object "Bio::DB::SwissProt" to get the flat file. And, in the swissprot flat file, I easily parsed CDS line using methods of bioperl object. Saying, CDS 1..574 /gene="IGHE" /coded_by="join(L00021.1:57..495,L00022.1:98..406, L00022.1:614..934,L00022.1:1021..1344, L00022.1:1428..1759)" this CDS feature says actually where I could get the DNA sequence. So, now I'm using "Bio::DB::GenBank" to get the DNA sequence. Thank you very much for your careful reading upto here. Sincerely Yours, Sang Chul Choi, from Raleigh, North Carolina ------------------------------------------------------------------------ NAVER :: Korea's No.1 portal service www.naver.com From jason.stajich at duke.edu Sat Apr 16 12:12:38 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Apr 16 12:07:26 2005 Subject: [Bioperl-l] Bio::DB::GenPept - get_Seq_by_id In-Reply-To: <5bafbfc62261fd817333f9f2db22abe8@mac.com> References: <5bafbfc62261fd817333f9f2db22abe8@mac.com> Message-ID: <5c0642bcd730fb05dd38f3cd6a6ec122@duke.edu> If you specify -verbose => 1 when initializing an object you will often see debugging statements. $sp = Bio::DB::GenPept->new(-verbose => 1); You will see that the URLs generated to fetch the sequence are proper (we're not truncating the id or anything). After playing around it looks like we have to put the ID in quotes if it starts with a number otherwise the server assumes it is a gi number. I think this might be an NCBI shortcut? In the short term you should just do this to your id strings to quote them if they start with a number. $id = "\"$id\"" if $id =~ /^\d/; We'll add code in the modules to detect and fix these automatically -- quoting GI numbers doesn't seem to cause problems so maybe we should quote every id? If you are only querying swissprot data you might find Bio::DB::SwissProt useful as well.We'll add code in the modules to detect and fix these automatically -- quoting GI numbers doesn't seem to cause problems so maybe we should quote every id? Bio::DB::NCBIHelper was updated in CVS to quote the ids before making the URL for the query. I put some fixed into CVS which better parse swissprot fields from DBSOURCE (in Bio/SeIO/genbank) as well although it is always better to get this from the original swissprot records as there is some munging in the transfer process. -jason On Apr 15, 2005, at 8:24 PM, Jamie Sherman wrote: > I'm getting really odd behavior when I user get_Seq_by_id to retrieve > from GenPept. I'm trying to retrieve by name where name is like > 'ROA1_HUMAN". When I have a name that starts with a Letter it works > great but for names that start with a number it returns junk. Is there > a work around for this or am I doing something wrong? Can I create a > Bio::DB::GenPept->new( arg to specify search type )? > Thanks, > --Jamie > > > Program: > #!/usr/bin/perl -w > > use Bio::DB::GenPept; > $sp = Bio::DB::GenPept->new; > > # worked $query = 'AAP1_YEAST'; > # worked $query = "ROA1_HUMAN"; > $query = "2AAA_YEAST"; #doesn't work? > > $seq = $sp->get_Seq_by_id($query); > print $seq->desc . "\n"; > print $seq->primary_id . "\n"; > > > Output: > [AAP1_YEAST] > Alanine/arginine aminopeptidase. > 728771 > > [ROA1_HUMAN] > Heterogeneous nuclear ribonucleoprotein A1 (Helix-destabilizing > protein) (Single-strand binding protein) (hnRNP core protein A1). > 133254 > > [2AAA_YEAST] > B.taurus DNA sequence 1 from patent application EP0238993. > 2 > > It is using 2 as the ID number, How do I escape this? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From sgoegel at gmail.com Sat Apr 16 13:12:50 2005 From: sgoegel at gmail.com (sgoegel@gmail.com) Date: Sat Apr 16 13:07:38 2005 Subject: [Bioperl-l] Problem writing sequence features for GenBank/GenPept sequences (file: Bio/SeqIO/genbank.pm) Message-ID: <200504161212.51000.sgoegel@gmail.com> write_sequence does not write genbank sequences correctly. Specifically, features appear as shown below. I traced the problem write_seq in SeqIO::genbank.pm, and then to the $self->_print_GenBank_FTHelper($fth); line within the foreach my $fth ( @fth ) loop. Here's what I'm actually doing within my script: use Bio::Perl; use Bio::DB::GenBank; use Bio::DB::GenPept; use Bio::Seq; use Bio::SeqIO; # --- cut --- my $seq = $gp->get_Seq_by_gi($id); # where $gp is a Bio::DB::GenPept object and $id is a valid gi num # ... write_sequence(">$filename", "genbank", $seq); this results in a mostly-correct looking genpept output file, but the feature section looks like so: FEATURES Location/Qualifiers source 1..194 /db_xref="Bio::Annotation::SimpleValue=HASH(0x87eecbc)" /organism="Bio::Annotation::SimpleValue=HASH(0x87eee00)" gene 1..194 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87eefe0)" /gene="Bio::Annotation::SimpleValue=HASH(0x87f0728)" Protein 1..194 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f07ac)" /gene="Bio::Annotation::SimpleValue=HASH(0x87f0908)" /product="Bio::Annotation::SimpleValue=HASH(0x87f0914)" Region 15..42 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f0a04)" /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1c4c) " /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1cb8) /gene="Bio::Annotation::SimpleValue=HASH(0x87f1d00)" /note="Bio::Annotation::SimpleValue=HASH(0x87f1d48)" Region 50..99 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f1dd8)" /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1ed4) " /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1f40) /gene="Bio::Annotation::SimpleValue=HASH(0x87f2e2c)" /note="Bio::Annotation::SimpleValue=HASH(0x87f2e74)" Site 118 /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f2f04)" /evidence=Bio::Annotation::SimpleValue=HASH(0x87f3000) /gene="Bio::Annotation::SimpleValue=HASH(0x87f306c)" /site_type="Bio::Annotation::SimpleValue=HASH(0x87f30fc)" /note="Bio::Annotation::SimpleValue=HASH(0x87f30b4)" !! As I said, I traced this problem to write_seq in the Bio/SeqIO/genbank.pm file, which calls $self->_print_GenBank_FTHelper($fth); This happened for all of about 4000 nucleotide sequences I downloaded in genbank format, as well as a protein sequence I tested. I fixed the problem with the following modification to _print_GenBank_FTHelper: (modified lines marked with ### comments) sub _print_GenBank_FTHelper { my ($self,$fth) = @_; if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) { $fth->warn("$fth is not a FTHelper class. Attempting to print, but there could be tears!"); } if( defined $fth->key && $fth->key eq 'CONTIG' ) { $self->_show_dna(0); $self->_write_line_GenBank_regex(sprintf("%-12s",$fth->key), ' 'x12,$fth->loc,"\,\|\$",80); } else { $self->_write_line_GenBank_regex(sprintf(" %-16s",$fth->key), " "x21, $fth->loc,"\,\|\$",80); } foreach my $tag ( keys %{$fth->field} ) { foreach my $value ( @{$fth->field->{$tag}} ) { ####### I added the next 3 lines. if (ref ($value) =~ /Bio::Annotation::SimpleValue/) { $value = $value->value; } ## --- cut --- I don't know whether my fix is sufficient or more should be changed to maintain coherence... however, my fix seems to work. From goshng at naver.com Sat Apr 16 14:39:25 2005 From: goshng at naver.com (Sang Chul Choi) Date: Sat Apr 16 14:32:57 2005 Subject: [Bioperl-l] For a SwissProt protein to get a DNA sequence Message-ID: <42615BDD.000001.23142@nhn427> I want to get a DNA sequence coding a swissprot protein. My swissprot ID is Q16637. I used Bio::DB::SwissProt to get the flat file. Then, there were several db_xref (dblinks annotation) links including U43883 U43876 U43877 U43878 U43880 U43881 U43882 U18423 U80017 AC005031 U21914 BC000908 BC015308 BC062723 BC070242 So, I have tried to fetch and parse the first one, U43883, or I've run these codes and gotten the following warnings and so on. The reason why I have these warnings, I think, would be there is no U43882, and U43880 in the current fetched U43883 flat file because the last line of code, $nuc_seq_str = $f->spliced_seq->seq;, the spliced_seq method is thought to be parsing the CDS feature. How can I put these DNA sequences together without any warnings? Thank you, Sang Chul Choi CDS join(U43876.1:608..688,U43877.1:104..175, U43878.1:118..237,U43879.1:84..284,U43880.1:69..221, U43881.1:103..198,U43882.1:53..163,209..259) CODE my $gbseq = $gb->get_Seq_by_acc($xrefid); # Get the CDS information my @features = $gbseq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ( $tag eq "CDS" ) { $nuc_seq_str = $f->spliced_seq->seq; -------------------- WARNING --------------------- MSG: cannot get remote location for U43882.1 without a valid Bio::DB::RandomAccessI database handle (like Bio::DB::GenBank) --------------------------------------------------- -------------------- WARNING --------------------- MSG: cannot get remote location for U43880.1 without a valid Bio::DB::RandomAccessI database handle (like Bio::DB::GenBank) --------------------------------------------------- From hlapp at gmx.net Sat Apr 16 15:21:52 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Apr 16 15:17:48 2005 Subject: [Bioperl-l] For a SwissProt protein to get a DNA sequence In-Reply-To: <42615BDD.000001.23142@nhn427> Message-ID: As the warning states $feature->spliced_seq() also accepts a module implementing an interface for fetching sequences by random access (i.e., Bio::DB::RandomAccessI). So, depending on where the sequences are to be fetched from instantiate the corresponding module from the Bio::DB::* collection and pass it to spliced_seq(). Also, check out the POD of the method to make sure you're calling it correctly. -hilmar On Saturday, April 16, 2005, at 11:39 AM, Sang Chul Choi wrote: > > I want to get a DNA sequence coding a swissprot protein. My swissprot > ID > is Q16637. I used Bio::DB::SwissProt to get the flat file. Then, there > were > > several db_xref (dblinks annotation) links including > > U43883 U43876 U43877 U43878 U43880 U43881 U43882 U18423 U80017 > AC005031 U21914 BC000908 BC015308 BC062723 BC070242 > > So, I have tried to fetch and parse the first one, U43883, or I've run > these codes and gotten > the following warnings and so on. The reason why I have these > warnings, I think, would > be there is no U43882, and U43880 in the current fetched U43883 flat > file because the last > line of code, $nuc_seq_str = $f->spliced_seq->seq;, the spliced_seq > method is thought > to be parsing the CDS feature. How can I put these DNA sequences > together without > any warnings? > > Thank you, > > Sang Chul Choi > > CDS join(U43876.1:608..688,U43877.1:104..175, > > U43878.1:118..237,U43879.1:84..284,U43880.1:69..221, > U43881.1:103..198,U43882.1:53..163,209..259) > > CODE > > my $gbseq = $gb->get_Seq_by_acc($xrefid); > > # Get the CDS information > my @features = $gbseq->all_SeqFeatures; > > # sort features by their primary tags > for my $f (@features) { > my $tag = $f->primary_tag; > if ( $tag eq "CDS" ) { > $nuc_seq_str = $f->spliced_seq->seq; > > > -------------------- WARNING --------------------- > MSG: cannot get remote location for U43882.1 without a valid > Bio::DB::RandomAccessI database handle (like Bio::DB::GenBank) > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: cannot get remote location for U43880.1 without a valid > Bio::DB::RandomAccessI database handle (like Bio::DB::GenBank) > --------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Apr 16 15:23:59 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Apr 16 15:18:17 2005 Subject: [Bioperl-l] Problem writing sequence features for GenBank/GenPept sequences (file: Bio/SeqIO/genbank.pm) In-Reply-To: <200504161212.51000.sgoegel@gmail.com> Message-ID: <151A898D-AEAD-11D9-8911-000A959EB4C4@gmx.net> This is a known problem in the 1.5.0 release. Do one of the following: - downgrade to 1.4.0 (preferably using the 1.4 branch CVS head, but depending on what you want to do the stock 1.4.0 may do fine), or - upgrade to the main trunk CVS head. Hth, -hilmar On Saturday, April 16, 2005, at 10:12 AM, sgoegel@gmail.com wrote: > write_sequence does not write genbank sequences correctly. > Specifically, features appear as shown below. > > I traced the problem write_seq in SeqIO::genbank.pm, > and then to the $self->_print_GenBank_FTHelper($fth); line within the > foreach > my $fth ( @fth ) loop. > > Here's what I'm actually doing within my script: > use Bio::Perl; > use Bio::DB::GenBank; > use Bio::DB::GenPept; > use Bio::Seq; > use Bio::SeqIO; > # --- cut --- > my $seq = $gp->get_Seq_by_gi($id); # where $gp is a Bio::DB::GenPept > object > and $id is a valid gi num > # ... > write_sequence(">$filename", "genbank", $seq); > > this results in a mostly-correct looking genpept output file, but the > feature > section looks like so: > FEATURES Location/Qualifiers > source 1..194 > > /db_xref="Bio::Annotation::SimpleValue=HASH(0x87eecbc)" > > /organism="Bio::Annotation::SimpleValue=HASH(0x87eee00)" > gene 1..194 > > /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87eefe0)" > /gene="Bio::Annotation::SimpleValue=HASH(0x87f0728)" Protein > 1..194 > > /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f07ac)" > /gene="Bio::Annotation::SimpleValue=HASH(0x87f0908)" > /product="Bio::Annotation::SimpleValue=HASH(0x87f0914)" Region > 15..42 > > /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f0a04)" > /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1c4c) " > > /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1cb8) > > /gene="Bio::Annotation::SimpleValue=HASH(0x87f1d00)" > > /note="Bio::Annotation::SimpleValue=HASH(0x87f1d48)" > Region 50..99 > > /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f1dd8)" > /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1ed4) " > > /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1f40) > > /gene="Bio::Annotation::SimpleValue=HASH(0x87f2e2c)" > > /note="Bio::Annotation::SimpleValue=HASH(0x87f2e74)" > Site 118 > > /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f2f04)" > /evidence=Bio::Annotation::SimpleValue=HASH(0x87f3000) > /gene="Bio::Annotation::SimpleValue=HASH(0x87f306c)" > /site_type="Bio::Annotation::SimpleValue=HASH(0x87f30fc)" > /note="Bio::Annotation::SimpleValue=HASH(0x87f30b4)" > > > !! > > As I said, I traced this problem to write_seq in the > Bio/SeqIO/genbank.pm > file, which calls $self->_print_GenBank_FTHelper($fth); > > This happened for all of about 4000 nucleotide sequences I downloaded > in > genbank format, as well as a protein sequence I tested. > > I fixed the problem with the following modification to > _print_GenBank_FTHelper: (modified lines marked with ### comments) > > sub _print_GenBank_FTHelper { > my ($self,$fth) = @_; > > if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) { > $fth->warn("$fth is not a FTHelper class. Attempting to print, > but > there could be tears!"); > } > if( defined $fth->key && > $fth->key eq 'CONTIG' ) { > $self->_show_dna(0); > $self->_write_line_GenBank_regex(sprintf("%-12s",$fth->key), > ' 'x12,$fth->loc,"\,\|\$",80); > } else { > $self->_write_line_GenBank_regex(sprintf(" > %-16s",$fth->key), > " "x21, > $fth->loc,"\,\|\$",80); > } > > foreach my $tag ( keys %{$fth->field} ) { > foreach my $value ( @{$fth->field->{$tag}} ) { > ####### I added the next 3 lines. > if (ref ($value) =~ /Bio::Annotation::SimpleValue/) { > $value = $value->value; > } > > ## --- cut --- > > I don't know whether my fix is sufficient or more should be changed to > maintain coherence... however, my fix seems to work. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Sat Apr 16 20:31:55 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Sat Apr 16 20:25:48 2005 Subject: [Bioperl-l] dependency XML::DOM::XPath Message-ID: <19C63B9E-AED8-11D9-8911-000A959EB4C4@gnf.org> The SeqIO::interpro parser requires this as a dependency but it's not listed in the Makefile.PL. Should it be added? Is the SeqIO::interpro parser 'in production' or still undergoing major changes? Is anybody using it? -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Sat Apr 16 20:35:38 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Sat Apr 16 20:29:11 2005 Subject: [Bioperl-l] failing tests and dubious warnings Message-ID: <9F1F30C0-AED8-11D9-8911-000A959EB4C4@gnf.org> This is using a fresh checkout of the cvs HEAD of the main trunk. The following tests fail for me on Mac OSX 10.2.8 and perl 5.6.0. Could please somebody on another platform (likely many ;) check whether this is my setup that tickles it or whether it can be reproduced on other platforms. I'd rather not start hunting down peculiar problems restricted to perl 5.6.0 or something related to it. Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- t/BioFetch_DB.t 27 1 3.70% 8 t/Index.t 79 20224 47 0 0.00% ?? t/LocationFactory.t 179 2 1.12% 178-179 t/PopGen.t 85 2 2.35% 82-83 Details: t/BioFetch_DB.t: probably a time out, I won't bother for now. (even though I can reproduce this on every run!) t/Index.t: t/Index......................ok 18/47 ------------- EXCEPTION ------------- MSG: Can't open 'DB_File' dbm file 'Wibbl4' : Inappropriate file type or format STACK Bio::Index::Abstract::open_dbm blib/lib/Bio/Index/Abstract.pm:385 STACK Bio::Index::Abstract::new blib/lib/Bio/Index/Abstract.pm:149 STACK Bio::Index::AbstractSeq::new blib/lib/Bio/Index/AbstractSeq.pm:91 STACK toplevel t/Index.t:112 -------------------------------------- t/Index......................dubious Test returned status 79 (wstat 20224, 0x4f00) after all the subtests completed successfully Does this suggest anything to anyone? I've never had this test fail before. t/LocationFactory.t: t/LocationFactory............ok 177/179Use of uninitialized value in substitution (s///) at blib/lib/Bio/Factory/FTLocationFactory.pm line 169. Argument "complement(315036" isn't numeric in numeric gt (>) at blib/lib/Bio/Factory/FTLocationFactory.pm line 254. Argument "complement(315036" isn't numeric in numeric gt (>) at blib/lib/Bio/Location/Atomic.pm line 101. Argument "complement(315036" isn't numeric in numeric eq (==) at blib/lib/Bio/Location/Simple.pm line 315. t/LocationFactory............NOK 178Argument "complement(314652" isn't numeric in numeric gt (>) at blib/lib/Bio/Factory/FTLocationFactory.pm line 254. Argument "complement(314652" isn't numeric in numeric gt (>) at blib/lib/Bio/Location/Atomic.pm line 101. Argument "complement(314652" isn't numeric in numeric eq (==) at blib/lib/Bio/Location/Simple.pm line 315. t/LocationFactory............FAILED tests 178-179 Failed 2/179 tests, 98.88% okay t/PopGen.t: # Test 82 got: '89' (t/PopGen.t at line 432) # Expected: '90' not ok 83 # Test 83 got: 'NA07000' (t/PopGen.t at line 433) # Expected: 'NA06994' Also, the following tests clutter the screen with various warnings that don't look entirely innocuous: t/FeatureIO..................ok 15/22 -------------------- WARNING --------------------- MSG: '##feature-ontology' directive handling not yet implemented --------------------------------------------------- -------------------- WARNING --------------------- MSG: '##attribute-ontology' directive handling not yet implemented --------------------------------------------------- -------------------- WARNING --------------------- MSG: '##source-ontology' directive handling not yet implemented --------------------------------------------------- t/FeatureIO..................ok 19/22 -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x863674) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x855174) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x8607c0) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x85f82c) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x8639f8) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- t/FeatureIO..................ok t/masta......................ok 3/16Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 10. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 10. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 10. Use of uninitialized value in numeric eq (==) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 226, line 10. # these are repeated many more times but cut here masta......................ok 14/16Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Matrix/PSM/IO/masta.pm line 193, line 63. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 63. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 63. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 63. Use of uninitialized value in numeric eq (==) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 226, line 63. # these are repeated many more times but cut here t/masta......................ok t/primer3....................ok 10/10Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. t/primer3....................ok -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Apr 16 22:06:20 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Apr 16 21:59:53 2005 Subject: [Bioperl-l] Re: Bio::OntologyIO choking on current GO Message-ID: <4A6416B9-AEE5-11D9-8911-000A959EB4C4@gmx.net> If you have Graph.pm installed with v0.5 or higher and have been unable to parse the current gene ontology with bioperl, you may want to give this a shot, otherwise dont read any further. FYI for those who are concerned, I added Nat Goodman's adaptor code that makes Graph.pm 0.5+ work with the SimpleGOEngine module. It works with the 0.2x version of Graph, but I've been unable yet to test with 0.5. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From brian_osborne at cognia.com Sun Apr 17 10:46:57 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Sun Apr 17 10:40:14 2005 Subject: [Bioperl-l] failing tests and dubious warnings In-Reply-To: <9F1F30C0-AED8-11D9-8911-000A959EB4C4@gnf.org> Message-ID: Hilmar, Cygwin 5.1, Perl 5.8.6. Index.t and LocationFactory.t pass, I don't see the errors you see. I see the same errors as you with PopGen.t. I also see those same warnings with masta.t, FeatureIO, and primer3. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp Sent: Saturday, April 16, 2005 8:36 PM To: Bioperl Subject: [Bioperl-l] failing tests and dubious warnings This is using a fresh checkout of the cvs HEAD of the main trunk. The following tests fail for me on Mac OSX 10.2.8 and perl 5.6.0. Could please somebody on another platform (likely many ;) check whether this is my setup that tickles it or whether it can be reproduced on other platforms. I'd rather not start hunting down peculiar problems restricted to perl 5.6.0 or something related to it. Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- t/BioFetch_DB.t 27 1 3.70% 8 t/Index.t 79 20224 47 0 0.00% ?? t/LocationFactory.t 179 2 1.12% 178-179 t/PopGen.t 85 2 2.35% 82-83 Details: t/BioFetch_DB.t: probably a time out, I won't bother for now. (even though I can reproduce this on every run!) t/Index.t: t/Index......................ok 18/47 ------------- EXCEPTION ------------- MSG: Can't open 'DB_File' dbm file 'Wibbl4' : Inappropriate file type or format STACK Bio::Index::Abstract::open_dbm blib/lib/Bio/Index/Abstract.pm:385 STACK Bio::Index::Abstract::new blib/lib/Bio/Index/Abstract.pm:149 STACK Bio::Index::AbstractSeq::new blib/lib/Bio/Index/AbstractSeq.pm:91 STACK toplevel t/Index.t:112 -------------------------------------- t/Index......................dubious Test returned status 79 (wstat 20224, 0x4f00) after all the subtests completed successfully Does this suggest anything to anyone? I've never had this test fail before. t/LocationFactory.t: t/LocationFactory............ok 177/179Use of uninitialized value in substitution (s///) at blib/lib/Bio/Factory/FTLocationFactory.pm line 169. Argument "complement(315036" isn't numeric in numeric gt (>) at blib/lib/Bio/Factory/FTLocationFactory.pm line 254. Argument "complement(315036" isn't numeric in numeric gt (>) at blib/lib/Bio/Location/Atomic.pm line 101. Argument "complement(315036" isn't numeric in numeric eq (==) at blib/lib/Bio/Location/Simple.pm line 315. t/LocationFactory............NOK 178Argument "complement(314652" isn't numeric in numeric gt (>) at blib/lib/Bio/Factory/FTLocationFactory.pm line 254. Argument "complement(314652" isn't numeric in numeric gt (>) at blib/lib/Bio/Location/Atomic.pm line 101. Argument "complement(314652" isn't numeric in numeric eq (==) at blib/lib/Bio/Location/Simple.pm line 315. t/LocationFactory............FAILED tests 178-179 Failed 2/179 tests, 98.88% okay t/PopGen.t: # Test 82 got: '89' (t/PopGen.t at line 432) # Expected: '90' not ok 83 # Test 83 got: 'NA07000' (t/PopGen.t at line 433) # Expected: 'NA06994' Also, the following tests clutter the screen with various warnings that don't look entirely innocuous: t/FeatureIO..................ok 15/22 -------------------- WARNING --------------------- MSG: '##feature-ontology' directive handling not yet implemented --------------------------------------------------- -------------------- WARNING --------------------- MSG: '##attribute-ontology' directive handling not yet implemented --------------------------------------------------- -------------------- WARNING --------------------- MSG: '##source-ontology' directive handling not yet implemented --------------------------------------------------- t/FeatureIO..................ok 19/22 -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x863674) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x855174) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x8607c0) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x85f82c) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- -------------------- WARNING --------------------- MSG: Bio::SeqFeature::Annotated=HASH(0x8639f8) does not implement Bio::SeqFeatureI, ignoring. --------------------------------------------------- t/FeatureIO..................ok t/masta......................ok 3/16Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 10. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 10. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 10. Use of uninitialized value in numeric eq (==) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 226, line 10. # these are repeated many more times but cut here masta......................ok 14/16Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Matrix/PSM/IO/masta.pm line 193, line 63. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 63. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 63. Use of uninitialized value in addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 63. Use of uninitialized value in numeric eq (==) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 226, line 63. # these are repeated many more times but cut here t/masta......................ok t/primer3....................ok 10/10Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. Use of uninitialized value in pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. t/primer3....................ok -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Sun Apr 17 10:58:02 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Apr 17 10:51:30 2005 Subject: [Bioperl-l] failing tests and dubious warnings In-Reply-To: References: Message-ID: <4106c060a4b295348372e4143b080e5b@duke.edu> On Apr 17, 2005, at 10:46 AM, Brian Osborne wrote: > Hilmar, > > Cygwin 5.1, Perl 5.8.6. > > Index.t and LocationFactory.t pass, I don't see the errors you see. > > I see the same errors as you with PopGen.t. I also see those same > warnings > with masta.t, FeatureIO, and primer3. > popgen error is due to something I changed in tree objects I'll fix the test soon. > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp > Sent: Saturday, April 16, 2005 8:36 PM > To: Bioperl > Subject: [Bioperl-l] failing tests and dubious warnings > > > This is using a fresh checkout of the cvs HEAD of the main trunk. The > following tests fail for me on Mac OSX 10.2.8 and perl 5.6.0. > > Could please somebody on another platform (likely many ;) check whether > this is my setup that tickles it or whether it can be reproduced on > other platforms. I'd rather not start hunting down peculiar problems > restricted to perl 5.6.0 or something related to it. > > Failed Test Stat Wstat Total Fail Failed List of Failed > ----------------------------------------------------------------------- > - > ------- > t/BioFetch_DB.t 27 1 3.70% 8 > t/Index.t 79 20224 47 0 0.00% ?? > t/LocationFactory.t 179 2 1.12% 178-179 > t/PopGen.t 85 2 2.35% 82-83 > > Details: > > t/BioFetch_DB.t: probably a time out, I won't bother for now. (even > though I can reproduce this on every run!) > > t/Index.t: > > t/Index......................ok 18/47 > ------------- EXCEPTION ------------- > MSG: Can't open 'DB_File' dbm file 'Wibbl4' : Inappropriate file type > or format > STACK Bio::Index::Abstract::open_dbm blib/lib/Bio/Index/Abstract.pm:385 > STACK Bio::Index::Abstract::new blib/lib/Bio/Index/Abstract.pm:149 > STACK Bio::Index::AbstractSeq::new blib/lib/Bio/Index/AbstractSeq.pm:91 > STACK toplevel t/Index.t:112 > > -------------------------------------- > t/Index......................dubious > Test returned status 79 (wstat 20224, 0x4f00) > after all the subtests completed successfully > > Does this suggest anything to anyone? I've never had this test fail > before. > > t/LocationFactory.t: > > t/LocationFactory............ok 177/179Use of uninitialized value in > substitution (s///) at blib/lib/Bio/Factory/FTLocationFactory.pm line > 169. > Argument "complement(315036" isn't numeric in numeric gt (>) at > blib/lib/Bio/Factory/FTLocationFactory.pm line 254. > Argument "complement(315036" isn't numeric in numeric gt (>) at > blib/lib/Bio/Location/Atomic.pm line 101. > Argument "complement(315036" isn't numeric in numeric eq (==) at > blib/lib/Bio/Location/Simple.pm line 315. > t/LocationFactory............NOK 178Argument "complement(314652" isn't > numeric in numeric gt (>) at blib/lib/Bio/Factory/FTLocationFactory.pm > line 254. > Argument "complement(314652" isn't numeric in numeric gt (>) at > blib/lib/Bio/Location/Atomic.pm line 101. > Argument "complement(314652" isn't numeric in numeric eq (==) at > blib/lib/Bio/Location/Simple.pm line 315. > t/LocationFactory............FAILED tests 178-179 > Failed 2/179 tests, 98.88% okay > > t/PopGen.t: > > # Test 82 got: '89' (t/PopGen.t at line 432) > # Expected: '90' > not ok 83 > # Test 83 got: 'NA07000' (t/PopGen.t at line 433) > # Expected: 'NA06994' > > Also, the following tests clutter the screen with various warnings that > don't look entirely innocuous: > > t/FeatureIO..................ok 15/22 > -------------------- WARNING --------------------- > MSG: '##feature-ontology' directive handling not yet implemented > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: '##attribute-ontology' directive handling not yet implemented > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: '##source-ontology' directive handling not yet implemented > --------------------------------------------------- > t/FeatureIO..................ok 19/22 > -------------------- WARNING --------------------- > MSG: Bio::SeqFeature::Annotated=HASH(0x863674) does not implement > Bio::SeqFeatureI, ignoring. > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: Bio::SeqFeature::Annotated=HASH(0x855174) does not implement > Bio::SeqFeatureI, ignoring. > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: Bio::SeqFeature::Annotated=HASH(0x8607c0) does not implement > Bio::SeqFeatureI, ignoring. > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: Bio::SeqFeature::Annotated=HASH(0x85f82c) does not implement > Bio::SeqFeatureI, ignoring. > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: Bio::SeqFeature::Annotated=HASH(0x8639f8) does not implement > Bio::SeqFeatureI, ignoring. > --------------------------------------------------- > t/FeatureIO..................ok > > t/masta......................ok 3/16Use of uninitialized value in > addition (+) at blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, > line 10. > Use of uninitialized value in addition (+) at > blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 10. > Use of uninitialized value in addition (+) at > blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 10. > Use of uninitialized value in numeric eq (==) at > blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 226, line 10. > # these are repeated many more times but cut here > masta......................ok 14/16Use of uninitialized value in > pattern match (m//) at blib/lib/Bio/Matrix/PSM/IO/masta.pm line 193, > line 63. > Use of uninitialized value in addition (+) at > blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 63. > Use of uninitialized value in addition (+) at > blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 63. > Use of uninitialized value in addition (+) at > blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 222, line 63. > Use of uninitialized value in numeric eq (==) at > blib/lib/Bio/Matrix/PSM/SiteMatrix.pm line 226, line 63. > # these are repeated many more times but cut here > t/masta......................ok > > t/primer3....................ok 10/10Use of uninitialized value in > pattern match (m//) at blib/lib/Bio/Seq/PrimedSeq.pm line 267, > line 103. > Use of uninitialized value in pattern match (m//) at > blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. > Use of uninitialized value in pattern match (m//) at > blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. > Use of uninitialized value in pattern match (m//) at > blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. > Use of uninitialized value in pattern match (m//) at > blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. > Use of uninitialized value in pattern match (m//) at > blib/lib/Bio/Seq/PrimedSeq.pm line 267, line 103. > t/primer3....................ok > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Sun Apr 17 11:00:00 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Sun Apr 17 10:53:15 2005 Subject: [Bioperl-l] SeqIO::table In-Reply-To: Message-ID: Hilmar, Yes, this is a good idea, like the existing 'tab' format but with more information. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp Sent: Friday, April 08, 2005 8:15 PM To: Bioperl Subject: [Bioperl-l] SeqIO::table I wrote two new SeqIO-compliant streams that will return Bio::Seq objects from a table in either column-delimited ASCII text-format or contained in an Excel worksheet inside an Excel file, respectively. The table in either format is presumed to contain one seq per line (or row). The parser allows you to identify a few columns with implied semantic meaning (display_id, accession, species, sequence string). All other columns may be selectively chosen to be preserved in the annotation bundle. The motivation for this was that several comprehensive gene family publications made their data available in manually curated spreadsheets. I needed these data as a SeqIO-compliant stream, and going through an intermediary fasta file can mess up the annotation a lot. If anybody else is interested in this or if anybody else thinks this could be of general interest I'll commit it to bioperl. I've enclosed the supported arguments for the SeqIO::table::new method, this will give an idea of what is configurable. The excel parser supports the same arguments and the name of the worksheet in addition. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- Named parameters supported by the proposed Bio::SeqIO::table: -comment leading character(s) introducing a comment line -header the number of header lines to skip; the first non-comment header line will be used to obtain column names; column names will be used as the default tags for attaching annotation. -delim the delimiter for columns as a regular expression; consecutive occurrences of the delimiter will not be collapsed. -display_id the one-based index of the column containing the display ID of the sequence -accession_number the one-based index of the column containing the accession number of the sequence -seq the one-based index of the column containing the sequence string of the sequence -species the one-based index of the column containing the species for the sequence record; if not a number, will be used as the static species common to all records -annotation if provided and a scalar, a flag whether or not all additional columns are to be preserved as annotation, the tags used will either be 'colX' if there is no column header and where X is the one-based column index, and otherwise the column headers will be used as tags; if a reference to an array, only those columns (one-based index) will be preserved as annotation, tags as before; if a reference to a hash, the keys are one-based column indexes to be preserved, and the values are the tags under which the annotation is to be attached; if not provided or supplied as undef, no additional annotation will be preserved. -trim flag determining whether or not all values should be trimmed of leading and trailing white space _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From aldaihani at hotmail.co.uk Sun Apr 17 17:04:48 2005 From: aldaihani at hotmail.co.uk (badr al-daihani) Date: Sun Apr 17 16:58:29 2005 Subject: [Bioperl-l] UniGene In-Reply-To: Message-ID: Hi folks would you please tell me how to retrieve the unigene number of a gene (UniGene) knowing the GenBankaccession number ? Best regards Badr _________________________________________________________________ It's fast, it's easy and it's free. Get MSN Messenger today! http://www.msn.co.uk/messenger From sdavis2 at mail.nih.gov Sun Apr 17 19:05:57 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun Apr 17 18:59:04 2005 Subject: [Bioperl-l] UniGene References: Message-ID: <000601c543a2$0581e910$5179f345@WATSON> Badr, The simplest way is to go to the ftp site for unigene: ftp://ftp.ncbi.nih.gov/repository/UniGene Get the file for the organism you are interested that ends in .gb_cid_lid. Just choose the file for the organism of interest. For example, the first few lines of Cre.gb_cid_lid look like: AY171232 3219 - AY171231 3003 - AY171230 4370 - AY171229 2671 - AY184800 2793 - AY184799 6486 - AY184798 206 - AY184797 3607 - AY184796 2281 - AY177787 2380 - AY212923 4329 - AB091079 3370 - The first column contains genbank accessions. The second contains unigene cluster ids. So, for the first genbank AY171232, the unigene accession is Cre.3219 (You have to prepend the 2- or 3- letter organism code). You can read these into a hash that is keyed on the genbank accession with a value that is the reference to an array of unigene cluster IDs for each genbank (a genbank can be in multiple unigene clusters). Hope that helps. Sean ----- Original Message ----- From: "badr al-daihani" To: Sent: Sunday, April 17, 2005 5:04 PM Subject: [Bioperl-l] UniGene > Hi folks > > would you please tell me how to retrieve the unigene number of a gene > (UniGene) > knowing the GenBankaccession number ? > > > Best regards > > Badr > > _________________________________________________________________ > It's fast, it's easy and it's free. Get MSN Messenger today! > http://www.msn.co.uk/messenger > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From brian_osborne at cognia.com Mon Apr 18 08:46:37 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Apr 18 08:41:07 2005 Subject: [Bioperl-l] Entrezgene parser: new parser available from sourceforge (attached too) (fwd) In-Reply-To: Message-ID: Stefan, Unless I've missed something it seems that someone needs to add a test for this new parser, the best place would be t/SeqIO.t. Could you add a mini-Entrez Gene file to t/data as a first step? I'm assuming that such a thing exists... Thanks again, Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Stefan A Kirov Sent: Tuesday, April 05, 2005 3:07 PM To: Bioperl Subject: [Bioperl-l] Entrezgene parser: new parser available from sourceforge (attached too) (fwd) As often happens, NCBI introduced some small, but interesting changes to their ASN entrezgene format. Therefore Mingyi had to change the underlying low level parser. Anyone who uses his parser direcly will have to update. This will also delay the release of the first version of the bioperl entrezgene parser, which I anticipated to be on Thursday. I still hope I will commit the code on Friday. Stefan ---------- Forwarded message ---------- Date: Tue, 05 Apr 2005 13:35:57 -0400 From: Mingyi Liu To: Stefan A Kirov Subject: new parser available from sourceforge (attached too) Hi, Stefan, I attached the new version to this email. Unfortunately as expected, the new version is much slower due to the use of lookahead regexes (needed to accomodate the meaningless/buggy changes NCBI introduced). About 30% slower in my test. Still can't find a good reason why they introduced those 3 different types of changes. Can't be one bug. Anyways. Thanks for letting me know so early! Mingyi From hota.fin at freemail.hu Mon Apr 18 10:54:43 2005 From: hota.fin at freemail.hu (Horvath Tamas) Date: Mon Apr 18 09:47:09 2005 Subject: [Bioperl-l] Bio::Graphics::Glyph::triangle Message-ID: <4263CA33.7020909@freemail.hu> I have some problems with the triangle glyph. When I don't specify any orientation to the glyph, it stretches nicely. However, if I do specify, it only drows isoceles triangles. (with E or W orientation). Can I overcome this problem? Hota From hota.fin at freemail.hu Mon Apr 18 11:09:50 2005 From: hota.fin at freemail.hu (Horvath Tamas) Date: Mon Apr 18 10:06:52 2005 Subject: [Bioperl-l] Bio::Graphics::Glyph::triangle Message-ID: <4263CDBE.7000003@freemail.hu> I have some problems with the triangle glyph. When I don't specify any orientation to the glyph, it stretches nicely. However, if I do specify, it only drows isoceles triangles. (with E or W orientation). Can I overcome this problem? Hota Actually, I found the problem: the original codein the 'sub draw_component': elsif($orient eq 'W'){$vx1=$x2;$vy1=$y1;$vx2=$x2;$vy2=$y2;$vx3=$x2-$p;$vy3=$ymid;} elsif($orient eq 'E'){$vx1=$x1;$vy1=$y1;$vx2=$x1;$vy2=$y2;$vx3=$x1+$p;$vy3=$ymid;} the $p has to be changed to ($q*2) Then it creates nicely stretched triangles. However, it might be more convenient to use an other type of glyph, like: ||||> but I don't know how to create it. Do we have this kind of glyph? Hota From cain at cshl.edu Mon Apr 18 10:38:51 2005 From: cain at cshl.edu (Scott Cain) Date: Mon Apr 18 10:35:50 2005 Subject: [Bioperl-l] Bio::Graphics::Glyph::triangle In-Reply-To: <4263CDBE.7000003@freemail.hu> References: <4263CDBE.7000003@freemail.hu> Message-ID: <1113835131.5116.8.camel@localhost.localdomain> Hota, I'm not sure what your ascii graphics are trying to represent. To me, it looks like you want a segments glyph and then specify the stranded nest to point either E or W. That would produce a rectangle with a triangle on the end. Scott On Mon, 2005-04-18 at 17:09 +0200, Horvath Tamas wrote: > I have some problems with the triangle glyph. When I don't specify any > orientation to the glyph, it stretches nicely. However, if I do specify, > it only drows isoceles triangles. (with E or W orientation). Can I > overcome this problem? > > Hota > > Actually, I found the problem: > the original codein the 'sub draw_component': > > elsif($orient eq > 'W'){$vx1=$x2;$vy1=$y1;$vx2=$x2;$vy2=$y2;$vx3=$x2-$p;$vy3=$ymid;} > elsif($orient eq > 'E'){$vx1=$x1;$vy1=$y1;$vx2=$x1;$vy2=$y2;$vx3=$x1+$p;$vy3=$ymid;} > > the $p has to be changed to ($q*2) > > Then it creates nicely stretched triangles. However, it might be more > convenient to use an other type of glyph, like: > > ||||> > > but I don't know how to create it. Do we have this kind of glyph? > > Hota > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From skirov at utk.edu Mon Apr 18 10:47:44 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Apr 18 10:41:42 2005 Subject: [Bioperl-l] Entrezgene parser: new parser available from sourceforge (attached too) (fwd) In-Reply-To: References: Message-ID: <4263C890.4050408@utk.edu> Brian, I will put the tests, but I am a bit busy this week. I will try to do it on Friday or during the weekend. Stefan Brian Osborne wrote: >Stefan, > >Unless I've missed something it seems that someone needs to add a test for >this new parser, the best place would be t/SeqIO.t. Could you add a >mini-Entrez Gene file to t/data as a first step? I'm assuming that such a >thing exists... > >Thanks again, > >Brian O. > > >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org >[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Stefan A >Kirov >Sent: Tuesday, April 05, 2005 3:07 PM >To: Bioperl >Subject: [Bioperl-l] Entrezgene parser: new parser available from >sourceforge (attached too) (fwd) > > >As often happens, NCBI introduced some small, but interesting changes to >their ASN entrezgene format. Therefore Mingyi had to change the underlying >low level parser. Anyone who uses his parser direcly will have to update. >This will also delay the release of the first version of the bioperl >entrezgene parser, which I anticipated to be on Thursday. I still hope I >will commit the code on Friday. >Stefan > >---------- Forwarded message ---------- >Date: Tue, 05 Apr 2005 13:35:57 -0400 >From: Mingyi Liu >To: Stefan A Kirov >Subject: new parser available from sourceforge (attached too) > >Hi, Stefan, > >I attached the new version to this email. Unfortunately as expected, >the new version is much slower due to the use of lookahead regexes >(needed to accomodate the meaningless/buggy changes NCBI introduced). >About 30% slower in my test. Still can't find a good reason why they >introduced those 3 different types of changes. Can't be one bug. > >Anyways. Thanks for letting me know so early! > >Mingyi > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From sdavis2 at mail.nih.gov Mon Apr 18 11:45:48 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon Apr 18 11:39:14 2005 Subject: [Bioperl-l] UniGene References: Message-ID: <004201c5442d$b10a0c50$5179f345@WATSON> You can look into using eutils at NCBI, but there is not a bioperl solution for this as far as I know. If you have many (1000's of genbanks), eutils may be slow. If you want to make it easier to update the file, just include Net::FTP (available on CPAN) to grab the file for you each time you run the script. Sean ----- Original Message ----- From: "badr al-daihani" To: Sent: Monday, April 18, 2005 11:42 AM Subject: Re: [Bioperl-l] UniGene > Dear Sean Davis > Thank you for your reply. > your solution is fine. but the problem, this means that I have to update > my file from time to time. Is any other dynamic solution which allow me to > connect to GenBank and fetch the UniGene using GenBank Accession > > > Best regards > > Badr > > > > > >>From: "Sean Davis" >>To: "badr al-daihani" , >>Subject: Re: [Bioperl-l] UniGene >>Date: Sun, 17 Apr 2005 19:05:57 -0400 >> >>Badr, >> >>The simplest way is to go to the ftp site for unigene: >> >>ftp://ftp.ncbi.nih.gov/repository/UniGene >> >>Get the file for the organism you are interested that ends in .gb_cid_lid. >>Just choose the file for the organism of interest. For example, the first >>few lines of Cre.gb_cid_lid look like: >> >>AY171232 3219 - >>AY171231 3003 - >>AY171230 4370 - >>AY171229 2671 - >>AY184800 2793 - >>AY184799 6486 - >>AY184798 206 - >>AY184797 3607 - >>AY184796 2281 - >>AY177787 2380 - >>AY212923 4329 - >>AB091079 3370 - >> >>The first column contains genbank accessions. The second contains unigene >>cluster ids. So, for the first genbank AY171232, the unigene accession is >>Cre.3219 (You have to prepend the 2- or 3- letter organism code). You can >>read these into a hash that is keyed on the genbank accession with a value >>that is the reference to an array of unigene cluster IDs for each genbank >>(a genbank can be in multiple unigene clusters). >> >>Hope that helps. >>Sean >> >>----- Original Message ----- From: "badr al-daihani" >> >>To: >>Sent: Sunday, April 17, 2005 5:04 PM >>Subject: [Bioperl-l] UniGene >> >> >>>Hi folks >>> >>>would you please tell me how to retrieve the unigene number of a gene >>>(UniGene) >>>knowing the GenBankaccession number ? >>> >>> >>>Best regards >>> >>>Badr >>> >>>_________________________________________________________________ >>>It's fast, it's easy and it's free. Get MSN Messenger today! >>>http://www.msn.co.uk/messenger >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _________________________________________________________________ > It's fast, it's easy and it's free. Get MSN Messenger today! > http://www.msn.co.uk/messenger > From hlapp at gmx.net Mon Apr 18 13:01:08 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Apr 18 12:54:37 2005 Subject: [Bioperl-l] Entrezgene parser: new parser available from sourceforge (attached too) (fwd) In-Reply-To: Message-ID: <7597B3D9-B02B-11D9-A2D2-000A959EB4C4@gmx.net> I'd rather put it into its own test just to keep things easier for now. Once the parser and API has settled it can still be appended to the SeqIO.t. -hilmar On Monday, April 18, 2005, at 05:46 AM, Brian Osborne wrote: > Stefan, > > Unless I've missed something it seems that someone needs to add a test > for > this new parser, the best place would be t/SeqIO.t. Could you add a > mini-Entrez Gene file to t/data as a first step? I'm assuming that > such a > thing exists... > > Thanks again, > > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Stefan A > Kirov > Sent: Tuesday, April 05, 2005 3:07 PM > To: Bioperl > Subject: [Bioperl-l] Entrezgene parser: new parser available from > sourceforge (attached too) (fwd) > > > As often happens, NCBI introduced some small, but interesting changes > to > their ASN entrezgene format. Therefore Mingyi had to change the > underlying > low level parser. Anyone who uses his parser direcly will have to > update. > This will also delay the release of the first version of the bioperl > entrezgene parser, which I anticipated to be on Thursday. I still hope > I > will commit the code on Friday. > Stefan > > ---------- Forwarded message ---------- > Date: Tue, 05 Apr 2005 13:35:57 -0400 > From: Mingyi Liu > To: Stefan A Kirov > Subject: new parser available from sourceforge (attached too) > > Hi, Stefan, > > I attached the new version to this email. Unfortunately as expected, > the new version is much slower due to the use of lookahead regexes > (needed to accomodate the meaningless/buggy changes NCBI introduced). > About 30% slower in my test. Still can't find a good reason why they > introduced those 3 different types of changes. Can't be one bug. > > Anyways. Thanks for letting me know so early! > > Mingyi > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Mon Apr 18 14:02:18 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Apr 18 13:56:00 2005 Subject: [Bioperl-l] Entrezgene parser: new parser available from sourceforge (attached too) (fwd) In-Reply-To: <7597B3D9-B02B-11D9-A2D2-000A959EB4C4@gmx.net> References: <7597B3D9-B02B-11D9-A2D2-000A959EB4C4@gmx.net> Message-ID: <4263F62A.10600@utk.edu> OK. Makes sense. Stefan Hilmar Lapp wrote: > I'd rather put it into its own test just to keep things easier for > now. Once the parser and API has settled it can still be appended to > the SeqIO.t. > > -hilmar > > On Monday, April 18, 2005, at 05:46 AM, Brian Osborne wrote: > >> Stefan, >> >> Unless I've missed something it seems that someone needs to add a >> test for >> this new parser, the best place would be t/SeqIO.t. Could you add a >> mini-Entrez Gene file to t/data as a first step? I'm assuming that >> such a >> thing exists... >> >> Thanks again, >> >> Brian O. >> >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Stefan A >> Kirov >> Sent: Tuesday, April 05, 2005 3:07 PM >> To: Bioperl >> Subject: [Bioperl-l] Entrezgene parser: new parser available from >> sourceforge (attached too) (fwd) >> >> >> As often happens, NCBI introduced some small, but interesting changes to >> their ASN entrezgene format. Therefore Mingyi had to change the >> underlying >> low level parser. Anyone who uses his parser direcly will have to >> update. >> This will also delay the release of the first version of the bioperl >> entrezgene parser, which I anticipated to be on Thursday. I still hope I >> will commit the code on Friday. >> Stefan >> >> ---------- Forwarded message ---------- >> Date: Tue, 05 Apr 2005 13:35:57 -0400 >> From: Mingyi Liu >> To: Stefan A Kirov >> Subject: new parser available from sourceforge (attached too) >> >> Hi, Stefan, >> >> I attached the new version to this email. Unfortunately as expected, >> the new version is much slower due to the use of lookahead regexes >> (needed to accomodate the meaningless/buggy changes NCBI introduced). >> About 30% slower in my test. Still can't find a good reason why they >> introduced those 3 different types of changes. Can't be one bug. >> >> Anyways. Thanks for letting me know so early! >> >> Mingyi >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From smarkel at scitegic.com Mon Apr 18 15:14:00 2005 From: smarkel at scitegic.com (Scott Markel) Date: Mon Apr 18 15:08:43 2005 Subject: [Bioperl-l] why does Bio::Tools::Run::Hmmer return a SearchIO object for hmmalign? In-Reply-To: <425C74DB.8010201@scitegic.com> References: <425C6684.7020609@scitegic.com> <37a1bc0dd995d0978cd7ef91fa216495@duke.edu> <425C74DB.8010201@scitegic.com> Message-ID: <426406F8.4050604@scitegic.com> Here are my proposed code changes for Bio::Tools::Run::Hmmer. 11c11 < #run hmmpfam|hmmalign|hmmsearch --- > #run hmmpfam|hmmsearch 37a38,45 > #align one or more sequences to an existing hmm using hmmalign > my $factory = Bio::Tools::Run::Hmmer->new('program'=>'hmmalign','hmm'=>'model.hmm'); > > # Pass the factory a Bio::Seq object or a file name > > # returns a Bio::AlignIO object > my $aio = $factory->run($seq); > 216c224 < if($self->program_name=~/hmmpfam|hmmsearch|hmmalign/){ --- > if($self->program_name=~/hmmpfam|hmmsearch/){ 220a229,241 > elsif($self->program_name=~/hmmalign/){ > # add "-q" to make sure the banner is suppressed; note that options go > # after hmmalign and before the file names > my @tokens = split(/\s+/, $str); > $str = shift(@tokens) . " -q " . join(" ", @tokens); > > open(HMM,"$str |") || $self->throw("HMMER call ($str) crashed: $?\n"); > > # warning: mismatch if user has set the format using "--outformat" > my $alignio = Bio::AlignIO->new(-fh=>\*HMM,-format=>"stockholm"); > > return $alignio; > } Would someone please verify that my changes are acceptable and then do the CVS update? Scott Scott Markel wrote: > Jason, > > It doesn't really work. It gives a SearchIO object with no > hits. > > I'll take a shot at modifying Bio::Tools::Run::Hmmer to handle > hmmalign differently. > > Scott > > Jason Stajich wrote: > >> That is curious. Is it even a proper searchIO object - I can't >> imagine it would work? I assume you know what it would take in >> Tools::Run::HMMER to do something different for hmmalign runs. >> >> -jason >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> On Apr 12, 2005, at 8:23 PM, Scott Markel wrote: >> >>> I'm curious as to why Bio::Tools::Run::Hmmer returns a >>> SearchIO object when the program is hmmalign. I would >>> have expected an AlignIO object since the result of an >>> hmmalign execution is a Stockholm formatted alignment file. >>> >>> I expect that I'm missing something obvious, but I don't >>> see it. An online search, including the mailing list >>> archive, didn't help. >>> >>> Scott >>> >>> -- >>> Scott Markel, Ph.D. >>> Principal Bioinformatics Architect email: smarkel@scitegic.com >>> SciTegic Inc. mobile: +1 858 205 3653 >>> 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 >>> San Diego, CA 92123 fax: +1 858 279 8804 >>> USA web: http://www.scitegic.com >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From smarkel at scitegic.com Mon Apr 18 15:17:10 2005 From: smarkel at scitegic.com (Scott Markel) Date: Mon Apr 18 15:10:56 2005 Subject: [Bioperl-l] possible filehandle out of scope bug between Bio::AlignIO and Bio::Tools::Run::Hmmer Message-ID: <426407B6.5060603@scitegic.com> Note: The context for this message assumes the code change to Bio::Tools::Run::Hmmer that I just sent to the mailing list. I'm running BioPerl-1.4 on both Windows XP (Perl 5.8.0) and cygwin (Perl 5.8.5). I get the same behavior when I use BioPerl-1.5. When I run the following code, I get the error message Can't call method "consensus_string" on an undefined value at runHmmAlign.pl line 14. If I change $factory to $::factory, so that it doesn't go out of scope when the subroutine is done, then everything is fine. My Perl debugging skills aren't what they should be, so I'm not sure how to verify the following, but it looks like the destructor for Bio::Tools::Run::Hmmer clobbers the filehandle in Bio::AlignIO. Similar code involving Bio::Tools::Run::Hmmer and Bio::SearchIO (for hmmsearch) does not have this problem. I checked the bug list, but didn't find anything for AlignIO and filehandle. Scott ============================== use strict; use warnings; use Bio::Tools::Run::Hmmer; my $hmmFile = shift; my $sequenceFile = shift; my $in = Bio::SeqIO->new(-file => $sequenceFile , -format => "fasta"); my $sequence = $in->next_seq(); my $hmmResults = runHmmAlign($hmmFile, $sequence); my $alignment = $hmmResults->next_aln(); my $consensusString = $alignment->consensus_string(); print("$consensusString\n"); sub runHmmAlign { my ($hmmFile, $sequence) = @_; my $hmmResults; eval { my $factory = Bio::Tools::Run::Hmmer->new("program" => "hmmalign", "hmm" => $hmmFile); $hmmResults = $factory->run($sequence); }; if ($@) { die("hmmalign failed: $@\n"); } return $hmmResults; } ============================== -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From iluminati at earthlink.net Mon Apr 18 16:40:21 2005 From: iluminati at earthlink.net (iluminati@earthlink.net) Date: Mon Apr 18 16:34:14 2005 Subject: [Bioperl-l] UniGene In-Reply-To: <010a01c54419$96883ff0$5179f345@WATSON> References: <000601c543a2$0581e910$5179f345@WATSON> <4263B1BE.80906@earthlink.net> <010a01c54419$96883ff0$5179f345@WATSON> Message-ID: <42641B35.3030002@earthlink.net> Sean, Thanks for the heads-up on Unigene vs. Gene_ID. I'll definitely take a look-see on that. The main thing I'm looking for is a means to get general info on what family a gene is in. I keep noticing certain kinds of genes with similar characteristics, and all I want is a good handle on the ballpark these genes are in. Most of the dirty work has been done in characterizing them. I need a means for me to check what gene is in which family, and UniGene seems like the easiest way to do so. Todd Sean Davis wrote: > Todd, > > Probably the easiest way if you start with refseq ids (like NM_000022) > is to map refseq to gene_id and then gene_id to unigene--are you > starting with refseq ID's? The files to do so are located in the > directory: > > ftp://ftp.ncbi.nih.gov/gene/DATA/ > > The files are pretty self-explanatory. gene2refseq and gene2unigene > will get you what you want. They are simply tab-delimited files. > Just make a hash keyed by refseq with value gene_id and then a hash > with gene_id as key and a reference to an array of unigene ids as > value (there can be more than one--or none--unigene id for each gene_id). > > What are you ultimately trying to do? There isn't much information > associated with unigene ids anymore, and the unigene ids are only > stable for about a month anymore. Any thought you would want to do > your analysis using gene_id instead of unigene? > > Sean > > P.S. If there isn't a reason for this NOT to go back to the bioperl > list, it is generally OK (and VERY much encouraged) to post back to > the list, even on these "by-the-way" questions. > > ----- Original Message ----- From: > To: "Sean Davis" > Sent: Monday, April 18, 2005 9:10 AM > Subject: Re: [Bioperl-l] UniGene > > >> Sean: >> >> I have an interesting question. I have a bunch of genes for which I >> have RefGene ID numbers, plus a couple of tables with which I can >> find either their KnownGene or GNFAtlas equivalents. Is there some >> way I can get the Unigene IDs? I'd love to know if there's some easy >> way for me to start looking at Unigene families. Thanks in advance. >> >> Todd Graham >> >> Sean Davis wrote: >> >>> Badr, >>> >>> The simplest way is to go to the ftp site for unigene: >>> >>> ftp://ftp.ncbi.nih.gov/repository/UniGene >>> >>> Get the file for the organism you are interested that ends in >>> .gb_cid_lid. Just choose the file for the organism of interest. For >>> example, the first few lines of Cre.gb_cid_lid look like: >>> >>> AY171232 3219 - >>> AY171231 3003 - >>> AY171230 4370 - >>> AY171229 2671 - >>> AY184800 2793 - >>> AY184799 6486 - >>> AY184798 206 - >>> AY184797 3607 - >>> AY184796 2281 - >>> AY177787 2380 - >>> AY212923 4329 - >>> AB091079 3370 - >>> >>> The first column contains genbank accessions. The second contains >>> unigene cluster ids. So, for the first genbank AY171232, the >>> unigene accession is Cre.3219 (You have to prepend the 2- or 3- >>> letter organism code). You can read these into a hash that is keyed >>> on the genbank accession with a value that is the reference to an >>> array of unigene cluster IDs for each genbank (a genbank can be in >>> multiple unigene clusters). >>> >>> Hope that helps. >>> Sean >>> >>> ----- Original Message ----- From: "badr al-daihani" >>> >>> To: >>> Sent: Sunday, April 17, 2005 5:04 PM >>> Subject: [Bioperl-l] UniGene >>> >>> >>>> Hi folks >>>> >>>> would you please tell me how to retrieve the unigene number of a >>>> gene (UniGene) >>>> knowing the GenBankaccession number ? >>>> >>>> >>>> Best regards >>>> >>>> Badr >>>> >>>> _________________________________________________________________ >>>> It's fast, it's easy and it's free. Get MSN Messenger today! >>>> http://www.msn.co.uk/messenger >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > From jason.stajich at duke.edu Mon Apr 18 16:52:19 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Apr 18 16:45:58 2005 Subject: [Bioperl-l] possible filehandle out of scope bug between Bio::AlignIO and Bio::Tools::Run::Hmmer In-Reply-To: <426407B6.5060603@scitegic.com> References: <426407B6.5060603@scitegic.com> Message-ID: <51dd287fb67a79b9472322a80051f01c@duke.edu> i think it has more to do with not passing -q into hmmalign which causes AlignIO to barf since the first few lines are header from hmmalign. I checked in your changes with some tweaks. It seems to work for me -- added a test and all. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 18, 2005, at 3:17 PM, Scott Markel wrote: > Note: The context for this message assumes the code change > to Bio::Tools::Run::Hmmer that I just sent to the mailing > list. > > I'm running BioPerl-1.4 on both Windows XP (Perl 5.8.0) and > cygwin (Perl 5.8.5). I get the same behavior when I use > BioPerl-1.5. > > When I run the following code, I get the error message > > Can't call method "consensus_string" on an undefined value > at runHmmAlign.pl line 14. > > If I change $factory to $::factory, so that it doesn't go out > of scope when the subroutine is done, then everything is fine. > > My Perl debugging skills aren't what they should be, so I'm > not sure how to verify the following, but it looks like the > destructor for Bio::Tools::Run::Hmmer clobbers the filehandle > in Bio::AlignIO. Similar code involving Bio::Tools::Run::Hmmer > and Bio::SearchIO (for hmmsearch) does not have this problem. > > I checked the bug list, but didn't find anything for AlignIO > and filehandle. > > Scott > > ============================== > use strict; > use warnings; > > use Bio::Tools::Run::Hmmer; > > my $hmmFile = shift; > my $sequenceFile = shift; > > my $in = Bio::SeqIO->new(-file => $sequenceFile , -format => "fasta"); > my $sequence = $in->next_seq(); > > my $hmmResults = runHmmAlign($hmmFile, $sequence); > my $alignment = $hmmResults->next_aln(); > my $consensusString = $alignment->consensus_string(); > print("$consensusString\n"); > > sub runHmmAlign > { > my ($hmmFile, $sequence) = @_; > > my $hmmResults; > > eval > { > my $factory = Bio::Tools::Run::Hmmer->new("program" => > "hmmalign", > "hmm" => > $hmmFile); > $hmmResults = $factory->run($sequence); > }; > > if ($@) > { > die("hmmalign failed: $@\n"); > } > > return $hmmResults; > } > ============================== > > -- > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel@scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > San Diego, CA 92123 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From smarkel at scitegic.com Mon Apr 18 16:59:21 2005 From: smarkel at scitegic.com (Scott Markel) Date: Mon Apr 18 16:52:42 2005 Subject: [Bioperl-l] possible filehandle out of scope bug between Bio::AlignIO and Bio::Tools::Run::Hmmer In-Reply-To: <51dd287fb67a79b9472322a80051f01c@duke.edu> References: <426407B6.5060603@scitegic.com> <51dd287fb67a79b9472322a80051f01c@duke.edu> Message-ID: <42641FA9.60006@scitegic.com> Jason, I'm pretty sure I was using -q. I specifically add that option at the beginning of the hmmalign elsif block in Bio::Tool::Run::Hmmer's _run subroutine. While I was debugging, the contents of the filehandle were just the Stockholm formatted alignment file - no hmmalign header. Scott Scott Jason Stajich wrote: > i think it has more to do with not passing -q into hmmalign which causes > AlignIO to barf since the first few lines are header from hmmalign. > > I checked in your changes with some tweaks. It seems to work for me -- > added a test and all. > > -jason > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Apr 18, 2005, at 3:17 PM, Scott Markel wrote: > >> Note: The context for this message assumes the code change >> to Bio::Tools::Run::Hmmer that I just sent to the mailing >> list. >> >> I'm running BioPerl-1.4 on both Windows XP (Perl 5.8.0) and >> cygwin (Perl 5.8.5). I get the same behavior when I use >> BioPerl-1.5. >> >> When I run the following code, I get the error message >> >> Can't call method "consensus_string" on an undefined value >> at runHmmAlign.pl line 14. >> >> If I change $factory to $::factory, so that it doesn't go out >> of scope when the subroutine is done, then everything is fine. >> >> My Perl debugging skills aren't what they should be, so I'm >> not sure how to verify the following, but it looks like the >> destructor for Bio::Tools::Run::Hmmer clobbers the filehandle >> in Bio::AlignIO. Similar code involving Bio::Tools::Run::Hmmer >> and Bio::SearchIO (for hmmsearch) does not have this problem. >> >> I checked the bug list, but didn't find anything for AlignIO >> and filehandle. >> >> Scott >> >> ============================== >> use strict; >> use warnings; >> >> use Bio::Tools::Run::Hmmer; >> >> my $hmmFile = shift; >> my $sequenceFile = shift; >> >> my $in = Bio::SeqIO->new(-file => $sequenceFile , -format => "fasta"); >> my $sequence = $in->next_seq(); >> >> my $hmmResults = runHmmAlign($hmmFile, $sequence); >> my $alignment = $hmmResults->next_aln(); >> my $consensusString = $alignment->consensus_string(); >> print("$consensusString\n"); >> >> sub runHmmAlign >> { >> my ($hmmFile, $sequence) = @_; >> >> my $hmmResults; >> >> eval >> { >> my $factory = Bio::Tools::Run::Hmmer->new("program" => >> "hmmalign", >> "hmm" => $hmmFile); >> $hmmResults = $factory->run($sequence); >> }; >> >> if ($@) >> { >> die("hmmalign failed: $@\n"); >> } >> >> return $hmmResults; >> } >> ============================== >> >> -- >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel@scitegic.com >> SciTegic Inc. mobile: +1 858 205 3653 >> 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 >> San Diego, CA 92123 fax: +1 858 279 8804 >> USA web: http://www.scitegic.com >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From jason.stajich at duke.edu Mon Apr 18 17:26:27 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Apr 18 17:23:01 2005 Subject: [Bioperl-l] possible filehandle out of scope bug between Bio::AlignIO and Bio::Tools::Run::Hmmer In-Reply-To: <42641FA9.60006@scitegic.com> References: <426407B6.5060603@scitegic.com> <51dd287fb67a79b9472322a80051f01c@duke.edu> <42641FA9.60006@scitegic.com> Message-ID: <5f2462380c4b68b4ceac7e47010b11ca@duke.edu> The errors I get seem to be with the sequence file not being filled prior to running as I am getting FATAL errors from hmmalign - but I'm really not sure why this would happen. FATAL: Failed to read any sequences from file /tmp/yHY9rwASfV/SyE1fpOHIM Can't call method "consensus_string" on an undefined value at /home/jason/scott_hmmrun.pl line 14. I don't have time to really debug - but you are probably right that a cleanup method is getting called, I am just not sure how. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 18, 2005, at 4:59 PM, Scott Markel wrote: > Jason, > > I'm pretty sure I was using -q. I specifically add that > option at the beginning of the hmmalign elsif block in > Bio::Tool::Run::Hmmer's _run subroutine. While I was > debugging, the contents of the filehandle were just the > Stockholm formatted alignment file - no hmmalign header. > > Scott > > Scott > > Jason Stajich wrote: > >> i think it has more to do with not passing -q into hmmalign which >> causes AlignIO to barf since the first few lines are header from >> hmmalign. >> I checked in your changes with some tweaks. It seems to work for me >> -- added a test and all. >> -jason >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> On Apr 18, 2005, at 3:17 PM, Scott Markel wrote: >>> Note: The context for this message assumes the code change >>> to Bio::Tools::Run::Hmmer that I just sent to the mailing >>> list. >>> >>> I'm running BioPerl-1.4 on both Windows XP (Perl 5.8.0) and >>> cygwin (Perl 5.8.5). I get the same behavior when I use >>> BioPerl-1.5. >>> >>> When I run the following code, I get the error message >>> >>> Can't call method "consensus_string" on an undefined value >>> at runHmmAlign.pl line 14. >>> >>> If I change $factory to $::factory, so that it doesn't go out >>> of scope when the subroutine is done, then everything is fine. >>> >>> My Perl debugging skills aren't what they should be, so I'm >>> not sure how to verify the following, but it looks like the >>> destructor for Bio::Tools::Run::Hmmer clobbers the filehandle >>> in Bio::AlignIO. Similar code involving Bio::Tools::Run::Hmmer >>> and Bio::SearchIO (for hmmsearch) does not have this problem. >>> >>> I checked the bug list, but didn't find anything for AlignIO >>> and filehandle. >>> >>> Scott >>> >>> ============================== >>> use strict; >>> use warnings; >>> >>> use Bio::Tools::Run::Hmmer; >>> >>> my $hmmFile = shift; >>> my $sequenceFile = shift; >>> >>> my $in = Bio::SeqIO->new(-file => $sequenceFile , -format => >>> "fasta"); >>> my $sequence = $in->next_seq(); >>> >>> my $hmmResults = runHmmAlign($hmmFile, $sequence); >>> my $alignment = $hmmResults->next_aln(); >>> my $consensusString = $alignment->consensus_string(); >>> print("$consensusString\n"); >>> >>> sub runHmmAlign >>> { >>> my ($hmmFile, $sequence) = @_; >>> >>> my $hmmResults; >>> >>> eval >>> { >>> my $factory = Bio::Tools::Run::Hmmer->new("program" => >>> "hmmalign", >>> "hmm" => >>> $hmmFile); >>> $hmmResults = $factory->run($sequence); >>> }; >>> >>> if ($@) >>> { >>> die("hmmalign failed: $@\n"); >>> } >>> >>> return $hmmResults; >>> } >>> ============================== >>> >>> -- >>> Scott Markel, Ph.D. >>> Principal Bioinformatics Architect email: smarkel@scitegic.com >>> SciTegic Inc. mobile: +1 858 205 3653 >>> 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 >>> San Diego, CA 92123 fax: +1 858 279 8804 >>> USA web: http://www.scitegic.com >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> > > -- > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel@scitegic.com > SciTegic Inc. mobile: +1 858 205 3653 > 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 > San Diego, CA 92123 fax: +1 858 279 8804 > USA web: http://www.scitegic.com > From smarkel at scitegic.com Mon Apr 18 17:36:46 2005 From: smarkel at scitegic.com (Scott Markel) Date: Mon Apr 18 17:34:11 2005 Subject: [Bioperl-l] possible filehandle out of scope bug between Bio::AlignIO and Bio::Tools::Run::Hmmer In-Reply-To: <5f2462380c4b68b4ceac7e47010b11ca@duke.edu> References: <426407B6.5060603@scitegic.com> <51dd287fb67a79b9472322a80051f01c@duke.edu> <42641FA9.60006@scitegic.com> <5f2462380c4b68b4ceac7e47010b11ca@duke.edu> Message-ID: <4264286E.9060608@scitegic.com> Jason, Okay. I'll stick with my workaround of giving the HMMER factory package scope. If you get time later to look at the filehandle issue, please let me know. I'm happy to help, but feel out of my depth doing this one alone. Thanks for taking care of the CVS check-in for hmmalign execution. Scott Jason Stajich wrote: > The errors I get seem to be with the sequence file not being filled > prior to running as I am getting FATAL errors from hmmalign - but I'm > really not sure why this would happen. > > FATAL: Failed to read any sequences from file /tmp/yHY9rwASfV/SyE1fpOHIM > Can't call method "consensus_string" on an undefined value at > /home/jason/scott_hmmrun.pl line 14. > > I don't have time to really debug - but you are probably right that a > cleanup method is getting called, I am just not sure how. > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Apr 18, 2005, at 4:59 PM, Scott Markel wrote: > >> Jason, >> >> I'm pretty sure I was using -q. I specifically add that >> option at the beginning of the hmmalign elsif block in >> Bio::Tool::Run::Hmmer's _run subroutine. While I was >> debugging, the contents of the filehandle were just the >> Stockholm formatted alignment file - no hmmalign header. >> >> Scott >> >> Scott >> >> Jason Stajich wrote: >> >>> i think it has more to do with not passing -q into hmmalign which >>> causes AlignIO to barf since the first few lines are header from >>> hmmalign. >>> I checked in your changes with some tweaks. It seems to work for me >>> -- added a test and all. >>> -jason >>> -- >>> Jason Stajich >>> jason.stajich at duke.edu >>> http://www.duke.edu/~jes12/ >>> On Apr 18, 2005, at 3:17 PM, Scott Markel wrote: >>> >>>> Note: The context for this message assumes the code change >>>> to Bio::Tools::Run::Hmmer that I just sent to the mailing >>>> list. >>>> >>>> I'm running BioPerl-1.4 on both Windows XP (Perl 5.8.0) and >>>> cygwin (Perl 5.8.5). I get the same behavior when I use >>>> BioPerl-1.5. >>>> >>>> When I run the following code, I get the error message >>>> >>>> Can't call method "consensus_string" on an undefined value >>>> at runHmmAlign.pl line 14. >>>> >>>> If I change $factory to $::factory, so that it doesn't go out >>>> of scope when the subroutine is done, then everything is fine. >>>> >>>> My Perl debugging skills aren't what they should be, so I'm >>>> not sure how to verify the following, but it looks like the >>>> destructor for Bio::Tools::Run::Hmmer clobbers the filehandle >>>> in Bio::AlignIO. Similar code involving Bio::Tools::Run::Hmmer >>>> and Bio::SearchIO (for hmmsearch) does not have this problem. >>>> >>>> I checked the bug list, but didn't find anything for AlignIO >>>> and filehandle. >>>> >>>> Scott >>>> >>>> ============================== >>>> use strict; >>>> use warnings; >>>> >>>> use Bio::Tools::Run::Hmmer; >>>> >>>> my $hmmFile = shift; >>>> my $sequenceFile = shift; >>>> >>>> my $in = Bio::SeqIO->new(-file => $sequenceFile , -format => "fasta"); >>>> my $sequence = $in->next_seq(); >>>> >>>> my $hmmResults = runHmmAlign($hmmFile, $sequence); >>>> my $alignment = $hmmResults->next_aln(); >>>> my $consensusString = $alignment->consensus_string(); >>>> print("$consensusString\n"); >>>> >>>> sub runHmmAlign >>>> { >>>> my ($hmmFile, $sequence) = @_; >>>> >>>> my $hmmResults; >>>> >>>> eval >>>> { >>>> my $factory = Bio::Tools::Run::Hmmer->new("program" => >>>> "hmmalign", >>>> "hmm" => >>>> $hmmFile); >>>> $hmmResults = $factory->run($sequence); >>>> }; >>>> >>>> if ($@) >>>> { >>>> die("hmmalign failed: $@\n"); >>>> } >>>> >>>> return $hmmResults; >>>> } >>>> ============================== >>>> >>>> -- >>>> Scott Markel, Ph.D. >>>> Principal Bioinformatics Architect email: smarkel@scitegic.com >>>> SciTegic Inc. mobile: +1 858 205 3653 >>>> 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 >>>> San Diego, CA 92123 fax: +1 858 279 8804 >>>> USA web: http://www.scitegic.com >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >> >> -- >> Scott Markel, Ph.D. >> Principal Bioinformatics Architect email: smarkel@scitegic.com >> SciTegic Inc. mobile: +1 858 205 3653 >> 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 >> San Diego, CA 92123 fax: +1 858 279 8804 >> USA web: http://www.scitegic.com >> > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From dlondon at ebi.ac.uk Tue Apr 19 05:16:30 2005 From: dlondon at ebi.ac.uk (Darin London) Date: Tue Apr 19 07:04:40 2005 Subject: [Bioperl-l] Re: BOSC 2005 In-Reply-To: <20050120175859.GA7254@parrot.ebi.ac.uk> References: <20050120175859.GA7254@parrot.ebi.ac.uk> Message-ID: <20050419091628.GN17377@parrot.ebi.ac.uk> {Please pass the word!} SECOND CALL FOR SPEAKERS The 6th annual Bioinformatics Open Source Conference (BOSC'2005) is organized by the not-for-profit Open Bioinformatics Foundation. The meeting will take place June 23-24, 2005 in Detroit, Michigan, USA, and is one of several Special Interest Group (SIG) meetings occurring in conjunction with the 13th International Conference on Intelligent Systems for Molecular Biology. see http://www.iscb.org/ismb2005 for more information. Because of the power of many Open Source bioinformatics packages in use by the Research Community today, it is not too presumptuous to say that the work of the Open Source Bioinformatics Community represents the cutting edge of Bioinformatics in general. This has been repeatedly demonstrated by the quality of presentations at previous BOSC conferences. This year, at BOSC 2005, we want to continue this tradition of excellence, while presenting this message to a wider part of the Research Community. Please, pass this message on to anyone you know that is interested in Bioinformatics software. BOSC PROGRAM & CONTACT INFO * Web: http://www.open-bio.org/bosc2005/ * Online Registration: https://www.cteusa.com/iscb4/ * Email: bosc@open-bio.org FEES * Corporate : $195 ($245 after May 16th) * Academic : $170 ($220 after May 16th) * Student : $145 ($195 after May 16th) SPEAKERS & ABSTRACTS WANTED The program committee is currently seeking abstracts for talks at BOSC 2005. BOSC is a great opportunity for you to tell the community about your use, development, or philosophy of open source software development in bioinformatics. The committee will select several submitted abstracts for 25-minute talks and others for shorter "lightning" talks. Accepted abstracts will be published on the BOSC web site. If you are interested in speaking at BOSC 2005, please send us before April 26, 2005: * an abstract (no more than a few paragraphs) * a URL for the project page, if applicable * information about the open source license used for your software or your release plans. Abstracts will be accepted for submission until April 26, 2005. Abstracts chosen for presentation will be announced May 12, 2005 (before the ISMB Early Registration Deadline). LIGHTNING-TALK SPEAKERS WANTED! The program committee is currently seeking speakers for the lightning talks at BOSC 2005. Lightning talks are quick - only five minutes long - and a great opportunity for you to give people a quick summary of your open source project, code, idea, or vision of the future. If you are interested in giving a lightning talk at BOSC 2005, please send us: * a brief title and summary (one or two lines) * a URL for the project page, if applicable * information about the open source license used for your software or your release plans. We will accept entries on-line until BOSC starts, but space for demos and lightning talks is limited.
References: <4263CDBE.7000003@freemail.hu> <1113835131.5116.8.camel@localhost.localdomain> <4263D8EF.4020800@freemail.hu> Message-ID: <1113919387.5116.86.camel@localhost.localdomain> Hi Hota, You should always respond to the whole mailing list, especially since I don't have a solution for you. :-) >From your description, it seems to me like the best thing would be to have a new glyph along the lines of the processed_transcript glyph, where you would provide the ends and the components in a way that they could be aggregated together and then the glyph magically renders all the parts. The trickiest thing about this would be that you really need two levels of aggregation: the transposon as a whole and the individual genes on the transposon. I don't know how to do that or if is currently possible. If you think this accurately describes a solution to your problem, perhaps we should put a feature request into bugzilla. Scott On Mon, 2005-04-18 at 17:57 +0200, Horvath Tamas wrote: > Scott Cain wrote: > > >Hota, > > > >I'm not sure what your ascii graphics are trying to represent. To me, > >it looks like you want a segments glyph and then specify the stranded > >nest to point either E or W. That would produce a rectangle with a > >triangle on the end. > > > >Scott > > > > > Yes, you've got it right but the use of segments glyph is far less > straightforward then the triangle glyph. Here's my problem: > > I want to display transposons. They have terminal inverted repeats > (TIRs) and 1 or 2 proteins coded. So the track should look like: > TIR conector encoded protein with 'hat' connectors 2nd > prot. TIR > |||||>-------------||||||||***||||||||||||||||||||*****|||||||||||||||||||||||-----|||||||||||****|||||||||||----<||||||| > > But I don't know how to do this using the segments glyph for the TIRs. I > also can't use 2 different connectors... actually I don't know why. > > Hota > > PS. thanks for the fast reply > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From ewijaya at singnet.com.sg Wed Apr 20 05:26:56 2005 From: ewijaya at singnet.com.sg (Edward Wijaya) Date: Wed Apr 20 06:35:50 2005 Subject: [Bioperl-l] Getting Alignment Score from Bio::Tools::Run::Alignment::TCoffee Message-ID: <1113989216.42662060c9baa@flounder.singnet.com.sg> Hi, For the above TCoffee alignment module Is there a way to: 1. get the alignment score (any method for this) 2. Pass input sequence as an array (not fasta file) e.g. input is "@seq = qw(ATCCGGA AGGGGAA)" etc. Below is my current working program. Thanks and hope to hear from you again. Regards, Edward WIJAYA __BEGIN__ #!/usr/bin/perl -w use strict; use Bio::AlignIO; use Data::Dumper; BEGIN {$ENV{TCOFFEEDIR} = '/home/edward/MyBioTool/T-COFFEE_distribution_Version_2.03/bin/'; } use Bio::Tools::Run::Alignment::TCoffee; my $inputfilename = "hm02r.fasta"; my $outfile = "tcoffee.out"; my @params = ( '-in' => $inputfilename, '-outfile' => $outfile, ); my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); my $aln = $factory->align($inputfilename); __END__ From ewijaya at singnet.com.sg Wed Apr 20 05:26:56 2005 From: ewijaya at singnet.com.sg (Edward Wijaya) Date: Wed Apr 20 06:39:44 2005 Subject: [Bioperl-l] Getting Alignment Score from Bio::Tools::Run::Alignment::TCoffee Message-ID: <1113989216.42662060c9baa@flounder.singnet.com.sg> Hi, For the above TCoffee alignment module Is there a way to: 1. get the alignment score (any method for this) 2. Pass input sequence as an array (not fasta file) e.g. input is "@seq = qw(ATCCGGA AGGGGAA)" etc. Below is my current working program. Thanks and hope to hear from you again. Regards, Edward WIJAYA __BEGIN__ #!/usr/bin/perl -w use strict; use Bio::AlignIO; use Data::Dumper; BEGIN {$ENV{TCOFFEEDIR} = '/home/edward/MyBioTool/T-COFFEE_distribution_Version_2.03/bin/'; } use Bio::Tools::Run::Alignment::TCoffee; my $inputfilename = "hm02r.fasta"; my $outfile = "tcoffee.out"; my @params = ( '-in' => $inputfilename, '-outfile' => $outfile, ); my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); my $aln = $factory->align($inputfilename); __END__ From skirov at utk.edu Wed Apr 20 09:03:13 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Apr 20 08:56:48 2005 Subject: [Bioperl-l] Re: Bioperl MEME parsing information In-Reply-To: <1113973878.4265e47605833@webmail.iu.edu> References: <1113973878.4265e47605833@webmail.iu.edu> Message-ID: <42665311.6050708@utk.edu> Sumit, First ALWAYS send your questions to bioperl list, not to me directly. What you need is in the docs for: Bio::Matrix::PSM::InstanceSite Bio::Matrix::PSM::SiteMatrix Bio::Matrix::PSM::PsmI so do perldoc Bio::Matrix::PSM::InstanceSite or look at the web description at http://doc.bioperl.org/releases/bioperl-1.4/ Sumit Middha wrote: >Hi, >I would like to know the source for all possible methods related with parsing of >a MEME output. For instance > >my $psmIO = new Bio::Matrix::PSM::IO(-format=>'meme', > -file=>$file); > while (my $psm = $psmIO -> next_psm) { > my $instances = $psm -> instances; > my $pid = $psm -> id; > my $c = $psm -> IUPAC; >........ > > >So what all options do I have, for instance retrieving the 'Motifs in block >format' sequences, etc. > > You need to look at Bio::Matrix::PSM::InstanceSite and work with the $instances. >Thanks, >Sumit > > Stefan From jason.stajich at duke.edu Wed Apr 20 10:39:31 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Apr 20 10:33:11 2005 Subject: [Bioperl-l] Getting Alignment Score from Bio::Tools::Run::Alignment::TCoffee In-Reply-To: <946e3d7aab6e21613160cde83e39987c@bioperl.org> References: <1113989216.42662060c9baa@flounder.singnet.com.sg> <946e3d7aab6e21613160cde83e39987c@bioperl.org> Message-ID: <0d179c06f88be83f9786201a433880ad@duke.edu> We don't try and parse it. You're welcome to add code in Bio::AlignIO::clustalw which would do this. It would be very simple to just parse this out of the header and there is already a score method in Bio::SimpleAlign to store this so that in the end score would be available through $aln->score. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ > On Apr 20, 2005, at 5:26 AM, Edward Wijaya wrote: > >> Hi, >> For the above TCoffee alignment module >> >> Is there a way to: >> 1. get the alignment score (any method for this) >> 2. Pass input sequence as an array (not fasta file) >> e.g. input is "@seq = qw(ATCCGGA AGGGGAA)" etc. >> >> Below is my current working program. >> >> Thanks and hope to hear from you again. >> Regards, >> Edward WIJAYA >> >> __BEGIN__ >> #!/usr/bin/perl -w >> use strict; >> use Bio::AlignIO; >> use Data::Dumper; >> >> BEGIN {$ENV{TCOFFEEDIR} = >> '/home/edward/MyBioTool/T-COFFEE_distribution_Version_2.03/bin/'; } >> use Bio::Tools::Run::Alignment::TCoffee; >> >> my $inputfilename = "hm02r.fasta"; >> my $outfile = "tcoffee.out"; >> my @params = ( >> '-in' => $inputfilename, >> '-outfile' => $outfile, >> ); >> my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); >> my $aln = $factory->align($inputfilename); >> >> __END__ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > From avilella at gmail.com Tue Apr 19 13:04:52 2005 From: avilella at gmail.com (Albert Vilella) Date: Wed Apr 20 10:34:17 2005 Subject: [Bioperl-l] help needed Was:Re: [Gmod-gbrowse] negative numbers in xyplot In-Reply-To: <200504011134.05290.lstein@cshl.edu> References: <1112342836.8125.19.camel@localhost.localdomain> <200504011134.05290.lstein@cshl.edu> Message-ID: <1113930293.8942.31.camel@localhost.localdomain> (below) > > I submitted a feature request some minutes ago to GBrowse about > > negative numbers in xyplot, but then, scanning a little bit through > > the mailing list, I understood that this seems to be > > Bio::Graphics::xyplot.pm related, not GBrowse. I resubmitted (now > > logged in) as Bio::Graphics related. > Well, the same group of people work on gbrowse and the bioperl glyphs, > so either way your bug report is most appreciated. Would someone on > the mailing list like to volunteer to fix this bug? It should be a > very simple one to handle. Maybe you can implement log coordiantes > as well? This "negative values in xyplot" feature seems to be stuck in the "TO DO" list, so I took a look at the code to see if I could implement this feature myself. After messing around a little bit with the code, I could more or less localize the place where this feature should be added, but there are a couple of places where I need some help: In xyplot.pm draw function: ----- [...] # now seed all the parts with the information they need to draw their positions foreach (@parts) { my $s = eval {$_->feature->score}; next unless defined $s; my $position = ($s-$min_score) * $scale; $_->{_y_position} = $bottom - $position; } Right now, "$bottom" will always point to the bottom the plot, where the horizontal line of the xyplot graphic will be placed latter, even if $s is negative. This results in negative numbers being wrongly positioned as score=0. So I understand that the horizontal line should be placed according to the presence or absence of negative numbers. What I couldn't find why was the horizontal lines are plotted inside each type of graphic (_draw_histogram, _draw_boxes, _draw_line, _draw_points), instead of outside, in its own subroutine. Is this so or am I missing a point? Then for negative numbers, my question would be: Where should the horizontal line be set? Finally, about the log coordinates scale Lincoln suggested, I visualize this as substituting $scale comparisons for a call to some log_scale subroutine that will re-place each value log-wise. Is that correct? Thanks in advance, Bests, Albert. From MAG at Stowers-Institute.org Wed Apr 20 12:14:11 2005 From: MAG at Stowers-Institute.org (Goel, Manisha) Date: Wed Apr 20 12:09:41 2005 Subject: [Bioperl-l] Nearest neighbor from newick format tree Message-ID: <200504201607.j3KG7TfY011376@portal.open-bio.org> Hi, I need to parse the phylogenetic trees in newick format to get the nearest neighbors for a given gi. I havn't found any such parser in Bio::Tree module yet. Is something like this available in any other related Bio-Perl module ? Thanks for any suggestions. -Manisha From jason.stajich at duke.edu Wed Apr 20 12:25:36 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Apr 20 12:19:08 2005 Subject: [Bioperl-l] Nearest neighbor from newick format tree In-Reply-To: <200504201607.j3KG7TfY011376@portal.open-bio.org> References: <200504201607.j3KG7TfY011376@portal.open-bio.org> Message-ID: Bio::TreeIO parses trees. If you want to find the sister nodes you get the ancestor for a node and then look at all the children for the ancestor node. This finds sister species in your clade (in same level) my @sisters = grep { $node->internal_id != $_->internal_id } $node->ancestor->each_Descendent You can use get_all_Descendents to do recursive grab of all nodes descendent from a certain node. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 20, 2005, at 12:14 PM, Goel, Manisha wrote: > Hi, > > I need to parse the phylogenetic trees in newick format to get the > nearest neighbors for a given gi. > I havn't found any such parser in Bio::Tree module yet. > Is something like this available in any other related Bio-Perl module ? > > Thanks for any suggestions. > -Manisha > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From ewijaya at singnet.com.sg Wed Apr 20 07:49:11 2005 From: ewijaya at singnet.com.sg (Edward WIJAYA) Date: Wed Apr 20 12:31:49 2005 Subject: [Bioperl-l] Getting Alignment Score from Bio::Tools::Run::Alignment::TCoffee In-Reply-To: References: <1113989216.42662060c9baa@flounder.singnet.com.sg> <946e3d7aab6e21613160cde83e39987c@bioperl.org> <0d179c06f88be83f9786201a433880ad@duke.edu> Message-ID: Thanks so much Jason, Works ok now. On Wed, 20 Apr 2005 23:03:44 +0800, Jason Stajich wrote: > passing them to what the tcoffee alignment factory? You need to > construct them as Bio::Seq objects first. > my @seqs = (Bio::Seq->new(-seq => 'ATGC', -display_id => 'one'), > Bio::Seq->new(-seq => 'ATGCAA', -display_id => 'two')); > [snip] > > On Apr 20, 2005, at 6:07 AM, Edward WIJAYA wrote: > >> How about passing input sequence as an array? Is it possible? [snip] -- Regards, Edward WIJAYA SINGAPORE From ewijaya at singnet.com.sg Wed Apr 20 08:08:04 2005 From: ewijaya at singnet.com.sg (Edward WIJAYA) Date: Wed Apr 20 12:51:01 2005 Subject: [Bioperl-l] Inconsistency in Alignment Score from Bio::Tools::Run::Alignment::TCoffee versus actual command-line program In-Reply-To: References: <1113989216.42662060c9baa@flounder.singnet.com.sg> <946e3d7aab6e21613160cde83e39987c@bioperl.org> <0d179c06f88be83f9786201a433880ad@duke.edu> Message-ID: Hi, Below was my code that does the alignment for two sequences. But how come it gives different result compare to the actual command-line/web result? Zero versus 100 (note the * marked line) Could it be a bug in the module? ---BioPerl Result---- PAIRWISE_ALIGNMENT [No Tree] T-COFFEE, Version_2.03(Wed Jul 11 14:38:06 PDT 2001) Notredame, Higgins, Heringa, JMB(302)pp205-217,2000 CPU 0 sec SCORE 0 * NSEQ 2 LEN 15 seq_0 AAAAAAAAAAAAAAA seq_1 AAAAAAAAAAAAAAA *************** =VERSUS= ---Command Line Result--- * CLUSTAL FORMAT for T-COFFEE Version_2.03, CPU=0.00 sec, SCORE=100, Nseq=2, Len=15 seq_0 AAAAAAAAAAAAAAA seq_1 AAAAAAAAAAAAAAA *************** ---Web Result-- http://www.ch.embnet.org/wwwtmp/run_2316.html __BEGIN__ #!/usr/bin/perl -w use strict; use Bio::AlignIO; BEGIN {$ENV{TCOFFEEDIR} = '/home/edward/MyBioTool/T-COFFEE_Ver2.03/bin/'; } use Bio::Tools::Run::Alignment::TCoffee; my @params = ( '-outfile' => 'tcof.out', '-maxlen' => '100' ); my $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params); my @array = qw (AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA); foreach ( 0..$#array ) { push @seqs2, (Bio::Seq->new(-seq => $array[$_], -display_id => 'seq_'.$_ )); } my $seq_array_ref = \@seqs2; my $aln2 = $factory->align($seq_array_ref); #final line that prints to STDOUT __END__ -- Regards, Edward WIJAYA SINGAPORE From golharam at umdnj.edu Wed Apr 20 12:13:41 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed Apr 20 13:07:18 2005 Subject: [Bioperl-l] New Bio::Tools::Spidey::Results module Message-ID: <005d01c545c3$ed1eb8c0$4122db82@GOLHARMOBILE1> Can someone please commit this new version of Bio::Tools::Spidey::Results.pm to CVS? Its attached to this message. I've added a method to determine if either of the mRNA ends are missing as determined by Spidey. Also, I think I need to write some test scripts for this thing. Can someone point me in the right direction as to what I should read to write a test script? I looked at some of the test scripts and it looks pretty straight forward. Thanks, Ryan -------------- next part -------------- A non-text attachment was scrubbed... Name: Results.pm Type: application/octet-stream Size: 13171 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050420/59f05015/Results.obj From smiddha at indiana.edu Wed Apr 20 12:49:14 2005 From: smiddha at indiana.edu (Sumit Middha) Date: Wed Apr 20 13:07:24 2005 Subject: [Bioperl-l] Re: Bioperl MEME parsing information In-Reply-To: <42665311.6050708@utk.edu> References: <1113973878.4265e47605833@webmail.iu.edu> <42665311.6050708@utk.edu> Message-ID: <1114015754.4266880a495cf@webmail.iu.edu> Hello, I am sorry for the direct email. But thanks for your response. Sumit Quoting Stefan Kirov : > Sumit, > First ALWAYS send your questions to bioperl list, not to me directly. > What you need is in the docs for: > > Bio::Matrix::PSM::InstanceSite > Bio::Matrix::PSM::SiteMatrix > Bio::Matrix::PSM::PsmI > > so do > > perldoc Bio::Matrix::PSM::InstanceSite > > or look at the web description at > http://doc.bioperl.org/releases/bioperl-1.4/ > > Sumit Middha wrote: > > >Hi, > >I would like to know the source for all possible methods related with > parsing of > >a MEME output. For instance > > > >my $psmIO = new Bio::Matrix::PSM::IO(-format=>'meme', > > -file=>$file); > > while (my $psm = $psmIO -> next_psm) { > > my $instances = $psm -> instances; > > my $pid = $psm -> id; > > my $c = $psm -> IUPAC; > >........ > > > > > >So what all options do I have, for instance retrieving the 'Motifs in block > >format' sequences, etc. > > > > > You need to look at Bio::Matrix::PSM::InstanceSite and work with the > $instances. > > >Thanks, > >Sumit > > > > > Stefan > > > From hlapp at gmx.net Wed Apr 20 14:01:50 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Apr 20 13:55:26 2005 Subject: [Bioperl-l] Re: Bio::DB::Query In-Reply-To: <15a9a89705041915154d6f9fad@mail.gmail.com> Message-ID: <45339391-B1C6-11D9-9308-000A959EB4C4@gmx.net> Carito, the Bio::DB::Query is bioperl-db, not biojava or java for that matter, so if this is what you're trying to use then email to the bioperl (bioperl-l@bioperl.org) list, or for schema-related and SQL questions the biosql (biosql-l@open-bio.org) list. Bio::DB::Query::BioQuery will map classes to tables for you. There's no really good HowTo document yet; there's plenty of examples though in the respective test script t/query.t. I don't understand your goal in modifying load_seqdatabase.pl, so unless you elaborate on what you're trying to achieve I can't help you. As for SQL queries and your example, you don't need bioperl-db to issue SQL queries against biosql - just use your favorite SQL shell for your RDBMS of choice (mysql for MySQL, psql for PostgreSQL, etc) and type in queries there. Bioperl-db is not meant to replace or provide a SQL shell functionality. If this doesn't help you will need to be more specific on what you're trying to achieve. -hilmar On Tuesday, April 19, 2005, at 03:15 PM, carito vargas wrote: > Hello, > > I am trying to modify the script load_seqdatabase.pl and I wanted to > do a "select MAX(identifier) from bioentry", kind of simple but and I > found it very complicated using Bio::DB::Query > > I read the documentation, but I don't understand how to construct the > query... we have to give clases arguments mapped to tables?? > > Please, if anyone can give me a clear example I would be thankfull > > Carito Vargas > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From david.adelson at tamu.edu Wed Apr 20 15:06:30 2005 From: david.adelson at tamu.edu (David Adelson) Date: Wed Apr 20 17:31:00 2005 Subject: [Bioperl-l] Installing BioPerl1.5/DBD/Gbrowse 1.6.2 on MacOSX 10.3.9 Message-ID: Has anyone had any luck getting gbrowse 1.6.2 to work on Mac OSX (10.3.9)? I am having trouble installing DBD and BioPerl1.5 from source. I have wasted way too much time on this today, but if anyone has any tips on getting these things to install/compile properly I would appreciate hearing from you. Dave Adelson, Texas A&M University -- It is a mistake to think you can solve any major problems just with potatoes. Douglas Adams From jason.stajich at duke.edu Wed Apr 20 17:52:03 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Apr 20 17:45:29 2005 Subject: [Bioperl-l] Installing BioPerl1.5/DBD/Gbrowse 1.6.2 on MacOSX 10.3.9 In-Reply-To: References: Message-ID: <3779c97246beec478735792bc0438be1@duke.edu> Can you provide more specific information about problems. What kind of compilation error messages, etc are you getting? -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 20, 2005, at 3:06 PM, David Adelson wrote: > Has anyone had any luck getting gbrowse 1.6.2 to work on Mac OSX > (10.3.9)? > I am having trouble installing DBD and BioPerl1.5 from source. I have > wasted way too much time on this today, but if anyone has any tips on > getting these things to install/compile properly I would appreciate > hearing > from you. > > Dave Adelson, Texas A&M University > > -- > It is a mistake to think you can solve any major problems just with > potatoes. > > Douglas Adams > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From harris at cshl.org Thu Apr 21 20:56:38 2005 From: harris at cshl.org (Todd Harris) Date: Thu Apr 21 20:50:13 2005 Subject: [Bioperl-l] Installing BioPerl1.5/DBD/Gbrowse 1.6.2 on MacOSX 10.3.9 In-Reply-To: Message-ID: Hi David - I've been running GBrowse day in and day out on Mac OS X. If you are having problem installing DBD, I suspect that you need to install the header files for MySQL. Todd > On 4/20/05 1:06 PM, David Adelson wrote: > Has anyone had any luck getting gbrowse 1.6.2 to work on Mac OSX (10.3.9)? > I am having trouble installing DBD and BioPerl1.5 from source. I have > wasted way too much time on this today, but if anyone has any tips on > getting these things to install/compile properly I would appreciate hearing > from you. > > Dave Adelson, Texas A&M University From jrm at compbio.dundee.ac.uk Fri Apr 22 04:58:44 2005 From: jrm at compbio.dundee.ac.uk (Jon manning) Date: Fri Apr 22 04:52:43 2005 Subject: [Bioperl-l] Bio::Matrix::IO issue Message-ID: <1114160324.18953.7.camel@tick.compbio.dundee.ac.uk> Hi all, I have a perl module that imports Bio::Matrix::IO and uses it several times. I have two scripts that call my module, a test script and something else I'm working on- the test works fine, but with the other I'm getting this error: "Bio::Root::IO" is not exported by the Bio::Root::IO module Can't continue after import errors at /usr/lib/perl5/site_perl/5.8.5/Bio/Matrix/IO.pm line 78 BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/Bio/Matrix/IO.pm line 78. ........ I don't want to post lots of code, it's all a bit complicated, and the crash doesn't point at a particular use of Bio::Matrix::IO, but does anyone have any idea what causes this sort of error? Thanks, Jon From jrm at compbio.dundee.ac.uk Fri Apr 22 05:21:46 2005 From: jrm at compbio.dundee.ac.uk (Jon manning) Date: Fri Apr 22 05:16:31 2005 Subject: [Bioperl-l] Bio::Matrix::IO issue In-Reply-To: <1114160324.18953.7.camel@tick.compbio.dundee.ac.uk> References: <1114160324.18953.7.camel@tick.compbio.dundee.ac.uk> Message-ID: <1114161706.18953.14.camel@tick.compbio.dundee.ac.uk> Funny- it was because I had 'use Bio::Matrix::IO' in my test module, but not the other one-though I'm not using it directly in my script. Anyway, apologies, the gaps in my perl knowledge are making themselves known! Jon On Fri, 2005-04-22 at 09:58 +0100, Jon manning wrote: > Hi all, > > I have a perl module that imports Bio::Matrix::IO and uses it several > times. I have two scripts that call my module, a test script and > something else I'm working on- the test works fine, but with the other > I'm getting this error: > > "Bio::Root::IO" is not exported by the Bio::Root::IO module > Can't continue after import errors > at /usr/lib/perl5/site_perl/5.8.5/Bio/Matrix/IO.pm line 78 > BEGIN failed--compilation aborted > at /usr/lib/perl5/site_perl/5.8.5/Bio/Matrix/IO.pm line 78. > ........ > > I don't want to post lots of code, it's all a bit complicated, and the > crash doesn't point at a particular use of Bio::Matrix::IO, but does > anyone have any idea what causes this sort of error? > > Thanks, > > Jon > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From whs at ebi.ac.uk Fri Apr 22 08:31:19 2005 From: whs at ebi.ac.uk (Will Spooner) Date: Fri Apr 22 08:25:12 2005 Subject: [Bioperl-l] Help with parsing (BLAST/exonerate) for DAS server In-Reply-To: Message-ID: Hi Michael, What you have is a Bio::Search::Result object generated from your blast report, and you would like to use this to generate a GFF file to load into LDAS? Whilst there is nothing like this in the Ensembl code, I do see a Bio::SearchIO::Writer::GbrowseGFF module that may suit your needs. If you have blasted against the Ensembl cdna file but want the chromosome location of the alignments, then you can parse the blast report into EnsemblHSP/Hit/Result objects. These extend BioPerl's GenericHSP/Hit/Result to exactly this end, and can be implemented in the manner described by the SearchIO howto; http://bioperl.org/HOWTOs/SearchIO/extending.html The code can be found in the 'modules' project of Ensembl's CVS repository (modules/Bio/Search/HSP/EnsemblHSP.pm etc). All the best, Will On Wed, 13 Apr 2005, Michael Seewald wrote: > Dear Bioperl developers, > > I have posed the following questions already to the Ensembl helpdesk and the > Ensembl-developers. Unfortunately, nobody could help me there (or was too > busy..). I would really appreciate your help or any pointers to > documentation/public examples. > > I would like to set up a DAS server in order to add sequences (+their > annotation) to the Ensembl ContigView. I have BLAST results of > 1) short genomic sequences BLASTed vs. the chromosomes ( > e.g.Homo_sapiens.NCBI35.nov.dna.chromosome.21.fa ) > and > 2) fragment transcript sequences (e.g. single Affy probes) BLASTed vs. > Ensembl transcripts (e.g. Homo_sapiens.NCBI35.nov.cdna.fa). > Now, I would like to parse and format them into something that can be > displayed using a DAS server like LDAS. > > * Are there already scripts available that support the parsing of BLAST > results and help to prepare them for the DAS server? Could you point me to > the source (e.g. in the Ensembl source)? > > * BLASTing versus the chromosomes and displaying sequences in the Contigview > seems to be straightforward because the BLAST hits can be taken right away. > What would you recommend, if I have transcript fragments, which could be the > result of splicing? How can I properly deal with intron/exon boundaries? Is > it a good idea to use Exonerate instead of BLAST here (or maybe in general > for the alignment)? > > Thanks & kind regards, > Michael Seewald > > -- > Dr. Michael Seewald > Bioinformatics > Bayer HealthCare AG > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From david.adelson at tamu.edu Fri Apr 22 10:48:27 2005 From: david.adelson at tamu.edu (David Adelson) Date: Fri Apr 22 11:31:54 2005 Subject: [Bioperl-l] Installing BioPerl1.5/DBD/Gbrowse 1.6.2 on MacOSX 10.3.9 In-Reply-To: <0e5e7a1c33fcf1390124350a5d9a3ee5@mail.nih.gov> Message-ID: Dear all, Thanks for your replies. Gbrowse from memory adaptor is still not working fully. I get no graphics (problems with GD, libgd; see below). Gbrowse from mysql is not working. Originally I got this message: ==== Could not open database. ------------- EXCEPTION ------------- MSG: Unable to load mysql adaptor: Can't locate Bio/DB/GFF/Adaptor/mysql.pm in @INC (@INC contains: /System/Library/Perl/5.8.1/darwin-thread-multi-2level /System/Library/Perl/5.8.1 /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 /Library/Perl /Network/Library/Perl/5.8.1/darwin-thread-multi-2level /Network/Library/Perl/5.8.1 /Network/Library/Perl .) at (eval 19) line 1. STACK Bio::DB::GFF::new /Library/Perl/5.8.1/Bio/DB/GFF.pm:643 STACK (eval) /Library/Perl/5.8.1/darwin-thread-multi-2level/Bio/Graphics/Browser/Util.pm: 158 STACK Bio::Graphics::Browser::Util::open_database /Library/Perl/5.8.1/darwin-thread-multi-2level/Bio/Graphics/Browser/Util.pm: 158 STACK main::get_settings /usr/local/apache2/cgi-bin/gbrowse:516 STACK toplevel /usr/local/apache2/cgi-bin/gbrowse:135 ==== I was able to copy mysql.pm up one level in the directory hierarchy and as a result no longer get this error. Now I get: ===== An internal error has occurred Could not open database. Can't locate object method "new" via package "Bio::DB::GFF::Adaptor::mysql" at /Library/Perl/5.8.1/Bio/DB/GFF.pm line 649. Please contact this site's maintainer (you@example.com) for assistance. ===== This message, while deliciously ironic (last line) seems much less tractable to me. Suggestions made include the possibility that the mysql header files are not present. They are in fact present at: fisher.tamu.edu [/usr/local/mysql/include] % ls config-os2.h m_string.h my_dir.h my_sys.h mysql_time.h sql_state.h config.h md5.h my_getopt.h my_time.h mysql_version.h sslopt-case.h errmsg.h merge.h my_global.h my_tree.h mysqld_error.h sslopt-longopts.h ft_global.h my_aes.h my_handler.h my_xml.h mysys_err.h sslopt-vars.h hash.h my_alarm.h my_list.h myisam.h nisam.h t_ctype.h heap.h my_alloc.h my_net.h myisammrg.h queues.h thr_alarm.h help_end.h my_base.h my_no_pthread.h myisampack.h raid.h thr_lock.h help_start.h my_bitmap.h my_nosys.h mysql.h rijndael.h typelib.h keycache.h my_config.h my_pthread.h mysql_com.h sha1.h violite.h m_ctype.h my_dbug.h my_semaphore.h mysql_embed.h sql_common.h So, my DBD problem is probably still a problem, but much less clear to me now. As far as the graphics issue is concerned, I thought I had identified the problem(s). The problem with GD is that it could not find gd.h which was not surprising since I could not get libgd to compile. It seemed that the header file for fontconfig library was not part of my X11 installation (Apple X11) which means libgd would not compile (I got this feedback from correspondence with the author of gd at Boutell.com). After installing the X11 SDK I can now compile lidgd AND GD!!! However, there were some warnings during the compile of libgd and the upshot is I still can't get any graphics:-( . Jason Stavich (I think) made the comment in an e-mail to me that I had to be careful about the library versions for gd. When I compiled gd I did get some warnings about library versions, but nothing fatal happened. (see attachment for output from gd and GD compiles) In the mean time, sorry to keep beating this horse, but it is not quite dead yet. Any additional suggestions will be gratefully accepted. Dave Versions of stuff being used: Bundle Bundle::BioPerl (C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz) bioperl-1.5.0 from source gd-2.0.33 zlib-1.2.2 libpng-1.2.8-config jpeg-6b GD-2.23 DBD-mysql-2.9006 MySQL 4.1.8-standard fisher.tamu.edu% uname -a Darwin fisher.tamu.edu 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power Macintosh powerpc fisher.tamu.edu % sw_vers ProductName: Mac OS X ProductVersion: 10.3.9 BuildVersion: 7W98 On 04/22/2005 05:02 AM, "Sean Davis" wrote: > Ditto from me. > > Sean > > On Apr 21, 2005, at 8:56 PM, Todd Harris wrote: > >> Hi David - >> >> I've been running GBrowse day in and day out on Mac OS X. If you are >> having >> problem installing DBD, I suspect that you need to install the header >> files >> for MySQL. >> >> Todd >> >>> On 4/20/05 1:06 PM, David Adelson wrote: >> >>> Has anyone had any luck getting gbrowse 1.6.2 to work on Mac OSX >>> (10.3.9)? >>> I am having trouble installing DBD and BioPerl1.5 from source. I have >>> wasted way too much time on this today, but if anyone has any tips on >>> getting these things to install/compile properly I would appreciate >>> hearing >>> from you. >>> >>> Dave Adelson, Texas A&M University >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- There's no trick to being a humorist when you have the whole government working for you. Will Rogers (1879 - 1935) -------------- next part -------------- A non-text attachment was scrubbed... Name: gd_compile.txt Type: application/octet-stream Size: 112137 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050422/8c8b5d56/gd_compile-0001.obj From jason.stajich at duke.edu Fri Apr 22 14:23:53 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Apr 22 14:48:37 2005 Subject: [Bioperl-l] Re: [Bioperl-guts-l] Re: how to use Storable.pm to save the object? In-Reply-To: <20050422180724.74977.qmail@web53601.mail.yahoo.com> References: <20050422180724.74977.qmail@web53601.mail.yahoo.com> Message-ID: <6b84521c795426f806ff6dc9fbee1964@duke.edu> I thought Will posted an answer to your query http://portal.open-bio.org/pipermail/bioperl-l/2005-April/018732.html Is there a good reason why a flatfile format like msf, clustal, etc won't work for your purpose or is Bio::SimpleAlign just and example? --jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 22, 2005, at 2:07 PM, Sally Li wrote: > Hi, > > I have waited for a few days since I sent a message to > the discussion group. But there is no response. Maybe > the question is too hard or some other issues? So I > have to submit to the technical group, which is not > appropriate. Sorry! > > Your help is appreciated! > > Sally. > > --- Sally Li wrote: >> Hi, there, >> >> Let's say we have an object which is SimpleAlign >> >> $aln >> >> How can we store this object using Storable.pm in a >> specific directory? I have difficulty to understand >> the doc in module Storable.pm. >> >> Any help will be appreciated! >> >> Thanks! >> >> Sally >> >> >> >> __________________________________ >> Do you Yahoo!? >> Yahoo! Small Business - Try our new resources site! >> http://smallbusiness.yahoo.com/resources/ >> > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > From Kary at ioc.fiocruz.br Fri Apr 22 15:58:40 2005 From: Kary at ioc.fiocruz.br (Kary Ann Del Carmen Soriano Ocana) Date: Fri Apr 22 15:57:34 2005 Subject: [Bioperl-l] Temporary name of hmmbuild model Message-ID: <8D44604203DAF9438BF9123B4A08C779B2704E@alpha.ioc.fiocruz.br> Dear all: I am working with hmmbuild, it is working very well, but the problem is, that it is generating the name model as a temporary name, and it should generate it whit the same name of the origin file. This is a bit problem because after this I have worked whit hmmpfam and the outfile is very confuse, for the temporary name. The hmmbuildpfam output listed below shows in the second column the temporary/processing names, and not the real "description name"... those names are generated when we build the HMM models with hmmbuild/hmmcalibrate (maybe they are extracted from temporary files then stored in our database like that or In my mind it is reading the clustalw alignment as temporary)... not sure how we could catch the real names.. any further tips would be greatly appreciated. [kary@vivax MGE-Tryp_Mobile_Genetic_Elements_14_01_05]$ more searchio.out tr|Q8GPE3 lEwFFhiGV0 -108 1.4e-05 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 Uge97N6Hov 1229 0.0e+00 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 5o31pMTfTD 358 2.9e-106 Tnp-IS1191 - Streptococcus thermophilus. tr|Q8GPE3 S9idxzhdAK 108 4.4e-31 Tnp-IS1191 - Streptococcus thermophilus. And this is the script of hmmbuild. #!/usr/bin/perl -w #'/usr/local/bin/hmmbuild'; use lib "/usr/local/bioperl14"; use lib "/usr/local/bioperl-run-1.4"; #use Bio::Tools::Run::WrapperBase; use Bio::Tools::Run::Hmmer; #use Bio::Tools::Run::Alignment::Clustalw; use strict; my $outfile; open (LOG, ">>./log_file_hmmerbuild"); my $in; my @tree; my $street; my $dirname = "/home/kary/public_html/inserir_dados/trypanosoma_5B_0603_CLUSTALW_files_teste"; open (DH, "find $dirname |") or die "Cannot open $dirname: $!"; while ($in=){ chomp ($in); for ($in =~ /\_(aln)/) { @tree = split(/[\/]/, $in); my $aln = pop @tree; my $street = join "/", @tree,"\n"; chomp ($street); print LOG "File $street$aln launched and "; &hmm_build($in); } } ############################################################################################################################################# sub hmm_build { my($aln) = @_; print "Your input file is: $aln\n"; my $name_outfile = "$aln"; $name_outfile =~ s/_aln//; my $outfile = "$name_outfile.hmm"; print LOG "Your output file is: $outfile\n"; #build a hmm using hmmbuild my $aio = Bio::AlignIO->new(-file=>"$in",-format=>'clustalw') or die print "Error for open the file"; $aln = $aio->next_aln; my $factory = Bio::Tools::Run::Hmmer->new('program'=>'hmmbuild','hmm'=>$outfile); $factory->run($aln); &hmm_calibrate($in); } ############################################################################################################################################# sub hmm_calibrate { my($aln) = @_; my $name_outfile = "$aln"; $name_outfile =~ s/.aln//; my $outfile = "$name_outfile.hmm"; print LOG "Your output file is: $outfile\n"; #calibrate the hmm my $factory = Bio::Tools::Run::Hmmer->new('program'=>'hmmcalibrate','hmm'=>$outfile); $factory->run(); } ############################################################################################################################################# Kindest regards, Kary From jason.stajich at duke.edu Fri Apr 22 17:53:54 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Apr 22 17:47:30 2005 Subject: [Bioperl-l] Temporary name of hmmbuild model In-Reply-To: <8D44604203DAF9438BF9123B4A08C779B2704E@alpha.ioc.fiocruz.br> References: <8D44604203DAF9438BF9123B4A08C779B2704E@alpha.ioc.fiocruz.br> Message-ID: <395df2a931509746b4a5e6daafaae7a0@duke.edu> you will want to either write out the alignment to a file first (with a name that makes sense) and pass that filename in when you run $hmmbuild->run() or pass in the -n option to hmmbuild to explicitly apply the name. I don't know if the current version of Tools::HMMer supports -n but I've update the code recently in CVS in the bioperl-run package and you might give it a try. -jason On Apr 22, 2005, at 3:58 PM, Kary Ann Del Carmen Soriano Ocana wrote: > Dear all: > I am working with hmmbuild, it is working very well, but the problem > is, that it is generating the name model as a temporary name, and it > should generate it whit the same name of the origin file. > This is a bit problem because after this I have worked whit hmmpfam > and the outfile is very confuse, for the temporary name. > The hmmbuildpfam output listed below shows in the second column the > temporary/processing names, and not the real "description > name"... those names are generated when we build the HMM models with > hmmbuild/hmmcalibrate (maybe they are extracted from temporary files > then stored in our database like that or In my mind it is reading the > clustalw alignment as temporary)... not sure how we could catch > the real names.. any further tips would be greatly appreciated. > > [kary@vivax MGE-Tryp_Mobile_Genetic_Elements_14_01_05]$ more > searchio.out > tr|Q8GPE3 lEwFFhiGV0 -108 1.4e-05 Tnp-IS1191 - > Streptococcus thermophilus. > tr|Q8GPE3 Uge97N6Hov 1229 0.0e+00 Tnp-IS1191 - > Streptococcus thermophilus. > tr|Q8GPE3 5o31pMTfTD 358 2.9e-106 Tnp-IS1191 - > Streptococcus thermophilus. > tr|Q8GPE3 S9idxzhdAK 108 4.4e-31 Tnp-IS1191 - > Streptococcus thermophilus. > > And this is the script of hmmbuild. > #!/usr/bin/perl -w > #'/usr/local/bin/hmmbuild'; > use lib "/usr/local/bioperl14"; > use lib "/usr/local/bioperl-run-1.4"; > > #use Bio::Tools::Run::WrapperBase; > use Bio::Tools::Run::Hmmer; > #use Bio::Tools::Run::Alignment::Clustalw; > use strict; > > my $outfile; > open (LOG, ">>./log_file_hmmerbuild"); > my $in; > my @tree; > my $street; > > my $dirname = > "/home/kary/public_html/inserir_dados/ > trypanosoma_5B_0603_CLUSTALW_files_teste"; > open (DH, "find $dirname |") or die "Cannot open $dirname: $!"; > > while ($in=){ > chomp ($in); > > for ($in =~ /\_(aln)/) { > @tree = split(/[\/]/, $in); > my $aln = pop @tree; > my $street = join "/", @tree,"\n"; > chomp ($street); > > > print LOG "File $street$aln launched and "; > &hmm_build($in); > > > } > > } > > > ####################################################################### > ###################################################################### > sub hmm_build { > my($aln) = @_; > print "Your input file is: $aln\n"; > my $name_outfile = "$aln"; > $name_outfile =~ s/_aln//; > my $outfile = "$name_outfile.hmm"; > print LOG "Your output file is: $outfile\n"; > > #build a hmm using hmmbuild > my $aio = Bio::AlignIO->new(-file=>"$in",-format=>'clustalw') or die > print "Error for open the file"; > $aln = $aio->next_aln; > my $factory = > Bio::Tools::Run::Hmmer->new('program'=>'hmmbuild','hmm'=>$outfile); > $factory->run($aln); > &hmm_calibrate($in); > } > ####################################################################### > ###################################################################### > sub hmm_calibrate { > my($aln) = @_; > my $name_outfile = "$aln"; > $name_outfile =~ s/.aln//; > my $outfile = "$name_outfile.hmm"; > print LOG "Your output file is: $outfile\n"; > #calibrate the hmm > my $factory = > Bio::Tools::Run::Hmmer- > >new('program'=>'hmmcalibrate','hmm'=>$outfile); > $factory->run(); > } > ####################################################################### > ###################################################################### > > > Kindest regards, > Kary > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Fri Apr 22 23:17:48 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Apr 22 23:13:05 2005 Subject: [Bioperl-l] Cluster::SequenceFamily question Message-ID: <4269BE5C.4090309@utk.edu> Shawn, Currently get_members can use only organism related data as a criteria, is that right or I am missing something? If this is the case is it possible to add other criteria as well, for example -authority, -namespace, etc. Thanks! Stefan From hlapp at gnf.org Sat Apr 23 21:41:31 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Sat Apr 23 21:35:03 2005 Subject: [Bioperl-l] nested joins Message-ID: If I understand things somewhat correctly, then the following regexp is used to deal with nested joins (bug#1674): $re = qr{ \( (?: (?> [^()]+ ) # Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x; This uses 2 advanced perlre features, which, despite being perfectly well documented in perl 5.6.0 behaves (matches) differently between perl 5.6.0 and later versions. (The irony seems to be that the expression itself appears verbatim in perlre as an example - already in 5.6.0!) I have tested 5.6.1 on linux and the expression matches correctly there. Maybe this is also a platform issue, but I don't have any other platform than Mac OSX 10.2 that still uses 5.6.0. I've included a scriptlet at the end with which people can test on their platform. This difference in behaviour is most likely the reason why the LocationFactory test fails on 5.6.0 but succeeds on later versions of perl. There's a couple of options we have: a) Require perl 5.6.1 in the Makefile.PL, and abandon support for 5.6.0. b) Remove support for nested joins in location strings. c) Branch in the respective piece of code depending on perl version and don't use the regex construct above if perl version is 5.6.0 or less, with the understanding that nested joins are not supported in perl 5.6.0. (BTW this is not supported at all in versions 5.005 and lower, so the requiring 5.005 in Makefile.PL should certainly be revised.) I'm a bit ambivalent on this as nested joins shouldn't really exist and unless I'm mistaken only existed in Genbank temporarily as allegedly they have been fixed now by NCBI staff. So, I'm a bit worried that we're incurring issues while spending efforts on how to best solve a non-existent problem. OTOH, it appears that the only two tests failing in 5.6.0 are the nested locations, so maybe no code changes are necessary in order to properly support all location strings in 5.6.0 except nested joins? If this is true the easiest solution would be to skip the two tests if perl is 5.6.0 or lower. Any opinions, comments, or pieces of advice appreciated. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- To verify behaviour, use the following scriptlet on your platform: my $re; $re = qr{ \( (?: (?> [^()]+ ) # Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x; my $oparg = 'join(11..21,join(100..300,complement(150..230)))'; while( $oparg =~ s/(join|order|bond)$re//ig ) { print "match: \$oparg ='$oparg', \$\& = '$&'\n"; } When run through perl -w it outputs Use of uninitialized value in substitution (s///) at re.pl line 12. under perl 5.6.0 (which is wrong) and match: $oparg ='', $& = 'join(11..21,join(100..300,complement(150..230)))' under perl 5.6.1+ (which is correct). From amackey at pcbi.upenn.edu Sun Apr 24 07:50:48 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Sun Apr 24 07:44:39 2005 Subject: [Bioperl-l] nested joins In-Reply-To: References: Message-ID: <478a880c57fd6bf5830b9e8b86bda2fa@pcbi.upenn.edu> Thanks for the detective work; I'd vote for this simplest option for now. -Aaron On Apr 23, 2005, at 9:41 PM, Hilmar Lapp wrote: > OTOH, it appears that the only two tests failing in 5.6.0 are the > nested locations, so maybe no code changes are necessary in order to > properly support all location strings in 5.6.0 except nested joins? If > this is true the easiest solution would be to skip the two tests if > perl is 5.6.0 or lower. > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From jason.stajich at duke.edu Sun Apr 24 10:21:56 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Apr 24 10:15:32 2005 Subject: [Bioperl-l] Re: nested joins In-Reply-To: References: Message-ID: This won't solve your problem, but I've fixed bug #1765 for nested joins as well now, all with regexps so this closes bug: http://bugzilla.open-bio.org/show_bug.cgi?id=1765 I think the RE solution is a little more elegant so I've stayed with it. It does require re-ordering the sub-locations based on the input string since the RE pulls out the groups first and then the non-joined sections second. Here is the code which captures the section (the $re is the same as one hilmar is listing below): # lets capture and remove all the sections which are groups while( $oparg =~ s/(join|order|bond)$re//ig ) { push @sections, $&; } push @sections, split(/,/,$oparg) if length($oparg); # because we don't necessarily process the string in-order # as we are pulling the data from the string out for # groups first, then pulling out data, comma delimited # I am re-sorting the sections based on their position # in the original string, using the index function to figure # out their position in the string # --jason # resort based on input order, schwartzian style! @sections = map { shift @$_ } sort { $a->[1] <=> $b->[1] } map { [$_, index($oparg_orig, $_)] } @sections; -jason On Apr 23, 2005, at 9:41 PM, Hilmar Lapp wrote: > If I understand things somewhat correctly, then the following regexp > is used to deal with nested joins (bug#1674): > > $re = qr{ > \( > (?: > (?> [^()]+ ) # Non-parens without backtracking > | > (??{ $re }) # Group with matching parens > )* > \) > }x; > > This uses 2 advanced perlre features, which, despite being perfectly > well documented in perl 5.6.0 behaves (matches) differently between > perl 5.6.0 and later versions. (The irony seems to be that the > expression itself appears verbatim in perlre as an example - already > in 5.6.0!) > > I have tested 5.6.1 on linux and the expression matches correctly > there. Maybe this is also a platform issue, but I don't have any other > platform than Mac OSX 10.2 that still uses 5.6.0. > > I've included a scriptlet at the end with which people can test on > their platform. > > This difference in behaviour is most likely the reason why the > LocationFactory test fails on 5.6.0 but succeeds on later versions of > perl. > > There's a couple of options we have: > > a) Require perl 5.6.1 in the Makefile.PL, and abandon support for > 5.6.0. > b) Remove support for nested joins in location strings. > c) Branch in the respective piece of code depending on perl version > and don't use the regex construct above if perl version is 5.6.0 or > less, with the understanding that nested joins are not supported in > perl 5.6.0. > > (BTW this is not supported at all in versions 5.005 and lower, so the > requiring 5.005 in Makefile.PL should certainly be revised.) > > I'm a bit ambivalent on this as nested joins shouldn't really exist > and unless I'm mistaken only existed in Genbank temporarily as > allegedly they have been fixed now by NCBI staff. So, I'm a bit > worried that we're incurring issues while spending efforts on how to > best solve a non-existent problem. > > OTOH, it appears that the only two tests failing in 5.6.0 are the > nested locations, so maybe no code changes are necessary in order to > properly support all location strings in 5.6.0 except nested joins? If > this is true the easiest solution would be to skip the two tests if > perl is 5.6.0 or lower. > > Any opinions, comments, or pieces of advice appreciated. > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > To verify behaviour, use the following scriptlet on your platform: > > my $re; > $re = qr{ > \( > (?: > (?> [^()]+ ) # Non-parens without backtracking > | > (??{ $re }) # Group with matching parens > )* > \) > }x; > my $oparg = 'join(11..21,join(100..300,complement(150..230)))'; > while( $oparg =~ s/(join|order|bond)$re//ig ) { > print "match: \$oparg ='$oparg', \$\& = '$&'\n"; > } > > When run through perl -w it outputs > > Use of uninitialized value in substitution (s///) at re.pl line 12. > > under perl 5.6.0 (which is wrong) and > > match: $oparg ='', $& = > 'join(11..21,join(100..300,complement(150..230)))' > > under perl 5.6.1+ (which is correct). > From hlapp at gmx.net Sun Apr 24 20:04:12 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Apr 24 19:57:41 2005 Subject: [Bioperl-l] survey: perl version Message-ID: <8E153CC6-B51D-11D9-AE48-000A959EB4C4@gmx.net> (Trigger for this survey is that we have at least one regex construct in the code that is unsupported on perl 5.005 and earlier, i.e., perl actually complains about the construct itself, yet this has not been reported before AFAIK.) For coding and installation requirements it'd be good to know which perl version people are using bioperl with. Specifically, is anybody still using bioperl under perl 5.005 or earlier. Please also respond if you use perl 5.6.0. If the impression is that nobody depends on perl 5.005 anymore then we may as well decide to increment the required version of perl from the next release on. (It is 5.005 right now.) -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cain at cshl.edu Sun Apr 24 22:51:03 2005 From: cain at cshl.edu (Scott Cain) Date: Sun Apr 24 22:44:29 2005 Subject: [Bioperl-l] survey: perl version In-Reply-To: <8E153CC6-B51D-11D9-AE48-000A959EB4C4@gmx.net> References: <8E153CC6-B51D-11D9-AE48-000A959EB4C4@gmx.net> Message-ID: <1114397464.3461.1.camel@localhost.localdomain> The only people I can think of the use a really old perl are the Harvard Flybase folks: do you guys use Bioperl? Scott On Sun, 2005-04-24 at 17:04 -0700, Hilmar Lapp wrote: > (Trigger for this survey is that we have at least one regex construct > in the code that is unsupported on perl 5.005 and earlier, i.e., perl > actually complains about the construct itself, yet this has not been > reported before AFAIK.) > > For coding and installation requirements it'd be good to know which > perl version people are using bioperl with. Specifically, is anybody > still using bioperl under perl 5.005 or earlier. Please also respond if > you use perl 5.6.0. > > If the impression is that nobody depends on perl 5.005 anymore then we > may as well decide to increment the required version of perl from the > next release on. (It is 5.005 right now.) > > -hilmar -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From mark_lambrecht at yahoo.com Mon Apr 25 05:45:27 2005 From: mark_lambrecht at yahoo.com (Mark Lambrecht) Date: Mon Apr 25 05:38:43 2005 Subject: [Bioperl-l] Entrez Gene solution : code available Message-ID: <20050425094527.60855.qmail@web30606.mail.mud.yahoo.com> The below mail has been sent to the bioperl list April 13th. In order that people can use and test the code, we have decided to make it available. You can download the code using the following link : http://users.pandora.be/akasha/BioPerl.tar.gz Kris Ulens bioinformatics software developer, Galapagos Genomics tel.: + 32 (0) 486 683 532 e-mail: fantom@earthling.net Mark Lambrecht, PhD scientist bioinformatics, Galapagos Genomics and K.U.Leuven, Faculty of Applied Bioscience and Engineering tel.: + 32 (0) 495 944 125 e-mail: mark_lambrecht@yahoo.com ===================================== Mail to bioperl-l list April 13th : We have developed our own interface to the NCBI Entrez Gene ASN.1 flat files. We needed this internally to replace the bioperl LocusLink parser. Because we have used so many great bioperl code over the last years, we had hoped that people can benefit from our work. This system has already proven its value , at least for us. The module consists of the following objects: => Bio::_GeneData.pm : abstract engine for parsing "type blocks" within the NCBI ASN.1 files => Bio::Gene.pm :Entrez Gene object (replaces the Bioperl sequence object that is normally returned by an IO object) and only keeps relevant data, can easily be extended to map additional needed data using the GeneData engine => Bio::GeneIO.pm : iterator derived from RootIO (similar to the SeqIO objects); implements next_gene method. subdirectory Index with => Bio::Index::EntrezGene.pm : object with capability to index and consult an ASN.1 File, inherits from Bio::Index::Abstract test scripts will be committed too : => few small test records (with extension asn1) => t_gene_indexer.pl : test file to index asn.1 file and return an example record #example: my $file = "gene_hs.asn1"; my $inx = Bio::Index::EntrezGene->new( '-filename' => $file.".inx", '-write_flag' => 'WRITE'); $inx->make_index("/usr/local/datasets/ncbi/gene/$file"); => testGene.pl : tests a Gene objects for return of appropriate data fields #example for only extracting track info from the asn1 file, this is a dynamic way of choosing which data to parse my $track_info = new Bio::Gene::GeneTrack; $track_info->geneid(1); $gene->type('test_type'); $gene->track_info($track_info); print "dump:\n".Dumper($gene)."\n"; Stefan Kirov and Mingyi Liu have produced similar solutions (wich we didn't test); we believe that ours is different because it is a all-in-one lightweight Entrez Gene ASN1 parser that will only capture essential data (thereby making it rather fast). We deliberately didn't choose to map the data on a Seq object. At the same time, a bioperl-compliant indexer has been written. We hope that this code can somehow be useful. We will commit the code to bioperl cvs if people agree, as soon as we obtain a login. Kris Ulens (bioinformatics software developer) Mark Lambrecht (scientist bioinformatics) Galapagos Genomics http://www.galapagosgenomics.com We have developed our own interface to the NCBI Entrez Gene ASN.1 flat files. We needed this internally to replace the bioperl LocusLink parser. Because we have used so many great bioperl code over the last years, we had hoped that people can benefit from our work. This system has already proven its value , at least for us. The module consists of the following objects: => Bio::_GeneData.pm : abstract engine for parsing "type blocks" within the NCBI ASN.1 files => Bio::Gene.pm :Entrez Gene object (replaces the Bioperl sequence object that is normally returned by an IO object) and only keeps relevant data, can easily be extended to map additional needed data using the GeneData engine => Bio::GeneIO.pm : iterator derived from RootIO (similar to the SeqIO objects); implements next_gene method. subdirectory Index with => Bio::Index::EntrezGene.pm : object with capability to index and consult an ASN.1 File, inherits from Bio::Index::Abstract test scripts will be committed too : => few small test records (with extension asn1) => t_gene_indexer.pl : test file to index asn.1 file and return an example record #example: my $file = "gene_hs.asn1"; my $inx = Bio::Index::EntrezGene->new( '-filename' => $file.".inx", '-write_flag' => 'WRITE'); $inx->make_index("/usr/local/datasets/ncbi/gene/$file"); => testGene.pl : tests a Gene objects for return of appropriate data fields #example for only extracting track info from the asn1 file, this is a dynamic way of choosing which data to parse my $track_info = new Bio::Gene::GeneTrack; $track_info->geneid(1); $gene->type('test_type'); $gene->track_info($track_info); print "dump:\n".Dumper($gene)."\n"; Stefan Kirov and Mingyi Liu have produced similar solutions (wich we didn't test); we believe that ours is different because it is a all-in-one lightweight Entrez Gene ASN1 parser that will only capture essential data (thereby making it rather fast). We deliberately didn't choose to map the data on a Seq object. At the same time, a bioperl-compliant indexer has been written. We hope that this code can somehow be useful. We will commit the code to bioperl cvs if people agree, as soon as we obtain a login. Kris Ulens Mark Lambrecht __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From sam.kalat at gmail.com Mon Apr 25 08:43:52 2005 From: sam.kalat at gmail.com (Sam Kalat) Date: Mon Apr 25 08:37:27 2005 Subject: [Bioperl-l] Installing BioPerl1.5/DBD/Gbrowse 1.6.2 on MacOSX 10.3.9 In-Reply-To: References: <0e5e7a1c33fcf1390124350a5d9a3ee5@mail.nih.gov> Message-ID: <9d86df2b05042505432d8c86ed@mail.gmail.com> I recently started to move some of my bioinformatics work from a Linux machine to a Mac. I can't help you with gbrowse yet, but I might be able to help with some of its prerequisites like GD. I compiled GD from source, and found that in its initial state my machine did not have the JPEG and PNG libraries. I installed these through fink, although in hindsight I would have probably done them by hand. My config.log reminds me that this is what I used for my configure step with GD (libgd): $ ./configure --with-jpeg=/sw --disable-gif The --with-jpeg directing to /sw is because that is the root folder for fink, so YMMV. I recall being very confused about some errors relating to GIF. I was never sure if the GIF libraries were missing due to its patent nonsense or what, and I had not encountered that error before on Linux or Solaris. But the --disable-gif was enough for me. I can't say if I have the same errors as you, your compile messages are encoded in some non-ascii format and I'm too lazy to figure out how to convert it. Subsequently when I installed the GD perl module under CPAN, I had to use 'look GD' and run the make steps by hand, to tell it that libgd had installed in /usr/local/lib rather than /usr/lib. At least I think that's what I did, all I actually wrote down was that I was having problems with paths. I also installed perl from source so that I could compile additional modules that we need, that in turn require perl headers that don't come by default with the mac. YMMV there too, but it is something to watch out for, and then you'll have two perls on your hands. Good luck, Sam Kalat On 4/22/05, David Adelson wrote: > Dear all, > > Thanks for your replies. > > Gbrowse from memory adaptor is still not working fully. I get no graphics > (problems with GD, libgd; see below). > > Gbrowse from mysql is not working. Originally I got this message: > ==== > Could not open database. > > ------------- EXCEPTION ------------- > MSG: Unable to load mysql adaptor: Can't locate Bio/DB/GFF/Adaptor/mysql.pm > in @INC (@INC contains: > /System/Library/Perl/5.8.1/darwin-thread-multi-2level > /System/Library/Perl/5.8.1 /Library/Perl/5.8.1/darwin-thread-multi-2level > /Library/Perl/5.8.1 /Library/Perl > /Network/Library/Perl/5.8.1/darwin-thread-multi-2level > /Network/Library/Perl/5.8.1 /Network/Library/Perl .) at (eval 19) line 1. > > STACK Bio::DB::GFF::new /Library/Perl/5.8.1/Bio/DB/GFF.pm:643 > STACK (eval) > /Library/Perl/5.8.1/darwin-thread-multi-2level/Bio/Graphics/Browser/Util.pm: > 158 > STACK Bio::Graphics::Browser::Util::open_database > /Library/Perl/5.8.1/darwin-thread-multi-2level/Bio/Graphics/Browser/Util.pm: > 158 > STACK main::get_settings /usr/local/apache2/cgi-bin/gbrowse:516 > STACK toplevel /usr/local/apache2/cgi-bin/gbrowse:135 > ==== > > I was able to copy mysql.pm up one level in the directory hierarchy and as a > result no longer get this error. Now I get: > ===== > An internal error has occurred > > Could not open database. > > Can't locate object method "new" via package "Bio::DB::GFF::Adaptor::mysql" > at /Library/Perl/5.8.1/Bio/DB/GFF.pm line 649. > > Please contact this site's maintainer (you@example.com) for assistance. > ===== > > This message, while deliciously ironic (last line) seems much less tractable > to me. Suggestions made include the possibility that the mysql header files > are not present. They are in fact present at: > > fisher.tamu.edu [/usr/local/mysql/include] % ls > config-os2.h m_string.h my_dir.h my_sys.h > mysql_time.h sql_state.h > config.h md5.h my_getopt.h my_time.h > mysql_version.h sslopt-case.h > errmsg.h merge.h my_global.h my_tree.h > mysqld_error.h sslopt-longopts.h > ft_global.h my_aes.h my_handler.h my_xml.h > mysys_err.h sslopt-vars.h > hash.h my_alarm.h my_list.h myisam.h > nisam.h t_ctype.h > heap.h my_alloc.h my_net.h myisammrg.h > queues.h thr_alarm.h > help_end.h my_base.h my_no_pthread.h myisampack.h > raid.h thr_lock.h > help_start.h my_bitmap.h my_nosys.h mysql.h > rijndael.h typelib.h > keycache.h my_config.h my_pthread.h mysql_com.h > sha1.h violite.h > m_ctype.h my_dbug.h my_semaphore.h mysql_embed.h > sql_common.h > > So, my DBD problem is probably still a problem, but much less clear to me > now. > > As far as the graphics issue is concerned, I thought I had identified the > problem(s). The problem with GD is that it could not find gd.h which was > not surprising since I could not get libgd to compile. It seemed that the > header file for fontconfig library was not part of my X11 installation > (Apple X11) which means libgd would not compile (I got this feedback from > correspondence with the author of gd at Boutell.com). After installing the > X11 SDK I can now compile lidgd AND GD!!! However, there were some warnings > during the compile of libgd and the upshot is I still can't get any > graphics:-( . Jason Stavich (I think) made the comment in an e-mail to me > that I had to be careful about the library versions for gd. When I compiled > gd I did get some warnings about library versions, but nothing fatal > happened. (see attachment for output from gd and GD compiles) > > In the mean time, sorry to keep beating this horse, but it is not quite dead > yet. Any additional suggestions will be gratefully accepted. > > Dave > > Versions of stuff being used: > Bundle Bundle::BioPerl (C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz) > bioperl-1.5.0 from source > gd-2.0.33 > zlib-1.2.2 > libpng-1.2.8-config > jpeg-6b > GD-2.23 > DBD-mysql-2.9006 > MySQL 4.1.8-standard > > fisher.tamu.edu% uname -a > Darwin fisher.tamu.edu 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 > 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power Macintosh > powerpc > fisher.tamu.edu % sw_vers > ProductName: Mac OS X > ProductVersion: 10.3.9 > BuildVersion: 7W98 > > On 04/22/2005 05:02 AM, "Sean Davis" wrote: > > > Ditto from me. > > > > Sean > > > > On Apr 21, 2005, at 8:56 PM, Todd Harris wrote: > > > >> Hi David - > >> > >> I've been running GBrowse day in and day out on Mac OS X. If you are > >> having > >> problem installing DBD, I suspect that you need to install the header > >> files > >> for MySQL. > >> > >> Todd > >> > >>> On 4/20/05 1:06 PM, David Adelson wrote: > >> > >>> Has anyone had any luck getting gbrowse 1.6.2 to work on Mac OSX > >>> (10.3.9)? > >>> I am having trouble installing DBD and BioPerl1.5 from source. I have > >>> wasted way too much time on this today, but if anyone has any tips on > >>> getting these things to install/compile properly I would appreciate > >>> hearing > >>> from you. > >>> > >>> Dave Adelson, Texas A&M University > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > There's no trick to being a humorist when you have the whole government > working for you. > Will Rogers (1879 - 1935) > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From zhou at morgan.harvard.edu Mon Apr 25 09:43:50 2005 From: zhou at morgan.harvard.edu (Pinglei Zhou) Date: Mon Apr 25 09:40:05 2005 Subject: [Bioperl-l] survey: perl version Message-ID: <200504251343.j3PDhoV26039@hershel.flybase.harvard.edu> > >The only people I can think of the use a really old perl are the Harvard >Flybase folks: do you guys use Bioperl? Unfortunately, our system still use Perl 5.005. Yes, we use Bioperl. Pinglei >On Sun, 2005-04-24 at 17:04 -0700, Hilmar Lapp wrote: >> (Trigger for this survey is that we have at least one regex construct >> in the code that is unsupported on perl 5.005 and earlier, i.e., perl >> actually complains about the construct itself, yet this has not been >> reported before AFAIK.) >> >> For coding and installation requirements it'd be good to know which >> perl version people are using bioperl with. Specifically, is anybody >> still using bioperl under perl 5.005 or earlier. Please also respond if >> you use perl 5.6.0. >> >> If the impression is that nobody depends on perl 5.005 anymore then we >> may as well decide to increment the required version of perl from the >> next release on. (It is 5.005 right now.) >> >> -hilmar >-- >------------------------------------------------------------------------ >Scott Cain, Ph. D. cain@cshl.org >GMOD Coordinator (http://www.gmod.org/) 216-392-3087 >Cold Spring Harbor Laboratory From jason.stajich at duke.edu Mon Apr 25 11:57:51 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Apr 25 11:51:08 2005 Subject: [Bioperl-l] Re: How to extend a class in bioperl? In-Reply-To: <20050425134903.97954.qmail@web53604.mail.yahoo.com> References: <20050425134903.97954.qmail@web53604.mail.yahoo.com> Message-ID: <2065fdfc8ffb6b853aea0fdc5572268e@duke.edu> You don't want those enclosing {} This is the minimum it takes to extend an object package AnTree; use vars qw(@ISA); use Bio::Tree::Tree; @ISA = qw(Bio::Tree::Tree); # your routines here 1; If you aren't adding any initialization arguments to the object you don't need to implement the 'new' function. If you want TreeIO to create AnTrees instead Bio::Tree::Tree you need to just do this # initialize Bio::TreeIO like normal my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'filenamehere'); # reset the Tree object type that is created $treeio->_eventHandler->treetype('Bio::Tree::AnTree'); -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 25, 2005, at 9:49 AM, Sally Li wrote: > Hi, there, > > Thanks to Jason, Will and Brian for the help in > storage of an object. > > Now, I have a techical question. It may be a general > OO programming one. How to make a parent instance into > a child instance so that it has the methods of the > child class? > > Attached are three files for the extension of > Bio::Tree::Tree class (AnTree.pm file), the test > script (AnTreeProcessV.pl) and the tree file > (testTree). My goal in this example is to make the > instance $myTree, which is an instance of Tree, have > the methods of AnTree so that I can use the methods > (like, getUnsortedNodes ()). As you see, the AnTree.pm > have the same attributes as Bio::Tree::Tree but more > methods. > > This is the way I would like to use to extend a class. > It is easy to manage the codes (in my opinion). I > don't have to use the same name as Bio::Tree::Tree, > which I used to modify a class in Bioperl. > > Thank you for help! > > Sally > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From jason.stajich at duke.edu Mon Apr 25 13:48:24 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Apr 25 13:41:59 2005 Subject: [Bioperl-l] Re: How to extend a class in bioperl? In-Reply-To: <20050425171453.77100.qmail@web53604.mail.yahoo.com> References: <20050425171453.77100.qmail@web53604.mail.yahoo.com> Message-ID: <35ed2b9ea10875d6ddeda51c840d849b@duke.edu> See Bio::Tree::Node and Bio::Tree::NodeNHX as examples of adding new methods to a base class. See the score function for example in Bio::Tree::Tree as to how to have a getter/setter method. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 25, 2005, at 1:14 PM, Sally Li wrote: > Hi, Jason, > > Thank you for your help! It works! > > If I need to add some attributes to the AnTree.pm, > what is the easy way? > > Thank you! > > Sally > > --- Jason Stajich wrote: >> You don't want those enclosing {} >> >> This is the minimum it takes to extend an object >> >> package AnTree; >> use vars qw(@ISA); >> use Bio::Tree::Tree; >> @ISA = qw(Bio::Tree::Tree); >> >> # your routines here >> >> 1; >> >> >> If you aren't adding any initialization arguments to >> the object you >> don't need to implement the 'new' function. >> >> If you want TreeIO to create AnTrees instead >> Bio::Tree::Tree you need >> to just do this >> # initialize Bio::TreeIO like normal >> my $treeio = Bio::TreeIO->new(-format => 'newick', >> -file => >> 'filenamehere'); >> # reset the Tree object type that is created >> > $treeio->_eventHandler->treetype('Bio::Tree::AnTree'); >> >> >> -jason >> -- >> Jason Stajich >> jason.stajich at duke.edu >> http://www.duke.edu/~jes12/ >> >> On Apr 25, 2005, at 9:49 AM, Sally Li wrote: >> >>> Hi, there, >>> >>> Thanks to Jason, Will and Brian for the help in >>> storage of an object. >>> >>> Now, I have a techical question. It may be a >> general >>> OO programming one. How to make a parent instance >> into >>> a child instance so that it has the methods of the >>> child class? >>> >>> Attached are three files for the extension of >>> Bio::Tree::Tree class (AnTree.pm file), the test >>> script (AnTreeProcessV.pl) and the tree file >>> (testTree). My goal in this example is to make the >>> instance $myTree, which is an instance of Tree, >> have >>> the methods of AnTree so that I can use the >> methods >>> (like, getUnsortedNodes ()). As you see, the >> AnTree.pm >>> have the same attributes as Bio::Tree::Tree but >> more >>> methods. >>> >>> This is the way I would like to use to extend a >> class. >>> It is easy to manage the codes (in my opinion). I >>> don't have to use the same name as >> Bio::Tree::Tree, >>> which I used to modify a class in Bioperl. >>> >>> Thank you for help! >>> >>> Sally >>> >>> >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >> >> >> > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From russo at morgan.harvard.edu Mon Apr 25 09:54:54 2005 From: russo at morgan.harvard.edu (Susan Russo) Date: Mon Apr 25 16:31:45 2005 Subject: [Bioperl-l] survey: perl version Message-ID: <200504251354.j3PDss329221@sang.flybase.harvard.edu> >Unfortunately, our system still use Perl 5.005. We're in the process of upgrading to 5.8 - hope to have all modules (including bioperl) functioning in <1 month. Susan From jason.stajich at duke.edu Tue Apr 26 11:03:34 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Apr 26 10:56:57 2005 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] AffyDB a module to handle Affymetrix annotation tables and more... Message-ID: <3fa7958119b0864fef80cc38efe96dd5@duke.edu> I think Allen is the main developer for the microarray stuff so I guess it is up to him to figure out how to incorporate the code. I'm sure we'd like to have more people contributing to that project so hopefully he can point you in the right direction for bringing your work into it. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ Begin forwarded message: > From: rambaldi > Date: April 12, 2005 11:12:56 AM EDT > To: bioperl-guts-l@bioperl.org > Cc: Subject: [Bioperl-guts-l] AffyDB a module to handle Affymetrix > annotation tables and more... > > Hi, I am a young researcher and Perl developer of the IFOM-FIRC > Institute of Cancer Research (Milan) > In the last month i have worked on a Perl module to handle Affymetrix > Probesets Informations and store it in a Relational Database > > Actually I have an account at PAUSE as RAMBALDI but i would like to add > my modules to the BioPerl Bundle > Under CPAN search i FIND actualy only 2 packages > > bioperl-microarray-0.1 > Bio-Affymetrix-0.3 > > that work with Affymetrix Mass Fille. > > > My Package handle information about probesets design, alignments oligos > ecc... and generate new data about single probes position over RefSeq > and Genome. > > Actually a web-interface to my modules are avaible at > > http://bio.ifom-firc.it/AffyDB/ > > Just type 'random' into the INPUT BOX to get a random list of 10 > probesets > > I would like to integrate my modules in the BioPerl distribution > > I have used OOPerl and the template of 'Mastering Perl for > Bioinformatics (Gene.pm)' > > Actualy i have 2 modules called: > > AffyDB.pm > > parse annotation tables, insert tables into a Entity Relationship mySQL > database, retrive informations > > Probeset.pm > > Really similar to Gene.pm, is an OOPerl module to create an object > (Probeset) and handle informations about design, alignments, ecc... > > And some perl CGI scripts to generate the web interface. > > Perl DOCs about the 2 modules are available (html version) at > > http://bio.ifom-firc.it/AffyDB/src/AffyDB.html > > http://bio.ifom-firc.it/AffyDB/src/Probeset.html > > I post IT one month AGO at the bioperl-macroarray list and one person > ask to reuse my module, > i would like to include mu modules into the BioPerl distribution... > > Is It possible? > > What I have to do? > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l > From jason.stajich at duke.edu Tue Apr 26 16:11:06 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Apr 26 16:04:53 2005 Subject: [Bioperl-l] updates, new code, invitation to contribute Message-ID: <7116ec3a8afda4837accb9239483e9b8@duke.edu> Some updates and requests. I'd appreciate if people could start to report in more summaries of what changes they have committed in addition to what we see in the bioperl-guts commit list it is useful to have a summary of what's going on from the authors. In the interest of making the next release easy, please try and add something to the Changes file to describe any major additions to the API, squashed bug, or more than formatting changes. I think we are probably going to need to do at least one or two more dev releases under the 1.5 series before we can think about a 1.6 so this log of changes will help us determine what will actually be new in a release. === Some updates from me. Bio::Tools::Phylo::PAML I just added RST parsing so one can parse ancestral sequences in as well as per site ancestral codon probabilities at each node in the tree. Last month I also added parsing of branch-specific parameters when those constraint models are applied in codeml/baseml. You can get these values by walking up the tree and getting the tag/values associated with each internal node. I've updated the SYNOPSIS to have more examples of these things and as well most things are also shown in the t/PAML.t tests. I'm working on including these examples in the PAML HOWTO. Bio::Factory::FTLocationFactory Hierarchical parsing works even for joins of joins now. This is using a regexp solution which seems to be suitably fast but has the problem of not working on perls < 5.6.1. scripts I fixed a bug in fastam9_to_table which allows it to parse TFASTX m9 output properly now. bioperl-run I've tried to expand the cmd-line options that several tools support - Bio::Tools::Run::HMMER. Better default option handling in Run::Phylo::PAML::Codeml === Bug fixing help! There are a lot of bugs sitting in the bugzilla queue. I realize it may not seem easy for someone to jump in and work on, but that is how you learn the toolkit (it's how I learned to do things on this project). Even if you simply try out to see if the bug is reproducible that is helpful. Right now feels like me and a few other folks against this massive wall of things to do and so we're probably only going to fix the things we are feeling particularly excited about. So we need more volunteers to do things like take a look at these bugs. We also need people to help read the documentation, synopsis, etc and report if it is inconsistent and (preferably) help us fix it. I have encountered several new converts to Bioperl who are constantly stymied by not being able to get the Synopsis to work as they expected. Of course these folks also feel to shy to email the list and tell us something is wrong, and then it never gets fixed.... So really, help out, let us know when something is wrong. If you want to help out (and get your name in shiny ASCII code in the AUTHORS file) contribute a couple of fixes through bugzilla, if you are doing it right we'll give you an account and you start helping even faster. === Future stuff Lots of people, and I mean, lots of people cannot seem to get bioperl installed. Sometimes this is all I hear from newbies when I meet folks or teach a class. I think this is really due mostly to the dependancies, and probably because people don't know how to use UNIX and/or understand where perl modules go. But people building RPMs complain, the new OSX users, and of course windows users always seem to have a lot of trouble getting things working. I think this is more due to the dependancies than Bioperl itsself. I think it may be worth considering moving stuff that has lots of dependancies out of the main core code and into separate installable packages. IO::String is not a large dependancy, but LWP starts to be add any of the XML modules, Graph, etc and it seems to be too large of a hurdle for many new folks. Maybe anything which depends on code linked to an external C library would be a candidate. I think more discussion is warranted for sure, this would not be an easy thing for us to undertake. But in the end it would produce a slimmed down core set of pure-perl modules that would be easy to use out of the box. The other alternative is to make a slimmed down bioperl-lite which is just and export and subset of bioperl modules which are pure-perl and have little or no dependancies... This would make it easier for people who don't need all the extra bells and whistles. I think it doesn't necessarily split by directory namespace however, for example SeqIO has some XML dependent modules which (under this proposal) get moved into a separate package. === At any rate, that is all food for thought and perhaps can be discussed by folks over the summer at the upcoming meetings and hackathons. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ From navin_elango at yahoo.co.uk Tue Apr 26 20:32:27 2005 From: navin_elango at yahoo.co.uk (Navin Elango) Date: Tue Apr 26 20:25:37 2005 Subject: [Bioperl-l] Mapping Coordinates In-Reply-To: <7116ec3a8afda4837accb9239483e9b8@duke.edu> Message-ID: <20050427003227.61742.qmail@web25005.mail.ukl.yahoo.com> Hi All, I am trying to map coordinates in the query of a blast alignment to that of the subject. I am using Bio::Coordinate::utils. I am having problems when the match is in the minus strand of the subject. I used blast2seq in NCBI to blast the following 2 sequences sequence 1(query) AAAAAcaaacttttatgctctgcttcccttttaaacctaaattccaatttcagatcatctctctcaagttcatagttccaTTTTTTTTTT sequence2(subject) TTTTTTTTTTTTTTTtggaactatgaacttgagagagatgatctgaaattggaatttaggtttaaaagggaagcagagcataaaagtttgAAAAAAAAAA The lower case letters in the second sequence is the reverse complement of the lower case letters in the first sequence. The BLAST output is as follows. Note, I have given some dummy name to the subject. ======================================= BLASTN 2.2.10 [Oct-19-2004] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. RID: 1114363842-27559-121522116853.BLASTQ2 Query= (90 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 3,067,237 sequences; 13,823,778,485 total letters Score E Sequences producing significant alignments: (bits) Value ref|NT_005120.15|Hs2_5277 Homo sapiens chromosome 2 genomic contig 2.103e+04 0.0 ALIGNMENTS >ref|NT_005120.15|Hs2_5277 Homo sapiens chromosome 2 genomic contig Length = 100 Score = 144 bits (75), Expect = 0.0 Identities = 75/75 (100%) Strand = Plus / Minus Query: 6 caaacttttatgctctgcttcccttttaaacctaaattccaatttcagatcatctctctc 65 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 90 caaacttttatgctctgcttcccttttaaacctaaattccaatttcagatcatctctctc 31 Query: 66 aagttcatagttcca 80 ||||||||||||||| Sbjct: 30 aagttcatagttcca 16 CPU time: 0.01 user secs. 0.01 sys. secs 0.02 total secs. Lambda K H 1.33 0.621 1.12 Gapped Lambda K H 1.33 0.621 1.12 =========================================== I want the region in the subject sequence which corresponds to region 10 - 20 in the query sequence. When I mapped 10 and 20 , using the coordinate mapper (code given below), I get 20 and 30 respectively, which seems to be wrong. How do I get the region in suject corresponding to region 10-20 ie cttttatgctc in query. The code I am using is as follows #building a mapper object, with the query sequence as the reference sequence. my $mapper = Bio::Coordinate::utils->from_align($hspAlignment,1); my $startPositionObject = Bio::Location::Simple->new (-start => $queryStartPosition, -end => $queryStartPosition ); my $res = $mapper->map($startPositionObject); my @returnArray; $subjectStart = $res->match->start; my $endPositionObject = Bio::Location::Simple->new (-start => $queryEndPosition, -end => $queryEndPosition ); my $endRes = $mapper->map($endPositionObject); $subjectEnd = $endRes->match->start; Please let me know if there is any other method to approach this problem. I see that there is a function for getting the position in the alignment based on position in the sequence. Is there a function to do the vice versa ? Thanks, Navin. Send instant messages to your online friends http://uk.messenger.yahoo.com From brian_osborne at cognia.com Tue Apr 26 21:09:26 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Apr 26 21:03:10 2005 Subject: [Bioperl-l] RE: Installing BioPerl on Windows In-Reply-To: <41B76EE0.6080800@genetics.utah.edu> Message-ID: Barry, Done, replaced old INSTALL.WIN with this text. Brian O. -----Original Message----- From: Barry Moore [mailto:barry.moore@genetics.utah.edu] Sent: Wednesday, December 08, 2004 4:15 PM To: Jason Stajich Cc: Bioperl List; Brian Osborne Subject: Installing BioPerl on Windows Jason, Brian, Others- A recent message to the bioperl list suggests that new Windows users are still having problems installing Bioperl on Windows. This is not necessary because it's actually quite easy to install Bioperl 1.4. I had a look at the INSATLL.WIN document and I think that while it has been updated a bit, it is starting to suffer from fragmented editing over a long period of time. All the information that you need is there, but it doesn't really fit together to well anymore, and there is still some outdated and conflicting information present. Since new Windows users are often the least likely to be experienced programmers and also likely to have little Unix experience it may also need to be written with that in mind, providing more explanation for how things are done. I've taken a crack at this and rewritten INSTALL.WIN with a longer (perhaps to long) introduction to Bioperl, and updated installation instruction for Bioperl 1.4. In fact I think that the file name INSTALL.WIN should probably be changed as that is a filename that is intuitive to someone who has done a lot of installing from source. Installing_Bioperl_on_Windows.txt may be more obvious filename to new Windows users. If you think it looks useful please feel free to post it on the Bioperl web site as a replacement for or in addition to the current INSTALL.WIN. I'll be happy to try to keep this document up to date, but I'll need one of the developers to put it on the site for me. Finally, I didn't touch the Cygwin sections of the previous INSTALL.WIN document because I have no experience with it, so I'll have to assume that it is accurate and let others contribute any fixes necessary there. Let me know if I've made any errors or omissions that need to be corrected. Barry ============================================================================ ====== Installing Bioperl on Windows ============================= 1) Quick Instructions for the impatient 2) Bioperl on Windows 3) Perl on Windows 4) BioPerl on Windows 5) Beyond the Core 6) BioPerl in Cygwin 7) Cygwin tips This installation guide was written by Barry Moore and other Bioperl authors based on the original work of Paul Boutros. Please report problems and/or fixes to the bioper lmailing list, bioperl-l@bioperl.org 1) Quick instructions for the impatient, lucky, or experienced user. ===================================================================== Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). Open a command prompt (Menus Start->Run and type cmd) and run the ppm shell (C:\>ppm). Add two new ppm repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Install Bioperl-1.4. Go to http://www.bioperl.org and start reading documentation or try the example script at the end of this file. 2) Bioperl on Windows ====================== Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, fasta, clustalw and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc.) and local (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. And finally with its associated documentation and mailing list Bioperl represents a community of bioinformatics professionals working in perl who are committed to supporting both development of Bioperl and the new users who are drawn to the project. While most bioinformatics and computational biology applications are developed in Unix/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, clustalw, Staden etc.). Bioperl simply acts as a wrapper for running and parsing output from these external programs. Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with BioPerl in the native Windows environment. Others, such as clustalw, have Windows ports, however the BioPerl developer who wrote the interface used Unix specific system calls to interact with these programs and so these wrappers will not work in the Windows environment. And finally some external programs such as Staden and the EMBOSS suite of programs can not be installed on Windows at all, and therefore any part of Bioperl that interacts with these packages either won?t work or can?t be installed at all. If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don?t mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also. Be aware that most if not all of the Bioperl developers are working in some type of a Unix environment (Linux, OSX, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses ? simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of Unix like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a Unix emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed more below. 3) Perl on Windows =================== There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState. ActiveState is a software company (http://www.activestate.com) that provides free builds of Perl for Windows users. The current (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638 is also available and should work just fine). To install ActivePerl on Windows: Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/ Run the ActivePerl Installer (accepting all defaults is fine). You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you?re doing. It that?s the case you probably don?t need to be reading this guide. Cygwin is a Unix emulation environment for Windows and comes with its own copy of Perl. Information on Cygwin and Bioperl is found below. 4) BioPerl on Windows ====================== Perl is a programming language that has been extended a lot by the addition of external modules. These modules work with the core language to extend the functionality of Perl. Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can?t install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl ? if you?ve installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you?ll have to install them yourself if you want to use them. BioPerl has such dependencies. Bioperl is actually a large collection of perl modules (over 1000 currently) and these modules are split into six groups. These six groups are: Bioperl Group Functions ----------------------------------------------------------------- bioperl (the core) Most of the main functionality of Bioperl. bioperl-run Wrappers to a lot of external programs. bioperl-ext Interaction with some alignment functions and the Staden package. bioperl-db Using bioperl with BioSQL and local relational databases. bioperl-microarray Microarray specific functions. biperl-gui Some preliminary work on a graphical user interface to some Bioperl functions. The Bioperl core is what most new users will want to start with. Bioperl 1.4 (the core) and the Perl modules that it depends on can be easily installed with ppm. PPM (Programming Package Manager) is an ActivePerl utility for installing Perl modules on systems using ActivePerl. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you. These .ppd files are stored online in ppm repositories. ActiveState maintains the largest ppm repository and when you installed ActivePerl ppm was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own ppm repositories to fill in the gaps. Installing will require you to direct ppm to look in two new repositories. You do this by opening a Windows command prompt, typing ppm to start the ppm shell and then typing the following two commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms Once ppm knows where to look for Bioperl and it?s dependencies you simply tell ppm to install it. This is done with the command: ppm> install Bioperl-1.4 5) Beyond the Core =================== You may find that you want some of the features of other Bioperl groups like bioperl-run or bioperl-db. There are currently no ppm packages for installing these parts of Bioperl. You will have to install these manually from source. For this you will need a Windows version of the program make called nmake (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.ex e). You will also want to have a willingness to experiment. You?ll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make. You will have to determine from the installation documents what dependencies are required and you will have to get them, read there documentation and install them first. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with other on the bioperl mailing list. 6) BioPerl in Cygwin ===================== Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. BioPerl runs well within Cygwin. Some users claim that installation of Bioperl is easier within Cygwin than within Windows, but these may be users with Unix backgrounds. One advantage of using Bioperl in Cygwin is that all the external modules are available through CPAN, most if not all external programs can be installed and run so many of the limitation of Bioperl on Windows are circumvented. To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, make, and gcc packages. Clicking the "View" button in the upper right of the installer enables you to see details on the various packages. Then follow the BioPerl installation instructions for Unix in BioPerl's INSTALL file. Note that expat comes with Cygwin (it's used by the module XML::Parser). One known issue is that DBD::mysql can be tricky to install in Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline external packages. Fortunately there's some good instructions online: http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwi n. Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a place to create temporary files. e.g.: setenv TMPDIR e:/cygwin/tmp # csh, tcsh export TMPDIR=e:/cygwin/tmp # sh, bash Note that this is not a syntax that Cygwin understands, which would be something like "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows. If this variable is not set correctly you'll see errors like this when you run Bio::Tools::Run::StandAloneBlast: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory STACK: Error::throw .......... 7) Cygwin tips =============== The easiest way to install Mysql is to use the Windows binaries available at www.mysql.com. Note that Windows does not have sockets, so you need to force the Mysql connections to use TCP/IP instead. Do this by using the "-h" option from the command- line: >mysql -h 127.0.0.1 -u blip -pblop biosql Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For example, if your databases are installed locally: alias mysql 'mysql -h 127.0.0.1' If you're trying to use some application or resource "outside" of Cygwin and you're having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may end up with paths written in these different syntaxes, depending. If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32- formatted. When you install Cygwin on a FAT32 partition you will not be able to set permissions and ownership correctly. In most situations this probably won't make any difference but there may be occasions where this is a problem. If you want to use BLAST we recommend that the Windows binary be obtained from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls. Although we've recommended using the BLAST and MySQL binaries you should be able to compile just about everything else from source code using Cygwin's gcc. You'll notice when you're installing Cygwin that many different libraries are also available (gd, jpeg, etc.). From hlapp at gmx.net Wed Apr 27 04:05:49 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Apr 27 03:59:01 2005 Subject: [Bioperl-l] updates, new code, invitation to contribute In-Reply-To: <7116ec3a8afda4837accb9239483e9b8@duke.edu> Message-ID: <2B104696-B6F3-11D9-B08A-000A959EB4C4@gmx.net> On Tuesday, April 26, 2005, at 01:11 PM, Jason Stajich wrote: > Bio::Factory::FTLocationFactory > Hierarchical parsing works even for joins of joins now. This is > using a regexp solution which seems to be suitably fast but has the > problem of not working on perls < 5.6.1. > It won't be difficult to make the test conditional on the perl version, but I wonder what happens if a perl 5.005 or lower loads the module and needs to compile the expression. Haven't had a chance to investigate yet. > [...] > Bug fixing help! > There are a lot of bugs sitting in the bugzilla queue. I realize it > may not seem easy for someone to jump in and work on, but that is how > you learn the toolkit (it's how I learned to do things on this > project). What would also help I think if people can write up small short test cases that expose the bug. > [...] > I think it may be worth considering moving stuff that has lots of > dependancies out of the main core code and into separate installable > packages. IO::String is not a large dependancy, but LWP starts to be > add any of the XML modules, Graph, etc and it seems to be too large of > a hurdle for many new folks. Maybe anything which depends on code > linked to an external C library would be a candidate. In a way I agree but what I'm not sure about is how separating out modules with compiled dependencies will help if you want those modules anyway. If you don't want them then failure to install those dependencies should not be a cause to worry, and in fact one should then not even bother to try to install them. So what is happening when someone "fails" to install bioperl? Did that person try to install all dependencies and then gave up over the ensuing hassle? Maybe we need to better document which dependencies are worth installing and which aren't unless you are certain you'll use a particular set of modules or functionality. Maybe it's the Bundle that's trying to do too good of a job by including virtually all dependencies? Maybe there should be different bundles for different intended uses? How would separating out modules with compiled dependencies help an RPM maintainer if she wants to build an RPM for the entire bioperl anyway (in either one or a set of RPMs)? Installing bioperl is indeed a PITA, I've done it too many times. The main pain, however, in my experience comes from LWP, GD, and sometimes expat. None of these are needed for much of the core functionality, but based on list traffic many users out there do want GBrowse and/or Bio::DB::* modules, for which you'll have to have them. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From tcj25 at cam.ac.uk Wed Apr 27 04:32:02 2005 From: tcj25 at cam.ac.uk (Terry Jones) Date: Wed Apr 27 04:29:10 2005 Subject: [Bioperl-l] updates, new code, invitation to contribute In-Reply-To: Your message at 01:05:49 on Wednesday, 27 April 2005 References: <7116ec3a8afda4837accb9239483e9b8@duke.edu> <2B104696-B6F3-11D9-B08A-000A959EB4C4@gmx.net> Message-ID: <17007.19970.696403.347464@terry.jones.tc> | It won't be difficult to make the test conditional on the perl | version, but I wonder what happens if a perl 5.005 or lower loads the | module and needs to compile the expression. Haven't had a chance to | investigate yet. You could put the re in a string. Then, if the perl version is late enough, match via /$re/o Or you could put code into a string and conditionally use eval. Terry From james.wasmuth at ed.ac.uk Wed Apr 27 05:19:02 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Wed Apr 27 05:16:57 2005 Subject: [Bioperl-l] updates, new code, invitation to contribute In-Reply-To: <2B104696-B6F3-11D9-B08A-000A959EB4C4@gmx.net> References: <2B104696-B6F3-11D9-B08A-000A959EB4C4@gmx.net> Message-ID: <426F5906.9040108@ed.ac.uk> > So what is happening when someone "fails" to install bioperl? Did that > person try to install all dependencies and then gave up over the > ensuing hassle? When I first installed BioPerl through CPAN it seemed that the whole of CPAN distribution was loading onto my computer. Okay, it wasn't but it took an age. I can see how people not too used to dealing with Perl modules could think something has gone wrong. Perhaps a "Don't Panic in large friendly letters" should appear as its being installed. Seriously though, new users just need to be told to be brave. anyway my 2p worth -james p.s. > There are a lot of bugs sitting in the bugzilla queue. I'll start having a look at some over the weekend. I've been silent due to other commitments, but could do with the distraction Hilmar Lapp wrote: > > On Tuesday, April 26, 2005, at 01:11 PM, Jason Stajich wrote: > >> Bio::Factory::FTLocationFactory >> Hierarchical parsing works even for joins of joins now. This is >> using a regexp solution which seems to be suitably fast but has the >> problem of not working on perls < 5.6.1. >> > > It won't be difficult to make the test conditional on the perl > version, but I wonder what happens if a perl 5.005 or lower loads the > module and needs to compile the expression. Haven't had a chance to > investigate yet. > >> [...] >> Bug fixing help! >> There are a lot of bugs sitting in the bugzilla queue. I realize it >> may not seem easy for someone to jump in and work on, but that is how >> you learn the toolkit (it's how I learned to do things on this project). > > > What would also help I think if people can write up small short test > cases that expose the bug. > >> [...] >> I think it may be worth considering moving stuff that has lots of >> dependancies out of the main core code and into separate installable >> packages. IO::String is not a large dependancy, but LWP starts to be >> add any of the XML modules, Graph, etc and it seems to be too large >> of a hurdle for many new folks. Maybe anything which depends on code >> linked to an external C library would be a candidate. > > > In a way I agree but what I'm not sure about is how separating out > modules with compiled dependencies will help if you want those modules > anyway. If you don't want them then failure to install those > dependencies should not be a cause to worry, and in fact one should > then not even bother to try to install them. > > So what is happening when someone "fails" to install bioperl? Did that > person try to install all dependencies and then gave up over the > ensuing hassle? Maybe we need to better document which dependencies > are worth installing and which aren't unless you are certain you'll > use a particular set of modules or functionality. > > Maybe it's the Bundle that's trying to do too good of a job by > including virtually all dependencies? Maybe there should be different > bundles for different intended uses? > > How would separating out modules with compiled dependencies help an > RPM maintainer if she wants to build an RPM for the entire bioperl > anyway (in either one or a set of RPMs)? > > Installing bioperl is indeed a PITA, I've done it too many times. The > main pain, however, in my experience comes from LWP, GD, and sometimes > expat. None of these are needed for much of the core functionality, > but based on list traffic many users out there do want GBrowse and/or > Bio::DB::* modules, for which you'll have to have them. > > -hilmar -- "Until man duplicates a blade of grass, nature can laugh at his so-called scientific knowledge...." --Thomas Edison Blaxter Nematode Genomics Group | Institute of Evolutionary Biology | Ashworth Laboratories, KB | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org Edinburgh | EH9 3JT | UK | From kvddrift at earthlink.net Wed Apr 27 07:19:48 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed Apr 27 07:36:05 2005 Subject: [Bioperl-l] Bioperl on Mac OS X Message-ID: Hi, Just as a reminder, bioperl can be installed on Mac OS X using the fink package manager (http://fink.sf.net). Fink will take care of all packages that are needed for bioperl, including GD. Most other perl modules that bioperl uses are also available through fink, so there won't be much loss of functionality. Currently version 1.4 is available, but I am (being the package maintainer) working on getting 1.5 in fink soon. cheers, - Koen. From brian_osborne at cognia.com Wed Apr 27 08:26:21 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Apr 27 08:20:46 2005 Subject: [Bioperl-l] New Bio::Tools::Spidey::Results module In-Reply-To: <005d01c545c3$ed1eb8c0$4122db82@GOLHARMOBILE1> Message-ID: Ryan, Commited. Yes, writing a test script is a good idea and it's straightforward. Use any *.t file in the t directory as an example, and the spidey input file will go in t/data. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Ryan Golhar Sent: Wednesday, April 20, 2005 12:14 PM To: 'Bioperl List' Subject: [Bioperl-l] New Bio::Tools::Spidey::Results module Can someone please commit this new version of Bio::Tools::Spidey::Results.pm to CVS? Its attached to this message. I've added a method to determine if either of the mRNA ends are missing as determined by Spidey. Also, I think I need to write some test scripts for this thing. Can someone point me in the right direction as to what I should read to write a test script? I looked at some of the test scripts and it looks pretty straight forward. Thanks, Ryan From nathanhaigh at ukonline.co.uk Wed Apr 27 09:00:05 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Wed Apr 27 08:53:18 2005 Subject: [Bioperl-l] updates, new code, invitation to contribute In-Reply-To: <7116ec3a8afda4837accb9239483e9b8@duke.edu> Message-ID: Once I have my PhD out of the way I can start helping debugging and doing some of the document/synopsis checks, but till then, I'll chip in when I can. I'll also help where I can RE Bioperl on Windows. I think the best approach to helping newbies is to ensure that the Bioperl homepage has a clear "newbie" area that covers the installation, and a few of the more commonly used modules. I know when I started with Perl/Bioperl I found it difficult to find/know where documentation was and how to get at the synopses. A "newbie" area should contain some help on where to find documentation, where to ask for help etc and some example code that can be downloaded and run without any tinkering, so they can start to get a feel for how things work. Nath -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich Sent: 26 April 2005 21:11 To: Bioperl list Subject: [Bioperl-l] updates, new code, invitation to contribute Some updates and requests. I'd appreciate if people could start to report in more summaries of what changes they have committed in addition to what we see in the bioperl-guts commit list it is useful to have a summary of what's going on from the authors. In the interest of making the next release easy, please try and add something to the Changes file to describe any major additions to the API, squashed bug, or more than formatting changes. I think we are probably going to need to do at least one or two more dev releases under the 1.5 series before we can think about a 1.6 so this log of changes will help us determine what will actually be new in a release. === Some updates from me. Bio::Tools::Phylo::PAML I just added RST parsing so one can parse ancestral sequences in as well as per site ancestral codon probabilities at each node in the tree. Last month I also added parsing of branch-specific parameters when those constraint models are applied in codeml/baseml. You can get these values by walking up the tree and getting the tag/values associated with each internal node. I've updated the SYNOPSIS to have more examples of these things and as well most things are also shown in the t/PAML.t tests. I'm working on including these examples in the PAML HOWTO. Bio::Factory::FTLocationFactory Hierarchical parsing works even for joins of joins now. This is using a regexp solution which seems to be suitably fast but has the problem of not working on perls < 5.6.1. scripts I fixed a bug in fastam9_to_table which allows it to parse TFASTX m9 output properly now. bioperl-run I've tried to expand the cmd-line options that several tools support - Bio::Tools::Run::HMMER. Better default option handling in Run::Phylo::PAML::Codeml === Bug fixing help! There are a lot of bugs sitting in the bugzilla queue. I realize it may not seem easy for someone to jump in and work on, but that is how you learn the toolkit (it's how I learned to do things on this project). Even if you simply try out to see if the bug is reproducible that is helpful. Right now feels like me and a few other folks against this massive wall of things to do and so we're probably only going to fix the things we are feeling particularly excited about. So we need more volunteers to do things like take a look at these bugs. We also need people to help read the documentation, synopsis, etc and report if it is inconsistent and (preferably) help us fix it. I have encountered several new converts to Bioperl who are constantly stymied by not being able to get the Synopsis to work as they expected. Of course these folks also feel to shy to email the list and tell us something is wrong, and then it never gets fixed.... So really, help out, let us know when something is wrong. If you want to help out (and get your name in shiny ASCII code in the AUTHORS file) contribute a couple of fixes through bugzilla, if you are doing it right we'll give you an account and you start helping even faster. === Future stuff Lots of people, and I mean, lots of people cannot seem to get bioperl installed. Sometimes this is all I hear from newbies when I meet folks or teach a class. I think this is really due mostly to the dependancies, and probably because people don't know how to use UNIX and/or understand where perl modules go. But people building RPMs complain, the new OSX users, and of course windows users always seem to have a lot of trouble getting things working. I think this is more due to the dependancies than Bioperl itsself. I think it may be worth considering moving stuff that has lots of dependancies out of the main core code and into separate installable packages. IO::String is not a large dependancy, but LWP starts to be add any of the XML modules, Graph, etc and it seems to be too large of a hurdle for many new folks. Maybe anything which depends on code linked to an external C library would be a candidate. I think more discussion is warranted for sure, this would not be an easy thing for us to undertake. But in the end it would produce a slimmed down core set of pure-perl modules that would be easy to use out of the box. The other alternative is to make a slimmed down bioperl-lite which is just and export and subset of bioperl modules which are pure-perl and have little or no dependancies... This would make it easier for people who don't need all the extra bells and whistles. I think it doesn't necessarily split by directory namespace however, for example SeqIO has some XML dependent modules which (under this proposal) get moved into a separate package. === At any rate, that is all food for thought and perhaps can be discussed by folks over the summer at the upcoming meetings and hackathons. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From limericksean at gmail.com Wed Apr 27 10:33:14 2005 From: limericksean at gmail.com (Sean O'Keeffe) Date: Wed Apr 27 10:26:46 2005 Subject: [Bioperl-l] retrieving sequences by ID Message-ID: <462784640504270733591dfcfd@mail.gmail.com> Hi all, This is probably an easy problem for ye, but one I'm having difficulty with none the less. I'm trying to extract only sequences from a fasta file (containing ~38,000 sequences) containing a specific ID in the header line e.g. return only the sequence for header containing 'ABCD12346' from: >ABCD12345|followed by the rest of the description acgtacgtgttttgggccctttaaa..... >ABCD12346|description acgtacgtgttttgggccctttaaa..... >ABCD12347|description acgtacgtgttttgggccctttaaa..... ... The specific ID's are contained in a list(~20,000) which I want to loop through. This is what I have done so far w/out any luck: ############################ use strict; use lib "/usr/lib/perl5/site_perl/5.8.1/"; use Bio::SeqIO; my (@ids)=@_ my $seq_in = Bio::SeqIO->new( '-format' => "fasta", '-file' => "$fastafile"); my $seq_out = Bio::SeqIO->new( '-format' => "fasta", '-file' => "$outfile"); for ($i=0;$i<=scalar(@ids);$i++){ while ($sequence = $seq_in->next_seq){ if ($sequence->display_id =~ /^>$ids[$i](.*$)/){ $seq_out->write_seq($sequence); } } } exit; ############################# This has so far returned a list of 3 fasta headers and the programs then finishes without errors. I'd like to know where I'm going wrong and if possible, how I could improve on things to prevent memory usage/speed it up. Thanks in advance, Sean O'Keeffe. From raoul.bonnal at itb.cnr.it Wed Apr 27 11:01:34 2005 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Wed Apr 27 10:55:42 2005 Subject: [Bioperl-l] retrieving sequences by ID In-Reply-To: <462784640504270733591dfcfd@mail.gmail.com> References: <462784640504270733591dfcfd@mail.gmail.com> Message-ID: <1114614094.18592.5.camel@localhost> Hi Sean, > I'd like to know where I'm going wrong and if possible, how I could > improve on things to prevent memory usage/speed it up. I think, you could create and hash of id from your fasta db associating the id with the seq obj. Then retrive the ids that you need simply accessing to the hash. RJP From MAG at Stowers-Institute.org Wed Apr 27 11:11:47 2005 From: MAG at Stowers-Institute.org (Goel, Manisha) Date: Wed Apr 27 11:05:55 2005 Subject: [Bioperl-l] Protdist output to generic matrix Message-ID: <200504271505.j3RF4vfY020299@portal.open-bio.org> Hi All, I am a new to Bioperl and was trying to parse my ProtDist (phylip) output file to display it as a matrix. I think my file is being read in by Bio::Tools::Phylo::Phylip::ProtDist, but how do I write it back as a simple matrix .. I am kind of mixed up about "write_matrix" or "print_matrix" .. Thanks for any help. -Manisha From diriano at rz.uni-potsdam.de Wed Apr 27 11:18:10 2005 From: diriano at rz.uni-potsdam.de (Diego Riano) Date: Wed Apr 27 11:12:02 2005 Subject: [Bioperl-l] retrieving sequences by ID In-Reply-To: <462784640504270733591dfcfd@mail.gmail.com> References: <462784640504270733591dfcfd@mail.gmail.com> Message-ID: <1114615090.10537.7.camel@molbio21.bio.uni-potsdam.de> Hi Sean I use to do something similar, here is how I am doing it: Your ids would be in @pre_list, I just remove the final \n, so I have a string without them. Using the string is faster than looping through the list, But I am not sure how it would behave with >20000 ids. ###################################################################### my @list=(); foreach my $seq(@pre_list){ chomp $seq; push @list,$seq; } my $list=join("\t",@list); my $output="outputfile" my $in = Bio::SeqIO->new(-file => "$ARGV[0]" , '-format' => 'Fasta'); my $out = Bio::SeqIO->new(-file => ">$output" , '-format' => 'Fasta'); while(my $seq = $in->next_seq()){ my $seqid=$seq->id; if ($rmlist=~/\b$seqid\b/){ $out->write_seq($seq); } } ###################################################################### I hope this helps, diego On Wed, 2005-04-27 at 16:33, Sean O'Keeffe wrote: > Hi all, > This is probably an easy problem for ye, but one I'm having difficulty > with none the > less. > I'm trying to extract only sequences from a fasta file (containing > ~38,000 sequences) > containing a specific ID in the header line e.g. > return only the sequence for header containing 'ABCD12346' from: > >ABCD12345|followed by the rest of the description > acgtacgtgttttgggccctttaaa..... > >ABCD12346|description > acgtacgtgttttgggccctttaaa..... > >ABCD12347|description > acgtacgtgttttgggccctttaaa..... > ... > > The specific ID's are contained in a list(~20,000) which I want to loop through. > This is what I have done so far w/out any luck: > > ############################ > > use strict; > use lib "/usr/lib/perl5/site_perl/5.8.1/"; > use Bio::SeqIO; > > my (@ids)=@_ > my $seq_in = Bio::SeqIO->new( '-format' => "fasta", > '-file' => "$fastafile"); > my $seq_out = Bio::SeqIO->new( '-format' => "fasta", > '-file' => "$outfile"); > for ($i=0;$i<=scalar(@ids);$i++){ > while ($sequence = $seq_in->next_seq){ > if ($sequence->display_id =~ /^>$ids[$i](.*$)/){ > $seq_out->write_seq($sequence); > } > } > } > exit; > > ############################# > > This has so far returned a list of 3 fasta headers and the programs > then finishes without errors. > I'd like to know where I'm going wrong and if possible, how I could > improve on things to prevent memory usage/speed it up. > > Thanks in advance, > Sean O'Keeffe. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- _______________________________________ Diego Mauricio Riano Pachon Biologist Institute of Biology and Biochemistry Potsdam University Karl-Liebknecht-Str. 24-25 Haus 20 14476 Golm Germany Tel:+49 331 977 2809 http://www.geocities.com/dmrp.geo/ From khoueiry at ibdm.univ-mrs.fr Wed Apr 27 11:20:26 2005 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Wed Apr 27 11:13:06 2005 Subject: [Bioperl-l] retrieving sequences by ID In-Reply-To: <462784640504270733591dfcfd@mail.gmail.com> References: <462784640504270733591dfcfd@mail.gmail.com> Message-ID: <1114615226.22472.2.camel@DavidLinux> Hi, First of all try to index your fasta file... use Bio::Index::Fasta; #indexing fasta file my $type = $ENV{'BIOPER_INDEX_TYPE'}; if( $type ) { $Bio::Index::Abstract::USE_DBM_TYPE = $type; } my $index = Bio::Index::Fasta->new($tmp_dir."index.idx", 'WRITE'); #the name you will give to the index $index->make_index($yourfastafile.fasta); #the fasta file you want to index Loop here{ ... my $seq = $index->fetch($id); #fetching the fasta file searching for a specified ID $seq_out->write_seq($seq); ... } Le mercredi 27 avril 2005 ? 15:33 +0100, Sean O'Keeffe a ?crit : > Hi all, > This is probably an easy problem for ye, but one I'm having difficulty > with none the > less. > I'm trying to extract only sequences from a fasta file (containing > ~38,000 sequences) > containing a specific ID in the header line e.g. > return only the sequence for header containing 'ABCD12346' from: > >ABCD12345|followed by the rest of the description > acgtacgtgttttgggccctttaaa..... > >ABCD12346|description > acgtacgtgttttgggccctttaaa..... > >ABCD12347|description > acgtacgtgttttgggccctttaaa..... > ... > > The specific ID's are contained in a list(~20,000) which I want to loop through. > This is what I have done so far w/out any luck: > > ############################ > > use strict; > use lib "/usr/lib/perl5/site_perl/5.8.1/"; > use Bio::SeqIO; > > my (@ids)=@_ > my $seq_in = Bio::SeqIO->new( '-format' => "fasta", > '-file' => "$fastafile"); > my $seq_out = Bio::SeqIO->new( '-format' => "fasta", > '-file' => "$outfile"); > for ($i=0;$i<=scalar(@ids);$i++){ > while ($sequence = $seq_in->next_seq){ > if ($sequence->display_id =~ /^>$ids[$i](.*$)/){ > $seq_out->write_seq($sequence); > } > } > } > exit; > > ############################# > > This has so far returned a list of 3 fasta headers and the programs > then finishes without errors. > I'd like to know where I'm going wrong and if possible, how I could > improve on things to prevent memory usage/speed it up. > > Thanks in advance, > Sean O'Keeffe. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Wed Apr 27 11:23:14 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Apr 27 11:17:10 2005 Subject: [Bioperl-l] Protdist output to generic matrix In-Reply-To: <200504271505.j3RF4vfY020299@portal.open-bio.org> References: <200504271505.j3RF4vfY020299@portal.open-bio.org> Message-ID: <00a875dd72ce85b89237d4436e9d84bc@duke.edu> Bio::Matrix::IO is what you want. specifically my $out = Bio::Matrix::IO->new(-format => 'phylip', -file => ">filename.phy"); $out->write_matrix($matrix); -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 27, 2005, at 11:11 AM, Goel, Manisha wrote: > Hi All, > > I am a new to Bioperl and was trying to parse my ProtDist (phylip) > output file to display it as a matrix. > I think my file is being read in by > Bio::Tools::Phylo::Phylip::ProtDist, > but how do I write it back as a simple matrix .. > I am kind of mixed up about "write_matrix" or "print_matrix" .. > > Thanks for any help. > -Manisha > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Wed Apr 27 11:26:53 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Apr 27 11:20:15 2005 Subject: [Bioperl-l] retrieving sequences by ID In-Reply-To: <462784640504270733591dfcfd@mail.gmail.com> References: <462784640504270733591dfcfd@mail.gmail.com> Message-ID: <8dd1828d86e894f1145de3709d69f338@duke.edu> Bio::DB::Fasta and Bio::Index::Fasta both do this for you. You might need to provide your own ID parsing header if you only want the bit before the '|' be part of the id. These modules build a persistent index so they remove the need to re-read the file on each run of your application. http://bioperl.org/HOWTOs/Beginners/indexing.html -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 27, 2005, at 10:33 AM, Sean O'Keeffe wrote: > Hi all, > This is probably an easy problem for ye, but one I'm having difficulty > with none the > less. > I'm trying to extract only sequences from a fasta file (containing > ~38,000 sequences) > containing a specific ID in the header line e.g. > return only the sequence for header containing 'ABCD12346' from: >> ABCD12345|followed by the rest of the description > acgtacgtgttttgggccctttaaa..... >> ABCD12346|description > acgtacgtgttttgggccctttaaa..... >> ABCD12347|description > acgtacgtgttttgggccctttaaa..... > ... > > The specific ID's are contained in a list(~20,000) which I want to > loop through. > This is what I have done so far w/out any luck: > > ############################ > > use strict; > use lib "/usr/lib/perl5/site_perl/5.8.1/"; > use Bio::SeqIO; > > my (@ids)=@_ > my $seq_in = Bio::SeqIO->new( '-format' => "fasta", > '-file' => "$fastafile"); > my $seq_out = Bio::SeqIO->new( '-format' => "fasta", > '-file' => "$outfile"); > for ($i=0;$i<=scalar(@ids);$i++){ > while ($sequence = $seq_in->next_seq){ > if ($sequence->display_id =~ /^>$ids[$i](.*$)/){ > $seq_out->write_seq($sequence); > } > } > } > exit; > > ############################# > > This has so far returned a list of 3 fasta headers and the programs > then finishes without errors. > I'd like to know where I'm going wrong and if possible, how I could > improve on things to prevent memory usage/speed it up. > > Thanks in advance, > Sean O'Keeffe. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From allenday at ucla.edu Wed Apr 27 14:31:33 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Apr 27 14:25:05 2005 Subject: [Bioperl-l] updates, new code, invitation to contribute In-Reply-To: <2B104696-B6F3-11D9-B08A-000A959EB4C4@gmx.net> References: <2B104696-B6F3-11D9-B08A-000A959EB4C4@gmx.net> Message-ID: > How would separating out modules with compiled dependencies help an RPM > maintainer if she wants to build an RPM for the entire bioperl anyway > (in either one or a set of RPMs)? It wouldn't help if you wanted to install the entire toolkit, and who doesn't? Splitting up the distributable would just mean more specfiles -- or more specfile subsections at the very least -- at no gain. Anyway, the RPMs for bioperl, bioperl-pedigree, and bioperl-run are all available for FC2 here: http://www.biopackages.net/ -Allen From sokeeff at tcd.ie Tue Apr 26 13:50:04 2005 From: sokeeff at tcd.ie (sokeeff@tcd.ie) Date: Wed Apr 27 14:46:20 2005 Subject: [Bioperl-l] retrieving sequences by ID Message-ID: <1114537804.426e7f4c8ce91@mymail.tcd.ie> Hi all, Probably an easy problem for ye, but one I'm having difficulty with none the less. I'm trying to extract only sequences from a fasta file(~38,000 sequences) containing a specific ID in the header files e.g. return only the sequence for header containing 'ABCD12346' from: >ABCD12345|description acgtacgtgttttgggccctttaaa..... >ABCD12346|description acgtacgtgttttgggccctttaaa..... >ABCD12347|description acgtacgtgttttgggccctttaaa..... ... The ID's are contained in a list(~20,000) which I want to loop through. This is what I have done so far w/out any luck: ############################ use strict; use lib "/usr/lib/perl5/site_perl/5.8.1/"; use Bio::SeqIO; my (@ids)=@_ my $seq_in = Bio::SeqIO->new( '-format' => "fasta", '-file' => "$fastafile"); my $seq_out = Bio::SeqIO->new( '-format' => "fasta", '-file' => "$outfile"); for ($i=0;$i<=scalar(@ids);$i++){ while ($sequence = $seq_in->next_seq){ if ($sequence->display_id =~ /^>$ids[$i](.*$)/){ $seq_out->write_seq($sequence); } } } exit; ############################# This returns a small list of fasta headers. I'd like to know if it's the right way to go about the task, where I'm going wrong and if possible, how I could improve on things to prevent memory usage/speed it up. Thanks in advance, Sean O'Keeffe. From lupey+ at pitt.edu Wed Apr 27 09:24:44 2005 From: lupey+ at pitt.edu (Paul Cantalupo) Date: Wed Apr 27 14:46:23 2005 Subject: [Bioperl-l] How to specify alignment view in RemoteBlast Message-ID: <426F929C.4060806@pitt.edu> Hello, I'm using BioPerl 1.4 and I'd like to start using Bio::Tools::Run::RemoteBlast to automate my Blast procedures. Blast allows several alignment views such as Pairwise which is the default. I want to use the 'Hit Table' alignment view so that I can get the output in tabular format. How do I specify this in Bio::Tools::Run::RemoteBlast? Thank you, Paul Cantalupo From skirov at utk.edu Wed Apr 27 14:59:10 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Apr 27 14:52:23 2005 Subject: [Bioperl-l] Bio::Ontology::Term references quetsion Message-ID: <426FE0FE.3060804@utk.edu> Is it true that Bio::Ontology::Term add_references method checks for title? If so there is a discrepancy with Bio::Annotation::Reference, as this object does not require a title upon creation. I don't think this should be such requirement in Bio::Ontology::Term unless I am missing something. Also the documentation does not say there is such requirement... Stefan From homann at wi.mit.edu Wed Apr 27 16:33:34 2005 From: homann at wi.mit.edu (Oliver Homann) Date: Wed Apr 27 16:26:54 2005 Subject: [Bioperl-l] Having no luck installing bioperl in Fedora Core 3 with Perl 5.8.5 Message-ID: <000c01c54b68$62bf3fb0$7007e6a9@salamander> Hi, We just recently purchased a new workstation, and I've been trying with absolutely no success to install bioperl. I've followed the instructions at bioperl.org, attempting several techniques (installing Bundle::BioPerl or bioperl-1.4.tar.gz via CPAN or installing the latter manually), and continue to encounter a litany of errors. I've pasted a slightly truncated version of the output from my latest attempt to install Bundle::BioPerl via CPAN... I'm hoping someone can offer me some insight into the nature of my problem. Thanks! Oliver #####STARTING CPAN GIVES ME THIS ERROR MESSAGE: # perl -MCPAN -e shell Undefined value assigned to typeglob at (eval 17) line 15, line 11. Warning [/etc/inputrc line 11]: Invalid variable `mark-symlinked-directories' #####TRYING TO INSTALL THE BUNDLE GIVES ME THIS: cpan> install "Bundle::BioPerl" Running install for module XML::DOM Running make for T/TJ/TJMATHER/XML-DOM-1.43.tar.gz Is already unwrapped into directory /root/.cpan/build/XML-DOM-1.43 Has already been processed within this session Running make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/dom_jp_attr.........FAILED tests 3, 9, 12, 14, 19, 22 Failed 6/23 tests, 73.91% okay t/dom_jp_cdata........FAILED test 3 Failed 1/3 tests, 66.67% okay t/dom_jp_example......ok t/dom_jp_minus........FAILED test 2 Failed 1/2 tests, 50.00% okay t/dom_jp_modify.......FAILED test 16 Failed 1/16 tests, 93.75% okay t/dom_jp_print........FAILED tests 2-3 Failed 2/3 tests, 33.33% okay Failed Test Stat Wstat Total Fail Failed List of Failed t/dom_jp_attr.t 23 6 26.09% 3 9 12 14 19 22 t/dom_jp_cdata.t 3 1 33.33% 3 t/dom_jp_minus.t 2 1 50.00% 2 t/dom_jp_modify.t 16 1 6.25% 16 t/dom_jp_print.t 3 2 66.67% 2-3 Failed 5/21 test scripts, 76.19% okay. 11/129 subtests failed, 91.47% okay. make: *** [test_dynamic] Error 255 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force Running install for module GD Running make for L/LD/LDS/GD-2.19.tar.gz Is already unwrapped into directory /root/.cpan/build/GD-2.19 Has already been processed within this session Running make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/GD..........Can't load './blib/arch/auto/GD/GD.so' for module GD: ./blib/arch/auto/GD/GD.so: undefined symbol: gdImageGifAnimAddPtr at /usr/lib/perl5/5.8.5/i386-linux-thread-multi/DynaLoader.pm line 230. at t/GD.t line 13 Compilation failed in require at t/GD.t line 13. BEGIN failed--compilation aborted at t/GD.t line 13. t/GD..........dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 1-10 Failed 10/10 tests, 0.00% okay t/Polyline....Can't load '/root/.cpan/build/GD-2.19/blib/arch/auto/GD/GD.so' for module GD: /root/.cpan/build/GD-2.19/blib/arch/auto/GD/GD.so: undefined symbol: gdImageGifAnimAddPtr at /usr/lib/perl5/5.8.5/i386-linux-thread-multi/DynaLoader.pm line 230. at /root/.cpan/build/GD-2.19/blib/lib/GD/Polyline.pm line 45 Compilation failed in require at /root/.cpan/build/GD-2.19/blib/lib/GD/Polyline.pm line 45. BEGIN failed--compilation aborted at /root/.cpan/build/GD-2.19/blib/lib/GD/Polyline.pm line 45. Compilation failed in require at t/Polyline.t line 10. BEGIN failed--compilation aborted at t/Polyline.t line 10. t/Polyline....dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED test 1 Failed 1/1 tests, 0.00% okay Failed Test Stat Wstat Total Fail Failed List of Failed -------------------------------------------------------------------------- -----t/GD.t 255 65280 10 19 190.00% 1-10 t/Polyline.t 255 65280 1 2 200.00% 1 Failed 2/2 test scripts, 0.00% okay. 11/11 subtests failed, 0.00% okay. make: *** [test_dynamic] Error 255 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force Bundle summary: The following items in bundle Bundle::BioPerl had installation problems: XML::DOM GD From cain at cshl.edu Wed Apr 27 16:44:43 2005 From: cain at cshl.edu (Scott Cain) Date: Wed Apr 27 16:37:52 2005 Subject: [Bioperl-l] Having no luck installing bioperl in Fedora Core 3 with Perl 5.8.5 In-Reply-To: <000c01c54b68$62bf3fb0$7007e6a9@salamander> References: <000c01c54b68$62bf3fb0$7007e6a9@salamander> Message-ID: <1114634683.6376.49.camel@localhost.localdomain> Hi Oliver, My advice to you, as someone who has been working and developing on FC3 for several months is to not use it. I've had many problems with it, some related to SELinux (until I gave up on it and turned it off) and other things a well (including apache and mod_perl). My suggestion would be to use FC2, especially since there is an rpm for bioperl 1.5 built by Allen Day available on biopackages.net. Scott On Wed, 2005-04-27 at 13:33 -0700, Oliver Homann wrote: > Hi, > > We just recently purchased a new workstation, and I've been trying with absolutely no success to install bioperl. I've followed the instructions at bioperl.org, attempting several techniques (installing Bundle::BioPerl or bioperl-1.4.tar.gz via CPAN or installing the latter manually), and continue to encounter a litany of errors. I've pasted a slightly truncated version of the output from my latest attempt to install Bundle::BioPerl via CPAN... I'm hoping someone can offer me some insight into the nature of my problem. > > Thanks! > Oliver > > #####STARTING CPAN GIVES ME THIS ERROR MESSAGE: > > > # perl -MCPAN -e shell > Undefined value assigned to typeglob at (eval 17) line 15, line 11. > Warning [/etc/inputrc line 11]: > Invalid variable `mark-symlinked-directories' > > > #####TRYING TO INSTALL THE BUNDLE GIVES ME THIS: > > > cpan> install "Bundle::BioPerl" > > > Running install for module XML::DOM > Running make for T/TJ/TJMATHER/XML-DOM-1.43.tar.gz > Is already unwrapped into directory /root/.cpan/build/XML-DOM-1.43 > Has already been processed within this session > Running make test > > > PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > > t/dom_jp_attr.........FAILED tests 3, 9, 12, 14, 19, 22 > Failed 6/23 tests, 73.91% okay > t/dom_jp_cdata........FAILED test 3 > Failed 1/3 tests, 66.67% okay > t/dom_jp_example......ok > t/dom_jp_minus........FAILED test 2 > Failed 1/2 tests, 50.00% okay > t/dom_jp_modify.......FAILED test 16 > Failed 1/16 tests, 93.75% okay > t/dom_jp_print........FAILED tests 2-3 > Failed 2/3 tests, 33.33% okay > > > Failed Test Stat Wstat Total Fail Failed List of Failed > t/dom_jp_attr.t 23 6 26.09% 3 9 12 14 19 22 > t/dom_jp_cdata.t 3 1 33.33% 3 > t/dom_jp_minus.t 2 1 50.00% 2 > t/dom_jp_modify.t 16 1 6.25% 16 > t/dom_jp_print.t 3 2 66.67% 2-3 > Failed 5/21 test scripts, 76.19% okay. 11/129 subtests failed, 91.47% okay. > > > make: *** [test_dynamic] Error 255 > /usr/bin/make test -- NOT OK > Running make install > make test had returned bad status, won't install without force > > > Running install for module GD > Running make for L/LD/LDS/GD-2.19.tar.gz > Is already unwrapped into directory /root/.cpan/build/GD-2.19 > Has already been processed within this session > Running make test > > > PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > t/GD..........Can't load './blib/arch/auto/GD/GD.so' for module GD: > ./blib/arch/auto/GD/GD.so: undefined symbol: gdImageGifAnimAddPtr at > /usr/lib/perl5/5.8.5/i386-linux-thread-multi/DynaLoader.pm line 230. > at t/GD.t line 13 > Compilation failed in require at t/GD.t line 13. > BEGIN failed--compilation aborted at t/GD.t line 13. > t/GD..........dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 1-10 > Failed 10/10 tests, 0.00% okay > t/Polyline....Can't load > '/root/.cpan/build/GD-2.19/blib/arch/auto/GD/GD.so' for module GD: > /root/.cpan/build/GD-2.19/blib/arch/auto/GD/GD.so: undefined symbol: > gdImageGifAnimAddPtr at > /usr/lib/perl5/5.8.5/i386-linux-thread-multi/DynaLoader.pm line 230. > at /root/.cpan/build/GD-2.19/blib/lib/GD/Polyline.pm line 45 > Compilation failed in require at > /root/.cpan/build/GD-2.19/blib/lib/GD/Polyline.pm line 45. > BEGIN failed--compilation aborted at > /root/.cpan/build/GD-2.19/blib/lib/GD/Polyline.pm line 45. > Compilation failed in require at t/Polyline.t line 10. > BEGIN failed--compilation aborted at t/Polyline.t line 10. > t/Polyline....dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED test 1 > Failed 1/1 tests, 0.00% okay > > > Failed Test Stat Wstat Total Fail Failed List of Failed > -------------------------------------------------------------------------- > -----t/GD.t 255 65280 10 19 190.00% 1-10 > t/Polyline.t 255 65280 1 2 200.00% 1 > Failed 2/2 test scripts, 0.00% okay. 11/11 subtests failed, 0.00% okay. > > > make: *** [test_dynamic] Error 255 > /usr/bin/make test -- NOT OK > Running make install > make test had returned bad status, won't install without force > > > Bundle summary: The following items in bundle Bundle::BioPerl had > installation problems: > XML::DOM GD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From skirov at utk.edu Wed Apr 27 18:56:15 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Apr 27 18:49:31 2005 Subject: [Bioperl-l] Having no luck installing bioperl in Fedora Core 3 with Perl 5.8.5 In-Reply-To: <000c01c54b68$62bf3fb0$7007e6a9@salamander> References: <000c01c54b68$62bf3fb0$7007e6a9@salamander> Message-ID: <4270188F.1080100@utk.edu> Oliver, I have seen the same error for XML::DOM and I remember I fixed it by updating another package, but I don't remember which one. Try to install it through the usual .configure/make stuff. You will get a warning telling you which module may need an update. As for GD see this post http://www.issociate.de/board/post/153088/problem_installing_GD-2.19.html. Hope it helps, Stefan Oliver Homann wrote: >Hi, > >We just recently purchased a new workstation, and I've been trying with absolutely no success to install bioperl. I've followed the instructions at bioperl.org, attempting several techniques (installing Bundle::BioPerl or bioperl-1.4.tar.gz via CPAN or installing the latter manually), and continue to encounter a litany of errors. I've pasted a slightly truncated version of the output from my latest attempt to install Bundle::BioPerl via CPAN... I'm hoping someone can offer me some insight into the nature of my problem. > >Thanks! >Oliver > >#####STARTING CPAN GIVES ME THIS ERROR MESSAGE: > > ># perl -MCPAN -e shell >Undefined value assigned to typeglob at (eval 17) line 15, line 11. >Warning [/etc/inputrc line 11]: > Invalid variable `mark-symlinked-directories' > > >#####TRYING TO INSTALL THE BUNDLE GIVES ME THIS: > > >cpan> install "Bundle::BioPerl" > > >Running install for module XML::DOM >Running make for T/TJ/TJMATHER/XML-DOM-1.43.tar.gz > Is already unwrapped into directory /root/.cpan/build/XML-DOM-1.43 > Has already been processed within this session >Running make test > > >PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" >"test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > >t/dom_jp_attr.........FAILED tests 3, 9, 12, 14, 19, 22 > Failed 6/23 tests, 73.91% okay >t/dom_jp_cdata........FAILED test 3 > Failed 1/3 tests, 66.67% okay >t/dom_jp_example......ok >t/dom_jp_minus........FAILED test 2 > Failed 1/2 tests, 50.00% okay >t/dom_jp_modify.......FAILED test 16 > Failed 1/16 tests, 93.75% okay >t/dom_jp_print........FAILED tests 2-3 > Failed 2/3 tests, 33.33% okay > > >Failed Test Stat Wstat Total Fail Failed List of Failed >t/dom_jp_attr.t 23 6 26.09% 3 9 12 14 19 22 >t/dom_jp_cdata.t 3 1 33.33% 3 >t/dom_jp_minus.t 2 1 50.00% 2 >t/dom_jp_modify.t 16 1 6.25% 16 >t/dom_jp_print.t 3 2 66.67% 2-3 >Failed 5/21 test scripts, 76.19% okay. 11/129 subtests failed, 91.47% okay. > > >make: *** [test_dynamic] Error 255 > /usr/bin/make test -- NOT OK >Running make install > make test had returned bad status, won't install without force > > >Running install for module GD >Running make for L/LD/LDS/GD-2.19.tar.gz > Is already unwrapped into directory /root/.cpan/build/GD-2.19 > Has already been processed within this session >Running make test > > >PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" >"test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >t/GD..........Can't load './blib/arch/auto/GD/GD.so' for module GD: >./blib/arch/auto/GD/GD.so: undefined symbol: gdImageGifAnimAddPtr at >/usr/lib/perl5/5.8.5/i386-linux-thread-multi/DynaLoader.pm line 230. > at t/GD.t line 13 >Compilation failed in require at t/GD.t line 13. >BEGIN failed--compilation aborted at t/GD.t line 13. >t/GD..........dubious > Test returned status 255 (wstat 65280, 0xff00) >DIED. FAILED tests 1-10 > Failed 10/10 tests, 0.00% okay >t/Polyline....Can't load >'/root/.cpan/build/GD-2.19/blib/arch/auto/GD/GD.so' for module GD: >/root/.cpan/build/GD-2.19/blib/arch/auto/GD/GD.so: undefined symbol: >gdImageGifAnimAddPtr at >/usr/lib/perl5/5.8.5/i386-linux-thread-multi/DynaLoader.pm line 230. > at /root/.cpan/build/GD-2.19/blib/lib/GD/Polyline.pm line 45 >Compilation failed in require at >/root/.cpan/build/GD-2.19/blib/lib/GD/Polyline.pm line 45. >BEGIN failed--compilation aborted at >/root/.cpan/build/GD-2.19/blib/lib/GD/Polyline.pm line 45. >Compilation failed in require at t/Polyline.t line 10. >BEGIN failed--compilation aborted at t/Polyline.t line 10. >t/Polyline....dubious > Test returned status 255 (wstat 65280, 0xff00) >DIED. FAILED test 1 > Failed 1/1 tests, 0.00% okay > > >Failed Test Stat Wstat Total Fail Failed List of Failed >-------------------------------------------------------------------------- >-----t/GD.t 255 65280 10 19 190.00% 1-10 >t/Polyline.t 255 65280 1 2 200.00% 1 >Failed 2/2 test scripts, 0.00% okay. 11/11 subtests failed, 0.00% okay. > > >make: *** [test_dynamic] Error 255 > /usr/bin/make test -- NOT OK >Running make install > make test had returned bad status, won't install without force > > >Bundle summary: The following items in bundle Bundle::BioPerl had >installation problems: > XML::DOM GD >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From brian_osborne at cognia.com Wed Apr 27 20:54:29 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Apr 27 20:47:26 2005 Subject: [Bioperl-l] retrieving sequences by ID In-Reply-To: <462784640504270733591dfcfd@mail.gmail.com> Message-ID: Sean, Definitely take a look at indexing, it's described in a few places including the Beginner's HOWTO (http://bioperl.org/HOWTOs/Beginners/indexing.html). Easy and much faster than SeqIO as a means of retrieving sequences. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Sean O'Keeffe Sent: Wednesday, April 27, 2005 10:33 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] retrieving sequences by ID Hi all, This is probably an easy problem for ye, but one I'm having difficulty with none the less. I'm trying to extract only sequences from a fasta file (containing ~38,000 sequences) containing a specific ID in the header line e.g. return only the sequence for header containing 'ABCD12346' from: >ABCD12345|followed by the rest of the description acgtacgtgttttgggccctttaaa..... >ABCD12346|description acgtacgtgttttgggccctttaaa..... >ABCD12347|description acgtacgtgttttgggccctttaaa..... ... The specific ID's are contained in a list(~20,000) which I want to loop through. This is what I have done so far w/out any luck: ############################ use strict; use lib "/usr/lib/perl5/site_perl/5.8.1/"; use Bio::SeqIO; my (@ids)=@_ my $seq_in = Bio::SeqIO->new( '-format' => "fasta", '-file' => "$fastafile"); my $seq_out = Bio::SeqIO->new( '-format' => "fasta", '-file' => "$outfile"); for ($i=0;$i<=scalar(@ids);$i++){ while ($sequence = $seq_in->next_seq){ if ($sequence->display_id =~ /^>$ids[$i](.*$)/){ $seq_out->write_seq($sequence); } } } exit; ############################# This has so far returned a list of 3 fasta headers and the programs then finishes without errors. I'd like to know where I'm going wrong and if possible, how I could improve on things to prevent memory usage/speed it up. Thanks in advance, Sean O'Keeffe. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From homann at wi.mit.edu Wed Apr 27 23:09:53 2005 From: homann at wi.mit.edu (Oliver Homann) Date: Wed Apr 27 23:03:22 2005 Subject: [Bioperl-l] Trouble with fetching a sequence from indexed fasta file References: <000c01c54b68$62bf3fb0$7007e6a9@salamander> <1114634683.6376.49.camel@localhost.localdomain> Message-ID: <000a01c54b9f$c225af20$7007e6a9@salamander> Hello everyone, First off, thanks to those of you that advised my about bioperl installation. I was eventually able to install using the Fedora core 2 version at biolinux.com (it seems to work so far). I'm writing now in the hopes that someone can help resolve a bit of beginner's confusion that is currently plaguing me. Specifically, I'm trying to fetch sequences from a large indexed fasta file using the code posted below. I;'ve left out everything but the relevant section. The part that has me puzzled is that my string of IDs that I'm using to fetch the sequences only works when entered manually into the fetch() method. In other words, the block below doesn't work, nor does the version without the extra $id = 'orf...' line. These approaches yield the error message "Can't call method seq on an undefined value". However, it DOES work if I relpace fetch($id) with fetch ('orf19.1168.prot_[A.aaa]'). Could someone please explain to me why this is the case? I suspect I'm missing something fundamental here... Many thanks! Oliver my $inx = Bio::Index::Fasta->new('/home/oliver/Sequences/proteins.idx'); foreach my $id(@idlist) { print "$id \n"; $id = 'orf19.1168.prot_[A_aaa]'; my $seq_obj = $inx->fetch($id); my $sequence = $seq_obj->seq; print "$sequence \n"; } From walsh at cenix-bioscience.com Thu Apr 28 01:38:48 2005 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Thu Apr 28 01:32:00 2005 Subject: [Bioperl-l] Trouble with fetching a sequence from indexed fasta file In-Reply-To: <000a01c54b9f$c225af20$7007e6a9@salamander> References: <000c01c54b68$62bf3fb0$7007e6a9@salamander> <1114634683.6376.49.camel@localhost.localdomain> <000a01c54b9f$c225af20$7007e6a9@salamander> Message-ID: <427076E8.5090402@cenix-bioscience.com> Hello Oliver, The error message "Can't call method seq on an undefined value" means that the return value of $inx->fetch($id) is undefined. You are then trying to call the method 'seq' on this undefined value, which is a run-time error. This likely means that one of the ids that you are trying to find is not in your proteins.idx file. You could print the accession at the top of the for loop to find out which one it is. Or you could use this: my $seq_obj = $inx->fetch($id); if (! $seq_obj) { print "$id was not retrievable\n"; # put in code to handle this case } else { # now you can go ahead and print the sequence my $sequence = $seq_obj->seq; ... } HTH, Andrew Oliver Homann wrote: > Hello everyone, > > First off, thanks to those of you that advised my about bioperl > installation. I was eventually able to install using the Fedora core 2 > version at biolinux.com (it seems to work so far). > > I'm writing now in the hopes that someone can help resolve a bit of > beginner's confusion that is currently plaguing me. Specifically, I'm > trying to fetch sequences from a large indexed fasta file using the code > posted below. I;'ve left out everything but the relevant section. The > part that has me puzzled is that my string of IDs that I'm using to > fetch the sequences only works when entered manually into the fetch() > method. In other words, the block below doesn't work, nor does the > version without the extra $id = 'orf...' line. These approaches yield > the error message "Can't call method seq on an undefined value". > However, it DOES work if I relpace fetch($id) with fetch > ('orf19.1168.prot_[A.aaa]'). Could someone please explain to me why > this is the case? I suspect I'm missing something fundamental here... > > Many thanks! > Oliver > > my $inx = Bio::Index::Fasta->new('/home/oliver/Sequences/proteins.idx'); > > foreach my $id(@idlist) { > > print "$id \n"; > > $id = 'orf19.1168.prot_[A_aaa]'; > > my $seq_obj = $inx->fetch($id); > > my $sequence = $seq_obj->seq; > > print "$sequence \n"; > > } > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Tel. +49-351-4173 137 Fax +49-351-4173 109 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From hota.fin at freemail.hu Thu Apr 28 08:24:44 2005 From: hota.fin at freemail.hu (Horvath Tamas) Date: Thu Apr 28 07:43:28 2005 Subject: [Bioperl-l] different label colours Message-ID: <4270D60C.3010308@freemail.hu> I'm trying to use different label colours in one single track, but the 'sub {}' does not work for the '-fontcolor' option. Is there a solution? If not yet, where should I look over the code, to implement it? Hota PS.: -fontcolor => sub { my $feature = shift; return 'red' if $feature->primary_tag =~ /mudr/i; return 'blue' if $feature->primary_tag =~ /zn_finger/i; return 'orange' if $feature->primary_tag =~ /repeat/i; return 'green' if $feature->primary_tag eq 'exon'; }, this is how it looks like, but the label color is consistently black (though if I explicitly use -fontcolor => 'green' then the label is green indeed) From crabtree at tigr.org Thu Apr 28 09:24:12 2005 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Thu Apr 28 09:17:32 2005 Subject: [Bioperl-l] Trouble with fetching a sequence from indexed fasta file Message-ID: Hi Oliver- > fetch($id) with fetch ('orf19.1168.prot_[A.aaa]'). Could Your Perl code (see below) says "A_aaa", not "A.aaa", which would explain the discrepancy (unless the "A.aaa" was a typo in your e-mail): > $id = 'orf19.1168.prot_[A_aaa]'; If this is the case then perhaps there was a similar typo/error in your original array of ids, @idlist? Jonathan From scannedr at tcd.ie Thu Apr 28 05:53:22 2005 From: scannedr at tcd.ie (Devin Scannell) Date: Thu Apr 28 10:17:15 2005 Subject: [Bioperl-l] retrieving sequences by ID In-Reply-To: <1114537804.426e7f4c8ce91@mymail.tcd.ie> References: <1114537804.426e7f4c8ce91@mymail.tcd.ie> Message-ID: <60b53fefd492a8a420ded3774b1cbee5@tcd.ie> Hi Sean, an easy way to do this is to (literally) use Bio::DB::Fasta best, Devin #!/usr/bin/perl use Bio::DB::Fasta; use Bio::SeqIO; my $db = Bio::DB::Fasta->new('your_fasta_file'); my $seq_out = Bio::SeqIO->new(-format => 'fasta', -file => 'new_file') @names = @_; foreach my $name (@names) { my $seq_obj = $db->get_Seq_by_id($name); $seq_out->write_seq($sequence) if $seq->display_id =~ /^>$ids[$i](.*$)/; } On 26 Apr 2005, at 18:50, sokeeff@tcd.ie wrote: > Hi all, > Probably an easy problem for ye, but one I'm having difficulty with > none the > less. > I'm trying to extract only sequences from a fasta file(~38,000 > sequences) > containing a specific ID in the header files e.g. > return only the sequence for header containing 'ABCD12346' from: >> ABCD12345|description > acgtacgtgttttgggccctttaaa..... >> ABCD12346|description > acgtacgtgttttgggccctttaaa..... >> ABCD12347|description > acgtacgtgttttgggccctttaaa..... > ... > > The ID's are contained in a list(~20,000) which I want to loop through. > This is what I have done so far w/out any luck: > > ############################ > > use strict; > use lib "/usr/lib/perl5/site_perl/5.8.1/"; > use Bio::SeqIO; > > my (@ids)=@_ > my $seq_in = Bio::SeqIO->new( '-format' => "fasta", > '-file' => "$fastafile"); > my $seq_out = Bio::SeqIO->new( '-format' => "fasta", > '-file' => "$outfile"); > for ($i=0;$i<=scalar(@ids);$i++){ > while ($sequence = $seq_in->next_seq){ > if ($sequence->display_id =~ /^>$ids[$i](.*$)/){ > $seq_out->write_seq($sequence); > } > } > } > exit; > > ############################# > > This returns a small list of fasta headers. I'd like to know if it's > the right > way to go about the task, where I'm going wrong and if possible, how I > could > improve on things to prevent memory usage/speed it up. > > Thanks in advance, > Sean O'Keeffe. > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From crabtree at tigr.org Thu Apr 28 11:22:52 2005 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Thu Apr 28 11:16:07 2005 Subject: [Bioperl-l] different label colours Message-ID: Hi Hota- This should work. Why don't you try inserting the following line in your anonymous sub (after "my $feature = shift;") and then tell us what (if anything) shows up on STDERR when you run your script: print STDERR "tag='", $feature->primary_tag, "'\n"; Jonathan > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > Horvath Tamas > Sent: Thursday, April 28, 2005 8:25 AM > To: Bioperl > Subject: [Bioperl-l] different label colours > > > I'm trying to use different label colours in one single > track, but the > 'sub {}' does not work for the '-fontcolor' option. Is there > a solution? > If not yet, where should I look over the code, to implement it? > > Hota > > PS.: > > -fontcolor => sub { my $feature = shift; > return 'red' if > $feature->primary_tag =~ /mudr/i; > return 'blue' if > $feature->primary_tag =~ /zn_finger/i; > return 'orange' if > $feature->primary_tag =~ /repeat/i; > return 'green' if > $feature->primary_tag eq 'exon'; > }, > this is how it looks like, but the label color is consistently black > (though if I explicitly use -fontcolor => 'green' then the label is > green indeed) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-> bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Fri Apr 29 01:20:07 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Apr 29 01:13:20 2005 Subject: [Bioperl-l] Bio::Ontology::Term references quetsion In-Reply-To: <426FE0FE.3060804@utk.edu> Message-ID: <59B1A3F6-B86E-11D9-B77B-000A959EB4C4@gmx.net> What makes you think it does? I can't see anything in the code that would enforce this. Do you get receive an error if you add a reference without a title? If so, what is the stack trace? -hilmar On Wednesday, April 27, 2005, at 11:59 AM, Stefan Kirov wrote: > Is it true that Bio::Ontology::Term add_references method checks for > title? If so there is a discrepancy with Bio::Annotation::Reference, > as this object does not require a title upon creation. I don't think > this should be such requirement in Bio::Ontology::Term unless I am > missing something. Also the documentation does not say there is such > requirement... > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hota.fin at freemail.hu Fri Apr 29 10:33:17 2005 From: hota.fin at freemail.hu (Horvath Tamas) Date: Fri Apr 29 09:25:01 2005 Subject: [Bioperl-l] different label colours In-Reply-To: References: Message-ID: <427245AD.8070203@freemail.hu> Crabtree, Jonathan wrote: >Hi Hota- > >This should work. Why don't you try inserting the following line in >your anonymous sub (after "my $feature = shift;") and then tell us what >(if anything) shows up on STDERR when you run your script: > >print STDERR "tag='", $feature->primary_tag, "'\n"; > >Jonathan > > > >>-----Original Message----- >>From: bioperl-l-bounces@portal.open-bio.org >>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of >>Horvath Tamas >>Sent: Thursday, April 28, 2005 8:25 AM >>To: Bioperl >>Subject: [Bioperl-l] different label colours >> >> >>I'm trying to use different label colours in one single >>track, but the >>'sub {}' does not work for the '-fontcolor' option. Is there >>a solution? >>If not yet, where should I look over the code, to implement it? >> >>Hota >> >>PS.: >> >>-fontcolor => sub { my $feature = shift; >> return 'red' if >>$feature->primary_tag =~ /mudr/i; >> return 'blue' if >>$feature->primary_tag =~ /zn_finger/i; >> return 'orange' if >>$feature->primary_tag =~ /repeat/i; >> return 'green' if >>$feature->primary_tag eq 'exon'; >> }, >>this is how it looks like, but the label color is consistently black >>(though if I explicitly use -fontcolor => 'green' then the label is >>green indeed) >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-> bio.org/mailman/listinfo/bioperl-l >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > Sorry, it's pretty messed up, but anyway, it looks like: (at font color, it is always '') tag_at_glyph='mudr_exon' tag_at_glyph='mudr_exon' tag_at_glyph='' tag_at_glyph='zn_finger_exon' tag_at_glyph='zn_finger_exon'tag_at_connector=' ' tag_at_connector='' tag_at_connector='repeat_L' tag_at_strand_arrow='repeat_L' tag_at_bgcolor='repeat_L' tag_at_connector='repeat_R' tag_at_strand_arrow='repeat_R' tag_at_bgcolor='repeat_R' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='last_exon' tag_at_strand_arrow='last_exon' tag_at_bgcolor='last_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='repeat_L' tag_at_strand_arrow='repeat_L' tag_at_bgcolor='repeat_L' tag_at_connector='repeat_R' tag_at_strand_arrow='repeat_R' tag_at_bgcolor='repeat_R' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='last_exon' tag_at_strand_arrow='last_exon' tag_at_bgcolor='last_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='repeat_L' tag_at_strand_arrow='repeat_L' tag_at_bgcolor='repeat_L' tag_at_connector='repeat_R' tag_at_strand_arrow='repeat_R' tag_at_bgcolor='repeat_R' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='last_exon' tag_at_strand_arrow='last_exon' tag_at_bgcolor='last_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='repeat_L' tag_at_strand_arrow='repeat_L' tag_at_bgcolor='repeat_L' tag_at_connector='repeat_R' tag_at_strand_arrow='repeat_R' tag_at_bgcolor='repeat_R' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='last_exon' tag_at_strand_arrow='last_exon' tag_at_bgcolor='last_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='repeat_L' tag_at_strand_arrow='repeat_L' tag_at_bgcolor='repeat_L' tag_at_connector='repeat_R' tag_at_strand_arrow='repeat_R' tag_at_bgcolor='repeat_R' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='last_exon' tag_at_strand_arrow='last_exon' tag_at_bgcolor='last_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='repeat_L' tag_at_strand_arrow='repeat_L' tag_at_bgcolor='repeat_L' tag_at_connector='repeat_R' tag_at_strand_arrow='repeat_R' tag_at_bgcolor='repeat_R' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='exon' tag_at_strand_arrow='exon' tag_at_bgcolor='exon' tag_at_connector='last_exon' tag_at_strand_arrow='last_exon' tag_at_bgcolor='last_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_connector='mudr_exon' tag_at_strand_arrow='mudr_exon' tag_at_bgcolor='mudr_exon' tag_at_fontcolor='' tag_at_connector='' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' tag_at_bgcolor='zn_finger_exon' tag_at_fontcolor='' From crabtree at tigr.org Fri Apr 29 09:50:22 2005 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Fri Apr 29 09:43:31 2005 Subject: [Bioperl-l] different label colours Message-ID: Hota- That's interesting. I suspect that the problem is actually not in your -fontcolor subroutine, but somewhere else in your script. Can you show us the rest of the code? Either your labeled features aren't getting assigned a primary_tag correctly, or perhaps the primary_tag value is being erased somehow. For example, maybe one of your other subroutines is accidentally invoking primary_tag as a setter, not a getter, as in $feature->primary_tag('') or $feature->primary_tag(undef) Jonathan > -----Original Message----- > From: Horvath Tamas [mailto:hota.fin@freemail.hu] > Sent: Friday, April 29, 2005 10:33 AM > To: Crabtree, Jonathan > Cc: Bioperl > Subject: Re: [Bioperl-l] different label colours > > > Crabtree, Jonathan wrote: > > >Hi Hota- > > > >This should work. Why don't you try inserting the following line in > >your anonymous sub (after "my $feature = shift;") and then > tell us what > >(if anything) shows up on STDERR when you run your script: > > > >print STDERR "tag='", $feature->primary_tag, "'\n"; > > > >Jonathan > > > > > > > >>-----Original Message----- > >>From: bioperl-l-bounces@portal.open-bio.org > >>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > >>Horvath Tamas > >>Sent: Thursday, April 28, 2005 8:25 AM > >>To: Bioperl > >>Subject: [Bioperl-l] different label colours > >> > >> > >>I'm trying to use different label colours in one single > >>track, but the > >>'sub {}' does not work for the '-fontcolor' option. Is there > >>a solution? > >>If not yet, where should I look over the code, to implement it? > >> > >>Hota > >> > >>PS.: > >> > >>-fontcolor => sub { my $feature = shift; > >> return 'red' if > >>$feature->primary_tag =~ /mudr/i; > >> return 'blue' if > >>$feature->primary_tag =~ /zn_finger/i; > >> return 'orange' if > >>$feature->primary_tag =~ /repeat/i; > >> return 'green' if > >>$feature->primary_tag eq 'exon'; > >> }, > >>this is how it looks like, but the label color is > consistently black > >>(though if I explicitly use -fontcolor => 'green' then the label is > >>green indeed) > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l@portal.open-bio.org > >>http://portal.open-> bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > Sorry, it's pretty messed up, but anyway, it looks like: (at > font color, > it is always '') > > tag_at_glyph='mudr_exon' > tag_at_glyph='mudr_exon' > tag_at_glyph='' > tag_at_glyph='zn_finger_exon' > tag_at_glyph='zn_finger_exon'tag_at_connector=' > ' > tag_at_connector='' > tag_at_connector='repeat_L' > tag_at_strand_arrow='repeat_L' > tag_at_bgcolor='repeat_L' > tag_at_connector='repeat_R' > tag_at_strand_arrow='repeat_R' > tag_at_bgcolor='repeat_R' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='last_exon' > tag_at_strand_arrow='last_exon' > tag_at_bgcolor='last_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='repeat_L' > tag_at_strand_arrow='repeat_L' > tag_at_bgcolor='repeat_L' > tag_at_connector='repeat_R' > tag_at_strand_arrow='repeat_R' > tag_at_bgcolor='repeat_R' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='last_exon' > tag_at_strand_arrow='last_exon' > tag_at_bgcolor='last_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='repeat_L' > tag_at_strand_arrow='repeat_L' > tag_at_bgcolor='repeat_L' > tag_at_connector='repeat_R' > tag_at_strand_arrow='repeat_R' > tag_at_bgcolor='repeat_R' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='last_exon' > tag_at_strand_arrow='last_exon' > tag_at_bgcolor='last_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='repeat_L' > tag_at_strand_arrow='repeat_L' > tag_at_bgcolor='repeat_L' > tag_at_connector='repeat_R' > tag_at_strand_arrow='repeat_R' > tag_at_bgcolor='repeat_R' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='last_exon' > tag_at_strand_arrow='last_exon' > tag_at_bgcolor='last_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='repeat_L' > tag_at_strand_arrow='repeat_L' > tag_at_bgcolor='repeat_L' > tag_at_connector='repeat_R' > tag_at_strand_arrow='repeat_R' > tag_at_bgcolor='repeat_R' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='last_exon' > tag_at_strand_arrow='last_exon' > tag_at_bgcolor='last_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='repeat_L' > tag_at_strand_arrow='repeat_L' > tag_at_bgcolor='repeat_L' > tag_at_connector='repeat_R' > tag_at_strand_arrow='repeat_R' > tag_at_bgcolor='repeat_R' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='exon' > tag_at_strand_arrow='exon' > tag_at_bgcolor='exon' > tag_at_connector='last_exon' > tag_at_strand_arrow='last_exon' > tag_at_bgcolor='last_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_connector='mudr_exon' > tag_at_strand_arrow='mudr_exon' > tag_at_bgcolor='mudr_exon' > tag_at_fontcolor='' > tag_at_connector='' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' > tag_at_bgcolor='zn_finger_exon' > tag_at_fontcolor='' > > From n.haigh at sheffield.ac.uk Fri Apr 29 04:29:02 2005 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri Apr 29 09:48:33 2005 Subject: [Bioperl-l] BLAST results format Message-ID: <001401c54c95$800d23b0$81922d50@bmbpc196> Is it possible to use Bio::Tools::Run::RemoteBlast to simply run the BLAST and retrieve the results in the text/html format as output from the NCBI servers? I'd like to present the users of my program with the same format that they would see when doing the search themselves from a web browser. Also does anyone know of any Perl modules that are able to parse and display html code - what I mean is something maybe like a light weight web browser? Or something that can interpret the html code to format the text rather than the ability to actually browser the web and fetch new pages using hyperlinks. Cheers Nathan From lupey+ at pitt.edu Fri Apr 29 09:25:00 2005 From: lupey+ at pitt.edu (Paul Cantalupo) Date: Fri Apr 29 09:48:35 2005 Subject: [Bioperl-l] Windows BLAST problems under Cygwin Message-ID: <427235AC.4050704@pitt.edu> Hello, I am running BioPerl 1.4 on Windows 2000 under Cygwin (therefore, I use Perl that comes with Cygwin; not Windows Perl). I am trying to run a standalone blast. I installed the Windows version of BLAST as recommended by the BioPerl installation instructions. My script (see localblast.pl below) takes an input sequence file (see test.fa below) and performs a blastp. By running the script with the following command line, I get the this error: $ localblast.pl test.fa [NULL_Caption] FATAL ERROR: blast: Unable to open input file /tmp/4lkjmTjRio ------------- EXCEPTION ------------- MSG: blastall call crashed: 256 /usr/local/blast/blastall -p blastp -d "/ecoli.nt" -i /tmp/4lkjmTjRio -o /tmp/llctIvZlC6 STACK Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8/Bio/Tools/Ru n/StandAloneBlast.pm:732 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/lib/perl5/site_perl/5.8/B io/Tools/Run/StandAloneBlast.pm:680 STACK Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/site_perl/5.8/Bio/Tools/Run /StandAloneBlast.pm:536 STACK toplevel ./localblast.pl:17 -------------------------------------- Notice that blast is unable to open the input file /tmp/4lkjmTjRio (which the library StandAloneBlast created). Next, I tried to run blastall directly from the commandline, with a file in the /tmp directory but it gave me the same error: 'unable to open input file'. But blastall does execute properly I use an input file that is in the current directory (using a relative path name in the -i option). But if I set the -i option to any absolute reference for a file like /home/lupey/fasta.fa, it fails and the error is the same: 'Unable to open input file'. So, why does BioPerl suggest using the Windows version of Blast if it can't open files using absolute references to files especially when the StandAloneBlast library places the inputfile in the /tmp directory? What solution can I employ to fix this? Thank you, Paul #localblast.pl #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], -format => 'fasta'); my $query = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => 'ecoli.nt', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); my $result = $blast_report->next_result; while( my $hit = $result->next_hit()) { print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n";} test.fa: >Test AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT From jason.stajich at duke.edu Fri Apr 29 10:04:53 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Apr 29 09:58:05 2005 Subject: [Bioperl-l] Windows BLAST problems under Cygwin In-Reply-To: <427235AC.4050704@pitt.edu> References: <427235AC.4050704@pitt.edu> Message-ID: blast may have crashed because you didn't set the BLASTDIR - notice your path to the ecoli db is /ecoli.nt which is probably incorrect. You can vary which dir it uses for tempfiles by setting TEMPDIR environment variable. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 29, 2005, at 9:25 AM, Paul Cantalupo wrote: > Hello, > > I am running BioPerl 1.4 on Windows 2000 under Cygwin (therefore, I > use Perl that comes with Cygwin; not Windows Perl). I am trying to run > a standalone blast. I installed the Windows version of BLAST as > recommended by the BioPerl installation instructions. My script (see > localblast.pl below) takes an input sequence file (see test.fa below) > and performs a blastp. By running the script with the following > command line, I get the this error: > > $ localblast.pl test.fa > [NULL_Caption] FATAL ERROR: blast: Unable to open input file > /tmp/4lkjmTjRio > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 /usr/local/blast/blastall -p blastp > -d "/ecoli.nt" -i > /tmp/4lkjmTjRio -o /tmp/llctIvZlC6 > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > /usr/lib/perl5/site_perl/5.8/Bio/Tools/Ru > n/StandAloneBlast.pm:732 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > /usr/lib/perl5/site_perl/5.8/B > io/Tools/Run/StandAloneBlast.pm:680 > STACK Bio::Tools::Run::StandAloneBlast::blastall > /usr/lib/perl5/site_perl/5.8/Bio/Tools/Run > /StandAloneBlast.pm:536 > STACK toplevel ./localblast.pl:17 > > -------------------------------------- > > > Notice that blast is unable to open the input file /tmp/4lkjmTjRio > (which the library StandAloneBlast created). Next, I tried to run > blastall directly from the commandline, with a file in the /tmp > directory but it gave me the same error: 'unable to open input file'. > But blastall does execute properly I use an input file that is in the > current directory (using a relative path name in the -i option). But > if I set the -i option to any absolute reference for a file like > /home/lupey/fasta.fa, it fails and the error is the same: 'Unable to > open input file'. > > So, why does BioPerl suggest using the Windows version of Blast if it > can't open files using absolute references to files especially when > the StandAloneBlast library places the inputfile in the /tmp > directory? What solution can I employ to fix this? > > Thank you, > > Paul > > > > #localblast.pl > > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], -format => 'fasta'); > my $query = $Seq_in->next_seq(); > > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > 'database' => > 'ecoli.nt', > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), " significance: ", > $hit->significance(), "\n";} > > > > test.fa: > >Test > AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC > TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA > TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC > ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG > CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA > GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC > AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG > AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hota.fin at freemail.hu Fri Apr 29 11:06:52 2005 From: hota.fin at freemail.hu (Horvath Tamas) Date: Fri Apr 29 09:58:31 2005 Subject: [Bioperl-l] different label colours In-Reply-To: References: Message-ID: <42724D8C.6030406@freemail.hu> Crabtree, Jonathan wrote: >Hota- > >That's interesting. I suspect that the problem is actually not in your >-fontcolor subroutine, but somewhere else in your script. Can you show >us the rest of the code? Either your labeled features aren't getting >assigned a primary_tag correctly, or perhaps the primary_tag value is >being erased somehow. For example, maybe one of your other subroutines >is accidentally invoking primary_tag as a setter, not a getter, as in >$feature->primary_tag('') or $feature->primary_tag(undef) > >Jonathan > > > >>-----Original Message----- >>From: Horvath Tamas [mailto:hota.fin@freemail.hu] >>Sent: Friday, April 29, 2005 10:33 AM >>To: Crabtree, Jonathan >>Cc: Bioperl >>Subject: Re: [Bioperl-l] different label colours >> >> >>Crabtree, Jonathan wrote: >> >> >> >>>Hi Hota- >>> >>>This should work. Why don't you try inserting the following line in >>>your anonymous sub (after "my $feature = shift;") and then >>> >>> >>tell us what >> >> >>>(if anything) shows up on STDERR when you run your script: >>> >>>print STDERR "tag='", $feature->primary_tag, "'\n"; >>> >>>Jonathan >>> >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: bioperl-l-bounces@portal.open-bio.org >>>>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of >>>>Horvath Tamas >>>>Sent: Thursday, April 28, 2005 8:25 AM >>>>To: Bioperl >>>>Subject: [Bioperl-l] different label colours >>>> >>>> >>>>I'm trying to use different label colours in one single >>>>track, but the >>>>'sub {}' does not work for the '-fontcolor' option. Is there >>>>a solution? >>>>If not yet, where should I look over the code, to implement it? >>>> >>>>Hota >>>> >>>>PS.: >>>> >>>>-fontcolor => sub { my $feature = shift; >>>> return 'red' if >>>>$feature->primary_tag =~ /mudr/i; >>>> return 'blue' if >>>>$feature->primary_tag =~ /zn_finger/i; >>>> return 'orange' if >>>>$feature->primary_tag =~ /repeat/i; >>>> return 'green' if >>>>$feature->primary_tag eq 'exon'; >>>> }, >>>>this is how it looks like, but the label color is >>>> >>>> >>consistently black >> >> >>>>(though if I explicitly use -fontcolor => 'green' then the label is >>>>green indeed) >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-> bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >>> >>Sorry, it's pretty messed up, but anyway, it looks like: (at >>font color, >>it is always '') >> >>tag_at_glyph='mudr_exon' >>tag_at_glyph='mudr_exon' >>tag_at_glyph='' >>tag_at_glyph='zn_finger_exon' >>tag_at_glyph='zn_finger_exon'tag_at_connector=' >>' >>tag_at_connector='' >>tag_at_connector='repeat_L' >>tag_at_strand_arrow='repeat_L' >>tag_at_bgcolor='repeat_L' >>tag_at_connector='repeat_R' >>tag_at_strand_arrow='repeat_R' >>tag_at_bgcolor='repeat_R' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='last_exon' >>tag_at_strand_arrow='last_exon' >>tag_at_bgcolor='last_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='repeat_L' >>tag_at_strand_arrow='repeat_L' >>tag_at_bgcolor='repeat_L' >>tag_at_connector='repeat_R' >>tag_at_strand_arrow='repeat_R' >>tag_at_bgcolor='repeat_R' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='last_exon' >>tag_at_strand_arrow='last_exon' >>tag_at_bgcolor='last_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='repeat_L' >>tag_at_strand_arrow='repeat_L' >>tag_at_bgcolor='repeat_L' >>tag_at_connector='repeat_R' >>tag_at_strand_arrow='repeat_R' >>tag_at_bgcolor='repeat_R' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='last_exon' >>tag_at_strand_arrow='last_exon' >>tag_at_bgcolor='last_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='repeat_L' >>tag_at_strand_arrow='repeat_L' >>tag_at_bgcolor='repeat_L' >>tag_at_connector='repeat_R' >>tag_at_strand_arrow='repeat_R' >>tag_at_bgcolor='repeat_R' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='last_exon' >>tag_at_strand_arrow='last_exon' >>tag_at_bgcolor='last_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='repeat_L' >>tag_at_strand_arrow='repeat_L' >>tag_at_bgcolor='repeat_L' >>tag_at_connector='repeat_R' >>tag_at_strand_arrow='repeat_R' >>tag_at_bgcolor='repeat_R' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='last_exon' >>tag_at_strand_arrow='last_exon' >>tag_at_bgcolor='last_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='repeat_L' >>tag_at_strand_arrow='repeat_L' >>tag_at_bgcolor='repeat_L' >>tag_at_connector='repeat_R' >>tag_at_strand_arrow='repeat_R' >>tag_at_bgcolor='repeat_R' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='exon' >>tag_at_strand_arrow='exon' >>tag_at_bgcolor='exon' >>tag_at_connector='last_exon' >>tag_at_strand_arrow='last_exon' >>tag_at_bgcolor='last_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_connector='mudr_exon' >>tag_at_strand_arrow='mudr_exon' >>tag_at_bgcolor='mudr_exon' >>tag_at_fontcolor='' >>tag_at_connector='' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_connector='zn_finger_exon' tag_at_strand_arrow='zn_finger_exon' >>tag_at_bgcolor='zn_finger_exon' >>tag_at_fontcolor='' >> >> >> >> > > > > Here's the cycle that u may need (the code is nod that clean, but... ): foreach my $record (@$pretty) { my $features; next unless $record->{R_TIR_START}; #this is only true if the record is valid my $track = $panel->add_track( -glyph => sub { my $feature = shift; print STDERR "tag_at_glyph='", $feature->primary_tag, "'\n"; if ($feature->primary_tag =~ /mudr/i || $feature->primary_tag =~ /zn_finger/i) { return 'generic'} else { return 'segments';} }, -bgcolor => sub { my $feature = shift; print STDERR "tag_at_bgcolor='", $feature->primary_tag, "'\n"; if ($feature->primary_tag =~ /exon/) { if ($feature->primary_tag =~ /mudr/) {return 'red';} elsif ($feature->primary_tag =~ /zn_finger/i) {return 'blue';} else {return 'green';}; } else {return 'orange';} }, -fgcolor => 'black', -connector => sub { my $feature = shift; print STDERR "tag_at_connector='", $feature->primary_tag, "'\n"; $feature->primary_tag =~ /exon/ ? return 'hat' : return 'dashed'; }, -height => 15, -bump => 0, -label => 1, -orient => sub { my $feature = shift; print STDERR "tag_at_orient='", $feature->primary_tag, "'\n"; $feature->primary_tag eq 'repeat_L' ? 'E' : 'W'; }, -fontcolor => sub { my $feature = shift; print STDERR "tag_at_fontcolor='", $feature->primary_tag, "'\n"; return 'red' if $feature->primary_tag =~ /mudr/i; return 'blue' if $feature->primary_tag =~ /zn_finger/i; return 'orange' if $feature->primary_tag =~ /repeat/i; return 'green' if $feature->primary_tag eq 'exon'; }, -font2color => 'green', -point => 0, -strand_arrow => sub { my $feature = shift; print STDERR "tag_at_strand_arrow='", $feature->primary_tag, "'\n"; if ($feature->primary_tag eq 'last_exon' or $feature->primary_tag =~ /repeat/i) {return 1;} else {return 0}; }, -description => sub { my $feature = shift; return unless $feature->has_tag('description'); my ($description) = $feature->each_tag_value('description'); return $description; } ); print '.'; $features = new Bio::SeqFeature::Generic (-display_name => ' '); $subfeature = new Bio::SeqFeature::Generic(-start => $record->{L_TIR_START}, -end => $record->{L_TIR_END}, -primary => 'repeat_L', -source => 'internal', -strand => 1); $features->add_sub_SeqFeature( $subfeature , 'EXPAND'); $subfeature = new Bio::SeqFeature::Generic(-start => $record->{R_TIR_START}, -end => $record->{R_TIR_END}, -primary => 'repeat_R', -source => 'internal', -strand => -1,); $features->add_sub_SeqFeature( $subfeature , 'EXPAND'); $track->add_feature($features); undef $features; my $description = $record->{SEQ_ID}; my @starts = (); my @startx = (); my $lastend = 1; my $s = $record->{L_TIR_START}; my $e = $record->{R_TIR_END}; my $l = $record->{L_TIR_END} - $record->{L_TIR_START}; my $ps = ${$record->{EXON_LIST}->[0]->{START}}; my $pe = ${$record->{EXON_LIST}->[$#{$record->{EXON_LIST}}]->{START}}; my $sc = $record->{SCORE}; $description .= ", GW score: $sc, sequence $s - $e, TIR app.: $l, prot.: $ps - $pe "; $features = new Bio::SeqFeature::Generic (-display_name => ' ', -tag => { description => $description } ); my @exonlist = @{$record->{EXON_LIST}}; my $last_exon = pop @{$record->{EXON_LIST}}; my @prot = (); my $pps = 0; my $ppe = 0; my $xs = 1; my $xe = 1; foreach $exon (@{$record->{EXON_LIST}}) { my $start = ${$exon->{START}}; push @startx , $start; $start -= $lastend; push @starts , $start; $lastend = ${$exon->{END}}; $pps = ${$exon->{START}}; $ppe = ${$exon->{END}}; $xs = $xe; $xe = $xs + int( ($ppe - $pps)/3); push(@prot , "$xs - $xe"); $subfeature = new Bio::SeqFeature::Generic (-start => ${$exon->{START}}, -end => ${$exon->{END}}, -primary => 'exon', -source => 'internal', -strand => 1, ); $features->add_sub_SeqFeature($subfeature,'EXPAND'); my $s = ${$exon->{START}};my $e = ${$exon->{END}};print"$s - $e.."; } $subfeature = new Bio::SeqFeature::Generic (-start => ${$last_exon->{START}}, -end => ${$last_exon->{END}}, -primary => 'last_exon', -source => 'internal', -strand => 1, ); $pps = ${$last_exon->{START}}; $ppe = ${$last_exon->{END}}; $xs = $xe; $xe = $xs + int( ($ppe - $pps)/3); push(@prot , "$xs - $xe"); my $protstat = join ( ".." , @prot); print "\n$protstat\n"; $features->add_sub_SeqFeature($subfeature,'EXPAND'); print "\n"; $track->add_feature($features); undef $features; my $ms = $record->{DOMAINS}->{MUDR}->{START}; my $me = $record->{DOMAINS}->{MUDR}->{END}; print "!$ms !$me\n"; my @mudr_exons = @{&calc_domain_exons($ms,$me,\@exonlist)}; print 1; my $label = "MuDR:$ms - $me"; $features = new Bio::SeqFeature::Generic (-display_name => $label); foreach $exon (@mudr_exons) { $subfeature = new Bio::SeqFeature::Generic (-start => $exon->{START}, -end => $exon->{END}, -primary => 'mudr_exon', -source => 'internal', -strand => 1, ); $features->add_sub_SeqFeature($subfeature,'EXPAND'); my $s = $exon->{START};my $e = $exon->{END};print"M$s - $e.."; } $features->add_sub_SeqFeature($subfeature,'EXPAND'); print "\n"; $track->add_feature($features); undef $features; $ms = 0; $ms = 0; $ms = $record->{DOMAINS}->{Zn_finger}->{START}; $me = $record->{DOMAINS}->{Zn_finger}->{END}; print "!$ms !$me\n"; @mudr_exons = @{&calc_domain_exons($ms,$me,\@exonlist)}; print 1; $label = "Zn:$ms - $me"; $features = new Bio::SeqFeature::Generic (-display_name => $label); foreach $exon (@mudr_exons) { $subfeature = new Bio::SeqFeature::Generic (-start => $exon->{START}, -end => $exon->{END}, -primary => 'zn_finger_exon', -source => 'internal', -strand => 1, ); $features->add_sub_SeqFeature($subfeature,'EXPAND'); my $s = $exon->{START};my $e = $exon->{END};print"Z$s - $e.."; } $features->add_sub_SeqFeature($subfeature,'EXPAND'); print "\n"; $track->add_feature($features) if $ms; undef $features; } From brian_osborne at cognia.com Fri Apr 29 10:25:28 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Apr 29 10:22:57 2005 Subject: [Bioperl-l] Windows BLAST problems under Cygwin In-Reply-To: Message-ID: Paul, This is from the INSTALL.WIN file: Directory for temporary files ============================= Set the environmental variable TMPDIR, programs like BLAST and clustalw need a place to create temporary files. E.g.: setenv TMPDIR e:/cygwin/tmp # csh, tcsh export TMPDIR=e:/cygwin/tmp # sh, bash Note that this is not the syntax that Cygwin understands, which would be something like "/cygdrive/e/cygwin/tmp", but this is the syntax that a Perl module expects on Windows. If this variable is not set correctly you'll see errors like this when you run Bio::Tools::Run::StandAloneBlast: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory STACK: Error::throw Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich Sent: Friday, April 29, 2005 10:05 AM To: Paul Cantalupo Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Windows BLAST problems under Cygwin blast may have crashed because you didn't set the BLASTDIR - notice your path to the ecoli db is /ecoli.nt which is probably incorrect. You can vary which dir it uses for tempfiles by setting TEMPDIR environment variable. -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Apr 29, 2005, at 9:25 AM, Paul Cantalupo wrote: > Hello, > > I am running BioPerl 1.4 on Windows 2000 under Cygwin (therefore, I > use Perl that comes with Cygwin; not Windows Perl). I am trying to run > a standalone blast. I installed the Windows version of BLAST as > recommended by the BioPerl installation instructions. My script (see > localblast.pl below) takes an input sequence file (see test.fa below) > and performs a blastp. By running the script with the following > command line, I get the this error: > > $ localblast.pl test.fa > [NULL_Caption] FATAL ERROR: blast: Unable to open input file > /tmp/4lkjmTjRio > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 /usr/local/blast/blastall -p blastp > -d "/ecoli.nt" -i > /tmp/4lkjmTjRio -o /tmp/llctIvZlC6 > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > /usr/lib/perl5/site_perl/5.8/Bio/Tools/Ru > n/StandAloneBlast.pm:732 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > /usr/lib/perl5/site_perl/5.8/B > io/Tools/Run/StandAloneBlast.pm:680 > STACK Bio::Tools::Run::StandAloneBlast::blastall > /usr/lib/perl5/site_perl/5.8/Bio/Tools/Run > /StandAloneBlast.pm:536 > STACK toplevel ./localblast.pl:17 > > -------------------------------------- > > > Notice that blast is unable to open the input file /tmp/4lkjmTjRio > (which the library StandAloneBlast created). Next, I tried to run > blastall directly from the commandline, with a file in the /tmp > directory but it gave me the same error: 'unable to open input file'. > But blastall does execute properly I use an input file that is in the > current directory (using a relative path name in the -i option). But > if I set the -i option to any absolute reference for a file like > /home/lupey/fasta.fa, it fails and the error is the same: 'Unable to > open input file'. > > So, why does BioPerl suggest using the Windows version of Blast if it > can't open files using absolute references to files especially when > the StandAloneBlast library places the inputfile in the /tmp > directory? What solution can I employ to fix this? > > Thank you, > > Paul > > > > #localblast.pl > > #!/usr/bin/perl > > use strict; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], -format => 'fasta'); > my $query = $Seq_in->next_seq(); > > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > 'database' => > 'ecoli.nt', > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), " significance: ", > $hit->significance(), "\n";} > > > > test.fa: > >Test > AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC > TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA > TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC > ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG > CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA > GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC > AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG > AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From crabtree at tigr.org Fri Apr 29 10:29:20 2005 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Fri Apr 29 10:23:23 2005 Subject: [Bioperl-l] different label colours Message-ID: Hota- Here's your problem; there are a number of places in your code where you're doing something that looks like this- $features = new Bio::SeqFeature::Generic (-display_name => $label); foreach $exon (@mudr_exons) { $subfeature = new Bio::SeqFeature::Generic( ..., -primary => 'zn_finger_exon', ...); $features->add_sub_SeqFeature($subfeature, 'EXPAND'); ... } The problem with this is that your "parent" feature ($features) has no primary_tag. By default, however, Bioperl will only call the fontcolor subroutine on the *parent* feature, not the child features (i.e., the exons, which *do* have valid primary tags.) Here are a couple of ways that you can fix this: 1. Assign each parent feature a valid primary tag when you create it. For example: $features = new Bio::SeqFeature::Generic (-display_name => $label, -primary => 'zn_finger_exon'); 2. Use the -all_callbacks option (see the documentation for Bio::Graphics::Panel). Using option 1. requires that you pick a single color for each parent feature, whereas option 2. will let you assign each exon its own color and/or label. Using all_callbacks does complicate things, however, so I wouldn't option 2 unless you really want to assign each child feature/exon its own label and/or color. Jonathan > -----Original Message----- > From: Horvath Tamas [mailto:hota.fin@freemail.hu] > Sent: Friday, April 29, 2005 11:07 AM > To: Crabtree, Jonathan > Cc: Bioperl > Subject: Re: [Bioperl-l] different label colours > > > Crabtree, Jonathan wrote: > > >Hota- > > > >That's interesting. I suspect that the problem is actually > not in your > >-fontcolor subroutine, but somewhere else in your script. > Can you show > >us the rest of the code? Either your labeled features > aren't getting > >assigned a primary_tag correctly, or perhaps the primary_tag > value is > >being erased somehow. For example, maybe one of your other > subroutines > >is accidentally invoking primary_tag as a setter, not a getter, as in > >$feature->primary_tag('') or $feature->primary_tag(undef) > > > >Jonathan > > > > > > > >>-----Original Message----- > >>From: Horvath Tamas [mailto:hota.fin@freemail.hu] > >>Sent: Friday, April 29, 2005 10:33 AM > >>To: Crabtree, Jonathan > >>Cc: Bioperl > >>Subject: Re: [Bioperl-l] different label colours > >> > >> > >>Crabtree, Jonathan wrote: > >> > >> > >> > >>>Hi Hota- > >>> > >>>This should work. Why don't you try inserting the > following line in > >>>your anonymous sub (after "my $feature = shift;") and then > >>> > >>> > >>tell us what > >> > >> > >>>(if anything) shows up on STDERR when you run your script: > >>> > >>>print STDERR "tag='", $feature->primary_tag, "'\n"; > >>> > >>>Jonathan > >>> > >>> > >>> > >>> > >>> > >>>>-----Original Message----- > >>>>From: bioperl-l-bounces@portal.open-bio.org > >>>>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > >>>>Horvath Tamas > >>>>Sent: Thursday, April 28, 2005 8:25 AM > >>>>To: Bioperl > >>>>Subject: [Bioperl-l] different label colours > >>>> > >>>> > >>>>I'm trying to use different label colours in one single > track, but > >>>>the > >>>>'sub {}' does not work for the '-fontcolor' option. Is there > >>>>a solution? > >>>>If not yet, where should I look over the code, to implement it? > >>>> > >>>>Hota > >>>> > >>>>PS.: > >>>> > >>>>-fontcolor => sub { my $feature = shift; > >>>> return 'red' if > >>>>$feature->primary_tag =~ /mudr/i; > >>>> return 'blue' if > >>>>$feature->primary_tag =~ /zn_finger/i; > >>>> return 'orange' if > >>>>$feature->primary_tag =~ /repeat/i; > >>>> return 'green' if > >>>>$feature->primary_tag eq 'exon'; > >>>> }, > >>>>this is how it looks like, but the label color is > >>>> > >>>> > >>consistently black > >> > >> > >>>>(though if I explicitly use -fontcolor => 'green' then > the label is > >>>>green indeed) > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l@portal.open-bio.org > >>>>http://portal.open-> bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> > >>>> > >>>> > >>>_______________________________________________ > >>>Bioperl-l mailing list > >>>Bioperl-l@portal.open-bio.org > >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> > >>> > >>> > >>> > >>Sorry, it's pretty messed up, but anyway, it looks like: (at > >>font color, > >>it is always '') > >> > >>tag_at_glyph='mudr_exon' > >>tag_at_glyph='mudr_exon' > >>tag_at_glyph='' > >>tag_at_glyph='zn_finger_exon' > >>tag_at_glyph='zn_finger_exon'tag_at_connector=' > >>' > >>tag_at_connector='' > >>tag_at_connector='repeat_L' > >>tag_at_strand_arrow='repeat_L' > >>tag_at_bgcolor='repeat_L' > >>tag_at_connector='repeat_R' > >>tag_at_strand_arrow='repeat_R' > >>tag_at_bgcolor='repeat_R' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='last_exon' > >>tag_at_strand_arrow='last_exon' > >>tag_at_bgcolor='last_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='repeat_L' > >>tag_at_strand_arrow='repeat_L' > >>tag_at_bgcolor='repeat_L' > >>tag_at_connector='repeat_R' > >>tag_at_strand_arrow='repeat_R' > >>tag_at_bgcolor='repeat_R' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='last_exon' > >>tag_at_strand_arrow='last_exon' > >>tag_at_bgcolor='last_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='repeat_L' > >>tag_at_strand_arrow='repeat_L' > >>tag_at_bgcolor='repeat_L' > >>tag_at_connector='repeat_R' > >>tag_at_strand_arrow='repeat_R' > >>tag_at_bgcolor='repeat_R' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='last_exon' > >>tag_at_strand_arrow='last_exon' > >>tag_at_bgcolor='last_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='repeat_L' > >>tag_at_strand_arrow='repeat_L' > >>tag_at_bgcolor='repeat_L' > >>tag_at_connector='repeat_R' > >>tag_at_strand_arrow='repeat_R' > >>tag_at_bgcolor='repeat_R' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='last_exon' > >>tag_at_strand_arrow='last_exon' > >>tag_at_bgcolor='last_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='repeat_L' > >>tag_at_strand_arrow='repeat_L' > >>tag_at_bgcolor='repeat_L' > >>tag_at_connector='repeat_R' > >>tag_at_strand_arrow='repeat_R' > >>tag_at_bgcolor='repeat_R' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='last_exon' > >>tag_at_strand_arrow='last_exon' > >>tag_at_bgcolor='last_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='repeat_L' > >>tag_at_strand_arrow='repeat_L' > >>tag_at_bgcolor='repeat_L' > >>tag_at_connector='repeat_R' > >>tag_at_strand_arrow='repeat_R' > >>tag_at_bgcolor='repeat_R' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='exon' > >>tag_at_strand_arrow='exon' > >>tag_at_bgcolor='exon' > >>tag_at_connector='last_exon' > >>tag_at_strand_arrow='last_exon' > >>tag_at_bgcolor='last_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_connector='mudr_exon' > >>tag_at_strand_arrow='mudr_exon' > >>tag_at_bgcolor='mudr_exon' > >>tag_at_fontcolor='' > >>tag_at_connector='' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_connector='zn_finger_exon' > tag_at_strand_arrow='zn_finger_exon' > >>tag_at_bgcolor='zn_finger_exon' > >>tag_at_fontcolor='' > >> > >> > >> > >> > > > > > > > > > Here's the cycle that u may need (the code is nod that clean, > but... ): > > foreach my $record (@$pretty) { > my $features; > next unless $record->{R_TIR_START}; #this is only true if the > record is valid > > my $track = $panel->add_track( > -glyph => sub { my $feature = shift; > print STDERR > "tag_at_glyph='", > $feature->primary_tag, "'\n"; > if ($feature->primary_tag =~ > /mudr/i || $feature->primary_tag =~ /zn_finger/i) > { return 'generic'} else { > return 'segments';} > }, > -bgcolor => sub { my $feature = shift; > print STDERR > "tag_at_bgcolor='", $feature->primary_tag, "'\n"; > if ($feature->primary_tag =~ > /exon/) { > if > ($feature->primary_tag > =~ /mudr/) {return 'red';} > elsif > ($feature->primary_tag =~ /zn_finger/i) {return 'blue';} > else {return 'green';}; > } > else {return 'orange';} > }, > -fgcolor => 'black', > -connector => sub { my $feature = shift; > print STDERR > "tag_at_connector='", $feature->primary_tag, "'\n"; > > $feature->primary_tag =~ /exon/ > ? return 'hat' : > return 'dashed'; > > }, > -height => 15, > -bump => 0, > -label => 1, > -orient => sub { my $feature = shift; > print STDERR > "tag_at_orient='", > $feature->primary_tag, "'\n"; > > $feature->primary_tag eq 'repeat_L' > ? 'E' : 'W'; > }, > -fontcolor => sub { my $feature = shift; > print STDERR > "tag_at_fontcolor='", $feature->primary_tag, "'\n"; > return 'red' if > $feature->primary_tag =~ /mudr/i; > return 'blue' if > $feature->primary_tag =~ /zn_finger/i; > return 'orange' if > $feature->primary_tag =~ /repeat/i; > return 'green' if > $feature->primary_tag eq 'exon'; > }, > -font2color => 'green', > -point => 0, > -strand_arrow => sub { my $feature = shift; > print STDERR > "tag_at_strand_arrow='", $feature->primary_tag, "'\n"; > if > ($feature->primary_tag eq > 'last_exon' or $feature->primary_tag =~ /repeat/i) > {return 1;} else > {return 0}; > }, > -description => sub { > my $feature = shift; > return unless > $feature->has_tag('description'); > my ($description) = > $feature->each_tag_value('description'); > return $description; > } > ); > print '.'; > > $features = new Bio::SeqFeature::Generic (-display_name => ' '); > $subfeature = new Bio::SeqFeature::Generic(-start => > $record->{L_TIR_START}, > -end => > $record->{L_TIR_END}, > -primary => 'repeat_L', > -source => 'internal', > -strand => 1); > $features->add_sub_SeqFeature( $subfeature , 'EXPAND'); > $subfeature = new Bio::SeqFeature::Generic(-start => > $record->{R_TIR_START}, > -end => > $record->{R_TIR_END}, > -primary => 'repeat_R', > -source => 'internal', > -strand => -1,); > $features->add_sub_SeqFeature( $subfeature , 'EXPAND'); > > $track->add_feature($features); > undef $features; > my $description = $record->{SEQ_ID}; > my @starts = (); > my @startx = (); > my $lastend = 1; > my $s = $record->{L_TIR_START}; my $e = > $record->{R_TIR_END}; my $l > = $record->{L_TIR_END} - $record->{L_TIR_START}; > my $ps = ${$record->{EXON_LIST}->[0]->{START}}; > my $pe = > ${$record->{EXON_LIST}->[$#{$record->{EXON_LIST}}]->{START}}; > my $sc = $record->{SCORE}; > > $description .= ", GW score: $sc, sequence $s - $e, TIR app.: $l, > prot.: $ps - $pe "; > > $features = new Bio::SeqFeature::Generic (-display_name => ' ', > -tag => { > > description => $description > } > ); > my @exonlist = @{$record->{EXON_LIST}}; > my $last_exon = pop @{$record->{EXON_LIST}}; > my @prot = (); > my $pps = 0; > my $ppe = 0; > my $xs = 1; > my $xe = 1; > > foreach $exon (@{$record->{EXON_LIST}}) { > my $start = ${$exon->{START}}; > push @startx , $start; > $start -= $lastend; > push @starts , $start; > $lastend = ${$exon->{END}}; > > $pps = ${$exon->{START}}; $ppe = ${$exon->{END}}; > $xs = $xe; > $xe = $xs + int( ($ppe - $pps)/3); > push(@prot , "$xs - $xe"); > > > $subfeature = new Bio::SeqFeature::Generic (-start => > ${$exon->{START}}, > -end => > ${$exon->{END}}, > > -primary => 'exon', > -source => > 'internal', > -strand => 1, > ); > $features->add_sub_SeqFeature($subfeature,'EXPAND'); > my $s = ${$exon->{START}};my $e = > ${$exon->{END}};print"$s - $e.."; > } > $subfeature = new Bio::SeqFeature::Generic (-start => > ${$last_exon->{START}}, > -end => > ${$last_exon->{END}}, > -primary => > 'last_exon', > -source => > 'internal', > -strand => 1, > ); > $pps = ${$last_exon->{START}}; $ppe = ${$last_exon->{END}}; > $xs = $xe; > $xe = $xs + int( ($ppe - $pps)/3); > push(@prot , "$xs - $xe"); > > my $protstat = join ( ".." , @prot); > print "\n$protstat\n"; > $features->add_sub_SeqFeature($subfeature,'EXPAND'); > > print "\n"; > $track->add_feature($features); > undef $features; > my $ms = $record->{DOMAINS}->{MUDR}->{START}; > my $me = $record->{DOMAINS}->{MUDR}->{END}; > print "!$ms !$me\n"; > my @mudr_exons = @{&calc_domain_exons($ms,$me,\@exonlist)}; > print 1; > > my $label = "MuDR:$ms - $me"; > > $features = new Bio::SeqFeature::Generic > (-display_name => $label); > > foreach $exon (@mudr_exons) { > $subfeature = new Bio::SeqFeature::Generic (-start => > $exon->{START}, > -end => > $exon->{END}, > -primary => > 'mudr_exon', > -source => > 'internal', > -strand => 1, > ); > $features->add_sub_SeqFeature($subfeature,'EXPAND'); > my $s = $exon->{START};my $e = > $exon->{END};print"M$s - $e.."; > } > $features->add_sub_SeqFeature($subfeature,'EXPAND'); > > print "\n"; > $track->add_feature($features); > undef $features; > > $ms = 0; > $ms = 0; > > $ms = $record->{DOMAINS}->{Zn_finger}->{START}; > $me = $record->{DOMAINS}->{Zn_finger}->{END}; > print "!$ms !$me\n"; > @mudr_exons = @{&calc_domain_exons($ms,$me,\@exonlist)}; > print 1; > > $label = "Zn:$ms - $me"; > > $features = new Bio::SeqFeature::Generic > (-display_name => $label); > > foreach $exon (@mudr_exons) { > $subfeature = new Bio::SeqFeature::Generic (-start => > $exon->{START}, > -end => > $exon->{END}, > -primary => > 'zn_finger_exon', > -source => > 'internal', > -strand => 1, > ); > $features->add_sub_SeqFeature($subfeature,'EXPAND'); > my $s = $exon->{START};my $e = > $exon->{END};print"Z$s - $e.."; > } > $features->add_sub_SeqFeature($subfeature,'EXPAND'); > > print "\n"; > $track->add_feature($features) if $ms; > undef $features; > > > } > From hota.fin at freemail.hu Fri Apr 29 11:56:01 2005 From: hota.fin at freemail.hu (Horvath Tamas) Date: Fri Apr 29 10:47:43 2005 Subject: [Bioperl-l] different label colours In-Reply-To: References: Message-ID: <42725911.1050908@freemail.hu> Crabtree, Jonathan wrote: >Hota- > >Here's your problem; there are a number of places in your code where >you're doing something that looks like this- > > $features = new Bio::SeqFeature::Generic (-display_name => $label); > foreach $exon (@mudr_exons) { > $subfeature = new Bio::SeqFeature::Generic( ..., > -primary => >'zn_finger_exon', > ...); > $features->add_sub_SeqFeature($subfeature, 'EXPAND'); > ... > } > >The problem with this is that your "parent" feature ($features) has no >primary_tag. By default, however, Bioperl will only call the fontcolor >subroutine on the *parent* feature, not the child features (i.e., the >exons, which *do* have valid primary tags.) Here are a couple of ways >that you can fix this: > >1. Assign each parent feature a valid primary tag when you create it. >For example: > $features = new Bio::SeqFeature::Generic (-display_name => $label, > -primary => >'zn_finger_exon'); >2. Use the -all_callbacks option (see the documentation for >Bio::Graphics::Panel). > >Using option 1. requires that you pick a single color for each parent >feature, whereas option 2. will let you assign each exon its own color >and/or label. Using all_callbacks does complicate things, however, so I >wouldn't option 2 unless you really want to assign each child >feature/exon its own label and/or color. > >Jonathan > > > There are some "interesting" things though. e.g. then is shouldn't know which is my 'last_exon' feature in the -strand_arrow option. If I understand you correctly, basically none of the colorings and other stuff should work, since none of my parent features has valid -primary tag... but actually everything else works fine... except the connector, but that's another story... > > >>-----Original Message----- >>From: Horvath Tamas [mailto:hota.fin@freemail.hu] >>Sent: Friday, April 29, 2005 11:07 AM >>To: Crabtree, Jonathan >>Cc: Bioperl >>Subject: Re: [Bioperl-l] different label colours >> >> >>Crabtree, Jonathan wrote: >> >> >> >>>Hota- >>> >>>That's interesting. I suspect that the problem is actually >>> >>> >>not in your >> >> >>>-fontcolor subroutine, but somewhere else in your script. >>> >>> >>Can you show >> >> >>>us the rest of the code? Either your labeled features >>> >>> >>aren't getting >> >> >>>assigned a primary_tag correctly, or perhaps the primary_tag >>> >>> >>value is >> >> >>>being erased somehow. For example, maybe one of your other >>> >>> >>subroutines >> >> >>>is accidentally invoking primary_tag as a setter, not a getter, as in >>>$feature->primary_tag('') or $feature->primary_tag(undef) >>> >>>Jonathan >>> >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: Horvath Tamas [mailto:hota.fin@freemail.hu] >>>>Sent: Friday, April 29, 2005 10:33 AM >>>>To: Crabtree, Jonathan >>>>Cc: Bioperl >>>>Subject: Re: [Bioperl-l] different label colours >>>> >>>> >>>>Crabtree, Jonathan wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Hi Hota- >>>>> >>>>>This should work. Why don't you try inserting the >>>>> >>>>> >>following line in >> >> >>>>>your anonymous sub (after "my $feature = shift;") and then >>>>> >>>>> >>>>> >>>>> >>>>tell us what >>>> >>>> >>>> >>>> >>>>>(if anything) shows up on STDERR when you run your script: >>>>> >>>>>print STDERR "tag='", $feature->primary_tag, "'\n"; >>>>> >>>>>Jonathan >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: bioperl-l-bounces@portal.open-bio.org >>>>>>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of >>>>>>Horvath Tamas >>>>>>Sent: Thursday, April 28, 2005 8:25 AM >>>>>>To: Bioperl >>>>>>Subject: [Bioperl-l] different label colours >>>>>> >>>>>> >>>>>>I'm trying to use different label colours in one single >>>>>> >>>>>> >>track, but >> >> >>>>>>the >>>>>>'sub {}' does not work for the '-fontcolor' option. Is there >>>>>>a solution? >>>>>>If not yet, where should I look over the code, to implement it? >>>>>> >>>>>>Hota >>>>>> >>>>>>PS.: >>>>>> >>>>>>-fontcolor => sub { my $feature = shift; >>>>>> return 'red' if >>>>>>$feature->primary_tag =~ /mudr/i; >>>>>> return 'blue' if >>>>>>$feature->primary_tag =~ /zn_finger/i; >>>>>> return 'orange' if >>>>>>$feature->primary_tag =~ /repeat/i; >>>>>> return 'green' if >>>>>>$feature->primary_tag eq 'exon'; >>>>>> }, >>>>>>this is how it looks like, but the label color is >>>>>> >>>>>> >>>>>> >>>>>> >>>>consistently black >>>> >>>> >>>> >>>> >>>>>>(though if I explicitly use -fontcolor => 'green' then >>>>>> >>>>>> >>the label is >> >> >>>>>>green indeed) >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l@portal.open-bio.org >>>>>>http://portal.open-> bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l@portal.open-bio.org >>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>Sorry, it's pretty messed up, but anyway, it looks like: (at >>>>font color, >>>>it is always '') >>>> >>>>tag_at_glyph='mudr_exon' >>>>tag_at_glyph='mudr_exon' >>>>tag_at_glyph='' >>>>tag_at_glyph='zn_finger_exon' >>>>tag_at_glyph='zn_finger_exon'tag_at_connector=' >>>>' >>>>tag_at_connector='' >>>>tag_at_connector='repeat_L' >>>>tag_at_strand_arrow='repeat_L' >>>>tag_at_bgcolor='repeat_L' >>>>tag_at_connector='repeat_R' >>>>tag_at_strand_arrow='repeat_R' >>>>tag_at_bgcolor='repeat_R' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='last_exon' >>>>tag_at_strand_arrow='last_exon' >>>>tag_at_bgcolor='last_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='repeat_L' >>>>tag_at_strand_arrow='repeat_L' >>>>tag_at_bgcolor='repeat_L' >>>>tag_at_connector='repeat_R' >>>>tag_at_strand_arrow='repeat_R' >>>>tag_at_bgcolor='repeat_R' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='last_exon' >>>>tag_at_strand_arrow='last_exon' >>>>tag_at_bgcolor='last_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='repeat_L' >>>>tag_at_strand_arrow='repeat_L' >>>>tag_at_bgcolor='repeat_L' >>>>tag_at_connector='repeat_R' >>>>tag_at_strand_arrow='repeat_R' >>>>tag_at_bgcolor='repeat_R' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='last_exon' >>>>tag_at_strand_arrow='last_exon' >>>>tag_at_bgcolor='last_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='repeat_L' >>>>tag_at_strand_arrow='repeat_L' >>>>tag_at_bgcolor='repeat_L' >>>>tag_at_connector='repeat_R' >>>>tag_at_strand_arrow='repeat_R' >>>>tag_at_bgcolor='repeat_R' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='last_exon' >>>>tag_at_strand_arrow='last_exon' >>>>tag_at_bgcolor='last_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='repeat_L' >>>>tag_at_strand_arrow='repeat_L' >>>>tag_at_bgcolor='repeat_L' >>>>tag_at_connector='repeat_R' >>>>tag_at_strand_arrow='repeat_R' >>>>tag_at_bgcolor='repeat_R' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='last_exon' >>>>tag_at_strand_arrow='last_exon' >>>>tag_at_bgcolor='last_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='repeat_L' >>>>tag_at_strand_arrow='repeat_L' >>>>tag_at_bgcolor='repeat_L' >>>>tag_at_connector='repeat_R' >>>>tag_at_strand_arrow='repeat_R' >>>>tag_at_bgcolor='repeat_R' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='exon' >>>>tag_at_strand_arrow='exon' >>>>tag_at_bgcolor='exon' >>>>tag_at_connector='last_exon' >>>>tag_at_strand_arrow='last_exon' >>>>tag_at_bgcolor='last_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_connector='mudr_exon' >>>>tag_at_strand_arrow='mudr_exon' >>>>tag_at_bgcolor='mudr_exon' >>>>tag_at_fontcolor='' >>>>tag_at_connector='' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_connector='zn_finger_exon' >>>> >>>> >>tag_at_strand_arrow='zn_finger_exon' >> >> >>>>tag_at_bgcolor='zn_finger_exon' >>>>tag_at_fontcolor='' >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>Here's the cycle that u may need (the code is nod that clean, >>but... ): >> >>foreach my $record (@$pretty) { >> my $features; >> next unless $record->{R_TIR_START}; #this is only true if the >>record is valid >> >> my $track = $panel->add_track( >> -glyph => sub { my $feature = shift; >> print STDERR >>"tag_at_glyph='", >>$feature->primary_tag, "'\n"; >> if ($feature->primary_tag =~ >>/mudr/i || $feature->primary_tag =~ /zn_finger/i) >> { return 'generic'} else { >>return 'segments';} >> }, >> -bgcolor => sub { my $feature = shift; >> print STDERR >>"tag_at_bgcolor='", $feature->primary_tag, "'\n"; >> if ($feature->primary_tag =~ >>/exon/) { >> if >>($feature->primary_tag >>=~ /mudr/) {return 'red';} >> elsif >>($feature->primary_tag =~ /zn_finger/i) {return 'blue';} >> else {return 'green';}; >> } >> else {return 'orange';} >> }, >> -fgcolor => 'black', >> -connector => sub { my $feature = shift; >> print STDERR >>"tag_at_connector='", $feature->primary_tag, "'\n"; >> >>$feature->primary_tag =~ /exon/ >> ? return 'hat' : >>return 'dashed'; >> >> }, >> -height => 15, >> -bump => 0, >> -label => 1, >> -orient => sub { my $feature = shift; >> print STDERR >>"tag_at_orient='", >>$feature->primary_tag, "'\n"; >> >>$feature->primary_tag eq 'repeat_L' >> ? 'E' : 'W'; >> }, >> -fontcolor => sub { my $feature = shift; >> print STDERR >>"tag_at_fontcolor='", $feature->primary_tag, "'\n"; >> return 'red' if >>$feature->primary_tag =~ /mudr/i; >> return 'blue' if >>$feature->primary_tag =~ /zn_finger/i; >> return 'orange' if >>$feature->primary_tag =~ /repeat/i; >> return 'green' if >>$feature->primary_tag eq 'exon'; >> }, >> -font2color => 'green', >> -point => 0, >> -strand_arrow => sub { my $feature = shift; >> print STDERR >>"tag_at_strand_arrow='", $feature->primary_tag, "'\n"; >> if >>($feature->primary_tag eq >>'last_exon' or $feature->primary_tag =~ /repeat/i) >> {return 1;} else >>{return 0}; >> }, >> -description => sub { >> my $feature = shift; >> return unless >>$feature->has_tag('description'); >> my ($description) = >>$feature->each_tag_value('description'); >> return $description; >> } >> ); >> print '.'; >> >> $features = new Bio::SeqFeature::Generic (-display_name => ' '); >> $subfeature = new Bio::SeqFeature::Generic(-start => >>$record->{L_TIR_START}, >> -end => >>$record->{L_TIR_END}, >> -primary => 'repeat_L', >> -source => 'internal', >> -strand => 1); >> $features->add_sub_SeqFeature( $subfeature , 'EXPAND'); >> $subfeature = new Bio::SeqFeature::Generic(-start => >>$record->{R_TIR_START}, >> -end => >>$record->{R_TIR_END}, >> -primary => 'repeat_R', >> -source => 'internal', >> -strand => -1,); >> $features->add_sub_SeqFeature( $subfeature , 'EXPAND'); >> >> $track->add_feature($features); >> undef $features; >> my $description = $record->{SEQ_ID}; >> my @starts = (); >> my @startx = (); >> my $lastend = 1; >> my $s = $record->{L_TIR_START}; my $e = >>$record->{R_TIR_END}; my $l >>= $record->{L_TIR_END} - $record->{L_TIR_START}; >> my $ps = ${$record->{EXON_LIST}->[0]->{START}}; >> my $pe = >>${$record->{EXON_LIST}->[$#{$record->{EXON_LIST}}]->{START}}; >> my $sc = $record->{SCORE}; >> >> $description .= ", GW score: $sc, sequence $s - $e, TIR app.: $l, >>prot.: $ps - $pe "; >> >> $features = new Bio::SeqFeature::Generic (-display_name => ' ', >> -tag => { >> >>description => $description >> } >> ); >> my @exonlist = @{$record->{EXON_LIST}}; >> my $last_exon = pop @{$record->{EXON_LIST}}; >> my @prot = (); >> my $pps = 0; >> my $ppe = 0; >> my $xs = 1; >> my $xe = 1; >> >> foreach $exon (@{$record->{EXON_LIST}}) { >> my $start = ${$exon->{START}}; >> push @startx , $start; >> $start -= $lastend; >> push @starts , $start; >> $lastend = ${$exon->{END}}; >> >> $pps = ${$exon->{START}}; $ppe = ${$exon->{END}}; >> $xs = $xe; >> $xe = $xs + int( ($ppe - $pps)/3); >> push(@prot , "$xs - $xe"); >> >> >> $subfeature = new Bio::SeqFeature::Generic (-start => >>${$exon->{START}}, >> -end => >>${$exon->{END}}, >> >>-primary => 'exon', >> -source => >>'internal', >> -strand => 1, >> ); >> $features->add_sub_SeqFeature($subfeature,'EXPAND'); >> my $s = ${$exon->{START}};my $e = >>${$exon->{END}};print"$s - $e.."; >> } >> $subfeature = new Bio::SeqFeature::Generic (-start => >>${$last_exon->{START}}, >> -end => >>${$last_exon->{END}}, >> -primary => >>'last_exon', >> -source => >>'internal', >> -strand => 1, >> ); >> $pps = ${$last_exon->{START}}; $ppe = ${$last_exon->{END}}; >> $xs = $xe; >> $xe = $xs + int( ($ppe - $pps)/3); >> push(@prot , "$xs - $xe"); >> >> my $protstat = join ( ".." , @prot); >> print "\n$protstat\n"; >> $features->add_sub_SeqFeature($subfeature,'EXPAND'); >> >> print "\n"; >> $track->add_feature($features); >> undef $features; >> my $ms = $record->{DOMAINS}->{MUDR}->{START}; >> my $me = $record->{DOMAINS}->{MUDR}->{END}; >> print "!$ms !$me\n"; >> my @mudr_exons = @{&calc_domain_exons($ms,$me,\@exonlist)}; >> print 1; >> >> my $label = "MuDR:$ms - $me"; >> >> $features = new Bio::SeqFeature::Generic >>(-display_name => $label); >> >> foreach $exon (@mudr_exons) { >> $subfeature = new Bio::SeqFeature::Generic (-start => >>$exon->{START}, >> -end => >>$exon->{END}, >> -primary => >>'mudr_exon', >> -source => >>'internal', >> -strand => 1, >> ); >> $features->add_sub_SeqFeature($subfeature,'EXPAND'); >> my $s = $exon->{START};my $e = >>$exon->{END};print"M$s - $e.."; >> } >> $features->add_sub_SeqFeature($subfeature,'EXPAND'); >> >> print "\n"; >> $track->add_feature($features); >> undef $features; >> >> $ms = 0; >> $ms = 0; >> >> $ms = $record->{DOMAINS}->{Zn_finger}->{START}; >> $me = $record->{DOMAINS}->{Zn_finger}->{END}; >> print "!$ms !$me\n"; >> @mudr_exons = @{&calc_domain_exons($ms,$me,\@exonlist)}; >> print 1; >> >> $label = "Zn:$ms - $me"; >> >> $features = new Bio::SeqFeature::Generic >>(-display_name => $label); >> >> foreach $exon (@mudr_exons) { >> $subfeature = new Bio::SeqFeature::Generic (-start => >>$exon->{START}, >> -end => >>$exon->{END}, >> -primary => >>'zn_finger_exon', >> -source => >>'internal', >> -strand => 1, >> ); >> $features->add_sub_SeqFeature($subfeature,'EXPAND'); >> my $s = $exon->{START};my $e = >>$exon->{END};print"Z$s - $e.."; >> } >> $features->add_sub_SeqFeature($subfeature,'EXPAND'); >> >> print "\n"; >> $track->add_feature($features) if $ms; >> undef $features; >> >> >> } >> >> >> > > > > From crabtree at tigr.org Fri Apr 29 11:02:01 2005 From: crabtree at tigr.org (Crabtree, Jonathan) Date: Fri Apr 29 10:55:05 2005 Subject: [Bioperl-l] different label colours Message-ID: Hota- >There are some "interesting" things though. e.g. then is shouldn't know >which is my 'last_exon' feature in the -strand_arrow option. If I >understand you correctly, basically none of the colorings and other >stuff should work, since none of my parent features has valid -primary >tag... but actually everything else works fine... except the connector, >but that's another story... Sorry, I should have been more specific. The fact that fontcolor doesn't get called on the child features has to do with the fact that you're using the 'segments' glyph for all of your parent features. Here's what the documentation in Bio::Graphics::Panel has to say on the subject: When you install a callback for a feature that contains subparts, the callback will be invoked first for the top-level feature, and then for each of its subparts (recursively). You should make sure to examine the feature's type to determine whether the option is appropriate. Some glyphs deliberately disable this recursive feature. The "track", "group", "transcript", "transcript2" and "segments" glyphs selectively disable the -bump, -label and -description options. This is to avoid, for example, a label being attached to each exon in a transcript, or the various segments of a gapped alignment bumping each other. You can override this behavior and force your callback to be invoked by providing add_track() with a true B<-all_callbacks> argument. In this case, you must be prepared to handle configuring options for the "group" and "track" glyphs. Jonathan From skirov at utk.edu Fri Apr 29 11:36:58 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Apr 29 11:30:52 2005 Subject: [Bioperl-l] Bio::Ontology::Term references quetsion In-Reply-To: <59B1A3F6-B86E-11D9-B77B-000A959EB4C4@gmx.net> References: <59B1A3F6-B86E-11D9-B77B-000A959EB4C4@gmx.net> Message-ID: <4272549A.40603@utk.edu> Hilmar, I traced it and you are right to say it is not Bio::Ontology::Term. It is because of the overload in Bio::Annotation::Reference. It returns undef if there is no title. This makes absolutely no sense to me. Actually both 1.4 and bioperl-live docs say: Args : a hash with *optional* title, authors, location, medline, start and end attributes And it acts like that. Surprisingly, when you call something like: if ($ref) { ....blah blah } else { die; } surprisingly your script dies. And it is really hard to find out why. So there should be either a check for a title before creating the object or this overloading should go away. Stefan Hilmar Lapp wrote: > What makes you think it does? I can't see anything in the code that > would enforce this. Do you get receive an error if you add a reference > without a title? If so, what is the stack trace? > > -hilmar > > On Wednesday, April 27, 2005, at 11:59 AM, Stefan Kirov wrote: > >> Is it true that Bio::Ontology::Term add_references method checks for >> title? If so there is a discrepancy with Bio::Annotation::Reference, >> as this object does not require a title upon creation. I don't think >> this should be such requirement in Bio::Ontology::Term unless I am >> missing something. Also the documentation does not say there is such >> requirement... >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> From mlemieux at bioinfo.ca Fri Apr 29 15:44:41 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Fri Apr 29 15:37:47 2005 Subject: [Bioperl-l] RemoteBlast - hit table and html format Message-ID: <32ebcf38504538b6ac0d184abee13ef4@bioinfo.ca> Nathan and Paul, The answers to your respective questions are related. Paul: RemoteBlast returns a Blast object that contains the results parsed from the response retrieved from the NCBI. The parser handles pairwise alignments perfectly but doesn't deal so well with the tabular data that is returned when Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'ALIGNMENT_VIEW'} = 'Tabular' (which is how you change the view option) is specified. To get the raw HTML returned by the NCBI, you have to intercept the result before the parsing happens. Look for the following bit of code in RemoteBlast.pm: if( $response->is_success ) { my $size = -s $tempfile; if( $size > 1000 ) { my $blastobj; if( $self->readmethod =~ /BPlite/ ) { $blastobj = new Bio::Tools::BPlite(-file => $tempfile); } else { $blastobj = new Bio::SearchIO(-file => $tempfile, -format => 'blast'); } #save tempfile $self->file($tempfile); return $blastobj; The file $tempfile contains the HTML before parsing. If you replace the code in the if($size > 1000) block with whatever processing you prefer, if any, that should do it. Nathan: Any graphical browser will do to look at the contents of $tempfile. If you're not getting pairwise alignments, any browser at all will do in fact. Just point the browser at file:///your_results_file If your users are familiar with a particular browser, this might be the most comfortable and easy solution for them. Alternatively, most recent text editors also display HTML properly (TextEdit on Mac, Word, etc.). Hope this helps! Madeleine From lupey+ at pitt.edu Fri Apr 29 11:55:09 2005 From: lupey+ at pitt.edu (Paul G Cantalupo) Date: Sat Apr 30 12:25:40 2005 Subject: [Bioperl-l] Windows BLAST problems under Cygwin In-Reply-To: References: Message-ID: It turns out that 1) Setting the TMPDIR environment variable as per INSTALL.WIN instructions fixed my problem (export TMPDIR=c:/cygwin/tmp) 2) I have a serious error in my script. I am trying to BLASTP on a nucleotide database (ecoli.nt)! Wow, I gotta wake up. Now it seems everything is OK. Thank you for your comments, Paul Paul Cantalupo Research Specialist/Systems Programmer 559 Crawford Hall Department of Biological Sciences University of Pittsburgh Pittsburgh, PA 15260 Work: 412-624-4687 Fax: 412-624-4759 Ask me about Toastmasters: www.toastmasters.org Midday Club Treasurer On Fri, 29 Apr 2005, Brian Osborne wrote: > Paul, > > This is from the INSTALL.WIN file: > > Directory for temporary files > ============================= > > Set the environmental variable TMPDIR, programs like BLAST and > clustalw need a place to create temporary files. E.g.: > > setenv TMPDIR e:/cygwin/tmp # csh, tcsh > export TMPDIR=e:/cygwin/tmp # sh, bash > > Note that this is not the syntax that Cygwin understands, which would > be something like "/cygdrive/e/cygwin/tmp", but this is the syntax > that a Perl module expects on Windows. > > If this variable is not set correctly you'll see errors like this > when you run Bio::Tools::Run::StandAloneBlast: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory > STACK: Error::throw > > > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich > Sent: Friday, April 29, 2005 10:05 AM > To: Paul Cantalupo > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Windows BLAST problems under Cygwin > > > blast may have crashed because you didn't set the BLASTDIR - notice > your path to the ecoli db is /ecoli.nt which is probably incorrect. > > You can vary which dir it uses for tempfiles by setting TEMPDIR > environment variable. > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Apr 29, 2005, at 9:25 AM, Paul Cantalupo wrote: > > > Hello, > > > > I am running BioPerl 1.4 on Windows 2000 under Cygwin (therefore, I > > use Perl that comes with Cygwin; not Windows Perl). I am trying to run > > a standalone blast. I installed the Windows version of BLAST as > > recommended by the BioPerl installation instructions. My script (see > > localblast.pl below) takes an input sequence file (see test.fa below) > > and performs a blastp. By running the script with the following > > command line, I get the this error: > > > > $ localblast.pl test.fa > > [NULL_Caption] FATAL ERROR: blast: Unable to open input file > > /tmp/4lkjmTjRio > > > > ------------- EXCEPTION ------------- > > MSG: blastall call crashed: 256 /usr/local/blast/blastall -p blastp > > -d "/ecoli.nt" -i > > /tmp/4lkjmTjRio -o /tmp/llctIvZlC6 > > > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > > /usr/lib/perl5/site_perl/5.8/Bio/Tools/Ru > > n/StandAloneBlast.pm:732 > > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > > /usr/lib/perl5/site_perl/5.8/B > > io/Tools/Run/StandAloneBlast.pm:680 > > STACK Bio::Tools::Run::StandAloneBlast::blastall > > /usr/lib/perl5/site_perl/5.8/Bio/Tools/Run > > /StandAloneBlast.pm:536 > > STACK toplevel ./localblast.pl:17 > > > > -------------------------------------- > > > > > > Notice that blast is unable to open the input file /tmp/4lkjmTjRio > > (which the library StandAloneBlast created). Next, I tried to run > > blastall directly from the commandline, with a file in the /tmp > > directory but it gave me the same error: 'unable to open input file'. > > But blastall does execute properly I use an input file that is in the > > current directory (using a relative path name in the -i option). But > > if I set the -i option to any absolute reference for a file like > > /home/lupey/fasta.fa, it fails and the error is the same: 'Unable to > > open input file'. > > > > So, why does BioPerl suggest using the Windows version of Blast if it > > can't open files using absolute references to files especially when > > the StandAloneBlast library places the inputfile in the /tmp > > directory? What solution can I employ to fix this? > > > > Thank you, > > > > Paul > > > > > > > > #localblast.pl > > > > #!/usr/bin/perl > > > > use strict; > > use Bio::SeqIO; > > use Bio::Tools::Run::StandAloneBlast; > > > > my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], -format => 'fasta'); > > my $query = $Seq_in->next_seq(); > > > > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => > > 'blastp', > > 'database' => > > 'ecoli.nt', > > _READMETHOD => "Blast" > > ); > > my $blast_report = $factory->blastall($query); > > my $result = $blast_report->next_result; > > > > while( my $hit = $result->next_hit()) { > > print "\thit name: ", $hit->name(), " significance: ", > > $hit->significance(), "\n";} > > > > > > > > test.fa: > > >Test > > AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC > > TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA > > TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC > > ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG > > CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA > > GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC > > AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG > > AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From lupey+ at pitt.edu Fri Apr 29 13:30:54 2005 From: lupey+ at pitt.edu (Paul G Cantalupo) Date: Sat Apr 30 12:25:43 2005 Subject: [Bioperl-l] Format of BlastHit '_description' key Message-ID: Hello, Why does the implementation of the '_description' key of Bio::Search::Hit::BlastHit join all the descriptions into one line? For example, the description lines (there are 4 total) of this BlastHit >ref|NP_040915.1| DNA polymerase [Human adenovirus A] emb|CAA51882.1| DNA polymerase [Human adenovirus type 12] pir||DJAD12 DNA-directed DNA polymerase (EC 2.7.7.7) - human adenovirus 12 sp|P06538|DPOL_ADE12 DNA polymerase Length = 1061 Score = 2172 bits (5627), Expect = 0.0 Identities = 1049/1061 (98%), Positives = 1049/1061 (98%) are joined into one scalar (as displayed by the Perl debugger) '_description' => 'DNA polymerase [Human adenovirus A] emb|CAA51882.1| DNA polymerase [Human adenovirus type 12] pir||DJAD12 DNA-directed DNA polymerase (EC 2.7.7.7) - human adenovirus 12 sp|P06538|DPOL_ADE12 DNA polymerase' The problem is that I need to be able to parse the description lines for the one that I want. Is it possible to reconstruct the original description lines as outputted by BLAST? Thank you, Paul Paul Cantalupo Research Specialist/Systems Programmer 559 Crawford Hall Department of Biological Sciences University of Pittsburgh Pittsburgh, PA 15260 Work: 412-624-4687 Fax: 412-624-4759 Ask me about Toastmasters: www.toastmasters.org Midday Club Treasurer