From Laurence.Amilhat at toulouse.inra.fr Thu Jan 3 09:29:09 2008 From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat) Date: Thu, 03 Jan 2008 15:29:09 +0100 Subject: [Bioperl-l] BioPerl and NHX tree Message-ID: <477CF135.9060104@toulouse.inra.fr> Dear all, I am trying to convert a newick tree into an NHX tree, so I can add the taxid tag for each leaf. I am using the modules: Bio::TreeIO & Bio::Tree::NodeNHX The idea is 1) to read the newick tree 2) get the leaf, and get the corresponding taxid for it 3) add the nhx species tag 4) write the nhx tree I was able to do the first 2 steps, and I could create an object node_nhx and add the tag T, but I don't know how to write an nhx Tree with the node_nhx previously created... Does anyone have an idea? any help are welcome. Thanks, laurence. Here are my code and the samples files for better understanding: newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt _newick2nhx.pl:_ use strict; use Bio::TreeIO; use Bio::Tree::NodeNHX; use Getopt::Long; my $tree_file; my $outfile; my $codefile; my %corresp; GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' =>\$codefile); open (CODE, "< $codefile"); while () { chomp; my($a, $b)=split (/\t/); $corresp{$a}=$b; } my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file"); my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile"); while (my $tree= $treeio->next_tree) { my @nodes=$tree->get_nodes(); foreach my $nd(@nodes) { if ($nd->is_Leaf()) { my $id=$nd->id(); print "$id TAXID ",$corresp{$id},"\n"; my $nodenhx=new Bio::Tree::NodeNHX(); $nodenhx->nhx_tag({T=>$corresp{$id}}); } } $treeout->write_tree($tree); } _test_tree.nwk_: (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0, 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0, (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,AAEL015662:100.0):100.0, 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0, 42558941:100.0); _seq_taxid.txt:_ AAEL015662 7159 42558969 9606 42558981 10090 42558942 9606 42558970 6239 42558929 10116 42558987 9606 42558930 10116 42558943 9606 148887393 10090 42558958 10090 42558941 9606 56405380 10090 90185247 9606 66774197 6239 _And the tata resulting file:_ (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],((((( 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.0[&&NHX],AAEL01566 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0); -- ==================================================================== = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = ==================================================================== From aaron.j.mackey at gsk.com Thu Jan 3 10:12:22 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Thu, 3 Jan 2008 10:12:22 -0500 Subject: [Bioperl-l] BioPerl and NHX tree In-Reply-To: <477CF135.9060104@toulouse.inra.fr> Message-ID: Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that way, your tree's nodes are already NodeNHX's. Instead of creating a new $nodenhx, you can use the $node variable directly from the tree ... -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM: > Dear all, > > I am trying to convert a newick tree into an NHX tree, so I can add the > taxid tag for each leaf. > > I am using the modules: Bio::TreeIO & Bio::Tree::NodeNHX > The idea is > 1) to read the newick tree > 2) get the leaf, and get the corresponding taxid for it > 3) add the nhx species tag > 4) write the nhx tree > > I was able to do the first 2 steps, and I could create an object > node_nhx and add the tag T, > but I don't know how to write an nhx Tree with the node_nhx previously > created... > > Does anyone have an idea? any help are welcome. > > Thanks, > > laurence. > > > Here are my code and the samples files for better understanding: > newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt > > _newick2nhx.pl:_ > use strict; > use Bio::TreeIO; > use Bio::Tree::NodeNHX; > use Getopt::Long; > > > my $tree_file; > my $outfile; > my $codefile; > my %corresp; > > GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' > =>\$codefile); > > open (CODE, "< $codefile"); > while () > { > chomp; > my($a, $b)=split (/\t/); > $corresp{$a}=$b; > } > > > my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file"); > my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile"); > > while (my $tree= $treeio->next_tree) > { > my @nodes=$tree->get_nodes(); > foreach my $nd(@nodes) > { > if ($nd->is_Leaf()) > { > my $id=$nd->id(); > print "$id TAXID ",$corresp{$id},"\n"; > > my $nodenhx=new Bio::Tree::NodeNHX(); > $nodenhx->nhx_tag({T=>$corresp{$id}}); > } > } > $treeout->write_tree($tree); > } > > > _test_tree.nwk_: > (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0, > 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0, > (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0, > AAEL015662:100.0):100.0, > 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0, > 42558941:100.0); > > _seq_taxid.txt:_ > AAEL015662 7159 > 42558969 9606 > 42558981 10090 > 42558942 9606 > 42558970 6239 > 42558929 10116 > 42558987 9606 > 42558930 10116 > 42558943 9606 > 148887393 10090 > 42558958 10090 > 42558941 9606 > 56405380 10090 > 90185247 9606 > 66774197 6239 > > > _And the tata resulting file:_ > (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0, > (42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],((((( > 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0, > 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100. > 0[&&NHX],AAEL01566 > 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX], > (42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0); > > > > > -- > ==================================================================== > = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = > = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = > ==================================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Laurence.Amilhat at toulouse.inra.fr Fri Jan 4 03:33:22 2008 From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat) Date: Fri, 04 Jan 2008 09:33:22 +0100 Subject: [Bioperl-l] BioPerl and NHX tree In-Reply-To: References: Message-ID: <477DEF52.20802@toulouse.inra.fr> Thank you Aaron, it's working now. I've changed to species instead of taxid, so I can color the species on my tree using the ATV viewer. thanks again, Regards, Laurence. aaron.j.mackey at gsk.com a ?crit : > Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that > way, your tree's nodes are already NodeNHX's. Instead of creating a new > $nodenhx, you can use the $node variable directly from the tree ... > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM: > > >> Dear all, >> >> I am trying to convert a newick tree into an NHX tree, so I can add the >> taxid tag for each leaf. >> >> I am using the modules: Bio::TreeIO & Bio::Tree::NodeNHX >> The idea is >> 1) to read the newick tree >> 2) get the leaf, and get the corresponding taxid for it >> 3) add the nhx species tag >> 4) write the nhx tree >> >> I was able to do the first 2 steps, and I could create an object >> node_nhx and add the tag T, >> but I don't know how to write an nhx Tree with the node_nhx previously >> created... >> >> Does anyone have an idea? any help are welcome. >> >> Thanks, >> >> laurence. >> >> >> Here are my code and the samples files for better understanding: >> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt >> >> _newick2nhx.pl:_ >> use strict; >> use Bio::TreeIO; >> use Bio::Tree::NodeNHX; >> use Getopt::Long; >> >> >> my $tree_file; >> my $outfile; >> my $codefile; >> my %corresp; >> >> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' >> =>\$codefile); >> >> open (CODE, "< $codefile"); >> while () >> { >> chomp; >> my($a, $b)=split (/\t/); >> $corresp{$a}=$b; >> } >> >> >> my $treeio = new Bio::TreeIO (-format => 'newick', -file => >> > "$tree_file"); > >> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile"); >> >> while (my $tree= $treeio->next_tree) >> { >> my @nodes=$tree->get_nodes(); >> foreach my $nd(@nodes) >> { >> if ($nd->is_Leaf()) >> { >> my $id=$nd->id(); >> print "$id TAXID ",$corresp{$id},"\n"; >> >> my $nodenhx=new Bio::Tree::NodeNHX(); >> $nodenhx->nhx_tag({T=>$corresp{$id}}); >> } >> } >> $treeout->write_tree($tree); >> } >> >> >> _test_tree.nwk_: >> >> > (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0, > > 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0, > >> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0, >> AAEL015662:100.0):100.0, >> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0, >> 42558941:100.0); >> >> _seq_taxid.txt:_ >> AAEL015662 7159 >> 42558969 9606 >> 42558981 10090 >> 42558942 9606 >> 42558970 6239 >> 42558929 10116 >> 42558987 9606 >> 42558930 10116 >> 42558943 9606 >> 148887393 10090 >> 42558958 10090 >> 42558941 9606 >> 56405380 10090 >> 90185247 9606 >> 66774197 6239 >> >> >> _And the tata resulting file:_ >> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0, >> >> > (42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],((((( > >> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0, >> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100. >> 0[&&NHX],AAEL01566 >> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX], >> >> > (42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0); > >> >> >> -- >> ==================================================================== >> = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = >> = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = >> ==================================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- ==================================================================== = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = ==================================================================== From hlapp at gmx.net Sun Jan 6 22:02:32 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 6 Jan 2008 22:02:32 -0500 Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db? In-Reply-To: References: Message-ID: <640890C9-2D34-4C70-9179-26A9EAB397D2@gmx.net> Hi Zhihua, you didn't ever respond to Marc's link to the Persistent Bioperl slides - did that help? -hilmar On Dec 6, 2007, at 11:25 PM, zhihuali wrote: > > Hi netters, > > I've installed BioSQL and bioperl-db, and successfully created and > stored a persistent object: > > use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB; > my $dbadp=Bio::DB::BioDB->new(- > database=>'biosql', - > user=>'annoymous', -dbname=>'bioseqdb'); > > my $seqobj=Bio::Seq->new(- > accession_number=>"test", - > id=>"test1", - > seq=>"AGCTAGCT", -version=>1);my $dbobj=$dbadp- > >create_persistent($seqobj);$dbobj->create;$dbobj->commit; > > It's successful because I found corresponding rows in the bioseqdb > tables. > > Now I want to retrieve the object back from the database. There's > not much documents available and I've tried find_by_unique_key/ > primary_key but all failed. Maybe I didn't use them correctly. > Could anyone give me an example as how to retrieve the stored > Bio::Seq object? > > Thanks a lot! > > Zhihua Li > _________________________________________________________________ > ? Live Search ??????? > http://www.live.com/?searchOnly=true > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Mon Jan 7 12:24:02 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 07 Jan 2008 12:24:02 -0500 Subject: [Bioperl-l] Anything up with cvs/svn? Message-ID: <1199726642.6374.10.camel@frissell> Hello, I was trying to get bioperl-live this morning from either cvs or svn and failed. I was wondering if something was going on with the server. Here are the things I tried: cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co bioperl-live which resulted in this: cvs checkout: warning: cannot write to history file /home/repository/bioperl/CVSROOT/history: Permission denied cvs checkout: Updating bioperl-live cvs checkout: failed to create lock directory for `/home/repository/bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/#cvs.lock): Permission denied cvs checkout: failed to obtain dir lock in repository `/home/repository/bioperl/bioperl-live' cvs [checkout aborted]: read lock failed - giving up Then I thought I'd try the suggested svn checkout method from the bioperl wiki: svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live which resulted in svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live' Finally, I after looking at the openbio server, I thought I'd try this: svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/bioperl/bioperl-live which resulted in repeated requests for my password (which I supplied correctly at least once out of the several requests). So, what's up? Thanks much, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hlapp at gmx.net Mon Jan 7 12:36:02 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 7 Jan 2008 12:36:02 -0500 Subject: [Bioperl-l] Anything up with cvs/svn? In-Reply-To: <1199726642.6374.10.camel@frissell> References: <1199726642.6374.10.camel@frissell> Message-ID: I think we are still migrating to svn. It's probably better to wait for the announcement that everything is ready to go. (And then cvs won't work anymore except for anonymous checkout - which should actually continue to work while this is in progress. Have you tried that?) -hilmar On Jan 7, 2008, at 12:24 PM, Scott Cain wrote: > Hello, > > I was trying to get bioperl-live this morning from either cvs or > svn and > failed. I was wondering if something was going on with the server. > > Here are the things I tried: > > cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co > bioperl-live > > which resulted in this: > > cvs checkout: warning: cannot write to history file /home/ > repository/bioperl/CVSROOT/history: Permission denied > cvs checkout: Updating bioperl-live > cvs checkout: failed to create lock directory for `/home/repository/ > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ > #cvs.lock): Permission denied > cvs checkout: failed to obtain dir lock in repository `/home/ > repository/bioperl/bioperl-live' > cvs [checkout aborted]: read lock failed - giving up > > Then I thought I'd try the suggested svn checkout method from the > bioperl wiki: > > svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ > bioperl-live > > which resulted in > > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ > hartzell/bioperl/bioperl-live' > > Finally, I after looking at the openbio server, I thought I'd try > this: > > svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ > bioperl/bioperl-live > > which resulted in repeated requests for my password (which I supplied > correctly at least once out of the several requests). > > So, what's up? > > Thanks much, > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Mon Jan 7 12:43:18 2008 From: jason at bioperl.org (Jason Stajich) Date: Mon, 7 Jan 2008 09:43:18 -0800 Subject: [Bioperl-l] Anything up with cvs/svn? In-Reply-To: <1199726642.6374.10.camel@frissell> References: <1199726642.6374.10.camel@frissell> Message-ID: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org> CVS r/w is locked because we are transitioning to SVN - you can still checkout via anonymous CVS on code.open-bio.org. The SVN is going to be in /home/svn-repositories/bioperl not George's directory, but we are still monkeying around with the directory structure. You can try a checkout but be warned it may change a few more times if we add another directory layer in there. You will get requests for your password at least three times - I strongly suggest you use SSH keys to avoid getting prompted each time - I don't know why you get asked 3 times as it is a SVN thing I assume it is having to make 3 separate requests to do a checkout. That's what is up for now. We'll report when the final SVN migration is done. -jason On Jan 7, 2008, at 9:24 AM, Scott Cain wrote: > Hello, > > I was trying to get bioperl-live this morning from either cvs or > svn and > failed. I was wondering if something was going on with the server. > > Here are the things I tried: > > cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co > bioperl-live > > which resulted in this: > > cvs checkout: warning: cannot write to history file /home/ > repository/bioperl/CVSROOT/history: Permission denied > cvs checkout: Updating bioperl-live > cvs checkout: failed to create lock directory for `/home/repository/ > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ > #cvs.lock): Permission denied > cvs checkout: failed to obtain dir lock in repository `/home/ > repository/bioperl/bioperl-live' > cvs [checkout aborted]: read lock failed - giving up > > Then I thought I'd try the suggested svn checkout method from the > bioperl wiki: > > svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ > bioperl-live > > which resulted in > > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ > hartzell/bioperl/bioperl-live' > > Finally, I after looking at the openbio server, I thought I'd try > this: > > svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ > bioperl/bioperl-live > > which resulted in repeated requests for my password (which I supplied > correctly at least once out of the several requests). > > So, what's up? > > Thanks much, > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > ______________________________________________ From cain.cshl at gmail.com Mon Jan 7 12:57:38 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 07 Jan 2008 12:57:38 -0500 Subject: [Bioperl-l] Anything up with cvs/svn? In-Reply-To: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org> References: <1199726642.6374.10.camel@frissell> <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org> Message-ID: <1199728658.6374.12.camel@frissell> Hi Hilmar and Jason, Thanks--for some reason, I thought svn was done. I'll remain anonymous for right now (Kind of difficult to do when you announce it publicly :-) Thanks, Scott On Mon, 2008-01-07 at 09:43 -0800, Jason Stajich wrote: > CVS r/w is locked because we are transitioning to SVN - you can still > checkout via anonymous CVS on code.open-bio.org. > > The SVN is going to be in /home/svn-repositories/bioperl not George's > directory, but we are still monkeying around with the directory > structure. You can try a checkout but be warned it may change a few > more times if we add another directory layer in there. > > You will get requests for your password at least three times - I > strongly suggest you use SSH keys to avoid getting prompted each time > - I don't know why you get asked 3 times as it is a SVN thing I > assume it is having to make 3 separate requests to do a checkout. > > That's what is up for now. We'll report when the final SVN migration > is done. > > -jason > On Jan 7, 2008, at 9:24 AM, Scott Cain wrote: > > > Hello, > > > > I was trying to get bioperl-live this morning from either cvs or > > svn and > > failed. I was wondering if something was going on with the server. > > > > Here are the things I tried: > > > > cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co > > bioperl-live > > > > which resulted in this: > > > > cvs checkout: warning: cannot write to history file /home/ > > repository/bioperl/CVSROOT/history: Permission denied > > cvs checkout: Updating bioperl-live > > cvs checkout: failed to create lock directory for `/home/repository/ > > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ > > #cvs.lock): Permission denied > > cvs checkout: failed to obtain dir lock in repository `/home/ > > repository/bioperl/bioperl-live' > > cvs [checkout aborted]: read lock failed - giving up > > > > Then I thought I'd try the suggested svn checkout method from the > > bioperl wiki: > > > > svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ > > bioperl-live > > > > which resulted in > > > > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ > > hartzell/bioperl/bioperl-live' > > > > Finally, I after looking at the openbio server, I thought I'd try > > this: > > > > svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ > > bioperl/bioperl-live > > > > which resulted in repeated requests for my password (which I supplied > > correctly at least once out of the several requests). > > > > So, what's up? > > > > Thanks much, > > Scott > > > > -- > > ---------------------------------------------------------------------- > > -- > > Scott Cain, Ph. D. > > cain.cshl at gmail.com > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > ______________________________________________ > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain.cshl at gmail.com Mon Jan 7 13:34:25 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 07 Jan 2008 13:34:25 -0500 Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL` Message-ID: <1199730865.6374.18.camel@frissell> Hello, I was wanting to implement this myself (and probably still will, assuming it's not already there...) but I am not a Module::Build guru. Here's what I'd like to do: add a parameter that I can add when evoking perl Build.PL so that the default answers will be used when it would normally ask me a question while running perl Build.PL, something like this: perl Build.PL --yes Is this sort of thing already built into Module::Build and I can't see it? Or can somebody suggest the best way of going about this? Thanks much, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Mon Jan 7 17:22:35 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Jan 2008 16:22:35 -0600 Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL` In-Reply-To: <1199730865.6374.18.camel@frissell> References: <1199730865.6374.18.camel@frissell> Message-ID: <31AD254B-DABA-488D-BDA8-D690F949CC39@uiuc.edu> I agree it would be nice. Not sure how hard it would be to implement; maybe it would be best to have a mode of installation, say if one wanted 'minimal' (no optional module installation, no scripts), 'full', 'dev', (assume minimal install but don't test), and so on, falling back to the query-based approach if nothing is indicated. chris On Jan 7, 2008, at 12:34 PM, Scott Cain wrote: > Hello, > > I was wanting to implement this myself (and probably still will, > assuming it's not already there...) but I am not a Module::Build guru. > Here's what I'd like to do: add a parameter that I can add when > evoking > perl Build.PL so that the default answers will be used when it would > normally ask me a question while running perl Build.PL, something like > this: > > perl Build.PL --yes > > Is this sort of thing already built into Module::Build and I can't see > it? Or can somebody suggest the best way of going about this? > > Thanks much, > Scott > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Mon Jan 7 17:37:36 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 07 Jan 2008 22:37:36 +0000 Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL` In-Reply-To: <1199730865.6374.18.camel@frissell> References: <1199730865.6374.18.camel@frissell> Message-ID: <4782A9B0.60203@sendu.me.uk> Scott Cain wrote: > Hello, > > I was wanting to implement this myself (and probably still will, > assuming it's not already there...) but I am not a Module::Build guru. > Here's what I'd like to do: add a parameter that I can add when evoking > perl Build.PL so that the default answers will be used when it would > normally ask me a question while running perl Build.PL, something like > this: > > perl Build.PL --yes > > Is this sort of thing already built into Module::Build and I can't see > it? Or can somebody suggest the best way of going about this? You should ask on the Module::Build mailing list. If it already exists I don't think it is obvious, however. If your question is BioPerl related, and you're looking for a fast way of installing BioPerl without the annoying questions, I'm sure I could hack something into ModuleBuildBioperl.pm From cain.cshl at gmail.com Mon Jan 7 22:04:19 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 07 Jan 2008 22:04:19 -0500 Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL` In-Reply-To: <4782A9B0.60203@sendu.me.uk> References: <1199730865.6374.18.camel@frissell> <4782A9B0.60203@sendu.me.uk> Message-ID: <1199761459.6017.1.camel@frissell> Hi Sendu, I just hacked something up (I only needed to change a few lines--once I figured out where everything was). I like Chris' idea though; before I commit it back (Ha, no rush there), I'll flesh it out a little more to give more options. Scott On Mon, 2008-01-07 at 22:37 +0000, Sendu Bala wrote: > Scott Cain wrote: > > Hello, > > > > I was wanting to implement this myself (and probably still will, > > assuming it's not already there...) but I am not a Module::Build guru. > > Here's what I'd like to do: add a parameter that I can add when evoking > > perl Build.PL so that the default answers will be used when it would > > normally ask me a question while running perl Build.PL, something like > > this: > > > > perl Build.PL --yes > > > > Is this sort of thing already built into Module::Build and I can't see > > it? Or can somebody suggest the best way of going about this? > > You should ask on the Module::Build mailing list. If it already exists I > don't think it is obvious, however. > > If your question is BioPerl related, and you're looking for a fast way > of installing BioPerl without the annoying questions, I'm sure I could > hack something into ModuleBuildBioperl.pm -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From granjeau at tagc.univ-mrs.fr Wed Jan 9 03:30:17 2008 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Wed, 09 Jan 2008 09:30:17 +0100 Subject: [Bioperl-l] Parsing SwissProt annotation in comment Message-ID: <47848619.40109@tagc.univ-mrs.fr> Hello, I would like to retrieve the human reviewed annotation of SwissProt entries; these information are in the comment section of the sequence file. Here is an example: CC -!- FUNCTION: Actins are highly conserved proteins that are involved CC in various types of cell motility and are ubiquitously expressed CC in all eukaryotic cells. CC -!- SUBUNIT: Polymerization of globular actin (G-actin) leads to a CC structural filament (F-actin) in the form of a two-stranded helix. CC Each actin can bind to 4 others. Found in a complex with XPO6, CC Ran, ACTB and PFN1. Component of a complex composed at least of CC ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with XPO6. CC -!- INTERACTION: CC Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668; CC P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161; CC -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton. Is there a specific method to do such a job? Thanks much, Samuel -- Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 24 http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 http://icim.marseille.inserm.fr/proteomique From robfsouza at gmail.com Wed Jan 9 08:20:08 2008 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Wed, 9 Jan 2008 11:20:08 -0200 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs Message-ID: Hello All! Greetings for everybody and happy new year for those following an western calendary! I'm starting a new project to store and analyze distinct sets of sequence annotation data which are related in a way suitable for representation in a directed (e.g. transcript splicing) or undirected (e.g. gene product interaction) graph. Analysis will require frequent queries based on interval overlaps, feature neighbourhood, annotation and, most importantly, feature relationships and stored paths. At first, I thought of build an entire new database structure to store project specific data (e.g. alternative splicing or protein interaction), but as I have some experience with Lincon's Bio::DB::SeqFeature::Store, I'm now considering extending it for the purpose of storing graphs describing relationships among features. I'm aware that some other bioperl related databases, specifically BioSQL and Chado, do have components which might be suitable for storing all or some of these data but, since Lincon's feature storage and interval binning implementations in Bio::DB::SeqFeature::Store::mysql are both clean, simple and very fast, perhaps extending it in a seemingly modular way is desirable. A good extension to Lincon's database could include tables like feature_relationship and feature_path, for edges and transitive closures (just like in BioSQL) and feature_stored_path, for exclusion of biologically irrelevant paths in DAGs, like certain splicing isoforms. These tables could be used to store sequence assemblies or EST alignments efficiently, including scaffolds inferred by connecting contigs. Before starting, I would like to know if the BioSQL and Chado schemata do have accelerators for quering intervals among billions of features and feature relatioships (some examples using these databases would also help, if they that these databases are efficient for such tasks). If these or other databases are not as suitable as Bio::DB::SeqFeature for feature retrieval based on interval overlap and attributes, then again I might consider extending Bio::DB::seqFeature and contributing such extensions back to bioperl... Any thoughts? Best regards, Robson PS: sorry if anyone gets two copies of this post, but took me some time to realize my new e-mail wasn't subscribed to bioperl-l... From bix at sendu.me.uk Wed Jan 9 08:59:08 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 09 Jan 2008 13:59:08 +0000 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: References: Message-ID: <4784D32C.9070807@sendu.me.uk> Robson Francisco de Souza wrote: > Before starting, I would like to know if the BioSQL and Chado schemata > do have accelerators for quering intervals among billions of features > and feature relatioships (some examples using these databases would > also help, if they that these databases are efficient for such tasks). > If these or other databases are not as suitable as Bio::DB::SeqFeature > for feature retrieval based on interval overlap and attributes, I'm using Bio::DB::SeqFeature for that purpose, but just a warning: I found that with millions of features it made a db that was too large in terms of disc space and too slow in terms of query time. I had to hack out its storage of feature objects in the db, instead generating feature objects on request from the stored attributes. Doing this turned out to be faster than simply unfreezing certain kinds of feature objects! (I also had to hack in support for retrieval by source, a patch that Lincoln hasn't gotten back to me about yet.) While I can't answer your main questions, I wish you good luck with your project and request that you keep us posted with what you achieve. From bosborne11 at verizon.net Wed Jan 9 09:46:42 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 09 Jan 2008 09:46:42 -0500 Subject: [Bioperl-l] Parsing SwissProt annotation in comment In-Reply-To: <47848619.40109@tagc.univ-mrs.fr> References: <47848619.40109@tagc.univ-mrs.fr> Message-ID: <3DAEDA67-B9A5-47A4-8108-0915659F1052@verizon.net> Samuel, The Feature-Annotation HOWTO addresses this specifically: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Brian O. On Jan 9, 2008, at 3:30 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello, > > I would like to retrieve the human reviewed annotation of SwissProt > entries; these information are in the comment section of the > sequence file. Here is an example: > > CC -!- FUNCTION: Actins are highly conserved proteins that are > involved > CC in various types of cell motility and are ubiquitously > expressed > CC in all eukaryotic cells. > CC -!- SUBUNIT: Polymerization of globular actin (G-actin) leads > to a > CC structural filament (F-actin) in the form of a two-stranded > helix. > CC Each actin can bind to 4 others. Found in a complex with > XPO6, > CC Ran, ACTB and PFN1. Component of a complex composed at > least of > CC ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with > XPO6. > CC -!- INTERACTION: > CC Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668; > CC P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161; > CC -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton. > > Is there a specific method to do such a job? > > Thanks much, > Samuel > > -- > > Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr > INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 24 > http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 > http://icim.marseille.inserm.fr/proteomique > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From alexanderptok at web.de Wed Jan 9 10:34:56 2008 From: alexanderptok at web.de (Alexander Ptok) Date: Wed, 09 Jan 2008 16:34:56 +0100 Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN] Message-ID: <2011210591@web.de> Hi, I am a beginner to BioPerl and working through the Beginners HOWTO Version of BioPerl is 1.4-1 running on Debian etch In the Howto everything worked fine until the section Retrieving multiple sequences from a database from where i copied the following script: use Bio::DB::GenBank; use Bio::DB::Query::GenBank; $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query ); $gb_obj = Bio::DB::GenBank->new; $stream_obj = $gb_obj->get_Stream_by_query($query_obj); while ($seq_obj = $stream_obj->next_seq) { # do something with the sequence object print $seq_obj->display_id, "\t", $seq_obj->length, "\n"; } If i cut the 0:3000[SLEN] query it works and returns a lot of sequences, when i alter the query to e.g. 1830[SLEN] it finds the one sequence that has the length 1830, but i was not able to query a range of lengths. Please, does anyone know what i am doing wrong. Greetings A. Ptok _________________________________________________________________________ In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten! Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114 From cjm at fruitfly.org Wed Jan 9 11:52:21 2008 From: cjm at fruitfly.org (Chris Mungall) Date: Wed, 9 Jan 2008 08:52:21 -0800 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: References: Message-ID: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org> [cc-d to gmod-schema] Chado does have some views and pg functions for interval-based retrieval. AFAIK there are no accelerators for deep feature graphs, as most chado users have relatively shallow gene-model/SO feature graphs. It may not be so hard to extend cvterm code for doing this, depending on the characteristics of your graphs (the closure of feature neighbourhood graphs may be particularly large) On Jan 9, 2008, at 5:20 AM, Robson Francisco de Souza wrote: > Hello All! > > Greetings for everybody and happy new year for those following an > western calendary! > > I'm starting a new project to store and analyze distinct sets of > sequence annotation data which are related in a way suitable for > representation in a directed (e.g. transcript splicing) or undirected > (e.g. gene product interaction) graph. Analysis will require frequent > queries based on interval overlaps, feature neighbourhood, annotation > and, most importantly, feature relationships and stored paths. > > At first, I thought of build an entire new database structure to store > project specific data (e.g. alternative splicing or protein > interaction), > but as I have some experience with Lincon's > Bio::DB::SeqFeature::Store, I'm now considering extending it for the > purpose of storing graphs describing relationships among features. > > I'm aware that some other bioperl related databases, specifically > BioSQL and Chado, do have components which might be suitable for > storing all or some of these data but, since Lincon's feature storage > and interval binning implementations in > Bio::DB::SeqFeature::Store::mysql are both clean, simple and very > fast, > perhaps extending it in a seemingly modular way is desirable. A good > extension to Lincon's database could include tables like > feature_relationship and feature_path, for edges and transitive > closures (just like in BioSQL) and feature_stored_path, for exclusion > of biologically irrelevant paths in DAGs, like certain splicing > isoforms. These tables could be used to store sequence assemblies or > EST alignments efficiently, including scaffolds inferred by connecting > contigs. > > Before starting, I would like to know if the BioSQL and Chado schemata > do have accelerators for quering intervals among billions of features > and feature relatioships (some examples using these databases would > also help, if they that these databases are efficient for such tasks). > If these or other databases are not as suitable as Bio::DB::SeqFeature > for feature retrieval based on interval overlap and attributes, then > again I might consider extending Bio::DB::seqFeature > and contributing such extensions back to bioperl... > > Any thoughts? > > Best regards, > Robson > > PS: sorry if anyone gets two copies of this post, but took me some > time to realize my new e-mail wasn't subscribed to bioperl-l... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Jan 9 10:00:38 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 9 Jan 2008 09:00:38 -0600 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: <4784D32C.9070807@sendu.me.uk> References: <4784D32C.9070807@sendu.me.uk> Message-ID: On Jan 9, 2008, at 7:59 AM, Sendu Bala wrote: > Robson Francisco de Souza wrote: >> Before starting, I would like to know if the BioSQL and Chado >> schemata >> do have accelerators for quering intervals among billions of features >> and feature relatioships (some examples using these databases would >> also help, if they that these databases are efficient for such >> tasks). >> If these or other databases are not as suitable as >> Bio::DB::SeqFeature >> for feature retrieval based on interval overlap and attributes, > > I'm using Bio::DB::SeqFeature for that purpose, but just a warning: > I found that with millions of features it made a db that was too > large in terms of disc space and too slow in terms of query time. I > had to hack out its storage of feature objects in the db, instead > generating feature objects on request from the stored attributes. > Doing this turned out to be faster than simply unfreezing certain > kinds of feature objects! Would this be Bio::SF::Annotated objects? If so I bet Storable is storing the OntologyStore object information along with the SF (which argues for refactoring the FeatureIO/Bio::SF::Annotated stuff in 1.7). Not sure what can be done about that beyond your hack, though it might be worth exploring whether one can optionally set the DB::Store to store the object instance. > (I also had to hack in support for retrieval by source, a patch that > Lincoln hasn't gotten back to me about yet.) > > While I can't answer your main questions, I wish you good luck with > your project and request that you keep us posted with what you > achieve. You can always try Lincoln on the GBrowse list as well. I would say go ahead and commit the patch if it isn't a big deal. chris From cjfields at uiuc.edu Wed Jan 9 13:12:55 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 9 Jan 2008 12:12:55 -0600 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: References: Message-ID: <128517E8-3A2A-45DD-83A0-0014863A25BC@uiuc.edu> cc'ing the gbrowse list in case Lincoln hasn't seen this. I believe the primary intent for Bio::DB::SeqFeature::Store was as a more GFF3-compatible replacement for Bio::DB::GFF (unlimited feature nesting, uses any SeqFeatureI, etc) and was streamlined for faster lookups by GBrowse. I don't think adding tables would affect performance dramatically, though maybe Lincoln would have a better idea. chris On Jan 9, 2008, at 7:20 AM, Robson Francisco de Souza wrote: > Hello All! > > Greetings for everybody and happy new year for those following an > western calendary! > > I'm starting a new project to store and analyze distinct sets of > sequence annotation data which are related in a way suitable for > representation in a directed (e.g. transcript splicing) or undirected > (e.g. gene product interaction) graph. Analysis will require frequent > queries based on interval overlaps, feature neighbourhood, annotation > and, most importantly, feature relationships and stored paths. > > At first, I thought of build an entire new database structure to store > project specific data (e.g. alternative splicing or protein > interaction), > but as I have some experience with Lincon's > Bio::DB::SeqFeature::Store, I'm now considering extending it for the > purpose of storing graphs describing relationships among features. > > I'm aware that some other bioperl related databases, specifically > BioSQL and Chado, do have components which might be suitable for > storing all or some of these data but, since Lincon's feature storage > and interval binning implementations in > Bio::DB::SeqFeature::Store::mysql are both clean, simple and very > fast, > perhaps extending it in a seemingly modular way is desirable. A good > extension to Lincon's database could include tables like > feature_relationship and feature_path, for edges and transitive > closures (just like in BioSQL) and feature_stored_path, for exclusion > of biologically irrelevant paths in DAGs, like certain splicing > isoforms. These tables could be used to store sequence assemblies or > EST alignments efficiently, including scaffolds inferred by connecting > contigs. > > Before starting, I would like to know if the BioSQL and Chado schemata > do have accelerators for quering intervals among billions of features > and feature relatioships (some examples using these databases would > also help, if they that these databases are efficient for such tasks). > If these or other databases are not as suitable as Bio::DB::SeqFeature > for feature retrieval based on interval overlap and attributes, then > again I might consider extending Bio::DB::seqFeature > and contributing such extensions back to bioperl... > > Any thoughts? > > Best regards, > Robson > > PS: sorry if anyone gets two copies of this post, but took me some > time to realize my new e-mail wasn't subscribed to bioperl-l... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Wed Jan 9 13:29:15 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 09 Jan 2008 13:29:15 -0500 Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN] In-Reply-To: <2011210591@web.de> References: <2011210591@web.de> Message-ID: <0EB96131-7931-4FC3-802F-A8152B474A99@verizon.net> Alexander, I don't understand. By using the clause "0:3000[SLEN] " you are querying for sequences in the length range of 0 to 3000. Brian O. On Jan 9, 2008, at 10:34 AM, Alexander Ptok wrote: > If i cut the 0:3000[SLEN] query it works and returns a lot of > sequences, when i alter the query to e.g. 1830[SLEN] it > finds the one sequence that has the length 1830, but i was not able > to query a range of lengths. From stefan.kirov at bms.com Wed Jan 9 14:54:07 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 09 Jan 2008 14:54:07 -0500 Subject: [Bioperl-l] pairwise_kaks.PLS: verbose rquired by PAML Message-ID: <4785265F.6020500@bms.com> Jason, Even this last fix I still had problems with bp_pairwise_kaks.pl. It turns out, verbose needs to be set on by default for codeml in order for the sequences to appear in mlc file.\ That being said, we need instead of: $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new (-verbose => $verbose, -params => { 'runmode' => -2, 'seqtype' => 1, } ); this: $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new (-verbose => $verbose, -params => { 'runmode' => -2, 'seqtype' => 1, 'verbose' => 1, } ); verbose can 2 as well.... Just got this clarification from Ziheng. He also offers to change the output so it becomes easier for us. I plan to ask him to put the sequence in the mlc header by default. Stefan From robfsouza at gmail.com Wed Jan 9 19:28:25 2008 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Wed, 9 Jan 2008 22:28:25 -0200 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org> References: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org> Message-ID: Hi, 2008/1/9, Chris Mungall : > [cc-d to gmod-schema] > > Chado does have some views and pg functions for interval-based > retrieval. AFAIK there are no accelerators for deep feature graphs, > as most chado users have relatively shallow gene-model/SO feature > graphs. It may not be so hard to extend cvterm code for doing this, > depending on the characteristics of your graphs (the closure of > feature neighbourhood graphs may be particularly large) Great! I'm studing Chado and I will have a look at the interval optimizations. Did any of you compared BioSQL and Chado for huge feature and feature graph storage/retrieval efficiency? As Sendu pointed to limitations in Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms (or maybe another one?) would be best suited for these tasks... for the moment, I will either extend Sendu's hack of Lincon's modules or adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to Chado, if it turns out to be more efficient than the pg functions. Best, Robson PS: I could not find the most recent version of gmod by following the Download link to gmod(Chado) from GMOD's site to the Sourceforge download page. Did I miss the right link on the download site or is this unexpected? Is the version available at IUBio's mirror (0.003-10) the most recent one? From cain.cshl at gmail.com Wed Jan 9 22:15:29 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 09 Jan 2008 22:15:29 -0500 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: References: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org> Message-ID: <1199934929.6229.44.camel@frissell> Hi Robson, I seem to be perennially working on the 1.0 release of Chado. The schema itself is quite stable but I'm always working on the tools to make them handle more cases and be as stable as possible. For the time being, you need to get Chado from cvs; see http://www.gmod.org/wiki/index.php/Chado_-_Getting_Started#Chado_From_CVS I removed the 0.003 release from the SourceForge site because the schema in it is out of date relative to what we've been working on for the last year. Scott On Wed, 2008-01-09 at 22:28 -0200, Robson Francisco de Souza wrote: > Hi, > > 2008/1/9, Chris Mungall : > > [cc-d to gmod-schema] > > > > Chado does have some views and pg functions for interval-based > > retrieval. AFAIK there are no accelerators for deep feature graphs, > > as most chado users have relatively shallow gene-model/SO feature > > graphs. It may not be so hard to extend cvterm code for doing this, > > depending on the characteristics of your graphs (the closure of > > feature neighbourhood graphs may be particularly large) > > Great! I'm studing Chado and I will have a look at the interval optimizations. > Did any of you compared BioSQL and Chado for huge feature and feature > graph storage/retrieval efficiency? As Sendu pointed to limitations in > Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms > (or maybe another one?) would be best suited for these tasks... for > the moment, I will either extend Sendu's hack of Lincon's modules or > adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to > Chado, if it turns out to be more efficient than the pg functions. > > Best, > Robson > > PS: I could not find the most recent version of gmod by following the > Download link to gmod(Chado) from GMOD's site to the Sourceforge > download page. Did I miss the right link on the download site or is > this unexpected? Is the version available at IUBio's mirror (0.003-10) > the most recent one? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From bosborne11 at verizon.net Thu Jan 10 09:16:16 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 10 Jan 2008 09:16:16 -0500 Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN] In-Reply-To: <2013325230@web.de> References: <2013325230@web.de> Message-ID: <932550FF-8414-4B3E-92BB-1895FD9658AE@verizon.net> Alexander, OK, that is odd (meaning, this did work a while back but it's not clear to me what could have changed). First thing to do, upgrade to Bioperl version 1.52. Can you do this? Version 1.4 is very old and you could run into other problems using it. Brian O. On Jan 10, 2008, at 8:54 AM, Alexander Ptok wrote: > Hallo Brian, > > thanks for your answer. The principle is clear, but it doesn't work > like it should, on my computer. So maybe i should repeat what i did > step by step. > > 1. i took the following script: > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; > $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', - > query => $query ); > > $gb_obj = Bio::DB::GenBank->new; > > $stream_obj = $gb_obj->get_Stream_by_query($query_obj); > > while ($seq_obj = $stream_obj->next_seq) { > # do something with the sequence object > print $seq_obj->display_id, "\t", $seq_obj->length, "\n"; > } > > and then on the terminal > > sv1494 at r04102:~/Desktop/bioperl$ perl script1.pl > sv1494 at r04102:~/Desktop/bioperl$ > > 2. i took out the 0:3000[SLEN]: > > $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]"; > > and then on the terminal > > sv1494 at r04102:~/Desktop/bioperl$ perl script2.pl > NM_128760 2775 > NM_125788 2874 > NM_124913 3068 > NM_124912 3117 > NM_124775 871 > NM_120360 1655 > NM_111862 2199 > NM_001036386 2734 > NM_119270 3996 > NM_105072 1656 > NM_113294 4824 > NM_180431 1673 > NM_120495 2515 > NM_120493 2050 > NM_112156 1089 > . > . > and a lot more of hits, and one can clearly see, there are some with > a lenght between 0 and 3000 > > 3. to have a look at the [SLEN] i tried another script with e.g. > 2199[SLEN] > > $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 2199[SLEN]"; > > on the terminal: > > sv1494 at r04102:~/Desktop/bioperl$ perl script3.pl > NM_111862 2199 > sv1494 at r04102:~/Desktop/bioperl$ > > > > It think everthing works fine, except that bioperl or maybe the > genbank doesn't understand > the range clause 0:3000, but in every documentation says i have to > do it that way. Did > i misunterstand something or is it just a problem of my computer/ > bioperl installation? > Maybe you can tell me if the script does what it is suppose to do on > your computer? > > Thanks and greetings > > Alexander Ptok >> >> Alexander, >> >> I don't understand. By using the clause "0:3000[SLEN] " you are >> querying for sequences in the length range of 0 to 3000. >> > > > _______________________________________________________________________ > Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 30 Tage > kostenlos testen. http://www.pc-sicherheit.web.de/startseite/? > mc=022220 > From pmiguel at purdue.edu Fri Jan 11 11:22:38 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 11 Jan 2008 11:22:38 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? Message-ID: <478797CE.9050202@purdue.edu> No problem getting sequence from genbank via a myriad of methods. But as the volume of non-finished sequence in genbank increases the importance of also obtaining quality values for a given sequence increases. Some records include quality values. I typically use bp_fetch.pl to grab a sequence from genbank: bp_fetch.pl -fmt fasta net::genbank:AC207960 sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't designed to pull down quals evidently: bp_fetch.pl -fmt qual net::genbank:AC207960 gives: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual object to write_seq() as a parameter named "source" STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SeqIO::qual::write_seq /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205 STACK: /usr/local/perl/bin/bp_fetch.pl:313 ----------------------------------------------------------- (running under bioperl 1.5.2) The quality values for this accession are in genbank as these URLs demonstrate: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual What is the best way to pull down these qual values? They aren't present in "GenBank(Full)" format. They are present in an ASN.1 format. Advice would be appreciated. -- Phillip Purdue Genomics Core Facility From cjfields at uiuc.edu Fri Jan 11 12:09:40 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 11 Jan 2008 11:09:40 -0600 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: <478797CE.9050202@purdue.edu> References: <478797CE.9050202@purdue.edu> Message-ID: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> I don't think this is possible with the current setup for Bio::DB::GenBank (which the script uses). We'll have to investigate whether it is possible to retrieve this data via NCBI's eutils; if so we can try adding it in. If you want you can submit this as an enhancement request via bugzilla for tracking: http://bugzilla.open-bio.org/ chris On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > No problem getting sequence from genbank via a myriad of methods. > But as the volume of non-finished sequence in genbank increases the > importance of also obtaining quality values for a given sequence > increases. Some records include quality values. > > I typically use bp_fetch.pl to grab a sequence from genbank: > > bp_fetch.pl -fmt fasta net::genbank:AC207960 > > sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't > designed to pull down quals evidently: > > bp_fetch.pl -fmt qual net::genbank:AC207960 > > gives: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual > object to write_seq() as a parameter named "source" > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ > 5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::SeqIO::qual::write_seq /usr/local/perl_5.8/lib/site_perl/ > 5.8.8/Bio/SeqIO/qual.pm:205 > STACK: /usr/local/perl/bin/bp_fetch.pl:313 > ----------------------------------------------------------- > > (running under bioperl 1.5.2) > > The quality values for this accession are in genbank as these URLs > demonstrate: > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual > > What is the best way to pull down these qual values? They aren't > present in "GenBank(Full)" format. They are present in an ASN.1 > format. > > Advice would be appreciated. > > -- > Phillip > Purdue Genomics Core Facility > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From MEC at stowers-institute.org Fri Jan 11 14:14:10 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 11 Jan 2008 13:14:10 -0600 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> Message-ID: Indeed eutil is capable of this The following use of my ncbi_eutil (attached) script yeilds what you want: ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > AC207960.qual It depends on the version of NCBI_PowerScripting.pm , such as is included in Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Friday, January 11, 2008 11:10 AM > To: Phillip San Miguel > Cc: bioperl-l > Subject: Re: [Bioperl-l] Recommended way to download qual > files from Genbank? > > I don't think this is possible with the current setup for > Bio::DB::GenBank (which the script uses). We'll have to > investigate whether it is possible to retrieve this data via > NCBI's eutils; if so we can try adding it in. If you want > you can submit this as an enhancement request via bugzilla > for tracking: > > http://bugzilla.open-bio.org/ > > chris > > On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > > > No problem getting sequence from genbank via a myriad of methods. > > But as the volume of non-finished sequence in genbank increases the > > importance of also obtaining quality values for a given sequence > > increases. Some records include quality values. > > > > I typically use bp_fetch.pl to grab a sequence from genbank: > > > > bp_fetch.pl -fmt fasta net::genbank:AC207960 > > > > sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't > > designed to pull down quals evidently: > > > > bp_fetch.pl -fmt qual net::genbank:AC207960 > > > > gives: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual > > object to write_seq() as a parameter named "source" > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ > > 5.8.8/Bio/Root/Root.pm:359 > > STACK: Bio::SeqIO::qual::write_seq > /usr/local/perl_5.8/lib/site_perl/ > > 5.8.8/Bio/SeqIO/qual.pm:205 > > STACK: /usr/local/perl/bin/bp_fetch.pl:313 > > ----------------------------------------------------------- > > > > (running under bioperl 1.5.2) > > > > The quality values for this accession are in genbank as these URLs > > demonstrate: > > > > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 > > > > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15 > > 4937460&dopt=fasta > > > > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15 > > 4937460&dopt=qual > > > > What is the best way to pull down these qual values? They aren't > > present in "GenBank(Full)" format. They are present in an ASN.1 > > format. > > > > Advice would be appreciated. > > > > -- > > Phillip > > Purdue Genomics Core Facility > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From pmiguel at purdue.edu Fri Jan 11 14:33:13 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 11 Jan 2008 14:33:13 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> Message-ID: <4787C479.8070600@purdue.edu> Hi Malcolm, Looks like your email was (inadvertantly?) redacted in some way. (No attachment and last sentence truncated.) Would it be possible to get a complete version so I can be sure I'm following you? Thanks, Phillip Cook, Malcolm wrote: > Indeed eutil is capable of this > > The following use of my ncbi_eutil (attached) script yeilds what you > want: > > ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > > AC207960.qual > > It depends on the version of NCBI_PowerScripting.pm , such as is > included in > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Chris Fields >> Sent: Friday, January 11, 2008 11:10 AM >> To: Phillip San Miguel >> Cc: bioperl-l >> Subject: Re: [Bioperl-l] Recommended way to download qual >> files from Genbank? >> >> I don't think this is possible with the current setup for >> Bio::DB::GenBank (which the script uses). We'll have to >> investigate whether it is possible to retrieve this data via >> NCBI's eutils; if so we can try adding it in. If you want >> you can submit this as an enhancement request via bugzilla >> for tracking: >> >> http://bugzilla.open-bio.org/ >> >> chris >> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: >> >> >>> No problem getting sequence from genbank via a myriad of methods. >>> But as the volume of non-finished sequence in genbank increases the >>> importance of also obtaining quality values for a given sequence >>> increases. Some records include quality values. >>> >>> I typically use bp_fetch.pl to grab a sequence from genbank: >>> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960 >>> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't >>> designed to pull down quals evidently: >>> >>> bp_fetch.pl -fmt qual net::genbank:AC207960 >>> >>> gives: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual >>> object to write_seq() as a parameter named "source" >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ >>> 5.8.8/Bio/Root/Root.pm:359 >>> STACK: Bio::SeqIO::qual::write_seq >>> >> /usr/local/perl_5.8/lib/site_perl/ >> >>> 5.8.8/Bio/SeqIO/qual.pm:205 >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313 >>> ----------------------------------------------------------- >>> >>> (running under bioperl 1.5.2) >>> >>> The quality values for this accession are in genbank as these URLs >>> demonstrate: >>> >>> >>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 >> >>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15 >> >>> 4937460&dopt=fasta >>> >>> >>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15 >> >>> 4937460&dopt=qual >>> >>> What is the best way to pull down these qual values? They aren't >>> present in "GenBank(Full)" format. They are present in an ASN.1 >>> format. >>> >>> Advice would be appreciated. >>> >>> -- >>> Phillip >>> Purdue Genomics Core Facility >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > From pmiguel at purdue.edu Fri Jan 11 14:37:24 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 11 Jan 2008 14:37:24 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> Message-ID: <4787C574.8020003@purdue.edu> Hi Chris, Thanks. I have submitted this as an enhancement request to bugzilla. Phillip Chris Fields wrote: > I don't think this is possible with the current setup for > Bio::DB::GenBank (which the script uses). We'll have to investigate > whether it is possible to retrieve this data via NCBI's eutils; if so > we can try adding it in. If you want you can submit this as an > enhancement request via bugzilla for tracking: > > http://bugzilla.open-bio.org/ > > chris > > On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > >> No problem getting sequence from genbank via a myriad of methods. But >> as the volume of non-finished sequence in genbank increases the >> importance of also obtaining quality values for a given sequence >> increases. Some records include quality values. >> >> I typically use bp_fetch.pl to grab a sequence from genbank: >> >> bp_fetch.pl -fmt fasta net::genbank:AC207960 >> >> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't >> designed to pull down quals evidently: >> >> bp_fetch.pl -fmt qual net::genbank:AC207960 >> >> gives: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual >> object to write_seq() as a parameter named "source" >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359 >> STACK: Bio::SeqIO::qual::write_seq >> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205 >> STACK: /usr/local/perl/bin/bp_fetch.pl:313 >> ----------------------------------------------------------- >> >> (running under bioperl 1.5.2) >> >> The quality values for this accession are in genbank as these URLs >> demonstrate: >> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 >> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta >> >> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual >> >> >> What is the best way to pull down these qual values? They aren't >> present in "GenBank(Full)" format. They are present in an ASN.1 format. >> >> Advice would be appreciated. >> >> -- >> Phillip >> Purdue Genomics Core Facility >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From pmiguel at purdue.edu Fri Jan 11 15:46:59 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 11 Jan 2008 15:46:59 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> <4787C479.8070600@purdue.edu> Message-ID: <4787D5C3.1030308@purdue.edu> Hi Malcolm, Yes that works great! Well, one caveat: If you download both the fasta and the qual files: ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=fasta > AC207960.fasta ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > AC207960.fasta.qual The "primary IDs" don't match. The fasta comes out: >gi|154937460|gb|AC207960.1| and the qual comes out: >AC207960.1 which seems to choke most programs that use seq and qual (eg cross_match) because they want the primary IDs of the seq and qual files to match. Otherwise fine, though. Thanks, Phillip Cook, Malcolm wrote: > Phillip: > > Of course - mea culpa - here's the full monty.... > > Indeed NCBI's eutils can do this: > > >> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > >> > AC207960.qual > > which uses my script (attached) to wrap NCBI's eutils. > > It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip > by NCBI in their "Jul 24-27, 2007" course found at > http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html > > I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the > very beginning so that trace messages are not printed on STDOUT, such as > this echoed header: > Retrieving 1 records from nucleotide... > ... and footer: > Received records 1 - 1. > Wrote data to -. > > (otherwise they are interspersed with downloaded qual files) > > It also depends on recent version of GetOpt::Long. > > Hope it helps. > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > > >> -----Original Message----- >> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] >> Sent: Friday, January 11, 2008 1:33 PM >> To: Cook, Malcolm >> Cc: Chris Fields; bioperl-l >> Subject: Re: [Bioperl-l] Recommended way to download qual >> files from Genbank? >> >> Hi Malcolm, >> Looks like your email was (inadvertantly?) redacted in >> some way. (No attachment and last sentence truncated.) Would >> it be possible to get a complete version so I can be sure I'm >> following you? >> Thanks, >> Phillip >> >> Cook, Malcolm wrote: >> >>> Indeed eutil is capable of this >>> >>> The following use of my ncbi_eutil (attached) script yeilds what you >>> want: >>> >>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch >>> >> rettype=qual > >> >>> AC207960.qual >>> >>> It depends on the version of NCBI_PowerScripting.pm , such as is >>> included in >>> >>> Malcolm Cook >>> Database Applications Manager - Bioinformatics Stowers >>> >> Institute for >> >>> Medical Research - Kansas City, Missouri >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris >>>> Fields >>>> Sent: Friday, January 11, 2008 11:10 AM >>>> To: Phillip San Miguel >>>> Cc: bioperl-l >>>> Subject: Re: [Bioperl-l] Recommended way to download qual >>>> >> files from >> >>>> Genbank? >>>> >>>> I don't think this is possible with the current setup for >>>> Bio::DB::GenBank (which the script uses). We'll have to >>>> >> investigate >> >>>> whether it is possible to retrieve this data via NCBI's >>>> >> eutils; if so >> >>>> we can try adding it in. If you want you can submit this as an >>>> enhancement request via bugzilla for tracking: >>>> >>>> http://bugzilla.open-bio.org/ >>>> >>>> chris >>>> >>>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: >>>> >>>> >>>> >>>>> No problem getting sequence from genbank via a myriad of >>>>> >> methods. >> >>>>> But as the volume of non-finished sequence in genbank >>>>> >> increases the >> >>>>> importance of also obtaining quality values for a given sequence >>>>> increases. Some records include quality values. >>>>> >>>>> I typically use bp_fetch.pl to grab a sequence from genbank: >>>>> >>>>> bp_fetch.pl -fmt fasta net::genbank:AC207960 >>>>> >>>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't >>>>> designed to pull down quals evidently: >>>>> >>>>> bp_fetch.pl -fmt qual net::genbank:AC207960 >>>>> >>>>> gives: >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual >>>>> object to write_seq() as a parameter named "source" >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ >>>>> 5.8.8/Bio/Root/Root.pm:359 >>>>> STACK: Bio::SeqIO::qual::write_seq >>>>> >>>>> >>>> /usr/local/perl_5.8/lib/site_perl/ >>>> >>>> >>>>> 5.8.8/Bio/SeqIO/qual.pm:205 >>>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313 >>>>> ----------------------------------------------------------- >>>>> >>>>> (running under bioperl 1.5.2) >>>>> >>>>> The quality values for this accession are in genbank as these URLs >>>>> demonstrate: >>>>> >>>>> >>>>> >>>>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746 >> >>>> 0 >>>> >>>> >>>>> >>>>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 >> >>>> 5 >>>> >>>> >>>>> 4937460&dopt=fasta >>>>> >>>>> >>>>> >>>>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 >> >>>> 5 >>>> >>>> >>>>> 4937460&dopt=qual >>>>> >>>>> What is the best way to pull down these qual values? They aren't >>>>> present in "GenBank(Full)" format. They are present in an ASN.1 >>>>> format. >>>>> >>>>> Advice would be appreciated. >>>>> >>>>> -- >>>>> Phillip >>>>> Purdue Genomics Core Facility >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >> >> From MEC at stowers-institute.org Fri Jan 11 14:40:14 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 11 Jan 2008 13:40:14 -0600 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: <4787C479.8070600@purdue.edu> References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> <4787C479.8070600@purdue.edu> Message-ID: Phillip: Of course - mea culpa - here's the full monty.... Indeed NCBI's eutils can do this: > ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > AC207960.qual which uses my script (attached) to wrap NCBI's eutils. It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip by NCBI in their "Jul 24-27, 2007" course found at http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the very beginning so that trace messages are not printed on STDOUT, such as this echoed header: Retrieving 1 records from nucleotide... ... and footer: Received records 1 - 1. Wrote data to -. (otherwise they are interspersed with downloaded qual files) It also depends on recent version of GetOpt::Long. Hope it helps. Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Phillip San Miguel [mailto:pmiguel at purdue.edu] > Sent: Friday, January 11, 2008 1:33 PM > To: Cook, Malcolm > Cc: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] Recommended way to download qual > files from Genbank? > > Hi Malcolm, > Looks like your email was (inadvertantly?) redacted in > some way. (No attachment and last sentence truncated.) Would > it be possible to get a complete version so I can be sure I'm > following you? > Thanks, > Phillip > > Cook, Malcolm wrote: > > Indeed eutil is capable of this > > > > The following use of my ncbi_eutil (attached) script yeilds what you > > want: > > > > ncbi_eutil -search db=nucleotide term=AC207960 -fetch > rettype=qual > > > AC207960.qual > > > > It depends on the version of NCBI_PowerScripting.pm , such as is > > included in > > > > Malcolm Cook > > Database Applications Manager - Bioinformatics Stowers > Institute for > > Medical Research - Kansas City, Missouri > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris > >> Fields > >> Sent: Friday, January 11, 2008 11:10 AM > >> To: Phillip San Miguel > >> Cc: bioperl-l > >> Subject: Re: [Bioperl-l] Recommended way to download qual > files from > >> Genbank? > >> > >> I don't think this is possible with the current setup for > >> Bio::DB::GenBank (which the script uses). We'll have to > investigate > >> whether it is possible to retrieve this data via NCBI's > eutils; if so > >> we can try adding it in. If you want you can submit this as an > >> enhancement request via bugzilla for tracking: > >> > >> http://bugzilla.open-bio.org/ > >> > >> chris > >> > >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > >> > >> > >>> No problem getting sequence from genbank via a myriad of > methods. > >>> But as the volume of non-finished sequence in genbank > increases the > >>> importance of also obtaining quality values for a given sequence > >>> increases. Some records include quality values. > >>> > >>> I typically use bp_fetch.pl to grab a sequence from genbank: > >>> > >>> bp_fetch.pl -fmt fasta net::genbank:AC207960 > >>> > >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't > >>> designed to pull down quals evidently: > >>> > >>> bp_fetch.pl -fmt qual net::genbank:AC207960 > >>> > >>> gives: > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual > >>> object to write_seq() as a parameter named "source" > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ > >>> 5.8.8/Bio/Root/Root.pm:359 > >>> STACK: Bio::SeqIO::qual::write_seq > >>> > >> /usr/local/perl_5.8/lib/site_perl/ > >> > >>> 5.8.8/Bio/SeqIO/qual.pm:205 > >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313 > >>> ----------------------------------------------------------- > >>> > >>> (running under bioperl 1.5.2) > >>> > >>> The quality values for this accession are in genbank as these URLs > >>> demonstrate: > >>> > >>> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746 > >> 0 > >> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 > >> 5 > >> > >>> 4937460&dopt=fasta > >>> > >>> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 > >> 5 > >> > >>> 4937460&dopt=qual > >>> > >>> What is the best way to pull down these qual values? They aren't > >>> present in "GenBank(Full)" format. They are present in an ASN.1 > >>> format. > >>> > >>> Advice would be appreciated. > >>> > >>> -- > >>> Phillip > >>> Purdue Genomics Core Facility > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: ncbi_eutil Type: application/octet-stream Size: 1854 bytes Desc: ncbi_eutil URL: From cain.cshl at gmail.com Mon Jan 14 13:46:39 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 14 Jan 2008 13:46:39 -0500 Subject: [Bioperl-l] GenBank format and feature names > 15 char Message-ID: <1200336399.6056.12.camel@frissell> Hi all, Last month, I got a bug report on the GBrowse bug tracker: http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 about a problem with dumping invalid GenBank files. GBrowse uses Bio::SeqIO::genbank to create these dumps. In his bug report, he claims that feature names over 15 characters long are invalid, and provided and example GenBank file where a feature is named 'BAC_cloned_genomic_insert', which is over 15 characters. What I want to know is this: is this truly a restriction on the GenBank format, or is it a software problem with some other package? Do we need to fix genbank.pm? I'm perfectly willing to do it; I'm just hesitant to believe this is really a bug. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From lstein at cshl.edu Mon Jan 14 13:53:15 2008 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 14 Jan 2008 13:53:15 -0500 Subject: [Bioperl-l] GenBank format and feature names > 15 char In-Reply-To: <1200336399.6056.12.camel@frissell> References: <1200336399.6056.12.camel@frissell> Message-ID: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> Hi Scott, He is correct about the limitation, but we deliberately relaxed it because we were running into situations where we lost information during roundtripping from other formats into genbank. Lincoln On Jan 14, 2008 1:46 PM, Scott Cain wrote: > Hi all, > > Last month, I got a bug report on the GBrowse bug tracker: > > > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 > > about a problem with dumping invalid GenBank files. GBrowse uses > Bio::SeqIO::genbank to create these dumps. > > In his bug report, he claims that feature names over 15 characters long > are invalid, and provided and example GenBank file where a feature is > named 'BAC_cloned_genomic_insert', which is over 15 characters. What I > want to know is this: is this truly a restriction on the GenBank format, > or is it a software problem with some other package? Do we need to fix > genbank.pm? I'm perfectly willing to do it; I'm just hesitant to > believe this is really a bug. > > Thanks, > Scott > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Mon Jan 14 14:35:46 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 14 Jan 2008 13:35:46 -0600 Subject: [Bioperl-l] GenBank format and feature names > 15 char In-Reply-To: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> References: <1200336399.6056.12.camel@frissell> <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> Message-ID: It looks like the keys in the feature table run into the location string w/o intervening space, which would probably cause havoc with roundtripping from this output. A few examples: BAC_cloned_genomic_insert<1..>1000 combined_genscanjoin(<1..347,400..498,794..>1000) splign_na_dbEST_ncbi<1..>1000 I would think at least a space in between the location and the key would be required for round-tripping out of genbank format. chris On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote: > Hi Scott, > > He is correct about the limitation, but we deliberately relaxed it > because > we were running into situations where we lost information during > roundtripping from other formats into genbank. > > Lincoln > > On Jan 14, 2008 1:46 PM, Scott Cain wrote: > >> Hi all, >> >> Last month, I got a bug report on the GBrowse bug tracker: >> >> >> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 >> >> about a problem with dumping invalid GenBank files. GBrowse uses >> Bio::SeqIO::genbank to create these dumps. >> >> In his bug report, he claims that feature names over 15 characters >> long >> are invalid, and provided and example GenBank file where a feature is >> named 'BAC_cloned_genomic_insert', which is over 15 characters. >> What I >> want to know is this: is this truly a restriction on the GenBank >> format, >> or is it a software problem with some other package? Do we need to >> fix >> genbank.pm? I'm perfectly willing to do it; I'm just hesitant to >> believe this is really a bug. >> >> Thanks, >> Scott >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Mon Jan 14 14:46:20 2008 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 14 Jan 2008 14:46:20 -0500 Subject: [Bioperl-l] GenBank format and feature names > 15 char In-Reply-To: References: <1200336399.6056.12.camel@frissell> <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> Message-ID: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com> That's a new bug. The version I worked on inserted a space after the name. Lincoln On Jan 14, 2008 2:35 PM, Chris Fields wrote: > It looks like the keys in the feature table run into the location > string w/o intervening space, which would probably cause havoc with > roundtripping from this output. A few examples: > > BAC_cloned_genomic_insert<1..>1000 > combined_genscanjoin(<1..347,400..498,794..>1000) > splign_na_dbEST_ncbi<1..>1000 > > I would think at least a space in between the location and the key > would be required for round-tripping out of genbank format. > > chris > > On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote: > > > Hi Scott, > > > > He is correct about the limitation, but we deliberately relaxed it > > because > > we were running into situations where we lost information during > > roundtripping from other formats into genbank. > > > > Lincoln > > > > On Jan 14, 2008 1:46 PM, Scott Cain wrote: > > > >> Hi all, > >> > >> Last month, I got a bug report on the GBrowse bug tracker: > >> > >> > >> > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 > >> > >> about a problem with dumping invalid GenBank files. GBrowse uses > >> Bio::SeqIO::genbank to create these dumps. > >> > >> In his bug report, he claims that feature names over 15 characters > >> long > >> are invalid, and provided and example GenBank file where a feature is > >> named 'BAC_cloned_genomic_insert', which is over 15 characters. > >> What I > >> want to know is this: is this truly a restriction on the GenBank > >> format, > >> or is it a software problem with some other package? Do we need to > >> fix > >> genbank.pm? I'm perfectly willing to do it; I'm just hesitant to > >> believe this is really a bug. > >> > >> Thanks, > >> Scott > >> > >> -- > >> > ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. > cain.cshl at gmail.com > >> GMOD Coordinator (http://www.gmod.org/) > >> 216-392-3087 > >> Cold Spring Harbor Laboratory > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From diogoat at gmail.com Tue Jan 15 08:40:10 2008 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 15 Jan 2008 11:40:10 -0200 Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS Message-ID: <638512560801150540m108db442r227d82c709a954@mail.gmail.com> Hello, I want to extract protein_id and transcript from a CDS tag, from genome in genbak format but i have one problem, when the sequence in the file don't have the protein_id or the transcript the script gives me this error: ------------- EXCEPTION ------------- MSG: asking for tag value that does not exist protein_id STACK Bio::SeqFeature::Generic::get_tag_values /usr/share/perl5/Bio/SeqFeature/Generic.pm:504 STACK toplevel parser_cds.pl:25 -------------------------------------- Bellow I past the script ############################################## use Bio::SeqIO; use warnings; my $infile = $ARGV[0]; my $outfile = "$infile.out"; open (OUT, ">>$outfile"); my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => 'Genbank'); while (my $inseq = $seq_in->next_seq) { for my $feat_object ($inseq->get_SeqFeatures){ if ($feat_object->primary_tag eq "CDS"){ print OUT $feat_object->get_tag_values('protein_id')," "; print OUT $feat_object->get_tag_values('translation'),"\n"; } } } ############################################### Somebody can helps me? Thank Diogo Tschoeke From Marc.Logghe at ablynx.com Tue Jan 15 09:44:54 2008 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Tue, 15 Jan 2008 15:44:54 +0100 Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS In-Reply-To: <638512560801150540m108db442r227d82c709a954@mail.gmail.com> Message-ID: <03C512635899144083CADB0EE2220189013E2BEC@alpaca.lan.ablynx.com> Hi, Try testing for existence first using the has_tag() method. It is provided by Bio::AnnotatableI. print OUT $feat_object->get_tag_values('protein_id')," " if ($feat->has_tag('protein_id')); HTH, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Diogo Tschoeke > Sent: dinsdag 15 januari 2008 14:40 > To: Bioperl-list > Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS > > Hello, > > I want to extract protein_id and transcript from a CDS tag, from genome in > genbak format but i have one problem, when the sequence in the file don't > have the protein_id or the transcript the script gives me this error: > > ------------- EXCEPTION ------------- > MSG: asking for tag value that does not exist protein_id > STACK Bio::SeqFeature::Generic::get_tag_values > /usr/share/perl5/Bio/SeqFeature/Generic.pm:504 > STACK toplevel parser_cds.pl:25 > -------------------------------------- > > Bellow I past the script > > ############################################## > use Bio::SeqIO; > use warnings; > > my $infile = $ARGV[0]; > my $outfile = "$infile.out"; > open (OUT, ">>$outfile"); > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => 'Genbank'); > > while (my $inseq = $seq_in->next_seq) { > > for my $feat_object ($inseq->get_SeqFeatures){ > if ($feat_object->primary_tag eq "CDS"){ > print OUT $feat_object->get_tag_values('protein_id')," "; > print OUT $feat_object->get_tag_values('translation'),"\n"; > } > } > } > ############################################### > > Somebody can helps me? > > Thank > > Diogo Tschoeke > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cuiw at ncbi.nlm.nih.gov Tue Jan 15 11:50:53 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Tue, 15 Jan 2008 11:50:53 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? References: <478797CE.9050202@purdue.edu><14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu><4787C479.8070600@purdue.edu> Message-ID: <18C407FD4FFB424292D769FBD68C1987048E95CC@NIHCESMLBX8.nih.gov> There is an alternative way if you can download and compile NCBI C++ Toolkit (ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/2007/Aug_27_2007/) . Simply call the binary like: id1_fetch -fmt quality -gi 13508865 Wenwu Cui ________________________________ From: Cook, Malcolm [mailto:MEC at stowers-institute.org] Sent: Fri 1/11/2008 2:40 PM To: Phillip San Miguel Cc: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] Recommended way to download qual files from Genbank? Phillip: Of course - mea culpa - here's the full monty.... Indeed NCBI's eutils can do this: > ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > AC207960.qual which uses my script (attached) to wrap NCBI's eutils. It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip by NCBI in their "Jul 24-27, 2007" course found at http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the very beginning so that trace messages are not printed on STDOUT, such as this echoed header: Retrieving 1 records from nucleotide... ... and footer: Received records 1 - 1. Wrote data to -. (otherwise they are interspersed with downloaded qual files) It also depends on recent version of GetOpt::Long. Hope it helps. Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Phillip San Miguel [mailto:pmiguel at purdue.edu] > Sent: Friday, January 11, 2008 1:33 PM > To: Cook, Malcolm > Cc: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] Recommended way to download qual > files from Genbank? > > Hi Malcolm, > Looks like your email was (inadvertantly?) redacted in > some way. (No attachment and last sentence truncated.) Would > it be possible to get a complete version so I can be sure I'm > following you? > Thanks, > Phillip > > Cook, Malcolm wrote: > > Indeed eutil is capable of this > > > > The following use of my ncbi_eutil (attached) script yeilds what you > > want: > > > > ncbi_eutil -search db=nucleotide term=AC207960 -fetch > rettype=qual > > > AC207960.qual > > > > It depends on the version of NCBI_PowerScripting.pm , such as is > > included in > > > > Malcolm Cook > > Database Applications Manager - Bioinformatics Stowers > Institute for > > Medical Research - Kansas City, Missouri > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris > >> Fields > >> Sent: Friday, January 11, 2008 11:10 AM > >> To: Phillip San Miguel > >> Cc: bioperl-l > >> Subject: Re: [Bioperl-l] Recommended way to download qual > files from > >> Genbank? > >> > >> I don't think this is possible with the current setup for > >> Bio::DB::GenBank (which the script uses). We'll have to > investigate > >> whether it is possible to retrieve this data via NCBI's > eutils; if so > >> we can try adding it in. If you want you can submit this as an > >> enhancement request via bugzilla for tracking: > >> > >> http://bugzilla.open-bio.org/ > >> > >> chris > >> > >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > >> > >> > >>> No problem getting sequence from genbank via a myriad of > methods. > >>> But as the volume of non-finished sequence in genbank > increases the > >>> importance of also obtaining quality values for a given sequence > >>> increases. Some records include quality values. > >>> > >>> I typically use bp_fetch.pl to grab a sequence from genbank: > >>> > >>> bp_fetch.pl -fmt fasta net::genbank:AC207960 > >>> > >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't > >>> designed to pull down quals evidently: > >>> > >>> bp_fetch.pl -fmt qual net::genbank:AC207960 > >>> > >>> gives: > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual > >>> object to write_seq() as a parameter named "source" > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ > >>> 5.8.8/Bio/Root/Root.pm:359 > >>> STACK: Bio::SeqIO::qual::write_seq > >>> > >> /usr/local/perl_5.8/lib/site_perl/ > >> > >>> 5.8.8/Bio/SeqIO/qual.pm:205 > >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313 > >>> ----------------------------------------------------------- > >>> > >>> (running under bioperl 1.5.2) > >>> > >>> The quality values for this accession are in genbank as these URLs > >>> demonstrate: > >>> > >>> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746 > >> 0 > >> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 > >> 5 > >> > >>> 4937460&dopt=fasta > >>> > >>> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 > >> 5 > >> > >>> 4937460&dopt=qual > >>> > >>> What is the best way to pull down these qual values? They aren't > >>> present in "GenBank(Full)" format. They are present in an ASN.1 > >>> format. > >>> > >>> Advice would be appreciated. > >>> > >>> -- > >>> Phillip > >>> Purdue Genomics Core Facility > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > > From singhal at berkeley.edu Tue Jan 15 17:50:12 2008 From: singhal at berkeley.edu (Sonal Singhal) Date: Tue, 15 Jan 2008 14:50:12 -0800 Subject: [Bioperl-l] redundant sequences Message-ID: Hi all, I am mining a few genomes to find all the genes in a gene family, and of course multiple BLAST searches of different paralogs are returning a lot of redundant hits. I have searched the BioPerl documentation, and I cannot find an easy way to cluster and then purge redundant sequences. Any ideas? Cheers, sonal From MEC at stowers-institute.org Tue Jan 15 18:21:00 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 15 Jan 2008 17:21:00 -0600 Subject: [Bioperl-l] redundant sequences In-Reply-To: References: Message-ID: Cd-hit: http://bioinformatics.burnham.org/cd-hi/ Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Sonal Singhal > Sent: Tuesday, January 15, 2008 4:50 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] redundant sequences > > Hi all, > > I am mining a few genomes to find all the genes in a gene > family, and of course multiple BLAST searches of different > paralogs are returning > a lot of redundant hits. I have searched the BioPerl documentation, > and I cannot find an easy way to cluster and then purge > redundant sequences. Any ideas? > > Cheers, > sonal > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cain.cshl at gmail.com Tue Jan 15 21:24:50 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 15 Jan 2008 21:24:50 -0500 Subject: [Bioperl-l] GenBank format and feature names > 15 char In-Reply-To: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com> References: <1200336399.6056.12.camel@frissell> <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com> Message-ID: <1200450290.7276.3.camel@frissell> Hi Chris and Lincoln, I've attached my suggested patch. So, can I use svn to check it in? It only adds a space after the feature type name; I suspect that will be enough to fix the file format for most uses. Scott On Mon, 2008-01-14 at 14:46 -0500, Lincoln Stein wrote: > That's a new bug. The version I worked on inserted a space after the name. > > Lincoln > > On Jan 14, 2008 2:35 PM, Chris Fields wrote: > > > It looks like the keys in the feature table run into the location > > string w/o intervening space, which would probably cause havoc with > > roundtripping from this output. A few examples: > > > > BAC_cloned_genomic_insert<1..>1000 > > combined_genscanjoin(<1..347,400..498,794..>1000) > > splign_na_dbEST_ncbi<1..>1000 > > > > I would think at least a space in between the location and the key > > would be required for round-tripping out of genbank format. > > > > chris > > > > On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote: > > > > > Hi Scott, > > > > > > He is correct about the limitation, but we deliberately relaxed it > > > because > > > we were running into situations where we lost information during > > > roundtripping from other formats into genbank. > > > > > > Lincoln > > > > > > On Jan 14, 2008 1:46 PM, Scott Cain wrote: > > > > > >> Hi all, > > >> > > >> Last month, I got a bug report on the GBrowse bug tracker: > > >> > > >> > > >> > > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 > > >> > > >> about a problem with dumping invalid GenBank files. GBrowse uses > > >> Bio::SeqIO::genbank to create these dumps. > > >> > > >> In his bug report, he claims that feature names over 15 characters > > >> long > > >> are invalid, and provided and example GenBank file where a feature is > > >> named 'BAC_cloned_genomic_insert', which is over 15 characters. > > >> What I > > >> want to know is this: is this truly a restriction on the GenBank > > >> format, > > >> or is it a software problem with some other package? Do we need to > > >> fix > > >> genbank.pm? I'm perfectly willing to do it; I'm just hesitant to > > >> believe this is really a bug. > > >> > > >> Thanks, > > >> Scott > > >> > > >> -- > > >> > > ------------------------------------------------------------------------ > > >> Scott Cain, Ph. D. > > cain.cshl at gmail.com > > >> GMOD Coordinator (http://www.gmod.org/) > > >> 216-392-3087 > > >> Cold Spring Harbor Laboratory > > >> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: genbank.pm.patch Type: text/x-patch Size: 1110 bytes Desc: not available URL: From cjfields at uiuc.edu Tue Jan 15 22:15:51 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 15 Jan 2008 21:15:51 -0600 Subject: [Bioperl-l] Subversion migration complete Message-ID: On behalf of the BioPerl core developers, I am proud to announce that the BioPerl SVN migration has been completed. We would like to thank everyone who helped, in particular George Hartzell and Chris Dagdigian, both of who played instrumental roles in the CVS->SVN conversion and anonymous SVN setup for BioPerl. Anonymous SVN checkouts for bioperl-live are now possible using: svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live Developers can obtain a checkout from: svn co svn+ssh://USER at dev.open-bio.org/home/svn-repositories/bioperl/ bioperl-live/trunk bioperl-live Browsable repository: http://code.open-bio.org/svnweb/index.cgi/bioperl/ Basic instructions: http://www.bioperl.org/wiki/Using_Subversion We are still in the midst of implementing a few extra details related to SVN migration; the status on these can be viewed here: http://www.bioperl.org/wiki/CVS_to_SVN_Migration Enjoy! chris From bug-bioperl at rt.cpan.org Wed Jan 16 22:35:30 2008 From: bug-bioperl at rt.cpan.org (Chris Fields via RT) Date: Wed, 16 Jan 2008 22:35:30 -0500 Subject: [Bioperl-l] [rt.cpan.org #29533] Bio::SeqIO::interpro depends on XML::DOM::XPath In-Reply-To: References: Message-ID: Queue: bioperl Ticket On Fri Sep 21 10:28:52 2007, support at helpdesk.open-bio.org wrote: > Hi Mike, > > The proper place to submit this fix is the bioperl-l at lists.open-bio.org > mailing list or the OBF Bugzilla queue at: > http://bugzilla.open-bio.org/, this RT system is mainly for sysadmin > activities rather than for tracking code changes. Would you be so kind > to re-send your request to one of the places above? Thanks for the heads > up! :) > > Regards, > Mauricio. This has been fixed. I'll get the CPAN maintainer to close this out. From vipingjo at gmail.com Thu Jan 17 03:48:36 2008 From: vipingjo at gmail.com (viping) Date: Thu, 17 Jan 2008 16:48:36 +0800 Subject: [Bioperl-l] Can't locate object method "is_compatible" via package "Bio::Tree::Tree" Message-ID: <200801171648332965577@gmail.com> Hi Everyone?? I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + Windows XP SP2. When running example codes(attched below as t.pl) within Bio\Tree\Compatible.pm , I got this error: Can't locate object method "is_compatible" via package "Bio::Tree::Tree" I replaced "$t1->is_compatible($t2)" with "is_compatible Bio::Tree::Compatible ($t1,$t2)", the error changed: Can't locate object method "get_nodes" via package "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252, line 1. I modified Compatible.pm, changed code for "get_nodes" like this "get_nodes Bio::Tree::Tree($self);", new error arised : Can't use string ("Bio::Tree::Tree") as a HASH ref while "strict refs" in use at i:/Perl/site/lib/Bio\Tree\Tree.pm line 198, line 1. I gived up. Any help will be deeply appreciated. # this is the example script in Bio::Tree::Compatible??t.pl use Bio::Tree::Compatible; use Bio::TreeIO; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'input.tre'); my $t1 = $input->next_tree; my $t2 = $input->next_tree; my ($incompat, $ilabels, $inodes) = $t1->is_compatible($t2); if ($incompat) { my %cluster1 = %{ $t1->cluster_representation }; my %cluster2 = %{ $t2->cluster_representation }; print "incompatible trees\n"; if (scalar(@$ilabels)) { foreach my $label (@$ilabels) { my $node1 = $t1->find_node(-id => $label); my $node2 = $t2->find_node(-id => $label); my @c1 = sort @{ $cluster1{$node1} }; my @c2 = sort @{ $cluster2{$node2} }; print "label $label"; print " cluster"; map { print " ",$_ } @c1; print " cluster"; map { print " ",$_ } @c2; print "\n"; } } if (scalar(@$inodes)) { while (@$inodes) { my $node1 = shift @$inodes; my $node2 = shift @$inodes; my @c1 = sort @{ $cluster1{$node1} }; my @c2 = sort @{ $cluster2{$node2} }; print "cluster"; map { print " ",$_ } @c1; print " properly intersects cluster"; map { print " ",$_ } @c2; print "\n"; } } } else { print "compatible trees\n"; } __END__; # this is the file 'input.tre': (((A,B)C,D),(E,F,G)); ((A,B)H,E,(J,(K)G)I); # this is the full messages I got running like this: "perl.exe -w t.pl" Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96. Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145. Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162. Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196. Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211. Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235. Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257. Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278. Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314. Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100. Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152. Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190. Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252. Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300. Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334. Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375. Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399. Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420. Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449. Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491. Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505. Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526. Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552. Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577. Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597. Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617. Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637. Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653. Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669. Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685. Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690. Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717. Can't locate object method "is_compatible" via package "Bio::Tree::Tree" at Z:\bp\t.pl line 8, line 2. From bix at sendu.me.uk Thu Jan 17 06:18:56 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 17 Jan 2008 11:18:56 +0000 Subject: [Bioperl-l] Can't locate object method "is_compatible" via package "Bio::Tree::Tree" In-Reply-To: <200801171648332965577@gmail.com> References: <200801171648332965577@gmail.com> Message-ID: <478F39A0.2030508@sendu.me.uk> viping wrote: > Hi Everyone?? > > I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + > Windows XP SP2. When running example codes(attched below as t.pl) > within Bio\Tree\Compatible.pm , I got this error: > > Can't locate object method "is_compatible" via package > "Bio::Tree::Tree" > > I replaced "$t1->is_compatible($t2)" with "is_compatible > Bio::Tree::Compatible ($t1,$t2)", Yup, you had the right idea; unfortunately the synopsis code for Bio::Tree::Compatible is wrong. I've now fixed it in svn. > the error changed: Can't locate object method "get_nodes" via package > "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm > line 252, line 1. I didn't get quite that error; instead I had an issue with TreeIO: for whatever reason it is only returning one tree from your input file (ie. $t2 is undefined). I therefore got "Can't call method "get_nodes" on an undefined value [...]" Can someone look into/confirm that? From bix at sendu.me.uk Thu Jan 17 06:35:57 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 17 Jan 2008 11:35:57 +0000 Subject: [Bioperl-l] Can't locate object method "is_compatible" via package "Bio::Tree::Tree" In-Reply-To: <478F39A0.2030508@sendu.me.uk> References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk> Message-ID: <478F3D9D.6050306@sendu.me.uk> Sendu Bala wrote: >> the error changed: Can't locate object method "get_nodes" via >> package "Bio::Tree::Compatible" at >> i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252, line 1. > > I didn't get quite that error; instead I had an issue with TreeIO: > for whatever reason it is only returning one tree from your input > file (ie. $t2 is undefined). > > I therefore got "Can't call method "get_nodes" on an undefined value > [...]" > > Can someone look into/confirm that? ... Yeah, I think I'm losing my mind. The code below is 'ok' using the commented out -fh input for TreeIO, but is 'not ok' using the -file input, where the specified file contains the exact same data as __DATA__. Huh? #!/usr/bin/perl -w use strict; use warnings; use Bio::Tree::Compatible; use Bio::TreeIO; my $input = new Bio::TreeIO('-format' => 'newick', #-fh => \*DATA, -file => 'input.tre' ); my $t1 = $input->next_tree; my $t2 = $input->next_tree; if ($t2) { print "ok\n"; } else { print "not ok\n"; } __DATA__ (((A,B)C,D),(E,F,G)); ((A,B)H,E,(J,(K)G)I); From vipingjo at gmail.com Thu Jan 17 08:23:14 2008 From: vipingjo at gmail.com (viping) Date: Thu, 17 Jan 2008 21:23:14 +0800 Subject: [Bioperl-l] Can't locate object method "is_compatible" via package"Bio::Tree::Tree" References: <200801171648332965577@gmail.com>, <478F39A0.2030508@sendu.me.uk> Message-ID: <200801172123112184046@gmail.com> I got latest code modified by Sendu Bala vi SVN. It works well while "input.tre" and "t.pl" are in the same directory. Thank you, Sendu Bala. This is output: Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96. Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145. Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162. Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196. Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211. Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235. Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257. Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278. Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314. Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100. Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152. Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190. Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252. Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300. Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334. Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375. Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399. Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420. Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449. Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491. Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505. Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526. Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552. Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577. Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597. Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617. Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637. Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653. Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669. Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685. Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690. Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717. incompatible trees label G cluster G cluster G K cluster A B C properly intersects cluster A B H cluster A B C properly intersects cluster A B E G H I J K cluster A B C D properly intersects cluster A B H cluster A B C D properly intersects cluster A B E G H I J K cluster E F G properly intersects cluster G K cluster E F G properly intersects cluster G I J K cluster E F G properly intersects cluster A B E G H I J K cluster A B C D E F G properly intersects cluster A B H cluster A B C D E F G properly intersects cluster G K cluster A B C D E F G properly intersects cluster G I J K cluster A B C D E F G properly intersects cluster A B E G H I J K #this is latest code: use Bio::Tree::Compatible; use Bio::TreeIO; my $input = Bio::TreeIO->new('-format' => 'newick', '-file' => 'input.tre'); my $t1 = $input->next_tree; my $t2 = $input->next_tree; my ($incompat, $ilabels, $inodes) = Bio::Tree::Compatible::is_compatible($t1,$t2); if ($incompat) { my %cluster1 = %{ Bio::Tree::Compatible::cluster_representation($t1) }; my %cluster2 = %{ Bio::Tree::Compatible::cluster_representation($t2) }; print "incompatible trees\n"; if (scalar(@$ilabels)) { foreach my $label (@$ilabels) { my $node1 = $t1->find_node(-id => $label); my $node2 = $t2->find_node(-id => $label); my @c1 = sort @{ $cluster1{$node1} }; my @c2 = sort @{ $cluster2{$node2} }; print "label $label"; print " cluster"; map { print " ",$_ } @c1; print " cluster"; map { print " ",$_ } @c2; print "\n"; } } if (scalar(@$inodes)) { while (@$inodes) { my $node1 = shift @$inodes; my $node2 = shift @$inodes; my @c1 = sort @{ $cluster1{$node1} }; my @c2 = sort @{ $cluster2{$node2} }; print "cluster"; map { print " ",$_ } @c1; print " properly intersects cluster"; map { print " ",$_ } @c2; print "\n"; } } } else { print "compatible trees\n"; } ------------------ viping 2008-01-17 ------------------------------------------------------------- From: Sendu Bala Date: 2008-01-17 19:19:30 To: viping Cc: bioperl-l Subject: Re: [Bioperl-l] Can't locate object method "is_compatible" via package"Bio::Tree::Tree" viping wrote: > Hi Everyone?? > > I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + > Windows XP SP2. When running example codes(attched below as t.pl) > within Bio\Tree\Compatible.pm , I got this error: > > Can't locate object method "is_compatible" via package > "Bio::Tree::Tree" > > I replaced "$t1->is_compatible($t2)" with "is_compatible > Bio::Tree::Compatible ($t1,$t2)", Yup, you had the right idea; unfortunately the synopsis code for Bio::Tree::Compatible is wrong. I've now fixed it in svn. > the error changed: Can't locate object method "get_nodes" via package > "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm > line 252, line 1. I didn't get quite that error; instead I had an issue with TreeIO: for whatever reason it is only returning one tree from your input file (ie. $t2 is undefined). I therefore got "Can't call method "get_nodes" on an undefined value [...]" Can someone look into/confirm that? From cjfields at uiuc.edu Thu Jan 17 08:25:41 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 17 Jan 2008 07:25:41 -0600 Subject: [Bioperl-l] Can't locate object method "is_compatible" via package "Bio::Tree::Tree" In-Reply-To: <478F39A0.2030508@sendu.me.uk> References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk> Message-ID: <7BF3650B-F1D4-4F21-9C59-3AC13CA35945@uiuc.edu> Probably need to file this as a bug. There is a similar issue with Bio::TreeIO::nexus, but it probably isn't related unless it is using the same parsing logic: http://bugzilla.open-bio.org/show_bug.cgi?id=2356 chris On Jan 17, 2008, at 5:18 AM, Sendu Bala wrote: > viping wrote: >> Hi Everyone? >> >> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + >> Windows XP SP2. When running example codes(attched below as t.pl) >> within Bio\Tree\Compatible.pm , I got this error: >> >> Can't locate object method "is_compatible" via package >> "Bio::Tree::Tree" >> >> I replaced "$t1->is_compatible($t2)" with "is_compatible >> Bio::Tree::Compatible ($t1,$t2)", > > Yup, you had the right idea; unfortunately the synopsis code for > Bio::Tree::Compatible is wrong. > I've now fixed it in svn. > > >> the error changed: Can't locate object method "get_nodes" via package >> "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm >> line 252, line 1. > > I didn't get quite that error; instead I had an issue with TreeIO: for > whatever reason it is only returning one tree from your input file > (ie. > $t2 is undefined). > > I therefore got "Can't call method "get_nodes" on an undefined value > [...]" > > Can someone look into/confirm that? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From N.Haigh at sheffield.ac.uk Fri Jan 18 07:47:48 2008 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 18 Jan 2008 12:47:48 +0000 Subject: [Bioperl-l] Parsing Primer3 output Message-ID: <1200660468.47909ff498dd0@webmail.shef.ac.uk> I might be overlooking something, but is it possible to parse primer3 output? Cheers Nath From cjfields at uiuc.edu Fri Jan 18 08:27:47 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 18 Jan 2008 07:27:47 -0600 Subject: [Bioperl-l] Parsing Primer3 output In-Reply-To: <1200660468.47909ff498dd0@webmail.shef.ac.uk> References: <1200660468.47909ff498dd0@webmail.shef.ac.uk> Message-ID: <8C8BF818-FC04-42E3-9210-3FE23F92EA8F@uiuc.edu> Bio::Tools::Primer3. chris On Jan 18, 2008, at 6:47 AM, Nathan S. Haigh wrote: > I might be overlooking something, but is it possible to parse > primer3 output? > > Cheers > Nath > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hangsyin at gmail.com Sat Jan 19 13:25:59 2008 From: hangsyin at gmail.com (Hang) Date: Sat, 19 Jan 2008 10:25:59 -0800 (PST) Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined value at BIO::DB::GFF.pl Message-ID: <14971922.post@talk.nabble.com> Hi, everyone, I met this problem when I was running this script to extract features overlaps with 4:20,000..25,000. It always responds like "Can't call method "features" on an undefined value at BIO::DB::GFF.pl line XX". ============================================================== use Bio::DB::GFF; use Bio::Tools::GFF; my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:dmel_gff:localhost', -user => 'XXXX', -pass => 'XXXX') || die "database open failed"; my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000); my @features = $segment->features(-types => ['gene', 'exon', 'intron', 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features"; print(scalar(@features)."\n"); ================================================================ I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error. Other methods failed also. Any help will be deeply appreciated! Best, Jon -- View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14971922.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Sat Jan 19 22:36:44 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Sat, 19 Jan 2008 22:36:44 -0500 Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined value at BIO::DB::GFF.pl In-Reply-To: <14971922.post@talk.nabble.com> References: <14971922.post@talk.nabble.com> Message-ID: <1200800204.6069.5.camel@frissell> Hi Jon, I think it's funny that you have "or die" on the database opening line, "or die" on the @features line, but you didn't put one on the $segment line. Try adding "or die: $!" to the $segment line to see what it says, also add a 'print $segment' after you create it and before you try to get the features from it. Clearly, the problem is that $segment is not defined (that is, nothing is in it, not that the wrong thing is in it). The next trick is to find out why. My first guess, without looking at the data set, is that the arm is not really named '4'. Scott On Sat, 2008-01-19 at 10:25 -0800, Hang wrote: > Hi, everyone, > > I met this problem when I was running this script to extract features > overlaps with 4:20,000..25,000. It always responds like "Can't call method > "features" on an undefined value at BIO::DB::GFF.pl line XX". > ============================================================== > use Bio::DB::GFF; > use Bio::Tools::GFF; > my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:dmel_gff:localhost', > -user => 'XXXX', > -pass => 'XXXX') || die "database > open failed"; > > my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000); > my @features = $segment->features(-types => ['gene', 'exon', 'intron', > 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features"; > print(scalar(@features)."\n"); > > ================================================================ > I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded > dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error. > Other methods failed also. > > Any help will be deeply appreciated! > > Best, > Jon > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hangsyin at gmail.com Sat Jan 19 22:49:59 2008 From: hangsyin at gmail.com (Hang) Date: Sat, 19 Jan 2008 19:49:59 -0800 (PST) Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined value at BIO::DB::GFF.pl In-Reply-To: <1200800204.6069.5.camel@frissell> References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell> Message-ID: <14978241.post@talk.nabble.com> Hi, Scott, After adding die $!, I know something is wrong at line: "my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);" my gff file is like this: ##gff-version 3 ##sequence-region 4 1 1351857 4 FlyBase transposable_element 2 611 . + . ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-; 4 repeatmasker_dummy match 2 347 . + . ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker; 4 repeatmasker_dummy match_part 2 347 2367 + . ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207 5860 6210 +; ... ... I really got confused. Any further suggestion? Thank you! Jon Scott Cain-3 wrote: > > Hi Jon, > > I think it's funny that you have "or die" on the database opening line, > "or die" on the @features line, but you didn't put one on the $segment > line. Try adding "or die: $!" to the $segment line to see what it says, > also add a 'print $segment' after you create it and before you try to > get the features from it. > > Clearly, the problem is that $segment is not defined (that is, nothing > is in it, not that the wrong thing is in it). The next trick is to find > out why. My first guess, without looking at the data set, is that the > arm is not really named '4'. > > Scott > > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote: >> Hi, everyone, >> >> I met this problem when I was running this script to extract features >> overlaps with 4:20,000..25,000. It always responds like "Can't call >> method >> "features" on an undefined value at BIO::DB::GFF.pl line XX". >> ============================================================== >> use Bio::DB::GFF; >> use Bio::Tools::GFF; >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', >> -dsn => >> 'dbi:mysql:dmel_gff:localhost', >> -user => 'XXXX', >> -pass => 'XXXX') || die "database >> open failed"; >> >> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000); >> my @features = $segment->features(-types => ['gene', 'exon', 'intron', >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features"; >> print(scalar(@features)."\n"); >> >> ================================================================ >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error. >> Other methods failed also. >> >> Any help will be deeply appreciated! >> >> Best, >> Jon >> > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14978241.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Sat Jan 19 23:08:04 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Sat, 19 Jan 2008 23:08:04 -0500 Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined value at BIO::DB::GFF.pl In-Reply-To: <14978241.post@talk.nabble.com> References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell> <14978241.post@talk.nabble.com> Message-ID: <1200802084.6069.11.camel@frissell> Hi Jon, Well, seeing the error message would be helpful, but my first guess without is that there are a few things you can try: * removing the "sequence-region" line from the GFF file, adding a line like this: 4 FlyBase chromosome_arm 1 1351857 . . . ID=4;Name=4 and then reloading the database. * Or, you may want to consider using Bio::DB::SeqFeature::Store, since Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that is, with three levels of features (like gene, mRNA and CDS)). Scott On Sat, 2008-01-19 at 19:49 -0800, Hang wrote: > Hi, Scott, > > After adding die $!, I know something is wrong at line: > "my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);" > > my gff file is like this: > ##gff-version 3 > ##sequence-region 4 1 1351857 > 4 FlyBase transposable_element 2 611 . + . > ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-; > 4 repeatmasker_dummy match 2 347 . + . > ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker; > 4 repeatmasker_dummy match_part 2 347 2367 + . > ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207 > 5860 6210 +; > ... > ... > I really got confused. Any further suggestion? Thank you! > > Jon > > > > > > Scott Cain-3 wrote: > > > > Hi Jon, > > > > I think it's funny that you have "or die" on the database opening line, > > "or die" on the @features line, but you didn't put one on the $segment > > line. Try adding "or die: $!" to the $segment line to see what it says, > > also add a 'print $segment' after you create it and before you try to > > get the features from it. > > > > Clearly, the problem is that $segment is not defined (that is, nothing > > is in it, not that the wrong thing is in it). The next trick is to find > > out why. My first guess, without looking at the data set, is that the > > arm is not really named '4'. > > > > Scott > > > > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote: > >> Hi, everyone, > >> > >> I met this problem when I was running this script to extract features > >> overlaps with 4:20,000..25,000. It always responds like "Can't call > >> method > >> "features" on an undefined value at BIO::DB::GFF.pl line XX". > >> ============================================================== > >> use Bio::DB::GFF; > >> use Bio::Tools::GFF; > >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', > >> -dsn => > >> 'dbi:mysql:dmel_gff:localhost', > >> -user => 'XXXX', > >> -pass => 'XXXX') || die "database > >> open failed"; > >> > >> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000); > >> my @features = $segment->features(-types => ['gene', 'exon', 'intron', > >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features"; > >> print(scalar(@features)."\n"); > >> > >> ================================================================ > >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded > >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error. > >> Other methods failed also. > >> > >> Any help will be deeply appreciated! > >> > >> Best, > >> Jon > >> > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hangsyin at gmail.com Sun Jan 20 10:08:59 2008 From: hangsyin at gmail.com (Hang) Date: Sun, 20 Jan 2008 07:08:59 -0800 (PST) Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined value at BIO::DB::GFF.pl In-Reply-To: <1200802084.6069.11.camel@frissell> References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell> <14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell> Message-ID: <14982665.post@talk.nabble.com> Hi, Scott, I tried to change sequence-region line to "4 FlyBase chromosome_arm 1 1351857 . . . ID=4;Name=4", it doesn't work. "$!" didn't say anything but "died at line 12". So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to load the dmel-all-r5.4.gff(from Flybase) to a test database: ============================================================= use Bio::DB::SeqFeature::Store; use Bio::DB::SeqFeature::Store::GFF3Loader; my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test', -user => 'root', -pass => 'XXXXX', -write => 1 ); my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $db, -verbose => 1); $loader->load(./'dmel-all-r5.4.gff'); ============================================================= I got bunch of errors like this: "DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316". The line 1316 in mysql.pm looks like this: $sth->execute($name) or die $sth->errstr; I checked the database test after failed loading. There is only one table created, which call 'meta'. I also tried 'grant all on test to XXX at localhost' and used that -user and -pass to load gff, it didn't work either. Jon Scott Cain-3 wrote: > > Hi Jon, > > Well, seeing the error message would be helpful, but my first guess > without is that there are a few things you can try: > > * removing the "sequence-region" line from the GFF file, adding a line > like this: > > 4 FlyBase chromosome_arm 1 1351857 . . . ID=4;Name=4 > > and then reloading the database. > > * Or, you may want to consider using Bio::DB::SeqFeature::Store, since > Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that > is, with three levels of features (like gene, mRNA and CDS)). > > Scott > > On Sat, 2008-01-19 at 19:49 -0800, Hang wrote: >> Hi, Scott, >> >> After adding die $!, I know something is wrong at line: >> "my $segment = $db->segment(-name => '4', -start => 20000, -end => >> 25000);" >> >> my gff file is like this: >> ##gff-version 3 >> ##sequence-region 4 1 1351857 >> 4 FlyBase transposable_element 2 611 . + . >> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-; >> 4 repeatmasker_dummy match 2 347 . + . >> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker; >> 4 repeatmasker_dummy match_part 2 347 2367 + . >> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207 >> 5860 6210 +; >> ... >> ... >> I really got confused. Any further suggestion? Thank you! >> >> Jon >> >> >> >> >> >> Scott Cain-3 wrote: >> > >> > Hi Jon, >> > >> > I think it's funny that you have "or die" on the database opening line, >> > "or die" on the @features line, but you didn't put one on the $segment >> > line. Try adding "or die: $!" to the $segment line to see what it >> says, >> > also add a 'print $segment' after you create it and before you try to >> > get the features from it. >> > >> > Clearly, the problem is that $segment is not defined (that is, nothing >> > is in it, not that the wrong thing is in it). The next trick is to >> find >> > out why. My first guess, without looking at the data set, is that the >> > arm is not really named '4'. >> > >> > Scott >> > >> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote: >> >> Hi, everyone, >> >> >> >> I met this problem when I was running this script to extract features >> >> overlaps with 4:20,000..25,000. It always responds like "Can't call >> >> method >> >> "features" on an undefined value at BIO::DB::GFF.pl line XX". >> >> ============================================================== >> >> use Bio::DB::GFF; >> >> use Bio::Tools::GFF; >> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', >> >> -dsn => >> >> 'dbi:mysql:dmel_gff:localhost', >> >> -user => 'XXXX', >> >> -pass => 'XXXX') || die >> "database >> >> open failed"; >> >> >> >> my $segment = $db->segment(-name => '4', -start => 20000, -end => >> 25000); >> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron', >> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features"; >> >> print(scalar(@features)."\n"); >> >> >> >> ================================================================ >> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded >> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any >> error. >> >> Other methods failed also. >> >> >> >> Any help will be deeply appreciated! >> >> >> >> Best, >> >> Jon >> >> >> > -- >> > >> ------------------------------------------------------------------------ >> > Scott Cain, Ph. D. >> cain at cshl.edu >> > GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> > Cold Spring Harbor Laboratory >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain at cshl.edu Sun Jan 20 10:25:16 2008 From: cain at cshl.edu (Scott Cain) Date: Sun, 20 Jan 2008 10:25:16 -0500 (EST) Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined value at BIO::DB::GFF.pl In-Reply-To: <14982665.post@talk.nabble.com> Message-ID: Jon, There is a script for loading a SeqFeature database just like the GFF database, though I don't know what it's called off hand (I'm not at my normal computer right now). Be sure to read the documentation and you will probably want to use the 'fast' option (I don't remember what it is called either). Scott ---------------------------------------------------------------------- Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator, http://www.gmod.org/ (216)392-3087 ---------------------------------------------------------------------- On Sun, 20 Jan 2008, Hang wrote: > > Hi, Scott, > I tried to change sequence-region line to "4 FlyBase chromosome_arm 1 > 1351857 . . . ID=4;Name=4", it doesn't work. "$!" didn't say anything but > "died at line 12". > > So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to > load the dmel-all-r5.4.gff(from Flybase) to a test database: > ============================================================= > use Bio::DB::SeqFeature::Store; > use Bio::DB::SeqFeature::Store::GFF3Loader; > my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', > -dsn => 'dbi:mysql:test', > -user => 'root', > -pass => 'XXXXX', > -write => 1 ); > my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $db, > -verbose => 1); > $loader->load(./'dmel-all-r5.4.gff'); > ============================================================= > I got bunch of errors like this: > "DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at > C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316". > The line 1316 in mysql.pm looks like this: $sth->execute($name) or die > $sth->errstr; > I checked the database test after failed loading. There is only one table > created, which call 'meta'. I also tried 'grant all on test to > XXX at localhost' and used that -user and -pass to load gff, it didn't work > either. > > Jon > > > Scott Cain-3 wrote: > > > > Hi Jon, > > > > Well, seeing the error message would be helpful, but my first guess > > without is that there are a few things you can try: > > > > * removing the "sequence-region" line from the GFF file, adding a line > > like this: > > > > 4 FlyBase chromosome_arm 1 1351857 . . . ID=4;Name=4 > > > > and then reloading the database. > > > > * Or, you may want to consider using Bio::DB::SeqFeature::Store, since > > Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that > > is, with three levels of features (like gene, mRNA and CDS)). > > > > Scott > > > > On Sat, 2008-01-19 at 19:49 -0800, Hang wrote: > >> Hi, Scott, > >> > >> After adding die $!, I know something is wrong at line: > >> "my $segment = $db->segment(-name => '4', -start => 20000, -end => > >> 25000);" > >> > >> my gff file is like this: > >> ##gff-version 3 > >> ##sequence-region 4 1 1351857 > >> 4 FlyBase transposable_element 2 611 . + . > >> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-; > >> 4 repeatmasker_dummy match 2 347 . + . > >> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker; > >> 4 repeatmasker_dummy match_part 2 347 2367 + . > >> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207 > >> 5860 6210 +; > >> ... > >> ... > >> I really got confused. Any further suggestion? Thank you! > >> > >> Jon > >> > >> > >> > >> > >> > >> Scott Cain-3 wrote: > >> > > >> > Hi Jon, > >> > > >> > I think it's funny that you have "or die" on the database opening line, > >> > "or die" on the @features line, but you didn't put one on the $segment > >> > line. Try adding "or die: $!" to the $segment line to see what it > >> says, > >> > also add a 'print $segment' after you create it and before you try to > >> > get the features from it. > >> > > >> > Clearly, the problem is that $segment is not defined (that is, nothing > >> > is in it, not that the wrong thing is in it). The next trick is to > >> find > >> > out why. My first guess, without looking at the data set, is that the > >> > arm is not really named '4'. > >> > > >> > Scott > >> > > >> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote: > >> >> Hi, everyone, > >> >> > >> >> I met this problem when I was running this script to extract features > >> >> overlaps with 4:20,000..25,000. It always responds like "Can't call > >> >> method > >> >> "features" on an undefined value at BIO::DB::GFF.pl line XX". > >> >> ============================================================== > >> >> use Bio::DB::GFF; > >> >> use Bio::Tools::GFF; > >> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', > >> >> -dsn => > >> >> 'dbi:mysql:dmel_gff:localhost', > >> >> -user => 'XXXX', > >> >> -pass => 'XXXX') || die > >> "database > >> >> open failed"; > >> >> > >> >> my $segment = $db->segment(-name => '4', -start => 20000, -end => > >> 25000); > >> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron', > >> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features"; > >> >> print(scalar(@features)."\n"); > >> >> > >> >> ================================================================ > >> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded > >> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any > >> error. > >> >> Other methods failed also. > >> >> > >> >> Any help will be deeply appreciated! > >> >> > >> >> Best, > >> >> Jon > >> >> > >> > -- > >> > > >> ------------------------------------------------------------------------ > >> > Scott Cain, Ph. D. > >> cain at cshl.edu > >> > GMOD Coordinator (http://www.gmod.org/) > >> 216-392-3087 > >> > Cold Spring Harbor Laboratory > >> > > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > > >> > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Sun Jan 20 12:10:27 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 20 Jan 2008 11:10:27 -0600 Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined value at BIO::DB::GFF.pl In-Reply-To: References: Message-ID: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu> It's bp_seqfeature_load.pl (if you have the full bioperl core distribution, it's in script/Bio-SeqFeature/Store). I had some problems with the fast-loading option but it was likely just my gff formatting; example data loaded just fine. As for the error, you need to use the '-create' flag when initializing a database (or wiping data from a current one): ============================================================= use Bio::DB::SeqFeature::Store; use Bio::DB::SeqFeature::Store::GFF3Loader; my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test', -user => 'root', -pass => 'XXXXX', -write => 1 -create => 1); my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $db, -verbose => 1); $loader->load(./'dmel-all-r5.4.gff'); ============================================================= chris On Jan 20, 2008, at 9:25 AM, Scott Cain wrote: > Jon, > > There is a script for loading a SeqFeature database just like the GFF > database, though I don't know what it's called off hand (I'm not at my > normal computer right now). Be sure to read the documentation and you > will probably want to use the 'fast' option (I don't remember what > it is > called either). > > Scott > > > ---------------------------------------------------------------------- > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator, http://www.gmod.org/ (216)392-3087 > ---------------------------------------------------------------------- > > > On Sun, 20 Jan 2008, Hang wrote: > >> >> Hi, Scott, >> I tried to change sequence-region line to "4 FlyBase >> chromosome_arm 1 >> 1351857 . . . ID=4;Name=4", it doesn't work. "$!" didn't say >> anything but >> "died at line 12". >> >> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my >> code to >> load the dmel-all-r5.4.gff(from Flybase) to a test database: >> ============================================================= >> use Bio::DB::SeqFeature::Store; >> use Bio::DB::SeqFeature::Store::GFF3Loader; >> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', >> -dsn => 'dbi:mysql:test', >> -user => 'root', >> -pass => 'XXXXX', >> -write => 1 ); >> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store >> => $db, >> -verbose >> => 1); >> $loader->load(./'dmel-all-r5.4.gff'); >> ============================================================= >> I got bunch of errors like this: >> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't >> exist at >> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line >> 1316". >> The line 1316 in mysql.pm looks like this: $sth->execute($name) or >> die >> $sth->errstr; >> I checked the database test after failed loading. There is only one >> table >> created, which call 'meta'. I also tried 'grant all on test to >> XXX at localhost' and used that -user and -pass to load gff, it didn't >> work >> either. >> >> Jon >> >> >> Scott Cain-3 wrote: >>> >>> Hi Jon, >>> >>> Well, seeing the error message would be helpful, but my first guess >>> without is that there are a few things you can try: >>> >>> * removing the "sequence-region" line from the GFF file, adding a >>> line >>> like this: >>> >>> 4 FlyBase chromosome_arm 1 1351857 . . . ID=4;Name=4 >>> >>> and then reloading the database. >>> >>> * Or, you may want to consider using Bio::DB::SeqFeature::Store, >>> since >>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 >>> (that >>> is, with three levels of features (like gene, mRNA and CDS)). >>> >>> Scott >>> >>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote: >>>> Hi, Scott, >>>> >>>> After adding die $!, I know something is wrong at line: >>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end => >>>> 25000);" >>>> >>>> my gff file is like this: >>>> ##gff-version 3 >>>> ##sequence-region 4 1 1351857 >>>> 4 FlyBase transposable_element 2 611 . + . >>>> ID=FBti0062890;Name=ninja-Dsim- >>>> like >>>> {}4829 >>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-; >>>> 4 repeatmasker_dummy match 2 347 . + . >>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- >>>> RepeatMasker; >>>> 4 repeatmasker_dummy match_part 2 347 2367 + . >>>> ID=:5142029_dummy;Name=:5142029;Parent=: >>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207 >>>> 5860 6210 +; >>>> ... >>>> ... >>>> I really got confused. Any further suggestion? Thank you! >>>> >>>> Jon >>>> >>>> >>>> >>>> >>>> >>>> Scott Cain-3 wrote: >>>>> >>>>> Hi Jon, >>>>> >>>>> I think it's funny that you have "or die" on the database >>>>> opening line, >>>>> "or die" on the @features line, but you didn't put one on the >>>>> $segment >>>>> line. Try adding "or die: $!" to the $segment line to see what it >>>> says, >>>>> also add a 'print $segment' after you create it and before you >>>>> try to >>>>> get the features from it. >>>>> >>>>> Clearly, the problem is that $segment is not defined (that is, >>>>> nothing >>>>> is in it, not that the wrong thing is in it). The next trick is >>>>> to >>>> find >>>>> out why. My first guess, without looking at the data set, is >>>>> that the >>>>> arm is not really named '4'. >>>>> >>>>> Scott >>>>> >>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote: >>>>>> Hi, everyone, >>>>>> >>>>>> I met this problem when I was running this script to extract >>>>>> features >>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't >>>>>> call >>>>>> method >>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX". >>>>>> ============================================================== >>>>>> use Bio::DB::GFF; >>>>>> use Bio::Tools::GFF; >>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', >>>>>> -dsn => >>>>>> 'dbi:mysql:dmel_gff:localhost', >>>>>> -user => 'XXXX', >>>>>> -pass => 'XXXX') || die >>>> "database >>>>>> open failed"; >>>>>> >>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end => >>>> 25000); >>>>>> my @features = $segment->features(-types => ['gene', 'exon', >>>>>> 'intron', >>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no >>>>>> features"; >>>>>> print(scalar(@features)."\n"); >>>>>> >>>>>> ================================================================ >>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I >>>>>> loaded >>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any >>>> error. >>>>>> Other methods failed also. >>>>>> >>>>>> Any help will be deeply appreciated! >>>>>> >>>>>> Best, >>>>>> Jon >>>>>> >>>>> -- >>>>> >>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. >>>> cain at cshl.edu >>>>> GMOD Coordinator (http://www.gmod.org/) >>>> 216-392-3087 >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. cain at cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ykumagai at biken.osaka-u.ac.jp Mon Jan 21 11:56:53 2008 From: ykumagai at biken.osaka-u.ac.jp (Yutaro Kumagai) Date: Tue, 22 Jan 2008 01:56:53 +0900 Subject: [Bioperl-l] Problem with Bio::ASN1::EntrezGene::Indexer Message-ID: <4794CED5.3070307@biken.osaka-u.ac.jp> Hi, everyone, I'm working on Bio::ASN1::EntrezGene::Indexer as below: ### use Bio::ASN1::EntrezGene::Indexer use Bio::ASN1::EntrezGene use Bio::SeqIO; my $inx = Bio::ASN1::EntrezGene::Indexer->new(-filename => 'c:/chrm/asn/entrezgene.idx'); # The index file has already been made successfully. I checked it # by counting the num. of records by $inx -> count_records etc. etc. my $seq1 = $inx -> fetch_hash(15959); # The ID 15969 surely exists, because I had no err message and # by dumpening $seq1, I confirmed that $seq1 contains some data. my $seq2 = $inx -> fetch(15969); ### However, the last method returned this error: "you must pass in a file name or handle through new() or input_file() first before calling next_seq! at C:/Perl/site/lib/Bio\SeqIO\entrezgene.pm line 136". I chased the programm by the debugger, and found that somehow _fh() in Bio::Index::AbstractSeq failed to pass the filehandle to fetch. Now, I have two questions: 1) what's wrong with the above methods? Is this a bug? Or just my fault? If so, what is my fault? 2) If I could'nt work with "fetch", how can I extract the data of sequences (position in genomic contig, strand etc.) from the data obtained by "fetch_hash"? Now I can't understand how the data structure of results by "fetch_hash" is... Thank you in advance. Yutaro Kumagai. -- ********************************** Yutaro Kumagai Dept. of Host Defense Res. Inst. for Microbial Diseases Osaka University Japan ykumagai at biken.osaka-u.ac.jp ********************************** From hangsyin at gmail.com Mon Jan 21 14:22:55 2008 From: hangsyin at gmail.com (Hang) Date: Mon, 21 Jan 2008 11:22:55 -0800 (PST) Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined value at BIO::DB::GFF.pl In-Reply-To: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu> References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell> <14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell> <14982665.post@talk.nabble.com> <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu> Message-ID: <15004412.post@talk.nabble.com> Hi, Chris: Following your suggestion, I added -create flag and the GFF3loader started to work. Thanks alot! When I load dmel-all-5.4.gff into mysql with -fast, I had the following error: Data too long for column 'attribute_value' at c:/../../../mysql.pm line 510 If I don't use -fast, it is OK, except for the annoying slow speed. Do you have any suggestion on this? Best, Hang Chris Fields wrote: > > It's bp_seqfeature_load.pl (if you have the full bioperl core > distribution, it's in script/Bio-SeqFeature/Store). I had some > problems with the fast-loading option but it was likely just my gff > formatting; example data loaded just fine. > > As for the error, you need to use the '-create' flag when initializing > a database (or wiping data from a current one): > > ============================================================= > use Bio::DB::SeqFeature::Store; > use Bio::DB::SeqFeature::Store::GFF3Loader; > my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', > -dsn => 'dbi:mysql:test', > -user => 'root', > -pass => 'XXXXX', > -write => 1 > -create => 1); > my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => > $db, > -verbose => > 1); > $loader->load(./'dmel-all-r5.4.gff'); > ============================================================= > > chris > > On Jan 20, 2008, at 9:25 AM, Scott Cain wrote: > >> Jon, >> >> There is a script for loading a SeqFeature database just like the GFF >> database, though I don't know what it's called off hand (I'm not at my >> normal computer right now). Be sure to read the documentation and you >> will probably want to use the 'fast' option (I don't remember what >> it is >> called either). >> >> Scott >> >> >> ---------------------------------------------------------------------- >> Scott Cain, Ph. D. cain at cshl.edu >> GMOD Coordinator, http://www.gmod.org/ (216)392-3087 >> ---------------------------------------------------------------------- >> >> >> On Sun, 20 Jan 2008, Hang wrote: >> >>> >>> Hi, Scott, >>> I tried to change sequence-region line to "4 FlyBase >>> chromosome_arm 1 >>> 1351857 . . . ID=4;Name=4", it doesn't work. "$!" didn't say >>> anything but >>> "died at line 12". >>> >>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my >>> code to >>> load the dmel-all-r5.4.gff(from Flybase) to a test database: >>> ============================================================= >>> use Bio::DB::SeqFeature::Store; >>> use Bio::DB::SeqFeature::Store::GFF3Loader; >>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', >>> -dsn => 'dbi:mysql:test', >>> -user => 'root', >>> -pass => 'XXXXX', >>> -write => 1 ); >>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store >>> => $db, >>> -verbose >>> => 1); >>> $loader->load(./'dmel-all-r5.4.gff'); >>> ============================================================= >>> I got bunch of errors like this: >>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't >>> exist at >>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line >>> 1316". >>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or >>> die >>> $sth->errstr; >>> I checked the database test after failed loading. There is only one >>> table >>> created, which call 'meta'. I also tried 'grant all on test to >>> XXX at localhost' and used that -user and -pass to load gff, it didn't >>> work >>> either. >>> >>> Jon >>> >>> >>> Scott Cain-3 wrote: >>>> >>>> Hi Jon, >>>> >>>> Well, seeing the error message would be helpful, but my first guess >>>> without is that there are a few things you can try: >>>> >>>> * removing the "sequence-region" line from the GFF file, adding a >>>> line >>>> like this: >>>> >>>> 4 FlyBase chromosome_arm 1 1351857 . . . ID=4;Name=4 >>>> >>>> and then reloading the database. >>>> >>>> * Or, you may want to consider using Bio::DB::SeqFeature::Store, >>>> since >>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 >>>> (that >>>> is, with three levels of features (like gene, mRNA and CDS)). >>>> >>>> Scott >>>> >>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote: >>>>> Hi, Scott, >>>>> >>>>> After adding die $!, I know something is wrong at line: >>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end => >>>>> 25000);" >>>>> >>>>> my gff file is like this: >>>>> ##gff-version 3 >>>>> ##sequence-region 4 1 1351857 >>>>> 4 FlyBase transposable_element 2 611 . + . >>>>> ID=FBti0062890;Name=ninja-Dsim- >>>>> like >>>>> {}4829 >>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-; >>>>> 4 repeatmasker_dummy match 2 347 . + . >>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- >>>>> RepeatMasker; >>>>> 4 repeatmasker_dummy match_part 2 347 2367 + . >>>>> ID=:5142029_dummy;Name=:5142029;Parent=: >>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207 >>>>> 5860 6210 +; >>>>> ... >>>>> ... >>>>> I really got confused. Any further suggestion? Thank you! >>>>> >>>>> Jon >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Scott Cain-3 wrote: >>>>>> >>>>>> Hi Jon, >>>>>> >>>>>> I think it's funny that you have "or die" on the database >>>>>> opening line, >>>>>> "or die" on the @features line, but you didn't put one on the >>>>>> $segment >>>>>> line. Try adding "or die: $!" to the $segment line to see what it >>>>> says, >>>>>> also add a 'print $segment' after you create it and before you >>>>>> try to >>>>>> get the features from it. >>>>>> >>>>>> Clearly, the problem is that $segment is not defined (that is, >>>>>> nothing >>>>>> is in it, not that the wrong thing is in it). The next trick is >>>>>> to >>>>> find >>>>>> out why. My first guess, without looking at the data set, is >>>>>> that the >>>>>> arm is not really named '4'. >>>>>> >>>>>> Scott >>>>>> >>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote: >>>>>>> Hi, everyone, >>>>>>> >>>>>>> I met this problem when I was running this script to extract >>>>>>> features >>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't >>>>>>> call >>>>>>> method >>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX". >>>>>>> ============================================================== >>>>>>> use Bio::DB::GFF; >>>>>>> use Bio::Tools::GFF; >>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', >>>>>>> -dsn => >>>>>>> 'dbi:mysql:dmel_gff:localhost', >>>>>>> -user => 'XXXX', >>>>>>> -pass => 'XXXX') || die >>>>> "database >>>>>>> open failed"; >>>>>>> >>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end => >>>>> 25000); >>>>>>> my @features = $segment->features(-types => ['gene', 'exon', >>>>>>> 'intron', >>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no >>>>>>> features"; >>>>>>> print(scalar(@features)."\n"); >>>>>>> >>>>>>> ================================================================ >>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I >>>>>>> loaded >>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any >>>>> error. >>>>>>> Other methods failed also. >>>>>>> >>>>>>> Any help will be deeply appreciated! >>>>>>> >>>>>>> Best, >>>>>>> Jon >>>>>>> >>>>>> -- >>>>>> >>>>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. >>>>> cain at cshl.edu >>>>>> GMOD Coordinator (http://www.gmod.org/) >>>>> 216-392-3087 >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. >>>> cain at cshl.edu >>>> GMOD Coordinator (http://www.gmod.org/) >>>> 216-392-3087 >>>> Cold Spring Harbor Laboratory >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Mon Jan 21 23:21:27 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 21 Jan 2008 22:21:27 -0600 Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined value at BIO::DB::GFF.pl In-Reply-To: <15004412.post@talk.nabble.com> References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell> <14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell> <14982665.post@talk.nabble.com> <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu> <15004412.post@talk.nabble.com> Message-ID: <8B1956B2-1380-4E73-8F14-F79CA5435697@uiuc.edu> I'm cc'ing this to the gbrowse list just in case Lincoln or Scott have an idea. My guess is it's a bug in the fast loader. Could you file this in bugzilla? http://bugzilla.open-bio.org/ chris On Jan 21, 2008, at 1:22 PM, Hang wrote: > > Hi, Chris: > > Following your suggestion, I added -create flag and the GFF3loader > started > to work. Thanks alot! > When I load dmel-all-5.4.gff into mysql with -fast, I had the > following > error: > Data too long for column 'attribute_value' at c:/../../../mysql.pm > line > 510 > If I don't use -fast, it is OK, except for the annoying slow speed. > Do you > have any suggestion on this? > > Best, > Hang > > > > > Chris Fields wrote: >> >> It's bp_seqfeature_load.pl (if you have the full bioperl core >> distribution, it's in script/Bio-SeqFeature/Store). I had some >> problems with the fast-loading option but it was likely just my gff >> formatting; example data loaded just fine. >> >> As for the error, you need to use the '-create' flag when >> initializing >> a database (or wiping data from a current one): >> >> ============================================================= >> use Bio::DB::SeqFeature::Store; >> use Bio::DB::SeqFeature::Store::GFF3Loader; >> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', >> -dsn => 'dbi:mysql:test', >> -user => 'root', >> -pass => 'XXXXX', >> -write => 1 >> -create => 1); >> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => >> $db, >> -verbose => >> 1); >> $loader->load(./'dmel-all-r5.4.gff'); >> ============================================================= >> >> chris >> >> On Jan 20, 2008, at 9:25 AM, Scott Cain wrote: >> >>> Jon, >>> >>> There is a script for loading a SeqFeature database just like the >>> GFF >>> database, though I don't know what it's called off hand (I'm not >>> at my >>> normal computer right now). Be sure to read the documentation and >>> you >>> will probably want to use the 'fast' option (I don't remember what >>> it is >>> called either). >>> >>> Scott >>> >>> >>> ---------------------------------------------------------------------- >>> Scott Cain, Ph. D. cain at cshl.edu >>> GMOD Coordinator, http://www.gmod.org/ (216)392-3087 >>> ---------------------------------------------------------------------- >>> >>> >>> On Sun, 20 Jan 2008, Hang wrote: >>> >>>> >>>> Hi, Scott, >>>> I tried to change sequence-region line to "4 FlyBase >>>> chromosome_arm 1 >>>> 1351857 . . . ID=4;Name=4", it doesn't work. "$!" didn't say >>>> anything but >>>> "died at line 12". >>>> >>>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my >>>> code to >>>> load the dmel-all-r5.4.gff(from Flybase) to a test database: >>>> ============================================================= >>>> use Bio::DB::SeqFeature::Store; >>>> use Bio::DB::SeqFeature::Store::GFF3Loader; >>>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', >>>> -dsn => >>>> 'dbi:mysql:test', >>>> -user => 'root', >>>> -pass => 'XXXXX', >>>> -write => 1 ); >>>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store >>>> => $db, >>>> -verbose >>>> => 1); >>>> $loader->load(./'dmel-all-r5.4.gff'); >>>> ============================================================= >>>> I got bunch of errors like this: >>>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't >>>> exist at >>>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line >>>> 1316". >>>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or >>>> die >>>> $sth->errstr; >>>> I checked the database test after failed loading. There is only one >>>> table >>>> created, which call 'meta'. I also tried 'grant all on test to >>>> XXX at localhost' and used that -user and -pass to load gff, it didn't >>>> work >>>> either. >>>> >>>> Jon >>>> >>>> >>>> Scott Cain-3 wrote: >>>>> >>>>> Hi Jon, >>>>> >>>>> Well, seeing the error message would be helpful, but my first >>>>> guess >>>>> without is that there are a few things you can try: >>>>> >>>>> * removing the "sequence-region" line from the GFF file, adding a >>>>> line >>>>> like this: >>>>> >>>>> 4 FlyBase chromosome_arm 1 1351857 . . . ID=4;Name=4 >>>>> >>>>> and then reloading the database. >>>>> >>>>> * Or, you may want to consider using Bio::DB::SeqFeature::Store, >>>>> since >>>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 >>>>> (that >>>>> is, with three levels of features (like gene, mRNA and CDS)). >>>>> >>>>> Scott >>>>> >>>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote: >>>>>> Hi, Scott, >>>>>> >>>>>> After adding die $!, I know something is wrong at line: >>>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end >>>>>> => >>>>>> 25000);" >>>>>> >>>>>> my gff file is like this: >>>>>> ##gff-version 3 >>>>>> ##sequence-region 4 1 1351857 >>>>>> 4 FlyBase transposable_element 2 611 . + . >>>>>> ID=FBti0062890;Name=ninja-Dsim- >>>>>> like >>>>>> {}4829 >>>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-; >>>>>> 4 repeatmasker_dummy match 2 347 . + . >>>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- >>>>>> RepeatMasker; >>>>>> 4 repeatmasker_dummy match_part 2 347 2367 + . >>>>>> ID=:5142029_dummy;Name=:5142029;Parent=: >>>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207 >>>>>> 5860 6210 +; >>>>>> ... >>>>>> ... >>>>>> I really got confused. Any further suggestion? Thank you! >>>>>> >>>>>> Jon >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Scott Cain-3 wrote: >>>>>>> >>>>>>> Hi Jon, >>>>>>> >>>>>>> I think it's funny that you have "or die" on the database >>>>>>> opening line, >>>>>>> "or die" on the @features line, but you didn't put one on the >>>>>>> $segment >>>>>>> line. Try adding "or die: $!" to the $segment line to see >>>>>>> what it >>>>>> says, >>>>>>> also add a 'print $segment' after you create it and before you >>>>>>> try to >>>>>>> get the features from it. >>>>>>> >>>>>>> Clearly, the problem is that $segment is not defined (that is, >>>>>>> nothing >>>>>>> is in it, not that the wrong thing is in it). The next trick is >>>>>>> to >>>>>> find >>>>>>> out why. My first guess, without looking at the data set, is >>>>>>> that the >>>>>>> arm is not really named '4'. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote: >>>>>>>> Hi, everyone, >>>>>>>> >>>>>>>> I met this problem when I was running this script to extract >>>>>>>> features >>>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't >>>>>>>> call >>>>>>>> method >>>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX". >>>>>>>> ============================================================== >>>>>>>> use Bio::DB::GFF; >>>>>>>> use Bio::Tools::GFF; >>>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql', >>>>>>>> -dsn => >>>>>>>> 'dbi:mysql:dmel_gff:localhost', >>>>>>>> -user => 'XXXX', >>>>>>>> -pass => 'XXXX') || die >>>>>> "database >>>>>>>> open failed"; >>>>>>>> >>>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, - >>>>>>>> end => >>>>>> 25000); >>>>>>>> my @features = $segment->features(-types => ['gene', 'exon', >>>>>>>> 'intron', >>>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no >>>>>>>> features"; >>>>>>>> print(scalar(@features)."\n"); >>>>>>>> >>>>>>>> = >>>>>>>> =============================================================== >>>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I >>>>>>>> loaded >>>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without >>>>>>>> any >>>>>> error. >>>>>>>> Other methods failed also. >>>>>>>> >>>>>>>> Any help will be deeply appreciated! >>>>>>>> >>>>>>>> Best, >>>>>>>> Jon >>>>>>>> >>>>>>> -- >>>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. >>>>>> cain at cshl.edu >>>>>>> GMOD Coordinator (http://www.gmod.org/) >>>>>> 216-392-3087 >>>>>>> Cold Spring Harbor Laboratory >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. >>>>> cain at cshl.edu >>>>> GMOD Coordinator (http://www.gmod.org/) >>>>> 216-392-3087 >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Wed Jan 23 03:14:06 2008 From: jason at bioperl.org (Jason Stajich) Date: Wed, 23 Jan 2008 00:14:06 -0800 Subject: [Bioperl-l] [Bioperl-guts-l] [14455] bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm: fixed up the gene glyph so that it works properly with CDS-only genes In-Reply-To: <200801222048.m0MKmhiI007977@dev.open-bio.org> References: <200801222048.m0MKmhiI007977@dev.open-bio.org> Message-ID: <91659EDD-B102-47C8-BF93-92576C2CF324@bioperl.org> Lincoln -- Thank you, Thank you for this fix! This takes care of inconsistency problems I was having with GFF3 and GFF2 data. It works so much more beautifully now! -jason On Jan 22, 2008, at 12:48 PM, Lincoln Stein wrote: > Revision: 14455 > Author: lstein > Date: 2008-01-22 15:48:42 -0500 (Tue, 22 Jan 2008) > > Log Message: > ----------- > fixed up the gene glyph so that it works properly with CDS-only genes > > Modified Paths: > -------------- > bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm > > Modified: bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm > =================================================================== > --- bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm 2008-01-22 > 00:16:02 UTC (rev 14454) > +++ bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm 2008-01-22 > 20:48:42 UTC (rev 14455) > @@ -44,7 +44,9 @@ > > sub bump { > my $self = shift; > - return 1 if $self->{level} == 0; # top level bumps, other levels > don't unless specified in config > + return 1 > + if $self->{level} == 0 > + && lc $self->feature->primary_tag eq 'gene'; # top level > bumps, other levels don't unless specified in config > return $self->SUPER::bump; > } > > @@ -92,12 +94,16 @@ > sub _subfeat { > my $class = shift; > my $feature = shift; > - if ($feature->primary_tag eq 'gene') { > + if (lc $feature->primary_tag eq 'gene') { > my @transcripts; > for my $t (qw/mRNA tRNA snRNA snoRNA miRNA ncRNA pseudogene/) { > push @transcripts, $feature->get_SeqFeatures($t); > } > return @transcripts; > + } elsif (lc $feature->primary_tag eq 'cds') { > + my @parts = $feature->get_SeqFeatures(); > + return ($feature) if $class->{level} == 0 and !@parts; > + return @parts; > } > > my @subparts; > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From ste.ghi at libero.it Thu Jan 24 08:42:49 2008 From: ste.ghi at libero.it (Stefano Ghignone) Date: Thu, 24 Jan 2008 14:42:49 +0100 Subject: [Bioperl-l] parsing ACE file Message-ID: Dear All, dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job? Any suggestion about how to start is welcome... Cheers Stefano From pmiguel at purdue.edu Thu Jan 24 14:06:35 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Thu, 24 Jan 2008 14:06:35 -0500 Subject: [Bioperl-l] parsing ACE file In-Reply-To: References: Message-ID: <4798E1BB.2020809@purdue.edu> Stefano Ghignone wrote: > Dear All, > dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job? > > Any suggestion about how to start is welcome... > Cheers > > Stefano > > perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace will give you a list of each the contigs followed by the reads in each contig, if "acefile.ace" is a phrap ace file. There is a bioperl module for handling phrap ace file, but I'm not sure what its current status is. Last time I looked (probably a couple of years ago) it seemed to have been abandoned half-finished. -- Phillip From golharam at umdnj.edu Thu Jan 24 14:36:29 2008 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 24 Jan 2008 14:36:29 -0500 Subject: [Bioperl-l] Wiki inconsistency? Message-ID: <4798E8BD.7030107@umdnj.edu> Hi, I haven't used Bioperl in a while but recently started using it. I was using 1.4.0 but see on the website that 1.5.2 has been released. If I click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), I see a two versions: bioperl-1.5.2_102 and bioperl-1.5.2_100 However, If I click on the Downloads link on the left toolbar, then scroll down, I see 1.5.2 Developer Release. The tar file here points to current_core_unstable.tar.gz. Is this supposed to be this way? It seems a bit confusing. I think it might be appropriate to put all the download links in one location...just my two cents... Ryan From cjfields at uiuc.edu Thu Jan 24 15:58:25 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 24 Jan 2008 14:58:25 -0600 Subject: [Bioperl-l] Wiki inconsistency? In-Reply-To: <4798E8BD.7030107@umdnj.edu> References: <4798E8BD.7030107@umdnj.edu> Message-ID: Maybe Sendu can answer more specifically, but I believe the extra designation referred to the release candidate (of which bioperl-core was the only one with '102'). You definitely want the core package. The other ones with '100' are other bioperl-related distributions which require the core package but have additional functionality (BioSQL-related functions, wrapper modules, etc.). chris On Jan 24, 2008, at 1:36 PM, Ryan Golhar wrote: > Hi, > > I haven't used Bioperl in a while but recently started using it. I > was using 1.4.0 but see on the website that 1.5.2 has been > released. If I click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2 > ), I see a two versions: > > bioperl-1.5.2_102 > > and > > bioperl-1.5.2_100 > > However, If I click on the Downloads link on the left toolbar, then > scroll down, I see 1.5.2 Developer Release. The tar file here > points to current_core_unstable.tar.gz. > > Is this supposed to be this way? It seems a bit confusing. I think > it might be appropriate to put all the download links in one > location...just my two cents... > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From florent.angly at gmail.com Thu Jan 24 17:06:29 2008 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 24 Jan 2008 14:06:29 -0800 Subject: [Bioperl-l] parsing ACE file In-Reply-To: <4798E1BB.2020809@purdue.edu> References: <4798E1BB.2020809@purdue.edu> Message-ID: <47990BE5.2010005@gmail.com> That would be the module Bio::Assembly::IO::ace It works fine as far as I know. To parse an assembly, use Bio::Assembly::IO: http://doc.bioperl.org/bioperl-live/Bio/Assembly/IO.html Regards, Florent Phillip San Miguel wrote: > Stefano Ghignone wrote: >> Dear All, >> dealing with an assembly .ace file and a list of contigs (from >> that assembly), how can I extract from the .ace file the read names >> forming each listed contig? Is there any module doing this job? >> >> Any suggestion about how to start is welcome... >> Cheers >> >> Stefano >> >> > perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace > > will give you a list of each the contigs followed by the reads in each > contig, if "acefile.ace" is a phrap ace file. > > There is a bioperl module for handling phrap ace file, but I'm not > sure what its current status is. Last time I looked (probably a couple > of years ago) it seemed to have been abandoned half-finished. > From golharam at umdnj.edu Thu Jan 24 16:17:14 2008 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 24 Jan 2008 16:17:14 -0500 Subject: [Bioperl-l] GenBank updated sequence not being retrieved Message-ID: <4799005A.5030204@umdnj.edu> I'm using Bioperl 1.4 (and tried with 1.5.1). I'm trying to download GenBank sequence for which I have accession #'s. One of the sequences has been replaced with a newer version. I'm using get_Seq_by_acc, which returns the warning: -------------------- WARNING --------------------- MSG: acc (gb|XM_087386) does not exist --------------------------------------------------- If I check NCBI's website for the sequence, it has indeed been replaced by an NM_ sequence. How can I get BioPerl to retrieve the latest version of a sequence? From johan.nilsson at sh.se Thu Jan 24 17:33:42 2008 From: johan.nilsson at sh.se (Johan Nilsson) Date: Thu, 24 Jan 2008 23:33:42 +0100 Subject: [Bioperl-l] Quickest Codon Based MSA? Message-ID: <47991246.6010106@sh.se> Hello, I have a question which might not necessarily be related to Bioperl, although I do believe the expertise is available here. I have a couple of thousand FASTA files, each containing 20 CDS sequence orthologues of rather high sequence similarity. I would like to create a codon-based multiple sequence alignment for each of these FASTA files (i.e. a nucleotide sequence alignment inferred from alignment of the translated peptide sequences, to assure that no frame shifts will occur). I first tried running Dialign2, which can perform the translation/back-translation in one go, but this turned out to be far too slow. I next tried to build protein alignments using ClustalW and subsequently built the coding region alignment using EMBOSS 'tranalign', but this also was too slow. Is there any method available which significantly speeds up the codon-preserving alignment??? As I mentioned, the sequences to be aligned are in general very conserved, so any heuristic taking advantage of the low divergence would be very helpful! Also, is there any adjustable parameter in dialign2/dialign-T that might speed up the program when looking at highly similar sequences? Best regards /Johan Nilsson From e-just at northwestern.edu Thu Jan 24 18:07:57 2008 From: e-just at northwestern.edu (Eric Just) Date: Thu, 24 Jan 2008 17:07:57 -0600 Subject: [Bioperl-l] Bioinformatics Job Opening at dictyBase in Chicago Message-ID: Hello everyone, We have an opening at dictyBase (Northwestern University in Chicago) for a Bioinformatics Software Engineer. This job involves writing and maintaining software for a genome database using Chado/OO-Perl/ Bioperl and many other state-of-the-art technologies. For more information please see: http://dictybase.org/dictybase_jobs.htm Thanks, Eric From bix at sendu.me.uk Thu Jan 24 18:16:14 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 24 Jan 2008 23:16:14 +0000 Subject: [Bioperl-l] Wiki inconsistency? In-Reply-To: <4798E8BD.7030107@umdnj.edu> References: <4798E8BD.7030107@umdnj.edu> Message-ID: <47991C3E.2010908@sendu.me.uk> Ryan Golhar wrote: > Hi, > > I haven't used Bioperl in a while but recently started using it. I was > using 1.4.0 but see on the website that 1.5.2 has been released. If I > click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), > I see a two versions: > > bioperl-1.5.2_102 > > and > > bioperl-1.5.2_100 Where do you see this older version? I did a search on the page and that term isn't found. _100 was the first version of 1.5.2 core to go out. There were then 2 minor revisions released, as detailed in the 'Updates' section of the page. > However, If I click on the Downloads link on the left toolbar, then > scroll down, I see 1.5.2 Developer Release. The tar file here points to > current_core_unstable.tar.gz. Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest version happens to be. So that people don't need to worry about the actual version, they can just have one static bookmark. > Is this supposed to be this way? It seems a bit confusing. I think it > might be appropriate to put all the download links in one > location...just my two cents... Well the primary page where all the links are found is the Downloads page. The Release_1.5.2 page is specific to 1.5.2 and will remain for historic reasons (so at some point there will be 1.5.3 or something and the appropriate links on the main Downloads page will be updated to that, but if someone specifically wants 1.5.2 they can still find the 1.5.2 downloads on its own dedicated page). From jason at bioperl.org Thu Jan 24 21:17:02 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 24 Jan 2008 18:17:02 -0800 Subject: [Bioperl-l] Quickest Codon Based MSA? In-Reply-To: <47991246.6010106@sh.se> References: <47991246.6010106@sh.se> Message-ID: I don't know if it is faster or slower than what you have tried but the aa_to_dna_aln translates a protein alignment back to CDS. You can see example code of it in use in the pairwise_kaks script in scripts/utilities/pairwise_kaks.PLS -jason On Jan 24, 2008, at 2:33 PM, Johan Nilsson wrote: > Hello, > > I have a question which might not necessarily be related to > Bioperl, although I do believe the expertise is available here. I > have a couple of thousand FASTA files, each containing 20 CDS > sequence orthologues of rather high sequence similarity. I would > like to create a codon-based multiple sequence alignment for each > of these FASTA files (i.e. a nucleotide sequence alignment inferred > from alignment of the translated peptide sequences, to assure that > no frame shifts will occur). I first tried running Dialign2, which > can perform the translation/back-translation in one go, but this > turned out to be far too slow. I next tried to build protein > alignments using ClustalW and subsequently built the coding region > alignment using EMBOSS 'tranalign', but this also was too slow. > > Is there any method available which significantly speeds up the > codon-preserving alignment??? As I mentioned, the sequences to be > aligned are in general very conserved, so any heuristic taking > advantage of the low divergence would be very helpful! Also, is > there any adjustable parameter in dialign2/dialign-T that might > speed up the program when looking at highly similar sequences? > > Best regards > /Johan Nilsson > _______________________________________________ > Bioperl-l mailing list From tristan.lefebure at gmail.com Thu Jan 24 22:07:52 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 24 Jan 2008 22:07:52 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree, and how to combine trees Message-ID: <200801242207.52991.tristan.lefebure@gmail.com> Hi, I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would like to merge several "one leaf taxonomic trees" into a taxonomic tree with several leafs. For example: #####BEGINNING##### #! /usr/bin/perl use strict; use warnings; use Bio::DB::Taxonomy; use Bio::TreeIO; # The taxonomic database # You might want to switch to a different flatfile or to Entrez my $dbh = new Bio::DB::Taxonomy(-source => 'flatfile', -directory=> '/tmp', -nodesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/nodes.dmp', -namesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/names.dmp'); # Fetch 4 taxa for the example my $tax_decapoda = $dbh->get_taxon(-name => 'Decapoda'); my $tax_heteroptera = $dbh->get_taxon(-name => 'Heteroptera'); my $tax_coleoptera = $dbh->get_taxon(-name => 'Coleoptera'); my $tax_copepoda = $dbh->get_taxon(-name => 'Copepoda'); # Transform to tree objects my $decapoda_tree = new Bio::Tree::Tree(-node => $tax_decapoda); my $heteroptera_tree = new Bio::Tree::Tree(-node => $tax_heteroptera); my $coleoptera_tree = new Bio::Tree::Tree(-node => $tax_coleoptera); my $copepoda_tree = new Bio::Tree::Tree(-node => $tax_copepoda); # Reduce the number of nodes to the following ranks my @ranks = qw(kingdom phylum subphylum superclass class subclass superorder order family); $decapoda_tree->splice(-keep_rank => \@ranks); $heteroptera_tree->splice(-keep_rank => \@ranks); $coleoptera_tree->splice(-keep_rank => \@ranks); $copepoda_tree->splice(-keep_rank => \@ranks); # Print the trees my $out = new Bio::TreeIO('-format' => 'newick', '-file' => ">four.tree"); $out->write_tree($decapoda_tree); $out->write_tree($heteroptera_tree); $out->write_tree($coleoptera_tree); $out->write_tree($copepoda_tree); #####END####### This gives the following "trees": (((((7524)33340)50557)6960)6656)33208; (((((7041)33340)50557)6960)6656)33208; ((((((6683)6682)72041)6681)6657)6656)33208; ((((6830)72037)6657)6656)33208; They are really special trees, as they contain only one leaf. I would like to combine them and remove the 'unused' nodes to obtain something like that: ((7524,7041)33340,(6683,6830)6657)6656; or even better: ((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda; Any suggestions? Thanks! -Tristan From anjan.purkayastha at gmail.com Thu Jan 24 18:32:20 2008 From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA) Date: Thu, 24 Jan 2008 18:32:20 -0500 Subject: [Bioperl-l] Question from a bioperl newbie Message-ID: hi, i recently installed bioperl on my mac-machine. tried to use it in a simple script with a "use Bio::Perl" command. however, i get an error message "Can't locate Bio/Perl.pm in @INC". the BioPerl folder is in my desktop. so i tried use: use lib "/Users/anjan/Desktop/bioperl-1.5.2_102/Bio"; This time it returned me another error: Undefined subroutine &main::get_sequence. so, when BioPerl is installed, which directory does it reside in.( it's not present in the .cpan/build directory.) appreciate your prompt reply. anjan -- ANJAN PURKAYASTHA, PhD. Senior Computational Biologist ========================== 1101 King Street, Suite 310, Alexandria, VA 22314. 703.518.8040 (office) 703.740.6939 (mobile) email: anjan at vbi.vt.edu; anjan.purkayastha at gmail.com http://www.vbi.vt.edu ========================== From bosborne11 at verizon.net Thu Jan 24 23:04:50 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 24 Jan 2008 23:04:50 -0500 Subject: [Bioperl-l] Question from a bioperl newbie In-Reply-To: References: Message-ID: <3B13E81A-66E1-418A-8915-9E877C2B751D@verizon.net> Anjan, use lib "/Users/anjan/Desktop/bioperl-1.5.2_102/"; Brian O. On Jan 24, 2008, at 6:32 PM, ANJAN PURKAYASTHA wrote: > hi, > i recently installed bioperl on my mac-machine. > tried to use it in a simple script with a "use Bio::Perl" command. > however, > i get an error message "Can't locate Bio/Perl.pm in @INC". > the BioPerl folder is in my desktop. so i tried use: use lib > "/Users/anjan/Desktop/bioperl-1.5.2_102/Bio"; > This time it returned me another error: Undefined subroutine > &main::get_sequence. > > so, when BioPerl is installed, which directory does it reside in. > ( it's not > present in the .cpan/build directory.) > > appreciate your prompt reply. > > anjan > > -- > ANJAN PURKAYASTHA, PhD. > Senior Computational Biologist > ========================== > > 1101 King Street, Suite 310, > Alexandria, VA 22314. > 703.518.8040 (office) > 703.740.6939 (mobile) > > email: > anjan at vbi.vt.edu; > anjan.purkayastha at gmail.com > > http://www.vbi.vt.edu > > ========================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From n.haigh at sheffield.ac.uk Fri Jan 25 02:32:10 2008 From: n.haigh at sheffield.ac.uk (Nathan S Haigh) Date: Fri, 25 Jan 2008 07:32:10 +0000 Subject: [Bioperl-l] Wiki inconsistency? In-Reply-To: <47991C3E.2010908@sendu.me.uk> References: <4798E8BD.7030107@umdnj.edu> <47991C3E.2010908@sendu.me.uk> Message-ID: <4799907A.9060301@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Sendu, Have you thought about using a template for the latest stable release and the latest developer release? That way, any article/link that always needs to point to the latest version simply has to include the correct template? So once a new release is made, you simply update the one template, and changes automatically propagate through the wiki - might save some wiki admin each time there's a new release. You could get more intricate, and use a template to show the latest version of any particular release series so you could do something like: {{latest release|series=1.5.x|full=y}} and {{latest release|series=1.4.x|full=y}} or even: {{latest release|series=stable|full=y}} and {{latest release|series=dev|full=y}} these templates could return 1.5.2_102 if the "full" param is set to something or simply 1.5.2 if the "full" param is missing. Just a thought. Nath Sendu Bala wrote: > Ryan Golhar wrote: >> Hi, >> >> I haven't used Bioperl in a while but recently started using it. I >> was using 1.4.0 but see on the website that 1.5.2 has been released. >> If I click on the link for 1.5.2 >> (http://www.bioperl.org/wiki/Release_1.5.2), I see a two versions: >> >> bioperl-1.5.2_102 >> >> and >> >> bioperl-1.5.2_100 > > Where do you see this older version? I did a search on the page and that > term isn't found. _100 was the first version of 1.5.2 core to go out. > There were then 2 minor revisions released, as detailed in the 'Updates' > section of the page. > > >> However, If I click on the Downloads link on the left toolbar, then >> scroll down, I see 1.5.2 Developer Release. The tar file here points >> to current_core_unstable.tar.gz. > > Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest > version happens to be. So that people don't need to worry about the > actual version, they can just have one static bookmark. > > >> Is this supposed to be this way? It seems a bit confusing. I think >> it might be appropriate to put all the download links in one >> location...just my two cents... > > Well the primary page where all the links are found is the Downloads > page. The Release_1.5.2 page is specific to 1.5.2 and will remain for > historic reasons (so at some point there will be 1.5.3 or something and > the appropriate links on the main Downloads page will be updated to > that, but if someone specifically wants 1.5.2 they can still find the > 1.5.2 downloads on its own dedicated page). > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHmZB69gTv6QYzVL4RAnRpAJwOyWjZXzD0UJBNFNP8H1Hrn4c66ACfRyzA NsJEZydsG+aMzNltrBw+Nx4= =kHt0 -----END PGP SIGNATURE----- From derek.fairley at belfasttrust.hscni.net Fri Jan 25 03:31:28 2008 From: derek.fairley at belfasttrust.hscni.net (Fairley, Derek) Date: Fri, 25 Jan 2008 08:31:28 -0000 Subject: [Bioperl-l] Quickest Codon Based MSA? In-Reply-To: <47991246.6010106@sh.se> Message-ID: Johan, There is currently no Bioperl-run wrapper for this program, but you might want to have a look at Codon Align 2.0 as well: http://homepage.mac.com/barryghall/CodonAlign.html Derek -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Johan Nilsson Sent: 24 January 2008 22:34 To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Quickest Codon Based MSA? Hello, I have a question which might not necessarily be related to Bioperl, although I do believe the expertise is available here. I have a couple of thousand FASTA files, each containing 20 CDS sequence orthologues of rather high sequence similarity. I would like to create a codon-based multiple sequence alignment for each of these FASTA files (i.e. a nucleotide sequence alignment inferred from alignment of the translated peptide sequences, to assure that no frame shifts will occur). I first tried running Dialign2, which can perform the translation/back-translation in one go, but this turned out to be far too slow. I next tried to build protein alignments using ClustalW and subsequently built the coding region alignment using EMBOSS 'tranalign', but this also was too slow. Is there any method available which significantly speeds up the codon-preserving alignment??? As I mentioned, the sequences to be aligned are in general very conserved, so any heuristic taking advantage of the low divergence would be very helpful! Also, is there any adjustable parameter in dialign2/dialign-T that might speed up the program when looking at highly similar sequences? Best regards /Johan Nilsson _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From ewijaya at gmail.com Fri Jan 25 04:26:05 2008 From: ewijaya at gmail.com (Edward Wijaya) Date: Fri, 25 Jan 2008 17:26:05 +0800 Subject: [Bioperl-l] BioPerl module to extract sequence from gene names Message-ID: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com> Dear Experts, Suppose I have the following list of gene names and Ensemble Ids. RBL1 ENSG00000080839 RB1 ENSG00000139687 CDC2 ENSG00000170312 CDC25A ENSG00000164045 CCNA2 ENSG00000145386 E2F3 ENSG00000112242 E2F2 ENSG00000007968 CDK2 ENSG00000123374 ...etc... Is there a way to extract the gene sequence from those list? And then output them in FASTA format. - Edward From bix at sendu.me.uk Fri Jan 25 05:55:50 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 25 Jan 2008 10:55:50 +0000 Subject: [Bioperl-l] Quickest Codon Based MSA? In-Reply-To: <47991246.6010106@sh.se> References: <47991246.6010106@sh.se> Message-ID: <4799C036.5060404@sendu.me.uk> Johan Nilsson wrote: > Hello, > > I have a question which might not necessarily be related to Bioperl, > although I do believe the expertise is available here. I have a couple > of thousand FASTA files, each containing 20 CDS sequence orthologues of > rather high sequence similarity. I would like to create a codon-based > multiple sequence alignment for each of these FASTA files (i.e. a > nucleotide sequence alignment inferred from alignment of the translated > peptide sequences, to assure that no frame shifts will occur). I first > tried running Dialign2, which can perform the > translation/back-translation in one go, but this turned out to be far > too slow. I next tried to build protein alignments using ClustalW and > subsequently built the coding region alignment using EMBOSS 'tranalign', > but this also was too slow. > > Is there any method available which significantly speeds up the > codon-preserving alignment??? As I mentioned, the sequences to be > aligned are in general very conserved, so any heuristic taking advantage > of the low divergence would be very helpful! Also, is there any > adjustable parameter in dialign2/dialign-T that might speed up the > program when looking at highly similar sequences? Do you know which is the slow part? For example, when using ClustalW, are the alignments slower than the creating the codon alignment from the protein? If ClustalW is the problem, you can try using other alignment programs famous for their speed, such as Muscle. If it's the protein->codon bit that's slow, try using other programs to do that, like Pal2Nal or the BioPerl method. From David.Messina at sbc.su.se Fri Jan 25 06:35:16 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 25 Jan 2008 12:35:16 +0100 Subject: [Bioperl-l] BioPerl module to extract sequence from gene names In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com> References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com> Message-ID: <628aabb70801250335l2a2754efn3e73e44a9dae6a35@mail.gmail.com> Hi Edward, I don't think there's a direct BioPerl interface to Ensembl, but BioMart at Ensembl itself will get you sequences (and lots of other things if you want) given a list of Ensembl IDs. http://www.ensembl.org/biomart/martview Note that as of this writing, the Ensembl BioMart server appears to be down temporarily. If you want to be able to get Ensembl sequences from a program, there's the Ensembl API: http://www.ensembl.org/info/using/api/core/core_tutorial.html Dave From bix at sendu.me.uk Fri Jan 25 06:07:42 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 25 Jan 2008 11:07:42 +0000 Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree, and how to combine trees In-Reply-To: <200801242207.52991.tristan.lefebure@gmail.com> References: <200801242207.52991.tristan.lefebure@gmail.com> Message-ID: <4799C2FE.8080700@sendu.me.uk> Tristan Lefebure wrote: > Hi, > > I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would > like to merge several "one leaf taxonomic trees" into a taxonomic tree with > several leafs. [...] > or even better: > > ((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda; The BioPerl script taxonomy2tree.pl generates: (((Decapoda,Copepoda)Crustacea,(Heteroptera,Coleoptera)Neoptera)Pancrustacea)"cellular organisms"; I think you can modify it similar to your own script to only output the classes you're interested in. http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/taxa/taxonomy2tree.PLS From bosborne11 at verizon.net Fri Jan 25 08:53:36 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 25 Jan 2008 08:53:36 -0500 Subject: [Bioperl-l] BioPerl module to extract sequence from gene names In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com> References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com> Message-ID: <9CE20DF3-ED5F-4432-A191-4123896E5815@verizon.net> Edward, Various approaches are discussed here: http://www.bioperl.org/wiki/Getting_Genomic_Sequences Since you have ENSEMBL ids I'd think that would be the way to go. Brian O. On Jan 25, 2008, at 4:26 AM, Edward Wijaya wrote: > Dear Experts, > > Suppose I have the following list of gene names and Ensemble Ids. > > RBL1 ENSG00000080839 > RB1 ENSG00000139687 > CDC2 ENSG00000170312 > CDC25A ENSG00000164045 > CCNA2 ENSG00000145386 > E2F3 ENSG00000112242 > E2F2 ENSG00000007968 > CDK2 ENSG00000123374 > ...etc... > > Is there a way to extract the gene sequence from those list? > And then output them in FASTA format. > > - Edward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From snoze.pa at gmail.com Fri Jan 25 18:30:56 2008 From: snoze.pa at gmail.com (snoze pa) Date: Fri, 25 Jan 2008 17:30:56 -0600 Subject: [Bioperl-l] bioperl DB error Message-ID: <10f848910801251530j6eacfcb0x81780ae312cf19c5@mail.gmail.com> Dear Users, I am using bioperl/iosql and trying to install ncbi taxonomy. But I am getting following error message. any help? thanks in advance perl load_ncbi_taxonomy.pl -download -driver mysql -dbname bioseqdb -dbuser root Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes failed to insert node (10090;10090;10088;species;1;2): Duplicate entry '10090' for key 2 at load_ncbi_taxonomy.pl line 568. From snoze.pa at gmail.com Fri Jan 25 18:49:28 2008 From: snoze.pa at gmail.com (snoze pa) Date: Fri, 25 Jan 2008 17:49:28 -0600 Subject: [Bioperl-l] bioseqDB error Message-ID: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com> Hi Anyone know why i am getting this error message!! Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes failed to insert node (10090;10090;10088;species;1;2): Duplicate entry '10090' for key 2 at load_ncbi_taxonomy.pl line 568 From wkath83 at vbi.vt.edu Thu Jan 24 13:19:06 2008 From: wkath83 at vbi.vt.edu (Katherine Wendelsdorf) Date: Thu, 24 Jan 2008 13:19:06 -0500 (EST) Subject: [Bioperl-l] bioperl on mac Message-ID: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu> Dear one who knows, I have a macbook with Leopard OSX and I am having trouble running scripts that call for bioperl modules. Here is my history: Using Fink I installed bioperl-pm586 version 1.5.2-4 and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl-pm586 in to the command line I get nothing. Spotlight says that the path is /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586. 1. I tried to run test2.pl script that was literally copied and pasted from the HOWTO manual, but it wouldnt run. The two attached docs are the script I tried to run and the output (which is nonexistant). I read something that said to "go in to" Bioperl to execute a command. I could not enter the bioperl directory when it was in the sw/shared directory so I copied the bioperl folder to the Desktop just so I could try executing the script inside bioperl. Where am I going wrong here? Should I place these folders (bioperl-pm586 and bioperl-run-pm586) somewhere else on my computer? Shoudl they be in the same directory as perl (usr/bin/perl)? 2. How do I know what modules are included in the bioperl-pm586 I downloaded? Specifically I want to use Bio::SeqIO. 3. What is the best way to download/install new modules as I need them? Any answers you coudl give me for any of these questions would be greatly appreciated! Thank you so much, kind volunteer! -Kate -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test2.pl URL: From bosborne11 at verizon.net Sat Jan 26 11:14:13 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 26 Jan 2008 11:14:13 -0500 Subject: [Bioperl-l] bioperl on mac In-Reply-To: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu> References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu> Message-ID: Katherine, Perl keeps the addresses of all the module directories in its @INC array. What do you see when you do: perl -e 'print @INC' ? If '/sw/share/bioperl-pm586' is not in @INC then you need to put it there, perhaps by adding something like: setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586 to the .tcshrc file in your home directory (if you use tcsh that is, most use bash, .bashrc, and 'set' these days). You asked some other questions, the general answer is that all the modules you'll need are in the 2 packages you've installed, and you don't need to move them from /sw. Brian O. On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote: > Dear one who knows, > > I have a macbook with Leopard OSX and I am having trouble running > scripts > that call for bioperl modules. > > Here is my history: Using Fink I installed bioperl-pm586 version > 1.5.2-4 > and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- > pm586 in > to the command line I get nothing. Spotlight says that the path is > /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586. > > 1. I tried to run test2.pl script that was literally copied and pasted > from the HOWTO manual, but it wouldnt run. The two attached docs are > the > script I tried to run and the output (which is nonexistant). I read > something that said to "go in to" Bioperl to execute a command. I > could > not enter the bioperl directory when it was in the sw/shared > directory so > I copied the bioperl folder to the Desktop just so I could try > executing > the script inside bioperl. Where am I going wrong here? > > Should I place these folders (bioperl-pm586 and bioperl-run-pm586) > somewhere else on my computer? Shoudl they be in the same directory as > perl (usr/bin/perl)? > > 2. How do I know what modules are included in the bioperl-pm586 I > downloaded? Specifically I want to use Bio::SeqIO. > > 3. What is the best way to download/install new modules as I need > them? > > > Any answers you coudl give me for any of these questions would be > greatly > appreciated! > > Thank you so much, kind volunteer! > - > Kate > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Sat Jan 26 15:30:11 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 26 Jan 2008 12:30:11 -0800 Subject: [Bioperl-l] bioperl on mac In-Reply-To: References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu> Message-ID: Usually this is done by fink by adding a line to your .tcshrc (if you are running that shell) or .bash_profile or .bashrc. On my machine I have this at the top of my .bash_profile file: test -r /sw/bin/init.sh && . /sw/bin/init.sh if that is not there you need to add it to insure that all the fink tools are setup properly. On Jan 26, 2008, at 8:14 AM, Brian Osborne wrote: > Katherine, > > Perl keeps the addresses of all the module directories in its @INC > array. What do you see when you do: > > perl -e 'print @INC' > > ? > > If '/sw/share/bioperl-pm586' is not in @INC then you need to put it > there, perhaps by adding something like: > > setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586 > > to the .tcshrc file in your home directory (if you use tcsh that > is, most use bash, .bashrc, and 'set' these days). > > You asked some other questions, the general answer is that all the > modules you'll need are in the 2 packages you've installed, and you > don't need to move them from /sw. > > > Brian O. > > > On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote: > >> Dear one who knows, >> >> I have a macbook with Leopard OSX and I am having trouble running >> scripts >> that call for bioperl modules. >> >> Here is my history: Using Fink I installed bioperl-pm586 version >> 1.5.2-4 >> and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- >> pm586 in >> to the command line I get nothing. Spotlight says that the path is >> /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586. >> >> 1. I tried to run test2.pl script that was literally copied and >> pasted >> from the HOWTO manual, but it wouldnt run. The two attached docs >> are the >> script I tried to run and the output (which is nonexistant). I read >> something that said to "go in to" Bioperl to execute a command. I >> could >> not enter the bioperl directory when it was in the sw/shared >> directory so >> I copied the bioperl folder to the Desktop just so I could try >> executing >> the script inside bioperl. Where am I going wrong here? >> >> Should I place these folders (bioperl-pm586 and bioperl-run-pm586) >> somewhere else on my computer? Shoudl they be in the same >> directory as >> perl (usr/bin/perl)? >> >> 2. How do I know what modules are included in the bioperl-pm586 I >> downloaded? Specifically I want to use Bio::SeqIO. >> >> 3. What is the best way to download/install new modules as I need >> them? >> >> >> Any answers you coudl give me for any of these questions would be >> greatly >> appreciated! >> >> Thank you so much, kind volunteer! >> - >> Kate_____________________________________________ >> __ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Sat Jan 26 19:14:45 2008 From: jason at bioperl.org (Jason Stajich) Date: Sat, 26 Jan 2008 16:14:45 -0800 Subject: [Bioperl-l] a question on "move_id_to_bootstrap" usage In-Reply-To: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com> References: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com> Message-ID: <8273f6c20801261614p312886d5x562593aa0cde60da@mail.gmail.com> I'm not sure why you still have the __DATA__ block if you are reading data in from a file or are you trying to send an example of the code but forgot to specify a different input point? If you are reading from a file that looks like the tree in the __DATA__ block you notice that the bootstrap info is encoded as the branch_length, NOT the id - the move_id_to_bootstrap only moves the ID to the BOOTSTRAP. you'll have to write a custom routine or just run a simple loop on your tree to move the data to the bootstrap - it would look just the move_id_to_bootstrap except you'd use branch_length instead of id to get the data that you want to set in the bootstrap. I leave it as an exercise for the reader, but if you can't figure it out let us know. In the future please ask your questions on the mailing list as I don't have much time to answer questions individually when someone else can help. -jason On Jan 23, 2008 1:57 PM, Anand wrote: > HI Jason, > > Thanks a lot. I followed your suggestion and updated both the modules. > > I followed the code example on http://www.bioperl.org/wiki/HOWTO:Trees and > tried to extract bootstrap values for my tree (which is output after > seqboot, protdist, fitch and consense) > > When I try running my script, I am not able to print the bootstrap > values...and it doesn't throw any error messages. Am I missing something? > > ====START of Code==== > #!/usr/bin/perl -w > use strict; > use lib "/home/anand/myperlmodules/lib/perl5/"; > use Bio::TreeIO; > # $usage: $0 > > my $infile = shift; > > my $treeio = Bio::TreeIO->new(-format => 'newick', > -file => $infile, > -internal_node_id => 'bootstrap', > ); > > while( my $tree = $treeio->next_tree ) { > for my $node ( $tree->get_nodes ) { > printf "id: %s bootstrap: %s\n", $node->id || '', $node->bootstrap > || '', "\n"; > } > } > __END__ > ((5815_1:100.0,(((5815_5:100.0,5815_7:100.0):100.0,5815_6:100.0):97.0 > ,5815_8:100.0): > 98.0,5815_4:100.0,5815_2:100.0):100.0,5815_3:100.0); > ====END of Code==== > > Thanks in advance for your time and help, > > Anand > > PS: Just to preserve formatting, I have attached the consense_output_file > > On Jan 22, 2008 8:02 AM, Jason Stajich wrote: > > > I suspect you may want to update everything in Bio/TreeIO and Bio/ > > Tree to be safe, I'm not exactly sure what was changed - you can look > > at the commit logs to see what else changed at the time - http:// > > code.open-bio.org/. You can also use that same server to grab a > > fresh checkout of what is the current state of the code base. > > > > -jason > > On Jan 22, 2008, at 12:59 AM, Anand wrote: > > > > > Hi Jason > > > > > > I have a question on the method "move_id_to_bootstrap". From this > > > post: > > > http://portal.open-bio.org/pipermail/bioperl-guts-l/2007-May/ > > > 025718.html > > > > > > it looks like it has been added very recently. As luck would have > > > it, the > > > TreeFunctionsI.pm in my bioperl installation is missing that method. > > > > > > My question: What is the best method to update TreeFunctionsI.pm so > > > that it > > > can have the "move_id_to_bootstrap" method? Does it have other update > > > dependencies. > > > > > > Thanks in advance for your help and time, > > > > > > Anand > > > > > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From hlapp at duke.edu Mon Jan 28 00:27:34 2008 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 28 Jan 2008 00:27:34 -0500 Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal References: <4795292E.4030401@sdsc.edu> Message-ID: Some folks may remember that CIPRES (http://www.phylo.org) released their portal with access to remote execution of several phylogenetic tree reconstruction programs in spring last year. It took a while but they have now also built a really nice REST-based API that makes the service fully programmable instead of screen- scraping 5 pages: http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API) It should be relatively straightforward to build the equivalent of RemoteBlast on top of this. Would anyone be keen to take this on? -hilmar P.S. Sorry for the cross-posting - I thought this is relevant to both communities. When responding in a project-specific way please make sure you remove the list that is no longer pertinent. Begin forwarded message: > From: Lucie Chan > Date: January 21, 2008 6:22:22 PM EST > To: Hilmar Lapp > Cc: Mark Miller , Rutger Vos , > Terri Liebowitz , Paul Hoover , > mtholder at ku.edu > Subject: Re: REST APIs for Cipres Web Portal > Reply-To: lcchan at sdsc.edu > > Hilmar, et al., > > I just released the first version of our REST Web Services API for > job submission, and job status query, and > job result file retrieval. I'd like to get some feedbacks (issues, > problems, improvements, suggestions, etc) from you. For > documentation on how to access the services, check it out at: > > http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST > API" below the "CIPRES PORTAL" banner. > > Lucie > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Jan 28 01:04:46 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Jan 2008 00:04:46 -0600 Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal In-Reply-To: References: <4795292E.4030401@sdsc.edu> Message-ID: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu> We can certainly add it to the to-do list; just need to sort out the details (how often to allow posts, etc). I guess we would want this in the Bio::Tools::Run namespace, same as RemoteBlast? chris On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote: > Some folks may remember that CIPRES (http://www.phylo.org) released > their portal with access to remote execution of several phylogenetic > tree reconstruction programs in spring last year. > > It took a while but they have now also built a really nice REST- > based API that makes the service fully programmable instead of > screen-scraping 5 pages: > > http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API) > > It should be relatively straightforward to build the equivalent of > RemoteBlast on top of this. Would anyone be keen to take this on? > > -hilmar > > P.S. Sorry for the cross-posting - I thought this is relevant to > both communities. When responding in a project-specific way please > make sure you remove the list that is no longer pertinent. > > > Begin forwarded message: > >> From: Lucie Chan >> Date: January 21, 2008 6:22:22 PM EST >> To: Hilmar Lapp >> Cc: Mark Miller , Rutger Vos , >> Terri Liebowitz , Paul Hoover , mtholder at ku.edu >> Subject: Re: REST APIs for Cipres Web Portal >> Reply-To: lcchan at sdsc.edu >> >> Hilmar, et al., >> >> I just released the first version of our REST Web Services API for >> job submission, and job status query, and >> job result file retrieval. I'd like to get some feedbacks (issues, >> problems, improvements, suggestions, etc) from you. For >> documentation on how to access the services, check it out at: >> >> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST >> API" below the "CIPRES PORTAL" banner. >> >> Lucie >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at duke.edu Mon Jan 28 08:42:39 2008 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 28 Jan 2008 08:42:39 -0500 Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal In-Reply-To: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu> References: <4795292E.4030401@sdsc.edu> <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu> Message-ID: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu> Yep that's what I was thinking. BTW the API needs multipart/form-data encoding for input (due to file upload); I'm assuming that that's supported well in LWP but if anyone knows where to start digging for that the pointer would be appreciated. -hilmar On Jan 28, 2008, at 1:04 AM, Chris Fields wrote: > We can certainly add it to the to-do list; just need to sort out > the details (how often to allow posts, etc). I guess we would want > this in the Bio::Tools::Run namespace, same as RemoteBlast? > > chris > > On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote: > >> Some folks may remember that CIPRES (http://www.phylo.org) >> released their portal with access to remote execution of several >> phylogenetic tree reconstruction programs in spring last year. >> >> It took a while but they have now also built a really nice REST- >> based API that makes the service fully programmable instead of >> screen-scraping 5 pages: >> >> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API) >> >> It should be relatively straightforward to build the equivalent of >> RemoteBlast on top of this. Would anyone be keen to take this on? >> >> -hilmar >> >> P.S. Sorry for the cross-posting - I thought this is relevant to >> both communities. When responding in a project-specific way please >> make sure you remove the list that is no longer pertinent. >> >> >> Begin forwarded message: >> >>> From: Lucie Chan >>> Date: January 21, 2008 6:22:22 PM EST >>> To: Hilmar Lapp >>> Cc: Mark Miller , Rutger Vos , >>> Terri Liebowitz , Paul Hoover , >>> mtholder at ku.edu >>> Subject: Re: REST APIs for Cipres Web Portal >>> Reply-To: lcchan at sdsc.edu >>> >>> Hilmar, et al., >>> >>> I just released the first version of our REST Web Services API >>> for job submission, and job status query, and >>> job result file retrieval. I'd like to get some feedbacks >>> (issues, problems, improvements, suggestions, etc) from you. For >>> documentation on how to access the services, check it out at: >>> >>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST >>> API" below the "CIPRES PORTAL" banner. >>> >>> Lucie >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Jan 28 08:50:08 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Jan 2008 07:50:08 -0600 Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal In-Reply-To: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu> References: <4795292E.4030401@sdsc.edu> <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu> <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu> Message-ID: Googled it. From http://www.issociate.de/board/post/258535/LWP_-_multipart/form-data_file_upload_from_scalar_rather_than_local_file.html : my $ua = new LWP::UserAgent; $response=$ua->request(POST $URL, Content_Type => 'multipart/form-data', Content => [ $PARAM => [undef,$FILENAME, Content => $CONTENTS ] ]); Where $PARAM is the name of the parameter, $FILENAME is what you want to call the file, and $CONTENTS is a scalar holding the contents of the file. Could probably use HTTP::Request in there, but whatever works. chris On Jan 28, 2008, at 7:42 AM, Hilmar Lapp wrote: > Yep that's what I was thinking. > > BTW the API needs multipart/form-data encoding for input (due to > file upload); I'm assuming that that's supported well in LWP but if > anyone knows where to start digging for that the pointer would be > appreciated. > > -hilmar > > On Jan 28, 2008, at 1:04 AM, Chris Fields wrote: > >> We can certainly add it to the to-do list; just need to sort out >> the details (how often to allow posts, etc). I guess we would want >> this in the Bio::Tools::Run namespace, same as RemoteBlast? >> >> chris >> >> On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote: >> >>> Some folks may remember that CIPRES (http://www.phylo.org) >>> released their portal with access to remote execution of several >>> phylogenetic tree reconstruction programs in spring last year. >>> >>> It took a while but they have now also built a really nice REST- >>> based API that makes the service fully programmable instead of >>> screen-scraping 5 pages: >>> >>> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API) >>> >>> It should be relatively straightforward to build the equivalent of >>> RemoteBlast on top of this. Would anyone be keen to take this on? >>> >>> -hilmar >>> >>> P.S. Sorry for the cross-posting - I thought this is relevant to >>> both communities. When responding in a project-specific way please >>> make sure you remove the list that is no longer pertinent. >>> >>> >>> Begin forwarded message: >>> >>>> From: Lucie Chan >>>> Date: January 21, 2008 6:22:22 PM EST >>>> To: Hilmar Lapp >>>> Cc: Mark Miller , Rutger Vos , >>>> Terri Liebowitz , Paul Hoover , mtholder at ku.edu >>>> Subject: Re: REST APIs for Cipres Web Portal >>>> Reply-To: lcchan at sdsc.edu >>>> >>>> Hilmar, et al., >>>> >>>> I just released the first version of our REST Web Services API >>>> for job submission, and job status query, and >>>> job result file retrieval. I'd like to get some feedbacks >>>> (issues, problems, improvements, suggestions, etc) from you. For >>>> documentation on how to access the services, check it out at: >>>> >>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST >>>> API" below the "CIPRES PORTAL" banner. >>>> >>>> Lucie >>>> >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : >>> =========================================================== >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From shandar at nibio.go.jp Sun Jan 27 01:50:40 2008 From: shandar at nibio.go.jp (Shandar Ahmad) Date: Sun, 27 Jan 2008 15:50:40 +0900 Subject: [Bioperl-l] PRIB 2008 Message-ID: <1201416640.31793.7.camel@boe> ******* Our apologies if you received multiple copies *********** If you wish not to receive PRIB 2008 related emails, please write to Madhu Chetty and CC to me at shandar at nibio.go.jp ****************************************************************** PRELIMINARY CALL FOR PAPERS AND INVITED SESSIONS ******************************************************************************************** Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB 2008) October 15 ? 17, 2008 Melbourne, Australia http://www.infotech.monash.edu.au/prib08 ******************************************************************************************** PRIB 2008 is aimed at bringing together top researchers, practitioners, and students from around the world to discuss the applications of pattern recognition methods in the field of bioinformatics to solve problems in life sciences. Pattern recognition techniques of interest include: statistical, syntactic, and structural approaches, Bayesian, hidden Markov and graphical models, neural networks, fuzzy and genetic algorithms, data mining, and their hybrids. Papers in areas of (but not limited to) bio-sequence analysis, gene and protein expression analysis, structure prediction, protein folding, docking, metabolic pathway analysis and regulatory networks, system biology, drug design, and bioimaging, are solicited for presentation at the conference. All papers will be peer reviewed and accepted papers will be published in the conference proceedings as an edited volume in Lecture Notes in Bioinformatics by Springer. Submission of papers will be electronic and through the conference website. Proposals for special sessions and tutorials at the conference are also invited in all related areas of research. Authors of selected papers presented at the conference will also be invited for publication in Special Issues of reputed journals. Location: Melbourne is a sophisticated city in the south-east corner of mainland Australia. It is known for its attractive site seeing places, great events, passion for food and wine and fabulous scenery. Boasting as a style-setter, Melbourne is home to continuous program of festivals, art exhibitions and musical extravaganzas. Warning: you might never want to go home. For latest information on PRIB 2008, visit the conference web site: http://www.infotech.monash.edu.au/prib08 or email the secretariat at prib2008.melb at infotech.monash.edu.au Important Deadlines Paper submission: 15 April 2008 Proposals for Special Sessions/Tutorials: 15 March 2008 Author notification: 15 May 2008 Camera-ready papers: 15 June 2008 Organising Committee, PRIB 2008 From snoze.pa at gmail.com Mon Jan 28 16:07:37 2008 From: snoze.pa at gmail.com (snoze pa) Date: Mon, 28 Jan 2008 15:07:37 -0600 Subject: [Bioperl-l] bioseqDB error In-Reply-To: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com> References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com> Message-ID: <10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com> Still I am getting the same error message.. My question is: Do i need to install bioperl-DB for biosql? When I am using biosql and trying to load NCBI taxonomy then it is working fine. but when I am trying to install bioperl-DB then it is giving me following error message when loading NCBI taxonomy. Any help? Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes failed to insert node (10090;10090;10088;species;1;2): Duplicate entry '10090' for key 2 at load_ncbi_taxonomy.pl line 568 From susantoroy at gmail.com Mon Jan 28 16:05:49 2008 From: susantoroy at gmail.com (Susanta Roy) Date: Tue, 29 Jan 2008 02:35:49 +0530 Subject: [Bioperl-l] Please remove my letter from your site Message-ID: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com> Dear Sir, Please remove my letter appearing at your below URL: http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html http://bioperl.org/pipermail/bioperl-l/2007-December.txt http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html It is not supposed to appear online. Thanks in advance. Regards Suisanta From cjfields at uiuc.edu Mon Jan 28 16:53:33 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 28 Jan 2008 15:53:33 -0600 Subject: [Bioperl-l] Please remove my letter from your site In-Reply-To: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com> References: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com> Message-ID: Um, you posted to a public mailing list (hence the list is open to the public, for searching, indexing via Google, etc). Terms of usage are here: http://lists.open-bio.org/mailman/listinfo/bioperl-l with more info here: http://www.bioperl.org/wiki/Mailing_lists BTW, this post will also appear. C'est la vie! chris On Jan 28, 2008, at 3:05 PM, Susanta Roy wrote: > Dear Sir, > Please remove my letter appearing at your below URL: > http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html > http://bioperl.org/pipermail/bioperl-l/2007-December.txt > http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html > > > It is not supposed to appear online. > Thanks in advance. > > Regards > Suisanta > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From snoze.pa at gmail.com Tue Jan 29 12:15:41 2008 From: snoze.pa at gmail.com (snoze pa) Date: Tue, 29 Jan 2008 11:15:41 -0600 Subject: [Bioperl-l] bioseqDB error In-Reply-To: <557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net> References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com> <10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com> <557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net> Message-ID: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com> Dear Users, I tried the to refresh installation and seems it is working. But when I loading sequences then it is giving me following warning messages. Am i doing alright? or i am missing huge chunk of sequences..Thanks in advance s -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values were ("","1") FKs (27,3,4) Duplicate entry '27-3-4-1' for key 2 --------------------------------------------------- ... ... and so on From tristan.lefebure at gmail.com Tue Jan 29 12:19:23 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 29 Jan 2008 12:19:23 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests Message-ID: <200801291219.23172.tristan.lefebure@gmail.com> Hello, I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers. I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. My script works well for short request, but it gives the following error with the long request: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: WebDBSeqI Request Error: 500 short write Content-Type: text/plain Client-Date: Tue, 29 Jan 2008 17:22:46 GMT Client-Warning: Internal response 500 short write STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361 STACK: ./fetch_from_genbank.pl:58 --------------------------------------------------------- Does that mean that we can only fetch 500 sequences at a time? Should I split my list in 500 ids framents and submit them one after the other? Any suggestions very welcomed... Thanks, -Tristan Here is the script: ################################## use strict; use warnings; use Bio::DB::GenBank; # use Bio::DB::EUtilities; use Bio::SeqIO; use Getopt::Long; # 2008-01-22 T Lefebure # I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank. # The following procedure is not really good as the stream is first copied to a temporary file, # and than re-used by BioPerl to generate the final file. my $db = 'nucleotide'; my $format = 'genbank'; my $help= ''; my $dformat = 'gb'; GetOptions( 'help|?' => \$help, 'format=s' => \$format, 'database=s' => \$db, ); my $printhelp = "\nUsage: $0 [options] Will download the corresponding data from GenBank. BioPerl is required. Options: -h print this help -format: genbank|fasta|... give output format (default=genbank) -database: nucleotide|genome|protein|... define the database to search in (default=nucleotide) The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n"; if ($#ARGV<1) { print $printhelp; exit; } open LIST, $ARGV[0]; my @list = ; if ($format eq 'fasta') { $dformat = 'fasta' } my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', -format => $dformat, -db => $db, ); my $seqio = $gb->get_Stream_by_acc(\@list); my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", -format => $format, ); while (my $seqo = $seqio->next_seq ) { print $seqo->id, "\n"; $seqout->write_seq($seqo); } From cjfields at uiuc.edu Tue Jan 29 13:06:08 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Jan 2008 12:06:08 -0600 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com> References: <200801291219.23172.tristan.lefebure@gmail.com> Message-ID: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu> Yes, you can only retrieve ~500 sequences at a time using either Bio::DB::GenBank. Both Bio::DB::GenBank and Bio::DB::EUtilities interact with NCBI's EUtilities (the former module returns raw data from the URL to be processed later, the latter module returns Bio::Seq/ Bio::SeqIO objects). http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets You can usually post more IDs using epost and fetch sequence referring to the WebEnv/key combo (batch posting). I try to make this a bit easier with EUtilities but it is woefully lacking in documentation (my fault), but there is some code up on the wiki which should work. chris On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote: > Hello, > > I would like to download a large number of sequences from GenBank > (122,146 to be exact) following a list of accession numbers. > I first investigated around Bio::DB::EUtilities, but got lost and > finally used Bio::DB::GenBank. > My script works well for short request, but it gives the following > error with the long request: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: WebDBSeqI Request Error: > 500 short write > Content-Type: text/plain > Client-Date: Tue, 29 Jan 2008 17:22:46 GMT > Client-Warning: Internal response > > 500 short write > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ > Root.pm:359 > STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ > DB/WebDBSeqI.pm:685 > STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ > 5.8.8/Bio/DB/WebDBSeqI.pm:472 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ > 5.8.8/Bio/DB/NCBIHelper.pm:361 > STACK: ./fetch_from_genbank.pl:58 > --------------------------------------------------------- > > Does that mean that we can only fetch 500 sequences at a time? > Should I split my list in 500 ids framents and submit them one after > the other? > > Any suggestions very welcomed... > Thanks, > -Tristan > > > Here is the script: > > ################################## > use strict; > use warnings; > use Bio::DB::GenBank; > # use Bio::DB::EUtilities; > use Bio::SeqIO; > use Getopt::Long; > > # 2008-01-22 T Lefebure > # I tried to use Bio::DB::EUtilities without much succes and get > back to Bio::DB::GenBank. > # The following procedure is not really good as the stream is first > copied to a temporary file, > # and than re-used by BioPerl to generate the final file. > > my $db = 'nucleotide'; > my $format = 'genbank'; > my $help= ''; > my $dformat = 'gb'; > > GetOptions( > 'help|?' => \$help, > 'format=s' => \$format, > 'database=s' => \$db, > ); > > > my $printhelp = "\nUsage: $0 [options] > > Will download the corresponding data from GenBank. BioPerl is > required. > > Options: > -h > print this help > -format: genbank|fasta|... > give output format (default=genbank) > -database: nucleotide|genome|protein|... > define the database to search in (default=nucleotide) > > The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html > \n"; > > if ($#ARGV<1) { > print $printhelp; > exit; > } > > open LIST, $ARGV[0]; > my @list = ; > > if ($format eq 'fasta') { $dformat = 'fasta' } > > my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', > -format => $dformat, > -db => $db, > ); > my $seqio = $gb->get_Stream_by_acc(\@list); > > my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", > -format => $format, > ); > while (my $seqo = $seqio->next_seq ) { > print $seqo->id, "\n"; > $seqout->write_seq($seqo); > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From snoze.pa at gmail.com Tue Jan 29 13:22:56 2008 From: snoze.pa at gmail.com (snoze pa) Date: Tue, 29 Jan 2008 12:22:56 -0600 Subject: [Bioperl-l] loading sequence error bioseq Message-ID: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com> Dear User, After successfully creating a database bioseqdb and loading ncbi_taxonomy successfully I am getting following error message while loading sequences into database. load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values were ("","31") FKs MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values were MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were Column 'dbname' cannot be null STACK: /usr/local/bioperl- db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620 ----------------------------------------------------------- at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/load_seqdatabase.pl line 633 Any Idea? Thanks in advance s From cjfields at uiuc.edu Tue Jan 29 13:44:16 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 29 Jan 2008 12:44:16 -0600 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests In-Reply-To: <479F7149.1010203@atgc.org> References: <200801291219.23172.tristan.lefebure@gmail.com> <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu> <479F7149.1010203@atgc.org> Message-ID: Forgot about that one; it's definitely a better way to do it if you have the GI/accessions. chris On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote: > you don't need to use bioperl to accomplish this task, to download > several thousand sequences based on accession ID list. > > NCBI batch Entrez can do that: > http://www.ncbi.nlm.nih.gov/sites/batchentrez > > just submit a large list of IDs, select database, and download. > > you can submit ~50,000 IDs in one file usually without problems. > it may not return results if a list is larger than ~100,000 IDs > > -- > Alexander Kozik > Bioinformatics Specialist > Genome and Biomedical Sciences Facility > 451 Health Sciences Drive > Genome Center, 4-th floor, room 4302 > University of California > Davis, CA 95616-8816 > Phone: (530) 754-9127 > email#1: akozik at atgc.org > email#2: akozik at gmail.com > web: http://www.atgc.org/ > > > > Chris Fields wrote: >> Yes, you can only retrieve ~500 sequences at a time using either >> Bio::DB::GenBank. Both Bio::DB::GenBank and Bio::DB::EUtilities >> interact with NCBI's EUtilities (the former module returns raw data >> from the URL to be processed later, the latter module returns >> Bio::Seq/Bio::SeqIO objects). >> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets >> You can usually post more IDs using epost and fetch sequence >> referring to the WebEnv/key combo (batch posting). I try to make >> this a bit easier with EUtilities but it is woefully lacking in >> documentation (my fault), but there is some code up on the wiki >> which should work. >> chris >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote: >>> Hello, >>> >>> I would like to download a large number of sequences from GenBank >>> (122,146 to be exact) following a list of accession numbers. >>> I first investigated around Bio::DB::EUtilities, but got lost and >>> finally used Bio::DB::GenBank. >>> My script works well for short request, but it gives the following >>> error with the long request: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: WebDBSeqI Request Error: >>> 500 short write >>> Content-Type: text/plain >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT >>> Client-Warning: Internal response >>> >>> 500 short write >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ >>> Root.pm:359 >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/ >>> Bio/DB/WebDBSeqI.pm:685 >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472 >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/ >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361 >>> STACK: ./fetch_from_genbank.pl:58 >>> --------------------------------------------------------- >>> >>> Does that mean that we can only fetch 500 sequences at a time? >>> Should I split my list in 500 ids framents and submit them one >>> after the other? >>> >>> Any suggestions very welcomed... >>> Thanks, >>> -Tristan >>> >>> >>> Here is the script: >>> >>> ################################## >>> use strict; >>> use warnings; >>> use Bio::DB::GenBank; >>> # use Bio::DB::EUtilities; >>> use Bio::SeqIO; >>> use Getopt::Long; >>> >>> # 2008-01-22 T Lefebure >>> # I tried to use Bio::DB::EUtilities without much succes and get >>> back to Bio::DB::GenBank. >>> # The following procedure is not really good as the stream is >>> first copied to a temporary file, >>> # and than re-used by BioPerl to generate the final file. >>> >>> my $db = 'nucleotide'; >>> my $format = 'genbank'; >>> my $help= ''; >>> my $dformat = 'gb'; >>> >>> GetOptions( >>> 'help|?' => \$help, >>> 'format=s' => \$format, >>> 'database=s' => \$db, >>> ); >>> >>> >>> my $printhelp = "\nUsage: $0 [options] >>> >>> Will download the corresponding data from GenBank. BioPerl is >>> required. >>> >>> Options: >>> -h >>> print this help >>> -format: genbank|fasta|... >>> give output format (default=genbank) >>> -database: nucleotide|genome|protein|... >>> define the database to search in (default=nucleotide) >>> >>> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html >>> \n"; >>> >>> if ($#ARGV<1) { >>> print $printhelp; >>> exit; >>> } >>> >>> open LIST, $ARGV[0]; >>> my @list = ; >>> >>> if ($format eq 'fasta') { $dformat = 'fasta' } >>> >>> my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', >>> -format => $dformat, >>> -db => $db, >>> ); >>> my $seqio = $gb->get_Stream_by_acc(\@list); >>> >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", >>> -format => $format, >>> ); >>> while (my $seqo = $seqio->next_seq ) { >>> print $seqo->id, "\n"; >>> $seqout->write_seq($seqo); >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From akozik at atgc.org Tue Jan 29 13:32:41 2008 From: akozik at atgc.org (Alexander Kozik) Date: Tue, 29 Jan 2008 10:32:41 -0800 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests In-Reply-To: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu> References: <200801291219.23172.tristan.lefebure@gmail.com> <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu> Message-ID: <479F7149.1010203@atgc.org> you don't need to use bioperl to accomplish this task, to download several thousand sequences based on accession ID list. NCBI batch Entrez can do that: http://www.ncbi.nlm.nih.gov/sites/batchentrez just submit a large list of IDs, select database, and download. you can submit ~50,000 IDs in one file usually without problems. it may not return results if a list is larger than ~100,000 IDs -- Alexander Kozik Bioinformatics Specialist Genome and Biomedical Sciences Facility 451 Health Sciences Drive Genome Center, 4-th floor, room 4302 University of California Davis, CA 95616-8816 Phone: (530) 754-9127 email#1: akozik at atgc.org email#2: akozik at gmail.com web: http://www.atgc.org/ Chris Fields wrote: > Yes, you can only retrieve ~500 sequences at a time using either > Bio::DB::GenBank. Both Bio::DB::GenBank and Bio::DB::EUtilities > interact with NCBI's EUtilities (the former module returns raw data from > the URL to be processed later, the latter module returns > Bio::Seq/Bio::SeqIO objects). > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets > > > You can usually post more IDs using epost and fetch sequence referring > to the WebEnv/key combo (batch posting). I try to make this a bit > easier with EUtilities but it is woefully lacking in documentation (my > fault), but there is some code up on the wiki which should work. > > chris > > On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote: > >> Hello, >> >> I would like to download a large number of sequences from GenBank >> (122,146 to be exact) following a list of accession numbers. >> I first investigated around Bio::DB::EUtilities, but got lost and >> finally used Bio::DB::GenBank. >> My script works well for short request, but it gives the following >> error with the long request: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: WebDBSeqI Request Error: >> 500 short write >> Content-Type: text/plain >> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT >> Client-Warning: Internal response >> >> 500 short write >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 >> STACK: Bio::DB::WebDBSeqI::_request >> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685 >> STACK: Bio::DB::WebDBSeqI::get_seq_stream >> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472 >> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc >> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361 >> STACK: ./fetch_from_genbank.pl:58 >> --------------------------------------------------------- >> >> Does that mean that we can only fetch 500 sequences at a time? >> Should I split my list in 500 ids framents and submit them one after >> the other? >> >> Any suggestions very welcomed... >> Thanks, >> -Tristan >> >> >> Here is the script: >> >> ################################## >> use strict; >> use warnings; >> use Bio::DB::GenBank; >> # use Bio::DB::EUtilities; >> use Bio::SeqIO; >> use Getopt::Long; >> >> # 2008-01-22 T Lefebure >> # I tried to use Bio::DB::EUtilities without much succes and get back >> to Bio::DB::GenBank. >> # The following procedure is not really good as the stream is first >> copied to a temporary file, >> # and than re-used by BioPerl to generate the final file. >> >> my $db = 'nucleotide'; >> my $format = 'genbank'; >> my $help= ''; >> my $dformat = 'gb'; >> >> GetOptions( >> 'help|?' => \$help, >> 'format=s' => \$format, >> 'database=s' => \$db, >> ); >> >> >> my $printhelp = "\nUsage: $0 [options] >> >> Will download the corresponding data from GenBank. BioPerl is required. >> >> Options: >> -h >> print this help >> -format: genbank|fasta|... >> give output format (default=genbank) >> -database: nucleotide|genome|protein|... >> define the database to search in (default=nucleotide) >> >> The full description of the options can be find at >> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n"; >> >> if ($#ARGV<1) { >> print $printhelp; >> exit; >> } >> >> open LIST, $ARGV[0]; >> my @list = ; >> >> if ($format eq 'fasta') { $dformat = 'fasta' } >> >> my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', >> -format => $dformat, >> -db => $db, >> ); >> my $seqio = $gb->get_Stream_by_acc(\@list); >> >> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", >> -format => $format, >> ); >> while (my $seqo = $seqio->next_seq ) { >> print $seqo->id, "\n"; >> $seqout->write_seq($seqo); >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Tue Jan 29 16:31:47 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 29 Jan 2008 16:31:47 -0500 Subject: [Bioperl-l] loading sequence error bioseq In-Reply-To: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com> References: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com> Message-ID: This looks suspiciously like a data error. Can you please give the full command line. This should also show which format your sequences are in. -hilmar On Jan 29, 2008, at 1:22 PM, snoze pa wrote: > Dear User, > > After successfully creating a database bioseqdb and loading > ncbi_taxonomy > successfully I am getting following error message while loading > sequences > into database. > > load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc > > MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, > values > were ("","31") FKs > MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, > values > were > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were > > Column 'dbname' cannot be null > > STACK: /usr/local/bioperl- > db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620 > ----------------------------------------------------------- > > at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/ > load_seqdatabase.pl line > 633 > > Any Idea? > > Thanks in advance > s > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Jan 29 16:40:21 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 29 Jan 2008 16:40:21 -0500 Subject: [Bioperl-l] bioseqDB error In-Reply-To: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com> References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com> <10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com> <557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net> <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com> Message-ID: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net> This would mean that two or more seqfeatures with the same type for the same sequence exist in the input data, each with rank 1. Normally the rank will be incremented for each seqfeature of a sequence, so I'm not sure how this is happening here w/o seeing the data. -hilmar On Jan 29, 2008, at 12:15 PM, snoze pa wrote: > Dear Users, > I tried the to refresh installation and seems it is working. But > when I > loading sequences then it is giving me following warning messages. > Am i > doing alright? or i am missing huge chunk of sequences..Thanks in > advance > s > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, > values > were ("","1") FKs (27,3,4) > Duplicate entry '27-3-4-1' for key 2 > --------------------------------------------------- > ... > ... > and so on > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From avilella at gmail.com Wed Jan 30 04:28:34 2008 From: avilella at gmail.com (Albert Vilella) Date: Wed, 30 Jan 2008 09:28:34 +0000 Subject: [Bioperl-l] fetch dna seqs from genbank protein ids Message-ID: <358f4d650801300128q44cf95a0va11799908c4f26a0@mail.gmail.com> Hi bioperlers, Got a question here: >I have a bunch of protein sequences in multi-FastA with their >accession numbers in the header and I want to retrieve their >corresponding nucleotide sequences and nucleotide accession numbers. >I can't seem to find a way to do it. I am looking at eUtils on the >NCBI site, but they only do really simple stuff. I had a look at the fetch example scripts, and I could fetch proteins from Genbank, but I don't see a clear connection between the protein sequence and the DNA sequence. Is this a DBlink? Which type? Cheers, Albert. From tristan.lefebure at gmail.com Wed Jan 30 09:56:07 2008 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 30 Jan 2008 09:56:07 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests In-Reply-To: References: <200801291219.23172.tristan.lefebure@gmail.com> <479F7149.1010203@atgc.org> Message-ID: <200801300956.07849.tristan.lefebure@gmail.com> Thank you both! Just in case it might be usefull for someone else, here are my ramblings: 1. I first tried to adapt my script and fetch 500 sequences at a time. It works, except that ~40% of the time NCBI gives the following error and my script crashed: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: WebDBSeqI Request Error: [...] The proxy server received an invalid response from an upstream server. [...] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361 STACK: ./fetch_from_genbank.pl:68 ----------------------------------------------------------- I tried to modify the script so that when the retrieval of a 500 sequence block crashes, it continues with the other blocks, but I was unsuccessfull. It probably needs some better understanding of BioPerl errors... Here is the section of the script that was modified: ######### my $n_seq = scalar @list; my @aborted; for (my $i=1; $i<=$n_seq; $i += 500) { print "Fetching sequences $i to ", $i+499, ": "; my $start = $i -1; my $end = $i + 500 -1; my @red_list = @list[$start .. $end]; my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', -format => $dformat, -db => $db, ); my $seqio; unless( $seqio = $gb->get_Stream_by_acc(\@red_list)) { print "Aborted, resubmit latter\n"; push @aborted, @red_list; next; } my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i", -format => $format, ); while (my $seqo = $seqio->next_seq ) { # print $seqo->id, "\n"; $seqout->write_seq($seqo); } print "Done\n"; } if (@aborted) { open OUT, ">aborted_fetching.AN"; foreach (@aborted) { print OUT $_ }; } ########## 2. So I moved to the second solution and tried batchentrez. I cut my 120,000 long AN list into 10,000 long pieces using split: split -l 10000 full_list.AN splitted_list_ and then submitted the 13 lists one by one. I must say that I don't really like using a web-interface to fetch data, and here the most ennoying part is that you end up with a regular Entrez/GenBank webpage: select your format, export to file, chosse file name... and have to do it many times. It is too much prone to human and web-browser errors for my taste, but it worked. Nevertheless there is some caveats: - some downloaded files were incomplete (~10%) and you have to restart it - you can't submit several lists in the same time (otherwise the same cookie will be used and you'll end up with several identical files) -Tristan On Tuesday 29 January 2008 13:44:16 you wrote: > Forgot about that one; it's definitely a better way to do it if you > have the GI/accessions. > > chris > > On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote: > > you don't need to use bioperl to accomplish this task, to download > > several thousand sequences based on accession ID list. > > > > NCBI batch Entrez can do that: > > http://www.ncbi.nlm.nih.gov/sites/batchentrez > > > > just submit a large list of IDs, select database, and download. > > > > you can submit ~50,000 IDs in one file usually without problems. > > it may not return results if a list is larger than ~100,000 IDs > > > > -- > > Alexander Kozik > > Bioinformatics Specialist > > Genome and Biomedical Sciences Facility > > 451 Health Sciences Drive > > Genome Center, 4-th floor, room 4302 > > University of California > > Davis, CA 95616-8816 > > Phone: (530) 754-9127 > > email#1: akozik at atgc.org > > email#2: akozik at gmail.com > > web: http://www.atgc.org/ > > > > Chris Fields wrote: > >> Yes, you can only retrieve ~500 sequences at a time using either > >> Bio::DB::GenBank. Both Bio::DB::GenBank and Bio::DB::EUtilities > >> interact with NCBI's EUtilities (the former module returns raw data > >> from the URL to be processed later, the latter module returns > >> Bio::Seq/Bio::SeqIO objects). > >> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d > >>atasets You can usually post more IDs using epost and fetch sequence > >> referring to the WebEnv/key combo (batch posting). I try to make > >> this a bit easier with EUtilities but it is woefully lacking in > >> documentation (my fault), but there is some code up on the wiki > >> which should work. > >> chris > >> > >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote: > >>> Hello, > >>> > >>> I would like to download a large number of sequences from GenBank > >>> (122,146 to be exact) following a list of accession numbers. > >>> I first investigated around Bio::DB::EUtilities, but got lost and > >>> finally used Bio::DB::GenBank. > >>> My script works well for short request, but it gives the following > >>> error with the long request: > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: WebDBSeqI Request Error: > >>> 500 short write > >>> Content-Type: text/plain > >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT > >>> Client-Warning: Internal response > >>> > >>> 500 short write > >>> > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ > >>> Root.pm:359 > >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/ > >>> Bio/DB/WebDBSeqI.pm:685 > >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ > >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472 > >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/ > >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361 > >>> STACK: ./fetch_from_genbank.pl:58 > >>> --------------------------------------------------------- > >>> > >>> Does that mean that we can only fetch 500 sequences at a time? > >>> Should I split my list in 500 ids framents and submit them one > >>> after the other? > >>> > >>> Any suggestions very welcomed... > >>> Thanks, > >>> -Tristan > >>> > >>> > >>> Here is the script: > >>> > >>> ################################## > >>> use strict; > >>> use warnings; > >>> use Bio::DB::GenBank; > >>> # use Bio::DB::EUtilities; > >>> use Bio::SeqIO; > >>> use Getopt::Long; > >>> > >>> # 2008-01-22 T Lefebure > >>> # I tried to use Bio::DB::EUtilities without much succes and get > >>> back to Bio::DB::GenBank. > >>> # The following procedure is not really good as the stream is > >>> first copied to a temporary file, > >>> # and than re-used by BioPerl to generate the final file. > >>> > >>> my $db = 'nucleotide'; > >>> my $format = 'genbank'; > >>> my $help= ''; > >>> my $dformat = 'gb'; > >>> > >>> GetOptions( > >>> 'help|?' => \$help, > >>> 'format=s' => \$format, > >>> 'database=s' => \$db, > >>> ); > >>> > >>> > >>> my $printhelp = "\nUsage: $0 [options] > >>> > >>> Will download the corresponding data from GenBank. BioPerl is > >>> required. > >>> > >>> Options: > >>> -h > >>> print this help > >>> -format: genbank|fasta|... > >>> give output format (default=genbank) > >>> -database: nucleotide|genome|protein|... > >>> define the database to search in (default=nucleotide) > >>> > >>> The full description of the options can be find at > >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html > >>> \n"; > >>> > >>> if ($#ARGV<1) { > >>> print $printhelp; > >>> exit; > >>> } > >>> > >>> open LIST, $ARGV[0]; > >>> my @list = ; > >>> > >>> if ($format eq 'fasta') { $dformat = 'fasta' } > >>> > >>> my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', > >>> -format => $dformat, > >>> -db => $db, > >>> ); > >>> my $seqio = $gb->get_Stream_by_acc(\@list); > >>> > >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", > >>> -format => $format, > >>> ); > >>> while (my $seqo = $seqio->next_seq ) { > >>> print $seqo->id, "\n"; > >>> $seqout->write_seq($seqo); > >>> } > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Jan 30 10:10:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Jan 2008 09:10:14 -0600 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com> References: <200801291219.23172.tristan.lefebure@gmail.com> <479F7149.1010203@atgc.org> <200801300956.07849.tristan.lefebure@gmail.com> Message-ID: <7143A650-AA84-4331-B55A-A66C3F5BBAB0@uiuc.edu> You can use an eval {} block to catch the error, then redo the loop (so you don't iterate to the next block) or use next and skip the current block if an error occurs. If you use redo then you should use a counter to exit the loop after several tries. chris On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote: > Thank you both! > > Just in case it might be usefull for someone else, here are my > ramblings: > > 1. I first tried to adapt my script and fetch 500 sequences at a > time. It works, except that ~40% of the time NCBI gives the > following error and my script crashed: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: WebDBSeqI Request Error: > [...] > The proxy server received an invalid > response from an upstream server. > [...] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ > Root.pm:359 > STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ > DB/WebDBSeqI.pm:685 > STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ > 5.8.8/Bio/DB/WebDBSeqI.pm:472 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ > 5.8.8/Bio/DB/NCBIHelper.pm:361 > STACK: ./fetch_from_genbank.pl:68 > ----------------------------------------------------------- > > I tried to modify the script so that when the retrieval of a 500 > sequence block crashes, it continues with the other blocks, but I > was unsuccessfull. It probably needs some better understanding of > BioPerl errors... > Here is the section of the script that was modified: > ######### > my $n_seq = scalar @list; > my @aborted; > > for (my $i=1; $i<=$n_seq; $i += 500) { > print "Fetching sequences $i to ", $i+499, ": "; > my $start = $i -1; > my $end = $i + 500 -1; > my @red_list = @list[$start .. $end]; > my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', > -format => $dformat, > -db => $db, > ); > > my $seqio; > unless( $seqio = $gb->get_Stream_by_acc(\@red_list)) { > print "Aborted, resubmit latter\n"; > push @aborted, @red_list; > next; > } > > my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i", > -format => $format, > ); > while (my $seqo = $seqio->next_seq ) { > # print $seqo->id, "\n"; > $seqout->write_seq($seqo); > } > print "Done\n"; > } > > if (@aborted) { > open OUT, ">aborted_fetching.AN"; > foreach (@aborted) { print OUT $_ }; > } > ########## > > > 2. So I moved to the second solution and tried batchentrez. I cut my > 120,000 long AN list into 10,000 long pieces using split: > split -l 10000 full_list.AN splitted_list_ > > and then submitted the 13 lists one by one. I must say that I don't > really like using a web-interface to fetch data, and here the most > ennoying part is that you end up with a regular Entrez/GenBank > webpage: select your format, export to file, chosse file name... and > have to do it many times. > It is too much prone to human and web-browser errors for my taste, > but it worked. > Nevertheless there is some caveats: > - some downloaded files were incomplete (~10%) and you have to > restart it > - you can't submit several lists in the same time (otherwise the > same cookie will be used and you'll end up with several identical > files) > > -Tristan > > On Tuesday 29 January 2008 13:44:16 you wrote: >> Forgot about that one; it's definitely a better way to do it if you >> have the GI/accessions. >> >> chris >> >> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote: >>> you don't need to use bioperl to accomplish this task, to download >>> several thousand sequences based on accession ID list. >>> >>> NCBI batch Entrez can do that: >>> http://www.ncbi.nlm.nih.gov/sites/batchentrez >>> >>> just submit a large list of IDs, select database, and download. >>> >>> you can submit ~50,000 IDs in one file usually without problems. >>> it may not return results if a list is larger than ~100,000 IDs >>> >>> -- >>> Alexander Kozik >>> Bioinformatics Specialist >>> Genome and Biomedical Sciences Facility >>> 451 Health Sciences Drive >>> Genome Center, 4-th floor, room 4302 >>> University of California >>> Davis, CA 95616-8816 >>> Phone: (530) 754-9127 >>> email#1: akozik at atgc.org >>> email#2: akozik at gmail.com >>> web: http://www.atgc.org/ >>> >>> Chris Fields wrote: >>>> Yes, you can only retrieve ~500 sequences at a time using either >>>> Bio::DB::GenBank. Both Bio::DB::GenBank and Bio::DB::EUtilities >>>> interact with NCBI's EUtilities (the former module returns raw data >>>> from the URL to be processed later, the latter module returns >>>> Bio::Seq/Bio::SeqIO objects). >>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d >>>> atasets You can usually post more IDs using epost and fetch >>>> sequence >>>> referring to the WebEnv/key combo (batch posting). I try to make >>>> this a bit easier with EUtilities but it is woefully lacking in >>>> documentation (my fault), but there is some code up on the wiki >>>> which should work. >>>> chris >>>> >>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote: >>>>> Hello, >>>>> >>>>> I would like to download a large number of sequences from GenBank >>>>> (122,146 to be exact) following a list of accession numbers. >>>>> I first investigated around Bio::DB::EUtilities, but got lost and >>>>> finally used Bio::DB::GenBank. >>>>> My script works well for short request, but it gives the following >>>>> error with the long request: >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: WebDBSeqI Request Error: >>>>> 500 short write >>>>> Content-Type: text/plain >>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT >>>>> Client-Warning: Internal response >>>>> >>>>> 500 short write >>>>> >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ >>>>> Root/ >>>>> Root.pm:359 >>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/ >>>>> Bio/DB/WebDBSeqI.pm:685 >>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ >>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472 >>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/ >>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361 >>>>> STACK: ./fetch_from_genbank.pl:58 >>>>> --------------------------------------------------------- >>>>> >>>>> Does that mean that we can only fetch 500 sequences at a time? >>>>> Should I split my list in 500 ids framents and submit them one >>>>> after the other? >>>>> >>>>> Any suggestions very welcomed... >>>>> Thanks, >>>>> -Tristan >>>>> >>>>> >>>>> Here is the script: >>>>> >>>>> ################################## >>>>> use strict; >>>>> use warnings; >>>>> use Bio::DB::GenBank; >>>>> # use Bio::DB::EUtilities; >>>>> use Bio::SeqIO; >>>>> use Getopt::Long; >>>>> >>>>> # 2008-01-22 T Lefebure >>>>> # I tried to use Bio::DB::EUtilities without much succes and get >>>>> back to Bio::DB::GenBank. >>>>> # The following procedure is not really good as the stream is >>>>> first copied to a temporary file, >>>>> # and than re-used by BioPerl to generate the final file. >>>>> >>>>> my $db = 'nucleotide'; >>>>> my $format = 'genbank'; >>>>> my $help= ''; >>>>> my $dformat = 'gb'; >>>>> >>>>> GetOptions( >>>>> 'help|?' => \$help, >>>>> 'format=s' => \$format, >>>>> 'database=s' => \$db, >>>>> ); >>>>> >>>>> >>>>> my $printhelp = "\nUsage: $0 [options] >>>>> >>>>> >>>>> Will download the corresponding data from GenBank. BioPerl is >>>>> required. >>>>> >>>>> Options: >>>>> -h >>>>> print this help >>>>> -format: genbank|fasta|... >>>>> give output format (default=genbank) >>>>> -database: nucleotide|genome|protein|... >>>>> define the database to search in (default=nucleotide) >>>>> >>>>> The full description of the options can be find at >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ >>>>> efetchseq_help.html >>>>> \n"; >>>>> >>>>> if ($#ARGV<1) { >>>>> print $printhelp; >>>>> exit; >>>>> } >>>>> >>>>> open LIST, $ARGV[0]; >>>>> my @list = ; >>>>> >>>>> if ($format eq 'fasta') { $dformat = 'fasta' } >>>>> >>>>> my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', >>>>> -format => $dformat, >>>>> -db => $db, >>>>> ); >>>>> my $seqio = $gb->get_Stream_by_acc(\@list); >>>>> >>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", >>>>> -format => $format, >>>>> ); >>>>> while (my $seqo = $seqio->next_seq ) { >>>>> print $seqo->id, "\n"; >>>>> $seqout->write_seq($seqo); >>>>> } >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From snoze.pa at gmail.com Wed Jan 30 12:34:24 2008 From: snoze.pa at gmail.com (snoze pa) Date: Wed, 30 Jan 2008 11:34:24 -0600 Subject: [Bioperl-l] bioseqDB error In-Reply-To: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net> References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com> <10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com> <557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net> <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com> <31534016-91B3-45C0-995D-CE5A82466303@gmx.net> Message-ID: <10f848910801300934q57e5d45cpbf0e17b45640e3f9@mail.gmail.com> Hilmar, The command I am using is following load_seqdatabase.pl -host localhost -namespace bioperl -dbname bioseqdb -dbuser root -format genbank sequences.txt I have no idea why i am getting that error thanks in advance On Jan 29, 2008 3:40 PM, Hilmar Lapp wrote: > This would mean that two or more seqfeatures with the same type for > the same sequence exist in the input data, each with rank 1. > > Normally the rank will be incremented for each seqfeature of a > sequence, so I'm not sure how this is happening here w/o seeing the > data. > > -hilmar > On Jan 29, 2008, at 12:15 PM, snoze pa wrote: > > > Dear Users, > > I tried the to refresh installation and seems it is working. But > > when I > > loading sequences then it is giving me following warning messages. > > Am i > > doing alright? or i am missing huge chunk of sequences..Thanks in > > advance > > s > > > > -------------------- WARNING --------------------- > > MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, > > values > > were ("","1") FKs (27,3,4) > > Duplicate entry '27-3-4-1' for key 2 > > --------------------------------------------------- > > ... > > ... > > and so on > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > From snoze.pa at gmail.com Wed Jan 30 13:01:46 2008 From: snoze.pa at gmail.com (snoze pa) Date: Wed, 30 Jan 2008 12:01:46 -0600 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com> References: <200801291219.23172.tristan.lefebure@gmail.com> Message-ID: <10f848910801301001k681e1291we0ce468e96d88f57@mail.gmail.com> U can use LWP one line code to grab sequences.. On Jan 29, 2008 11:19 AM, Tristan Lefebure wrote: > Hello, > > I would like to download a large number of sequences from GenBank (122,146 > to be exact) following a list of accession numbers. > I first investigated around Bio::DB::EUtilities, but got lost and finally > used Bio::DB::GenBank. > My script works well for short request, but it gives the following error > with the long request: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: WebDBSeqI Request Error: > 500 short write > Content-Type: text/plain > Client-Date: Tue, 29 Jan 2008 17:22:46 GMT > Client-Warning: Internal response > > 500 short write > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::DB::WebDBSeqI::_request > /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361 > STACK: ./fetch_from_genbank.pl:58 > --------------------------------------------------------- > > Does that mean that we can only fetch 500 sequences at a time? > Should I split my list in 500 ids framents and submit them one after the > other? > > Any suggestions very welcomed... > Thanks, > -Tristan > > > Here is the script: > > ################################## > use strict; > use warnings; > use Bio::DB::GenBank; > # use Bio::DB::EUtilities; > use Bio::SeqIO; > use Getopt::Long; > > # 2008-01-22 T Lefebure > # I tried to use Bio::DB::EUtilities without much succes and get back to > Bio::DB::GenBank. > # The following procedure is not really good as the stream is first copied > to a temporary file, > # and than re-used by BioPerl to generate the final file. > > my $db = 'nucleotide'; > my $format = 'genbank'; > my $help= ''; > my $dformat = 'gb'; > > GetOptions( > 'help|?' => \$help, > 'format=s' => \$format, > 'database=s' => \$db, > ); > > > my $printhelp = "\nUsage: $0 [options] > > Will download the corresponding data from GenBank. BioPerl is required. > > Options: > -h > print this help > -format: genbank|fasta|... > give output format (default=genbank) > -database: nucleotide|genome|protein|... > define the database to search in (default=nucleotide) > > The full description of the options can be find at > http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n > "; > > if ($#ARGV<1) { > print $printhelp; > exit; > } > > open LIST, $ARGV[0]; > my @list = ; > > if ($format eq 'fasta') { $dformat = 'fasta' } > > my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', > -format => $dformat, > -db => $db, > ); > my $seqio = $gb->get_Stream_by_acc(\@list); > > my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", > -format => $format, > ); > while (my $seqo = $seqio->next_seq ) { > print $seqo->id, "\n"; > $seqout->write_seq($seqo); > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From snoze.pa at gmail.com Wed Jan 30 13:38:12 2008 From: snoze.pa at gmail.com (snoze pa) Date: Wed, 30 Jan 2008 12:38:12 -0600 Subject: [Bioperl-l] load_seqdatabase help Message-ID: <10f848910801301038t1ae296c2o2453728b68dc81f8@mail.gmail.com> Dear User, Is there any alternative way so that I can load following sequence in to biosql schema. I am trying to use load_seqdatabase.pl but it is not working in my case and showing numbers of warning/error messages.. I did everything but unable to load it yet. http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb Any help, if i can load above sequence into my bioseqdb database. Thanks in advance s From snoze.pa at gmail.com Wed Jan 30 14:30:22 2008 From: snoze.pa at gmail.com (snoze pa) Date: Wed, 30 Jan 2008 13:30:22 -0600 Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl Message-ID: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com> Hi Hilmar, After spending lots of time i figure out the error. I am able to load sequences if the sequences do not have following entry xrefs (non-sequence databases): If the Genbank sequence have this entry then script load_seqdatabase.pl is crashing. I try it in couple of sequences and found it is the culprit line genbank format. But this line is important as it contain lots of information... so I am wondering how to solve this problem Any help? Thanks in advance s From Russell.Smithies at agresearch.co.nz Wed Jan 30 14:34:44 2008 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 31 Jan 2008 08:34:44 +1300 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com> References: <200801291219.23172.tristan.lefebure@gmail.com><479F7149.1010203@atgc.org> <200801300956.07849.tristan.lefebure@gmail.com> Message-ID: Take a look at http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi Ebot is an interactive tool that generates a Perl script that implements an E-utility pipeline. You can probably hack the resulting script to introduce the required BioPerly bits. Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Tristan Lefebure > Sent: Thursday, 31 January 2008 3:56 a.m. > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::GenBank and large number of requests > > Thank you both! > > Just in case it might be usefull for someone else, here are my ramblings: > > 1. I first tried to adapt my script and fetch 500 sequences at a time. It works, > except that ~40% of the time NCBI gives the following error and my script crashed: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: WebDBSeqI Request Error: > [...] > The proxy server received an invalid > response from an upstream server. > [...] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::DB::WebDBSeqI::_request > /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361 > STACK: ./fetch_from_genbank.pl:68 > ----------------------------------------------------------- > > I tried to modify the script so that when the retrieval of a 500 sequence block > crashes, it continues with the other blocks, but I was unsuccessfull. It probably > needs some better understanding of BioPerl errors... > Here is the section of the script that was modified: > ######### > my $n_seq = scalar @list; > my @aborted; > > for (my $i=1; $i<=$n_seq; $i += 500) { > print "Fetching sequences $i to ", $i+499, ": "; > my $start = $i -1; > my $end = $i + 500 -1; > my @red_list = @list[$start .. $end]; > my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', > -format => $dformat, > -db => $db, > ); > > my $seqio; > unless( $seqio = $gb->get_Stream_by_acc(\@red_list)) { > print "Aborted, resubmit latter\n"; > push @aborted, @red_list; > next; > } > > my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i", > -format => $format, > ); > while (my $seqo = $seqio->next_seq ) { > # print $seqo->id, "\n"; > $seqout->write_seq($seqo); > } > print "Done\n"; > } > > if (@aborted) { > open OUT, ">aborted_fetching.AN"; > foreach (@aborted) { print OUT $_ }; > } > ########## > > > 2. So I moved to the second solution and tried batchentrez. I cut my 120,000 long > AN list into 10,000 long pieces using split: > split -l 10000 full_list.AN splitted_list_ > > and then submitted the 13 lists one by one. I must say that I don't really like using > a web-interface to fetch data, and here the most ennoying part is that you end up > with a regular Entrez/GenBank webpage: select your format, export to file, chosse > file name... and have to do it many times. > It is too much prone to human and web-browser errors for my taste, but it worked. > Nevertheless there is some caveats: > - some downloaded files were incomplete (~10%) and you have to restart it > - you can't submit several lists in the same time (otherwise the same cookie will be > used and you'll end up with several identical files) > > -Tristan > > On Tuesday 29 January 2008 13:44:16 you wrote: > > Forgot about that one; it's definitely a better way to do it if you > > have the GI/accessions. > > > > chris > > > > On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote: > > > you don't need to use bioperl to accomplish this task, to download > > > several thousand sequences based on accession ID list. > > > > > > NCBI batch Entrez can do that: > > > http://www.ncbi.nlm.nih.gov/sites/batchentrez > > > > > > just submit a large list of IDs, select database, and download. > > > > > > you can submit ~50,000 IDs in one file usually without problems. > > > it may not return results if a list is larger than ~100,000 IDs > > > > > > -- > > > Alexander Kozik > > > Bioinformatics Specialist > > > Genome and Biomedical Sciences Facility > > > 451 Health Sciences Drive > > > Genome Center, 4-th floor, room 4302 > > > University of California > > > Davis, CA 95616-8816 > > > Phone: (530) 754-9127 > > > email#1: akozik at atgc.org > > > email#2: akozik at gmail.com > > > web: http://www.atgc.org/ > > > > > > Chris Fields wrote: > > >> Yes, you can only retrieve ~500 sequences at a time using either > > >> Bio::DB::GenBank. Both Bio::DB::GenBank and Bio::DB::EUtilities > > >> interact with NCBI's EUtilities (the former module returns raw data > > >> from the URL to be processed later, the latter module returns > > >> Bio::Seq/Bio::SeqIO objects). > > >> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d > > >>atasets You can usually post more IDs using epost and fetch sequence > > >> referring to the WebEnv/key combo (batch posting). I try to make > > >> this a bit easier with EUtilities but it is woefully lacking in > > >> documentation (my fault), but there is some code up on the wiki > > >> which should work. > > >> chris > > >> > > >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote: > > >>> Hello, > > >>> > > >>> I would like to download a large number of sequences from GenBank > > >>> (122,146 to be exact) following a list of accession numbers. > > >>> I first investigated around Bio::DB::EUtilities, but got lost and > > >>> finally used Bio::DB::GenBank. > > >>> My script works well for short request, but it gives the following > > >>> error with the long request: > > >>> > > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > >>> MSG: WebDBSeqI Request Error: > > >>> 500 short write > > >>> Content-Type: text/plain > > >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT > > >>> Client-Warning: Internal response > > >>> > > >>> 500 short write > > >>> > > >>> STACK: Error::throw > > >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ > > >>> Root.pm:359 > > >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/ > > >>> Bio/DB/WebDBSeqI.pm:685 > > >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ > > >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472 > > >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/ > > >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361 > > >>> STACK: ./fetch_from_genbank.pl:58 > > >>> --------------------------------------------------------- > > >>> > > >>> Does that mean that we can only fetch 500 sequences at a time? > > >>> Should I split my list in 500 ids framents and submit them one > > >>> after the other? > > >>> > > >>> Any suggestions very welcomed... > > >>> Thanks, > > >>> -Tristan > > >>> > > >>> > > >>> Here is the script: > > >>> > > >>> ################################## > > >>> use strict; > > >>> use warnings; > > >>> use Bio::DB::GenBank; > > >>> # use Bio::DB::EUtilities; > > >>> use Bio::SeqIO; > > >>> use Getopt::Long; > > >>> > > >>> # 2008-01-22 T Lefebure > > >>> # I tried to use Bio::DB::EUtilities without much succes and get > > >>> back to Bio::DB::GenBank. > > >>> # The following procedure is not really good as the stream is > > >>> first copied to a temporary file, > > >>> # and than re-used by BioPerl to generate the final file. > > >>> > > >>> my $db = 'nucleotide'; > > >>> my $format = 'genbank'; > > >>> my $help= ''; > > >>> my $dformat = 'gb'; > > >>> > > >>> GetOptions( > > >>> 'help|?' => \$help, > > >>> 'format=s' => \$format, > > >>> 'database=s' => \$db, > > >>> ); > > >>> > > >>> > > >>> my $printhelp = "\nUsage: $0 [options] > > >>> > > >>> Will download the corresponding data from GenBank. BioPerl is > > >>> required. > > >>> > > >>> Options: > > >>> -h > > >>> print this help > > >>> -format: genbank|fasta|... > > >>> give output format (default=genbank) > > >>> -database: nucleotide|genome|protein|... > > >>> define the database to search in (default=nucleotide) > > >>> > > >>> The full description of the options can be find at > > >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html > > >>> \n"; > > >>> > > >>> if ($#ARGV<1) { > > >>> print $printhelp; > > >>> exit; > > >>> } > > >>> > > >>> open LIST, $ARGV[0]; > > >>> my @list = ; > > >>> > > >>> if ($format eq 'fasta') { $dformat = 'fasta' } > > >>> > > >>> my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', > > >>> -format => $dformat, > > >>> -db => $db, > > >>> ); > > >>> my $seqio = $gb->get_Stream_by_acc(\@list); > > >>> > > >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", > > >>> -format => $format, > > >>> ); > > >>> while (my $seqo = $seqio->next_seq ) { > > >>> print $seqo->id, "\n"; > > >>> $seqout->write_seq($seqo); > > >>> } > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> Christopher Fields > > >> Postdoctoral Researcher > > >> Lab of Dr. Robert Switzer > > >> Dept of Biochemistry > > >> University of Illinois Urbana-Champaign > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at uiuc.edu Wed Jan 30 15:04:18 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Jan 2008 14:04:18 -0600 Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com> References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com> Message-ID: <0BA39C27-1871-441B-B2DE-F7FECF8570D7@uiuc.edu> Sounds like a bug in the GenBank parser. Could you post a bug report with an example sequence record and your script? http://bugzilla.open-bio.org/ chris On Jan 30, 2008, at 1:30 PM, snoze pa wrote: > Hi Hilmar, > > After spending lots of time i figure out the error. I am able to load > sequences if the sequences do not have following entry > > xrefs (non-sequence databases): > > If the Genbank sequence have this entry then script > load_seqdatabase.pl is > crashing. I try it in couple of sequences and found it is the > culprit line > genbank format. But this line is important as it contain lots of > information... so I am wondering how to solve this problem > > Any help? > > Thanks in advance > s > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Jan 30 15:42:14 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 30 Jan 2008 14:42:14 -0600 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com> References: <200801291219.23172.tristan.lefebure@gmail.com> <479F7149.1010203@atgc.org> <200801300956.07849.tristan.lefebure@gmail.com> Message-ID: <29768205-F511-4EDB-84D2-BCC36DBA92C7@uiuc.edu> When using Bio::DB::EUtilities (from bioperl-live) this works for me: use Bio::DB::EUtilities; # get array of IDs somehow, in @ids my ($start, $chunk, $last) = (0, 100, $#ids); my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'genbank'); my $ct = 1; # used to denote separate files my $tries = 0; # server attempts while ($start < $last) { # want seqs in chunk size of 100 (set above) my $end = ($start + $chunk - 1 ) < $last ? ($start + $chunk - 1) : $last; # grab slice of IDs my @sub = @ids[$start..$end]; # pass to agent $factory->set_parameters(-id => \@sub ); eval { # check server response, if good send to file $factory->get_Response(-file => ">seqs_$ct.gb"); }; # ERROR! if ($@) { $tries++; if ($tries <= 10) { warn("Server problem on attempt $tries:$@.\nTrying again..."); redo; } else { die("Repeated server issues after $tries attempts."); # could warn and just skip this batch of accs using 'next' } } $start = $end+1; $ct++; $tries = 0; } chris On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote: > Thank you both! > > Just in case it might be usefull for someone else, here are my > ramblings: > > 1. I first tried to adapt my script and fetch 500 sequences at a > time. It works, except that ~40% of the time NCBI gives the > following error and my script crashed: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: WebDBSeqI Request Error: > [...] > The proxy server received an invalid > response from an upstream server. > [...] > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ > Root.pm:359 > STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ > DB/WebDBSeqI.pm:685 > STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ > 5.8.8/Bio/DB/WebDBSeqI.pm:472 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ > 5.8.8/Bio/DB/NCBIHelper.pm:361 > STACK: ./fetch_from_genbank.pl:68 > ----------------------------------------------------------- > > I tried to modify the script so that when the retrieval of a 500 > sequence block crashes, it continues with the other blocks, but I > was unsuccessfull. It probably needs some better understanding of > BioPerl errors... > Here is the section of the script that was modified: > ######### > my $n_seq = scalar @list; > my @aborted; > > for (my $i=1; $i<=$n_seq; $i += 500) { > print "Fetching sequences $i to ", $i+499, ": "; > my $start = $i -1; > my $end = $i + 500 -1; > my @red_list = @list[$start .. $end]; > my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', > -format => $dformat, > -db => $db, > ); > > my $seqio; > unless( $seqio = $gb->get_Stream_by_acc(\@red_list)) { > print "Aborted, resubmit latter\n"; > push @aborted, @red_list; > next; > } > > my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i", > -format => $format, > ); > while (my $seqo = $seqio->next_seq ) { > # print $seqo->id, "\n"; > $seqout->write_seq($seqo); > } > print "Done\n"; > } > > if (@aborted) { > open OUT, ">aborted_fetching.AN"; > foreach (@aborted) { print OUT $_ }; > } > ########## > > > 2. So I moved to the second solution and tried batchentrez. I cut my > 120,000 long AN list into 10,000 long pieces using split: > split -l 10000 full_list.AN splitted_list_ > > and then submitted the 13 lists one by one. I must say that I don't > really like using a web-interface to fetch data, and here the most > ennoying part is that you end up with a regular Entrez/GenBank > webpage: select your format, export to file, chosse file name... and > have to do it many times. > It is too much prone to human and web-browser errors for my taste, > but it worked. > Nevertheless there is some caveats: > - some downloaded files were incomplete (~10%) and you have to > restart it > - you can't submit several lists in the same time (otherwise the > same cookie will be used and you'll end up with several identical > files) > > -Tristan > > On Tuesday 29 January 2008 13:44:16 you wrote: >> Forgot about that one; it's definitely a better way to do it if you >> have the GI/accessions. >> >> chris >> >> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote: >>> you don't need to use bioperl to accomplish this task, to download >>> several thousand sequences based on accession ID list. >>> >>> NCBI batch Entrez can do that: >>> http://www.ncbi.nlm.nih.gov/sites/batchentrez >>> >>> just submit a large list of IDs, select database, and download. >>> >>> you can submit ~50,000 IDs in one file usually without problems. >>> it may not return results if a list is larger than ~100,000 IDs >>> >>> -- >>> Alexander Kozik >>> Bioinformatics Specialist >>> Genome and Biomedical Sciences Facility >>> 451 Health Sciences Drive >>> Genome Center, 4-th floor, room 4302 >>> University of California >>> Davis, CA 95616-8816 >>> Phone: (530) 754-9127 >>> email#1: akozik at atgc.org >>> email#2: akozik at gmail.com >>> web: http://www.atgc.org/ >>> >>> Chris Fields wrote: >>>> Yes, you can only retrieve ~500 sequences at a time using either >>>> Bio::DB::GenBank. Both Bio::DB::GenBank and Bio::DB::EUtilities >>>> interact with NCBI's EUtilities (the former module returns raw data >>>> from the URL to be processed later, the latter module returns >>>> Bio::Seq/Bio::SeqIO objects). >>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d >>>> atasets You can usually post more IDs using epost and fetch >>>> sequence >>>> referring to the WebEnv/key combo (batch posting). I try to make >>>> this a bit easier with EUtilities but it is woefully lacking in >>>> documentation (my fault), but there is some code up on the wiki >>>> which should work. >>>> chris >>>> >>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote: >>>>> Hello, >>>>> >>>>> I would like to download a large number of sequences from GenBank >>>>> (122,146 to be exact) following a list of accession numbers. >>>>> I first investigated around Bio::DB::EUtilities, but got lost and >>>>> finally used Bio::DB::GenBank. >>>>> My script works well for short request, but it gives the following >>>>> error with the long request: >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: WebDBSeqI Request Error: >>>>> 500 short write >>>>> Content-Type: text/plain >>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT >>>>> Client-Warning: Internal response >>>>> >>>>> 500 short write >>>>> >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ >>>>> Root/ >>>>> Root.pm:359 >>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/ >>>>> Bio/DB/WebDBSeqI.pm:685 >>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ >>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472 >>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/ >>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361 >>>>> STACK: ./fetch_from_genbank.pl:58 >>>>> --------------------------------------------------------- >>>>> >>>>> Does that mean that we can only fetch 500 sequences at a time? >>>>> Should I split my list in 500 ids framents and submit them one >>>>> after the other? >>>>> >>>>> Any suggestions very welcomed... >>>>> Thanks, >>>>> -Tristan >>>>> >>>>> >>>>> Here is the script: >>>>> >>>>> ################################## >>>>> use strict; >>>>> use warnings; >>>>> use Bio::DB::GenBank; >>>>> # use Bio::DB::EUtilities; >>>>> use Bio::SeqIO; >>>>> use Getopt::Long; >>>>> >>>>> # 2008-01-22 T Lefebure >>>>> # I tried to use Bio::DB::EUtilities without much succes and get >>>>> back to Bio::DB::GenBank. >>>>> # The following procedure is not really good as the stream is >>>>> first copied to a temporary file, >>>>> # and than re-used by BioPerl to generate the final file. >>>>> >>>>> my $db = 'nucleotide'; >>>>> my $format = 'genbank'; >>>>> my $help= ''; >>>>> my $dformat = 'gb'; >>>>> >>>>> GetOptions( >>>>> 'help|?' => \$help, >>>>> 'format=s' => \$format, >>>>> 'database=s' => \$db, >>>>> ); >>>>> >>>>> >>>>> my $printhelp = "\nUsage: $0 [options] >>>>> >>>>> >>>>> Will download the corresponding data from GenBank. BioPerl is >>>>> required. >>>>> >>>>> Options: >>>>> -h >>>>> print this help >>>>> -format: genbank|fasta|... >>>>> give output format (default=genbank) >>>>> -database: nucleotide|genome|protein|... >>>>> define the database to search in (default=nucleotide) >>>>> >>>>> The full description of the options can be find at >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ >>>>> efetchseq_help.html >>>>> \n"; >>>>> >>>>> if ($#ARGV<1) { >>>>> print $printhelp; >>>>> exit; >>>>> } >>>>> >>>>> open LIST, $ARGV[0]; >>>>> my @list = ; >>>>> >>>>> if ($format eq 'fasta') { $dformat = 'fasta' } >>>>> >>>>> my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', >>>>> -format => $dformat, >>>>> -db => $db, >>>>> ); >>>>> my $seqio = $gb->get_Stream_by_acc(\@list); >>>>> >>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", >>>>> -format => $format, >>>>> ); >>>>> while (my $seqo = $seqio->next_seq ) { >>>>> print $seqo->id, "\n"; >>>>> $seqout->write_seq($seqo); >>>>> } >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From georg.otto at tuebingen.mpg.de Thu Jan 31 04:34:31 2008 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Thu, 31 Jan 2008 10:34:31 +0100 Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests References: <200801291219.23172.tristan.lefebure@gmail.com> Message-ID: Hi, I succeeded with a similar task using the seqhound database. I had a list of > 200,000 gid numbers, but I guess it can work in a similar fashion using accession numbers. Here is the script: #!/usr/perl use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::Query::GenBank; use Bio::DB::SeqHound; my $sh = new Bio::DB::SeqHound(); my($USAGE) = "$0 id_file\n\n"; unless(@ARGV) { print $USAGE; exit; } my $id_file = $ARGV[0]; open ID_FILE, "<$id_file" or die "error: $!"; while () { chomp; my $id = $_; if (defined(my $seq_obj = $sh->get_Seq_by_gi($id))) { my $out = Bio::SeqIO->new(-format => 'fasta'); $out->write_seq($seq_obj); } else { next; } } Best, Georg Tristan Lefebure writes: > Hello, > > I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers. > I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. > My script works well for short request, but it gives the following error with the long request: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: WebDBSeqI Request Error: > 500 short write > Content-Type: text/plain > Client-Date: Tue, 29 Jan 2008 17:22:46 GMT > Client-Warning: Internal response > > 500 short write > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685 > STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361 > STACK: ./fetch_from_genbank.pl:58 > --------------------------------------------------------- > > Does that mean that we can only fetch 500 sequences at a time? > Should I split my list in 500 ids framents and submit them one after the other? > > Any suggestions very welcomed... > Thanks, > -Tristan > > > Here is the script: > > ################################## > use strict; > use warnings; > use Bio::DB::GenBank; > # use Bio::DB::EUtilities; > use Bio::SeqIO; > use Getopt::Long; > > # 2008-01-22 T Lefebure > # I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank. > # The following procedure is not really good as the stream is first copied to a temporary file, > # and than re-used by BioPerl to generate the final file. > > my $db = 'nucleotide'; > my $format = 'genbank'; > my $help= ''; > my $dformat = 'gb'; > > GetOptions( > 'help|?' => \$help, > 'format=s' => \$format, > 'database=s' => \$db, > ); > > > my $printhelp = "\nUsage: $0 [options] > > Will download the corresponding data from GenBank. BioPerl is required. > > Options: > -h > print this help > -format: genbank|fasta|... > give output format (default=genbank) > -database: nucleotide|genome|protein|... > define the database to search in (default=nucleotide) > > The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n"; > > if ($#ARGV<1) { > print $printhelp; > exit; > } > > open LIST, $ARGV[0]; > my @list = ; > > if ($format eq 'fasta') { $dformat = 'fasta' } > > my $gb = new Bio::DB::GenBank( -retrievaltype => 'tempfile', > -format => $dformat, > -db => $db, > ); > my $seqio = $gb->get_Stream_by_acc(\@list); > > my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]", > -format => $format, > ); > while (my $seqo = $seqio->next_seq ) { > print $seqo->id, "\n"; > $seqout->write_seq($seqo); > } From bernd.web at gmail.com Thu Jan 31 05:48:15 2008 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 31 Jan 2008 11:48:15 +0100 Subject: [Bioperl-l] searchio/blast Message-ID: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com> Hi, I noticed that the HTMLWriter output for a BLAST report may not be correct if more than one sequence was "blasted". After the BLAST report of the first sequence the report is ended with: Search Parameters Parameter Value Search Statistics Statistic Value Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on Thu Jan 31 11:35:51 2008 Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37 tseemann Exp $ Then the second HTML blast report follows. Although maybe generally 1 sequence is blasted by a user requiring HTML output, this may be nice to fix? Also for the HTML Writer of FastA reports the statistics section is empty, An additional issue with HTMLWriter containing more than 1 BLAST report is the following: When a sequence ID occurs more than once, the link (on the E-value) is to the first occurrence since it is not report specific. In case the above is regarded as unwanted, I'd be happy to make a concise example with code. Best regards, Bernd From cjfields at uiuc.edu Thu Jan 31 07:39:46 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 31 Jan 2008 06:39:46 -0600 Subject: [Bioperl-l] searchio/blast In-Reply-To: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com> References: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com> Message-ID: The easiest way to take care of these (so we don't forget about them and can track changes) is to add them as BioPerl bugs/enhancement requests to bugzilla, along with example reports and code. chris On Jan 31, 2008, at 4:48 AM, Bernd Web wrote: > Hi, > > I noticed that the HTMLWriter output for a BLAST report may not be > correct if more than one sequence was "blasted". > > After the BLAST report of the first sequence the report is ended with: > Search Parameters > Parameter Value > > Search Statistics > Statistic Value > > Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on > Thu Jan 31 11:35:51 2008 > Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37 > tseemann Exp $ > > Then the second HTML blast report follows. > Although maybe generally 1 sequence is blasted by a user requiring > HTML output, this may be nice to fix? > Also for the HTML Writer of FastA reports the statistics section is > empty, > > An additional issue with HTMLWriter containing more than 1 BLAST > report is the following: > When a sequence ID occurs more than once, the link (on the E-value) is > to the first occurrence since it is not report specific. > > In case the above is regarded as unwanted, I'd be happy to make a > concise example with code. > > > Best regards, > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Thu Jan 31 08:12:25 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 31 Jan 2008 08:12:25 -0500 Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com> References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com> Message-ID: On Jan 30, 2008, at 2:30 PM, snoze pa wrote: > Hi Hilmar, > > After spending lots of time i figure out the error. I am able to load > sequences if the sequences do not have following entry > > xrefs (non-sequence databases): Is this the literal value? I am asking because I can't find this in the file at http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb which you said was giving you grief. So does the genbank file above now load, or how can I identify the critical line in there? -hilmar > > If the Genbank sequence have this entry then script > load_seqdatabase.pl is > crashing. I try it in couple of sequences and found it is the > culprit line > genbank format. But this line is important as it contain lots of > information... so I am wondering how to solve this problem > > Any help? > > Thanks in advance > s > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From snoze.pa at gmail.com Thu Jan 31 13:46:24 2008 From: snoze.pa at gmail.com (snoze pa) Date: Thu, 31 Jan 2008 12:46:24 -0600 Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl In-Reply-To: References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com> Message-ID: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com> The link i sent was related to my tutorial. I was following that website. The typical example is one of the following which have *xrefs (non-sequence databases): line. thanks s * LOCUS P27912 792 aa linear VRL 15-JAN-2008 DEFINITION Genome polyprotein [Contains: Protein C (Core protein) (Capsid protein); prM; Peptide pr; Small envelope protein M (Matrix protein); Envelope protein E; Non-structural protein 1 (NS1)]. ACCESSION P27912 VERSION P27912.1 GI:130422 DBSOURCE swissprot: locus POLG_DEN1A, accession P27912; class: standard. created: Aug 1, 1992. sequence updated: Aug 1, 1992. annotation updated: Jan 15, 2008. xrefs: D00502.1, BAA00394.1, B32401 *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912, GO:0005789, InterPro:IPR011999, InterPro:IPR013754, InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157, InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:2.60.98.10, Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832, Pfam:PF00869, Pfam:PF01004, Pfam:PF00948, Pfam:PF01570 KEYWORDS Capsid protein; Cleavage on pair of basic residues; Endoplasmic reticulum; Envelope protein; Glycoprotein; Membrane; Secreted; Transmembrane; Viral nucleoprotein; Virion. SOURCE Dengue virus 1 Thailand/AHF 82-80/1980 ORGANISM Dengue virus 1 Thailand/AHF 82-80/1980 Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; Flavivirus; Dengue virus group. REFERENCE 1 (residues 1 to 792) AUTHORS Chu,M.C., O'Rourke,E.J. and Trent,D.W. TITLE Genetic relatedness among structural protein genes of dengue 1 virus strains JOURNAL J. Gen. Virol. 70 (PT 7), 1701-1712 (1989) PUBMED 2738579 REMARK NUCLEOTIDE SEQUENCE [GENOMIC RNA]. COMMENT On May 27, 2005 this sequence version replaced gi:418950. [FUNCTION] Protein C packages viral RNA to form a viral nucleocapsid, and promotes virion budding (By similarity). [FUNCTION] prM acts as a chaperone for envelope protein E during intracellular virion assembly by masking and inactivating envelope protein E fusion peptide. prM is matured in the last step of virion assembly, presumably to avoid catastrophic activation of the viral fusion peptide induced by the acidic pH of the trans-Golgi network. After cleavage by host furin, the pr peptide is released in the extracellular medium and small envelope protein M and envelope protein E homodimers are dissociated (By similarity). [FUNCTION] Envelope protein E binds cell surface receptor and is involved in membrane fusion between virion and target cell. Synthesized as an homodimer with prM which acts as a chaperone for envelope protein E. After cleavage of prM, envelope protein E dissociate from small envelope protein M and homodimerizes (By similarity). [FUNCTION] Non-structural protein 1 is slowly secreted from mammalian cells, but not from mosquito cells. Secreted form elicits protective immune response and plays an essential role in RNA replication. Soluble and membrane-associated NS1 may activate human complement and induce host vascular leakage. This effect might explain the clinical manifestations of dengue hemorrhagic fever and dengue shock syndrome (By similarity). [SUBUNIT] prM and envelope protein E form heterodimers in the endoplasmic reticulum and Golgi. Envelope protein E forms homodimers. NS1 forms homodimers as well as homohexamers when secreted. NS1 may interact with NS4A (By similarity). [SUBCELLULAR LOCATION] Note=The virion is assembled in the endoplasmic reticulum lumen, transported by vesicles to the Golgi, then transported again to the cell membrane where it is released outside the cell. [SUBCELLULAR LOCATION] Protein C: Virion (By similarity). [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity). [SUBCELLULAR LOCATION] Small envelope protein M: Virion membrane; Single-pass type I membrane protein (By similarity). [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane; Single-pass type I membrane protein (By similarity). [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted. Endoplasmic reticulum membrane; Peripheral membrane protein; Lumenal side (By similarity). [DOMAIN] Transmembrane domains of the small envelope protein M and envelope protein E contains an endoplasmic reticulum retention signals (By similarity). [PTM] Specific enzymatic cleavages in vivo yield mature proteins. The nascent protein C contains a C-terminal hydrophobic domain that act as a signal sequence for translocation of prM into the lumen of the ER. Mature protein C is cleaved at a site upstream of this hydrophobic domain by NS3. prM is cleaved in post-Golgi vesicles by a host furin, releasing the mature small envelope protein M, and peptide pr (By similarity). [PTM] Envelope protein E and non-structural protein 1 are N-glycosylated (By similarity). FEATURES Location/Qualifiers source 1..792 /organism="Dengue virus 1 Thailand/AHF 82-80/1980" /specific_host="Aedes aegypti (Yellowfever mosquito)" /specific_host="Homo sapiens (Human)" /db_xref="taxon:11057" Protein 1..>792 /product="Genome polyprotein [Contains: Protein C" Region 1..101 /region_name="Topological domain" /inference="non-experimental evidence, no additional details recorded" /note="Cytoplasmic (Potential)." Region 1..100 /region_name="Mature chain" /experiment="experimental evidence, no additional details recorded" /note="Protein C. /FTId=PRO_0000037884." Region 5..114 /region_name="Flavi_capsid" /note="Flavivirus capsid protein C. Flaviviruses are small enveloped viruses with virions comprised of 3 proteins called C, M and E. Multiple copies of the C protein form the nucleocapsid, which contains the ssRNA molecule; pfam01003" /db_xref="CDD:85176" Site 100..101 /site_type="cleavage" /inference="non-experimental evidence, no additional details recorded" /note="Cleavage; by serine protease NS3 (By similarity)." Region 101..114 /region_name="Propeptide" /experiment="experimental evidence, no additional details recorded" /note="ER anchor for the protein C, removed in mature form by serine protease NS3. /FTId=PRO_0000037885." Region 102..122 /region_name="Transmembrane region" /inference="non-experimental evidence, no additional details recorded" /note="Potential." Site 114..115 /site_type="cleavage" /inference="non-experimental evidence, no additional details recorded" /note="Cleavage; by host signal peptidase (By similarity)." Region 115..280 /region_name="Mature chain" /experiment="experimental evidence, no additional details recorded" /note="prM. /FTId=PRO_0000264649." Region 115..205 /region_name="Mature chain" /experiment="experimental evidence, no additional details recorded" /note="Peptide pr. /FTId=PRO_0000264650." Region 119..204 /region_name="Flavi_propep" /note="Flavivirus polyprotein propeptide. The flaviviruses are small enveloped animal viruses containing a single positive strand genomic RNA. The genome encodes one large ORF a polyprotein which undergos proteolytic processing into mature viral peptide chains; pfam01570" /db_xref="CDD:65376" Region 123..238 /region_name="Topological domain" /inference="non-experimental evidence, no additional details recorded" /note="Extracellular (Potential)." Site 183 /site_type="glycosylation" /inference="non-experimental evidence, no additional details recorded" /note="N-linked (GlcNAc...) (Potential)." Site 205..206 /site_type="cleavage" /inference="non-experimental evidence, no additional details recorded" /note="Cleavage; by host furin (By similarity)." Region 206..280 /region_name="Flavi_M" /note="Flavivirus envelope glycoprotein M. Flaviviruses are small enveloped viruses with virions comprised of 3 proteins called C, M and E. The envelope glycoprotein M is made as a precursor, called prM; pfam01004" /db_xref="CDD:85177" Region 206..280 /region_name="Mature chain" /experiment="experimental evidence, no additional details recorded" /note="Small envelope protein M. /FTId=PRO_0000037886." Region 239..259 /region_name="Transmembrane region" /inference="non-experimental evidence, no additional details recorded" /note="Potential." Region 260..265 /region_name="Topological domain" /inference="non-experimental evidence, no additional details recorded" /note="Cytoplasmic (Potential)." Region 266..286 /region_name="Transmembrane region" /inference="non-experimental evidence, no additional details recorded" /note="Potential." Site 280..281 /site_type="cleavage" /inference="non-experimental evidence, no additional details recorded" /note="Cleavage; by host signal peptidase (By similarity)." Region 281..775 /region_name="Mature chain" /experiment="experimental evidence, no additional details recorded" /note="Envelope protein E. /FTId=PRO_0000037887." Region 281..576 /region_name="Flavi_glycoprot" /note="Flavivirus glycoprotein, central and dimerisation domains; pfam00869" /db_xref="CDD:85082" Bond bond(283,310) /bond_type="disulfide" /inference="non-experimental evidence, no additional details recorded" /note="By similarity." Region 287..725 /region_name="Topological domain" /inference="non-experimental evidence, no additional details recorded" /note="Extracellular (Potential)." Bond bond(340,401) /bond_type="disulfide" /inference="non-experimental evidence, no additional details recorded" /note="By similarity." Site 347 /site_type="glycosylation" /inference="non-experimental evidence, no additional details recorded" /note="N-linked (GlcNAc...) (Potential)." Bond bond(354,385) /bond_type="disulfide" /inference="non-experimental evidence, no additional details recorded" /note="By similarity." Bond bond(372,396) /bond_type="disulfide" /inference="non-experimental evidence, no additional details recorded" /note="By similarity." Site 433 /site_type="glycosylation" /inference="non-experimental evidence, no additional details recorded" /note="N-linked (GlcNAc...) (Potential)." Bond bond(465,565) /bond_type="disulfide" /inference="non-experimental evidence, no additional details recorded" /note="By similarity." Region 578..673 /region_name="Flavi_glycop_C" /note="Flavivirus glycoprotein, immunoglobulin-like domain; pfam02832" /db_xref="CDD:66513" Bond bond(582,613) /bond_type="disulfide" /inference="non-experimental evidence, no additional details recorded" /note="By similarity." Region 726..746 /region_name="Transmembrane region" /inference="non-experimental evidence, no additional details recorded" /note="Potential." Region 747..752 /region_name="Topological domain" /inference="non-experimental evidence, no additional details recorded" /note="Cytoplasmic (Potential)." Region 753..773 /region_name="Transmembrane region" /inference="non-experimental evidence, no additional details recorded" /note="Potential." Region 774..>792 /region_name="Topological domain" /inference="non-experimental evidence, no additional details recorded" /note="Extracellular (Potential)." Site 775..776 /site_type="cleavage" /inference="non-experimental evidence, no additional details recorded" /note="Cleavage; by host signal peptidase (By similarity)." Region 776..>792 /region_name="Mature chain" /experiment="experimental evidence, no additional details recorded" /note="Non-structural protein 1. /FTId=PRO_0000037888." ORIGIN 1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf vaflrflaip 61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp talafhlttr 121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm teaepddvdc 181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega wkqiqkvetw 241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd fveglsgatw 301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt dsrcptqgea 361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv qyenlkysvi 421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg ldfnrvvllt 481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev vvlgsqegam 541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek evaetqhgtv 601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae ppfgesyivv 661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft svgklihqif 721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg vmvqadsgcv 781 inwkgkelkc gs // On Jan 31, 2008 7:12 AM, Hilmar Lapp wrote: > > On Jan 30, 2008, at 2:30 PM, snoze pa wrote: > > > Hi Hilmar, > > > > After spending lots of time i figure out the error. I am able to load > > sequences if the sequences do not have following entry > > > > xrefs (non-sequence databases): > > Is this the literal value? I am asking because I can't find this in > the file at > > http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb > > which you said was giving you grief. So does the genbank file above > now load, or how can I identify the critical line in there? > > -hilmar > > > > If the Genbank sequence have this entry then script > > load_seqdatabase.pl is > > crashing. I try it in couple of sequences and found it is the > > culprit line > > genbank format. But this line is important as it contain lots of > > information... so I am wondering how to solve this problem > > > > Any help? > > > > Thanks in advance > > s > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > From hlapp at gmx.net Thu Jan 31 15:10:35 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 31 Jan 2008 15:10:35 -0500 Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl In-Reply-To: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com> References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com> <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com> Message-ID: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net> I see. Note that the sequence below is really a UniProt sequence, that has been reformatted into GenBank format, and hence aren't in your typical genbank sequence format (which usually lacks DBSOURCE, for example). (The joys of data integration.) If you load the same sequence from UniProt, does it still fail to parse or to load? Also, does it or does this not mean that sequences at the link you sent load w/o error? I.e., can I close that issue report, or is there a bug in bioperl-db? -hilmar On Jan 31, 2008, at 1:46 PM, snoze pa wrote: > The link i sent was related to my tutorial. I was following that > website. The typical example is one of the following which have > xrefs (non-sequence databases): line. > thanks > s > > LOCUS P27912 792 aa linear VRL > 15-JAN-2008 > DEFINITION Genome polyprotein [Contains: Protein C (Core protein) > (Capsid > protein); prM; Peptide pr; Small envelope protein M > (Matrix > protein); Envelope protein E; Non-structural protein 1 > (NS1)]. > ACCESSION P27912 > VERSION P27912.1 GI:130422 > DBSOURCE swissprot: locus POLG_DEN1A, accession P27912; > class: standard. > created: Aug 1, 1992. > sequence updated: Aug 1, 1992. > annotation updated: Jan 15, 2008. > xrefs: D00502.1, BAA00394.1, B32401 > xrefs (non-sequence databases): HSSP:Q88653, SMR:P27912, > GO:0005789, InterPro:IPR011999, InterPro:IPR013754, > InterPro:IPR001122, InterPro:IPR000069, > InterPro:IPR001157, > InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA: > 2.60.98.10, > Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832, > Pfam:PF00869, > Pfam:PF01004, Pfam:PF00948, Pfam:PF01570 > KEYWORDS Capsid protein; Cleavage on pair of basic residues; > Endoplasmic > reticulum; Envelope protein; Glycoprotein; Membrane; > Secreted; > Transmembrane; Viral nucleoprotein; Virion. > SOURCE Dengue virus 1 Thailand/AHF 82-80/1980 > ORGANISM Dengue virus 1 Thailand/AHF 82-80/1980 > Viruses; ssRNA positive-strand viruses, no DNA stage; > Flaviviridae; > Flavivirus; Dengue virus group. > REFERENCE 1 (residues 1 to 792) > AUTHORS Chu,M.C., O'Rourke,E.J. and Trent,D.W. > TITLE Genetic relatedness among structural protein genes of > dengue 1 > virus strains > JOURNAL J. Gen. Virol. 70 (PT 7), 1701-1712 (1989) > PUBMED 2738579 > REMARK NUCLEOTIDE SEQUENCE [GENOMIC RNA]. > COMMENT On May 27, 2005 this sequence version replaced gi:418950. > [FUNCTION] Protein C packages viral RNA to form a viral > nucleocapsid, and promotes virion budding (By similarity). > [FUNCTION] prM acts as a chaperone for envelope protein > E during > intracellular virion assembly by masking and > inactivating envelope > protein E fusion peptide. prM is matured in the last > step of virion > assembly, presumably to avoid catastrophic activation > of the viral > fusion peptide induced by the acidic pH of the trans- > Golgi network. > After cleavage by host furin, the pr peptide is > released in the > extracellular medium and small envelope protein M and > envelope > protein E homodimers are dissociated (By similarity). > [FUNCTION] Envelope protein E binds cell surface > receptor and is > involved in membrane fusion between virion and target > cell. > Synthesized as an homodimer with prM which acts as a > chaperone for > envelope protein E. After cleavage of prM, envelope > protein E > dissociate from small envelope protein M and > homodimerizes (By > similarity). > [FUNCTION] Non-structural protein 1 is slowly secreted > from > mammalian cells, but not from mosquito cells. Secreted > form elicits > protective immune response and plays an essential role > in RNA > replication. Soluble and membrane-associated NS1 may > activate human > complement and induce host vascular leakage. This > effect might > explain the clinical manifestations of dengue > hemorrhagic fever and > dengue shock syndrome (By similarity). > [SUBUNIT] prM and envelope protein E form heterodimers > in the > endoplasmic reticulum and Golgi. Envelope protein E forms > homodimers. NS1 forms homodimers as well as > homohexamers when > secreted. NS1 may interact with NS4A (By similarity). > [SUBCELLULAR LOCATION] Note=The virion is assembled in the > endoplasmic reticulum lumen, transported by vesicles to > the Golgi, > then transported again to the cell membrane where it is > released > outside the cell. > [SUBCELLULAR LOCATION] Protein C: Virion (By similarity). > [SUBCELLULAR LOCATION] Peptide pr: Secreted (By > similarity). > [SUBCELLULAR LOCATION] Small envelope protein M: Virion > membrane; > Single-pass type I membrane protein (By similarity). > [SUBCELLULAR LOCATION] Envelope protein E: Virion > membrane; > Single-pass type I membrane protein (By similarity). > [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted. > Endoplasmic reticulum membrane; Peripheral membrane > protein; > Lumenal side (By similarity). > [DOMAIN] Transmembrane domains of the small envelope > protein M and > envelope protein E contains an endoplasmic reticulum > retention > signals (By similarity). > [PTM] Specific enzymatic cleavages in vivo yield mature > proteins. > The nascent protein C contains a C-terminal hydrophobic > domain that > act as a signal sequence for translocation of prM into > the lumen of > the ER. Mature protein C is cleaved at a site upstream > of this > hydrophobic domain by NS3. prM is cleaved in post-Golgi > vesicles by > a host furin, releasing the mature small envelope > protein M, and > peptide pr (By similarity). > [PTM] Envelope protein E and non-structural protein 1 are > N-glycosylated (By similarity). > FEATURES Location/Qualifiers > source 1..792 > /organism="Dengue virus 1 Thailand/AHF > 82-80/1980" > /specific_host="Aedes aegypti (Yellowfever > mosquito)" > /specific_host="Homo sapiens (Human)" > /db_xref="taxon:11057" > Protein 1..>792 > /product="Genome polyprotein [Contains: > Protein C" > Region 1..101 > /region_name="Topological domain" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Cytoplasmic (Potential)." > Region 1..100 > /region_name="Mature chain" > /experiment="experimental evidence, no > additional details > recorded" > /note="Protein C. /FTId=PRO_0000037884." > Region 5..114 > /region_name="Flavi_capsid" > /note="Flavivirus capsid protein C. > Flaviviruses are small > enveloped viruses with virions comprised of 3 > proteins > called C, M and E. Multiple copies of the C > protein form > the nucleocapsid, which contains the ssRNA > molecule; > pfam01003" > /db_xref="CDD:85176" > Site 100..101 > /site_type="cleavage" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Cleavage; by serine protease NS3 (By > similarity)." > Region 101..114 > /region_name="Propeptide" > /experiment="experimental evidence, no > additional details > recorded" > /note="ER anchor for the protein C, removed in > mature form > by serine protease NS3. /FTId=PRO_0000037885." > Region 102..122 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Potential." > Site 114..115 > /site_type="cleavage" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Cleavage; by host signal peptidase (By > similarity)." > Region 115..280 > /region_name="Mature chain" > /experiment="experimental evidence, no > additional details > recorded" > /note="prM. /FTId=PRO_0000264649." > Region 115..205 > /region_name="Mature chain" > /experiment="experimental evidence, no > additional details > recorded" > /note="Peptide pr. /FTId=PRO_0000264650." > Region 119..204 > /region_name="Flavi_propep" > /note="Flavivirus polyprotein propeptide. The > flaviviruses > are small enveloped animal viruses containing > a single > positive strand genomic RNA. The genome > encodes one large > ORF a polyprotein which undergos proteolytic > processing > into mature viral peptide chains; pfam01570" > /db_xref="CDD:65376" > Region 123..238 > /region_name="Topological domain" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Extracellular (Potential)." > Site 183 > /site_type="glycosylation" > /inference="non-experimental evidence, no > additional > details recorded" > /note="N-linked (GlcNAc...) (Potential)." > Site 205..206 > /site_type="cleavage" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Cleavage; by host furin (By similarity)." > Region 206..280 > /region_name="Flavi_M" > /note="Flavivirus envelope glycoprotein M. > Flaviviruses > are small enveloped viruses with virions > comprised of 3 > proteins called C, M and E. The envelope > glycoprotein M is > made as a precursor, called prM; pfam01004" > /db_xref="CDD:85177" > Region 206..280 > /region_name="Mature chain" > /experiment="experimental evidence, no > additional details > recorded" > /note="Small envelope protein M. / > FTId=PRO_0000037886." > Region 239..259 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Potential." > Region 260..265 > /region_name="Topological domain" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Cytoplasmic (Potential)." > Region 266..286 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Potential." > Site 280..281 > /site_type="cleavage" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Cleavage; by host signal peptidase (By > similarity)." > Region 281..775 > /region_name="Mature chain" > /experiment="experimental evidence, no > additional details > recorded" > /note="Envelope protein E. /FTId=PRO_0000037887." > Region 281..576 > /region_name="Flavi_glycoprot" > /note="Flavivirus glycoprotein, central and > dimerisation > domains; pfam00869" > /db_xref="CDD:85082" > Bond bond(283,310) > /bond_type="disulfide" > /inference="non-experimental evidence, no > additional > details recorded" > /note="By similarity." > Region 287..725 > /region_name="Topological domain" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Extracellular (Potential)." > Bond bond(340,401) > /bond_type="disulfide" > /inference="non-experimental evidence, no > additional > details recorded" > /note="By similarity." > Site 347 > /site_type="glycosylation" > /inference="non-experimental evidence, no > additional > details recorded" > /note="N-linked (GlcNAc...) (Potential)." > Bond bond(354,385) > /bond_type="disulfide" > /inference="non-experimental evidence, no > additional > details recorded" > /note="By similarity." > Bond bond(372,396) > /bond_type="disulfide" > /inference="non-experimental evidence, no > additional > details recorded" > /note="By similarity." > Site 433 > /site_type="glycosylation" > /inference="non-experimental evidence, no > additional > details recorded" > /note="N-linked (GlcNAc...) (Potential)." > Bond bond(465,565) > /bond_type="disulfide" > /inference="non-experimental evidence, no > additional > details recorded" > /note="By similarity." > Region 578..673 > /region_name="Flavi_glycop_C" > /note="Flavivirus glycoprotein, immunoglobulin- > like > domain; pfam02832" > /db_xref="CDD:66513" > Bond bond(582,613) > /bond_type="disulfide" > /inference="non-experimental evidence, no > additional > details recorded" > /note="By similarity." > Region 726..746 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Potential." > Region 747..752 > /region_name="Topological domain" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Cytoplasmic (Potential)." > Region 753..773 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Potential." > Region 774..>792 > /region_name="Topological domain" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Extracellular (Potential)." > Site 775..776 > /site_type="cleavage" > /inference="non-experimental evidence, no > additional > details recorded" > /note="Cleavage; by host signal peptidase (By > similarity)." > Region 776..>792 > /region_name="Mature chain" > /experiment="experimental evidence, no > additional details > recorded" > /note="Non-structural protein 1. / > FTId=PRO_0000037888." > ORIGIN > 1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf > vaflrflaip > 61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp > talafhlttr > 121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm > teaepddvdc > 181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega > wkqiqkvetw > 241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd > fveglsgatw > 301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt > dsrcptqgea > 361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv > qyenlkysvi > 421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg > ldfnrvvllt > 481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev > vvlgsqegam > 541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek > evaetqhgtv > 601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae > ppfgesyivv > 661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft > svgklihqif > 721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg > vmvqadsgcv > 781 inwkgkelkc gs > // > > > On Jan 31, 2008 7:12 AM, Hilmar Lapp wrote: > > On Jan 30, 2008, at 2:30 PM, snoze pa wrote: > > > Hi Hilmar, > > > > After spending lots of time i figure out the error. I am able to > load > > sequences if the sequences do not have following entry > > > > xrefs (non-sequence databases): > > Is this the literal value? I am asking because I can't find this in > the file at > > http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb > > which you said was giving you grief. So does the genbank file above > now load, or how can I identify the critical line in there? > > -hilmar > > > > If the Genbank sequence have this entry then script > > load_seqdatabase.pl is > > crashing. I try it in couple of sequences and found it is the > > culprit line > > genbank format. But this line is important as it contain lots of > > information... so I am wondering how to solve this problem > > > > Any help? > > > > Thanks in advance > > s > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From snoze.pa at gmail.com Thu Jan 31 15:21:18 2008 From: snoze.pa at gmail.com (snoze pa) Date: Thu, 31 Jan 2008 14:21:18 -0600 Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl In-Reply-To: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net> References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com> <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com> <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net> Message-ID: <10f848910801311221q2a9f0d02x6c4600048f05adab@mail.gmail.com> Thanks Hilmar, I also thought that they are translated into genbank format. My problem is i have downloaded tons of sequences from NCBI in gb format. In my flat file, i have many sequences in this format so I am unable to load them into local database using load_seqdatabase.pl script. So far i am full of warnings and errors. Any solution to this problem? otherwise i will try to write some code to load all sequences into local data base. But it seems to be easy to modify the parsing code so that we can load these sequences. >format (which usually lacks DBSOURCE, for example I think if the three dimensional structure of the protein is known then in ncbi gb format the DBSOURCE is common. I agree with you, the joys of integration. The link was related to tutorial i was using.. u can off it. Thanks for looking into matter.. s On Jan 31, 2008 2:10 PM, Hilmar Lapp wrote: > I see. Note that the sequence below is really a UniProt sequence, that has > been reformatted into GenBank format, and hence aren't in your typical > genbank sequence format (which usually lacks DBSOURCE, for example). (The > joys of data integration.) > If you load the same sequence from UniProt, does it still fail to parse or > to load? > > Also, does it or does this not mean that sequences at the link you sent > load w/o error? I.e., can I close that issue report, or is there a bug in > bioperl-db? > > -hilmar > > On Jan 31, 2008, at 1:46 PM, snoze pa wrote: > > The link i sent was related to my tutorial. I was following that website. > The typical example is one of the following which have *xrefs > (non-sequence databases): line. > thanks > s > * > LOCUS P27912 792 aa linear VRL > 15-JAN-2008 > DEFINITION Genome polyprotein [Contains: Protein C (Core protein) (Capsid > protein); prM; Peptide pr; Small envelope protein M (Matrix > protein); Envelope protein E; Non-structural protein 1 (NS1)]. > ACCESSION P27912 > VERSION P27912.1 GI:130422 > DBSOURCE swissprot: locus POLG_DEN1A, accession P27912; > class: standard. > created: Aug 1, 1992. > sequence updated: Aug 1, 1992. > annotation updated: Jan 15, 2008. > xrefs: D00502.1, BAA00394.1, B32401 > *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912, > GO:0005789, InterPro:IPR011999, InterPro:IPR013754, > InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157, > InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA: > 2.60.98.10, > Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832, > Pfam:PF00869, > Pfam:PF01004, Pfam:PF00948, Pfam:PF01570 > KEYWORDS Capsid protein; Cleavage on pair of basic residues; > Endoplasmic > reticulum; Envelope protein; Glycoprotein; Membrane; Secreted; > Transmembrane; Viral nucleoprotein; Virion. > SOURCE Dengue virus 1 Thailand/AHF 82-80/1980 > ORGANISM Dengue virus 1 Thailand/AHF 82-80/1980 > Viruses; ssRNA positive-strand viruses, no DNA stage; > Flaviviridae; > Flavivirus; Dengue virus group. > REFERENCE 1 (residues 1 to 792) > AUTHORS Chu,M.C., O'Rourke,E.J. and Trent,D.W. > TITLE Genetic relatedness among structural protein genes of dengue 1 > virus strains > JOURNAL J. Gen. Virol. 70 (PT 7), 1701-1712 (1989) > PUBMED 2738579 > REMARK NUCLEOTIDE SEQUENCE [GENOMIC RNA]. > COMMENT On May 27, 2005 this sequence version replaced gi:418950. > [FUNCTION] Protein C packages viral RNA to form a viral > nucleocapsid, and promotes virion budding (By similarity). > [FUNCTION] prM acts as a chaperone for envelope protein E > during > intracellular virion assembly by masking and inactivating > envelope > protein E fusion peptide. prM is matured in the last step of > virion > assembly, presumably to avoid catastrophic activation of the > viral > fusion peptide induced by the acidic pH of the trans-Golgi > network. > After cleavage by host furin, the pr peptide is released in > the > extracellular medium and small envelope protein M and envelope > protein E homodimers are dissociated (By similarity). > [FUNCTION] Envelope protein E binds cell surface receptor and > is > involved in membrane fusion between virion and target cell. > Synthesized as an homodimer with prM which acts as a chaperone > for > envelope protein E. After cleavage of prM, envelope protein E > dissociate from small envelope protein M and homodimerizes (By > similarity). > [FUNCTION] Non-structural protein 1 is slowly secreted from > mammalian cells, but not from mosquito cells. Secreted form > elicits > protective immune response and plays an essential role in RNA > replication. Soluble and membrane-associated NS1 may activate > human > complement and induce host vascular leakage. This effect might > explain the clinical manifestations of dengue hemorrhagic > fever and > dengue shock syndrome (By similarity). > [SUBUNIT] prM and envelope protein E form heterodimers in the > endoplasmic reticulum and Golgi. Envelope protein E forms > homodimers. NS1 forms homodimers as well as homohexamers when > secreted. NS1 may interact with NS4A (By similarity). > [SUBCELLULAR LOCATION] Note=The virion is assembled in the > endoplasmic reticulum lumen, transported by vesicles to the > Golgi, > then transported again to the cell membrane where it is > released > outside the cell. > [SUBCELLULAR LOCATION] Protein C: Virion (By similarity). > [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity). > [SUBCELLULAR LOCATION] Small envelope protein M: Virion > membrane; > Single-pass type I membrane protein (By similarity). > [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane; > Single-pass type I membrane protein (By similarity). > [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted. > Endoplasmic reticulum membrane; Peripheral membrane protein; > Lumenal side (By similarity). > [DOMAIN] Transmembrane domains of the small envelope protein M > and > envelope protein E contains an endoplasmic reticulum retention > signals (By similarity). > [PTM] Specific enzymatic cleavages in vivo yield mature > proteins. > The nascent protein C contains a C-terminal hydrophobic domain > that > act as a signal sequence for translocation of prM into the > lumen of > the ER. Mature protein C is cleaved at a site upstream of this > hydrophobic domain by NS3. prM is cleaved in post-Golgi > vesicles by > a host furin, releasing the mature small envelope protein M, > and > peptide pr (By similarity). > [PTM] Envelope protein E and non-structural protein 1 are > N-glycosylated (By similarity). > FEATURES Location/Qualifiers > source 1..792 > /organism="Dengue virus 1 Thailand/AHF 82-80/1980" > /specific_host="Aedes aegypti (Yellowfever mosquito)" > /specific_host="Homo sapiens (Human)" > /db_xref="taxon:11057" > Protein 1..>792 > /product="Genome polyprotein [Contains: Protein C" > Region 1..101 > /region_name="Topological domain" > /inference="non-experimental evidence, no additional > details recorded" > /note="Cytoplasmic (Potential)." > Region 1..100 > /region_name="Mature chain" > /experiment="experimental evidence, no additional > details > recorded" > /note="Protein C. /FTId=PRO_0000037884." > Region 5..114 > /region_name="Flavi_capsid" > /note="Flavivirus capsid protein C. Flaviviruses are > small > enveloped viruses with virions comprised of 3 > proteins > called C, M and E. Multiple copies of the C protein > form > the nucleocapsid, which contains the ssRNA molecule; > pfam01003" > /db_xref="CDD:85176" > Site 100..101 > /site_type="cleavage" > /inference="non-experimental evidence, no additional > details recorded" > /note="Cleavage; by serine protease NS3 (By > similarity)." > Region 101..114 > /region_name="Propeptide" > /experiment="experimental evidence, no additional > details > recorded" > /note="ER anchor for the protein C, removed in mature > form > by serine protease NS3. /FTId=PRO_0000037885." > Region 102..122 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no additional > details recorded" > /note="Potential." > Site 114..115 > /site_type="cleavage" > /inference="non-experimental evidence, no additional > details recorded" > /note="Cleavage; by host signal peptidase (By > similarity)." > Region 115..280 > /region_name="Mature chain" > /experiment="experimental evidence, no additional > details > recorded" > /note="prM. /FTId=PRO_0000264649." > Region 115..205 > /region_name="Mature chain" > /experiment="experimental evidence, no additional > details > recorded" > /note="Peptide pr. /FTId=PRO_0000264650." > Region 119..204 > /region_name="Flavi_propep" > /note="Flavivirus polyprotein propeptide. The > flaviviruses > are small enveloped animal viruses containing a > single > positive strand genomic RNA. The genome encodes one > large > ORF a polyprotein which undergos proteolytic > processing > into mature viral peptide chains; pfam01570" > /db_xref="CDD:65376" > Region 123..238 > /region_name="Topological domain" > /inference="non-experimental evidence, no additional > details recorded" > /note="Extracellular (Potential)." > Site 183 > /site_type="glycosylation" > /inference="non-experimental evidence, no additional > details recorded" > /note="N-linked (GlcNAc...) (Potential)." > Site 205..206 > /site_type="cleavage" > /inference="non-experimental evidence, no additional > details recorded" > /note="Cleavage; by host furin (By similarity)." > Region 206..280 > /region_name="Flavi_M" > /note="Flavivirus envelope glycoprotein M. > Flaviviruses > are small enveloped viruses with virions comprised of > 3 > proteins called C, M and E. The envelope glycoprotein > M is > made as a precursor, called prM; pfam01004" > /db_xref="CDD:85177" > Region 206..280 > /region_name="Mature chain" > /experiment="experimental evidence, no additional > details > recorded" > /note="Small envelope protein M. > /FTId=PRO_0000037886." > Region 239..259 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no additional > details recorded" > /note="Potential." > Region 260..265 > /region_name="Topological domain" > /inference="non-experimental evidence, no additional > details recorded" > /note="Cytoplasmic (Potential)." > Region 266..286 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no additional > details recorded" > /note="Potential." > Site 280..281 > /site_type="cleavage" > /inference="non-experimental evidence, no additional > details recorded" > /note="Cleavage; by host signal peptidase (By > similarity)." > Region 281..775 > /region_name="Mature chain" > /experiment="experimental evidence, no additional > details > recorded" > /note="Envelope protein E. /FTId=PRO_0000037887." > Region 281..576 > /region_name="Flavi_glycoprot" > /note="Flavivirus glycoprotein, central and > dimerisation > domains; pfam00869" > /db_xref="CDD:85082" > Bond bond(283,310) > /bond_type="disulfide" > /inference="non-experimental evidence, no additional > details recorded" > /note="By similarity." > Region 287..725 > /region_name="Topological domain" > /inference="non-experimental evidence, no additional > details recorded" > /note="Extracellular (Potential)." > Bond bond(340,401) > /bond_type="disulfide" > /inference="non-experimental evidence, no additional > details recorded" > /note="By similarity." > Site 347 > /site_type="glycosylation" > /inference="non-experimental evidence, no additional > details recorded" > /note="N-linked (GlcNAc...) (Potential)." > Bond bond(354,385) > /bond_type="disulfide" > /inference="non-experimental evidence, no additional > details recorded" > /note="By similarity." > Bond bond(372,396) > /bond_type="disulfide" > /inference="non-experimental evidence, no additional > details recorded" > /note="By similarity." > Site 433 > /site_type="glycosylation" > /inference="non-experimental evidence, no additional > details recorded" > /note="N-linked (GlcNAc...) (Potential)." > Bond bond(465,565) > /bond_type="disulfide" > /inference="non-experimental evidence, no additional > details recorded" > /note="By similarity." > Region 578..673 > /region_name="Flavi_glycop_C" > /note="Flavivirus glycoprotein, immunoglobulin-like > domain; pfam02832" > /db_xref="CDD:66513" > Bond bond(582,613) > /bond_type="disulfide" > /inference="non-experimental evidence, no additional > details recorded" > /note="By similarity." > Region 726..746 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no additional > details recorded" > /note="Potential." > Region 747..752 > /region_name="Topological domain" > /inference="non-experimental evidence, no additional > details recorded" > /note="Cytoplasmic (Potential)." > Region 753..773 > /region_name="Transmembrane region" > /inference="non-experimental evidence, no additional > details recorded" > /note="Potential." > Region 774..>792 > /region_name="Topological domain" > /inference="non-experimental evidence, no additional > details recorded" > /note="Extracellular (Potential)." > Site 775..776 > /site_type="cleavage" > /inference="non-experimental evidence, no additional > details recorded" > /note="Cleavage; by host signal peptidase (By > similarity)." > Region 776..>792 > /region_name="Mature chain" > /experiment="experimental evidence, no additional > details > recorded" > /note="Non-structural protein 1. > /FTId=PRO_0000037888." > ORIGIN > 1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf > vaflrflaip > 61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp > talafhlttr > 121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm > teaepddvdc > 181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega > wkqiqkvetw > 241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd > fveglsgatw > 301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt > dsrcptqgea > 361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv > qyenlkysvi > 421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg > ldfnrvvllt > 481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev > vvlgsqegam > 541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek > evaetqhgtv > 601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae > ppfgesyivv > 661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft > svgklihqif > 721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg > vmvqadsgcv > 781 inwkgkelkc gs > // > > > On Jan 31, 2008 7:12 AM, Hilmar Lapp wrote: > > > > > On Jan 30, 2008, at 2:30 PM, snoze pa wrote: > > > > > Hi Hilmar, > > > > > > After spending lots of time i figure out the error. I am able to load > > > sequences if the sequences do not have following entry > > > > > > xrefs (non-sequence databases): > > > > Is this the literal value? I am asking because I can't find this in > > the file at > > > > http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb > > > > which you said was giving you grief. So does the genbank file above > > now load, or how can I identify the critical line in there? > > > > -hilmar > > > > > > If the Genbank sequence have this entry then script > > > load_seqdatabase.pl is > > > crashing. I try it in couple of sequences and found it is the > > > culprit line > > > genbank format. But this line is important as it contain lots of > > > information... so I am wondering how to solve this problem > > > > > > Any help? > > > > > > Thanks in advance > > > s > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > >