From lsbrath at gmail.com Wed Dec 1 14:13:43 2010 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 1 Dec 2010 14:13:43 -0500 Subject: [Bioperl-l] Problems loading BioPerl a Message-ID: Hello, I am receiving the following message on my MacOSX system after I run "./Build install" ERROR: Can't create '/usr/local/bin' Do not have write permissions on '/usr/local/bin' Any suggestions? Mgavi From cjfields at illinois.edu Wed Dec 1 14:27:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 1 Dec 2010 13:27:59 -0600 Subject: [Bioperl-l] Problems loading BioPerl a In-Reply-To: References: Message-ID: As the error msg indicates, your user account doesn't have write privs for /usr/local/bin'. Suggestions are one of the following: 1) Install it locally; see the instructions on the wiki for UNIX. 2) Use 'sudo' to install it system-wide. I don't recommend that unless needed if you are using bioperl-live (or for any CPAN code for that matter). chris On Dec 1, 2010, at 1:13 PM, Mgavi Brathwaite wrote: > Hello, > > I am receiving the following message on my MacOSX system after I run > "./Build install" > > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > > Any suggestions? > > Mgavi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From geovjames at gmail.com Thu Dec 2 03:03:52 2010 From: geovjames at gmail.com (gvj) Date: Thu, 2 Dec 2010 00:03:52 -0800 (PST) Subject: [Bioperl-l] how to get ID and parent from GFF Message-ID: <30356493.post@talk.nabble.com> Hi, This is something simple but I m nt able to figure it out whats going wrong. I have a gff like this: 2 test gene 2876 3540 0.38 + . ID=g1 2 test transcript 2876 3540 0.38 + . ID=g1.t1;Parent=g1 2 test transcription_start_site 2876 2876 . + . Parent=g1.t1 2 test exon 2876 3118 . + . Parent=g1.t1 2 test start_codon 3004 3006 . + 0 Parent=g1.t1 2 test intron 3119 3225 1 + . Parent=g1.t1 2 test CDS 3004 3118 1 + 0 ID=g1.t1.cds;Parent=g1.t1 2 test CDS 3226 3329 1 + 2 ID=g1.t1.cds;Parent=g1.t1 2 test exon 3226 3540 . + . Parent=g1.t1 2 test stop_codon 3327 3329 . + 0 Parent=g1.t1 2 test transcription_end_site 3540 3540 . + . Parent=g1.t1 but when I am tring to get the ID value as follow: my $gff = Bio::DB::GFF->new( -adaptor => "memory", -gff => $ARGV[0]); for my $gff_gene ($gff->features("transcript")) { print " YES the parent is there " if( $gff_gene->has_tag('Parent') ) ; # nothing is printing , That means no Parent tag :( my ($tmp) = $gff_gene->get_tag_values("Parent=g1"); my ($attr) = $gff_gene->attributes("ID"); my @tags = $gff_gene->get_all_tags(); my $from_id = $gff_gene->id; print "the keys are:". $attr. " or $tmp the tags are :@tags the ID from : $from_id "; } ~~~output~~~~~~~~ keys are: or the tags are :Parent=g1 the ID from : 26 So it seems like that only Parent tag is found and that also along with its value. Is it something to deal with my gff structure?? -- View this message in context: http://old.nabble.com/how-to-get-ID-and-parent-from-GFF-tp30356493p30356493.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Thu Dec 2 09:53:47 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 2 Dec 2010 08:53:47 -0600 Subject: [Bioperl-l] how to get ID and parent from GFF In-Reply-To: <30356493.post@talk.nabble.com> References: <30356493.post@talk.nabble.com> Message-ID: I'm not sure how this is done via Bio::DB::GFF, but try using Bio::DB::SeqFeature instead (which has parent-child ties). I think you need the tag value 'parent_id', but 'Parent' might work as well, with the caveat I haven't tried this myself yet. chris On Dec 2, 2010, at 2:03 AM, gvj wrote: > > Hi, > This is something simple but I m nt able to figure it out whats going wrong. > I have a gff like this: > 2 test gene 2876 3540 0.38 + . ID=g1 > 2 test transcript 2876 3540 0.38 + . > ID=g1.t1;Parent=g1 > 2 test transcription_start_site 2876 2876 . + > . Parent=g1.t1 > 2 test exon 2876 3118 . + . Parent=g1.t1 > 2 test start_codon 3004 3006 . + 0 > Parent=g1.t1 > 2 test intron 3119 3225 1 + . Parent=g1.t1 > 2 test CDS 3004 3118 1 + 0 > ID=g1.t1.cds;Parent=g1.t1 > 2 test CDS 3226 3329 1 + 2 > ID=g1.t1.cds;Parent=g1.t1 > 2 test exon 3226 3540 . + . Parent=g1.t1 > 2 test stop_codon 3327 3329 . + 0 > Parent=g1.t1 > 2 test transcription_end_site 3540 3540 . + . > Parent=g1.t1 > > but when I am tring to get the ID value as follow: > > my $gff = Bio::DB::GFF->new( -adaptor => "memory", > -gff => $ARGV[0]); > > for my $gff_gene ($gff->features("transcript")) { > print " YES the parent is there " if( $gff_gene->has_tag('Parent') ) ; # > nothing is printing , That means no Parent tag :( > > my ($tmp) = $gff_gene->get_tag_values("Parent=g1"); > my ($attr) = $gff_gene->attributes("ID"); > my @tags = $gff_gene->get_all_tags(); > my $from_id = $gff_gene->id; > > > print "the keys are:". $attr. " or $tmp the tags are :@tags the ID > from : $from_id "; > } > > > ~~~output~~~~~~~~ > keys are: or the tags are :Parent=g1 the ID from : 26 > > So it seems like that only Parent tag is found and that also along with its > value. Is it something to deal with my gff structure?? > -- > View this message in context: http://old.nabble.com/how-to-get-ID-and-parent-from-GFF-tp30356493p30356493.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From clements at nescent.org Fri Dec 3 19:56:11 2010 From: clements at nescent.org (Dave Clements) Date: Fri, 3 Dec 2010 16:56:11 -0800 Subject: [Bioperl-l] 2011 GMOD Spring Training, March 8-12 In-Reply-To: References: Message-ID: Applications are now being accepted for the 2011 GMOD Spring Training course, a five-day hands-on school aimed at teaching new GMOD administrators how to install, configure and integrate popular GMOD components. The course will be held March 8-12 at the US National Evolutionary Synthesis Center (NESCent) in Durham, North Carolina, as part of GMOD Americas 2011. Links: * http://gmod.org/wiki/2011_GMOD_Spring_Training * http://gmod.org/wiki/GMOD_Americas_2011 * http://www.nescent.org/ These components will be covered: * Apollo - genome annotation editor * Chado - biological database schema * Galaxy - workflow system * GBrowse - genome viewer * GBrowse_syn - synteny viewer * GFF3 - genome annotation file format and tools * InterMine - biological data mining system * JBrowse - next generation genome browser * MAKER - genome annotation pipeline * Tripal - web front end to Chado databases The deadline for applying is the end of Friday, January 7, 2011. Admission is competitive and is based on the strength of the application, especially the statement of interest. The 2010 school had over 60 applicants for the 25 slots. Any application received after deadline will be automatically placed on the waiting list. The course requires some knowledge of Linux as a prerequisite. The registration fee will be $265 (only $53 per day!). There will be a limited number of scholarships available. This may be the only GMOD School offered in 2011. If you are interested, you are strongly encouraged to apply by January 7. Thanks, Dave Clements -- http://gmod.org/wiki/GMOD_Americas_2011 http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/Help_Desk_Feedback From kris.richardson at tufts.edu Mon Dec 6 12:16:58 2010 From: kris.richardson at tufts.edu (kris richardson) Date: Mon, 6 Dec 2010 12:16:58 -0500 Subject: [Bioperl-l] eutils help Message-ID: <7D46D5FF-67F3-40F6-8E69-6C72DE9E7975@tufts.edu> Dear Bioperl Users, I am interested in generating the flanking sequences (20 nt from each side) from a list of ~500,000 SNPs, from the dbSNP build 132. I tried using the perl API variation toolset to extract this information, however the script throws an error when it encounters many of the recently discovered SNPs (from the 1000genomes data), as this tool is still using the dbSNP 131 data. I read the bipoerl eUtils tool might be used to obtain this info, but I can not find any example code in which the dbSNP data is queried... Does any one have any pointers or examples on how one might use efetch and eUtils to obtain the flanking sequence for a list of SNP rs #s? Thanks! Kris From Russell.Smithies at agresearch.co.nz Mon Dec 6 14:48:28 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 7 Dec 2010 08:48:28 +1300 Subject: [Bioperl-l] eutils help In-Reply-To: <7D46D5FF-67F3-40F6-8E69-6C72DE9E7975@tufts.edu> References: <7D46D5FF-67F3-40F6-8E69-6C72DE9E7975@tufts.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF3313BCA11C0@exchsth.agresearch.co.nz> I do this quite frequently and it's usually easiest to download the flanking fasta for the SNPs with their batch query tool http://www.ncbi.nlm.nih.gov/SNP/batchquery.html then trim the sequences as required with BioPerl. I think you'll run into problems downloading that many SNPs reliably with eUtils and it's best to break it up into smaller chunks. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of kris richardson > Sent: Tuesday, 7 December 2010 6:17 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] eutils help > > Dear Bioperl Users, > > I am interested in generating the flanking sequences (20 nt from each > side) from a list of ~500,000 SNPs, from the dbSNP build 132. > > I tried using the perl API variation toolset to extract this > information, however the script throws an error when it encounters many > of the recently discovered SNPs (from the 1000genomes data), as this > tool is still using the dbSNP 131 data. > > I read the bipoerl eUtils tool might be used to obtain this info, but > I can not find any example code in which the dbSNP data is queried... > Does any one have any pointers or examples on how one might use efetch > and eUtils to obtain the flanking sequence for a list of SNP rs #s? > > > Thanks! > > Kris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From biocjh at gmail.com Tue Dec 7 00:29:11 2010 From: biocjh at gmail.com (C.J.) Date: Tue, 7 Dec 2010 13:29:11 +0800 Subject: [Bioperl-l] extract protein sequence Message-ID: Dear all, I have download many polyprotein sequences from Genbank. As the polyprotein sequence contains several mature peptides. I want to extract my target mature peptide from these sequences. Anyone would be kind to tell me any model in Bioperl can do this? Thanks. -- Regards! C.J. From cjfields at illinois.edu Tue Dec 7 08:04:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Dec 2010 07:04:41 -0600 Subject: [Bioperl-l] extract protein sequence In-Reply-To: References: Message-ID: On Dec 6, 2010, at 11:29 PM, C.J. wrote: > Dear all, > > I have download many polyprotein sequences from Genbank. > As the polyprotein sequence contains several mature peptides. > I want to extract my target mature peptide from these sequences. > Anyone would be kind to tell me any model in Bioperl can do this? > Thanks. > > -- > Regards! > C.J. You'll need to provide some example accessions to look at. My guess is, if the mature peptide is described as a feature, then yes. chris From cjfields at illinois.edu Tue Dec 7 12:44:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Dec 2010 11:44:01 -0600 Subject: [Bioperl-l] eutils help In-Reply-To: <7D46D5FF-67F3-40F6-8E69-6C72DE9E7975@tufts.edu> References: <7D46D5FF-67F3-40F6-8E69-6C72DE9E7975@tufts.edu> Message-ID: <66C5EFC1-1F39-4729-A70E-3491768B9702@illinois.edu> The 'brief' return type format for dbSNP gives something like this (see below). XML output (changing 'retmode' to XML instead of text) give much more information. use strict; use warnings; use Bio::DB::EUtilities; my $term = shift; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'snp', -email => 'foo at bar.org', -term => $term, -usehistory => 'y', -retmax => 100); my $hist = $eutil->next_History || die "No history returned"; $eutil->set_parameters(-eutil => 'efetch', -history => $hist, -retmode => 'text', -rettype => 'brief'); print $eutil->get_Response->content."\n"; chris On Dec 6, 2010, at 11:16 AM, kris richardson wrote: > Dear Bioperl Users, > > I am interested in generating the flanking sequences (20 nt from each side) from a list of ~500,000 SNPs, from the dbSNP build 132. > > I tried using the perl API variation toolset to extract this information, however the script throws an error when it encounters many of the recently discovered SNPs (from the 1000genomes data), as this tool is still using the dbSNP 131 data. > > I read the bipoerl eUtils tool might be used to obtain this info, but I can not find any example code in which the dbSNP data is queried... Does any one have any pointers or examples on how one might use efetch and eUtils to obtain the flanking sequence for a list of SNP rs #s? > > > Thanks! > > Kris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From deeepersound at googlemail.com Tue Dec 7 15:00:35 2010 From: deeepersound at googlemail.com (Maxim) Date: Tue, 7 Dec 2010 21:00:35 +0100 Subject: [Bioperl-l] Bio::DB::Fasta problem Message-ID: Hi, I'm using Bio::DB::Fasta to retrieve sequences. This works like a charm (super fast!!!), for example like in the script below, where bed-files are taken as input. My problem: as soon as the initialization of the fasta directory index is called from within an if-statement, I get an error like: Can't call method "get_Seq_by_id" on an undefined value at extract_bed_bio_db_fasta_test.pl line 20, line 134. The script is as simple as this, the error comes up when I use the full script (including the 2 out-commented lines): use Bio::Perl; use Bio::DB::Fasta; $bedFileName=@ARGV[0]; $genome = @ARGV[1]; #if ($genome eq "mm9") { my $db = Bio::DB::Fasta->new('/Users/Computing/BIODBFASTA/mm9_masked'); #} open (INP, "$bedFileName"); foreach () { @words = split /\s+/, $_; ### 0-chr 1-start 2-end $chr = @words[0];$chr =~s/\s+//g; $start = @words[1];$start =~s/\s+//g; $end = @words[2];$end =~s/\s+//g; $fast_name = $chr . "_" . $start . "_" . $end; my $obj = $db->get_Seq_by_id($chr); $subseq = $obj->subseq($start => $end); print ">", "$fast_name\n"; print "$subseq\n"; } What is the problem when initializing from within the if-statement? I really appreciate all kind of advice, I guess the reason is rather simple, but I do not get it. Regards Maxim From scott at scottcain.net Tue Dec 7 15:08:56 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 7 Dec 2010 15:08:56 -0500 Subject: [Bioperl-l] Bio::DB::Fasta problem In-Reply-To: References: Message-ID: Hi Maxim, I think you have a scope problem. If you declare the variable $db inside of the if block, it ceases to exist when exiting the block. Put the "my $db;" before the if block so that it will continue to exist. Scott On Tue, Dec 7, 2010 at 3:00 PM, Maxim wrote: > Hi, > > I'm using Bio::DB::Fasta to retrieve sequences. This works like a charm > (super fast!!!), for example like in the script below, where bed-files are > taken as input. > > My problem: as soon as the initialization of the fasta directory index is > called from within an if-statement, I get an error like: > > Can't call method "get_Seq_by_id" on an undefined value at > extract_bed_bio_db_fasta_test.pl line 20, line 134. > > > The script is as simple as this, the error comes up when I use the full > script (including the 2 out-commented lines): > > use Bio::Perl; > use Bio::DB::Fasta; > > $bedFileName=@ARGV[0]; > $genome = @ARGV[1]; > #if ($genome eq "mm9") { > my $db ? ? ?= Bio::DB::Fasta->new('/Users/Computing/BIODBFASTA/mm9_masked'); > #} > > open (INP, "$bedFileName"); > foreach () > { > @words = split /\s+/, $_; ### 0-chr 1-start 2-end > $chr = @words[0];$chr =~s/\s+//g; > $start = @words[1];$start =~s/\s+//g; > $end = @words[2];$end =~s/\s+//g; > ? $fast_name = $chr . "_" . $start . "_" . $end; > my $obj ? ? = $db->get_Seq_by_id($chr); > $subseq ?= $obj->subseq($start => $end); > print ">", "$fast_name\n"; > print "$subseq\n"; > } > > What is the problem when initializing from within the if-statement? > > I really appreciate all kind of advice, I guess the reason is rather simple, > but I do not get it. > Regards > Maxim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From jason.stajich at gmail.com Tue Dec 7 15:55:38 2010 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 07 Dec 2010 12:55:38 -0800 Subject: [Bioperl-l] Bio::DB::Fasta problem In-Reply-To: References: Message-ID: <4CFE9F4A.4090808@gmail.com> use strict at the top of your script would have also helped catch this. > Scott Cain December 7, 2010 12:08 PM: > > Hi Maxim, > > I think you have a scope problem. If you declare the variable $db > inside of the if block, it ceases to exist when exiting the block. > Put the "my $db;" before the if block so that it will continue to > exist. > > Scott > > > On Tue, Dec 7, 2010 at 3:00 PM, Maxim wrote: >> Hi, >> >> I'm using Bio::DB::Fasta to retrieve sequences. This works like a charm >> (super fast!!!), for example like in the script below, where bed-files are >> taken as input. >> >> My problem: as soon as the initialization of the fasta directory index is >> called from within an if-statement, I get an error like: >> >> Can't call method "get_Seq_by_id" on an undefined value at >> extract_bed_bio_db_fasta_test.pl line 20, line 134. >> >> >> The script is as simple as this, the error comes up when I use the full >> script (including the 2 out-commented lines): >> >> use Bio::Perl; >> use Bio::DB::Fasta; >> >> $bedFileName=@ARGV[0]; >> $genome = @ARGV[1]; >> #if ($genome eq "mm9") { >> my $db = Bio::DB::Fasta->new('/Users/Computing/BIODBFASTA/mm9_masked'); >> #} >> >> open (INP, "$bedFileName"); >> foreach () >> { >> @words = split /\s+/, $_; ### 0-chr 1-start 2-end >> $chr = @words[0];$chr =~s/\s+//g; >> $start = @words[1];$start =~s/\s+//g; >> $end = @words[2];$end =~s/\s+//g; >> $fast_name = $chr . "_" . $start . "_" . $end; >> my $obj = $db->get_Seq_by_id($chr); >> $subseq = $obj->subseq($start => $end); >> print ">", "$fast_name\n"; >> print "$subseq\n"; >> } >> >> What is the problem when initializing from within the if-statement? >> >> I really appreciate all kind of advice, I guess the reason is rather simple, >> but I do not get it. >> Regards >> Maxim >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > ------------------------------------------------------------------------ > > Maxim December 7, 2010 12:00 PM: > > Hi, > > I'm using Bio::DB::Fasta to retrieve sequences. This works like a charm > (super fast!!!), for example like in the script below, where bed-files are > taken as input. > > My problem: as soon as the initialization of the fasta directory index is > called from within an if-statement, I get an error like: > > Can't call method "get_Seq_by_id" on an undefined value at > extract_bed_bio_db_fasta_test.pl line 20, line 134. > > > The script is as simple as this, the error comes up when I use the full > script (including the 2 out-commented lines): > > use Bio::Perl; > use Bio::DB::Fasta; > > $bedFileName=@ARGV[0]; > $genome = @ARGV[1]; > #if ($genome eq "mm9") { > my $db = Bio::DB::Fasta->new('/Users/Computing/BIODBFASTA/mm9_masked'); > #} > > open (INP, "$bedFileName"); > foreach () > { > @words = split /\s+/, $_; ### 0-chr 1-start 2-end > $chr = @words[0];$chr =~s/\s+//g; > $start = @words[1];$start =~s/\s+//g; > $end = @words[2];$end =~s/\s+//g; > $fast_name = $chr . "_" . $start . "_" . $end; > my $obj = $db->get_Seq_by_id($chr); > $subseq = $obj->subseq($start => $end); > print ">", "$fast_name\n"; > print "$subseq\n"; > } > > What is the problem when initializing from within the if-statement? > > I really appreciate all kind of advice, I guess the reason is rather simple, > but I do not get it. > Regards > Maxim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich From miguel.pignatelli at uv.es Thu Dec 9 07:05:59 2010 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 09 Dec 2010 13:05:59 +0100 Subject: [Bioperl-l] Another Taxonomy modules to CPAN In-Reply-To: <4CD12E99.7080701@uv.es> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> Message-ID: <4D00C627.40208@uv.es> Hi all, I have uploaded these modules into CPAN a couple of weeks ago. Bio::LITE::Taxonomy Bio::LITE::Taxonomy::NCBI Bio::LITE::Taxonomy::NCBI::Gi2taxi Bio::LITE::Taxonomy::RDP Feel free to test them and see if they fit your needs. Remember: They are not part of Bioperl, just alternatives. See the documentation for (not many... but improving) details. Any feedback is highly welcome Regards, M; On 11/03/2010 10:42 AM, Miguel Pignatelli wrote: > Hi all, > > I have written a couple of modules that overlap certain functionality > with Bio::DB::Taxonomy and Bio::Taxon. I had to write them because > certain constraints in the environment I had to run it (GRID) made > impossible to use a bioperl based solution. > > > The main features of these modules are: > > + No dependencies of non-standard Perl modules > + NCBI and RDP based taxonomies supported > + Very fast and low memory footprint -- orders of magnitude faster than > Bioperl modules (for the tasks they are designed for --). > > Of course, they do not compete with Bio::DB::Taxonomy and Bio::Taxon in > completeness or integration with other tools (e.g. rest of bioperl suit) > but they are handy for mapping very large datasets (for example blast > results) with the NCBI or RDP Taxonomy. > > The modules are: > > Taxonomy::Base -- Finds ancestors, ranks, converts between > names, ranks and IDs, etc... > > Taxonomy::RDP -- Reads the taxonomic tree from the RDP xml file > > Taxonomy::NCBI -- Reads the taxonomic tree from flat NCBI files > (nodes.dmp and names.dmp) > (Similar to Bio::DB::Taxonomy::flatfile) > > Taxonomy::NCBI::Gi2taxid -- Converts very fast and efficiently > NCBI GIs to Taxids. > Uses a binary lookup table. > > These modules are being used by several groups now -- mainly working > with large metagenomics datasets -- and I am considering uploading them > to CPAN, but I am not clear on where these modules should be placed there. > > How do you think I should name these modules? (e.g. where these modules > should live in CPAN?) Their natural place could be under > Bio::DB::Taxonomy, maybe Bio::DB::Taxonomy::Lite / > Bio::DB::Taxonomy::Lite::NCBI / etc...? Is this possible (and > convenient) without being part of Bioperl? Any other suggestions? > > Thank you very much in advance, > > M; > > ---------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.bolser at gmail.com Fri Dec 10 07:40:30 2010 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 10 Dec 2010 12:40:30 +0000 Subject: [Bioperl-l] Flip a sequence (with features) Message-ID: Hi, What is the best way to take a sequence covered in features and flip it? i.e. the orientation of all the features should be inverted, and their position on the sequence should be flipped, such that a feature near the start becomes a feature near the end... I'm sure there is a trivial solution to this, but I'm not sure where to start looking! Cheers, Dan. From roy.chaudhuri at gmail.com Fri Dec 10 07:53:17 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 10 Dec 2010 12:53:17 +0000 Subject: [Bioperl-l] Flip a sequence (with features) In-Reply-To: References: Message-ID: <4D0222BD.8030807@gmail.com> See Bio::SeqUtils->revcom_with_features Cheers, Roy. On 10/12/2010 12:40, Dan Bolser wrote: > Hi, > > What is the best way to take a sequence covered in features and flip it? > > i.e. the orientation of all the features should be inverted, and their > position on the sequence should be flipped, such that a feature near > the start becomes a feature near the end... > > I'm sure there is a trivial solution to this, but I'm not sure where > to start looking! > > > Cheers, > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From deeepersound at googlemail.com Sat Dec 11 13:16:07 2010 From: deeepersound at googlemail.com (Maxim) Date: Sat, 11 Dec 2010 19:16:07 +0100 Subject: [Bioperl-l] how to parse results from Bio::Biblio Message-ID: Hi, I have problems to parse XML-like results from Bio::Biblio/Bio::Biblio::IO. I thought to use XML::Simple, but I get an error when I attempt to do it like in below example script: #!/usr/bin/perl -w use Bio::Biblio; use Bio::Biblio::IO; use XML::Simple; use Data::Dumper; use strict; my $biblio = new Bio::Biblio; my $citation = $biblio->get_by_id ('18287711'); #print $citation; my $xml = new XML::Simple; my $data = $xml->XMLin($citation); # this part works, output to me looks like it should be print Dumper($data); #this part does not! print "Abstract: $data->{Abstract}"; The error: could not find ParserDetails.ini in /opt/local/lib/perl5/site_perl/5.8.9/XML/SAX ### I think this is not relevant to the error as "normal" parsing of XML files works on my machine Use of uninitialized value in concatenation (.) or string at test.pl line 21. The line that print the dump of var1 appears to contain something that looks like an XML file, but obviously it's not properly parsed. I guess there will be a solution for this problem within BioPerl without the requirement to use other modules (like XML::Simple), but I cannot figure out how to do this! Maxim From cjfields at illinois.edu Sat Dec 11 16:31:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 11 Dec 2010 15:31:51 -0600 Subject: [Bioperl-l] how to parse results from Bio::Biblio In-Reply-To: References: Message-ID: <08A09F6C-42CE-4F0D-BADB-306379F039EB@illinois.edu> On Dec 11, 2010, at 12:16 PM, Maxim wrote: > Hi, > > I have problems to parse XML-like results from Bio::Biblio/Bio::Biblio::IO. > I thought to use XML::Simple, but I get an error when I attempt to do it > like in below example script: > > #!/usr/bin/perl -w > > use Bio::Biblio; > use Bio::Biblio::IO; > use XML::Simple; > use Data::Dumper; > use strict; > > my $biblio = new Bio::Biblio; > my $citation = $biblio->get_by_id ('18287711'); > > #print $citation; > > my $xml = new XML::Simple; > > my $data = $xml->XMLin($citation); > > # this part works, output to me looks like it should be > print Dumper($data); > #this part does not! > print "Abstract: $data->{Abstract}"; > > > The error: > could not find ParserDetails.ini in > /opt/local/lib/perl5/site_perl/5.8.9/XML/SAX ### I think this is not > relevant to the error as "normal" parsing of XML files works on my machine > Use of uninitialized value in concatenation (.) or string at test.pl line > 21. According to the script above, you have not installed XML::SAX, or ParserDetails.ini wasn't installed correctly. > The line that print the dump of var1 appears to contain something that looks > like an XML file, but obviously it's not properly parsed. I guess there will > be a solution for this problem within BioPerl without the requirement to use > other modules (like XML::Simple), but I cannot figure out how to do this! > > Maxim AFAIK Bio::Biblio was never completely implemented; the framework is there (Bio::Biblio::*), parsers exists (Bio::Biblio::IO::*), and the DB connections are made (Bio::DB::Biblio), but no progress has been made on these in quite a while. You are more than welcome to work on these if you are interested. chris From biocjh at gmail.com Sun Dec 12 11:45:59 2010 From: biocjh at gmail.com (C.J.) Date: Mon, 13 Dec 2010 00:45:59 +0800 Subject: [Bioperl-l] extract protein sequence In-Reply-To: References: Message-ID: Hi Chris, There is a example. I want to extract the VP4 sequence from the file listing blow: Could you tell me how to do this with bio-perl? Thanks! *********************** LOCUS ACF74968 69 aa linear VRL 02-AUG-2008 DEFINITION polyprotein [Human enterovirus 71]. ACCESSION ACF74968 VERSION ACF74968.1 GI:194485377 DBSOURCE accession EU862482.1 KEYWORDS . SOURCE Human enterovirus 71 ORGANISM Human enterovirus 71 Viruses; ssRNA positive-strand viruses, no DNA stage; Picornavirales; Picornaviridae; Enterovirus; Human enterovirus A. REFERENCE 1 (residues 1 to 69) AUTHORS Li,Y., Qian,Y., Zhu,R., Deng,J., Zhao,L., Wang,F., Liu,L., Sun,Y., Chen,D., Zhang,Y., Jia,L., Ding,Y., Dong,H. and Zhang,S. TITLE Sequence analysis of VP4 of Enterovirus 71 isolated in Beijing between 2007 and 2008 JOURNAL Unpublished REFERENCE 2 (residues 1 to 69) AUTHORS Li,Y., Qian,Y., Zhu,R., Deng,J., Zhao,L., Wang,F., Liu,L., Sun,Y., Chen,D., Zhang,Y., Jia,L., Ding,Y., Dong,H. and Zhang,S. TITLE Direct Submission JOURNAL Submitted (27-JUN-2008) Laboratory of Virology, Capital Institute of Pediatrics, Number 2, Yabao Road, Chaoyang District, Beijing City 100020, The People's Republic of China COMMENT Method: conceptual translation. FEATURES Location/Qualifiers source 1..69 /organism="Human enterovirus 71" /strain="BJ97" /host="Homo sapiens" /db_xref="taxon:39054" /country="China: Beijing" /collection_date="May-2008" Protein 1..>69 /product="polyprotein" mat_peptide 1..69 /product="VP4" Region 2..69 /region_name="Pico_P1A" /note="Picornavirus coat protein (VP4); pfam02226" /db_xref="CDD:145404" CDS 1..69 /coded_by="EU862482.1:1..>207" ORIGIN 1 mgsqvstqrs gshensnsat egstinytti nyykdsyaat agkqslkqdp dkfanpvkdi 61 ftemaaplk // ******************* 2010/12/7 Chris Fields : > On Dec 6, 2010, at 11:29 PM, C.J. wrote: > >> Dear all, >> >> I have download many polyprotein sequences from Genbank. >> As the polyprotein sequence contains several mature peptides. >> I want to extract my target mature peptide from these sequences. >> Anyone would be kind to tell me any model in Bioperl can do this? >> Thanks. >> >> -- >> Regards! >> C.J. > > You'll need to provide some example accessions to look at. ?My guess is, if the mature peptide is described as a feature, then yes. > > chris > > > -- Regards! C.J. From jordi.durban at gmail.com Sun Dec 12 15:02:49 2010 From: jordi.durban at gmail.com (Jordi Durban) Date: Sun, 12 Dec 2010 21:02:49 +0100 Subject: [Bioperl-l] extract protein sequence In-Reply-To: References: Message-ID: Try using: use Bio::DB::GenBank; use Bio::AnnotatableI; And something like: my $prot_obj = $gb->get_Seq_by_acc( $prot_id ); $out->write_seq( $prot_obj ); It could be useful. Hope this helps. 2010/12/12 C.J. > Hi Chris, > > There is a example. I want to extract the VP4 sequence from the file > listing blow: > Could you tell me how to do this with bio-perl? > Thanks! > > *********************** > LOCUS ACF74968 69 aa linear VRL > 02-AUG-2008 > DEFINITION polyprotein [Human enterovirus 71]. > ACCESSION ACF74968 > VERSION ACF74968.1 GI:194485377 > DBSOURCE accession EU862482.1 > KEYWORDS . > SOURCE Human enterovirus 71 > ORGANISM Human enterovirus 71 > Viruses; ssRNA positive-strand viruses, no DNA stage; > Picornavirales; Picornaviridae; Enterovirus; Human enterovirus > A. > REFERENCE 1 (residues 1 to 69) > AUTHORS Li,Y., Qian,Y., Zhu,R., Deng,J., Zhao,L., Wang,F., Liu,L., > Sun,Y., > Chen,D., Zhang,Y., Jia,L., Ding,Y., Dong,H. and Zhang,S. > TITLE Sequence analysis of VP4 of Enterovirus 71 isolated in Beijing > between 2007 and 2008 > JOURNAL Unpublished > REFERENCE 2 (residues 1 to 69) > AUTHORS Li,Y., Qian,Y., Zhu,R., Deng,J., Zhao,L., Wang,F., Liu,L., > Sun,Y., > Chen,D., Zhang,Y., Jia,L., Ding,Y., Dong,H. and Zhang,S. > TITLE Direct Submission > JOURNAL Submitted (27-JUN-2008) Laboratory of Virology, Capital > Institute > of Pediatrics, Number 2, Yabao Road, Chaoyang District, Beijing > City 100020, The People's Republic of China > COMMENT Method: conceptual translation. > FEATURES Location/Qualifiers > source 1..69 > /organism="Human enterovirus 71" > /strain="BJ97" > /host="Homo sapiens" > /db_xref="taxon:39054" > /country="China: Beijing" > /collection_date="May-2008" > Protein 1..>69 > /product="polyprotein" > mat_peptide 1..69 > /product="VP4" > Region 2..69 > /region_name="Pico_P1A" > /note="Picornavirus coat protein (VP4); pfam02226" > /db_xref="CDD:145404" > CDS 1..69 > /coded_by="EU862482.1:1..>207" > ORIGIN > 1 mgsqvstqrs gshensnsat egstinytti nyykdsyaat agkqslkqdp dkfanpvkdi > 61 ftemaaplk > // > ******************* > > > 2010/12/7 Chris Fields : > > On Dec 6, 2010, at 11:29 PM, C.J. wrote: > > > >> Dear all, > >> > >> I have download many polyprotein sequences from Genbank. > >> As the polyprotein sequence contains several mature peptides. > >> I want to extract my target mature peptide from these sequences. > >> Anyone would be kind to tell me any model in Bioperl can do this? > >> Thanks. > >> > >> -- > >> Regards! > >> C.J. > > > > You'll need to provide some example accessions to look at. My guess is, > if the mature peptide is described as a feature, then yes. > > > > chris > > > > > > > > > > -- > Regards! > C.J. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jordi From jason at bioperl.org Sun Dec 12 15:10:21 2010 From: jason at bioperl.org (Jason Stajich) Date: Sun, 12 Dec 2010 12:10:21 -0800 Subject: [Bioperl-l] extract protein sequence In-Reply-To: References: Message-ID: <4D052C2D.4090905@bioperl.org> But CJ wants the subseq Something like this would get the sub-feature. for my $feat ( $seq->get_SeqFeatures ) { if( $feat->primary_tag eq 'mat_peptide' ) { my $subfeatseq = $seq->trunc($feat->start, $feat->end); my ($prod) = $feat->get_tag_values('product'); $subfeatseq->display_id($prod); $out->write_seq($subfeatseq); } } See also staring around slide 23: http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf Jordi Durban wrote: > Try using: > use Bio::DB::GenBank; > use Bio::AnnotatableI; > > And something like: > my $prot_obj = $gb->get_Seq_by_acc( $prot_id ); > $out->write_seq( $prot_obj ); > > It could be useful. > Hope this helps. > 2010/12/12 C.J. > >> Hi Chris, >> >> There is a example. I want to extract the VP4 sequence from the file >> listing blow: >> Could you tell me how to do this with bio-perl? >> Thanks! >> >> *********************** >> LOCUS ACF74968 69 aa linear VRL >> 02-AUG-2008 >> DEFINITION polyprotein [Human enterovirus 71]. >> ACCESSION ACF74968 >> VERSION ACF74968.1 GI:194485377 >> DBSOURCE accession EU862482.1 >> KEYWORDS . >> SOURCE Human enterovirus 71 >> ORGANISM Human enterovirus 71 >> Viruses; ssRNA positive-strand viruses, no DNA stage; >> Picornavirales; Picornaviridae; Enterovirus; Human enterovirus >> A. >> REFERENCE 1 (residues 1 to 69) >> AUTHORS Li,Y., Qian,Y., Zhu,R., Deng,J., Zhao,L., Wang,F., Liu,L., >> Sun,Y., >> Chen,D., Zhang,Y., Jia,L., Ding,Y., Dong,H. and Zhang,S. >> TITLE Sequence analysis of VP4 of Enterovirus 71 isolated in Beijing >> between 2007 and 2008 >> JOURNAL Unpublished >> REFERENCE 2 (residues 1 to 69) >> AUTHORS Li,Y., Qian,Y., Zhu,R., Deng,J., Zhao,L., Wang,F., Liu,L., >> Sun,Y., >> Chen,D., Zhang,Y., Jia,L., Ding,Y., Dong,H. and Zhang,S. >> TITLE Direct Submission >> JOURNAL Submitted (27-JUN-2008) Laboratory of Virology, Capital >> Institute >> of Pediatrics, Number 2, Yabao Road, Chaoyang District, Beijing >> City 100020, The People's Republic of China >> COMMENT Method: conceptual translation. >> FEATURES Location/Qualifiers >> source 1..69 >> /organism="Human enterovirus 71" >> /strain="BJ97" >> /host="Homo sapiens" >> /db_xref="taxon:39054" >> /country="China: Beijing" >> /collection_date="May-2008" >> Protein 1..>69 >> /product="polyprotein" >> mat_peptide 1..69 >> /product="VP4" >> Region 2..69 >> /region_name="Pico_P1A" >> /note="Picornavirus coat protein (VP4); pfam02226" >> /db_xref="CDD:145404" >> CDS 1..69 >> /coded_by="EU862482.1:1..>207" >> ORIGIN >> 1 mgsqvstqrs gshensnsat egstinytti nyykdsyaat agkqslkqdp dkfanpvkdi >> 61 ftemaaplk >> // >> ******************* >> >> >> 2010/12/7 Chris Fields: >>> On Dec 6, 2010, at 11:29 PM, C.J. wrote: >>> >>>> Dear all, >>>> >>>> I have download many polyprotein sequences from Genbank. >>>> As the polyprotein sequence contains several mature peptides. >>>> I want to extract my target mature peptide from these sequences. >>>> Anyone would be kind to tell me any model in Bioperl can do this? >>>> Thanks. >>>> >>>> -- >>>> Regards! >>>> C.J. >>> You'll need to provide some example accessions to look at. My guess is, >> if the mature peptide is described as a feature, then yes. >>> chris >>> >>> >>> >> >> >> -- >> Regards! >> C.J. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki From cjfields at illinois.edu Sun Dec 12 18:10:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 12 Dec 2010 17:10:07 -0600 Subject: [Bioperl-l] extract protein sequence In-Reply-To: <4D052C2D.4090905@bioperl.org> References: <4D052C2D.4090905@bioperl.org> Message-ID: <4E050C94-D325-4AEC-A651-92654DB66CCB@illinois.edu> +1 on this, which uses the Bio::FeatureHolder API. See also the Feature-Annotation HOWTO: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation chris On Dec 12, 2010, at 2:10 PM, Jason Stajich wrote: > But CJ wants the subseq > Something like this would get the sub-feature. > for my $feat ( $seq->get_SeqFeatures ) { > if( $feat->primary_tag eq 'mat_peptide' ) { > my $subfeatseq = $seq->trunc($feat->start, $feat->end); > my ($prod) = $feat->get_tag_values('product'); > $subfeatseq->display_id($prod); > $out->write_seq($subfeatseq); > } > } > > See also staring around slide 23: > http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf > > > Jordi Durban wrote: >> Try using: >> use Bio::DB::GenBank; >> use Bio::AnnotatableI; >> >> And something like: >> my $prot_obj = $gb->get_Seq_by_acc( $prot_id ); >> $out->write_seq( $prot_obj ); >> >> It could be useful. >> Hope this helps. >> 2010/12/12 C.J. >> >>> Hi Chris, >>> >>> There is a example. I want to extract the VP4 sequence from the file >>> listing blow: >>> Could you tell me how to do this with bio-perl? >>> Thanks! >>> >>> *********************** >>> LOCUS ACF74968 69 aa linear VRL >>> 02-AUG-2008 >>> DEFINITION polyprotein [Human enterovirus 71]. >>> ACCESSION ACF74968 >>> VERSION ACF74968.1 GI:194485377 >>> DBSOURCE accession EU862482.1 >>> KEYWORDS . >>> SOURCE Human enterovirus 71 >>> ORGANISM Human enterovirus 71 >>> Viruses; ssRNA positive-strand viruses, no DNA stage; >>> Picornavirales; Picornaviridae; Enterovirus; Human enterovirus >>> A. >>> REFERENCE 1 (residues 1 to 69) >>> AUTHORS Li,Y., Qian,Y., Zhu,R., Deng,J., Zhao,L., Wang,F., Liu,L., >>> Sun,Y., >>> Chen,D., Zhang,Y., Jia,L., Ding,Y., Dong,H. and Zhang,S. >>> TITLE Sequence analysis of VP4 of Enterovirus 71 isolated in Beijing >>> between 2007 and 2008 >>> JOURNAL Unpublished >>> REFERENCE 2 (residues 1 to 69) >>> AUTHORS Li,Y., Qian,Y., Zhu,R., Deng,J., Zhao,L., Wang,F., Liu,L., >>> Sun,Y., >>> Chen,D., Zhang,Y., Jia,L., Ding,Y., Dong,H. and Zhang,S. >>> TITLE Direct Submission >>> JOURNAL Submitted (27-JUN-2008) Laboratory of Virology, Capital >>> Institute >>> of Pediatrics, Number 2, Yabao Road, Chaoyang District, Beijing >>> City 100020, The People's Republic of China >>> COMMENT Method: conceptual translation. >>> FEATURES Location/Qualifiers >>> source 1..69 >>> /organism="Human enterovirus 71" >>> /strain="BJ97" >>> /host="Homo sapiens" >>> /db_xref="taxon:39054" >>> /country="China: Beijing" >>> /collection_date="May-2008" >>> Protein 1..>69 >>> /product="polyprotein" >>> mat_peptide 1..69 >>> /product="VP4" >>> Region 2..69 >>> /region_name="Pico_P1A" >>> /note="Picornavirus coat protein (VP4); pfam02226" >>> /db_xref="CDD:145404" >>> CDS 1..69 >>> /coded_by="EU862482.1:1..>207" >>> ORIGIN >>> 1 mgsqvstqrs gshensnsat egstinytti nyykdsyaat agkqslkqdp dkfanpvkdi >>> 61 ftemaaplk >>> // >>> ******************* >>> >>> >>> 2010/12/7 Chris Fields: >>>> On Dec 6, 2010, at 11:29 PM, C.J. wrote: >>>> >>>>> Dear all, >>>>> >>>>> I have download many polyprotein sequences from Genbank. >>>>> As the polyprotein sequence contains several mature peptides. >>>>> I want to extract my target mature peptide from these sequences. >>>>> Anyone would be kind to tell me any model in Bioperl can do this? >>>>> Thanks. >>>>> >>>>> -- >>>>> Regards! >>>>> C.J. >>>> You'll need to provide some example accessions to look at. My guess is, >>> if the mature peptide is described as a feature, then yes. >>>> chris >>>> >>>> >>>> >>> >>> >>> -- >>> Regards! >>> C.J. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> > > -- > Jason Stajich > jason at bioperl.org > http://bioperl.org/wiki > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From klkeysb at gmail.com Tue Dec 14 05:16:09 2010 From: klkeysb at gmail.com (Kevin L. Keys) Date: Tue, 14 Dec 2010 11:16:09 +0100 Subject: [Bioperl-l] Suppress warnings for Bio:.LocatableSeq:.end() Message-ID: Greetings, I am using BioPerl to download multiple alignments from the Ensembl databases. I keep getting the following message: --------------------- WARNING --------------------- MSG: In sequence $(...) residue count gives end value (####). Overriding value [####] with value (####) for Bio::LocatableSeq::end(). where $(...) is an Ensembl transcript ID and the #'s represent digits. I suspect that BioPerl is complaining about varying sequence lengths. For my project, this is no concern of mine since I control for it after downloading. Is there a way to suppress this warning so that it doesn't clog my terminal feed? many thanks, KLK From jun.yin at ucd.ie Tue Dec 14 06:23:59 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 14 Dec 2010 11:23:59 +0000 Subject: [Bioperl-l] Suppress warnings for Bio:.LocatableSeq:.end() In-Reply-To: References: Message-ID: <04b601cb9b81$6768a9b0$3639fd10$%yin@ucd.ie> Hi, Kevin, The warning is because Bio::LocatableSeq::end() is using this formula to check the end coordinate: End=Start+Length-1 It is possible that the start and end assigned by Ensembl does not fit this algorithm. For turning off warnings, I found this comment from one of Chris Fields' old posts: If you want to turn off warnings for any bioperl class, you can set the object to $obj->verbose(-1), or pass it to the constructor: my $seqio = Bio::SeqIO->new(?-file? => $infile, ?-format? => ?fasta?, -verbose => -1); Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin L. Keys Sent: Tuesday, December 14, 2010 10:16 AM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Suppress warnings for Bio:.LocatableSeq:.end() Greetings, I am using BioPerl to download multiple alignments from the Ensembl databases. I keep getting the following message: --------------------- WARNING --------------------- MSG: In sequence $(...) residue count gives end value (####). Overriding value [####] with value (####) for Bio::LocatableSeq::end(). where $(...) is an Ensembl transcript ID and the #'s represent digits. I suspect that BioPerl is complaining about varying sequence lengths. For my project, this is no concern of mine since I control for it after downloading. Is there a way to suppress this warning so that it doesn't clog my terminal feed? many thanks, KLK _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From lskatz at gmail.com Fri Dec 10 18:53:38 2010 From: lskatz at gmail.com (Lee Katz) Date: Fri, 10 Dec 2010 18:53:38 -0500 Subject: [Bioperl-l] Re (3): Status of assembly modules Message-ID: I am wondering if there is a way to optimize the BioPerl code for Assembly IO. Specifically, when I convert a 2.2 MB genome (~200 contigs) from a 454 ace file to a regular ace file, it takes a few hours to get through 30 contigs using the code below (I estimate more than a day to get through all of it). Is there a way to optimize it? To convert a sequence file to another format at most would take a minute and therefore converting an ace on the magnitude of hours or days is too much. I wish I understood bioperl better but I think the best I can do is issue a challenge or a feature request: who can speed up Assembly::IO::ace? # convert a Newbler ace to a standard ace sub _newblerAceToAce($args){ my($self,$args)=@_; my $ace454=Bio::Assembly::IO->new(-file=>$$args{ace454Path},-format=>"ace",-variant=>'454'); my $ace=Bio::Assembly::IO->new(-file=>">$$args{acePath}",-format=>"ace"); #output ace my $numContigs=`grep -c ^CO $$args{ace454Path}`+0; logmsg "Converting $$args{ace454Path} (454-ace) to $$args{acePath} (ace). $numContigs contigs total."; while(my $contig=$ace454->next_contig){ logmsg "Finished with ".$contig->id ." out of $numContigs"; $ace->write_contig($contig); } return $$args{acePath}; } Message: 3 Date: Mon, 22 Nov 2010 15:18:10 -0500 From: Lee Katz Subject: [Bioperl-l] Re(2): Status of assembly modules To: bioperl-l at lists.open-bio.org Message-ID: Content-Type: text/plain; charset=UTF-8 I figured it out (I haven't tested much though). To whoever works on Assembly::IO::ace.pm: I changed a regular expression on line 231 because the contig object was not initializing properly. For some reason the 454 ace file had adopted the reference assembly's ID and therefore there was a GI number followed by a pipe. The pipe was not captured with \w+. I think that the regex will be safe with \s(\S+)\s. if (/^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! #if (/^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! On Thu, Nov 18, 2010 at 12:04 PM, wrote: > Message: 3 > Date: Wed, 17 Nov 2010 22:20:03 -0500 > From: Lee Katz > Subject: Re: [Bioperl-l] Status of assembly modules > To: bioperl-l at lists.open-bio.org > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > I have read on the BioPerl site that a 454 ace is not standardized due to > its coordinate system. How can I convert it to the standard ace file? > > When I run this code either by using contig or assembly objects, I get an > error. > Can't call method "get_consensus_sequence" on an undefined value at > Bio/Assembly/IO/ace.pm line 280, line 93349. > > sub _newblerAceToAce($args){ > my($self,$args)=@_; > my > > $ace454=Bio::Assembly::IO->new(-file=>$args{ace454Path},-format=>"ace ",-variant=>'454'); > my > $ace=Bio::Assembly::IO->new(-file=>">$args{acePath}",-format=>"ace"); > #while(my $contig=$ace454->next_contig){ > while(my $scaffold=$ace454->next_assembly){ > print Dumper $scaffold; > } > return $args{acePath}; > } > From fahmidaa120 at gmail.com Thu Dec 9 07:50:13 2010 From: fahmidaa120 at gmail.com (Fahmida) Date: Thu, 9 Dec 2010 04:50:13 -0800 (PST) Subject: [Bioperl-l] Help Parsing FASTA Sequence File Message-ID: <30416193.post@talk.nabble.com> Hi, I've several input 'score' files and their corresponding 'data' files like: score1.txt data1.txt score2.txt data2.txt .... .... score1.txt contig00002 length=671 numreads=17 1207 0.0 contig00003 length=637 numreads=26 1205 0.0 contig00052 length=535 numreads=10 607 e-176 contig00072 length=472 numreads=46 571 e-165 contig00019 length=667 numreads=5 474 e-136 This file has several rows and five columns.column 1-3 are names/descriptions and column 4 (1207, 1205, etc) and column 5 (0.0,0.0, e-176, etc). contain the scores. I want to make a list of TOP 2 names based on column 4 score and whose column 5 score is not '0.0'. For example. for the above data the output list would be: contig00052 length=535 numreads=10 contig00072 length=472 numreads=46 Use the above list to extract data from the 'data1.txt': data1.txt >contig00001 length=567 numreads=35 GGGCTGACGTGGCCGCTAATACGACTCACTATAGGGAGAGAAAaCCAAGGGAGAAaGAAa CTACACTACTAATGGAAAaGATCTACATGCTAGAAAAa >contig00002 length=671 numreads=17 GGGgCTGACGTGgCcGCTAATACGACTCACTATAGGgAGAGTTACTGTGGAGGGAGAGGC TTGCTCAAaTCCGCGTTCAAGGATTTCCAGATTGGTAAGAACTTCAGATT >contig00052 length=535 numreads=10 GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA CCCAGGTGCCGTTAGCCA >contig00003 length=637 numreads=26 GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA CCCAGGTGCCGTTAGCCAGAGCTG >contig00072 length=472 numreads=46 GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT CTCAAGcACTAGGATC >contig00019 length=504 numreads=5 GGGCTGACGTGGCCGCTAATACGACTCACTATAGGgAGAGATCTCACTAAAAAACTGGGG ATAACGCCT Example Output file: >contig00052 length=535 numreads=10 GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA CCCAGGTGCCGTTAGCCA >contig00072 length=472 numreads=46 GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT CTCAAGcACTAGGATC Any reply would be greatly appreciated. -- View this message in context: http://old.nabble.com/Help-Parsing-FASTA-Sequence-File-tp30416193p30416193.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gooch at student.uchc.edu Wed Dec 15 17:00:52 2010 From: gooch at student.uchc.edu (Michael Gooch) Date: Wed, 15 Dec 2010 17:00:52 -0500 Subject: [Bioperl-l] problems with a few packages Message-ID: <4D093A94.3060300@student.uchc.edu> noticed mention of optional packages not being installed and tried to install them. can't install the following packages, cpan claims there is an error while running make. OS is mac snow leopard. I have installed several linux command line tools for various dependencies using macports. I couldn't possibly list them all from memory, but when I find that I need something normally found in linux that mac doesnt have, I try to get it with macports. install Bio::Ext::Align install Bio::SeqIO::staden::read cpan[28]> force install Bio::Ext::Align Running install for module 'Bio::Ext::Align' Running make for B/BI/BIRNEY/bioperl-ext-1.4.tar.gz Has already been unwrapped into directory /Users/gooch/.cpan/build/bioperl-ext-1.4-pmSEl7 '/opt/local/bin/perl Makefile.PL' returned status 512, won't make Running make test Make had some problems, won't test Running make install Make had some problems, won't install cpan[29]> force install Bio::SeqIO::staden::read Running install for module 'Bio::SeqIO::staden::read' Running make for B/BI/BIRNEY/bioperl-ext-1.4.tar.gz Has already been unwrapped into directory /Users/gooch/.cpan/build/bioperl-ext-1.4-pmSEl7 '/opt/local/bin/perl Makefile.PL' returned status 512, won't make Running make test Make had some problems, won't test Running make install Make had some problems, won't install From cjfields at illinois.edu Wed Dec 15 19:15:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Dec 2010 18:15:11 -0600 Subject: [Bioperl-l] Re (3): Status of assembly modules In-Reply-To: References: Message-ID: Lee, You are more than welcome to look at the code to optimize it; might be worth looking athe they way scaffolds, contigs, etc are defined within one aonther. I believe Florent Angly has been actively working on these modules; Florent may have thoughts on this. chris On Dec 10, 2010, at 5:53 PM, Lee Katz wrote: > I am wondering if there is a way to optimize the BioPerl code for Assembly > IO. Specifically, when I convert a 2.2 MB genome (~200 contigs) from a 454 > ace file to a regular ace file, it takes a few hours to get through 30 > contigs using the code below (I estimate more than a day to get through all > of it). > > Is there a way to optimize it? To convert a sequence file to another format > at most would take a minute and therefore converting an ace on the magnitude > of hours or days is too much. I wish I understood bioperl better but I > think the best I can do is issue a challenge or a feature request: who can > speed up Assembly::IO::ace? > > # convert a Newbler ace to a standard ace > sub _newblerAceToAce($args){ > my($self,$args)=@_; > my > $ace454=Bio::Assembly::IO->new(-file=>$$args{ace454Path},-format=>"ace",-variant=>'454'); > my $ace=Bio::Assembly::IO->new(-file=>">$$args{acePath}",-format=>"ace"); > #output ace > my $numContigs=`grep -c ^CO $$args{ace454Path}`+0; > logmsg "Converting $$args{ace454Path} (454-ace) to $$args{acePath} (ace). > $numContigs contigs total."; > while(my $contig=$ace454->next_contig){ > logmsg "Finished with ".$contig->id ." out of $numContigs"; > $ace->write_contig($contig); > } > return $$args{acePath}; > } > > > Message: 3 > > Date: Mon, 22 Nov 2010 15:18:10 -0500 > > From: Lee Katz > > Subject: [Bioperl-l] Re(2): Status of assembly modules > > To: bioperl-l at lists.open-bio.org > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > I figured it out (I haven't tested much though). > > > To whoever works on Assembly::IO::ace.pm: > > I changed a regular expression on line 231 because the contig object was not > > initializing properly. For some reason the 454 ace file had adopted the > > reference assembly's ID and therefore there was a GI number followed by a > > pipe. The pipe was not captured with \w+. I think that the regex will be > > safe with \s(\S+)\s. > > > if (/^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! > > #if (/^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! > > > On Thu, Nov 18, 2010 at 12:04 PM, > wrote: > > >> Message: 3 > >> Date: Wed, 17 Nov 2010 22:20:03 -0500 > >> From: Lee Katz > >> Subject: Re: [Bioperl-l] Status of assembly modules > >> To: bioperl-l at lists.open-bio.org > >> Message-ID: > >> > >> Content-Type: text/plain; charset=UTF-8 > >> > >> I have read on the BioPerl site that a 454 ace is not standardized due to > >> its coordinate system. How can I convert it to the standard ace file? > >> > >> When I run this code either by using contig or assembly objects, I get an > >> error. > >> Can't call method "get_consensus_sequence" on an undefined value at > >> Bio/Assembly/IO/ace.pm line 280, line 93349. > >> > >> sub _newblerAceToAce($args){ > >> my($self,$args)=@_; > >> my > >> > >> $ace454=Bio::Assembly::IO->new(-file=>$args{ace454Path},-format=>"ace > ",-variant=>'454'); > >> my > >> $ace=Bio::Assembly::IO->new(-file=>">$args{acePath}",-format=>"ace"); > >> #while(my $contig=$ace454->next_contig){ > >> while(my $scaffold=$ace454->next_assembly){ > >> print Dumper $scaffold; > >> } > >> return $args{acePath}; > >> } > >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 15 19:23:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Dec 2010 18:23:30 -0600 Subject: [Bioperl-l] problems with a few packages In-Reply-To: <4D093A94.3060300@student.uchc.edu> References: <4D093A94.3060300@student.uchc.edu> Message-ID: Michael, Don't use the CPAN versions of these modules, they're no longer supported (note to self: ask Ewan to remove the old versions of bioperl off CPAN). In fact, bioperl-ext is essentially abandonware. The bioperl-ext code in github (https://github.com/bioperl/bioperl-ext) does work, but we can't make any promises re: bug fixes, as the authors of much of that code have long ago abandoned it. chris On Dec 15, 2010, at 4:00 PM, Michael Gooch wrote: > noticed mention of optional packages not being installed and tried to install them. > > can't install the following packages, cpan claims there is an error while running make. OS is mac snow leopard. I have installed several linux command line tools for various dependencies using macports. I couldn't possibly list them all from memory, but when I find that I need something normally found in linux that mac doesnt have, I try to get it with macports. > > install Bio::Ext::Align > install Bio::SeqIO::staden::read > > cpan[28]> force install Bio::Ext::Align > Running install for module 'Bio::Ext::Align' > Running make for B/BI/BIRNEY/bioperl-ext-1.4.tar.gz > Has already been unwrapped into directory /Users/gooch/.cpan/build/bioperl-ext-1.4-pmSEl7 > '/opt/local/bin/perl Makefile.PL' returned status 512, won't make > Running make test > Make had some problems, won't test > Running make install > Make had some problems, won't install > > cpan[29]> force install Bio::SeqIO::staden::read > Running install for module 'Bio::SeqIO::staden::read' > Running make for B/BI/BIRNEY/bioperl-ext-1.4.tar.gz > Has already been unwrapped into directory /Users/gooch/.cpan/build/bioperl-ext-1.4-pmSEl7 > '/opt/local/bin/perl Makefile.PL' returned status 512, won't make > Running make test > Make had some problems, won't test > Running make install > Make had some problems, won't install > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Wed Dec 15 23:43:23 2010 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 16 Dec 2010 14:43:23 +1000 Subject: [Bioperl-l] Re (3): Status of assembly modules In-Reply-To: References: Message-ID: <4D0998EB.8030300@gmail.com> Hi Lee, I am familiar with the assembly code. Could you file bug reports for these two issues on http://bugzilla.open-bio.org/ and include a files to reproduce the problems? The issue in parsing the 'CO' line appears easy to fix. The issue about your 454 contigs may be harder. You have few contigs, but I am assuming that they contain many reads. When you're done putting up the bug reports, give me the links to them and I'll try to see what I can do about fixing the issues. Florent PS/ What version of BioPerl are you using? It would be good to make sure that the issues are present in the development version. On 11/12/10 09:53, Lee Katz wrote: > I am wondering if there is a way to optimize the BioPerl code for Assembly > IO. Specifically, when I convert a 2.2 MB genome (~200 contigs) from a 454 > ace file to a regular ace file, it takes a few hours to get through 30 > contigs using the code below (I estimate more than a day to get through all > of it). > > Is there a way to optimize it? To convert a sequence file to another format > at most would take a minute and therefore converting an ace on the magnitude > of hours or days is too much. I wish I understood bioperl better but I > think the best I can do is issue a challenge or a feature request: who can > speed up Assembly::IO::ace? > > # convert a Newbler ace to a standard ace > sub _newblerAceToAce($args){ > my($self,$args)=@_; > my > $ace454=Bio::Assembly::IO->new(-file=>$$args{ace454Path},-format=>"ace",-variant=>'454'); > my $ace=Bio::Assembly::IO->new(-file=>">$$args{acePath}",-format=>"ace"); > #output ace > my $numContigs=`grep -c ^CO $$args{ace454Path}`+0; > logmsg "Converting $$args{ace454Path} (454-ace) to $$args{acePath} (ace). > $numContigs contigs total."; > while(my $contig=$ace454->next_contig){ > logmsg "Finished with ".$contig->id ." out of $numContigs"; > $ace->write_contig($contig); > } > return $$args{acePath}; > } > > > Message: 3 > > Date: Mon, 22 Nov 2010 15:18:10 -0500 > > From: Lee Katz > > Subject: [Bioperl-l] Re(2): Status of assembly modules > > To: bioperl-l at lists.open-bio.org > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > I figured it out (I haven't tested much though). > > > To whoever works on Assembly::IO::ace.pm: > > I changed a regular expression on line 231 because the contig object was not > > initializing properly. For some reason the 454 ace file had adopted the > > reference assembly's ID and therefore there was a GI number followed by a > > pipe. The pipe was not captured with \w+. I think that the regex will be > > safe with \s(\S+)\s. > > > if (/^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! > > #if (/^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! > > > On Thu, Nov 18, 2010 at 12:04 PM,> wrote: > >> Message: 3 >> Date: Wed, 17 Nov 2010 22:20:03 -0500 >> From: Lee Katz >> Subject: Re: [Bioperl-l] Status of assembly modules >> To: bioperl-l at lists.open-bio.org >> Message-ID: >> >> Content-Type: text/plain; charset=UTF-8 >> I have read on the BioPerl site that a 454 ace is not standardized due to >> its coordinate system. How can I convert it to the standard ace file? >> When I run this code either by using contig or assembly objects, I get an >> error. >> Can't call method "get_consensus_sequence" on an undefined value at >> Bio/Assembly/IO/ace.pm line 280, line 93349. >> sub _newblerAceToAce($args){ >> my($self,$args)=@_; >> my >> $ace454=Bio::Assembly::IO->new(-file=>$args{ace454Path},-format=>"ace > ",-variant=>'454'); > >> my >> $ace=Bio::Assembly::IO->new(-file=>">$args{acePath}",-format=>"ace"); >> #while(my $contig=$ace454->next_contig){ >> while(my $scaffold=$ace454->next_assembly){ >> print Dumper $scaffold; >> } >> return $args{acePath}; >> } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Thu Dec 16 00:39:05 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 16 Dec 2010 06:39:05 +0100 Subject: [Bioperl-l] problems with a few packages In-Reply-To: References: <4D093A94.3060300@student.uchc.edu> Message-ID: <2F28B3DF-F69B-45C9-A770-0EFD6F49CE0A@sbc.su.se> Hey Chris, On Dec 16, 2010, at 1:23, Chris Fields wrote: > (note to self: ask Ewan to remove the old versions of bioperl off CPAN). When I looked at this a couple of months ago, as far as I could tell Ewan had marked them as deprecated, and they're now part of BackPAN instead of the main CPAN. What I didn't do was ask the CPAN admins why these deprecated releases still show up (and preferentially) in users' CPAN queries. So perhaps that's what we should do next. (and by we I mean me unless you or someone else is motivated at the moment) Dave From dan.bolser at gmail.com Thu Dec 16 04:03:11 2010 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 16 Dec 2010 09:03:11 +0000 Subject: [Bioperl-l] about hawkeye In-Reply-To: <191bb0a.11900.12ced3da3af.Coremail.zhyyjj_811@163.com> References: <191bb0a.11900.12ced3da3af.Coremail.zhyyjj_811@163.com> Message-ID: Hi Yingjie, I'm replying to the AMOS-help and BioPerl mailing lists so that your question can reach the largest audience of domain experts. I believe there are tools within BioPerl that can convert SAM to AMOS's AFG and bank formats, however, I haven't looked into those recently. Alternatively, you can open a SAM or an indexed BAM file in the Tablet assembly viewer: http://bioinf.scri.ac.uk/tablet/ Finally, you can try asking your question (or searching for the answer) on the SEQanswers forum: http://seqanswers.com/ About the broken link to the AMOS AFG format specification, recently the AMOS manual was moved onto a wiki hosted on sourceforge. Unfortunately they didn't set up a forwarding rule to allow the old URLs to keep working. When you find a broken link to the AMOS manual, try searching for the information here: http://sourceforge.net/apps/mediawiki/amos I think you can find what you're looking for here: http://sourceforge.net/apps/mediawiki/amos/index.php?title=Infrastructure#File_format_specs All the best, Dan. 2010/12/16 ??? : > Dear Bolser, > > It is my great pleasure when you see this letter. Recently I use hawkeye > visualization tools to view my assembly results. But I have some BAM or SAM > format files. I need to convert BAM or SAM to AFG. Do you have a tools do > it? Or I want to develop this conversion tool.Can you help me? > > I can't open this link about AFG formt, > http://amos.sourceforge.net/docs/specs/. Can you send it to me? > > Thank you for your help! > > Best regards, > > Yingjie Zhu > Institute of Medicinal Plant Development in Beijing, China > 2010-12-17 > > > From cjfields at illinois.edu Thu Dec 16 08:33:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 16 Dec 2010 07:33:15 -0600 Subject: [Bioperl-l] problems with a few packages In-Reply-To: <2F28B3DF-F69B-45C9-A770-0EFD6F49CE0A@sbc.su.se> References: <4D093A94.3060300@student.uchc.edu> <2F28B3DF-F69B-45C9-A770-0EFD6F49CE0A@sbc.su.se> Message-ID: <77D4D6CD-0D4E-4CFD-A1B2-F2C491E4C6F3@illinois.edu> On Dec 15, 2010, at 11:39 PM, Dave Messina wrote: > Hey Chris, > > On Dec 16, 2010, at 1:23, Chris Fields wrote: > >> (note to self: ask Ewan to remove the old versions of bioperl off CPAN). > > > When I looked at this a couple of months ago, as far as I could tell Ewan had marked them as deprecated, and they're now part of BackPAN instead of the main CPAN. > > What I didn't do was ask the CPAN admins why these deprecated releases still show up (and preferentially) in users' CPAN queries. So perhaps that's what we should do next. (and by we I mean me unless you or someone else is motivated at the moment) > > > Dave Probably a good idea to check with the CPAN folks. If Ewan didn't delete the old versions (under his user name), then they might still be indexed. chris From beornk at gmail.com Fri Dec 17 11:05:47 2010 From: beornk at gmail.com (Arnau Montagud) Date: Fri, 17 Dec 2010 08:05:47 -0800 (PST) Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: <31AA49FD0FDD466CB349ABAE75591B26@NewLife> References: <31AA49FD0FDD466CB349ABAE75591B26@NewLife> Message-ID: <4ce46b69-7e21-41da-8c8a-2cc65103b047@35g2000prt.googlegroups.com> Hello, I am new to Bioperl and looking for a solution to this specific problem, I found this mailing list. I am trying to know distances between all the leaves of a given extended newick tree. Thanks to your script I can have a vector (@dists) with all the distances, but I would like to know from what pair of nodes are those distances from (!). Thanks! My current script is: use Bio::TreeIO; $tree = Bio::TreeIO->new( -file=>'tree', -format=>'nhx' )->next_tree; my @nodes = $tree->get_leaf_nodes; my @dists; while (my $l = shift @nodes) { foreach my $m (@nodes) { push @dists, $tree->distance( -nodes => [$l, $m] ); } } foreach (@dists) { print "$_\n"; } On 12 mar, 16:45, "Mark A. Jensen" wrote: > along with Jason's comment then you'll need to > loop through the node pairs by hand: > > my @leaves = $tree->get_leaf_nodes; > my @dists; > while (my $l = shift @leaves) { > ? foreach my $m (@leaves) { > ? ? push @dists, $tree->distance( -nodes=> [$l, $m] ); > ? } > > } > > should give you all n(n-1)/2 pairwisedistances. > > > > > > > > > > ----- Original Message ----- > From: "Jeffrey Detras" > To: > Sent: Friday, March 05, 2010 1:17 AM > Subject: [Bioperl-l]distancesbetweenleafnodes > > > Hi, > > > I am new at using the Bio::TreeIO module specifically using the newick > > format for a phylogenetic analysis. The sample_tree attached is > > Newick-formatted tree. My objective is to get all thedistancesbetweenall > > theleafnodes. I copied examples of the code from > >http://www.bioperl.org/wiki/HOWTO:Treesbut it does not tell me much (to my > > knowledge) so that I understand how to assign the right array value for the > >nodes/leaves. The message would say must provide 2 rootnodes. > > > Here is what I have right now: > > > #!/usr/bin/perl -w > > use strict; > > > my $treefile = 'sample_tree'; > > use Bio::TreeIO; > > my $treeio = Bio::TreeIO->new(-format => 'newick', > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -file => $treefile); > > > while (my $tree = $treeio->next_tree) { > > ? ? ? ? my @leaves = $tree->get_leaf_nodes; > > ? ? ? ? for (my $dist = $tree->distance(-nodes=> \@leaves)){ > > ? ? ? ? ? ? ? ? print "Distancebetweentrees is $dist\n"; > > ? ? ? ? } > > } > > > Thanks, > > Jeff > > --------------------------------------------------------------------------- ----- > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Fri Dec 17 11:35:57 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 17 Dec 2010 16:35:57 +0000 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: <4ce46b69-7e21-41da-8c8a-2cc65103b047@35g2000prt.googlegroups.com> References: <31AA49FD0FDD466CB349ABAE75591B26@NewLife> <4ce46b69-7e21-41da-8c8a-2cc65103b047@35g2000prt.googlegroups.com> Message-ID: <4D0B916D.2000203@gmail.com> Hi Arnau, Looks pretty simple, don't you just need to print out the ids of the leaf nodes? So your loop would be something like: while (my $l = shift @nodes) { foreach my $m (@nodes) { print join("\t", $l->id, $m->id, $tree->distance( -nodes => [$l, $m] )), "\n"; } } Cheers, Roy. On 17/12/2010 16:05, Arnau Montagud wrote: > Hello, I am new to Bioperl and looking for a solution to this specific > problem, I found this mailing list. > I am trying to know distances between all the leaves of a given > extended newick tree. Thanks to your script I can have a vector > (@dists) with all the distances, but I would like to know from what > pair of nodes are those distances from (!). > Thanks! > > My current script is: > > use Bio::TreeIO; > > $tree = Bio::TreeIO->new( > -file=>'tree', > -format=>'nhx' > )->next_tree; > > my @nodes = $tree->get_leaf_nodes; > my @dists; > while (my $l = shift @nodes) { > foreach my $m (@nodes) { > push @dists, $tree->distance( -nodes => [$l, $m] ); > } > } > > foreach (@dists) { > print "$_\n"; > } > > > On 12 mar, 16:45, "Mark A. Jensen" wrote: >> along with Jason's comment then you'll need to >> loop through the node pairs by hand: >> >> my @leaves = $tree->get_leaf_nodes; >> my @dists; >> while (my $l = shift @leaves) { >> foreach my $m (@leaves) { >> push @dists, $tree->distance( -nodes=> [$l, $m] ); >> } >> >> } >> >> should give you all n(n-1)/2 pairwisedistances. >> >> >> >> >> >> >> >> >> >> ----- Original Message ----- >> From: "Jeffrey Detras" >> To: >> Sent: Friday, March 05, 2010 1:17 AM >> Subject: [Bioperl-l]distancesbetweenleafnodes >> >>> Hi, >> >>> I am new at using the Bio::TreeIO module specifically using the newick >>> format for a phylogenetic analysis. The sample_tree attached is >>> Newick-formatted tree. My objective is to get all thedistancesbetweenall >>> theleafnodes. I copied examples of the code from >>> http://www.bioperl.org/wiki/HOWTO:Treesbut it does not tell me much (to my >>> knowledge) so that I understand how to assign the right array value for the >>> nodes/leaves. The message would say must provide 2 rootnodes. >> >>> Here is what I have right now: >> >>> #!/usr/bin/perl -w >>> use strict; >> >>> my $treefile = 'sample_tree'; >>> use Bio::TreeIO; >>> my $treeio = Bio::TreeIO->new(-format => 'newick', >>> -file => $treefile); >> >>> while (my $tree = $treeio->next_tree) { >>> my @leaves = $tree->get_leaf_nodes; >>> for (my $dist = $tree->distance(-nodes=> \@leaves)){ >>> print "Distancebetweentrees is $dist\n"; >>> } >>> } >> >>> Thanks, >>> Jeff >> >> --------------------------------------------------------------------------- ----- >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From beornk at gmail.com Fri Dec 17 11:41:19 2010 From: beornk at gmail.com (Arnau Montagud) Date: Fri, 17 Dec 2010 17:41:19 +0100 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: <4D0B916D.2000203@gmail.com> References: <31AA49FD0FDD466CB349ABAE75591B26@NewLife> <4ce46b69-7e21-41da-8c8a-2cc65103b047@35g2000prt.googlegroups.com> <4D0B916D.2000203@gmail.com> Message-ID: Thank you so much Roy! It works perfectly! Arnau 2010/12/17 Roy Chaudhuri > Hi Arnau, > > Looks pretty simple, don't you just need to print out the ids of the leaf > nodes? So your loop would be something like: > > while (my $l = shift @nodes) { > foreach my $m (@nodes) { > print join("\t", $l->id, $m->id, $tree->distance( -nodes => [$l, $m] > )), "\n"; > } > } > > Cheers, > Roy. > > On 17/12/2010 16:05, Arnau Montagud wrote: > >> Hello, I am new to Bioperl and looking for a solution to this specific >> problem, I found this mailing list. >> I am trying to know distances between all the leaves of a given >> extended newick tree. Thanks to your script I can have a vector >> (@dists) with all the distances, but I would like to know from what >> pair of nodes are those distances from (!). >> Thanks! >> >> My current script is: >> >> use Bio::TreeIO; >> >> $tree = Bio::TreeIO->new( >> -file=>'tree', >> -format=>'nhx' >> )->next_tree; >> >> my @nodes = $tree->get_leaf_nodes; >> my @dists; >> while (my $l = shift @nodes) { >> foreach my $m (@nodes) { >> push @dists, $tree->distance( -nodes => [$l, $m] ); >> } >> } >> >> foreach (@dists) { >> print "$_\n"; >> } >> >> >> On 12 mar, 16:45, "Mark A. Jensen" wrote: >> >>> along with Jason's comment then you'll need to >>> loop through the node pairs by hand: >>> >>> my @leaves = $tree->get_leaf_nodes; >>> my @dists; >>> while (my $l = shift @leaves) { >>> foreach my $m (@leaves) { >>> push @dists, $tree->distance( -nodes=> [$l, $m] ); >>> } >>> >>> } >>> >>> should give you all n(n-1)/2 pairwisedistances. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ----- Original Message ----- >>> From: "Jeffrey Detras" >>> To: >>> Sent: Friday, March 05, 2010 1:17 AM >>> Subject: [Bioperl-l]distancesbetweenleafnodes >>> >>> Hi, >>>> >>> >>> I am new at using the Bio::TreeIO module specifically using the newick >>>> format for a phylogenetic analysis. The sample_tree attached is >>>> Newick-formatted tree. My objective is to get all thedistancesbetweenall >>>> theleafnodes. I copied examples of the code from >>>> http://www.bioperl.org/wiki/HOWTO:Treesbut it does not tell me much (to >>>> my >>>> knowledge) so that I understand how to assign the right array value for >>>> the >>>> nodes/leaves. The message would say must provide 2 rootnodes. >>>> >>> >>> Here is what I have right now: >>>> >>> >>> #!/usr/bin/perl -w >>>> use strict; >>>> >>> >>> my $treefile = 'sample_tree'; >>>> use Bio::TreeIO; >>>> my $treeio = Bio::TreeIO->new(-format => 'newick', >>>> -file => $treefile); >>>> >>> >>> while (my $tree = $treeio->next_tree) { >>>> my @leaves = $tree->get_leaf_nodes; >>>> for (my $dist = $tree->distance(-nodes=> \@leaves)){ >>>> print "Distancebetweentrees is $dist\n"; >>>> } >>>> } >>>> >>> >>> Thanks, >>>> Jeff >>>> >>> >>> --------------------------------------------------------------------------- >>> ----- >>> >>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.orghttp:// >>> lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From gowthaman.ramasamy at seattlebiomed.org Fri Dec 17 19:18:40 2010 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Fri, 17 Dec 2010 16:18:40 -0800 Subject: [Bioperl-l] Parsing individual exons from EMBL file Message-ID: Hi All, I am trying to find a method to parse the individual exons/cds featutres from a multi exonic gene feature. When I try the following methods, it gives me only the outer most boundaries. (55387 and 56300 in the below example). For example...my EMBL contains... FT CDS complement(join(55387..56181,56187..56300)) FT /ID="apidb|cds_LmjF01.0200-1" FT /description="." FT /size="903" FT /Parent="apidb|rna_LmjF01.0200-1" FT /feature_order="115" FT /product="hypothetical+protein%2C+conserved" FT /Name="cds" Use Bio::SeqIO; While(my $seqobj = $file_io->next_seq()){ My @features = $seqobj->all_SeqFeatures(); Foreach $feat (@features){ $feat->start; $feat->end; } } When I use $feat->start; it gives me 55387 and $feat>end; it gives me 56300. Ideally I would like to get the start and end of sub features (exon 1 55387..56181) and (exon256187..56300). When when I tried to use the "sub_SeqFeature()" it does not return anything. Any idea? Also not sure, if I have the rightly formated EMBL file. Any suggestions... Any suggestion of converting EMBL to GFF3 will be appreciated. I have a script which does that. But just fuses all the joins together to give me only one GFF line. Basically, I could not separate the exons. Thanks, Gowtham From jason at bioperl.org Fri Dec 17 19:32:31 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 17 Dec 2010 16:32:31 -0800 Subject: [Bioperl-l] Parsing individual exons from EMBL file In-Reply-To: References: Message-ID: <4D0C011F.5000602@bioperl.org> You need to operate on the sub-locations. basically for my $loc ( $feature->location->each_Location ) { print $loc->start .. $loc->end, "\n"; } But for converting to GFF3 will want to look at the Unflattener which basically does this for you and the bp_unflatten_seq.pl script which implements it. What you may know by now is that all EMBL/GenBank records are not consistent in how things are annotated (how ID, product, description are used) so mapping this to properly formatted GFF3 for Gbrowse, etc can be a tedious process sometimes. FYI -- APIDB also provides GFF3 if you would rather... http://tritrypdb.org/common/downloads/release-2.5/Lmajor/gff/ -jason Gowthaman Ramasamy wrote: > Hi All, > I am trying to find a method to parse the individual exons/cds featutres from a multi exonic gene feature. When I try the following methods, it gives me only the outer most boundaries. (55387 and 56300 in the below example). > > For example...my EMBL contains... > FT CDS complement(join(55387..56181,56187..56300)) > FT /ID="apidb|cds_LmjF01.0200-1" > FT /description="." > FT /size="903" > FT /Parent="apidb|rna_LmjF01.0200-1" > FT /feature_order="115" > FT /product="hypothetical+protein%2C+conserved" > FT /Name="cds" > > Use Bio::SeqIO; > While(my $seqobj = $file_io->next_seq()){ > My @features = $seqobj->all_SeqFeatures(); > Foreach $feat (@features){ > $feat->start; > $feat->end; > } > } > > When I use $feat->start; it gives me 55387 and $feat>end; it gives me 56300. > > Ideally I would like to get the start and end of sub features (exon 1 55387..56181) and (exon256187..56300). When when I tried to use the "sub_SeqFeature()" it does not return anything. > > Any idea? Also not sure, if I have the rightly formated EMBL file. Any suggestions... > > Any suggestion of converting EMBL to GFF3 will be appreciated. I have a script which does that. But just fuses all the joins together to give me only one GFF line. Basically, I could not separate the exons. > > Thanks, > Gowtham > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki From asjo at koldfront.dk Fri Dec 17 19:29:21 2010 From: asjo at koldfront.dk (Adam =?utf-8?Q?Sj=C3=B8gren?=) Date: Sat, 18 Dec 2010 01:29:21 +0100 Subject: [Bioperl-l] Parsing individual exons from EMBL file In-Reply-To: (Gowthaman Ramasamy's message of "Fri, 17 Dec 2010 16:18:40 -0800") References: Message-ID: <87oc8jdiu6.fsf@topper.koldfront.dk> On Fri, 17 Dec 2010 16:18:40 -0800, Gowthaman wrote: > Foreach $feat (@features){ > $feat->start; > $feat->end; You probably want something like: foreach $sub_location ($feat->location->each_Location) { say $sub_location->start; say $sub_location->end; } in there instead of ->start and ->end on the feature object. Best regards, Adam -- "Accept the mystery!" Adam Sj?gren asjo at koldfront.dk From florent.angly at gmail.com Sat Dec 18 00:26:50 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 18 Dec 2010 15:26:50 +1000 Subject: [Bioperl-l] Help Parsing FASTA Sequence File In-Reply-To: <30416193.post@talk.nabble.com> References: <30416193.post@talk.nabble.com> Message-ID: <4D0C461A.9080905@gmail.com> Hi, You should probably start here: http://www.bioperl.org/wiki/HOWTO:SeqIO Florent On 09/12/10 22:50, Fahmida wrote: > Hi, > > I've several input 'score' files and their corresponding 'data' files like: > score1.txt data1.txt > score2.txt data2.txt > .... > .... > > score1.txt > > contig00002 length=671 numreads=17 1207 0.0 > contig00003 length=637 numreads=26 1205 0.0 > contig00052 length=535 numreads=10 607 e-176 > contig00072 length=472 numreads=46 571 e-165 > contig00019 length=667 numreads=5 474 e-136 > > This file has several rows and five columns.column 1-3 are > names/descriptions and column 4 (1207, 1205, etc) and column 5 (0.0,0.0, > e-176, etc). contain the scores. I want to make a list of TOP 2 names based > on column 4 score and whose column 5 score is not '0.0'. For example. for > the above data the output list would be: > > contig00052 length=535 numreads=10 > contig00072 length=472 numreads=46 > > Use the above list to extract data from the 'data1.txt': > > data1.txt > >> contig00001 length=567 numreads=35 > GGGCTGACGTGGCCGCTAATACGACTCACTATAGGGAGAGAAAaCCAAGGGAGAAaGAAa > CTACACTACTAATGGAAAaGATCTACATGCTAGAAAAa >> contig00002 length=671 numreads=17 > GGGgCTGACGTGgCcGCTAATACGACTCACTATAGGgAGAGTTACTGTGGAGGGAGAGGC > TTGCTCAAaTCCGCGTTCAAGGATTTCCAGATTGGTAAGAACTTCAGATT >> contig00052 length=535 numreads=10 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA > CCCAGGTGCCGTTAGCCA >> contig00003 length=637 numreads=26 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA > CCCAGGTGCCGTTAGCCAGAGCTG >> contig00072 length=472 numreads=46 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA > GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT > CTCAAGcACTAGGATC >> contig00019 length=504 numreads=5 > GGGCTGACGTGGCCGCTAATACGACTCACTATAGGgAGAGATCTCACTAAAAAACTGGGG > ATAACGCCT > > > Example Output file: > >> contig00052 length=535 numreads=10 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA > CCCAGGTGCCGTTAGCCA >> contig00072 length=472 numreads=46 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA > GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT > CTCAAGcACTAGGATC > > Any reply would be greatly appreciated. > From florent.angly at gmail.com Sun Dec 19 23:08:05 2010 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 20 Dec 2010 14:08:05 +1000 Subject: [Bioperl-l] Re (3): Status of assembly modules In-Reply-To: References: Message-ID: <4D0ED6A5.1050406@gmail.com> Hi Lee, I was able to fix the bug you reported regarding the contig IDs in the developement verison of Bioperl. For the other bug, please file a bug report at http://bugzilla.open-bio.org/ and give me the URL. Provide the file that you used so that we can reproduce the bug. Also, tell me how much memory you have on your machine, as I suspect that you may be running out of memory because of the size of your dataset and the way the Bioperl assembly modules deal with contigs and scaffolds. Thank you, Florent On 16/12/10 10:15, Chris Fields wrote: > Lee, > > You are more than welcome to look at the code to optimize it; might be worth looking athe they way scaffolds, contigs, etc are defined within one aonther. I believe Florent Angly has been actively working on these modules; Florent may have thoughts on this. > > chris > > On Dec 10, 2010, at 5:53 PM, Lee Katz wrote: > >> I am wondering if there is a way to optimize the BioPerl code for Assembly >> IO. Specifically, when I convert a 2.2 MB genome (~200 contigs) from a 454 >> ace file to a regular ace file, it takes a few hours to get through 30 >> contigs using the code below (I estimate more than a day to get through all >> of it). >> >> Is there a way to optimize it? To convert a sequence file to another format >> at most would take a minute and therefore converting an ace on the magnitude >> of hours or days is too much. I wish I understood bioperl better but I >> think the best I can do is issue a challenge or a feature request: who can >> speed up Assembly::IO::ace? >> >> # convert a Newbler ace to a standard ace >> sub _newblerAceToAce($args){ >> my($self,$args)=@_; >> my >> $ace454=Bio::Assembly::IO->new(-file=>$$args{ace454Path},-format=>"ace",-variant=>'454'); >> my $ace=Bio::Assembly::IO->new(-file=>">$$args{acePath}",-format=>"ace"); >> #output ace >> my $numContigs=`grep -c ^CO $$args{ace454Path}`+0; >> logmsg "Converting $$args{ace454Path} (454-ace) to $$args{acePath} (ace). >> $numContigs contigs total."; >> while(my $contig=$ace454->next_contig){ >> logmsg "Finished with ".$contig->id ." out of $numContigs"; >> $ace->write_contig($contig); >> } >> return $$args{acePath}; >> } >> >> >> Message: 3 >> >> Date: Mon, 22 Nov 2010 15:18:10 -0500 >> >> From: Lee Katz >> >> Subject: [Bioperl-l] Re(2): Status of assembly modules >> >> To: bioperl-l at lists.open-bio.org >> >> Message-ID: >> >> >> >> Content-Type: text/plain; charset=UTF-8 >> >> >> I figured it out (I haven't tested much though). >> >> >> To whoever works on Assembly::IO::ace.pm: >> >> I changed a regular expression on line 231 because the contig object was not >> >> initializing properly. For some reason the 454 ace file had adopted the >> >> reference assembly's ID and therefore there was a GI number followed by a >> >> pipe. The pipe was not captured with \w+. I think that the regex will be >> >> safe with \s(\S+)\s. >> >> >> if (/^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! >> >> #if (/^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! >> >> >> On Thu, Nov 18, 2010 at 12:04 PM,>> wrote: >> >>> Message: 3 >>> Date: Wed, 17 Nov 2010 22:20:03 -0500 >>> From: Lee Katz >>> Subject: Re: [Bioperl-l] Status of assembly modules >>> To: bioperl-l at lists.open-bio.org >>> Message-ID: >>> >>> Content-Type: text/plain; charset=UTF-8 >>> I have read on the BioPerl site that a 454 ace is not standardized due to >>> its coordinate system. How can I convert it to the standard ace file? >>> When I run this code either by using contig or assembly objects, I get an >>> error. >>> Can't call method "get_consensus_sequence" on an undefined value at >>> Bio/Assembly/IO/ace.pm line 280, line 93349. >>> sub _newblerAceToAce($args){ >>> my($self,$args)=@_; >>> my >>> $ace454=Bio::Assembly::IO->new(-file=>$args{ace454Path},-format=>"ace >> ",-variant=>'454'); >> >>> my >>> $ace=Bio::Assembly::IO->new(-file=>">$args{acePath}",-format=>"ace"); >>> #while(my $contig=$ace454->next_contig){ >>> while(my $scaffold=$ace454->next_assembly){ >>> print Dumper $scaffold; >>> } >>> return $args{acePath}; >>> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bunk at novozymes.com Tue Dec 21 07:09:32 2010 From: bunk at novozymes.com (Jacob Bunk Nielsen) Date: Tue, 21 Dec 2010 13:09:32 +0100 Subject: [Bioperl-l] OX lines of EMBL files [patch included] Message-ID: <774oa7fhtv.fsf@novozymes.com> Hi I have noticed that BioPerl does not write OX lines with NCBI TaxIDs in. I'd like this feature even though the official way seems to have a a source feature with a db_xref to a 'taxon:(\d+)' for this sort of thing. I've found and example of an OX line being used for a TaxID at ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt Please find attached a patch for reading and writing NCBI TaxIDs to EMBL files and a test that verifies that the TaxID will roundtrip being written and then read. Best regards Jacob -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Read-NCBI-TaxID-from-OX-lines-of-EMBL-formatted-DNA-.patch Type: text/x-diff Size: 5580 bytes Desc: not available URL: From jskittrell at unmc.edu Tue Dec 21 16:40:13 2010 From: jskittrell at unmc.edu (Jeff S Kittrell) Date: Tue, 21 Dec 2010 15:40:13 -0600 Subject: [Bioperl-l] sim4 parsing module Message-ID: An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Dec 21 22:42:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 21 Dec 2010 21:42:44 -0600 Subject: [Bioperl-l] sim4 parsing module In-Reply-To: References: Message-ID: On Dec 21, 2010, at 3:40 PM, Jeff S Kittrell wrote: > The sim4 module does not properly parse sim4cc output. I was told by the sim4cc developers its output is the same as sim4. I dont know if someone on this could help me with fixing this issue? I can provide examples/output files where it does and does not work. I don't the have requisite PERL knowledge to fix it myself. Thanks for any help > > Jeff Kittrell Jeff, yes we would need to see example code. It might be a good idea to submit this as a bug, with the examples and a script. http://bugzilla.open-bio.org/ chris From cjfields at illinois.edu Wed Dec 22 10:38:32 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Dec 2010 09:38:32 -0600 Subject: [Bioperl-l] Help Parsing FASTA Sequence File In-Reply-To: <30416193.post@talk.nabble.com> References: <30416193.post@talk.nabble.com> Message-ID: <3888594D-7D55-427C-8FDF-FF1C16965991@illinois.edu> You might want to look at Bio::DB::Fasta or Bio::Index::Fasta, or Bio::DB::Flat (all of which index FASTA), and use SQLite or similar to create a database for the score lookups. chris On Dec 9, 2010, at 6:50 AM, Fahmida wrote: > > Hi, > > I've several input 'score' files and their corresponding 'data' files like: > score1.txt data1.txt > score2.txt data2.txt > .... > .... > > score1.txt > > contig00002 length=671 numreads=17 1207 0.0 > contig00003 length=637 numreads=26 1205 0.0 > contig00052 length=535 numreads=10 607 e-176 > contig00072 length=472 numreads=46 571 e-165 > contig00019 length=667 numreads=5 474 e-136 > > This file has several rows and five columns.column 1-3 are > names/descriptions and column 4 (1207, 1205, etc) and column 5 (0.0,0.0, > e-176, etc). contain the scores. I want to make a list of TOP 2 names based > on column 4 score and whose column 5 score is not '0.0'. For example. for > the above data the output list would be: > > contig00052 length=535 numreads=10 > contig00072 length=472 numreads=46 > > Use the above list to extract data from the 'data1.txt': > > data1.txt > >> contig00001 length=567 numreads=35 > GGGCTGACGTGGCCGCTAATACGACTCACTATAGGGAGAGAAAaCCAAGGGAGAAaGAAa > CTACACTACTAATGGAAAaGATCTACATGCTAGAAAAa >> contig00002 length=671 numreads=17 > GGGgCTGACGTGgCcGCTAATACGACTCACTATAGGgAGAGTTACTGTGGAGGGAGAGGC > TTGCTCAAaTCCGCGTTCAAGGATTTCCAGATTGGTAAGAACTTCAGATT >> contig00052 length=535 numreads=10 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA > CCCAGGTGCCGTTAGCCA >> contig00003 length=637 numreads=26 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA > CCCAGGTGCCGTTAGCCAGAGCTG >> contig00072 length=472 numreads=46 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA > GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT > CTCAAGcACTAGGATC >> contig00019 length=504 numreads=5 > GGGCTGACGTGGCCGCTAATACGACTCACTATAGGgAGAGATCTCACTAAAAAACTGGGG > ATAACGCCT > > > Example Output file: > >> contig00052 length=535 numreads=10 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA > CCCAGGTGCCGTTAGCCA >> contig00072 length=472 numreads=46 > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA > GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT > CTCAAGcACTAGGATC > > Any reply would be greatly appreciated. > > -- > View this message in context: http://old.nabble.com/Help-Parsing-FASTA-Sequence-File-tp30416193p30416193.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jordi.durban at gmail.com Wed Dec 22 10:56:59 2010 From: jordi.durban at gmail.com (Jordi Durban) Date: Wed, 22 Dec 2010 16:56:59 +0100 Subject: [Bioperl-l] Help Parsing FASTA Sequence File In-Reply-To: <3888594D-7D55-427C-8FDF-FF1C16965991@illinois.edu> References: <30416193.post@talk.nabble.com> <3888594D-7D55-427C-8FDF-FF1C16965991@illinois.edu> Message-ID: At first sight I'd try using awk to get those "column1" that aren't 0.0 at their "colum4". Something like: if ($4 !~ /0.0/) print $1 And once these identifiers you could try to get the $seq->seq() from each $seq->id(). Hope this helps. 2010/12/22 Chris Fields > You might want to look at Bio::DB::Fasta or Bio::Index::Fasta, or > Bio::DB::Flat (all of which index FASTA), and use SQLite or similar to > create a database for the score lookups. > > chris > > On Dec 9, 2010, at 6:50 AM, Fahmida wrote: > > > > > Hi, > > > > I've several input 'score' files and their corresponding 'data' files > like: > > score1.txt data1.txt > > score2.txt data2.txt > > .... > > .... > > > > score1.txt > > > > contig00002 length=671 numreads=17 1207 0.0 > > contig00003 length=637 numreads=26 1205 0.0 > > contig00052 length=535 numreads=10 607 e-176 > > contig00072 length=472 numreads=46 571 e-165 > > contig00019 length=667 numreads=5 474 e-136 > > > > This file has several rows and five columns.column 1-3 are > > names/descriptions and column 4 (1207, 1205, etc) and column 5 (0.0,0.0, > > e-176, etc). contain the scores. I want to make a list of TOP 2 names > based > > on column 4 score and whose column 5 score is not '0.0'. For example. for > > the above data the output list would be: > > > > contig00052 length=535 numreads=10 > > contig00072 length=472 numreads=46 > > > > Use the above list to extract data from the 'data1.txt': > > > > data1.txt > > > >> contig00001 length=567 numreads=35 > > GGGCTGACGTGGCCGCTAATACGACTCACTATAGGGAGAGAAAaCCAAGGGAGAAaGAAa > > CTACACTACTAATGGAAAaGATCTACATGCTAGAAAAa > >> contig00002 length=671 numreads=17 > > GGGgCTGACGTGgCcGCTAATACGACTCACTATAGGgAGAGTTACTGTGGAGGGAGAGGC > > TTGCTCAAaTCCGCGTTCAAGGATTTCCAGATTGGTAAGAACTTCAGATT > >> contig00052 length=535 numreads=10 > > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA > > CCCAGGTGCCGTTAGCCA > >> contig00003 length=637 numreads=26 > > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA > > CCCAGGTGCCGTTAGCCAGAGCTG > >> contig00072 length=472 numreads=46 > > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA > > GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT > > CTCAAGcACTAGGATC > >> contig00019 length=504 numreads=5 > > GGGCTGACGTGGCCGCTAATACGACTCACTATAGGgAGAGATCTCACTAAAAAACTGGGG > > ATAACGCCT > > > > > > Example Output file: > > > >> contig00052 length=535 numreads=10 > > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA > > CCCAGGTGCCGTTAGCCA > >> contig00072 length=472 numreads=46 > > GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA > > GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT > > CTCAAGcACTAGGATC > > > > Any reply would be greatly appreciated. > > > > -- > > View this message in context: > http://old.nabble.com/Help-Parsing-FASTA-Sequence-File-tp30416193p30416193.html > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jordi From deeepersound at googlemail.com Wed Dec 22 19:00:25 2010 From: deeepersound at googlemail.com (Maxim) Date: Thu, 23 Dec 2010 01:00:25 +0100 Subject: [Bioperl-l] indexing conservation scores Message-ID: Hi, bio::db:fasta is a beautiful tool for fast access to sequences present in large flat text (fasta) files and I really love it. Now I'd like to speed up the retrieval of data from large files that store conservation scores. The files that I was able to find at UCSC have fixed step wiggle format, like fixedStep chrom=chrYHet start=1 step=1 0.117 0.092 0.092 0.085 0.071 0.051 0.021 0.010 0.008 0.010 0.019 0.023 0.023 0.019 ........ Does someone see a chance how to use the indexing mechanism used by bio::db::fasta in order to allow retrieval of float numbers. I could reformat the wiggle file to a simple space,tab or comma separated list of scores per chromosome. Are there suggestions? Or is there indeed a module that takes care about my problem and I have just overlooked it? Or won't such an approach get considerably faster than normal unix commands like: sed -n '2,5001p' chrYHet.pp to retrieve the scores? Maxim From sdavis2 at mail.nih.gov Wed Dec 22 19:30:41 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 22 Dec 2010 19:30:41 -0500 Subject: [Bioperl-l] indexing conservation scores In-Reply-To: References: Message-ID: On Wed, Dec 22, 2010 at 7:00 PM, Maxim wrote: > Hi, > > bio::db:fasta is a beautiful tool for fast access to sequences present in > large flat text (fasta) files and I really love it. Now I'd like to speed > up > the retrieval of data from large files that store conservation scores. The > files that I was able to find at UCSC have fixed step wiggle format, like > > Hi, Maxim. Have you looked at this page? http://genomewiki.ucsc.edu/index.php/Using_hgWiggle_without_a_database Sean > fixedStep chrom=chrYHet start=1 step=1 > 0.117 > 0.092 > 0.092 > 0.085 > 0.071 > 0.051 > 0.021 > 0.010 > 0.008 > 0.010 > 0.019 > 0.023 > 0.023 > 0.019 > ........ > > Does someone see a chance how to use the indexing mechanism used by > bio::db::fasta in order to allow retrieval of float numbers. I could > reformat the wiggle file to a simple space,tab or comma separated list of > scores per chromosome. > > Are there suggestions? Or is there indeed a module that takes care about my > problem and I have just overlooked it? > Or won't such an approach get considerably faster than normal unix > commands > like: > sed -n '2,5001p' chrYHet.pp > to retrieve the scores? > > > Maxim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Dec 22 19:39:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Dec 2010 18:39:12 -0600 Subject: [Bioperl-l] indexing conservation scores In-Reply-To: References: Message-ID: <9AE07256-A04B-480D-8DB9-649D4C859CBF@illinois.edu> Maybe use a tied hash using BerkeleyDB or AnyDBM_File, or DBD::SQLite? Also, maybe convert to BigWig and use Lincoln's Bio::DB::BigFile tools (note the installation process is a little tricky for this): http://search.cpan.org/~lds/Bio-BigFile-1.04/lib/Bio/DB/BigWig.pm Also, +1 to Sean's suggestion (don't rely completely on bioperl to implement everything :) chris On Dec 22, 2010, at 6:00 PM, Maxim wrote: > Hi, > > bio::db:fasta is a beautiful tool for fast access to sequences present in > large flat text (fasta) files and I really love it. Now I'd like to speed up > the retrieval of data from large files that store conservation scores. The > files that I was able to find at UCSC have fixed step wiggle format, like > > fixedStep chrom=chrYHet start=1 step=1 > 0.117 > 0.092 > 0.092 > 0.085 > 0.071 > 0.051 > 0.021 > 0.010 > 0.008 > 0.010 > 0.019 > 0.023 > 0.023 > 0.019 > ........ > > Does someone see a chance how to use the indexing mechanism used by > bio::db::fasta in order to allow retrieval of float numbers. I could > reformat the wiggle file to a simple space,tab or comma separated list of > scores per chromosome. > > Are there suggestions? Or is there indeed a module that takes care about my > problem and I have just overlooked it? > Or won't such an approach get considerably faster than normal unix commands > like: > sed -n '2,5001p' chrYHet.pp > to retrieve the scores? > > > Maxim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 22 22:05:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Dec 2010 21:05:24 -0600 Subject: [Bioperl-l] OX lines of EMBL files [patch included] In-Reply-To: <774oa7fhtv.fsf@novozymes.com> References: <774oa7fhtv.fsf@novozymes.com> Message-ID: <8F82EA32-728D-41B9-A890-36B3A8031232@illinois.edu> On Dec 21, 2010, at 6:09 AM, Jacob Bunk Nielsen wrote: > Hi > > I have noticed that BioPerl does not write OX lines with NCBI TaxIDs in. > I'd like this feature even though the official way seems to have a a > source feature with a db_xref to a 'taxon:(\d+)' for this sort of thing. > > I've found and example of an OX line being used for a TaxID at > ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt > > Please find attached a patch for reading and writing NCBI TaxIDs to EMBL > files and a test that verifies that the TaxID will roundtrip being > written and then read. > > Best regards > > Jacob Jacob, Patch was committed: https://github.com/bioperl/bioperl-live/commit/8b00b61d8641c636ad33e420197b9b7fc452ef42 Thanks for letting us know about this! chris From jskittrell at unmc.edu Thu Dec 23 16:00:04 2010 From: jskittrell at unmc.edu (Jeff S Kittrell) Date: Thu, 23 Dec 2010 15:00:04 -0600 Subject: [Bioperl-l] Emboss factory help Message-ID: An HTML attachment was scrubbed... URL: From lskatz at gmail.com Thu Dec 23 22:06:24 2010 From: lskatz at gmail.com (Lee Katz) Date: Thu, 23 Dec 2010 22:06:24 -0500 Subject: [Bioperl-l] Re (3): Status of assembly modules In-Reply-To: <4D0ED6A5.1050406@gmail.com> References: <4D0ED6A5.1050406@gmail.com> Message-ID: Done On Sun, Dec 19, 2010 at 11:08 PM, Florent Angly wrote: > Hi Lee, > > I was able to fix the bug you reported regarding the contig IDs in the > developement verison of Bioperl. > > For the other bug, please file a bug report at > http://bugzilla.open-bio.org/ and give me the URL. Provide the file that > you used so that we can reproduce the bug. Also, tell me how much memory you > have on your machine, as I suspect that you may be running out of memory > because of the size of your dataset and the way the Bioperl assembly modules > deal with contigs and scaffolds. > > Thank you, > > Florent > > > > On 16/12/10 10:15, Chris Fields wrote: > >> Lee, >> >> You are more than welcome to look at the code to optimize it; might be >> worth looking athe they way scaffolds, contigs, etc are defined within one >> aonther. I believe Florent Angly has been actively working on these >> modules; Florent may have thoughts on this. >> >> chris >> >> On Dec 10, 2010, at 5:53 PM, Lee Katz wrote: >> >> I am wondering if there is a way to optimize the BioPerl code for >>> Assembly >>> IO. Specifically, when I convert a 2.2 MB genome (~200 contigs) from a >>> 454 >>> ace file to a regular ace file, it takes a few hours to get through 30 >>> contigs using the code below (I estimate more than a day to get through >>> all >>> of it). >>> >>> Is there a way to optimize it? To convert a sequence file to another >>> format >>> at most would take a minute and therefore converting an ace on the >>> magnitude >>> of hours or days is too much. I wish I understood bioperl better but I >>> think the best I can do is issue a challenge or a feature request: who >>> can >>> speed up Assembly::IO::ace? >>> >>> # convert a Newbler ace to a standard ace >>> sub _newblerAceToAce($args){ >>> my($self,$args)=@_; >>> my >>> >>> $ace454=Bio::Assembly::IO->new(-file=>$$args{ace454Path},-format=>"ace",-variant=>'454'); >>> my >>> $ace=Bio::Assembly::IO->new(-file=>">$$args{acePath}",-format=>"ace"); >>> #output ace >>> my $numContigs=`grep -c ^CO $$args{ace454Path}`+0; >>> logmsg "Converting $$args{ace454Path} (454-ace) to $$args{acePath} >>> (ace). >>> $numContigs contigs total."; >>> while(my $contig=$ace454->next_contig){ >>> logmsg "Finished with ".$contig->id ." out of $numContigs"; >>> $ace->write_contig($contig); >>> } >>> return $$args{acePath}; >>> } >>> >>> >>> Message: 3 >>> >>> Date: Mon, 22 Nov 2010 15:18:10 -0500 >>> >>> From: Lee Katz >>> >>> Subject: [Bioperl-l] Re(2): Status of assembly modules >>> >>> To: bioperl-l at lists.open-bio.org >>> >>> Message-ID: >>> >>> >>> >>> Content-Type: text/plain; charset=UTF-8 >>> >>> >>> I figured it out (I haven't tested much though). >>> >>> >>> To whoever works on Assembly::IO::ace.pm: >>> >>> I changed a regular expression on line 231 because the contig object was >>> not >>> >>> initializing properly. For some reason the 454 ace file had adopted the >>> >>> reference assembly's ID and therefore there was a GI number followed by a >>> >>> pipe. The pipe was not captured with \w+. I think that the regex will >>> be >>> >>> safe with \s(\S+)\s. >>> >>> >>> if (/^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! >>> >>> #if (/^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! >>> >>> >>> On Thu, Nov 18, 2010 at 12:04 PM,>> >>>> wrote: >>>> >>> >>> Message: 3 >>>> Date: Wed, 17 Nov 2010 22:20:03 -0500 >>>> From: Lee Katz >>>> Subject: Re: [Bioperl-l] Status of assembly modules >>>> To: bioperl-l at lists.open-bio.org >>>> Message-ID: >>>> >>>> Content-Type: text/plain; charset=UTF-8 >>>> I have read on the BioPerl site that a 454 ace is not standardized due >>>> to >>>> its coordinate system. How can I convert it to the standard ace file? >>>> When I run this code either by using contig or assembly objects, I get >>>> an >>>> error. >>>> Can't call method "get_consensus_sequence" on an undefined value at >>>> Bio/Assembly/IO/ace.pm line 280, line 93349. >>>> sub _newblerAceToAce($args){ >>>> my($self,$args)=@_; >>>> my >>>> $ace454=Bio::Assembly::IO->new(-file=>$args{ace454Path},-format=>"ace >>>> >>> ",-variant=>'454'); >>> >>> my >>>> $ace=Bio::Assembly::IO->new(-file=>">$args{acePath}",-format=>"ace"); >>>> #while(my $contig=$ace454->next_contig){ >>>> while(my $scaffold=$ace454->next_assembly){ >>>> print Dumper $scaffold; >>>> } >>>> return $args{acePath}; >>>> } >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- Lee Katz http://leeskatz.com From florent.angly at gmail.com Fri Dec 24 05:17:00 2010 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 24 Dec 2010 11:17:00 +0100 Subject: [Bioperl-l] Re (3): Status of assembly modules In-Reply-To: References: <4D0ED6A5.1050406@gmail.com> Message-ID: <4D14731C.1040907@gmail.com> Ok, I found it. This is bug 3163: http://bugzilla.open-bio.org/show_bug.cgi?id=3163 Again, Lee, can you add on this bug report the amount of memory that you have on your machine and also include the file that you used (I cannot start to fix the bug if I don't have the file to reproduce it)? Best, Flooent From lskatz at gmail.com Fri Dec 24 09:30:27 2010 From: lskatz at gmail.com (Lee Katz) Date: Fri, 24 Dec 2010 09:30:27 -0500 Subject: [Bioperl-l] Re (3): Status of assembly modules In-Reply-To: <4D14731C.1040907@gmail.com> References: <4D0ED6A5.1050406@gmail.com> <4D14731C.1040907@gmail.com> Message-ID: Hopefully I addressed what you asked. Bugzilla could not accept the whole file, so I just attached the first 1000 lines of it. If necessary, I will put the file online somewhere temporarily and link to it. On Fri, Dec 24, 2010 at 5:17 AM, Florent Angly wrote: > Ok, I found it. This is bug 3163: > http://bugzilla.open-bio.org/show_bug.cgi?id=3163 > > Again, Lee, can you add on this bug report the amount of memory that you > have on your machine and also include the file that you used (I cannot start > to fix the bug if I don't have the file to reproduce it)? > > Best, > > Flooent > -- Lee Katz http://leeskatz.com From cdavis at bcm.edu Wed Dec 29 14:46:14 2010 From: cdavis at bcm.edu (Davis, Caleb F) Date: Wed, 29 Dec 2010 13:46:14 -0600 Subject: [Bioperl-l] fastq index Message-ID: <3FB6C3C1E6A0F44498ACC8AB9DD66CB88B0537914D@EXCMSMBX03.ad.bcm.edu> Hi all, Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html using the fastq sequence given here: http://www.bioperl.org/wiki/FASTQ_sequence_format but I get this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113 STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134 STACK: fetch_fastq_test.pl:11 The only other report of this behavior I could find is here: http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836 I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here. --Caleb From cjfields at illinois.edu Wed Dec 29 15:28:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Dec 2010 14:28:08 -0600 Subject: [Bioperl-l] fastq index In-Reply-To: <3FB6C3C1E6A0F44498ACC8AB9DD66CB88B0537914D@EXCMSMBX03.ad.bcm.edu> References: <3FB6C3C1E6A0F44498ACC8AB9DD66CB88B0537914D@EXCMSMBX03.ad.bcm.edu> Message-ID: <509375BA-6CDA-455B-B0EC-E0C9DD44B401@illinois.edu> On Dec 29, 2010, at 1:46 PM, Davis, Caleb F wrote: > Hi all, > > Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here: > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html > using the fastq sequence given here: > http://www.bioperl.org/wiki/FASTQ_sequence_format > but I get this error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113 > STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134 > STACK: fetch_fastq_test.pl:11 > > The only other report of this behavior I could find is here: > http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836 > > I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here. > > --Caleb Caleb, Make sure you are using the latest BioPerl release via CPAN, or via github; the line number and error message don't correspond to the latest version. If the problem persists, you may need to file a bug report for this with some example data and a script, or at least show some example data that is triggering the problem. I believe the current indexing scheme used for FASTQ isn't up-to-date with the current parser (which underwent a complete refactoring a while back), so this would help tremendously, but it should be fairly easy to add proper indexing to this. Jason and I briefly talked about FASTQ parsing a few months back in relation to speed of parsing, it could be much faster (my main concern initially was that it was correct). chris From MEC at stowers.org Wed Dec 29 19:20:26 2010 From: MEC at stowers.org (Cook, Malcolm) Date: Wed, 29 Dec 2010 18:20:26 -0600 Subject: [Bioperl-l] fastq index In-Reply-To: <509375BA-6CDA-455B-B0EC-E0C9DD44B401@illinois.edu> Message-ID: If you're looking for alternatives, I recommend: http://sourceforge.net/projects/cdbfasta/ No bioperl wrapper, but, hey, that's what `system` is for Cheers, Malcolm On 12/29/10 2:28 PM, "Chris Fields" wrote: On Dec 29, 2010, at 1:46 PM, Davis, Caleb F wrote: > Hi all, > > Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here: > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html > using the fastq sequence given here: > http://www.bioperl.org/wiki/FASTQ_sequence_format > but I get this error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 > STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113 > STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134 > STACK: fetch_fastq_test.pl:11 > > The only other report of this behavior I could find is here: > http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836 > > I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here. > > --Caleb Caleb, Make sure you are using the latest BioPerl release via CPAN, or via github; the line number and error message don't correspond to the latest version. If the problem persists, you may need to file a bug report for this with some example data and a script, or at least show some example data that is triggering the problem. I believe the current indexing scheme used for FASTQ isn't up-to-date with the current parser (which underwent a complete refactoring a while back), so this would help tremendously, but it should be fairly easy to add proper indexing to this. Jason and I briefly talked about FASTQ parsing a few months back in relation to speed of parsing, it could be much faster (my main concern initially was that it was correct). chris _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 29 22:34:32 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Dec 2010 21:34:32 -0600 Subject: [Bioperl-l] fastq index In-Reply-To: References: Message-ID: <9A90C056-FEEB-4755-B8C7-ABD90977DE8C@illinois.edu> May just wrap this for the indexer. Thanks for the pointer Malcolm! chris On Dec 29, 2010, at 6:20 PM, Cook, Malcolm wrote: > If you're looking for alternatives, I recommend: http://sourceforge.net/projects/cdbfasta/ > > No bioperl wrapper, but, hey, that's what `system` is for > > Cheers, > > Malcolm > > > On 12/29/10 2:28 PM, "Chris Fields" wrote: > > On Dec 29, 2010, at 1:46 PM, Davis, Caleb F wrote: > >> Hi all, >> >> Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here: >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html >> using the fastq sequence given here: >> http://www.bioperl.org/wiki/FASTQ_sequence_format >> but I get this error: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >> STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113 >> STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134 >> STACK: fetch_fastq_test.pl:11 >> >> The only other report of this behavior I could find is here: >> http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836 >> >> I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here. >> >> --Caleb > > Caleb, > > Make sure you are using the latest BioPerl release via CPAN, or via github; the line number and error message don't correspond to the latest version. If the problem persists, you may need to file a bug report for this with some example data and a script, or at least show some example data that is triggering the problem. > > I believe the current indexing scheme used for FASTQ isn't up-to-date with the current parser (which underwent a complete refactoring a while back), so this would help tremendously, but it should be fairly easy to add proper indexing to this. Jason and I briefly talked about FASTQ parsing a few months back in relation to speed of parsing, it could be much faster (my main concern initially was that it was correct). > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cdavis at bcm.edu Fri Dec 31 01:47:15 2010 From: cdavis at bcm.edu (Davis, Caleb F) Date: Fri, 31 Dec 2010 00:47:15 -0600 Subject: [Bioperl-l] fastq index In-Reply-To: <9A90C056-FEEB-4755-B8C7-ABD90977DE8C@illinois.edu> References: <9A90C056-FEEB-4755-B8C7-ABD90977DE8C@illinois.edu> Message-ID: <3FB6C3C1E6A0F44498ACC8AB9DD66CB88B05379151@EXCMSMBX03.ad.bcm.edu> Thank you for the rec! Here's what I get with 1.6.1: %perl make_fq_inx_test.pl test.inx test.fastq %perl fetch_fastq_test.pl test.inx FVBWUVC01D7SUB ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: No description line parsed STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368 STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71 STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29 STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147 STACK: fetch_fastq_test.pl:11 ----------------------------------------------------------- Is it a bug? --Caleb These perl scripts are from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html ########## make_fq_inx_test.pl ########### # Complete code for making an index for several # fastq files use Bio::Index::Fastq; use strict; my $Index_File_Name = shift; my $inx = Bio::Index::Fastq->new( '-filename' => $Index_File_Name, '-write_flag' => 1); $inx->make_index(@ARGV); ########## fetch_fastq_test.pl ########### # Print out several sequences present in the index # in Fastq format use Bio::Index::Fastq; use strict; my $Index_File_Name = shift; my $inx = Bio::Index::Fastq->new('-filename' => $Index_File_Name); my $out = Bio::SeqIO->new('-format' => 'Fastq','-fh' => \*STDOUT); foreach my $id (@ARGV) { my $seq = $inx->fetch($id); # Returns Bio::Seq::Quality object <------------------- THROW $out->write_seq($seq); } Example data-- ########## test.fastq ########### @FVBWUVC01BR7MP GCGACCCTAGGTAGCAACCGCCGGCTTCGGCGGTAAGGTATCACTCAG + 24<9000988:;<=<;=<44444<<=<<<>???@@@@?>=86662232 @FVBWUVC01D7NSE GAAGCAGACACAGAAAGACACGGTCTAGCAGATCG + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIEEEE@< @FVBWUVC01D7SUB TTTATCGGCTAGGTCAAATAGAGTGCTTTGATATCAGCATGTCTAGCT + FFD===FFFFFHFFFFFFFFFFC888FFFFDDBAAA@@@840...757 @FVBWUVC01BFN75 TTAGAATTCAGTTTAGTGCGCTGATCTGAGTCGAGATAAAATCACCAGTACCCAAAACCAGGCGGGCTCGCCACGTTGGCTAATCCTGGTACATTTTGTAATCAATGTTCAGAAGA + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFDDBB:544448<<=>;899<=8889988894<<9955,,/4,,,,,811775512426766777;97668<<44944 @FVBWUVC01AYO0N AAATTTGTGTTAGAAGGACGAGTCACCACGTACCAATAGCAACAACGATCGGTCGGACTATTCATTGTGGTGGTGACGCTC + IIIIIIIIIIIIIHHFF@??DA???==<=766<<11,/,,,1,,,,733977--/4444722466<;;<<<82/,,--.12 @FVBWUVC01EYPM9 GGATTACACGGGAAAGGTGCTTGTGTCCCGACAGGCTAGGATA + FFFFDD<<:ABAA<988:9::BA===BBBBAA??<8623425/ @FVBWUVC01BWHY4 AGGTACTACTTCTTAGTGAGACAAGTCCTGGACAGGAGCAGGTAATATT + HGGGDDD:555:4449==>=<<555=BBAAAA at 8888894224266;.. @FVBWUVC01ELH7H CATGAGAAGTCTTAATATTACCTCTCAGGTACCTCCTCTTAAGACACAATTACAGAAGGTGCT + IIIII@@??GIIIIG<<666:IFEIEIEED<==<;CE?3344IFIIIIIIIIIGC>==>>44444898==<;<62444221775557 -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Wednesday, December 29, 2010 9:35 PM To: Cook, Malcolm Cc: Davis, Caleb F; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] fastq index May just wrap this for the indexer. Thanks for the pointer Malcolm! chris On Dec 29, 2010, at 6:20 PM, Cook, Malcolm wrote: > If you're looking for alternatives, I recommend: http://sourceforge.net/projects/cdbfasta/ > > No bioperl wrapper, but, hey, that's what `system` is for > > Cheers, > > Malcolm > > > On 12/29/10 2:28 PM, "Chris Fields" wrote: > > On Dec 29, 2010, at 1:46 PM, Davis, Caleb F wrote: > >> Hi all, >> >> Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here: >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html >> using the fastq sequence given here: >> http://www.bioperl.org/wiki/FASTQ_sequence_format >> but I get this error: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >> STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113 >> STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134 >> STACK: fetch_fastq_test.pl:11 >> >> The only other report of this behavior I could find is here: >> http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836 >> >> I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here. >> >> --Caleb > > Caleb, > > Make sure you are using the latest BioPerl release via CPAN, or via github; the line number and error message don't correspond to the latest version. If the problem persists, you may need to file a bug report for this with some example data and a script, or at least show some example data that is triggering the problem. > > I believe the current indexing scheme used for FASTQ isn't up-to-date with the current parser (which underwent a complete refactoring a while back), so this would help tremendously, but it should be fairly easy to add proper indexing to this. Jason and I briefly talked about FASTQ parsing a few months back in relation to speed of parsing, it could be much faster (my main concern initially was that it was correct). > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Dec 31 10:28:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 31 Dec 2010 09:28:01 -0600 Subject: [Bioperl-l] fastq index In-Reply-To: <3FB6C3C1E6A0F44498ACC8AB9DD66CB88B05379151@EXCMSMBX03.ad.bcm.edu> References: <9A90C056-FEEB-4755-B8C7-ABD90977DE8C@illinois.edu> <3FB6C3C1E6A0F44498ACC8AB9DD66CB88B05379151@EXCMSMBX03.ad.bcm.edu> Message-ID: <6CCEA607-FA44-4D5D-B49B-577816E7514B@illinois.edu> Caleb, Yes that would be a bug. I posted this to bugzilla for tracking: http://bugzilla.open-bio.org/show_bug.cgi?id=3165 chris On Dec 31, 2010, at 12:47 AM, Davis, Caleb F wrote: > Thank you for the rec! > > Here's what I get with 1.6.1: > > %perl make_fq_inx_test.pl test.inx test.fastq > %perl fetch_fastq_test.pl test.inx FVBWUVC01D7SUB > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: No description line parsed > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71 > STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29 > STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147 > STACK: fetch_fastq_test.pl:11 > ----------------------------------------------------------- > > Is it a bug? > --Caleb > > These perl scripts are from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html > > ########## make_fq_inx_test.pl ########### > # Complete code for making an index for several > # fastq files > use Bio::Index::Fastq; > use strict; > > my $Index_File_Name = shift; > my $inx = Bio::Index::Fastq->new( > '-filename' => $Index_File_Name, > '-write_flag' => 1); > $inx->make_index(@ARGV); > > > ########## fetch_fastq_test.pl ########### > # Print out several sequences present in the index > # in Fastq format > use Bio::Index::Fastq; > use strict; > > my $Index_File_Name = shift; > my $inx = Bio::Index::Fastq->new('-filename' => $Index_File_Name); > my $out = Bio::SeqIO->new('-format' => 'Fastq','-fh' => \*STDOUT); > > foreach my $id (@ARGV) { > my $seq = $inx->fetch($id); # Returns Bio::Seq::Quality object <------------------- THROW > $out->write_seq($seq); > } > > Example data-- > > ########## test.fastq ########### > @FVBWUVC01BR7MP > GCGACCCTAGGTAGCAACCGCCGGCTTCGGCGGTAAGGTATCACTCAG > + > 24<9000988:;<=<;=<44444<<=<<<>???@@@@?>=86662232 > @FVBWUVC01D7NSE > GAAGCAGACACAGAAAGACACGGTCTAGCAGATCG > + > IIIIIIIIIIIIIIIIIIIIIIIIIIIIIEEEE@< > @FVBWUVC01D7SUB > TTTATCGGCTAGGTCAAATAGAGTGCTTTGATATCAGCATGTCTAGCT > + > FFD===FFFFFHFFFFFFFFFFC888FFFFDDBAAA@@@840...757 > @FVBWUVC01BFN75 > TTAGAATTCAGTTTAGTGCGCTGATCTGAGTCGAGATAAAATCACCAGTACCCAAAACCAGGCGGGCTCGCCACGTTGGCTAATCCTGGTACATTTTGTAATCAATGTTCAGAAGA > + > IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFDDBB:544448<<=>;899<=8889988894<<9955,,/4,,,,,811775512426766777;97668<<44944 > @FVBWUVC01AYO0N > AAATTTGTGTTAGAAGGACGAGTCACCACGTACCAATAGCAACAACGATCGGTCGGACTATTCATTGTGGTGGTGACGCTC > + > IIIIIIIIIIIIIHHFF@??DA???==<=766<<11,/,,,1,,,,733977--/4444722466<;;<<<82/,,--.12 > @FVBWUVC01EYPM9 > GGATTACACGGGAAAGGTGCTTGTGTCCCGACAGGCTAGGATA > + > FFFFDD<<:ABAA<988:9::BA===BBBBAA??<8623425/ > @FVBWUVC01BWHY4 > AGGTACTACTTCTTAGTGAGACAAGTCCTGGACAGGAGCAGGTAATATT > + > HGGGDDD:555:4449==>=<<555=BBAAAA at 8888894224266;.. > @FVBWUVC01ELH7H > CATGAGAAGTCTTAATATTACCTCTCAGGTACCTCCTCTTAAGACACAATTACAGAAGGTGCT > + > IIIII@@??GIIIIG<<666:IFEIEIEED<==<;CE?3344IFIIIIIIIIIGC>== @FVBWUVC01CTTAY > CTCGAGATTCTGGATCCTCATGGACAAGATGTTCTCCGGCTTAGAGAT > + > FFFFFFFFFFFFDA:88@>>>44444898==<;<62444221775557 > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, December 29, 2010 9:35 PM > To: Cook, Malcolm > Cc: Davis, Caleb F; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] fastq index > > May just wrap this for the indexer. Thanks for the pointer Malcolm! > > chris > > On Dec 29, 2010, at 6:20 PM, Cook, Malcolm wrote: > >> If you're looking for alternatives, I recommend: http://sourceforge.net/projects/cdbfasta/ >> >> No bioperl wrapper, but, hey, that's what `system` is for >> >> Cheers, >> >> Malcolm >> >> >> On 12/29/10 2:28 PM, "Chris Fields" wrote: >> >> On Dec 29, 2010, at 1:46 PM, Davis, Caleb F wrote: >> >>> Hi all, >>> >>> Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here: >>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html >>> using the fastq sequence given here: >>> http://www.bioperl.org/wiki/FASTQ_sequence_format >>> but I get this error: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 >>> STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113 >>> STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134 >>> STACK: fetch_fastq_test.pl:11 >>> >>> The only other report of this behavior I could find is here: >>> http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836 >>> >>> I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here. >>> >>> --Caleb >> >> Caleb, >> >> Make sure you are using the latest BioPerl release via CPAN, or via github; the line number and error message don't correspond to the latest version. If the problem persists, you may need to file a bug report for this with some example data and a script, or at least show some example data that is triggering the problem. >> >> I believe the current indexing scheme used for FASTQ isn't up-to-date with the current parser (which underwent a complete refactoring a while back), so this would help tremendously, but it should be fairly easy to add proper indexing to this. Jason and I briefly talked about FASTQ parsing a few months back in relation to speed of parsing, it could be much faster (my main concern initially was that it was correct). >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >